[netcdf-java] Data precision while aggregating data
Sachin Kumar Bhate
skbhate at ngi.msstate.edu
Thu May 15 16:45:20 MDT 2008
Rich,
Ok, so lets see if the time is being rounded to nearest milliseconds or
simply truncated in netcdf-java.
I ran a test below to see how the time is being calculated in
netcdf-java. So, making case simple, I used
the 'unit' string as the java's standard base time and ran tests as
shown below.
Test-1, which uses 'ucar.nc2.units.DateUnit' resulted in '0'
milliseconds, while Test-2, which uses java's
calendar class resulted in '734' milliseconds.
So, I think the time is being truncated instead of rounded to nearest
milliseconds in netcdf-java. Internally it must be still using the
instance of java's Calendar class, but may be truncating the final
results.
So, this may explain one second difference we may be getting in reading
that start value (i.e. 47865.791666666511 ).
I may be wrong, John may have better theory. (Test cases below)
TEST-CASES:
unitString = "days since 1970-01-01 00:00:00 UTC";
which is also the standard base time (a.k.a 'epoch') used in java.
1. Test-case-1: Using 'ucar.nc2.units.DateUnit' class
>>>
import ucar.nc2.units.DateUnit;
DateUnit du = new DateUnit(unitString);
long originMS = du.getDateOrigin().getTime();
System.out.println("Test-case-1: "+ originMS+ " milliseconds");
<<<<
Result: Test-case-1: 0 milliseconds
2. Test-case-2: Using 'java.util.Calendar' class
>>>
import java.util.*;
Calendar cal = Calendar.getInstance(Locale.US);
cal.setTimeZone(new SimpleTimeZone(0,"UTC"));
cal.set(1970, 0, 1,0, 0, 0); // 01/01/1970
System.out.println("Test-case-2: "cal.getTimeInMillis()"+
milliseconds");
Test-case-2: 734 milliseconds.
( Note: This result will vary anything between 0-999
milliseconds, as each Calendar instance
contains the number of milliseconds since the epoch for the
represented point in time.)
-Sachin.
Rich Signell wrote:
> John,
>
> Your comment about java representing time in milliseconds since 1970
> gave me an idea:
> perhaps the problem is simply a difference in the way that rounding is
> done by the routine that calculates (year, mon, day, hour, min, sec)
> from decimal days.
>
> In the Matlab routine, time is rounded by default to the nearest second.
>
> In the Java routine, is time rounded to the nearest millisecond, or
> perhaps not even rounded, but simply truncated?
>
> As a test, I tried adding 0.5 milliseconds to my time value:
> 47865.791666666511 + 1/24/3600/1000 = 47865.79166667230
>
> and sure enough, I get the result I was looking for:
>
> 05-Dec-1989 19:00:00
>
> -Rich
>
> On Thu, May 15, 2008 at 1:10 PM, John Caron <caron at unidata.ucar.edu> wrote:
>
>> Rich Signell wrote:
>>
>>> John,
>>>
>>> Four replies to your four comments: ;-)
>>>
>>> On Wed, May 14, 2008 at 9:08 PM, John Caron <caron at unidata.ucar.edu>
>>> wrote:
>>>
>>>> Im not quite sure where the inaccuracy comes in, likely converting
>>>> between
>>>> Date and udunits representation. Ill have to see what I can do.
>>>>
>>>> A few comments:
>>>>
>>>> 1) double has 53 bits of accuracy giving slightly under 16 decimal digits
>>>> of
>>>> accuracy. thus:
>>>>
>>>> public void testDoublePrecision() {
>>>> double dval = 47865.7916666665110000;
>>>> System.out.println(" dval= "+dval);
>>>> }
>>>>
>>>> prints:
>>>>
>>>> dval= 47865.79166666651
>>>>
>>>>
>>> Okay, you lost the lowest bit, but you should still be fine. You
>>> still have 11 places after the decimal point. In Matlab, which uses
>>> double precision arithmetic, I don't get a problem converting to
>>> gregorian until we drop to 8 places after the decimal point:
>>>
>>> datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511) =>
>>> 05-Dec-1989 19:00:00
>>> datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666651) =>
>>> 05-Dec-1989 19:00:00
>>> datestr(datenum([1858 11 17 0 0 0]) + 47865.7916666665) =>
>>> 05-Dec-1989 19:00:00
>>> datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666) =>
>>> 05-Dec-1989 19:00:00
>>> datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666) =>
>>>
>> yes, it does seem funny we are losing so much precision. It probably has to
>> do with converting internally to a Java date, which uses millisecs since
>> 1970.
>>
>>
>>> 05-Dec-1989 18:59:59
>>>
>>>
>>>> 2) preserving lowest bits of accuracy is tricky, and requires care, which
>>>> i
>>>> promise has not (yet) happened in the CDM units handling. in general,
>>>> relying lowest bits being preserving is dicey.
>>>>
>>> That's okay -- we don't need to preserve that lowest bit.
>>>
>> how many bits do you need to preserve?
>>
>>
>>
>>>> 3) what is the definition of a "day". how accurate do you need that? All
>>>> I
>>>> could find was this note in the units package:
>>>>
>>>> * Interval between 2 successive passages of sun through vernal
>>>> equinox
>>>> * (365.242198781 days -- see
>>>> * http://www.ast.cam.ac.uk/pubinfo/leaflets/,
>>>> * http://aa.usno.navy.mil/AA/
>>>> * and http://adswww.colorado.edu/adswww/astro_coord.html):
>>>>
>>>> you may agree, but what if someone uses a different meaning for "day" ??
>>>>
>>> Take a look at udunits.dat:
>>> http://www.unidata.ucar.edu/software/udunits/udunits-1/udunits.txt
>>>
>>> A "day" is precisely defined as 86400 seconds.
>>> A "sidereal day" is a different unit.
>>>
>> yes, the 86400 is clear. but how many days are there between date1 and date
>> 2? you have to deal with leap years etc
>>
>>
>>>> 4) IMHO, using udunits for calender date is a mistake. its a units
>>>> package,
>>>> not a calender package.
>>>>
>>> Maybe, but I think to solve the current problem, we could just find
>>> out where the computations are dropping the double precision.
>>>
>> yes, thats the short term solution
>>
>>
>>
>>>> 5) "47865.7916666665110000 days since 1858-11-17 00:00:00 UTC" is, um,
>>>> unreadable to humans.
>>>>
>>> What is not unreadable about that? Yes, it's a big number with a lot
>>> of precision, and a older date, but I think it's perfectly readable
>>> and unambigous. And as I mentioned, it's a an international
>>> recognized convention called "Modified Julian Date".
>>>
>> its unreadable because you cant tell what the actual date it represents,
>> without using software.
>>
>>
>>>> 6) I earlier proposed to CF that we allow ISO date strings, more
>>>> readable,
>>>> not ambiguous, and doesnt have a precision problem. Various CF
>>>> authorities
>>>> thought it wasnt needed because it was redundant with the udunits
>>>> representation.
>>>>
>>> I think allowing ISO date strings in CF would be a good idea, and I
>>> also think allowing a two integer representation in CF would be a good
>>> idea (we use Julian day, and milliseconds since midnight as our two
>>> integer vectors). But that idea was also not too popular. Several
>>> people thought it would be a good idea, including Balaji, but there
>>> was concern about to need to modify all existing CF applications to
>>> handle these new time conventions. But if this was just handled in
>>> UDUNITS, I don't think this would be much problem, as I would think
>>> that most CF-compliant apps have used the UDUNITS library to to their
>>> math.
>>>
>> part of my point to CF is that one must use udunits (which has both C and
>> Java versions, as well as multiple releases. do they always agree?). Its a
>> mistake to tie long-term semantics as important as time to a single software
>> package. better to document what its supposed to mean, so it can be
>> independently implemented.
>>
>>
>
>
>
>
--
Sachin Kumar Bhate, Research Associate
MSU-High Performance Computing Collaboratory, NGI
John C. Stennis Space Center, MS 39529
http://www.northerngulfinstitute.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.unidata.ucar.edu/mailing_lists/archives/netcdf-java/attachments/20080515/ffc3a9c9/attachment.html
More information about the netcdf-java
mailing list