[netcdf-java] Data precision while aggregating data

John Caron caron at unidata.ucar.edu
Thu May 15 11:10:25 MDT 2008



Rich Signell wrote:
> John,
> 
> Four replies to your four comments:   ;-)
> 
> On Wed, May 14, 2008 at 9:08 PM, John Caron <caron at unidata.ucar.edu> wrote:
>> Im not quite sure where the inaccuracy comes in, likely converting between
>> Date and udunits representation. Ill have to see what I can do.
>>
>> A few comments:
>>
>> 1) double has 53 bits of accuracy giving slightly under 16 decimal digits of
>> accuracy. thus:
>>
>>  public void testDoublePrecision() {
>>    double dval = 47865.7916666665110000;
>>    System.out.println(" dval= "+dval);
>>  }
>>
>> prints:
>>
>>  dval= 47865.79166666651
>>
> 
> Okay, you lost the lowest bit, but you should still be fine.   You
> still have 11 places after the decimal point.    In Matlab, which uses
> double precision arithmetic, I don't get a problem converting to
> gregorian until we drop to 8 places after the decimal point:
> 
> datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511) =>
> 05-Dec-1989 19:00:00
> datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666651)   =>
> 05-Dec-1989 19:00:00
> datestr(datenum([1858 11 17 0 0 0]) + 47865.7916666665)    =>
> 05-Dec-1989 19:00:00
> datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666)      =>
> 05-Dec-1989 19:00:00
> datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666)        =>

yes, it does seem funny we are losing so much precision. It probably has to do with converting 
internally to a Java date, which uses millisecs since 1970.

> 05-Dec-1989 18:59:59
> 
>> 2) preserving lowest bits of accuracy is tricky, and requires care, which i
>> promise has not (yet) happened in the CDM units handling. in general,
>> relying lowest bits being preserving is dicey.
> 
> That's okay -- we don't need to preserve that lowest bit.

how many bits do you need to preserve?


>> 3) what is the definition of a "day". how accurate do you need that? All I
>> could find was this note in the units package:
>>
>>         * Interval between 2 successive passages of sun through vernal
>> equinox
>>         * (365.242198781 days -- see
>>         * http://www.ast.cam.ac.uk/pubinfo/leaflets/,
>>         * http://aa.usno.navy.mil/AA/
>>         * and http://adswww.colorado.edu/adswww/astro_coord.html):
>>
>> you may agree, but what if someone uses a different meaning for "day" ??
> 
> Take a look at udunits.dat:
> http://www.unidata.ucar.edu/software/udunits/udunits-1/udunits.txt
> 
> A "day" is precisely defined as 86400 seconds.
> A "sidereal day" is a different unit.

yes, the 86400 is clear. but how many days are there between date1 and date 2? you have to deal with 
leap years etc

> 
>> 4) IMHO, using udunits for calender date is a mistake. its a units package,
>> not a calender package.
> 
> Maybe, but I think to solve the current problem, we could just find
> out where the computations are dropping the double precision.

yes, thats the short term solution


> 
>> 5) "47865.7916666665110000 days since 1858-11-17 00:00:00 UTC" is, um,
>> unreadable to humans.
> 
> What is not unreadable about that?   Yes, it's a big number with a lot
> of precision, and a older date, but I think it's perfectly readable
> and unambigous.    And as I mentioned, it's a an international
> recognized convention called "Modified Julian Date".

its unreadable because you cant tell what the actual date it represents, without using software.

> 
>> 6) I earlier proposed to CF that we allow ISO date strings, more readable,
>> not ambiguous, and doesnt have a precision problem. Various CF authorities
>> thought it wasnt needed because it was redundant with the udunits
>> representation.
> 
> I think allowing ISO date strings in CF would be a good idea, and I
> also think allowing a two integer representation in CF would be a good
> idea (we use Julian day, and milliseconds since midnight as our two
> integer vectors).   But that idea was also not too popular.   Several
> people thought it would be a good idea, including Balaji, but there
> was concern about to need to modify all existing CF applications to
> handle these new time conventions.     But if this was just handled in
> UDUNITS, I don't think this would be much problem, as I would think
> that most CF-compliant apps have used the UDUNITS library to to their
> math.

part of my point to CF is that one must use udunits (which has both C and Java versions, as well as 
multiple releases. do they always agree?). Its a mistake to tie long-term semantics as important as 
time to a single software package. better to document what its supposed to mean, so it can be 
independently implemented.


More information about the netcdf-java mailing list