[netcdf-java] Data precision while aggregating data
John Caron
caron at unidata.ucar.edu
Thu May 15 11:10:25 MDT 2008
Rich Signell wrote:
> John,
>
> Four replies to your four comments: ;-)
>
> On Wed, May 14, 2008 at 9:08 PM, John Caron <caron at unidata.ucar.edu> wrote:
>> Im not quite sure where the inaccuracy comes in, likely converting between
>> Date and udunits representation. Ill have to see what I can do.
>>
>> A few comments:
>>
>> 1) double has 53 bits of accuracy giving slightly under 16 decimal digits of
>> accuracy. thus:
>>
>> public void testDoublePrecision() {
>> double dval = 47865.7916666665110000;
>> System.out.println(" dval= "+dval);
>> }
>>
>> prints:
>>
>> dval= 47865.79166666651
>>
>
> Okay, you lost the lowest bit, but you should still be fine. You
> still have 11 places after the decimal point. In Matlab, which uses
> double precision arithmetic, I don't get a problem converting to
> gregorian until we drop to 8 places after the decimal point:
>
> datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511) =>
> 05-Dec-1989 19:00:00
> datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666651) =>
> 05-Dec-1989 19:00:00
> datestr(datenum([1858 11 17 0 0 0]) + 47865.7916666665) =>
> 05-Dec-1989 19:00:00
> datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666) =>
> 05-Dec-1989 19:00:00
> datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666) =>
yes, it does seem funny we are losing so much precision. It probably has to do with converting
internally to a Java date, which uses millisecs since 1970.
> 05-Dec-1989 18:59:59
>
>> 2) preserving lowest bits of accuracy is tricky, and requires care, which i
>> promise has not (yet) happened in the CDM units handling. in general,
>> relying lowest bits being preserving is dicey.
>
> That's okay -- we don't need to preserve that lowest bit.
how many bits do you need to preserve?
>> 3) what is the definition of a "day". how accurate do you need that? All I
>> could find was this note in the units package:
>>
>> * Interval between 2 successive passages of sun through vernal
>> equinox
>> * (365.242198781 days -- see
>> * http://www.ast.cam.ac.uk/pubinfo/leaflets/,
>> * http://aa.usno.navy.mil/AA/
>> * and http://adswww.colorado.edu/adswww/astro_coord.html):
>>
>> you may agree, but what if someone uses a different meaning for "day" ??
>
> Take a look at udunits.dat:
> http://www.unidata.ucar.edu/software/udunits/udunits-1/udunits.txt
>
> A "day" is precisely defined as 86400 seconds.
> A "sidereal day" is a different unit.
yes, the 86400 is clear. but how many days are there between date1 and date 2? you have to deal with
leap years etc
>
>> 4) IMHO, using udunits for calender date is a mistake. its a units package,
>> not a calender package.
>
> Maybe, but I think to solve the current problem, we could just find
> out where the computations are dropping the double precision.
yes, thats the short term solution
>
>> 5) "47865.7916666665110000 days since 1858-11-17 00:00:00 UTC" is, um,
>> unreadable to humans.
>
> What is not unreadable about that? Yes, it's a big number with a lot
> of precision, and a older date, but I think it's perfectly readable
> and unambigous. And as I mentioned, it's a an international
> recognized convention called "Modified Julian Date".
its unreadable because you cant tell what the actual date it represents, without using software.
>
>> 6) I earlier proposed to CF that we allow ISO date strings, more readable,
>> not ambiguous, and doesnt have a precision problem. Various CF authorities
>> thought it wasnt needed because it was redundant with the udunits
>> representation.
>
> I think allowing ISO date strings in CF would be a good idea, and I
> also think allowing a two integer representation in CF would be a good
> idea (we use Julian day, and milliseconds since midnight as our two
> integer vectors). But that idea was also not too popular. Several
> people thought it would be a good idea, including Balaji, but there
> was concern about to need to modify all existing CF applications to
> handle these new time conventions. But if this was just handled in
> UDUNITS, I don't think this would be much problem, as I would think
> that most CF-compliant apps have used the UDUNITS library to to their
> math.
part of my point to CF is that one must use udunits (which has both C and Java versions, as well as
multiple releases. do they always agree?). Its a mistake to tie long-term semantics as important as
time to a single software package. better to document what its supposed to mean, so it can be
independently implemented.
More information about the netcdf-java
mailing list