Re: [netcdf-java] Data precision while aggregating data



Rich Signell wrote:
John,

Four replies to your four comments:   ;-)

On Wed, May 14, 2008 at 9:08 PM, John Caron <caron@xxxxxxxxxxxxxxxx> wrote:
Im not quite sure where the inaccuracy comes in, likely converting between
Date and udunits representation. Ill have to see what I can do.

A few comments:

1) double has 53 bits of accuracy giving slightly under 16 decimal digits of
accuracy. thus:

 public void testDoublePrecision() {
   double dval = 47865.7916666665110000;
   System.out.println(" dval= "+dval);
 }

prints:

 dval= 47865.79166666651


Okay, you lost the lowest bit, but you should still be fine.   You
still have 11 places after the decimal point.    In Matlab, which uses
double precision arithmetic, I don't get a problem converting to
gregorian until we drop to 8 places after the decimal point:

datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511) =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666651)   =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.7916666665)    =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666)      =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666)        =>

yes, it does seem funny we are losing so much precision. It probably has to do with converting internally to a Java date, which uses millisecs since 1970.

05-Dec-1989 18:59:59

2) preserving lowest bits of accuracy is tricky, and requires care, which i
promise has not (yet) happened in the CDM units handling. in general,
relying lowest bits being preserving is dicey.

That's okay -- we don't need to preserve that lowest bit.

how many bits do you need to preserve?


3) what is the definition of a "day". how accurate do you need that? All I
could find was this note in the units package:

        * Interval between 2 successive passages of sun through vernal
equinox
        * (365.242198781 days -- see
        * http://www.ast.cam.ac.uk/pubinfo/leaflets/,
        * http://aa.usno.navy.mil/AA/
        * and http://adswww.colorado.edu/adswww/astro_coord.html):

you may agree, but what if someone uses a different meaning for "day" ??

Take a look at udunits.dat:
http://www.unidata.ucar.edu/software/udunits/udunits-1/udunits.txt

A "day" is precisely defined as 86400 seconds.
A "sidereal day" is a different unit.

yes, the 86400 is clear. but how many days are there between date1 and date 2? you have to deal with leap years etc


4) IMHO, using udunits for calender date is a mistake. its a units package,
not a calender package.

Maybe, but I think to solve the current problem, we could just find
out where the computations are dropping the double precision.

yes, thats the short term solution



5) "47865.7916666665110000 days since 1858-11-17 00:00:00 UTC" is, um,
unreadable to humans.

What is not unreadable about that?   Yes, it's a big number with a lot
of precision, and a older date, but I think it's perfectly readable
and unambigous.    And as I mentioned, it's a an international
recognized convention called "Modified Julian Date".

its unreadable because you cant tell what the actual date it represents, 
without using software.


6) I earlier proposed to CF that we allow ISO date strings, more readable,
not ambiguous, and doesnt have a precision problem. Various CF authorities
thought it wasnt needed because it was redundant with the udunits
representation.

I think allowing ISO date strings in CF would be a good idea, and I
also think allowing a two integer representation in CF would be a good
idea (we use Julian day, and milliseconds since midnight as our two
integer vectors).   But that idea was also not too popular.   Several
people thought it would be a good idea, including Balaji, but there
was concern about to need to modify all existing CF applications to
handle these new time conventions.     But if this was just handled in
UDUNITS, I don't think this would be much problem, as I would think
that most CF-compliant apps have used the UDUNITS library to to their
math.

part of my point to CF is that one must use udunits (which has both C and Java versions, as well as multiple releases. do they always agree?). Its a mistake to tie long-term semantics as important as time to a single software package. better to document what its supposed to mean, so it can be independently implemented.


  • 2008 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: