Re: [netcdf-java] Data precision while aggregating data

To: Rich Signell <rsignell@xxxxxxxx>
Subject: Re: [netcdf-java] Data precision while aggregating data
From: Sachin Kumar Bhate <skbhate@xxxxxxxxxxxxxxx>
Date: Thu, 15 May 2008 13:13:57 -0400

Rich and all,

Looks like the problem may not be in parsing as double and loosing thatone decimal place. It may

be somewhere else in udunits.

Just wrote a small test method in java and looks like I have to dropatleast 8 decimal places to make

a difference of one second.

>>>
import java.uti.*

public static String getPreciseDate(double startValue) {

           final double daysToMS = 24*60*60*1000;

Calendar cal = Calendar.getInstance(newSimpleTimeZone(0,"GMT"));

           cal.set(1858, 10, 17,0, 0, 0);   // 11/17/1858.

           long calTimeMS = cal.getTimeInMillis();
           calTimeMS = calTimeMS + (long) (startValue*daysToMS);

           Date newDate = new Date(calTimeMS);
           return(newDate.toGMTString());

   }

>>>

test results:
val1: 47865.7916666665110000
Date: 5 Dec 1989 19:00:00 GMT
val2: 47865.79166666651
Date: 5 Dec 1989 19:00:00 GMT
val3: 47865.7916666665
Date: 5 Dec 1989 19:00:00 GMT
val4: 47865.791666666
Date: 5 Dec 1989 19:00:00 GMT
val5: 47865.79166666
Date: 5 Dec 1989 19:00:00 GMT
val6: 47865.7916666
Date: 5 Dec 1989 19:00:00 GMT
val7: 47865.791666
Date: 5 Dec 1989 19:00:00 GMT
val8: 47865.79166
Date: 5 Dec 1989 18:59:59 GMT

-Sachin

Rich Signell wrote:

John,

Four replies to your four comments:   ;-)

On Wed, May 14, 2008 at 9:08 PM, John Caron <caron@xxxxxxxxxxxxxxxx> wrote:

Im not quite sure where the inaccuracy comes in, likely converting between
Date and udunits representation. Ill have to see what I can do.

A few comments:

1) double has 53 bits of accuracy giving slightly under 16 decimal digits of
accuracy. thus:

 public void testDoublePrecision() {
   double dval = 47865.7916666665110000;
   System.out.println(" dval= "+dval);
 }

prints:

 dval= 47865.79166666651


Okay, you lost the lowest bit, but you should still be fine.   You
still have 11 places after the decimal point.    In Matlab, which uses
double precision arithmetic, I don't get a problem converting to
gregorian until we drop to 8 places after the decimal point:

datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511) =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666651)   =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.7916666665)    =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666)      =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666)        =>
05-Dec-1989 18:59:59

2) preserving lowest bits of accuracy is tricky, and requires care, which i
promise has not (yet) happened in the CDM units handling. in general,
relying lowest bits being preserving is dicey.


That's okay -- we don't need to preserve that lowest bit.

3) what is the definition of a "day". how accurate do you need that? All I
could find was this note in the units package:

        * Interval between 2 successive passages of sun through vernal
equinox
        * (365.242198781 days -- see
        * http://www.ast.cam.ac.uk/pubinfo/leaflets/,
        * http://aa.usno.navy.mil/AA/
        * and http://adswww.colorado.edu/adswww/astro_coord.html):

you may agree, but what if someone uses a different meaning for "day" ??


Take a look at udunits.dat:
http://www.unidata.ucar.edu/software/udunits/udunits-1/udunits.txt

A "day" is precisely defined as 86400 seconds.
A "sidereal day" is a different unit.

4) IMHO, using udunits for calender date is a mistake. its a units package,
not a calender package.


Maybe, but I think to solve the current problem, we could just find
out where the computations are dropping the double precision.

5) "47865.7916666665110000 days since 1858-11-17 00:00:00 UTC" is, um,
unreadable to humans.


What is not unreadable about that?   Yes, it's a big number with a lot
of precision, and a older date, but I think it's perfectly readable
and unambigous.    And as I mentioned, it's a an international
recognized convention called "Modified Julian Date".

6) I earlier proposed to CF that we allow ISO date strings, more readable,
not ambiguous, and doesnt have a precision problem. Various CF authorities
thought it wasnt needed because it was redundant with the udunits
representation.


I think allowing ISO date strings in CF would be a good idea, and I
also think allowing a two integer representation in CF would be a good
idea (we use Julian day, and milliseconds since midnight as our two
integer vectors).   But that idea was also not too popular.   Several
people thought it would be a good idea, including Balaji, but there
was concern about to need to modify all existing CF applications to
handle these new time conventions.     But if this was just handled in
UDUNITS, I don't think this would be much problem, as I would think
that most CF-compliant apps have used the UDUNITS library to to their
math.

-Rich

Rich Signell wrote:

Jon,

The precision of the time vector with "units since XXXX" must
definitely be considered carefully, but we did think about this.

We want to store all our oceanographic time series data with the same
time convention to facilitate aggregation and minimize mods to
existing software.

Choosing time as double precision with units of "days since 1858-11-17
00:00"  should give us a precision of:
 - Better than 3.0e-5 milliseconds until August 31, 2132 and
 - Better than 3.0e-4 milliseconds until October 12, 4596!

(This is actually is the definition of "Modified Julian Day", which is
one of the few internationally recognized time conventions that starts
at midnight. See http://tycho.usno.navy.mil/mjd.html for more info.
It also has the advantage of being a date by which nearly all the
world had finally switched to a Gregorian calendar, and early enough
so that most of the data we want to represent will have positive time
values.)

The bug Sachin reported is a big deal for us, since we want to use
NcML and THREDDS as a way of serving our hundreds of oceanographic
time series files as CF compliant using NcML with the THREDDS data
server without changing any of the original files.    The original
files are NetCDF, but with a non-standard convention for time:  an
integer array with julian day, and a second integer array with
milliseconds since midnight.    This allows integer math with time to
give results with no round off problems.

We have a script in Matlab (that uses double precision math) to take
our two integer format for time and create NcML for a CF-compliant
time array using start and increment.   That script produces NcML like
this:

<variable name="time" shape="time" type="double">
 <attribute name="units" value="days since 1858-11-17 00:00:00 UTC"/>
 <attribute name="long_name" value="Modified Julian Day"/>
 <values start="47865.7916666665110000" increment="0.0416666666666667"/>
</variable>

As Sachin mentioned, the start time for this file is  "05-Dec-1989
19:00:00", and as proof that we have sufficient precision, when we
simply load the time vector in NetCDF-java and do the double precision
math in Matlab, we get the right start time:

datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511)

ans =  05-Dec-1989 19:00:00

but when we use the NetCDF-Java time routines to convert to Gregorian, we
get

05-Dec-1989 18:59:59 GMT

Clearly our users will not accept this.   I hope this can get resolved
soon!!!!

-Rich

On Tue, May 13, 2008 at 2:52 AM, Jon Blower <jdb@xxxxxxxxxxxxxxxxxxxx>
wrote:

Hi,

 I have seen similar issues (time values being out by a second or two).
 I was wondering whether it's something to do with udunits and
 calculating dates on the basis of "units since XXXXXX".  I seem to
 remember an earlier conversation on this list (or maybe on the CF
 list) concerning how udunits defines the length of certain time-spans
 (e.g. a month) and wondered whether this might be the issue?  Jonathan
 Gregory recommended against using "months since" and "years since" and
 sticking to seconds or days to avoid ambiguities in the length of a
 month/year.  But maybe this is a red herring.

 Whatever the issue is I'd be very keen to understand it as it's
 affecting me too!

 Cheers, Jon


 On Mon, May 12, 2008 at 9:31 PM, Sachin Kumar Bhate
 <skbhate@xxxxxxxxxxxxxxx> wrote:

John,

 >
 >  The NcML  file shown below attempts to aggregate time series files,
 >  overriding
 >  the time values for each 'time' variable.
 >
 >  The aggregation works great and I can access the time values as well,
 >  but I see that there is loss of precision in the new time values,
when I
 >  access
 >  values for a coordinate data variable.
 >
 >  For example:
 >
 >  <<<<
 >    URI =
 >
 'http://www.gri.msstate.edu/rsearch_data/nopp/test_agg_precision.ncml';
 >    String var="T_20";
 >
 >    GridDataset gid = GridDataset.open(URI);
 >    GeoGrid Grid = gid.findGridByName(var);
 >    GridCoordSys GridCoordS = (GridCoordSys)
Grid.getCoordinateSystem();
 >
 >     java.util.Date d[] = GridCoordS.getTimeDates();
 >
 >     System.out.println("DateString: "+d[0].toGMTString());
 >   >>>>>
 >
 >  The output from the above code for the 1st time value in the java
Date
 >  array.
 >
 >  DateString: 5 Dec 1989 18:59:59 GMT
 >
 >  But, the correct value should be
 >
 >  DateString: 5 Dec 1989 19:00:00 GMT
 >
 >
 >  Just out of curiosity I tried to print the 1st time value being read
 >  from the NcML,
 >  by 'ucar.nc2.ncml.NcmlReader.readValues()'. I get,
 >
 >  Start = 47865.79166666651;   (Parsed as double)
 >
 >  but,  the 1st start value specified in NcML is
 '47865.7916666665110000'.
 >
 >  Don't care about the tailing '0s', but the digit '1' in the 12th
decimal
 >  place is being dropped and may be causing this
 >  problem.
 >
 >  Although, parsing it as a 'BigDecimal' does read in the correct
value.
 >
 >  Start-BigDecimal: 47865.7916666665110000
 >
 >
 >  I am just guessing here, I am not sure if this is what causing the
 >  precision problem.
 >
 >  Will appreciate your help.
 >
 >  thanks..
 >
 >  Sachin
 >
 >  --
 >  Sachin Kumar Bhate, Research Associate
 >  MSU-High Performance Computing Collaboratory, NGI
 >  John C. Stennis Space Center, MS 39529
 >  http://www.northerngulfinstitute.org/
 >
 >
 >
 >  _______________________________________________
 >  netcdf-java mailing list
 >  netcdf-java@xxxxxxxxxxxxxxxx
 >  For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
 >



 --
 --------------------------------------------------------------
 Dr Jon Blower Tel: +44 118 378 5213 (direct line)
 Technical Director Tel: +44 118 378 8741 (ESSC)
 Reading e-Science Centre Fax: +44 118 378 6413
 ESSC Email: jdb@xxxxxxxxxxxxxxxxxxxx
 University of Reading
 3 Earley Gate
 Reading RG6 6AL, UK
 --------------------------------------------------------------


_______________________________________________
 netcdf-java mailing list
 netcdf-java@xxxxxxxxxxxxxxxx
 For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/


--
Sachin Kumar Bhate, Research Associate
MSU-High Performance Computing Collaboratory, NGI
John C. Stennis Space Center, MS 39529
http://www.northerngulfinstitute.org/

References:
- [netcdf-java] Data precision while aggregating data
  - From: Sachin Kumar Bhate
- Re: [netcdf-java] Data precision while aggregating data
  - From: Jon Blower
- Re: [netcdf-java] Data precision while aggregating data
  - From: Rich Signell
- Re: [netcdf-java] Data precision while aggregating data
  - From: John Caron
- Re: [netcdf-java] Data precision while aggregating data
  - From: Rich Signell

2008 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: