[netcdf-java] Data precision while aggregating data

John Caron caron at unidata.ucar.edu
Wed May 14 19:08:21 MDT 2008


Im not quite sure where the inaccuracy comes in, likely converting between Date and udunits 
representation. Ill have to see what I can do.

A few comments:

1) double has 53 bits of accuracy giving slightly under 16 decimal digits of accuracy. thus:

   public void testDoublePrecision() {
     double dval = 47865.7916666665110000;
     System.out.println(" dval= "+dval);
   }

prints:

   dval= 47865.79166666651

2) preserving lowest bits of accuracy is tricky, and requires care, which i promise has not (yet) 
happened in the CDM units handling. in general, relying lowest bits being preserving is dicey.

3) what is the definition of a "day". how accurate do you need that? All I could find was this note 
in the units package:

	 * Interval between 2 successive passages of sun through vernal equinox
	 * (365.242198781 days -- see
	 * http://www.ast.cam.ac.uk/pubinfo/leaflets/,
	 * http://aa.usno.navy.mil/AA/
	 * and http://adswww.colorado.edu/adswww/astro_coord.html):

you may agree, but what if someone uses a different meaning for "day" ??

4) IMHO, using udunits for calender date is a mistake. its a units package, not a calender package.

5) "47865.7916666665110000 days since 1858-11-17 00:00:00 UTC" is, um, unreadable to humans.

6) I earlier proposed to CF that we allow ISO date strings, more readable, not ambiguous, and doesnt 
have a precision problem. Various CF authorities thought it wasnt needed because it was redundant 
with the udunits representation.



Rich Signell wrote:
> Jon,
> 
> The precision of the time vector with "units since XXXX" must
> definitely be considered carefully, but we did think about this.
> 
> We want to store all our oceanographic time series data with the same
> time convention to facilitate aggregation and minimize mods to
> existing software.
> 
> Choosing time as double precision with units of "days since 1858-11-17
> 00:00"  should give us a precision of:
>   - Better than 3.0e-5 milliseconds until August 31, 2132 and
>   - Better than 3.0e-4 milliseconds until October 12, 4596!
> 
> (This is actually is the definition of "Modified Julian Day", which is
> one of the few internationally recognized time conventions that starts
> at midnight. See http://tycho.usno.navy.mil/mjd.html for more info.
> It also has the advantage of being a date by which nearly all the
> world had finally switched to a Gregorian calendar, and early enough
> so that most of the data we want to represent will have positive time
> values.)
> 
> The bug Sachin reported is a big deal for us, since we want to use
> NcML and THREDDS as a way of serving our hundreds of oceanographic
> time series files as CF compliant using NcML with the THREDDS data
> server without changing any of the original files.    The original
> files are NetCDF, but with a non-standard convention for time:  an
> integer array with julian day, and a second integer array with
> milliseconds since midnight.    This allows integer math with time to
> give results with no round off problems.
> 
> We have a script in Matlab (that uses double precision math) to take
> our two integer format for time and create NcML for a CF-compliant
> time array using start and increment.   That script produces NcML like
> this:
> 
> <variable name="time" shape="time" type="double">
>   <attribute name="units" value="days since 1858-11-17 00:00:00 UTC"/>
>   <attribute name="long_name" value="Modified Julian Day"/>
>   <values start="47865.7916666665110000" increment="0.0416666666666667"/>
> </variable>
> 
> As Sachin mentioned, the start time for this file is  "05-Dec-1989
> 19:00:00", and as proof that we have sufficient precision, when we
> simply load the time vector in NetCDF-java and do the double precision
> math in Matlab, we get the right start time:
> 
> datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511)
> 
> ans =  05-Dec-1989 19:00:00
> 
> but when we use the NetCDF-Java time routines to convert to Gregorian, we get
> 
> 05-Dec-1989 18:59:59 GMT
> 
> Clearly our users will not accept this.   I hope this can get resolved soon!!!!
> 
> -Rich
> 
> On Tue, May 13, 2008 at 2:52 AM, Jon Blower <jdb at mail.nerc-essc.ac.uk> wrote:
>> Hi,
>>
>>  I have seen similar issues (time values being out by a second or two).
>>   I was wondering whether it's something to do with udunits and
>>  calculating dates on the basis of "units since XXXXXX".  I seem to
>>  remember an earlier conversation on this list (or maybe on the CF
>>  list) concerning how udunits defines the length of certain time-spans
>>  (e.g. a month) and wondered whether this might be the issue?  Jonathan
>>  Gregory recommended against using "months since" and "years since" and
>>  sticking to seconds or days to avoid ambiguities in the length of a
>>  month/year.  But maybe this is a red herring.
>>
>>  Whatever the issue is I'd be very keen to understand it as it's
>>  affecting me too!
>>
>>  Cheers, Jon
>>
>>
>>  On Mon, May 12, 2008 at 9:31 PM, Sachin Kumar Bhate
>>  <skbhate at ngi.msstate.edu> wrote:
>>
>>
>>> John,
>>  >
>>  >  The NcML  file shown below attempts to aggregate time series files,
>>  >  overriding
>>  >  the time values for each 'time' variable.
>>  >
>>  >  The aggregation works great and I can access the time values as well,
>>  >  but I see that there is loss of precision in the new time values, when I
>>  >  access
>>  >  values for a coordinate data variable.
>>  >
>>  >  For example:
>>  >
>>  >  <<<<
>>  >    URI =
>>  >  'http://www.gri.msstate.edu/rsearch_data/nopp/test_agg_precision.ncml';
>>  >    String var="T_20";
>>  >
>>  >    GridDataset gid = GridDataset.open(URI);
>>  >    GeoGrid Grid = gid.findGridByName(var);
>>  >    GridCoordSys GridCoordS = (GridCoordSys) Grid.getCoordinateSystem();
>>  >
>>  >     java.util.Date d[] = GridCoordS.getTimeDates();
>>  >
>>  >     System.out.println("DateString: "+d[0].toGMTString());
>>  >   >>>>>
>>  >
>>  >  The output from the above code for the 1st time value in the java Date
>>  >  array.
>>  >
>>  >  DateString: 5 Dec 1989 18:59:59 GMT
>>  >
>>  >  But, the correct value should be
>>  >
>>  >  DateString: 5 Dec 1989 19:00:00 GMT
>>  >
>>  >
>>  >  Just out of curiosity I tried to print the 1st time value being read
>>  >  from the NcML,
>>  >  by 'ucar.nc2.ncml.NcmlReader.readValues()'. I get,
>>  >
>>  >  Start = 47865.79166666651;   (Parsed as double)
>>  >
>>  >  but,  the 1st start value specified in NcML is  '47865.7916666665110000'.
>>  >
>>  >  Don't care about the tailing '0s', but the digit '1' in the 12th decimal
>>  >  place is being dropped and may be causing this
>>  >  problem.
>>  >
>>  >  Although, parsing it as a 'BigDecimal' does read in the correct value.
>>  >
>>  >  Start-BigDecimal: 47865.7916666665110000
>>  >
>>  >
>>  >  I am just guessing here, I am not sure if this is what causing the
>>  >  precision problem.
>>  >
>>  >  Will appreciate your help.
>>  >
>>  >  thanks..
>>  >
>>  >  Sachin
>>  >
>>  >  --
>>  >  Sachin Kumar Bhate, Research Associate
>>  >  MSU-High Performance Computing Collaboratory, NGI
>>  >  John C. Stennis Space Center, MS 39529
>>  >  http://www.northerngulfinstitute.org/
>>  >
>>  >
>>  >
>>  >  _______________________________________________
>>  >  netcdf-java mailing list
>>  >  netcdf-java at unidata.ucar.edu
>>  >  For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/
>>  >
>>
>>
>>
>>  --
>>  --------------------------------------------------------------
>>  Dr Jon Blower Tel: +44 118 378 5213 (direct line)
>>  Technical Director Tel: +44 118 378 8741 (ESSC)
>>  Reading e-Science Centre Fax: +44 118 378 6413
>>  ESSC Email: jdb at mail.nerc-essc.ac.uk
>>  University of Reading
>>  3 Earley Gate
>>  Reading RG6 6AL, UK
>>  --------------------------------------------------------------
>>
>>
>> _______________________________________________
>>  netcdf-java mailing list
>>  netcdf-java at unidata.ucar.edu
>>  For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/
>>
> 
> 
> 


More information about the netcdf-java mailing list