[netcdf-java] Data precision while aggregating data
John Caron
caron at unidata.ucar.edu
Wed May 14 19:08:21 MDT 2008
Im not quite sure where the inaccuracy comes in, likely converting between Date and udunits
representation. Ill have to see what I can do.
A few comments:
1) double has 53 bits of accuracy giving slightly under 16 decimal digits of accuracy. thus:
public void testDoublePrecision() {
double dval = 47865.7916666665110000;
System.out.println(" dval= "+dval);
}
prints:
dval= 47865.79166666651
2) preserving lowest bits of accuracy is tricky, and requires care, which i promise has not (yet)
happened in the CDM units handling. in general, relying lowest bits being preserving is dicey.
3) what is the definition of a "day". how accurate do you need that? All I could find was this note
in the units package:
* Interval between 2 successive passages of sun through vernal equinox
* (365.242198781 days -- see
* http://www.ast.cam.ac.uk/pubinfo/leaflets/,
* http://aa.usno.navy.mil/AA/
* and http://adswww.colorado.edu/adswww/astro_coord.html):
you may agree, but what if someone uses a different meaning for "day" ??
4) IMHO, using udunits for calender date is a mistake. its a units package, not a calender package.
5) "47865.7916666665110000 days since 1858-11-17 00:00:00 UTC" is, um, unreadable to humans.
6) I earlier proposed to CF that we allow ISO date strings, more readable, not ambiguous, and doesnt
have a precision problem. Various CF authorities thought it wasnt needed because it was redundant
with the udunits representation.
Rich Signell wrote:
> Jon,
>
> The precision of the time vector with "units since XXXX" must
> definitely be considered carefully, but we did think about this.
>
> We want to store all our oceanographic time series data with the same
> time convention to facilitate aggregation and minimize mods to
> existing software.
>
> Choosing time as double precision with units of "days since 1858-11-17
> 00:00" should give us a precision of:
> - Better than 3.0e-5 milliseconds until August 31, 2132 and
> - Better than 3.0e-4 milliseconds until October 12, 4596!
>
> (This is actually is the definition of "Modified Julian Day", which is
> one of the few internationally recognized time conventions that starts
> at midnight. See http://tycho.usno.navy.mil/mjd.html for more info.
> It also has the advantage of being a date by which nearly all the
> world had finally switched to a Gregorian calendar, and early enough
> so that most of the data we want to represent will have positive time
> values.)
>
> The bug Sachin reported is a big deal for us, since we want to use
> NcML and THREDDS as a way of serving our hundreds of oceanographic
> time series files as CF compliant using NcML with the THREDDS data
> server without changing any of the original files. The original
> files are NetCDF, but with a non-standard convention for time: an
> integer array with julian day, and a second integer array with
> milliseconds since midnight. This allows integer math with time to
> give results with no round off problems.
>
> We have a script in Matlab (that uses double precision math) to take
> our two integer format for time and create NcML for a CF-compliant
> time array using start and increment. That script produces NcML like
> this:
>
> <variable name="time" shape="time" type="double">
> <attribute name="units" value="days since 1858-11-17 00:00:00 UTC"/>
> <attribute name="long_name" value="Modified Julian Day"/>
> <values start="47865.7916666665110000" increment="0.0416666666666667"/>
> </variable>
>
> As Sachin mentioned, the start time for this file is "05-Dec-1989
> 19:00:00", and as proof that we have sufficient precision, when we
> simply load the time vector in NetCDF-java and do the double precision
> math in Matlab, we get the right start time:
>
> datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511)
>
> ans = 05-Dec-1989 19:00:00
>
> but when we use the NetCDF-Java time routines to convert to Gregorian, we get
>
> 05-Dec-1989 18:59:59 GMT
>
> Clearly our users will not accept this. I hope this can get resolved soon!!!!
>
> -Rich
>
> On Tue, May 13, 2008 at 2:52 AM, Jon Blower <jdb at mail.nerc-essc.ac.uk> wrote:
>> Hi,
>>
>> I have seen similar issues (time values being out by a second or two).
>> I was wondering whether it's something to do with udunits and
>> calculating dates on the basis of "units since XXXXXX". I seem to
>> remember an earlier conversation on this list (or maybe on the CF
>> list) concerning how udunits defines the length of certain time-spans
>> (e.g. a month) and wondered whether this might be the issue? Jonathan
>> Gregory recommended against using "months since" and "years since" and
>> sticking to seconds or days to avoid ambiguities in the length of a
>> month/year. But maybe this is a red herring.
>>
>> Whatever the issue is I'd be very keen to understand it as it's
>> affecting me too!
>>
>> Cheers, Jon
>>
>>
>> On Mon, May 12, 2008 at 9:31 PM, Sachin Kumar Bhate
>> <skbhate at ngi.msstate.edu> wrote:
>>
>>
>>> John,
>> >
>> > The NcML file shown below attempts to aggregate time series files,
>> > overriding
>> > the time values for each 'time' variable.
>> >
>> > The aggregation works great and I can access the time values as well,
>> > but I see that there is loss of precision in the new time values, when I
>> > access
>> > values for a coordinate data variable.
>> >
>> > For example:
>> >
>> > <<<<
>> > URI =
>> > 'http://www.gri.msstate.edu/rsearch_data/nopp/test_agg_precision.ncml';
>> > String var="T_20";
>> >
>> > GridDataset gid = GridDataset.open(URI);
>> > GeoGrid Grid = gid.findGridByName(var);
>> > GridCoordSys GridCoordS = (GridCoordSys) Grid.getCoordinateSystem();
>> >
>> > java.util.Date d[] = GridCoordS.getTimeDates();
>> >
>> > System.out.println("DateString: "+d[0].toGMTString());
>> > >>>>>
>> >
>> > The output from the above code for the 1st time value in the java Date
>> > array.
>> >
>> > DateString: 5 Dec 1989 18:59:59 GMT
>> >
>> > But, the correct value should be
>> >
>> > DateString: 5 Dec 1989 19:00:00 GMT
>> >
>> >
>> > Just out of curiosity I tried to print the 1st time value being read
>> > from the NcML,
>> > by 'ucar.nc2.ncml.NcmlReader.readValues()'. I get,
>> >
>> > Start = 47865.79166666651; (Parsed as double)
>> >
>> > but, the 1st start value specified in NcML is '47865.7916666665110000'.
>> >
>> > Don't care about the tailing '0s', but the digit '1' in the 12th decimal
>> > place is being dropped and may be causing this
>> > problem.
>> >
>> > Although, parsing it as a 'BigDecimal' does read in the correct value.
>> >
>> > Start-BigDecimal: 47865.7916666665110000
>> >
>> >
>> > I am just guessing here, I am not sure if this is what causing the
>> > precision problem.
>> >
>> > Will appreciate your help.
>> >
>> > thanks..
>> >
>> > Sachin
>> >
>> > --
>> > Sachin Kumar Bhate, Research Associate
>> > MSU-High Performance Computing Collaboratory, NGI
>> > John C. Stennis Space Center, MS 39529
>> > http://www.northerngulfinstitute.org/
>> >
>> >
>> >
>> > _______________________________________________
>> > netcdf-java mailing list
>> > netcdf-java at unidata.ucar.edu
>> > For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/
>> >
>>
>>
>>
>> --
>> --------------------------------------------------------------
>> Dr Jon Blower Tel: +44 118 378 5213 (direct line)
>> Technical Director Tel: +44 118 378 8741 (ESSC)
>> Reading e-Science Centre Fax: +44 118 378 6413
>> ESSC Email: jdb at mail.nerc-essc.ac.uk
>> University of Reading
>> 3 Earley Gate
>> Reading RG6 6AL, UK
>> --------------------------------------------------------------
>>
>>
>> _______________________________________________
>> netcdf-java mailing list
>> netcdf-java at unidata.ucar.edu
>> For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/
>>
>
>
>
More information about the netcdf-java
mailing list