Re: [netcdf-java] Data precision while aggregating data

To: Rich Signell <rsignell@xxxxxxxx>
Subject: Re: [netcdf-java] Data precision while aggregating data
From: Sachin Kumar Bhate <skbhate@xxxxxxxxxxxxxxx>
Date: Thu, 15 May 2008 18:45:20 -0400

Rich,

Ok, so lets see if the time is being rounded to nearest milliseconds orsimply truncated in netcdf-java.

I ran a test below to see how the time is being calculated innetcdf-java. So, making case simple, I usedthe 'unit' string as the java's standard base time and ran tests asshown below.

Test-1, which uses 'ucar.nc2.units.DateUnit' resulted in '0'milliseconds, while Test-2, which uses java's

calendar class resulted in '734' milliseconds.

So, I think the time is being truncated instead of rounded to nearestmilliseconds in netcdf-java. Internally it must be still using theinstance of java's Calendar class, but may be truncating the finalresults.

So, this may explain one second difference we may be getting in readingthat start value (i.e. 47865.791666666511 ).


I may be wrong, John may have better theory.  (Test cases below)

TEST-CASES:

unitString = "days since 1970-01-01 00:00:00 UTC";

which is also the standard base time (a.k.a 'epoch') used in java.

1. Test-case-1: Using 'ucar.nc2.units.DateUnit' class
        >>>
        import ucar.nc2.units.DateUnit;

        DateUnit du = new DateUnit(unitString);
        long originMS = du.getDateOrigin().getTime();
        System.out.println("Test-case-1:  "+ originMS+ " milliseconds");
         <<<<

        Result:   Test-case-1:  0 milliseconds

2. Test-case-2:  Using 'java.util.Calendar'  class
      >>>
          import java.util.*;

        Calendar cal = Calendar.getInstance(Locale.US);
        cal.setTimeZone(new SimpleTimeZone(0,"UTC"));

cal.set(1970, 0, 1,0, 0, 0); // 01/01/1970System.out.println("Test-case-2: "cal.getTimeInMillis()"+milliseconds");Test-case-2: 734 milliseconds.( Note: This result will vary anything between 0-999milliseconds, as each Calendar instancecontains the number of milliseconds since the epoch for therepresented point in time.)-Sachin.

Rich Signell wrote:

John,

Your comment about java representing time in milliseconds since 1970
gave me an idea:
perhaps the problem is simply a difference in the way that rounding is
done by the routine that calculates (year, mon, day, hour, min, sec)
from decimal days.

In the Matlab routine, time is rounded by default to the nearest second.

In the Java routine, is time rounded to the nearest millisecond, or
perhaps not even rounded, but simply truncated?

As a test, I tried adding 0.5 milliseconds to my time value:
47865.791666666511 + 1/24/3600/1000 = 47865.79166667230

and sure enough, I get the result I was looking for:

05-Dec-1989 19:00:00

-Rich

On Thu, May 15, 2008 at 1:10 PM, John Caron <caron@xxxxxxxxxxxxxxxx> wrote:

Rich Signell wrote:

John,

Four replies to your four comments:   ;-)

On Wed, May 14, 2008 at 9:08 PM, John Caron <caron@xxxxxxxxxxxxxxxx>
wrote:

Im not quite sure where the inaccuracy comes in, likely converting
between
Date and udunits representation. Ill have to see what I can do.

A few comments:

1) double has 53 bits of accuracy giving slightly under 16 decimal digits
of
accuracy. thus:

 public void testDoublePrecision() {
  double dval = 47865.7916666665110000;
  System.out.println(" dval= "+dval);
 }

prints:

 dval= 47865.79166666651

Okay, you lost the lowest bit, but you should still be fine.   You
still have 11 places after the decimal point.    In Matlab, which uses
double precision arithmetic, I don't get a problem converting to
gregorian until we drop to 8 places after the decimal point:

datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666511) =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666651)   =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.7916666665)    =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.791666666)      =>
05-Dec-1989 19:00:00
datestr(datenum([1858 11 17 0 0 0]) + 47865.79166666)        =>

yes, it does seem funny we are losing so much precision. It probably has to
do with converting internally to a Java date, which uses millisecs since
1970.

05-Dec-1989 18:59:59

2) preserving lowest bits of accuracy is tricky, and requires care, which
i
promise has not (yet) happened in the CDM units handling. in general,
relying lowest bits being preserving is dicey.

That's okay -- we don't need to preserve that lowest bit.

how many bits do you need to preserve?

3) what is the definition of a "day". how accurate do you need that? All
I
could find was this note in the units package:

       * Interval between 2 successive passages of sun through vernal
equinox
       * (365.242198781 days -- see
       * http://www.ast.cam.ac.uk/pubinfo/leaflets/,
       * http://aa.usno.navy.mil/AA/
       * and http://adswww.colorado.edu/adswww/astro_coord.html):

you may agree, but what if someone uses a different meaning for "day" ??

Take a look at udunits.dat:
http://www.unidata.ucar.edu/software/udunits/udunits-1/udunits.txt

A "day" is precisely defined as 86400 seconds.
A "sidereal day" is a different unit.

yes, the 86400 is clear. but how many days are there between date1 and date
2? you have to deal with leap years etc

4) IMHO, using udunits for calender date is a mistake. its a units
package,
not a calender package.

Maybe, but I think to solve the current problem, we could just find
out where the computations are dropping the double precision.

yes, thats the short term solution

5) "47865.7916666665110000 days since 1858-11-17 00:00:00 UTC" is, um,
unreadable to humans.

What is not unreadable about that?   Yes, it's a big number with a lot
of precision, and a older date, but I think it's perfectly readable
and unambigous.    And as I mentioned, it's a an international
recognized convention called "Modified Julian Date".

its unreadable because you cant tell what the actual date it represents,
without using software.

6) I earlier proposed to CF that we allow ISO date strings, more
readable,
not ambiguous, and doesnt have a precision problem. Various CF
authorities
thought it wasnt needed because it was redundant with the udunits
representation.

I think allowing ISO date strings in CF would be a good idea, and I
also think allowing a two integer representation in CF would be a good
idea (we use Julian day, and milliseconds since midnight as our two
integer vectors).   But that idea was also not too popular.   Several
people thought it would be a good idea, including Balaji, but there
was concern about to need to modify all existing CF applications to
handle these new time conventions.     But if this was just handled in
UDUNITS, I don't think this would be much problem, as I would think
that most CF-compliant apps have used the UDUNITS library to to their
math.

part of my point to CF is that one must use udunits (which has both C and
Java versions, as well as multiple releases. do they always agree?). Its a
mistake to tie long-term semantics as important as time to a single software
package. better to document what its supposed to mean, so it can be
independently implemented.


--
Sachin Kumar Bhate, Research Associate
MSU-High Performance Computing Collaboratory, NGI
John C. Stennis Space Center, MS 39529
http://www.northerngulfinstitute.org/

References:
- [netcdf-java] Data precision while aggregating data
  - From: Sachin Kumar Bhate
- Re: [netcdf-java] Data precision while aggregating data
  - From: Jon Blower
- Re: [netcdf-java] Data precision while aggregating data
  - From: Rich Signell
- Re: [netcdf-java] Data precision while aggregating data
  - From: John Caron
- Re: [netcdf-java] Data precision while aggregating data
  - From: Rich Signell
- Re: [netcdf-java] Data precision while aggregating data
  - From: John Caron
- Re: [netcdf-java] Data precision while aggregating data
  - From: Rich Signell

2008 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: