[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

971020: Precision of DOUBLE address@hidden



Jan asks:

> Keywords: 199710191607.KAA18217
>
> Now I have a difficult question for you !
> We would like to use netCDF for the restart files of our GCM. This means
> that we have to store the data with the precision of the machine in
> netCDF. I have done a few tests where I store a R*4 and R*8 as float and
> double in netCDF and see how it looks when I read it again.
>
> On a Pentium it works fine. In any case the double external type
> contains everything up to the last bit. On a VPP700 things are not that
> simple.
>
> true R*4 -> archived correctly in the double and float external types
> true R*8 -> archived correctly in the double external type
> (DOUBLE PRECISION declaration in FORTRAN)
> promoted R*8 -> neither format contains the correct full data !
>
> I could not yet try a Cray C90 as I do not have the 3.3.1 version there.
> Anyway I would not be able to not try all computers so I would like from
> you if you know, on theoretical or experimental grounds, if it will work
> on all or most computers. That is if the data is stored in the double type
> I can get all my bites back !


Steve replies:

" I expect that only the following cases are guaranteed to maintain the
" bit-pattern:
"
"     1.  Internal, 4-byte IEEE floating-point values stored as
"         netCDF floats; and
"
"     2.  Internal, 8-byte IEEE floating-point values stored as netCDF
"       doubles.
" I've copied a more authoritative person on this reply (Glenn Davis).
" Hopefully, you'll hear from him on this matter.

Steve is correct. I'll amplify his response.

The netcdf file format uses IEEE standand floating point representation.

        IEEE single (32 bits) (usually C 'float' and FORTRAN 'REAL' or
'REAL*4')
                 1 bit  for sign
                 8 bits for exponent
                23 bits for mantissa

        IEEE double (64 bits) (usually C 'float' and FORTRAN 'REAL' or
'REAL*4')
                 1 bit  for sign
                11 bits for exponent
                52 bits for mantissa

(The mantissa is "normalized", so that effectively there is one
more bit of mantissa which is assummed to be 1.)

Almost all modern computers use this floating point representation.
Notably, some "current production" CRAY machines do not.
(Watch out, though. Many new CRAY machines use IEEE floating point).

        Cray single (64 bits)
                 1 bit  for sign
                15 bits for exponent
                48 bits for mantissa (Not normalized)

        Cray double (128 bits)
                (not often used)

Analysing the situtation, we see that the CRAY single is "close" to
an IEEE double (NC_DOUBLE). The CRAY single has a bigger exponent, and
therefore can contain bigger numbers (10^2465 vs IEEE 10^308).
The CRAY single has fewer mantissa bits than the IEEE double, so
the IEEE double has greater precision
(2.2204460492503131e-16 vs CRAY 7.10542735760100e-15)

Fact:
Within the range of IEEE double
(-1.7976931348623158e308 < x < 1.7976931348623158e308)
every number represented in CRAY single is exactly representable
in IEEE double.

So, you shouldn't have a problem.
In your model running on old-style CRAY, use the REAL declaration
that gives you CRAY single. Typically this REAL, or REAL*8, but
the compiler flags muddy the issue. Save your data as NC_DOUBLE.
You should get back the same number.
(If you don't, I need to fix something in the ncx_cray.c.)

Be careful, though. Bit for bit comparisions are tricky for
CRAY floating point. Because the numbers are not normalized,
there are typically 2 valid CRAY representations for a
given number. Also, the functions which scan and print
(base 10) ascii representations are often tricky to
use to make comparisions.
Use the difference of the numbers, or the output of the
frexp() function for comparisions.

For more detailed discussion of these issues, see
the CRAY document "Sources of Numerical Differences" by
Dr. C. J. Suchyata.

Thank you for using netcdf.

-glenn