[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FW: contact person regarding MIT/LL study



>To: address@hidden
>From: "Shoup, Ryan" <address@hidden>
>Subject: Re: 20050214: contact person regarding MIT/LL study
>Organization: MIT
>Keywords: 200502142058.j1EKwMv2021835, netCDF suitability

Hi Ryan,

> As mentioned below, we are performing a small study to investigate the use
> of a scientific data format for one of the satellite links on the next
> generation GOES satellite.  The idea would be to package calibrated
> satellite instrument data into a netCDF format.  Users at receive sites
> would receive the GOES downlink data in netCDF format and convert the data
> or whatever.  Right now, the GOES satellite contains data that is in a
> unique GOES-specific format called the GVAR format.  This format is
> essentially a binary format for the data.
> 
> With this background, I had the following questions:
> 
> 1) Are you aware of netCDF being used [or planned to be used] on any
> satellite transmission link?

No.  It was originally designed as an API for a direct access file
format, so it might not be appropriate for data transmission.  A
program that writes a netCDF file can define variables in one order
and write them in a different order to a disk file with no loss of
efficiency, since seeking to a file offset before writing can be done
efficiently.  However, you can't define variables in one order and
write them to a transmission stream in a different order (you can't
seek on a pipe).  That may not be an important difference if you
intend to always write complete files on the satellite and then
transmit them serially, but if you intend to instead transmit data
when it's available rather than wait until all the data in a file is
written and the file is closed, netCDF may not be appropriate.

> 2) For a satellite transmission link, maintaining as low a bandwidth as
> possible for the link is quite important.  In you opinion, would netCDF
> unnecessarily add too much overhead for use in a satellite transmission
> link?

It would depend somewhat on characteristics of the data.  A
particularly bad example might be trying to transmit a stream of 9-bit
values using netCDF 16-bit shorts, in which 7 bits out of every 16
would be wasted.  Part of the overhead netCDF adds is unused bits for
quantities that don't fit neatly in its 8-, 16-, 32-, or 64-bit types.
There's not much overhead in the file header, and rectangular arrays
of data are stored compactly.  However, if you need to transmit a data
structure such as a ragged array with variable-length rows, the most
natural netCDF representation is a rectangular array with missing
values padding out each row to the maximum row size, so that might add
considerable overhead.

> 3) Can netCDF be encoded and decoded at rates of say 10 - 50 Mbps using
> standard PC or Unix workstations?

Yes, I don't think the encoding speed would be much of an issue.
NetCDF just uses IEEE floating point representation for floats and
doubles and big-endian representation for integers, so the
encoding/decoding is pretty trivial and fast.

> 4) Are the software APIs for netCDF robust enough to handle bit errors in a
> netCDF data-formatted data object [ie a .nc file].  We ask this as we
> attempted to inject a single bit error into a .nc file then use Windows APIs
> to netCDF decode the .nc file.  We noticed that the API hung our computer in
> some instances.

No, a single bit error in the initial header information, before the
data, could be very bad.  All the metadata for a netCDF file is stored
in its header, followed by all the data.  The metadata in the header
includes fields that hold the offset to the beginning of each
variable, the dimension sizes for each variable, etc.  A single bit
error in a variable offset would point to the wrong place in the
"file" for where the variable's data started.  A single bit error in
the type code for a variable would change the representation of the
variable's type, thus causing the wrong decoding to be used,
e.g. reading integers as floating-point numbers or text as doubles.  I
could imagine some bit errors causing a reader to seem to hang, e.g. a
high-order bit in a dimension size flipped might specify a huge size
for variables that use that dimension, so the reader might try to
read much more data than was available.

While there is considerable error checking for consistency when
reading a netCDF file, so that a bit error in the header might result
in an error indication that the file was corrupt, there would be no
way to recover, since there is no redundancy such as checksums in
netCDF files.  A single bit error in the data portion of a netCDF
dataset would likely go undetected, so it could result in an
arbitrarily bad data value.

In summary, I think transmission formats that have been designed to
handle dropped bits, noise, error detection, error correction, and
compression are generally better for transmission than a
general-purpose file format designed for efficient disk accesses.  On
the other hand, a data model, API, and disk format like netCDF is
usually a better choice to insulate applications from details of the
representation of data in files or on servers, and is a good choice
for data providers to capture and preserve self-describing data for
later use by others.  

To get the best of both worlds typically involves a special format for
data transmission, followed by "decoders" that store the transmitted
data in a more convenient form for later access.  For example, much
atmospheric science data is transmitted using the GRIB format, then
decoded into netCDF files for use by applications.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden          http://www.unidata.ucar.edu/staff/russ