RE: NetCDF for underway oceanographic data storage

Phil Morgan (Phil.Morgan@ml.csiro.au)
Thu, 17 Sep 92 15:09:29 EST

Gday all [Hi all netcdf users]

I have a few comments of this topic. (I have included a copy of
Lindsay Pender's email that did not get distributed to the netcdfgroup).
If you wish to reply, please reply to the netrcdfgroup so we can all be
informed of this discussion. I think there may be many interested
parties in this group.

Cheers
-Phil Morgan

Addition comments at end by Lindsay Pender

*******************************************************

LINDSAY PENDER WRITES:
=====================
>> >From pender@ml.csiro.au Mon Sep 14 21:14:10 1992
>> >Subject: NetCDF for underway oceanographic data storage
>> >
>> >I read with interest your mail message expressing your intention to use netCDF
>> >for underway data storage. I have also considered this approach for the same
>> >reasons you give, however came up with some conceptual difficulties when I was
>> >looking at ways to implement it. It may be that your data is different, but in
>> >our case we have data coming from many different sources, each with different
>> >sampling rates. Some of our instruments are sampled at 2.5kHz, while others are
>> >as slow as once a minute. For an underway data storage system using netCDF how
>> >do you store such data with only one 'unlimited' dimension? What I have
>> >considered doing, is to collect data from the various instruments into fixed
>> >length blocks, and then after some suitable time writing all of the data to a
>> >netCDF file with the now known dimensions. Using this scheme, I would have to
>> >carry an extra variable for each instrument - the time stamp for each block.
>> >
>> >Any comment?
>> >
>> >Regards
>> >
>> >Lindsay Pender

TIM HOLT WRITES:
===============
>> OSU currently can manage it's data by logging 1 minute averages for
>> all instruments. No one yet has asked for finer resolution from our
>> common use equipment. CTD, ADCP, and other such higher resolution
>> systems are managed and logged by their own software and are currently
>> independent of the new netCDF system. Soon though, I will need to merge
>> in some finer res. data (5 second GPS and ADCP). Here is my scheme, and
>> I'm real curious what kinds of alternatives others can suggest.
>>
>> I'll see if I can describe my idea with a CDL file. It may not be the best
>> way, but I guess it will work...
>>
>>
>>
>> <<< BEGIN multi_res.cdl >>>
>>
>> netcdf multires {
>>
>> dimensions:
>> min_max_mean = 3; // store 3 numbers: min, max, mean
>> ten_hz = 600; // number of 10.0 hZ samples in 1 minute
>> five_hz = 300; // number of 5.0 hZ samples in 1 minute
>> twopoint5_hz = 150; // number of 2.5 hZ samples in 1 minute
>> one_hz = 60; // number of 1.0 hZ samples in 1 minute
>> five_second = 20; // number of 0.05 hZ samples in 1 minute
>> time = unlimited; // the "time" dimension
>>
>> variables:
>> long time(time); // seconds since some fixed point in time
>> float gps_lat(time); // gps latitude in sample period
>> float gps_lon(time); // gps longitude in sample period
>> short n_sats(time); // number of satellites used in fix
>> float raw_gps_lat(time, five_second); // raw gps latitude
>> float raw_gps_lon(time, five_second); // raw gps longitude
>> float sea_temp(time, min_max_mean); // sea surface temperature
>> float towed_cdt_temp (time, ten_hz); // raw CTD temperature
>> float towed_ctd_cond (time, ten_hz); // raw CTD conductivity
>> }
>>
>> <<< END multi_res.cdl >>>
>>
>> The idea is to pick the least common denominator (1 minute data) and
>> pack anything that's a finer resolution into a new dimension.
>>
>> I did try this scheme for a towed vehicle logging/display system, but I
>> found the netCDF time overhead (on a PC) was too high for me to log real
>> time, raw 24 hZ CTD data. Too many variables to log -- more than the
>> simple example above. I still used the same idea, but went to a simpler
>> ASCII file for quick I/O.
>>
>> Comments???
>>
>> Tim Holt, OSU Oceanography
>> holtt@oce.orst.edu
>>


Reading in data and saving to a file in real-time will always be
limited by the sampling rate and the number of samples monitored.
Saving directly to netCDF format adds an extra cost in processing
overheads.

For fast sampling and/or large number of samples, it is best to
save the "continuously" sampled data records from an instrument
directly to a file(ascii or binary[fastest]). For example, read
data into a record in fortran (yes, i know it's an extension) or a
structure in C etc and write out the whole record in binary format.
This file of records acts as a buffer from which you can run a
program to convert the file of records into netCDF format.

A picture of this for 2 instruments follows (could N instruments,
each with a different number of component data elements).

+---------+
| instr#1 | -->> (Read&Save) -->> instr#1 file --> (convert) ---> netCDF
+---------+ of records


+---------+
| instr#2 | -->> (Read&Save) -->> instr#2 file --> (convert) ---> netCDF
+---------+ of records


If data is acquired at a relatively slow rate then you may well
have plenty of time to write directly to a netCDF format file.

The netCDF file could be separate files for each instrument or all
data merged into a single file.

Separate instrument logged files (data at same sampling rate for each file)
================================
If all data from one instrument is at the same time base then this is easy.
The time dimension can be set to "unlimited". Each instrument log file
will have it's own time variable appropriate for the sampling rate.
Comparisons between different instruments will need to account for the
different time base in each file. This should be no problem but we do
have several (instrument) files.

Other specialised instruments with their own data acquisition and data
storage format (eg ADCP,CTD) could have their data converted to netCDF
files after aquisition is complete.

If there are several data components at different sampling rates then
data at the same sampling rate could be grouped together in a file.
Thus each file will have it's own time base.

ONE "MERGED" FILE (data components with different sampling rates)
=================
If data components are sampled at different rates but using the same
clock then there will be a common denominator (common time base) and the
method suggested by Tim Holt IS EXCELLENT. Lindsay's concern of different
sampling rates can be accommodated by Tim's method as long as there is a
common clock from which the sampling rates are referenced.

lindsay>> Some of our instruments are sampled at 2.5kHz, while others are
lindsay>> as slow as once a minute. For an underway data storage system using netCDF how
lindsay>> do you store such data with only one 'unlimited' dimension?

The above case should encompass most common data aquisition situations.

However, if high speed acquisition is sampled using different clocks then
they do not have an exact common time base.

Lindsay Pender has 2 solutions
==============================
1. The easiest solution may be to record the time base for each data
component sampled at a different rate and from a different clock.

Lindsay>> What I have
>> >considered doing, is to collect data from the various instruments into fixed
>> >length blocks, and then after some suitable time writing all of the data to a
>> >netCDF file with the now known dimensions. Using this scheme, I would have to
>> >carry an extra variable for each instrument - the time stamp for each block.
>> >
I believe that Lindsay is suggesting something like this ...(rough CDL)
dimesions:
xsample_no = 300 // say, no. of samples in blocks of x
ysample_no = 40 // say, no. of samples in blocks of y

// These are the user defined no. of blocks to read from each
// instrument before writing out to a netCDF file
indexx = 1000 // say
indexy = 500 // say

variables:
// Instrument #1 data
float signalx(indexx,xsample_no,other dims)
long timex(indexx) // time stamps for each block

// Instrument #2 data
float signaly(indexy,ysample_no,other dims)
long timey(indexy) // time stamps for each block

This will require the aquisition program to count the number of
samples and write out a netCDF file at appropriate times. Application
programs will need to use the individual time stamps for each block
of data from each instrument.

If data aquisition is fast and processor time limited, it may be
neccessary to write all data to a binary file and later convert
to a netCDF file.

2. PADDING (Info directly from Lindsay)

When sampling rates do not have a common clock one could use
Tim Holt's scheme by rounding up the block length for each
instrument such that in the chosen block time interval
(common for all instruments) each sample from each
instrument was guarenteed to fit within the block. Note now that
the number of samples in consecutive blocks may be different,
depending upon the relative timing of the block and instrument
sampling. This can be handled by using _FillValue for the unused
samples in the block.

============end of file=======

==============================================================================
Phil Morgan mail: CSIRO Oceanography _--_|\
GPO Box 1538, / \
Hobart Tas 7008, AUSTRALIA \_.--._/
email: morgan@drought.ml.csiro.au -----
phone: (002) 206236 +61 02 206236 \ /
fax: (002) 240530 +61 02 240530 \*/ ==============================================================================