Re: [netcdf-java] point data

To: netcdf-java <netcdf-java@xxxxxxxxxxxxxxxx>
Subject: Re: [netcdf-java] point data
From: Rich Signell <rsignell@xxxxxxxx>
Date: Mon, 25 Jan 2010 10:42:58 -0500

NetCDF-Java folk,

I'm trying to figure out how best to store the Global and US "Surface
summary of day data" at:

    http://www.ncdc.noaa.gov/oa/climate/climatedata.html#daily

in NetCDF format with the CDM Point Feature type conventions:

     http://www.unidata.ucar.edu/software/netcdf-java/CDM/CFpoints.html

This is daily-averaged surface data (temp, air pressure, etc) that
starts in 1929 with just a few stations, and now has thousands of
global stations.   It's stored on a ftp site with directories for
each year which containing gzip compressed text files, one for each
station.   The files in the 2010 directory are replaced every few days
with new updated files.

In present form the compressed text files take up 2.9GB, but if we
made a single NetCDF file with 22 vars x 81 years x 10,000 stations it
would be 29TB without compression.

So looking at the Point Data specs, it seems we could take several approaches:

1. Write with fixed time,station dimensions, fill missing values with
NaN, and use the NetCDF4 deflation.
2. Use 5.8.2.2 Ragged array (contiguous) representation
3. Use 5.8.2.3 Ragged array (non-contiguous) representation

since the records in the  files are updated regularly, perhaps option
2 is out, so I'm leaning toward option 3, in which you have just one
dimension for the each data variable and write all the station data
into it, but you have another variable which specifies the station ID
it corresponds to.

Does this sound right?

Thanks,
Rich
-- 
Dr. Richard P. Signell   (508) 457-2229
USGS, 384 Woods Hole Rd.
Woods Hole, MA 02543-1598

Follow-Ups:
- Re: [netcdf-java] point data
  - From: John Caron

2010 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: