Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.
NetCDF-Java folk, I'm trying to figure out how best to store the Global and US "Surface summary of day data" at: http://www.ncdc.noaa.gov/oa/climate/climatedata.html#daily in NetCDF format with the CDM Point Feature type conventions: http://www.unidata.ucar.edu/software/netcdf-java/CDM/CFpoints.html This is daily-averaged surface data (temp, air pressure, etc) that starts in 1929 with just a few stations, and now has thousands of global stations. It's stored on a ftp site with directories for each year which containing gzip compressed text files, one for each station. The files in the 2010 directory are replaced every few days with new updated files. In present form the compressed text files take up 2.9GB, but if we made a single NetCDF file with 22 vars x 81 years x 10,000 stations it would be 29TB without compression. So looking at the Point Data specs, it seems we could take several approaches: 1. Write with fixed time,station dimensions, fill missing values with NaN, and use the NetCDF4 deflation. 2. Use 5.8.2.2 Ragged array (contiguous) representation 3. Use 5.8.2.3 Ragged array (non-contiguous) representation since the records in the files are updated regularly, perhaps option 2 is out, so I'm leaning toward option 3, in which you have just one dimension for the each data variable and write all the station data into it, but you have another variable which specifies the station ID it corresponds to. Does this sound right? Thanks, Rich -- Dr. Richard P. Signell (508) 457-2229 USGS, 384 Woods Hole Rd. Woods Hole, MA 02543-1598
netcdf-java
archives: