Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.
John Below are the answers to your questions -- let me know if it's not enough info. Lauren ====================================== Lauren E. Hay, Ph.D. Tel: (303) 236-7279 U.S. Geological Survey Fax: (303) 236-5034 Box 25046, MS 412, DFC Email: lhay@xxxxxxxx Lakewood, CO 80225 ====================================== From: John Caron <caron@xxxxxxxxxxxxxxxx> To: Rich Signell <rsignell@xxxxxxxx> Cc: netcdf-java <netcdf-java@xxxxxxxxxxxxxxxx>, Roland Viger <rviger@xxxxxxxx>, Steven Markstrom <markstro@xxxxxxxx>, Lauren E Hay <lhay@xxxxxxxx>, Nate Booth <nlbooth@xxxxxxxx> Date: 01/25/2010 10:21 AM Subject: Re: [netcdf-java] point data Hi Rich and all: This is a interesting challenge on such a large datasets to get good read response. First, you have to decide what kinds of queries you want to support and what kind of response time is needed. I have generally used the assumption that the common queries that you want to optimize are: 1) get data over a time range for all stations in a lat/lon box. 2) get data for a single station over a time range, or for all time. 3) get data with a specified list of stations Usually I would break the data into multiple files based on time range, aiming for a file size of 50-500 Mb. I also use a different format for current vs archived data, so that the current dataset can be added to dynamically, while the archived data is rewritten (once) for speed of retrieval. Again, all depends on what queries you want to optimize so ill wait for your thoughts on that. We ran into this problem in the past so we made a separate file for each station and each variable. Is there a problem with having too many files? Can we have a file by year that only contains stations with data for that year? Or -- if we don't care how many files -- 1 file for each station for each variable for each year. It does not matter to me. The current project will have data that has a set time period. We hope to use this structure for other projects that will have file updates as new data is collected. Another question is what clients need to access this data. Are you writing your own web service, do you just want remote access from IDV, or ?? We anticipate that our web serivces will use the OpenDAP API. I'm not the person to answer this one. I would think that if we're careful, we can get netcdf-4 sizes that are similar to compressed text, but we'll have to experiment. The data appears to be integer or float with a fixed dynamic range, which is amenable to storing as an integer with scale/offset. integer data compresses much better than floating point due to the noise in the low bits of the mantissa. So one task you should get started on is to examine each field and decide its data type. if floating point, decide on its range and the number of significant bits.
netcdf-java
archives: