Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.
Hi Tom- On 6/21/12 3:13 PM, Tom Kunicki wrote:
Do your files happen to have an unlimited dimension when it is not required?
Probably most have an unlimited time dimension - in some cases the files are still being appened to, in others not. For context, I'm looking at this in the RAMADDA context, but I would assume that TDS has the same issues, since they use (essentially) the same code.
In the past we've had a performance issues dealing with static data sets only to later realize the slow load times where due to the reading of data associated with an unlimited dimension (i.e. "time"). When a dimension is unlimited the values associated with it are stored sparsely though out the the file. Converting the unlimited dimension to fixed significantly increased time-to-open these files (i.e. the values for the "time" axis are stored contiguously, no longer sparsely). You'll want unlimited if you intend to append data along that dimension to the file in the future, otherwise make sure it's fixed if you are concerned about performance on initial open.
That makes sense and if it has to seek far into the 3.2 GB file, I can see where that would matter. However, I still think most of the time is related to OS caching. For example, on my 3.2 GB file (with an unlimited dimension), the first time I run my sample program, it takes ~50 seconds to open the file using either method (FeatureDataSet or GridDataset). I exit the program (so there's no VM/netCDF caching) and run it again and it takes < .5 seconds.
Don
Tom Kunicki Center for Integrated Data Analytics U.S. Geological Survey 8505 Research Way Middleton, WI 53562 On Jun 21, 2012, at 4:13 PM, Don Murray wrote:Just as a followup, the attached program tests the speed of opening a file using the method in FeatureScan vs. GridDataset.open. In my test, the latter is actually faster by a few milliseconds. The real slowdown is the initial os caching of the file (in this case a 3.3 GB file). Once the file is in the OS cache, both methods are pretty quick. Thanks to John (and Roland) for their help. Don On 6/20/12 8:14 PM, John Caron wrote:On 6/19/2012 3:19 PM, Don Murray wrote:Hi- I have a bunch of netCDF files and I want to quickly determine whether they are grids, trajectories, or point features. For grids, I've been using GridDataset gds = GridDataset.open(path) and catch the exception if it's not a grid, but for a 3.3 GB file, that can take 2 minutes (or longer) to open and create the dataset if it is a grid. I was wondering if there's a quicker method of determining the feature type of a netCDF file. Thanks for your help. DonHi Don: The most convenient thing is to use ToolsUI / FeatureTypes / FeatureScan, and give it a file or directory. It will try to figure out the type and report on what it finds. The code is in ucar.nc2.ft.scan.FeatureScan.java, you can copy the parts you need. Its an ongoing process, i think im not doing it as well as it can be done. Send me reports on files it misidentifies. John _______________________________________________ netcdf-java mailing list netcdf-java@xxxxxxxxxxxxxxxx For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/-- Don Murray NOAA/ESRL/PSD and CIRES 303-497-3596 http://www.esrl.noaa.gov/psd/people/don.murray/ <TestOpen.java>_______________________________________________ netcdf-java mailing list netcdf-java@xxxxxxxxxxxxxxxx For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/
-- Don Murray NOAA/ESRL/PSD and CIRES 303-497-3596 http://www.esrl.noaa.gov/psd/people/don.murray/
netcdf-java
archives: