Re: [netcdf-java] fastest way to determine feature type

To: Tom Kunicki <tkunicki@xxxxxxxx>
Subject: Re: [netcdf-java] fastest way to determine feature type
From: Don Murray <don.murray@xxxxxxxx>
Date: Thu, 21 Jun 2012 15:40:42 -0600

Hi Tom-

On 6/21/12 3:13 PM, Tom Kunicki wrote:

Do your files happen to have an unlimited dimension when it is not
required?

Probably most have an unlimited time dimension - in some cases the filesare still being appened to, in others not. For context, I'm looking atthis in the RAMADDA context, but I would assume that TDS has the sameissues, since they use (essentially) the same code.

In the past we've had a performance issues dealing with static data
sets only to later realize the slow load times where due to the
reading of data associated with an unlimited dimension (i.e. "time").
When a dimension is unlimited the values associated with it are
stored sparsely though out the the file. Converting the unlimited
dimension to fixed significantly increased time-to-open these files
(i.e. the values for the "time" axis are stored contiguously, no
longer sparsely).  You'll want unlimited if you intend to append data
along that dimension to the file in the future, otherwise make sure
it's fixed if you are concerned about performance on initial open.

That makes sense and if it has to seek far into the 3.2 GB file, I cansee where that would matter. However, I still think most of the time isrelated to OS caching. For example, on my 3.2 GB file (with anunlimited dimension), the first time I run my sample program, it takes~50 seconds to open the file using either method (FeatureDataSet orGridDataset). I exit the program (so there's no VM/netCDF caching) andrun it again and it takes < .5 seconds.

Don

Tom Kunicki Center for Integrated Data Analytics U.S. Geological
Survey 8505 Research Way Middleton, WI  53562

On Jun 21, 2012, at 4:13 PM, Don Murray wrote:

Just as a followup, the attached program tests the speed of opening
a file using the method in FeatureScan vs. GridDataset.open.  In my
test, the latter is actually faster by a few milliseconds.  The
real slowdown is the initial os caching of the file (in this case a
3.3 GB file). Once the file is in the OS cache, both methods are
pretty quick.

Thanks to John (and Roland) for their help.

Don

On 6/20/12 8:14 PM, John Caron wrote:

On 6/19/2012 3:19 PM, Don Murray wrote:

Hi-

I have a bunch of netCDF files and I want to quickly determine
whether they are grids, trajectories, or point features.  For
grids, I've been using GridDataset gds = GridDataset.open(path)
and catch the exception if it's not a grid, but for a 3.3 GB
file, that can take 2 minutes (or longer) to open and create
the dataset if it is a grid.  I was wondering if there's a
quicker method of determining the feature type of a netCDF
file.

Thanks for your help.

Don


Hi Don:

The most convenient thing is to use ToolsUI / FeatureTypes /
FeatureScan, and give it a file or directory. It will try to
figure out the type and report on what it finds.

The code is in ucar.nc2.ft.scan.FeatureScan.java, you can copy
the parts you need.

Its an ongoing process, i think im not doing it as well as it can
be done. Send me reports on files it misidentifies.

John

_______________________________________________ netcdf-java
mailing list netcdf-java@xxxxxxxxxxxxxxxx For list information or
to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/


-- Don Murray NOAA/ESRL/PSD and CIRES 303-497-3596
http://www.esrl.noaa.gov/psd/people/don.murray/


<TestOpen.java>_______________________________________________
netcdf-java mailing list netcdf-java@xxxxxxxxxxxxxxxx For list
information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/


--
Don Murray
NOAA/ESRL/PSD and CIRES
303-497-3596
http://www.esrl.noaa.gov/psd/people/don.murray/

References:
- [netcdf-java] fastest way to determine feature type
  - From: Don Murray
- Re: [netcdf-java] fastest way to determine feature type
  - From: John Caron
- Re: [netcdf-java] fastest way to determine feature type
  - From: Don Murray
- Re: [netcdf-java] fastest way to determine feature type
  - From: Tom Kunicki

2012 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: