Re: [netcdf-java] Reading very large THREDDS catalogs...

To: netcdf-java@xxxxxxxxxxxxxxxx
Subject: Re: [netcdf-java] Reading very large THREDDS catalogs...
From: John Caron <caron@xxxxxxxxxxxxxxxx>
Date: Thu, 20 Oct 2011 19:54:00 -0600

On 9/22/2011 8:18 AM, Roland Schweitzer wrote:

On 09/22/2011 09:13 AM, Roland Schweitzer wrote:
Hi,
Some folks at NCAR have put together a THREDDS catalog(http://tds.prototype.ucar.edu/thredds/esgcet/catalog.xml) which Iwould like read to prepare configuration information for LAS. Thecatalog consists of 3000+ catalogRef elements that point to otherlocal catalogs. When running through this catalog doing the obviousthing:
            List<InvDataset> datasets = catalog.getDatasets();
for (Iterator<InvDataset> iterator = datasets.iterator();iterator.hasNext();) {
                InvDataset invDataset = (InvDataset) iterator.next();
                System.out.println("\t"+invDataset.getName());
            }
Addendum:
Of course, you have to actually look at the datasets in thesub-catalogs to have the dataset in the catalogRef read... Like this:
            List<InvDataset> datasets = catalog.getDatasets();
for (Iterator<InvDataset> iterator = datasets.iterator();iterator.hasNext();) {
                InvDataset invDataset = (InvDataset) iterator.next();
                System.out.println("\t"+invDataset.getName());
                List<InvDataset> subDatasets = invDataset.getDatasets();
for (Iterator<InvDataset> subIt =subDatasets.iterator(); subIt .hasNext();) {
                    InvDataset subDataset = (InvDataset) subIt.next();
                    System.out.println("\t\t"+subDataset.getName());
                }
            }

But the point is the same.
the JVM heap gets larger when each successive dataset (catalogRef) isread as observed by setting the options to log the garbage collectionon the JVM. This makes sense in that the catalogRef gets read andthe information gets kept in memory. The problem is that eventuallyyou will run out of heap. When you run out depends on how muchmemory you give the JVM.
If folks are going to be publishing catalogs this large, we need someway to read them in a memory efficient way. I know that once I reachthe bottom of the loop I'm finished with that dataset and it would beok with me to boot it out of memory, but I haven't figured out aclever way to do that.
What are the options for reading such a large catalog using theJava-netCDF tools?
Roland

TDS >= 4.2.8 has a new option to turn off catalog caching, added tosupport ESG catalogs. Use it when you have a large number of staticcatalogs to minimize memory use.


It seems to work AFAICT, with minor performance penalty.

http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/ThreddsConfigXMLFile.html#CatalogCaching

References:
- [netcdf-java] Reading very large THREDDS catalogs...
  - From: Roland Schweitzer
- Re: [netcdf-java] Reading very large THREDDS catalogs...
  - From: Roland Schweitzer

2011 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: