Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.

Re: [netcdf-java] Reading very large THREDDS catalogs...

On 9/22/2011 8:18 AM, Roland Schweitzer wrote:
On 09/22/2011 09:13 AM, Roland Schweitzer wrote:
Hi,

Some folks at NCAR have put together a THREDDS catalog (http://tds.prototype.ucar.edu/thredds/esgcet/catalog.xml) which I would like read to prepare configuration information for LAS. The catalog consists of 3000+ catalogRef elements that point to other local catalogs. When running through this catalog doing the obvious thing:
            List<InvDataset> datasets = catalog.getDatasets();
for (Iterator<InvDataset> iterator = datasets.iterator(); iterator.hasNext();) {
                InvDataset invDataset = (InvDataset) iterator.next();
                System.out.println("\t"+invDataset.getName());
            }
Addendum:

Of course, you have to actually look at the datasets in the sub-catalogs to have the dataset in the catalogRef read... Like this:
            List<InvDataset> datasets = catalog.getDatasets();
for (Iterator<InvDataset> iterator = datasets.iterator(); iterator.hasNext();) {
                InvDataset invDataset = (InvDataset) iterator.next();
                System.out.println("\t"+invDataset.getName());
                List<InvDataset> subDatasets = invDataset.getDatasets();
for (Iterator<InvDataset> subIt = subDatasets.iterator(); subIt .hasNext();) {
                    InvDataset subDataset = (InvDataset) subIt.next();
                    System.out.println("\t\t"+subDataset.getName());
                }
            }

But the point is the same.
the JVM heap gets larger when each successive dataset (catalogRef) is 
read as observed by setting the options to log the garbage collection 
on the JVM.  This makes sense in that the catalogRef gets read and 
the information gets kept in memory.  The problem is that eventually 
you will run out of heap.  When you run out depends on how much 
memory you give the JVM.
If folks are going to be publishing catalogs this large, we need some 
way to read them in a memory efficient way.  I know that once I reach 
the bottom of the loop I'm finished with that dataset and it would be 
ok with me to boot it out of memory, but I haven't figured out a 
clever way to do that.
What are the options for reading such a large catalog using the 
Java-netCDF tools?
Roland
TDS >= 4.2.8 has a new option to turn off catalog caching, added to 
support ESG catalogs. Use it when you have a large number of static 
catalogs to minimize memory use.
It seems to work AFAICT, with minor performance penalty.

http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/ThreddsConfigXMLFile.html#CatalogCaching



  • 2011 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: