Re: [netcdf-java] Reading very large THREDDS catalogs...

On 09/22/2011 09:13 AM, Roland Schweitzer wrote:
Hi,

Some folks at NCAR have put together a THREDDS catalog (http://tds.prototype.ucar.edu/thredds/esgcet/catalog.xml) which I would like read to prepare configuration information for LAS. The catalog consists of 3000+ catalogRef elements that point to other local catalogs. When running through this catalog doing the obvious thing:

            List<InvDataset> datasets = catalog.getDatasets();
for (Iterator<InvDataset> iterator = datasets.iterator(); iterator.hasNext();) {
                InvDataset invDataset = (InvDataset) iterator.next();
                System.out.println("\t"+invDataset.getName());
            }

Addendum:

Of course, you have to actually look at the datasets in the sub-catalogs to have the dataset in the catalogRef read... Like this:

            List<InvDataset> datasets = catalog.getDatasets();
for (Iterator<InvDataset> iterator = datasets.iterator(); iterator.hasNext();) {
                InvDataset invDataset = (InvDataset) iterator.next();
                System.out.println("\t"+invDataset.getName());
                List<InvDataset> subDatasets = invDataset.getDatasets();
for (Iterator<InvDataset> subIt = subDatasets.iterator(); subIt .hasNext();) {
                    InvDataset subDataset = (InvDataset) subIt.next();
                    System.out.println("\t\t"+subDataset.getName());
                }
            }

But the point is the same.

the JVM heap gets larger when each successive dataset (catalogRef) is read as observed by setting the options to log the garbage collection on the JVM. This makes sense in that the catalogRef gets read and the information gets kept in memory. The problem is that eventually you will run out of heap. When you run out depends on how much memory you give the JVM.

If folks are going to be publishing catalogs this large, we need some way to read them in a memory efficient way. I know that once I reach the bottom of the loop I'm finished with that dataset and it would be ok with me to boot it out of memory, but I haven't figured out a clever way to do that.

What are the options for reading such a large catalog using the Java-netCDF tools?

Roland

_______________________________________________
netcdf-java mailing list
netcdf-java@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/



  • 2011 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: