[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FW: Catalog generator crawling remote OPenDAP/DODS servers



Hi Bas,

Your plan sound good. Here are a few comments and answers:

1) We have some JUnit tests in the test/src directory that might help you get going. Take a look at test/src/thredds/cataloggen/TestCollectionLevelScanner.java. If you can get that running and play with it some it should help you understand CollectionLevelScanner and CrawlableDataset. Some of the files in test/src/thredds/crawlabledataset might be helpful to look at as well. Most of the tests use data from the test/data directory (including some data to crawl) so you should have everything you need to get the tests working for you. There are a few spots in the crawlabledataset tests that use data local to my machine (sorry, we're still trying to clean up our testing).

2) The ant build script, build.xml, contains a "makeWar" target for building the thredds.war file. Also, "compile" and "test-setup" might come in handy for getting the above tests running ("cleanCompileTest" does both after a clean). If you have trouble with that, I can certainly build a .war with your files included. Just let me know.

3) Hmm. Nothing jumps out at me as to why your "Test" class wouldn't work. Does it actually fail to generate a catalog? Or just generate a skeleton catalog?

4) Yeah, the collection, catalog, and current levels are kind of confusing. Basically, if you have a hierarchical data collection (I'll talk in terms of CrawlableDatasetFile where the collection is represented by a local file directory structure containing the data files), the directory at the top of your data collection would be the collection level. CollectionLevelScanner only scans one level (directory) at a time. The current level is the directory that you want to scan. The catalog level is were it gets confusing. The issue is if you want to build a catalog that shows multiple levels of the collection. To do this, you need to construct a CollectionLevelScanner and call generateCatalog() on it for each level to be cataloged and then piece the resulting catalogs together (see StandardCatalogBuilder for an example of this). Because the service base URLs might be relative, you need to know the relative location of the datasets currently being crawled to construct the dataset URL. The catalog level specifies the relative location. You leave catalog level null if you are not building a multi-level catalog.

If I ever get around to refactoring this stuff, this is one of the things I might rethink. Sorry it is confusing. Hope this explanation helps.

5) You can see the XML representation of an InvCatalog by creating an InvCatalogFactory

     InvCatalogFactory fac = InvCatalogFactory.getDefaultFactory( false );

and calling one of the writeXML() methods on it.

Hope this all helps. Let me know if you have more questions.

Ethan

Bas Retsios wrote:
Hello Ethan,

Thank you for your e-mail.
Now I hope you have some time and patience for helping me, as I was convinced that the best approach is to go for a CrawlableDataset.

I have come up with the following approach:
- Understand how the class CrawlableDatasetFile works
- For this, I will need to easily "run" a part of Thredds from within Eclipse. As I can not find out if there is existing code that does this, I will make my own "Test" class for performing some calls involving CrawlableDatasetFile. - After understanding the existing functionality, I will make a similar class, named e.g. CrawlableDatasetDods. If more classes are needed, I will also make them. As much as possible, I will copy code from DodsDirDatasetSource and similar classes. - I will change my "Test" class, to make calls that involve the new CrawlableDatasetDods classes. - All new CrawlableDatasetDods-related classes will be tested extensively with my "Test" class, using at least two different DODS servers as a source. - As I do not know how to make the thredds.war file, I will submit all classes to you, then I would like you to send me a new .war file containing the entire thredds. - If you think the classes I have created are useful, you can include them in the official Thredds release. I have to discuss the copyright with my boss, but we do have an "open-source" policy for everything we develop. - Unless I have made a terrible mistake in estimating the time I would need for the above, my intention is to have everything ready within 1 week.

Please inform me if you have a better approach.

I have started making a simple "Test" class. See attachment: Test.java. In this file, you will see that I could not figure out how to properly use CollectionLevelScanner, CrawlableDataset, InvService and InvCatalogImpl. Could you please help me improve the "Test" class, and in particular in the following aspects: 1. I have no data that CrawlableDatasetFile could crawl. Do you have a link to some data that I can extract on my harddisk? 2. Help me understand collectionPath, collectionLevel and catalogLevel by changing my code in Test.java (note that I already read the JavaDoc of CollectionLevelScanner before writing Test.java) 3. Point me to some code with which I could see that a proper InvCatalogImpl was generated (other than catalog.getName()).

Thanks in advance,

Bas Retsios.

==
Software Developer
IT Department, Sector Remote Sensing & GIS
International Institute for Geo-information Science and Earth Observation (ITC)
P.O. Box 6,  7500 AA Enschede, The Netherlands
Phone +31 (0)53 4874 573, telefax +31 (0)53 4874 335
E-mail address@hidden, Internet http://www.itc.nl
------------------------------------------------------------------------

package thredds.cataloggen;

import java.io.IOException;
import java.lang.reflect.InvocationTargetException;

import thredds.catalog.InvCatalogImpl;
import thredds.catalog.InvService;
import thredds.crawlabledataset.CrawlableDatasetFactory;
import thredds.crawlabledataset.CrawlableDataset;
import thredds.crawlabledataset.CrawlableDatasetFilter;

public class Test {

        public static void main(String[] args) throws IOException,
                        IllegalArgumentException, ClassNotFoundException,
                        NoSuchMethodException, IllegalAccessException,
                        InvocationTargetException, InstantiationException {

                String collectionPath = "file:///d:/thredds/";
                CrawlableDataset collectionLevel = CrawlableDatasetFactory
                                .createCrawlableDataset("file:///d:/thredds/", 
null, null);
                CrawlableDataset catalogLevel = CrawlableDatasetFactory
                                .createCrawlableDataset("file:///d:/thredds/", 
null, null);

                CrawlableDataset currentLevel = null;
                CrawlableDatasetFilter filter = null;

                InvService service = new InvService("my_service", "File",
                                "file:///d:/", null, null);

                CollectionLevelScanner cls = new 
CollectionLevelScanner(collectionPath,
                                collectionLevel, catalogLevel, currentLevel, 
filter, service);
                cls.scan();
                InvCatalogImpl catalog = cls.generateCatalog();
                String s = catalog.getName();
                int i = 0;
        }

}

--
Ethan R. Davis                                Telephone: (303) 497-8155
Software Engineer                             Fax:       (303) 497-8690
UCAR Unidata Program Center                   E-mail:    address@hidden
P.O. Box 3000
Boulder, CO  80307-3000                       http://www.unidata.ucar.edu/
---------------------------------------------------------------------------