[thredds] Set harvest attribute using datasetScan

Hi all, I'm setting up a geospatial data and metadata portal based on
thredds catalog and the Geonetwork engine and web application. I am working
on Linux CentOS and my applications are deployed with Tomcat8.

I am populating a thredds catalog based on a filesystem containing
meteorological data. Geonetwork then harvests the catalog and populates the
application. However, and given that I'm updating the data on the web side,
I would like to harvest only once the data.

I tried to set the 'harvest' attribute from the catalog, but without
success. Here's an excerpt of my catalog.xml file:

  <datasetScan name="AUXILIARY" ID="testAUXILIARY"
               path="AUXILIARY" location="content/testdata/auxiliary-aux"
harvest="true">
    <metadata inherited="true">
      <serviceName>all</serviceName>
      <dataType>Grid</dataType>
      <dataFormatType>NetCDF</dataFormatType>
        <DatasetType harvest="true"></DatasetType>
        <harvest>true</harvest>
      <keyword>WRF outputs</keyword>
        <documentation type="summary">This is a summary for my test ARPA
catalog for WRF runs. Runs are made at 12Z and 00Z, with analysis an
        d forecasts every 6 hours out to 60 hours. Horizontal = 93 by 65
points, resolution 81.27 km, LambertConformal projection. Vertical = 1000 to
         100 hPa pressure levels.</documentation>
       <timeCoverage>
         <end>present</end>
         <duration>5 years</duration>
       </timeCoverage>
       <variables vocabulary="GRIB-1" />
       <variables vocabulary="">
         <variable name="Z_sfc" vocabulary_name="Geopotential H" units="gp
m">Geopotential height, gpm</variable>
       </variables>
    </metadata>

    <filter>
      <include wildcard="*wrfout_*"/>
    </filter>

    <addDatasetSize/>
    <addTimeCoverage
datasetNameMatchPattern="([0-9]{2})_([0-9]{4})-([0-9]{2})-([0-9]{2})_([0-9]{2}):([0-9]{2}):([0-9]{2})"
           startTimeSubstitutionPattern="$2-$3-$4T$5:00:00"
                  duration="6 hours" />

    <namer>
    <regExpOnName regExp="([0-9]{4})([0-9]{2})([0-9]{2})_([0-9]{2})"
replaceString="WRF $1-$2-$3T$4:00:00" />
    <regExpOnName
regExp="([0-9]{2})_([0-9]{4})-([0-9]{2})-([0-9]{2})_([0-9]{2}):([0-9]{2}):([0-9]{2})"
replaceString="WRF Domain-$1 $2-$3-$4T$5:00:00" />
    </namer>

  </datasetScan>


Even if I set the harvest="true" attribute, it is not inherited by the
datasets and thus the harvester does not get the files. I can also ignore
the 'harvest' attribute while harvesting, but my aim is to harvest only new
files using an auxiliary catalog that contains symbolic links (and updating
the Thredds path after harvesting).

Am I missing something? How would you sistematically add the harvest
attribute to all inner datasets in a nested filesystem? Or, would it make
sense to create two catalogs using the time filter options (ex. all up to
yesterday in one catalog, and today's files in another)? Can you show me an
example of usage of those filters in a datasetScan?

Many thanks,
Chiara



-- 
Chiara Scaini
  • 2018 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: