[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [uaf_tech] Next UAF telcon: June 10th, 12:30pm EDT



Hi John,

Thanks for weighing in. Helpful. Since you ended in "Not sure if I covered all the issues", can we touch back to see what this says about the original issue that Rich raised.

The choice to have TDS translate
      <end>present</end>
      <duration>7 days</duration>
into
      Start: 2010-06-03 12:04:57Z
      End: 2010-06-10 12:04:57Z
      Duration: 7 days
has implications for data discovery services and crawling.  While the first encoding ("present" with a duration) remains true when new files are added to the underlying aggregation, the second encoding has to be altered or it becomes out of date.   Does Unidata envision that metadata harvesters will ping these datasets on a regular basis to get the updated information?  Is there (or should there be) metadata in the THREDDS catalog to tell crawlers which datasets require periodic pinging and at what frequency?  Is RAMADDA sensitive to these issues?  In short, what are your thoughts on the data discovery process for datasets that extend to "the present"?

    - Steve

===========================

John Caron wrote:
Hi Rich, et al:

I agree that modifying NcML in the TDS when files arrive is not a viable solution. You need to use a scan element for this, although we are replacing <scan> elements with <collection> elements (in FMRC right now, will be extended to other aggregations in 4.3).

1) Specifying the time range in the catalog for this case is possible. Heres how we do it on motherlode:

       <timeCoverage>
         <end>present</end>
         <duration>7 days</duration>
       </timeCoverage>

this means that the starting time is "present" - 7 days. The TDS generates the actual ISO dates in the catalog, eg at this moment:

TimeCoverage:

Start: 2010-06-03 12:04:57Z
End: 2010-06-10 12:04:57Z
Duration: 7 days

A bit more detail at:

http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/v1.0.2/InvCatalogSpec.html#timeCoverageType


2) One can also generate time ranges from the filename, see "Adding timeCoverage" in

http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/DatasetScan.html

this is used when you have files with the starting time embedded in the filename and a known duration.


3) We are moving towards automatic generation of the time coverage, as Rich mentioned, we do that now in the FMRC, and we will try to extend that to other aggregations where the time coordinate can be extracted

Not sure if I covered all the issues.

John

Rich Signell wrote:
Guys,

Sorry to sent this twice, but I wanted to cc John Caron and Ethan
Davis to allow them to comment.

-Rich

On Wed, Jun 9, 2010 at 6:02 PM, Rich Signell <address@hidden> wrote:
Ted,

With time aggregations, the virtual dataset is served dynamically via
THREDDS as new data arrives without modifying the underlying catalog
that specifies the aggregation.    We don't want to be modifying NcML
in the catalog every time a file arrives.   So it seems we have two
choices:  (1) have the crawler actually read the last time value and
since it's CF-compliant, this is easy (there is a NetCDF-Java function
for this).   I think both ncISO and RAMADDA already do this.   (2) we
ask Unidata to modify the TDS so that it automatically generates the
stop time as THREDDS metadata.  It already does this for FRMC
aggregations.   On the plus side, this ensures that we get the right
time without reading the time values.  The disadvantage is that it
would only work for TDS served data.

-Rich

On Wed, Jun 9, 2010 at 5:42 PM, Ted Habermann <address@hidden> wrote:
Rich et al.,

Seems to me our first choice should be to use an existing standard for
describing time periods. In my experience the most commonly used is ISO
8601. Describing time periods of known duration is straightforward if we
know the starting point. For example a period with duration 7 days starting
today would be: 20100609/P7D. There are probably a couple ways to expressing
this explicitly in NcML:

<attribute name="time_coverage_start" value="2010-06-09"/>
<attribute name="time_coverage_duration" value="P7D"/>

or, it may make sense to just calculate the end time and write it into the
file:

<attribute name="time_coverage_start" value="2010-06-09"/>
<attribute name="time_coverage_end" value="2010-06-16"/>

If we are dealing with collection level NcML (?), one could say
<attribute name="time_coverage_start" value="present"/>
<attribute name="time_coverage_duration" value="P7D"/>

I'm not sure off hand how this would get translated to ISO. Maybe
<gmd:temporalElement>
  <gmd:EX_TemporalExtent>
    <gmd:extent>
      <gml:TimePeriod gml:id="t3">
        <gml:beginPosition indeterminatePosition="now"/>
        <gml:endPosition>P7D</gml:endPosition>
      </gml:TimePeriod>
    </gmd:extent>
  </gmd:EX_TemporalExtent>
</gmd:temporalElement>

Ted






On 6/9/2010 12:34 PM, Steve Hankin wrote:

David Neufeld wrote:

Hi Rich, Steve,

I think if we move toward a model where metadata is handled as a service as
opposed to a static file this problem starts to go away.

Agree in principle.  I have argued this same pov with Ted -- that we should
not insist that metadata be inserted into files, if that metadata is
derivable from information already contained in the file.

Ideas for implementing this approach?  The most appealing to me is that TDS,
itself, would generate data discovery metadata such as

time_coverage_start = "present minus 30 days";   // a running archive
time_coverage_end = "present plus 10 days";   // a forecast

based upon coordinates and use metadata found inside the dataset, and
perhaps some new ncML directives that govern the "metadata service".  But
the questions remain: who would do this work and when?  And what should UAF
do in the interim (i.e. now)?

    - Steve

So for example, if we generate metadata dynamically and it contains the
standard static attributes along side of dynamically retrieved values for
geographic and temporal bounds then we're in good shape at the catalog
level.  There is still the issue of how often to harvest the metadata in
other clearinghouses like RAMADDA or Geonetwork, but that can be left more
for the portal provider to determine.

Dave

On 6/9/2010 10:39 AM, Steve Hankin wrote:


Rich Signell wrote:

UAF Folks,

I can't make the 12:30 ET/9:30 PT meeting tomorrow, but here are my two
issues:

Hi Rich,

Sorry you cannot make it.   With that in mind have started the conversations
here by email ...

1) How to handle temporal metadata for time aggregated datasets that are
changing every day (or perhaps every 15 min for the HF Radar measurements).
I got bit by this when I did a temporal/ geospatial search in RAMADDA for
UAF data in the Gulf of Mexico during the last week and turned up no
datasets.  It should have turned up the NCOM Region 1 model data, HF radar
data and USGS COAWST model results.   I'm pretty sure the problem is that
RAMADDA harvested the data from the clean catalog more than a week ago, so
the "stop dates" in the metadata database are older than one week ago.   How
should this best be fixed?

Might this be best addressed by using the Unidata Discover Attribute
recommendations:
http://www.unidata.ucar.edu/software/netcdf-java/formats/DataDiscoveryAttConvention.html?
They offer the global attribute:

   time_coverage_end = "present"

Arguably within UAF we should insert such global attributes into the
relevant datasets and also work to communicate the need back to the data
providers to do so on their own THREDDS servers.  An alternative to consider
is putting this information into the THREDDS metadata instead of into the
ncML of the dataset.

btw: A seeming omission in the Unidata recommendations is any way to
represent "3 months ago" as the start time.  A start time of this style is
pretty common in operational outputs.


2) How to represent FMRC data.   If we scan a catalog with a Forecast Model
Run Collection we currently get hundreds of datasets, because the FRMC
automatically produces datasets for the daily forecasts as well as the "Best
Time Series" dataset that most people are interested in.   In the latest
version of the Thredds Data Server (4.2 beta), the provider can specify they
only want the best time series dataset to be exposed.   This will help
significantly, but it will take a while to get everybody with FMRCs
retrofit.   I will bring this up on the Model Data Interoperability Google
Group.

Might be best to hold off this topic until you are on the phone, since you
are our resident expert.  No?

   - Steve

--
==== Ted Habermann ===========================
     Enterprise Data Systems Group Leader
     NOAA, National Geophysical Data Center
     V: 303.497.6472   F: 303.497.6513
     "I entreat you, I implore you, I exhort you,
     I challenge you: To speak with conviction.
     To say what you believe in a manner that bespeaks
     the determination with which you believe it.
     Because contrary to the wisdom of the bumper sticker,
     it is not enough these days to simply QUESTION AUTHORITY.
     You have to speak with it, too."
     Taylor Mali, www.taylormali.com
==== address@hidden ==================


--
Dr. Richard P. Signell   (508) 457-2229
USGS, 384 Woods Hole Rd.
Woods Hole, MA 02543-1598






NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.