Re: [thredds] Problem between OPeNDAP and TDS when netCDF file is modified

To: don.k.hooper@xxxxxxxx
Subject: Re: [thredds] Problem between OPeNDAP and TDS when netCDF file is modified
From: William Madry <madry@xxxxxxxxxxxxxxxx>
Date: Fri, 04 May 2012 14:19:52 -0600

On 05/02/2012 04:28 PM, Hoop wrote:

All,

This is my latest in a now monthly series of requests for help with
doing aggregations with our TDS.  The problem I first reported back
on 23 February, wherein aggregations don't notice time steps added
to the final file in the time series, is unresolved..  Since I last
wrote (4 April), we upgraded to 4.2.10.  There was no effect that we
could discern.  Whether we use NcML or FeatureCollection, new time
steps in the final file go unnoticed until Tomcat is restarted.
Fabulously, if a new file is added without restarting Tomcat, the
initial time steps in the new final file are added to the aggregation,
leaving a gap where the time steps added to the previous "final" file
since the last Tomcat restart should be.  This leads to complaints of
the aggregation not being CF-compliant, since it appears to have
uneven spacing in time.

Interestingly, doing the aggregation in RAMADDA works as we would
expect, since it is frequently rebuilding the aggregation.  So, while
it is perhaps less efficient than TDS, at least it is reliable.

-Hoop

Ethan,

I've deleted Claude's original post from the cascade below, and neatened up
the Subject: line, which will no doubt screw up threading.  In any case, our
web system administrator tells me that we had such a NetcdfFileCache element
all along, with maxFiles set to 0 (I don't have all of our TDS config files
in front of me, alas).  I also found, in the online documentation at:

http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/ThreddsConfigXMLFile.html#DiskCache

the following:

  <FeatureCollection>
    <dir>/tomcat_home/content/thredds/cache/collection/</dir>
    <maxSize>20 Mb</maxSize>
    <jvmPercent>2</jvmPercent>
  </FeatureCollection>

Eliminating the<dir>  and<jvmPercent>  elements, and setting maxSize to zero
(and, hopefully putting it in the correct TDS config file.>SIGH<), we
restarted Tomcat.  The results were initially disenheartening, as a timestep
added to the final file an hour before was not included in the
featureCollection aggregation, but was picked up by the NcML aggregation of
the same time series.

I just checked again (about an hour later), and it's still not part of the
featureCollection aggregation.  So, we still have no solution AFAIK.  Of
course, I don't know which TDS config file our web system administrator
put the maxSize element in.  The web page above says it should have gone in:

${tomcat_home}/content/thredds/threddsConfig.xml

Our web system administrator has gone home for the day, but I've asked him
in e-mail just which config file he put that element in.

-Hoop

On 04/04/12 14:51, Hoop wrote:

Ethan,

Thanks for responding. I'm dubious that this will be effective.
Our web system administrator looked around and found a different
cache directory for collections. When he cleared this out, the
new invocation of Tomcat resulted in the missing timesteps finding
their way into the featureCollection aggregation. It thus strikes
me that this is indeed the cache that needs to be cleaned our and/or
disregarded by the aggregation-making daemon process.

Nonetheless, we'll try it and get back to you.

-Hoop

---------------------------- Original message -------------------------------
Re: [thredds] Pb between OpenDap and THREDDS when netcdf file are modifed

* To: thredds@xxxxxxxxxxxxxxxx
* Subject: Re: [thredds] Pb between OpenDap and THREDDS when netcdf file
are modifed
* From: Ethan Davis<edavis@xxxxxxxxxxxxxxxx>
* Date: Wed, 04 Apr 2012 13:55:46 -0600

Hi Hoop,

Try turning off the NetcdfFile caching in your threddsConfig.xml by
setting NetcdfFileCache/maxFiles to zero:

http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/ThreddsConfigXMLFile.html#FileCache

This will turn off the NetcdfFile cache globally but not the aggregation
caches. There may be some performance issues in turning this off but we
suspect that OS file caching may make it negligible.

Let us know what you see. I'll get back to you on the XML checker stuff
in another email.

Ethan

On 04/03/12 11:47, Hoop wrote:

Ethan,

Additional information:  our web system administrator checked the
logs, and found that the software daemon that is supposed to check
and rebuild the aggregation if need be was indeed running, but
finding nothing to do.  Worse, he restarted Tomcat, which, with
NcML aggregation would pick up the more recent time steps, did not
change things.  The time series still ends 2012/03/28, as it did
when I first created the featureCollection version of the
aggregation, even though the final file has added five time steps.
The NcML version of the aggregation did pick up the new time steps
when Tomcat was restarted.

Hoping for a detailed response,
-Hoop

On 04/02/12 11:39, Hoop wrote:

Ethan,

Well, that got me just where NcML aggregation got me: an aggregation
that does not notice new timesteps added to the latest file.  It also
created two new time-like variables (time_offset and time_run) and
threw away most of the metadata I had for the time variable.  My only
reason for using "Latest" instead letting it default to "Penultimate"
was in the forlorn hope of getting my second value of the attribute
time:actual_range picked up.

I am still getting the same error messages from the XML checker
that TDS runs on its configuration files.  I wonder if I'm ever
going to hear back about this difference that makes a difference
between the published XSDs and the online-documentation.  Here are
the error messages:

[2012-03-29T19:16:15GMT]
readCatalog(): full path=/usr/share/tomcat5/content/thredds/catalog.xml;
path=catalog.xml
readCatalog(): valid catalog -- ----Catalog Validation version 1.0.01
*** XML parser error (36:14)= cvc-complex-type.2.4.a: Invalid content
was found starting with element 'filter'. One of
'{"http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":addLatest,
"http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":addProxies,
"http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":addDatasetSize,
"http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":addTimeCoverage}'
is expected.
*** XML parser error (54:50)= cvc-complex-type.2.4.a: Invalid content
was found starting with element 'update'. One of
'{"http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":fmrcConfig,
"http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":pointConfig,
"http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2":netcdf}' is
expected.

readCatalog(): full
path=/usr/share/tomcat5/content/thredds/enhancedCatalog.xml;
path=enhancedCatalog.xml
readCatalog(): valid catalog -- ----Catalog Validation version 1.0.01

-Hoop

------ original message --------------
Hi Hoop,

Try adding the following to your featureCollection element

   <metadata inherited="true">
     <serviceName>all</serviceName>
   </metadata>

Also, since your most recent dataset is the one that is changing, you
might want to change protoDataset@choice from "Latest" to "Penultimate"
(which is the default, so you could just drop protoDataset all
together). Also, since data files in your dataset don't age off, it
probably isn't too important which dataset is used but probably better
to not use the one that gets updated. The protoDataset is used to
populate the metadata in the feature dataset.

Since your datasets are a simple timeseries rather than a full-blown
FMRC, you will probably want to add

   <fmrcConfig datasetTypes="Best"/>

The fmrcConfig@datasetTypes value tells the featureCollection which
types of FMRC datasets to create. With the value "Best", the forecast
types are left off and only the "Best Time Series" dataset is created.
Not the best dataset name for a simple time series grid (its not just
the best time series, its the only one!) but that's what we have for the
moment. If you want to let people see the underlying files, you could
add "Files" to the fmrcConfig@datasetTypes value.

I'm including the link to the FeatureCollection tutorial [1] which I
forgot to point out in an earlier email when I gave you the link to the
reference docs [2].

Hope that helps,

Ethan

[1]
http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/tutorial/FeatureCollectionsTutorial.html

[2]
http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/collections/FeatureCollections.html

On 3/26/2012 11:13 AM, Hoop wrote:

Ethan,

The catalog is attached.  The filter element is in a datasetScan
element that we use to generically wrap our NetCDF files, and
not included within the featureCollection element or any other
aggregation element.  It is meant to generally apply throughout our
installation.

Sample files may be obtained from:

    ftp://ftp.cdc.noaa.gov/Datasets/noaa.oisst.v2.highres/
The files for this year are updated on a daily basis, barring
problems.

Let me know what else I can do to help.

-Hoop

On 03/24/12 23:02, thredds-request@xxxxxxxxxxxxxxxx wrote:

Send thredds mailing list submissions to
   thredds@xxxxxxxxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
   http://mailman.unidata.ucar.edu/mailman/listinfo/thredds
or, via email, send a message with subject or body 'help' to
   thredds-request@xxxxxxxxxxxxxxxx

You can reach the person managing the list at
   thredds-owner@xxxxxxxxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of thredds digest..."

thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit:
http://www.unidata.ucar.edu/mailing_lists/

Today's Topics:
    5. Re: Pb between OpenDap and THREDDS when netcdf file are
       modifed (Ethan Davis)
----------------------------------------------------------------------
Message: 5
Date: Sat, 24 Mar 2012 23:02:53 -0600
From: Ethan Davis<edavis@xxxxxxxxxxxxxxxx>
To: thredds@xxxxxxxxxxxxxxxx
Subject: Re: [thredds] Pb between OpenDap and THREDDS when netcdf file
   are modifed
Message-ID:<4F6EA6FD.8080906@xxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1

Hi Hoop,

Can you send us (or point us to) a few sample files and send us your
full catalog?

Is the filter you mention below part of your featureCollection element?

Ethan

On 3/9/2012 1:59 PM, Hoop wrote:

Ethan,

I don't believe John ever responded as you had requested.
I did my best to try "featureCollection", but I got nowhere.
It doesn't help that the XSDs specify required elements
(for "update" and "filter") that are not mentioned in the
online documentation; the validation process that TDS runs
at start-up informed me of those errors.  I have no clue how
to correct them.  Here is the attempt I made:

<featureCollection name="SST_NOAA_OISST_V2_HighResFC" featureType="FMRC"
  harvest="true" path="Datasets/aggro/OISSThires.nc">
  <collection
   spec="/Datasets/noaa.oisst.v2.highres/sst.day.mean.#yyyy#.v2.nc$"
   name="SST_OISST_V2_HighResFC" olderThan="15 min" />
  <protoDataset choice="Latest" change="0 0 7 * * ? *" />
  <update startup="true" rescan="0 0 * * * ? *" />
</featureCollection>

My use of "filter" is as follows:

      <filter>
         <include wildcard="*.nc"/>
         <exclude wildcard="*.data"/>
         <exclude wildcard="*.f"/>
         <exclude wildcard="*.gbx"/>
         <exclude wildcard="*.txt"/>
         <exclude wildcard="README"/>
      </filter>

Someone want to tell me what I did wrong in each case?

Thanks,
-Hoop

-------- Original Message --------
Subject:        Re: [thredds] Pb between OpenDap and THREDDS when
netcdf file are modifed
Date:   Thu, 23 Feb 2012 22:03:38 -0700
From:   Ethan Davis<edavis@xxxxxxxxxxxxxxxx>
To:     thredds@xxxxxxxxxxxxxxxx

Hi Hoop,

The dynamic dataset handling in the NcML aggregation code was designed
to deal with the appearance of new datasets more than data being
appended to existing datasets. The NcML aggregations are also limited to
straight forward aggregations based on homogeneity of dimensions and
coordinate variables; they don't use any coordinate system or higher
level feature information that might be available. This makes straight
NcML aggregation somewhat fragile and hard to generalize to more complex
situations.

FeatureCollections are designed to use the CDMs understanding of
coordinate systems and feature types to both simplify configuration and
make aggregations more robust and general.

While the FMRC collection capability was designed for a time series of
forecast runs, I believe it should handle a simple time series of grids
as well. (John, can you add more information on this?)

Ethan

On 2/23/2012 3:21 PM, Hoop wrote:

Ethan,

This reminds me of an issue we are having, with version 4.2.7.
Here is the relevant snippet from our config:
<dataset name="SST NOAA OISST V2 HighRes" ID="SST_OISST_V2_HighRes"
     urlPath="Datasets/aggro/OISSThires.nc" serviceName="odap"
dataType="grid">
     <netcdf
xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";;>
         <aggregation dimName="time" type="joinExisting"
recheckEvery="15 min">
             <scan location="/Projects/Datasets/noaa.oisst.v2.highres/"
                   regExp="sst\.day\.mean\.....\.v2\.nc$"
subdirs="false"/>
         </aggregation>
     </netcdf>
</dataset>

The behavior we are getting in our time series, which is based on
NetCDF files with a year's worth of time steps (or less), is as
follows:
In between re-boots of Tomcat, new time steps added to the latest file
are not added to the aggregation.  However, if the calendar marches
along
and a new file for a new year is added to our archive without rebooting
Tomcat, the timesteps for the new file are added, without the ones that
would complete the previous year, resulting in a discontinuity along
the
time axis.  And someone somewhere may e-mail us complaining that our
OPeNDAP object is not CF-compliant because the time steps aren't all of
the same size.  %}

I looked at the featureCollection documentation link you gave, but
since
our data are not forecasts, nor point data, nor in GRIB2 format, that
didn't seem the right fit.  Maybe I'm wrong; I'm severely
sleep-deprived
right now....

We also have some time series in monthly files (to keep the individual
file size under 2 Gbytes).  We have not tried aggregating any of those
time series.  Could be an interesting challenge.

Thanks for any help.

-Hoop

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit:
http://www.unidata.ucar.edu/mailing_lists/

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit:
http://www.unidata.ucar.edu/mailing_lists/

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/

Good Afternoon Hoop,

After reading through the e-mail thread, I'm guessing that the issue youare having relates to a file in the NetcdfFileCache not staying currentwith your updated final file. Adding a new final file, rather thanupdating an existing file, doesn't show the same issue because the newfile is not cached. I would suggest effectively turning off theNetcdfFileCache by setting the maxFiles parameter to zero in yourthreddsConfig.xml file:


  <NetcdfFileCache>
    <maxFiles>0</maxFiles>
  </NetcdfFileCache>

That way, the aggregation will not grab a stale file out of the cache.

Regards,
  Lansing

Follow-Ups:
- Re: [thredds] Problem between OPeNDAP and TDS when netCDF file is modified
  - From: Hoop

References:
- Re: [thredds] Problem between OPeNDAP and TDS when netCDF file is modified
  - From: Hoop

2012 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: