Re: [thredds] Problem between OPeNDAP and TDS when netCDF file is modified

To: thredds@xxxxxxxxxxxxxxxx
Subject: Re: [thredds] Problem between OPeNDAP and TDS when netCDF file is modified
From: Hoop <don.k.hooper@xxxxxxxx>
Date: Wed, 04 Apr 2012 17:13:17 -0600
Ethan,

I've deleted Claude's original post from the cascade below, and neatened up
the Subject: line, which will no doubt screw up threading.  In any case, our
web system administrator tells me that we had such a NetcdfFileCache element
all along, with maxFiles set to 0 (I don't have all of our TDS config files
in front of me, alas).  I also found, in the online documentation at:

http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/ThreddsConfigXMLFile.html#DiskCache

the following:

 <FeatureCollection>
   <dir>/tomcat_home/content/thredds/cache/collection/</dir>
   <maxSize>20 Mb</maxSize>
   <jvmPercent>2</jvmPercent>
 </FeatureCollection>

Eliminating the <dir> and <jvmPercent> elements, and setting maxSize to zero
(and, hopefully putting it in the correct TDS config file.  >SIGH<), we
restarted Tomcat.  The results were initially disenheartening, as a timestep
added to the final file an hour before was not included in the
featureCollection aggregation, but was picked up by the NcML aggregation of
the same time series.

I just checked again (about an hour later), and it's still not part of the
featureCollection aggregation.  So, we still have no solution AFAIK.  Of
course, I don't know which TDS config file our web system administrator
put the maxSize element in.  The web page above says it should have gone in:

${tomcat_home}/content/thredds/threddsConfig.xml

Our web system administrator has gone home for the day, but I've asked him
in e-mail just which config file he put that element in.

-Hoop

On 04/04/12 14:51, Hoop wrote:
> Ethan,
> 
> Thanks for responding.  I'm dubious that this will be effective.
> Our web system administrator looked around and found a different
> cache directory for collections.  When he cleared this out, the
> new invocation of Tomcat resulted in the missing timesteps finding
> their way into the featureCollection aggregation.  It thus strikes
> me that this is indeed the cache that needs to be cleaned our and/or
> disregarded by the aggregation-making daemon process.
> 
> Nonetheless, we'll try it and get back to you.
> 
> -Hoop
> 
> ---------------------------- Original message -------------------------------
> Re: [thredds] Pb between OpenDap and THREDDS when netcdf file are modifed
> 
>     * To: thredds@xxxxxxxxxxxxxxxx
>     * Subject: Re: [thredds] Pb between OpenDap and THREDDS when netcdf file
> are modifed
>     * From: Ethan Davis <edavis@xxxxxxxxxxxxxxxx>
>     * Date: Wed, 04 Apr 2012 13:55:46 -0600
> 
> Hi Hoop,
> 
> Try turning off the NetcdfFile caching in your threddsConfig.xml by
> setting NetcdfFileCache/maxFiles to zero:
> 
>   <NetcdfFileCache>
>     <maxFiles>0</maxFiles>
>   </NetcdfFileCache>
> 
> http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/ThreddsConfigXMLFile.html#FileCache
> 
> This will turn off the NetcdfFile cache globally but not the aggregation
> caches. There may be some performance issues in turning this off but we
> suspect that OS file caching may make it negligible.
> 
> Let us know what you see. I'll get back to you on the XML checker stuff
> in another email.
> 
> Ethan
> 
> On 04/03/12 11:47, Hoop wrote:
>> Ethan,
>>
>> Additional information:  our web system administrator checked the
>> logs, and found that the software daemon that is supposed to check
>> and rebuild the aggregation if need be was indeed running, but
>> finding nothing to do.  Worse, he restarted Tomcat, which, with
>> NcML aggregation would pick up the more recent time steps, did not
>> change things.  The time series still ends 2012/03/28, as it did
>> when I first created the featureCollection version of the
>> aggregation, even though the final file has added five time steps.
>> The NcML version of the aggregation did pick up the new time steps
>> when Tomcat was restarted.
>>
>> Hoping for a detailed response,
>> -Hoop
>>
>> On 04/02/12 11:39, Hoop wrote:
>>> Ethan,
>>>
>>> Well, that got me just where NcML aggregation got me: an aggregation
>>> that does not notice new timesteps added to the latest file.  It also
>>> created two new time-like variables (time_offset and time_run) and
>>> threw away most of the metadata I had for the time variable.  My only
>>> reason for using "Latest" instead letting it default to "Penultimate"
>>> was in the forlorn hope of getting my second value of the attribute
>>> time:actual_range picked up.
>>>
>>> I am still getting the same error messages from the XML checker
>>> that TDS runs on its configuration files.  I wonder if I'm ever
>>> going to hear back about this difference that makes a difference
>>> between the published XSDs and the online-documentation.  Here are
>>> the error messages:
>>>
>>> [2012-03-29T19:16:15GMT]
>>> readCatalog(): full path=/usr/share/tomcat5/content/thredds/catalog.xml;
>>> path=catalog.xml
>>> readCatalog(): valid catalog -- ----Catalog Validation version 1.0.01
>>> *** XML parser error (36:14)= cvc-complex-type.2.4.a: Invalid content
>>> was found starting with element 'filter'. One of
>>> '{"http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":addLatest,
>>> "http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":addProxies,
>>> "http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":addDatasetSize,
>>> "http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":addTimeCoverage}'
>>> is expected.
>>> *** XML parser error (54:50)= cvc-complex-type.2.4.a: Invalid content
>>> was found starting with element 'update'. One of
>>> '{"http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":fmrcConfig,
>>> "http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0":pointConfig,
>>> "http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2":netcdf}' is
>>> expected.
>>>
>>> readCatalog(): full
>>> path=/usr/share/tomcat5/content/thredds/enhancedCatalog.xml;
>>> path=enhancedCatalog.xml
>>> readCatalog(): valid catalog -- ----Catalog Validation version 1.0.01
>>>
>>> -Hoop
>>>
>>> ------ original message --------------
>>> Hi Hoop,
>>>
>>> Try adding the following to your featureCollection element
>>>
>>>   <metadata inherited="true">
>>>     <serviceName>all</serviceName>
>>>   </metadata>
>>>
>>> Also, since your most recent dataset is the one that is changing, you
>>> might want to change protoDataset@choice from "Latest" to "Penultimate"
>>> (which is the default, so you could just drop protoDataset all
>>> together). Also, since data files in your dataset don't age off, it
>>> probably isn't too important which dataset is used but probably better
>>> to not use the one that gets updated. The protoDataset is used to
>>> populate the metadata in the feature dataset.
>>>
>>> Since your datasets are a simple timeseries rather than a full-blown
>>> FMRC, you will probably want to add
>>>
>>>   <fmrcConfig datasetTypes="Best"/>
>>>
>>> The fmrcConfig@datasetTypes value tells the featureCollection which
>>> types of FMRC datasets to create. With the value "Best", the forecast
>>> types are left off and only the "Best Time Series" dataset is created.
>>> Not the best dataset name for a simple time series grid (its not just
>>> the best time series, its the only one!) but that's what we have for the
>>> moment. If you want to let people see the underlying files, you could
>>> add "Files" to the fmrcConfig@datasetTypes value.
>>>
>>> I'm including the link to the FeatureCollection tutorial [1] which I
>>> forgot to point out in an earlier email when I gave you the link to the
>>> reference docs [2].
>>>
>>> Hope that helps,
>>>
>>> Ethan
>>>
>>> [1]
>>> http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/tutorial/FeatureCollectionsTutorial.html
>>>
>>> [2]
>>> http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/collections/FeatureCollections.html
>>>
>>> On 3/26/2012 11:13 AM, Hoop wrote:
>>>> Ethan,
>>>>
>>>> The catalog is attached.  The filter element is in a datasetScan
>>>> element that we use to generically wrap our NetCDF files, and
>>>> not included within the featureCollection element or any other
>>>> aggregation element.  It is meant to generally apply throughout our
>>>> installation.
>>>>
>>>> Sample files may be obtained from:
>>>>
>>>>    ftp://ftp.cdc.noaa.gov/Datasets/noaa.oisst.v2.highres/
>>>> The files for this year are updated on a daily basis, barring
>>>> problems.
>>>>
>>>> Let me know what else I can do to help.
>>>>
>>>> -Hoop
>>>>
>>>> On 03/24/12 23:02, thredds-request@xxxxxxxxxxxxxxxx wrote:
>>>>> Send thredds mailing list submissions to
>>>>>   thredds@xxxxxxxxxxxxxxxx
>>>>>
>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>>   http://mailman.unidata.ucar.edu/mailman/listinfo/thredds
>>>>> or, via email, send a message with subject or body 'help' to
>>>>>   thredds-request@xxxxxxxxxxxxxxxx
>>>>>
>>>>> You can reach the person managing the list at
>>>>>   thredds-owner@xxxxxxxxxxxxxxxx
>>>>>
>>>>> When replying, please edit your Subject line so it is more specific
>>>>> than "Re: Contents of thredds digest..."
>>>>>
>>>>> thredds mailing list
>>>>> thredds@xxxxxxxxxxxxxxxx
>>>>> For list information or to unsubscribe,  visit: 
>>>>> http://www.unidata.ucar.edu/mailing_lists/
>>>>>
>>>>> Today's Topics:
>>>>>    5. Re: Pb between OpenDap and THREDDS when netcdf file are
>>>>>       modifed (Ethan Davis)
>>>>> ----------------------------------------------------------------------
>>>>> Message: 5
>>>>> Date: Sat, 24 Mar 2012 23:02:53 -0600
>>>>> From: Ethan Davis <edavis@xxxxxxxxxxxxxxxx>
>>>>> To: thredds@xxxxxxxxxxxxxxxx
>>>>> Subject: Re: [thredds] Pb between OpenDap and THREDDS when netcdf file
>>>>>   are modifed
>>>>> Message-ID: <4F6EA6FD.8080906@xxxxxxxxxxxxxxxx>
>>>>> Content-Type: text/plain; charset=ISO-8859-1
>>>>>
>>>>> Hi Hoop,
>>>>>
>>>>> Can you send us (or point us to) a few sample files and send us your
>>>>> full catalog?
>>>>>
>>>>> Is the filter you mention below part of your featureCollection element?
>>>>>
>>>>> Ethan
>>>>>
>>>>> On 3/9/2012 1:59 PM, Hoop wrote:
>>>>>> Ethan,
>>>>>>
>>>>>> I don't believe John ever responded as you had requested.
>>>>>> I did my best to try "featureCollection", but I got nowhere.
>>>>>> It doesn't help that the XSDs specify required elements
>>>>>> (for "update" and "filter") that are not mentioned in the
>>>>>> online documentation; the validation process that TDS runs
>>>>>> at start-up informed me of those errors.  I have no clue how
>>>>>> to correct them.  Here is the attempt I made:
>>>>>>
>>>>>> <featureCollection name="SST_NOAA_OISST_V2_HighResFC" featureType="FMRC"
>>>>>>  harvest="true" path="Datasets/aggro/OISSThires.nc">
>>>>>>  <collection
>>>>>>   spec="/Datasets/noaa.oisst.v2.highres/sst.day.mean.#yyyy#.v2.nc$"
>>>>>>   name="SST_OISST_V2_HighResFC" olderThan="15 min" />
>>>>>>  <protoDataset choice="Latest" change="0 0 7 * * ? *" />
>>>>>>  <update startup="true" rescan="0 0 * * * ? *" />
>>>>>> </featureCollection>
>>>>>>
>>>>>> My use of "filter" is as follows:
>>>>>>
>>>>>>      <filter>
>>>>>>         <include wildcard="*.nc"/>
>>>>>>         <exclude wildcard="*.data"/>
>>>>>>         <exclude wildcard="*.f"/>
>>>>>>         <exclude wildcard="*.gbx"/>
>>>>>>         <exclude wildcard="*.txt"/>
>>>>>>         <exclude wildcard="README"/>
>>>>>>      </filter>
>>>>>>
>>>>>> Someone want to tell me what I did wrong in each case?
>>>>>>
>>>>>> Thanks,
>>>>>> -Hoop
>>>>>>
>>>>>>> -------- Original Message --------
>>>>>>> Subject:        Re: [thredds] Pb between OpenDap and THREDDS when 
>>>>>>> netcdf file are modifed
>>>>>>> Date:   Thu, 23 Feb 2012 22:03:38 -0700
>>>>>>> From:   Ethan Davis <edavis@xxxxxxxxxxxxxxxx>
>>>>>>> To:     thredds@xxxxxxxxxxxxxxxx
>>>>>>>
>>>>>>> Hi Hoop,
>>>>>>>
>>>>>>> The dynamic dataset handling in the NcML aggregation code was designed
>>>>>>> to deal with the appearance of new datasets more than data being
>>>>>>> appended to existing datasets. The NcML aggregations are also limited to
>>>>>>> straight forward aggregations based on homogeneity of dimensions and
>>>>>>> coordinate variables; they don't use any coordinate system or higher
>>>>>>> level feature information that might be available. This makes straight
>>>>>>> NcML aggregation somewhat fragile and hard to generalize to more complex
>>>>>>> situations.
>>>>>>>
>>>>>>> FeatureCollections are designed to use the CDMs understanding of
>>>>>>> coordinate systems and feature types to both simplify configuration and
>>>>>>> make aggregations more robust and general.
>>>>>>>
>>>>>>> While the FMRC collection capability was designed for a time series of
>>>>>>> forecast runs, I believe it should handle a simple time series of grids
>>>>>>> as well. (John, can you add more information on this?)
>>>>>>>
>>>>>>> Ethan
>>>>>>>
>>>>>>> On 2/23/2012 3:21 PM, Hoop wrote:
>>>>>>>> Ethan,
>>>>>>>>
>>>>>>>> This reminds me of an issue we are having, with version 4.2.7.
>>>>>>>> Here is the relevant snippet from our config:
>>>>>>>> <dataset name="SST NOAA OISST V2 HighRes" ID="SST_OISST_V2_HighRes"
>>>>>>>>     urlPath="Datasets/aggro/OISSThires.nc" serviceName="odap" 
>>>>>>>> dataType="grid">
>>>>>>>>     <netcdf 
>>>>>>>> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
>>>>>>>>         <aggregation dimName="time" type="joinExisting" 
>>>>>>>> recheckEvery="15 min">
>>>>>>>>             <scan location="/Projects/Datasets/noaa.oisst.v2.highres/"
>>>>>>>>                   regExp="sst\.day\.mean\.....\.v2\.nc$" 
>>>>>>>> subdirs="false"/>
>>>>>>>>         </aggregation>
>>>>>>>>     </netcdf>
>>>>>>>> </dataset>
>>>>>>>>
>>>>>>>> The behavior we are getting in our time series, which is based on
>>>>>>>> NetCDF files with a year's worth of time steps (or less), is as 
>>>>>>>> follows:
>>>>>>>> In between re-boots of Tomcat, new time steps added to the latest file
>>>>>>>> are not added to the aggregation.  However, if the calendar marches 
>>>>>>>> along
>>>>>>>> and a new file for a new year is added to our archive without rebooting
>>>>>>>> Tomcat, the timesteps for the new file are added, without the ones that
>>>>>>>> would complete the previous year, resulting in a discontinuity along 
>>>>>>>> the
>>>>>>>> time axis.  And someone somewhere may e-mail us complaining that our
>>>>>>>> OPeNDAP object is not CF-compliant because the time steps aren't all of
>>>>>>>> the same size.  %}
>>>>>>>>
>>>>>>>> I looked at the featureCollection documentation link you gave, but 
>>>>>>>> since
>>>>>>>> our data are not forecasts, nor point data, nor in GRIB2 format, that
>>>>>>>> didn't seem the right fit.  Maybe I'm wrong; I'm severely 
>>>>>>>> sleep-deprived
>>>>>>>> right now....
>>>>>>>>
>>>>>>>> We also have some time series in monthly files (to keep the individual
>>>>>>>> file size under 2 Gbytes).  We have not tried aggregating any of those
>>>>>>>> time series.  Could be an interesting challenge.
>>>>>>>>
>>>>>>>> Thanks for any help.
>>>>>>>>
>>>>>>>> -Hoop
>>>>>>> _______________________________________________
>>>>>>> thredds mailing list
>>>>>>> thredds@xxxxxxxxxxxxxxxx
>>>>>>> For list information or to unsubscribe,  visit: 
>>>>>>> http://www.unidata.ucar.edu/mailing_lists/ 
>>>>>> _______________________________________________
>>>>>> thredds mailing list
>>>>>> thredds@xxxxxxxxxxxxxxxx
>>>>>> For list information or to unsubscribe,  visit: 
>>>>>> http://www.unidata.ucar.edu/mailing_lists/
References:
- Re: [thredds] Pb between OpenDap and THREDDS when netcdf file are modifed
  - From: Hoop
- Re: [thredds] Pb between OpenDap and THREDDS when netcdf file are modifed
  - From: Hoop
- Re: [thredds] Pb between OpenDap and THREDDS when netcdf file are modifed
  - From: Hoop
2012 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: