[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: phone message -- sample XML to follow (here it is)



Hi guys:

The good news is that Ive found the problem with the caching. Performance now 
is a lot better, though i dont have a measurement, and a lot may depend on your 
server.

The bad news (maybe) is that I am only going to fix this in the 4.0 version of 
NcML/TDS. We are pushing hard to get this out to beta this month. Id love to 
have you start to use it, to get feedback on other issues that may be lurking.

The main problem was the "anonymous" inner aggregations. To get the caching 
right, we need to give them ids, eg:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
  <aggregation dimName="time" type="joinExisting">
    <netcdf ncoords="36500" id="first100">
      <aggregation type="union">
        <netcdf location="pr_A2.00010101-01001231.nc"/>
        <netcdf location="tasmax_A2.00010101-01001231.nc"/>
        <netcdf location="tasmin_A2.00010101-01001231.nc"/>
      </aggregation>
    </netcdf>
    <netcdf ncoords="36500" id="sec100">
      <aggregation type="union">
        <netcdf location="pr_A2.01010101-02001231.nc"/>
        <netcdf location="tasmax_A2.01010101-02001231.nc"/>
        <netcdf location="tasmin_A2.01010101-02001231.nc"/>
      </aggregation>
    </netcdf>
    <netcdf ncoords="36500" id="third100">
      <aggregation type="union">
        <netcdf location="pr_A2.02010101-03001231.nc"/>
        <netcdf location="tasmax_A2.02010101-03001231.nc"/>
        <netcdf location="tasmin_A2.02010101-03001231.nc"/>
      </aggregation>
    </netcdf>
  </aggregation>
</netcdf>

I might be able to generate auto ids, but for now they have to be added by 
hand. As I said, this will only be useful in the 4.0 version. Ill get a release 
out later today in case you want to try it.

John

Steve Hankin wrote:
> Hi John,
> 
> Thanks for looking into this.   At this moment Kevin is modifying the
> code that creates the ncML aggregation configuration from the contents
> of our database.  It looks like we will be "down to the wire" in seeing
> how much faster TDS becomes when we start using the improved ncML (the
> changes are bigger than just moving the ncoords attribute).
> Can we ask you to "stand by" and maybe be willing to set your peepers on
> it later today?  Kevin's preliminary tests indicated that we will still
> getting the cache hit failures (that for unknown reasons TDS rebuilds
> the aggregation in cache instead of reusing what it saved previously). 
> But we don't have an up-to-date TDS site to show you yet.
> 
>    - Steve
> 
> John Caron wrote:
>> hi kevin:
>>
>> your ftp site is pretty slow (600 KB/sec) - is it throttled, or just
>> overwhelmed? should i wait until tonight to try to download these files?
>>
>> Kevin OBrien wrote:
>>  
>>> Hi John -
>>>
>>> I did as you suggested and moved the ncoords attribute to the outer
>>> aggregation and  I was able to get to the aggregation in around 28
>>> seconds.  Just to confirm that it wasn't something system-related, I
>>> changed the xml configuration back, and verified that when the ncoords
>>> attribute was in the inner aggregation, it took around 2 minutes to
>>> open.  So that's  a big speed improvement!    I think I will expand the
>>> aggregation xml to include full experiments and see how the performance
>>> changes.
>>>
>>> By the way, you can get all of these files at
>>>  
>>> ftp://nomads.gfdl.noaa.gov/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/
>>>
>>>
>>>
>>> and you'll see there are many more that would actually be configured
>>> into the aggregation..
>>>
>>> One thing I did notice and have a question about - after I moved the
>>> ncoords attribute to the outer aggregation, and I restarted the server -
>>> a cache file showed up in the cacheAged directory.   When I then just
>>> restarted the server to test the use of cache, after I had opened the
>>> aggregation (which again took around 30 seconds), I noticed that the
>>> cache file in the cacheAged directory had apparently been updated (at
>>> least the time stamp of the file was new).  If nothing in the
>>> aggregation has changed, should it be updating the cache file?  Or
>>> should it use the cache file already there?
>>>
>>> thanks -
>>> kevin
>>>
>>> John Caron wrote:
>>>    
>>>> Hi Kevin, Steve:
>>>>
>>>> You should try putting the ncoords attribute on the outer aggregation:
>>>>
>>>>             <netcdf
>>>> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";
>>>> ncoords="36500">
>>>>               <aggregation type="union">
>>>>                  <netcdf
>>>> location="file:/data/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/pr_A2.00010101-01001231.nc"
>>>>  
>>>> />
>>>>                  <netcdf
>>>> location="file:/data/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/tasmax_A2.00010101-01001231.nc"
>>>> />
>>>>                  <netcdf
>>>> location="file:/data/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/tasmin_A2.00010101-01001231.nc"
>>>> />
>>>>               </aggregation>
>>>>             </netcdf>
>>>>
>>>> let me know if that helps.
>>>>
>>>> Id like to test this nested aggregation as a use case . Can I get
>>>> those 9 files? thanks.
>>>>
>>>>
>>>> Steve Hankin wrote:
>>>>      
>>>>> (This is a continuation of the conversation that Kevin O'Brien
>>>>> started with you.)
>>>>>
>>>>> Hi John,
>>>>>
>>>>> Below is the ncML and TDS configuration information.  It all "works"
>>>>> ...  except the caching.  Any clues?
>>>>>
>>>>>      - Steve
>>>>>
>>>>> ===
>>>>>
>>>>> This from threddsConfig.xml
>>>>>
>>>>>   <AggregationCache>
>>>>>   
>>>>> <dir>/home/pmel/DataPortal/apache-tomcat-5.5.25/content/thredds/cacheAged/</dir>
>>>>>
>>>>>
>>>>>     <scour>24 hours</scour>
>>>>>     <maxAge>90 days</maxAge>
>>>>>   </AggregationCache>  ===
>>>>>
>>>>> And this is the latest ncML that Kevin tested:  "It took nearly two
>>>>> minutes to open the aggregation the first time.  After that, accesses
>>>>> were quick -- evidently caching was working.  Then I restarted the
>>>>> tomcat server, and again it took nearly two minutes to open the
>>>>> aggregation.  I could see that the cache file in the caching
>>>>> directory was again updated after the second tomcat restart (ie, the
>>>>> cache was rewritten rather than used)..."
>>>>>
>>>>> <catalog name="test IPCC Datasets"
>>>>>       
>>>>> xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0";
>>>>>         xmlns:xlink="http://www.w3.org/1999/xlink";>
>>>>>  
>>>>>   <service name="thisDODS3" serviceType="OpenDAP"
>>>>> base="/thredds/dodsC/" />
>>>>>  
>>>>>
>>>>>      <dataset ID="CM2Q-d2_1PctTo4x_j1 atmos daily all vars
>>>>> 00010101-03001231 test" name="CM2Q-d2_1PctTo4x_j1 atmos daily all
>>>>> vars 00010101-03001231 test"
>>>>> urlPath="ipcc_ar4_CM2.0_R1_1to4x-0_daily_atmos_00010101-03001231_test">
>>>>>
>>>>>         <serviceName>thisDODS3</serviceName>
>>>>>         <netcdf
>>>>> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
>>>>>           <aggregation dimName="time" type="joinExisting">
>>>>>              <netcdf
>>>>> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
>>>>>                <aggregation type="union">
>>>>>                   <netcdf
>>>>> location="file:/data/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/pr_A2.00010101-01001231.nc"
>>>>>
>>>>> ncoords="36500"   />
>>>>>                   <netcdf
>>>>> location="file:/data/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/tasmax_A2.00010101-01001231.nc"
>>>>>
>>>>> ncoords="36500" />
>>>>>                   <netcdf
>>>>> location="file:/data/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/tasmin_A2.00010101-01001231.nc"
>>>>>
>>>>> ncoords="36500" />
>>>>>                </aggregation>
>>>>>              </netcdf>
>>>>>              <netcdf
>>>>> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
>>>>>                <aggregation type="union">
>>>>>                   <netcdf
>>>>> location="file:/data/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/pr_A2.01010101-02001231.nc"
>>>>>
>>>>> ncoords="36500" />
>>>>>                   <netcdf
>>>>> location="file:/data/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/tasmax_A2.01010101-02001231.nc"
>>>>>
>>>>> ncoords="36500" />
>>>>>                   <netcdf
>>>>> location="file:/data/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/tasmin_A2.01010101-02001231.nc"
>>>>>
>>>>> ncoords="36500" />
>>>>>                </aggregation>
>>>>>              </netcdf>
>>>>>              <netcdf
>>>>> xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
>>>>>                <aggregation type="union">
>>>>>                   <netcdf
>>>>> location="file:/data/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/pr_A2.02010101-03001231.nc"
>>>>>
>>>>> ncoords="36500"/>
>>>>>                   <netcdf
>>>>> location="file:/data/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/tasmax_A2.02010101-03001231.nc"
>>>>>
>>>>> ncoords="36500"/>
>>>>>                   <netcdf
>>>>> location="file:/data/gfdl_cm2_0/CM2Q-d2_1PctTo4x_j1/pp/atmos/ts/daily/tasmin_A2.02010101-03001231.nc"
>>>>>
>>>>> ncoords="36500" />
>>>>>                </aggregation>
>>>>>              </netcdf>
>>>>>           </aggregation>
>>>>>         </netcdf>
>>>>>       </dataset>
>>>>>
>>>>> </catalog>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Steve Hankin wrote:
>>>>>        
>>>>>> Hi John,
>>>>>>
>>>>>> We're at the phone number in the signature line below.  Will follow
>>>>>> this email shortly with some XML fragments ... hoping maybe you have
>>>>>> a suggestion.
>>>>>>
>>>>>>    - Steve
>>>>>>
>>>>>>           
>>>>> -- 
>>>>> Steve Hankin, NOAA/PMEL -- address@hidden
>>>>> 7600 Sand Point Way NE, Seattle, WA 98115-0070
>>>>> ph. (206) 526-6080, FAX (206) 526-6744
>>>>>
>>>>> "The only thing necessary for the triumph of evil is for good men
>>>>> to do nothing." -- Edmund Burke
>>>>>
>>>>>         
>