Hi Giles:

Some questions are embedded  below:

On 5/25/2010 10:33 AM, Rich Signell wrote:

I actually don't have that much experience with this type of data, but
I'm cc'ing this to John Caron for his opinion.   He is very actively
working on time series stuff and I'm sure he has better ideas than I
on the appropriate approach to take.

John, please group reply, as I'm interested in the reply as well!


On Mon, May 24, 2010 at 2:22 PM,<address@hidden>  wrote:
Hi Rich

Good to hear from you. Fun stuff this netCDF/NcML. I've got a lot to learn
and would welcome your opinions/advice...

I/we are exploring this path (using remote NcML) because we would like to
set up a server which serves up many (several dozen) time-series datasets
("live" data from wave buoys and tide gauges and met sensors as well as
WAM model forecasts, etc) from several (around 20) ports. The datasets
will be continuously updated - every 30mins for wave data, every 6-10mins
for tide data either by appending existing files or by "rolling" to new
files every... not sure yet, will depend on source and data type - lets
say month, as a guess... we will then need to aggregate the sequences of
monthly files when we want to retrieve longer records.

A number of other requirements:
1. Speed/efficiency is important, when we query for data it must come back
fast. Scan aggregations without caching will definitely be too slow.

what will the queries be? What will be the queries be that need to be fast?

In what format do you want the data to be returned?

2. It is essential that we can re-create the exact state of the datasets
at specific times - for re-creating queries at a later time in case
questions arise. This makes me wary of the caching built into TDS - unless
the "refresh every" time is set very small, in which case what is the

If data has not yet arrived, you will get different results. How does that fit into the need for reproducibility?

3. I don't like the idea of having to restart the TDS every time a dataset
definition is updated in the catalog.xml (it would need to be restarted
very frequently).

Not sure what
you mean by the dataset definition? Why is it getting updated?

The above leads me to attempt a system where the NcML (joinExisting)
aggregations are done explicitly in remote files. I am making them nice
efficient aggregations where the coordinate dimensions are explicitly
specified in the NcML. These NcML files are being updated by the same
python scripts that are adding/appending the new data as it comes in. In
this way the TDS catalog.xml only needs to be changed and the TDS
restarted when new datasets are defined.

This also means that we can point our NetCDF data clients (Matlab, Python,
and we will somehow need to get something working in C#.NET) directly at
the NcML files rather than the TDS if we feel like it - not quite sure why
we would at this stage though...

Hope that paints the scene. I welcome you comments.

The answer to your second question is yes. We want to serve the virtual
datasets via OpenDAP.
This whole process is an exploration of whether NetCDF/NcML/TDS/OpenDAP
would offer a better (flexible/robust/transparent/extensible) solution
than the SQL databases we currently use for the same purpose...

what issues do you have with the SQL database?

Kind regards



Interesting finding.   And<address@hidden>  is
definitely the right place to ask.   But why do you want to use remote
NcML if you are running a THREDDS Data Server?
Do you want to embed that NcML into a thredds catalog so you can serve
the virtual dataset created by your NcML via OpenDAP and other


On Mon, May 24, 2010 at 4:40 AM, Giles Lesser
<address@hidden>  wrote:

Further to my earlier posts, I have done some more investigation (using
toolsUI-4.0.49.jar) and have determined that:

a) if I point either the njTbx or toolsUI at a URL such as
then they both fail, however toolsUI returns the rather more helpful
error message
"Server returned HTTP response code: 400 for URL:...."
where as njTbx returns
"read error: stream is closed"


b) if I simply change the "http" at the start of the URL to "dods" then
everything works just fine. ie
works fine in both njTbx and toolsUI.

This leads me to the (poorly understood) conclusion that the problem is
likely somewhere in the configuration of Tomcat/thredds where an
extension of
.ncml on an HTTP URL causes the server confusion, however an .ncml
extension on
a DODS URL is passed on (to thredds?) and evaluates OK.

I would like to understand why this occurs, however I appreciate that
this forum
is probably not the place for help. I will attempt to post to the
unidata-thredds mailing list, however if someone here can explain why
occurs I would be very grateful.

Thanks for your help.

On 14/05/2010 11:13 AM, address@hidden wrote:
Hi Sachin

My catalog.xml file follows. As you can see this is a test server we
experimenting with to see if OpenDAP serves the needs we have (and that
am just a beginner....)

I was/am attempting to access the LAST dataset declared in this
catalog.xml file.

You will also notice that I was experimenting with making the dataType
the datasets "Point" or "Grid" - as I'm not sure which is appropriate.
.nc files are NOT grid results in the numerical modelling sense of the
word, they contain output (wave spectra) for several forecast times at
several discrete stations.

The NcML files aggregate multiple .nc files along the analysisTime
dimension. I will also include a .ncml file FYI.

Many thanks


start of catalog.xml
<?xml version="1.0" encoding="UTF-8"?>
<catalog name="Giles' Simple Little THREDDS Server Catalog"


    <service name="all" base="" serviceType="compound">
      <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/"
      <service name="http" serviceType="HTTPServer"
base="/thredds/fileServer/" />

    <datasetRoot path="enviroData/regional/australasia/bom/wam"
location="/home/enviroData/regional/australasia/bom/wam/" />
    <datasetRoot path="enviroData/regional/australasia/bom/wam/2008"
location="/home/enviroData/regional/australasia/bom/wam/2008/" />
    <datasetRoot path="enviroData/regional/australasia/bom/wam/2009"
location="/home/enviroData/regional/australasia/bom/wam/2009/" />
    <datasetRoot path="enviroData/regional/australasia/bom/wam/2010"
location="/home/enviroData/regional/australasia/bom/wam/2010/" />

    <dataset name="The first WAM forecast" ID="WAM2008072200"

    <dataset name="BOM WAM forecasts 2008 - 1D" ID="BOMWAM2008_1D"
    <dataset name="BOM WAM forecasts 2008 - 2D" ID="BOMWAM2008_2D"

<dataset name="BOM WAM forecasts 2009 - 1D" ID="BOMWAM2009_1D"
<dataset name="BOM WAM forecasts 2009 - 2D" ID="BOMWAM2009_2D"

<dataset name="BOM WAM forecasts 2010 - 1D" ID="BOMWAM2010_1D"
<dataset name="BOM WAM forecasts 2010 - 2D" ID="BOMWAM2010_2D"

    <dataset name="BOM WAM forecasts ALL - 1D" ID="BOMWAM_ALL_1D"
        <aggregation dimName="validTime" type="joinExisting"

    <dataset name="BOM WAM forecasts ALL - 2D" ID="BOMWAM_ALL_2D"


end of catalog.xml

start of allSpec1d.ncml
<?xml version="1.0" encoding="UTF-8"?>
      <aggregation dimName="analysisTime" type="joinExisting">
        <scan location="\." suffix="spec1d_*.nc" subdirs="true" />
end of allSpec1d.ncml


I can see from your previous post that you can access ncml locally via
njtbx, so it seems
that your TDS may have some issues as it is unable to open dataset.

Can you provide your catalog.xml ?


On Thu, May 13, 2010 at 6:34 PM,<address@hidden>

Hi Sachin

Wow, quick response!

No, sorry I can't - well I could, but it wouldn't be any use. It is a
(trial) server running on our internal network. No access from the

Anything else I can do to help?




Can you provide the complete url ?




On Thu, May 13, 2010 at 6:13 PM,<address@hidden>
Dearest njtbx-users

I am using njtbx to access data aggregated (joinExisting) using a
file. When I point njtbx at the file directly all works fine. When
the (same) NcML file using a THREDDS OpenDAP server and then try
the data via the OpenDAP URL njtbx fails (despite being able to
OpenDAP dataset quite happily using a web browser). I would
help in figuring out how to debug this problem.

The problem seems to be caused by njtbx failing to create a
the OpenDAP URL. For example, when the mDataset fucntion is
file location it returns:

          numGrids: 0
     numDimensions: 6
      numVariables: 10

However, when pointed to the OpenDAP URL mDataset returns:

Unable to open the dataset:

read error: stream is closed
Attempt to reference field of non-structure array.

Debugging further leads to the discovery that the (Matlab) error
the function njGetInfo

The java object NCid provided as input to that function appears to
K>>    NCid

NCid =


However the getGridDataset() method fails to return anything
K>>    NCid.getGridDataset()

ans =


and hence line 21 of njGetInfo fails:
K>>    nGrid = size(NCid.getGridDataset().getGrids());
??? Attempt to reference field of non-structure array.

My question is: Is this clearly a bug in the Java toolbox? or is it
possible that something about either my underlying NetCDF files,
aggregation file, or TDS dataset definition could be at fault?

I can post/provide examples of .nc files, NcML file, and TDS
file if that would help anyone.

Any pointers very gratefully received.

Many thanks


