[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NCML



Rob Weingruber wrote:

Hi John Again ;-)

John Caron wrote:



Rob Weingruber wrote:

Hi Again ;-)

Looks like the NCML is the way to go.  Thanks for the suggestion ;-)

I created a super simple NCML that aggregates several files into a single virtual dataset, joined by an existing 'Time' variable. Piece of cake. And the GeoGrid API was nice enough to then give me all of the 'Valid Times' for that virtual dataset. This 'Time' variable is the semantic equivalent of the 'valid time' for the file. However, there also is the issue of the 'Generated Time' for a file (ie: generated at 12:00 Z, but valid for 15:00 Z. This would be used in requests such as 'give me the 15:00 Z forecast gen'ed at 12:00 Z). I see that there might be 2 ways to glue on the generated time information: a) as an attribute in each of the files that make up the data set or b) join on a new gen time variable. Which would be best and performant, in your opinion? Would the latter even be possible, considering that we still would need to join on the existing 'valid time' variable? Or would we just join on the 'valid time', and then attach a new gen-time variable (and value) to each of the files (within the NCML for that virtual dataset)?


I am currently working on a new kind of NcML aggregation called "forecastModelRunCollection", which deals with a 2D time, "valid" and "generated". I hope to have an alpha version in the next week or two. There is some partially completedd code in the 2.2.17 snapshot. I will probably make some UML diagrams, and ill send them along to you for your feedback when I do.

Gladly will take a look at these/this whenever you're ready for me to. This sounds like
exactly what we might need...

Make sense?


Also, I recall that we agreed the performance would be fine for, say, 10,000 files within a virtual dataset defined by NCML. Did I misinterpret, or is that reasonable?


I think there will be some optimizations needed to scale up to that size. It will probably work (given enough memory - I forget if JVMs are still restricted to 2Gb heaps)? Id like to measure its memory use, so perhaps you could help me test and debug this size datasets?


Glad to help here too. I think I have an old JBuilder around, that has an OptimizeIt license too....

One thing I thought of recently, is: does NCML allow datetime coordValue's to be placed *into* the NCML (thereby avoiding a file.open when those coordValues are queried via the API)? I tried the following, to no avail**:

  <aggregation dimName="Time" type="joinExisting">
<netcdf location="file:/d2/www/data/ncmlTest/DPG/2006070611/wrfout_d01_2006-07-06_080000.DPG_F.nc" coordValue="2006-07-06 08:00:00Z"/>...

The reasoning behind this is that I would like to place the 'valid (and gen) time's into the NCML, where each coordValue would theoretically match the value in the file for that specific netcdf file. If the API could then *use the values directly from the NCML*, then there might be no need to open the file(s) when geoGrid.getTimes() is called. The point being that if we can avoid opening files for valid and gen time information, then we'd better the performance for datasets with lots and lots of files. What do you think?

** "To no avail" - means that it worked, but I tried moving the netcdf files out of the way, to see if they would be opened for a geoGrid.getTimes() call, and exceptions were thrown. It all worked when I left the files where they were supposed to be, but that wasnt the point ;-)


Ive just been working on some of that in 2.2.17, see new section "Defining coordinates on a JoinExisting aggregation" in

 http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Aggregation.html.

It looks like what you were doing above is mostly correct (assuming your files have an existing coordinate variable called "Time" with length 1), but the current version is not handling it. I would reccomend that you use the form "2006-07-06T08:00:00Z" so that we can use space delimiters when theres more than one coord value.

Also, the coord values can be cached (you have to enable this, see the last section "Aggregation Caching") if you want to let the library read the values the first time.

This code is so new im not sure i have even done a release with it. Im working at home today, ill check when im in tommorrow...

This refers to the joinExisting aggregations. You probably really want to use the new "Forecast Model Run" Aggregation that im working on now. It will be similar, but take into account the 2D time coordinate.