Re: [netcdf-java] Aggregation on an inner time dimension

To: John Caron <caron@xxxxxxxxxxxxxxxx>
Subject: Re: [netcdf-java] Aggregation on an inner time dimension
From: Martin Price <mpricemetoffice@xxxxxxxxxxxxxx>
Date: Mon, 17 Aug 2009 10:18:41 +0100

Hi John

>> Yes we extract 10-year plus time series of hourly hindcast data from
>> six-hourly model run files,  either one point or more usually a
>> handful of them.  These are then processed for climatology statistics,
>> to look at specific events, or supplied to a user.  Converting them to
>> e.g. monthly files does improve extraction time somewhat, but not
>> nearly as much as permuting them so time is the inner dimension.  To
>> lowest order, permuting reduces extraction time by
>> 1/(length_of_time_dimension).
>>
>
> If im thinking about this correctly, this will be true when you only want 1
> point, since each time point will cost you a disk read.
>
> If you want a "handful", for example 6 points along the lon (the inner
> dimension (time, lat, lon) for current implementation), then im thinking you
> would get the same performance.

Yes I think you're right for this dataset in its current form, but an
inner aggregation would open up the possibility of dramatically
increasing performance by concatenating the files together into say
daily, weekly, or monthly files.  I've tried this with time as the
outer dimension and increases performance a bit, I think because it
reduces the time it takes to build the FMRC and the overhead of
opening extra files.  But if disk seek time is around 5ms and read
time for a double around 0.002ms, then if all the data are contiguous
on disk you can read in thousands of data points in the time it takes
to do a single seek.  I'm anything but an expert on IO so I don't know
how far this approach would scale, but we've tested it up to daily
files of hourly data (for a different dataset, accessing the
individual files in a loop using Java-netCDF) and did get very close
to a factor of 24 speedup.

>We are working on an experimental "ncstream" protocol that allows a writer to 
>write data in any >order, and the reader rearranges as needed, but its not 
>ready for use yet.

Do you know when this might be available?  I need to decide whether
it's worth writing something bespoke for this... and if there's a
solution in the pipeline in the libraries it's probably not.

> BTW, who is consuming the output? An internal process that you control, or
> ???

They're used by an analyst.  Most often some statistical analysis is
done and the results used to create a report for an external user at
their site.


Thanks for your help, I'd got about as far as I could from looking at the APIs.

kind regards,
Martin

-- 
Martin Price
Ocean Forecasting Research and Development
Met Office, FitzRoy Road, Exeter, EX1 3PB, United Kingdom
Tel: +44 (0)1392 886982   Fax: +44 (0)1392 885681
email: mpricemetoffice at googlemail dot com
http://www.metoffice.gov.uk

References:
- [netcdf-java] Aggregation on an inner time dimension
  - From: Martin Price
- Re: [netcdf-java] Aggregation on an inner time dimension
  - From: John Caron
- Re: [netcdf-java] Aggregation on an inner time dimension
  - From: John Caron

2009 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: