Re: [netcdf-java] Reading contiguous data in NetCDF files

Hi John,,

> thats interesting. when you finish your analysis, ill try to
incorporate your improvements into the library.

Well, I don't plan any more analysis really.  The code I now use to read
data can be found in my DataReadingStrategy:
http://www.resc.rdg.ac.uk/trac/ncWMS/browser/branches/edal-refactor-2010
0416/src/java/uk/ac/rdg/resc/edal/cdm/DataReadingStrategy.java (see the
readData method).  Note that I'm using the read() method of a
non-enhanced variable, then applying the enhancements.

> im guessing the "time spent creating the LatLonRect" is constructing
the latlon bounding box?

Yes, I think so.  This might be worth caching rather than calculating
afresh each time.  But most users won't be calling this method 64k times
I guess!

Cheers, Jon

-----Original Message-----
From: netcdf-java-bounces@xxxxxxxxxxxxxxxx
[mailto:netcdf-java-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John Caron
Sent: 22 July 2010 14:49
To: Jon Blower
Cc: netcdf-java@xxxxxxxxxxxxxxxx
Subject: Re: [netcdf-java] Reading contiguous data in NetCDF files

Jon Blower wrote:
> Hi John,
> 
> I've been doing some more tests on this.  Joe was absolutely right, on
> the first read of data from a file the situation is completely i/o
> bound, but subsequent reads come from disk cache and performance is
> limited by the speed of conversion to Java types. 
> 
> I've also found out that using GridDatatype.makeSubset() is
inefficient
> for my use case of "many small read operations" since it creates a lot
> of georeferencing objects with each invocation (profiling shows that
> most of the time is spent creating the LatLonRect, strangely).  I've
> found it much more efficient to use the lower-level objects like
> Variable.read() to do the reading, having used the GridDatatype to
> identify which axis is which.  This means that there is very little
> penalty for each read operation, particularly if the read operations
are
> sorted in order of increasing offset (this means that buffering in the
> RAF does its job well).  It actually turns out to be quite efficient
to
> read data point-by-point to satisfy the use case of reading sparse
> points from a data file.

thats interesting. when you finish your analysis, ill try to incorporate
your improvements into the library.

GridDatatype is old code, and needs work obviously. im guessing the
"time spent creating the LatLonRect" is constructing the latlon bounding
box? i think i made an attempt to defer that since its rarely needed,
but had to back it out (?). ill give it another try asap.


> 
> This may ultimately bubble up to simplifications in THREDDS-WMS, which
> I'll let you know about after more experimentation.  It's possible
that
> the WMS will no longer need the capability to open datasets with
> Enhance.ScaleMissingDefer.
> 
> Thanks for your help,
> Jon
> 
> 
> -----Original Message-----
> From: John Caron [mailto:caron@xxxxxxxxxxxxxxxx] 
> Sent: 19 July 2010 15:23
> To: Jon Blower
> Cc: netcdf-java@xxxxxxxxxxxxxxxx
> Subject: Re: [netcdf-java] Reading contiguous data in NetCDF files
> 
> Hi Jon:
> 
> Jon Blower wrote:
>> Thanks John and Joe.  Yes, I do know that disk I/O is the limiting 
>> factor, but optimising it isn't easy due to all the buffers and disk 
>> caches (as you and Joe have pointed out).  Interestingly, I can "see"

>> these caches.  When I read random chunks of data from a file,
> sometimes 
>> a read takes ~1ms, sometimes ~5ms and sometimes ~10ms, with not much
> in 
>> between these values (a trimodal distribution).  I think these must
be
> 
>> three levels of caching.  Also, if I run the same test multiple times
> on 
>> the same file, the number of 10ms reads drops off, and the number of
> 1ms 
>> reads increases.  (I'm on a Windows XP laptop with a 5400 rpm hard
> drive.)
>>  
>>
>> I guess the only way to bypass the caches would be to cycle between a

>> large set of data files, which are in total bigger than the disk
> caches. 
>>   (I'm trying to simulate a busy server environment.)
> 
> If your server is running linux or solaris instead of windows XP, your
> will have different I/O results. 
>>  
>>
>> By the way, I've been digging in the IOSPs and the ucar
> RandomAccessFile 
>> class.  The ucar RAF seems to be the same as java.io.RAF except that
> it 
>> implements an 8k buffer which is supposed to increase performance.
> But 
>> the code of N3raf (which extends N3iosp and I assume is the default 
>> class used for data reading) uses raf.readToByteChannel(), which 
>> bypasses the 8k buffer.  So could a java.io.RAF have been used in
this
> case?
> 
> ucar.RAF forked java.RAF to add buffering back in Java 1.0 days. It
has
> accumulated various other conveniences since then, i think byte
ordering
> is one (?). also, java.RAF is a final class so cant be sublassed by
> HTTPRandomAccessFile. For these and various reasons, one could not
> revert to use java.RAF except by forking the CDM code.
> 
> raf.readToByteChannel() is an experiment to try to allow "streamed
> reading", ie direct file to network transfer. It can be used in such a
> restricted manner that its not very useful generally. Its not used by
> the Grid/WMS code.
> 
>>  
>>
>> To expand a little on my use case: in general, to create a 
>> low-resolution map of data for a WMS, one has to read only a small 
>> fraction of the available data in the file.  So I'm looking for an 
>> efficient way to read sparse clouds of data (not evenly-spaced).  
>> Reading point-by-point is not efficient, but nor is reading lots of 
>> data, converting it to new types, then throwing most of it away.
> 
> What about writing low-resolution versions of the data and using that
> when possible?
> 
>>  
>>
>> Cheers, Jon

_______________________________________________
netcdf-java mailing list
netcdf-java@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/ 



  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: