Re: [netcdf-java] Reading contiguous data in NetCDF files

To: Jon Blower <j.d.blower@xxxxxxxxxxxxx>
Subject: Re: [netcdf-java] Reading contiguous data in NetCDF files
From: John Caron <caron@xxxxxxxxxxxxxxxx>
Date: Mon, 19 Jul 2010 08:22:51 -0600

Hi Jon:

Jon Blower wrote:

Thanks John and Joe. Yes, I do know that disk I/O is the limitingfactor, but optimising it isn’t easy due to all the buffers and diskcaches (as you and Joe have pointed out). Interestingly, I can “see”these caches. When I read random chunks of data from a file, sometimesa read takes ~1ms, sometimes ~5ms and sometimes ~10ms, with not much inbetween these values (a trimodal distribution). I think these must bethree levels of caching. Also, if I run the same test multiple times onthe same file, the number of 10ms reads drops off, and the number of 1msreads increases. (I’m on a Windows XP laptop with a 5400 rpm hard drive.)
I guess the only way to bypass the caches would be to cycle between alarge set of data files, which are in total bigger than the disk caches.(I’m trying to simulate a busy server environment.)

If your server is running linux or solaris instead of windows XP, your will have different I/O results.

By the way, I’ve been digging in the IOSPs and the ucar RandomAccessFileclass. The ucar RAF seems to be the same as java.io.RAF except that itimplements an 8k buffer which is supposed to increase performance. Butthe code of N3raf (which extends N3iosp and I assume is the defaultclass used for data reading) uses raf.readToByteChannel(), whichbypasses the 8k buffer. So could a java.io.RAF have been used in this case?


ucar.RAF forked java.RAF to add buffering back in Java 1.0 days. It has 
accumulated various other conveniences since then, i think byte ordering is one 
(?). also, java.RAF is a final class so cant be sublassed by 
HTTPRandomAccessFile. For these and various reasons, one could not revert to 
use java.RAF except by forking the CDM code.

raf.readToByteChannel() is an experiment to try to allow "streamed reading", ie 
direct file to network transfer. It can be used in such a restricted manner that its not 
very useful generally. Its not used by the Grid/WMS code.

To expand a little on my use case: in general, to create alow-resolution map of data for a WMS, one has to read only a smallfraction of the available data in the file. So I’m looking for anefficient way to read sparse clouds of data (not evenly-spaced).Reading point-by-point is not efficient, but nor is reading lots ofdata, converting it to new types, then throwing most of it away.


What about writing low-resolution versions of the data and using that when 
possible?

Cheers, Jon

Follow-Ups:
- Re: [netcdf-java] Reading contiguous data in NetCDF files
  - From: Jon Blower

References:
- [netcdf-java] Reading contiguous data in NetCDF files
  - From: Jon Blower
- Re: [netcdf-java] Reading contiguous data in NetCDF files
  - From: John Caron
- Re: [netcdf-java] Reading contiguous data in NetCDF files
  - From: Jon Blower
- Re: [netcdf-java] Reading contiguous data in NetCDF files
  - From: Joe Sirott
- Re: [netcdf-java] Reading contiguous data in NetCDF files
  - From: John Caron
- Re: [netcdf-java] Reading contiguous data in NetCDF files
  - From: Jon Blower

2010 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: