reading raw (packed) data from NetCDF files and avoiding missing-value check
Jon Blower
jdb at mail.nerc-essc.ac.uk
Fri Oct 27 07:55:44 MDT 2006
Hi Don,
Actually I think the former (a method to read raw data) is better than
the latter (not setting missing data metadata) because I still need a
method to do the unpacking etc on my data points of interest. This
needs the invalidDataMissing, fillValueMissing attributes to be set,
but I choose when to apply them, rather than them being applied on
every single data point that is read.
Regards, Jon
On 27/10/06, Don Murray <dmurray at unidata.ucar.edu> wrote:
> Hi Jon-
>
> Thanks for the explanation. It sounds like a method to read
> the raw data would be useful or better yet a constructor
> to GeoGrid that would take a boolean for not setting missing
> data (akin to all the setInvalidDataMissing(), setFillValueMissing()
> methods), but still allow the coordinate system enhancements.
>
> Don
>
> Jon Blower wrote:
> > Hi Don,
> >
> > The problem is caused by my use of the nj22 library. In my
> > application I need to create an image from a NetCDF file as quickly as
> > possible. The image will often be of much lower resolution than the
> > source data, but will not necessarily be in the same coordinate
> > reference system.
> >
> > If I want to create a 100x100 image, I need to read at least 10,000
> > data points. However, reading 10,000 individual points appears to be
> > very slow (especially for an NcML aggregation) so I am compromising by
> > reading chunks of contiguous data at a time. This means that I often
> > end up reading considerably more data than I need to make the image.
> > I perform the necessary interpolation in my application and throw away
> > the unwanted data.
> >
> > If I read packed data using an "enhanced" variable, then every single
> > point is internally checked to see if it is a missing value, and every
> > single point is unpacked (scale and offset applied). Through
> > profiling, I established this to be an expensive operation because it
> > is being applied to many more data points than I need. Therefore I
> > employed a method whereby data are read in their packed form, without
> > being checked for missing values. I then perform the check just for
> > the 10,000 points that I need to plot in my image. This is
> > considerably and demonstrably faster, although as with all
> > optimisation problems, it's a compromise.
> >
> > Does this clear things up? As far as changes to the libraries go, it
> > would be handy to have a method in GeoGrid for reading "raw" (packed)
> > data as fast as possible, and giving the user the opportunity to
> > unpack the data later.
> >
> > Best wishes,
> > Jon
> >
> > On 27/10/06, Don Murray <dmurray at unidata.ucar.edu> wrote:
> >> Jon and John-
> >>
> >> Why is it so much slower using the GeoGrid directly? Perhaps
> >> there can be some performance tuning on the GeoGrid side to
> >> avoid people having to jump through the hoops that Jon is?
> >> Is it because the GeoGrid scales and offsets the entire grid
> >> before subsetting instead of subsetting and then scale and
> >> offset (which seems to be what Jon ends up doing). Jon,
> >> when you say you are scaling and offsetting only the individual
> >> values, is this all the values in the subset or if not, what
> >> percentage of the subset are you doing this on?
> >>
> >> We've been doing some profiling of the netcdf-java reading
> >> in the IDV and if this is an area where we could get some
> >> performance enhancements, I'd like to implement something
> >> in the IDV.
> >>
> >> Don
> >>
> >> Jon Blower wrote:
> >> > Hi John (cc list),
> >> >
> >> > Thanks for you help - I found a solution that works well in my app.
> >> > As you suggested, I open the dataset without enhancement, then added
> >> > the coordinate systems:
> >> >
> >> > nc = NetcdfDataset.openDataset(location, false, null);
> >> > // Add the coordinate systems
> >> > CoordSysBuilder.addCoordinateSystems(nc, null);
> >> > GridDataset gd = new GridDataset(nc);
> >> > GeoGrid geogrid = gd.findGridByName(varID);
> >> >
> >> > I then create an EnhanceScaleMissingImpl:
> >> >
> >> > EnhanceScaleMissingImpl enhanced = new
> >> > EnhanceScaleMissingImpl((VariableDS)geogrid.getVariable());
> >> >
> >> > (Unfortunately this class is package-private so I made a copy from the
> >> > source code in my local directory. Could this class be made public
> >> > please?)
> >> >
> >> > This means that when I read data using geogrid.subset() it does not
> >> > check for missing values or unpack the data and is therefore quicker.
> >> > I then do enhanced.convertScaleOffsetMissing() only on the individual
> >> > values I need to work with. Seems to work well and is pretty quick.
> >> > Is there anything dangerous in the above?
> >> >
> >> > Thanks again,
> >> > Jon
> >> >
> >> >
> >> > On 26/10/06, John Caron <caron at unidata.ucar.edu> wrote:
> >> >> Hi Jon:
> >> >>
> >> >> Jon Blower wrote:
> >> >> > Hi John,
> >> >> >
> >> >> > I need some of the functionality of a GridDataset to allow me to
> >> read
> >> >> > coordinate system information. Also, I might be opening an NcML
> >> >> > aggregation. Is it sensible to use
> >> NetcdfDataset.getReferencedFile()?
> >> >> > In the case of an NcML aggregation, is it possible to get a
> >> handle to
> >> >> > a specific NetcdfFile (given relevant information such as the
> >> >> > timestep)?
> >> >>
> >> >> You are getting into the internals, so its a bit dangerous.
> >> >>
> >> >> I think this will work:
> >> >>
> >> >> NetcdfDataset ncd = openDataset(String location, false, null); //
> >> >> dont enhance
> >> >> ucar.nc2.dataset.CoordSysBuilder.addCoordinateSystems(ncd, null); //
> >> >> add coord info
> >> >> GridDataset gds = new GridDataset( ncd); // make into a grid
> >> >>
> >> >> BTW, you will want to switch to the new GridDataset in
> >> >> ucar.nc2.dt.grid when you start using 2.2.17. It should be compatible,
> >> >> let me know.
> >> >>
> >> >>
> >> >> >
> >> >> > On a related note, is it efficient to read data from a NetcdfFile
> >> (or
> >> >> > NetcdfDataset) point-by-point? I have been assuming that reading
> >> >> > contiguous chunks of data is more efficient than reading individual
> >> >> > points, even if it means reading more data than I actually need, but
> >> >> > perhaps this is not the case? Unfortunately I'm not at my usual
> >> >> > computer so I can't do a quick check myself. If reading data
> >> >> > point-by-point is efficient (enough) my problem goes away.
> >> >>
> >> >> It depends on data locality. If the points are close together on disk,
> >> >> then they will likely to be already in the random access file buffer.
> >> >> The bigger the buffer the more likely, you can try different buffer
> >> >> sizes with:
> >> >>
> >> >> NetcdfDataset openDataset(String location, boolean enhance, int
> >> >> buffer_size, ucar.nc2.util.CancelTask cancelTask, Object spiObject);
> >> >>
> >> >>
> >> >>
> >> >> >
> >> >> > Thanks, Jon
> >> >> >
> >> >> > On 26/10/06, John Caron <caron at unidata.ucar.edu> wrote:
> >> >> >
> >> >> >> Hi Jon:
> >> >> >>
> >> >> >> One obvious thing would be to open it as a NetcdfFile, not a
> >> >> >> GridDataset. Is that a possibility?
> >> >> >>
> >> >> >> Jon Blower wrote:
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > I'm writing an application that reads data from NetCDF files and
> >> >> >> > produces images. I've noticed (through profiling) that a slow
> >> point
> >> >> >> > in the data reading process is the unpacking of packed data (i.e.
> >> >> >> > applying scale and offset) and checking for missing values. I
> >> would
> >> >> >> > like to minimize the use of these calls.
> >> >> >> >
> >> >> >> > To cut a long post short, I would like to find a low-level
> >> function
> >> >> >> > that allows me to read the packed data, exactly as they appear in
> >> >> the
> >> >> >> > file. I can then "manually" apply the unpacking and
> >> missing-value
> >> >> >> > checks only to those data points that I need to display.
> >> >> >> >
> >> >> >> > I'm using nj22, version 2.2.16. I've tried reading data from
> >> >> >> > GeoGrid.subset() but this (of course) performs the unpacking. I
> >> >> then
> >> >> >> > tried getting the "unenhanced" variable object through
> >> >> >> > GeoGrid.getVariable().getOriginalVariable(), but
> >> (unexpectedly) this
> >> >> >> > also seems to perform unpacking and missing-value checks - I
> >> >> expected
> >> >> >> > it to give raw data.
> >> >> >> >
> >> >> >> > Can anyone help me to find a function for reading raw (packed)
> >> data
> >> >> >> > without performing missing-value checks?
> >> >> >> >
> >> >> >> > Thanks in advance,
> >> >> >> > Jon
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >>
> >> ==============================================================================
> >>
> >> >>
> >> >> >>
> >> >> >> To unsubscribe netcdf-java, visit:
> >> >> >> http://www.unidata.ucar.edu/mailing-list-delete-form.html
> >> >> >>
> >> >>
> >> ==============================================================================
> >>
> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >> --
> >> *************************************************************
> >> Don Murray UCAR Unidata Program
> >> dmurray at unidata.ucar.edu P.O. Box 3000
> >> (303) 497-8628 Boulder, CO 80307
> >> http://www.unidata.ucar.edu/staff/donm
> >> *************************************************************
> >>
> >>
> >>
> >
> >
>
> --
> *************************************************************
> Don Murray UCAR Unidata Program
> dmurray at unidata.ucar.edu P.O. Box 3000
> (303) 497-8628 Boulder, CO 80307
> http://www.unidata.ucar.edu/staff/donm
> *************************************************************
>
>
>
--
--------------------------------------------------------------
Dr Jon Blower Tel: +44 118 378 5213 (direct line)
Technical Director Tel: +44 118 378 8741 (ESSC)
Reading e-Science Centre Fax: +44 118 378 6413
ESSC Email: jdb at mail.nerc-essc.ac.uk
University of Reading
3 Earley Gate
Reading RG6 6AL, UK
--------------------------------------------------------------
==============================================================================
To unsubscribe netcdf-java, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================
More information about the Netcdf-java
mailing list