[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: reading raw (packed) data from NetCDF files and avoiding missing-value check



Hi Don,

Actually I think the former (a method to read raw data) is better than
the latter (not setting missing data metadata) because I still need a
method to do the unpacking etc on my data points of interest.  This
needs the invalidDataMissing, fillValueMissing attributes to be set,
but I choose when to apply them, rather than them being applied on
every single data point that is read.

Regards, Jon

On 27/10/06, Don Murray <address@hidden> wrote:
Hi Jon-

Thanks for the explanation.  It sounds like a method to read
the raw data would be useful or better yet a constructor
to GeoGrid that would take a boolean for not setting missing
data (akin to all the setInvalidDataMissing(), setFillValueMissing()
methods), but still allow the coordinate system enhancements.

Don

Jon Blower wrote:
> Hi Don,
>
> The problem is caused by my use of the nj22 library.  In my
> application I need to create an image from a NetCDF file as quickly as
> possible.  The image will often be of much lower resolution than the
> source data, but will not necessarily be in the same coordinate
> reference system.
>
> If I want to create a 100x100 image, I need to read at least 10,000
> data points.  However, reading 10,000 individual points appears to be
> very slow (especially for an NcML aggregation) so I am compromising by
> reading chunks of contiguous data at a time.  This means that I often
> end up reading considerably more data than I need to make the image.
> I perform the necessary interpolation in my application and throw away
> the unwanted data.
>
> If I read packed data using an "enhanced" variable, then every single
> point is internally checked to see if it is a missing value, and every
> single point is unpacked (scale and offset applied).  Through
> profiling, I established this to be an expensive operation because it
> is being applied to many more data points than I need.  Therefore I
> employed a method whereby data are read in their packed form, without
> being checked for missing values.  I then perform the check just for
> the 10,000 points that I need to plot in my image.  This is
> considerably and demonstrably faster, although as with all
> optimisation problems, it's a compromise.
>
> Does this clear things up?  As far as changes to the libraries go, it
> would be handy to have a method in GeoGrid for reading "raw" (packed)
> data as fast as possible, and giving the user the opportunity to
> unpack the data later.
>
> Best wishes,
> Jon
>
> On 27/10/06, Don Murray <address@hidden> wrote:
>> Jon and John-
>>
>> Why is it so much slower using the GeoGrid directly?  Perhaps
>> there can be some performance tuning on the GeoGrid side to
>> avoid people having to jump through the hoops that Jon is?
>> Is it because the GeoGrid scales and offsets the entire grid
>> before subsetting instead of subsetting and then scale and
>> offset (which seems to be what Jon ends up doing).  Jon,
>> when you say you are scaling and offsetting only the individual
>> values, is this all the values in the subset or if not, what
>> percentage of the subset are you doing this on?
>>
>> We've been doing some profiling of the netcdf-java reading
>> in the IDV and if this is an area where we could get some
>> performance enhancements, I'd like to implement something
>> in the IDV.
>>
>> Don
>>
>> Jon Blower wrote:
>> > Hi John (cc list),
>> >
>> > Thanks for you help - I found a solution that works well in my app.
>> > As you suggested, I open the dataset without enhancement, then added
>> > the coordinate systems:
>> >
>> >            nc = NetcdfDataset.openDataset(location, false, null);
>> >            // Add the coordinate systems
>> >            CoordSysBuilder.addCoordinateSystems(nc, null);
>> >            GridDataset gd = new GridDataset(nc);
>> >            GeoGrid geogrid = gd.findGridByName(varID);
>> >
>> > I then create an EnhanceScaleMissingImpl:
>> >
>> >            EnhanceScaleMissingImpl enhanced = new
>> > EnhanceScaleMissingImpl((VariableDS)geogrid.getVariable());
>> >
>> > (Unfortunately this class is package-private so I made a copy from the
>> > source code in my local directory.  Could this class be made public
>> > please?)
>> >
>> > This means that when I read data using geogrid.subset() it does not
>> > check for missing values or unpack the data and is therefore quicker.
>> > I then do enhanced.convertScaleOffsetMissing() only on the individual
>> > values I need to work with.  Seems to work well and is pretty quick.
>> > Is there anything dangerous in the above?
>> >
>> > Thanks again,
>> > Jon
>> >
>> >
>> > On 26/10/06, John Caron <address@hidden> wrote:
>> >> Hi Jon:
>> >>
>> >> Jon Blower wrote:
>> >> > Hi John,
>> >> >
>> >> > I need some of the functionality of a GridDataset to allow me to
>> read
>> >> > coordinate system information.  Also, I might be opening an NcML
>> >> > aggregation.  Is it sensible to use
>> NetcdfDataset.getReferencedFile()?
>> >> > In the case of an NcML aggregation, is it possible to get a
>> handle to
>> >> > a specific NetcdfFile (given relevant information such as the
>> >> > timestep)?
>> >>
>> >> You are getting into the internals, so its a bit dangerous.
>> >>
>> >> I think this will work:
>> >>
>> >>  NetcdfDataset ncd = openDataset(String location, false, null); //
>> >> dont enhance
>> >>  ucar.nc2.dataset.CoordSysBuilder.addCoordinateSystems(ncd, null); //
>> >> add coord info
>> >>  GridDataset gds = new GridDataset( ncd); // make into a grid
>> >>
>> >> BTW, you will want to switch to the new GridDataset in
>> >> ucar.nc2.dt.grid when you start using 2.2.17. It should be compatible,
>> >> let me know.
>> >>
>> >>
>> >> >
>> >> > On a related note, is it efficient to read data from a NetcdfFile
>> (or
>> >> > NetcdfDataset) point-by-point?  I have been assuming that reading
>> >> > contiguous chunks of data is more efficient than reading individual
>> >> > points, even if it means reading more data than I actually need, but
>> >> > perhaps this is not the case?  Unfortunately I'm not at my usual
>> >> > computer so I can't do a quick check myself.  If reading data
>> >> > point-by-point is efficient (enough) my problem goes away.
>> >>
>> >> It depends on data locality. If the points are close together on disk,
>> >> then they will likely to be already in the random access file buffer.
>> >> The bigger the buffer the more likely, you can try different buffer
>> >> sizes with:
>> >>
>> >> NetcdfDataset openDataset(String location, boolean enhance, int
>> >> buffer_size, ucar.nc2.util.CancelTask cancelTask, Object spiObject);
>> >>
>> >>
>> >>
>> >> >
>> >> > Thanks, Jon
>> >> >
>> >> > On 26/10/06, John Caron <address@hidden> wrote:
>> >> >
>> >> >> Hi Jon:
>> >> >>
>> >> >> One obvious thing would be to open it as a NetcdfFile, not a
>> >> >> GridDataset. Is that a possibility?
>> >> >>
>> >> >> Jon Blower wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > I'm writing an application that reads data from NetCDF files and
>> >> >> > produces images.  I've noticed (through profiling) that a slow
>> point
>> >> >> > in the data reading process is the unpacking of packed data (i.e.
>> >> >> > applying scale and offset) and checking for missing values.  I
>> would
>> >> >> > like to minimize the use of these calls.
>> >> >> >
>> >> >> > To cut a long post short, I would like to find a low-level
>> function
>> >> >> > that allows me to read the packed data, exactly as they appear in
>> >> the
>> >> >> > file.  I can then "manually" apply the unpacking and
>> missing-value
>> >> >> > checks only to those data points that I need to display.
>> >> >> >
>> >> >> > I'm using nj22, version 2.2.16.  I've tried reading data from
>> >> >> > GeoGrid.subset() but this (of course) performs the unpacking.  I
>> >> then
>> >> >> > tried getting the "unenhanced" variable object through
>> >> >> > GeoGrid.getVariable().getOriginalVariable(), but
>> (unexpectedly) this
>> >> >> > also seems to perform unpacking and missing-value checks - I
>> >> expected
>> >> >> > it to give raw data.
>> >> >> >
>> >> >> > Can anyone help me to find a function for reading raw (packed)
>> data
>> >> >> > without performing missing-value checks?
>> >> >> >
>> >> >> > Thanks in advance,
>> >> >> > Jon
>> >> >> >
>> >> >>
>> >> >>
>> >>
>> 
===============================================================================
>>
>> >>
>> >> >>
>> >> >> To unsubscribe netcdf-java, visit:
>> >> >> http://www.unidata.ucar.edu/mailing-list-delete-form.html
>> >> >>
>> >>
>> 
===============================================================================
>>
>> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>> --
>> *************************************************************
>> Don Murray                               UCAR Unidata Program
>> address@hidden                        P.O. Box 3000
>> (303) 497-8628                              Boulder, CO 80307
>> http://www.unidata.ucar.edu/staff/donm
>> *************************************************************
>>
>>
>>
>
>

--
*************************************************************
Don Murray                               UCAR Unidata Program
address@hidden                        P.O. Box 3000
(303) 497-8628                              Boulder, CO 80307
http://www.unidata.ucar.edu/staff/donm
*************************************************************





--
--------------------------------------------------------------
Dr Jon Blower              Tel: +44 118 378 5213 (direct line)
Technical Director         Tel: +44 118 378 8741 (ESSC)
Reading e-Science Centre   Fax: +44 118 378 6413
ESSC                       Email: address@hidden
University of Reading
3 Earley Gate
Reading RG6 6AL, UK
--------------------------------------------------------------

===============================================================================
To unsubscribe netcdf-java, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
===============================================================================