[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: reading raw (packed) data from NetCDF files and avoiding missing-value check



Hi John,

are you aware of strided data acess, where you can read every 2nd or 10th point 
etc?

Yes I am thanks - but because requested CRSs don't generally dovetail
nicely with the data's native CRS this doesn't always work well.
However, it is certainly useful in some cases.

sounds like you are finding the nearest neighbor in the large array, based on 
what points
are needed in the your output CRS?

Exactly right, yes.

i will also look at trying to make the converter faster.
do you have any timing test output that would be instructive?

All my timing data is from the NetBeans profiler
(http://profiler.netbeans.org/), which I have found to be extremely
useful for identifying "hot spots" in code.  This produces "live"
graphical output so I'm afraid I don't have anything to send you.  I
could send some screenshots if you like but it would be much more
meaningful for you to run the profiler yourself if this is possible.

Thanks, Jon


On 27/10/06, John Caron <address@hidden> wrote:


Jon Blower wrote:
> Hi Don,
>
> The problem is caused by my use of the nj22 library.  In my
> application I need to create an image from a NetCDF file as quickly as
> possible.  The image will often be of much lower resolution than the
> source data, but will not necessarily be in the same coordinate
> reference system.

are you aware of strided data acess, where you can read every 2nd or 10th point 
etc?

sounds like you are finding the nearest neighbor in the large array, based on 
what points are needed in the your output CRS?

>
> If I want to create a 100x100 image, I need to read at least 10,000
> data points.  However, reading 10,000 individual points appears to be
> very slow (especially for an NcML aggregation) so I am compromising by
> reading chunks of contiguous data at a time.  This means that I often
> end up reading considerably more data than I need to make the image.
> I perform the necessary interpolation in my application and throw away
> the unwanted data.
>
> If I read packed data using an "enhanced" variable, then every single
> point is internally checked to see if it is a missing value, and every
> single point is unpacked (scale and offset applied).  Through
> profiling, I established this to be an expensive operation because it
> is being applied to many more data points than I need.  Therefore I
> employed a method whereby data are read in their packed form, without
> being checked for missing values.  I then perform the check just for
> the 10,000 points that I need to plot in my image.  This is
> considerably and demonstrably faster, although as with all
> optimisation problems, it's a compromise.
>
> Does this clear things up?  As far as changes to the libraries go, it
> would be handy to have a method in GeoGrid for reading "raw" (packed)
> data as fast as possible, and giving the user the opportunity to
> unpack the data later.

that seems reasonable, i will see how easy it is to do.
in any case, some fine-grained control is needed.

i will also look at trying to make the converter faster.
do you have any timing test output that would be instructive?



>
> Best wishes,
> Jon
>
> On 27/10/06, Don Murray <address@hidden> wrote:
>
>> Jon and John-
>>
>> Why is it so much slower using the GeoGrid directly?  Perhaps
>> there can be some performance tuning on the GeoGrid side to
>> avoid people having to jump through the hoops that Jon is?
>> Is it because the GeoGrid scales and offsets the entire grid
>> before subsetting instead of subsetting and then scale and
>> offset (which seems to be what Jon ends up doing).  Jon,
>> when you say you are scaling and offsetting only the individual
>> values, is this all the values in the subset or if not, what
>> percentage of the subset are you doing this on?
>>
>> We've been doing some profiling of the netcdf-java reading
>> in the IDV and if this is an area where we could get some
>> performance enhancements, I'd like to implement something
>> in the IDV.
>>
>> Don
>>
>> Jon Blower wrote:
>> > Hi John (cc list),
>> >
>> > Thanks for you help - I found a solution that works well in my app.
>> > As you suggested, I open the dataset without enhancement, then added
>> > the coordinate systems:
>> >
>> >            nc = NetcdfDataset.openDataset(location, false, null);
>> >            // Add the coordinate systems
>> >            CoordSysBuilder.addCoordinateSystems(nc, null);
>> >            GridDataset gd = new GridDataset(nc);
>> >            GeoGrid geogrid = gd.findGridByName(varID);
>> >
>> > I then create an EnhanceScaleMissingImpl:
>> >
>> >            EnhanceScaleMissingImpl enhanced = new
>> > EnhanceScaleMissingImpl((VariableDS)geogrid.getVariable());
>> >
>> > (Unfortunately this class is package-private so I made a copy from the
>> > source code in my local directory.  Could this class be made public
>> > please?)
>> >
>> > This means that when I read data using geogrid.subset() it does not
>> > check for missing values or unpack the data and is therefore quicker.
>> > I then do enhanced.convertScaleOffsetMissing() only on the individual
>> > values I need to work with.  Seems to work well and is pretty quick.
>> > Is there anything dangerous in the above?
>> >
>> > Thanks again,
>> > Jon
>> >
>> >
>> > On 26/10/06, John Caron <address@hidden> wrote:
>> >> Hi Jon:
>> >>
>> >> Jon Blower wrote:
>> >> > Hi John,
>> >> >
>> >> > I need some of the functionality of a GridDataset to allow me to
>> read
>> >> > coordinate system information.  Also, I might be opening an NcML
>> >> > aggregation.  Is it sensible to use
>> NetcdfDataset.getReferencedFile()?
>> >> > In the case of an NcML aggregation, is it possible to get a
>> handle to
>> >> > a specific NetcdfFile (given relevant information such as the
>> >> > timestep)?
>> >>
>> >> You are getting into the internals, so its a bit dangerous.
>> >>
>> >> I think this will work:
>> >>
>> >>  NetcdfDataset ncd = openDataset(String location, false, null); //
>> >> dont enhance
>> >>  ucar.nc2.dataset.CoordSysBuilder.addCoordinateSystems(ncd, null); //
>> >> add coord info
>> >>  GridDataset gds = new GridDataset( ncd); // make into a grid
>> >>
>> >> BTW, you will want to switch to the new GridDataset in
>> >> ucar.nc2.dt.grid when you start using 2.2.17. It should be compatible,
>> >> let me know.
>> >>
>> >>
>> >> >
>> >> > On a related note, is it efficient to read data from a NetcdfFile
>> (or
>> >> > NetcdfDataset) point-by-point?  I have been assuming that reading
>> >> > contiguous chunks of data is more efficient than reading individual
>> >> > points, even if it means reading more data than I actually need, but
>> >> > perhaps this is not the case?  Unfortunately I'm not at my usual
>> >> > computer so I can't do a quick check myself.  If reading data
>> >> > point-by-point is efficient (enough) my problem goes away.
>> >>
>> >> It depends on data locality. If the points are close together on disk,
>> >> then they will likely to be already in the random access file buffer.
>> >> The bigger the buffer the more likely, you can try different buffer
>> >> sizes with:
>> >>
>> >> NetcdfDataset openDataset(String location, boolean enhance, int
>> >> buffer_size, ucar.nc2.util.CancelTask cancelTask, Object spiObject);
>> >>
>> >>
>> >>
>> >> >
>> >> > Thanks, Jon
>> >> >
>> >> > On 26/10/06, John Caron <address@hidden> wrote:
>> >> >
>> >> >> Hi Jon:
>> >> >>
>> >> >> One obvious thing would be to open it as a NetcdfFile, not a
>> >> >> GridDataset. Is that a possibility?
>> >> >>
>> >> >> Jon Blower wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > I'm writing an application that reads data from NetCDF files and
>> >> >> > produces images.  I've noticed (through profiling) that a slow
>> point
>> >> >> > in the data reading process is the unpacking of packed data (i.e.
>> >> >> > applying scale and offset) and checking for missing values.  I
>> would
>> >> >> > like to minimize the use of these calls.
>> >> >> >
>> >> >> > To cut a long post short, I would like to find a low-level
>> function
>> >> >> > that allows me to read the packed data, exactly as they appear in
>> >> the
>> >> >> > file.  I can then "manually" apply the unpacking and
>> missing-value
>> >> >> > checks only to those data points that I need to display.
>> >> >> >
>> >> >> > I'm using nj22, version 2.2.16.  I've tried reading data from
>> >> >> > GeoGrid.subset() but this (of course) performs the unpacking.  I
>> >> then
>> >> >> > tried getting the "unenhanced" variable object through
>> >> >> > GeoGrid.getVariable().getOriginalVariable(), but
>> (unexpectedly) this
>> >> >> > also seems to perform unpacking and missing-value checks - I
>> >> expected
>> >> >> > it to give raw data.
>> >> >> >
>> >> >> > Can anyone help me to find a function for reading raw (packed)
>> data
>> >> >> > without performing missing-value checks?
>> >> >> >
>> >> >> > Thanks in advance,
>> >> >> > Jon
>> >> >> >
>> >> >>
>> >> >>
>> >>
>> 
===============================================================================
>>
>> >>
>> >> >>
>> >> >> To unsubscribe netcdf-java, visit:
>> >> >> http://www.unidata.ucar.edu/mailing-list-delete-form.html
>> >> >>
>> >>
>> 
===============================================================================
>>
>> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>> --
>> *************************************************************
>> Don Murray                               UCAR Unidata Program
>> address@hidden                        P.O. Box 3000
>> (303) 497-8628                              Boulder, CO 80307
>> http://www.unidata.ucar.edu/staff/donm
>> *************************************************************
>>
>>
>>
>
>



--
--------------------------------------------------------------
Dr Jon Blower              Tel: +44 118 378 5213 (direct line)
Technical Director         Tel: +44 118 378 8741 (ESSC)
Reading e-Science Centre   Fax: +44 118 378 6413
ESSC                       Email: address@hidden
University of Reading
3 Earley Gate
Reading RG6 6AL, UK
--------------------------------------------------------------

===============================================================================
To unsubscribe netcdf-java, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
===============================================================================