[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #YUZ-640378]: Associative Lookup



Hi Lee,

Here's an idea for how to do it, but I don't have an implementation. It uses 
the unlimited record index to get direct access to the data within a specified 
lat/lon region.

Goals:

    * To support writing data from irregularly scattered stations in
      one pass 
    * To not waste much space in the file
    * To permit direct retrievals by location (and time, with an easy
      generalization) 

Assumptions

    * Use of record dimension as a list dimension rather than for time
    * Resolution of location lookup is fixed

For this example, assume forest-fire fuel is in scattered circles with known 
center and radius.  Given the lat,lon of a fire, you want to quickly find out 
if it is in any of these known circles and if so get the data on smoke 
potential, combustibility, fuel, etc.

Create an int grid variable, say loc_cells, dimensioned by (nlats,nlons): 
dimensions:

   ...
  rec=unlimited;
  nlats = 500; nlons=500;
variables:
  int loc_cells(nlats,nlons);  // cells containing record numbers 
                               // of list headers for locs in that 
                               // cell
  int smoke_potential(rec);
  int combustibility(rec);
  int fuel_type(rec);
  float lat_center(rec);       // for a circle of a given type
  float lon_center(rec);
  float radius(rec);
  int next(rec);               // next data for this cell

The reader just reads the loc_cells variable into memory on opening the file.  
When the data corresponding to a fire at a particular (lat,lon) is desired, 
that single int is retrieved.  It specifies the record number at the head of a 
list for that cell location.  You can read in that data by reading in the 
appropriate record data for each desired variable, including the next variable. 
 If this doesn't give you a match (say the fire isn't in the specified circle), 
you use the value of next(rec) to get another record that identifies another 
region in the same lat,lon cell.  You keep traversing the linked list until you 
get to, say, -1 in the next fire indicating the end of the list for that 
lat,lon cell.  You can also follow the lists corresponding to surrounding cells 
...

This assumes your grid of cells has the right reolution of only a few data 
records (forest circle regions) occurring in each cell.

Note that this problem can be solved more easily with a real GIS system, where 
your regions are polygonal, each region has a database record associated with 
it, and fast computational geometry algorithms are used to quickly determine 
which of many (possibly overlapping) regions a specified location is in, then 
retrieving its database record.  Using netCDF instead of a database makes 
tradeoffs of portability, performance, and features.

In particular, while each netCDF access to the value of a record variable for a 
particular record number is direct, it might be a separate disk access for each 
such access, so the netCDF solution would be relatively slow if there were long 
lists in each cell to search or lots of record variables.

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: YUZ-640378
Department: Support netCDF
Priority: Normal
Status: Closed