[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Staging #VMF-551087]: most efficient ordering of array dimensions



Hi Tom,

> Can you tell me what the CF convention is (ambiguous online) for ordering
> array dimensions of netcdf dimensions? Aka if I have a variable, say,
> temperature with latitude, longitude, and time dimensions (with time
> set to /unlimited), is it recommended the array be sent to netcdf as
> temperature(latitude,longitude,time)? And does "most rapidly varying
> dimension" typically mean the dimension with the most elements (aka
> unlimited being potentially the most)?

In general, dimensions for gridded data may be in any order in
CF-compliant data.  CF evolved from the earlier COARDS conventions,
which mandated a strict order for dimensions, but CF relaxed the
COARDS rules to allow more flexibility.

An exception is if you have data that fits one of the "feature types"
defined and described in the new CF "Chapter 9.  Discrete Sampling
Geometries":

  
http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.html#discrete-sampling-geometries

In that case the order of dimensions is specified, to facilitate
software that recognizes observational data that is a collection of
points, time series, trajectories, or profiles.

The phrase "most rapidly varying dimension" generally refers to the
order in which the data is laid out on disk, and has nothing to do
with size of dimensions.  For example, 3D data might be stored in rows
of fixed time and longitude, with adjacent data differing by latitude,
in which case latitude would be the most rapidly varying dimension.
This would be appropriate and efficient if the data were most commonly
accessed in latitude rows or in longitude-latitude slices, but would
be inefficient if the most common access was reading a time series at
a grid point.

With netCDF-3 classic format files, the data provider must choose the
order of dimensions, guided by convenience in writing data in the
order it's generated or by efficiency of access for later readers of
the data.  Sometimes those two orders are different, which may require
reordering the data for the most common patterns of access.  Different
patterns of access might dictate different orders of dimensions, but
with netCDF-3 you have to pick one or make multiple copies of the data
using different layouts.

With netCDF-4, it's possible to compromise using "chunking" or
multidimensional tiling.  There are no CF conventions for chunking
yet.  Like compression, it's a performance characteristic that may
improve efficiency for diverse patterns of access without affecting the
software that accesses the data in any way, since compression and
chunking are transparent to the reading software.

If you want to know more about chunking and performance, see the 2011
netCDF workshop slides on the subject:

  
http://www.unidata.ucar.edu/netcdf/workshops/2011/nc4chunking/WhatIsChunking.html
  
http://www.unidata.ucar.edu/software/netcdf/workshops/2011/chunk_cache/index.html
  

We hope to provide better guidance in the netCDF documentation on chunking and 
compression in a future release ...

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: VMF-551087
Department: Support netCDF
Priority: Normal
Status: Closed