Re: [netcdfgroup] New C++ interface for netCDF

On Mon, 24 Feb 2020 at 18:27 Chris Barker <chris.barker@xxxxxxxx> wrote:

> Sounds interesting. ONe suggestion:
>
> It's a subtle distinction, but rather than a "interface to netcdf",
> considering building a library for working with data that conforms to
> the netcdf data model, that has netcdf-C as one back-end.
>
> This is how xarray is built, and it opens the door to other file
> formats, like zarr, or even grib, etc.

Thanks Chris. I could isolate most netCDF-C API functions in a 'storage
adaptor' class for future extensibility. The I/O API is basically the
same across N-dimensional array formats, and many libraries (e.g.
bjoern-andres/marray, xtensor) provide BLAS-like slicing over contiguous
storage. Efficient views over chunked storage are more difficult but it
can be done with a generic caching iterator, similar to the approach
used by nccopy. I'm working on a C++ implementation.

The problem is figuring out how to index and slice the array, since
metadata encoding schemes vastly differ between formats. Flexible
slicing is the primary goal of this library. Consider how difficult this
would be with the Unidata netCDF libraries (from code example):

```
auto slice = tcw.select(
    ncpp::selection<date::sys_days>{"time", start, end, 2},
    ncpp::selection<double>{"latitude", 77.5, 80},
    ncpp::selection<double>{"longitude", 7.5, 10}
);

// tcw(2002-07-01 12:00,80,7.5)    = -23261
// tcw(2002-07-01 12:00,80,10)     = -23675
// tcw(2002-07-01 12:00,77.5,7.5)  = -23473
// tcw(2002-07-01 12:00,77.5,10)   = -23216
// ...
```

The CF conventions are the only major standard which can unambiguously
associate an array with labeled axes (coordinate variables), coordinate
chains (`instance_dimension` attribute) and related variables
(`ancillary_variables` attribute). CF conventions are currently only
defined for the netCDF data model, and there is not yet a standardized
and portable metadata mapping from the netCDF data model to other
backends. For example if you convert a GRIB-2 file to netCDF using
Unidata CDM, ecCodes and wgrib2, you may get very different results
including mislabeled variables. There are just too many edge cases.

xarray allows for more control over dataset indexing and value 
conversion. While I recognize that this is useful, I initially want to 
prioritize simple and unambiguous operations on standards-compliant 
files. For example, date/time conversion should work the same as it does 
in ncdump, and coordinate variables have to meet certain preconditions. 
CF compliance also makes high-level, automatic indexing possible for 
discrete sampling geometries, using the `featureType` and `cfRole` 
attributes to determine logical structure.

--
John Buonagurio
  • 2020 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: