Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.

[netcdfgroup] Proposal for Generated Filter Code

# Proposal for Generated Filter Code

I am starting work on a new netcdf utility that takes a
simplified filter specification and uses it to generate a
complete HDF5 filter wrapper [2] plus an NCZarr filter wrapper.

The raison d'etre is that the process of building an HDF5 filter
wrapper [2,3] from scratch is complex, time-consuming and error
prone. Using a code generator is likely to simplify this
process. At the very least, it will produce base code that a
filter builder can modify to build the desired wrapper.

This program is analogous to, say, the yacc parser generator
that converts an annotated BNF to a full blown parser.

**What I need:** I have a simple prototype working, but I need
some community input on this idea. Would anyone use it? Is the
proposed specification (Appendix A) reasonably simple to construct?

If you want to participate, use this
[GitHub discussion](https://github.com/Unidata/netcdf-c/discussions/2288).

# Specification Overview

The filter specification is written in JSON, although it is
highly stylized. It was derived from the NumCodecs [4] format
but with significant extensions to support the Netcdf-4/HDF5
wrapper format.

A couple of visible extensions with respect to JSON are:

1. Single line comments are supported beginning with '#'.
2. An alternate string delimiter is provided using the '`'
character; chosen because occurrences of that delimiter in C
code is very uncommon.

The basic specification is a JSON dictionary with very specific
keys that are used to control code generation.

A draft example for specifying the zstandard filter wrapper is shown in Appendix A. The various dictionary keys provide filter information.
* **"id"** -- specifies the NumCodecs name (Zstd) and the HDF5 assigned
identifier (32015); it also specifies an alternate preferred name.
* **"parameters"** -- a dictionary whose keys are the parameter names
as specified by NumCodecs, and the value is a keyword
indicating the type of the corresponding parameter.
The allowable types are "integer" or "float". or an enumeration
(not described here).
* **"initialize"** -- the value is a piece of code to initialize the filter before use. * **"finalize"** -- the value is a piece of code to shutdown the filter after all use is complete. * **"prefix"** -- arbitrary code to insert at the front of the filter wrapper; typically used to include filter library specific headers. * **"suffix"** -- arbitrary code to insert at the end of the filter wrapper; typically used to include filter library specific utility functions. * **"encode"** -- a function name plus the code for a user-provided function to invoke the filter's encoding/compression capability; this has a very specific signature. * **"decode"** -- a function name plus the code for a user-provided function to invoke the filter's decoding/decompression capability; this has a very specific signature.

# References
[1] [HDF5 Filter Specification](https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf)<br> [2] [Registered HDF5 Filter Plugins](https://portal.hdfgroup.org/display/support/Registered+Filter+Plugins)<br> [3] [User Contributed Filters](https://support.hdfgroup.org/services/contributions.html#filters)<br>
[4] [NumCodecs](https://numcodecs.readthedocs.io/en/stable/)<br>

# Appendix A. Zstandard Draft Example
````
{
    "id": {"zstd": 32015, "preferred": "zstandard"},
    "parameters": [{"level": "integer"}]
    "encode": ["name": "zstd_compress",
               "code": # The signature is standardized
        `
        size_t zstd_compress(size_t srclen, void* srcbuf, size_t* dstlenp, void** dstbufp, size_t cd_nelmts, const unsigned int* cd_values)
        {
            int ret = NC_NOERR;
            size_t dstlen;
            void* dstbuf;
            dstlen = (size_t)ZSTD_compressBound(srclen);
            if(ZSTD_isError(dstlen)) {ret = NC_EFILTER; goto cleanup;}
            /* Prepare the destination buffer. */
            if((dstbuf = malloc(dstlen))==NULL) {ret = NC_ENOMEM; goto cleanup;}             dstlen = ZSTD_compress(dstbuf, dstlen, srcbuf, srclen, /*level*/cd_values[0]);
            if(ZSTD_isError(dstlen)) {ret = NC_EFILTER; goto cleanup;}
            if(dstlenp) *dstlenp = dstlen;
            if(dstbufp) *dstbufp = dstbuf;
        cleanup:
            return dstsize;
        }`],

    "decode": ["name": "zstd_decompress",
               "code": # The signature is standardized
        `...`]
    "prefix": `...`,
    "suffix": `...`
}
````


  • 2022 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: