[netcdfgroup] Proposal for Generated Filter Code

# Proposal for Generated Filter Code

I am starting work on a new netcdf utility that takes a
simplified filter specification and uses it to generate a
complete HDF5 filter wrapper [2] plus an NCZarr filter wrapper.

The raison d'etre is that the process of building an HDF5 filter
wrapper [2,3] from scratch is complex, time-consuming and error
prone. Using a code generator is likely to simplify this
process. At the very least, it will produce base code that a
filter builder can modify to build the desired wrapper.

This program is analogous to, say, the yacc parser generator
that converts an annotated BNF to a full blown parser.

**What I need:** I have a simple prototype working, but I need
some community input on this idea. Would anyone use it? Is the
proposed specification (Appendix A) reasonably simple to construct?

If you want to participate, use this
[GitHub discussion](https://github.com/Unidata/netcdf-c/discussions/2288).

# Specification Overview

The filter specification is written in JSON, although it is
highly stylized. It was derived from the NumCodecs [4] format
but with significant extensions to support the Netcdf-4/HDF5
wrapper format.

A couple of visible extensions with respect to JSON are:

1. Single line comments are supported beginning with '#'.
2. An alternate string delimiter is provided using the '`'
character; chosen because occurrences of that delimiter in C
code is very uncommon.

The basic specification is a JSON dictionary with very specific
keys that are used to control code generation.

A draft example for specifying the zstandard filter wrapper is shown in Appendix A. The various dictionary keys provide filter information.
* **"id"** -- specifies the NumCodecs name (Zstd) and the HDF5 assigned
identifier (32015); it also specifies an alternate preferred name.
* **"parameters"** -- a dictionary whose keys are the parameter names
as specified by NumCodecs, and the value is a keyword
indicating the type of the corresponding parameter.
The allowable types are "integer" or "float". or an enumeration
(not described here).
* **"initialize"** -- the value is a piece of code to initialize the filter before use. * **"finalize"** -- the value is a piece of code to shutdown the filter after all use is complete. * **"prefix"** -- arbitrary code to insert at the front of the filter wrapper; typically used to include filter library specific headers. * **"suffix"** -- arbitrary code to insert at the end of the filter wrapper; typically used to include filter library specific utility functions. * **"encode"** -- a function name plus the code for a user-provided function to invoke the filter's encoding/compression capability; this has a very specific signature. * **"decode"** -- a function name plus the code for a user-provided function to invoke the filter's decoding/decompression capability; this has a very specific signature.

# References
[1] [HDF5 Filter Specification](https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf)<br> [2] [Registered HDF5 Filter Plugins](https://portal.hdfgroup.org/display/support/Registered+Filter+Plugins)<br> [3] [User Contributed Filters](https://support.hdfgroup.org/services/contributions.html#filters)<br>
[4] [NumCodecs](https://numcodecs.readthedocs.io/en/stable/)<br>

# Appendix A. Zstandard Draft Example
````
{
    "id": {"zstd": 32015, "preferred": "zstandard"},
    "parameters": [{"level": "integer"}]
    "encode": ["name": "zstd_compress",
               "code": # The signature is standardized
        `
        size_t zstd_compress(size_t srclen, void* srcbuf, size_t* dstlenp, void** dstbufp, size_t cd_nelmts, const unsigned int* cd_values)
        {
            int ret = NC_NOERR;
            size_t dstlen;
            void* dstbuf;
            dstlen = (size_t)ZSTD_compressBound(srclen);
            if(ZSTD_isError(dstlen)) {ret = NC_EFILTER; goto cleanup;}
            /* Prepare the destination buffer. */
            if((dstbuf = malloc(dstlen))==NULL) {ret = NC_ENOMEM; goto cleanup;}             dstlen = ZSTD_compress(dstbuf, dstlen, srcbuf, srclen, /*level*/cd_values[0]);
            if(ZSTD_isError(dstlen)) {ret = NC_EFILTER; goto cleanup;}
            if(dstlenp) *dstlenp = dstlen;
            if(dstbufp) *dstbufp = dstbuf;
        cleanup:
            return dstsize;
        }`],

    "decode": ["name": "zstd_decompress",
               "code": # The signature is standardized
        `...`]
    "prefix": `...`,
    "suffix": `...`
}
````


  • 2022 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: