High Performance Netcdf-4 Proposal

This documents outlines a proposal to create an alternate Netcdf-4 file format targeted to high-performance, READ-ONLY, access. For the purposes of this document, this format will be called NCX.

Limitations of the Existing Netcdf-4 format

It is currently the case the Netcdf-4 file format uses the existing HDF5 file format to store its data. From a high-performance point of view, the HDF5 format is limited in a number of ways.

  1. It does not support multi-threaded access; currently all API calls must be serialized using a single global lock.
  2. MPIO support is provided, but is totally embedded in the HDF5 library. There is no ability for user control and optimization.
  3. The HDF5 file format is completely fixed and opaque and there is limited support for performance-specific organizations. The two exceptions are:
    • Chunking parameterization is allowed to control how data is co-located.
    • Compression (on a per-chunk basis) allows data to be compressed thus supporting faster reads.

Rationale for a New NCX Format

What is being proposed is a new format for read-only access to "Netcdf4-like" files that provide the following capabilities.

  1. A simple-as-possible file format with a specification independent of any implementation.
  2. Keeping the existing Netcdf-4 data-model.
  3. Some ability to re-arrange the data in the file to support specific access patterns. This would include keeping the HDF5 chunking and compression concepts.
  4. Support for community developed tools that can re-organize

In addition, NCX is intended to be sufficiently simple that multiple, independent implementations can be constructed in a variety of programming languages. This is in contrast to the situation with HDF5 where the file format is so complex that there only exists one complete implementation exists: the one provided by the HDF group.

A Draft File Format

The NCX format proposed in this section is preliminary. Alternative proposals are encouraged.

The basic format builds on the concept of a single-file file system format (aka SFFS).

The basic idea is that a single file is organized to contain a file system including a root plus inodes plus data blocks, all within a single file that is treated as as if it was a heap.

The SFFS approach has a number of useful properties.

Simplicity: The basic SFFS layout is relatively simple. As with an on-disk file system, it uses a superblock plus a set of inodes each of which points to a tree of data blocks. Such an organization avoids the complexity of e.g. the HDF5 b-trees while providing a very general data layout.

Dynamicity: As with a normal file system, a file in the SFFS can be extended (or shortened) in size dynamically at the end of the file.

Annotation: Since the SFFS simulates a file system, it is possible to add information about existing information in the SFFS. In effect, one can create a file that provides "annotations" about other files in the SFFS: These annotations can include, for example, indices pointing into an existing file.

Capability for Reorganization: As long as the basic inode structure is maintained, it is possible to move chunks of data around to support better IO performance. One could even redivide the existing data into larger or smaller data chunks.

Mapping Netcdf4 to an SFFS.

Meta-data: The meta data about the netcdf-4 file can itself be contained in a single, virtual file in the SFFS.

Primitive-Typed Variables: Consider a variable consisting of primitive types, of fixed size: ints of various sizes and unsigned or signed or enums or chars. Assume the dimensions are all fixed size (not unlimited).

Such a variable can be easily laid out in a contiguous format, possibly using hdf5 style chunking and compression.

Unlimited Dimensions Case 1 (Initial Unlimited): Extending the previous case, a primitive-typed variable might have one or more unlimited dimensions. For the case of a single, initial unlimited, it can be kept exactly as with a variable with no unlimited. This is because it is possible to dynamically extend a file to accommodate changes in the size of the unlimited dimension.

Unlimited Dimensions Case 2 (Multiple Unlimited): Consider the following. dimensions: d1=..., d2=..., d3=..., du=UNLIMITED; variables: int v(d1,d2,du,d3);

For this case, we have a number of options. One option (assuming read-only as we are) is to start the file containing v with n intra-file offsets pointing to the subparts of the variable defined by the unlimited dimension. That is, for this example, we have an initial index of d1 x d2 offsets, where each offset points to the start of each of the subarrays of size du x d3. This case generalizes to multiple unlimited dimensions in the obvious(?) way.

Note how this differs from the netcdf-3 case where all variables with an unlimited dimension are co-mingled. However also note that we could re-organize this in a variety of ways to support parallel IO for specific access patterns.

String Typed Variables: This is fairly easy in that one can store each string with a preceding count and the strings are stored linearly with some form of index pointing to the offset of each string.

Even simpler, and again because it is read-only, is to store each string using the maximum size string. This produces internal fragmentation, but allows us to treat string as fixed size object.

Opaque Typed Variables: This is essentially the same situation as strings.

VLEN Typed Variables: One approach is to treat each vlen object as a separate file of its own length. Another approach is to use the String approach because we know the maximum size of all the vlens.

Compound Typed Variables: Again we have some options: we could store each compound object as in field order (as with a C struct) with each field following the next.

Alternately, we could store in the equivalent of "column order" where all instances of the first field (assuming an array of compounds) are stored one after another. Then all instances of second field are stored, an so on.

Misc. Notes

Why Read-Only?

The "process" implied here is as follows.

  1. The data file is created using the existing read-write model of the netcdf-c library.
  2. A special program (e.g. nccopy) is used to take original file as a whole and convert it to the NCX format.

The point being that at the point that the NCX file is created, the whole of the dataset is available. This means that, for example, specialized layout of variable-length data (strings, vlens, unlimited) can be achieved because the totality of the data is available. If an attempt was made to write the original dataset piecemeal using the NCX format, the whole of the dataset would not be available, hence it would not be possible to do certain kinds of layout optimizations.

Use of Docker

I considered using docker (esp. docker commit) as an alternative. This has the advantage that one could even include programs into the `file'. However, security considerations made this approach untenable until docker sand-boxing is completely reliable and trusted.

Comments:

My initial thoughts are that the benefits/effort ratio for a read-only format for the netCDF-4 data model may not be large enough to justify the work, though I've never worked with single-file virtual filesystems. They might make development easier.

If only a small subset of users would need a read-only representation of netCDF data, the benefits for those users would have to be very high to justify the work.

The read-only constraint would certainly permit a simpler file format, as nothing need be dynamically sized or variable length, since the size of everything can be fixed when the file is created.

John Caron developed a Java implementation of an experimental streaming format, ncstream, that was append-only, to optimize performance of remote access to subsets of data. It was read-only from the client side, so experience with that might have some relevance.

As far as MPIO and optimization are concerned, I tend to cede leadership the high-performance computing space to HDF5, because they have the resources, expertise, HPC users and access to a variety of HPC platforms to develop and support HPC software. The fact that netCDF-4 inherits improvements in HDF5 performance for free is a compelling feature of the current layering. And with all the netCDF-4 data out there that's already represented using HDF5 format, adding maintenance and support for a new format won't necessarily mean savings.

Anyway, that's a few thoughts off the top of my head, but maybe not much insight.

Posted by Russ Rew on July 08, 2016 at 08:17 PM MDT #

Interesting post, for the last section, could you tell me why you want to store programs in the file? This sounds crazy, but certainly you have some rationale in it.

Posted by jialin liu on July 11, 2016 at 10:50 AM MDT #

The reason that including code might be of interest is to allow built-in support for e.g. specialized algorithms such as compression.

Posted by dmh on July 12, 2016 at 11:50 AM MDT #

I think the idea of another binary format in the HPC space has merit.

Although netCDF-4/HDF5 offers some great parallel I/O features, it does fall short of what some users require.

Consider that these are extremely expensive systems, running extremely valuable software. Bad I/O performance can be a significant waste of a very expensive resource. So a great deal of programmer effort may be expended on I/O performance.

This leads to multiple teams writing I/O systems that attempt to beat netCDF-4/HDF5 performance (which really is hard to beat, when it is being used correctly).

The result is a lot of extra effort by scientific programmers. This is the very sort of problem netCDF should solve.

I think Denis' idea has merit if a core objective is to support the high performance needs of climate models.

Some of the capabilities of the parallel IO library (PIO from NCAR) may also be of interest.

Some more detailed observations:

* I think read-only is too strong a term (as well as being a bit confusing - how do I write a read-only file?) Maybe immutable is the correct term.

* I think you could achieve this by diabling re-entering of define mode for a file, and adding a few additional API calls that allow the specification of max string/varray; some restriction on multiple unlimited dimensions may also be needed.

* Struct values should be written in row-order, because that is how they are stored in memory, and storing one element of a stuct can be done with one disk access instead of one per column.

There is a meeting coming up in Boulder in early September. At this meeting will be some of the engineers who have decided to write their own I/O system, and convert the files to the mandatory netCDF-4 format as a final step. Perhaps we should all get together and ask them what features and performance improvements impel them to this decision.

Posted by Edward Hartnett on August 02, 2016 at 01:55 PM MDT #

Informative post on how to create an alternative Netcdf-4 file format targeted to read only and high performance access. Looking forward to more innovative posts like this.

Posted by Amelia Roster on September 09, 2016 at 05:15 AM MDT #

Post a Comment:
Comments are closed for this entry.
Unidata Developer's Blog
A weblog about software development by Unidata developers*
Unidata Developer's Blog
A weblog about software development by Unidata developers*

Welcome

FAQs

News@Unidata blog

Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
Browse by Topic
« April 2024
SunMonTueWedThuFriSat
 
2
3
4
5
6
7
8
9
10
11
12
13
14
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today