Mapping between CDM and netCDF-4 Data Models
last modified: August 13, 2012
The CDM data model is close to, but not identical to the netCDF-4 data model. However there is a complete 2-way mapping between the two models.
DataTypes
From netCDF-4 to CDM
- A netCDF-4 Compound is a CDM Structure. Both can be arbitrarily nested.
- A netCDF-4 Enum is a CDM enum1, enum2, or enum4 and references a EnumTypedef which holds the (enum, String) map.
- A netCDF-4 Vlen is mapped to a CDM variable length Dimension.
- A netCDF-4 Opaque type is a CDM opaque type, but the length of the data cannot be found until you read the data.
- NetCDF-4 signed and unsigned byte, short, int, long are mapped to CDM byte, short, int, long. If unsigned, the attribute _Unsigned = "true" is added to the CDM Variable.
From CDM to netCDF-4
- A CDM array of Opaque may have a different length for each Opaque object. May have to read to find maximum length, or ???
Type Definitions
From netCDF-4 to CDM
- A netCDF-4 Enumeration Type becomes a CDM EnumTypedef.
- All other netCDF-4 type definitions are repeated for each CDM variable that uses them. The attribute _Typedef = "typename" is added to the CDM Variable, where typename is the name of the netCDF-4 type.
From CDM to netCDF-4
- A CDM EnumTypedef becomes a netCDF-4 Enumeration Type.
- If a CDM Variable has an attribute _Typedef = "typename", then the Variables' definition is made into a netCDF-4 type.
Attributes
In CDM, an attribute type may only be a scalar or 1D array of signed byte, short, int, long, float, double, or String. A char type is mapped to a String.
From netCDF-4 to CDM
- An attribute of compound type in netCDF-4 is flattened, by making each field a seperate attribute, with name attName.fieldName in the CDM.
- If the compound attribute is for a compound variable, and the field name of the attribute matches a field name of the variable, the attribute is added to that field instead of being flattened.
- An attribute of enum type in netCDF-4 becomes a String type in the CDM. ???
- An attribute of opaque type in netCDF-4 becomes a byte type in the CDM.
- An attribute of vlen of type in netCDF-4 becomes an array of type in the CDM.
- An attribute of an unsigned byte, short, int in netCDF-4 is promoted to a signed short, int, or long in the CDM.
From CDM to netCDF-4
- Attributes on member variables of Structures are made into a compound attribute on the parent Structure.
Differences between netCDF-4 C and Java libraries
Fixed length Strings with anonymous dimension
- HDF5 object: type = 3 (String) with a dimension.
- C library: turns these into variable length Strings
- Java library: turns these into char arrays, with an anonymous dimension
Enum Typedefs
- If there is a enum typedef that is not used by a variable, it will not show up in the enum typedefs. (bug?)
Attributes
- When a variable is chunked, an integer array attribute named _ChunkSize is added to the variable, whose values are the chunck size for each dimension.
Notes
1) char arrays are interpeted as UTF-8 bytes array (Strings) when they are attributes . but data arrays are not, they are run through unsignedToShort() and cast to char. this seems like trouble.
2) nc4 allows arbitrary composition of vlen. cdm tries to map these to a variable length dimension, to get a ragged array, not part of the data type. But Arrays are rectangular, so its a difficult fit.
could define ArrayRagged which maps to C multidim arrays.
its natural to map
int data(x,y,*) -> int(*) data(x,y)
but it doesnt generalize well to nested vlens. nc4 solution is to declare each type seperately and chain them:
int(*) type1;
type1(*) type2;
type2 data(x,y);
Array.isVariableLength(). IOSP might return ArrayInteger from int data(*). Needs to return ArrayObject for int data(3,*), with Array.isVariableLength() true.
int(*) returns ArrayInt
int(3,*) returns ArrayObject(3) with ArrayInt(*) inside
int (*,3) returns Array(n,3), whatever n happens to be.
int(3,*,*) returns ArrayObject(3) with ArrayObject(*) inside with ArrayInt(*) inside.
int(*,3,*) returns ArrayObject(n) with ArrayObject(3) inside with ArrayInt(*) inside.
int(*,*,3) returns ArrayObject(n) with ArrayInt(*,3) inside. OR ArrayObject(n) with ArrayObject(*) with ArrayInt(3) inside.
struct {
int i1;
float vf(*);
} s(3);
is like float(3,*) -> ArrayObject(3) with ArrayFloat(*), inside the ArrayStructure.
this is getting out of control
3) attributes : n4 can be user defined types, cdm: 1 dim array of primitive or String.
netcdf tst_enums {
types:
ubyte enum Bradys {Mike = 8, Carol = 7, Greg = 6, Marsha = 5, Peter = 4, Jan = 3, Bobby = 2, Whats-her-face = 1, Alice = 0} ;
// global attributes:
Bradys :brady_attribute = Alice, Peter, Mike ;
}
netcdf R:/testdata/netcdf4/nc4/tst_enums.nc {
types:
enum Bradys { 'Alice' = 0, 'Whats-her-face' = 1, 'Bobby' = 2, 'Jan' = 3, 'Peter' = 4, 'Marsha' = 5, 'Greg' = 6, 'Carol' = 7, 'Mike' = 8};
:brady_attribute = "Alice", "Peter", "Mike";
}