|
|
|||
|
||||
NetCDF Streaming Format (ncstream) is a write-optimized encoding of CDM datasets. Ncstream consists of a series of header and data messages, in any order. Writes are always appended. Later messages override earlier ones whenever they overlap or conflict. To add or modify structural metadata, simply append a new header message. Each data message identifies the variable and the section (rectangular subset) of data it contains. A variable's data thus consists of the collection of data messages for it, if any.
Messages are encoded using Google's Protobuf library.
"Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format."
The main advantage of protobuf over XML is performance, since both message size and parsing speed is improved. A very important feature of protobuf is the ability to evolve the message structure in a way that doesnt break previous code.
We dont use protobuf messages for the data, since protobuf messages are built in memory, and we need to be able to stream (write data directly from its source onto the output stream, eg socket). The data is simply linearized in the usual netCDF way, and written to the stream. A data message identifying the variable and the section that the data represents is part of every data message.
Variable length datatypes like String and Opaque use the vdataMessage for data transfer. First the number of objects is written, then each object, preceeded by its length in bytes as a vlen. Strings are encoded as UTF-8 bytes. Opaque is just a bag of bytes.
TDS 4.0 currently has a prototype service using ncstream similar to OPeNDAP, which can be used by Netcdf-Java 4.0 library. The classic model has been tested, and the extended model processing is mostly complete.
An ncstream is a sequence of one or more messages:
ncstream = {message}
message = headerMessage | dataMessage | vdataMessage | errorMessage
headerMessage = MAGIC_HEADER, vlen, NcStreamProto.Header
dataMessage = MAGIC_DATA, vlen, NcStreamProto.Data, vlen, (byte)*vlen
vdataMessage = MAGIC_DATA, vlen, NcStreamProto.Data, vn, (vlen, bytes)*vn
errorMessage = MAGIC_ERR, vlen, NcStreamProto.Error
vlen = variable length encoded positive integer == length of the following object in bytes
vn = variable length encoded positive integer == number of objects that follow
NcStreamProto.Header = Header message encoded by protobuf
NcStreamProto.Data = Data message encoded by protobuf
data = actual bytes of data, encoding described by the NcStreamProto.Data message
primitives:
MAGIC_HEADER= 0xad, 0xec, 0xce, 0xda
MAGIC_DATA = 0xab, 0xec, 0xce, 0xba
MAGIC_ERR = 0xab, 0xad, 0xba, 0xda
The protobuf messages are defined by
(these are files on Unidata's SVN server)
An ncstream dataset starts with MAGIC_START, followed by a set of messages.
ncstreamDataset = MAGIC_START ncstream MAGIC_START = 0x43, 0x44, 0x46, 0x53 // 'CDFS'
There is just enough information in the stream to break the stream into messages and to know what kind of message it is. To interpret the message correctly, one must have the correct proto file. To interpret the data stream correctly, one must have the header information. (is that really true? maybe only for structs)
NcStreamProto.Data contains the full variable name the data belongs to, the DataType and Section, if its big-endian or little-endian. ?? Note in Java, DataOutputStream always writes in big-endian order.
message Data {
required string varName = 1;
required DataType dataType = 2;
required Section section = 3;
optional bool bigend = 4 [default=true];
}
Primitive types (byte, char, short, int, long, float, double): arrays of primitives are stored in row-major order. The endian-ness is specified in the NcStreamProto.Data message when needed.
Variable length types (String, Opaque): First the number of objects is written, then each object, preceeded by its length in bytes as a vlen. Strings are encoded as UTF-8 bytes. Opaque is just a bag of bytes. what about vlen? eg int (3, *) ??
Structure types (Structure, Sequence): An array of StructureData. Can be encoded in row or col (?). What about vlens ??
This document is maintained by John Caron and was last updated August 17, 2009
| Contact Us Site Map Search Terms and Conditions Privacy Policy Participation Policy | |||||
|
|||||