hdf/netcdf prototype

        I read with interest the plans to develop the prototype in the near 
future.  We are developing software under a NASA grant to manipulate and 
display large atmospheric datasets and we chose to use netCDF as our primary 
data interface, but would like to support HDF as well.

        Our software is being written to run on a variety of Unix platforms, 
but we are developing code on an IBM RS6000.  Therefore we would very much
like a prototype version that will run on the RS6000.  We would also be happy 
to test the prototype when it becomes available and provide feedback.

        In our project we are creating a layer on top of netCDF to better
support large datasets divided into multiple files.  I would be most interested
in learning more about the SILO project to see how the netCDF model has been
extended and how it may relate to our work.  Is information or code available
on SILO for examination?

        I have reviewed the design document and think the plans outlined will
result in a useful product.  How would HDF palettes appear in netCDF (as a 
variable)?  netCDF requires that variables and dimensions have unique names, 
whereas HDF has no such constraints.  How would an HDF object get a name in 
netCDF?  On p.8 of the document is a discussion on how variables will be 
stored in HDF.  From our standpoint, fast hyperslab access is important to 
retrieve and display data values quickly.  For derivable quantities (p.9), 
equations could be stored as attributes, although parsing of this information 
is not possible in netCDF currently.  I agree that this is a worthwhile 
addition to the model.

        Thanks.

--------------------------------------------------------------------------
Keith Searight, Research Programmer             keith@xxxxxxxxxxxxxxxxxxxx
Univ. of Illinois at Urbana-Champaign     
Dept. of Atmospheric Sciences                           Ph. (217) 333-8132
105 S. Gregory Ave., Urbana, IL  61801                  Fax (217) 244-4393

>From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 23 2003 Oct -0600 09:59:18 
Message-ID: <wrxvfqgkkrt.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 23 Oct 2003 09:59:18 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
In-Reply-To: <200307091637.h69Gb8Ld001151@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: automatic type conversion issues: the big picture
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id h9NFxKGf013026
        for netcdf-hdf-out; Thu, 23 Oct 2003 09:59:20 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id h9NFxJOb013020
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Thu, 23 Oct 2003 09:59:20 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200310231559.h9NFxJOb013020
Lines: 31
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk

Howdy all!

I just thought I would share some things that came up last week with
respect to automatic type conversion. This will lead to a bunch of
questions for Quincey, and eventually a set of requirements for the
HDF5 team with respect to a new feature.

Since this will be a long discussion I'll break it into a number of
messages, in a possibly vain attempt to trick people into reading
them!

Naturally I hope that others will step in to clear me up on any
misconceptions I might have. I've only spent a couple of days on this,
so I have probably got a few things wrong.

Turns out that HDF5 does not convert between float and int vice
versa. That is, if you have a bunch of ints and try to write them as
floats, HDF5 will give you an error at the write attempt, complaining
that the source type can't be converted to the destination type.

Netcdf, on the other hand, does do these sorts of conversions.

Quincey tells me that this will be a feature added to HDF5, but as I
needed to have it to continue my work, I hacked out a quick couple of
functions to fake it in netcdf-4. When HDF5 is upgraded with this
functionality, I'll toss out mine.

There's two parts to this. Firstly, the data must be scanned to see if
there are any "range errors." Secondly, the data must be converted to
the type, and written.


>From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 23 2003 Oct -0600 10:00:36 
Message-ID: <wrxsmlkkkpn.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 23 Oct 2003 10:00:36 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: automatic type conversion issues: range errors
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id h9NG0bxk016989
        for netcdf-hdf-out; Thu, 23 Oct 2003 10:00:37 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id h9NG0aOb016908
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Thu, 23 Oct 2003 10:00:36 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200310231600.h9NG0aOb016908
Lines: 30
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk

In the type conversion process, range errors will require special
handling to live up to the netcdf-3 standard.

Netcdf defines a range error as occurring when you try to stuff too
large (or small) of a number from one type, into a more restrictive
type.

For example, let's say you have a length 2 array of long:

long arr[] = {10, 1232134};

Now you want to write this out as a byte (i.e. signed one byte
int). The first array element is no problem. The second is too large
to fit.

The netcdf answer to this is to write the first array element as
instructed, then to write a fill value for the second, and return the
NC_ERANGE error.

Uniquely (I believe) for netcdf errors, the NC_ERANGE error indicates
that the operation (i.e. the write of the array) DID take place, but
that at least one range error was found, and that value replaced with
a fill value.

Usually, as is the C convention, a netcdf function returning an error
should not be expected to have completed it's operation.

Quincey, what does HDF do if you try and write a too-large long into a
signed char? Does it give an error?


>From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 23 2003 Oct -0600 10:04:13 
Message-ID: <wrxoew8kkjm.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 23 Oct 2003 10:04:13 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: automatic type conversion issues: conversion
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id h9NG4E8i021925
        for netcdf-hdf-out; Thu, 23 Oct 2003 10:04:14 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id h9NG4DOb021921
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Thu, 23 Oct 2003 10:04:13 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200310231604.h9NG4DOb021921
Lines: 15
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk

The data conversion itself takes place in just the way that the C
language would convert the type (according to the netcdf docs).

In my type conversion function, I determine if the conversion is
needed, if so, I allocate new memory and copy the data one element at
a time, allowing the C compiler to convert the type. Then I write out
the new type.

I'll post the two functions as well, in case anyone wants to tell me
what bugs there are.

As a hint, there is one quite major bug in my type conversion
function, but nc_test hasn't caught it yet. Can you?

Ed

>From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 23 2003 Oct -0600 10:05:48 
Message-ID: <wrxk76wkkgz.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 23 Oct 2003 10:05:48 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: automatic type conversion issues: range checking function
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id h9NG5nlm028555
        for netcdf-hdf-out; Thu, 23 Oct 2003 10:05:49 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id h9NG5mOb028493
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Thu, 23 Oct 2003 10:05:48 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200310231605.h9NG5mOb028493
Lines: 145
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk

int
data_range_check(nc_type file_type, nc_type mem_type, void *orig_data, size_t 
len)
{
   short *shortp;
   int *intp;
   float *floatp;
   double *doublep;
   size_t i;
   int retval = NC_NOERR;

   /* I got these from the netcdf-3.5 dist. Obviously they are
      specific to linux-like machines, but good enough for now. */
#define X_SCHAR_MIN     (-128)
#define X_SCHAR_MAX     127
#define X_UCHAR_MAX     255U
#define X_SHORT_MIN     (-32768)
#define X_SHORT_MAX     32767
#define X_INT_MIN       (-2147483647-1)
#define X_INT_MAX       2147483647
#define X_UINT_MAX      4294967295U
#define X_FLOAT_MAX     3.40282347e+38f
#define X_FLOAT_MIN     (-X_FLOAT_MAX)

   switch (file_type)
   {
   case NC_BYTE:
      switch (mem_type)
      {
      case NC_SHORT:
         for (shortp=(short *)orig_data, i=0; i<len; i++, shortp++)
            if (*shortp > X_SCHAR_MAX || *shortp < X_SCHAR_MIN)
            {
               retval = NC_ERANGE;
               *shortp = 0;
            }
         break;
      case NC_INT:
         for (intp=(int *)orig_data, i=0; i<len; i++, intp++)
            if (*intp > X_SCHAR_MAX || *intp < X_SCHAR_MIN)
            {
               retval = NC_ERANGE;
               *intp = 0;
            }
         break;
      case NC_FLOAT:
         for (floatp=(float *)orig_data, i=0; i<len; i++, floatp++)
            if (*floatp > X_SCHAR_MAX || *floatp < X_SCHAR_MIN)
            {
               retval = NC_ERANGE;
               *floatp = 0;
            }
         break;
      case NC_DOUBLE:
         for (doublep=(double *)orig_data, i=0; i<len; i++, doublep++)
            if (*doublep > X_SCHAR_MAX || *doublep < X_SCHAR_MIN)
            {
               retval = NC_ERANGE;
               *doublep = 0;
            }
         break;
      default:
         break;
      }
      break;
   case NC_CHAR:
      if (mem_type != NC_CHAR)
         return NC_ECHAR;
      break;
   case NC_SHORT:
      switch (mem_type)
      {
      case NC_INT:
         for (intp=(int *)orig_data, i=0; i<len; i++, intp++)
            if (*intp > X_SHORT_MAX || *intp < X_SHORT_MIN)
            {
               retval = NC_ERANGE;
               *intp = 0;
            }
         break;
      case NC_FLOAT:
         for (floatp=(float *)orig_data, i=0; i<len; i++, floatp++)
            if (*floatp > X_SHORT_MAX || *floatp < X_SHORT_MIN)
            {
               retval = NC_ERANGE;
               *floatp = 0;
            }
         break;
      case NC_DOUBLE:
         for (doublep=(double *)orig_data, i=0; i<len; i++, doublep++)
            if (*doublep > X_SHORT_MAX || *doublep < X_SHORT_MIN)
            {
               retval = NC_ERANGE;
               *doublep = 0;
            }
         break;
      default:
         break;
      }
      break;
   case NC_INT:
      switch (mem_type)
      {
      case NC_FLOAT:
         for (floatp=(float *)orig_data, i=0; i<len; i++, floatp++)
            if (*floatp > X_INT_MAX || *floatp < X_INT_MIN)
            {
               retval = NC_ERANGE;
               *floatp = 0;
            }
         break;
      case NC_DOUBLE:
         for (doublep=(double *)orig_data, i=0; i<len; i++, doublep++)
            if (*doublep > X_INT_MAX || *doublep < X_INT_MIN)
            {
               retval = NC_ERANGE;
               *doublep = 0;
            }
         break;
      default:
         break;
      }
      break;
   case NC_FLOAT:
      switch (mem_type)
      {
      case NC_DOUBLE:
         for (doublep=(double *)orig_data, i=0; i<len; i++, doublep++)
            if (*doublep > X_FLOAT_MAX || *doublep < X_FLOAT_MIN)
            {
               retval = NC_ERANGE;
               *doublep = 0;
            }
         break;
      default:
         break;
      }
      break;
   case NC_DOUBLE:
      /* Everything will fit in a double! */
      break;
   default:
      break;
   }
   return retval;
}

>From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 23 2003 Oct -0600 10:07:09 
Message-ID: <wrxfzhkkkeq.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 23 Oct 2003 10:07:09 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: automatic type conversion issues: conversion function
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id h9NG7AUI029800
        for netcdf-hdf-out; Thu, 23 Oct 2003 10:07:10 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id h9NG79Ob029796
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Thu, 23 Oct 2003 10:07:09 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200310231607.h9NG79Ob029796
Lines: 91
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk

int
data_type_convert(nc_type *mem_type, nc_type file_type, 
                  void **orig_data, size_t len, void **data, 
                  int *mem_allocated)
{
   short *shortp;
   size_t i;
   int *intp;
   float *floatp;
   double *doublep;
   unsigned char *ucharp;
   signed char *bytep;
   unsigned char *uchar_data = NULL;
   double *double_data = NULL;
   long *long_data = NULL;

   if ((*mem_type == NC_SHORT ||
        *mem_type == NC_INT ||
        *mem_type == NC_BYTE) && 
       (file_type == NC_FLOAT ||
        file_type == NC_DOUBLE))
   {
      /* Convert the data to a double instead... */
      if (!(double_data = malloc(len * sizeof(double))))
         return NC_ENOMEM;
      switch (*mem_type)
      {
      case NC_BYTE:
         for (bytep=(signed char *)(*orig_data), i=0; i<len; i++, bytep++)
            double_data[i] = (double)*bytep;
         break;
      case NC_SHORT:
         for (shortp=(short *)(*orig_data), i=0; i<len; i++, shortp++)
            double_data[i] = (double)*shortp;
         break;
      case NC_INT:
         for (intp=(int *)(*orig_data), i=0; i<len; i++, intp++)
            double_data[i] = (double)*intp;
         break;
      default:
         break;
      }
      *mem_type = NC_DOUBLE;
      *data = double_data;
      (*mem_allocated)++;
   }
   else if ((*mem_type == NC_FLOAT ||
             *mem_type == NC_DOUBLE) &&
            (file_type == NC_BYTE || 
             file_type == NC_SHORT ||
             file_type == NC_INT))
   {
      /* Convert there data to a long instead... */
      if (!(long_data = malloc(len * sizeof(long))))
         return NC_ENOMEM;
      switch (*mem_type)
      {
      case NC_FLOAT:
         for (floatp=(float *)(*orig_data), i=0; i<len; i++, floatp++)
            long_data[i] = (long)*floatp;
         break;
      case NC_DOUBLE:
         for (doublep=(double *)(*orig_data), i=0; i<len; i++, doublep++)
            long_data[i] = (long)*doublep;
         break;
      default:
         break;
      }
      *mem_type = NC_LONG;
      *data = long_data;
      (*mem_allocated)++;
   }
   else if (*mem_type == _NC_UCHAR &&
            file_type == NC_BYTE)
   {
      /* Convert there data to a unsigned char instead... */
      if (!(uchar_data = malloc(len * sizeof(unsigned char))))
         return NC_ENOMEM;
      for (ucharp=(unsigned char *)(*orig_data), i=0; i<len; i++, ucharp++)
         uchar_data[i] = (unsigned char)*ucharp;
      *mem_type = NC_BYTE;
      *data = uchar_data;
      (*mem_allocated)++;
   }
   else
   {
      *data = (void *)(*orig_data);
   }
   return NC_NOERR;

}

>From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 23 2003 Oct -0600 10:12:21 
Message-ID: <wrxbrs8kk62.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 23 Oct 2003 10:12:21 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: automatic type conversion issues: future development
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id h9NGCNtR004735
        for netcdf-hdf-out; Thu, 23 Oct 2003 10:12:23 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id h9NGCLOb004730
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Thu, 23 Oct 2003 10:12:22 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200310231612.h9NGCLOb004730
Lines: 21
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk


There is an obvious combination of the code I posted, and the
"odometer algorithm" in the original netcdf code. I haven't yet
combined these, but when I do it will fix a number of bugs, and also
give me support for the mapping functions as well. (And I still don't
understand how those are meant to be used!)

In the HDF5 world, I would like to see your type conversion meet the
following requirements:

1 - Convert types like C does.
2 - Check for range errors, and give an error if one or more occur,
but continue the operation anyway, substituting fill values for the
out of range values.
3 - Obviously, only check the subset of the data actually
read/written. That is,  if a hyperslab is to be written, don't check
range or convert types of values not in the hyperslab.

Quincey, what do you think of all that?

Ed

>From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 23 2003 Oct -0600 10:15:44 
Message-ID: <wrx7k2wkk0f.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 23 Oct 2003 10:15:44 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
In-Reply-To: <wrxbrs8kk62.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Re: automatic type conversion issues: future development
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id h9NGFjR2009940
        for netcdf-hdf-out; Thu, 23 Oct 2003 10:15:45 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id h9NGFiOb009925
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Thu, 23 Oct 2003 10:15:44 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200310231615.h9NGFiOb009925
References: <wrxbrs8kk62.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Lines: 15
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk

Ed Hartnett <ed@xxxxxxxxxxxxxxxx> writes:

> There is an obvious combination of the code I posted, and the
> "odometer algorithm" in the original netcdf code. I haven't yet
> combined these, but when I do it will fix a number of bugs, and also
> give me support for the mapping functions as well. (And I still don't
> understand how those are meant to be used!)

Just one more elaboration...

I realize that the range check and conversion should be done on the
same pass through the data, I've just kept them separate until now so
that I could get a better handle on the problem...

Ed

>From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 23 2003 Oct -0600 10:21:04 
Message-ID: <wrx3cdkkjrj.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 23 Oct 2003 10:21:04 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: question for Russ - signed vs. unsigned char and NC_BYTE
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id h9NGL5en015250
        for netcdf-hdf-out; Thu, 23 Oct 2003 10:21:05 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id h9NGL4Ob015246
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Thu, 23 Oct 2003 10:21:04 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200310231621.h9NGL4Ob015246
Lines: 16
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk

Here's a question I have about the netcdf interface.

Suppose I approach an unknown file, which contains an attribute. I can
use nc_inq_att to find out the type of the att. Suppose it is NC_BYTE.

How do I read that attribute? With nc_get_att_schar or with
nc_get_att_uchar?

That is, is it signed or unsigned?

The netcdf manual says that when writing data out, NC_BYTE will be
treated as signed. But then why do we have nc_get_att_uchar?

Thanks!

Ed