[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDFJava #VKJ-807633]: NcML _Netcdf4Dimid, scalar shape compliance, and string separators



Hi Charlie:

> Hi John and NcML people,
> 
> I hope you are well.
> 
> We are preparing NCO 4.3.9 with the bold claim that its "Output
> validates without errors against NcML 2.2 schema." 

The ncml-schema, lives here:

  http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2.xsd

technically "validate" means XML schema validation which most XML parsers will 
check for you.


> While I think this
> is true in letter, I'm unsure this is true in spirit, mainly because
> the toolsui interface (which I use to check NCO output for schema
> compliance) generates NcML with "_Netcdf4Dimid" elements for which
> documentation is scarce and may be outdated.  Those elements appear to
> be either incomplete or unnecessary---it's clear that they are
> dimension IDs, yet only one is recorded even for multidimensional
> variables. Still, the toolsui schema does not complain when
> _Netcdf4Dimid elements are omitted, as NCO currently does.

I think "in spirit" probably means that netcdf-java / CDM library does the 
right thing. Here, schema validation is only the first layer of that; CF 
compliance being the next layer and has nothing to do with the XML schema.

_Netcdf4Dimid is a "real" attribute in netcdf-4 files, apparently meaning:

// on dimension scales, holds a scalar H5T_NATIVE_INT which is the (zero-based) 
dimension ID for this dimension. used to maintain creation order

Its a kludge for using hdf5; im leaving those attribute in in case the user 
cares about creation order. 
netcdf C probably removes them, since it puts the dimensions in creation order. 


> 
> Should I add _Netcdf4Dimid elements to NCO NcML output?
> If so, is there a rule for which dimension to add that element for
> in the case of multi-dimensional variables?

No you should not. I assume the problem is that you are comparing the output of 
java and C?
I have code when comparing java vs C libraries to ignore them. there are a few 
other things to ignore also:

      // added by cdm
      if (name.equals(CDM.CHUNK_SIZE)) return false;
      if (name.equals(CDM.FILL_VALUE)) return false;
      if (name.equals("_lastModified")) return false;

      // hidden by nc4
      if (name.equals(Nc4.NETCDF4_DIMID)) return false;  // preserve the order 
of the dimensions
      if (name.equals(Nc4.NETCDF4_COORDINATES)) return false;  // ??
      if (name.equals(Nc4.NETCDF4_STRICT)) return false;

where:

  // special attribute names used by netcdf4 library
  static public final String NETCDF4_COORDINATES  = "_Netcdf4Coordinates"; // 
only on the multi-dimensional coordinate variables of the netCDF model (2D 
chars)
                                                                           // 
appears to hold the dimension ids of the 2 dimensions
  static public final String NETCDF4_DIMID  = "_Netcdf4Dimid"; // on dimension 
scales, holds a scalar H5T_NATIVE_INT which is the (zero-based) dimension ID 
for this dimension.
                                                               // used to 
maintain creation order
  static public final String NETCDF4_STRICT  = "_nc3_strict";  // global - when 
using classic model


  public static final String CHUNK_SIZE = "_ChunkSize";
  public static final String FILL_VALUE = "_FillValue";


> In any case, I attach a sample input file and its NcML output
> generated by ncks in case you have the time and inclination to check
> whether the NcML is truly standards-compliant in a way that only a
> human can. Also wondering whether NcML really wants shape="" elements
> for scalar variables, which would seem redundant, yet I will go by
> your recommendation.

shape is not technically required, but the code i think needs it. One could say 
if not specified, assume scalar. For now, safer to leave it in.

> 
> Also, I rather randomly picked a separator = "*|*" for strings, in
> order to avoid generating NcML with ambiguous whitespace separators
> for arrays of strings. If there is a preferred string separator,
> please let me know.

I use "," for readability. 
but it needs to be something that is not already in one of the strings.
To be sure, you should scan the strings first. Otherwise "*|*" is as good as 
anything.

BTW, in your example, reading in_grp.ncml is barfing because g11/string_var is 
a scalar in the original file, but because there are embedded blanks, and blank 
is the default seperator, it sees 33 values. So you need the separator.

Thanks for your test file, im checking to see what issues it comes up with 
(just trying to open the NcML in ToolsUI/viewer).

for example, CDM doesnt actually support unsigned longs. we just pretend they 
are signed. ill think about a workaround for Ncml reading. Ill let you know if 
i see anything else.

Regards,
John
 

> 
> Thanks!
> c
> 
> p.s. output generated by current ncks snapshot with
> ncks --xml in_grp.nc > in_grp.ncml
> --
> Charlie Zender, Earth System Sci. & Computer Sci.
> University of California, Irvine 949-891-2429 )'(
> 
> 

Ticket Details
===================
Ticket ID: VKJ-807633
Department: Support netCDF Java
Priority: Normal
Status: Open