Re: NetCDF/PERL and strings (or CHARs is you prefer)

On Fri, 12 Nov 1999, Steve Diggs wrote:

> Robb,
> I got your name from Steve Emmerson at Unidata.
> My problem, I'm sure is a common one.  I'm not new to either NetCDF or
> Perl, but I've only been using the Perl/NetCDF interface for a few days.
> I'm having issues with the way that NetCDF treats string data.
> For instance, I might have an array of scalars that looks like this:
> @atmospheric_conditions = qw ( cloudy sunny fog rain );
> @count_ac = ($#atmospheric_conditions);
> This is close to my actual problem, so if I decide to write this arrayof
> atmospeheric conditions to a NetCDF file, I would do the following:
> my @start = (0);
> my $varid_ac   = NetCDF::vardef($ncid, 'ATM_COND', NetCDF::CHAR, $dimid);
> # .. then I leave define mode and write the data out
> NetCDF::varput($ncid, $varid_ac, \@start, \@count_ac,
>       \@atmospheric_conditions);
> I will only get back: 'clo' (the 1st 3 chars from the 1st element of the
> array above).  What's the deal?  How on earth do I encode variable length
> strings in Perl arrays into NetCDF CHAR arrays? 


NetCDF doesn't handle variable length strings, you need to pad all strings
with nulls. The cdl files define the variable type and scope for all the
writes. If you download the decoders package and look at the netCDF perl
decoders as examples.  I will also send an attachment example.

 Better yet, if I give
> this outout NetCDF data set to someone unfamiliar with the data, what
> gyrations do they have to go through in NetCDF/Perl to extract the values
> correctly?

It would be better if you wrote the extraction process using NetCDF perl
or the NetCDF library. This package is not the easiest to use but once you
have some working examples then it make sense.


> I'm sure that you've answered this question before, so a solid example or
> really good documentation would be sufficient.  BTW, where isn't it
> apparent in the NetCDF-Perl documentation that the weirdness above happens
> and what to do about it?
> thanks,
> -sd
> p.s. I'll send you a piece of code that illustrates this behaviour in my
> next message to you.
> -- 
> --------------------------------------------------------------------
> Steve Diggs                                   Voice: (858)534-1108
> Scripps Institution of Oceanography           FAX  : (858)534-7383
> WOCE Hydrographic Program Office/STS          EMAIL: address@hidden
> 9500 Gilman Drive                             WWW  : whpo.ucsd.edu
> La Jolla, CA 92093-0214
> --------------------------------------------------------------------

Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
# statsproc
# written by:         M. Baltuch
# date written:       February 1997
# description:
#   This file parses the raw stats input and writes both a netCDF file
#   containing the latency data, as well as a topo file.
# input files:   ~ftp/pub/idd/ldmstats/latency.input         raw latency input
#                ~/usr/local/ldm/etc/stats.conf        site configuration file
# output files:  ~ftp/pub/idd/ldmstats/topology.cur          current topo data
#                ~ftp/pub/idd/ldmstats/<feed><yyyymm>.nc     latency data
#                          feed    datastream
#                          yyyy    year
#                          mm      month number
#                ~ftp/pub/idd/ldmstats/<yyyymmddhh>.stats    old style stats
#                          yyyy    year                      files
#                          mm      month number
#                          dd      day number
#                          hh      hour number
#                ~/usr/local/ldm/logs/latency.log            log file
# Modification History:
#  who           when         what

use NetCDF;

# necessary files

$latencydir = "/home/ftp/pub/idd/ldmstats";
$idddir = "/home/idd";
$ncbindir = "/usr/local/ldm/util";
$archivedir = "/data/iddstats";

$inputfile = "$latencydir/latency.input";
$rawfile = "$latencydir/latency.input.1";
$archivefile = "$archivedir/latency.dat";
$conffile = "$idddir/etc/stats.conf";
$topofile = "$latencydir/topology.cur";
$junkfile = "$latencydir/stats.junk";
$logfile = "$idddir/logs/latency.log";
$cdlfile = "$idddir/etc/latency.cdl";

# shift the current input file

rename $inputfile, $rawfile;
system("touch $inputfile");
chmod 0664,$inputfile;

# make sure any writes to rawfile have completed


# copy the raw file to the archive

#system("cat $rawfile >> $archivefile");

# open necessary files

open (RAWFILE, "$rawfile") || &bad_exit("Could not open $rawfile: $!");
open (CONFFILE, "$conffile") || &bad_exit("Could not open $conffile: $!");
open (TOPOFILE, ">> $topofile") || &bad_exit("Could not open $topofile: $!");
open (LOGFILE, ">> $logfile") || &bad_exit("Could not open $logfile: $!");

# log start


# read in the configuration file

while (<CONFFILE>) {
    ($confname, $confstatus) = split (/[ \t\n]+/);
    $HOSTSTATUS{$confname} = $confstatus;


# main processing loop

$binfh = 'fh00';
$numproc = 0;
$nc_flag = 0;

while (<RAWFILE>) {
    undef $avglat;
    undef $maxlat;
    if (/LDMBINSTAT/) {
        while(<RAWFILE>) {
            if (/LDMEND/) {
            else {
                ($version,$host,$bin,$feed,$numprods,$numbytes,$plus) =
                    split(' ', $_ );
                $host = lc($host);
                if ($plus eq "+") {
                    chomp ( $_ ); # get rid of CR
                    chop ( $_ ); # get rid of plus sign
                    $add = <RAWFILE>;
                    $_ .= $add;
                    ($lsttime,$origin,$avglat,$maxlat) = split(' ', $add);
                    ($maxlat, $dummy) = split('\@',$maxlat);
                    $nc_flag = 1;
                else {
                    &print_log("OLDSTATS $host $bin $feed");
                    $nc_flag = 0;

                if ($bin !~ /^199|^200/) {
                    open (JUNK, ">> $junkfile") ||
                        &bad_exit("Could not open $junkfile: $!");
                    print JUNK;
                    close JUNK;
                    &print_log("BADENTRY $host $bin $feed $avglat $maxlat");
                    $nc_flag = 0;
                elsif (!defined $BINFILES{$bin}) {
                    open($binfh, ">> $latencydir/$bin.stats") ||
                        &bad_exit("Could not open $latencydir/$bin.stats: $!");
                    chmod 0664,"$latencydir/$bin.stats";
                    $BINFILES{$bin} = $binfh;

                # change HRS to HDS

                ( @F ) = split( / /, $_ );
                $F[ 1 ] = lc( $F[ 1 ] );
                #print $_;
                print "@F";
                undef( @F );

                # write to the netCDF file

                if ($nc_flag == 1) {

                # check for too many file handles open

                if ($bfh == 15) {
                    foreach $key (keys %BINFILES) {
                    $bfh = 0;
    elsif (/TOPOLOGY/) {
        while(<RAWFILE>) {
            if (/TOPOEND/) {
            else {
                print TOPOFILE;

# close all stats files

foreach $key (keys %BINFILES) {

# close all netCDF files;

foreach $key (keys %NCFILEIDS) {

# close up the rest

close RAWFILE;

# make sure that the topology file has the right permissions

chmod 0664,"topofile";

# log finish

&print_log("Finished: $numproc sites processed");
close LOGFILE;

# eliminate old data in stats files
chdir( "$latencydir" ) ;
@FILES = split( /[ \t\n]+/, `/bin/ls -rx *.stats` ) ;
for( $i = 0; $i <= 24; $i++ ) {
        open( STATS, $FILES[ $i ] ) || die "could not open $FILES[ $i ]: $!\n" ;
        while( <STATS> ) {
                ( $version, $host, $bintime, $feedtype, $number, $bytes, 
                        $theBin, $source, $avgLat, $maxLat ) 
                                = split( ' ', $_ );
                $DATA{ "$host $feedtype $source" } = $_ ;
        close( STATS ) ;
        open( STATS, ">$FILES[ $i ]" ) || 
                        die "could not open $FILES[ $i ]: $!\n" ;
        foreach $entry ( sort keys( %DATA ) ) {
                print STATS $DATA{ $entry } ;
        close( STATS ) ;
        undef( %DATA ) ;

# all done

exit 0;

# bad exit routine
sub bad_exit {
    local($err_str) = @_;
    local($date_str) = &get_date();

    print STDERR "$date_str procstats: $err_str\n";
    print LOGFILE "$date_str procstats: $err_str\n";
    exit -1;

# Date routine.  Gets date and time as GMT in the same format as the LDM log
# file.
sub get_date {
    @month_array = (Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec);
    local($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
    local($date_string) =
        sprintf("%s %d %02d:%02d:%02d UTC", $month_array[$mon], $mday,
                $hour, $min,$sec);
    return $date_string;

# Log message printing routine
sub print_log {
    local($msg_str) = @_;
    local($date_str) = &get_date();

    print LOGFILE "$date_str procstats: $msg_str\n";

# write data to netCDF file
sub write_ncfile {
    if ($feed eq "NONE" || $feed eq "DOWN") {
    local($filename) = "$feed";
    $filename =~ s/\Q|// ;
    $filename .= substr $bin, 0, 6;
    $filename .= ".nc";
    local($yeardir) = substr $bin, 0, 4;
    local($ncdir) = "$latencydir/$yeardir";

    local($recnum) = 0;

# get the hour and day indices

    $bin =~ /\d{6}(\d{2})(\d{2})/;
    local($daynum) = $1;
    $daynum -= 1;
    local($hournum) = $2;

# do we need to make a new netCDF file?

    if (! -e "$ncdir/$filename") {

        # make sure the year directory exists first

        if (! -e $ncdir) {
            mkdir($ncdir,0775) || &bad_exit("Can't create $ncdir: $!");

        # now create the file

        system("$ncbindir/ncgen -o $ncdir/$filename $cdlfile");

# open the appropriate netCDF file if not already open

    if (!defined $NCFILEIDS{$filename}) {
        $ncid = NetCDF::open("$ncdir/$filename", WRITE);
        $NCFILEIDS{$filename} = $ncid;

        # get the size of the record dimension

        $dimid = NetCDF::dimid($ncid,"siteNum");
        $nameid = "xxxxxxx";
        $rsize = -1;
        NetCDF::diminq($ncid, $dimid, $nameid, $rsize);
        $RECSIZE{$filename} = $rsize;

        # read the site_name variable to get the recnum's

        $varid = NetCDF::varid($ncid,"site_name");
        $SITEVARIDS{$filename} = $varid;

        if ($rsize > 0) {
            for ($i = 0; $i < $RECSIZE{$filename}; $i++) {
                @start = ($i, 0);
                @count = (1, 80);
                @sitename = ("\0" x 80);
                NetCDF::varget($ncid, $varid, \@start, \@count, \@sitename);
                $newsite = "";
                for ($j = 0; $j < 80; $j++) {
                    $siteChr = chr($sitename[$j]);
                    last if( $siteChr eq "\0" || $siteChr eq "\\" ) ;
                    $newsite .= $siteChr ;
                $SITENUM{$filename}{$newsite} = $i;

        # get the variable id for the avg and max latency variables, as well
        # as for the host status variable

        $STATUSVARIDS{$filename}  = NetCDF::varid($ncid, "node_status");
        $MAXVARIDS{$filename} = NetCDF::varid($ncid, "max_latency");
        $AVGVARIDS{$filename} = NetCDF::varid($ncid, "avg_latency");
    else {
        $ncid = $NCFILEIDS{$filename};

    # do we know about this site yet in this file
    if (!defined $SITENUM{$filename}{$host}) {
        if (!defined $HOSTSTATUS{$host}) {
            &print_log("HOSTEXIST $host $bin $feed $avglat $maxlat");
        $SITENUM{$filename}{$host} = $RECSIZE{$filename};
        @start = ($SITENUM{$filename}{$host}, 0);
        @count = (1, 80);
        for ($i = 0; $i < 80; $i++) {
            $padstr[$i] = '\0';
        for ($i = 0; $i < length($host); $i++) {
            $padstr[$i] = substr($host, $i, 1);
        NetCDF::varput($ncid,$SITEVARIDS{$filename},\@start, \@count,\@padstr);

        @stat_indices  = ($SITENUM{$filename}{$host});
        $hoststat = $HOSTSTATUS{$host} * 1;
        NetCDF::varput1($ncid,$STATUSVARIDS{$filename}, \@stat_indices,

    $hindex = $SITENUM{$filename}{$host};
    @indices = ($hindex,$daynum,$hournum);
    $avglat *= 1.0;
    $maxlat *= 1.0;
    NetCDF::varput1($ncid, $AVGVARIDS{$filename}, \@indices, $avglat);
    NetCDF::varput1($ncid, $MAXVARIDS{$filename}, \@indices, $maxlat);

    # finally, make sure we don't have too many netCDF files open

    if ($numncid == 21) {
        foreach $key (keys %NCFILEIDS) {
        undef %NCFILEIDS;
        undef %RECSIZE;
        undef %SITENUM;
        undef %SITEVARIDS;
        undef %MAXVARIDS;
        undef %AVGVARIDS;
        undef %STATUSVARIDS;
        $numncid = 0;
netcdf latency  {       // IDD latency definition

        siteNum = UNLIMITED;
        day = 31;
        hour = 24;
        maxNameLength = 80;


        float avg_latency(siteNum, day, hour);
                avg_latency:long_name = "Average Latency Report";
                avg_latency:_FillValue = -99999.f;
                avg_latency:units = "seconds since 1970-01-01 00 UTC";
        float max_latency(siteNum, day, hour);
                max_latency:long_name = "Maximum Latency Report";
                max_latency:_FillValue = -99999.f;
                max_latency:units = "seconds since 1970-01-01 00 UTC";
        byte node_status(siteNum);
                node_status:long_name = "host relay status";
                node_status:_FillValue = '\377';
        char site_name(siteNum, maxNameLength);
                site_name:long_name = "Site Hostname Index";

        :title = "Latency definition";

// general notes:
// Files will contain one month's worth of data for a single feed.  It will be
// stored in a directory indicating the year.  A configuration file will be
// kept in ~ldm/etc/stats.conf.  This will provide current relay/leaf status
// for each host site and will be used when first adding a site to the data
// file.
