[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NetCDF & perl



Brian,

> To: address@hidden
> From: Brian Olsen <address@hidden>
> Subject: NetCDF & perl
> Organization: University of Utah
> Keywords: 200508162220.j7GMK7jo022999 netCDF Perl

The above message contained the following:

> I work for the University of Utah/MesoWest team.  We
> receive the file FSL.CompressedNetCDF.MADIS.mesonet2.*
> via the FSL2 distribution.  We use perl to process the
> data for our purposes.  I want to enable more variables
> than we currently do.  Some of these new variables are
> new data types with different dimensions.  This
> presents a problem since I'm pretty ignorant about how
> the perl module works.

The perl extension module for netCDF is, basically, a one-for-one
interface to the netCDF-2 C API.  Information on the netCDF-2 C API can
be found at

    http://www.unidata.ucar.edu/cgi-bin/man-cgi?netcdf2+3

> I'll give a bare-bones version of the script:
> 
> #!/usr/bin/perl
> use NetCDF;
> $infile = $ARGV[0]; # the mesonet2 file
> 
> $ncid = NetCDF::open($infile, NetCDF::NOWRITE);
> NetCDF::inquire($ncid,$ndims,$nvars,$natts,$recdim);
> NetCDF::diminq($ncid,$recdim,$dimname,$nrec);
> 
> for ($varcount=0;$varcount<$nvars;$varcount++)
> {
>   NetCDF::varinq($ncid,$varcount,$varname,$type,$dims,\@dimids,$atts);
> 
>   # READ #1 FOR 1D FLOAT AND DOUBLE  Variables
>   if($varname eq "temperature" || $varname eq "relHumidity" || $varname eq 
> "altimeter" || $varname eq "windDir" || $varname eq "windSpeed" || $varname 
> eq "windGust" || $varname eq "precipAccum" || $varname eq "solarRadiation" || 
> $varname eq "observationTime" || $varname eq "stationPressure" || $varname eq 
> "visibility" || $varname eq "soilTemperature" || $varname eq 
> "fuelTemperature" || $varname eq "fuelMoisture" || $varname =~ 
> /roadSubsurfaceTemp/ || $varname eq "latitude" || $varname eq "longitude" )

Rather than go through the netCDF variables in a loop and using "if"
statements on the variable names, I suggest accessing the variables
directly using the NetCDF:varinq function.

>   {
>     NetCDF::diminq($ncid,$dimids[0],$dimname,$dimsize);
>     @start = (0);
>     @count = ($dimsize);
>     NetCDF::varget($ncid,$varcount,\@start,\@count,\@values);
> 
>     for ($val=0;$val<$dimsize;$val++)
>     {
>       $$varname[$val] = $values[$val];
>     }
>   }
> 
>   # READ #2 FOR 2D STRING VARIABLES
>   if($varname eq "stationName" || $varname eq "stationId" || $varname eq 
> "dataProvider" || $varname eq "rawMessage" )
>   {
>     for ($recs=0;$recs<$nrec;$recs++)
>     {
>       NetCDF::diminq($ncid,$dimids[1],$dimname,$dimsize);
>       @start = ($recs,0);
>       @count = (1,$dimsize);
>       NetCDF::varget($ncid,$varcount,\@start,\@count,\@values);
>       $$varname[$recs] = "";
>       for ($val=0;$val<$dimsize;$val++)
>       {
>         if ($values[$val] != 0)
>         {
>           $$varname[$recs] .= chr( $values[$val]);
>         }
>       }
>     }
>   }
> 
>   # READ #3 FOR 2D SHORT INTEGERS
>   if($varname eq "precipType" || $varname eq "precipIntensity")
>   {
>     for ($recs=0;$recs<$nrec;$recs++)
>     {
>       NetCDF::diminq($ncid,$dimids[1],$dimname,$dimsize);
>       @start = ($recs,0);
>       @count = (1,$dimsize);
>       NetCDF::varget($ncid,$varcount,\@start,\@count,\@values);
>       $$varname[$recs] = $values[0];
>     }
>   }
> 
>   # READ #4 FOR 1D SHORT INTEGERS
>   if($varname eq "code1PST")
>   {
>     for ($recs=0;$recs<$nrec;$recs++)
>     {
>       NetCDF::diminq($ncid,$dimids[0],$dimname,$dimsize);
>       @start = ($recs);
>       @count = (2);
>       NetCDF::varget($ncid,$varcount,\@start,\@count,\@values);
>       $$varname[$recs] = $values[$recs];
>     }
>   }
> }
> 
> for ($recs=0;$recs<$nrec;$recs++)
> {
>   # do stuff
> }
> # END OF SCRIPT
> 
> So, you can see that the script is in two parts.  The first part
> loads the NetCDF data into massive arrays.  The name of each array
> is the name of the NetCDF variable being collected.  The second
> part iterates on $recs from 0 to the total number of observations,
> extracting the "recs'th" element of each array: thus, piecing
> together a single observation from all the arrays.  Anyway, the
> first part is what I'm concerned about.
> 
> I have no idea whether this is an efficient way to extract data.
> It seems that I need a different type of extraction for different
> variable types, hence the four "READ" blocks.  The first two READs
> were written before I inherited this script.  I was able to get
> READ #3 to work after much trial and error, although I have a
> feeling I'm possibly not catching all the data (2nd dimension?).
> I can't get the fourth READ to work at all.

It could be that the shape of the "code1PST" variable (i.e., the number,
order, and size of its dimensions) doesn't match your assumptions.

> So, I basically need help understanding how the various functions
> work, and what the arguments "@start" and "@count" mean.

"@start" is a 1D vector of indices that specify the starting point for
data transfer.  The indices are 0-based and must be consonant with the
shape of the variable.

"@count" is a 1D vector of lengths that specify the number of points
along each dimension for data transfer.  The lengths must be consonant
with the shape of the variable and with the "@start" vector.

I suggest studying the netCDF-2 API mentioned earlier.

You could also use the alternative perl module described in

    
http://my.unidata.ucar.edu/cgi-bin/getfile?file=/content/support/help/MailArchives/netcdfgroup-list/msg00315.html

You could also write a netCDF-3 based C program to decode the data.

> --Brian Olsen
> MesoWest

Regards,
Steve Emmerson

> NOTE: All email exchanges with Unidata User Support are recorded in the
> Unidata inquiry tracking system and then made publicly available
> through the web.  If you do not want to have your interactions made
> available in this way, you must let us know in each email you send to us.