Re: perl metar decoder -- parsing DIRmin/max wrong ? (fwd)

NOTE: The decoders mailing list is no longer active. The list archives are made available for historical reasons.

To: bit-bucket <bit-bucket@xxxxxxxxxxxxxxxx>
Subject: Re: perl metar decoder -- parsing DIRmin/max wrong ? (fwd)
From: Robb Kambic <rkambic@xxxxxxxxxxxxxxxx>
Date: Thu, 4 Mar 2004 14:40:05 -0700 (MST)



==============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
rkambic@xxxxxxxxxxxxxxxx                   WWW: http://www.unidata.ucar.edu/
==============================================================================

---------- Forwarded message ----------
Date: Thu, 04 Mar 2004 11:00:50 -0600
From: David Larson <davidl@xxxxxxxxxxxxxxxxxx>
To: Robb Kambic <rkambic@xxxxxxxxxxxxxxxx>
Subject: Re: perl metar decoder --

I've looked into this problem, which I didn't know existed.

Your code is now:

  # Separate bulletins into reports
  if( /=\n/ ) {
      s#=\s+\n#=\n#g ;
      @reports = split( /=\n/ ) ;
  } else {
      #@reports = split ( /\n/ ) ;
      s#\n# #g ;
      next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
      $reports[ 0 ] = $_ ;
  }

But based on your assumption that these bulletins will not, and cannot,
contain multiple reports (which seems and appears to be reasonable),
then there really only needs to be one split, right?  Because if there
is no equal, the entire line will be placed into the first report. This
seems to be a slight simplification:

   # Separate bulletins into reports
   if( /=\n/ ) {
        s#=\s+\n#=\n#g ;
   } else {
        s#\n# #g ;
   }
   @reports = split( /=\n/ ) ;
   ... snip ... the next line is placed down many lines
   next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;

Also, it is an error to have multiple time specifications in any report,
right?  So that can be generalized as well, as I have done above.

You asked for my comments, and well, there you have them!  :-)  I might
take a closer look at the rest of the changes as well, but that will be
delayed a bit.

I sure appreciate your quick responses to all correspondence.

Dave

Robb Kambic wrote:

David,

Yes, I know about the problem. The problem exists in bulletins that don't
use the = sign to seperate reports. The solution is to assume that bulletins
that don't use = only have one report. I scanned many raw reports and this
seems to be true, so I changed the code to:

<               @reports = split ( /\n/ ) ;
---

             #@reports = split ( /\n/ ) ;
             s#\n# #g ;
             next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
             $reports[ 0 ] = $_;


The new code is attached.  I'm also working on a newer version of the
decoder, it's in the ftp decoders directory. ie

metar2nc.new and metar.cdl.new

The pqact.conf entry needs to change \2:yy to \2:yyyy because it now uses
the century too.  The cdl is different, merges vars that have different
units into one.  ie wind knots, mph, and m/s are all store using winds
m/s.  Also, store all reports per station into one record.  Take a look, I
would appreciate any comments before it's released.

Robb...


On Tue, 2 Mar 2004, David Larson wrote:

Robb,

I've been chasing down a problem that seems to cause perfectly good
reports to be discarded by the perl metar decoder.  There is a comment
in the 2.4.4 decoder that reads "reports appended together wrongly", the
code in this area takes the first line as the report to process, and
discards the next line.

To walk through this, I'll refer to the following report:

132
SAUS80 KWBC 021800 RRD
METAR
K4BL 021745Z 12005KT 3SM BR OVC008 01/M01 RMK SLP143 NOSPECI 60011
    8/2// T00061006 10011 21017 51007

The decoder attempts to classify the report type ($rep_type on line 257
of metar2nc), in doing so, it classifies this report as a "SPECI" ...
which isn't what you'd expect by visual inspection of the report.
However, perl is doing the right thing given that it is asked to match
on #(METAR|SPECI) \d{4,6}Z?\n# which exists in the remarks of the report.

The solution is probably to bind the text to the start of the line with
a caret.  Seems to work pretty well so far.

I've changed the lines (257-263) in metar2nc-v2.4.4 from:

   if( s#(METAR|SPECI) \d{4,6}Z?\n## ) {
       $rep_type = $1 ;
   } elsif( s#(METAR|SPECI)\s*\n## ) {
       $rep_type = $1 ;
   } else {
       $rep_type = "METAR" ;
   }

To:

   if( s#^(METAR|SPECI) \d{4,6}Z?\n## ) {
       $rep_type = $1 ;
   } elsif( s#^(METAR|SPECI)\s*\n## ) {
       $rep_type = $1 ;
   } else {
       $rep_type = "METAR" ;
   }

I simply added the caret (^) to bind the pattern to the start of the report.

Let me know what you think.
Dave


==============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
rkambic@xxxxxxxxxxxxxxxx                   WWW: http://www.unidata.ucar.edu/
==============================================================================

------------------------------------------------------------------------

#! /usr/local/bin/perl
#
#  usage: metar2nc cdlfile [datatdir] [yymm] < ncfile
#
#
#chdir( "/home/rkambic/code/decoders/src/metar" ) ;

use NetCDF ;
use Time::Local ;
# process command line switches
while ($_ = $ARGV[0], /^-/) {
         shift;
      last if /^--$/;
             /^(-v)/ && $verbose++;
}
# process input parameters
if( $#ARGV == 0 ) {
        $cdlfile = $ARGV[ 0 ] ;
} elsif( $#ARGV == 1 ) {
        $cdlfile = $ARGV[ 0 ] ;
        if( $ARGV[ 1 ] =~ /^\d/ ) {
                $yymm = $ARGV[ 1 ] ;
        } else {
                $datadir = $ARGV[ 1 ] ;
        }
} elsif( $#ARGV == 2 ) {
        $cdlfile = $ARGV[ 0 ] ;
        $datadir = $ARGV[ 1 ] ;
        $yymm = $ARGV[ 2 ] ;
} else {
        die "usage: metar2nc cdlfile [datatdir] [yymm] < ncfile $!\n" ;
}
print "Missing cdlfile file $cdlfile: $!\n" unless  -e $cdlfile ;

if( -e "util/ncgen" ) {
        $ncgen = "util/ncgen" ;
} elsif( -e "/usr/local/ldm/util/ncgen" ) {
        $ncgen = "/usr/local/ldm/util/ncgen" ;
} elsif( -e "/upc/netcdf/bin/ncgen" ) {
        $ncgen = "/upc/netcdf/bin/ncgen" ;
} elsif( -e "./ncgen" ) {
        $ncgen = "./ncgen" ;
} else {
        open( NCGEN, "which ncgen |" ) ;
        $ncgen = <NCGEN> ;
        close( NCGEN ) ;

        if( $ncgen =~ /no ncgen/ ) {
                die "Can't find NetCDF utility 'ncgen' in PATH, util/ncgen
/usr/local/ldm/util/ncgen, /upc/netcdf/bin/ncgen, or ./ncgen : $!\n" ;
        } else {
                $ncgen = "ncgen" ;
        }
}
# the data and the metadata directories
$datadir = "." if( ! $datadir ) ;
$metadir = $datadir . "/../metadata/surface/metar" ;
# redirect STDOUT and STDERR
open( STDOUT, ">$datadir/metarLog.$$.log" ) ||
                die "could not open $datadir/metarLog.$$.log: $!\n" ;
open( STDERR, ">&STDOUT" ) ||
                die "could not dup stdout: $!\n" ;
select( STDERR ) ; $| = 1 ;
select( STDOUT ) ; $| = 1 ;

die "Missing cdlfile file $cdlfile: $!\n" unless  -e $cdlfile ;

# year and month
if( ! $yymm ) {
        $theyear = (gmtime())[ 5 ] ;
        $theyear = ( $theyear < 100 ? $theyear : $theyear - 100 ) ;
        $theyear = sprintf( "%02d", $theyear ) ;
        $themonth = (gmtime())[ 4 ] ;
        $themonth++ ;
        $yymm = $theyear . sprintf( "%02d", $themonth ) ;
} else {
        $theyear = substr( $yymm, 0, 2 ) ;
        $themonth = substr( $yymm, 2 ) ;
}
# file used for bad metars or prevention of overwrites to ncfiles
open( OPN, ">>$datadir/rawmetars.$$.nc" ) ||
                die "could not open $datadir/rawmetars.$$.nc: $!\n" ;
# set error handling to verbose only
$result = NetCDF::opts( VERBOSE ) ;

# set interrupt handler
$SIG{ 'INT' }  = 'atexit' ;
$SIG{ 'KILL' }  = 'atexit' ;
$SIG{ 'TERM' }  = 'atexit' ;
$SIG{ 'QUIT' }  = 'atexit' ;

# set defaults

$F = -99999 ;
$A = \$F ;
$S1 = "\0" ;
$AS1 = \$S1 ;
$S2 = "\0\0" ;
$AS2 = \$S2 ;
$S3 = "\0\0\0" ;
$AS3 = \$S3 ;
$S4 = "\0\0\0\0" ;
$AS4 = \$S4 ;
$S8 = "\0" x 8 ;
$AS8 = \$S8 ;
$S10 = "\0" x 10 ;
$AS10 = \$S10 ;
$S15 = "\0" x 15 ;
$AS15 = \$S15 ;
$S32 = "\0" x 32 ;
$AS32 = \$S32 ;
$S128 = "\0" x 128 ;
$AS128 = \$S128 ;

%CDL = (
"rep_type", 0, "stn_name", 1, "wmo_id", 2, "lat", 3, "lon", 4, "elev", 5,
"ob_hour", 6, "ob_min", 7, "ob_day", 8, "time_obs", 9, "time_nominal", 10,
"AUTO", 11, "UNITS", 12, "DIR", 13, "SPD", 14, "GUST", 15, "VRB", 16,
"DIRmin", 17, "DIRmax", 18, "prevail_VIS_SM", 19, "prevail_VIS_KM", 20,
"plus_VIS_SM", 21, "plus_VIS_KM", 22, "prevail_VIS_M", 23, "VIS_dir", 24,
"CAVOK", 25, "RVRNO", 26, "RV_designator", 27, "RV_above_max", 28,
"RV_below_min", 29, "RV_vrbl", 30, "RV_min", 31, "RV_max", 32,
"RV_visRange", 33, "WX", 34, "vert_VIS", 35, "cloud_type", 36, "cloud_hgt", 37,
"cloud_meters", 38, "cloud_phenom", 39, "T", 40, "TD", 41,
"hectoPasc_ALTIM", 42, "inches_ALTIM", 43, "NOSIG", 44, "TornadicType", 45,
"TornadicLOC", 46, "TornadicDIR", 47, "BTornadic_hh", 48, "BTornadic_mm", 49,
"ETornadic_hh", 50, "ETornadic_mm", 51, "AUTOindicator", 52,
"PKWND_dir", 53, "PKWND_spd", 54, "PKWND_hh", 55, "PKWND_mm", 56,
"WshfTime_hh", 57, "WshfTime_mm", 58, "Wshft_FROPA", 59, "VIS_TWR", 60,
"VIS_SFC", 61, "VISmin", 62, "VISmax", 63, "VIS_2ndSite", 64,
"VIS_2ndSite_LOC", 65, "LTG_OCNL", 66, "LTG_FRQ", 67, "LTG_CNS", 68,
"LTG_CG", 69, "LTG_IC", 70, "LTG_CC", 71, "LTG_CA", 72, "LTG_DSNT", 73,
"LTG_AP", 74, "LTG_VcyStn", 75, "LTG_DIR", 76, "Recent_WX", 77,
"Recent_WX_Bhh", 78, "Recent_WX_Bmm", 79, "Recent_WX_Ehh", 80,
"Recent_WX_Emm", 81, "Ceiling_min", 82, "Ceiling_max", 83,
"CIG_2ndSite_meters", 84, "CIG_2ndSite_LOC", 85, "PRESFR", 86, "PRESRR", 87,
"SLPNO", 88, "SLP", 89, "SectorVIS_DIR", 90, "SectorVIS", 91, "GR", 92,
"GRsize", 93, "VIRGA", 94, "VIRGAdir", 95, "SfcObscuration", 96,
"OctsSkyObscured", 97, "CIGNO", 98, "Ceiling_est", 99, "Ceiling", 100,
"VrbSkyBelow", 101, "VrbSkyLayerHgt", 102, "VrbSkyAbove", 103,
"Sign_cloud", 104, "Sign_dist", 105, "Sign_dir", 106, "ObscurAloft", 107,
"ObscurAloftSkyCond", 108, "ObscurAloftHgt", 109, "ACFTMSHP", 110,
"NOSPECI", 111, "FIRST", 112, "LAST", 113, "Cloud_low", 114,
"Cloud_medium", 115, "Cloud_high", 116, "SNINCR", 117, "SNINCR_TotalDepth", 118,
"SN_depth", 119, "SN_waterequiv", 120, "SunSensorOut", 121, "SunShineDur", 122,
"PRECIP_hourly", 123, "PRECIP_amt", 124, "PRECIP_24_amt", 125, "T_tenths", 126,
"TD_tenths", 127, "Tmax", 128, "Tmin", 129, "Tmax24", 130, "Tmin24", 131,
"char_Ptend", 132, "Ptend", 133, "PWINO", 134, "FZRANO", 135, "TSNO", 136,
"PNO", 137, "maintIndicator", 138, "PlainText", 139, "report", 140,
"remarks", 141 ) ;

# default netCDF record structure, contains all vars for the METAR reports
@defaultrec = ( $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS3, $A, $A,
$A, $A, $A, $A, $A, $A, $A, $A, $A, $AS2, $A, $A, [( $S3, $S3, $S3, $S3 )],
[( $F, $F, $F, $F )], [( $F, $F, $F, $F )], [( $F, $F, $F, $F )],
[( $F, $F, $F, $F )], [( $F, $F, $F, $F )], [( $F, $F, $F, $F )], $AS32, $A,
[( $S4, $S4, $S4, $S4, $S4, $S4 )], [( $F, $F, $F, $F, $F, $F )],
[( $F, $F, $F, $F, $F, $F )], [( $S4, $S4, $S4, $S4, $S4, $S4 )],
$A, $A, $A, $A, $A, $AS15, $AS10, $AS2, $A, $A, $A, $A, $AS4, $A, $A,
$A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS10, $A, $A, $A, $A, $A,
$A, $A, $A, $A, $A, $AS2, [( $S8, $S8, $S8 )], [( $F, $F, $F )],
[( $F, $F, $F )], [( $F, $F, $F )], [( $F, $F, $F )], $A, $A, $A, $A,
$A, $A, $A, $A, $AS2, $A, $A, $A, $A, $AS2, $AS8, $A, $A, $A, $A,
$AS3, $A, $AS3, $AS10, $AS10, $AS10, $AS8, $AS3, $A, $A, $A, $A, $A,
$AS1, $AS1, $AS1, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A,
$A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS128, $AS128, $AS128 ) ;

# two fold purpose array, if entry > 0, then var is requested and it's value
# is the position in the record, except first entry
@W = ( 0 ) x ( $#defaultrec +1 ) ;
$W[ 0 ] = -1 ;

# open cdl and create record structure according to variables
open( CDL, "$cdlfile" ) || die "could not open $cdlfile: $!\n" ;
$i = 0 ;
while( <CDL> ) {
        if( s#^\s*(char|int|long|double|float) (\w{1,25})## ) {
                ( $number ) = $CDL{ $2 } ;
                push( @rec, $defaultrec[ $number ] ) ;
                $W[ $number ] = $i++ ;
        }
}
close CDL ;
undef( @defaultrec ) ;
undef( %CDL ) ;

# read in station data
if( -e "etc/sfmetar_sa.tbl" ) {
        $sfile = "etc/sfmetar_sa.tbl" ;
} elsif( -e "./sfmetar_sa.tbl" ) {
        $sfile = "./sfmetar_sa.tbl" ;
} else {
        die "Can't find sfmetar_sa.tbl station file.: $!\n" ;
}
open( STATION, "$sfile" ) || die "could not open $sfile: $!\n" ;

while( <STATION> ) {
        s#^(\w{3,6})?\s+(\d{4,5}).{40}## ;
        $id = $1 ;
        $wmo_id = $2 ;
        $wmo_id = "0" . $wmo_id if( length( $wmo_id ) == 4 ) ;
        ( $lat, $lon, $elev ) = split ;
        $lat = sprintf( "%7.2f", $lat / 100 ) ;
        $lon = sprintf( "%7.2f", $lon / 100) ;

        # set these vars ( $wmo_id, $lat, $lon, $elev )
        $STATIONS{ "$id" } = "$wmo_id $lat $lon $elev" ;
}
close STATION ;

# read in list of already processed reports if it exists
# open metar.lst, list of reports processed in the last 4 hours.
if( -e "$datadir/metar.lst" ) {
        open( LST, "$datadir/metar.lst" ) ||
                die "could not open $datadir/metar.lst: $!\n" ;
        while( <LST> ) {
                ( $stn, $rtptime, $hr ) = split ;
                $reportslist{ "$stn $rtptime" }  = $hr ;
        }
        close LST ;
        #unlink( "$datadir/metar.lst" ) ;
}
# Now begin parsing file and decoding observations breaking on cntrl C
$/ = "\cC" ;

# set select processing here from STDIN
START:
while( 1 ) {
        open( STDIN, '-' ) ;
        vec($rin,fileno(STDIN),1) = 1;
        $timeout = 1200 ; # 20 minutes
        $nfound = select( $rout = $rin, undef, undef, $timeout );
        # timed out
        if( ! $nfound ) {
                print "Shut down, time out 20 minutes\n" ;
                &atexit() ;
        }
        &atexit( "eof" ) if( eof( STDIN ) ) ;

        # Process each line of metar bulletins, header first
        $_ = <STDIN> ;
        #next unless /METAR|SPECI/ ;
        s#\cC## ;
        s#\cM##g ;
        s#\cA\n## ;
        s#\c^##g ;

        s#\d\d\d \n## ;
        s#\w{4}\d{1,2} \w{4} (\d{2})(\d{2})(\d{2})?.*\n## ;
        $tday = $1 ;
        $thour = $2 ;
        $thour = "23" if( $thour eq "24" ) ;
        $tmin = $3 ;
        $tmin = "00" unless( $tmin ) ;
        next unless ( $tday && defined( $thour ) ) ;
        $time_trans = thetime( "trans" ) ;
        if( s#(METAR|SPECI) \d{4,6}Z?\n## ) {
                $rep_type = $1 ;
        } elsif( s#(METAR|SPECI)\s*\n## ) {
                $rep_type = $1 ;
        } else {
                $rep_type = "METAR" ;
        }
        # Separate bulletins into reports
        if( /=\n/ ) {
                s#=\s+\n#=\n#g ;
                @reports = split( /=\n/ ) ;
        } else {
                #@reports = split ( /\n/ ) ;
                s#\n# #g ;
                next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
                $reports[ 0 ] = $_ ;
        }

2004 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the decoders archives: