[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: perl metar decoder -- parsing DIRmin/max wrong ? (fwd)




===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================

---------- Forwarded message ----------
Date: Thu, 04 Mar 2004 11:00:50 -0600
From: David Larson <address@hidden>
To: Robb Kambic <address@hidden>
Subject: Re: perl metar decoder --

I've looked into this problem, which I didn't know existed.

Your code is now:

   # Separate bulletins into reports
   if( /=\n/ ) {
       s#=\s+\n#=\n#g ;
       @reports = split( /=\n/ ) ;
   } else {
       #@reports = split ( /\n/ ) ;
       s#\n# #g ;
       next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
       $reports[ 0 ] = $_ ;
   }

But based on your assumption that these bulletins will not, and cannot,
contain multiple reports (which seems and appears to be reasonable),
then there really only needs to be one split, right?  Because if there
is no equal, the entire line will be placed into the first report. This
seems to be a slight simplification:

    # Separate bulletins into reports
    if( /=\n/ ) {
         s#=\s+\n#=\n#g ;
    } else {
         s#\n# #g ;
    }
    @reports = split( /=\n/ ) ;
    ... snip ... the next line is placed down many lines
    next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;

Also, it is an error to have multiple time specifications in any report,
right?  So that can be generalized as well, as I have done above.

You asked for my comments, and well, there you have them!  :-)  I might
take a closer look at the rest of the changes as well, but that will be
delayed a bit.

I sure appreciate your quick responses to all correspondence.

Dave

Robb Kambic wrote:

>David,
>
>Yes, I know about the problem. The problem exists in bulletins that don't
>use the = sign to seperate reports. The solution is to assume that bulletins
>that don't use = only have one report. I scanned many raw reports and this
>seems to be true, so I changed the code to:
>
><               @reports = split ( /\n/ ) ;
>---
>
>
>>              #@reports = split ( /\n/ ) ;
>>              s#\n# #g ;
>>              next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
>>              $reports[ 0 ] = $_;
>>
>>
>
>The new code is attached.  I'm also working on a newer version of the
>decoder, it's in the ftp decoders directory. ie
>
>metar2nc.new and metar.cdl.new
>
>The pqact.conf entry needs to change \2:yy to \2:yyyy because it now uses
>the century too.  The cdl is different, merges vars that have different
>units into one.  ie wind knots, mph, and m/s are all store using winds
>m/s.  Also, store all reports per station into one record.  Take a look, I
>would appreciate any comments before it's released.
>
>Robb...
>
>
>On Tue, 2 Mar 2004, David Larson wrote:
>
>
>
>>Robb,
>>
>>I've been chasing down a problem that seems to cause perfectly good
>>reports to be discarded by the perl metar decoder.  There is a comment
>>in the 2.4.4 decoder that reads "reports appended together wrongly", the
>>code in this area takes the first line as the report to process, and
>>discards the next line.
>>
>>To walk through this, I'll refer to the following report:
>>
>>132
>>SAUS80 KWBC 021800 RRD
>>METAR
>>K4BL 021745Z 12005KT 3SM BR OVC008 01/M01 RMK SLP143 NOSPECI 60011
>>     8/2// T00061006 10011 21017 51007=
>>
>>The decoder attempts to classify the report type ($rep_type on line 257
>>of metar2nc), in doing so, it classifies this report as a "SPECI" ...
>>which isn't what you'd expect by visual inspection of the report.
>>However, perl is doing the right thing given that it is asked to match
>>on #(METAR|SPECI) \d{4,6}Z?\n# which exists in the remarks of the report.
>>
>>The solution is probably to bind the text to the start of the line with
>>a caret.  Seems to work pretty well so far.
>>
>>I've changed the lines (257-263) in metar2nc-v2.4.4 from:
>>
>>    if( s#(METAR|SPECI) \d{4,6}Z?\n## ) {
>>        $rep_type = $1 ;
>>    } elsif( s#(METAR|SPECI)\s*\n## ) {
>>        $rep_type = $1 ;
>>    } else {
>>        $rep_type = "METAR" ;
>>    }
>>
>>To:
>>
>>    if( s#^(METAR|SPECI) \d{4,6}Z?\n## ) {
>>        $rep_type = $1 ;
>>    } elsif( s#^(METAR|SPECI)\s*\n## ) {
>>        $rep_type = $1 ;
>>    } else {
>>        $rep_type = "METAR" ;
>>    }
>>
>>I simply added the caret (^) to bind the pattern to the start of the report.
>>
>>Let me know what you think.
>>Dave
>>
>>
>>
>
>===============================================================================
>Robb Kambic                               Unidata Program Center
>Software Engineer III                     Univ. Corp for Atmospheric Research
>address@hidden            WWW: http://www.unidata.ucar.edu/
>===============================================================================
>
>------------------------------------------------------------------------
>
>#! /usr/local/bin/perl
>#
>#  usage: metar2nc cdlfile [datatdir] [yymm] < ncfile
>#
>#
>#chdir( "/home/rkambic/code/decoders/src/metar" ) ;
>
>use NetCDF ;
>use Time::Local ;
># process command line switches
>while ($_ = $ARGV[0], /^-/) {
>        shift;
>       last if /^--$/;
>            /^(-v)/ && $verbose++;
>}
># process input parameters
>if( $#ARGV == 0 ) {
>       $cdlfile = $ARGV[ 0 ] ;
>} elsif( $#ARGV == 1 ) {
>       $cdlfile = $ARGV[ 0 ] ;
>       if( $ARGV[ 1 ] =~ /^\d/ ) {
>               $yymm = $ARGV[ 1 ] ;
>       } else {
>               $datadir = $ARGV[ 1 ] ;
>       }
>} elsif( $#ARGV == 2 ) {
>       $cdlfile = $ARGV[ 0 ] ;
>       $datadir = $ARGV[ 1 ] ;
>       $yymm = $ARGV[ 2 ] ;
>} else {
>       die "usage: metar2nc cdlfile [datatdir] [yymm] < ncfile $!\n" ;
>}
>print "Missing cdlfile file $cdlfile: $!\n" unless  -e $cdlfile ;
>
>if( -e "util/ncgen" ) {
>       $ncgen = "util/ncgen" ;
>} elsif( -e "/usr/local/ldm/util/ncgen" ) {
>       $ncgen = "/usr/local/ldm/util/ncgen" ;
>} elsif( -e "/upc/netcdf/bin/ncgen" ) {
>       $ncgen = "/upc/netcdf/bin/ncgen" ;
>} elsif( -e "./ncgen" ) {
>       $ncgen = "./ncgen" ;
>} else {
>       open( NCGEN, "which ncgen |" ) ;
>       $ncgen = <NCGEN> ;
>       close( NCGEN ) ;
>
>       if( $ncgen =~ /no ncgen/ ) {
>               die "Can't find NetCDF utility 'ncgen' in PATH, util/ncgen
>/usr/local/ldm/util/ncgen, /upc/netcdf/bin/ncgen, or ./ncgen : $!\n" ;
>       } else {
>               $ncgen = "ncgen" ;
>       }
>}
># the data and the metadata directories
>$datadir = "." if( ! $datadir ) ;
>$metadir = $datadir . "/../metadata/surface/metar" ;
># redirect STDOUT and STDERR
>open( STDOUT, ">$datadir/metarLog.$$.log" ) ||
>               die "could not open $datadir/metarLog.$$.log: $!\n" ;
>open( STDERR, ">&STDOUT" ) ||
>               die "could not dup stdout: $!\n" ;
>select( STDERR ) ; $| = 1 ;
>select( STDOUT ) ; $| = 1 ;
>
>die "Missing cdlfile file $cdlfile: $!\n" unless  -e $cdlfile ;
>
># year and month
>if( ! $yymm ) {
>       $theyear = (gmtime())[ 5 ] ;
>       $theyear = ( $theyear < 100 ? $theyear : $theyear - 100 ) ;
>       $theyear = sprintf( "%02d", $theyear ) ;
>       $themonth = (gmtime())[ 4 ] ;
>       $themonth++ ;
>       $yymm = $theyear . sprintf( "%02d", $themonth ) ;
>} else {
>       $theyear = substr( $yymm, 0, 2 ) ;
>       $themonth = substr( $yymm, 2 ) ;
>}
># file used for bad metars or prevention of overwrites to ncfiles
>open( OPN, ">>$datadir/rawmetars.$$.nc" ) ||
>               die "could not open $datadir/rawmetars.$$.nc: $!\n" ;
># set error handling to verbose only
>$result = NetCDF::opts( VERBOSE ) ;
>
># set interrupt handler
>$SIG{ 'INT' }  = 'atexit' ;
>$SIG{ 'KILL' }  = 'atexit' ;
>$SIG{ 'TERM' }  = 'atexit' ;
>$SIG{ 'QUIT' }  = 'atexit' ;
>
># set defaults
>
>$F = -99999 ;
>$A = \$F ;
>$S1 = "\0" ;
>$AS1 = \$S1 ;
>$S2 = "\0\0" ;
>$AS2 = \$S2 ;
>$S3 = "\0\0\0" ;
>$AS3 = \$S3 ;
>$S4 = "\0\0\0\0" ;
>$AS4 = \$S4 ;
>$S8 = "\0" x 8 ;
>$AS8 = \$S8 ;
>$S10 = "\0" x 10 ;
>$AS10 = \$S10 ;
>$S15 = "\0" x 15 ;
>$AS15 = \$S15 ;
>$S32 = "\0" x 32 ;
>$AS32 = \$S32 ;
>$S128 = "\0" x 128 ;
>$AS128 = \$S128 ;
>
>%CDL = (
>"rep_type", 0, "stn_name", 1, "wmo_id", 2, "lat", 3, "lon", 4, "elev", 5,
>"ob_hour", 6, "ob_min", 7, "ob_day", 8, "time_obs", 9, "time_nominal", 10,
>"AUTO", 11, "UNITS", 12, "DIR", 13, "SPD", 14, "GUST", 15, "VRB", 16,
>"DIRmin", 17, "DIRmax", 18, "prevail_VIS_SM", 19, "prevail_VIS_KM", 20,
>"plus_VIS_SM", 21, "plus_VIS_KM", 22, "prevail_VIS_M", 23, "VIS_dir", 24,
>"CAVOK", 25, "RVRNO", 26, "RV_designator", 27, "RV_above_max", 28,
>"RV_below_min", 29, "RV_vrbl", 30, "RV_min", 31, "RV_max", 32,
>"RV_visRange", 33, "WX", 34, "vert_VIS", 35, "cloud_type", 36, "cloud_hgt", 37,
>"cloud_meters", 38, "cloud_phenom", 39, "T", 40, "TD", 41,
>"hectoPasc_ALTIM", 42, "inches_ALTIM", 43, "NOSIG", 44, "TornadicType", 45,
>"TornadicLOC", 46, "TornadicDIR", 47, "BTornadic_hh", 48, "BTornadic_mm", 49,
>"ETornadic_hh", 50, "ETornadic_mm", 51, "AUTOindicator", 52,
>"PKWND_dir", 53, "PKWND_spd", 54, "PKWND_hh", 55, "PKWND_mm", 56,
>"WshfTime_hh", 57, "WshfTime_mm", 58, "Wshft_FROPA", 59, "VIS_TWR", 60,
>"VIS_SFC", 61, "VISmin", 62, "VISmax", 63, "VIS_2ndSite", 64,
>"VIS_2ndSite_LOC", 65, "LTG_OCNL", 66, "LTG_FRQ", 67, "LTG_CNS", 68,
>"LTG_CG", 69, "LTG_IC", 70, "LTG_CC", 71, "LTG_CA", 72, "LTG_DSNT", 73,
>"LTG_AP", 74, "LTG_VcyStn", 75, "LTG_DIR", 76, "Recent_WX", 77,
>"Recent_WX_Bhh", 78, "Recent_WX_Bmm", 79, "Recent_WX_Ehh", 80,
>"Recent_WX_Emm", 81, "Ceiling_min", 82, "Ceiling_max", 83,
>"CIG_2ndSite_meters", 84, "CIG_2ndSite_LOC", 85, "PRESFR", 86, "PRESRR", 87,
>"SLPNO", 88, "SLP", 89, "SectorVIS_DIR", 90, "SectorVIS", 91, "GR", 92,
>"GRsize", 93, "VIRGA", 94, "VIRGAdir", 95, "SfcObscuration", 96,
>"OctsSkyObscured", 97, "CIGNO", 98, "Ceiling_est", 99, "Ceiling", 100,
>"VrbSkyBelow", 101, "VrbSkyLayerHgt", 102, "VrbSkyAbove", 103,
>"Sign_cloud", 104, "Sign_dist", 105, "Sign_dir", 106, "ObscurAloft", 107,
>"ObscurAloftSkyCond", 108, "ObscurAloftHgt", 109, "ACFTMSHP", 110,
>"NOSPECI", 111, "FIRST", 112, "LAST", 113, "Cloud_low", 114,
>"Cloud_medium", 115, "Cloud_high", 116, "SNINCR", 117, "SNINCR_TotalDepth", 
>118,
>"SN_depth", 119, "SN_waterequiv", 120, "SunSensorOut", 121, "SunShineDur", 122,
>"PRECIP_hourly", 123, "PRECIP_amt", 124, "PRECIP_24_amt", 125, "T_tenths", 126,
>"TD_tenths", 127, "Tmax", 128, "Tmin", 129, "Tmax24", 130, "Tmin24", 131,
>"char_Ptend", 132, "Ptend", 133, "PWINO", 134, "FZRANO", 135, "TSNO", 136,
>"PNO", 137, "maintIndicator", 138, "PlainText", 139, "report", 140,
>"remarks", 141 ) ;
>
># default netCDF record structure, contains all vars for the METAR reports
>@defaultrec = ( $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS3, $A, $A,
> $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS2, $A, $A, [( $S3, $S3, $S3, $S3 )],
> [( $F, $F, $F, $F )], [( $F, $F, $F, $F )], [( $F, $F, $F, $F )],
> [( $F, $F, $F, $F )], [( $F, $F, $F, $F )], [( $F, $F, $F, $F )], $AS32, $A,
> [( $S4, $S4, $S4, $S4, $S4, $S4 )], [( $F, $F, $F, $F, $F, $F )],
> [( $F, $F, $F, $F, $F, $F )], [( $S4, $S4, $S4, $S4, $S4, $S4 )],
> $A, $A, $A, $A, $A, $AS15, $AS10, $AS2, $A, $A, $A, $A, $AS4, $A, $A,
> $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS10, $A, $A, $A, $A, $A,
> $A, $A, $A, $A, $A, $AS2, [( $S8, $S8, $S8 )], [( $F, $F, $F )],
> [( $F, $F, $F )], [( $F, $F, $F )], [( $F, $F, $F )], $A, $A, $A, $A,
> $A, $A, $A, $A, $AS2, $A, $A, $A, $A, $AS2, $AS8, $A, $A, $A, $A,
> $AS3, $A, $AS3, $AS10, $AS10, $AS10, $AS8, $AS3, $A, $A, $A, $A, $A,
> $AS1, $AS1, $AS1, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A,
> $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS128, $AS128, $AS128 ) ;
>
># two fold purpose array, if entry > 0, then var is requested and it's value
># is the position in the record, except first entry
>@W = ( 0 ) x ( $#defaultrec +1 ) ;
>$W[ 0 ] = -1 ;
>
># open cdl and create record structure according to variables
>open( CDL, "$cdlfile" ) || die "could not open $cdlfile: $!\n" ;
>$i = 0 ;
>while( <CDL> ) {
>       if( s#^\s*(char|int|long|double|float) (\w{1,25})## ) {
>               ( $number ) = $CDL{ $2 } ;
>               push( @rec, $defaultrec[ $number ] ) ;
>               $W[ $number ] = $i++ ;
>       }
>}
>close CDL ;
>undef( @defaultrec ) ;
>undef( %CDL ) ;
>
># read in station data
>if( -e "etc/sfmetar_sa.tbl" ) {
>       $sfile = "etc/sfmetar_sa.tbl" ;
>} elsif( -e "./sfmetar_sa.tbl" ) {
>       $sfile = "./sfmetar_sa.tbl" ;
>} else {
>       die "Can't find sfmetar_sa.tbl station file.: $!\n" ;
>}
>open( STATION, "$sfile" ) || die "could not open $sfile: $!\n" ;
>
>while( <STATION> ) {
>       s#^(\w{3,6})?\s+(\d{4,5}).{40}## ;
>       $id = $1 ;
>       $wmo_id = $2 ;
>       $wmo_id = "0" . $wmo_id if( length( $wmo_id ) == 4 ) ;
>       ( $lat, $lon, $elev ) = split ;
>       $lat = sprintf( "%7.2f", $lat / 100 ) ;
>       $lon = sprintf( "%7.2f", $lon / 100) ;
>
>       # set these vars ( $wmo_id, $lat, $lon, $elev )
>       $STATIONS{ "$id" } = "$wmo_id $lat $lon $elev" ;
>}
>close STATION ;
>
># read in list of already processed reports if it exists
># open metar.lst, list of reports processed in the last 4 hours.
>if( -e "$datadir/metar.lst" ) {
>       open( LST, "$datadir/metar.lst" ) ||
>               die "could not open $datadir/metar.lst: $!\n" ;
>       while( <LST> ) {
>               ( $stn, $rtptime, $hr ) = split ;
>               $reportslist{ "$stn $rtptime" }  = $hr ;
>       }
>       close LST ;
>       #unlink( "$datadir/metar.lst" ) ;
>}
># Now begin parsing file and decoding observations breaking on cntrl C
>$/ = "\cC" ;
>
># set select processing here from STDIN
>START:
>while( 1 ) {
>       open( STDIN, '-' ) ;
>       vec($rin,fileno(STDIN),1) = 1;
>       $timeout = 1200 ; # 20 minutes
>       $nfound = select( $rout = $rin, undef, undef, $timeout );
>       # timed out
>       if( ! $nfound ) {
>               print "Shut down, time out 20 minutes\n" ;
>               &atexit() ;
>       }
>       &atexit( "eof" ) if( eof( STDIN ) ) ;
>
>       # Process each line of metar bulletins, header first
>       $_ = <STDIN> ;
>       #next unless /METAR|SPECI/ ;
>       s#\cC## ;
>       s#\cM##g ;
>       s#\cA\n## ;
>       s#\c^##g ;
>
>       s#\d\d\d \n## ;
>       s#\w{4}\d{1,2} \w{4} (\d{2})(\d{2})(\d{2})?.*\n## ;
>       $tday = $1 ;
>       $thour = $2 ;
>       $thour = "23" if( $thour eq "24" ) ;
>       $tmin = $3 ;
>       $tmin = "00" unless( $tmin ) ;
>       next unless ( $tday && defined( $thour ) ) ;
>       $time_trans = thetime( "trans" ) ;
>       if( s#(METAR|SPECI) \d{4,6}Z?\n## ) {
>               $rep_type = $1 ;
>       } elsif( s#(METAR|SPECI)\s*\n## ) {
>               $rep_type = $1 ;
>       } else {
>               $rep_type = "METAR" ;
>       }
>       # Separate bulletins into reports
>       if( /=\n/ ) {
>               s#=\s+\n#=\n#g ;
>               @reports = split( /=\n/ ) ;
>       } else {
>               #@reports = split ( /\n/ ) ;
>               s#\n# #g ;
>               next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
>               $reports[ 0 ] = $_ ;
>       }
>
>