Re: perl metar decoder - multi-report bulletins and equal

NOTE: The decoders mailing list is no longer active. The list archives are made available for historical reasons.

David,

This is discouraging, I spent hours looking through raw bulletins to "try"
make the decoder correct. I might try looking at the station ID before
processing, don't know if that will help.  My philosophy is it's better to
disregard bulletins/reports rather then enter "bad" data into a file. That
said, your example bulletin should be discarded. ugh.  Will let you know
about my new ideas.

Robb...


On Thu, 4 Mar 2004, David Larson wrote:

I do see non-US bulletins that are split "badly" according to this code
change ...

628
SANK31 MNMG 041706
METAR
MNPC 041700Z 08012KT 7000 BKN016 29/26 Q1015
MNRS 041700Z 06010KT 9999 FEW022 BKN250 30/23 Q1012
MNJG 041700Z 36004KT 7000 VCRA BKN016 22/17 Q1015
MNJU 041700Z 10006KT 9999 SCT025 32/19 Q1013
MNCH 041700Z 02010KT 9999 SCT030 33/21 Q1012
MNMG 041700Z 07016KT 9999 SCT025 32/20 Q1011 A2988
MNBL 041700Z 10008KT 9999 SCT019 SCT070 29/25 Q1014

Not only is each line a new report, but what worsens it is that the
*last* entry *is* separated by an equal!  Yuck.  Perhaps this is just a
junk bulletin?  I'm suprised that it could even go out this way.

Does anyone you know even make an attempt to use the perl metar decoder
for non-US stations?  I've tried long enough to estimate the work as a
*lot*.

Dave

David Larson wrote:

> I've looked into this problem, which I didn't know existed.
>
> Your code is now:
>
>   # Separate bulletins into reports
>   if( /=\n/ ) {
>       s#=\s+\n#=\n#g ;
>       @reports = split( /=\n/ ) ;
>   } else {
>       #@reports = split ( /\n/ ) ;
>       s#\n# #g ;
>       next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
>       $reports[ 0 ] = $_ ;
>   }
>
> But based on your assumption that these bulletins will not, and
> cannot, contain multiple reports (which seems and appears to be
> reasonable), then there really only needs to be one split, right?
> Because if there is no equal, the entire line will be placed into the
> first report. This seems to be a slight simplification:
>
>    # Separate bulletins into reports
>    if( /=\n/ ) {
>         s#=\s+\n#=\n#g ;
>    } else {
>         s#\n# #g ;
>    }
>    @reports = split( /=\n/ ) ;
>    ... snip ... the next line is placed down many lines
>    next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
>
> Also, it is an error to have multiple time specifications in any
> report, right?  So that can be generalized as well, as I have done above.
>
> You asked for my comments, and well, there you have them!  :-)  I
> might take a closer look at the rest of the changes as well, but that
> will be delayed a bit.
>
> I sure appreciate your quick responses to all correspondence.
>
> Dave
>
> Robb Kambic wrote:
>
>> David,
>>
>> Yes, I know about the problem. The problem exists in bulletins that
>> don't
>> use the = sign to seperate reports. The solution is to assume that
>> bulletins
>> that don't use = only have one report. I scanned many raw reports and
>> this
>> seems to be true, so I changed the code to:
>>
>> <               @reports = split ( /\n/ ) ;
>> ---
>>
>>
>>>              #@reports = split ( /\n/ ) ;
>>>              s#\n# #g ;
>>>              next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
>>>              $reports[ 0 ] = $_;
>>>
>>
>>
>> The new code is attached.  I'm also working on a newer version of the
>> decoder, it's in the ftp decoders directory. ie
>>
>> metar2nc.new and metar.cdl.new
>>
>> The pqact.conf entry needs to change \2:yy to \2:yyyy because it now
>> uses
>> the century too.  The cdl is different, merges vars that have different
>> units into one.  ie wind knots, mph, and m/s are all store using winds
>> m/s.  Also, store all reports per station into one record.  Take a
>> look, I
>> would appreciate any comments before it's released.
>>
>> Robb...
>>
>>
>> On Tue, 2 Mar 2004, David Larson wrote:
>>
>>
>>
>>> Robb,
>>>
>>> I've been chasing down a problem that seems to cause perfectly good
>>> reports to be discarded by the perl metar decoder.  There is a comment
>>> in the 2.4.4 decoder that reads "reports appended together wrongly",
>>> the
>>> code in this area takes the first line as the report to process, and
>>> discards the next line.
>>>
>>> To walk through this, I'll refer to the following report:
>>>
>>> 132
>>> SAUS80 KWBC 021800 RRD
>>> METAR
>>> K4BL 021745Z 12005KT 3SM BR OVC008 01/M01 RMK SLP143 NOSPECI 60011
>>>     8/2// T00061006 10011 21017 51007
>>>
>>> The decoder attempts to classify the report type ($rep_type on line 257
>>> of metar2nc), in doing so, it classifies this report as a "SPECI" ...
>>> which isn't what you'd expect by visual inspection of the report.
>>> However, perl is doing the right thing given that it is asked to match
>>> on #(METAR|SPECI) \d{4,6}Z?\n# which exists in the remarks of the
>>> report.
>>>
>>> The solution is probably to bind the text to the start of the line with
>>> a caret.  Seems to work pretty well so far.
>>>
>>> I've changed the lines (257-263) in metar2nc-v2.4.4 from:
>>>
>>>    if( s#(METAR|SPECI) \d{4,6}Z?\n## ) {
>>>        $rep_type = $1 ;
>>>    } elsif( s#(METAR|SPECI)\s*\n## ) {
>>>        $rep_type = $1 ;
>>>    } else {
>>>        $rep_type = "METAR" ;
>>>    }
>>>
>>> To:
>>>
>>>    if( s#^(METAR|SPECI) \d{4,6}Z?\n## ) {
>>>        $rep_type = $1 ;
>>>    } elsif( s#^(METAR|SPECI)\s*\n## ) {
>>>        $rep_type = $1 ;
>>>    } else {
>>>        $rep_type = "METAR" ;
>>>    }
>>>
>>> I simply added the caret (^) to bind the pattern to the start of the
>>> report.
>>>
>>> Let me know what you think.
>>> Dave
>>>
>>>
>>
>> ------------------------------------------------------------------------
>>
>> #! /usr/local/bin/perl
>> #
>> #  usage: metar2nc cdlfile [datatdir] [yymm] < ncfile
>> #
>> #
>> #chdir( "/home/rkambic/code/decoders/src/metar" ) ;
>>
>> use NetCDF ;
>> use Time::Local ;
>> # process command line switches
>> while ($_ = $ARGV[0], /^-/) {
>>      shift;
>>       last if /^--$/;
>>          /^(-v)/ && $verbose++;
>> }
>> # process input parameters
>> if( $#ARGV == 0 ) {
>>     $cdlfile = $ARGV[ 0 ] ;
>> } elsif( $#ARGV == 1 ) {
>>     $cdlfile = $ARGV[ 0 ] ;
>>     if( $ARGV[ 1 ] =~ /^\d/ ) {
>>         $yymm = $ARGV[ 1 ] ;
>>     } else {
>>         $datadir = $ARGV[ 1 ] ;
>>     }
>> } elsif( $#ARGV == 2 ) {
>>     $cdlfile = $ARGV[ 0 ] ;
>>     $datadir = $ARGV[ 1 ] ;
>>     $yymm = $ARGV[ 2 ] ;
>> } else {
>>     die "usage: metar2nc cdlfile [datatdir] [yymm] < ncfile $!\n" ;
>> }
>> print "Missing cdlfile file $cdlfile: $!\n" unless  -e $cdlfile ;
>>
>> if( -e "util/ncgen" ) {
>>     $ncgen = "util/ncgen" ;
>> } elsif( -e "/usr/local/ldm/util/ncgen" ) {
>>     $ncgen = "/usr/local/ldm/util/ncgen" ;
>> } elsif( -e "/upc/netcdf/bin/ncgen" ) {
>>     $ncgen = "/upc/netcdf/bin/ncgen" ;
>> } elsif( -e "./ncgen" ) {
>>     $ncgen = "./ncgen" ;
>> } else {
>>     open( NCGEN, "which ncgen |" ) ;
>>     $ncgen = <NCGEN> ;
>>     close( NCGEN ) ;
>>
>>     if( $ncgen =~ /no ncgen/ ) {
>>         die "Can't find NetCDF utility 'ncgen' in PATH, util/ncgen
>> /usr/local/ldm/util/ncgen, /upc/netcdf/bin/ncgen, or ./ncgen : $!\n" ;
>>     } else {
>>         $ncgen = "ncgen" ;
>>     }
>> }
>> # the data and the metadata directories $datadir = "." if( ! $datadir
>> ) ;
>> $metadir = $datadir . "/../metadata/surface/metar" ;
>> # redirect STDOUT and STDERR
>> open( STDOUT, ">$datadir/metarLog.$$.log" ) ||
>>         die "could not open $datadir/metarLog.$$.log: $!\n" ;
>> open( STDERR, ">&STDOUT" ) ||
>>         die "could not dup stdout: $!\n" ;
>> select( STDERR ) ; $| = 1 ;
>> select( STDOUT ) ; $| = 1 ;
>>
>> die "Missing cdlfile file $cdlfile: $!\n" unless  -e $cdlfile ;
>>
>> # year and month
>> if( ! $yymm ) {
>>     $theyear = (gmtime())[ 5 ] ;
>>     $theyear = ( $theyear < 100 ? $theyear : $theyear - 100 ) ;
>>     $theyear = sprintf( "%02d", $theyear ) ;
>>     $themonth = (gmtime())[ 4 ] ;
>>     $themonth++ ;
>>     $yymm = $theyear . sprintf( "%02d", $themonth ) ;
>> } else {
>>     $theyear = substr( $yymm, 0, 2 ) ;
>>     $themonth = substr( $yymm, 2 ) ;
>> }
>> # file used for bad metars or prevention of overwrites to ncfiles
>> open( OPN, ">>$datadir/rawmetars.$$.nc" ) ||         die "could not
>> open $datadir/rawmetars.$$.nc: $!\n" ;
>> # set error handling to verbose only
>> $result = NetCDF::opts( VERBOSE ) ;
>>
>> # set interrupt handler
>> $SIG{ 'INT' }  = 'atexit' ;
>> $SIG{ 'KILL' }  = 'atexit' ;
>> $SIG{ 'TERM' }  = 'atexit' ;
>> $SIG{ 'QUIT' }  = 'atexit' ;
>>
>> # set defaults
>>
>> $F = -99999 ;
>> $A = \$F ;
>> $S1 = "\0" ;
>> $AS1 = \$S1 ;
>> $S2 = "\0\0" ;
>> $AS2 = \$S2 ;
>> $S3 = "\0\0\0" ;
>> $AS3 = \$S3 ;
>> $S4 = "\0\0\0\0" ;
>> $AS4 = \$S4 ;
>> $S8 = "\0" x 8 ;
>> $AS8 = \$S8 ;
>> $S10 = "\0" x 10 ;
>> $AS10 = \$S10 ;
>> $S15 = "\0" x 15 ;
>> $AS15 = \$S15 ;
>> $S32 = "\0" x 32 ;
>> $AS32 = \$S32 ;
>> $S128 = "\0" x 128 ;
>> $AS128 = \$S128 ;
>>
>> %CDL = (
>> "rep_type", 0, "stn_name", 1, "wmo_id", 2, "lat", 3, "lon", 4,
>> "elev", 5,
>> "ob_hour", 6, "ob_min", 7, "ob_day", 8, "time_obs", 9,
>> "time_nominal", 10, "AUTO", 11, "UNITS", 12, "DIR", 13, "SPD", 14,
>> "GUST", 15, "VRB", 16, "DIRmin", 17, "DIRmax", 18, "prevail_VIS_SM",
>> 19, "prevail_VIS_KM", 20, "plus_VIS_SM", 21, "plus_VIS_KM", 22,
>> "prevail_VIS_M", 23, "VIS_dir", 24, "CAVOK", 25, "RVRNO", 26,
>> "RV_designator", 27, "RV_above_max", 28, "RV_below_min", 29,
>> "RV_vrbl", 30, "RV_min", 31, "RV_max", 32, "RV_visRange", 33, "WX",
>> 34, "vert_VIS", 35, "cloud_type", 36, "cloud_hgt", 37,
>> "cloud_meters", 38, "cloud_phenom", 39, "T", 40, "TD", 41,
>> "hectoPasc_ALTIM", 42, "inches_ALTIM", 43, "NOSIG", 44,
>> "TornadicType", 45, "TornadicLOC", 46, "TornadicDIR", 47,
>> "BTornadic_hh", 48, "BTornadic_mm", 49,
>> "ETornadic_hh", 50, "ETornadic_mm", 51, "AUTOindicator", 52,
>> "PKWND_dir", 53, "PKWND_spd", 54, "PKWND_hh", 55, "PKWND_mm", 56,
>> "WshfTime_hh", 57, "WshfTime_mm", 58, "Wshft_FROPA", 59, "VIS_TWR", 60,
>> "VIS_SFC", 61, "VISmin", 62, "VISmax", 63, "VIS_2ndSite", 64,
>> "VIS_2ndSite_LOC", 65, "LTG_OCNL", 66, "LTG_FRQ", 67, "LTG_CNS", 68,
>> "LTG_CG", 69, "LTG_IC", 70, "LTG_CC", 71, "LTG_CA", 72, "LTG_DSNT", 73,
>> "LTG_AP", 74, "LTG_VcyStn", 75, "LTG_DIR", 76, "Recent_WX", 77,
>> "Recent_WX_Bhh", 78, "Recent_WX_Bmm", 79, "Recent_WX_Ehh", 80,
>> "Recent_WX_Emm", 81, "Ceiling_min", 82, "Ceiling_max", 83,
>> "CIG_2ndSite_meters", 84, "CIG_2ndSite_LOC", 85, "PRESFR", 86,
>> "PRESRR", 87,
>> "SLPNO", 88, "SLP", 89, "SectorVIS_DIR", 90, "SectorVIS", 91, "GR", 92,
>> "GRsize", 93, "VIRGA", 94, "VIRGAdir", 95, "SfcObscuration", 96,
>> "OctsSkyObscured", 97, "CIGNO", 98, "Ceiling_est", 99, "Ceiling", 100,
>> "VrbSkyBelow", 101, "VrbSkyLayerHgt", 102, "VrbSkyAbove", 103,
>> "Sign_cloud", 104, "Sign_dist", 105, "Sign_dir", 106, "ObscurAloft",
>> 107,
>> "ObscurAloftSkyCond", 108, "ObscurAloftHgt", 109, "ACFTMSHP", 110,
>> "NOSPECI", 111, "FIRST", 112, "LAST", 113, "Cloud_low", 114,
>> "Cloud_medium", 115, "Cloud_high", 116, "SNINCR", 117,
>> "SNINCR_TotalDepth", 118,
>> "SN_depth", 119, "SN_waterequiv", 120, "SunSensorOut", 121,
>> "SunShineDur", 122,
>> "PRECIP_hourly", 123, "PRECIP_amt", 124, "PRECIP_24_amt", 125,
>> "T_tenths", 126,
>> "TD_tenths", 127, "Tmax", 128, "Tmin", 129, "Tmax24", 130, "Tmin24",
>> 131, "char_Ptend", 132, "Ptend", 133, "PWINO", 134, "FZRANO", 135,
>> "TSNO", 136, "PNO", 137, "maintIndicator", 138, "PlainText", 139,
>> "report", 140, "remarks", 141 ) ;
>>
>> # default netCDF record structure, contains all vars for the METAR
>> reports
>> @defaultrec = ( $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS3,
>> $A, $A,
>> $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS2, $A, $A, [( $S3, $S3, $S3,
>> $S3 )],
>> [( $F, $F, $F, $F )], [( $F, $F, $F, $F )], [( $F, $F, $F, $F )], [(
>> $F, $F, $F, $F )], [( $F, $F, $F, $F )], [( $F, $F, $F, $F )], $AS32,
>> $A,
>> [( $S4, $S4, $S4, $S4, $S4, $S4 )], [( $F, $F, $F, $F, $F, $F )],
>> [( $F, $F, $F, $F, $F, $F )], [( $S4, $S4, $S4, $S4, $S4, $S4 )],
>> $A, $A, $A, $A, $A, $AS15, $AS10, $AS2, $A, $A, $A, $A, $AS4, $A, $A,
>> $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS10, $A, $A, $A, $A, $A,
>> $A, $A, $A, $A, $A, $AS2, [( $S8, $S8, $S8 )], [( $F, $F, $F )],
>> [( $F, $F, $F )], [( $F, $F, $F )], [( $F, $F, $F )], $A, $A, $A, $A,
>> $A, $A, $A, $A, $AS2, $A, $A, $A, $A, $AS2, $AS8, $A, $A, $A, $A,
>> $AS3, $A, $AS3, $AS10, $AS10, $AS10, $AS8, $AS3, $A, $A, $A, $A, $A,
>> $AS1, $AS1, $AS1, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A,
>> $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS128, $AS128,
>> $AS128 ) ;
>>
>> # two fold purpose array, if entry > 0, then var is requested and
>> it's value
>> # is the position in the record, except first entry
>> @W = ( 0 ) x ( $#defaultrec +1 ) ;
>> $W[ 0 ] = -1 ;
>>
>> # open cdl and create record structure according to variables
>> open( CDL, "$cdlfile" ) || die "could not open $cdlfile: $!\n" ;
>> $i = 0 ;
>> while( <CDL> ) {
>>     if( s#^\s*(char|int|long|double|float) (\w{1,25})## ) {
>>         ( $number ) = $CDL{ $2 } ;
>>         push( @rec, $defaultrec[ $number ] ) ;
>>         $W[ $number ] = $i++ ;
>>     }
>> }
>> close CDL ;
>> undef( @defaultrec ) ;
>> undef( %CDL ) ;
>>
>> # read in station data
>> if( -e "etc/sfmetar_sa.tbl" ) {
>>     $sfile = "etc/sfmetar_sa.tbl" ;
>> } elsif( -e "./sfmetar_sa.tbl" ) {
>>     $sfile = "./sfmetar_sa.tbl" ;
>> } else {
>>     die "Can't find sfmetar_sa.tbl station file.: $!\n" ;
>> }
>> open( STATION, "$sfile" ) || die "could not open $sfile: $!\n" ;
>>
>> while( <STATION> ) {
>>     s#^(\w{3,6})?\s+(\d{4,5}).{40}## ;
>>     $id = $1 ;
>>     $wmo_id = $2 ;
>>     $wmo_id = "0" . $wmo_id if( length( $wmo_id ) == 4 ) ;
>>     ( $lat, $lon, $elev ) = split ;
>>     $lat = sprintf( "%7.2f", $lat / 100 ) ;
>>     $lon = sprintf( "%7.2f", $lon / 100) ;
>>
>>     # set these vars ( $wmo_id, $lat, $lon, $elev )     $STATIONS{
>> "$id" } = "$wmo_id $lat $lon $elev" ;
>> }
>> close STATION ;
>>
>> # read in list of already processed reports if it exists
>> # open metar.lst, list of reports processed in the last 4 hours.
>> if( -e "$datadir/metar.lst" ) {
>>     open( LST, "$datadir/metar.lst" ) ||         die "could not open
>> $datadir/metar.lst: $!\n" ;
>>     while( <LST> ) {
>>         ( $stn, $rtptime, $hr ) = split ;
>>         $reportslist{ "$stn $rtptime" }  = $hr ;
>>     }
>>     close LST ;
>>     #unlink( "$datadir/metar.lst" ) ;
>> }
>> # Now begin parsing file and decoding observations breaking on cntrl C
>> $/ = "\cC" ;
>>
>> # set select processing here from STDIN
>> START:
>> while( 1 ) {
>>     open( STDIN, '-' ) ;
>>     vec($rin,fileno(STDIN),1) = 1;
>>     $timeout = 1200 ; # 20 minutes
>>     $nfound = select( $rout = $rin, undef, undef, $timeout );
>>     # timed out
>>     if( ! $nfound ) {
>>         print "Shut down, time out 20 minutes\n" ;
>>         &atexit() ;
>>     }
>>     &atexit( "eof" ) if( eof( STDIN ) ) ;
>>
>>     # Process each line of metar bulletins, header first
>>     $_ = <STDIN> ;
>>     #next unless /METAR|SPECI/ ;
>>     s#\cC## ;
>>     s#\cM##g ;
>>     s#\cA\n## ;
>>     s#\c^##g ;
>>
>>     s#\d\d\d \n## ;
>>     s#\w{4}\d{1,2} \w{4} (\d{2})(\d{2})(\d{2})?.*\n## ;
>>     $tday = $1 ;
>>     $thour = $2 ;
>>     $thour = "23" if( $thour eq "24" ) ;
>>     $tmin = $3 ;
>>     $tmin = "00" unless( $tmin ) ;
>>     next unless ( $tday && defined( $thour ) ) ;
>>     $time_trans = thetime( "trans" ) ;
>>     if( s#(METAR|SPECI) \d{4,6}Z?\n## ) {
>>         $rep_type = $1 ;
>>     } elsif( s#(METAR|SPECI)\s*\n## ) {
>>         $rep_type = $1 ;
>>     } else {
>>         $rep_type = "METAR" ;
>>     }
>>     # Separate bulletins into reports     if( /=\n/ ) {
>>         s#=\s+\n#=\n#g ;
>>         @reports = split( /=\n/ ) ;
>>     } else {
>>         #@reports = split ( /\n/ ) ;
>>         s#\n# #g ;
>>         next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
>>         $reports[ 0 ] = $_ ;
>>     }
>>
>>
>


==============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
rkambic@xxxxxxxxxxxxxxxx                   WWW: http://www.unidata.ucar.edu/
==============================================================================


  • 2004 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the decoders archives: