[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: perl metar decoder - multi-report bulletins and equal



David,

This is discouraging, I spent hours looking through raw bulletins to "try"
make the decoder correct. I might try looking at the station ID before
processing, don't know if that will help.  My philosophy is it's better to
disregard bulletins/reports rather then enter "bad" data into a file. That
said, your example bulletin should be discarded. ugh.  Will let you know
about my new ideas.

Robb...


On Thu, 4 Mar 2004, David Larson wrote:

> I do see non-US bulletins that are split "badly" according to this code
> change ...
>
> 628
> SANK31 MNMG 041706
> METAR
> MNPC 041700Z 08012KT 7000 BKN016 29/26 Q1015
> MNRS 041700Z 06010KT 9999 FEW022 BKN250 30/23 Q1012
> MNJG 041700Z 36004KT 7000 VCRA BKN016 22/17 Q1015
> MNJU 041700Z 10006KT 9999 SCT025 32/19 Q1013
> MNCH 041700Z 02010KT 9999 SCT030 33/21 Q1012
> MNMG 041700Z 07016KT 9999 SCT025 32/20 Q1011 A2988
> MNBL 041700Z 10008KT 9999 SCT019 SCT070 29/25 Q1014=
>
> Not only is each line a new report, but what worsens it is that the
> *last* entry *is* separated by an equal!  Yuck.  Perhaps this is just a
> junk bulletin?  I'm suprised that it could even go out this way.
>
> Does anyone you know even make an attempt to use the perl metar decoder
> for non-US stations?  I've tried long enough to estimate the work as a
> *lot*.
>
> Dave
>
> David Larson wrote:
>
> > I've looked into this problem, which I didn't know existed.
> >
> > Your code is now:
> >
> >   # Separate bulletins into reports
> >   if( /=\n/ ) {
> >       s#=\s+\n#=\n#g ;
> >       @reports = split( /=\n/ ) ;
> >   } else {
> >       #@reports = split ( /\n/ ) ;
> >       s#\n# #g ;
> >       next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
> >       $reports[ 0 ] = $_ ;
> >   }
> >
> > But based on your assumption that these bulletins will not, and
> > cannot, contain multiple reports (which seems and appears to be
> > reasonable), then there really only needs to be one split, right?
> > Because if there is no equal, the entire line will be placed into the
> > first report. This seems to be a slight simplification:
> >
> >    # Separate bulletins into reports
> >    if( /=\n/ ) {
> >         s#=\s+\n#=\n#g ;
> >    } else {
> >         s#\n# #g ;
> >    }
> >    @reports = split( /=\n/ ) ;
> >    ... snip ... the next line is placed down many lines
> >    next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
> >
> > Also, it is an error to have multiple time specifications in any
> > report, right?  So that can be generalized as well, as I have done above.
> >
> > You asked for my comments, and well, there you have them!  :-)  I
> > might take a closer look at the rest of the changes as well, but that
> > will be delayed a bit.
> >
> > I sure appreciate your quick responses to all correspondence.
> >
> > Dave
> >
> > Robb Kambic wrote:
> >
> >> David,
> >>
> >> Yes, I know about the problem. The problem exists in bulletins that
> >> don't
> >> use the = sign to seperate reports. The solution is to assume that
> >> bulletins
> >> that don't use = only have one report. I scanned many raw reports and
> >> this
> >> seems to be true, so I changed the code to:
> >>
> >> <               @reports = split ( /\n/ ) ;
> >> ---
> >>
> >>
> >>>              #@reports = split ( /\n/ ) ;
> >>>              s#\n# #g ;
> >>>              next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
> >>>              $reports[ 0 ] = $_;
> >>>
> >>
> >>
> >> The new code is attached.  I'm also working on a newer version of the
> >> decoder, it's in the ftp decoders directory. ie
> >>
> >> metar2nc.new and metar.cdl.new
> >>
> >> The pqact.conf entry needs to change \2:yy to \2:yyyy because it now
> >> uses
> >> the century too.  The cdl is different, merges vars that have different
> >> units into one.  ie wind knots, mph, and m/s are all store using winds
> >> m/s.  Also, store all reports per station into one record.  Take a
> >> look, I
> >> would appreciate any comments before it's released.
> >>
> >> Robb...
> >>
> >>
> >> On Tue, 2 Mar 2004, David Larson wrote:
> >>
> >>
> >>
> >>> Robb,
> >>>
> >>> I've been chasing down a problem that seems to cause perfectly good
> >>> reports to be discarded by the perl metar decoder.  There is a comment
> >>> in the 2.4.4 decoder that reads "reports appended together wrongly",
> >>> the
> >>> code in this area takes the first line as the report to process, and
> >>> discards the next line.
> >>>
> >>> To walk through this, I'll refer to the following report:
> >>>
> >>> 132
> >>> SAUS80 KWBC 021800 RRD
> >>> METAR
> >>> K4BL 021745Z 12005KT 3SM BR OVC008 01/M01 RMK SLP143 NOSPECI 60011
> >>>     8/2// T00061006 10011 21017 51007=
> >>>
> >>> The decoder attempts to classify the report type ($rep_type on line 257
> >>> of metar2nc), in doing so, it classifies this report as a "SPECI" ...
> >>> which isn't what you'd expect by visual inspection of the report.
> >>> However, perl is doing the right thing given that it is asked to match
> >>> on #(METAR|SPECI) \d{4,6}Z?\n# which exists in the remarks of the
> >>> report.
> >>>
> >>> The solution is probably to bind the text to the start of the line with
> >>> a caret.  Seems to work pretty well so far.
> >>>
> >>> I've changed the lines (257-263) in metar2nc-v2.4.4 from:
> >>>
> >>>    if( s#(METAR|SPECI) \d{4,6}Z?\n## ) {
> >>>        $rep_type = $1 ;
> >>>    } elsif( s#(METAR|SPECI)\s*\n## ) {
> >>>        $rep_type = $1 ;
> >>>    } else {
> >>>        $rep_type = "METAR" ;
> >>>    }
> >>>
> >>> To:
> >>>
> >>>    if( s#^(METAR|SPECI) \d{4,6}Z?\n## ) {
> >>>        $rep_type = $1 ;
> >>>    } elsif( s#^(METAR|SPECI)\s*\n## ) {
> >>>        $rep_type = $1 ;
> >>>    } else {
> >>>        $rep_type = "METAR" ;
> >>>    }
> >>>
> >>> I simply added the caret (^) to bind the pattern to the start of the
> >>> report.
> >>>
> >>> Let me know what you think.
> >>> Dave
> >>>
> >>>
> >>
> >>
> >> ===============================================================================
> >>
> >> Robb Kambic                       Unidata Program Center
> >> Software Engineer III               Univ. Corp for Atmospheric Research
> >> address@hidden           WWW: http://www.unidata.ucar.edu/
> >> ===============================================================================
> >>
> >>
> >> ------------------------------------------------------------------------
> >>
> >> #! /usr/local/bin/perl
> >> #
> >> #  usage: metar2nc cdlfile [datatdir] [yymm] < ncfile
> >> #
> >> #
> >> #chdir( "/home/rkambic/code/decoders/src/metar" ) ;
> >>
> >> use NetCDF ;
> >> use Time::Local ;
> >> # process command line switches
> >> while ($_ = $ARGV[0], /^-/) {
> >>      shift;
> >>       last if /^--$/;
> >>          /^(-v)/ && $verbose++;
> >> }
> >> # process input parameters
> >> if( $#ARGV == 0 ) {
> >>     $cdlfile = $ARGV[ 0 ] ;
> >> } elsif( $#ARGV == 1 ) {
> >>     $cdlfile = $ARGV[ 0 ] ;
> >>     if( $ARGV[ 1 ] =~ /^\d/ ) {
> >>         $yymm = $ARGV[ 1 ] ;
> >>     } else {
> >>         $datadir = $ARGV[ 1 ] ;
> >>     }
> >> } elsif( $#ARGV == 2 ) {
> >>     $cdlfile = $ARGV[ 0 ] ;
> >>     $datadir = $ARGV[ 1 ] ;
> >>     $yymm = $ARGV[ 2 ] ;
> >> } else {
> >>     die "usage: metar2nc cdlfile [datatdir] [yymm] < ncfile $!\n" ;
> >> }
> >> print "Missing cdlfile file $cdlfile: $!\n" unless  -e $cdlfile ;
> >>
> >> if( -e "util/ncgen" ) {
> >>     $ncgen = "util/ncgen" ;
> >> } elsif( -e "/usr/local/ldm/util/ncgen" ) {
> >>     $ncgen = "/usr/local/ldm/util/ncgen" ;
> >> } elsif( -e "/upc/netcdf/bin/ncgen" ) {
> >>     $ncgen = "/upc/netcdf/bin/ncgen" ;
> >> } elsif( -e "./ncgen" ) {
> >>     $ncgen = "./ncgen" ;
> >> } else {
> >>     open( NCGEN, "which ncgen |" ) ;
> >>     $ncgen = <NCGEN> ;
> >>     close( NCGEN ) ;
> >>
> >>     if( $ncgen =~ /no ncgen/ ) {
> >>         die "Can't find NetCDF utility 'ncgen' in PATH, util/ncgen
> >> /usr/local/ldm/util/ncgen, /upc/netcdf/bin/ncgen, or ./ncgen : $!\n" ;
> >>     } else {
> >>         $ncgen = "ncgen" ;
> >>     }
> >> }
> >> # the data and the metadata directories $datadir = "." if( ! $datadir
> >> ) ;
> >> $metadir = $datadir . "/../metadata/surface/metar" ;
> >> # redirect STDOUT and STDERR
> >> open( STDOUT, ">$datadir/metarLog.$$.log" ) ||
> >>         die "could not open $datadir/metarLog.$$.log: $!\n" ;
> >> open( STDERR, ">&STDOUT" ) ||
> >>         die "could not dup stdout: $!\n" ;
> >> select( STDERR ) ; $| = 1 ;
> >> select( STDOUT ) ; $| = 1 ;
> >>
> >> die "Missing cdlfile file $cdlfile: $!\n" unless  -e $cdlfile ;
> >>
> >> # year and month
> >> if( ! $yymm ) {
> >>     $theyear = (gmtime())[ 5 ] ;
> >>     $theyear = ( $theyear < 100 ? $theyear : $theyear - 100 ) ;
> >>     $theyear = sprintf( "%02d", $theyear ) ;
> >>     $themonth = (gmtime())[ 4 ] ;
> >>     $themonth++ ;
> >>     $yymm = $theyear . sprintf( "%02d", $themonth ) ;
> >> } else {
> >>     $theyear = substr( $yymm, 0, 2 ) ;
> >>     $themonth = substr( $yymm, 2 ) ;
> >> }
> >> # file used for bad metars or prevention of overwrites to ncfiles
> >> open( OPN, ">>$datadir/rawmetars.$$.nc" ) ||         die "could not
> >> open $datadir/rawmetars.$$.nc: $!\n" ;
> >> # set error handling to verbose only
> >> $result = NetCDF::opts( VERBOSE ) ;
> >>
> >> # set interrupt handler
> >> $SIG{ 'INT' }  = 'atexit' ;
> >> $SIG{ 'KILL' }  = 'atexit' ;
> >> $SIG{ 'TERM' }  = 'atexit' ;
> >> $SIG{ 'QUIT' }  = 'atexit' ;
> >>
> >> # set defaults
> >>
> >> $F = -99999 ;
> >> $A = \$F ;
> >> $S1 = "\0" ;
> >> $AS1 = \$S1 ;
> >> $S2 = "\0\0" ;
> >> $AS2 = \$S2 ;
> >> $S3 = "\0\0\0" ;
> >> $AS3 = \$S3 ;
> >> $S4 = "\0\0\0\0" ;
> >> $AS4 = \$S4 ;
> >> $S8 = "\0" x 8 ;
> >> $AS8 = \$S8 ;
> >> $S10 = "\0" x 10 ;
> >> $AS10 = \$S10 ;
> >> $S15 = "\0" x 15 ;
> >> $AS15 = \$S15 ;
> >> $S32 = "\0" x 32 ;
> >> $AS32 = \$S32 ;
> >> $S128 = "\0" x 128 ;
> >> $AS128 = \$S128 ;
> >>
> >> %CDL = (
> >> "rep_type", 0, "stn_name", 1, "wmo_id", 2, "lat", 3, "lon", 4,
> >> "elev", 5,
> >> "ob_hour", 6, "ob_min", 7, "ob_day", 8, "time_obs", 9,
> >> "time_nominal", 10, "AUTO", 11, "UNITS", 12, "DIR", 13, "SPD", 14,
> >> "GUST", 15, "VRB", 16, "DIRmin", 17, "DIRmax", 18, "prevail_VIS_SM",
> >> 19, "prevail_VIS_KM", 20, "plus_VIS_SM", 21, "plus_VIS_KM", 22,
> >> "prevail_VIS_M", 23, "VIS_dir", 24, "CAVOK", 25, "RVRNO", 26,
> >> "RV_designator", 27, "RV_above_max", 28, "RV_below_min", 29,
> >> "RV_vrbl", 30, "RV_min", 31, "RV_max", 32, "RV_visRange", 33, "WX",
> >> 34, "vert_VIS", 35, "cloud_type", 36, "cloud_hgt", 37,
> >> "cloud_meters", 38, "cloud_phenom", 39, "T", 40, "TD", 41,
> >> "hectoPasc_ALTIM", 42, "inches_ALTIM", 43, "NOSIG", 44,
> >> "TornadicType", 45, "TornadicLOC", 46, "TornadicDIR", 47,
> >> "BTornadic_hh", 48, "BTornadic_mm", 49,
> >> "ETornadic_hh", 50, "ETornadic_mm", 51, "AUTOindicator", 52,
> >> "PKWND_dir", 53, "PKWND_spd", 54, "PKWND_hh", 55, "PKWND_mm", 56,
> >> "WshfTime_hh", 57, "WshfTime_mm", 58, "Wshft_FROPA", 59, "VIS_TWR", 60,
> >> "VIS_SFC", 61, "VISmin", 62, "VISmax", 63, "VIS_2ndSite", 64,
> >> "VIS_2ndSite_LOC", 65, "LTG_OCNL", 66, "LTG_FRQ", 67, "LTG_CNS", 68,
> >> "LTG_CG", 69, "LTG_IC", 70, "LTG_CC", 71, "LTG_CA", 72, "LTG_DSNT", 73,
> >> "LTG_AP", 74, "LTG_VcyStn", 75, "LTG_DIR", 76, "Recent_WX", 77,
> >> "Recent_WX_Bhh", 78, "Recent_WX_Bmm", 79, "Recent_WX_Ehh", 80,
> >> "Recent_WX_Emm", 81, "Ceiling_min", 82, "Ceiling_max", 83,
> >> "CIG_2ndSite_meters", 84, "CIG_2ndSite_LOC", 85, "PRESFR", 86,
> >> "PRESRR", 87,
> >> "SLPNO", 88, "SLP", 89, "SectorVIS_DIR", 90, "SectorVIS", 91, "GR", 92,
> >> "GRsize", 93, "VIRGA", 94, "VIRGAdir", 95, "SfcObscuration", 96,
> >> "OctsSkyObscured", 97, "CIGNO", 98, "Ceiling_est", 99, "Ceiling", 100,
> >> "VrbSkyBelow", 101, "VrbSkyLayerHgt", 102, "VrbSkyAbove", 103,
> >> "Sign_cloud", 104, "Sign_dist", 105, "Sign_dir", 106, "ObscurAloft",
> >> 107,
> >> "ObscurAloftSkyCond", 108, "ObscurAloftHgt", 109, "ACFTMSHP", 110,
> >> "NOSPECI", 111, "FIRST", 112, "LAST", 113, "Cloud_low", 114,
> >> "Cloud_medium", 115, "Cloud_high", 116, "SNINCR", 117,
> >> "SNINCR_TotalDepth", 118,
> >> "SN_depth", 119, "SN_waterequiv", 120, "SunSensorOut", 121,
> >> "SunShineDur", 122,
> >> "PRECIP_hourly", 123, "PRECIP_amt", 124, "PRECIP_24_amt", 125,
> >> "T_tenths", 126,
> >> "TD_tenths", 127, "Tmax", 128, "Tmin", 129, "Tmax24", 130, "Tmin24",
> >> 131, "char_Ptend", 132, "Ptend", 133, "PWINO", 134, "FZRANO", 135,
> >> "TSNO", 136, "PNO", 137, "maintIndicator", 138, "PlainText", 139,
> >> "report", 140, "remarks", 141 ) ;
> >>
> >> # default netCDF record structure, contains all vars for the METAR
> >> reports
> >> @defaultrec = ( $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS3,
> >> $A, $A,
> >> $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS2, $A, $A, [( $S3, $S3, $S3,
> >> $S3 )],
> >> [( $F, $F, $F, $F )], [( $F, $F, $F, $F )], [( $F, $F, $F, $F )], [(
> >> $F, $F, $F, $F )], [( $F, $F, $F, $F )], [( $F, $F, $F, $F )], $AS32,
> >> $A,
> >> [( $S4, $S4, $S4, $S4, $S4, $S4 )], [( $F, $F, $F, $F, $F, $F )],
> >> [( $F, $F, $F, $F, $F, $F )], [( $S4, $S4, $S4, $S4, $S4, $S4 )],
> >> $A, $A, $A, $A, $A, $AS15, $AS10, $AS2, $A, $A, $A, $A, $AS4, $A, $A,
> >> $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS10, $A, $A, $A, $A, $A,
> >> $A, $A, $A, $A, $A, $AS2, [( $S8, $S8, $S8 )], [( $F, $F, $F )],
> >> [( $F, $F, $F )], [( $F, $F, $F )], [( $F, $F, $F )], $A, $A, $A, $A,
> >> $A, $A, $A, $A, $AS2, $A, $A, $A, $A, $AS2, $AS8, $A, $A, $A, $A,
> >> $AS3, $A, $AS3, $AS10, $AS10, $AS10, $AS8, $AS3, $A, $A, $A, $A, $A,
> >> $AS1, $AS1, $AS1, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A,
> >> $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $A, $AS128, $AS128,
> >> $AS128 ) ;
> >>
> >> # two fold purpose array, if entry > 0, then var is requested and
> >> it's value
> >> # is the position in the record, except first entry
> >> @W = ( 0 ) x ( $#defaultrec +1 ) ;
> >> $W[ 0 ] = -1 ;
> >>
> >> # open cdl and create record structure according to variables
> >> open( CDL, "$cdlfile" ) || die "could not open $cdlfile: $!\n" ;
> >> $i = 0 ;
> >> while( <CDL> ) {
> >>     if( s#^\s*(char|int|long|double|float) (\w{1,25})## ) {
> >>         ( $number ) = $CDL{ $2 } ;
> >>         push( @rec, $defaultrec[ $number ] ) ;
> >>         $W[ $number ] = $i++ ;
> >>     }
> >> }
> >> close CDL ;
> >> undef( @defaultrec ) ;
> >> undef( %CDL ) ;
> >>
> >> # read in station data
> >> if( -e "etc/sfmetar_sa.tbl" ) {
> >>     $sfile = "etc/sfmetar_sa.tbl" ;
> >> } elsif( -e "./sfmetar_sa.tbl" ) {
> >>     $sfile = "./sfmetar_sa.tbl" ;
> >> } else {
> >>     die "Can't find sfmetar_sa.tbl station file.: $!\n" ;
> >> }
> >> open( STATION, "$sfile" ) || die "could not open $sfile: $!\n" ;
> >>
> >> while( <STATION> ) {
> >>     s#^(\w{3,6})?\s+(\d{4,5}).{40}## ;
> >>     $id = $1 ;
> >>     $wmo_id = $2 ;
> >>     $wmo_id = "0" . $wmo_id if( length( $wmo_id ) == 4 ) ;
> >>     ( $lat, $lon, $elev ) = split ;
> >>     $lat = sprintf( "%7.2f", $lat / 100 ) ;
> >>     $lon = sprintf( "%7.2f", $lon / 100) ;
> >>
> >>     # set these vars ( $wmo_id, $lat, $lon, $elev )     $STATIONS{
> >> "$id" } = "$wmo_id $lat $lon $elev" ;
> >> }
> >> close STATION ;
> >>
> >> # read in list of already processed reports if it exists
> >> # open metar.lst, list of reports processed in the last 4 hours.
> >> if( -e "$datadir/metar.lst" ) {
> >>     open( LST, "$datadir/metar.lst" ) ||         die "could not open
> >> $datadir/metar.lst: $!\n" ;
> >>     while( <LST> ) {
> >>         ( $stn, $rtptime, $hr ) = split ;
> >>         $reportslist{ "$stn $rtptime" }  = $hr ;
> >>     }
> >>     close LST ;
> >>     #unlink( "$datadir/metar.lst" ) ;
> >> }
> >> # Now begin parsing file and decoding observations breaking on cntrl C
> >> $/ = "\cC" ;
> >>
> >> # set select processing here from STDIN
> >> START:
> >> while( 1 ) {
> >>     open( STDIN, '-' ) ;
> >>     vec($rin,fileno(STDIN),1) = 1;
> >>     $timeout = 1200 ; # 20 minutes
> >>     $nfound = select( $rout = $rin, undef, undef, $timeout );
> >>     # timed out
> >>     if( ! $nfound ) {
> >>         print "Shut down, time out 20 minutes\n" ;
> >>         &atexit() ;
> >>     }
> >>     &atexit( "eof" ) if( eof( STDIN ) ) ;
> >>
> >>     # Process each line of metar bulletins, header first
> >>     $_ = <STDIN> ;
> >>     #next unless /METAR|SPECI/ ;
> >>     s#\cC## ;
> >>     s#\cM##g ;
> >>     s#\cA\n## ;
> >>     s#\c^##g ;
> >>
> >>     s#\d\d\d \n## ;
> >>     s#\w{4}\d{1,2} \w{4} (\d{2})(\d{2})(\d{2})?.*\n## ;
> >>     $tday = $1 ;
> >>     $thour = $2 ;
> >>     $thour = "23" if( $thour eq "24" ) ;
> >>     $tmin = $3 ;
> >>     $tmin = "00" unless( $tmin ) ;
> >>     next unless ( $tday && defined( $thour ) ) ;
> >>     $time_trans = thetime( "trans" ) ;
> >>     if( s#(METAR|SPECI) \d{4,6}Z?\n## ) {
> >>         $rep_type = $1 ;
> >>     } elsif( s#(METAR|SPECI)\s*\n## ) {
> >>         $rep_type = $1 ;
> >>     } else {
> >>         $rep_type = "METAR" ;
> >>     }
> >>     # Separate bulletins into reports     if( /=\n/ ) {
> >>         s#=\s+\n#=\n#g ;
> >>         @reports = split( /=\n/ ) ;
> >>     } else {
> >>         #@reports = split ( /\n/ ) ;
> >>         s#\n# #g ;
> >>         next if( /\d{4,6}Z.*\d{4,6}Z/ ) ;
> >>         $reports[ 0 ] = $_ ;
> >>     }
> >>
> >>
> >
>

===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================