[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #RJS-786355]: Regular expressions

Subject: [LDM #RJS-786355]: Regular expressions
Date: Sun, 05 Apr 2009 10:34:12 -0600

Dave,

> > Backreference \n always refers to the subexpression enclosed by the
> > n-th
> > unescaped left parenthesis.
> 
> OK. This is what I read in some documentation that I eventually dug up
> on the Web after I sent my support request.

> > As I recall, the first field in a WMO header has six characters: four
> > letters followed by two digits.  The above ERE, however, would match,
> > for example, "SIVA ", "SIWA ", "SHV ", "SHXX ", and "SSA " -- which
> > don't fit the pattern of the first field of a WMO header.
> 
> That problem occurred to me late last night, so I stuck wild card
> characters in wherever I needed to get the first field up to 6
> characters. The new version looks like:
> 
> WMO  (^S[IMN]V[^GINS]..)|(^S[IMN]W[^KZ]..)|(^S(HV...|HXX|S[^X]...))|
> (^SX(VD..|V.50|US(2[03]|08|40|82|86)))|(^Y[HO]XX84) .... ([0-3][0-9])
> ([0-2][0-9])..
> FILE -close    data/surface/(\9:yy)(\9:mm)\9\(10)_boy.wmo
> 
> I've chosen \9 and \(10) to try to match the day and hour information
> in the pattern, based on the number of unescaped left parentheses
> preceding those fields, of which I count 8.
> 
> The files that get saved as a result are named, literally, "(:yy)
> (:mm)_boy.wmo", so the choices of \9 and \(10) don't appear to match
> anything. This suggests to me that there might be (for this purpose)
> effectively fewer than 8 parenthetical expressions preceding the day
> and hour fields, unless there's another error in there somewhere.

The "|" operator has the lowest precedence, so many of the subpatterns
between "|" operators can loose their outermost parentheses.

The subpattern "^S(HV...|HXX|S[^X]...)" will match the four character
string "SHXX" as well as many six character strings, which is probably
not what you want.

The subpattern before the subpattern " .... ([0-3][0-9])([0-2][0-9]).."
should be enclosed in parentheses because it's the one that's trying
to match the first six characters with a sequence of alternatives.
As the ERE stands now, only product-identifiers that start with a "Y"
will match on the date and hour.

I'm not exactly sure what you're trying to match, but the following
ERE-s might help:

(^S[IMN]V[^GINS]..|^S[IMN]W[^KZ]..|^S(HV...|HXX..|S[^X]...)|^SX(VD..|V.50|US(2[03]|08|40|82|86))|^Y[HO]XX84)
 .... ([0-3][0-9])([0-2][0-9])..

^(S(([IMN](V[^GINS]|W[^KZ]))..|(HV.|HXX|S[^X].)..|X(VD..|V.50|US(2[03]|08|40|82|86)))|Y[HO]XX84)
 .... ([0-3][0-9])([0-2][0-9])..

Backreferences for the day and hour would be, respectively, \5 and \6
for the first ERE and \8 and \9 for the second.

> > To simplify things, you can always break-up a complicated ERE into
> > multiple pqact(1) entries, each one handling a subset of the
> > complicated
> > ERE.
> 
> I'll try this. Would there be any complications arising from using
> "FILE -close" on each entry?

If your computer is fast enough to handle the rate at which files
are opened and closed, then there shouldn't be any complications.
The LDM log file will tell you if the pqact(1) process is falling
behind.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: RJS-786355
Department: Support LDM
Priority: Normal
Status: Closed

Prev by Date: [LDM #RJS-786355]: Regular expressions
Next by Date: [LDM #YXG-369372]: FW: LDM Support and Future Enhancements
Previous by thread: [LDM #RJS-786355]: Regular expressions
Next by thread: [LDM #RJG-715043]: LDM feed configuration...suggestions for improving the LDM
Index(es):
- Date
- Thread