[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #GRO-598397]: Question about regular expression in pqact file



Howard,

> I ran across the following issue when working on my pqact.conf_exp
> file.
> 
> I have files that are similar to the following example:
> 
> WWRFDM01-RCI_20090904T0000_20090904T0015_20090904T2330_00_ZRENCInwnd.nc.gz
> 
> I want to match these files and pipe them to a decoder I have. Here is
> the regular expression I am using.
> 
> ^(WWRFDM01-RCI)_(.*)T(.*)([0-9][0-9])_(.*)_(.*)T(.*)([0-9][0-9])_([0-9][0-9])(.*)_(.*RENCI.*nc.gz$)
> 
> which does parse the pattern correctly.
> 
> However, I want to pass the complete file name to my script.  I have
> been doing this by reconstructing the original file name using the
> back references.  However we have just made a change to our system and
> the filename is now being parsed using the above regex into 11
> different groups.  This is a problem for which I see three possible
> solutions (and maybe there are others).
> 
> 1) recode the regular expression so that it captures fewer groups.  I'm
> working on that...

The regex(1) utility, which comes with the LDM package, can probably help.  
Execute the command "man regex" for more information.  You can also use it to 
time your matches and, thus, improve their efficiency.

> 2) figure out how to express \10 and \11. Do you know the syntax for
> that? I haven't found it anywhere

The pqact(1) configuration-file syntax for backreferences beyond "\9" is 
"\(nn)" (e.g., "\(10)", "\(11)").  For more information, see 
<http://www.unidata.ucar.edu/software/ldm/ldm-6.8.1/basics/pqact.conf.html#argref>.

> 3) Use a token that represents the entire pattern. Is there such a
> pattern and if so what is it?

If you nest the entire pattern in another pair of parentheses, then the entire 
matching string is available via the backreference "\1".  If you do this, then 
you'll have to increment all other backreferences by one.

> Thanks much
> Howard
> --
> Howard Lander <mailto:address@hidden>
> Senior Research Software Developer
> Renaissance Computing Institute <http://www.renci.org>
> The University of North Carolina at Chapel Hill
> Duke University
> North Carolina State University
> 100 Europa Drive
> Suite 540
> Chapel Hill, NC 27517
> 919-445-9651

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: GRO-598397
Department: Support LDM
Priority: Normal
Status: Closed