Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #GRO-598397]: Question about regular expression in pqact file



Howard,

> I ran across the following issue when working on my pqact.conf_exp
> file.
> 
> I have files that are similar to the following example:
> 
> WWRFDM01-RCI_20090904T0000_20090904T0015_20090904T2330_00_ZRENCInwnd.nc.gz
> 
> I want to match these files and pipe them to a decoder I have. Here is
> the regular expression I am using.
> 
> ^(WWRFDM01-RCI)_(.*)T(.*)([0-9][0-9])_(.*)_(.*)T(.*)([0-9][0-9])_([0-9][0-9])(.*)_(.*RENCI.*nc.gz$)
> 
> which does parse the pattern correctly.
> 
> However, I want to pass the complete file name to my script.  I have
> been doing this by reconstructing the original file name using the
> back references.  However we have just made a change to our system and
> the filename is now being parsed using the above regex into 11
> different groups.  This is a problem for which I see three possible
> solutions (and maybe there are others).
> 
> 1) recode the regular expression so that it captures fewer groups.  I'm
> working on that...

The regex(1) utility, which comes with the LDM package, can probably help.  
Execute the command "man regex" for more information.  You can also use it to 
time your matches and, thus, improve their efficiency.

> 2) figure out how to express \10 and \11. Do you know the syntax for
> that? I haven't found it anywhere

The pqact(1) configuration-file syntax for backreferences beyond "\9" is 
"\(nn)" (e.g., "\(10)", "\(11)").  For more information, see 
<http://www.unidata.ucar.edu/software/ldm/ldm-6.8.1/basics/pqact.conf.html#argref>.

> 3) Use a token that represents the entire pattern. Is there such a
> pattern and if so what is it?

If you nest the entire pattern in another pair of parentheses, then the entire 
matching string is available via the backreference "\1".  If you do this, then 
you'll have to increment all other backreferences by one.

> Thanks much
> Howard
> --
> Howard Lander <mailto:howard@xxxxxxxxx>
> Senior Research Software Developer
> Renaissance Computing Institute <http://www.renci.org>
> The University of North Carolina at Chapel Hill
> Duke University
> North Carolina State University
> 100 Europa Drive
> Suite 540
> Chapel Hill, NC 27517
> 919-445-9651

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: GRO-598397
Department: Support LDM
Priority: Normal
Status: Closed


 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Community Programs   Unidata is a member of the UCAR Community Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690