Extended regular expressions (ERE-s) are used in several places in the LDM system -- primarily as a pattern for matching
The official description of ERE-s can be found at http://www.opengroup.org/onlinepubs/7908799/xbd/re.html#tag_007_004 .
An excellent tool for determining a matching ERE from example text can be found at http://txt2re.com.
Here is an incomplete summary of ERE syntax:
Text:.
Any single character[
chars]
Character class: One of chars[^
chars]
Character class: None of chars text1|
text2 Alternative: text1 or text2 Quantifiers:?
0 or 1 of the preceding text*
0 or N of the preceding text (N > 0)+
1 or N of the preceding text (N > 1){M,N}
M through N of the preceding text (either may be missing) Grouping:(
text)
Grouping of text (either to set the borders of an alternative or for making backreferences where the Nth group can be used in a pqact PIPE action with (for example)\
N) Anchors:^
Start of string anchor$
End of string anchor Escaping:\
char escape that particular character (for instance to specify the characters ".[]()
" etc.)
Don't use ERE-s with a ".*" prefix because:
The inefficiency of pathological ERE-s can be seen by using the UNIX® time utility with the LDM's regex utility. First the non-pathological case:
The above indicates that ten-thousand comparisons of the given string against the ERE took 0.04 seconds of user-time.$ time regex -n 10000 \ -s 'lksjdfklsdjfkljsdfkljsdljfsdlkjfdlskjfldjflkjsdflkjsdflkjsd' \ "some-sort-of-pattern" no match real 0m0.044s user 0m0.040s sys 0m0.000s
The timing of the corresponding pathological ERE is much different:
The above took 17.72 seconds of user-time, meaning that the non-pathological ERE is 443 times more efficient than the pathological one. More complex pathological ERE-s have even worse results.$ time regex -n 10000 \ -s 'lksjdfklsdjfkljsdfkljsdljfsdlkjfdlskjfldjflkjsdflkjsdflkjsd' \ ".*some-sort-of-pattern" no match real 0m18.424s user 0m17.720s sys 0m0.020s