Extended regular expressions (ERE-s) are used in several places in the LDM system -- primarily as a pattern for matching
The official description of ERE-s can be found at http://www.opengroup.org/onlinepubs/7908799/xbd/re.html#tag_007_004 .
An excellent tool for determining a matching ERE from example text can be found at http://txt2re.com.
Here is an incomplete summary of ERE syntax:
Text:.Any single character[chars]Character class: One of chars[^chars]Character class: None of chars text1|text2 Alternative: text1 or text2 Quantifiers:?0 or 1 of the preceding text*0 or N of the preceding text (N > 0)+1 or N of the preceding text (N > 1){M,N}M through N of the preceding text (either may be missing) Grouping:(text)Grouping of text (either to set the borders of an alternative or for making backreferences where the Nth group can be used in a pqact PIPE action with (for example)\N) Anchors:^Start of string anchor$End of string anchor Escaping:\char escape that particular character (for instance to specify the characters ".[]()" etc.)
Don't use ERE-s with a ".*" prefix because:
The inefficiency of pathological ERE-s can be seen by using the UNIX® time utility with the LDM's regex utility. First the non-pathological case:
$ time regex -n 10000 \
-s 'lksjdfklsdjfkljsdfkljsdljfsdlkjfdlskjfldjflkjsdflkjsdflkjsd' \
"some-sort-of-pattern"
no match
real 0m0.044s
user 0m0.040s
sys 0m0.000s
The above indicates that ten-thousand comparisons of the given string
against the ERE took 0.04 seconds of user-time.
The timing of the corresponding pathological ERE is much different:
$ time regex -n 10000 \
-s 'lksjdfklsdjfkljsdfkljsdljfsdlkjfdlskjfldjflkjsdflkjsdflkjsd' \
".*some-sort-of-pattern"
no match
real 0m18.424s
user 0m17.720s
sys 0m0.020s
The above took 17.72 seconds of user-time, meaning that the
non-pathological ERE is 443 times more efficient than the pathological
one. More complex pathological ERE-s have even worse results.