Regular Expressions
The Digital Mars regular expression engine is common to both the C++ Library RegExp and the D Library RegExp. Regular expressions are patterns specified by the following sequences:
c | matches literal character c. |
. | matches any character. |
* | matches previous character/subexpression 0 or more times. |
+ | matches previous character/subexpression 1 or more times. |
? | matches previous character/subexpression 0 or 1 times. |
^ | matches beginning of line. |
$ | matches end of line |
{n} | matches previous character/subexpression n times. |
{n,} | matches previous character/subexpression n or more times. |
{n,m} | matches previous character/subexpression n to m times. |
[class] | match character in character class. |
[^class] | match character not in character class. |
A|B | matches regular expression A or regular expression B. |
(exp) | matches regular subexpression exp. |
Escape sequences:
\nnn | starts out a 1, 2 or 3 digit octal sequence, where n is an octal digit. If nnn is larger than 0377, then the 3rd digit is not part of the sequence and is not consumed. For maximal portability to other regular expression engines, use exactly 3 digits. |
\xXX | starts out a 1 or 2 digit hex sequence. X is a hex character. If the first character after the \x is not a hex character, the value of the sequence is 'x' and the XX is not consumed. For maximal portability to other regular expression engines, use exactly 2 digits. |
\uUUUU | is a unicode sequence. There are exactly 4 hex characters after the \u, if any are not, then the value of the sequence is 'u', and the UUUU is not consumed. |
\b | matches a backspace character (in character classes only). |
\b | matches a word boundary (when not in character classes). |
\B | matches when not on a word boundary (when not in character classes). |
\cC | matches the control character corresponding to the letter C. |
\d | matches a 0..9 digit. |
\D | matches any character but a 0..9 digit. |
\f | matches a formfeed character. |
\n | matches a linefeed character. |
\r | matches a carriage return character. |
\s | matches whitespace, one of \f, \n, \r, \t or \v. |
\S | matches any character but those recognized by \s. |
\t | matches a tab character. |
\v | matches a vertical tab character. |
\w | matches a word character. |
\W | matches any character that is not a word. |
\\ | matches the \ character. |
\char | matches char literally if char is not one of the above. |
attributes[] contains zero or more of the following:
g | global match - used to pick off a sequence of matches rather than starting over each time from the beginning of the input string. |
i | case insensitive |
m | multiline |