digitalmars.com                      
Last update Sat Jun 18 16:54:33 2011

regexp.h

RegExp is a C++ class to handle regular expressions. Regular expressions are a powerful method of string pattern matching. The RegExp class is the core foundation for adding powerful string pattern matching capabilities to programs like grep, text editors, awk, sed, etc. The regular expression language used is the same as that commonly used, however, some of the very advanced forms may behave slightly differently.

The linkable runtime library version of RegExp is for ASCII chars. To use it with Unicode, compile the file \dm\src\core\regexp.cpp:

dmc -c regexp -D_UNICODE
RegExp class members are:
RegExp();
~RegExp();

unsigned re_nsub;
regmatch_t *pmatch;

int compile(char *pattern,
   char *attributes, int ref);
int test(char *string,
   int startindex = 0);

char *replace(char *format);
char *replace2(char *format);

static char *replace3(char *format,
    char *input,
    unsigned re_nsub, regmatch_t *pmatch);
static char *replace4(char *input,
    regmatch_t *text, char *replacement);

struct regmatch_t
Is a simple type representing a match. It contains the members:
int rm_so; index into the input string of the start of a match
int rm_eo; index just past the end of the match
RegExp();
This is the constructor. It builds a regular expression object.
~RegExp();
This is the destructor.
unsigned re_nsub;
regmatch_t *pmatch;
pmatch[0] contains the match for the entire regular expression. If the regular expression contained parenthesized subexpressions, the matches for those subexpressions are in the pmatch[1..re_nsub] array.
int compile(char *pattern, char *attributes, int ref);
Compiles a regular expression given by pattern and modified by attributes into an internal format. A regular expression must be compiled before it is used. Separating the compilation step from the match step means that the time consuming compilation needs to be done only once, and then the fast executing internal format can be used for repeated matches.
pattern
is the regular expression string.
attributes
is NULL or a string containing either or both of the characters i and g:
i the regular expression is case insensitive
g the regular expression is global
ref
is a flag to the RegExp object:
0 The RegExp object should make its own copy of pattern.
1 The RegExp object can refer to the caller's copy of the pattern string. This means that the pattern string cannot be free'd or delete'd until the RegExp object is destructed.
Return Value
!=0 Successful compilation
0 Failed to compile - the regular expression was not valid
int test(char *string, int startindex = 0);
Scans a string starting at position startindex looking for a match against the previously compiled regular expression.

Return Value

!=0 Successful match. Member pmatch[0] is set to where the expression match is, and pmatch[1..re_nsub] is the array of any subexpression matches.
0 Failed to find a match.
char *replace(char *format);
Once a regular expression has been run through compile() and matched against a source string with test(), then replace() can be used to merge a format string with the matched text.

The format string consists of:

& replace this character with the match specified by member text.
\n replace with the nth subexpression, where n is 1..9, specified by member pmatch[n].
\c replace with the character c.
c any other characters c are copied to the output string.

Return Value

The merged output string is returned. It was allocated by malloc(), and so should be free'd by free().
char *replace2(char *format);
This is more advanced than replace() in that it can handle more than 9 subexpressions. The format string consists of:
$$ $
$& The matched substring.
$` The portion of string that precedes the matched substring.
$' The portion of string that follows the matched substring.
$n The nth capture, where n is a single digit 1-9 and $n is not followed by a decimal digit. If n <= re_nsub and the nth capture is undefined, use the empty string instead. If n > re_nsub, no characters are copied to the output.
$nn The nnth capture, where nn is a two-digit decimal number 01-99. If nn <= re_nsub and the nnth capture is undefined, use the empty string instead. If nn > re_nsub, no characters are copied to the output.
$c where c is any other character causes $c to be copied to the output string.
c any other characters c are copied to the output string.
static char *replace3(char *format, char *input, regmatch_t *text, unsigned re_nsub, regmatch_t *pmatch);
replace3() is the same as replace2(), except that it does not need a RegExp instance. Instead, it requires the parameters:
input The source string from where the matched text comes from.
re_nsub Number of subexpression matches.
pmatch[1+re_nsub] Array of those subexpression matches, with pmatch[0] being the match for the entire expression.
static char *replace4(char *input, regmatch_t *text, char *replacement);
replace4() does not require a RegExp instance. It performs a simple merge of input with replacement. Characters from input[text->re_so] to input[text->re_eo] are replaced with replacement.
Home | Compiler & Tools | IDDE Reference | STL | Search | Download | Forums