www.digitalmars.com

D Programming Language 1.0


Last update Sun Dec 30 20:34:43 2012

Regular Expressions

Regular expressions are a powerful tool for pattern matching on strings of text. They are built in to the core of languages like Perl, Ruby, and Javascript. Perl and Ruby are particulary reknowned for adroitly handling regular expressions. So why aren't they part of the D core language? Read on and see how they're done in D compared with Ruby.

This article explains how to use regular expressions in D. It doesn't explain regular expressions themselves, after all, people have written entire books on that topic. D's specific implementation of regular expressions is entirely contained in the Phobos library module std.regexp. For a more advanced treatment of using regular expressions in conjuction with template metaprogramming, see Templates Revisited.

In Ruby a regular expression can be created as a special literal:

r = /pattern/
s = /p[1-5]\s*/

D doesn't have special literals for them, but they can be created:

r = RegExp("pattern");
s = RegExp(r"p[1-5]\s*");

If the pattern contains backslash characters \, wysiwyg string literals are used, which have the 'r' prefix to the string. r and s are of type RegExp, but we can use type inference to declare and assign them automatically:

auto r = RegExp("pattern");
auto s = RegExp(r"p[1-5]\s*");

To check for a match of a string s with a regular expression in Ruby, use the =~ operator, which returns the index of the first match:

s = "abcabcabab"
s =~ /b/   /* match, returns 1 */
s =~ /f/   /* no match, returns nil */

In D this looks like:

auto s = "abcabcabab";
std.regexp.find(s, "b");    /* match, returns 1 */
std.regexp.find(s, "f");    /* no match, returns -1 */

Note the equivalence to std.string.find, which searches for substring matches rather than regular expression matches.

The Ruby =~ operator sets some implicitly defined variables based on the result:

s = "abcdef"
if s =~ /c/
    "#{$`}[#{$&}]#{$'}"   /* generates string ab[c]def

The function std.regexp.search() returns a RegExp object describing the match, which can be exploited:

auto m = std.regexp.search("abcdef", "c");
if (m)
    writefln("%s[%s]%s", m.pre, m.match(0), m.post);

Or even more concisely as:

if (auto m = std.regexp.search("abcdef", "c"))
    writefln("%s[%s]%s", m.pre, m.match(0), m.post); // writes ab[c]def

Search and Replace

Search and replace gets more interesting. To replace the occurrences of "a" with "ZZ" in Ruby; the first occurrence, then all:

s = "Strap a rocket engine on a chicken."
s.sub(/a/, "ZZ") // result: StrZZp a rocket engine on a chicken.
s.gsub(/a/, "ZZ") // result: StrZZp ZZ rocket engine on ZZ chicken.

In D:

s = "Strap a rocket engine on a chicken.";
sub(s, "a", "ZZ");        // result: StrZZp a rocket engine on a chicken.
sub(s, "a", "ZZ", "g");   // result: StrZZp ZZ rocket engine on ZZ chicken.

The replacement string can reference the matches using the $&, $$, $', $`, .. 9 notation:

sub(s, "[ar]", "[$&]", "g"); // result: St[r][a]p [a] [r]ocket engine on [a] chicken.

Or the replacement string can be provided by a delegate:

sub(s, "[ar]",
   (RegExp m) { return toupper(m.match(0)); },
   "g");    // result: StRAp A Rocket engine on A chicken.
(toupper() comes from std.string.)

Looping

It's possible to search over all matches within a string:

import std.stdio;
import std.regexp;

void main()
{
    foreach(m; RegExp("ab").search("abcabcabab"))
    {
        writefln("%s[%s]%s", m.pre, m.match(0), m.post);
    }
}
// Prints:
// [ab]cabcabab
// abc[ab]cabab
// abcabc[ab]ab
// abcabcab[ab]

Conclusion

D regular expression handling is as powerful as Ruby's. But its syntax isn't as concise:

But it is just as powerful.





Forums | Comments |  D  | Search | Downloads | Home