D - regexp suggestion

Pavel Minayev (14/14) Feb 08 2002 It would be really nice to have a method of RegExp similar to test(),

Walter (4/18) Feb 08 2002 I believe you can already do that with regexp by looking at the match ar...

Pavel Minayev (4/6) Feb 08 2002 array

Walter (3/9) Feb 08 2002 You can also use the "g" attribute.

Pavel Minayev (4/5) Feb 08 2002 Sorry, I'm not very familiar with regexp... how is

Walter (4/9) Feb 09 2002 If you use the "g" attribute to the RegExp constructor, and repeated cal...

Pavel Minayev (4/6) Feb 09 2002 But doesn't it try to search for the regexp further if it doens't

Walter (4/10) Feb 09 2002 calls

Pavel Minayev (14/17) Feb 09 2002 Then I don't understand how it can be used to tokenize the string.

Sean L. Palmer (6/24) Feb 09 2002 I think sscanf could do this if it could return a pointer to how far it ...

Pavel Minayev (5/8) Feb 09 2002 got

Sean L. Palmer (7/15) Feb 09 2002 sscanf has alot more power than most people realize. I myself didn't

Walter (7/13) Feb 09 2002 If you're changing the regular expression you're searching for, which is

Pavel Minayev (33/38) Feb 09 2002 for

Pavel Minayev (5/9) Feb 09 2002 Sorry =) This should of course look:
Walter (6/6) Feb 09 2002 All you have to do is:

Karl Bochert (12/23) Feb 09 2002 Looks really awkward. Why doesn't the RegExp class have some query fucti...
Pavel Minayev (5/11) Feb 10 2002 If the first token will be r2, and not r1, but there are some r1s

Walter (7/21) Feb 10 2002 Yes, but if you are using multiple RegExp's on the same string, you need...

Pavel Minayev (14/17) Feb 10 2002 two

Karl Bochert (16/37) Feb 10 2002 I may be missing the point here but:

Pavel Minayev (10/13) Feb 10 2002 overall

Karl Bochert (37/57) Feb 10 2002 I probably have some details wrong here, but

Pavel Minayev (5/23) Feb 10 2002 Yep, right. Now I have all the tokens, how do I determine
Walter (5/9) Feb 10 2002 There is no difference if the global attribute is set. If the global

Karl Bochert (6/18) Feb 10 2002 I think I understand. match() without the global attribute set finds al...

Walter (5/14) Feb 10 2002 That's not a problem with parenthesized subexpressions. You can tell whi...

Pavel Minayev (4/7) Feb 10 2002 Walter, where is that match[][] thing? match() returns char[][], which

Walter (6/13) Feb 10 2002 which

Pavel Minayev (9/11) Feb 11 2002 char[][] is the list of tokens, or, to be more exact, the list of their

Walter (14/21) Feb 11 2002 Suppose

Karl Bochert (9/36) Feb 11 2002 Or:

"Pavel Minayev" <evilone omen.ru> writes:

It would be really nice to have a method of RegExp similar to test(),
but only matching regexp at the position given, not advancing
further on error, and returning number of bytes read (or 0 on failure).
It could be used for easy token parsing:

    RegExp identifier = new RegExp('\w', "");
    char[] code, token;
    int pos;
    ...
    int count = identifier.get(code, pos);
    if (count)
    {
        token = code[pos .. pos + count];
        pos += count;    // next token
    }

Feb 08 2002

"Walter" <walter digitalmars.com> writes:

I believe you can already do that with regexp by looking at the match array
and using it to slice the input array.

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a41ccn$2m50$1 digitaldaemon.com...
 It would be really nice to have a method of RegExp similar to test(),
 but only matching regexp at the position given, not advancing
 further on error, and returning number of bytes read (or 0 on failure).
 It could be used for easy token parsing:

     RegExp identifier = new RegExp('\w', "");
     char[] code, token;
     int pos;
     ...
     int count = identifier.get(code, pos);
     if (count)
     {
         token = code[pos .. pos + count];
         pos += count;    // next token
     }

Feb 08 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a41imc$2pnk$1 digitaldaemon.com...
 I believe you can already do that with regexp by looking at the match

array
 and using it to slice the input array.

Yes, but it's sloooooow!

Feb 08 2002

"Walter" <walter digitalmars.com> writes:

You can also use the "g" attribute.

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a41jep$2q3p$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a41imc$2pnk$1 digitaldaemon.com...
 I believe you can already do that with regexp by looking at the match

 array
 and using it to slice the input array.

 Yes, but it's sloooooow!

Feb 08 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a41oek$2se5$1 digitaldaemon.com...

 You can also use the "g" attribute.

Sorry, I'm not very familiar with regexp... how is
it supposed to do what I want?

Feb 08 2002

"Walter" <walter digitalmars.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a42jse$6h1$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a41oek$2se5$1 digitaldaemon.com...

 You can also use the "g" attribute.

 Sorry, I'm not very familiar with regexp... how is
 it supposed to do what I want?

If you use the "g" attribute to the RegExp constructor, and repeated calls
to exec() will each pick up where the previous left off.

Feb 09 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a42tc9$hrc$1 digitaldaemon.com...

 If you use the "g" attribute to the RegExp constructor, and repeated calls
 to exec() will each pick up where the previous left off.

But doesn't it try to search for the regexp further if it doens't
match in current position?

Feb 09 2002

"Walter" <walter digitalmars.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a433vk$l3i$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a42tc9$hrc$1 digitaldaemon.com...

 If you use the "g" attribute to the RegExp constructor, and repeated


calls
 to exec() will each pick up where the previous left off.

 But doesn't it try to search for the regexp further if it doens't
 match in current position?

Yes.

Feb 09 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a43tq3$11uk$2 digitaldaemon.com...

 But doesn't it try to search for the regexp further if it doens't
 match in current position?

 Yes.

Then I don't understand how it can be used to tokenize the string.
Suppose I have:

    foo123 = bar456 + 789;

Now I first search for the identifier, and get "foo123" and
"bar456". Then I search for numbers and get "123", "456"
and "789" - and only the latter is correct...

With my suggestion implemented, however, it'd look somewhat
different. First I check for identifier, and get "foo123".
Now I advance after the end of that token, and perform another
check... when I get to "789", I check if it matches an
identifier /\w.../ - it doesn't, so I check if it is a number
/0-9+/ and succeed... that's how it is supposed to work.

Feb 09 2002

"Sean L. Palmer" <spalmer iname.com> writes:

I think sscanf could do this if it could return a pointer to how far it got
in the input string during processing in addition to how many fields were
converted.  sscanf as it exists in C is not so useful.

Sean

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a443lq$147s$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a43tq3$11uk$2 digitaldaemon.com...

 But doesn't it try to search for the regexp further if it doens't
 match in current position?

 Yes.

 Then I don't understand how it can be used to tokenize the string.
 Suppose I have:

     foo123 = bar456 + 789;

 Now I first search for the identifier, and get "foo123" and
 "bar456". Then I search for numbers and get "123", "456"
 and "789" - and only the latter is correct...

 With my suggestion implemented, however, it'd look somewhat
 different. First I check for identifier, and get "foo123".
 Now I advance after the end of that token, and perform another
 check... when I get to "789", I check if it matches an
 identifier /\w.../ - it doesn't, so I check if it is a number
 /0-9+/ and succeed... that's how it is supposed to work.

Feb 09 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a444t2$14qa$1 digitaldaemon.com...

 I think sscanf could do this if it could return a pointer to how far it

got
 in the input string during processing in addition to how many fields were
 converted.  sscanf as it exists in C is not so useful.

Also if sscanf would understoof regexps... =)
That's why I suggest RegExp.scan();

Feb 09 2002

"Sean L. Palmer" <spalmer iname.com> writes:

sscanf has alot more power than most people realize.  I myself didn't
discover alot of it until recently.  But it won't tell you where it got to
in the string.

Sean

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a447tq$161o$1 digitaldaemon.com...
 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:a444t2$14qa$1 digitaldaemon.com...

 I think sscanf could do this if it could return a pointer to how far it

 got
 in the input string during processing in addition to how many fields


were
 converted.  sscanf as it exists in C is not so useful.

 Also if sscanf would understoof regexps... =)
 That's why I suggest RegExp.scan();

Feb 09 2002

"Walter" <walter digitalmars.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a443lq$147s$1 digitaldaemon.com...
 With my suggestion implemented, however, it'd look somewhat
 different. First I check for identifier, and get "foo123".
 Now I advance after the end of that token, and perform another
 check... when I get to "789", I check if it matches an
 identifier /\w.../ - it doesn't, so I check if it is a number
 /0-9+/ and succeed... that's how it is supposed to work.

If you're changing the regular expression you're searching for, which is
what you're doing by switching from looking for an identifier to looking for
a number, you'll need to create a new RegExp for each different regular
expression. Then apply them as required to the remainder of the input
string.

Feb 09 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a446n4$15hm$1 digitaldaemon.com...

 If you're changing the regular expression you're searching for, which is
 what you're doing by switching from looking for an identifier to looking

for
 a number, you'll need to create a new RegExp for each different regular
 expression. Then apply them as required to the remainder of the input
 string.

I pre-create them all in form of an array;

    RegExp[] tokens;

    static this()
    {
        tokens =
            new RegExp('\w+', ""),    // word
            new RegExp('\d+', ""),    // number
            ...
    }

Now how do I apply them to the remainder of the input string (whatever
this means)? I can of course first retrieve identifiers, and remove
them from the array, then get rid of numbers, symbols... etc. But it
would be damn slow.

This could be also done by "regexp comparison" function, if there
were one:

    // read a token
    for (int i = 0; i < token.length; i++)
    {
        // RegExp.cmp() returns the number of chars at the beginning
        // of given string that match the regexp, or 0 if no match
        int len = tokens[0].cmp(text[pos .. text.length]);
        if (len)
        {
            // match!
            token = text[pos .. pos + len];
            pos += len;
        }
    }

Regexp comparison is a good idea anyhow, IMO. Can be used for lots
of different things.

Feb 09 2002

"Pavel Minayev" <evilone omen.ru> writes:

         tokens =
             new RegExp('\w+', ""),    // word
             new RegExp('\d+', ""),    // number
             ...

Sorry =) This should of course look:

         tokens =
             new RegExp('\w+', "") ~    // word
             new RegExp('\d+', "") ~    // number
             ...

Feb 09 2002

"Walter" <walter digitalmars.com> writes:

All you have to do is:

    r1 = new RegExp(...);

    m1 = r1.match(input);
    if (m1.length)
        m2 = r2.match(input[&m1[0][0] - &input[0] .. input.length];

and so on...

Feb 09 2002

Karl Bochert <kbochert ix.netcom.com> writes:

On Sat, 9 Feb 2002 15:56:56 -0800, "Walter" <walter digitalmars.com> wrote:
 All you have to do is:
 
     r1 = new RegExp(...);
 
     m1 = r1.match(input);
     if (m1.length)
         m2 = r2.match(input[&m1[0][0] - &input[0] .. input.length];
 
 and so on...
 
 

Looks really awkward. Why doesn't the RegExp class have some query fuctions
to hide the gore?

    r1 = new RegExp (...);
    r1.exec(input);

    x = r1.matches ();  //returns number of parenthesized matches
    tail = r1.tail ();        //returns portion of input after match
    m1 = getMatch (n)  //returns the nth matching substring

Regular expressions are very powerful but can also be very complicated.
Shouldn't the class help by providing well-named queries?
In addition it would be more like PCRE, which is already well understood.

Karl Bochert

Feb 09 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a44fdn$18t6$1 digitaldaemon.com...

 All you have to do is:

     r1 = new RegExp(...);

     m1 = r1.match(input);
     if (m1.length)
         m2 = r2.match(input[&m1[0][0] - &input[0] .. input.length];

 and so on...

If the first token will be r2, and not r1, but there are some r1s
further in the string, the first match() will skip the r2 and
get the r1.

Feb 10 2002

"Walter" <walter digitalmars.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a45a2l$1kk4$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a44fdn$18t6$1 digitaldaemon.com...

 All you have to do is:

     r1 = new RegExp(...);

     m1 = r1.match(input);
     if (m1.length)
         m2 = r2.match(input[&m1[0][0] - &input[0] .. input.length];

 and so on...

 If the first token will be r2, and not r1, but there are some r1s
 further in the string, the first match() will skip the r2 and
 get the r1.

Yes, but if you are using multiple RegExp's on the same string, you need to
decide which slices get searched for which patterns. If you are using one
RegExp, just set the "g" attribute. If you use one RegExp to search for two
different patterns, use parenthesized subexpressions, and the math[][]
return will tell you which one was matched.

Feb 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a45e05$1m8o$1 digitaldaemon.com...

 RegExp, just set the "g" attribute. If you use one RegExp to search for

two
 different patterns, use parenthesized subexpressions, and the math[][]
 return will tell you which one was matched.

This will tokenize the string, but once I have all the tokens, there's -
once again - the problem how to determine the type of each token, having
its regexp. Once again suppose the token was "foo666". Once again I need
to check all possible versions, and if I check for the number first, I'll
have a match - "666"... of course a check can be done for starting
position == 0 - which involves too many checks, IMO, or the regexp can
have "^" inserted at the front... but even then, each token gets checked
twice - first in the RegExp.match(), then by my type detection routine.
Wouldn't it be slow?

I'm not asking for much... just the version of test() with for-loop
removed.

Feb 10 2002

Karl Bochert <kbochert ix.netcom.com> writes:

On Sun, 10 Feb 2002 17:54:52 +0300, "Pavel Minayev" <evilone omen.ru> wrote:
 "Walter" <walter digitalmars.com> wrote in message
 news:a45e05$1m8o$1 digitaldaemon.com...
 
 RegExp, just set the "g" attribute. If you use one RegExp to search for

 two
 different patterns, use parenthesized subexpressions, and the math[][]
 return will tell you which one was matched.

 
 This will tokenize the string, but once I have all the tokens, there's -
 once again - the problem how to determine the type of each token, having
 its regexp. Once again suppose the token was "foo666". Once again I need
 to check all possible versions, and if I check for the number first, I'll
 have a match - "666"... of course a check can be done for starting
 position == 0 - which involves too many checks, IMO, or the regexp can
 have "^" inserted at the front... but even then, each token gets checked
 twice - first in the RegExp.match(), then by my type detection routine.
 Wouldn't it be slow?
 
 I'm not asking for much... just the version of test() with for-loop
 removed.
 

 I may be missing the point here but:

The power of regular expressions is their ability to search for multiple
patterns at once. If the next thing in the input is either a number or
a word which could have embedded digits then
 "\w[\w\d]*"  matches a word 
"\d+"             matches a number
"(\w[\w\d]*)|(\d+)"  matches a word or a number
and
"[\t ]*(\w[\w\d]*)|(\d+)"  matches any spaces followed by a word or a number.

In the last 2 cases, the result of the search is up to 3 substrings : the
overall
match, and the substrings within the parentheses. Perform the search and
then the  lengths of the substrings will tell you what you found.

Documentation on standard regex's can be found at:
http://compy.ww.tu-berlin.de/doc/packages/pcre/pcre.html
among many other places.

Feb 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Karl Bochert" <kbochert ix.netcom.com> wrote in message
news:1103_1013361883 bose...

 In the last 2 cases, the result of the search is up to 3 substrings : the

overall
 match, and the substrings within the parentheses. Perform the search and
 then the  lengths of the substrings will tell you what you found.

How can these lengths tell? Token type is determined by the forming
characters (described by regexp in my case), not by the length - or
am I missing something? Suppose the input was:

    foo bar123 456 baz

Now I get the following tokens:

    "foo", "bar123", "baz", "123", "456"

How do I know that "123" is not supposed to be here?

Feb 10 2002

Karl Bochert <kbochert ix.netcom.com> writes:

On Sun, 10 Feb 2002 20:32:11 +0300, "Pavel Minayev" <evilone omen.ru> wrote:
 "Karl Bochert" <kbochert ix.netcom.com> wrote in message
 news:1103_1013361883 bose...
 
 In the last 2 cases, the result of the search is up to 3 substrings : the

 overall
 match, and the substrings within the parentheses. Perform the search and
 then the  lengths of the substrings will tell you what you found.

 
 How can these lengths tell? Token type is determined by the forming
 characters (described by regexp in my case), not by the length - or
 am I missing something? Suppose the input was:
 
     foo bar123 456 baz
 
 Now I get the following tokens:
 
     "foo", "bar123", "baz", "123", "456"
 
 How do I know that "123" is not supposed to be here?
 

I probably have some details wrong here, but
Declare a regular expression:
    p = Regexp( "(\w[\w\d]*)|(\d+)" )
then:
   p.match ("123test")
produces 3 substrings:
"123"    -- the overall match
""          -- the match for the first set of parens
"123"    -- the match for the second set of parens

In PCRE (the common C implementation) the substrings are returned
as an array of pointers into the string (6 in this case). I suspect
D returns an equivalent array of offsets (slices?)  into the string?

The non-zero length of the third substring shows that a number ("\d+") was
found.

In your example:
     p.exec (foo bar123 baz 123);
produces:
    "foo"
    "foo"
    ""

and:
    p.exec ("bar123 baz 123")
produces:
    "bar123"
    "bar123"
    ""

and:
    p.exec ("123 456");
produces:
    "123"
    ""
    "123"

I have used exec() here because it is probably the same as PCRE's exec
function. I have read the RegExp documentation but do not understand the
difference between the exec() and match() methods. Maybe match() is
just exec() anchored to the start of the text?

Karl

Feb 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Karl Bochert" <kbochert ix.netcom.com> wrote in message
news:1103_1013375566 bose...

 In your example:
      p.exec (foo bar123 baz 123);
 produces:
     "foo"
     "foo"
     ""

 and:
     p.exec ("bar123 baz 123")
 produces:
     "bar123"
     "bar123"
     ""

 and:
     p.exec ("123 456");
 produces:
     "123"
     ""
     "123"

Yep, right. Now I have all the tokens, how do I determine
the _type_ of each (identifier, number, string...), with
regexp describing those types?

Feb 10 2002

"Walter" <walter digitalmars.com> writes:

"Karl Bochert" <kbochert ix.netcom.com> wrote in message
news:1103_1013375566 bose...
 I have used exec() here because it is probably the same as PCRE's exec
 function. I have read the RegExp documentation but do not understand the
 difference between the exec() and match() methods. Maybe match() is
 just exec() anchored to the start of the text?

There is no difference if the global attribute is set. If the global
attribute is not set, then match returns an array of all the matches in the
input.

Feb 10 2002

Karl Bochert <kbochert ix.netcom.com> writes:

On Sun, 10 Feb 2002 15:47:34 -0800, "Walter" <walter digitalmars.com> wrote:
 
 "Karl Bochert" <kbochert ix.netcom.com> wrote in message
 news:1103_1013375566 bose...
 I have used exec() here because it is probably the same as PCRE's exec
 function. I have read the RegExp documentation but do not understand the
 difference between the exec() and match() methods. Maybe match() is
 just exec() anchored to the start of the text?

 
 There is no difference if the global attribute is set. If the global
 attribute is not set, then match returns an array of all the matches in the
 input.
 

 I think I understand. match() without the global attribute set finds all
matches
 in the subject string, but loses the 'which substring' information. That might
explain Pavel's problem -- to parse the next token and get it's type info
he should use exec() or global match().

Karl Bochert

Feb 10 2002

"Walter" <walter digitalmars.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a461ka$1tv0$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a45e05$1m8o$1 digitaldaemon.com...

 RegExp, just set the "g" attribute. If you use one RegExp to search for

 two
 different patterns, use parenthesized subexpressions, and the math[][]
 return will tell you which one was matched.

 This will tokenize the string, but once I have all the tokens, there's -
 once again - the problem how to determine the type of each token, having
 its regexp.

That's not a problem with parenthesized subexpressions. You can tell which
one got the match by the index in match[][]. The second index 0 is the
overall match, subsequent indices are the matches for each subexpression.

Feb 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a470kt$2art$1 digitaldaemon.com...

 That's not a problem with parenthesized subexpressions. You can tell which
 one got the match by the index in match[][]. The second index 0 is the
 overall match, subsequent indices are the matches for each subexpression.

Walter, where is that match[][] thing? match() returns char[][], which
ain't what I need...

Feb 10 2002

"Walter" <walter digitalmars.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a47ir2$2i4j$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a470kt$2art$1 digitaldaemon.com...

 That's not a problem with parenthesized subexpressions. You can tell


which
 one got the match by the index in match[][]. The second index 0 is the
 overall match, subsequent indices are the matches for each


subexpression.
 Walter, where is that match[][] thing? match() returns char[][], which
 ain't what I need...

It sounds like just what you need. I guess I just don't understand what's
wrong.

Feb 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a47r1i$2lhb$1 digitaldaemon.com...

 It sounds like just what you need. I guess I just don't understand what's
 wrong.

char[][] is the list of tokens, or, to be more exact, the list of their
_values_. But how do I know their _types_ (string or number or ..)? Suppose
the regexp was:

    ([A-Za-z_]+|0-9+)

And I get 10 tokens. How do I tell if the first matched [A-Za-z_]+ part
or the 0-9+ part, without checking it separately (which results in two
checks per token)?

Feb 11 2002

"Walter" <walter digitalmars.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a485eh$2rbg$1 digitaldaemon.com...
 char[][] is the list of tokens, or, to be more exact, the list of their
 _values_. But how do I know their _types_ (string or number or ..)?

Suppose
 the regexp was:

     ([A-Za-z_]+|0-9+)

 And I get 10 tokens. How do I tell if the first matched [A-Za-z_]+ part
 or the 0-9+ part, without checking it separately (which results in two
 checks per token)?

You can tell which parenthesized subexpression matched by checking to see
which index it was in:

    char[][] m;
    r = new RegExp("(a)|(b)", "g");    // search for "a" or "b"
    while ((m = r.exec("a b and a b")) != null)
    {
        if (m[1])
            ; // matched an "a"
        else if (m[2])
            ; // matched a "b"
    }

Feb 11 2002

Karl Bochert <kbochert ix.netcom.com> writes:

On Mon, 11 Feb 2002 14:57:58 -0800, "Walter" <walter digitalmars.com> wrote:
 
 "Pavel Minayev" <evilone omen.ru> wrote in message
 news:a485eh$2rbg$1 digitaldaemon.com...
 char[][] is the list of tokens, or, to be more exact, the list of their
 _values_. But how do I know their _types_ (string or number or ..)?

 Suppose
 the regexp was:

     ([A-Za-z_]+|0-9+)

 And I get 10 tokens. How do I tell if the first matched [A-Za-z_]+ part
 or the 0-9+ part, without checking it separately (which results in two
 checks per token)?

 
 You can tell which parenthesized subexpression matched by checking to see
 which index it was in:
 
     char[][] m;
     r = new RegExp("(a)|(b)", "g");    // search for "a" or "b"
     while ((m = r.exec("a b and a b")) != null)
     {
         if (m[1])
             ; // matched an "a"
         else if (m[2])
             ; // matched a "b"
     }
 

Or:
    m = r.exec (...);
    switch (m.length) {
    case 0:      // no match
    case 2: // matched 'a'
    case 3:   //matched 'b'
    ...
???

Feb 11 2002

D Programming

C/C++ Programming

Other

D - regexp suggestion