digitalmars.D - Questions about builtin RegExp

Andrew Fedoniouk (10/10) Feb 16 2006 1) Will builtin RegExp increase minimal size of D executable?

Oskar Linde (12/21) Feb 17 2006 No. This was as far as I understood one of the considerations.

Andrew Fedoniouk (10/29) Feb 17 2006 And what is this opNext() doing exactly?

Regan Heath (5/30) Feb 17 2006 I think you're thinking inside the box. :)

Andrew Fedoniouk (11/44) Feb 18 2006 I beleive there is a sort of misunderstanding about what scripting is an...

Regan Heath (13/61) Feb 18 2006 I think there is some overlap, i.e. some scripting tasks do not require ...

Andrew Fedoniouk (23/42) Feb 18 2006 1) Scrtipting langauges are being used usualy as built into some other

Walter Bright (5/13) Feb 18 2006 I agree. But I don't believe that there's anything special about scripti...

kris (3/5) Feb 18 2006 Really? Do you have some kind of data to back that assertion?

Walter Bright (12/16) Feb 18 2006 Peer reviewed statistical research studies? Nope. But it's a pretty good...

Lucas Goss (6/7) Feb 18 2006 I've never used scripting languages for that purpose. The only reason

Walter Bright (7/16) Feb 17 2006 No.

Andrew Fedoniouk (42/51) Feb 17 2006 Next questions then:

Ivan Senji (46/126) Feb 17 2006 Instead of an answer a quick example of what I tried and what works:

Andrew Fedoniouk (25/151) Feb 17 2006 Thanks, Ivan, see below:

Ivan Senji (6/25) Feb 17 2006 Naturally, but this was just a see-if-it-can-be-done example. :)

Andrew Fedoniouk (13/27) Feb 17 2006 :)

Ivan Senji (23/61) Feb 17 2006 Well it wouldn't be the first time that the documentation is

Andrew Fedoniouk (8/29) Feb 17 2006 And what is this opNext for then?

Walter Bright (3/7) Feb 17 2006 m/regex/g => RegExp("regex", "g")

Walter Bright (8/9) Feb 17 2006 For startsWith(), sure. But if that was all regex was used for, nobody w...

Walter Bright (16/48) Feb 17 2006 None. Operator overloading requires one object be a class or a struct. B...

Andrew Fedoniouk (24/77) Feb 17 2006 And this RegExp("string") ~~ "string" is more honest, isn't it?

Walter Bright (8/27) Feb 18 2006 That doesn't give the match results, though.

Andrew Fedoniouk (10/27) Feb 18 2006 Who cares in most of cases?

Walter Bright (9/17) Feb 18 2006 In a very large fraction of cases, it matters. After all, if you are

Andrew Fedoniouk (25/42) Feb 18 2006 Probably in some Perl-ish use cases this is really so needed.

Walter Bright (3/9) Feb 19 2006 I'd like to see strtok() parse an email address out of a body of text.

Andrew Fedoniouk (16/22) Feb 19 2006 I don't really understand "parse an email address out of a body of text....

Chris Sauls (24/59) Feb 19 2006 I think he meant something more like (using MatchExpr, sorry):
Unknown W. Brackets (21/52) Feb 19 2006 Andrew Fedoniouk,

Regan Heath (44/59) Feb 19 2006 Here's how I'd do it:

Walter Bright (3/4) Feb 19 2006 Your's is a lot of code to do what a regex does. Now recognize a url ...

Regan Heath (10/15) Feb 19 2006 This is true, though my code is likely faster.

Georg Wrede (23/83) Feb 20 2006 DISCLAIMER INSERTED WHEN PROOFREADING:

Georg Wrede (13/39) Feb 20 2006 Had I to do stuff on the M$ "platform", I'd definitely look long and

"Andrew Fedoniouk" <news terrainformatica.com> writes:

1) Will builtin RegExp increase minimal size of D executable?
I mean if this executable is not using regexp at all.

2) Is it possible to override operator ~~ ?

3) What is the main purpose of incorporating
interprettable regexps in natively compileable language?

4) When happens check of regexp for syntax correctness -
at compile time or at runtime?  "..." ~~ "..."
If ~~ is a part of language syntax then one can assume that expression
is getting compiled somehow.

Andrew.

Feb 16 2006

Oskar Linde <olREM OVEnada.kth.se> writes:

Andrew Fedoniouk wrote:

 1) Will builtin RegExp increase minimal size of D executable?
 I mean if this executable is not using regexp at all.

No. This was as far as I understood one of the considerations.
 
 2) Is it possible to override operator ~~ ?

Yes. opMatch() and opNext().
 
 3) What is the main purpose of incorporating
 interprettable regexps in natively compileable language?

To make regexps more accessible I guess. Makes D seem like a alternative to
scripting languages.
 
 4) When happens check of regexp for syntax correctness -
 at compile time or at runtime?  "..." ~~ "..."
 If ~~ is a part of language syntax then one can assume that expression
 is getting compiled somehow.

At runtime. For now atleast. In the future it could possibly be compiled at
compile time, but there will still always be a need to support run-time
regexps anyway.

/Oskar

Feb 17 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt40sg$29nc$1 digitaldaemon.com...
 Andrew Fedoniouk wrote:

 1) Will builtin RegExp increase minimal size of D executable?
 I mean if this executable is not using regexp at all.

 No. This was as far as I understood one of the considerations.

 2) Is it possible to override operator ~~ ?

 Yes. opMatch() and opNext().

And what is this opNext() doing exactly?
next sub-expression, next match from last position matched (/g) ?

 3) What is the main purpose of incorporating
 interprettable regexps in natively compileable language?

 To make regexps more accessible I guess. Makes D seem like a alternative 
 to
 scripting languages.

???

alternative to some scripting language can be another scripting language.
alternative to some natively compileable language can be another natively 
compileable language.

 4) When happens check of regexp for syntax correctness -
 at compile time or at runtime?  "..." ~~ "..."
 If ~~ is a part of language syntax then one can assume that expression
 is getting compiled somehow.

 At runtime. For now atleast. In the future it could possibly be compiled 
 at
 compile time, but there will still always be a need to support run-time
 regexps anyway.

Having "builtin" regexps without strings in the language seems unnatural.

Andrew.

Feb 17 2006

"Regan Heath" <regan netwin.co.nz> writes:

On Fri, 17 Feb 2006 20:46:01 -0800, Andrew Fedoniouk  
<news terrainformatica.com> wrote:
 "Oskar Linde" <olREM OVEnada.kth.se> wrote in message
 news:dt40sg$29nc$1 digitaldaemon.com...
 Andrew Fedoniouk wrote:

 1) Will builtin RegExp increase minimal size of D executable?
 I mean if this executable is not using regexp at all.

 No. This was as far as I understood one of the considerations.

 2) Is it possible to override operator ~~ ?

 Yes. opMatch() and opNext().

 And what is this opNext() doing exactly?
 next sub-expression, next match from last position matched (/g) ?

 3) What is the main purpose of incorporating
 interprettable regexps in natively compileable language?

 To make regexps more accessible I guess. Makes D seem like a alternative
 to
 scripting languages.

 ???

 alternative to some scripting language can be another scripting language.
 alternative to some natively compileable language can be another natively
 compileable language.

I think you're thinking inside the box. :)
With the recent additions is it not possible to write scripts in D?

Regan

Feb 17 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Regan Heath" <regan netwin.co.nz> wrote in message 
news:ops45qq5rn23k2f5 nrage.netwin.co.nz...
 On Fri, 17 Feb 2006 20:46:01 -0800, Andrew Fedoniouk 
 <news terrainformatica.com> wrote:
 "Oskar Linde" <olREM OVEnada.kth.se> wrote in message
 news:dt40sg$29nc$1 digitaldaemon.com...
 Andrew Fedoniouk wrote:

 1) Will builtin RegExp increase minimal size of D executable?
 I mean if this executable is not using regexp at all.

 No. This was as far as I understood one of the considerations.

 2) Is it possible to override operator ~~ ?

 Yes. opMatch() and opNext().

 And what is this opNext() doing exactly?
 next sub-expression, next match from last position matched (/g) ?

 3) What is the main purpose of incorporating
 interprettable regexps in natively compileable language?

 To make regexps more accessible I guess. Makes D seem like a alternative
 to
 scripting languages.

 ???

 alternative to some scripting language can be another scripting language.
 alternative to some natively compileable language can be another natively
 compileable language.

 I think you're thinking inside the box. :)
 With the recent additions is it not possible to write scripts in D?

I beleive there is a sort of misunderstanding about what scripting is and
why there are scripting (typeless) languages, compiled bytecoded and 
compiled native.
These three groups has their own niches. D as a compiled language will never 
reach
flexibility of e.g. prototype based JavaScript or Ruby. There are just 
different definitions of flexibility
for these groups - different and sometimes even orthogonal tasks .

Andrew.

Feb 18 2006

"Regan Heath" <regan netwin.co.nz> writes:

On Sat, 18 Feb 2006 00:36:23 -0800, Andrew Fedoniouk  
<news terrainformatica.com> wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:ops45qq5rn23k2f5 nrage.netwin.co.nz...
 On Fri, 17 Feb 2006 20:46:01 -0800, Andrew Fedoniouk
 <news terrainformatica.com> wrote:
 "Oskar Linde" <olREM OVEnada.kth.se> wrote in message
 news:dt40sg$29nc$1 digitaldaemon.com...
 Andrew Fedoniouk wrote:

 1) Will builtin RegExp increase minimal size of D executable?
 I mean if this executable is not using regexp at all.

 No. This was as far as I understood one of the considerations.

 2) Is it possible to override operator ~~ ?

 Yes. opMatch() and opNext().

 And what is this opNext() doing exactly?
 next sub-expression, next match from last position matched (/g) ?

 3) What is the main purpose of incorporating
 interprettable regexps in natively compileable language?

 To make regexps more accessible I guess. Makes D seem like a  
 alternative
 to
 scripting languages.

 ???

 alternative to some scripting language can be another scripting  
 language.
 alternative to some natively compileable language can be another  
 natively
 compileable language.

 I think you're thinking inside the box. :)
 With the recent additions is it not possible to write scripts in D?

 I beleive there is a sort of misunderstanding about what scripting is and
 why there are scripting (typeless) languages, compiled bytecoded and
 compiled native.
 These three groups has their own niches. D as a compiled language will  
 never
 reach
 flexibility of e.g. prototype based JavaScript or Ruby. There are just
 different definitions of flexibility
 for these groups - different and sometimes even orthogonal tasks .

I think there is some overlap, i.e. some scripting tasks do not require  
the flexibilty you mention, instead the important factor may be one or  
more of:
  - how fast can I code the solution
  - how easily can I code the solution
  - how easily can I maintain the solution
  - how likely is my solution to contain bugs
  - how easy will it be to find those bugs

Assuming you're a D programmer and assuming the D std lib contains the  
tools to achieve your task, why not use D?

Regan

Feb 18 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

 I beleive there is a sort of misunderstanding about what scripting is and
 why there are scripting (typeless) languages, compiled bytecoded and
 compiled native.
 These three groups has their own niches. D as a compiled language will 
 never
 reach
 flexibility of e.g. prototype based JavaScript or Ruby. There are just
 different definitions of flexibility
 for these groups - different and sometimes even orthogonal tasks .

 I think there is some overlap, i.e. some scripting tasks do not require 
 the flexibilty you mention, instead the important factor may be one or 
 more of:
  - how fast can I code the solution
  - how easily can I code the solution
  - how easily can I maintain the solution
  - how likely is my solution to contain bugs
  - how easy will it be to find those bugs

 Assuming you're a D programmer and assuming the D std lib contains the 
 tools to achieve your task, why not use D?

1) Scrtipting langauges are being used usualy as built into some other 
environments.
This use case is quite different from D execution model. Different life 
cycle.
2) Scripting langauges are safe. Tremendous effort needed to make GPF in 
scripting
environment. In D to make GPF is a piece of cake. I mean not because of bugs 
in
language or libs but because you can dereference null object pointer for 
example.
3) Scripting languages provide very high level and convenient set of ready 
to use
task oriented set of classes/objects.
Example: for building D projects you would rather use make or build scripts 
than
D itself, right? Even if you would have something like std.build I bet you 
will use
some scripting tool for your builds.

What I want to say:

To write  fast scripting engine in D is possible and this is what D is best 
for (among other things).
But to write something D-ish in scripting.... Completely different areas of 
use to be short.

Feb 18 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
news:dt6ma6$1jt0$1 digitaldaemon.com...
 I beleive there is a sort of misunderstanding about what scripting is and
 why there are scripting (typeless) languages, compiled bytecoded and 
 compiled native.
 These three groups has their own niches. D as a compiled language will 
 never reach
 flexibility of e.g. prototype based JavaScript or Ruby. There are just 
 different definitions of flexibility
 for these groups - different and sometimes even orthogonal tasks .

I agree. But I don't believe that there's anything special about scripting 
that makes it especially suited for regex, but regex is a large reason 
people use scripting languages.

Feb 18 2006

kris <fu bar.org> writes:

Walter Bright wrote:
[snip]
 regex is a large reason 
 people use scripting languages. 

Really? Do you have some kind of data to back that assertion?

Feb 18 2006

"Walter Bright" <newshound digitalmars.com> writes:

"kris" <fu bar.org> wrote in message news:dt7m3l$2hc5$1 digitaldaemon.com...
 Walter Bright wrote:
 [snip]
 regex is a large reason people use scripting languages.

 Really? Do you have some kind of data to back that assertion?

Peer reviewed statistical research studies? Nope. But it's a pretty good 
impression one gets by reading the examples in manuals for scripting 
languages, listening to what people say about those languages, and looking 
at a sampling of actual scripts.

Here's a quote from "Programming Perl"'s preface by Larry Wall: "Perl is no 
longer just for text processing." That means, to me, that Perl was DESIGNED 
to be a text processing language. Why would the backbone of that, regex, not 
be why a large number of people use Perl?

Perl stands for "Practical Extraction and Report Language", i.e. text 
manipulation. Larry goes out of his way to say that Perl is a superset of 
sed and awk, which are regex string manipulation scripting languages.

Feb 18 2006

Lucas Goss <lgoss007 gmail.com> writes:

Walter Bright wrote:
 ... but regex is a large reason people use scripting languages.

I've never used scripting languages for that purpose. The only reason 
I've used scripting languages is because they are often times easier, 
quicker, and have a huge library to write portable code. D almost 
matches them in being as easy and as quick, but lacks the huge standard 
library.

Feb 18 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
news:dt3v1o$27nk$1 digitaldaemon.com...
 1) Will builtin RegExp increase minimal size of D executable?
 I mean if this executable is not using regexp at all.

No.

 2) Is it possible to override operator ~~ ?

Overload, yes. With opMatch().

 3) What is the main purpose of incorporating
 interprettable regexps in natively compileable language?

Make them easier to use.

 4) When happens check of regexp for syntax correctness -
 at compile time or at runtime?  "..." ~~ "..."

Right now, at runtime. But the compiler is allowed to diagnose it at compile 
time, if it's a string literal.

 If ~~ is a part of language syntax then one can assume that expression
 is getting compiled somehow.

Feb 17 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

Thanks, Walter,

 2) Is it possible to override operator ~~ ?

 Overload, yes. With opMatch().

Next questions then:
[char string literal] ~~ [char string literal]

1) For what object I need to override opMatch to be able
to get it invoked in the line above?

2) For some types of RE (alike) expressions there is no need
to create instance of RegExp, e.g. test
"*.ext" ~~ file_name
can be implemented times faster than standard RE creation/invocation.

3) Some objects has no string representation of match operation.
For example CSS selector as an object has match operation with
DOM element as an argument. But you have a requirement:

"Both operands must be implicitly convertible to char[]."

What to do in this case?

 3) What is the main purpose of incorporating
 interprettable regexps in natively compileable language?

 Make them easier to use.

Easier? What is wrong with standard way:

   regexp re = new regexp(".....");
   re.test(...);

And easier is not mean more effective.

while( true )
{
    if( "mask" ~~ file_name )
       ....
}

As far as I understand you will generate:

while( true )
{
     regexp re = new regexp("mask");
     re.test(file_name);
       ....
}


 4) When happens check of regexp for syntax correctness -
 at compile time or at runtime?  "..." ~~ "..."

 Right now, at runtime. But the compiler is allowed to diagnose it at 
 compile time, if it's a string literal.

If it does not compile this regexp at compile time than this is just a fake 
and not a
a solution at all for the language of D level.
Even Perl compiles its regular expresions in compile time.

So the real meaning of
  arg1 ~~ arg2
notation is just a shortcut of
  arg1.test(arg2)

In general shortcuts are good but in this particular case
it has hidden side effects in creation of new RegExp object on each test 
invocation.

Andrew.

Feb 17 2006

Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:

Andrew Fedoniouk wrote:
 Thanks, Walter,
 
 
2) Is it possible to override operator ~~ ?

Overload, yes. With opMatch().

 
 
 Next questions then:
 [char string literal] ~~ [char string literal]
 
 1) For what object I need to override opMatch to be able
 to get it invoked in the line above?
 
 2) For some types of RE (alike) expressions there is no need
 to create instance of RegExp, e.g. test
 "*.ext" ~~ file_name
 can be implemented times faster than standard RE creation/invocation.
 
 3) Some objects has no string representation of match operation.
 For example CSS selector as an object has match operation with
 DOM element as an argument. But you have a requirement:
 
 "Both operands must be implicitly convertible to char[]."
 
 What to do in this case?

Instead of an answer a quick example of what I tried and what works:

<CODE>
import std.stdio;

class ArrayBeginsWith
{
   static ArrayBeginsWith opCall(int a)
   {
     check = a;
     return instance;
   }
   static ArrayBeginsWith instance;
   static int check;
   static this()
   {
     instance = new ArrayBeginsWith;
   }
   static bool opMatch(int[] nums)
   {
     if(nums.length < 1)return false;
     if(nums[0] == check) return true;
     else return false;
   }
}

static bool opMatch(int[] nums)
{
   if(nums.length < 2)return false;
   if(nums[0] == 0 && nums[1] == 1) return true;
   else return false;
}


void main()
{
   static int[] somearray1 = [0,1,2];
   static int[] somearray2 = [2,1,2];

   writefln(ArrayBeginsWith(0) ~~ somearray1);
   writefln(ArrayBeginsWith(0) ~~ somearray2);

   writefln(ArrayBeginsWith(2) ~~ somearray1);
   writefln(ArrayBeginsWith(2) ~~ somearray2);
}
</CODE>


 
 
3) What is the main purpose of incorporating
interprettable regexps in natively compileable language?

Make them easier to use.

 
 
 Easier? What is wrong with standard way:
 
    regexp re = new regexp(".....");
    re.test(...);
 

Nothing is wrong with this, but ~~ is easier :)

 And easier is not mean more effective.
 
 while( true )
 {
     if( "mask" ~~ file_name )
        ....
 }
 
 As far as I understand you will generate:
 
 while( true )
 {
      regexp re = new regexp("mask");
      re.test(file_name);
        ....
 }
 

I don't think this is to hard to optimize away. Compiler can even 
generate global RegExp instance for each regular expression literal and 
use it many times.

 
 
4) When happens check of regexp for syntax correctness -
at compile time or at runtime?  "..." ~~ "..."

Right now, at runtime. But the compiler is allowed to diagnose it at 
compile time, if it's a string literal.

 
 
 If it does not compile this regexp at compile time than this is just a fake 
 and not a
 a solution at all for the language of D level.
 Even Perl compiles its regular expresions in compile time.
 
 So the real meaning of
   arg1 ~~ arg2
 notation is just a shortcut of
   arg1.test(arg2)
 
 In general shortcuts are good but in this particular case
 it has hidden side effects in creation of new RegExp object on each test 
 invocation.
 

This generation of new RegExp doesn't have to be true. But ~~ provides 
us with a feature of testing arbitrary types for arbitrary things.

Feb 17 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

Thanks, Ivan, see below:


"Ivan Senji" <ivan.senji_REMOVE_ _THIS__gmail.com> wrote in message 
news:dt5b54$h1q$1 digitaldaemon.com...
 Andrew Fedoniouk wrote:
 Thanks, Walter,


2) Is it possible to override operator ~~ ?

Overload, yes. With opMatch().


 Next questions then:
 [char string literal] ~~ [char string literal]

 1) For what object I need to override opMatch to be able
 to get it invoked in the line above?

 2) For some types of RE (alike) expressions there is no need
 to create instance of RegExp, e.g. test
 "*.ext" ~~ file_name
 can be implemented times faster than standard RE creation/invocation.

 3) Some objects has no string representation of match operation.
 For example CSS selector as an object has match operation with
 DOM element as an argument. But you have a requirement:

 "Both operands must be implicitly convertible to char[]."

 What to do in this case?

 Instead of an answer a quick example of what I tried and what works:

 <CODE>
 import std.stdio;

 class ArrayBeginsWith
 {
   static ArrayBeginsWith opCall(int a)
   {
     check = a;
     return instance;
   }
   static ArrayBeginsWith instance;
   static int check;
   static this()
   {
     instance = new ArrayBeginsWith;
   }
   static bool opMatch(int[] nums)
   {
     if(nums.length < 1)return false;
     if(nums[0] == check) return true;
     else return false;
   }
 }

 static bool opMatch(int[] nums)
 {
   if(nums.length < 2)return false;
   if(nums[0] == 0 && nums[1] == 1) return true;
   else return false;
 }


 void main()
 {
   static int[] somearray1 = [0,1,2];
   static int[] somearray2 = [2,1,2];

   writefln(ArrayBeginsWith(0) ~~ somearray1);
   writefln(ArrayBeginsWith(0) ~~ somearray2);

   writefln(ArrayBeginsWith(2) ~~ somearray1);
   writefln(ArrayBeginsWith(2) ~~ somearray2);
 }
 </CODE>

function startsWith( int[] arr, int v )
{
    if(arr.length < 1) return false;
    return arr[0] == check);
}

and its usage:

static int[] somearray2 = [2,1,2];

if( somearray2.startsWith( 0 ) ) ...

will be more a) compact b) human readable c) maintainable d) natural

the same apply to

function match( const char[] str, RegExp re )
{
   ...
}

if( mystr.match(someRe)  ) ....

------------------------------------

I would go to normal implementation of outer methods instead of this :p~~.



3) What is the main purpose of incorporating
interprettable regexps in natively compileable language?

Make them easier to use.


 Easier? What is wrong with standard way:

    regexp re = new regexp(".....");
    re.test(...);

 Nothing is wrong with this, but ~~ is easier :)

 And easier is not mean more effective.

 while( true )
 {
     if( "mask" ~~ file_name )
        ....
 }

 As far as I understand you will generate:

 while( true )
 {
      regexp re = new regexp("mask");
      re.test(file_name);
        ....
 }

 I don't think this is to hard to optimize away. Compiler can even generate 
 global RegExp instance for each regular expression literal and use it many 
 times.

4) When happens check of regexp for syntax correctness -
at compile time or at runtime?  "..." ~~ "..."

Right now, at runtime. But the compiler is allowed to diagnose it at 
compile time, if it's a string literal.


 If it does not compile this regexp at compile time than this is just a 
 fake and not a
 a solution at all for the language of D level.
 Even Perl compiles its regular expresions in compile time.

 So the real meaning of
   arg1 ~~ arg2
 notation is just a shortcut of
   arg1.test(arg2)

 In general shortcuts are good but in this particular case
 it has hidden side effects in creation of new RegExp object on each test 
 invocation.

 This generation of new RegExp doesn't have to be true. But ~~ provides us 
 with a feature of testing arbitrary types for arbitrary things.

As I said having defined function with name 'match' and clearly defined 
parameters
is way better than to make syntax of the language look like an Xmas Tree -
with all possible smiley notations (http://www.helpbytes.co.uk/smileys.php)

Andrew.

Feb 17 2006

Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:

Andrew Fedoniouk wrote:
 Thanks, Ivan, see below:

...

 
 function startsWith( int[] arr, int v )
 {
     if(arr.length < 1) return false;
     return arr[0] == check);
 }
 
 and its usage:
 
 static int[] somearray2 = [2,1,2];
 
 if( somearray2.startsWith( 0 ) ) ...
 
 will be more a) compact b) human readable c) maintainable d) natural
 

Naturally, but this was just a see-if-it-can-be-done example.  :)

 As I said having defined function with name 'match' and clearly defined 
 parameters
 is way better than to make syntax of the language look like an Xmas Tree -

Well i don't see it like that, I see it as a abstracted concept of 
"matching", and that can be interpreted as an elementary operation. Plus 
we can overload ~~ to mean matching of any kind we want that makes sense.

Feb 17 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

 static int[] somearray2 = [2,1,2];

 if( somearray2.startsWith( 0 ) ) ...

 will be more a) compact b) human readable c) maintainable d) natural

 Naturally, but this was just a see-if-it-can-be-done example.  :)

:D or better :~~D

 As I said having defined function with name 'match' and clearly defined 
 parameters
 is way better than to make syntax of the language look like an Xmas 
 Tree -

 Well i don't see it like that, I see it as a abstracted concept of 
 "matching", and that can be interpreted as an elementary operation. Plus 
 we can overload ~~ to mean matching of any kind we want that makes sense.

:)

1) According to http://www.digitalmars.com/d/expression.html#MatchExpression
"Both operands must be implicitly convertible to char[]. "
so yours "matching of any kind we want " is not strictly true.

2) ~~ has sidefects. Moreover it is implemented as statefull comparison so
consequent ~~'s on the same arguments will yeld to different results.

3)
while(true)
{
    bool r = "a" ~~ r"\w";
}

must allocate new RegExp.

Feb 17 2006

Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:

Andrew Fedoniouk wrote:
static int[] somearray2 = [2,1,2];

if( somearray2.startsWith( 0 ) ) ...

will be more a) compact b) human readable c) maintainable d) natural

Naturally, but this was just a see-if-it-can-be-done example.  :)

 
 
 :D or better :~~D
 

That's a good smiley.

 
As I said having defined function with name 'match' and clearly defined 
parameters
is way better than to make syntax of the language look like an Xmas 
Tree -

Well i don't see it like that, I see it as a abstracted concept of 
"matching", and that can be interpreted as an elementary operation. Plus 
we can overload ~~ to mean matching of any kind we want that makes sense.

 
 :)
 
 1) According to http://www.digitalmars.com/d/expression.html#MatchExpression
 "Both operands must be implicitly convertible to char[]. "
 so yours "matching of any kind we want " is not strictly true.

Well it wouldn't be the first time that the documentation is 
wrong/incomplete. Both types *do* have to be implicitly convertible to 
char[] unless you use a match expression with your own type with defined 
  opMatch operator.

 
 2) ~~ has sidefects. Moreover it is implemented as statefull comparison so
 consequent ~~'s on the same arguments will yeld to different results.
 

char[] ~~ char[] is implemented that way, but users Foo ~~ Bar[] doesn't 
  have to behave that way (but it can if it makes sense there are more 
matches)

 3)
 while(true)
 {
     bool r = "a" ~~ r"\w";
 }
 
 must allocate new RegExp.

Why?

Why couldn't a compiler optimize this away into something like:

RegExp __regexp0001;
static this()
{
   __regexp0001 = new RegExp("a");
}

and then later whenever literal "a" is used as regex:
while(true)
{
     bool r = __regexp0001 ~~ r"\w";
}

So it is true that a new RegExp is allocated but it needs only to be 
done once.

Feb 17 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

 3)
 while(true)
 {
     bool r = "a" ~~ r"\w";
 }

 must allocate new RegExp.

 Why?

 Why couldn't a compiler optimize this away into something like:

 RegExp __regexp0001;
 static this()
 {
   __regexp0001 = new RegExp("a");
 }

 and then later whenever literal "a" is used as regex:
 while(true)
 {
     bool r = __regexp0001 ~~ r"\w";
 }

 So it is true that a new RegExp is allocated but it needs only to be done 
 once.

And what is this opNext for then?

And more: traditionally there are two "test" operations in RegExps:
'match' and 'test' as far as I remember.
match returns matched substring  and test returns boolean.

There is also /g flag which allow to scan the whole string  (Perl)
$i = 0while ($string =~ m/regex/g) {

}So what exactly this ~~ does?Andrew.

Feb 17 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
news:dt5ton$10qu$1 digitaldaemon.com...
 There is also /g flag which allow to scan the whole string  (Perl)
 $i = 0while ($string =~ m/regex/g) {

 }So what exactly this ~~ does?Andrew.

m/regex/g  =>  RegExp("regex", "g")

Feb 17 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
news:dt5eo9$kgu$1 digitaldaemon.com...
 will be more a) compact b) human readable c) maintainable d) natural

For startsWith(), sure. But if that was all regex was used for, nobody would 
have ever invented them. Regexes can search for arbitrarilly complex 
patterns, and are used that way. Writing a library of custom functions for 
each is out of the question.

What you're also missing in the examples is using the match result, not just 
testing for the match.

Feb 17 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
news:dt591g$erk$1 digitaldaemon.com...
 Next questions then:
 [char string literal] ~~ [char string literal]

 1) For what object I need to override opMatch to be able
 to get it invoked in the line above?

None. Operator overloading requires one object be a class or a struct. But 
you could do:

    RegExp("string") ~~ "string"

and overload opMatch for RegExp.

 2) For some types of RE (alike) expressions there is no need
 to create instance of RegExp, e.g. test
 "*.ext" ~~ file_name
 can be implemented times faster than standard RE creation/invocation.

Sure. Create your own MyReg object, and use it like:

    MyReg("*.ext") ~~ filename

 3) Some objects has no string representation of match operation.
 For example CSS selector as an object has match operation with
 DOM element as an argument. But you have a requirement:

 "Both operands must be implicitly convertible to char[]."

 What to do in this case?

Operator overloading happens before implicit conversions.

 3) What is the main purpose of incorporating
 interprettable regexps in natively compileable language?

 Make them easier to use.

 Easier? What is wrong with standard way:

   regexp re = new regexp(".....");
   re.test(...);

For whatever reason, people find that confusing and impractical.

 And easier is not mean more effective.

True. I didn't say it was more effective.

 If it does not compile this regexp at compile time than this is just a 
 fake and not a
 a solution at all for the language of D level.
 Even Perl compiles its regular expresions in compile time.

It isn't worth trying to do them at compile time if the feature itself 
doesn't catch on.

 So the real meaning of
  arg1 ~~ arg2
 notation is just a shortcut of
  arg1.test(arg2)

It's more than that, because of the implicit declaration of the match 
results.

 In general shortcuts are good but in this particular case
 it has hidden side effects in creation of new RegExp object on each test 
 invocation.

Yes, but why is that a bad thing?

Feb 17 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Walter Bright" <newshound digitalmars.com> wrote in message 
news:dt6da8$1ci6$1 digitaldaemon.com...
 "Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
 news:dt591g$erk$1 digitaldaemon.com...
 Next questions then:
 [char string literal] ~~ [char string literal]

 1) For what object I need to override opMatch to be able
 to get it invoked in the line above?

 None. Operator overloading requires one object be a class or a struct. But 
 you could do:

    RegExp("string") ~~ "string"

 and overload opMatch for RegExp.

And this RegExp("string") ~~ "string" is more honest, isn't it?

Or as in Harmonia:

string s = ....
bool r = s.like("str*");


 2) For some types of RE (alike) expressions there is no need
 to create instance of RegExp, e.g. test
 "*.ext" ~~ file_name
 can be implemented times faster than standard RE creation/invocation.

 Sure. Create your own MyReg object, and use it like:

    MyReg("*.ext") ~~ filename

But I want my own function for char[] ~~ char[] !
Simple pattern match does not require compilation phase
or even memory allocation...


 3) Some objects has no string representation of match operation.
 For example CSS selector as an object has match operation with
 DOM element as an argument. But you have a requirement:

 "Both operands must be implicitly convertible to char[]."

 What to do in this case?

 Operator overloading happens before implicit conversions.

I don't understand why not allow this:
bool opMatch(char[] a, char[] b) ?


 3) What is the main purpose of incorporating
 interprettable regexps in natively compileable language?

 Make them easier to use.

 Easier? What is wrong with standard way:

   regexp re = new regexp(".....");
   re.test(...);

 For whatever reason, people find that confusing and impractical.

uh, people....  I see.

 And easier is not mean more effective.

 True. I didn't say it was more effective.

 If it does not compile this regexp at compile time than this is just a 
 fake and not a
 a solution at all for the language of D level.
 Even Perl compiles its regular expresions in compile time.

 It isn't worth trying to do them at compile time if the feature itself 
 doesn't catch on.

 So the real meaning of
  arg1 ~~ arg2
 notation is just a shortcut of
  arg1.test(arg2)

 It's more than that, because of the implicit declaration of the match 
 results.

 In general shortcuts are good but in this particular case
 it has hidden side effects in creation of new RegExp object on each test 
 invocation.

 Yes, but why is that a bad thing?

You need to explain very well what is going on under the hood of this ~~
- it is statefull operator (if it is /g).

<ot>

I am using stream tokenizer in Harmonia instead of this /g.
(class TokenizerT(CHAR) // harmonia/string.d)

Simple like(pattern)  method is enough in 90% of cases.

Perl is completely different story - it is built around RegExp.
And it is typeless.

</ot>

BTW: Have you seen Nemerle and its way of meta-programming?
http://nemerle.org/

Andrew.

Feb 17 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
news:dt6gbc$1eig$1 digitaldaemon.com...
 "Walter Bright" <newshound digitalmars.com> wrote in message 
 news:dt6da8$1ci6$1 digitaldaemon.com...
 None. Operator overloading requires one object be a class or a struct. 
 But you could do:

    RegExp("string") ~~ "string"

 and overload opMatch for RegExp.

 And this RegExp("string") ~~ "string" is more honest, isn't it?

 Or as in Harmonia:

 string s = ....
 bool r = s.like("str*");

That doesn't give the match results, though.


 Sure. Create your own MyReg object, and use it like:
    MyReg("*.ext") ~~ filename

 But I want my own function for char[] ~~ char[] !

Consider overloading the '+' in '1+2'? To overload operators, one of the 
operands must be a user defined type.


 I don't understand why not allow this:
 bool opMatch(char[] a, char[] b) ?

For the same reason opAdd(int a, int b) is not allowed. Such a function 
would apply globally, all the library code will break, etc.


 BTW: Have you seen Nemerle and its way of meta-programming?
 http://nemerle.org/

I don't know anything about it. I'll take a look at the link.

Feb 18 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Walter Bright" <newshound digitalmars.com> wrote in message 
news:dt6nug$1lhe$1 digitaldaemon.com...
 Or as in Harmonia:

 string s = ....
 bool r = s.like("str*");

 That doesn't give the match results, though.

Who cares in most of cases?
user input validation tasks or simple filename matching ...

When you need match results you will use regexp
or something more effective like tokenizers.

 Sure. Create your own MyReg object, and use it like:
    MyReg("*.ext") ~~ filename

 But I want my own function for char[] ~~ char[] !

 Consider overloading the '+' in '1+2'? To overload operators, one of the 
 operands must be a user defined type.


 I don't understand why not allow this:
 bool opMatch(char[] a, char[] b) ?

 For the same reason opAdd(int a, int b) is not allowed. Such a function 
 would apply globally, all the library code will break, etc.


 BTW: Have you seen Nemerle and its way of meta-programming?
 http://nemerle.org/

 I don't know anything about it. I'll take a look at the link.

Take a look.  A bit ugly on my taste but
some ideas of Nemerle macros can be reused.
They allow to add your own problem specific notation and
syntax to the language.

Feb 18 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
news:dt7qm4$2kn0$1 digitaldaemon.com...
 "Walter Bright" <newshound digitalmars.com> wrote in message 
 news:dt6nug$1lhe$1 digitaldaemon.com...
 string s = ....
 bool r = s.like("str*");


 That doesn't give the match results, though.
 Who cares in most of cases?

In a very large fraction of cases, it matters. After all, if you are 
searching a posting for an embedded email address, it doesn't do much good 
to only know that one is/isn't there. One is searching for it so one can do 
something with it.

 When you need match results you will use regexp
 or something more effective like tokenizers.

Writing a real lexer takes a lot of effort. That's why people invented 
regex, it'll handle most jobs without having to write a lexer. C's strtok() 
is embarassingly inadequate.

Feb 18 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Walter Bright" <newshound digitalmars.com> wrote in message 
news:dt80n7$2qiu$3 digitaldaemon.com...
 "Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
 news:dt7qm4$2kn0$1 digitaldaemon.com...
 "Walter Bright" <newshound digitalmars.com> wrote in message 
 news:dt6nug$1lhe$1 digitaldaemon.com...
 string s = ....
 bool r = s.like("str*");


 That doesn't give the match results, though.
 Who cares in most of cases?

 In a very large fraction of cases, it matters. After all, if you are 
 searching a posting for an embedded email address, it doesn't do much good 
 to only know that one is/isn't there. One is searching for it so one can 
 do something with it.

Probably in some Perl-ish use cases this is really so needed.

In my http://blocknote.net hyperlink auto-recognition start working
on each complete non-ws sequence - I already know position.
But this is a particular use case.


 When you need match results you will use regexp
 or something more effective like tokenizers.

 Writing a real lexer takes a lot of effort. That's why people invented 
 regex, it'll handle most jobs without having to write a lexer. C's 
 strtok() is embarassingly inadequate.

Why?

Here is simple Tokenizer for C/C++/D/etc. alike texts

module harmonia.string;

class TokenizerT(CHAR)
{
  enum token { EOT, SPACE, WORD, QUOTE, DELIMETER, COMMENT }  ...
}

And

module harmonia.html.scanner;

is simple HTML/XML push parser (scanner)

----------------------
I mean that std.lib should have multiple text handling tools.
RegExp is not only one possible.

I would like to see something like customizeable
TokenizerT above in std lib.
Frequently such tokenizer is what really needed rather than
regexp and scriptin style poor man tokenizing using array.split and the 
like.

Andrew.

Feb 18 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Andrew Fedoniouk" <news terrainformatica.com> wrote in message 
news:dt87fd$314d$1 digitaldaemon.com...
 "Walter Bright" <newshound digitalmars.com> wrote in message 
 news:dt80n7$2qiu$3 digitaldaemon.com...
 Writing a real lexer takes a lot of effort. That's why people invented 
 regex, it'll handle most jobs without having to write a lexer. C's 
 strtok() is embarassingly inadequate.

 Why?

I'd like to see strtok() parse an email address out of a body of text.

Feb 19 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Walter Bright" <newshound digitalmars.com> wrote in message 
news:dt9ho8$20e4$3 digitaldaemon.com...

 Writing a real lexer takes a lot of effort. That's why people invented 
 regex, it'll handle most jobs without having to write a lexer. C's 
 strtok() is embarassingly inadequate.

 Why?

 I'd like to see strtok() parse an email address out of a body of text.

I don't really understand "parse an email address out of a body of text."

Do you mean something like this:

char* pw = text;
url u;

forever
{
  pw = strtok( pw, " \t\n\r" ); if( !pw ) return;
  if( !u.parse(pw) ) continue;
  if( u.protocol() == url::MAILTO )
     //found - do something here
     ;
};

?

Andrew.

Feb 19 2006

Chris Sauls <ibisbasenji gmail.com> writes:

Andrew Fedoniouk wrote:
 "Walter Bright" <newshound digitalmars.com> wrote in message 
 news:dt9ho8$20e4$3 digitaldaemon.com...
 
 
Writing a real lexer takes a lot of effort. That's why people invented 
regex, it'll handle most jobs without having to write a lexer. C's 
strtok() is embarassingly inadequate.

Why?

I'd like to see strtok() parse an email address out of a body of text.

 
 
 I don't really understand "parse an email address out of a body of text."
 
 Do you mean something like this:
 
 char* pw = text;
 url u;
 
 forever
 {
   pw = strtok( pw, " \t\n\r" ); if( !pw ) return;
   if( !u.parse(pw) ) continue;
   if( u.protocol() == url::MAILTO )
      //found - do something here
      ;
 };
 
 ?
 
 Andrew. 
 
 

I think he meant something more like (using MatchExpr, sorry):












Granted, I just tossed that together in five seconds flat, so its probably not
quite 
right.  I'm just recently starting to lean into the RegExp camp myself.  Its
made parsing 
of Lyra scripts a dream.

One thing I miss from a scripting language in doing the above, is PHP's lovely
list() 
construct.  Pretending we had this in D:








-- Chris Nicholson-Sauls

Feb 19 2006

"Unknown W. Brackets" <unknown simplemachines.org> writes:

Andrew Fedoniouk,

What he's saying is... essentially... please take this string:

char[] some_text = "The email address Walter is posting from is 
newshound digitalmars.com.  The headers for your message have 
<news terrainformatica.com>, so I would assume that is your address.  My 
address can be found in this HTML: <a 
href=\"mailto:unknown simplemachines.org\">my email</a>";

Now use strtok to output just the email addresses.  I would expect the 
output to be like this:

1: newshound digitalmars.com
2: news terrainformatica.com
3: unknown simplemachines.org

How many lines will it take to grab those addresses, without using a 
regular expression?  You can use "like()" all you like, and strtok(), or 
even strpos()...

He does not mean a whitespace separated list of addresses, why would you 
need to work to parse that?  Most people would not use a regular 
expression for that, it'd be silly.

I think you're looking at this from a different angle than Walter is.

Just illustrating,
-[Unknown]


 "Walter Bright" <newshound digitalmars.com> wrote in message 
 news:dt9ho8$20e4$3 digitaldaemon.com...
 
 Writing a real lexer takes a lot of effort. That's why people invented 
 regex, it'll handle most jobs without having to write a lexer. C's 
 strtok() is embarassingly inadequate.

 Why?

 I'd like to see strtok() parse an email address out of a body of text.

 
 I don't really understand "parse an email address out of a body of text."
 
 Do you mean something like this:
 
 char* pw = text;
 url u;
 
 forever
 {
   pw = strtok( pw, " \t\n\r" ); if( !pw ) return;
   if( !u.parse(pw) ) continue;
   if( u.protocol() == url::MAILTO )
      //found - do something here
      ;
 };
 
 ?
 
 Andrew.

Feb 19 2006

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 19 Feb 2006 14:47:43 -0800, Unknown W. Brackets  
<unknown simplemachines.org> wrote:
 Andrew Fedoniouk,

 What he's saying is... essentially... please take this string:

 char[] some_text = "The email address Walter is posting from is  
 newshound digitalmars.com.  The headers for your message have  
 <news terrainformatica.com>, so I would assume that is your address.  My  
 address can be found in this HTML: <a  
 href=\"mailto:unknown simplemachines.org\">my email</a>";

 Now use strtok to output just the email addresses.  I would expect the  
 output to be like this:

 1: newshound digitalmars.com
 2: news terrainformatica.com
 3: unknown simplemachines.org

 How many lines will it take to grab those addresses, without using a  
 regular expression?  You can use "like()" all you like, and strtok(), or  
 even strpos()...

Here's how I'd do it:

import std.stdio;
import std.string;

char[] some_text = "The email address Walter is posting from is  
newshound digitalmars.com.  The headers for your message have  
<news terrainformatica.com>, so I would assume that is your address.  My  
address can be found in this HTML: <a  
href=\"mailto:unknown simplemachines.org\">my email</a>";

void main()
{
	char[][] res;	
	res = parse_string(some_text);
	foreach(int i, char[] r; res)
		writefln("%d. %s",i+1,r);
}

bool valid_email_char(char c)
{
	char* special = "<>()[]\\.,;: \"";
	if (c == '.') return true;
	if (c <= 0x1F) return false;
	if (c == 0x7F) return false;
	if (c == ' ') return false;
	if (strchr(special,c)) return false;
	return true;
}

char[][] parse_string(char[] text)
{
	char[][] res;
	char* raw = toStringz(text);
	char* p;
	char* e;
	
	for(p = strchr(raw,' '); p; p = strchr(e,' ')) {
		for(e = p+1; valid_email_char(*e); e++) {}
		if (e > raw && *(e-1) == '.') e--;
		for(; p > raw && valid_email_char(*(p-1)); p--) {}
		res ~= p[0..(e-p)]; //add .dup if required
	}
	
	return res;
}

Regan

Feb 19 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Regan Heath" <regan netwin.co.nz> wrote in message 
news:ops48ur1em23k2f5 nrage.netwin.co.nz...
 Here's how I'd do it:

Your's is a lot of code to do what a regex does. Now recognize a url <g>.

Feb 19 2006

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 19 Feb 2006 18:52:19 -0800, Walter Bright  
<newshound digitalmars.com> wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:ops48ur1em23k2f5 nrage.netwin.co.nz...
 Here's how I'd do it:

 Your's is a lot of code to do what a regex does.

This is true, though my code is likely faster.

 Now recognize a url <g>.

Nah. You've made your point.. in fact I was secretly trying to help. <g>

Regex is a good general purpose string parsing facility. I personally find  
composing a regex can be complicated, likely it's easier with practice. A  
custom piece of code is probably faster and I find it easier to tweak. In  
the end, unless it was performance critical or has resisted my initial  
efforts at composing a regex, I'd probably use a regex.

Regan

Feb 19 2006

Georg Wrede <georg.wrede nospam.org> writes:

Regan Heath wrote:
 Walter Bright <newshound digitalmars.com> wrote:
 "Regan Heath" <regan netwin.co.nz> wrote 

 Here's how I'd do it:

 Your's is a lot of code to do what a regex does.

 
 This is true, though my code is likely faster.
 
 Now recognize a url <g>.

 
 Nah. You've made your point.. in fact I was secretly trying to help. <g>

DISCLAIMER INSERTED WHEN PROOFREADING:

I'm not attacking you, or anybody's opinion here, I'm just thinking 
aloud -- mostly to sort out my own opinion on this issue!  :-)

 Regex is a good general purpose string parsing facility. I personally 
 find  composing a regex can be complicated, likely it's easier with 
 practice. A  custom piece of code is probably faster and I find it 
 easier to tweak. In  the end, unless it was performance critical or has 
 resisted my initial  efforts at composing a regex, I'd probably use a 
 regex.

Heh, interestingly, I have the same feeling about all three!! (I.e. 
composing nontrivial regexes is hard, custom code is faster and easier 
to tweak.)

But I can't but wonder whether I'm wrong on all three!

In other words, writing custom code to do the same as a nontrivial 
regexp might feel the easier choice at the outset, but the sheer number 
of lines required (for example for the url recognition task) makes the 
code error prone and unobvious.

And I too _feel_ that the custom code would be faster, but, on second 
thought, I'd probably have to do some intensive optimizing cycles if I 
were against an average regexp implementation. ;-( This regexp stuff is 
"well understood" and polished during decades, after all.

As to "easier to tweak", suppose that Boss comes to you 2 months later 
and wants this Url Recognizer (which you had to write in a hurry to 
compete with the regexp guy in the next cubicle) to only accept 
top-level domains in country specific urls, you'd be hard put to know 
where to start tweaking, while the other guy gets it right in 30 seconds 
flat tweaking his regexp code.

(The boss' tweak accepts foo.fi but not foo.bar.fi nor foo.com)

 Here's how I'd do it:
 
 import std.stdio;
 import std.string;
 
 char[] some_text = "The email address Walter is posting from is 
newshound digitalmars.com.  The headers for your message have 
<news terrainformatica.com>, so I would assume that is your address.  My 
address can be found in this HTML: <a 
href=\"mailto:unknown simplemachines.org\">my email</a>";
 
 void main()
 {
     char[][] res;   
     res = parse_string(some_text);
     foreach(int i, char[] r; res)
         writefln("%d. %s",i+1,r);
 }
 
 bool valid_email_char(char c)
 {
     char* special = "<>()[]\\.,;: \"";
     if (c == '.') return true;
     if (c <= 0x1F) return false;
     if (c == 0x7F) return false;
     if (c == ' ') return false;
     if (strchr(special,c)) return false;
     return true;
 }
 
 char[][] parse_string(char[] text)
 {
     char[][] res;
     char* raw = toStringz(text);
     char* p;
     char* e;
     
     for(p = strchr(raw,' '); p; p = strchr(e,' ')) {
         for(e = p+1; valid_email_char(*e); e++) {}
         if (e > raw && *(e-1) == '.') e--;
         for(; p > raw && valid_email_char(*(p-1)); p--) {}
         res ~= p[0..(e-p)]; //add .dup if required
     }
     
     return res;
 }

Feb 20 2006

Georg Wrede <georg.wrede nospam.org> writes:

Andrew Fedoniouk wrote:
 "Walter Bright" <newshound digitalmars.com>
 "Andrew Fedoniouk" <news terrainformatica.com>


 In general shortcuts are good but in this particular case it has
 hidden side effects in creation of new RegExp object on each test
 invocation.

 
 Yes, but why is that a bad thing?

 
 
 You need to explain very well what is going on under the hood of this
 ~~ - it is statefull operator (if it is /g).
 
 <ot>
 
 I am using stream tokenizer in Harmonia instead of this /g. (class
 TokenizerT(CHAR) // harmonia/string.d)
 
 Simple like(pattern)  method is enough in 90% of cases.
 
 Perl is completely different story - it is built around RegExp. And
 it is typeless.
 
 </ot>
 
 BTW: Have you seen Nemerle and its way of meta-programming? 
 http://nemerle.org/

Had I to do stuff on the M$ "platform", I'd definitely look long and 


The macro thing looks quite a bit like what I had in mind last winter 
when we were discussing whether the high level (that is, 
metaprogramming) features of D should be implemented in a syntax 
distinct from the "normal" language syntax or not.

Seems I lost. :-)
(No hard feelings, Walter and Don are really amazing me, over and over 
again!)

Still, there's a lot of obvious stuff that seems trivial with a separate 
syntax, while either impossible or cumbersome with the current one. (But 
hey, with the rate W&D are going, all that will also be fixed by D 1.5.)

Feb 20 2006

D Programming

C/C++ Programming

Other

digitalmars.D - Questions about builtin RegExp