digitalmars.D - $`, $', $&, $n - sugar or cyclamates?

Walter Bright (19/19) Feb 15 2006 D dramatically improves the convenience of string handling over C++. But...

Hasan Aljudy (2/28) Feb 15 2006 I don't have much todo with regexes .. but please .. the $ sign is ugly!...
Ameer Armaly (2/25) Feb 15 2006
Trevor Parscal (4/23) Feb 15 2006 Leave the $ sign for scripting languages...
Derek Parnell (23/46) Feb 15 2006 Thanks for this Walter. Although it adds no new functionality to

Sean Kelly (5/28) Feb 15 2006 And this was my concern too. But perhaps this is a bridge best left
Walter Bright (5/16) Feb 15 2006 You're right in that all it really does is offer an easier way to get at...

Tom (8/27) Feb 15 2006 On the contrary I think "$" is a very valuable symbol and should be used...
John Demme (4/15) Feb 15 2006 Oh Bob no... Don't turn D into Perl. I like the $ for short cuts and su...

Walter Bright (6/11) Feb 15 2006 . I considered setting this up as a vote:

pragma (28/39) Feb 15 2006 Well, assuming that your mind is made up on this way or no way, I'd have...

Walter Bright (4/17) Feb 15 2006 Yes, opMatch. Already done!

Ivan Senji (23/33) Feb 16 2006 Walter!! You are really crazy! (In a really really good way)

Walter Bright (3/4) Feb 16 2006 It's supposed to work .

Dave (12/23) Feb 15 2006 I think both apply and are not mutually exclusive
Unknown W. Brackets (16/33) Feb 15 2006 I personally don't see why it has to be 1 or 2. I think compromise is a...
jicman (2/13) Feb 16 2006

jicman (2/17) Feb 16 2006 I agree. Perl is perl, D is D.

S. Chancellor (8/33) Feb 15 2006 With this you've essentially bound syntax to the RegExp class, or are

Derek Parnell (15/17) Feb 15 2006 I use regular expression matching a lot in the type of programming I do,

Walter Bright (4/10) Feb 15 2006 All you need to use it with your own custom type is provide an opMatch()...

Kris (39/58) Feb 15 2006 There seem to be multiple issues here. The first one, which you ask abou...

Walter Bright (22/63) Feb 15 2006 Nothing, really. But are they more readable than _match.pre, etc.?

Oskar Linde (30/42) Feb 16 2006 Have you considered making this more general? I.e. for all if statements...

Walter Bright (3/32) Feb 16 2006 I never thought of that. It's an intriguing idea.

pragma (3/45) Feb 16 2006 Something along these lines would *most certainly* get my vote!

kris (2/55) Feb 16 2006 Yes ~ mine too

Sean Kelly (3/28) Feb 16 2006 Mine too.

Sean Kelly (30/58) Feb 16 2006 Hold on. Walter, can you explain this injection business a bit? For

Oskar Linde (11/42) Feb 16 2006 Those are AndAndExpression and OrOrExpression and will not inject anythi...

Sean Kelly (11/56) Feb 16 2006 Very weird. So a MatchExpression by itself has a boolean result but

Oskar Linde (6/19) Feb 16 2006 No, not boolean. A MatchExpression has a _Match* result. This result is ...

Sean Kelly (4/15) Feb 16 2006 Oh right. And pointers can be implicitly evaluates as logical

=?ISO-8859-1?Q?Julio_C=E9sar_Carrascal_Urquijo?= (2/5) Feb 16 2006 This is a great idea. I like it.

Walter Bright (16/21) Feb 16 2006 There is one problem with it: every time an IfStatement is added to exis...

Oskar Linde (45/70) Feb 17 2006 Coding guidelines would probably say that $ should be assigned to a

pragma (8/20) Feb 17 2006 Sort of like an auto 'auto' declaration? I gather that the point is tha...

Oskar Linde (9/29) Feb 17 2006 Yes, exactly so. The scope of such variables declared in the operand of

Walter Bright (9/24) Feb 17 2006 Yes, that's a problem.

Fredrik Olsson (7/45) Feb 17 2006 Yes! This one I like.
Sean Kelly (4/36) Feb 17 2006 I like it. Assuming this were implemented, would it affect all

Sai (1/7) Feb 17 2006 I personally like the former, it does not need special 'if' syntax.

Ivan Senji (8/45) Feb 17 2006 How would this scale to something like
Georg Wrede (6/38) Feb 17 2006 I'm uneasy with this. We're playing with fundamental constructs here.

Kris (6/18) Feb 17 2006 I'm all for getting some kind of regex sugar in the grammar, but also fe...

Sean Kelly (6/11) Feb 17 2006 As long as these new features don't break old code, I'm fine with Walter...

Kris (2/12) Feb 17 2006 That would be cool.

Sean Kelly (16/25) Feb 17 2006 True enough. However, the above syntax is currently illegal, so there's...

Deewiant (5/12) Feb 17 2006 That would sort of make the whole token pointless IMO - easier just to d...

kris (23/106) Feb 16 2006 Well, there's always "in" ...

Walter Bright (32/85) Feb 16 2006 Fair enough. Let's see what others think.

kris (54/92) Feb 16 2006 That doesn't mean D should adopt arbitrary symbols, Walter. If you want

Sean Kelly (19/58) Feb 16 2006 I'm branching Ares before I check in this last block of changes. In the...

Thomas Kuehne (14/23) Feb 16 2006 -----BEGIN PGP SIGNED MESSAGE-----

Sean Kelly (8/30) Feb 16 2006 The same as the problems with std::vector in C++ (though I don't
Kris (2/5) Feb 16 2006 Easy fix ~ change the bool alias to byte, instead of bit :-)

Sean Kelly (7/13) Feb 16 2006 I already use byte in some cases :-) But it lacks the boolean value

Kris (4/15) Feb 16 2006 Yes, you're right of course. Would be just great if Walter would add a t...

Regan Heath (34/53) Feb 16 2006 A true bool would make several people happy.. but once one existed peopl...

Sean Kelly (10/49) Feb 16 2006 This is only a slippery slope if we want it to be ;-) I think the
Walter Bright (4/7) Feb 16 2006 I regularly do bit masking and shifting on ints. I'm so used to it, I do...

Derek Parnell (10/18) Feb 16 2006 YOU ARE DEAD WRONG! Sheesh!!! Not all us are blessed with your abilities...

Walter Bright (9/24) Feb 16 2006 What about using some functions instead:

Derek Parnell (9/35) Feb 16 2006 You mean like std.regexp library functions? Oh that's right ... we have ...

Kris (2/9) Feb 16 2006 Besides, its easy to use op-overloads for such things as necessary.

Derek Parnell (15/44) Feb 16 2006 I regard the syntax
Bruno Medeiros (11/69) Feb 17 2006 An interesting idea, but maybe, to avoid conflicts syntax conflicts, we

Regan Heath (4/15) Feb 17 2006 I like it.

Walter Bright (51/105) Feb 16 2006 Those had to go because === was indistinguishable from == in many fonts.

Sean Kelly (24/37) Feb 16 2006 This is really more of a library issue than a compiler issue. My

Walter Bright (20/43) Feb 16 2006 I was concerned that code that did not use MatchExpressions might

Sean Kelly (6/13) Feb 16 2006 Perhaps I'm being idealistic, as I simply don't believe the runtime

Walter Bright (10/22) Feb 16 2006 Consider that there's no way to implement C, D, etc., without some runti...

Sean Kelly (9/31) Feb 17 2006 Just to be clear, by "standard library code" I actually meant D code

Walter Bright (4/12) Feb 17 2006 I think it's implied by it being part of the language spec. Regardless, ...

Kris (31/71) Feb 16 2006 Can't say that I agree, but my opinion matters rather little anyway

Walter Bright (3/14) Feb 16 2006 How do you interpret the fact that it has failed to gain traction among ...

Kris (21/37) Feb 16 2006 I noted a few reasons previously, regarding differing approaches and

Walter Bright (14/33) Feb 16 2006 This might be a circular result - people don't use regex in C because

Sean Kelly (16/47) Feb 16 2006 For what it's worth, the latest release of Ares trims a lot of fat out

Walter Bright (4/9) Feb 16 2006 Very little actually changed, what I did was resort the order so it was ...

Georg Wrede (15/22) Feb 16 2006 Hmm.

Walter Bright (6/14) Feb 16 2006 There are a lot of cool things you can do in script languages because th...

Georg Wrede (3/21) Feb 17 2006 Neither do I.

Walter Bright (3/11) Feb 17 2006 My answer is because they're inconvenient to use in C/C++.
James Dunne (31/58) Feb 20 2006 My answer is that regular expressions simply aren't powerful enough for

Georg Wrede (6/10) Feb 16 2006 Would it be correct to assume that if we had compile-time regexps, then

Georg Wrede (40/78) Feb 16 2006 There are 2 things reducing its usage.

Walter Bright (3/6) Feb 16 2006 I think the $` is pretty much dead now .

Sean Kelly (4/11) Feb 16 2006 I'm half inclined to suggest -> for ~~, though there doesn't seem to be

Walter Bright (7/18) Feb 16 2006 Two cons:

James Dunne (6/32) Feb 15 2006 I'd rather make my code easier to read than write. I don't use regexps
Roberto Mariottini (17/37) Feb 15 2006 No.
Oskar Linde (19/28) Feb 16 2006 ` is not readily available on all keyboards. Some fonts also have proble...

Walter Bright (7/22) Feb 16 2006 $1, $2, $3, ...

Oskar Linde (40/51) Feb 17 2006 So why does _match have to be a pointer? Would something like this not w...

Walter Bright (7/21) Feb 17 2006 I wanted it to work with both pointers to structs and to class reference...

bobef (10/36) Feb 16 2006 It is nice feature but I don't think such thing should be part of the
Charles (5/24) Feb 16 2006 Sweet jesus ... the horror.
David Medlock (16/42) Feb 16 2006 I havent read this whole thread, but pardon if this has been suggested.

Walter Bright (4/6) Feb 16 2006 Why, indeed. Oskar has brought it up, and he and you are right. I'm goin...

Hasan Aljudy (6/17) Feb 16 2006 I agree with the "foreach" point/suggestion ..

"Walter Bright" <newshound digitalmars.com> writes:

D dramatically improves the convenience of string handling over C++. But 
while I think using the library std.regexp is straightforward, obviously it 
just isn't gaining traction. People like the shortcut approaches Ruby and 
Perl use for regular expressions, hence the new D match-expression support.

So, now we have:

    if (regular_expression ~~ string)
    {
            _match.pre
            _match.post
            _match.match(n)
    }

Should we do some aliases:

    $` => _match.pre
    $' => _match.post
    $& => _match.match(0)
    $n => _match.match(n)

? Syntactic sugar is often a good idea, but at what point do they become 
cyclamates and cause cancer in laboratory animals? Will these $ tokens 
render D more accessible, but perhaps too unreadable?

Feb 15 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Walter Bright wrote:
 D dramatically improves the convenience of string handling over C++. But 
 while I think using the library std.regexp is straightforward, obviously it 
 just isn't gaining traction. People like the shortcut approaches Ruby and 
 Perl use for regular expressions, hence the new D match-expression support.
 
 So, now we have:
 
     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable? 
 
 

I don't have much todo with regexes .. but please .. the $ sign is ugly!!

Feb 15 2006

"Ameer Armaly" <ameer_armaly hotmail.com> writes:

"Walter Bright" <newshound digitalmars.com> wrote in message 
news:dt088e$1svm$2 digitaldaemon.com...
D dramatically improves the convenience of string handling over C++. But 
while I think using the library std.regexp is straightforward, obviously it 
just isn't gaining traction. People like the shortcut approaches Ruby and 
Perl use for regular expressions, hence the new D match-expression support.

 So, now we have:

    if (regular_expression ~~ string)
    {
            _match.pre
            _match.post
            _match.match(n)
    }

 Should we do some aliases:

    $` => _match.pre
    $' => _match.post
    $& => _match.match(0)
    $n => _match.match(n)

 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable? Hmm... I don't do 
 much with regular expressions, but the presence of too much sugar can be 
 counterproductive; I personally think the standard lib is the place for 
 that kind of thing.

Feb 15 2006

Trevor Parscal <Trevor_member pathlink.com> writes:

In article <dt088e$1svm$2 digitaldaemon.com>, Walter Bright says...
D dramatically improves the convenience of string handling over C++. But 
while I think using the library std.regexp is straightforward, obviously it 
just isn't gaining traction. People like the shortcut approaches Ruby and 
Perl use for regular expressions, hence the new D match-expression support.

So, now we have:

    if (regular_expression ~~ string)
    {
            _match.pre
            _match.post
            _match.match(n)
    }

Should we do some aliases:

    $` => _match.pre
    $' => _match.post
    $& => _match.match(0)
    $n => _match.match(n)

? Syntactic sugar is often a good idea, but at what point do they become 
cyclamates and cause cancer in laboratory animals? Will these $ tokens 
render D more accessible, but perhaps too unreadable? 

Leave the $ sign for scripting languages...

Thanks,
Trevor Parscal

Feb 15 2006

Derek Parnell <derek psych.ward> writes:

On Wed, 15 Feb 2006 13:59:33 -0800, Walter Bright wrote:

 D dramatically improves the convenience of string handling over C++. But 
 while I think using the library std.regexp is straightforward, obviously it 
 just isn't gaining traction. People like the shortcut approaches Ruby and 
 Perl use for regular expressions, hence the new D match-expression support.
 
 So, now we have:
 
     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }

Thanks for this Walter. Although it adds no new functionality to
applications, it does say that D is a serious player in making string
handling programs easier to write and maintain. I expect that std.regexp
will still stay around and that this new feature is merely a portal into
that library.
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable?

My first thought was "ouch! - not pleasant". After some consideration I'm
now leaning towards the idea that we should hold off on implementing these
shortcuts for now and wait to see if they are actually required or not. And
then, if there is a crying need for them, to come up with a set of
shortcuts that will be acceptable enough. 

Currently the '$' symbol is associated with arrays and lengths, and not as
a general purpose lead-in character to symbol values. To mix these two
disparate concepts in coders minds might not be fruitful. However, there
may be other alternatives yet to be discovered, so the concept ought not to
be totally abandoned just yet.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
16/02/2006 10:32:50 AM

Feb 15 2006

Sean Kelly <sean f4.ca> writes:

Derek Parnell wrote:
 On Wed, 15 Feb 2006 13:59:33 -0800, Walter Bright wrote:
  
 Should we do some aliases:

     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)

 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable?

 
 My first thought was "ouch! - not pleasant". After some consideration I'm
 now leaning towards the idea that we should hold off on implementing these
 shortcuts for now and wait to see if they are actually required or not. And
 then, if there is a crying need for them, to come up with a set of
 shortcuts that will be acceptable enough. 

Agreed.

 Currently the '$' symbol is associated with arrays and lengths, and not as
 a general purpose lead-in character to symbol values. To mix these two
 disparate concepts in coders minds might not be fruitful. However, there
 may be other alternatives yet to be discovered, so the concept ought not to
 be totally abandoned just yet.

And this was my concern too.  But perhaps this is a bridge best left 
ignored until there's a reason to jump.


Sean

Feb 15 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:15w2x5659i8ey$.p4zbzif24wfw$.dlg 40tude.net...
 Thanks for this Walter. Although it adds no new functionality to
 applications, it does say that D is a serious player in making string
 handling programs easier to write and maintain. I expect that std.regexp
 will still stay around and that this new feature is merely a portal into
 that library.

You're right in that all it really does is offer an easier way to get at 
std.regexp.

 My first thought was "ouch! - not pleasant". After some consideration I'm
 now leaning towards the idea that we should hold off on implementing these
 shortcuts for now and wait to see if they are actually required or not. 
 And
 then, if there is a crying need for them, to come up with a set of
 shortcuts that will be acceptable enough.

That's why I didn't do them yet.

Feb 15 2006

Tom <Tom_member pathlink.com> writes:

In article <dt088e$1svm$2 digitaldaemon.com>, Walter Bright says...
D dramatically improves the convenience of string handling over C++. But 
while I think using the library std.regexp is straightforward, obviously it 
just isn't gaining traction. People like the shortcut approaches Ruby and 
Perl use for regular expressions, hence the new D match-expression support.

So, now we have:

    if (regular_expression ~~ string)
    {
            _match.pre
            _match.post
            _match.match(n)
    }

Should we do some aliases:

    $` => _match.pre
    $' => _match.post
    $& => _match.match(0)
    $n => _match.match(n)

? Syntactic sugar is often a good idea, but at what point do they become 
cyclamates and cause cancer in laboratory animals? Will these $ tokens 
render D more accessible, but perhaps too unreadable? 

On the contrary I think "$" is a very valuable symbol and should be used. Though
the symbol "`" is very inconvenient (at least for spanish keyboard layout), ugly
and could lead to confusion with "'" symbol - as I've seen many times and which
I personally don't like to see used in such a way as "$'" -.

Maybe "$[" and "$]", don't know.

Just my opinion,

Tom;

Feb 15 2006

John Demme <me teqdruid.com> writes:

Walter Bright wrote:
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become
 cyclamates and cause cancer in laboratory animals? Will these $ tokens
 render D more accessible, but perhaps too unreadable?

Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and such,
but please no random symbols.  I like $match.pre and $length, ect... but $&
and $` don't mean anything to me!

Feb 15 2006

"Walter Bright" <newshound digitalmars.com> writes:

"John Demme" <me teqdruid.com> wrote in message 
news:dt0fvp$23bj$1 digitaldaemon.com...
 Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and 
 such,
 but please no random symbols.  I like $match.pre and $length, ect... but 
 $&
 and $` don't mean anything to me!

<g>. I considered setting this up as a vote:

Vote for 1:

(1) If I wanted to write ugly programs I'd use Perl, not D.

(2) Cool! I can now dump my Perl scripts and use D!

Feb 15 2006

pragma <pragma_member pathlink.com> writes:

In article <dt0hbb$25iq$2 digitaldaemon.com>, Walter Bright says...
"John Demme" <me teqdruid.com> wrote in message 
news:dt0fvp$23bj$1 digitaldaemon.com...
 Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and 
 such,
 but please no random symbols.  I like $match.pre and $length, ect... but 
 $&
 and $` don't mean anything to me!

<g>. I considered setting this up as a vote:

Vote for 1:

(1) If I wanted to write ugly programs I'd use Perl, not D.

(2) Cool! I can now dump my Perl scripts and use D! 

Well, assuming that your mind is made up on this way or no way, I'd have to lean
toward (2).  Its there to be used, but if I object to it personally, I can
abstain from using it.

Just some food for thought, as I think there's plenty left to be worked out in
this concept. :)

IMHO, using "~~" as a token doesn't look right yet, but that's probably because
this would be the first time that token has been used in a programming language
(unless I'm mistaken).  The only thing I could possibly suggest to use
differently would be at-cost (" ") symbol:

if("regular expression"   "operand"){ /*...*/ }

This looks a little more arithmetic to my eye than "~~". :)

The dollar-sign operators look good, but "$n" seems limited to me.  Why not open
this up to array-indexing so it's more compatible with foreach, arrays and other
things D?  Also, what about if I want to pass the set of matches as an array?

The '$x' tokens are sure to lex great, but isn't this running the risk of
overloading the '$' symbol a bit much (from a visual standpoint)?  

if("$\w*" ~~ "hello world"){
mystring[0..$&.length] = $&; //eek!
}

Also, am I to assume that we'll get an "opProcess" operator overload to use on
our classes?  As long as _match is flexible enough to accept any type, this
could really work.  To my eye, the compiler could accept a custom class or
struct as the _match value (kind of like an internal 'auto') so long as its
namespace provides the .pre, .post, .match members.  All-in-all, it would be a
rather nice side effect of all this, as things like Spirit have been difficult
to implement as D has fewer operator overloads than C++.


- Eric Anderton at yahoo

Feb 15 2006

"Walter Bright" <newshound digitalmars.com> writes:

"pragma" <pragma_member pathlink.com> wrote in message 
news:dt0mfk$29qc$1 digitaldaemon.com...
 Also, am I to assume that we'll get an "opProcess" operator overload to 
 use on
 our classes?

Yes, opMatch. Already done!

  As long as _match is flexible enough to accept any type, this
 could really work.  To my eye, the compiler could accept a custom class or
 struct as the _match value (kind of like an internal 'auto') so long as 
 its
 namespace provides the .pre, .post, .match members.

Already done!

 All-in-all, it would be a
 rather nice side effect of all this, as things like Spirit have been 
 difficult
 to implement as D has fewer operator overloads than C++.


 - Eric Anderton at yahoo

Feb 15 2006

Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:

Walter Bright wrote:
 "pragma" <pragma_member pathlink.com> wrote in message 
 news:dt0mfk$29qc$1 digitaldaemon.com...
 
Also, am I to assume that we'll get an "opProcess" operator overload to 
use on
our classes?

 
 
 Yes, opMatch. Already done!
 

Walter!! You are really crazy! (In a really really good way)

I just tried this for fun and it works:

<code>

import std.stdio;

class ArrayBeginsWith0and1
{
   static bool opMatch(int[] nums)
   {
     if(nums.length < 2)return false;
     if(nums[0] == 0 && nums[1] == 1) return true;
     else return false;
   }
}

void main()
{
   static int[] somearray1 = [0,1,2];
   static int[] somearray2 = [2,1,2];

   writefln(ArrayBeginsWith0and1 ~~ somearray1); //prints true
   writefln(ArrayBeginsWith0and1 ~~ somearray2); //prints false
}

</code>

I hope this isn't a bug that this works?

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Ivan Senji" <ivan.senji_REMOVE_ _THIS__gmail.com> wrote in message 
news:dt1u3t$d97$1 digitaldaemon.com...
 I hope this isn't a bug that this works?

It's supposed to work <g>.

Feb 16 2006

Dave <Dave_member pathlink.com> writes:

In article <dt0hbb$25iq$2 digitaldaemon.com>, Walter Bright says...
"John Demme" <me teqdruid.com> wrote in message 
news:dt0fvp$23bj$1 digitaldaemon.com...
 Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and 
 such,
 but please no random symbols.  I like $match.pre and $length, ect... but 
 $&
 and $` don't mean anything to me!

<g>. I considered setting this up as a vote:

Vote for 1:

(1) If I wanted to write ugly programs I'd use Perl, not D.

(2) Cool! I can now dump my Perl scripts and use D! 

I think both apply and are not mutually exclusive <g>

For me, the big part of supporting the most common regex operation in the
language itself is that quick scripts using it can be kicked out without having
to import something or remember the details of the RegExp class. Crazy (or
lazy?), but I find that appealing when comparing it to a scripting language. So
that's a vote for (2).

I've never been a big fan of most of Perl's syntactical sugar - just too easy to
miss something when you're reading it, so that's a vote for (1). And besides,
one will never be able to copy and paste much of anything from Perl into D so
there isn't any 'sweet' benefit there either <g>

- Dave

Feb 15 2006

"Unknown W. Brackets" <unknown simplemachines.org> writes:

I personally don't see why it has to be 1 or 2.  I think compromise is a 
great thing.

I should note first that I actually like $ in scripting languages, 
because it tends to make variables stand out (not hide them.)

You seem to be suggesting either using _match.match(0) (ick!) or $&.... 
why?  Why can't it be:

    $pre => _match.pre
    $post => _match.post
    $match => _match.match(0)
    $5 => _match.match(5)

Yes, yes, I realize this looks more like those scripting-language 
variables, but it's also clearer than Perl's syntax, and almost as easy 
to type.  I would spend more time making sure I'm pressing the right 
symbol than typing "pre" or some such.

Just my opinion.

-[Unknown]


 "John Demme" <me teqdruid.com> wrote in message 
 news:dt0fvp$23bj$1 digitaldaemon.com...
 Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and 
 such,
 but please no random symbols.  I like $match.pre and $length, ect... but 
 $&
 and $` don't mean anything to me!

 
 <g>. I considered setting this up as a vote:
 
 Vote for 1:
 
 (1) If I wanted to write ugly programs I'd use Perl, not D.
 
 (2) Cool! I can now dump my Perl scripts and use D!

Feb 15 2006

jicman <jicman_member pathlink.com> writes:

1

Walter Bright says...
"John Demme" <me teqdruid.com> wrote in message 
news:dt0fvp$23bj$1 digitaldaemon.com...
 Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and 
 such,
 but please no random symbols.  I like $match.pre and $length, ect... but 
 $&
 and $` don't mean anything to me!

<g>. I considered setting this up as a vote:

Vote for 1:

(1) If I wanted to write ugly programs I'd use Perl, not D.

(2) Cool! I can now dump my Perl scripts and use D!

Feb 16 2006

jicman <jicman_member pathlink.com> writes:

John Demme says...
Walter Bright wrote:
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become
 cyclamates and cause cancer in laboratory animals? Will these $ tokens
 render D more accessible, but perhaps too unreadable?

Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and such,
but please no random symbols.  I like $match.pre and $length, ect... but $&
and $` don't mean anything to me!

I agree.  Perl is perl, D is D.

Feb 16 2006

S. Chancellor <dnewsgr mephit.kicks-ass.org> writes:

On 2006-02-15 13:59:33 -0800, "Walter Bright" <newshound digitalmars.com> said:

 D dramatically improves the convenience of string handling over C++. 
 But while I think using the library std.regexp is straightforward, 
 obviously it just isn't gaining traction. People like the shortcut 
 approaches Ruby and Perl use for regular expressions, hence the new D 
 match-expression support.
 
 So, now we have:
 
     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they 
 become cyclamates and cause cancer in laboratory animals? Will these $ 
 tokens render D more accessible, but perhaps too unreadable?

With this you've essentially bound syntax to the RegExp class, or are 
you not using that for this?    I do believe I recall some statements 
by you in the past against standard libraries being an integral part of 
the computer language.  Though, I'm too lazy to dig them up right now.

My preference is that this match syntax be removed, and the aliases 
never see the light of day.  I use perl for this sort of stuff.

-S.

Feb 15 2006

Derek Parnell <derek psych.ward> writes:

On Wed, 15 Feb 2006 18:06:45 -0800, S. Chancellor wrote:
 
 My preference is that this match syntax be removed, and the aliases 
 never see the light of day.  I use perl for this sort of stuff.

I use regular expression matching a lot in the type of programming I do,
e.g. Build, and I suspect I'd find perl far too slow for the purpose. 

I haven't used the std.regexp library because it doesn't really support
Unicode correctly so I've written simple functions to some pattern matching
for my needs. And as I've just found out, the new pattern matching just
uses the standard library and Unicode support is not there, so I still
can't use it.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
16/02/2006 1:38:45 PM

Feb 15 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:sedgdqrvihce.1s7xzb5qubodc$.dlg 40tude.net...
 I haven't used the std.regexp library because it doesn't really support
 Unicode correctly so I've written simple functions to some pattern 
 matching
 for my needs. And as I've just found out, the new pattern matching just
 uses the standard library and Unicode support is not there, so I still
 can't use it.

All you need to use it with your own custom type is provide an opMatch() 
overload.

Feb 15 2006

"Kris" <fu bar.com> writes:

"Walter Bright" <newshound digitalmars.com> wrote...
D dramatically improves the convenience of string handling over C++. But 
while I think using the library std.regexp is straightforward, obviously it 
just isn't gaining traction. People like the shortcut approaches Ruby and 
Perl use for regular expressions, hence the new D match-expression support.

 So, now we have:

    if (regular_expression ~~ string)
    {
            _match.pre
            _match.post
            _match.match(n)
    }

 Should we do some aliases:

    $` => _match.pre
    $' => _match.post
    $& => _match.match(0)
    $n => _match.match(n)

 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable?


There seem to be multiple issues here. The first one, which you ask about, 
is related to the syntax. At first blush, the ~~ looks like an approximate 
approximation, and then making D look like a malformed Perl is surely a 
mistake. What the heck is wrong with $match.pre, $match.post, 
$match.index(n) instead? At least they're readable :-)

Additionally, I thought '~' was used for concatenation? Because '+' is 
overloaded in other languages? Isn't that just exactly what you're now doing 
with '~' ? I mean, what does a "pattern within" operation have to do with 
concatenation?

Then, you say this is applicable only to char[]. What about wchar[] and 
dchar[]? Are they now relegated to second-class citizens? It's no use 
converting those arrays into char[] on the fly ~ apart from the heap 
activity and conversion that would ensue (for both operands; one of which 
could be rather substantial), $match.pre and friends would also have to do 
conversions back into the original format. Ugghh.

Yet another issue is with respect to case-folding (which is often used with 
regex expressions). You see, unicode case-folding does not follow the 
trivial rules of ASCII ~ you can't just call tolower() and hope for the 
best. Thus, there needs to be some mechanism to support alternate, more 
appropriate, converters.

In retrospect, much of this should probably be handled via template usage 
(for the different UTF types). And the converter issue can be resolved by 
supporting some kind of assignable or plug-in module. All of this can be 
handled by a templated class. I attempted to do just this with your RegExp 
class, but ran into problems related to how patterns are stored in the 
"instruction" stream (size differences between char and dchar, for example).

I'm an advocate for potentially getting regex support into the grammar but, 
on the face of it, your approach just doesn't appear to be considered in a 
particularly thorough manner. There again, perhaps you've already addressed 
the above issues, and the resolution is just not currently visible?

Perhaps this whole thing should wait until after we see what can be done 
with the regex templates, so that there's some experience behind the 
grammar? I mean, that would surely be better than having to remove the above 
at some point in the future. What's the big rush with built-in regex anyway? 
I really do think it should wait until we have some solid experience with 
regex templates ~ don't you think it's rather likely we'll learn something 
really useful that applies directly to a built-in grammar?

- Kris

Feb 15 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Kris" <fu bar.com> wrote in message news:dt0q7n$2cuo$1 digitaldaemon.com...
 There seem to be multiple issues here. The first one, which you ask about, 
 is related to the syntax. At first blush, the ~~ looks like an approximate 
 approximation, and then making D look like a malformed Perl is surely a 
 mistake.

If you've got a better idea for tokens ~~ and !~ ?

 What the heck is wrong with $match.pre, $match.post, $match.index(n) 
 instead? At least they're readable :-)

Nothing, really. But are they more readable than _match.pre, etc.?

 Additionally, I thought '~' was used for concatenation?

It is.

 Because '+' is overloaded in other languages? Isn't that just exactly what 
 you're now doing with '~' ?

'=' and '==' mean entirely different things. So does / and /*. I don't think 
~~ need have anything to do with complement or concatenation.

 I mean, what does a "pattern within" operation have to do with 
 concatenation?

Nothing at all.

 Then, you say this is applicable only to char[]. What about wchar[] and 
 dchar[]? Are they now relegated to second-class citizens? It's no use 
 converting those arrays into char[] on the fly ~ apart from the heap 
 activity and conversion that would ensue (for both operands; one of which 
 could be rather substantial), $match.pre and friends would also have to do 
 conversions back into the original format. Ugghh.

That is a problem, one that would get solved when RegExp can do wchar and 
dchar. That isn't a technical problem, it's more of a getting around to it 
problem.

 Yet another issue is with respect to case-folding (which is often used 
 with regex expressions). You see, unicode case-folding does not follow the 
 trivial rules of ASCII ~ you can't just call tolower() and hope for the 
 best. Thus, there needs to be some mechanism to support alternate, more 
 appropriate, converters.

I agree that case is an issue. That's why this also works:

    if (RegExp("string", "i") ~~ "string") ...

and can work with any class type as the left operand, as long as it 
overloads opMatch.

 In retrospect, much of this should probably be handled via template usage 
 (for the different UTF types). And the converter issue can be resolved by 
 supporting some kind of assignable or plug-in module. All of this can be 
 handled by a templated class. I attempted to do just this with your RegExp 
 class, but ran into problems related to how patterns are stored in the 
 "instruction" stream (size differences between char and dchar, for 
 example).

I don't agree. The problem I ran into with this approach is the injection of 
the declaration _match into the current scope.

 I'm an advocate for potentially getting regex support into the grammar 
 but, on the face of it, your approach just doesn't appear to be considered 
 in a particularly thorough manner. There again, perhaps you've already 
 addressed the above issues, and the resolution is just not currently 
 visible?

I considered many ways of doing it, and have actually been thinking about it 
for months. This seemed to be the most practical. I hope I answered your 
questions about it.

 Perhaps this whole thing should wait until after we see what can be done 
 with the regex templates, so that there's some experience behind the 
 grammar? I mean, that would surely be better than having to remove the 
 above at some point in the future. What's the big rush with built-in regex 
 anyway? I really do think it should wait until we have some solid 
 experience with regex templates ~ don't you think it's rather likely we'll 
 learn something really useful that applies directly to a built-in grammar?

I don't think this takes away from the regex templates. I hope to use the 
regex templates in conjunction with this syntactic sugar to create optimized 
regex evaluation.

Feb 15 2006

Oskar Linde <olREM OVEnada.kth.se> writes:

Walter Bright wrote:

 "Kris" <fu bar.com> wrote in message
 news:dt0q7n$2cuo$1 digitaldaemon.com...

 In retrospect, much of this should probably be handled via template usage
 (for the different UTF types). And the converter issue can be resolved by
 supporting some kind of assignable or plug-in module. All of this can be
 handled by a templated class. I attempted to do just this with your
 RegExp class, but ran into problems related to how patterns are stored in
 the "instruction" stream (size differences between char and dchar, for
 example).

 
 I don't agree. The problem I ran into with this approach is the injection
 of the declaration _match into the current scope.

Have you considered making this more general? I.e. for all if statements,
inject a variable that takes the value of the entire condition expression.
(Using _result as a placeholder for such an identifier.)

if ("..." ~~ "...) {
  _result.match(0);
}

if (myFunc()) {
  _result.whatever();
}

Why should this behavior be reserved for opMatch() only? Isn't this a very
common coding pattern that could also become less verbose by this:

SomeType result;
if ( (result = getSomething())) {
        doSomethingWith(result);
}

(becoming:

if (getSomething()) {
        doSomethingWith(_result);
}

)

One suggestion would be to call _result $. Giving $ the semantics of a
"scope injected value". This would go hand in hand with an earlier
suggestion of changing the $ for index operations too:

Assume [] introduces a new scope, then a $ within [] would refer to whatever
is being indexed.

char[] cutHeadAndTail = myString[1 .. $.length-1];
Image subImage = myImage[$.upperLeft .. $.middle];
char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

/Oskar

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1aif$2qpd$1 digitaldaemon.com...
 Have you considered making this more general? I.e. for all if statements,
 inject a variable that takes the value of the entire condition expression.
 (Using _result as a placeholder for such an identifier.)

 if ("..." ~~ "...) {
  _result.match(0);
 }

 if (myFunc()) {
  _result.whatever();
 }

 Why should this behavior be reserved for opMatch() only? Isn't this a very
 common coding pattern that could also become less verbose by this:

 SomeType result;
 if ( (result = getSomething())) {
        doSomethingWith(result);
 }

 (becoming:

 if (getSomething()) {
        doSomethingWith(_result);
 }

 )

 One suggestion would be to call _result $. Giving $ the semantics of a
 "scope injected value". This would go hand in hand with an earlier
 suggestion of changing the $ for index operations too:

 Assume [] introduces a new scope, then a $ within [] would refer to 
 whatever
 is being indexed.

 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

I never thought of that. It's an intriguing idea.

Feb 16 2006

pragma <pragma_member pathlink.com> writes:

In article <dt1eje$2uvu$2 digitaldaemon.com>, Walter Bright says...
"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1aif$2qpd$1 digitaldaemon.com...
 Have you considered making this more general? I.e. for all if statements,
 inject a variable that takes the value of the entire condition expression.
 (Using _result as a placeholder for such an identifier.)

 if ("..." ~~ "...) {
  _result.match(0);
 }

 if (myFunc()) {
  _result.whatever();
 }

 Why should this behavior be reserved for opMatch() only? Isn't this a very
 common coding pattern that could also become less verbose by this:

 SomeType result;
 if ( (result = getSomething())) {
        doSomethingWith(result);
 }

 (becoming:

 if (getSomething()) {
        doSomethingWith(_result);
 }

 )

 One suggestion would be to call _result $. Giving $ the semantics of a
 "scope injected value". This would go hand in hand with an earlier
 suggestion of changing the $ for index operations too:

 Assume [] introduces a new scope, then a $ within [] would refer to 
 whatever
 is being indexed.

 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

I never thought of that. It's an intriguing idea. 

Something along these lines would *most certainly* get my vote!

- Eric Anderton at yahoo

Feb 16 2006

kris <fu bar.org> writes:

pragma wrote:
 In article <dt1eje$2uvu$2 digitaldaemon.com>, Walter Bright says...
 
"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1aif$2qpd$1 digitaldaemon.com...

Have you considered making this more general? I.e. for all if statements,
inject a variable that takes the value of the entire condition expression.
(Using _result as a placeholder for such an identifier.)

if ("..." ~~ "...) {
 _result.match(0);
}

if (myFunc()) {
 _result.whatever();
}

Why should this behavior be reserved for opMatch() only? Isn't this a very
common coding pattern that could also become less verbose by this:

SomeType result;
if ( (result = getSomething())) {
       doSomethingWith(result);
}

(becoming:

if (getSomething()) {
       doSomethingWith(_result);
}

)

One suggestion would be to call _result $. Giving $ the semantics of a
"scope injected value". This would go hand in hand with an earlier
suggestion of changing the $ for index operations too:

Assume [] introduces a new scope, then a $ within [] would refer to 
whatever
is being indexed.

char[] cutHeadAndTail = myString[1 .. $.length-1];
Image subImage = myImage[$.upperLeft .. $.middle];
char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

I never thought of that. It's an intriguing idea. 

 
 
 Something along these lines would *most certainly* get my vote!
 
 - Eric Anderton at yahoo

Yes ~ mine too

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

kris wrote:
 pragma wrote:
 In article <dt1eje$2uvu$2 digitaldaemon.com>, Walter Bright says...

 "Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
 news:dt1aif$2qpd$1 digitaldaemon.com...
 One suggestion would be to call _result $. Giving $ the semantics of a
 "scope injected value". This would go hand in hand with an earlier
 suggestion of changing the $ for index operations too:

 Assume [] introduces a new scope, then a $ within [] would refer to 
 whatever
 is being indexed.

 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

 I never thought of that. It's an intriguing idea.


 Something along these lines would *most certainly* get my vote!

 
 Yes ~ mine too

Mine too.


Sean

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

Sean Kelly wrote:
 kris wrote:
 pragma wrote:
 In article <dt1eje$2uvu$2 digitaldaemon.com>, Walter Bright says...

 "Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
 news:dt1aif$2qpd$1 digitaldaemon.com...
 One suggestion would be to call _result $. Giving $ the semantics of a
 "scope injected value". This would go hand in hand with an earlier
 suggestion of changing the $ for index operations too:

 Assume [] introduces a new scope, then a $ within [] would refer to 
 whatever
 is being indexed.

 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

 I never thought of that. It's an intriguing idea.


 Something along these lines would *most certainly* get my vote!

 Yes ~ mine too

 
 Mine too.

Hold on.  Walter, can you explain this injection business a bit?  For 
example, the effect here seems clear:

if( "x" ~~ "y" ) {
     _match.blah;
}

But what about this:

if( "x" ~~ "y" && "y" ~~ "z" ) {
     _match.blah;
}

And this:

if( "x" ~~ "y" || "y" ~~ "z" ) {
     _match.blah;
}

Does _match represent the result of the last match sub-expression 
evaluated?  And is there any way to know which expression succeeded? 
Does the fact that the injected value is a _Match* mean that I might 
potentially have an array of objects I could iterate through?  And 
finally, could you clarify the spec in this regard?

Also, with respect to the above proposal, how might this work:

int numStudents();
float avgGrade();

if( numStudents() < 10 || avgGrade() > 50.0 ) {

}

While the result of each subexression is actually boolean (just as in 
the match expression above), the values we'd be interested in are the 
integer and float.  But in the above example, the float might not be 
evaluated at all.  I'd merely like to voice this as a qualifier to my 
initial support of this idea above :-)


Sean

Feb 16 2006

Oskar Linde <olREM OVEnada.kth.se> writes:

Sean Kelly wrote:

 Hold on.  Walter, can you explain this injection business a bit?  For
 example, the effect here seems clear:
 
 if( "x" ~~ "y" ) {
      _match.blah;
 }
 
 But what about this:
 
 if( "x" ~~ "y" && "y" ~~ "z" ) {
      _match.blah;
 }
 
 And this:
 
 if( "x" ~~ "y" || "y" ~~ "z" ) {
      _match.blah;
 }

Those are AndAndExpression and OrOrExpression and will not inject anything.
Only a pure if(MatchExpression) injects anything.

 Also, with respect to the above proposal, how might this work:
 
 int numStudents();
 float avgGrade();
 
 if( numStudents() < 10 || avgGrade() > 50.0 ) {
 
 }

In this case, $ would always refer to the value of (numStudents() < 10 ||
avgGrade() > 50.0), which is bool and must always be true. (It would be
interesting to change the || expression into returning the left value if it
is nonzero and the right value otherwise, without converting anything to
bool, but I'm not fully sure what implications that would have...)

 While the result of each subexression is actually boolean (just as in
 the match expression above), the values we'd be interested in are the
 integer and float.  But in the above example, the float might not be
 evaluated at all.  I'd merely like to voice this as a qualifier to my
 initial support of this idea above :-)

This is probably impossible. How would the compiler know what subexpressions
are interesting and how would those be referred to?

/Oskar

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

Oskar Linde wrote:
 Sean Kelly wrote:
 
 Hold on.  Walter, can you explain this injection business a bit?  For
 example, the effect here seems clear:

 if( "x" ~~ "y" ) {
      _match.blah;
 }

 But what about this:

 if( "x" ~~ "y" && "y" ~~ "z" ) {
      _match.blah;
 }

 And this:

 if( "x" ~~ "y" || "y" ~~ "z" ) {
      _match.blah;
 }

 
 Those are AndAndExpression and OrOrExpression and will not inject anything.
 Only a pure if(MatchExpression) injects anything.

Very weird.  So a MatchExpression by itself has a boolean result but 
injects a value into the following scope?

 Also, with respect to the above proposal, how might this work:

 int numStudents();
 float avgGrade();

 if( numStudents() < 10 || avgGrade() > 50.0 ) {

 }

 
 In this case, $ would always refer to the value of (numStudents() < 10 ||
 avgGrade() > 50.0), which is bool and must always be true. (It would be
 interesting to change the || expression into returning the left value if it
 is nonzero and the right value otherwise, without converting anything to
 bool, but I'm not fully sure what implications that would have...)

So based on the above, your suggestion would only be useful for single 
call expressions:

if( numStudents() )
     printf( "%i students\n", $.whatever );

Seems reasonable I suppose.

 While the result of each subexression is actually boolean (just as in
 the match expression above), the values we'd be interested in are the
 integer and float.  But in the above example, the float might not be
 evaluated at all.  I'd merely like to voice this as a qualifier to my
 initial support of this idea above :-)

 
 This is probably impossible. How would the compiler know what subexpressions
 are interesting and how would those be referred to?

That's fine.  I was merely trying to sort out the implications of this 
new feature.


Sean

Feb 16 2006

Oskar Linde <olREM OVEnada.kth.se> writes:

Sean Kelly wrote:

 Oskar Linde wrote:
 
 Those are AndAndExpression and OrOrExpression and will not inject
 anything. Only a pure if(MatchExpression) injects anything.

 
 Very weird.  So a MatchExpression by itself has a boolean result but
 injects a value into the following scope?

No, not boolean. A MatchExpression has a _Match* result. This result is what
gets injected into the following scope. My suggestion is just a
generalization of this.

 So based on the above, your suggestion would only be useful for single
 call expressions:
 
 if( numStudents() )
      printf( "%i students\n", $.whatever );
 

Yes.

/Oskar

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

Oskar Linde wrote:
 Sean Kelly wrote:
 
 Oskar Linde wrote:
 Those are AndAndExpression and OrOrExpression and will not inject
 anything. Only a pure if(MatchExpression) injects anything.

 Very weird.  So a MatchExpression by itself has a boolean result but
 injects a value into the following scope?

 
 No, not boolean. A MatchExpression has a _Match* result. This result is what
 gets injected into the following scope. My suggestion is just a
 generalization of this.

Oh right.  And pointers can be implicitly evaluates as logical 
expressions.  Makes sense now.


Sean

Feb 16 2006

=?ISO-8859-1?Q?Julio_C=E9sar_Carrascal_Urquijo?= writes:

Oskar Linde wrote:
 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

This is a great idea. I like it.

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Julio C�sar Carrascal Urquijo" <jcesar phreaker.net> wrote in message 
news:dt28a3$o2q$1 digitaldaemon.com...
 Oskar Linde wrote:
 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

 This is a great idea. I like it.

There is one problem with it: every time an IfStatement is added to existing 
code, it will break all uses of $ in the ThenStatement:

----- before --------
if (foo())
    $.bar = 3;
------ after ---------
if (foo())
{
     if (abc())
        $.bar = 3;    // uh-oh!
}
----------------------

This is of course a trivial example, but consider if the $ appeared in a 
large block of code.

Feb 16 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Walter Bright wrote:
 "Julio C�sar Carrascal Urquijo" <jcesar phreaker.net> wrote in message 
 news:dt28a3$o2q$1 digitaldaemon.com...
 Oskar Linde wrote:
 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

 This is a great idea. I like it.

 
 There is one problem with it: every time an IfStatement is added to existing 
 code, it will break all uses of $ in the ThenStatement:
 
 ----- before --------
 if (foo())
     $.bar = 3;
 ------ after ---------
 if (foo())
 {
      if (abc())
         $.bar = 3;    // uh-oh!
 }
 ----------------------
 
 This is of course a trivial example, but consider if the $ appeared in a 
 large block of code. 
 

Coding guidelines would probably say that $ should be assigned to a 
named variable for all but the simplest blocks:
if (foo()) {
	auto myvar = $;
	...
}
The $ would be kind of elusive and only usable in its outermost scope. 
But the MatchExpression injected _match has the same problem. Consider 
the following hypothetical refactoring example:

const char[] two_argument_function_call = 
r"([_a-zA-Z][_0-9a-zA-Z]*)\(([^,\(\)]+),([^,\(\)]+)\)";

// Find function-calls
if (two_argument_function_call ~~ str) {
	// Swap the order of arguments for functions named array_*
	if ("array_(.+)" ~~ _match.match(1)) {
		// Need access results from outer _match.
	}
	...
}

And here is something the current MatchExpression behavior suffers from 
that a general scope variable would not:

if (a ~~ b) {
	if (c == d && e ~~ f) {
		do_something(_match.match(0)); // (*)
	}
}				

*) here e ~~ f is not injecting its result and _match refers to the 
result of a ~~ b

The apparent innocent change of removing the condition c == d from the 
if-statement will suddenly and silently have a side effect of injecting 
a shadowing _match variable and thus alter the argument to do_something().

Maybe this is a good time to consider Ben Hinkle's suggested 
declare-and-init operator := as a non-verbose way of naming 
sub-expressions.
http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
(Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
thread)

if (m := a ~~ b) {
   ...
   if (n := c ~~ m.match(0)) {
   ...
   }
}

/Oskar

Feb 17 2006

pragma <pragma_member pathlink.com> writes:

In article <dt4nqs$2erg$1 digitaldaemon.com>, Oskar Linde says...
Maybe this is a good time to consider Ben Hinkle's suggested 
declare-and-init operator := as a non-verbose way of naming 
sub-expressions.
http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
(Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
thread)

if (m := a ~~ b) {
   ...
   if (n := c ~~ m.match(0)) {
   ...
   }
}

Sort of like an auto 'auto' declaration?  I gather that the point is that the
lvalue to the := expession is transparent to the context in which it is used
(kind of inlining a variable creation and assignment)?

Also, how about using $.outer instead?

Link for "SSO" thread (with syntax examples at bottom of post):
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/33645


- Eric Anderton at yahoo

Feb 17 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

pragma wrote:
 In article <dt4nqs$2erg$1 digitaldaemon.com>, Oskar Linde says...
 Maybe this is a good time to consider Ben Hinkle's suggested 
 declare-and-init operator := as a non-verbose way of naming 
 sub-expressions.
 http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
 (Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
 thread)

 if (m := a ~~ b) {
   ...
   if (n := c ~~ m.match(0)) {
   ...
   }
 }

 
 Sort of like an auto 'auto' declaration?  I gather that the point is that the
 lvalue to the := expession is transparent to the context in which it is used
 (kind of inlining a variable creation and assignment)?

Yes, exactly so. The scope of such variables declared in the operand of 
for example if-statements should probably be similar to the scope of 
variables declared in the init-part of a for-declaration.

 Also, how about using $.outer instead?

$.outer could collide with a member identifier. Maybe using the keyword 
super somehow... or append another $, like $$ for outer, $$$ for 
outer(outer). I don't think it's very necessary when you can do
auto outer = $; before starting the inner scope.
/Oskar

Feb 17 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
news:dt4nqs$2erg$1 digitaldaemon.com...
 The apparent innocent change of removing the condition c == d from the 
 if-statement will suddenly and silently have a side effect of injecting a 
 shadowing _match variable and thus alter the argument to do_something().

Yes, that's a problem.

 Maybe this is a good time to consider Ben Hinkle's suggested 
 declare-and-init operator := as a non-verbose way of naming 
 sub-expressions.
 http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
 (Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
 thread)

 if (m := a ~~ b) {
   ...
   if (n := c ~~ m.match(0)) {
   ...
   }
 }

It's a workable proposal. But it overlaps the functionality of 'auto' 
declarations a bit much. And:

    if (auto m = a ~~ b)

might be a little wordy? Perhaps:

    if (m; a ~~ b)

sort of along the lines of foreach?

Feb 17 2006

Fredrik Olsson <peylow gmail.com> writes:

Walter Bright skrev:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:dt4nqs$2erg$1 digitaldaemon.com...
 
The apparent innocent change of removing the condition c == d from the 
if-statement will suddenly and silently have a side effect of injecting a 
shadowing _match variable and thus alter the argument to do_something().

 
 
 Yes, that's a problem.
 
 
Maybe this is a good time to consider Ben Hinkle's suggested 
declare-and-init operator := as a non-verbose way of naming 
sub-expressions.
http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
(Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
thread)

if (m := a ~~ b) {
  ...
  if (n := c ~~ m.match(0)) {
  ...
  }
}

 
 
 It's a workable proposal. But it overlaps the functionality of 'auto' 
 declarations a bit much. And:
 
     if (auto m = a ~~ b)
 
 might be a little wordy? Perhaps:
 
     if (m; a ~~ b)
 
 sort of along the lines of foreach? 
 
 

Yes! This one I like.

I have shuddred allot while reading this thread, I do not like too much 
magic happening in my code. This one is neat and simple, consistent with 
existing syntax. And most importantly; makes it quite hard to write 
incorrect code.

// Fredrik Olsson

Feb 17 2006

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:dt4nqs$2erg$1 digitaldaemon.com...
 The apparent innocent change of removing the condition c == d from the 
 if-statement will suddenly and silently have a side effect of injecting a 
 shadowing _match variable and thus alter the argument to do_something().

 
 Yes, that's a problem.
 
 Maybe this is a good time to consider Ben Hinkle's suggested 
 declare-and-init operator := as a non-verbose way of naming 
 sub-expressions.
 http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
 (Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
 thread)

 if (m := a ~~ b) {
   ...
   if (n := c ~~ m.match(0)) {
   ...
   }
 }

 
 It's a workable proposal. But it overlaps the functionality of 'auto' 
 declarations a bit much. And:
 
     if (auto m = a ~~ b)
 
 might be a little wordy? Perhaps:
 
     if (m; a ~~ b)
 
 sort of along the lines of foreach?

I like it.  Assuming this were implemented, would it affect all 
conditional expressions except foreach?


Sean

Feb 17 2006

Sai <Sai_member pathlink.com> writes:

     if (auto m = a ~~ b)
 
 might be a little wordy? Perhaps:
 
     if (m; a ~~ b)
 

I personally like the former, it does not need special 'if' syntax.

Feb 17 2006

Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:

Walter Bright wrote:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:dt4nqs$2erg$1 digitaldaemon.com...
 
The apparent innocent change of removing the condition c == d from the 
if-statement will suddenly and silently have a side effect of injecting a 
shadowing _match variable and thus alter the argument to do_something().

 
 
 Yes, that's a problem.
 
 
Maybe this is a good time to consider Ben Hinkle's suggested 
declare-and-init operator := as a non-verbose way of naming 
sub-expressions.
http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
(Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
thread)

if (m := a ~~ b) {
  ...
  if (n := c ~~ m.match(0)) {
  ...
  }
}

 
 
 It's a workable proposal. But it overlaps the functionality of 'auto' 
 declarations a bit much. And:
 
     if (auto m = a ~~ b)
 
 might be a little wordy? Perhaps:
 
     if (m; a ~~ b)
 
 sort of along the lines of foreach? 
 

How would this scale to something like

if((a ~~ b) && (c ~~ d))

would it be:

if( m; a~~b && n; c~~d) ?

This looks confusing to me. Wouldn't ':' look better here:

if( m: a~~b && n: c~~d) ?

But I think i like Ben's declare nad init := operator best in this case.

Feb 17 2006

Georg Wrede <georg.wrede nospam.org> writes:

Walter Bright wrote:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:dt4nqs$2erg$1 digitaldaemon.com...
 
 The apparent innocent change of removing the condition c == d from
 the if-statement will suddenly and silently have a side effect of
 injecting a shadowing _match variable and thus alter the argument
 to do_something().

 
 
 Yes, that's a problem.
 
 
 Maybe this is a good time to consider Ben Hinkle's suggested 
 declare-and-init operator := as a non-verbose way of naming 
 sub-expressions. 
 http://www.digitalmars.com/d/archives/digitalmars/D/28198.html 
 (Also similar to Serg Kovrovs suggestion on the Semantic Scope
 Operator thread)
 
 if (m := a ~~ b) { ... if (n := c ~~ m.match(0)) { ... } }

 
 
 It's a workable proposal. But it overlaps the functionality of 'auto'
  declarations a bit much. And:
 
 if (auto m = a ~~ b)
 
 might be a little wordy? Perhaps:
 
 if (m; a ~~ b)
 
 sort of along the lines of foreach?

I'm uneasy with this. We're playing with fundamental constructs here.

if( ; )

is something so pivotal, that we should give this careful thought.

If it took us 4 years of hard work to get rid of bit, what will happen 
when this gets rushed into the language without due diligence?

Feb 17 2006

"Kris" <fu bar.com> writes:

"Georg Wrede" <georg.wrede nospam.org> wrote
[snip]
 if (auto m = a ~~ b)

 might be a little wordy? Perhaps:

 if (m; a ~~ b)

 sort of along the lines of foreach?

 I'm uneasy with this. We're playing with fundamental constructs here.

 if( ; )

 is something so pivotal, that we should give this careful thought.

 If it took us 4 years of hard work to get rid of bit, what will happen 
 when this gets rushed into the language without due diligence?

I'm all for getting some kind of regex sugar in the grammar, but also feel a 
bit alarmed about the sudden rush to 'slam' all this into the language. 
Seems like it would be wiser to approach this whole thing in smaller steps: 
let's see how foreach() goes first?

Feb 17 2006

Sean Kelly <sean f4.ca> writes:

Kris wrote:
 
 I'm all for getting some kind of regex sugar in the grammar, but also feel a 
 bit alarmed about the sudden rush to 'slam' all this into the language. 
 Seems like it would be wiser to approach this whole thing in smaller steps: 
 let's see how foreach() goes first? 

As long as these new features don't break old code, I'm fine with Walter 
trying things out.  After all, the best way to solicit input is often to 
give people something to play with.  But it would be nice if there were 
a way to have these features flagged as "experimental."


Sean

Feb 17 2006

"Kris" <fu bar.com> writes:

"Sean Kelly" <sean f4.ca> wrote...
 Kris wrote:
 I'm all for getting some kind of regex sugar in the grammar, but also 
 feel a bit alarmed about the sudden rush to 'slam' all this into the 
 language. Seems like it would be wiser to approach this whole thing in 
 smaller steps: let's see how foreach() goes first?

 As long as these new features don't break old code, I'm fine with Walter 
 trying things out.  After all, the best way to solicit input is often to 
 give people something to play with.  But it would be nice if there were a 
 way to have these features flagged as "experimental."

That would be cool.

Feb 17 2006

Sean Kelly <sean f4.ca> writes:

Georg Wrede wrote:
 
 I'm uneasy with this. We're playing with fundamental constructs here.
 
 if( ; )
 
 is something so pivotal, that we should give this careful thought.
 
 If it took us 4 years of hard work to get rid of bit, what will happen 
 when this gets rushed into the language without due diligence?

True enough.  However, the above syntax is currently illegal, so there's 
no change of something breaking, and C/C++ already allow declarations in 
if blocks via the traditional method:

if( int x = foo() ) {}

One of Walter's other suggestions was to use this syntax, with the 
qualification that it was a bit verbose.

One thing I like about the proposed syntax is that it's already how 
foreach works, so the semantic meaning is mostly just being extended to 
if and while blocks.  The 'for' syntax doesn't match this however, which 
may be one argument in favor of the more traditional 'auto' method.

Personally, my primary interest is that the syntax be both consistent 
and obvious.  Both of the above work for me, but I favor "if( x; foo() 
)" if implicit type determination is mandatory.  If it's not, I'm 
ambivalent.


Sean

Feb 17 2006

Deewiant <deewiant.doesnotlike.spam gmail.com> writes:

Oskar Linde wrote:
 Coding guidelines would probably say that $ should be assigned to a
 named variable for all but the simplest blocks:
 if (foo()) {
     auto myvar = $;
     ...
 }
 The $ would be kind of elusive and only usable in its outermost scope.

That would sort of make the whole token pointless IMO - easier just to do
something like:

if ((myvar = foo()) != 0)

or whatever, I'm not sure exactly how the syntax currently works for this.

Feb 17 2006

kris <fu bar.org> writes:

Walter Bright wrote:
 "Kris" <fu bar.com> wrote in message news:dt0q7n$2cuo$1 digitaldaemon.com...
 
There seem to be multiple issues here. The first one, which you ask about, 
is related to the syntax. At first blush, the ~~ looks like an approximate 
approximation, and then making D look like a malformed Perl is surely a 
mistake.

 
 
 If you've got a better idea for tokens ~~ and !~ ?

Well, there's always "in" ...

if (".wav$" in filename)
     ...

plus the !in variation. Don't you find that somewhat more appealing?



What the heck is wrong with $match.pre, $match.post, $match.index(n) 
instead? At least they're readable :-)

 
 
 Nothing, really. But are they more readable than _match.pre, etc.?

I believe the shortened versions ($pre, $post, $group[n] etc) are much 
more readable. This type of thing is why some of us were so adamant 
about saving the $ sign as a prefix for meta-tags, vis-a-vis $time, 
$file, $line and, of course, $length


Additionally, I thought '~' was used for concatenation?

 
 
 It is.
 
 
Because '+' is overloaded in other languages? Isn't that just exactly what 
you're now doing with '~' ?

 
 
 '=' and '==' mean entirely different things. So does / and /*. I don't think 
 ~~ need have anything to do with complement or concatenation.

The first two are at least related. But the argument is flawed: choosing 
arbitrary symbols for operators does not make the language easier to 
grasp. At least "in" has some relevant meaning to it.


Then, you say this is applicable only to char[]. What about wchar[] and 
dchar[]? Are they now relegated to second-class citizens? It's no use 
converting those arrays into char[] on the fly ~ apart from the heap 
activity and conversion that would ensue (for both operands; one of which 
could be rather substantial), $match.pre and friends would also have to do 
conversions back into the original format. Ugghh.

 
 
 That is a problem, one that would get solved when RegExp can do wchar and 
 dchar. That isn't a technical problem, it's more of a getting around to it 
 problem.

Well, since grammar supported regex has elevated itself to the top of 
the priority list, perhaps wchar/dchar support might tag along with it?


Yet another issue is with respect to case-folding (which is often used 
with regex expressions). You see, unicode case-folding does not follow the 
trivial rules of ASCII ~ you can't just call tolower() and hope for the 
best. Thus, there needs to be some mechanism to support alternate, more 
appropriate, converters.

 
 
 I agree that case is an issue. That's why this also works:
 
     if (RegExp("string", "i") ~~ "string") ...
 
 and can work with any class type as the left operand, as long as it 
 overloads opMatch.

That's a good solution. Do you have a unicode 'folder' ?


In retrospect, much of this should probably be handled via template usage 
(for the different UTF types). And the converter issue can be resolved by 
supporting some kind of assignable or plug-in module. All of this can be 
handled by a templated class. I attempted to do just this with your RegExp 
class, but ran into problems related to how patterns are stored in the 
"instruction" stream (size differences between char and dchar, for 
example).

 
 
 I don't agree. The problem I ran into with this approach is the injection of 
 the declaration _match into the current scope.

I don't understand the relevance of that, Walter. What does _match have 
to do with the need to support utf8,utf16 and utf32?


I'm an advocate for potentially getting regex support into the grammar 
but, on the face of it, your approach just doesn't appear to be considered 
in a particularly thorough manner. There again, perhaps you've already 
addressed the above issues, and the resolution is just not currently 
visible?

 
 
 I considered many ways of doing it, and have actually been thinking about it 
 for months. This seemed to be the most practical. I hope I answered your 
 questions about it.

No, but the opMatch() is a good solution for that aspect.



Perhaps this whole thing should wait until after we see what can be done 
with the regex templates, so that there's some experience behind the 
grammar? I mean, that would surely be better than having to remove the 
above at some point in the future. What's the big rush with built-in regex 
anyway? I really do think it should wait until we have some solid 
experience with regex templates ~ don't you think it's rather likely we'll 
learn something really useful that applies directly to a built-in grammar?

 
 
 I don't think this takes away from the regex templates. I hope to use the 
 regex templates in conjunction with this syntactic sugar to create optimized 
 regex evaluation. 

Perhaps, but I really don't see the need for this sudden rush to get 
regex support into the grammar. Experience with regex templates is 
almost certain to uncover some conflict in this regard ~ one that will 
likely have to be compromised to fit in with the current syntax. That's 
just Murphy's law. What's the big hurry?

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"kris" <fu bar.org> wrote in message news:dt1cm1$2t76$1 digitaldaemon.com...
 Walter Bright wrote:
 Well, there's always "in" ...

 if (".wav$" in filename)
     ...
 plus the !in variation. Don't you find that somewhat more appealing?

Not really. I think it also conflicts with 'in' already.

What the heck is wrong with $match.pre, $match.post, $match.index(n) 
instead? At least they're readable :-)

 Nothing, really. But are they more readable than _match.pre, etc.?

 I believe the shortened versions ($pre, $post, $group[n] etc) are much 
 more readable. This type of thing is why some of us were so adamant about 
 saving the $ sign as a prefix for meta-tags, vis-a-vis $time, $file, $line 
 and, of course, $length

Fair enough. Let's see what others think.

Additionally, I thought '~' was used for concatenation?

 It is.
Because '+' is overloaded in other languages? Isn't that just exactly 
what you're now doing with '~' ?

 '=' and '==' mean entirely different things. So does / and /*. I don't 
 think ~~ need have anything to do with complement or concatenation.

 The first two are at least related. But the argument is flawed: choosing 
 arbitrary symbols for operators does not make the language easier to 
 grasp.

It's all a matter of what you're used to. Who'd have thought that '!' for 
'not' would feel natural? It was a kludge invented for C. Now it's standard.

 At least "in" has some relevant meaning to it.

It would be overloading its existing meaning, which means that it'll take 
semantic, rather than syntactic, analysis to disambiguate. This is potential 
trouble.

 That is a problem, one that would get solved when RegExp can do wchar and 
 dchar. That isn't a technical problem, it's more of a getting around to 
 it problem.

 Well, since grammar supported regex has elevated itself to the top of the 
 priority list, perhaps wchar/dchar support might tag along with it?

The thing is, RegExp has been in there from the beginning, but it has gone 
unused and even its existence is overlooked. I don't believe that's because 
it isn't useful - look at Ruby, Perl, Javascript, etc. Those languages 
heavilly use regex. Is there something inherent about *script* languages 
that make them nice for regex? I don't believe there is, I think it gets 
heavilly used in those languages because the syntactic sugar makes it easy 
to use.

I've been blasted for putting strings in the language (instead of as a 
library String class), for putting complex numbers in, and for associative 
arrays. I think the results speak for these being a success. If regex's are 
heavilly used, then the extra sugar for them becomes worthwhile as well.

Who uses regex in C++? Hardly anyone. I'm betting it's because using them 
sucks in C++, not because people don't use regex's.


 I agree that case is an issue. That's why this also works:

     if (RegExp("string", "i") ~~ "string") ...

 and can work with any class type as the left operand, as long as it 
 overloads opMatch.

 That's a good solution. Do you have a unicode 'folder' ?

No. But that's a library issue, not a language issue. Match expressions are 
set up so that one can completely control their behavior with a custom 
class.

In retrospect, much of this should probably be handled via template usage 
(for the different UTF types). And the converter issue can be resolved by 
supporting some kind of assignable or plug-in module. All of this can be 
handled by a templated class. I attempted to do just this with your 
RegExp class, but ran into problems related to how patterns are stored in 
the "instruction" stream (size differences between char and dchar, for 
example).

 I don't agree. The problem I ran into with this approach is the injection 
 of the declaration _match into the current scope.

 I don't understand the relevance of that, Walter. What does _match have to 
 do with the need to support utf8,utf16 and utf32?

Nothing. But _match *does* have a lot to do with the inadequacy of a pure 
template solution. Not even mixins will work in a nice way here.

 I don't think this takes away from the regex templates. I hope to use the 
 regex templates in conjunction with this syntactic sugar to create 
 optimized regex evaluation.

 Perhaps, but I really don't see the need for this sudden rush to get regex 
 support into the grammar. Experience with regex templates is almost 
 certain to uncover some conflict in this regard ~ one that will likely 
 have to be compromised to fit in with the current syntax. That's just 
 Murphy's law. What's the big hurry?

I thought it fit in well with D's new capability of being runnable in a 
script-like fashion. If this opens up a reasonably broad new range of 
applications that D is a good fit for, that's good. I might be wrong, of 
course, as I've been with the bit data type (a complete botch). Match 
expressions don't break anything, were not expensive to implement, and the 
only way to see how they'll work out is to try them.

Feb 16 2006

kris <fu bar.org> writes:

Walter Bright wrote:
 "kris" <fu bar.org> wrote in message news:dt1cm1$2t76$1 digitaldaemon.com...
 
Walter Bright wrote:
Well, there's always "in" ...

if (".wav$" in filename)
    ...
plus the !in variation. Don't you find that somewhat more appealing?

 
 
 Not really. I think it also conflicts with 'in' already.

but not from the users standpoint


 It's all a matter of what you're used to. Who'd have thought that '!' for 
 'not' would feel natural? It was a kludge invented for C. Now it's standard.

That doesn't mean D should adopt arbitrary symbols, Walter. If you want 
rapid adoption, then the more you can do to make the language 
"approachable", the more success you'll have. There was a similar issue 
with === and !==, and you thankfully deprecated them :-)


At least "in" has some relevant meaning to it.

 
 
 It would be overloading its existing meaning, which means that it'll take 
 semantic, rather than syntactic, analysis to disambiguate. This is potential 
 trouble.

I can see that there "might" be trouble for the compiler and, if so, 
that would be an issue. However, for a developer, the meaning of "in" 
with respect to its use with AA and potentially regex-patterns is 
consistent. One is asking the question "does this thing on the left 
exist within the thing on the right". It even takes care of getting the 
operand ordering correct. Thus, I'd urge you to at least see if there's 
actually a notable problem for the compiler to handle this before 
writing the idea off.


 The thing is, RegExp has been in there from the beginning, but it has gone 
 unused and even its existence is overlooked. I don't believe that's because 
 it isn't useful - look at Ruby, Perl, Javascript, etc. Those languages 
 heavilly use regex. Is there something inherent about *script* languages 
 that make them nice for regex? I don't believe there is, I think it gets 
 heavilly used in those languages because the syntactic sugar makes it easy 
 to use.

Heck, I've used regex in all manner of ways. I don't think visibility is 
the problem; rather, I suspect there's a limited set of domains where it 
applies in a systems language. Some of the those can be addressed in 
other ways, particularly where performance is a concern; hence regex may 
not get used as much as it might. In scripting languages there's often a 
need for Q & D pattern-matching, with little regard for a potentially 
more efficient mechanism. Horses for courses.


 I've been blasted for putting strings in the language (instead of as a 
 library String class), for putting complex numbers in, and for associative 
 arrays. I think the results speak for these being a success. If regex's are 
 heavilly used, then the extra sugar for them becomes worthwhile as well.

That's getting a bit off topic, isn't it? OK, I'll go with it:

I'm an advocate for getting regex support in the grammar, but I'm 
certainly not an advocate for tying Phobos to the compiler (RegExp has a 
notable resultant import set; because of this I refactored it for Ares 
and Mango).

Without a clearly defined means to decouple Phobos from the compiler, 
you're effectively erecting barriers for other solutions to clamber over 
(as Sean vaguely intimated earlier). What's missing from all this 
built-in stuff is a clean and documented means to have it supported 
outside of Phobos. After all, the compiler is injecting explicit 
references for AA code, utf conversion code, regex code, and a variety 
of other things. What's next?

In short: you're (a) building more and more library functionality 
directly into the language without providing a means to cleanly support 
alternate implementations, extensions, or otherwise decouple the 
compiler. And (b) by doing so, you're (perhaps inadvertantly) stifling 
some innovation and causing some headaches for the very people who are 
trying to help D along the road to acceptance. It would really help if 
you'd be somewhat sensitive to these aspects rather than persistently 
ignoring them.

For instance, how does one change .sort to use a different sorting 
algorithm? How does one change the hashing function for non-classes? How 
can one unhook RegExp+OutBuffer+String+Others, and replace it? etc. etc. 
If D is intended to be a closed-shop, Phobos-only environment, then some 
of us are presumably wasting our time supporting the language; right?

I don't suppose that was the answer you were looking for <g>


 Who uses regex in C++? Hardly anyone. I'm betting it's because using them 
 sucks in C++, not because people don't use regex's.

Again, it's horses for courses. BTW, regex does not suck in C, so why C++ ?

 I thought it fit in well with D's new capability of being runnable in a 
 script-like fashion. If this opens up a reasonably broad new range of 
 applications that D is a good fit for, that's good. I might be wrong, of 
 course, as I've been with the bit data type (a complete botch). Match 
 expressions don't break anything, were not expensive to implement, and the 
 only way to see how they'll work out is to try them. 

I figured that was the motivation. The "cost" you speak of considers 
only how much effort it takes you to get the functionality into the 
compiler, test it a bit, document the usage, and respond to the flak ;-)

BTW: perhaps it would be appropriate to deprecate bit[] before 1.0 and 
provide a nice library class/struct instead? You might even reuse the 
old code from Zortech/Zorland days.

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

kris wrote:
 Walter Bright wrote:
 
 I've been blasted for putting strings in the language (instead of as a 
 library String class), for putting complex numbers in, and for 
 associative arrays. I think the results speak for these being a 
 success. If regex's are heavilly used, then the extra sugar for them 
 becomes worthwhile as well.

 
 That's getting a bit off topic, isn't it? OK, I'll go with it:
 
 I'm an advocate for getting regex support in the grammar, but I'm 
 certainly not an advocate for tying Phobos to the compiler (RegExp has a 
 notable resultant import set; because of this I refactored it for Ares 
 and Mango).
 
 Without a clearly defined means to decouple Phobos from the compiler, 
 you're effectively erecting barriers for other solutions to clamber over 
 (as Sean vaguely intimated earlier). What's missing from all this 
 built-in stuff is a clean and documented means to have it supported 
 outside of Phobos. After all, the compiler is injecting explicit 
 references for AA code, utf conversion code, regex code, and a variety 
 of other things. What's next?

I'm branching Ares before I check in this last block of changes.  In the 
new branch I'm simply going to move all necessary Phobos std code 
required into dmdrt/util and will plan to trim it down over time.  Not 
ideal, I know, but better than trying to play catch-up with heavily 
modified code such as the version of RegExp you provided.  For the rest, 
I agree completely, but then I've already said as much in d.D.announce :-)

 Who uses regex in C++? Hardly anyone. I'm betting it's because using 
 them sucks in C++, not because people don't use regex's.

 
 Again, it's horses for courses. BTW, regex does not suck in C, so why C++ ?

The lack of a standard library component is a significant factor IMO. 
As is the widely divergent syntaxes supported by third party libraries. 
  Personally, I haven't used regular expressions in D because I haven't 
needed to yet, not because they weren't a language feature.  But I can't 
help liking this being built-in from a language perspective, even if 
this is balanced by practical concerns.

 I thought it fit in well with D's new capability of being runnable in 
 a script-like fashion. If this opens up a reasonably broad new range 
 of applications that D is a good fit for, that's good. I might be 
 wrong, of course, as I've been with the bit data type (a complete 
 botch). Match expressions don't break anything, were not expensive to 
 implement, and the only way to see how they'll work out is to try them. 

 
 I figured that was the motivation. The "cost" you speak of considers 
 only how much effort it takes you to get the functionality into the 
 compiler, test it a bit, document the usage, and respond to the flak ;-)

 BTW: perhaps it would be appropriate to deprecate bit[] before 1.0 and 
 provide a nice library class/struct instead? You might even reuse the 
 old code from Zortech/Zorland days.

If it helps, I'll send you a case of beer or something ;-)  But if 
there's universal agreement that packed bit arrays were a mistake then 
they need to be out pre-1.0 and broken code be damned.  I really don't 
want to see a 1.0 D release containing features that even the designer 
thinks should not exist.


Sean

Feb 16 2006

Thomas Kuehne <thomas-dloop kuehne.cn> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sean Kelly schrieb am 2006-02-16:
 kris wrote:

[snip]

 BTW: perhaps it would be appropriate to deprecate bit[] before 1.0 and 
 provide a nice library class/struct instead? You might even reuse the 
 old code from Zortech/Zorland days.

 If it helps, I'll send you a case of beer or something ;-)  But if 
 there's universal agreement that packed bit arrays were a mistake then 
 they need to be out pre-1.0 and broken code be damned.  I really don't 
 want to see a 1.0 D release containing features that even the designer 
 thinks should not exist.

What is the cost of keeping bit[] in the language?

Currently, every type - including void - can be used as the type on an
array element. What would be the consequences for generic programming
if T -> T[] isn't guaranteed to succeed?

Thomas



-----BEGIN PGP SIGNATURE-----

iD8DBQFD9QXm3w+/yD4P9tIRAruLAJ96SNaO7jn85lXJxyxXmMVsS3bPZACdG1pd
KBuKJE2ogwPwg0YSHeGIJ+A=
=+ZUL
-----END PGP SIGNATURE-----

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

Thomas Kuehne wrote:
 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1
 
 Sean Kelly schrieb am 2006-02-16:
 kris wrote:

 
 [snip]
 
 BTW: perhaps it would be appropriate to deprecate bit[] before 1.0 and 
 provide a nice library class/struct instead? You might even reuse the 
 old code from Zortech/Zorland days.

 If it helps, I'll send you a case of beer or something ;-)  But if 
 there's universal agreement that packed bit arrays were a mistake then 
 they need to be out pre-1.0 and broken code be damned.  I really don't 
 want to see a 1.0 D release containing features that even the designer 
 thinks should not exist.

 
 What is the cost of keeping bit[] in the language?
 
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

The same as the problems with std::vector<bool> in C++ (though I don't 
have any specific references handy).  I think the true ramifications of 
this in D won't be completely apparent until the language has been in 
use a bit longer however.

One thought I had was to leave bit in place, perhaps deprecated, and add 
'bool' as a non-packed but otherwise equivalent type.


Sean

Feb 16 2006

"Kris" <fu bar.com> writes:

"Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

Easy fix ~ change the bool alias to byte, instead of bit :-)

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

Kris wrote:
 "Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

 
 Easy fix ~ change the bool alias to byte, instead of bit :-) 

I already use byte in some cases :-)  But it lacks the boolean value 
safety of bit, so I tend to litter my code with asserts just to be sure 
something didn't get screwed up... or simply make sure I'm only 
comparing to zero and not-zero.  Either way, it's more error prone than 
I'd like.


Sean

Feb 16 2006

"Kris" <fu bar.com> writes:

"Sean Kelly" <sean f4.ca> wrote
 Kris wrote:
 "Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

 Easy fix ~ change the bool alias to byte, instead of bit :-)

 I already use byte in some cases :-)  But it lacks the boolean value 
 safety of bit, so I tend to litter my code with asserts just to be sure 
 something didn't get screwed up... or simply make sure I'm only comparing 
 to zero and not-zero.  Either way, it's more error prone than I'd like.

Yes, you're right of course. Would be just great if Walter would add a true 
*cough* bool *cough* type that doesn't try to pack itself when used with 
arrays. Packed bits are great too, but for different reasons.

Feb 16 2006

"Regan Heath" <regan netwin.co.nz> writes:

On Thu, 16 Feb 2006 16:36:48 -0800, Kris <fu bar.com> wrote:
 "Sean Kelly" <sean f4.ca> wrote
 Kris wrote:
 "Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

 Easy fix ~ change the bool alias to byte, instead of bit :-)

 I already use byte in some cases :-)  But it lacks the boolean value
 safety of bit, so I tend to litter my code with asserts just to be sure
 something didn't get screwed up... or simply make sure I'm only  
 comparing
 to zero and not-zero.  Either way, it's more error prone than I'd like.

 Yes, you're right of course. Would be just great if Walter would add a  
 true
 *cough* bool *cough* type that doesn't try to pack itself when used with
 arrays.

A true bool would make several people happy.. but once one existed people  
would then want:

class A {}
A a = new a();
if (a) //error not boolean result.

right? That would bother me.

 Packed bits are great too, but for different reasons.

Indeed, I can think of several uses for packed bits. i.e.
  - Using them as a bunch of flags, generally boolean on/off flags.
  - Representing/disecting packed data, i.e. tcp headers.
  - Assembling/converting data i.e. 8bit to 7bit characters for SMS  
messages.

all of these can be done with & | ^ etc but it would be nice, i.e. more  
readable, easier to write if we could index the data.

I've suggested this before but is it perhaps possible to allow us to  
perform array operations on the basic types: byte, short, int, long. For  
the same reason that bit[] does not work, these could not provide a full  
set of array functionality, but it could provide much that would be of  
use, I suspect.

Examples:

int flags;
...
if (flags[5]) //check for flag
	flag[5] = 1; //set flag

void foo(long header) {
   int length = header[0..5]; //copy bits to lvalue.
...

For the 3rd task, converting from 8bit to 7bit some sort of stream that  
allowed bits to be sent to it and assembled would be the ideal way, I  
suspect.

In the end it's just syntactic sugar for & | and ^. The question is, does  
it make the code clearer, I think so. Does it make bit manipulation easier  
to code, I think so. Is that enough to make it a valuable feature?

Regan

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

Regan Heath wrote:
 On Thu, 16 Feb 2006 16:36:48 -0800, Kris <fu bar.com> wrote:
 "Sean Kelly" <sean f4.ca> wrote
 Kris wrote:
 "Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

 Easy fix ~ change the bool alias to byte, instead of bit :-)

 I already use byte in some cases :-)  But it lacks the boolean value
 safety of bit, so I tend to litter my code with asserts just to be sure
 something didn't get screwed up... or simply make sure I'm only 
 comparing
 to zero and not-zero.  Either way, it's more error prone than I'd like.

 Yes, you're right of course. Would be just great if Walter would add a 
 true
 *cough* bool *cough* type that doesn't try to pack itself when used with
 arrays.

 
 A true bool would make several people happy.. but once one existed 
 people would then want:
 
 class A {}
 A a = new a();
 if (a) //error not boolean result.
 
 right? That would bother me.

This is only a slippery slope if we want it to be ;-)  I think the 
intent behind adding 'bool' was twofold: first, 'bit' loses meaning if 
it never actually refers to a bit, and second, it allows 'bit' to be 
deprecated for a while so people can change their code.

 Packed bits are great too, but for different reasons.

 
 Indeed, I can think of several uses for packed bits. i.e.
  - Using them as a bunch of flags, generally boolean on/off flags.
  - Representing/disecting packed data, i.e. tcp headers.
  - Assembling/converting data i.e. 8bit to 7bit characters for SMS 
 messages.
 
 all of these can be done with & | ^ etc but it would be nice, i.e. more 
 readable, easier to write if we could index the data.

Aye.  I like the idea of packed bit arrays in general.  I just don't 
want them to be mandatory for the built-in boolean type--I run into too 
many situations where I want to do something that the existing syntax 
doesn't support and I'm stuck using an array of bytes instead.


Sean

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Regan Heath" <regan netwin.co.nz> wrote in message 
news:ops43d5lmc23k2f5 nrage.netwin.co.nz...
 In the end it's just syntactic sugar for & | and ^. The question is, does 
 it make the code clearer, I think so. Does it make bit manipulation easier 
 to code, I think so. Is that enough to make it a valuable feature?

I regularly do bit masking and shifting on ints. I'm so used to it, I don't 
think that adding sugar for it would help any.

Feb 16 2006

Derek Parnell <derek psych.ward> writes:

On Thu, 16 Feb 2006 17:25:23 -0800, Walter Bright wrote:

 "Regan Heath" <regan netwin.co.nz> wrote in message 
 news:ops43d5lmc23k2f5 nrage.netwin.co.nz...
 In the end it's just syntactic sugar for & | and ^. The question is, does 
 it make the code clearer, I think so. Does it make bit manipulation easier 
 to code, I think so. Is that enough to make it a valuable feature?

 
 I regularly do bit masking and shifting on ints. I'm so used to it, I don't 
 think that adding sugar for it would help any.

YOU ARE DEAD WRONG! Sheesh!!! Not all us are blessed with your abilities.
That's why I don't do Assembler anymore and that's why we use higher level
languages than machine code.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
17/02/2006 12:40:36 PM

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:edpqlnztl599.19xc3uf14ntbh.dlg 40tude.net...
 On Thu, 16 Feb 2006 17:25:23 -0800, Walter Bright wrote:

 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:ops43d5lmc23k2f5 nrage.netwin.co.nz...
 In the end it's just syntactic sugar for & | and ^. The question is, 
 does
 it make the code clearer, I think so. Does it make bit manipulation 
 easier
 to code, I think so. Is that enough to make it a valuable feature?

 I regularly do bit masking and shifting on ints. I'm so used to it, I 
 don't
 think that adding sugar for it would help any.

 YOU ARE DEAD WRONG! Sheesh!!! Not all us are blessed with your abilities.

What about using some functions instead:

    int setBit(inout v, int b)
    {
        return v |= 1 << b;
    }

?

 That's why I don't do Assembler anymore and that's why we use higher level
 languages than machine code.

<g>

Feb 16 2006

Derek Parnell <derek psych.ward> writes:

On Thu, 16 Feb 2006 17:48:54 -0800, Walter Bright wrote:

 "Derek Parnell" <derek psych.ward> wrote in message 
 news:edpqlnztl599.19xc3uf14ntbh.dlg 40tude.net...
 On Thu, 16 Feb 2006 17:25:23 -0800, Walter Bright wrote:

 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:ops43d5lmc23k2f5 nrage.netwin.co.nz...
 In the end it's just syntactic sugar for & | and ^. The question is, 
 does
 it make the code clearer, I think so. Does it make bit manipulation 
 easier
 to code, I think so. Is that enough to make it a valuable feature?

 I regularly do bit masking and shifting on ints. I'm so used to it, I 
 don't
 think that adding sugar for it would help any.

 YOU ARE DEAD WRONG! Sheesh!!! Not all us are blessed with your abilities.

 
 What about using some functions instead:
 
     int setBit(inout v, int b)
     {
         return v |= 1 << b;
     }
 
 ?

You mean like std.regexp library functions? Oh that's right ... we have ~~
now; silly me.


-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
17/02/2006 1:49:07 PM

Feb 16 2006

"Kris" <fu bar.com> writes:

"Walter Bright" <newshound digitalmars.com> wrote
 "Regan Heath" <regan netwin.co.nz> wrote in message 
 news:ops43d5lmc23k2f5 nrage.netwin.co.nz...
 In the end it's just syntactic sugar for & | and ^. The question is, does 
 it make the code clearer, I think so. Does it make bit manipulation 
 easier to code, I think so. Is that enough to make it a valuable feature?

 I regularly do bit masking and shifting on ints. I'm so used to it, I 
 don't think that adding sugar for it would help any.

Besides, its easy to use op-overloads for such things as necessary.

Feb 16 2006

Derek Parnell <derek psych.ward> writes:

On Fri, 17 Feb 2006 13:54:47 +1300, Regan Heath wrote:

 On Thu, 16 Feb 2006 16:36:48 -0800, Kris <fu bar.com> wrote:
 "Sean Kelly" <sean f4.ca> wrote
 Kris wrote:
 "Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

 Easy fix ~ change the bool alias to byte, instead of bit :-)

 I already use byte in some cases :-)  But it lacks the boolean value
 safety of bit, so I tend to litter my code with asserts just to be sure
 something didn't get screwed up... or simply make sure I'm only  
 comparing
 to zero and not-zero.  Either way, it's more error prone than I'd like.

 Yes, you're right of course. Would be just great if Walter would add a  
 true
 *cough* bool *cough* type that doesn't try to pack itself when used with
 arrays.

 
 A true bool would make several people happy.. but once one existed people  
 would then want:
 
 class A {}
 A a = new a();
 if (a) //error not boolean result.
 
 right? That would bother me.

I regard the syntax

   if ( <identifier> )

as shorthand for

    if ( <identifier> != 0 )

or 

    if ( <identifier> !is null)

as appropriate, so this would not fall foul of a native boolean
implementation.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
17/02/2006 12:37:40 PM

Feb 16 2006

Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:

Regan Heath wrote:
 On Thu, 16 Feb 2006 16:36:48 -0800, Kris <fu bar.com> wrote:
 "Sean Kelly" <sean f4.ca> wrote
 Kris wrote:
 "Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

 Easy fix ~ change the bool alias to byte, instead of bit :-)

 I already use byte in some cases :-)  But it lacks the boolean value
 safety of bit, so I tend to litter my code with asserts just to be sure
 something didn't get screwed up... or simply make sure I'm only 
 comparing
 to zero and not-zero.  Either way, it's more error prone than I'd like.

 Yes, you're right of course. Would be just great if Walter would add a 
 true
 *cough* bool *cough* type that doesn't try to pack itself when used with
 arrays.

 
 A true bool would make several people happy.. but once one existed 
 people would then want:
 
 class A {}
 A a = new a();
 if (a) //error not boolean result.
 
 right? That would bother me.
 

I favor a true bool, but would still like to keep the if(<int>) idiom.

 Packed bits are great too, but for different reasons.

 
 Indeed, I can think of several uses for packed bits. i.e.
  - Using them as a bunch of flags, generally boolean on/off flags.
  - Representing/disecting packed data, i.e. tcp headers.
  - Assembling/converting data i.e. 8bit to 7bit characters for SMS 
 messages.
 
 all of these can be done with & | ^ etc but it would be nice, i.e. more 
 readable, easier to write if we could index the data.
 
 I've suggested this before but is it perhaps possible to allow us to 
 perform array operations on the basic types: byte, short, int, long. For 
 the same reason that bit[] does not work, these could not provide a full 
 set of array functionality, but it could provide much that would be of 
 use, I suspect.
 
 Examples:
 
 int flags;
 ....
 if (flags[5]) //check for flag
     flag[5] = 1; //set flag
 
 void foo(long header) {
   int length = header[0..5]; //copy bits to lvalue.
 ....
 

An interesting idea, but maybe, to avoid conflicts syntax conflicts, we 
should have:
   if (flags.bits[5])
     flags.bits[5] = 0;

(the name "bits" could maybe be other)


-- 
Bruno Medeiros - CS/E student
"Certain aspects of D are a pathway to many abilities some consider to 
be... unnatural."

Feb 17 2006

"Regan Heath" <regan netwin.co.nz> writes:

On Fri, 17 Feb 2006 15:38:38 +0000, Bruno Medeiros  
<daiphoenixNO SPAMlycos.com> wrote:
 if (flags[5]) //check for flag
     flag[5] = 1; //set flag
  void foo(long header) {
   int length = header[0..5]; //copy bits to lvalue.
 ....

 An interesting idea, but maybe, to avoid conflicts syntax conflicts, we  
 should have:
    if (flags.bits[5])
      flags.bits[5] = 0;

 (the name "bits" could maybe be other)

I like it.

Regan

Feb 17 2006

"Walter Bright" <newshound digitalmars.com> writes:

"kris" <fu bar.org> wrote in message news:dt1jhm$1m3$1 digitaldaemon.com...
 Walter Bright wrote:
 Not really. I think it also conflicts with 'in' already.

 but not from the users standpoint

Can't separate the two.

 That doesn't mean D should adopt arbitrary symbols, Walter. If you want 
 rapid adoption, then the more you can do to make the language 
 "approachable", the more success you'll have. There was a similar issue 
 with === and !==, and you thankfully deprecated them :-)

Those had to go because === was indistinguishable from == in many fonts.

 It would be overloading its existing meaning, which means that it'll take 
 semantic, rather than syntactic, analysis to disambiguate. This is 
 potential trouble.

 I can see that there "might" be trouble for the compiler and, if so, that 
 would be an issue. However, for a developer, the meaning of "in" with 
 respect to its use with AA and potentially regex-patterns is consistent.

The trouble starts happening when you overload the operators. Doing this 
with 'in' will result in similar problems that C++ has with '+' being 
sometimes plus, sometimes concatenate.

 One is asking the question "does this thing on the left exist within the 
 thing on the right". It even takes care of getting the operand ordering 
 correct. Thus, I'd urge you to at least see if there's actually a notable 
 problem for the compiler to handle this before writing the idea off.

It's not a problem with the compiler. It's a conceptual problem for the 
user. When I see 'in' I think of containers. That's completely different 
from regex.

 Heck, I've used regex in all manner of ways. I don't think visibility is 
 the problem; rather, I suspect there's a limited set of domains where it 
 applies in a systems language. Some of the those can be addressed in other 
 ways, particularly where performance is a concern; hence regex may not get 
 used as much as it might. In scripting languages there's often a need for 
 Q & D pattern-matching, with little regard for a potentially more 
 efficient mechanism. Horses for courses.

Scripting languages have 3 main programming characteristics:

1) dynamic typing
2) great string handling
3) runtime script generation & execution

A lot of people turn to them because of (2). There's no reason C++ and D 
can't do (2) as well. C++ doesn't because the C++ community has adopted the 
principle of "if it can be done as a library, it must be done as a library, 
no matter how unbelievably wretched that might turn out." So when C++ 
programmers want to do strings, they switch to Perl, Ruby, Python, etc.

As to string manipulation in a systems app - is a compiler a systems app? I 
believe it is, and there's a bunch of tedious string manipulation in it. 
Everything from handling the command line arguments to manipulating file 
names to formatting error messages to reading config files. It's astonishing 
how that stuff shrinks and becomes a pleasure to code rather than tedium 
when the string handling sugar is applied.

I also write a number of garden variety string processing apps, such as the 
one that turns newsgroup postings into the "D archives". I want to do them 
in D. I don't want to install/learn Ruby/Python/Perl. I see no reason why D 
cannot dominate that problem space well.

 I'm an advocate for getting regex support in the grammar,

I thought you were arguing against that <g>.

 but I'm certainly not an advocate for tying Phobos to the compiler (RegExp 
 has a notable resultant import set; because of this I refactored it for 
 Ares and Mango).
 Without a clearly defined means to decouple Phobos from the compiler, 
 you're effectively erecting barriers for other solutions to clamber over 
 (as Sean vaguely intimated earlier). What's missing from all this built-in 
 stuff is a clean and documented means to have it supported outside of 
 Phobos. After all, the compiler is injecting explicit references for AA 
 code, utf conversion code, regex code, and a variety of other things. 
 What's next?

The compiler actually does not emit any explicit references to RegExp. It's 
all done by a reference to object._Match. _Match operates as a proxy to 
RegExp, but the compiler knows nothing about that.

 In short: you're (a) building more and more library functionality directly 
 into the language without providing a means to cleanly support alternate 
 implementations, extensions, or otherwise decouple the compiler. And (b) 
 by doing so, you're (perhaps inadvertantly) stifling some innovation and 
 causing some headaches for the very people who are trying to help D along 
 the road to acceptance. It would really help if you'd be somewhat 
 sensitive to these aspects rather than persistently ignoring them.

 For instance, how does one change .sort to use a different sorting 
 algorithm? How does one change the hashing function for non-classes? How 
 can one unhook RegExp+OutBuffer+String+Others, and replace it? etc. etc. 
 If D is intended to be a closed-shop, Phobos-only environment, then some 
 of us are presumably wasting our time supporting the language; right?

Regex is non-trivial. There's no way to have any sort of language support 
for it without it being in the library. Anyone working on D libraries or 
other things is welcome to use RegExp, so I am just not understanding what 
the problem is. Phobos isn't a closed shop, the license on the files allows 
anyone to do pretty much anything they want with it.

Also, let me reiterate that the compiler does *not* emit any hardcoded 
references to RegExp, nor does it know anything at all about regex's. It 
uses object._Match, which is a proxy to whatever the language implementor 
wants to use.

RegExp could probably remove its dependence on OutBuffer, though.

 Who uses regex in C++? Hardly anyone. I'm betting it's because using them 
 sucks in C++, not because people don't use regex's.

 Again, it's horses for courses. BTW, regex does not suck in C, so why C++ 
 ?

It sucks in C, and why do I say that? I've shipped a C compiler for 22 years 
now, and not once, not ever, did anyone ask for a regex library for it. 
Regex wasn't put in the C standard, or the C++ one. Yet regex is considered 
a core capability of several other languages. There are many ways to 
interpret that - I am interpreting it as meaning that regex sucks in C, and 
so people seem to just never even think of using C when they need to process 
strings.


 BTW: perhaps it would be appropriate to deprecate bit[] before 1.0 and 
 provide a nice library class/struct instead? You might even reuse the old 
 code from Zortech/Zorland days.

I know Stewart is using bit[], I want to hear his opinion first. If he says 
dump it, I'm agreeable.

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 
 The compiler actually does not emit any explicit references to RegExp. It's 
 all done by a reference to object._Match. _Match operates as a proxy to 
 RegExp, but the compiler knows nothing about that.

This is really more of a library issue than a compiler issue.  My 
concern is that, since internal/object.d now imports std.regexp, the 
runtime code can no longer be built without at least a skeleton regexp 
module available.  And if the regexp implementation changes then the 
runtime must be rebuilt.  I'll admit that the current approach is 
probably best given that std.regexp exists and code duplication is a Bad 
Thing, but it still creates a language dependency on library code, even 
if the compiler isn't emitting RegExp calls directly.

 Regex is non-trivial. There's no way to have any sort of language support 
 for it without it being in the library. Anyone working on D libraries or 
 other things is welcome to use RegExp, so I am just not understanding what 
 the problem is. Phobos isn't a closed shop, the license on the files allows 
 anyone to do pretty much anything they want with it.

I agree.  And this works fine for Phobos.  But if Phobos is to be a 
template for future standard library implementations, then it should be 
designed in a way that allows for closed-source compiler implementations 
as well.

Also, what if a library writer decides to exploit the regular expression 
support provided by the language, and merely implements his RegExp class 
as a veneer over the built-in functionality?  It creates an odd sort of 
circular dependency.  I'd originally considered the same thing for UTF 
transcoding using the built-in foreach mechanism, but as that code is 
relatively simply it's not as much of an issue.

I assume there's no plan to remove std.regexp from Phobos now that 
language support is in place?

 Also, let me reiterate that the compiler does *not* emit any hardcoded 
 references to RegExp, nor does it know anything at all about regex's. It 
 uses object._Match, which is a proxy to whatever the language implementor 
 wants to use.

Understood.  In fact I'll vouch for this since I've had a close look at 
the code.


Sean

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Sean Kelly" <sean f4.ca> wrote in message 
news:dt394j$1n2j$1 digitaldaemon.com...
 This is really more of a library issue than a compiler issue.  My concern 
 is that, since internal/object.d now imports std.regexp, the runtime code 
 can no longer be built without at least a skeleton regexp module 
 available.  And if the regexp implementation changes then the runtime must 
 be rebuilt.  I'll admit that the current approach is probably best given 
 that std.regexp exists and code duplication is a Bad Thing, but it still 
 creates a language dependency on library code, even if the compiler isn't 
 emitting RegExp calls directly.

I was concerned that code that did not use MatchExpressions might 
inadvertantly link in the std.regexp module, which would be a Bad Thing. It 
does not, so I'm not convinced this is a bad thing.

 Regex is non-trivial. There's no way to have any sort of language support 
 for it without it being in the library. Anyone working on D libraries or 
 other things is welcome to use RegExp, so I am just not understanding 
 what the problem is. Phobos isn't a closed shop, the license on the files 
 allows anyone to do pretty much anything they want with it.

 I agree.  And this works fine for Phobos.  But if Phobos is to be a 
 template for future standard library implementations, then it should be 
 designed in a way that allows for closed-source compiler implementations 
 as well.

Sure, and std.regexp's license allows it to be used in closed source. It's a 
different license from dmd's source code, and the reason for the difference 
is so that people can use it for just the purpose you suggest. If one wanted 
to reimplement (or better, extend) RegExp in order to support, say, Perl 6 
regex, all that object._Match needs are about 4 trival members, which 
shouldn't be a burden.

Other than that, why reimplement RegExp?

 Also, what if a library writer decides to exploit the regular expression 
 support provided by the language, and merely implements his RegExp class 
 as a veneer over the built-in functionality?  It creates an odd sort of 
 circular dependency.

At some point, he'll need a regex implementation. And the license for 
std.RegExp allows him to use/adapt it as required.

 I assume there's no plan to remove std.regexp from Phobos now that 
 language support is in place?

I'm just not getting it - why should it be removed? There never was a plan 
to remove it. And why would an implementation of a D runtime library not 
want to do a regex implementation? Of course, it's a lot of work to 
implement a regex, but one can just copy over std.RegExp and use/adapt it as 
required, as the license allows that. So I am just not getting what the 
problem is.

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 
 I'm just not getting it - why should it be removed? There never was a plan 
 to remove it. And why would an implementation of a D runtime library not 
 want to do a regex implementation? Of course, it's a lot of work to 
 implement a regex, but one can just copy over std.RegExp and use/adapt it as 
 required, as the license allows that. So I am just not getting what the 
 problem is. 

Perhaps I'm being idealistic, as I simply don't believe the runtime 
should rely on standard library code.  Up to now that's been achievable, 
but the solution for this particular feature is less clear.  But I'll 
drop the issue for now and mull it over a bit.


Sean

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Sean Kelly" <sean f4.ca> wrote in message 
news:dt3hev$1taf$1 digitaldaemon.com...
 Walter Bright wrote:
 I'm just not getting it - why should it be removed? There never was a 
 plan to remove it. And why would an implementation of a D runtime library 
 not want to do a regex implementation? Of course, it's a lot of work to 
 implement a regex, but one can just copy over std.RegExp and use/adapt it 
 as required, as the license allows that. So I am just not getting what 
 the problem is.

 Perhaps I'm being idealistic, as I simply don't believe the runtime should 
 rely on standard library code.  Up to now that's been achievable, but the 
 solution for this particular feature is less clear.  But I'll drop the 
 issue for now and mull it over a bit.

Consider that there's no way to implement C, D, etc., without some runtime 
library. Just doing a long divide relies on library code. There's the 
startup code (you can't just jmp to main()), shutdown code, exception 
handling support, etc.

C/C++ have gone the odd route of making the library *part of the language*, 
so, for example, a compiler can recognize strlen and replace it with custom 
code. To my mind this gives the worst of both worlds - no syntactic sugar 
and no library flexibility.

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 "Sean Kelly" <sean f4.ca> wrote in message 
 news:dt3hev$1taf$1 digitaldaemon.com...
 Walter Bright wrote:
 I'm just not getting it - why should it be removed? There never was a 
 plan to remove it. And why would an implementation of a D runtime library 
 not want to do a regex implementation? Of course, it's a lot of work to 
 implement a regex, but one can just copy over std.RegExp and use/adapt it 
 as required, as the license allows that. So I am just not getting what 
 the problem is.

 Perhaps I'm being idealistic, as I simply don't believe the runtime should 
 rely on standard library code.  Up to now that's been achievable, but the 
 solution for this particular feature is less clear.  But I'll drop the 
 issue for now and mull it over a bit.

 
 Consider that there's no way to implement C, D, etc., without some runtime 
 library. Just doing a long divide relies on library code. There's the 
 startup code (you can't just jmp to main()), shutdown code, exception 
 handling support, etc.

Just to be clear, by "standard library code" I actually meant D code 
specifically.  I fully expect the standard C library to be used by the D 
runtime.  But as the C runtime likely calls C standard library 
functions, I suppose there's little reason to expect otherwise from D.

 C/C++ have gone the odd route of making the library *part of the language*, 
 so, for example, a compiler can recognize strlen and replace it with custom 
 code. To my mind this gives the worst of both worlds - no syntactic sugar 
 and no library flexibility. 

I've heard this mentioned before and it seems a bit odd to me.  Does the 
spec actually mention this anywhere, or is it merely implied by having 
the library spec be a part of the language spec?


Sean

Feb 17 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Sean Kelly" <sean f4.ca> wrote in message 
news:dt5d4e$i49$2 digitaldaemon.com...
 Walter Bright wrote:
 C/C++ have gone the odd route of making the library *part of the 
 language*, so, for example, a compiler can recognize strlen and replace 
 it with custom code. To my mind this gives the worst of both worlds - no 
 syntactic sugar and no library flexibility.

 I've heard this mentioned before and it seems a bit odd to me.  Does the 
 spec actually mention this anywhere, or is it merely implied by having the 
 library spec be a part of the language spec?

I think it's implied by it being part of the language spec. Regardless, it 
is true, and many compilers (including DMC) take advantage of it.

Feb 17 2006

"Kris" <fu bar.com> writes:

"Walter Bright" <newshound digitalmars.com> wrote ...
[snip]
 One is asking the question "does this thing on the left exist within the 
 thing on the right". It even takes care of getting the operand ordering 
 correct. Thus, I'd urge you to at least see if there's actually a notable 
 problem for the compiler to handle this before writing the idea off.

 It's not a problem with the compiler. It's a conceptual problem for the 
 user. When I see 'in' I think of containers. That's completely different 
 from regex.

Can't say that I agree, but my opinion matters rather little anyway <g>


 I'm an advocate for getting regex support in the grammar,

 I thought you were arguing against that <g>.

Not at all. I've been an advocate for it in the past also. It's certain 
other aspects of built-in functionality that I consistently have a beef 
with.


 In short: you're (a) building more and more library functionality 
 directly into the language without providing a means to cleanly support 
 alternate implementations, extensions, or otherwise decouple the 
 compiler. And (b) by doing so, you're (perhaps inadvertantly) stifling 
 some innovation and causing some headaches for the very people who are 
 trying to help D along the road to acceptance. It would really help if 
 you'd be somewhat sensitive to these aspects rather than persistently 
 ignoring them.

 For instance, how does one change .sort to use a different sorting 
 algorithm? How does one change the hashing function for non-classes? How 
 can one unhook RegExp+OutBuffer+String+Others, and replace it? etc. etc. 
 If D is intended to be a closed-shop, Phobos-only environment, then some 
 of us are presumably wasting our time supporting the language; right?

 Regex is non-trivial. There's no way to have any sort of language support 
 for it without it being in the library. Anyone working on D libraries or 
 other things is welcome to use RegExp, so I am just not understanding what 
 the problem is. Phobos isn't a closed shop, the license on the files 
 allows anyone to do pretty much anything they want with it.

It's one thing to hear you say that; yet the proof is in the pudding. It's 
actually quite tricky to disentangle the compiler from Phobos. Some parts 
simply cannot be decoupled at all (at this time).  It's not a critisism of 
you personally, but the above concerns are very real and the frustration is 
something you perhaps need to know about.

If I read your answer a particular way, it can be interpreted as saying "why 
would you *not* want to use Phobos?". That would be an example of stifling 
innovation, for all kind of reasons.


 Also, let me reiterate that the compiler does *not* emit any hardcoded 
 references to RegExp, nor does it know anything at all about regex's. It 
 uses object._Match, which is a proxy to whatever the language implementor 
 wants to use.

 RegExp could probably remove its dependence on OutBuffer, though.

Probably. On the same topic, you've often 'lectured' about the need to 
decouple such that the "libraries don't end up like Java" . Yet RegExp 
imports String too, which in turn imports all these (std.format in 
particular):

private import std.stdio;
private import std.utf;
private import std.uni;
private import std.array;
private import std.format;
private import std.ctype;
private import std.stdarg;

It's quite easy to eliminate OutBuffer and String from RegExp. There's an 
adjusted version of it in circulation, if you'd like to forego the effort.


 It sucks in C, and why do I say that? I've shipped a C compiler for 22 
 years now, and not once, not ever, did anyone ask for a regex library for 
 it. Regex wasn't put in the C standard, or the C++ one. Yet regex is 
 considered a core capability of several other languages. There are many 
 ways to interpret that - I am interpreting it as meaning that regex sucks 
 in C, and so people seem to just never even think of using C when they 
 need to process strings.

I'm surprised that you'd interpret it that way. I've used regex in C for 
decades. There was one great implementation from, uhhh, Ian somebody from 
Edinburgh Uni, which generated x86 code on the fly. I used that to great 
effect ~ a truly impressive utility.

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Kris" <fu bar.com> wrote in message news:dt3cc2$1pc7$1 digitaldaemon.com...
 It sucks in C, and why do I say that? I've shipped a C compiler for 22 
 years now, and not once, not ever, did anyone ask for a regex library for 
 it. Regex wasn't put in the C standard, or the C++ one. Yet regex is 
 considered a core capability of several other languages. There are many 
 ways to interpret that - I am interpreting it as meaning that regex sucks 
 in C, and so people seem to just never even think of using C when they 
 need to process strings.

 I'm surprised that you'd interpret it that way. I've used regex in C for 
 decades. There was one great implementation from, uhhh, Ian somebody from 
 Edinburgh Uni, which generated x86 code on the fly. I used that to great 
 effect ~ a truly impressive utility.

How do you interpret the fact that it has failed to gain traction among the 
general C population?

Feb 16 2006

"Kris" <fu bar.com> writes:

"Walter Bright" <newshound digitalmars.com> wrote...
 "Kris" <fu bar.com> wrote in message 
 news:dt3cc2$1pc7$1 digitaldaemon.com...
 It sucks in C, and why do I say that? I've shipped a C compiler for 22 
 years now, and not once, not ever, did anyone ask for a regex library 
 for it. Regex wasn't put in the C standard, or the C++ one. Yet regex is 
 considered a core capability of several other languages. There are many 
 ways to interpret that - I am interpreting it as meaning that regex 
 sucks in C, and so people seem to just never even think of using C when 
 they need to process strings.

 I'm surprised that you'd interpret it that way. I've used regex in C for 
 decades. There was one great implementation from, uhhh, Ian somebody from 
 Edinburgh Uni, which generated x86 code on the fly. I used that to great 
 effect ~ a truly impressive utility.

 How do you interpret the fact that it has failed to gain traction among 
 the general C population?

I noted a few reasons previously, regarding differing approaches and 
mindsets between script developers and systems developers. Even when the 
same person does both. George Wrede just posted some very similar reasoning 
too. The upshot is that (IMO) the general C population rarely have a 
compelling need for regex. Where regex might seem (perhaps mistakenly) like 
using a sledgehammer to crack a nut in C, it's usage is often not given a 
second thought in scripts.

Speaking personally, I don't expect high performance out of a script, and 
don't give two hoots about Q & D hacking therein. That's not the case with 
systems-programming (for me), where I'm likely to use something more 
lightweight as appropriate. On the other hand, I've written a lot of the 
type of code that really benefits from the state-machinery exposed by a good 
regex engine. Other times I've hand-tuned my own state-machines to do the 
work instead. Sometimes in assembly.

As noted previously, I don't think it's a question of visibility at all ~ 
more a question of task, applicability, priorities, and various other cost 
factors.

One has to wonder how much script-regex actually leverages the power within? 
I'd bet a large % are completely trivial. The kind which can easily be 
handled by other (more efficient) means in systems languages.

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Kris" <fu bar.com> wrote in message news:dt3foh$1s1d$1 digitaldaemon.com...
 I noted a few reasons previously, regarding differing approaches and 
 mindsets between script developers and systems developers. Even when the 
 same person does both. George Wrede just posted some very similar 
 reasoning too. The upshot is that (IMO) the general C population rarely 
 have a compelling need for regex. Where regex might seem (perhaps 
 mistakenly) like using a sledgehammer to crack a nut in C, it's usage is 
 often not given a second thought in scripts.

This might be a circular result - people don't use regex in C because 
regex's suck in C, so there is no incentive to improve it because there 
aren't any users. People just get used to going to another language to use 
regex, and never stop to think it doesn't have to be that way.

 Speaking personally, I don't expect high performance out of a script, and 
 don't give two hoots about Q & D hacking therein. That's not the case with 
 systems-programming (for me), where I'm likely to use something more 
 lightweight as appropriate.

There's a lot of string processing work done in C that is not performance 
sensitive - like dealing with the command line arguments.

 On the other hand, I've written a lot of the type of code that really 
 benefits from the state-machinery exposed by a good regex engine. Other 
 times I've hand-tuned my own state-machines to do the work instead. 
 Sometimes in assembly.

Sure. And building in some syntactic sugar for regex isn't going to sabotage 
optimization.

 One has to wonder how much script-regex actually leverages the power 
 within? I'd bet a large % are completely trivial.

I agree with that.

 The kind which can easily be handled by other (more efficient) means in 
 systems languages.

I'm not sure that efficiency is the only goal here - productivity is a big 
one, too, and one often uses regex in parts of the program that don't need 
performance. I know I sure get tired of strlen/strcmp/memcpy for routine 
non-performance-critical code.

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

Kris wrote:
 "Walter Bright" <newshound digitalmars.com> wrote ...
 RegExp could probably remove its dependence on OutBuffer, though.

 
 Probably. On the same topic, you've often 'lectured' about the need to 
 decouple such that the "libraries don't end up like Java" . Yet RegExp 
 imports String too, which in turn imports all these (std.format in 
 particular):
 
 private import std.stdio;
 private import std.utf;
 private import std.uni;
 private import std.array;
 private import std.format;
 private import std.ctype;
 private import std.stdarg;
 
 It's quite easy to eliminate OutBuffer and String from RegExp. There's an 
 adjusted version of it in circulation, if you'd like to forego the effort.

For what it's worth, the latest release of Ares trims a lot of fat out 
of std.string, so far as runtime dependencies are concerned.  The only 
modules that are actually required by some portion of the runtime are:

std.ctype
std.outbuffer
std.regexp
std.string
std.utf

And outbuffer should be easy enough to remove from this list.  I'd have 
continued to use your modified std.regexp for this release except the 
deltas between the 146 and 147 versions of std.regexp were tremendous. 
It would have taken hours to sort out a workable merge of that file, so 
falling back on the new Phobos version seemed preferable.

 It sucks in C, and why do I say that? I've shipped a C compiler for 22 
 years now, and not once, not ever, did anyone ask for a regex library for 
 it. Regex wasn't put in the C standard, or the C++ one. Yet regex is 
 considered a core capability of several other languages. There are many 
 ways to interpret that - I am interpreting it as meaning that regex sucks 
 in C, and so people seem to just never even think of using C when they 
 need to process strings.

 
 I'm surprised that you'd interpret it that way. I've used regex in C for 
 decades. There was one great implementation from, uhhh, Ian somebody from 
 Edinburgh Uni, which generated x86 code on the fly. I used that to great 
 effect ~ a truly impressive utility. 

That sounds pretty cool.


Sean

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Sean Kelly" <sean f4.ca> wrote in message 
news:dt3nss$22bj$1 digitaldaemon.com...
 And outbuffer should be easy enough to remove from this list.  I'd have 
 continued to use your modified std.regexp for this release except the 
 deltas between the 146 and 147 versions of std.regexp were tremendous. It 
 would have taken hours to sort out a workable merge of that file, so 
 falling back on the new Phobos version seemed preferable.

Very little actually changed, what I did was resort the order so it was more 
appealing in Ddoc format, and add the Ddoc comments.

Feb 16 2006

Georg Wrede <georg.wrede nospam.org> writes:

Walter Bright wrote:

 It sucks in C, and why do I say that? I've shipped a C compiler for
 22 years now, and not once, not ever, did anyone ask for a regex
 library for it. Regex wasn't put in the C standard, or the C++ one.
 Yet regex is considered a core capability of several other languages.
 There are many ways to interpret that - I am interpreting it as
 meaning that regex sucks in C, and so people seem to just never even
 think of using C when they need to process strings.

Hmm.

Regexes being a big thing for interpreted languages is much thanks to 
the Q&D convenience. Also systems scripting needs it for nontrivial 
filtering, and of course complicated line rewriting.

C folks tend to "peek directly" into the strings because it's cheap, and 
you have a sense of complete control.

Using regexps in C needs a total change of paradigm. Regexps are kind of 
"top down" things, wherease traditionally "peeking into strings" is 
bottom-up programming.

You'd also have to learn regexps. The trivial things are trivial in 
C-style too, and the non-trivial stuff gets avoided because of the 
up-front investment. Folks rather do nested ifs and stuff.

Conversely, many interpreted languages make it inefficient to do "peek" 
kind of programming, as compared to using regexps.

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:43F53BE5.8020900 nospam.org...
 Using regexps in C needs a total change of paradigm. Regexps are kind of 
 "top down" things, wherease traditionally "peeking into strings" is 
 bottom-up programming.

 You'd also have to learn regexps. The trivial things are trivial in 
 C-style too, and the non-trivial stuff gets avoided because of the 
 up-front investment. Folks rather do nested ifs and stuff.

 Conversely, many interpreted languages make it inefficient to do "peek" 
 kind of programming, as compared to using regexps.

There are a lot of cool things you can do in script languages because they 
are interpreted, and one doesn't care about efficiency. Those things are 
simply incompatible with D. But I don't see any inherent advantages script 
languages should have in implementing regex.

Feb 16 2006

Georg Wrede <georg.wrede nospam.org> writes:

Walter Bright wrote:
 "Georg Wrede" <georg.wrede nospam.org> wrote in message 
 news:43F53BE5.8020900 nospam.org...
 
 Using regexps in C needs a total change of paradigm. Regexps are
 kind of "top down" things, wherease traditionally "peeking into
 strings" is bottom-up programming.
 
 You'd also have to learn regexps. The trivial things are trivial in
  C-style too, and the non-trivial stuff gets avoided because of the
  up-front investment. Folks rather do nested ifs and stuff.
 
 Conversely, many interpreted languages make it inefficient to do
 "peek" kind of programming, as compared to using regexps.

 
 There are a lot of cool things you can do in script languages because
 they are interpreted, and one doesn't care about efficiency. Those
 things are simply incompatible with D. But I don't see any inherent
 advantages script languages should have in implementing regex.

Neither do I.

But the question was, how come regexps aren't _used_ as much as we'd expect.

Feb 17 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:43F58CDB.9090504 nospam.org...
 Walter Bright wrote:
 There are a lot of cool things you can do in script languages because
 they are interpreted, and one doesn't care about efficiency. Those
 things are simply incompatible with D. But I don't see any inherent
 advantages script languages should have in implementing regex.

 Neither do I.

 But the question was, how come regexps aren't _used_ as much as we'd 
 expect.

My answer is because they're inconvenient to use in C/C++.

Feb 17 2006

James Dunne <james.jdunne gmail.com> writes:

Georg Wrede wrote:
 Walter Bright wrote:
 
 "Georg Wrede" <georg.wrede nospam.org> wrote in message 
 news:43F53BE5.8020900 nospam.org...

 Using regexps in C needs a total change of paradigm. Regexps are
 kind of "top down" things, wherease traditionally "peeking into
 strings" is bottom-up programming.

 You'd also have to learn regexps. The trivial things are trivial in
  C-style too, and the non-trivial stuff gets avoided because of the
  up-front investment. Folks rather do nested ifs and stuff.

 Conversely, many interpreted languages make it inefficient to do
 "peek" kind of programming, as compared to using regexps.


 There are a lot of cool things you can do in script languages because
 they are interpreted, and one doesn't care about efficiency. Those
 things are simply incompatible with D. But I don't see any inherent
 advantages script languages should have in implementing regex.

 
 
 Neither do I.
 
 But the question was, how come regexps aren't _used_ as much as we'd 
 expect.

My answer is that regular expressions simply aren't powerful enough for 
the kinds of string processing that I need to do regularly (no pun 
intended).  Regular expressions represent regular languages.  Not all 
languages are regular, of course.

<rant>
My other beef with regular expression are that there are so many 
competeing standards for them, and on top of that some are not even 
standardized (i.e. MS Visual Studio .NET 2003).

You never know if one implementation uses longest-match or one uses 
shortest-match; you never know how newlines are handled; you never know 
if Unicode is supported; you never know the run-time performance of your 
regex; you never know the syntax for selecting match indicies (0 based 
or 1 based, use '\1'? Record match with {} or with \(\) or with () ??) etc.

There are simply too many variables with regular expressions as they 
exist in all their forms to be relied upon.  Finally, they're just plain 
ugly and nearly impossible to debug.
</rant>

Following that rant, I can put a positive spin here and say that Ragel 
state machine compiler is an excellent model to work from!  One can 
insert custom code between state transitions for debugging and even for 
complex logic!  Why can't we have compiler-support for this type of 
power? :)

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/MU/S d-pu s:+ a-->? C++++$ UL+++ P--- L+++ !E W-- N++ o? K? w--- O 
M--  V? PS PE Y+ PGP- t+ 5 X+ !R tv-->!tv b- DI++(+) D++ G e++>e 
h>--->++ r+++ y+++
------END GEEK CODE BLOCK------

James Dunne

Feb 20 2006

Georg Wrede <georg.wrede nospam.org> writes:

kris wrote:

 I'm an advocate for getting regex support in the grammar, but I'm 
 certainly not an advocate for tying Phobos to the compiler (RegExp has a 
 notable resultant import set; because of this I refactored it for Ares 
 and Mango).

Would it be correct to assume that if we had compile-time regexps, then 
the resultant import set would be effectively zero? (As long as we of 
course don't also use regexps that aren't compile-time compilable?)

Since (IMHO) most shortish programs only use literal regexes, this would 
be quite important.

Feb 16 2006

Georg Wrede <georg.wrede nospam.org> writes:

Walter Bright wrote:
 "kris" <fu bar.org> wrote 
 Walter Bright wrote:


At least "in" has some relevant meaning to it.

 
 It would be overloading its existing meaning, which means that it'll take 
 semantic, rather than syntactic, analysis to disambiguate. This is potential 
 trouble.

Sad. "in" did sound good. :-)

That is a problem, one that would get solved when RegExp can do wchar and 
dchar. That isn't a technical problem, it's more of a getting around to 
it problem.

Well, since grammar supported regex has elevated itself to the top of the 
priority list, perhaps wchar/dchar support might tag along with it?

 
 The thing is, RegExp has been in there from the beginning, but it has gone 
 unused and even its existence is overlooked. I don't believe that's because 
 it isn't useful - look at Ruby, Perl, Javascript, etc. Those languages 
 heavilly use regex. Is there something inherent about *script* languages 
 that make them nice for regex? I don't believe there is, I think it gets 
 heavilly used in those languages because the syntactic sugar makes it easy 
 to use.

There are 2 things reducing its usage.

First, the using itself has been awkward.

Second, and more important, most real-world uses of regex involve 
literals. And that implies compile-time compilation, if they are to be 
perceived efficient.

I don't think this takes away from the regex templates. I hope to use the 
regex templates in conjunction with this syntactic sugar to create 
optimized regex evaluation.

Perhaps, but I really don't see the need for this sudden rush to get regex 
support into the grammar. Experience with regex templates is almost 
certain to uncover some conflict in this regard ~ one that will likely 
have to be compromised to fit in with the current syntax. That's just 
Murphy's law. What's the big hurry?

 
 I thought it fit in well with D's new capability of being runnable in a 
 script-like fashion.

Experience has shown that using D as a scripting language in a 
production environment, currently needs some method of 
compiler-version-locking.

In other words, if a script is written for D.130, then something should 
ensure that it stays compiled with that version, even after the system D 
compiler gets updated.

If this is not done, then system scripts break at unexpected times (i.e. 
the first time that particular script is run after the compiler is 
updated to the first version that breaks the script). In a production 
environment it is plain impossible to search and test-run each D script 
any time the compiler gets updated.

This problem is made even worse by the run-time library not having any 
version identifier. It sure would be nice if one could leave the old 
run-time libraries as-is, and only add the new one next to them. The 
binaries should choose the right one automagically.

The way we are using D scripting (digitalmars.D.announce:2674) is 
version independent (meaning we can use _any_ DMD), but of course the 
individual D scripts introduce compiler version dependencies by themselves.

One solution to all the above mentioned problems, would of course be a 
"dscript.d" binary, that takes care of everything. (A good starting 
point would be to use the above mentioned scripting script.) Then every 
D script would start with



but that would then totally obviate the DMD -run parameter!

 If this opens up a reasonably broad new range of 
 applications that D is a good fit for, that's good. I might be wrong, of 
 course, as I've been with the bit data type (a complete botch). Match 
 expressions don't break anything, were not expensive to implement, and the 
 only way to see how they'll work out is to try them. 

I think the current implementation is good. I don't like to see any 
$whatever (or even worse, $` $� $' $") implemented!!!! We don't like to 
see D become Perl.

And hey, Perl itself has been moving away from the 
$-unbrememberable-fly-droppings stuff. AND even _bash_ has been starting 
to avoid them lately! (See man bash.)

Syntactic sugar is ok in general. But not "semantic" or "hieroglyphic" 
sugar. Let's see how the brand new stuff works, and whether any 
additional sugar ever becomes needed here!

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:43F530AC.9010101 nospam.org...
 Syntactic sugar is ok in general. But not "semantic" or "hieroglyphic" 
 sugar. Let's see how the brand new stuff works, and whether any additional 
 sugar ever becomes needed here!

I think the $` is pretty much dead now <g>.

Feb 16 2006

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 "Kris" <fu bar.com> wrote in message news:dt0q7n$2cuo$1 digitaldaemon.com...
 There seem to be multiple issues here. The first one, which you ask about, 
 is related to the syntax. At first blush, the ~~ looks like an approximate 
 approximation, and then making D look like a malformed Perl is surely a 
 mistake.

 
 If you've got a better idea for tokens ~~ and !~ ?

I'm half inclined to suggest -> for ~~, though there doesn't seem to be 
an obvious corresponding 'not' version.


Sean

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Sean Kelly" <sean f4.ca> wrote in message 
news:dt2cra$ssu$2 digitaldaemon.com...
 Walter Bright wrote:
 "Kris" <fu bar.com> wrote in message 
 news:dt0q7n$2cuo$1 digitaldaemon.com...
 There seem to be multiple issues here. The first one, which you ask 
 about, is related to the syntax. At first blush, the ~~ looks like an 
 approximate approximation, and then making D look like a malformed Perl 
 is surely a mistake.

 If you've got a better idea for tokens ~~ and !~ ?

 I'm half inclined to suggest -> for ~~, though there doesn't seem to be an 
 obvious corresponding 'not' version.

Two cons:

1) people see -> and they're going to think the C/C++ meaning. Heck, I often 
mistakenly use -> in D instead of '.'. For that reason -> should never 
result in valid D code.

2) as you suggested, !-> doesn't look too hot :-(

Feb 16 2006

James Dunne <james.jdunne gmail.com> writes:

Walter Bright wrote:
 D dramatically improves the convenience of string handling over C++. But 
 while I think using the library std.regexp is straightforward, obviously it 
 just isn't gaining traction. People like the shortcut approaches Ruby and 
 Perl use for regular expressions, hence the new D match-expression support.
 
 So, now we have:
 
     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable? 
 
 

I'd rather make my code easier to read than write.  I don't use regexps 
just for that reason.

-- 
Regards,
James Dunne

Feb 15 2006

Roberto Mariottini <Roberto_member pathlink.com> writes:

In article <dt088e$1svm$2 digitaldaemon.com>, Walter Bright says...
D dramatically improves the convenience of string handling over C++. But 
while I think using the library std.regexp is straightforward, obviously it 
just isn't gaining traction. People like the shortcut approaches Ruby and 
Perl use for regular expressions, hence the new D match-expression support.

So, now we have:

    if (regular_expression ~~ string)
    {
            _match.pre
            _match.post
            _match.match(n)
    }

Fairly good.

Should we do some aliases:

    $` => _match.pre
    $' => _match.post
    $& => _match.match(0)
    $n => _match.match(n)

?

No.
That's why I hate perl. I have to look in the manual to know what the hell $`
means, and be carefult abou it being realli an ` and not a '.

Syntactic sugar is often a good idea, but at what point do they become 
cyclamates and cause cancer in laboratory animals? Will these $ tokens 
render D more accessible, but perhaps too unreadable? 

Yes.

All those $'$`$&$3 are useful only to make my eyes cross. If you want to use $
use it as an abbreviation of 'match', so you'll get:

$pre => _match.pre
$post => _match.post
$(0) => _match.match(0)
$(n) => _match.match(n)

So once I know that $ stands for 'match', I can easily argue what $pre, $post,
$(0) and $(3) stand for.

Ciao

---
http://www.mariottini.net/roberto/

Feb 15 2006

Oskar Linde <olREM OVEnada.kth.se> writes:

Walter Bright wrote:

 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post

` is not readily available on all keyboards. Some fonts also have problems
differentiating between the three Latin-1 ticks (` ' �) (the straight tick
(apostrophe) (') looks like a right tick (acute accent) (�)  in many
fonts). 

     $& => _match.match(0)
     $n => _match.match(n)

Is n meant to be an integer expression or a numeric literal?

 ? Syntactic sugar is often a good idea, but at what point do they become
 cyclamates and cause cancer in laboratory animals? Will these $ tokens
 render D more accessible, but perhaps too unreadable?

IMHO, Both. It makes D less readable for sure. I also think this repulses
more people in general than it attracts some odd perl hackers. :)

In this case, I don't even thing the syntactical sugar makes the code much
faster to write (which in reality, I think, is psychological more than a
real problem). 

If verbosity is to be avoided, I would suggest (as in my earlier reply to
this thread) that $ replaces _match. This would give:

$.pre
$.post
$[0]
$[n]
(or $.match(n), but why not overload opIndex?)

/Oskar

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1ccm$2ssg$1 digitaldaemon.com...
     $& => _match.match(0)
     $n => _match.match(n)

 Is n meant to be an integer expression or a numeric literal?

$1, $2, $3, ...

 ? Syntactic sugar is often a good idea, but at what point do they become
 cyclamates and cause cancer in laboratory animals? Will these $ tokens
 render D more accessible, but perhaps too unreadable?

 IMHO, Both. It makes D less readable for sure. I also think this repulses
 more people in general than it attracts some odd perl hackers. :)

I'm a little surprised at the uniformly negative reaction to the perl-ish 
notation. But that's good, as it makes the right way to go for D clear.

 If verbosity is to be avoided, I would suggest (as in my earlier reply to
 this thread) that $ replaces _match. This would give:

 $.pre
 $.post
 $[0]
 $[n]
 (or $.match(n), but why not overload opIndex?)

That was the original plan, but when _match is of type T*, the [ ] cannot be 
overloaded.

Feb 16 2006

Oskar Linde <olREM OVEnada.kth.se> writes:

Walter Bright wrote:

 
 "Oskar Linde" <olREM OVEnada.kth.se> wrote in message
 news:dt1ccm$2ssg$1 digitaldaemon.com...
 $.pre
 $.post
 $[0]
 $[n]
 (or $.match(n), but why not overload opIndex?)

 
 That was the original plan, but when _match is of type T*, the [ ] cannot
 be overloaded.

So why does _match have to be a pointer? Would something like this not work?
(from object.d, added void *_this, opIndex and changed this->_this)

/* ***************************** _Match **************************** */

/* **
 * Default type for _match.
 * Implemented as a proxy for RegExp, so that object doesn't pull in
 * the entire std.regexp.
 */

import std.regexp;

struct _Match
{
    void *_this;

    char[] match(size_t n)
    {
        return (cast(RegExp)_this).match(n);
    }

    char[] opIndex(size_t n)
    {
        return match(n);
    }

    _Match opNext()
    {
        RegExp r = (cast(RegExp)_this).opNext();
        if (r)
            return cast(_Match)_this;
        r = cast(RegExp)_this;
        delete r;
        return null;
    }

    char[] pre()
    {
        return (cast(RegExp)_this).pre();
    }

    char[] post()
    {
        return (cast(RegExp)_this).post();
    }
}

/Oskar

Feb 17 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt41c8$2a9a$1 digitaldaemon.com...
 Walter Bright wrote:

 "Oskar Linde" <olREM OVEnada.kth.se> wrote in message
 news:dt1ccm$2ssg$1 digitaldaemon.com...
 $.pre
 $.post
 $[0]
 $[n]
 (or $.match(n), but why not overload opIndex?)

 That was the original plan, but when _match is of type T*, the [ ] cannot
 be overloaded.

 So why does _match have to be a pointer?

I wanted it to work with both pointers to structs and to class references.

 Would something like this not work?

The problem with that is testing:

    _Match m;
    if (m)

doesn't work if _Match is a struct.

Feb 17 2006

bobef <bobef lessequal.com> writes:

Walter Bright wrote:
 D dramatically improves the convenience of string handling over C++. But 
 while I think using the library std.regexp is straightforward, obviously it 
 just isn't gaining traction. People like the shortcut approaches Ruby and 
 Perl use for regular expressions, hence the new D match-expression support.
 
 So, now we have:
 
     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable? 
 
 

It is nice feature but I don't think such thing should be part of the 
language. I don't think it is so common. Maybe I am wrong... The other 
thing I don't like is the too many reserved words... Me personally 
wouldn't try to catch Ruby or Perl. I believe comparison between D/C/C++ 
and virtual machine or scripting language is foolish. But it depends on 
what are the goals of D - larger audience or higher quality. Because, in 
my opinion, trying to catch a scripting language is regression. But as I 
said it is very nice feature. I will use it myself, but wouldn't judge 
for a language by this...

Feb 16 2006

"Charles" <noone nowhere.com> writes:

Sweet jesus ... the horror.




"Walter Bright" <newshound digitalmars.com> wrote in message
news:dt088e$1svm$2 digitaldaemon.com...
 D dramatically improves the convenience of string handling over C++. But
 while I think using the library std.regexp is straightforward, obviously

it
 just isn't gaining traction. People like the shortcut approaches Ruby and
 Perl use for regular expressions, hence the new D match-expression

support.
 So, now we have:

     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }

 Should we do some aliases:

     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)

 ? Syntactic sugar is often a good idea, but at what point do they become
 cyclamates and cause cancer in laboratory animals? Will these $ tokens
 render D more accessible, but perhaps too unreadable?

Feb 16 2006

David Medlock <noone nowhere.com> writes:

Walter Bright wrote:
 D dramatically improves the convenience of string handling over C++. But 
 while I think using the library std.regexp is straightforward, obviously it 
 just isn't gaining traction. People like the shortcut approaches Ruby and 
 Perl use for regular expressions, hence the new D match-expression support.
 
 So, now we have:
 
     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable? 
 
 

I havent read this whole thread, but pardon if this has been suggested.
Why doesnt the regular expression stuff use foreach?

struct Match {
   short start, end;
}

foreach( Match m ; "[0-9]" ~~ mystring )
{
   writefln( "Found number:%s", mystring[m.start..m.end] );
}

Basically this implements a callback methodology for regexes, similar to:

void match( char[] regex, char[] str, bool delegate( Match m, char[] s ) 
  dg );

Obviously this doesnt cover all cases, but I'm just curious why it isn't 
used.

-DavidM

Feb 16 2006

"Walter Bright" <newshound digitalmars.com> writes:

"David Medlock" <noone nowhere.com> wrote in message 
news:dt2mpk$17aj$1 digitaldaemon.com...
 I havent read this whole thread, but pardon if this has been suggested.
 Why doesnt the regular expression stuff use foreach?

Why, indeed. Oskar has brought it up, and he and you are right. I'm going to 
reevaluate this based on the feedback in this thread.

Feb 16 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Walter Bright wrote:
 "David Medlock" <noone nowhere.com> wrote in message 
 news:dt2mpk$17aj$1 digitaldaemon.com...
 
I havent read this whole thread, but pardon if this has been suggested.
Why doesnt the regular expression stuff use foreach?

 
 
 Why, indeed. Oskar has brought it up, and he and you are right. I'm going to 
 reevaluate this based on the feedback in this thread. 
 
 

I agree with the "foreach" point/suggestion  ..
IMO building regex into the language to the point where a ~~ expressions 
automatically generates a "_match" variable is just going too far. a 
Match struct/class and a foreach implementation makes it much more 
consistent and clean.

Feb 16 2006

D Programming

C/C++ Programming

Other

digitalmars.D - $`, $', $&, $n - sugar or cyclamates?