www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - $`, $', $&, $n - sugar or cyclamates?

reply "Walter Bright" <newshound digitalmars.com> writes:
D dramatically improves the convenience of string handling over C++. But 
while I think using the library std.regexp is straightforward, obviously it 
just isn't gaining traction. People like the shortcut approaches Ruby and 
Perl use for regular expressions, hence the new D match-expression support.

So, now we have:

    if (regular_expression ~~ string)
    {
            _match.pre
            _match.post
            _match.match(n)
    }

Should we do some aliases:

    $` => _match.pre
    $' => _match.post
    $& => _match.match(0)
    $n => _match.match(n)

? Syntactic sugar is often a good idea, but at what point do they become 
cyclamates and cause cancer in laboratory animals? Will these $ tokens 
render D more accessible, but perhaps too unreadable? 
Feb 15 2006
next sibling parent Hasan Aljudy <hasan.aljudy gmail.com> writes:
Walter Bright wrote:
 D dramatically improves the convenience of string handling over C++. But 
 while I think using the library std.regexp is straightforward, obviously it 
 just isn't gaining traction. People like the shortcut approaches Ruby and 
 Perl use for regular expressions, hence the new D match-expression support.
 
 So, now we have:
 
     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable? 
 
 

I don't have much todo with regexes .. but please .. the $ sign is ugly!!
Feb 15 2006
prev sibling next sibling parent "Ameer Armaly" <ameer_armaly hotmail.com> writes:
"Walter Bright" <newshound digitalmars.com> wrote in message 
news:dt088e$1svm$2 digitaldaemon.com...
D dramatically improves the convenience of string handling over C++. But 
while I think using the library std.regexp is straightforward, obviously it 
just isn't gaining traction. People like the shortcut approaches Ruby and 
Perl use for regular expressions, hence the new D match-expression support.

 So, now we have:

    if (regular_expression ~~ string)
    {
            _match.pre
            _match.post
            _match.match(n)
    }

 Should we do some aliases:

    $` => _match.pre
    $' => _match.post
    $& => _match.match(0)
    $n => _match.match(n)

 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable? Hmm... I don't do 
 much with regular expressions, but the presence of too much sugar can be 
 counterproductive; I personally think the standard lib is the place for 
 that kind of thing.

 

Feb 15 2006
prev sibling next sibling parent Trevor Parscal <Trevor_member pathlink.com> writes:
In article <dt088e$1svm$2 digitaldaemon.com>, Walter Bright says...
D dramatically improves the convenience of string handling over C++. But 
while I think using the library std.regexp is straightforward, obviously it 
just isn't gaining traction. People like the shortcut approaches Ruby and 
Perl use for regular expressions, hence the new D match-expression support.

So, now we have:

    if (regular_expression ~~ string)
    {
            _match.pre
            _match.post
            _match.match(n)
    }

Should we do some aliases:

    $` => _match.pre
    $' => _match.post
    $& => _match.match(0)
    $n => _match.match(n)

? Syntactic sugar is often a good idea, but at what point do they become 
cyclamates and cause cancer in laboratory animals? Will these $ tokens 
render D more accessible, but perhaps too unreadable? 

Leave the $ sign for scripting languages... Thanks, Trevor Parscal
Feb 15 2006
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 15 Feb 2006 13:59:33 -0800, Walter Bright wrote:

 D dramatically improves the convenience of string handling over C++. But 
 while I think using the library std.regexp is straightforward, obviously it 
 just isn't gaining traction. People like the shortcut approaches Ruby and 
 Perl use for regular expressions, hence the new D match-expression support.
 
 So, now we have:
 
     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }

Thanks for this Walter. Although it adds no new functionality to applications, it does say that D is a serious player in making string handling programs easier to write and maintain. I expect that std.regexp will still stay around and that this new feature is merely a portal into that library.
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable?

My first thought was "ouch! - not pleasant". After some consideration I'm now leaning towards the idea that we should hold off on implementing these shortcuts for now and wait to see if they are actually required or not. And then, if there is a crying need for them, to come up with a set of shortcuts that will be acceptable enough. Currently the '$' symbol is associated with arrays and lengths, and not as a general purpose lead-in character to symbol values. To mix these two disparate concepts in coders minds might not be fruitful. However, there may be other alternatives yet to be discovered, so the concept ought not to be totally abandoned just yet. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 16/02/2006 10:32:50 AM
Feb 15 2006
next sibling parent Sean Kelly <sean f4.ca> writes:
Derek Parnell wrote:
 On Wed, 15 Feb 2006 13:59:33 -0800, Walter Bright wrote:
  
 Should we do some aliases:

     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)

 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable?

My first thought was "ouch! - not pleasant". After some consideration I'm now leaning towards the idea that we should hold off on implementing these shortcuts for now and wait to see if they are actually required or not. And then, if there is a crying need for them, to come up with a set of shortcuts that will be acceptable enough.

Agreed.
 Currently the '$' symbol is associated with arrays and lengths, and not as
 a general purpose lead-in character to symbol values. To mix these two
 disparate concepts in coders minds might not be fruitful. However, there
 may be other alternatives yet to be discovered, so the concept ought not to
 be totally abandoned just yet.

And this was my concern too. But perhaps this is a bridge best left ignored until there's a reason to jump. Sean
Feb 15 2006
prev sibling parent "Walter Bright" <newshound digitalmars.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:15w2x5659i8ey$.p4zbzif24wfw$.dlg 40tude.net...
 Thanks for this Walter. Although it adds no new functionality to
 applications, it does say that D is a serious player in making string
 handling programs easier to write and maintain. I expect that std.regexp
 will still stay around and that this new feature is merely a portal into
 that library.

You're right in that all it really does is offer an easier way to get at std.regexp.
 My first thought was "ouch! - not pleasant". After some consideration I'm
 now leaning towards the idea that we should hold off on implementing these
 shortcuts for now and wait to see if they are actually required or not. 
 And
 then, if there is a crying need for them, to come up with a set of
 shortcuts that will be acceptable enough.

That's why I didn't do them yet.
Feb 15 2006
prev sibling next sibling parent Tom <Tom_member pathlink.com> writes:
In article <dt088e$1svm$2 digitaldaemon.com>, Walter Bright says...
D dramatically improves the convenience of string handling over C++. But 
while I think using the library std.regexp is straightforward, obviously it 
just isn't gaining traction. People like the shortcut approaches Ruby and 
Perl use for regular expressions, hence the new D match-expression support.

So, now we have:

    if (regular_expression ~~ string)
    {
            _match.pre
            _match.post
            _match.match(n)
    }

Should we do some aliases:

    $` => _match.pre
    $' => _match.post
    $& => _match.match(0)
    $n => _match.match(n)

? Syntactic sugar is often a good idea, but at what point do they become 
cyclamates and cause cancer in laboratory animals? Will these $ tokens 
render D more accessible, but perhaps too unreadable? 

On the contrary I think "$" is a very valuable symbol and should be used. Though the symbol "`" is very inconvenient (at least for spanish keyboard layout), ugly and could lead to confusion with "'" symbol - as I've seen many times and which I personally don't like to see used in such a way as "$'" -. Maybe "$[" and "$]", don't know. Just my opinion, Tom;
Feb 15 2006
prev sibling next sibling parent reply John Demme <me teqdruid.com> writes:
Walter Bright wrote:
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become
 cyclamates and cause cancer in laboratory animals? Will these $ tokens
 render D more accessible, but perhaps too unreadable?

Oh Bob no... Don't turn D into Perl. I like the $ for short cuts and such, but please no random symbols. I like $match.pre and $length, ect... but $& and $` don't mean anything to me!
Feb 15 2006
next sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"John Demme" <me teqdruid.com> wrote in message 
news:dt0fvp$23bj$1 digitaldaemon.com...
 Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and 
 such,
 but please no random symbols.  I like $match.pre and $length, ect... but 
 $&
 and $` don't mean anything to me!

<g>. I considered setting this up as a vote: Vote for 1: (1) If I wanted to write ugly programs I'd use Perl, not D. (2) Cool! I can now dump my Perl scripts and use D!
Feb 15 2006
next sibling parent reply pragma <pragma_member pathlink.com> writes:
In article <dt0hbb$25iq$2 digitaldaemon.com>, Walter Bright says...
"John Demme" <me teqdruid.com> wrote in message 
news:dt0fvp$23bj$1 digitaldaemon.com...
 Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and 
 such,
 but please no random symbols.  I like $match.pre and $length, ect... but 
 $&
 and $` don't mean anything to me!

<g>. I considered setting this up as a vote: Vote for 1: (1) If I wanted to write ugly programs I'd use Perl, not D. (2) Cool! I can now dump my Perl scripts and use D!

Well, assuming that your mind is made up on this way or no way, I'd have to lean toward (2). Its there to be used, but if I object to it personally, I can abstain from using it. Just some food for thought, as I think there's plenty left to be worked out in this concept. :) IMHO, using "~~" as a token doesn't look right yet, but that's probably because this would be the first time that token has been used in a programming language (unless I'm mistaken). The only thing I could possibly suggest to use differently would be at-cost (" ") symbol: if("regular expression" "operand"){ /*...*/ } This looks a little more arithmetic to my eye than "~~". :) The dollar-sign operators look good, but "$n" seems limited to me. Why not open this up to array-indexing so it's more compatible with foreach, arrays and other things D? Also, what about if I want to pass the set of matches as an array? The '$x' tokens are sure to lex great, but isn't this running the risk of overloading the '$' symbol a bit much (from a visual standpoint)? if("$\w*" ~~ "hello world"){ mystring[0..$&.length] = $&; //eek! } Also, am I to assume that we'll get an "opProcess" operator overload to use on our classes? As long as _match is flexible enough to accept any type, this could really work. To my eye, the compiler could accept a custom class or struct as the _match value (kind of like an internal 'auto') so long as its namespace provides the .pre, .post, .match members. All-in-all, it would be a rather nice side effect of all this, as things like Spirit have been difficult to implement as D has fewer operator overloads than C++. - Eric Anderton at yahoo
Feb 15 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"pragma" <pragma_member pathlink.com> wrote in message 
news:dt0mfk$29qc$1 digitaldaemon.com...
 Also, am I to assume that we'll get an "opProcess" operator overload to 
 use on
 our classes?

Yes, opMatch. Already done!
  As long as _match is flexible enough to accept any type, this
 could really work.  To my eye, the compiler could accept a custom class or
 struct as the _match value (kind of like an internal 'auto') so long as 
 its
 namespace provides the .pre, .post, .match members.

Already done!
 All-in-all, it would be a
 rather nice side effect of all this, as things like Spirit have been 
 difficult
 to implement as D has fewer operator overloads than C++.


 - Eric Anderton at yahoo 

Feb 15 2006
parent reply Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:
Walter Bright wrote:
 "pragma" <pragma_member pathlink.com> wrote in message 
 news:dt0mfk$29qc$1 digitaldaemon.com...
 
Also, am I to assume that we'll get an "opProcess" operator overload to 
use on
our classes?

Yes, opMatch. Already done!

Walter!! You are really crazy! (In a really really good way) I just tried this for fun and it works: <code> import std.stdio; class ArrayBeginsWith0and1 { static bool opMatch(int[] nums) { if(nums.length < 2)return false; if(nums[0] == 0 && nums[1] == 1) return true; else return false; } } void main() { static int[] somearray1 = [0,1,2]; static int[] somearray2 = [2,1,2]; writefln(ArrayBeginsWith0and1 ~~ somearray1); //prints true writefln(ArrayBeginsWith0and1 ~~ somearray2); //prints false } </code> I hope this isn't a bug that this works?
Feb 16 2006
parent "Walter Bright" <newshound digitalmars.com> writes:
"Ivan Senji" <ivan.senji_REMOVE_ _THIS__gmail.com> wrote in message 
news:dt1u3t$d97$1 digitaldaemon.com...
 I hope this isn't a bug that this works?

It's supposed to work <g>.
Feb 16 2006
prev sibling next sibling parent Dave <Dave_member pathlink.com> writes:
In article <dt0hbb$25iq$2 digitaldaemon.com>, Walter Bright says...
"John Demme" <me teqdruid.com> wrote in message 
news:dt0fvp$23bj$1 digitaldaemon.com...
 Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and 
 such,
 but please no random symbols.  I like $match.pre and $length, ect... but 
 $&
 and $` don't mean anything to me!

<g>. I considered setting this up as a vote: Vote for 1: (1) If I wanted to write ugly programs I'd use Perl, not D. (2) Cool! I can now dump my Perl scripts and use D!

I think both apply and are not mutually exclusive <g> For me, the big part of supporting the most common regex operation in the language itself is that quick scripts using it can be kicked out without having to import something or remember the details of the RegExp class. Crazy (or lazy?), but I find that appealing when comparing it to a scripting language. So that's a vote for (2). I've never been a big fan of most of Perl's syntactical sugar - just too easy to miss something when you're reading it, so that's a vote for (1). And besides, one will never be able to copy and paste much of anything from Perl into D so there isn't any 'sweet' benefit there either <g> - Dave
Feb 15 2006
prev sibling next sibling parent "Unknown W. Brackets" <unknown simplemachines.org> writes:
I personally don't see why it has to be 1 or 2.  I think compromise is a 
great thing.

I should note first that I actually like $ in scripting languages, 
because it tends to make variables stand out (not hide them.)

You seem to be suggesting either using _match.match(0) (ick!) or $&.... 
why?  Why can't it be:

    $pre => _match.pre
    $post => _match.post
    $match => _match.match(0)
    $5 => _match.match(5)

Yes, yes, I realize this looks more like those scripting-language 
variables, but it's also clearer than Perl's syntax, and almost as easy 
to type.  I would spend more time making sure I'm pressing the right 
symbol than typing "pre" or some such.

Just my opinion.

-[Unknown]


 "John Demme" <me teqdruid.com> wrote in message 
 news:dt0fvp$23bj$1 digitaldaemon.com...
 Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and 
 such,
 but please no random symbols.  I like $match.pre and $length, ect... but 
 $&
 and $` don't mean anything to me!

<g>. I considered setting this up as a vote: Vote for 1: (1) If I wanted to write ugly programs I'd use Perl, not D. (2) Cool! I can now dump my Perl scripts and use D!

Feb 15 2006
prev sibling parent jicman <jicman_member pathlink.com> writes:
1

Walter Bright says...
"John Demme" <me teqdruid.com> wrote in message 
news:dt0fvp$23bj$1 digitaldaemon.com...
 Oh Bob no... Don't turn D into Perl.  I like the $ for short cuts and 
 such,
 but please no random symbols.  I like $match.pre and $length, ect... but 
 $&
 and $` don't mean anything to me!

<g>. I considered setting this up as a vote: Vote for 1: (1) If I wanted to write ugly programs I'd use Perl, not D. (2) Cool! I can now dump my Perl scripts and use D!

Feb 16 2006
prev sibling parent jicman <jicman_member pathlink.com> writes:
John Demme says...
Walter Bright wrote:
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become
 cyclamates and cause cancer in laboratory animals? Will these $ tokens
 render D more accessible, but perhaps too unreadable?

Oh Bob no... Don't turn D into Perl. I like the $ for short cuts and such, but please no random symbols. I like $match.pre and $length, ect... but $& and $` don't mean anything to me!

I agree. Perl is perl, D is D.
Feb 16 2006
prev sibling next sibling parent reply S. Chancellor <dnewsgr mephit.kicks-ass.org> writes:
On 2006-02-15 13:59:33 -0800, "Walter Bright" <newshound digitalmars.com> said:

 D dramatically improves the convenience of string handling over C++. 
 But while I think using the library std.regexp is straightforward, 
 obviously it just isn't gaining traction. People like the shortcut 
 approaches Ruby and Perl use for regular expressions, hence the new D 
 match-expression support.
 
 So, now we have:
 
     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they 
 become cyclamates and cause cancer in laboratory animals? Will these $ 
 tokens render D more accessible, but perhaps too unreadable?

With this you've essentially bound syntax to the RegExp class, or are you not using that for this? I do believe I recall some statements by you in the past against standard libraries being an integral part of the computer language. Though, I'm too lazy to dig them up right now. My preference is that this match syntax be removed, and the aliases never see the light of day. I use perl for this sort of stuff. -S.
Feb 15 2006
parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 15 Feb 2006 18:06:45 -0800, S. Chancellor wrote:
 
 My preference is that this match syntax be removed, and the aliases 
 never see the light of day.  I use perl for this sort of stuff.

I use regular expression matching a lot in the type of programming I do, e.g. Build, and I suspect I'd find perl far too slow for the purpose. I haven't used the std.regexp library because it doesn't really support Unicode correctly so I've written simple functions to some pattern matching for my needs. And as I've just found out, the new pattern matching just uses the standard library and Unicode support is not there, so I still can't use it. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 16/02/2006 1:38:45 PM
Feb 15 2006
parent "Walter Bright" <newshound digitalmars.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:sedgdqrvihce.1s7xzb5qubodc$.dlg 40tude.net...
 I haven't used the std.regexp library because it doesn't really support
 Unicode correctly so I've written simple functions to some pattern 
 matching
 for my needs. And as I've just found out, the new pattern matching just
 uses the standard library and Unicode support is not there, so I still
 can't use it.

All you need to use it with your own custom type is provide an opMatch() overload.
Feb 15 2006
prev sibling next sibling parent reply "Kris" <fu bar.com> writes:
"Walter Bright" <newshound digitalmars.com> wrote...
D dramatically improves the convenience of string handling over C++. But 
while I think using the library std.regexp is straightforward, obviously it 
just isn't gaining traction. People like the shortcut approaches Ruby and 
Perl use for regular expressions, hence the new D match-expression support.

 So, now we have:

    if (regular_expression ~~ string)
    {
            _match.pre
            _match.post
            _match.match(n)
    }

 Should we do some aliases:

    $` => _match.pre
    $' => _match.post
    $& => _match.match(0)
    $n => _match.match(n)

 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable?

There seem to be multiple issues here. The first one, which you ask about, is related to the syntax. At first blush, the ~~ looks like an approximate approximation, and then making D look like a malformed Perl is surely a mistake. What the heck is wrong with $match.pre, $match.post, $match.index(n) instead? At least they're readable :-) Additionally, I thought '~' was used for concatenation? Because '+' is overloaded in other languages? Isn't that just exactly what you're now doing with '~' ? I mean, what does a "pattern within" operation have to do with concatenation? Then, you say this is applicable only to char[]. What about wchar[] and dchar[]? Are they now relegated to second-class citizens? It's no use converting those arrays into char[] on the fly ~ apart from the heap activity and conversion that would ensue (for both operands; one of which could be rather substantial), $match.pre and friends would also have to do conversions back into the original format. Ugghh. Yet another issue is with respect to case-folding (which is often used with regex expressions). You see, unicode case-folding does not follow the trivial rules of ASCII ~ you can't just call tolower() and hope for the best. Thus, there needs to be some mechanism to support alternate, more appropriate, converters. In retrospect, much of this should probably be handled via template usage (for the different UTF types). And the converter issue can be resolved by supporting some kind of assignable or plug-in module. All of this can be handled by a templated class. I attempted to do just this with your RegExp class, but ran into problems related to how patterns are stored in the "instruction" stream (size differences between char and dchar, for example). I'm an advocate for potentially getting regex support into the grammar but, on the face of it, your approach just doesn't appear to be considered in a particularly thorough manner. There again, perhaps you've already addressed the above issues, and the resolution is just not currently visible? Perhaps this whole thing should wait until after we see what can be done with the regex templates, so that there's some experience behind the grammar? I mean, that would surely be better than having to remove the above at some point in the future. What's the big rush with built-in regex anyway? I really do think it should wait until we have some solid experience with regex templates ~ don't you think it's rather likely we'll learn something really useful that applies directly to a built-in grammar? - Kris
Feb 15 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Kris" <fu bar.com> wrote in message news:dt0q7n$2cuo$1 digitaldaemon.com...
 There seem to be multiple issues here. The first one, which you ask about, 
 is related to the syntax. At first blush, the ~~ looks like an approximate 
 approximation, and then making D look like a malformed Perl is surely a 
 mistake.

If you've got a better idea for tokens ~~ and !~ ?
 What the heck is wrong with $match.pre, $match.post, $match.index(n) 
 instead? At least they're readable :-)

Nothing, really. But are they more readable than _match.pre, etc.?
 Additionally, I thought '~' was used for concatenation?

It is.
 Because '+' is overloaded in other languages? Isn't that just exactly what 
 you're now doing with '~' ?

'=' and '==' mean entirely different things. So does / and /*. I don't think ~~ need have anything to do with complement or concatenation.
 I mean, what does a "pattern within" operation have to do with 
 concatenation?

Nothing at all.
 Then, you say this is applicable only to char[]. What about wchar[] and 
 dchar[]? Are they now relegated to second-class citizens? It's no use 
 converting those arrays into char[] on the fly ~ apart from the heap 
 activity and conversion that would ensue (for both operands; one of which 
 could be rather substantial), $match.pre and friends would also have to do 
 conversions back into the original format. Ugghh.

That is a problem, one that would get solved when RegExp can do wchar and dchar. That isn't a technical problem, it's more of a getting around to it problem.
 Yet another issue is with respect to case-folding (which is often used 
 with regex expressions). You see, unicode case-folding does not follow the 
 trivial rules of ASCII ~ you can't just call tolower() and hope for the 
 best. Thus, there needs to be some mechanism to support alternate, more 
 appropriate, converters.

I agree that case is an issue. That's why this also works: if (RegExp("string", "i") ~~ "string") ... and can work with any class type as the left operand, as long as it overloads opMatch.
 In retrospect, much of this should probably be handled via template usage 
 (for the different UTF types). And the converter issue can be resolved by 
 supporting some kind of assignable or plug-in module. All of this can be 
 handled by a templated class. I attempted to do just this with your RegExp 
 class, but ran into problems related to how patterns are stored in the 
 "instruction" stream (size differences between char and dchar, for 
 example).

I don't agree. The problem I ran into with this approach is the injection of the declaration _match into the current scope.
 I'm an advocate for potentially getting regex support into the grammar 
 but, on the face of it, your approach just doesn't appear to be considered 
 in a particularly thorough manner. There again, perhaps you've already 
 addressed the above issues, and the resolution is just not currently 
 visible?

I considered many ways of doing it, and have actually been thinking about it for months. This seemed to be the most practical. I hope I answered your questions about it.
 Perhaps this whole thing should wait until after we see what can be done 
 with the regex templates, so that there's some experience behind the 
 grammar? I mean, that would surely be better than having to remove the 
 above at some point in the future. What's the big rush with built-in regex 
 anyway? I really do think it should wait until we have some solid 
 experience with regex templates ~ don't you think it's rather likely we'll 
 learn something really useful that applies directly to a built-in grammar?

I don't think this takes away from the regex templates. I hope to use the regex templates in conjunction with this syntactic sugar to create optimized regex evaluation.
Feb 15 2006
next sibling parent reply Oskar Linde <olREM OVEnada.kth.se> writes:
Walter Bright wrote:

 "Kris" <fu bar.com> wrote in message
 news:dt0q7n$2cuo$1 digitaldaemon.com...

 In retrospect, much of this should probably be handled via template usage
 (for the different UTF types). And the converter issue can be resolved by
 supporting some kind of assignable or plug-in module. All of this can be
 handled by a templated class. I attempted to do just this with your
 RegExp class, but ran into problems related to how patterns are stored in
 the "instruction" stream (size differences between char and dchar, for
 example).

I don't agree. The problem I ran into with this approach is the injection of the declaration _match into the current scope.

Have you considered making this more general? I.e. for all if statements, inject a variable that takes the value of the entire condition expression. (Using _result as a placeholder for such an identifier.) if ("..." ~~ "...) { _result.match(0); } if (myFunc()) { _result.whatever(); } Why should this behavior be reserved for opMatch() only? Isn't this a very common coding pattern that could also become less verbose by this: SomeType result; if ( (result = getSomething())) { doSomethingWith(result); } (becoming: if (getSomething()) { doSomethingWith(_result); } ) One suggestion would be to call _result $. Giving $ the semantics of a "scope injected value". This would go hand in hand with an earlier suggestion of changing the $ for index operations too: Assume [] introduces a new scope, then a $ within [] would refer to whatever is being indexed. char[] cutHeadAndTail = myString[1 .. $.length-1]; Image subImage = myImage[$.upperLeft .. $.middle]; char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')]; /Oskar
Feb 16 2006
next sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1aif$2qpd$1 digitaldaemon.com...
 Have you considered making this more general? I.e. for all if statements,
 inject a variable that takes the value of the entire condition expression.
 (Using _result as a placeholder for such an identifier.)

 if ("..." ~~ "...) {
  _result.match(0);
 }

 if (myFunc()) {
  _result.whatever();
 }

 Why should this behavior be reserved for opMatch() only? Isn't this a very
 common coding pattern that could also become less verbose by this:

 SomeType result;
 if ( (result = getSomething())) {
        doSomethingWith(result);
 }

 (becoming:

 if (getSomething()) {
        doSomethingWith(_result);
 }

 )

 One suggestion would be to call _result $. Giving $ the semantics of a
 "scope injected value". This would go hand in hand with an earlier
 suggestion of changing the $ for index operations too:

 Assume [] introduces a new scope, then a $ within [] would refer to 
 whatever
 is being indexed.

 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

I never thought of that. It's an intriguing idea.
Feb 16 2006
parent reply pragma <pragma_member pathlink.com> writes:
In article <dt1eje$2uvu$2 digitaldaemon.com>, Walter Bright says...
"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1aif$2qpd$1 digitaldaemon.com...
 Have you considered making this more general? I.e. for all if statements,
 inject a variable that takes the value of the entire condition expression.
 (Using _result as a placeholder for such an identifier.)

 if ("..." ~~ "...) {
  _result.match(0);
 }

 if (myFunc()) {
  _result.whatever();
 }

 Why should this behavior be reserved for opMatch() only? Isn't this a very
 common coding pattern that could also become less verbose by this:

 SomeType result;
 if ( (result = getSomething())) {
        doSomethingWith(result);
 }

 (becoming:

 if (getSomething()) {
        doSomethingWith(_result);
 }

 )

 One suggestion would be to call _result $. Giving $ the semantics of a
 "scope injected value". This would go hand in hand with an earlier
 suggestion of changing the $ for index operations too:

 Assume [] introduces a new scope, then a $ within [] would refer to 
 whatever
 is being indexed.

 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

I never thought of that. It's an intriguing idea.

Something along these lines would *most certainly* get my vote! - Eric Anderton at yahoo
Feb 16 2006
parent reply kris <fu bar.org> writes:
pragma wrote:
 In article <dt1eje$2uvu$2 digitaldaemon.com>, Walter Bright says...
 
"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1aif$2qpd$1 digitaldaemon.com...

Have you considered making this more general? I.e. for all if statements,
inject a variable that takes the value of the entire condition expression.
(Using _result as a placeholder for such an identifier.)

if ("..." ~~ "...) {
 _result.match(0);
}

if (myFunc()) {
 _result.whatever();
}

Why should this behavior be reserved for opMatch() only? Isn't this a very
common coding pattern that could also become less verbose by this:

SomeType result;
if ( (result = getSomething())) {
       doSomethingWith(result);
}

(becoming:

if (getSomething()) {
       doSomethingWith(_result);
}

)

One suggestion would be to call _result $. Giving $ the semantics of a
"scope injected value". This would go hand in hand with an earlier
suggestion of changing the $ for index operations too:

Assume [] introduces a new scope, then a $ within [] would refer to 
whatever
is being indexed.

char[] cutHeadAndTail = myString[1 .. $.length-1];
Image subImage = myImage[$.upperLeft .. $.middle];
char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

I never thought of that. It's an intriguing idea.

Something along these lines would *most certainly* get my vote! - Eric Anderton at yahoo

Yes ~ mine too
Feb 16 2006
parent reply Sean Kelly <sean f4.ca> writes:
kris wrote:
 pragma wrote:
 In article <dt1eje$2uvu$2 digitaldaemon.com>, Walter Bright says...

 "Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
 news:dt1aif$2qpd$1 digitaldaemon.com...
 One suggestion would be to call _result $. Giving $ the semantics of a
 "scope injected value". This would go hand in hand with an earlier
 suggestion of changing the $ for index operations too:

 Assume [] introduces a new scope, then a $ within [] would refer to 
 whatever
 is being indexed.

 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

I never thought of that. It's an intriguing idea.

Something along these lines would *most certainly* get my vote!

Yes ~ mine too

Mine too. Sean
Feb 16 2006
parent reply Sean Kelly <sean f4.ca> writes:
Sean Kelly wrote:
 kris wrote:
 pragma wrote:
 In article <dt1eje$2uvu$2 digitaldaemon.com>, Walter Bright says...

 "Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
 news:dt1aif$2qpd$1 digitaldaemon.com...
 One suggestion would be to call _result $. Giving $ the semantics of a
 "scope injected value". This would go hand in hand with an earlier
 suggestion of changing the $ for index operations too:

 Assume [] introduces a new scope, then a $ within [] would refer to 
 whatever
 is being indexed.

 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

I never thought of that. It's an intriguing idea.

Something along these lines would *most certainly* get my vote!

Yes ~ mine too

Mine too.

Hold on. Walter, can you explain this injection business a bit? For example, the effect here seems clear: if( "x" ~~ "y" ) { _match.blah; } But what about this: if( "x" ~~ "y" && "y" ~~ "z" ) { _match.blah; } And this: if( "x" ~~ "y" || "y" ~~ "z" ) { _match.blah; } Does _match represent the result of the last match sub-expression evaluated? And is there any way to know which expression succeeded? Does the fact that the injected value is a _Match* mean that I might potentially have an array of objects I could iterate through? And finally, could you clarify the spec in this regard? Also, with respect to the above proposal, how might this work: int numStudents(); float avgGrade(); if( numStudents() < 10 || avgGrade() > 50.0 ) { } While the result of each subexression is actually boolean (just as in the match expression above), the values we'd be interested in are the integer and float. But in the above example, the float might not be evaluated at all. I'd merely like to voice this as a qualifier to my initial support of this idea above :-) Sean
Feb 16 2006
parent reply Oskar Linde <olREM OVEnada.kth.se> writes:
Sean Kelly wrote:

 Hold on.  Walter, can you explain this injection business a bit?  For
 example, the effect here seems clear:
 
 if( "x" ~~ "y" ) {
      _match.blah;
 }
 
 But what about this:
 
 if( "x" ~~ "y" && "y" ~~ "z" ) {
      _match.blah;
 }
 
 And this:
 
 if( "x" ~~ "y" || "y" ~~ "z" ) {
      _match.blah;
 }

Those are AndAndExpression and OrOrExpression and will not inject anything. Only a pure if(MatchExpression) injects anything.
 Also, with respect to the above proposal, how might this work:
 
 int numStudents();
 float avgGrade();
 
 if( numStudents() < 10 || avgGrade() > 50.0 ) {
 
 }

In this case, $ would always refer to the value of (numStudents() < 10 || avgGrade() > 50.0), which is bool and must always be true. (It would be interesting to change the || expression into returning the left value if it is nonzero and the right value otherwise, without converting anything to bool, but I'm not fully sure what implications that would have...)
 While the result of each subexression is actually boolean (just as in
 the match expression above), the values we'd be interested in are the
 integer and float.  But in the above example, the float might not be
 evaluated at all.  I'd merely like to voice this as a qualifier to my
 initial support of this idea above :-)

This is probably impossible. How would the compiler know what subexpressions are interesting and how would those be referred to? /Oskar
Feb 16 2006
parent reply Sean Kelly <sean f4.ca> writes:
Oskar Linde wrote:
 Sean Kelly wrote:
 
 Hold on.  Walter, can you explain this injection business a bit?  For
 example, the effect here seems clear:

 if( "x" ~~ "y" ) {
      _match.blah;
 }

 But what about this:

 if( "x" ~~ "y" && "y" ~~ "z" ) {
      _match.blah;
 }

 And this:

 if( "x" ~~ "y" || "y" ~~ "z" ) {
      _match.blah;
 }

Those are AndAndExpression and OrOrExpression and will not inject anything. Only a pure if(MatchExpression) injects anything.

Very weird. So a MatchExpression by itself has a boolean result but injects a value into the following scope?
 Also, with respect to the above proposal, how might this work:

 int numStudents();
 float avgGrade();

 if( numStudents() < 10 || avgGrade() > 50.0 ) {

 }

In this case, $ would always refer to the value of (numStudents() < 10 || avgGrade() > 50.0), which is bool and must always be true. (It would be interesting to change the || expression into returning the left value if it is nonzero and the right value otherwise, without converting anything to bool, but I'm not fully sure what implications that would have...)

So based on the above, your suggestion would only be useful for single call expressions: if( numStudents() ) printf( "%i students\n", $.whatever ); Seems reasonable I suppose.
 While the result of each subexression is actually boolean (just as in
 the match expression above), the values we'd be interested in are the
 integer and float.  But in the above example, the float might not be
 evaluated at all.  I'd merely like to voice this as a qualifier to my
 initial support of this idea above :-)

This is probably impossible. How would the compiler know what subexpressions are interesting and how would those be referred to?

That's fine. I was merely trying to sort out the implications of this new feature. Sean
Feb 16 2006
parent reply Oskar Linde <olREM OVEnada.kth.se> writes:
Sean Kelly wrote:

 Oskar Linde wrote:
 
 Those are AndAndExpression and OrOrExpression and will not inject
 anything. Only a pure if(MatchExpression) injects anything.

Very weird. So a MatchExpression by itself has a boolean result but injects a value into the following scope?

No, not boolean. A MatchExpression has a _Match* result. This result is what gets injected into the following scope. My suggestion is just a generalization of this.
 So based on the above, your suggestion would only be useful for single
 call expressions:
 
 if( numStudents() )
      printf( "%i students\n", $.whatever );
 

Yes. /Oskar
Feb 16 2006
parent Sean Kelly <sean f4.ca> writes:
Oskar Linde wrote:
 Sean Kelly wrote:
 
 Oskar Linde wrote:
 Those are AndAndExpression and OrOrExpression and will not inject
 anything. Only a pure if(MatchExpression) injects anything.

injects a value into the following scope?

No, not boolean. A MatchExpression has a _Match* result. This result is what gets injected into the following scope. My suggestion is just a generalization of this.

Oh right. And pointers can be implicitly evaluates as logical expressions. Makes sense now. Sean
Feb 16 2006
prev sibling parent reply =?ISO-8859-1?Q?Julio_C=E9sar_Carrascal_Urquijo?= writes:
Oskar Linde wrote:
 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

This is a great idea. I like it.
Feb 16 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Julio CÚsar Carrascal Urquijo" <jcesar phreaker.net> wrote in message 
news:dt28a3$o2q$1 digitaldaemon.com...
 Oskar Linde wrote:
 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];

This is a great idea. I like it.

There is one problem with it: every time an IfStatement is added to existing code, it will break all uses of $ in the ThenStatement: ----- before -------- if (foo()) $.bar = 3; ------ after --------- if (foo()) { if (abc()) $.bar = 3; // uh-oh! } ---------------------- This is of course a trivial example, but consider if the $ appeared in a large block of code.
Feb 16 2006
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Walter Bright wrote:
 "Julio CÚsar Carrascal Urquijo" <jcesar phreaker.net> wrote in message 
 news:dt28a3$o2q$1 digitaldaemon.com...
 Oskar Linde wrote:
 char[] cutHeadAndTail = myString[1 .. $.length-1];
 Image subImage = myImage[$.upperLeft .. $.middle];
 char[] contents = text[$.indexOf('{')+1 .. $.indexOf('}')];


There is one problem with it: every time an IfStatement is added to existing code, it will break all uses of $ in the ThenStatement: ----- before -------- if (foo()) $.bar = 3; ------ after --------- if (foo()) { if (abc()) $.bar = 3; // uh-oh! } ---------------------- This is of course a trivial example, but consider if the $ appeared in a large block of code.

Coding guidelines would probably say that $ should be assigned to a named variable for all but the simplest blocks: if (foo()) { auto myvar = $; ... } The $ would be kind of elusive and only usable in its outermost scope. But the MatchExpression injected _match has the same problem. Consider the following hypothetical refactoring example: const char[] two_argument_function_call = r"([_a-zA-Z][_0-9a-zA-Z]*)\(([^,\(\)]+),([^,\(\)]+)\)"; // Find function-calls if (two_argument_function_call ~~ str) { // Swap the order of arguments for functions named array_* if ("array_(.+)" ~~ _match.match(1)) { // Need access results from outer _match. } ... } And here is something the current MatchExpression behavior suffers from that a general scope variable would not: if (a ~~ b) { if (c == d && e ~~ f) { do_something(_match.match(0)); // (*) } } *) here e ~~ f is not injecting its result and _match refers to the result of a ~~ b The apparent innocent change of removing the condition c == d from the if-statement will suddenly and silently have a side effect of injecting a shadowing _match variable and thus alter the argument to do_something(). Maybe this is a good time to consider Ben Hinkle's suggested declare-and-init operator := as a non-verbose way of naming sub-expressions. http://www.digitalmars.com/d/archives/digitalmars/D/28198.html (Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator thread) if (m := a ~~ b) { ... if (n := c ~~ m.match(0)) { ... } } /Oskar
Feb 17 2006
next sibling parent reply pragma <pragma_member pathlink.com> writes:
In article <dt4nqs$2erg$1 digitaldaemon.com>, Oskar Linde says...
Maybe this is a good time to consider Ben Hinkle's suggested 
declare-and-init operator := as a non-verbose way of naming 
sub-expressions.
http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
(Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
thread)

if (m := a ~~ b) {
   ...
   if (n := c ~~ m.match(0)) {
   ...
   }
}

Sort of like an auto 'auto' declaration? I gather that the point is that the lvalue to the := expession is transparent to the context in which it is used (kind of inlining a variable creation and assignment)? Also, how about using $.outer instead? Link for "SSO" thread (with syntax examples at bottom of post): digitalmars.D/33645 - Eric Anderton at yahoo
Feb 17 2006
parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
pragma wrote:
 In article <dt4nqs$2erg$1 digitaldaemon.com>, Oskar Linde says...
 Maybe this is a good time to consider Ben Hinkle's suggested 
 declare-and-init operator := as a non-verbose way of naming 
 sub-expressions.
 http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
 (Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
 thread)

 if (m := a ~~ b) {
   ...
   if (n := c ~~ m.match(0)) {
   ...
   }
 }

Sort of like an auto 'auto' declaration? I gather that the point is that the lvalue to the := expession is transparent to the context in which it is used (kind of inlining a variable creation and assignment)?

Yes, exactly so. The scope of such variables declared in the operand of for example if-statements should probably be similar to the scope of variables declared in the init-part of a for-declaration.
 Also, how about using $.outer instead?

$.outer could collide with a member identifier. Maybe using the keyword super somehow... or append another $, like $$ for outer, $$$ for outer(outer). I don't think it's very necessary when you can do auto outer = $; before starting the inner scope. /Oskar
Feb 17 2006
prev sibling next sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
news:dt4nqs$2erg$1 digitaldaemon.com...
 The apparent innocent change of removing the condition c == d from the 
 if-statement will suddenly and silently have a side effect of injecting a 
 shadowing _match variable and thus alter the argument to do_something().

Yes, that's a problem.
 Maybe this is a good time to consider Ben Hinkle's suggested 
 declare-and-init operator := as a non-verbose way of naming 
 sub-expressions.
 http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
 (Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
 thread)

 if (m := a ~~ b) {
   ...
   if (n := c ~~ m.match(0)) {
   ...
   }
 }

It's a workable proposal. But it overlaps the functionality of 'auto' declarations a bit much. And: if (auto m = a ~~ b) might be a little wordy? Perhaps: if (m; a ~~ b) sort of along the lines of foreach?
Feb 17 2006
next sibling parent Fredrik Olsson <peylow gmail.com> writes:
Walter Bright skrev:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:dt4nqs$2erg$1 digitaldaemon.com...
 
The apparent innocent change of removing the condition c == d from the 
if-statement will suddenly and silently have a side effect of injecting a 
shadowing _match variable and thus alter the argument to do_something().

Yes, that's a problem.
Maybe this is a good time to consider Ben Hinkle's suggested 
declare-and-init operator := as a non-verbose way of naming 
sub-expressions.
http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
(Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
thread)

if (m := a ~~ b) {
  ...
  if (n := c ~~ m.match(0)) {
  ...
  }
}

It's a workable proposal. But it overlaps the functionality of 'auto' declarations a bit much. And: if (auto m = a ~~ b) might be a little wordy? Perhaps: if (m; a ~~ b) sort of along the lines of foreach?

I have shuddred allot while reading this thread, I do not like too much magic happening in my code. This one is neat and simple, consistent with existing syntax. And most importantly; makes it quite hard to write incorrect code. // Fredrik Olsson
Feb 17 2006
prev sibling next sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:dt4nqs$2erg$1 digitaldaemon.com...
 The apparent innocent change of removing the condition c == d from the 
 if-statement will suddenly and silently have a side effect of injecting a 
 shadowing _match variable and thus alter the argument to do_something().

Yes, that's a problem.
 Maybe this is a good time to consider Ben Hinkle's suggested 
 declare-and-init operator := as a non-verbose way of naming 
 sub-expressions.
 http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
 (Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
 thread)

 if (m := a ~~ b) {
   ...
   if (n := c ~~ m.match(0)) {
   ...
   }
 }

It's a workable proposal. But it overlaps the functionality of 'auto' declarations a bit much. And: if (auto m = a ~~ b) might be a little wordy? Perhaps: if (m; a ~~ b) sort of along the lines of foreach?

I like it. Assuming this were implemented, would it affect all conditional expressions except foreach? Sean
Feb 17 2006
parent Sai <Sai_member pathlink.com> writes:
     if (auto m = a ~~ b)
 
 might be a little wordy? Perhaps:
 
     if (m; a ~~ b)
 

I personally like the former, it does not need special 'if' syntax.
Feb 17 2006
prev sibling next sibling parent Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:
Walter Bright wrote:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:dt4nqs$2erg$1 digitaldaemon.com...
 
The apparent innocent change of removing the condition c == d from the 
if-statement will suddenly and silently have a side effect of injecting a 
shadowing _match variable and thus alter the argument to do_something().

Yes, that's a problem.
Maybe this is a good time to consider Ben Hinkle's suggested 
declare-and-init operator := as a non-verbose way of naming 
sub-expressions.
http://www.digitalmars.com/d/archives/digitalmars/D/28198.html
(Also similar to Serg Kovrovs suggestion on the Semantic Scope Operator 
thread)

if (m := a ~~ b) {
  ...
  if (n := c ~~ m.match(0)) {
  ...
  }
}

It's a workable proposal. But it overlaps the functionality of 'auto' declarations a bit much. And: if (auto m = a ~~ b) might be a little wordy? Perhaps: if (m; a ~~ b) sort of along the lines of foreach?

How would this scale to something like if((a ~~ b) && (c ~~ d)) would it be: if( m; a~~b && n; c~~d) ? This looks confusing to me. Wouldn't ':' look better here: if( m: a~~b && n: c~~d) ? But I think i like Ben's declare nad init := operator best in this case.
Feb 17 2006
prev sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Walter Bright wrote:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:dt4nqs$2erg$1 digitaldaemon.com...
 
 The apparent innocent change of removing the condition c == d from
 the if-statement will suddenly and silently have a side effect of
 injecting a shadowing _match variable and thus alter the argument
 to do_something().

Yes, that's a problem.
 Maybe this is a good time to consider Ben Hinkle's suggested 
 declare-and-init operator := as a non-verbose way of naming 
 sub-expressions. 
 http://www.digitalmars.com/d/archives/digitalmars/D/28198.html 
 (Also similar to Serg Kovrovs suggestion on the Semantic Scope
 Operator thread)
 
 if (m := a ~~ b) { ... if (n := c ~~ m.match(0)) { ... } }

It's a workable proposal. But it overlaps the functionality of 'auto' declarations a bit much. And: if (auto m = a ~~ b) might be a little wordy? Perhaps: if (m; a ~~ b) sort of along the lines of foreach?

I'm uneasy with this. We're playing with fundamental constructs here. if( ; ) is something so pivotal, that we should give this careful thought. If it took us 4 years of hard work to get rid of bit, what will happen when this gets rushed into the language without due diligence?
Feb 17 2006
next sibling parent reply "Kris" <fu bar.com> writes:
"Georg Wrede" <georg.wrede nospam.org> wrote
[snip]
 if (auto m = a ~~ b)

 might be a little wordy? Perhaps:

 if (m; a ~~ b)

 sort of along the lines of foreach?

I'm uneasy with this. We're playing with fundamental constructs here. if( ; ) is something so pivotal, that we should give this careful thought. If it took us 4 years of hard work to get rid of bit, what will happen when this gets rushed into the language without due diligence?

I'm all for getting some kind of regex sugar in the grammar, but also feel a bit alarmed about the sudden rush to 'slam' all this into the language. Seems like it would be wiser to approach this whole thing in smaller steps: let's see how foreach() goes first?
Feb 17 2006
parent reply Sean Kelly <sean f4.ca> writes:
Kris wrote:
 
 I'm all for getting some kind of regex sugar in the grammar, but also feel a 
 bit alarmed about the sudden rush to 'slam' all this into the language. 
 Seems like it would be wiser to approach this whole thing in smaller steps: 
 let's see how foreach() goes first? 

As long as these new features don't break old code, I'm fine with Walter trying things out. After all, the best way to solicit input is often to give people something to play with. But it would be nice if there were a way to have these features flagged as "experimental." Sean
Feb 17 2006
parent "Kris" <fu bar.com> writes:
"Sean Kelly" <sean f4.ca> wrote...
 Kris wrote:
 I'm all for getting some kind of regex sugar in the grammar, but also 
 feel a bit alarmed about the sudden rush to 'slam' all this into the 
 language. Seems like it would be wiser to approach this whole thing in 
 smaller steps: let's see how foreach() goes first?

As long as these new features don't break old code, I'm fine with Walter trying things out. After all, the best way to solicit input is often to give people something to play with. But it would be nice if there were a way to have these features flagged as "experimental."

That would be cool.
Feb 17 2006
prev sibling parent Sean Kelly <sean f4.ca> writes:
Georg Wrede wrote:
 
 I'm uneasy with this. We're playing with fundamental constructs here.
 
 if( ; )
 
 is something so pivotal, that we should give this careful thought.
 
 If it took us 4 years of hard work to get rid of bit, what will happen 
 when this gets rushed into the language without due diligence?

True enough. However, the above syntax is currently illegal, so there's no change of something breaking, and C/C++ already allow declarations in if blocks via the traditional method: if( int x = foo() ) {} One of Walter's other suggestions was to use this syntax, with the qualification that it was a bit verbose. One thing I like about the proposed syntax is that it's already how foreach works, so the semantic meaning is mostly just being extended to if and while blocks. The 'for' syntax doesn't match this however, which may be one argument in favor of the more traditional 'auto' method. Personally, my primary interest is that the syntax be both consistent and obvious. Both of the above work for me, but I favor "if( x; foo() )" if implicit type determination is mandatory. If it's not, I'm ambivalent. Sean
Feb 17 2006
prev sibling parent Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Oskar Linde wrote:
 Coding guidelines would probably say that $ should be assigned to a
 named variable for all but the simplest blocks:
 if (foo()) {
     auto myvar = $;
     ...
 }
 The $ would be kind of elusive and only usable in its outermost scope.

That would sort of make the whole token pointless IMO - easier just to do something like: if ((myvar = foo()) != 0) or whatever, I'm not sure exactly how the syntax currently works for this.
Feb 17 2006
prev sibling next sibling parent reply kris <fu bar.org> writes:
Walter Bright wrote:
 "Kris" <fu bar.com> wrote in message news:dt0q7n$2cuo$1 digitaldaemon.com...
 
There seem to be multiple issues here. The first one, which you ask about, 
is related to the syntax. At first blush, the ~~ looks like an approximate 
approximation, and then making D look like a malformed Perl is surely a 
mistake.

If you've got a better idea for tokens ~~ and !~ ?

Well, there's always "in" ... if (".wav$" in filename) ... plus the !in variation. Don't you find that somewhat more appealing?
What the heck is wrong with $match.pre, $match.post, $match.index(n) 
instead? At least they're readable :-)

Nothing, really. But are they more readable than _match.pre, etc.?

I believe the shortened versions ($pre, $post, $group[n] etc) are much more readable. This type of thing is why some of us were so adamant about saving the $ sign as a prefix for meta-tags, vis-a-vis $time, $file, $line and, of course, $length
Additionally, I thought '~' was used for concatenation?

It is.
Because '+' is overloaded in other languages? Isn't that just exactly what 
you're now doing with '~' ?

'=' and '==' mean entirely different things. So does / and /*. I don't think ~~ need have anything to do with complement or concatenation.

The first two are at least related. But the argument is flawed: choosing arbitrary symbols for operators does not make the language easier to grasp. At least "in" has some relevant meaning to it.
Then, you say this is applicable only to char[]. What about wchar[] and 
dchar[]? Are they now relegated to second-class citizens? It's no use 
converting those arrays into char[] on the fly ~ apart from the heap 
activity and conversion that would ensue (for both operands; one of which 
could be rather substantial), $match.pre and friends would also have to do 
conversions back into the original format. Ugghh.

That is a problem, one that would get solved when RegExp can do wchar and dchar. That isn't a technical problem, it's more of a getting around to it problem.

Well, since grammar supported regex has elevated itself to the top of the priority list, perhaps wchar/dchar support might tag along with it?
Yet another issue is with respect to case-folding (which is often used 
with regex expressions). You see, unicode case-folding does not follow the 
trivial rules of ASCII ~ you can't just call tolower() and hope for the 
best. Thus, there needs to be some mechanism to support alternate, more 
appropriate, converters.

I agree that case is an issue. That's why this also works: if (RegExp("string", "i") ~~ "string") ... and can work with any class type as the left operand, as long as it overloads opMatch.

That's a good solution. Do you have a unicode 'folder' ?
In retrospect, much of this should probably be handled via template usage 
(for the different UTF types). And the converter issue can be resolved by 
supporting some kind of assignable or plug-in module. All of this can be 
handled by a templated class. I attempted to do just this with your RegExp 
class, but ran into problems related to how patterns are stored in the 
"instruction" stream (size differences between char and dchar, for 
example).

I don't agree. The problem I ran into with this approach is the injection of the declaration _match into the current scope.

I don't understand the relevance of that, Walter. What does _match have to do with the need to support utf8,utf16 and utf32?
I'm an advocate for potentially getting regex support into the grammar 
but, on the face of it, your approach just doesn't appear to be considered 
in a particularly thorough manner. There again, perhaps you've already 
addressed the above issues, and the resolution is just not currently 
visible?

I considered many ways of doing it, and have actually been thinking about it for months. This seemed to be the most practical. I hope I answered your questions about it.

No, but the opMatch() is a good solution for that aspect.
Perhaps this whole thing should wait until after we see what can be done 
with the regex templates, so that there's some experience behind the 
grammar? I mean, that would surely be better than having to remove the 
above at some point in the future. What's the big rush with built-in regex 
anyway? I really do think it should wait until we have some solid 
experience with regex templates ~ don't you think it's rather likely we'll 
learn something really useful that applies directly to a built-in grammar?

I don't think this takes away from the regex templates. I hope to use the regex templates in conjunction with this syntactic sugar to create optimized regex evaluation.

Perhaps, but I really don't see the need for this sudden rush to get regex support into the grammar. Experience with regex templates is almost certain to uncover some conflict in this regard ~ one that will likely have to be compromised to fit in with the current syntax. That's just Murphy's law. What's the big hurry?
Feb 16 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"kris" <fu bar.org> wrote in message news:dt1cm1$2t76$1 digitaldaemon.com...
 Walter Bright wrote:
 Well, there's always "in" ...

 if (".wav$" in filename)
     ...
 plus the !in variation. Don't you find that somewhat more appealing?

Not really. I think it also conflicts with 'in' already.
What the heck is wrong with $match.pre, $match.post, $match.index(n) 
instead? At least they're readable :-)


more readable. This type of thing is why some of us were so adamant about saving the $ sign as a prefix for meta-tags, vis-a-vis $time, $file, $line and, of course, $length

Fair enough. Let's see what others think.
Additionally, I thought '~' was used for concatenation?

Because '+' is overloaded in other languages? Isn't that just exactly 
what you're now doing with '~' ?

think ~~ need have anything to do with complement or concatenation.

arbitrary symbols for operators does not make the language easier to grasp.

It's all a matter of what you're used to. Who'd have thought that '!' for 'not' would feel natural? It was a kludge invented for C. Now it's standard.
 At least "in" has some relevant meaning to it.

It would be overloading its existing meaning, which means that it'll take semantic, rather than syntactic, analysis to disambiguate. This is potential trouble.
 That is a problem, one that would get solved when RegExp can do wchar and 
 dchar. That isn't a technical problem, it's more of a getting around to 
 it problem.

priority list, perhaps wchar/dchar support might tag along with it?

The thing is, RegExp has been in there from the beginning, but it has gone unused and even its existence is overlooked. I don't believe that's because it isn't useful - look at Ruby, Perl, Javascript, etc. Those languages heavilly use regex. Is there something inherent about *script* languages that make them nice for regex? I don't believe there is, I think it gets heavilly used in those languages because the syntactic sugar makes it easy to use. I've been blasted for putting strings in the language (instead of as a library String class), for putting complex numbers in, and for associative arrays. I think the results speak for these being a success. If regex's are heavilly used, then the extra sugar for them becomes worthwhile as well. Who uses regex in C++? Hardly anyone. I'm betting it's because using them sucks in C++, not because people don't use regex's.
 I agree that case is an issue. That's why this also works:

     if (RegExp("string", "i") ~~ "string") ...

 and can work with any class type as the left operand, as long as it 
 overloads opMatch.


No. But that's a library issue, not a language issue. Match expressions are set up so that one can completely control their behavior with a custom class.
In retrospect, much of this should probably be handled via template usage 
(for the different UTF types). And the converter issue can be resolved by 
supporting some kind of assignable or plug-in module. All of this can be 
handled by a templated class. I attempted to do just this with your 
RegExp class, but ran into problems related to how patterns are stored in 
the "instruction" stream (size differences between char and dchar, for 
example).

of the declaration _match into the current scope.

do with the need to support utf8,utf16 and utf32?

Nothing. But _match *does* have a lot to do with the inadequacy of a pure template solution. Not even mixins will work in a nice way here.
 I don't think this takes away from the regex templates. I hope to use the 
 regex templates in conjunction with this syntactic sugar to create 
 optimized regex evaluation.

support into the grammar. Experience with regex templates is almost certain to uncover some conflict in this regard ~ one that will likely have to be compromised to fit in with the current syntax. That's just Murphy's law. What's the big hurry?

I thought it fit in well with D's new capability of being runnable in a script-like fashion. If this opens up a reasonably broad new range of applications that D is a good fit for, that's good. I might be wrong, of course, as I've been with the bit data type (a complete botch). Match expressions don't break anything, were not expensive to implement, and the only way to see how they'll work out is to try them.
Feb 16 2006
next sibling parent reply kris <fu bar.org> writes:
Walter Bright wrote:
 "kris" <fu bar.org> wrote in message news:dt1cm1$2t76$1 digitaldaemon.com...
 
Walter Bright wrote:
Well, there's always "in" ...

if (".wav$" in filename)
    ...
plus the !in variation. Don't you find that somewhat more appealing?

Not really. I think it also conflicts with 'in' already.

but not from the users standpoint
 It's all a matter of what you're used to. Who'd have thought that '!' for 
 'not' would feel natural? It was a kludge invented for C. Now it's standard.

That doesn't mean D should adopt arbitrary symbols, Walter. If you want rapid adoption, then the more you can do to make the language "approachable", the more success you'll have. There was a similar issue with === and !==, and you thankfully deprecated them :-)
At least "in" has some relevant meaning to it.

It would be overloading its existing meaning, which means that it'll take semantic, rather than syntactic, analysis to disambiguate. This is potential trouble.

I can see that there "might" be trouble for the compiler and, if so, that would be an issue. However, for a developer, the meaning of "in" with respect to its use with AA and potentially regex-patterns is consistent. One is asking the question "does this thing on the left exist within the thing on the right". It even takes care of getting the operand ordering correct. Thus, I'd urge you to at least see if there's actually a notable problem for the compiler to handle this before writing the idea off.
 The thing is, RegExp has been in there from the beginning, but it has gone 
 unused and even its existence is overlooked. I don't believe that's because 
 it isn't useful - look at Ruby, Perl, Javascript, etc. Those languages 
 heavilly use regex. Is there something inherent about *script* languages 
 that make them nice for regex? I don't believe there is, I think it gets 
 heavilly used in those languages because the syntactic sugar makes it easy 
 to use.

Heck, I've used regex in all manner of ways. I don't think visibility is the problem; rather, I suspect there's a limited set of domains where it applies in a systems language. Some of the those can be addressed in other ways, particularly where performance is a concern; hence regex may not get used as much as it might. In scripting languages there's often a need for Q & D pattern-matching, with little regard for a potentially more efficient mechanism. Horses for courses.
 I've been blasted for putting strings in the language (instead of as a 
 library String class), for putting complex numbers in, and for associative 
 arrays. I think the results speak for these being a success. If regex's are 
 heavilly used, then the extra sugar for them becomes worthwhile as well.

That's getting a bit off topic, isn't it? OK, I'll go with it: I'm an advocate for getting regex support in the grammar, but I'm certainly not an advocate for tying Phobos to the compiler (RegExp has a notable resultant import set; because of this I refactored it for Ares and Mango). Without a clearly defined means to decouple Phobos from the compiler, you're effectively erecting barriers for other solutions to clamber over (as Sean vaguely intimated earlier). What's missing from all this built-in stuff is a clean and documented means to have it supported outside of Phobos. After all, the compiler is injecting explicit references for AA code, utf conversion code, regex code, and a variety of other things. What's next? In short: you're (a) building more and more library functionality directly into the language without providing a means to cleanly support alternate implementations, extensions, or otherwise decouple the compiler. And (b) by doing so, you're (perhaps inadvertantly) stifling some innovation and causing some headaches for the very people who are trying to help D along the road to acceptance. It would really help if you'd be somewhat sensitive to these aspects rather than persistently ignoring them. For instance, how does one change .sort to use a different sorting algorithm? How does one change the hashing function for non-classes? How can one unhook RegExp+OutBuffer+String+Others, and replace it? etc. etc. If D is intended to be a closed-shop, Phobos-only environment, then some of us are presumably wasting our time supporting the language; right? I don't suppose that was the answer you were looking for <g>
 Who uses regex in C++? Hardly anyone. I'm betting it's because using them 
 sucks in C++, not because people don't use regex's.

Again, it's horses for courses. BTW, regex does not suck in C, so why C++ ?
 I thought it fit in well with D's new capability of being runnable in a 
 script-like fashion. If this opens up a reasonably broad new range of 
 applications that D is a good fit for, that's good. I might be wrong, of 
 course, as I've been with the bit data type (a complete botch). Match 
 expressions don't break anything, were not expensive to implement, and the 
 only way to see how they'll work out is to try them. 

I figured that was the motivation. The "cost" you speak of considers only how much effort it takes you to get the functionality into the compiler, test it a bit, document the usage, and respond to the flak ;-) BTW: perhaps it would be appropriate to deprecate bit[] before 1.0 and provide a nice library class/struct instead? You might even reuse the old code from Zortech/Zorland days.
Feb 16 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
kris wrote:
 Walter Bright wrote:
 
 I've been blasted for putting strings in the language (instead of as a 
 library String class), for putting complex numbers in, and for 
 associative arrays. I think the results speak for these being a 
 success. If regex's are heavilly used, then the extra sugar for them 
 becomes worthwhile as well.

That's getting a bit off topic, isn't it? OK, I'll go with it: I'm an advocate for getting regex support in the grammar, but I'm certainly not an advocate for tying Phobos to the compiler (RegExp has a notable resultant import set; because of this I refactored it for Ares and Mango). Without a clearly defined means to decouple Phobos from the compiler, you're effectively erecting barriers for other solutions to clamber over (as Sean vaguely intimated earlier). What's missing from all this built-in stuff is a clean and documented means to have it supported outside of Phobos. After all, the compiler is injecting explicit references for AA code, utf conversion code, regex code, and a variety of other things. What's next?

I'm branching Ares before I check in this last block of changes. In the new branch I'm simply going to move all necessary Phobos std code required into dmdrt/util and will plan to trim it down over time. Not ideal, I know, but better than trying to play catch-up with heavily modified code such as the version of RegExp you provided. For the rest, I agree completely, but then I've already said as much in d.D.announce :-)
 Who uses regex in C++? Hardly anyone. I'm betting it's because using 
 them sucks in C++, not because people don't use regex's.

Again, it's horses for courses. BTW, regex does not suck in C, so why C++ ?

The lack of a standard library component is a significant factor IMO. As is the widely divergent syntaxes supported by third party libraries. Personally, I haven't used regular expressions in D because I haven't needed to yet, not because they weren't a language feature. But I can't help liking this being built-in from a language perspective, even if this is balanced by practical concerns.
 I thought it fit in well with D's new capability of being runnable in 
 a script-like fashion. If this opens up a reasonably broad new range 
 of applications that D is a good fit for, that's good. I might be 
 wrong, of course, as I've been with the bit data type (a complete 
 botch). Match expressions don't break anything, were not expensive to 
 implement, and the only way to see how they'll work out is to try them. 

I figured that was the motivation. The "cost" you speak of considers only how much effort it takes you to get the functionality into the compiler, test it a bit, document the usage, and respond to the flak ;-) BTW: perhaps it would be appropriate to deprecate bit[] before 1.0 and provide a nice library class/struct instead? You might even reuse the old code from Zortech/Zorland days.

If it helps, I'll send you a case of beer or something ;-) But if there's universal agreement that packed bit arrays were a mistake then they need to be out pre-1.0 and broken code be damned. I really don't want to see a 1.0 D release containing features that even the designer thinks should not exist. Sean
Feb 16 2006
parent reply Thomas Kuehne <thomas-dloop kuehne.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sean Kelly schrieb am 2006-02-16:
 kris wrote:

[snip]
 BTW: perhaps it would be appropriate to deprecate bit[] before 1.0 and 
 provide a nice library class/struct instead? You might even reuse the 
 old code from Zortech/Zorland days.

If it helps, I'll send you a case of beer or something ;-) But if there's universal agreement that packed bit arrays were a mistake then they need to be out pre-1.0 and broken code be damned. I really don't want to see a 1.0 D release containing features that even the designer thinks should not exist.

What is the cost of keeping bit[] in the language? Currently, every type - including void - can be used as the type on an array element. What would be the consequences for generic programming if T -> T[] isn't guaranteed to succeed? Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFD9QXm3w+/yD4P9tIRAruLAJ96SNaO7jn85lXJxyxXmMVsS3bPZACdG1pd KBuKJE2ogwPwg0YSHeGIJ+A= =+ZUL -----END PGP SIGNATURE-----
Feb 16 2006
next sibling parent Sean Kelly <sean f4.ca> writes:
Thomas Kuehne wrote:
 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1
 
 Sean Kelly schrieb am 2006-02-16:
 kris wrote:

[snip]
 BTW: perhaps it would be appropriate to deprecate bit[] before 1.0 and 
 provide a nice library class/struct instead? You might even reuse the 
 old code from Zortech/Zorland days.

there's universal agreement that packed bit arrays were a mistake then they need to be out pre-1.0 and broken code be damned. I really don't want to see a 1.0 D release containing features that even the designer thinks should not exist.

What is the cost of keeping bit[] in the language? Currently, every type - including void - can be used as the type on an array element. What would be the consequences for generic programming if T -> T[] isn't guaranteed to succeed?

The same as the problems with std::vector<bool> in C++ (though I don't have any specific references handy). I think the true ramifications of this in D won't be completely apparent until the language has been in use a bit longer however. One thought I had was to leave bit in place, perhaps deprecated, and add 'bool' as a non-packed but otherwise equivalent type. Sean
Feb 16 2006
prev sibling parent reply "Kris" <fu bar.com> writes:
"Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

Easy fix ~ change the bool alias to byte, instead of bit :-)
Feb 16 2006
parent reply Sean Kelly <sean f4.ca> writes:
Kris wrote:
 "Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

Easy fix ~ change the bool alias to byte, instead of bit :-)

I already use byte in some cases :-) But it lacks the boolean value safety of bit, so I tend to litter my code with asserts just to be sure something didn't get screwed up... or simply make sure I'm only comparing to zero and not-zero. Either way, it's more error prone than I'd like. Sean
Feb 16 2006
parent reply "Kris" <fu bar.com> writes:
"Sean Kelly" <sean f4.ca> wrote
 Kris wrote:
 "Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

Easy fix ~ change the bool alias to byte, instead of bit :-)

I already use byte in some cases :-) But it lacks the boolean value safety of bit, so I tend to litter my code with asserts just to be sure something didn't get screwed up... or simply make sure I'm only comparing to zero and not-zero. Either way, it's more error prone than I'd like.

Yes, you're right of course. Would be just great if Walter would add a true *cough* bool *cough* type that doesn't try to pack itself when used with arrays. Packed bits are great too, but for different reasons.
Feb 16 2006
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 16 Feb 2006 16:36:48 -0800, Kris <fu bar.com> wrote:
 "Sean Kelly" <sean f4.ca> wrote
 Kris wrote:
 "Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

Easy fix ~ change the bool alias to byte, instead of bit :-)

I already use byte in some cases :-) But it lacks the boolean value safety of bit, so I tend to litter my code with asserts just to be sure something didn't get screwed up... or simply make sure I'm only comparing to zero and not-zero. Either way, it's more error prone than I'd like.

Yes, you're right of course. Would be just great if Walter would add a true *cough* bool *cough* type that doesn't try to pack itself when used with arrays.

A true bool would make several people happy.. but once one existed people would then want: class A {} A a = new a(); if (a) //error not boolean result. right? That would bother me.
 Packed bits are great too, but for different reasons.

Indeed, I can think of several uses for packed bits. i.e. - Using them as a bunch of flags, generally boolean on/off flags. - Representing/disecting packed data, i.e. tcp headers. - Assembling/converting data i.e. 8bit to 7bit characters for SMS messages. all of these can be done with & | ^ etc but it would be nice, i.e. more readable, easier to write if we could index the data. I've suggested this before but is it perhaps possible to allow us to perform array operations on the basic types: byte, short, int, long. For the same reason that bit[] does not work, these could not provide a full set of array functionality, but it could provide much that would be of use, I suspect. Examples: int flags; ... if (flags[5]) //check for flag flag[5] = 1; //set flag void foo(long header) { int length = header[0..5]; //copy bits to lvalue. ... For the 3rd task, converting from 8bit to 7bit some sort of stream that allowed bits to be sent to it and assembled would be the ideal way, I suspect. In the end it's just syntactic sugar for & | and ^. The question is, does it make the code clearer, I think so. Does it make bit manipulation easier to code, I think so. Is that enough to make it a valuable feature? Regan
Feb 16 2006
next sibling parent Sean Kelly <sean f4.ca> writes:
Regan Heath wrote:
 On Thu, 16 Feb 2006 16:36:48 -0800, Kris <fu bar.com> wrote:
 "Sean Kelly" <sean f4.ca> wrote
 Kris wrote:
 "Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

Easy fix ~ change the bool alias to byte, instead of bit :-)

I already use byte in some cases :-) But it lacks the boolean value safety of bit, so I tend to litter my code with asserts just to be sure something didn't get screwed up... or simply make sure I'm only comparing to zero and not-zero. Either way, it's more error prone than I'd like.

Yes, you're right of course. Would be just great if Walter would add a true *cough* bool *cough* type that doesn't try to pack itself when used with arrays.

A true bool would make several people happy.. but once one existed people would then want: class A {} A a = new a(); if (a) //error not boolean result. right? That would bother me.

This is only a slippery slope if we want it to be ;-) I think the intent behind adding 'bool' was twofold: first, 'bit' loses meaning if it never actually refers to a bit, and second, it allows 'bit' to be deprecated for a while so people can change their code.
 Packed bits are great too, but for different reasons.

Indeed, I can think of several uses for packed bits. i.e. - Using them as a bunch of flags, generally boolean on/off flags. - Representing/disecting packed data, i.e. tcp headers. - Assembling/converting data i.e. 8bit to 7bit characters for SMS messages. all of these can be done with & | ^ etc but it would be nice, i.e. more readable, easier to write if we could index the data.

Aye. I like the idea of packed bit arrays in general. I just don't want them to be mandatory for the built-in boolean type--I run into too many situations where I want to do something that the existing syntax doesn't support and I'm stuck using an array of bytes instead. Sean
Feb 16 2006
prev sibling next sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Regan Heath" <regan netwin.co.nz> wrote in message 
news:ops43d5lmc23k2f5 nrage.netwin.co.nz...
 In the end it's just syntactic sugar for & | and ^. The question is, does 
 it make the code clearer, I think so. Does it make bit manipulation easier 
 to code, I think so. Is that enough to make it a valuable feature?

I regularly do bit masking and shifting on ints. I'm so used to it, I don't think that adding sugar for it would help any.
Feb 16 2006
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 16 Feb 2006 17:25:23 -0800, Walter Bright wrote:

 "Regan Heath" <regan netwin.co.nz> wrote in message 
 news:ops43d5lmc23k2f5 nrage.netwin.co.nz...
 In the end it's just syntactic sugar for & | and ^. The question is, does 
 it make the code clearer, I think so. Does it make bit manipulation easier 
 to code, I think so. Is that enough to make it a valuable feature?

I regularly do bit masking and shifting on ints. I'm so used to it, I don't think that adding sugar for it would help any.

YOU ARE DEAD WRONG! Sheesh!!! Not all us are blessed with your abilities. That's why I don't do Assembler anymore and that's why we use higher level languages than machine code. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 17/02/2006 12:40:36 PM
Feb 16 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:edpqlnztl599.19xc3uf14ntbh.dlg 40tude.net...
 On Thu, 16 Feb 2006 17:25:23 -0800, Walter Bright wrote:

 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:ops43d5lmc23k2f5 nrage.netwin.co.nz...
 In the end it's just syntactic sugar for & | and ^. The question is, 
 does
 it make the code clearer, I think so. Does it make bit manipulation 
 easier
 to code, I think so. Is that enough to make it a valuable feature?

I regularly do bit masking and shifting on ints. I'm so used to it, I don't think that adding sugar for it would help any.

YOU ARE DEAD WRONG! Sheesh!!! Not all us are blessed with your abilities.

What about using some functions instead: int setBit(inout v, int b) { return v |= 1 << b; } ?
 That's why I don't do Assembler anymore and that's why we use higher level
 languages than machine code.

<g>
Feb 16 2006
parent Derek Parnell <derek psych.ward> writes:
On Thu, 16 Feb 2006 17:48:54 -0800, Walter Bright wrote:

 "Derek Parnell" <derek psych.ward> wrote in message 
 news:edpqlnztl599.19xc3uf14ntbh.dlg 40tude.net...
 On Thu, 16 Feb 2006 17:25:23 -0800, Walter Bright wrote:

 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:ops43d5lmc23k2f5 nrage.netwin.co.nz...
 In the end it's just syntactic sugar for & | and ^. The question is, 
 does
 it make the code clearer, I think so. Does it make bit manipulation 
 easier
 to code, I think so. Is that enough to make it a valuable feature?

I regularly do bit masking and shifting on ints. I'm so used to it, I don't think that adding sugar for it would help any.

YOU ARE DEAD WRONG! Sheesh!!! Not all us are blessed with your abilities.

What about using some functions instead: int setBit(inout v, int b) { return v |= 1 << b; } ?

You mean like std.regexp library functions? Oh that's right ... we have ~~ now; silly me. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 17/02/2006 1:49:07 PM
Feb 16 2006
prev sibling parent "Kris" <fu bar.com> writes:
"Walter Bright" <newshound digitalmars.com> wrote
 "Regan Heath" <regan netwin.co.nz> wrote in message 
 news:ops43d5lmc23k2f5 nrage.netwin.co.nz...
 In the end it's just syntactic sugar for & | and ^. The question is, does 
 it make the code clearer, I think so. Does it make bit manipulation 
 easier to code, I think so. Is that enough to make it a valuable feature?

I regularly do bit masking and shifting on ints. I'm so used to it, I don't think that adding sugar for it would help any.

Besides, its easy to use op-overloads for such things as necessary.
Feb 16 2006
prev sibling next sibling parent Derek Parnell <derek psych.ward> writes:
On Fri, 17 Feb 2006 13:54:47 +1300, Regan Heath wrote:

 On Thu, 16 Feb 2006 16:36:48 -0800, Kris <fu bar.com> wrote:
 "Sean Kelly" <sean f4.ca> wrote
 Kris wrote:
 "Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

Easy fix ~ change the bool alias to byte, instead of bit :-)

I already use byte in some cases :-) But it lacks the boolean value safety of bit, so I tend to litter my code with asserts just to be sure something didn't get screwed up... or simply make sure I'm only comparing to zero and not-zero. Either way, it's more error prone than I'd like.

Yes, you're right of course. Would be just great if Walter would add a true *cough* bool *cough* type that doesn't try to pack itself when used with arrays.

A true bool would make several people happy.. but once one existed people would then want: class A {} A a = new a(); if (a) //error not boolean result. right? That would bother me.

I regard the syntax if ( <identifier> ) as shorthand for if ( <identifier> != 0 ) or if ( <identifier> !is null) as appropriate, so this would not fall foul of a native boolean implementation. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 17/02/2006 12:37:40 PM
Feb 16 2006
prev sibling parent reply Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:
Regan Heath wrote:
 On Thu, 16 Feb 2006 16:36:48 -0800, Kris <fu bar.com> wrote:
 "Sean Kelly" <sean f4.ca> wrote
 Kris wrote:
 "Thomas Kuehne" <thomas-dloop kuehne.cn> wrote ...
 Currently, every type - including void - can be used as the type on an
 array element. What would be the consequences for generic programming
 if T -> T[] isn't guaranteed to succeed?

Easy fix ~ change the bool alias to byte, instead of bit :-)

I already use byte in some cases :-) But it lacks the boolean value safety of bit, so I tend to litter my code with asserts just to be sure something didn't get screwed up... or simply make sure I'm only comparing to zero and not-zero. Either way, it's more error prone than I'd like.

Yes, you're right of course. Would be just great if Walter would add a true *cough* bool *cough* type that doesn't try to pack itself when used with arrays.

A true bool would make several people happy.. but once one existed people would then want: class A {} A a = new a(); if (a) //error not boolean result. right? That would bother me.

 Packed bits are great too, but for different reasons.

Indeed, I can think of several uses for packed bits. i.e. - Using them as a bunch of flags, generally boolean on/off flags. - Representing/disecting packed data, i.e. tcp headers. - Assembling/converting data i.e. 8bit to 7bit characters for SMS messages. all of these can be done with & | ^ etc but it would be nice, i.e. more readable, easier to write if we could index the data. I've suggested this before but is it perhaps possible to allow us to perform array operations on the basic types: byte, short, int, long. For the same reason that bit[] does not work, these could not provide a full set of array functionality, but it could provide much that would be of use, I suspect. Examples: int flags; .... if (flags[5]) //check for flag flag[5] = 1; //set flag void foo(long header) { int length = header[0..5]; //copy bits to lvalue. ....

should have: if (flags.bits[5]) flags.bits[5] = 0; (the name "bits" could maybe be other) -- Bruno Medeiros - CS/E student "Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
Feb 17 2006
parent "Regan Heath" <regan netwin.co.nz> writes:
On Fri, 17 Feb 2006 15:38:38 +0000, Bruno Medeiros  
<daiphoenixNO SPAMlycos.com> wrote:
 if (flags[5]) //check for flag
     flag[5] = 1; //set flag
  void foo(long header) {
   int length = header[0..5]; //copy bits to lvalue.
 ....

should have: if (flags.bits[5]) flags.bits[5] = 0; (the name "bits" could maybe be other)

I like it. Regan
Feb 17 2006
prev sibling next sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"kris" <fu bar.org> wrote in message news:dt1jhm$1m3$1 digitaldaemon.com...
 Walter Bright wrote:
 Not really. I think it also conflicts with 'in' already.


Can't separate the two.
 That doesn't mean D should adopt arbitrary symbols, Walter. If you want 
 rapid adoption, then the more you can do to make the language 
 "approachable", the more success you'll have. There was a similar issue 
 with === and !==, and you thankfully deprecated them :-)

Those had to go because === was indistinguishable from == in many fonts.
 It would be overloading its existing meaning, which means that it'll take 
 semantic, rather than syntactic, analysis to disambiguate. This is 
 potential trouble.

would be an issue. However, for a developer, the meaning of "in" with respect to its use with AA and potentially regex-patterns is consistent.

The trouble starts happening when you overload the operators. Doing this with 'in' will result in similar problems that C++ has with '+' being sometimes plus, sometimes concatenate.
 One is asking the question "does this thing on the left exist within the 
 thing on the right". It even takes care of getting the operand ordering 
 correct. Thus, I'd urge you to at least see if there's actually a notable 
 problem for the compiler to handle this before writing the idea off.

It's not a problem with the compiler. It's a conceptual problem for the user. When I see 'in' I think of containers. That's completely different from regex.
 Heck, I've used regex in all manner of ways. I don't think visibility is 
 the problem; rather, I suspect there's a limited set of domains where it 
 applies in a systems language. Some of the those can be addressed in other 
 ways, particularly where performance is a concern; hence regex may not get 
 used as much as it might. In scripting languages there's often a need for 
 Q & D pattern-matching, with little regard for a potentially more 
 efficient mechanism. Horses for courses.

Scripting languages have 3 main programming characteristics: 1) dynamic typing 2) great string handling 3) runtime script generation & execution A lot of people turn to them because of (2). There's no reason C++ and D can't do (2) as well. C++ doesn't because the C++ community has adopted the principle of "if it can be done as a library, it must be done as a library, no matter how unbelievably wretched that might turn out." So when C++ programmers want to do strings, they switch to Perl, Ruby, Python, etc. As to string manipulation in a systems app - is a compiler a systems app? I believe it is, and there's a bunch of tedious string manipulation in it. Everything from handling the command line arguments to manipulating file names to formatting error messages to reading config files. It's astonishing how that stuff shrinks and becomes a pleasure to code rather than tedium when the string handling sugar is applied. I also write a number of garden variety string processing apps, such as the one that turns newsgroup postings into the "D archives". I want to do them in D. I don't want to install/learn Ruby/Python/Perl. I see no reason why D cannot dominate that problem space well.
 I'm an advocate for getting regex support in the grammar,

I thought you were arguing against that <g>.
 but I'm certainly not an advocate for tying Phobos to the compiler (RegExp 
 has a notable resultant import set; because of this I refactored it for 
 Ares and Mango).
 Without a clearly defined means to decouple Phobos from the compiler, 
 you're effectively erecting barriers for other solutions to clamber over 
 (as Sean vaguely intimated earlier). What's missing from all this built-in 
 stuff is a clean and documented means to have it supported outside of 
 Phobos. After all, the compiler is injecting explicit references for AA 
 code, utf conversion code, regex code, and a variety of other things. 
 What's next?

The compiler actually does not emit any explicit references to RegExp. It's all done by a reference to object._Match. _Match operates as a proxy to RegExp, but the compiler knows nothing about that.
 In short: you're (a) building more and more library functionality directly 
 into the language without providing a means to cleanly support alternate 
 implementations, extensions, or otherwise decouple the compiler. And (b) 
 by doing so, you're (perhaps inadvertantly) stifling some innovation and 
 causing some headaches for the very people who are trying to help D along 
 the road to acceptance. It would really help if you'd be somewhat 
 sensitive to these aspects rather than persistently ignoring them.

 For instance, how does one change .sort to use a different sorting 
 algorithm? How does one change the hashing function for non-classes? How 
 can one unhook RegExp+OutBuffer+String+Others, and replace it? etc. etc. 
 If D is intended to be a closed-shop, Phobos-only environment, then some 
 of us are presumably wasting our time supporting the language; right?

Regex is non-trivial. There's no way to have any sort of language support for it without it being in the library. Anyone working on D libraries or other things is welcome to use RegExp, so I am just not understanding what the problem is. Phobos isn't a closed shop, the license on the files allows anyone to do pretty much anything they want with it. Also, let me reiterate that the compiler does *not* emit any hardcoded references to RegExp, nor does it know anything at all about regex's. It uses object._Match, which is a proxy to whatever the language implementor wants to use. RegExp could probably remove its dependence on OutBuffer, though.
 Who uses regex in C++? Hardly anyone. I'm betting it's because using them 
 sucks in C++, not because people don't use regex's.

?

It sucks in C, and why do I say that? I've shipped a C compiler for 22 years now, and not once, not ever, did anyone ask for a regex library for it. Regex wasn't put in the C standard, or the C++ one. Yet regex is considered a core capability of several other languages. There are many ways to interpret that - I am interpreting it as meaning that regex sucks in C, and so people seem to just never even think of using C when they need to process strings.
 BTW: perhaps it would be appropriate to deprecate bit[] before 1.0 and 
 provide a nice library class/struct instead? You might even reuse the old 
 code from Zortech/Zorland days.

I know Stewart is using bit[], I want to hear his opinion first. If he says dump it, I'm agreeable.
Feb 16 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 
 The compiler actually does not emit any explicit references to RegExp. It's 
 all done by a reference to object._Match. _Match operates as a proxy to 
 RegExp, but the compiler knows nothing about that.

This is really more of a library issue than a compiler issue. My concern is that, since internal/object.d now imports std.regexp, the runtime code can no longer be built without at least a skeleton regexp module available. And if the regexp implementation changes then the runtime must be rebuilt. I'll admit that the current approach is probably best given that std.regexp exists and code duplication is a Bad Thing, but it still creates a language dependency on library code, even if the compiler isn't emitting RegExp calls directly.
 Regex is non-trivial. There's no way to have any sort of language support 
 for it without it being in the library. Anyone working on D libraries or 
 other things is welcome to use RegExp, so I am just not understanding what 
 the problem is. Phobos isn't a closed shop, the license on the files allows 
 anyone to do pretty much anything they want with it.

I agree. And this works fine for Phobos. But if Phobos is to be a template for future standard library implementations, then it should be designed in a way that allows for closed-source compiler implementations as well. Also, what if a library writer decides to exploit the regular expression support provided by the language, and merely implements his RegExp class as a veneer over the built-in functionality? It creates an odd sort of circular dependency. I'd originally considered the same thing for UTF transcoding using the built-in foreach mechanism, but as that code is relatively simply it's not as much of an issue. I assume there's no plan to remove std.regexp from Phobos now that language support is in place?
 Also, let me reiterate that the compiler does *not* emit any hardcoded 
 references to RegExp, nor does it know anything at all about regex's. It 
 uses object._Match, which is a proxy to whatever the language implementor 
 wants to use.

Understood. In fact I'll vouch for this since I've had a close look at the code. Sean
Feb 16 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dt394j$1n2j$1 digitaldaemon.com...
 This is really more of a library issue than a compiler issue.  My concern 
 is that, since internal/object.d now imports std.regexp, the runtime code 
 can no longer be built without at least a skeleton regexp module 
 available.  And if the regexp implementation changes then the runtime must 
 be rebuilt.  I'll admit that the current approach is probably best given 
 that std.regexp exists and code duplication is a Bad Thing, but it still 
 creates a language dependency on library code, even if the compiler isn't 
 emitting RegExp calls directly.

I was concerned that code that did not use MatchExpressions might inadvertantly link in the std.regexp module, which would be a Bad Thing. It does not, so I'm not convinced this is a bad thing.
 Regex is non-trivial. There's no way to have any sort of language support 
 for it without it being in the library. Anyone working on D libraries or 
 other things is welcome to use RegExp, so I am just not understanding 
 what the problem is. Phobos isn't a closed shop, the license on the files 
 allows anyone to do pretty much anything they want with it.

I agree. And this works fine for Phobos. But if Phobos is to be a template for future standard library implementations, then it should be designed in a way that allows for closed-source compiler implementations as well.

Sure, and std.regexp's license allows it to be used in closed source. It's a different license from dmd's source code, and the reason for the difference is so that people can use it for just the purpose you suggest. If one wanted to reimplement (or better, extend) RegExp in order to support, say, Perl 6 regex, all that object._Match needs are about 4 trival members, which shouldn't be a burden. Other than that, why reimplement RegExp?
 Also, what if a library writer decides to exploit the regular expression 
 support provided by the language, and merely implements his RegExp class 
 as a veneer over the built-in functionality?  It creates an odd sort of 
 circular dependency.

At some point, he'll need a regex implementation. And the license for std.RegExp allows him to use/adapt it as required.
 I assume there's no plan to remove std.regexp from Phobos now that 
 language support is in place?

I'm just not getting it - why should it be removed? There never was a plan to remove it. And why would an implementation of a D runtime library not want to do a regex implementation? Of course, it's a lot of work to implement a regex, but one can just copy over std.RegExp and use/adapt it as required, as the license allows that. So I am just not getting what the problem is.
Feb 16 2006
parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 
 I'm just not getting it - why should it be removed? There never was a plan 
 to remove it. And why would an implementation of a D runtime library not 
 want to do a regex implementation? Of course, it's a lot of work to 
 implement a regex, but one can just copy over std.RegExp and use/adapt it as 
 required, as the license allows that. So I am just not getting what the 
 problem is. 

Perhaps I'm being idealistic, as I simply don't believe the runtime should rely on standard library code. Up to now that's been achievable, but the solution for this particular feature is less clear. But I'll drop the issue for now and mull it over a bit. Sean
Feb 16 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dt3hev$1taf$1 digitaldaemon.com...
 Walter Bright wrote:
 I'm just not getting it - why should it be removed? There never was a 
 plan to remove it. And why would an implementation of a D runtime library 
 not want to do a regex implementation? Of course, it's a lot of work to 
 implement a regex, but one can just copy over std.RegExp and use/adapt it 
 as required, as the license allows that. So I am just not getting what 
 the problem is.

Perhaps I'm being idealistic, as I simply don't believe the runtime should rely on standard library code. Up to now that's been achievable, but the solution for this particular feature is less clear. But I'll drop the issue for now and mull it over a bit.

Consider that there's no way to implement C, D, etc., without some runtime library. Just doing a long divide relies on library code. There's the startup code (you can't just jmp to main()), shutdown code, exception handling support, etc. C/C++ have gone the odd route of making the library *part of the language*, so, for example, a compiler can recognize strlen and replace it with custom code. To my mind this gives the worst of both worlds - no syntactic sugar and no library flexibility.
Feb 16 2006
parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 "Sean Kelly" <sean f4.ca> wrote in message 
 news:dt3hev$1taf$1 digitaldaemon.com...
 Walter Bright wrote:
 I'm just not getting it - why should it be removed? There never was a 
 plan to remove it. And why would an implementation of a D runtime library 
 not want to do a regex implementation? Of course, it's a lot of work to 
 implement a regex, but one can just copy over std.RegExp and use/adapt it 
 as required, as the license allows that. So I am just not getting what 
 the problem is.

rely on standard library code. Up to now that's been achievable, but the solution for this particular feature is less clear. But I'll drop the issue for now and mull it over a bit.

Consider that there's no way to implement C, D, etc., without some runtime library. Just doing a long divide relies on library code. There's the startup code (you can't just jmp to main()), shutdown code, exception handling support, etc.

Just to be clear, by "standard library code" I actually meant D code specifically. I fully expect the standard C library to be used by the D runtime. But as the C runtime likely calls C standard library functions, I suppose there's little reason to expect otherwise from D.
 C/C++ have gone the odd route of making the library *part of the language*, 
 so, for example, a compiler can recognize strlen and replace it with custom 
 code. To my mind this gives the worst of both worlds - no syntactic sugar 
 and no library flexibility. 

I've heard this mentioned before and it seems a bit odd to me. Does the spec actually mention this anywhere, or is it merely implied by having the library spec be a part of the language spec? Sean
Feb 17 2006
parent "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dt5d4e$i49$2 digitaldaemon.com...
 Walter Bright wrote:
 C/C++ have gone the odd route of making the library *part of the 
 language*, so, for example, a compiler can recognize strlen and replace 
 it with custom code. To my mind this gives the worst of both worlds - no 
 syntactic sugar and no library flexibility.

spec actually mention this anywhere, or is it merely implied by having the library spec be a part of the language spec?

I think it's implied by it being part of the language spec. Regardless, it is true, and many compilers (including DMC) take advantage of it.
Feb 17 2006
prev sibling next sibling parent reply "Kris" <fu bar.com> writes:
"Walter Bright" <newshound digitalmars.com> wrote ...
[snip]
 One is asking the question "does this thing on the left exist within the 
 thing on the right". It even takes care of getting the operand ordering 
 correct. Thus, I'd urge you to at least see if there's actually a notable 
 problem for the compiler to handle this before writing the idea off.

It's not a problem with the compiler. It's a conceptual problem for the user. When I see 'in' I think of containers. That's completely different from regex.

Can't say that I agree, but my opinion matters rather little anyway <g>
 I'm an advocate for getting regex support in the grammar,

I thought you were arguing against that <g>.

Not at all. I've been an advocate for it in the past also. It's certain other aspects of built-in functionality that I consistently have a beef with.
 In short: you're (a) building more and more library functionality 
 directly into the language without providing a means to cleanly support 
 alternate implementations, extensions, or otherwise decouple the 
 compiler. And (b) by doing so, you're (perhaps inadvertantly) stifling 
 some innovation and causing some headaches for the very people who are 
 trying to help D along the road to acceptance. It would really help if 
 you'd be somewhat sensitive to these aspects rather than persistently 
 ignoring them.

 For instance, how does one change .sort to use a different sorting 
 algorithm? How does one change the hashing function for non-classes? How 
 can one unhook RegExp+OutBuffer+String+Others, and replace it? etc. etc. 
 If D is intended to be a closed-shop, Phobos-only environment, then some 
 of us are presumably wasting our time supporting the language; right?

Regex is non-trivial. There's no way to have any sort of language support for it without it being in the library. Anyone working on D libraries or other things is welcome to use RegExp, so I am just not understanding what the problem is. Phobos isn't a closed shop, the license on the files allows anyone to do pretty much anything they want with it.

It's one thing to hear you say that; yet the proof is in the pudding. It's actually quite tricky to disentangle the compiler from Phobos. Some parts simply cannot be decoupled at all (at this time). It's not a critisism of you personally, but the above concerns are very real and the frustration is something you perhaps need to know about. If I read your answer a particular way, it can be interpreted as saying "why would you *not* want to use Phobos?". That would be an example of stifling innovation, for all kind of reasons.
 Also, let me reiterate that the compiler does *not* emit any hardcoded 
 references to RegExp, nor does it know anything at all about regex's. It 
 uses object._Match, which is a proxy to whatever the language implementor 
 wants to use.

 RegExp could probably remove its dependence on OutBuffer, though.

Probably. On the same topic, you've often 'lectured' about the need to decouple such that the "libraries don't end up like Java" . Yet RegExp imports String too, which in turn imports all these (std.format in particular): private import std.stdio; private import std.utf; private import std.uni; private import std.array; private import std.format; private import std.ctype; private import std.stdarg; It's quite easy to eliminate OutBuffer and String from RegExp. There's an adjusted version of it in circulation, if you'd like to forego the effort.
 It sucks in C, and why do I say that? I've shipped a C compiler for 22 
 years now, and not once, not ever, did anyone ask for a regex library for 
 it. Regex wasn't put in the C standard, or the C++ one. Yet regex is 
 considered a core capability of several other languages. There are many 
 ways to interpret that - I am interpreting it as meaning that regex sucks 
 in C, and so people seem to just never even think of using C when they 
 need to process strings.

I'm surprised that you'd interpret it that way. I've used regex in C for decades. There was one great implementation from, uhhh, Ian somebody from Edinburgh Uni, which generated x86 code on the fly. I used that to great effect ~ a truly impressive utility.
Feb 16 2006
next sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Kris" <fu bar.com> wrote in message news:dt3cc2$1pc7$1 digitaldaemon.com...
 It sucks in C, and why do I say that? I've shipped a C compiler for 22 
 years now, and not once, not ever, did anyone ask for a regex library for 
 it. Regex wasn't put in the C standard, or the C++ one. Yet regex is 
 considered a core capability of several other languages. There are many 
 ways to interpret that - I am interpreting it as meaning that regex sucks 
 in C, and so people seem to just never even think of using C when they 
 need to process strings.

I'm surprised that you'd interpret it that way. I've used regex in C for decades. There was one great implementation from, uhhh, Ian somebody from Edinburgh Uni, which generated x86 code on the fly. I used that to great effect ~ a truly impressive utility.

How do you interpret the fact that it has failed to gain traction among the general C population?
Feb 16 2006
parent reply "Kris" <fu bar.com> writes:
"Walter Bright" <newshound digitalmars.com> wrote...
 "Kris" <fu bar.com> wrote in message 
 news:dt3cc2$1pc7$1 digitaldaemon.com...
 It sucks in C, and why do I say that? I've shipped a C compiler for 22 
 years now, and not once, not ever, did anyone ask for a regex library 
 for it. Regex wasn't put in the C standard, or the C++ one. Yet regex is 
 considered a core capability of several other languages. There are many 
 ways to interpret that - I am interpreting it as meaning that regex 
 sucks in C, and so people seem to just never even think of using C when 
 they need to process strings.

I'm surprised that you'd interpret it that way. I've used regex in C for decades. There was one great implementation from, uhhh, Ian somebody from Edinburgh Uni, which generated x86 code on the fly. I used that to great effect ~ a truly impressive utility.

How do you interpret the fact that it has failed to gain traction among the general C population?

I noted a few reasons previously, regarding differing approaches and mindsets between script developers and systems developers. Even when the same person does both. George Wrede just posted some very similar reasoning too. The upshot is that (IMO) the general C population rarely have a compelling need for regex. Where regex might seem (perhaps mistakenly) like using a sledgehammer to crack a nut in C, it's usage is often not given a second thought in scripts. Speaking personally, I don't expect high performance out of a script, and don't give two hoots about Q & D hacking therein. That's not the case with systems-programming (for me), where I'm likely to use something more lightweight as appropriate. On the other hand, I've written a lot of the type of code that really benefits from the state-machinery exposed by a good regex engine. Other times I've hand-tuned my own state-machines to do the work instead. Sometimes in assembly. As noted previously, I don't think it's a question of visibility at all ~ more a question of task, applicability, priorities, and various other cost factors. One has to wonder how much script-regex actually leverages the power within? I'd bet a large % are completely trivial. The kind which can easily be handled by other (more efficient) means in systems languages.
Feb 16 2006
parent "Walter Bright" <newshound digitalmars.com> writes:
"Kris" <fu bar.com> wrote in message news:dt3foh$1s1d$1 digitaldaemon.com...
 I noted a few reasons previously, regarding differing approaches and 
 mindsets between script developers and systems developers. Even when the 
 same person does both. George Wrede just posted some very similar 
 reasoning too. The upshot is that (IMO) the general C population rarely 
 have a compelling need for regex. Where regex might seem (perhaps 
 mistakenly) like using a sledgehammer to crack a nut in C, it's usage is 
 often not given a second thought in scripts.

This might be a circular result - people don't use regex in C because regex's suck in C, so there is no incentive to improve it because there aren't any users. People just get used to going to another language to use regex, and never stop to think it doesn't have to be that way.
 Speaking personally, I don't expect high performance out of a script, and 
 don't give two hoots about Q & D hacking therein. That's not the case with 
 systems-programming (for me), where I'm likely to use something more 
 lightweight as appropriate.

There's a lot of string processing work done in C that is not performance sensitive - like dealing with the command line arguments.
 On the other hand, I've written a lot of the type of code that really 
 benefits from the state-machinery exposed by a good regex engine. Other 
 times I've hand-tuned my own state-machines to do the work instead. 
 Sometimes in assembly.

Sure. And building in some syntactic sugar for regex isn't going to sabotage optimization.
 One has to wonder how much script-regex actually leverages the power 
 within? I'd bet a large % are completely trivial.

I agree with that.
 The kind which can easily be handled by other (more efficient) means in 
 systems languages.

I'm not sure that efficiency is the only goal here - productivity is a big one, too, and one often uses regex in parts of the program that don't need performance. I know I sure get tired of strlen/strcmp/memcpy for routine non-performance-critical code.
Feb 16 2006
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Kris wrote:
 "Walter Bright" <newshound digitalmars.com> wrote ...
 RegExp could probably remove its dependence on OutBuffer, though.

Probably. On the same topic, you've often 'lectured' about the need to decouple such that the "libraries don't end up like Java" . Yet RegExp imports String too, which in turn imports all these (std.format in particular): private import std.stdio; private import std.utf; private import std.uni; private import std.array; private import std.format; private import std.ctype; private import std.stdarg; It's quite easy to eliminate OutBuffer and String from RegExp. There's an adjusted version of it in circulation, if you'd like to forego the effort.

For what it's worth, the latest release of Ares trims a lot of fat out of std.string, so far as runtime dependencies are concerned. The only modules that are actually required by some portion of the runtime are: std.ctype std.outbuffer std.regexp std.string std.utf And outbuffer should be easy enough to remove from this list. I'd have continued to use your modified std.regexp for this release except the deltas between the 146 and 147 versions of std.regexp were tremendous. It would have taken hours to sort out a workable merge of that file, so falling back on the new Phobos version seemed preferable.
 It sucks in C, and why do I say that? I've shipped a C compiler for 22 
 years now, and not once, not ever, did anyone ask for a regex library for 
 it. Regex wasn't put in the C standard, or the C++ one. Yet regex is 
 considered a core capability of several other languages. There are many 
 ways to interpret that - I am interpreting it as meaning that regex sucks 
 in C, and so people seem to just never even think of using C when they 
 need to process strings.

I'm surprised that you'd interpret it that way. I've used regex in C for decades. There was one great implementation from, uhhh, Ian somebody from Edinburgh Uni, which generated x86 code on the fly. I used that to great effect ~ a truly impressive utility.

That sounds pretty cool. Sean
Feb 16 2006
parent "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dt3nss$22bj$1 digitaldaemon.com...
 And outbuffer should be easy enough to remove from this list.  I'd have 
 continued to use your modified std.regexp for this release except the 
 deltas between the 146 and 147 versions of std.regexp were tremendous. It 
 would have taken hours to sort out a workable merge of that file, so 
 falling back on the new Phobos version seemed preferable.

Very little actually changed, what I did was resort the order so it was more appealing in Ddoc format, and add the Ddoc comments.
Feb 16 2006
prev sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Walter Bright wrote:

 It sucks in C, and why do I say that? I've shipped a C compiler for
 22 years now, and not once, not ever, did anyone ask for a regex
 library for it. Regex wasn't put in the C standard, or the C++ one.
 Yet regex is considered a core capability of several other languages.
 There are many ways to interpret that - I am interpreting it as
 meaning that regex sucks in C, and so people seem to just never even
 think of using C when they need to process strings.

Hmm. Regexes being a big thing for interpreted languages is much thanks to the Q&D convenience. Also systems scripting needs it for nontrivial filtering, and of course complicated line rewriting. C folks tend to "peek directly" into the strings because it's cheap, and you have a sense of complete control. Using regexps in C needs a total change of paradigm. Regexps are kind of "top down" things, wherease traditionally "peeking into strings" is bottom-up programming. You'd also have to learn regexps. The trivial things are trivial in C-style too, and the non-trivial stuff gets avoided because of the up-front investment. Folks rather do nested ifs and stuff. Conversely, many interpreted languages make it inefficient to do "peek" kind of programming, as compared to using regexps.
Feb 16 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:43F53BE5.8020900 nospam.org...
 Using regexps in C needs a total change of paradigm. Regexps are kind of 
 "top down" things, wherease traditionally "peeking into strings" is 
 bottom-up programming.

 You'd also have to learn regexps. The trivial things are trivial in 
 C-style too, and the non-trivial stuff gets avoided because of the 
 up-front investment. Folks rather do nested ifs and stuff.

 Conversely, many interpreted languages make it inefficient to do "peek" 
 kind of programming, as compared to using regexps.

There are a lot of cool things you can do in script languages because they are interpreted, and one doesn't care about efficiency. Those things are simply incompatible with D. But I don't see any inherent advantages script languages should have in implementing regex.
Feb 16 2006
parent reply Georg Wrede <georg.wrede nospam.org> writes:
Walter Bright wrote:
 "Georg Wrede" <georg.wrede nospam.org> wrote in message 
 news:43F53BE5.8020900 nospam.org...
 
 Using regexps in C needs a total change of paradigm. Regexps are
 kind of "top down" things, wherease traditionally "peeking into
 strings" is bottom-up programming.
 
 You'd also have to learn regexps. The trivial things are trivial in
  C-style too, and the non-trivial stuff gets avoided because of the
  up-front investment. Folks rather do nested ifs and stuff.
 
 Conversely, many interpreted languages make it inefficient to do
 "peek" kind of programming, as compared to using regexps.

There are a lot of cool things you can do in script languages because they are interpreted, and one doesn't care about efficiency. Those things are simply incompatible with D. But I don't see any inherent advantages script languages should have in implementing regex.

Neither do I. But the question was, how come regexps aren't _used_ as much as we'd expect.
Feb 17 2006
next sibling parent "Walter Bright" <newshound digitalmars.com> writes:
"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:43F58CDB.9090504 nospam.org...
 Walter Bright wrote:
 There are a lot of cool things you can do in script languages because
 they are interpreted, and one doesn't care about efficiency. Those
 things are simply incompatible with D. But I don't see any inherent
 advantages script languages should have in implementing regex.

Neither do I. But the question was, how come regexps aren't _used_ as much as we'd expect.

My answer is because they're inconvenient to use in C/C++.
Feb 17 2006
prev sibling parent James Dunne <james.jdunne gmail.com> writes:
Georg Wrede wrote:
 Walter Bright wrote:
 
 "Georg Wrede" <georg.wrede nospam.org> wrote in message 
 news:43F53BE5.8020900 nospam.org...

 Using regexps in C needs a total change of paradigm. Regexps are
 kind of "top down" things, wherease traditionally "peeking into
 strings" is bottom-up programming.

 You'd also have to learn regexps. The trivial things are trivial in
  C-style too, and the non-trivial stuff gets avoided because of the
  up-front investment. Folks rather do nested ifs and stuff.

 Conversely, many interpreted languages make it inefficient to do
 "peek" kind of programming, as compared to using regexps.

There are a lot of cool things you can do in script languages because they are interpreted, and one doesn't care about efficiency. Those things are simply incompatible with D. But I don't see any inherent advantages script languages should have in implementing regex.

Neither do I. But the question was, how come regexps aren't _used_ as much as we'd expect.

My answer is that regular expressions simply aren't powerful enough for the kinds of string processing that I need to do regularly (no pun intended). Regular expressions represent regular languages. Not all languages are regular, of course. <rant> My other beef with regular expression are that there are so many competeing standards for them, and on top of that some are not even standardized (i.e. MS Visual Studio .NET 2003). You never know if one implementation uses longest-match or one uses shortest-match; you never know how newlines are handled; you never know if Unicode is supported; you never know the run-time performance of your regex; you never know the syntax for selecting match indicies (0 based or 1 based, use '\1'? Record match with {} or with \(\) or with () ??) etc. There are simply too many variables with regular expressions as they exist in all their forms to be relied upon. Finally, they're just plain ugly and nearly impossible to debug. </rant> Following that rant, I can put a positive spin here and say that Ragel state machine compiler is an excellent model to work from! One can insert custom code between state transitions for debugging and even for complex logic! Why can't we have compiler-support for this type of power? :) -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/MU/S d-pu s:+ a-->? C++++$ UL+++ P--- L+++ !E W-- N++ o? K? w--- O M-- V? PS PE Y+ PGP- t+ 5 X+ !R tv-->!tv b- DI++(+) D++ G e++>e h>--->++ r+++ y+++ ------END GEEK CODE BLOCK------ James Dunne
Feb 20 2006
prev sibling parent Georg Wrede <georg.wrede nospam.org> writes:
kris wrote:

 I'm an advocate for getting regex support in the grammar, but I'm 
 certainly not an advocate for tying Phobos to the compiler (RegExp has a 
 notable resultant import set; because of this I refactored it for Ares 
 and Mango).

Would it be correct to assume that if we had compile-time regexps, then the resultant import set would be effectively zero? (As long as we of course don't also use regexps that aren't compile-time compilable?) Since (IMHO) most shortish programs only use literal regexes, this would be quite important.
Feb 16 2006
prev sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Walter Bright wrote:
 "kris" <fu bar.org> wrote 
 Walter Bright wrote:


At least "in" has some relevant meaning to it.

It would be overloading its existing meaning, which means that it'll take semantic, rather than syntactic, analysis to disambiguate. This is potential trouble.

Sad. "in" did sound good. :-)
That is a problem, one that would get solved when RegExp can do wchar and 
dchar. That isn't a technical problem, it's more of a getting around to 
it problem.

Well, since grammar supported regex has elevated itself to the top of the priority list, perhaps wchar/dchar support might tag along with it?

The thing is, RegExp has been in there from the beginning, but it has gone unused and even its existence is overlooked. I don't believe that's because it isn't useful - look at Ruby, Perl, Javascript, etc. Those languages heavilly use regex. Is there something inherent about *script* languages that make them nice for regex? I don't believe there is, I think it gets heavilly used in those languages because the syntactic sugar makes it easy to use.

There are 2 things reducing its usage. First, the using itself has been awkward. Second, and more important, most real-world uses of regex involve literals. And that implies compile-time compilation, if they are to be perceived efficient.
I don't think this takes away from the regex templates. I hope to use the 
regex templates in conjunction with this syntactic sugar to create 
optimized regex evaluation.

Perhaps, but I really don't see the need for this sudden rush to get regex support into the grammar. Experience with regex templates is almost certain to uncover some conflict in this regard ~ one that will likely have to be compromised to fit in with the current syntax. That's just Murphy's law. What's the big hurry?

I thought it fit in well with D's new capability of being runnable in a script-like fashion.

Experience has shown that using D as a scripting language in a production environment, currently needs some method of compiler-version-locking. In other words, if a script is written for D.130, then something should ensure that it stays compiled with that version, even after the system D compiler gets updated. If this is not done, then system scripts break at unexpected times (i.e. the first time that particular script is run after the compiler is updated to the first version that breaks the script). In a production environment it is plain impossible to search and test-run each D script any time the compiler gets updated. This problem is made even worse by the run-time library not having any version identifier. It sure would be nice if one could leave the old run-time libraries as-is, and only add the new one next to them. The binaries should choose the right one automagically. The way we are using D scripting (digitalmars.D.announce:2674) is version independent (meaning we can use _any_ DMD), but of course the individual D scripts introduce compiler version dependencies by themselves. One solution to all the above mentioned problems, would of course be a "dscript.d" binary, that takes care of everything. (A good starting point would be to use the above mentioned scripting script.) Then every D script would start with #! /usr/local/bin/dscript but that would then totally obviate the DMD -run parameter!
 If this opens up a reasonably broad new range of 
 applications that D is a good fit for, that's good. I might be wrong, of 
 course, as I've been with the bit data type (a complete botch). Match 
 expressions don't break anything, were not expensive to implement, and the 
 only way to see how they'll work out is to try them. 

I think the current implementation is good. I don't like to see any $whatever (or even worse, $` $┤ $' $") implemented!!!! We don't like to see D become Perl. And hey, Perl itself has been moving away from the $-unbrememberable-fly-droppings stuff. AND even _bash_ has been starting to avoid them lately! (See man bash.) Syntactic sugar is ok in general. But not "semantic" or "hieroglyphic" sugar. Let's see how the brand new stuff works, and whether any additional sugar ever becomes needed here!
Feb 16 2006
parent "Walter Bright" <newshound digitalmars.com> writes:
"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:43F530AC.9010101 nospam.org...
 Syntactic sugar is ok in general. But not "semantic" or "hieroglyphic" 
 sugar. Let's see how the brand new stuff works, and whether any additional 
 sugar ever becomes needed here!

I think the $` is pretty much dead now <g>.
Feb 16 2006
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 "Kris" <fu bar.com> wrote in message news:dt0q7n$2cuo$1 digitaldaemon.com...
 There seem to be multiple issues here. The first one, which you ask about, 
 is related to the syntax. At first blush, the ~~ looks like an approximate 
 approximation, and then making D look like a malformed Perl is surely a 
 mistake.

If you've got a better idea for tokens ~~ and !~ ?

I'm half inclined to suggest -> for ~~, though there doesn't seem to be an obvious corresponding 'not' version. Sean
Feb 16 2006
parent "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dt2cra$ssu$2 digitaldaemon.com...
 Walter Bright wrote:
 "Kris" <fu bar.com> wrote in message 
 news:dt0q7n$2cuo$1 digitaldaemon.com...
 There seem to be multiple issues here. The first one, which you ask 
 about, is related to the syntax. At first blush, the ~~ looks like an 
 approximate approximation, and then making D look like a malformed Perl 
 is surely a mistake.

If you've got a better idea for tokens ~~ and !~ ?

I'm half inclined to suggest -> for ~~, though there doesn't seem to be an obvious corresponding 'not' version.

Two cons: 1) people see -> and they're going to think the C/C++ meaning. Heck, I often mistakenly use -> in D instead of '.'. For that reason -> should never result in valid D code. 2) as you suggested, !-> doesn't look too hot :-(
Feb 16 2006
prev sibling next sibling parent James Dunne <james.jdunne gmail.com> writes:
Walter Bright wrote:
 D dramatically improves the convenience of string handling over C++. But 
 while I think using the library std.regexp is straightforward, obviously it 
 just isn't gaining traction. People like the shortcut approaches Ruby and 
 Perl use for regular expressions, hence the new D match-expression support.
 
 So, now we have:
 
     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable? 
 
 

I'd rather make my code easier to read than write. I don't use regexps just for that reason. -- Regards, James Dunne
Feb 15 2006
prev sibling next sibling parent Roberto Mariottini <Roberto_member pathlink.com> writes:
In article <dt088e$1svm$2 digitaldaemon.com>, Walter Bright says...
D dramatically improves the convenience of string handling over C++. But 
while I think using the library std.regexp is straightforward, obviously it 
just isn't gaining traction. People like the shortcut approaches Ruby and 
Perl use for regular expressions, hence the new D match-expression support.

So, now we have:

    if (regular_expression ~~ string)
    {
            _match.pre
            _match.post
            _match.match(n)
    }

Fairly good.
Should we do some aliases:

    $` => _match.pre
    $' => _match.post
    $& => _match.match(0)
    $n => _match.match(n)

?

No. That's why I hate perl. I have to look in the manual to know what the hell $` means, and be carefult abou it being realli an ` and not a '.
Syntactic sugar is often a good idea, but at what point do they become 
cyclamates and cause cancer in laboratory animals? Will these $ tokens 
render D more accessible, but perhaps too unreadable? 

Yes. All those $'$`$&$3 are useful only to make my eyes cross. If you want to use $ use it as an abbreviation of 'match', so you'll get: $pre => _match.pre $post => _match.post $(0) => _match.match(0) $(n) => _match.match(n) So once I know that $ stands for 'match', I can easily argue what $pre, $post, $(0) and $(3) stand for. Ciao --- http://www.mariottini.net/roberto/
Feb 15 2006
prev sibling next sibling parent reply Oskar Linde <olREM OVEnada.kth.se> writes:
Walter Bright wrote:

 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post

` is not readily available on all keyboards. Some fonts also have problems differentiating between the three Latin-1 ticks (` ' ┤) (the straight tick (apostrophe) (') looks like a right tick (acute accent) (┤) in many fonts).
     $& => _match.match(0)
     $n => _match.match(n)

Is n meant to be an integer expression or a numeric literal?
 ? Syntactic sugar is often a good idea, but at what point do they become
 cyclamates and cause cancer in laboratory animals? Will these $ tokens
 render D more accessible, but perhaps too unreadable?

IMHO, Both. It makes D less readable for sure. I also think this repulses more people in general than it attracts some odd perl hackers. :) In this case, I don't even thing the syntactical sugar makes the code much faster to write (which in reality, I think, is psychological more than a real problem). If verbosity is to be avoided, I would suggest (as in my earlier reply to this thread) that $ replaces _match. This would give: $.pre $.post $[0] $[n] (or $.match(n), but why not overload opIndex?) /Oskar
Feb 16 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1ccm$2ssg$1 digitaldaemon.com...
     $& => _match.match(0)
     $n => _match.match(n)


$1, $2, $3, ...
 ? Syntactic sugar is often a good idea, but at what point do they become
 cyclamates and cause cancer in laboratory animals? Will these $ tokens
 render D more accessible, but perhaps too unreadable?

more people in general than it attracts some odd perl hackers. :)

I'm a little surprised at the uniformly negative reaction to the perl-ish notation. But that's good, as it makes the right way to go for D clear.
 If verbosity is to be avoided, I would suggest (as in my earlier reply to
 this thread) that $ replaces _match. This would give:

 $.pre
 $.post
 $[0]
 $[n]
 (or $.match(n), but why not overload opIndex?)

That was the original plan, but when _match is of type T*, the [ ] cannot be overloaded.
Feb 16 2006
parent reply Oskar Linde <olREM OVEnada.kth.se> writes:
Walter Bright wrote:

 
 "Oskar Linde" <olREM OVEnada.kth.se> wrote in message
 news:dt1ccm$2ssg$1 digitaldaemon.com...
 $.pre
 $.post
 $[0]
 $[n]
 (or $.match(n), but why not overload opIndex?)

That was the original plan, but when _match is of type T*, the [ ] cannot be overloaded.

So why does _match have to be a pointer? Would something like this not work? (from object.d, added void *_this, opIndex and changed this->_this) /* ***************************** _Match **************************** */ /* ** * Default type for _match. * Implemented as a proxy for RegExp, so that object doesn't pull in * the entire std.regexp. */ import std.regexp; struct _Match { void *_this; char[] match(size_t n) { return (cast(RegExp)_this).match(n); } char[] opIndex(size_t n) { return match(n); } _Match opNext() { RegExp r = (cast(RegExp)_this).opNext(); if (r) return cast(_Match)_this; r = cast(RegExp)_this; delete r; return null; } char[] pre() { return (cast(RegExp)_this).pre(); } char[] post() { return (cast(RegExp)_this).post(); } } /Oskar
Feb 17 2006
parent "Walter Bright" <newshound digitalmars.com> writes:
"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt41c8$2a9a$1 digitaldaemon.com...
 Walter Bright wrote:

 "Oskar Linde" <olREM OVEnada.kth.se> wrote in message
 news:dt1ccm$2ssg$1 digitaldaemon.com...
 $.pre
 $.post
 $[0]
 $[n]
 (or $.match(n), but why not overload opIndex?)

That was the original plan, but when _match is of type T*, the [ ] cannot be overloaded.

So why does _match have to be a pointer?

I wanted it to work with both pointers to structs and to class references.
 Would something like this not work?

The problem with that is testing: _Match m; if (m) doesn't work if _Match is a struct.
Feb 17 2006
prev sibling next sibling parent bobef <bobef lessequal.com> writes:
Walter Bright wrote:
 D dramatically improves the convenience of string handling over C++. But 
 while I think using the library std.regexp is straightforward, obviously it 
 just isn't gaining traction. People like the shortcut approaches Ruby and 
 Perl use for regular expressions, hence the new D match-expression support.
 
 So, now we have:
 
     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable? 
 
 

It is nice feature but I don't think such thing should be part of the language. I don't think it is so common. Maybe I am wrong... The other thing I don't like is the too many reserved words... Me personally wouldn't try to catch Ruby or Perl. I believe comparison between D/C/C++ and virtual machine or scripting language is foolish. But it depends on what are the goals of D - larger audience or higher quality. Because, in my opinion, trying to catch a scripting language is regression. But as I said it is very nice feature. I will use it myself, but wouldn't judge for a language by this...
Feb 16 2006
prev sibling next sibling parent "Charles" <noone nowhere.com> writes:
Sweet jesus ... the horror.




"Walter Bright" <newshound digitalmars.com> wrote in message
news:dt088e$1svm$2 digitaldaemon.com...
 D dramatically improves the convenience of string handling over C++. But
 while I think using the library std.regexp is straightforward, obviously

 just isn't gaining traction. People like the shortcut approaches Ruby and
 Perl use for regular expressions, hence the new D match-expression

 So, now we have:

     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }

 Should we do some aliases:

     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)

 ? Syntactic sugar is often a good idea, but at what point do they become
 cyclamates and cause cancer in laboratory animals? Will these $ tokens
 render D more accessible, but perhaps too unreadable?

Feb 16 2006
prev sibling parent reply David Medlock <noone nowhere.com> writes:
Walter Bright wrote:
 D dramatically improves the convenience of string handling over C++. But 
 while I think using the library std.regexp is straightforward, obviously it 
 just isn't gaining traction. People like the shortcut approaches Ruby and 
 Perl use for regular expressions, hence the new D match-expression support.
 
 So, now we have:
 
     if (regular_expression ~~ string)
     {
             _match.pre
             _match.post
             _match.match(n)
     }
 
 Should we do some aliases:
 
     $` => _match.pre
     $' => _match.post
     $& => _match.match(0)
     $n => _match.match(n)
 
 ? Syntactic sugar is often a good idea, but at what point do they become 
 cyclamates and cause cancer in laboratory animals? Will these $ tokens 
 render D more accessible, but perhaps too unreadable? 
 
 

I havent read this whole thread, but pardon if this has been suggested. Why doesnt the regular expression stuff use foreach? struct Match { short start, end; } foreach( Match m ; "[0-9]" ~~ mystring ) { writefln( "Found number:%s", mystring[m.start..m.end] ); } Basically this implements a callback methodology for regexes, similar to: void match( char[] regex, char[] str, bool delegate( Match m, char[] s ) dg ); Obviously this doesnt cover all cases, but I'm just curious why it isn't used. -DavidM
Feb 16 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"David Medlock" <noone nowhere.com> wrote in message 
news:dt2mpk$17aj$1 digitaldaemon.com...
 I havent read this whole thread, but pardon if this has been suggested.
 Why doesnt the regular expression stuff use foreach?

Why, indeed. Oskar has brought it up, and he and you are right. I'm going to reevaluate this based on the feedback in this thread.
Feb 16 2006
parent Hasan Aljudy <hasan.aljudy gmail.com> writes:
Walter Bright wrote:
 "David Medlock" <noone nowhere.com> wrote in message 
 news:dt2mpk$17aj$1 digitaldaemon.com...
 
I havent read this whole thread, but pardon if this has been suggested.
Why doesnt the regular expression stuff use foreach?

Why, indeed. Oskar has brought it up, and he and you are right. I'm going to reevaluate this based on the feedback in this thread.

I agree with the "foreach" point/suggestion .. IMO building regex into the language to the point where a ~~ expressions automatically generates a "_match" variable is just going too far. a Match struct/class and a foreach implementation makes it much more consistent and clean.
Feb 16 2006