www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - DMD 0.147 release

reply "Walter Bright" <newshound digitalmars.com> writes:
Added match expressions.

http://www.digitalmars.com/d/changelog.html
Feb 15 2006
next sibling parent reply Chr. Grade <Chr._member pathlink.com> writes:
Nifty feature. Would be handy if regex searches be included as well - for
continuous buffers and for chunked buffers.

Chr. Grade

In article <dt088d$1svm$1 digitaldaemon.com>, Walter Bright says...
Added match expressions.

http://www.digitalmars.com/d/changelog.html

Feb 15 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Chr. Grade" <Chr._member pathlink.com> wrote in message 
news:dt0ait$1v78$1 digitaldaemon.com...
 Nifty feature. Would be handy if regex searches be included as well - for
 continuous buffers and for chunked buffers.

Not sure what you mean?
Feb 15 2006
next sibling parent Derek Parnell <derek psych.ward> writes:
On Wed, 15 Feb 2006 14:49:14 -0800, Walter Bright wrote:

 "Chr. Grade" <Chr._member pathlink.com> wrote in message 
 news:dt0ait$1v78$1 digitaldaemon.com...
 Nifty feature. Would be handy if regex searches be included as well - for
 continuous buffers and for chunked buffers.

Not sure what you mean?

Oh, I'm positive I don't know what Chr. means :-) What is a "continuous buffer" in this context? What is a "chunked buffer"? -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 16/02/2006 10:11:17 AM
Feb 15 2006
prev sibling parent reply Chr. Grade <Chr._member pathlink.com> writes:
I obviously lack the terminology neccessary.
Trying with pseudo-code:

// --- Situation: find matches, chunks of data in a list, no continuous
// buffer, memcpys to duplicate and concatenate data inefficient

slist List    = ...; // containers with some data differing in size
rxres Results = List ~~ "regex+";

// Hopefully indexed all potentially dangling matches between two
// chunks (?)...

while( !Results )
.. = Results.nFirst,
.. = Results.nLast,
.. = Results.get_ptr,
Results++;

// --- Situation: find matches in a continuous buffer:

utf16 Text[]  = ...;
rxres Results = Text ~~ "foo+";

while( !Results )
print( Results++ );

---

Maybe this explains what I meant, maybe it is just absurd.

Chr. Grade


In article <dt0b6l$1vpe$1 digitaldaemon.com>, Walter Bright says...
"Chr. Grade" <Chr._member pathlink.com> wrote in message 
news:dt0ait$1v78$1 digitaldaemon.com...
 Nifty feature. Would be handy if regex searches be included as well - for
 continuous buffers and for chunked buffers.

Not sure what you mean?

Feb 15 2006
parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 16 Feb 2006 00:13:16 +0000 (UTC), Chr. Grade wrote:

 I obviously lack the terminology neccessary.
 Trying with pseudo-code:
 
 // --- Situation: find matches, chunks of data in a list, no continuous
 // buffer, memcpys to duplicate and concatenate data inefficient
 
 slist List    = ...; // containers with some data differing in size
 rxres Results = List ~~ "regex+";
 
 // Hopefully indexed all potentially dangling matches between two
 // chunks (?)...
 
 while( !Results )
 .. = Results.nFirst,
 .. = Results.nLast,
 .. = Results.get_ptr,
 Results++;
 
 // --- Situation: find matches in a continuous buffer:
 
 utf16 Text[]  = ...;
 rxres Results = Text ~~ "foo+";
 
 while( !Results )
 print( Results++ );
 
 ---
 
 Maybe this explains what I meant, maybe it is just absurd.
 

I'm really sorry, but this has just made it worse for me. I have absolutely no idea what you are trying to do or say. Are you talking about a list of pointers to strings and searching over the referenced strings in one ~~ operation? -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 16/02/2006 11:20:32 AM
Feb 15 2006
parent Chr. Grade <Chr._member pathlink.com> writes:
In article <4z8zsk5s3ozv$.1xsunk1521nn9.dlg 40tude.net>, Derek Parnell says...

 Maybe this explains what I meant, maybe it is just absurd.
 

I'm really sorry, but this has just made it worse for me. I have absolutely no idea what you are trying to do or say. Are you talking about a list of pointers to strings and searching over the referenced strings in one ~~ operation?

Yes, whole list in one operation, indexing matches. The regexp engine would have to do the pointer hopping as needed. Here's an example of what I mean, but it can't handle discontinuous buffers: www.boost.org/libs/regex/example/snippets/regex_search_example.cpp The code there could be wrapped up in a class/struct which only exposes the iteration through the map/list with matches via overloaded operators. Chr. Grade
-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
16/02/2006 11:20:32 AM

Feb 15 2006
prev sibling next sibling parent reply clayasaurus <clayasaurus gmail.com> writes:
Nice, these match expressions make things really handy. At first I was 
confused on what I would use them for, but I started programming for a 
little bit and already found a use for them. Namely, assert(filename ~~ 
"*.wav"), assuming I understand it correctly.

Walter Bright wrote:
 Added match expressions.
 
 http://www.digitalmars.com/d/changelog.html
 
 
 

Feb 15 2006
next sibling parent clayasaurus <clayasaurus gmail.com> writes:
One thing I forgot to ask, do we have

this()
in
{
}
out
{
}
body
{
}

Yet? Thanks.
~ Clay

clayasaurus wrote:
 Nice, these match expressions make things really handy. At first I was 
 confused on what I would use them for, but I started programming for a 
 little bit and already found a use for them. Namely, assert(filename ~~ 
 "*.wav"), assuming I understand it correctly.
 
 Walter Bright wrote:
 Added match expressions.

 http://www.digitalmars.com/d/changelog.html


Feb 15 2006
prev sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"clayasaurus" <clayasaurus gmail.com> wrote in message 
news:dt0dm6$228a$1 digitaldaemon.com...
 Nice, these match expressions make things really handy. At first I was 
 confused on what I would use them for, but I started programming for a 
 little bit and already found a use for them. Namely, assert(filename ~~ 
 "*.wav"), assuming I understand it correctly.

It's the other way around, the regexp is on the left. Also, operating system wildcard thing isn't the one used, it's real regular expressions from std.regexp. So you'd write it as: assert(".wav$" ~~ filename); which means any string ending in ".wav". std.path.fnmatch() does operating system style wildcards like "*.wav" - I could make that work with the match expressions too if there's a desire (because operator overloading works with it!). Another example of things you can do: assert("^abc" ~~ string); // (1) matches any string that starts with the string "abc". It's a little klunky to do otherwise, assert(string.length >= 3 && string[0..3] == "abc"); // (2) Currently, evaluating ("^abc"~~string) invokes the full std.regexp machinery. But a compiler is free to optimize (1) into (2). I'm thinking of Eric and Don's examples of generating custom recognizers for static regex strings. This could make D's regex support into a real screamer.
Feb 15 2006
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 15 Feb 2006 15:50:03 -0800, Walter Bright wrote:


 Also, operating system 
 wildcard thing isn't the one used, it's real regular expressions from 
 std.regexp. So you'd write it as:
 
     assert(".wav$" ~~ filename);
 
 which means any string ending in ".wav".

Should that be ... assert("\.wav$" ~~ filename); otherwise it would also match things like "somefile.awav" because doesn't the "." in the regexp represents 'any-character'. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 16/02/2006 11:16:13 AM
Feb 15 2006
next sibling parent "Walter Bright" <newshound digitalmars.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:1fyp16zonzb9q$.1qxsxpiy1s1ry.dlg 40tude.net...
 On Wed, 15 Feb 2006 15:50:03 -0800, Walter Bright wrote:
 So you'd write it as:

     assert(".wav$" ~~ filename);

 which means any string ending in ".wav".

Should that be ... assert("\.wav$" ~~ filename); otherwise it would also match things like "somefile.awav" because doesn't the "." in the regexp represents 'any-character'.

Yes. <g>
Feb 15 2006
prev sibling parent reply clayasaurus <clayasaurus gmail.com> writes:
Derek Parnell wrote:
 On Wed, 15 Feb 2006 15:50:03 -0800, Walter Bright wrote:
 
 
 Also, operating system 
 wildcard thing isn't the one used, it's real regular expressions from 
 std.regexp. So you'd write it as:

     assert(".wav$" ~~ filename);

 which means any string ending in ".wav".

Should that be ... assert("\.wav$" ~~ filename); otherwise it would also match things like "somefile.awav" because doesn't the "." in the regexp represents 'any-character'.

Hrm. The compiler tells me it is an unidentified escape sequence.
Feb 16 2006
parent reply Sean Kelly <sean f4.ca> writes:
clayasaurus wrote:
 Derek Parnell wrote:
 On Wed, 15 Feb 2006 15:50:03 -0800, Walter Bright wrote:


 Also, operating system wildcard thing isn't the one used, it's real 
 regular expressions from std.regexp. So you'd write it as:

     assert(".wav$" ~~ filename);

 which means any string ending in ".wav".

Should that be ... assert("\.wav$" ~~ filename); otherwise it would also match things like "somefile.awav" because doesn't the "." in the regexp represents 'any-character'.

Hrm. The compiler tells me it is an unidentified escape sequence.

Try "\.wav$"r :-) Sean
Feb 16 2006
next sibling parent "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dt2uj7$1e4l$1 digitaldaemon.com...
 Try "\.wav$"r :-)

r"\.wav$"
Feb 16 2006
prev sibling parent pragma <pragma_member pathlink.com> writes:
In article <dt2uj7$1e4l$1 digitaldaemon.com>, Sean Kelly says...
clayasaurus wrote:
 Derek Parnell wrote:
 On Wed, 15 Feb 2006 15:50:03 -0800, Walter Bright wrote:


 Also, operating system wildcard thing isn't the one used, it's real 
 regular expressions from std.regexp. So you'd write it as:

     assert(".wav$" ~~ filename);

 which means any string ending in ".wav".

Should that be ... assert("\.wav$" ~~ filename); otherwise it would also match things like "somefile.awav" because doesn't the "." in the regexp represents 'any-character'.

Hrm. The compiler tells me it is an unidentified escape sequence.

Try "\.wav$"r :-)

Or use backticks instead: assert(`\.wav$` ~~ filename); - Eric Anderton at yahoo
Feb 17 2006
prev sibling parent reply Chr. Grade <Chr._member pathlink.com> writes:
Currently, evaluating ("^abc"~~string) invokes the full std.regexp 
machinery. But a compiler is free to optimize (1) into (2). I'm thinking of 
Eric and Don's examples of generating custom recognizers for static regex 
strings. This could make D's regex support into a real screamer. 

Static regex? Umm... Again, this might be absurd, but there could be a type "regex". regex rxSome = "|&|="; regex rxMore = "[a-n]"; regex rxMerge = "foo($rxSome)?($rxMore)+"; Whereas... char[] cpSome = "..."; char[] cpMore = "..."; .. would lead to a less readable: char[] cpMerge = "foo" . cpSome . "?" . cpMore . "+"; --- Chr. Grade
Feb 15 2006
parent "Walter Bright" <newshound digitalmars.com> writes:
"Chr. Grade" <Chr._member pathlink.com> wrote in message 
news:dt0gmk$24v3$1 digitaldaemon.com...
Currently, evaluating ("^abc"~~string) invokes the full std.regexp
machinery. But a compiler is free to optimize (1) into (2). I'm thinking 
of
Eric and Don's examples of generating custom recognizers for static regex
strings. This could make D's regex support into a real screamer.

Static regex? Umm... Again, this might be absurd, but there could be a type "regex". regex rxSome = "|&|="; regex rxMore = "[a-n]"; regex rxMerge = "foo($rxSome)?($rxMore)+";

--------------------------- import std.regexp; auto rxSome = RegExp("|&|="); if (rxSome ~~ "string") ... ----------------------------- works now.
Feb 15 2006
prev sibling next sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 Added match expressions.

Interesting. So where can I find documentation on pattern syntax? The docs for std.regexp doesn't seem to mention it. Is it just the classic textbook syntax, or are there differences? Sean
Feb 15 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dt0ds9$226k$1 digitaldaemon.com...
 Walter Bright wrote:
 Added match expressions.

Interesting. So where can I find documentation on pattern syntax? The docs for std.regexp doesn't seem to mention it. Is it just the classic textbook syntax, or are there differences?

There's a link in the std_regexp page to it: www.digitalmars.com/ctg/regular.html It's the classic syntax.
Feb 15 2006
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 15 Feb 2006 15:51:40 -0800, Walter Bright wrote:

 
 There's a link in the std_regexp page to it: 
 www.digitalmars.com/ctg/regular.html

There is a couple of problems with this link. It doesn't work when one uses the downloaded html docs. This is because it uses a link to a file that is not a part of the downloaded stuff. But more importantly, the syntax is wrong. The actual html you use is (notice the twin double quotes) <a href=""../../ctg/regular.html"">Regular expressions</a> but it would be better to use something like ... <a href="http://www.digitalmars.com/ctg/regular.html">Regular expressions</a> -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 16/02/2006 10:59:41 AM
Feb 15 2006
parent Sean Kelly <sean f4.ca> writes:
Derek Parnell wrote:
 On Wed, 15 Feb 2006 15:51:40 -0800, Walter Bright wrote:
 
 There's a link in the std_regexp page to it: 
 www.digitalmars.com/ctg/regular.html

There is a couple of problems with this link. It doesn't work when one uses the downloaded html docs. This is because it uses a link to a file that is not a part of the downloaded stuff. But more importantly, the syntax is wrong.

Got me. I'm looking at the online docs (http://www.digitalmars.com/d/phobos/std_regexp.html) and both links at the top of the page just link to std_regexp.html. Thus my question. Sean
Feb 15 2006
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 "Sean Kelly" <sean f4.ca> wrote in message 
 news:dt0ds9$226k$1 digitaldaemon.com...
 Walter Bright wrote:
 Added match expressions.

docs for std.regexp doesn't seem to mention it. Is it just the classic textbook syntax, or are there differences?

There's a link in the std_regexp page to it: www.digitalmars.com/ctg/regular.html It's the classic syntax.

Awesome! This will take some getting used to, but it promises to be of tremendous use. Don't ask me why a built-in feature seems preferable to the same thing in library code, but it does :-p Perhaps some of it is that this will work for both compile-time and run-time evaluation, while the library version would likely be different for each. Sean
Feb 15 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dt0fp5$23qb$1 digitaldaemon.com...
 Awesome!  This will take some getting used to, but it promises to be of 
 tremendous use.  Don't ask me why a built-in feature seems preferable to 
 the same thing in library code, but it does :-p

I don't really know why either. std.regexp has been in D since day 1, but it's been completely overlooked, and I regularly get comments about D not doing regular expressions. If this is what it takes, then so be it <g>.
Feb 15 2006
parent jicman <jicman_member pathlink.com> writes:
Walter Bright says...
"Sean Kelly" <sean f4.ca> wrote in message 
news:dt0fp5$23qb$1 digitaldaemon.com...
 Awesome!  This will take some getting used to, but it promises to be of 
 tremendous use.  Don't ask me why a built-in feature seems preferable to 
 the same thing in library code, but it does :-p

I don't really know why either. std.regexp has been in D since day 1, but it's been completely overlooked, and I regularly get comments about D not doing regular expressions. If this is what it takes, then so be it <g>.

Most of the programs that I've done with D use std.regexp, so I use it all the time.
Feb 16 2006
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 15 Feb 2006 13:52:12 -0800, Walter Bright wrote:

 Added match expressions.

Too lazy to test sorry. Do match expressions support Unicode or just ASCII? -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 16/02/2006 10:44:23 AM
Feb 15 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message
news:k0lbfijz1ng3$.7oaf5rf2w9ut$.dlg 40tude.net...
 On Wed, 15 Feb 2006 13:52:12 -0800, Walter Bright wrote:

 Added match expressions.

Too lazy to test sorry. Do match expressions support Unicode or just ASCII?

I know it works with ASCII, and it's supposed to work with UTF. I wouldn't be surprised if the latter is buggy, though, since I haven't written test cases for it. It's designed, however, so the compiler itself need know nothing about regular expressions. The work is all done by std.regexp.
Feb 15 2006
parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 15 Feb 2006 16:29:21 -0800, Walter Bright wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:k0lbfijz1ng3$.7oaf5rf2w9ut$.dlg 40tude.net...
 On Wed, 15 Feb 2006 13:52:12 -0800, Walter Bright wrote:

 Added match expressions.

Too lazy to test sorry. Do match expressions support Unicode or just ASCII?

I know it works with ASCII, and it's supposed to work with UTF. I wouldn't be surprised if the latter is buggy, though, since I haven't written test cases for it. It's designed, however, so the compiler itself need know nothing about regular expressions. The work is all done by std.regexp.

Seems to be working, but more unittests could be written. void main() { assert( "\uff16" ~~ "\u2341\uff16" ); // succeeds correctly //assert( "\xff" ~~ "\u2341\uff16" ); // fails correctly //assert( "^\uff16" ~~ "\u2341\uff16" ); // fails correctly assert( "\uff16$" ~~ "\u2341\uff16" ); // succeeds correctly } BTW, one side effect of the new matching syntax is that you don't have to explicitly import std.regexp. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 16/02/2006 11:56:43 AM
Feb 15 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:fdfenjm7wj46.1xmq12pyjxp8c$.dlg 40tude.net...
 Seems to be working, but more unittests could be written.

 void main()
 {
    assert( "\uff16" ~~ "\u2341\uff16" );  // succeeds correctly
    //assert( "\xff" ~~ "\u2341\uff16" );  // fails correctly
    //assert( "^\uff16" ~~ "\u2341\uff16" );  // fails correctly
    assert( "\uff16$" ~~ "\u2341\uff16" );  // succeeds correctly
 }

You can use !~ for the fails cases.
 BTW, one side effect of the new matching syntax is that you don't have to
 explicitly import std.regexp.

That was on purpose. It uses a proxy.
Feb 15 2006
next sibling parent Derek Parnell <derek psych.ward> writes:
On Wed, 15 Feb 2006 17:13:48 -0800, Walter Bright wrote:

 "Derek Parnell" <derek psych.ward> wrote in message 
 news:fdfenjm7wj46.1xmq12pyjxp8c$.dlg 40tude.net...
 Seems to be working, but more unittests could be written.


And here they are ... void main() { char[] target = "\u2341\u2201\uff16"; assert( "\xff" !~ target ); // fails correctly assert( "\x22" !~ target ); // fails correctly assert( ".\x22." !~ target ); // fails correctly assert( "\uff16" ~~ target ); // succeeds correctly assert( "^\uff16" !~ target ); // fails correctly assert( "\uff16$" ~~ target ); // succeeds correctly assert( "\u2341" ~~ target ); // succeeds correctly assert( "^\u2341" ~~ target ); // succeeds correctly assert( "\u2341$" !~ target ); // fails correctly assert( "\u2201" ~~ target ); // succeeds correctly assert( "^\u2201" !~ target ); // fails correctly assert( "\u2201$" !~ target ); // fails correctly assert( "\u2201\uff16" ~~ target ); // succeeds correctly assert( "^\u2201\uff16" !~ target ); // succeeds correctly assert( "\u2201\uff16$" ~~ target ); // succeeds correctly assert( "\u2341\u2201" ~~ target ); // succeeds correctly assert( "^\u2341\u2201" ~~ target ); // succeeds correctly assert( "\u2341\u2201$" !~ target ); // fails correctly assert( "\u2341\u2201\uff16" ~~ target ); // succeeds correctly assert( "^\u2341\u2201\uff16" ~~ target ); // succeeds correctly assert( "\u2341\u2201\uff16$" ~~ target ); // succeeds correctly assert( "^\u2341\u2201\uff16$" ~~ target ); // succeeds correctly //assert( "\u2341.\uff16" ~~ target ); // fails //assert( "^\u2341.\uff16" ~~ target ); // fails //assert( "\u2341.\uff16$" ~~ target ); // fails //assert( "^\u2341.\uff16$" ~~ target ); // fails assert( "\u2341.." ~~ target ); // succeeds correctly assert( "^\u2341.." ~~ target ); // succeeds correctly //assert( "\u2341..$" ~~ target ); // fails //assert( "^\u2341..$" ~~ target ); // fails assert( ".." ~~ target ); // succeeds correctly assert( "^.." ~~ target ); // succeeds correctly assert( "..$" ~~ target ); // succeeds correctly assert( "^..$" !~ target ); // fails correctly assert( "..\uff16" ~~ target ); // succeeds correctly //assert( "^..\uff16" ~~ target ); // fails assert( "..\uff16$" ~~ target ); // succeeds correctly //assert( "^..\uff16$" ~~ target ); // fails assert( "..." ~~ target ); // succeeds correctly assert( "^..." ~~ target ); // succeeds correctly assert( "...$" ~~ target ); // succeeds correctly //assert( "^...$" ~~ target ); // fails } It seems that the pattern "." only tries to match a single byte and not a single character. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 16/02/2006 1:16:12 PM
Feb 15 2006
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 "Derek Parnell" <derek psych.ward> wrote in message 
 news:fdfenjm7wj46.1xmq12pyjxp8c$.dlg 40tude.net...
 
 BTW, one side effect of the new matching syntax is that you don't have to
 explicitly import std.regexp.

That was on purpose. It uses a proxy.

As cool as this is, I don't entirely like the prospect of cutting yet more ties between standard library components and runtime code. My approach with Ares has been to separate the two, which until now has meant moving only std.utf into the DMD runtime. Now it looks like std.regex will end up there as well (along with std.outbuffer perhaps). With the new language features, is there any reason to continue regex library support? Just how much can't be done by the built-in syntax? Sean
Feb 15 2006
parent reply "Kris" <fu bar.com> writes:
"Sean Kelly" <sean f4.ca> wrote ...
 Walter Bright wrote:
 "Derek Parnell" <derek psych.ward> wrote in message 
 news:fdfenjm7wj46.1xmq12pyjxp8c$.dlg 40tude.net...

 BTW, one side effect of the new matching syntax is that you don't have 
 to
 explicitly import std.regexp.

That was on purpose. It uses a proxy.

As cool as this is, I don't entirely like the prospect of cutting yet more ties between standard library components and runtime code. My approach with Ares has been to separate the two, which until now has meant moving only std.utf into the DMD runtime. Now it looks like std.regex will end up there as well (along with std.outbuffer perhaps). With the new language features, is there any reason to continue regex library support? Just how much can't be done by the built-in syntax?

I agree. And it's hard to fathom what the sudden rush to get this is about. I listed a number of (IMO) serious issues on the main forum, so I'll add my support here that hooking RegExp (and all its various imports) into the compiler is just bad news *at this point in time* Let's just suppose for a minute that the regex-templates work out well. It seems to me that any built-in support for regex (within the D grammar) would be nothing more than a thin veneer over the template syntax (for regex-templates), to make it somewhat more palatable for the masses? That may not come to pass, but it seems that we should at least wait until there's a bit of education and experience in this regard, rather than hurriedly tie the grammar to something which clearly has a number of fundamental problems. Again; what's the big rush here? - Kris
Feb 15 2006
parent reply Sean Kelly <sean f4.ca> writes:
Kris wrote:
 "Sean Kelly" <sean f4.ca> wrote ...
 Walter Bright wrote:
 "Derek Parnell" <derek psych.ward> wrote in message 
 news:fdfenjm7wj46.1xmq12pyjxp8c$.dlg 40tude.net...

 BTW, one side effect of the new matching syntax is that you don't have 
 to
 explicitly import std.regexp.


ties between standard library components and runtime code. My approach with Ares has been to separate the two, which until now has meant moving only std.utf into the DMD runtime. Now it looks like std.regex will end up there as well (along with std.outbuffer perhaps). With the new language features, is there any reason to continue regex library support? Just how much can't be done by the built-in syntax?

I agree. And it's hard to fathom what the sudden rush to get this is about. I listed a number of (IMO) serious issues on the main forum, so I'll add my support here that hooking RegExp (and all its various imports) into the compiler is just bad news *at this point in time*

I'm willing to let the new language feature mature in place. And while I think it's unnecessary given that it can be done just as well in library code, there's something about making regex handling a first class citizen that increases its appeal. However, though I can understand Walter's desire to leverage existing code, I think it's a terrible mistake to make language features rely on library code, even if the relationship is concealed. I don't think this is an issue for D in general (since the language spec obviously doesn't require this) so much as DMD specifically however. For example, the GC code currently imports std.thread to do various things. Now let's say that the private implementation of std.thread changes, and the changes have an impact on inlinable functions. If the GC code isn't rebuilt, and if it was compiled with the -inline option set, all hell could break loose. More generally however, such ties make it very difficult for third party library writers to provide alternate standard library implementations to work with DMD (similar to STLPort or STLSoft for C++), because the compiler runtime must be rebuilt to operate with any new library used. And it's difficult to be certain just what low-level features the runtime may rely on without well-defined points of interaction. This is something I'm completely unfamiliar with coming from a C/C++ background, and it makes me wonder if any other compiled languages are like this as well. Personally, I would love it if more attention were paid to defining necessary library interaction in D. This is probably the most significant thing I've done in Ares and is what I think gives Ares the most credibility as a replacement standard library. And while I would love for Walter to assume control of the DMD runtime and GC portions, doing so would require some care (and discussion) given to how language features such as regex are implemented: does the runtime truly need to interact with the standard library? If so, how? Implicit UTF conversions during foreach seems a reasonable language feature as such code is relatively simple to implement, but regular expression processing is somewhat complicated. Is this a language feature that may be ignored by compilers that target embedded processors, simply because of size/complexity? Can I expect to see shoddy regex implementations in some compilers such that I'm inclined to use a library implementation anyway? My real concern isn't D now so much as D five years from now--I like the language enough that I really want it to succeed :-)
 Let's just suppose for a minute that the regex-templates work out well. It 
 seems to me that any built-in support for regex (within the D grammar) would 
 be nothing more than a thin veneer over the template syntax (for 
 regex-templates), to make it somewhat  more palatable for the masses? That 
 may not come to pass, but it seems that we should at least wait until 
 there's a bit of education and experience in this regard, rather than 
 hurriedly tie the grammar to something which clearly has a number of 
 fundamental problems. Again; what's the big rush here?

It makes an odd sort of sense that a language designed by a compiler writer should have built-in regex support. And it seems like Walter has been thinking about this for a while, so I'm willing to see how it goes. But as a library writer, the way that Walter implements these changes gives me fits ;-) Sean
Feb 16 2006
next sibling parent kris <fu bar.org> writes:
Sean Kelly wrote:
 Kris wrote:
 
 "Sean Kelly" <sean f4.ca> wrote ...

 Walter Bright wrote:

 "Derek Parnell" <derek psych.ward> wrote in message 
 news:fdfenjm7wj46.1xmq12pyjxp8c$.dlg 40tude.net...

 BTW, one side effect of the new matching syntax is that you don't 
 have to
 explicitly import std.regexp.

That was on purpose. It uses a proxy.

As cool as this is, I don't entirely like the prospect of cutting yet more ties between standard library components and runtime code. My approach with Ares has been to separate the two, which until now has meant moving only std.utf into the DMD runtime. Now it looks like std.regex will end up there as well (along with std.outbuffer perhaps). With the new language features, is there any reason to continue regex library support? Just how much can't be done by the built-in syntax?

I agree. And it's hard to fathom what the sudden rush to get this is about. I listed a number of (IMO) serious issues on the main forum, so I'll add my support here that hooking RegExp (and all its various imports) into the compiler is just bad news *at this point in time*

I'm willing to let the new language feature mature in place. And while I think it's unnecessary given that it can be done just as well in library code, there's something about making regex handling a first class citizen that increases its appeal. However, though I can understand Walter's desire to leverage existing code, I think it's a terrible mistake to make language features rely on library code, even if the relationship is concealed. I don't think this is an issue for D in general (since the language spec obviously doesn't require this) so much as DMD specifically however. For example, the GC code currently imports std.thread to do various things. Now let's say that the private implementation of std.thread changes, and the changes have an impact on inlinable functions. If the GC code isn't rebuilt, and if it was compiled with the -inline option set, all hell could break loose. More generally however, such ties make it very difficult for third party library writers to provide alternate standard library implementations to work with DMD (similar to STLPort or STLSoft for C++), because the compiler runtime must be rebuilt to operate with any new library used. And it's difficult to be certain just what low-level features the runtime may rely on without well-defined points of interaction. This is something I'm completely unfamiliar with coming from a C/C++ background, and it makes me wonder if any other compiled languages are like this as well. Personally, I would love it if more attention were paid to defining necessary library interaction in D. This is probably the most significant thing I've done in Ares and is what I think gives Ares the most credibility as a replacement standard library. And while I would love for Walter to assume control of the DMD runtime and GC portions, doing so would require some care (and discussion) given to how language features such as regex are implemented: does the runtime truly need to interact with the standard library? If so, how? Implicit UTF conversions during foreach seems a reasonable language feature as such code is relatively simple to implement, but regular expression processing is somewhat complicated. Is this a language feature that may be ignored by compilers that target embedded processors, simply because of size/complexity? Can I expect to see shoddy regex implementations in some compilers such that I'm inclined to use a library implementation anyway? My real concern isn't D now so much as D five years from now--I like the language enough that I really want it to succeed :-)
 Let's just suppose for a minute that the regex-templates work out 
 well. It seems to me that any built-in support for regex (within the D 
 grammar) would be nothing more than a thin veneer over the template 
 syntax (for regex-templates), to make it somewhat  more palatable for 
 the masses? That may not come to pass, but it seems that we should at 
 least wait until there's a bit of education and experience in this 
 regard, rather than hurriedly tie the grammar to something which 
 clearly has a number of fundamental problems. Again; what's the big 
 rush here?

It makes an odd sort of sense that a language designed by a compiler writer should have built-in regex support. And it seems like Walter has been thinking about this for a while, so I'm willing to see how it goes. But as a library writer, the way that Walter implements these changes gives me fits ;-) Sean

Amen. I made an eerily similar post on the D forum.
Feb 16 2006
prev sibling parent reply Kyle Furlong <kylefurlong gmail.com> writes:
Sean Kelly wrote:
 Kris wrote:
 "Sean Kelly" <sean f4.ca> wrote ...
 Walter Bright wrote:
 "Derek Parnell" <derek psych.ward> wrote in message 
 news:fdfenjm7wj46.1xmq12pyjxp8c$.dlg 40tude.net...

 BTW, one side effect of the new matching syntax is that you don't 
 have to
 explicitly import std.regexp.


more ties between standard library components and runtime code. My approach with Ares has been to separate the two, which until now has meant moving only std.utf into the DMD runtime. Now it looks like std.regex will end up there as well (along with std.outbuffer perhaps). With the new language features, is there any reason to continue regex library support? Just how much can't be done by the built-in syntax?

I agree. And it's hard to fathom what the sudden rush to get this is about. I listed a number of (IMO) serious issues on the main forum, so I'll add my support here that hooking RegExp (and all its various imports) into the compiler is just bad news *at this point in time*

I'm willing to let the new language feature mature in place. And while I think it's unnecessary given that it can be done just as well in library code, there's something about making regex handling a first class citizen that increases its appeal. However, though I can understand Walter's desire to leverage existing code, I think it's a terrible mistake to make language features rely on library code, even if the relationship is concealed. I don't think this is an issue for D in general (since the language spec obviously doesn't require this) so much as DMD specifically however. For example, the GC code currently imports std.thread to do various things. Now let's say that the private implementation of std.thread changes, and the changes have an impact on inlinable functions. If the GC code isn't rebuilt, and if it was compiled with the -inline option set, all hell could break loose. More generally however, such ties make it very difficult for third party library writers to provide alternate standard library implementations to work with DMD (similar to STLPort or STLSoft for C++), because the compiler runtime must be rebuilt to operate with any new library used. And it's difficult to be certain just what low-level features the runtime may rely on without well-defined points of interaction. This is something I'm completely unfamiliar with coming from a C/C++ background, and it makes me wonder if any other compiled languages are like this as well. Personally, I would love it if more attention were paid to defining necessary library interaction in D. This is probably the most significant thing I've done in Ares and is what I think gives Ares the most credibility as a replacement standard library. And while I would love for Walter to assume control of the DMD runtime and GC portions, doing so would require some care (and discussion) given to how language features such as regex are implemented: does the runtime truly need to interact with the standard library? If so, how? Implicit UTF conversions during foreach seems a reasonable language feature as such code is relatively simple to implement, but regular expression processing is somewhat complicated. Is this a language feature that may be ignored by compilers that target embedded processors, simply because of size/complexity? Can I expect to see shoddy regex implementations in some compilers such that I'm inclined to use a library implementation anyway? My real concern isn't D now so much as D five years from now--I like the language enough that I really want it to succeed :-)
 Let's just suppose for a minute that the regex-templates work out 
 well. It seems to me that any built-in support for regex (within the D 
 grammar) would be nothing more than a thin veneer over the template 
 syntax (for regex-templates), to make it somewhat  more palatable for 
 the masses? That may not come to pass, but it seems that we should at 
 least wait until there's a bit of education and experience in this 
 regard, rather than hurriedly tie the grammar to something which 
 clearly has a number of fundamental problems. Again; what's the big 
 rush here?

It makes an odd sort of sense that a language designed by a compiler writer should have built-in regex support. And it seems like Walter has been thinking about this for a while, so I'm willing to see how it goes. But as a library writer, the way that Walter implements these changes gives me fits ;-) Sean

We (the Titan team) have run into this sort of issue with Titan. When trying to untangle the dmd runtime code, we have found huge reliance on library code, both libc and phobos. Another issue that makes porting difficult is the lack of a standard definition of what language features expand to what runtime functions. These two things have made dealing with the dmd runtime extremely hackish and untenable. Walter, for the love of Bob, do something about this.
Feb 16 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Kyle Furlong wrote:
 
 We (the Titan team) have run into this sort of issue with Titan. When 
 trying to untangle the dmd runtime code, we have found huge reliance on 
 library code, both libc and phobos. Another issue that makes porting 
 difficult is the lack of a standard definition of what language features 
 expand to what runtime functions. These two things have made dealing 
 with the dmd runtime extremely hackish and untenable. Walter, for the 
 love of Bob, do something about this.

In the meantime, I suggest using Ares as a starting point. It still uses libc functionality (which I think is unavoidable), but should otherwise be much cleaner to build off of. If you have any questions about library interaction (since I've yet to document a lot of this) feel free to ask in the forums. Sean
Feb 16 2006
parent "Kris" <fu bar.com> writes:
"Sean Kelly" <sean f4.ca> wrote
 Kyle Furlong wrote:
 We (the Titan team) have run into this sort of issue with Titan. When 
 trying to untangle the dmd runtime code, we have found huge reliance on 
 library code, both libc and phobos. Another issue that makes porting 
 difficult is the lack of a standard definition of what language features 
 expand to what runtime functions. These two things have made dealing with 
 the dmd runtime extremely hackish and untenable. Walter, for the love of 
 Bob, do something about this.

In the meantime, I suggest using Ares as a starting point. It still uses libc functionality (which I think is unavoidable), but should otherwise be much cleaner to build off of.

I'll second that. Ares is as good an isolation of D compiler support as you're likely to see anywhere.
Feb 16 2006
prev sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Kyle Furlong" <kylefurlong gmail.com> wrote in message 
news:dt2mkm$17aa$1 digitaldaemon.com...
 We (the Titan team) have run into this sort of issue with Titan. When 
 trying to untangle the dmd runtime code, we have found huge reliance on 
 library code, both libc and phobos.

Sure, but I'm not sure why this is a problem.
 Another issue that makes porting difficult is the lack of a standard 
 definition of what language features expand to what runtime functions. 
 These two things have made dealing with the dmd runtime extremely hackish 
 and untenable. Walter, for the love of Bob, do something about this.

Feb 16 2006
parent reply Kyle Furlong <kylefurlong gmail.com> writes:
Walter Bright wrote:
 "Kyle Furlong" <kylefurlong gmail.com> wrote in message 
 news:dt2mkm$17aa$1 digitaldaemon.com...
 We (the Titan team) have run into this sort of issue with Titan. When 
 trying to untangle the dmd runtime code, we have found huge reliance on 
 library code, both libc and phobos.

Sure, but I'm not sure why this is a problem.
 Another issue that makes porting difficult is the lack of a standard 
 definition of what language features expand to what runtime functions. 
 These two things have made dealing with the dmd runtime extremely hackish 
 and untenable. Walter, for the love of Bob, do something about this.


It basically forces us to write our own libc. And yes one can argue that libc is as necessary as air for any platform, but I'm of the purist type, and feel that the runtime should be self contained. You also did not respond to my request for a runtime standard, is this something you are unwilling to do? i.e. is each vendor's runtime going to be completely incompatible with others?
Feb 16 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Kyle Furlong" <kylefurlong gmail.com> wrote in message
news:dt3gn8$1sq0$1 digitaldaemon.com...
 Walter Bright wrote:
 "Kyle Furlong" <kylefurlong gmail.com> wrote in message
 news:dt2mkm$17aa$1 digitaldaemon.com...
 We (the Titan team) have run into this sort of issue with Titan. When
 trying to untangle the dmd runtime code, we have found huge reliance on
 library code, both libc and phobos.

Sure, but I'm not sure why this is a problem.
 Another issue that makes porting difficult is the lack of a standard
 definition of what language features expand to what runtime functions.
 These two things have made dealing with the dmd runtime extremely
 hackish and untenable. Walter, for the love of Bob, do something about
 this.


It basically forces us to write our own libc. And yes one can argue that libc is as necessary as air for any platform, but I'm of the purist type, and feel that the runtime should be self contained.

Since D is designed to interface directly to C, the C runtime is kind of a given. I also don't see much point in reimplementing things like strlen, strtod, etc. These have been around for decades, they're well optimized, and bug free.
 You also did not respond to my request for a runtime standard, is this
 something you are unwilling to do? i.e. is each vendor's runtime going to
 be completely incompatible with others?

I'm not sure what you're asking. Are you asking if Phobos is a D standard?
Feb 16 2006
next sibling parent reply Kyle Furlong <kylefurlong gmail.com> writes:
Walter Bright wrote:
 "Kyle Furlong" <kylefurlong gmail.com> wrote in message
 news:dt3gn8$1sq0$1 digitaldaemon.com...
 Walter Bright wrote:
 "Kyle Furlong" <kylefurlong gmail.com> wrote in message
 news:dt2mkm$17aa$1 digitaldaemon.com...
 We (the Titan team) have run into this sort of issue with Titan. When
 trying to untangle the dmd runtime code, we have found huge reliance on
 library code, both libc and phobos.

 Another issue that makes porting difficult is the lack of a standard
 definition of what language features expand to what runtime functions.
 These two things have made dealing with the dmd runtime extremely
 hackish and untenable. Walter, for the love of Bob, do something about
 this.


libc is as necessary as air for any platform, but I'm of the purist type, and feel that the runtime should be self contained.

Since D is designed to interface directly to C, the C runtime is kind of a given. I also don't see much point in reimplementing things like strlen, strtod, etc. These have been around for decades, they're well optimized, and bug free.
 You also did not respond to my request for a runtime standard, is this
 something you are unwilling to do? i.e. is each vendor's runtime going to
 be completely incompatible with others?

I'm not sure what you're asking. Are you asking if Phobos is a D standard?

So your dmd compiler emit's references to the _d_whatever extern(C) functions in the runtime, correct? I'm asking if this is going to be a standard, part of the spec, or vendor specific.
Feb 16 2006
parent "Walter Bright" <newshound digitalmars.com> writes:
"Kyle Furlong" <kylefurlong gmail.com> wrote in message 
news:dt3oer$22lg$1 digitaldaemon.com...
 So your dmd compiler emit's references to the _d_whatever extern(C) 
 functions in the runtime, correct?

Yes.
 I'm asking if this is going to be a standard, part of the spec, or vendor 
 specific.

All the stuff in the internal package is meant to be vendor specific. So, yes. For another vendor, the names may change, there may be more or fewer such routines.
Feb 17 2006
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 "Kyle Furlong" <kylefurlong gmail.com> wrote in message
 news:dt3gn8$1sq0$1 digitaldaemon.com...
 
 You also did not respond to my request for a runtime standard, is this
 something you are unwilling to do? i.e. is each vendor's runtime going to
 be completely incompatible with others?

I'm not sure what you're asking. Are you asking if Phobos is a D standard?

I think Kyle is wondering whether a compiler writer could simply sit down and write a D compiler, sans standard library, given the D spec. I think this is possible insofar as language features are concerned, but it may be less clear regarding any points of contact between the runtime and the GC or standard library code. For example, internal/gc/gc.d contains a bunch of extern (C) functions (prefixed with "_d_") which probably really belong to the runtime code. But if this is true, what are the points of contact between the runtime and GC? Using Phobos as a guide, one might think a compiler writer must provide a garbage collector as a part of the runtime, while I consider these logically separate libraries. I think the real goal here is to define a clear separation of labor, so a compiler writer can do his part, a library writer his part, an platform writer his part, and each can be assured that if they follow the spec then their libraries should link against any implementation of the other pieces and work without error. This is really what I'm making an effort to define, and why I fussed so much over this RegExp business. Until this last release, things had been distilled to a few well-defined extern (C) functions--no need for imports at all :-) Sean
Feb 16 2006
next sibling parent Kyle Furlong <kylefurlong gmail.com> writes:
Sean Kelly wrote:
 Walter Bright wrote:
 "Kyle Furlong" <kylefurlong gmail.com> wrote in message
 news:dt3gn8$1sq0$1 digitaldaemon.com...

 You also did not respond to my request for a runtime standard, is this
 something you are unwilling to do? i.e. is each vendor's runtime 
 going to
 be completely incompatible with others?

I'm not sure what you're asking. Are you asking if Phobos is a D standard?

I think Kyle is wondering whether a compiler writer could simply sit down and write a D compiler, sans standard library, given the D spec. I think this is possible insofar as language features are concerned, but it may be less clear regarding any points of contact between the runtime and the GC or standard library code. For example, internal/gc/gc.d contains a bunch of extern (C) functions (prefixed with "_d_") which probably really belong to the runtime code. But if this is true, what are the points of contact between the runtime and GC? Using Phobos as a guide, one might think a compiler writer must provide a garbage collector as a part of the runtime, while I consider these logically separate libraries. I think the real goal here is to define a clear separation of labor, so a compiler writer can do his part, a library writer his part, an platform writer his part, and each can be assured that if they follow the spec then their libraries should link against any implementation of the other pieces and work without error. This is really what I'm making an effort to define, and why I fussed so much over this RegExp business. Until this last release, things had been distilled to a few well-defined extern (C) functions--no need for imports at all :-) Sean

Well put Sean, I would be very interested in Walter's take on these issues.
Feb 16 2006
prev sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dt3phq$23iu$1 digitaldaemon.com...
 I think Kyle is wondering whether a compiler writer could simply sit down 
 and write a D compiler, sans standard library, given the D spec.  I think 
 this is possible insofar as language features are concerned, but it may be 
 less clear regarding any points of contact between the runtime and the GC 
 or standard library code.  For example, internal/gc/gc.d contains a bunch 
 of extern (C) functions (prefixed with "_d_") which probably really belong 
 to the runtime code.  But if this is true, what are the points of contact 
 between the runtime and GC?

For the language implementor, the stuff in std.gc. How operator new interfaces with the gc is up to the language implementor.
 Using Phobos as a guide, one might think a compiler writer must provide a 
 garbage collector as a part of the runtime, while I consider these 
 logically separate libraries.  I think the real goal here is to define a 
 clear separation of labor, so a compiler writer can do his part, a library 
 writer his part, an platform writer his part, and each can be assured that 
 if they follow the spec then their libraries should link against any 
 implementation of the other pieces and work without error.  This is really 
 what I'm making an effort to define, and why I fussed so much over this 
 RegExp business.  Until this last release, things had been distilled to a 
 few well-defined extern (C) functions--no need for imports at all :-)

Feb 17 2006
parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 "Sean Kelly" <sean f4.ca> wrote in message 
 news:dt3phq$23iu$1 digitaldaemon.com...
 I think Kyle is wondering whether a compiler writer could simply sit down 
 and write a D compiler, sans standard library, given the D spec.  I think 
 this is possible insofar as language features are concerned, but it may be 
 less clear regarding any points of contact between the runtime and the GC 
 or standard library code.  For example, internal/gc/gc.d contains a bunch 
 of extern (C) functions (prefixed with "_d_") which probably really belong 
 to the runtime code.  But if this is true, what are the points of contact 
 between the runtime and GC?

For the language implementor, the stuff in std.gc. How operator new interfaces with the gc is up to the language implementor.

But what if the user wants to employ a non-standard GC? There have already been questions about this for real-time programming and other specialized applications. While I'm coming to understand your argument about the necessary reliance of runtime code on library code, I do believe that D can only benefit if the scope of this reliance and the means of interaction are well-defined. You've mentioned that, according to their specs, the C/C++ libraries are inextricably intertwined with the compiler definition, and have said that you consider this something you've sought to fix in D. And while I don't have the experience with writing C/C++ compilers that you do (and therefore have little exposure to this particular issue), it does seem we somewhat agree on what the correct approach for library design should be. As a point of discussion, I'd like to outline what I've done with Ares thus far. First, it's important to note that I consider the runtime to be a distinct library containing anything required for basic language support, the garbage collector similarly separated and devoted to memory management, and the standard library as a third distinct entity which contains all components and interfaces the user is expected to actually interact with. Phobos already has this basic separation, but the points of interaction between each component aren't particularly well-defined. For example, if someone wants to provided a specialized garbage collector, what does he do? A bit of research will reveal that some modules from internal/gc should be removed and a new class of type GC should be created, but this requires more interaction with low-level code than most users want to have. Second, I believe it's important that the need to import modules across these library boundaries should be avoided if at all possible, as doing so creates a compile-time depencency between them. Also, it seems logical to assume that the runtime and GC code might not be written in D at all, so the points of interaction should be equally accessible from other languages, implying that all such points of interaction should be extern (C) functions. This also hass the side-benefit of allowing the functions to be delared in the module they're called, as the name mangling scheme ignores declaration placement. Since the purpose of a garbage collector is to allocate and manage memory, I see little need to extend its interface beyond this. Therefore I consider the "_d_" prefixed calls in internal/gc/gc.d to really belong to the runtime, where I've moved them. Currently, a GC library in Ares is required to expose these functions (which are are wrapped in a static class instance for user access in the standard library): extern (C) void gc_init(); extern (C) void gc_term(); alias void function( void *p, void *dummy ) gc_finalizer; extern (C) void gc_setFinalizer( void *p, gc_finalizer fn ); extern (C) void gc_enable(); extern (C) void gc_disable(); extern (C) void gc_collect(); extern (C) void* gc_malloc( size_t sz ); extern (C) void* gc_calloc( size_t nm, size_t sz ); extern (C) void* gc_realloc( void* p, size_t sz ); extern (C) void gc_free( void* p ); extern (C) size_t gc_sizeOf( void* p ); extern (C) size_t gc_capacityOf( void* p ); extern (C) void gc_addRoot( void* p ); extern (C) void gc_addRange( void* pbeg, void* pend ); extern (C) void gc_removeRoot( void* p ); extern (C) void gc_removeRange( void* pbeg, void* pend ); extern (C) void gc_pin( void* p ); extern (C) void gc_unpin( void* p ); I've also considered requiring that the runtime expose an os_getStaticDataSegment function and potentially other OS-specific memory related functions, but haven't gotten around to it so far. The remaining points of interaction are all provided by the standard library. First, the callbacks for runtime errors, all of which are expected to throw exceptions as default behavior (though onAssert can be hooked at run-time if the user wishes to signal the debugger or something similar): extern (C) void onAssert( char[] file, uint line ); extern (C) void onOutOfMemory(); extern (C) void onArrayBoundsError( char[] file, size_t line ); extern (C) void onSwitchError( char[] file, size_t line ); extern (C) void onInvalidUtfError( char[] msg, size_t idx ); And then a way to monitor and control multithreading for debugging or GC use: extern (C) bit multiThreaded(); extern (C) void suspendAllThreads(); extern (C) void resumeAllThreads(); extern (C) void scanAllThreads( void delegate( void*, void* ) fn ); Also, I suspect I'll now be adding a set of functions for RegExp interaction, but haven't done that yet. So you can see that, so far, there has been no need to import any modules across library boundaries--all imports are either internal or of C headers (which can be easily declared in the module they're called if desired). I think Phobos could ultimately benefit from such an arrangement, but it's really not critical at this point. Sean
Feb 17 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dt5cbc$i49$1 digitaldaemon.com...
 Walter Bright wrote:
 For the language implementor, the stuff in std.gc. How operator new 
 interfaces with the gc is up to the language implementor.

already been questions about this for real-time programming and other specialized applications.

I don't know what you mean by non-standard. It must implement the interface in std.gc, and operator new and delete need to work. Other than that, there are a wide range of gc implementation strategies one can use.
 First, it's important to note that I consider the runtime to be a distinct 
 library containing anything required for basic language support, the 
 garbage collector similarly separated and devoted to memory management, 
 and the standard library as a third distinct entity which contains all 
 components and interfaces the user is expected to actually interact with. 
 Phobos already has this basic separation, but the points of interaction 
 between each component aren't particularly well-defined. For example, if 
 someone wants to provided a specialized garbage collector, what does he 
 do?  A bit of research will reveal that some modules from internal/gc 
 should be removed and a new class of type GC should be created, but this 
 requires more interaction with low-level code than most users want to 
 have.

Writing a gc is non-trivial, and someone who is up to that task I doubt will have much difficulty with the interface to it. You're right in that one can't casually create a GC class, but I don't see that as a fault in the interface.
 Second, I believe it's important that the need to import modules across 
 these library boundaries should be avoided if at all possible, as doing so 
 creates a compile-time depencency between them.  Also, it seems logical to 
 assume that the runtime and GC code might not be written in D at all, so 
 the points of interaction should be equally accessible from other 
 languages, implying that all such points of interaction should be extern 
 (C) functions.  This also hass the side-benefit of allowing the functions 
 to be delared in the module they're called, as the name mangling scheme 
 ignores declaration placement.

I don't see the reason why one would want to write a new GC that is not in D. If one wants to use an existing one, say the Boehm GC which is in C, all one needs is a simple wrapper of D functions around the Boehm ones.
 So you can see that, so far, there has been no need to import any modules 
 across library boundaries--all imports are either internal or of C headers 
 (which can be easily declared in the module they're called if desired).  I 
 think Phobos could ultimately benefit from such an arrangement, but it's 
 really not critical at this point.

I see what you're doing, but what is the advantage of avoiding doing the import if you're going to need that code anyway?
Feb 17 2006
parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 "Sean Kelly" <sean f4.ca> wrote in message 
 news:dt5cbc$i49$1 digitaldaemon.com...
 Walter Bright wrote:
 For the language implementor, the stuff in std.gc. How operator new 
 interfaces with the gc is up to the language implementor.

already been questions about this for real-time programming and other specialized applications.

I don't know what you mean by non-standard. It must implement the interface in std.gc, and operator new and delete need to work. Other than that, there are a wide range of gc implementation strategies one can use.

By non-standard I simply meant a different implementation.
 First, it's important to note that I consider the runtime to be a distinct 
 library containing anything required for basic language support, the 
 garbage collector similarly separated and devoted to memory management, 
 and the standard library as a third distinct entity which contains all 
 components and interfaces the user is expected to actually interact with. 
 Phobos already has this basic separation, but the points of interaction 
 between each component aren't particularly well-defined. For example, if 
 someone wants to provided a specialized garbage collector, what does he 
 do?  A bit of research will reveal that some modules from internal/gc 
 should be removed and a new class of type GC should be created, but this 
 requires more interaction with low-level code than most users want to 
 have.

Writing a gc is non-trivial, and someone who is up to that task I doubt will have much difficulty with the interface to it. You're right in that one can't casually create a GC class, but I don't see that as a fault in the interface.

True enough.
 Second, I believe it's important that the need to import modules across 
 these library boundaries should be avoided if at all possible, as doing so 
 creates a compile-time depencency between them.  Also, it seems logical to 
 assume that the runtime and GC code might not be written in D at all, so 
 the points of interaction should be equally accessible from other 
 languages, implying that all such points of interaction should be extern 
 (C) functions.  This also hass the side-benefit of allowing the functions 
 to be delared in the module they're called, as the name mangling scheme 
 ignores declaration placement.

I don't see the reason why one would want to write a new GC that is not in D. If one wants to use an existing one, say the Boehm GC which is in C, all one needs is a simple wrapper of D functions around the Boehm ones.

I meant that more as a general statement rather than regarding the GC specifically--I think it's more likely that portions of the runtime code will not be written in D than the GC. But as D code can call C functions directly, why not use that for library interaction? It seems more straightforward than creating wrappers. Also, I think the thread control functions may be useful for a debugger (which may well be written in C/C++), and the GC functions might be useful in mixed-code applications. Wrappers could again be created, but I don't see the point.
 So you can see that, so far, there has been no need to import any modules 
 across library boundaries--all imports are either internal or of C headers 
 (which can be easily declared in the module they're called if desired).  I 
 think Phobos could ultimately benefit from such an arrangement, but it's 
 really not critical at this point.

I see what you're doing, but what is the advantage of avoiding doing the import if you're going to need that code anyway?

Largely to avoid compile-time dependencies between libaries, as I feel it's important that a user should be able to download an alternate standard library or GC and use it simply by linking it in. And while this could also be accomplished by documenting that UDTs should perhaps not be used and compiling against header modules, it seems more straightforward to simply define things at the code level. Another benefit I discovered is that this approach allows specialized functionality to be exposed or code paths to be optimized specifically for library use. For example, I'd originally defined a Thread.count method which I knew was being called by the GC. But when I got around to looking at the GC code I realized that it didn't actually care how many threads were running so much as whether critical sections were necessary to ensure correct behavior. And this revealed that the way I was tracking thread count--modified by the newly created thread before entering user code--was not only incorrect, but the fact that Thread.count passed through a critical section of its own made it effectively useless to the GC code. The redesigned function serves one purpose: to indicate whether Thread.start has ever been called by the application, and thus whether memory synchronization issues might be present or mutual exclusion might be necessary. No critical sections are used, and indeed, a count of threads isn't even maintained--just a bit flag. This approach was obvious in light of what the GC needed, but it was not at all apparent from the context of what a standard library user might be interested in. Finally, defining specific means of interaction allows behavior to be modified quite easily. When a system error occurs in Ares, rather than throwing an exception directly the runtime instead passes relevant information to a callback exposed by the standard library. Thus the runtime has no dependence on the exception object definition (aside from the requirements imposed by the stack unwinding code itself), and the user has a clear means of hooking the error handling mechanism if different behavior is desired--the behavior of onAssertError can be modified, for example, so the user can signal the debugger immediately instead of waiting for an exception to propogate. If the modules were imported and exceptions thrown directly, this would obviously not be possible. Since this approach seems to provide at least marginal benefit, I would like to turn things around and ask what the advantage is of importing modules directly as Phobos does? I can see that it offers immediate relief if the library writer decides he needs more functionality than has been predetermined, but with a prototype standard library already in place I would think that such needs should already be obvious. Are there other advantages as well? Sean
Feb 18 2006
parent "Walter Bright" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:dt7u8d$2o5g$1 digitaldaemon.com...
 Since this approach seems to provide at least marginal benefit, I would 
 like to turn things around and ask what the advantage is of importing 
 modules directly as Phobos does?  I can see that it offers immediate 
 relief if the library writer decides he needs more functionality than has 
 been predetermined, but with a prototype standard library already in place 
 I would think that such needs should already be obvious.  Are there other 
 advantages as well?

You have made some good points.
Feb 20 2006
prev sibling next sibling parent reply Kyle Furlong <kylefurlong gmail.com> writes:
Walter Bright wrote:
 Added match expressions.
 
 http://www.digitalmars.com/d/changelog.html
 
 
 

Is it possible to drop in compile-time regex support? (i.e. Eric's solution)
Feb 15 2006
parent reply pragma <pragma_member pathlink.com> writes:
In article <dt0k9b$27jr$1 digitaldaemon.com>, Kyle Furlong says...
Walter Bright wrote:
 Added match expressions.
 
 http://www.digitalmars.com/d/changelog.html
 
 
 

Is it possible to drop in compile-time regex support? (i.e. Eric's solution)

IMHO, it's not quite ready for prime-time yet. In fact, some parts of it are still somewhat incomplete. :( - Eric Anderton at yahoo
Feb 15 2006
parent reply James Dunne <james.jdunne gmail.com> writes:
pragma wrote:
 In article <dt0k9b$27jr$1 digitaldaemon.com>, Kyle Furlong says...
 
Walter Bright wrote:

Added match expressions.

http://www.digitalmars.com/d/changelog.html

Is it possible to drop in compile-time regex support? (i.e. Eric's solution)

IMHO, it's not quite ready for prime-time yet. In fact, some parts of it are still somewhat incomplete. :( - Eric Anderton at yahoo

Not to knock Eric's great efforts at compile-time regex (which is seriously cool, btw), but I would be more impressed at code generation of regex parsing. Have the compiler itself write out some highly optimized goto-like code and have it parse known regex strings at runtime in the fastest way possible. Reminds me of the approach of the Ragel state machine (link on D Links page), but doesn't have to be anywhere near as complicated. -- Regards, James Dunne
Feb 15 2006
parent reply BCS <BCS_member pathlink.com> writes:
James Dunne wrote:
 pragma wrote:
 
 In article <dt0k9b$27jr$1 digitaldaemon.com>, Kyle Furlong says...

 Walter Bright wrote:

 Added match expressions.

 http://www.digitalmars.com/d/changelog.html

Is it possible to drop in compile-time regex support? (i.e. Eric's solution)

IMHO, it's not quite ready for prime-time yet. In fact, some parts of it are still somewhat incomplete. :( - Eric Anderton at yahoo

Not to knock Eric's great efforts at compile-time regex (which is seriously cool, btw), but I would be more impressed at code generation of regex parsing. Have the compiler itself write out some highly optimized goto-like code and have it parse known regex strings at runtime in the fastest way possible. Reminds me of the approach of the Ragel state machine (link on D Links page), but doesn't have to be anywhere near as complicated.

I am not commenting on the regex support in particular (I haven't used it yet), however I think that the introduction if this _type_ of feature is a good thing, if it is done carefully. To elaborate, the use of templates to implement compile time regex just seems like an error prone mess (a fantastic, made by a genus mess, but still a mess). While templates can be vary powerful and get a lot of stuff done, I think that the language should include support for compile time programming that is not just a side effect of other features. As an example of what I would like to see more of, I posted a while ago proposing a witheach statement. digitalmars.D/32232 There are a few other tasks that I think should be easily done at compile time construction of a balanced binary search tree for instance.
Feb 16 2006
parent reply Georg Wrede <georg.wrede nospam.org> writes:
BCS wrote:
 To elaborate, the use of templates to implement compile time regex just 
 seems like an error prone mess (a fantastic, made by a genus mess, but 
 still a mess). 

Probably nobody thinks that compile time regexes should be implemented with template metaprogramming. While D template programming screams compared to C++, shoving work on the template system does slow down compilation unnecessarily, compared to doing things 'the proper way'. This would erode the absolutely coolest feature DMD has: a blazing compilation speed.
 While templates can be vary powerful and get a lot of 
 stuff done, I think that the language should include support for compile 
 time programming that is not just a side effect of other features.

At the time, I think they just served to demonstrate a few things: - that you (or actually, Don) really can do most amazing things with templates - that showing this would motivate Walter to add effort and priority to implementing them properly (i.e. non-template) - serve as a vehicle to demonstrate the utility (of both template programming in itself, and the utility of compile-time regexes)
Feb 17 2006
parent reply James Dunne <james.jdunne gmail.com> writes:
Georg Wrede wrote:
 BCS wrote:
 
 To elaborate, the use of templates to implement compile time regex 
 just seems like an error prone mess (a fantastic, made by a genus 
 mess, but still a mess). 

Probably nobody thinks that compile time regexes should be implemented with template metaprogramming. While D template programming screams compared to C++, shoving work on the template system does slow down compilation unnecessarily, compared to doing things 'the proper way'. This would erode the absolutely coolest feature DMD has: a blazing compilation speed.
 While templates can be vary powerful and get a lot of stuff done, I 
 think that the language should include support for compile time 
 programming that is not just a side effect of other features.

At the time, I think they just served to demonstrate a few things: - that you (or actually, Don) really can do most amazing things with templates - that showing this would motivate Walter to add effort and priority to implementing them properly (i.e. non-template) - serve as a vehicle to demonstrate the utility (of both template programming in itself, and the utility of compile-time regexes)

DMD has the speed; that's great and all, but we simply can't assume all implementations of the D language will be equivalent in performance. (Someone is going to write one in Java, I just know it). IMO, basing language decisions off reference implementations is a Bad Thing. -- Regards, James Dunne
Feb 18 2006
parent clayasaurus <clayasaurus gmail.com> writes:
James Dunne wrote:
 Georg Wrede wrote:
 BCS wrote:

 To elaborate, the use of templates to implement compile time regex 
 just seems like an error prone mess (a fantastic, made by a genus 
 mess, but still a mess). 

Probably nobody thinks that compile time regexes should be implemented with template metaprogramming. While D template programming screams compared to C++, shoving work on the template system does slow down compilation unnecessarily, compared to doing things 'the proper way'. This would erode the absolutely coolest feature DMD has: a blazing compilation speed.
 While templates can be vary powerful and get a lot of stuff done, I 
 think that the language should include support for compile time 
 programming that is not just a side effect of other features.

At the time, I think they just served to demonstrate a few things: - that you (or actually, Don) really can do most amazing things with templates - that showing this would motivate Walter to add effort and priority to implementing them properly (i.e. non-template) - serve as a vehicle to demonstrate the utility (of both template programming in itself, and the utility of compile-time regexes)

DMD has the speed; that's great and all, but we simply can't assume all implementations of the D language will be equivalent in performance. (Someone is going to write one in Java, I just know it). IMO, basing language decisions off reference implementations is a Bad Thing.

No, but we can assume implementations of D will be faster than C++ since Walter's DMD is twice as fast as DMC for building Empire, even though they share the same optimizer, code gen, and linker. The D frontend, which is open source, gives D its speed. For me, the fast compile times compared to C++ are a big feature of D.
Feb 18 2006
prev sibling next sibling parent reply Tom <Tom_member pathlink.com> writes:
In article <dt088d$1svm$1 digitaldaemon.com>, Walter Bright says...
Added match expressions.

http://www.digitalmars.com/d/changelog.html

A question: I wonder, do you fix the regressions that arise on each of these releases? (I really ask myself 'cos I don't see that fixes in the changelog or maybe i'm wrong) Thanks in advance, P.S.: Another little question (i know, it's a second one :-D), sorry about my ignorance of common emoticons and stuff, what does <g> means? Tom;
Feb 15 2006
next sibling parent James Dunne <james.jdunne gmail.com> writes:
Tom wrote:
 In article <dt088d$1svm$1 digitaldaemon.com>, Walter Bright says...
 
Added match expressions.

http://www.digitalmars.com/d/changelog.html

A question: I wonder, do you fix the regressions that arise on each of these releases? (I really ask myself 'cos I don't see that fixes in the changelog or maybe i'm wrong) Thanks in advance, P.S.: Another little question (i know, it's a second one :-D), sorry about my ignorance of common emoticons and stuff, what does <g> means? Tom;

<grin> -- Regards, James Dunne
Feb 15 2006
prev sibling parent "Walter Bright" <newshound digitalmars.com> writes:
"Tom" <Tom_member pathlink.com> wrote in message 
news:dt0m6n$29jv$1 digitaldaemon.com...
 A question: I wonder, do you fix the regressions that arise on each of 
 these
 releases? (I really ask myself 'cos I don't see that fixes in the 
 changelog or
 maybe i'm wrong)

I try to do the most important ones first.
 P.S.: Another little question (i know, it's a second one :-D), sorry about 
 my
 ignorance of common emoticons and stuff, what does <g> means?

grin
Feb 15 2006
prev sibling next sibling parent huangliang <huangliang_member pathlink.com> writes:
MatchExpression is a robust feather, but too robust.
we do not need another text oriented language, Perl takes up the place.

D is complex enough, pls don't give it more syntax.
I suggest to freeze features, and improve those existence.

how about 'implicit template instantiation', 'function and delegate' etc...
Feb 16 2006
prev sibling next sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Walter Bright wrote:
 Added match expressions.
 
 http://www.digitalmars.com/d/changelog.html

Cool! You really have to be working like 25 hours a day at this! What does "When a MatchExpression is the operand of an IfStatement or WhileStatement, special handling happens." in the doc mean? And another question: I assume all literal regexes will some day be compiled at compile time, right? Are we there yet?
Feb 16 2006
next sibling parent "Charles" <noone nowhere.com> writes:
 What does

     "When a MatchExpression is the operand of an IfStatement
     or WhileStatement, special handling happens."

.... Trouble. Just curious, was this 'built in regex' on anyone's wish list besides Matthew's ? Charlie "Georg Wrede" <georg.wrede nospam.org> wrote in message news:43F48BBB.1050001 nospam.org...
 Walter Bright wrote:
 Added match expressions.

 http://www.digitalmars.com/d/changelog.html

Cool! You really have to be working like 25 hours a day at this! What does "When a MatchExpression is the operand of an IfStatement or WhileStatement, special handling happens." in the doc mean? And another question: I assume all literal regexes will some day be compiled at compile time, right? Are we there yet?

Feb 16 2006
prev sibling parent "Walter Bright" <newshound digitalmars.com> writes:
"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:43F48BBB.1050001 nospam.org...
 What does

    "When a MatchExpression is the operand of an IfStatement
    or WhileStatement, special handling happens."

 in the doc mean?

It's explained in the IfStatement and WhileStatement sections.
 And another question: I assume all literal regexes will some day be 
 compiled at compile time, right?

Yes.
 Are we there yet?

Not even close :-(
Feb 16 2006
prev sibling next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Walter Bright wrote:
 Added match expressions.
 
 http://www.digitalmars.com/d/changelog.html

So this int[] x, y; ... x=y~~42; won't work anymore.... Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Feb 16 2006
next sibling parent James Dunne <james.jdunne gmail.com> writes:
Stewart Gordon wrote:
 Walter Bright wrote:
 
 Added match expressions.

 http://www.digitalmars.com/d/changelog.html

So this int[] x, y; ... x=y~~42; won't work anymore.... Stewart.

If you're not using whitespace to deliniate your tokens in the first place, you should expect things like this. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/MU/S d-pu s:+ a-->? C++++$ UL+++ P--- L+++ !E W-- N++ o? K? w--- O M-- V? PS PE Y+ PGP- t+ 5 X+ !R tv-->!tv b- DI++(+) D++ G e++>e h>--->++ r+++ y+++ ------END GEEK CODE BLOCK------ James Dunne
Feb 16 2006
prev sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message 
news:dt2476$ii2$1 digitaldaemon.com...
 Walter Bright wrote:
 Added match expressions.

 http://www.digitalmars.com/d/changelog.html

So this int[] x, y; ... x=y~~42; won't work anymore....

That's right. Neither will: x = !~y; It's in the same vein that: x = y/*p; never worked, either.
Feb 16 2006
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Walter Bright wrote:
 "Stewart Gordon" <smjg_1998 yahoo.com> wrote in message 
 news:dt2476$ii2$1 digitaldaemon.com...
 Walter Bright wrote:
 Added match expressions.

 http://www.digitalmars.com/d/changelog.html

int[] x, y; ... x=y~~42; won't work anymore....

That's right. Neither will: x = !~y; It's in the same vein that: x = y/*p; never worked, either.

At least neither of those two is syntactically valid now. Why are MatchExpression and NotMatchExpression separate nonterminals? Why not simply MatchExpression: EqualExpression ~~ RelExpression EqualExpression !~ RelExpression or even EqualExpression: RelExpression EqualExpression == RelExpression EqualExpression != RelExpression EqualExpression is RelExpression EqualExpression !is RelExpression EqualExpression ~~ RelExpression EqualExpression !~ RelExpression ? Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Feb 16 2006
next sibling parent Sean Kelly <sean f4.ca> writes:
Stewart Gordon wrote:
 Walter Bright wrote:
 "Stewart Gordon" <smjg_1998 yahoo.com> wrote in message 
 news:dt2476$ii2$1 digitaldaemon.com...
 Walter Bright wrote:
 Added match expressions.

 http://www.digitalmars.com/d/changelog.html

int[] x, y; ... x=y~~42; won't work anymore....

That's right. Neither will: x = !~y; It's in the same vein that: x = y/*p; never worked, either.

At least neither of those two is syntactically valid now. Why are MatchExpression and NotMatchExpression separate nonterminals? Why not simply MatchExpression: EqualExpression ~~ RelExpression EqualExpression !~ RelExpression

I think because MatchExpression injects a _Match* object into the following scope, while NotMatchExpression does not. Sean
Feb 16 2006
prev sibling parent "Walter Bright" <newshound digitalmars.com> writes:
"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message 
news:dt2j3t$13tb$1 digitaldaemon.com...
 Why are MatchExpression and NotMatchExpression separate nonterminals?

Because IfStatement handles them differently.
Feb 16 2006
prev sibling parent reply Wang Zhen <nehzgnaw gmail.com> writes:
Although syntactically correct, MatchExpression in StaticIfCondition or 
StaticAssert do not compile. For example:

void main(){static if(!("" ~~ "")){}static assert("" ~~ "");}

Is this intended or an unimplemented feature?


Walter Bright wrote:
 Added match expressions.
 
 http://www.digitalmars.com/d/changelog.html
 
 
 

Feb 17 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Wang Zhen" <nehzgnaw gmail.com> wrote in message 
news:dt49iv$2hm5$1 digitaldaemon.com...
 Although syntactically correct, MatchExpression in StaticIfCondition or 
 StaticAssert do not compile. For example:

 void main(){static if(!("" ~~ "")){}static assert("" ~~ "");}

 Is this intended or an unimplemented feature?

The problem is that getting it to work requires the compiler itself to understand regular expressions. Currently, it does not.
Feb 17 2006
next sibling parent "Craig Black" <cblack ara.com> writes:
 The problem is that getting it to work requires the compiler itself to 
 understand regular expressions. Currently, it does not.

You could also perhaps use compile-time templates to evaluate static if regex's. However, it would be another compiler dependency on a library. -Craig
Feb 17 2006
prev sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Walter Bright wrote:
 "Wang Zhen" <nehzgnaw gmail.com> wrote in message 
 news:dt49iv$2hm5$1 digitaldaemon.com...
 
 Although syntactically correct, MatchExpression in
 StaticIfCondition or StaticAssert do not compile. For example:
 
 void main(){static if(!("" ~~ "")){}static assert("" ~~ "");}
 
 Is this intended or an unimplemented feature?

The problem is that getting it to work requires the compiler itself to understand regular expressions. Currently, it does not.

Intriguing. I'd sure love to hear more about this. I take it understanding regular expressions is much more than just compiling them? (Like what the runtime does, or Perl, etc.?)
Feb 17 2006
next sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:43F658D2.2000608 nospam.org...
 Walter Bright wrote:
 "Wang Zhen" <nehzgnaw gmail.com> wrote in message 
 news:dt49iv$2hm5$1 digitaldaemon.com...

 Although syntactically correct, MatchExpression in
 StaticIfCondition or StaticAssert do not compile. For example:

 void main(){static if(!("" ~~ "")){}static assert("" ~~ "");}

 Is this intended or an unimplemented feature?

The problem is that getting it to work requires the compiler itself to understand regular expressions. Currently, it does not.

Intriguing. I'd sure love to hear more about this.

If the compiler is to constant fold regular expressions, then it needs to build in to the compiler exactly what would happen if the regex code was evaluated at runtime.
 I take it understanding regular expressions is much more than just 
 compiling them? (Like what the runtime does, or Perl, etc.?)

I think the confusion here is the difference between compiling a string literal, and compiling the regular expression within the string literal. DMD currently does the former, the latter is done at runtime by std.regexp.
Feb 17 2006
parent reply Georg Wrede <georg.wrede nospam.org> writes:
Walter Bright wrote:
 "Georg Wrede" <georg.wrede nospam.org> wrote in message 
 news:43F658D2.2000608 nospam.org...
 
 Walter Bright wrote:
 
 "Wang Zhen" <nehzgnaw gmail.com> wrote in message 
 news:dt49iv$2hm5$1 digitaldaemon.com...
 
 
 Although syntactically correct, MatchExpression in 
 StaticIfCondition or StaticAssert do not compile. For example:
 
 void main(){static if(!("" ~~ "")){}static assert("" ~~ "");}
 
 Is this intended or an unimplemented feature?

The problem is that getting it to work requires the compiler itself to understand regular expressions. Currently, it does not.

Intriguing. I'd sure love to hear more about this.

If the compiler is to constant fold regular expressions, then it needs to build in to the compiler exactly what would happen if the regex code was evaluated at runtime.

Yes. IMHO in essence, the binary machine code, which the runtime also would build. What I have a hard time seeing is, how this differs from building a normal function at compile time? And eventually storing both in the executable image. (I'd give you more intelligent questions, but I'm too baffled.)
Feb 21 2006
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:43FB25FC.8090806 nospam.org...
 Walter Bright wrote:
 If the compiler is to constant fold regular expressions, then it
 needs to build in to the compiler exactly what would happen if the
 regex code was evaluated at runtime.

would build. What I have a hard time seeing is, how this differs from building a normal function at compile time?

Consider the strlen() function. Compiling a strlen() function and generating machine code for it is a very different thing from the compiler knowing what strlen is and replacing: strlen("abc") with: 3
Feb 21 2006
next sibling parent reply "Lionello Lunesu" <lio remove.lunesu.com> writes:
Interesting indeed.

Is there no way to "fold constants" in this kind of code too? If you know 
the inputs to a function are all constant, can't you simply replace the 
inputs + function call with the function's output?

Would be really cool if this kind of general constant folding could take 
place. The compiler would need to keep track of all constant variables, and 
flagging outputs of operations with constants as constants too. In your 
example, since the input to the strlen function is a constant, the compiler 
could just call the strlen-code itself and replace the actual call with that 
call's output.

I have no experience what-so-ever with compiler writing, so I'm probably 
overlooking MANY things :-)

Lio.

"Walter Bright" <newshound digitalmars.com> wrote in message 
news:dtfin6$29hi$1 digitaldaemon.com...
 "Georg Wrede" <georg.wrede nospam.org> wrote in message 
 news:43FB25FC.8090806 nospam.org...
 Walter Bright wrote:
 If the compiler is to constant fold regular expressions, then it
 needs to build in to the compiler exactly what would happen if the
 regex code was evaluated at runtime.

would build. What I have a hard time seeing is, how this differs from building a normal function at compile time?

Consider the strlen() function. Compiling a strlen() function and generating machine code for it is a very different thing from the compiler knowing what strlen is and replacing: strlen("abc") with: 3

Feb 22 2006
next sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Lionello Lunesu skrev:
 Interesting indeed.
 
 Is there no way to "fold constants" in this kind of code too? If you know 
 the inputs to a function are all constant, can't you simply replace the 
 inputs + function call with the function's output?

Disclaimer: I don't know much about this. Most is pure speculation. I guess it is theoretically possible, but the compiler has to know that the function is pure. That is: a) The function can not have any side effects. b) The result has to be deterministic and only depend on the arguments. This means that the function can not call any function not fulfilling a and b, and that it can not rely on things like floating point rounding state etc. In the general case, the compiler has no way of knowing this. The function may be externally defined, and only resolved at link time. For stdlib-functions the compiler could of course be given this knowledge beforehand (like strlen). For functions fully known to the compiler, inlining followed by constant folding could theoretically have the same effect, but I don't think any compilers are smart enough to identify pure blocks of code in a general fashion and being able to evaluate them at compile time. Somewhat easier would be to identify pure functions and evaluate them at compile time. I guess this is going much further than current constant folding. The problems I see are: a) Hard for the compiler to tell if a function is pure. In many cases it is not even possible (The halting problem has an example of such an undecidable function). b) The compiler needs a way to evaluate the function at compile time. c) The compiler has no way of knowing the function space and time complexity. It would be interesting if there was a way to flag functions as being pure. The compiler could then try to evaluate the function at compile time or reduce the number of calls to the function at run time similar to what a common sub-expression removal optimization would do. /Oskar
Feb 22 2006
next sibling parent Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Oskar Linde wrote:
 It would be interesting if there was a way to flag functions as being
 pure. 

This is what I've always thought declaring a function as "const", like can be done in C++, should do. Optimisation avenues galore.
Feb 22 2006
prev sibling parent reply "Lionello Lunesu" <lio remove.lunesu.com> writes:
"Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
news:dthe8b$1jg1$1 digitaldaemon.com...
 Lionello Lunesu skrev:
 Interesting indeed.

 Is there no way to "fold constants" in this kind of code too? If you know 
 the inputs to a function are all constant, can't you simply replace the 
 inputs + function call with the function's output?

Disclaimer: I don't know much about this. Most is pure speculation. I guess it is theoretically possible, but the compiler has to know that the function is pure. That is: a) The function can not have any side effects.

Good point. Completely forgot about that.
 b) The result has to be deterministic and only depend on the arguments.

Yeah, imagine de compiler calling rand(), taking a void (very constant), returning 123 or so.. assuming it's constant! :-)
 a) Hard for the compiler to tell if a function is pure. In many cases it 
 is not even possible (The halting problem has an example of such an 
 undecidable function).

Let's see. If the function only uses the inputs, without even unreferencing them, then it's pretty clear I suppose. But you're right, it's complex.
 b) The compiler needs a way to evaluate the function at compile time.

That's easy, by just calling it.
 c) The compiler has no way of knowing the function space and time 
 complexity.

How is this different from a) ?
 It would be interesting if there was a way to flag functions as being 
 pure. The compiler could then try to evaluate the function at compile time 
 or reduce the number of calls to the function at run time similar to what 
 a common sub-expression removal optimization would do.

Indeed. Something like C++ "const", but then for real, and not removable by a cast. A "pure" function would simply have a number of restrictions, I suppose something like: not allowed to reference any data outside the function (globals, class members, etc). Lio.
Feb 22 2006
parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Lionello Lunesu skrev:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:dthe8b$1jg1$1 digitaldaemon.com...
 a) Hard for the compiler to tell if a function is pure. In many cases it 
 is not even possible (The halting problem has an example of such an 
 undecidable function).

Let's see. If the function only uses the inputs, without even unreferencing them, then it's pretty clear I suppose. But you're right, it's complex.

The function has to halt also. An infinite loop can be impossible for the compiler to detect. One would not want the compilation to hang.
 c) The compiler has no way of knowing the function space and time 
 complexity.

How is this different from a) ?

This is similar to a), but since a) is provable undecidable, probably not harder. :) If the function call takes five hours to complete, the compilation would take five hours times the number of times the function got called with different arguments. Also, if the function uses 2 gb of stack space, the compiler might run out of memory... The compiler would have to execute the function for a certain amount of time, and break it if it doesn't return. / Oskar
Feb 22 2006
prev sibling parent Sean Kelly <sean f4.ca> writes:
Lionello Lunesu wrote:
 Interesting indeed.
 
 Is there no way to "fold constants" in this kind of code too? If you know 
 the inputs to a function are all constant, can't you simply replace the 
 inputs + function call with the function's output?

If the function can be inlined and the operations it contains are also subject to constant folding then the optimizer should already do this. Otherwise, while it's possible in some cases I don't know of a compiler that does this. I believe this has been talked about on the C++ forums as "atomic functions." Sean
Feb 22 2006
prev sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Walter Bright wrote:
 "Georg Wrede" <georg.wrede nospam.org> wrote
 Walter Bright wrote:
 
 If the compiler is to constant fold regular expressions, then it 
 needs to build in to the compiler exactly what would happen if
 the regex code was evaluated at runtime.

Yes. IMHO in essence, the binary machine code, which the runtime also would build. What I have a hard time seeing is, how this differs from building a normal function at compile time?

Consider the strlen() function. Compiling a strlen() function and generating machine code for it is a very different thing from the compiler knowing what strlen is and replacing: strlen("abc") with: 3

Either I'm getting too old for this business, or you're only giving pseudo answers. (1) If we were to stop the compiler dead in its tracks, and I compiled the function "manually" and returned it to the compiler, would we still have a problem here? (2) {-- and this I've so far avoided to bring up, out of courtesy --}, if Don can do it with templates, what's so impossible doing it the regular way?? ------------------- Just a cross-check: [I think] we're talking about compiling a single regular expression. My definition: "a compiled regular expression" is any piece of machine code that takes *one string* as the argument, and returns (depending on which of the 2 kinds it is) either a boolean (as in found or not), or an integer denoting position of First Match. Such a piece of machine code is a function that complies to one of the following signatures: bool foo(char[]); // match int bar(char[]); // search
Feb 22 2006
parent reply Don Clugston <dac nospam.com.au> writes:
Georg Wrede wrote:
 Walter Bright wrote:
 "Georg Wrede" <georg.wrede nospam.org> wrote
 Walter Bright wrote:

 If the compiler is to constant fold regular expressions, then it 
 needs to build in to the compiler exactly what would happen if
 the regex code was evaluated at runtime.

Yes. IMHO in essence, the binary machine code, which the runtime also would build. What I have a hard time seeing is, how this differs from building a normal function at compile time?

Consider the strlen() function. Compiling a strlen() function and generating machine code for it is a very different thing from the compiler knowing what strlen is and replacing: strlen("abc") with: 3

Either I'm getting too old for this business, or you're only giving pseudo answers. (1) If we were to stop the compiler dead in its tracks, and I compiled the function "manually" and returned it to the compiler, would we still have a problem here?

That would be OK. The issue is that the compiler is a tool for converting text to machine code. It has no mechanism for executing the machine code.
 (2) {-- and this I've so far avoided to bring up, out of courtesy --},
 if Don can do it with templates, what's so impossible doing it the 
 regular way??

The compiler does have a mechanism for executing the "template language" at compile time, which is what my code is using. But, the template language (which I'll call Double D (DD) :-) ) is fundamentally different to the ordinary D language (eg, it has no variables). Conceivably, a compiler could convert a D function into a DD metafunction, provided that it doesn't write to any variables except at initialisation, and doesn't use any control structures other than "if-else" and "return", and all of its parameters are compile-time constants. But that would be so restricted as to be almost useless. Of course the compiler itself could have the DD code built into it, but DD is a horribly inefficient language, and it would be hideous to program from inside the compiler. What could perhaps be done is to allow functions with all-constant parameters to be converted into overloads. eg we have the DD metafunction int strlenT!(char [] s) Then, if we could define some kind of syntax like const alias int strlen(char [] s) strlenT!(s); as an overload of strlen, so that if all parameters are compile-time constants, then the reference to strlen becomes a template instantiation. More generally, if the lookup mechanism for functions was changed to be: If the first n parameters of a functions are all compile-time constants, C1, C2, ... with the remainder being variables or constants, V1, V2, ... try to find a matching template. eg, given func(C1, C2, C3, V1, V2, C4) the following functions are looked for, in this order: func!(C1, C2, C3)(V1, V2, C4); func!(C1, C2)(C3, V1, V2, C4); func!(C1)(C2, C3, V1, V2, C4); func(C1, C2, C3, V1, V2, C4); Note that as soon as a template is found, the search stops. eg if there is a func(C1, C2, C3) which doesn't have a (V1, V2, V3) member function, compilation will fail even if a function func(p1, p2, p3, p4, p5, p6) exists. This is superficially akin to overloading 'const' parameters in C++, but unlike C++ "const" would actually mean "constant" and not just "I'm not _supposed_ to change it".
Feb 23 2006
next sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Don Clugston wrote:
 Georg Wrede wrote:
 Walter Bright wrote:
 Georg Wrede wrote:
 Walter Bright wrote:

 If the compiler is to constant fold regular expressions, then it 
 needs to build in to the compiler exactly what would happen if
 the regex code was evaluated at runtime.

Yes. IMHO in essence, the binary machine code, which the runtime also would build. What I have a hard time seeing is, how this differs from building a normal function at compile time?

Consider the strlen() function. Compiling a strlen() function and generating machine code for it is a very different thing from the compiler knowing what strlen is and replacing: strlen("abc") with: 3

Either I'm getting too old for this business, or you're only giving pseudo answers. (1) If we were to stop the compiler dead in its tracks, and I compiled the function "manually" and returned it to the compiler, would we still have a problem here?

That would be OK. The issue is that the compiler is a tool for converting text to machine code. It has no mechanism for executing the machine code.

Aaaaaah... heureka. So there's a wavelength problem here! What I've been talking all along, is 'a regexp compiled into a function, but _not_run_ at compile time. ** So, Don's regexps can be both "compiled" and "run" at compile time, whereas what I've been wishing all along is a "compile-time compiled but not compile-time run" regexp! In other words, a profoundly normal function, just that it happens to be written in RegexpLanguage instead of vanilla D (Or C, or asm). (Gees, I hope this same wavelength problem wasn't the reason for last winter's unsuccessful regexp discussions.) :-(
Feb 23 2006
parent Don Clugston <dac nospam.com.au> writes:
Georg Wrede wrote:
 Don Clugston wrote:
 Georg Wrede wrote:
 Walter Bright wrote:
 Georg Wrede wrote:
 Walter Bright wrote:

 If the compiler is to constant fold regular expressions, then it 
 needs to build in to the compiler exactly what would happen if
 the regex code was evaluated at runtime.

Yes. IMHO in essence, the binary machine code, which the runtime also would build. What I have a hard time seeing is, how this differs from building a normal function at compile time?

Consider the strlen() function. Compiling a strlen() function and generating machine code for it is a very different thing from the compiler knowing what strlen is and replacing: strlen("abc") with: 3

Either I'm getting too old for this business, or you're only giving pseudo answers. (1) If we were to stop the compiler dead in its tracks, and I compiled the function "manually" and returned it to the compiler, would we still have a problem here?

That would be OK. The issue is that the compiler is a tool for converting text to machine code. It has no mechanism for executing the machine code.

Aaaaaah... heureka. So there's a wavelength problem here! What I've been talking all along, is 'a regexp compiled into a function, but _not_run_ at compile time.

Oh dear, I think I've just confused you. I was only referring to strlen, not to regexps. I was trying to explain Walter's statement about why it's difficult for a compiler writer.
 ** So, Don's regexps can be both "compiled" and "run" at compile time, 
 whereas what I've been wishing all along is a "compile-time compiled but 
 not compile-time run" regexp!

No, you were right the first time. At compile time, the regexp pattern string is compiled into an ordinary function. Example: the trivial case bool b = test!("abc")(str); compiles to something like: int test_a(char [] str) { return str.length>=3 && str[0..3]=="abc"; } bool b = test_a(str); It doesn't actually call the test_a function at compile time. It's only something like strlen!("abc"), where all of the parameters are known at run time, which is "run" at compile time. In the regexp case, it's the "make a regexp engine" code which is run at compile time. The engine itself is only run at runtime.
 In other words, a profoundly normal function, just that it happens to be 
 written in RegexpLanguage instead of vanilla D (Or C, or asm).

Exactly.
Feb 23 2006
prev sibling parent Georg Wrede <georg.wrede nospam.org> writes:
(I put stuff in D.dtl.)

georg
Feb 23 2006
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Georg Wrede wrote:
 Walter Bright wrote:
 "Wang Zhen" <nehzgnaw gmail.com> wrote in message 
 news:dt49iv$2hm5$1 digitaldaemon.com...

 Although syntactically correct, MatchExpression in
 StaticIfCondition or StaticAssert do not compile. For example:

 void main(){static if(!("" ~~ "")){}static assert("" ~~ "");}

 Is this intended or an unimplemented feature?

The problem is that getting it to work requires the compiler itself to understand regular expressions. Currently, it does not.

Intriguing. I'd sure love to hear more about this. I take it understanding regular expressions is much more than just compiling them? (Like what the runtime does, or Perl, etc.?)

A problem is that there are a number of dialects of regexp. The spec doesn't seem to indicate which dialect is being used. Among the differences between them is whether subexpressions are parenthesised by \(...\) or simply (...). Another issue is whether we expect implementations to support the Unicode extensions to regexps described here http://www.textpad.info/forum/viewtopic.php?t=4778 No doubt there are other differences.... Whichever we choose, the behaviour of using std.regexp directly, ~~ evaluated at runtime and ~~ evaluated at compiletime must be consistent. But that isn't hard - the compiler would just call the same code that std.regexp uses. Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Feb 21 2006