www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Overlapping functionality: IFTI, templates, is-expressions

reply Russell Lewis <webmaster villagersonline.com> writes:
As you can tell from my recent posts (and bug reports), I've been 
pounding on D's template mechanisms a lot recently.  I'm trying to build 
a parser generator using templates.  I'm now starting my 3rd major 
generation of the library, because I keep finding fundamental weaknesses 
in each method I try.

I'm writing this post to observe, publicly, that there is *A LOT* over 
overlap between functions with IFTI, template specializations, and 
is-expressions.  However, the overlap isn't perfect; AFAIK, none of 
these 3 is a strict superset of the others.  So, it looks to me like I 
can't just choose one method and stick with it.  I would like to 
advocate for two changes to the language:


1) The 3 mechanisms should be explicitly connected.  That is, 2 of them 
should be defined to just be syntax sugar of the third.  For instance, a 
template with specializations, IMHO, should be declared (in the spec) to 
be syntax sugar for one mega-template (taking a Tuple argument) where 
the various alternatives are deduced using static if() and 
is-expressions.  The spec should, of course, exactly document how each 
type of template specialization is mapped to is-expressions.  Somehow, 
IFTI should be likewise linked it, though I'm a little less clear how.

2) Templates *MUST* support multiple Tuples.  Imagine trying to declare 
a template specialization which takes a delegate (or delegate type) with 
an arbitrary number of arguments, but the template also takes an 
arbitrary number of other parameters.  I've found ways to do it which 
work for my special case, but I don't know of any way to do it in the 
general case.


In truth, I have more ideas than just the two above, but the two above 
would solve most of my issues.

Thoughts?
	Russ
Mar 19 2008
next sibling parent reply BCS <BCS pathlink.com> writes:
Russell Lewis wrote:
 As you can tell from my recent posts (and bug reports), I've been 
 pounding on D's template mechanisms a lot recently.  I'm trying to build 
 a parser generator using templates.  I'm now starting my 3rd major 
 generation of the library, because I keep finding fundamental weaknesses 
 in each method I try.
 

Do you /need/ one or do you want to /wright/ one? I ask because I have a working version of this already. http://www.dsource.org/projects/scrapple/browser/trunk/dparser
 [...]

I haven't read your proposals in detail but the first one seems impractical because it would be very hard to construct the combined template. Consider, for example, a template where it is used to build other instances of it's self. Also I /think/ you can overload templates from different modules if the args are different. As to the second >1 tuple idea, firstly, it can be done in a round about way (see below) and it has been proposed but no usable way to implement it has been put forward. (A few ides have been floated but they have had problems) template A(a...) { template B(b...) { // template code uses a & b } } A!(int, char, Object).B!(1, 'g', null)...
 In truth, I have more ideas than just the two above, but the two above 
 would solve most of my issues.
 
 Thoughts?
     Russ

Mar 19 2008
parent reply Russell Lewis <webmaster villagersonline.com> writes:
BCS wrote:
 Russell Lewis wrote:
 As you can tell from my recent posts (and bug reports), I've been 
 pounding on D's template mechanisms a lot recently.  I'm trying to 
 build a parser generator using templates.  I'm now starting my 3rd 
 major generation of the library, because I keep finding fundamental 
 weaknesses in each method I try.

Do you /need/ one or do you want to /wright/ one? I ask because I have a working version of this already. http://www.dsource.org/projects/scrapple/browser/trunk/dparser

I don't know whether I *need* one...but it's a very fun exercise to write one. I have your svn repository checked out on my machine...but I haven't delved deeping into it, yet. From a cursory inspection, it didn't seem like it fit my needs. But I may be wrong. :)
  > [...]
 
 I haven't read your proposals in detail but the first one seems 
 impractical because it would be very hard to construct the combined 
 template. Consider, for example, a template where it is used to build 
 other instances of it's self. Also I /think/ you can overload templates 
 from different modules if the args are different.

What I'm thinking is something like this: ORIGINAL CODE (viewing template specialization as syntax sugar) template foo(T : int) { ... } template foo(T : char) { ... } REWRITTEN CODE (expressing specialization as is-expressions) template foo(TPL...) { static if(TPL.length == 1 && is( TPL[0] == int )) { ... } else static if(TPL.length == 1 && is( TPL[0] == char)) { ... } else static assert(false, "ERROR: Unsupported specialization of foo"); } END CODE
 As to the second >1 tuple idea, firstly, it can be done in a round about 
 way (see below) and it has been proposed but no usable way to implement 
 it has been put forward. (A few ides have been floated but they have had 
 problems)

Certainly, we need (someday) a syntax to allow multiple tuples in the same set of parameters, and that will require some sort of "grouping" syntax for the instantiator. And you can hack multiple-tuples using int params if you require the instantiators to play nicely with you (the first param is the length of the first tuple, the rest of the template parameters are one big tuple, the concatenation of your two tuples). But, at least for some cases (specialization) there is some low-hanging fruit, such as: * Args of function pointers & delegates EXAMPLE CODE template foo(void delegate(ARGS...), REST...) { ... } END CODE * Typesafe tuples (analogous to typesafe variadic functions) EXAMPLE CODE template foo(void delegate()... DGS, REST...) { ... } END CODE * Combine the two above, to give a tuple-of-tuples? EXAMPLE CODE template foo(void delegate(ARGS...)... DGS, REST...) { ... } END CODE
Mar 19 2008
next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Russell Lewis wrote:
 What I'm thinking is something like this:
 
 ORIGINAL CODE (viewing template specialization as syntax sugar)
 
    template foo(T : int) { ... }
    template foo(T : char) { ... }
 
 REWRITTEN CODE (expressing specialization as is-expressions)
 
    template foo(TPL...)
    {
      static if(TPL.length == 1 && is( TPL[0] == int )) { ... }
      else
      static if(TPL.length == 1 && is( TPL[0] == char)) { ... }
      else
      static assert(false, "ERROR: Unsupported specialization of foo");
    }
 
 END CODE

Defining template specialization in terms of a single template with static if's would be a breaking change. Two reasons come to mind: 1. Template instantiation changes from "best match" to "first match". 2. SFINAE is no longer true. #1 This would shift the work of template matching from the compiler writer to the programmer. Ordering of templates would become important to the programmer. The viability of this will partially depend on if someone come up with an example where no single template ordering would work properly. If there's no examples or only pathological examples (that may break many implementations anyway), I'd be fine with the loss of functionality. #2 I know from past discussions that C++ programmers hold this very near and dear to their heart. Personally, it seems like a back door to long compilation times. I don't have any problem with enforcing criteria for templates to be defined up front, but I'm certain others don't agree with me. Does the loss of SFINAE make templates too close to generics? I know C# will refuse to compile code if the up-front criteria for a generic type isn't specific enough.
Mar 19 2008
next sibling parent reply Russell Lewis <webmaster villagersonline.com> writes:
Jason House wrote:
 Defining template specialization in terms of a single template with static
 if's would be a breaking change.  Two reasons come to mind:
   1. Template instantiation changes from "best match" to "first match".
   2. SFINAE is no longer true.
 
 #1 This would shift the work of template matching from the compiler writer
 to the programmer.  Ordering of templates would become important to the
 programmer.  The viability of this will partially depend on if someone come
 up with an example where no single template ordering would work properly. 
 If there's no examples or only pathological examples (that may break many
 implementations anyway), I'd be fine with the loss of functionality.

I miscommunicated. I didn't mean that templates would be dropped into the is-expression template in lexical order. I was assuming that the complier would generate them in the correct order to maintain "best match." So the most restrictive cases would be handled first, getting less and less restrictive as you go. And before you wonder, "how will we ever specify the right order to put things in?" remember that we need just that sort of language specification if anybody else is ever going to write another D compiler. That is, we *must* have some strict, duplicatable definition of "best match" if we will ever have any D compiler other than dmd.
 #2 I know from past discussions that C++ programmers hold this very near and
 dear to their heart.  Personally, it seems like a back door to long
 compilation times.  I don't have any problem with enforcing criteria for
 templates to be defined up front, but I'm certain others don't agree with
 me.  Does the loss of SFINAE make templates too close to generics?  I know
 C# will refuse to compile code if the up-front criteria for a generic type
 isn't specific enough.

(quoting the Joker) "I have a name for my pain, and it is SFINAE!" Now I understand why dmd works (or actually, fails to work) the way it does. When I have some deeply-nested set of templates, and there is some quirkly little problem deep within them, then dmd will often give me some totally useless error: foo.d: 25: template "wrapper_way_on_the_outside" has an error GRRRR!!!! I thought that this was a bug in the compiler... The only way to debug these problems is to unwrap the template, by hand, using repeated aliases over and over and over, dozens of compiles, until you finally find the problem. It's usually that some minor template way off in the bowels of my library wasn't properly overloaded for some specific case. void myFunctionThatWillNotCompile() { alias deepestNestedTemplate!(ARGS) a1; alias nextLevel!(deepestNestedTemplate!(ARGS)) a2; alias adNauseum!(...) a3; /+ ... this code commented out because SFINAE makes ... dmd produce error messages that don't tell me ... what I need to know +/ } Worse, it means that if I have a properly-specialized template for some case, but the implementation within it has a small error, my code is going to get routed to the general, catch-all template (which it never should have hit), and the errors that I actually *do* see won't make any sense. EEEK!!! I know that there must be a reason why people argue for SFINAE, but, from my recent experience, I have to disagree. I will say (with all modestly and readiness to learn otherwise) that "SFINAE IS EVIL." Here's my modest proposal: * If there is a "best match," force the code to use it. Don't ignore the best match because of internal compilation errors. * If people really need SFINAE, then give them a way to express it explicitly. Something like: template foo(int n) { static if(n == 0) try_next_best_specialization; }
Mar 19 2008
parent reply Jason House <jason.james.house gmail.com> writes:
Russell Lewis wrote:

 Jason House wrote:
 #2 I know from past discussions that C++ programmers hold this very near
 #and
 dear to their heart.  Personally, it seems like a back door to long
 compilation times.  I don't have any problem with enforcing criteria for
 templates to be defined up front, but I'm certain others don't agree with
 me.  Does the loss of SFINAE make templates too close to generics?  I
 know C# will refuse to compile code if the up-front criteria for a
 generic type isn't specific enough.

(quoting the Joker) "I have a name for my pain, and it is SFINAE!" Now I understand why dmd works (or actually, fails to work) the way it does. When I have some deeply-nested set of templates, and there is some quirkly little problem deep within them, then dmd will often give me some totally useless error: foo.d: 25: template "wrapper_way_on_the_outside" has an error GRRRR!!!! I thought that this was a bug in the compiler... The only way to debug these problems is to unwrap the template, by hand, using repeated aliases over and over and over, dozens of compiles, until you finally find the problem. It's usually that some minor template way off in the bowels of my library wasn't properly overloaded for some specific case. void myFunctionThatWillNotCompile() { alias deepestNestedTemplate!(ARGS) a1; alias nextLevel!(deepestNestedTemplate!(ARGS)) a2; alias adNauseum!(...) a3; /+ ... this code commented out because SFINAE makes ... dmd produce error messages that don't tell me ... what I need to know +/ } Worse, it means that if I have a properly-specialized template for some case, but the implementation within it has a small error, my code is going to get routed to the general, catch-all template (which it never should have hit), and the errors that I actually *do* see won't make any sense. EEEK!!! I know that there must be a reason why people argue for SFINAE, but, from my recent experience, I have to disagree. I will say (with all modestly and readiness to learn otherwise) that "SFINAE IS EVIL." Here's my modest proposal: * If there is a "best match," force the code to use it. Don't ignore the best match because of internal compilation errors. * If people really need SFINAE, then give them a way to express it explicitly. Something like: template foo(int n) { static if(n == 0) try_next_best_specialization; }

Making it explicit was discussed on the mailing list in the past (a few months ago?). IIRC, nothing came of it.
Mar 19 2008
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Jason House wrote:
 Russell Lewis wrote:
 
 Jason House wrote:
 #2 I know from past discussions that C++ programmers hold this very near
 #and
 dear to their heart.  Personally, it seems like a back door to long
 compilation times.  I don't have any problem with enforcing criteria for
 templates to be defined up front, but I'm certain others don't agree with
 me.  Does the loss of SFINAE make templates too close to generics?  I
 know C# will refuse to compile code if the up-front criteria for a
 generic type isn't specific enough.

Now I understand why dmd works (or actually, fails to work) the way it does. When I have some deeply-nested set of templates, and there is some quirkly little problem deep within them, then dmd will often give me some totally useless error: foo.d: 25: template "wrapper_way_on_the_outside" has an error GRRRR!!!! I thought that this was a bug in the compiler... The only way to debug these problems is to unwrap the template, by hand, using repeated aliases over and over and over, dozens of compiles, until you finally find the problem. It's usually that some minor template way off in the bowels of my library wasn't properly overloaded for some specific case. void myFunctionThatWillNotCompile() { alias deepestNestedTemplate!(ARGS) a1; alias nextLevel!(deepestNestedTemplate!(ARGS)) a2; alias adNauseum!(...) a3; /+ ... this code commented out because SFINAE makes ... dmd produce error messages that don't tell me ... what I need to know +/ } Worse, it means that if I have a properly-specialized template for some case, but the implementation within it has a small error, my code is going to get routed to the general, catch-all template (which it never should have hit), and the errors that I actually *do* see won't make any sense. EEEK!!! I know that there must be a reason why people argue for SFINAE, but, from my recent experience, I have to disagree. I will say (with all modestly and readiness to learn otherwise) that "SFINAE IS EVIL." Here's my modest proposal: * If there is a "best match," force the code to use it. Don't ignore the best match because of internal compilation errors. * If people really need SFINAE, then give them a way to express it explicitly. Something like: template foo(int n) { static if(n == 0) try_next_best_specialization; }

Making it explicit was discussed on the mailing list in the past (a few months ago?). IIRC, nothing came of it.

I think the main problem is that Walter basically hasn't ever spent time writing template-heavy code like this. Remember it took a lot of convincing just to get him to agree that D needed templates at all. So a dozen people can tell him it's a good idea, but it's hard for him to verify this is so for himself, and adding features ad-hoc when people ask for them is clearly not a good design recipe. --bb
Mar 19 2008
prev sibling next sibling parent Russell Lewis <webmaster villagersonline.com> writes:
Jason House wrote:
 #2 I know from past discussions that C++ programmers hold this very near and
 dear to their heart.  Personally, it seems like a back door to long
 compilation times.  I don't have any problem with enforcing criteria for
 templates to be defined up front, but I'm certain others don't agree with
 me.  Does the loss of SFINAE make templates too close to generics?  I know
 C# will refuse to compile code if the up-front criteria for a generic type
 isn't specific enough.

To put a humorous twist on my SFINAE rant, let's consider this pseudo-program: Bag BuyGroceries(SHOPPING_LIST...) () { auto myCar = GetInCar!(CalcCargoSize!(SHOPPING_LIST)); auto home = GetCurLocation(); auto myBank = GetBank(); auto store = GetStore(); DriveTo(myCar, myBank); auto money = myBank.Withdraw(CalcCost!(SHOPPING_LIST)); DriveTo(myCar, store); auto ret = Buy!(SHOPPING_LIST)(store); DriveTo(myCar, home); return ret; } If you didn't have enough money in your bank account, then, because of SFINAE, the compiler would give you the following error message: program.d: 12: You do not own a car.
Mar 19 2008
prev sibling parent reply Russell Lewis <webmaster villagersonline.com> writes:
Jason House wrote:
 #2 (SFINAE) I know from past discussions that C++ programmers hold this very
near and
 dear to their heart.  Personally, it seems like a back door to long
 compilation times.  I don't have any problem with enforcing criteria for
 templates to be defined up front, but I'm certain others don't agree with
 me.  Does the loss of SFINAE make templates too close to generics?  I know
 C# will refuse to compile code if the up-front criteria for a generic type
 isn't specific enough.

FYI: SFINAE is "Substitution Failure Is Not An Error". See Wikipedia. I thought about this some more, wondering why C++ programmers would have come to this conclusion. I now think it likely that they did this specifically because they didn't have static if() and is-expressions. If you don't have those two tools, then the only thing you can do is to run ahead into something, and hope it works...but if it doesn't work, you provide a more general alternative. Is it possible that SFINAE was a hack to get around C++'s lack of compile-time features? A hack that we can now dispense of?
Mar 19 2008
next sibling parent Don Clugston <dac nospam.com.au> writes:
Russell Lewis wrote:
 Jason House wrote:
 #2 (SFINAE) I know from past discussions that C++ programmers hold 
 this very near and
 dear to their heart.  Personally, it seems like a back door to long
 compilation times.  I don't have any problem with enforcing criteria for
 templates to be defined up front, but I'm certain others don't agree with
 me.  Does the loss of SFINAE make templates too close to generics?  I 
 know
 C# will refuse to compile code if the up-front criteria for a generic 
 type
 isn't specific enough.

FYI: SFINAE is "Substitution Failure Is Not An Error". See Wikipedia. I thought about this some more, wondering why C++ programmers would have come to this conclusion. I now think it likely that they did this specifically because they didn't have static if() and is-expressions. If you don't have those two tools, then the only thing you can do is to run ahead into something, and hope it works...but if it doesn't work, you provide a more general alternative. Is it possible that SFINAE was a hack to get around C++'s lack of compile-time features? A hack that we can now dispense of?

Possibly. It's hard to come up with code that looks cleaner with specialisation and SFINAE than it does with a static if. My experience was similar to yours -- I stopped using SFINAE.
Mar 20 2008
prev sibling parent reply Jason House <jason.james.house gmail.com> writes:
Russell Lewis Wrote:

 Jason House wrote:
 #2 (SFINAE) I know from past discussions that C++ programmers hold this very
near and
 dear to their heart.  Personally, it seems like a back door to long
 compilation times.  I don't have any problem with enforcing criteria for
 templates to be defined up front, but I'm certain others don't agree with
 me.  Does the loss of SFINAE make templates too close to generics?  I know
 C# will refuse to compile code if the up-front criteria for a generic type
 isn't specific enough.

FYI: SFINAE is "Substitution Failure Is Not An Error". See Wikipedia.

Sorry, I should have defined it in my post. That's what I get for thinking about too many things at once.
 I thought about this some more, wondering why C++ programmers would have 
 come to this conclusion.  I now think it likely that they did this 
 specifically because they didn't have static if() and is-expressions. 
 If you don't have those two tools, then the only thing you can do is to 
 run ahead into something, and hope it works...but if it doesn't work, 
 you provide a more general alternative.

Sounds right.
 Is it possible that SFINAE was a hack to get around C++'s lack of 
 compile-time features?  A hack that we can now dispense of?

I don't know if it's that simple. What if I want a specialization where T extends from two interfaces? I think the current syntax can handle simple stuff, but I don't know if it can go beyond that. Maybe some would argue the more complex cases should be done with static if anyway... Personally, I wouldn't mind some kind of optional in clause for a templated definition that can do a few static asserts to force more complex conditions... I don't know what would be the best syntax for that.
Mar 20 2008
next sibling parent reply Russell Lewis <webmaster villagersonline.com> writes:
Jason House wrote:
 Is it possible that SFINAE was a hack to get around C++'s lack of 
 compile-time features?  A hack that we can now dispense of?

I don't know if it's that simple. What if I want a specialization where T extends from two interfaces? I think the current syntax can handle simple stuff, but I don't know if it can go beyond that. Maybe some would argue the more complex cases should be done with static if anyway... Personally, I wouldn't mind some kind of optional in clause for a templated definition that can do a few static asserts to force more complex conditions... I don't know what would be the best syntax for that.

Good idea! But just to make things clear, let's make it an "SFINAE block." How about this definition: BEGIN CODE template foo(PARAMS) sfinae { // any failures in this block of code will cause DMD to // silently try a different template specialization } body { // any failures in this block of code are hard failures // ...but this block of code should be able to reference // symbols declared in the sfinae block } END CODE
Mar 20 2008
parent Jason House <jason.james.house gmail.com> writes:
Russell Lewis Wrote:

 Jason House wrote:
 Is it possible that SFINAE was a hack to get around C++'s lack of 
 compile-time features?  A hack that we can now dispense of?

I don't know if it's that simple. What if I want a specialization where T extends from two interfaces? I think the current syntax can handle simple stuff, but I don't know if it can go beyond that. Maybe some would argue the more complex cases should be done with static if anyway... Personally, I wouldn't mind some kind of optional in clause for a templated definition that can do a few static asserts to force more complex conditions... I don't know what would be the best syntax for that.

Good idea! But just to make things clear, let's make it an "SFINAE block." How about this definition: BEGIN CODE template foo(PARAMS) sfinae { // any failures in this block of code will cause DMD to // silently try a different template specialization } body { // any failures in this block of code are hard failures // ...but this block of code should be able to reference // symbols declared in the sfinae block } END CODE

Yes, that's exactly the kind of thing I'm thinking of.
Mar 20 2008
prev sibling parent BCS <BCS pathlink.com> writes:
Jason House wrote:

 Personally, I wouldn't mind some kind of optional in
 clause for a templated definition that can do a few
 static asserts to force more complex conditions...
 I don't know what would be the best syntax for that.

pragma(fail) ??
Mar 20 2008
prev sibling parent reply BCS <BCS pathlink.com> writes:
Russell Lewis wrote:

 
 I don't know whether I *need* one...but it's a very fun exercise to 
 write one.  I have your svn repository checked out on my machine...but I 
 haven't delved deeping into it, yet.  From a cursory inspection, it 
 didn't seem like it fit my needs.  But I may be wrong. :)
 

What do you need/want? Outside not having () groups, crashing on Left recursive grammars and a funky syntax for action rules, I think is fairly complete. I'm actually starting to work on it again and might be able to work in something if you need it.
 
 
 Certainly, we need (someday) a syntax to allow multiple tuples in the 
 same set of parameters, and that will require some sort of "grouping" 
 syntax for the instantiator.  And you can hack multiple-tuples using int 
 params if you require the instantiators to play nicely with you (the 
 first param is the length of the first tuple, the rest of the template 
 parameters are one big tuple, the concatenation of your two tuples).
 
 But, at least for some cases (specialization) there is some low-hanging 
 fruit, such as:
 

[...] Those cases might be of use.
Mar 19 2008
parent reply Russell Lewis <webmaster villagersonline.com> writes:
BCS wrote:
 Russell Lewis wrote:
 
 I don't know whether I *need* one...but it's a very fun exercise to 
 write one.  I have your svn repository checked out on my machine...but 
 I haven't delved deeping into it, yet.  From a cursory inspection, it 
 didn't seem like it fit my needs.  But I may be wrong. :)

What do you need/want? Outside not having () groups, crashing on Left recursive grammars and a funky syntax for action rules, I think is fairly complete. I'm actually starting to work on it again and might be able to work in something if you need it.

I don't really have logical reasoning here, so take what I'm about to say as "the story of what Russ did" instead of "substantive criticism of any details": The thing that I stumbled on was that, if I understood you correctly, your project is a port of the dmd front-end. What I'm looking for is a parser for the D language which is explicitly not tied to dmd. Not because dmd is bad (I use it all the time), but because I think that, for the health of the community, I think we really need more than one compiler. Ports of dmd don't count, IMHO. The other thing I'm doing is that I'm experimenting with my new parser architecture (infill delegates instead of building parse trees in the first pass), so I don't expect that other parser libraries are likely to be places to play with that. I posted a description of this parser architecture in one of my (very few) posts to digitalmars.D.learn BTW, my current (95% working, just had some corner cases that wouldn't auto-detect the right specialized templates) parser library uses syntax like this: or!(':', Parse_block, /* parses a nonterminal */ chain!("import", array!(IDENT, ','), ';'), ... many other alternativess ...) (delegate void(... auto-generated arg types ...) { ... process successful parse here ... }); It's far from perfect, but it's fun. It includes the templates chain!(...) (sequences) optional!(...) (things that can be skipped) array!(...) (repeated sequences, with or without dividers) or!(...) (alternatives) and they nest nicely. :)
Mar 19 2008
next sibling parent reply BCS <ao pathlink.com> writes:
Reply to Russell,

 I don't really have logical reasoning here, so take what I'm about to
 say as "the story of what Russ did" instead of "substantive criticism
 of any details":
 
 The thing that I stumbled on was that, if I understood you correctly,
 your project is a port of the dmd front-end.  What I'm looking for is
 a parser for the D language which is explicitly not tied to dmd.

Um. OK that understanding would make a big difference. What I have is a general parser generator. I also have a (non-functional) D grammar for it.
  Not
 because dmd is bad (I use it all the time), but because I think that,
 for the health of the community, I think we really need more than one
 compiler.

that's why I'm writing it.
 
 The other thing I'm doing is that I'm experimenting with my new parser
 architecture (infill delegates instead of building parse trees in the
 first pass), so I don't expect that other parser libraries are likely
 to be places to play with that.  I posted a description of this parser
 architecture in one of my (very few) posts to digitalmars.D.learn
 

You /might/ be able to use what I have to build that. You would need to embed the delegate generation/passing stuff in the action though.
 BTW, my current (95% working, just had some corner cases that wouldn't
 auto-detect the right specialized templates) parser library uses
 syntax like this:
 
 or!(':',
 Parse_block,  /* parses a nonterminal */
 chain!("import", array!(IDENT, ','), ';'),
 ... many other alternativess ...)
 (delegate void(... auto-generated arg types ...)
 {
 ... process successful parse here ...
 });

Errk!! Ow, Ow, ow. To much extra syntax! (for my taste) Mine works off a single string: |mixin(MakeMixin!("root",ReduceWhite(" | next : | bar / baz | | pig / owl * horse owl ; | root : | beer / next baz | | pig / baz horse owl | | keg / cat horse+ owl? baz | | pig / cat* cat cat | | beer / twin * car; | twin : | beer / king qween* ; | car : | keg / king qween qween king;"))); that, in the right place, with a few more template defined, make a parser.
Mar 19 2008
parent reply Russell Lewis <webmaster villagersonline.com> writes:
BCS wrote:
 or!(':',
 Parse_block,  /* parses a nonterminal */
 chain!("import", array!(IDENT, ','), ';'),
 ... many other alternativess ...)
 (delegate void(... auto-generated arg types ...)
 {
 ... process successful parse here ...
 });

Errk!! Ow, Ow, ow. To much extra syntax! (for my taste) Mine works off a single string:

True! But what I'm attempting to do, at this point in the development, is to develop an as-simple-as-I-can-get-it template parser library. As I see it, the *next* step is to build a string parser which calls the parser library. And, if I say so myself, I am, IMHO, getting pretty close to "as dense as you can get without having to parse strings at compile time." Ofc, if you have a parser library which is functional even if it's a little clunky, then hopefully you could use that library to scan input strings, build parse trees, and then generate parsers from that. :) So introduce me to your parser a bit. How do you handle things like: * Single character tokens (matching the . operator in D syntax, for instance) * Multi-character tokens (matching D keywords) * Lexer-recognized tokens (IDENT, NUM, CHAR_LITERAL, etc.) Also, how good are you at handling ambiguous grammars? How about any built-in way to handle repeated strings of elements, and turn them into arrays? Finally, how do you define the action-code? As a little more background, I actually wrote a parser generator that generated parsers from an extended-Bison syntax. It was slow, but functioned, and the output that the parser produced were auto-generated D structs and arrays that mirrored the parse tree. I still found it hard to read the grammar code. Here's the first few lines of the grammar for D: BEGIN CODE (IN MY BISON-DERIVED GRAMMAR) %token IDENT %{char*%} %token NUM %{char*%} %token STRING %{char*%} %token CHAR %{char*%} %token SHEBANG %{char*%} module_df: // http://digitalmars.com/d/module.html SHEBANG? ("module" IDENT_list=name ';')? /* HACK: *= is broken */ (decl_def+=decl_defs)? ; IDENT_list: [IDENT=ident,'.']+=name ; END CODE Problem was, when I tried to write complex rules, all in one integrated block, that things got hopelessly complex: BEGIN CODE class_declaration: // http://digitalmars.com/d/class.html ("auto"=is_auto|"scope"=is_scope)? "class" IDENT=name (template_parms_decl=template_parms)? (':' [("public"=is_public|"protected"=is_protected|"package"=is_package|"p ivate"=is_private)? type=type,',']+=super_interfaces)? '{' /* HACK: *= is broken */ (decl_def+=decl_defs)? '}' ; END CODE The nonterminal above will parse (nearly?) all of the class declarations that you can find in the DMD sources, or in dstress...but can you read it? I'm not so much drawing conclusions ("this sort of grammar can never work") as looking for alternatives ("I wonder if..."). Tell me the details about your design!
Mar 19 2008
parent reply BCS <BCS pathlink.com> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Russell Lewis wrote:
 BCS wrote:
 
 or!(':',
 Parse_block,  /* parses a nonterminal */
 chain!("import", array!(IDENT, ','), ';'),
 ... many other alternativess ...)
 (delegate void(... auto-generated arg types ...)
 {
 ... process successful parse here ...
 });

Errk!! Ow, Ow, ow. To much extra syntax! (for my taste) Mine works off a single string:

True! But what I'm attempting to do, at this point in the development, is to develop an as-simple-as-I-can-get-it template parser library. As I see it, the *next* step is to build a string parser which calls the parser library. And, if I say so myself, I am, IMHO, getting pretty close to "as dense as you can get without having to parse strings at compile time."

I'll grant that. I came from the other side; In my system the string parser is an integral part of the system. I don't think it could be trivially removed (or have bean added after the rest was written).
 
 Ofc, if you have a parser library which is functional even if it's a 
 little clunky, then hopefully you could use that library to scan input 
 strings, build parse trees, and then generate parsers from that. :)
 

I don't follow: Are you talking about parsing at run time or compile time?
 
 So introduce me to your parser a bit.  How do you handle things like:
 * Single character tokens (matching the . operator in D syntax, for
   instance)
 * Multi-character tokens (matching D keywords)
 * Lexer-recognized tokens (IDENT, NUM, CHAR_LITERAL, etc.)
 

(attached is an example parser implementation) Short version; I don't, I don't, and I do. Longer version. I don't even bother with lexing. That is left completely up to the programmer. They can even do it in Lex/Flex if they want. What I expect is an object that will give me it's "position" and can be set back to that "position" by feeding that value back in. This object is then passed back to named functions that the programmer supplys and these functions do the lexing or read into a pre lexed stream or whatever.
 Also, how good are you at handling ambiguous grammars?
 

It doesn't care. It work by matching the first match given so if there is ambiguity, it won't know and will still produce something as output.
 How about any built-in way to handle repeated strings of elements, and 
 turn them into arrays?

Yes, mostly. The grammar definitions allow '+', '*' and '?'. But can't handle groups except as another non terminal. The annotated terms are aggregated into an Object that can return the array of items.
 
 Finally, how do you define the action-code?
 

This is done be referencing a function for each disjunct in the rule. The terminals and actions code for the system (the actual D code the user needs to wright) are suppled by defining explicit string specializations of the "Terminal" and "Action" templates.
 
 
 I'm not so much drawing conclusions ("this sort of grammar can never 
 work") as looking for alternatives ("I wonder if...").  Tell me the 
 details about your design!

My generator will consume the (almost unmodified) D grammar in the spec. I actually have a sed script that will extract it for use (it's not done but the rest isn't to complex). I still need to factor out left recursion but I think I can automate that. My current project is allowing special cases action rules that will make this easy (e.g. process "item items*" as a left sides tree). Attached is a parser that uses dparse to, at runtime, parse the specification grammar that dparse uses at compile time. It might be a bit to new for dparse (I have been working on it and don't want to commit it in a bad state) but it should give you an idea of how things work. BTW the lexer (stuff before about line 210) in there is a bit cheesy because I'm lazy and that works. The /real/ guts start around line 315.
Mar 20 2008
parent reply Russell Lewis <webmaster villagersonline.com> writes:
BCS wrote:
 Russell Lewis wrote:

Thanks for the example code. I'll look it over soon, but right now I have to do my real job. :( I do have one response and one more question, though... (huge snip)
 Ofc, if you have a parser library which is functional even if it's a 
 little clunky, then hopefully you could use that library to scan input 
 strings, build parse trees, and then generate parsers from that. :)

I don't follow: Are you talking about parsing at run time or compile time?

Both, actually. My hope is to build a a parser library which is good for runtime parses, but which also, with CTFE and a string-to-template compile-time lexer, could be used to parse the actual grammar code that people give me as input. That is: stringInput -> compile-time-lexer -> myLibrary -> CTFE -> parse tree of user's grammar -> code which uses my library -> generated parser We'll see if it works...
 Also, how good are you at handling ambiguous grammars?

It doesn't care. It work by matching the first match given so if there is ambiguity, it won't know and will still produce something as output.

I should have also asked about lookahead. I didn't, because I view lookahead & ambiguity as the same thing in my parser. :) A key problem I had with Bison was that it couldn't handle C or D grammars in the "natural" form. The canonical example of this is the token string IDENT '*' '*' ... '*' IDENT ... which could be parsed as "declaration of pointer variable" (in which case the stars would naturally be part of "type" nonterminals, attached the to left side), or "multiplication where 2nd element needs lots of dereferencing" (in which case most of the stars would natrually be part of "expression" nonterminals, attached to the right side). This the classic reduce/shift conflict problem in yacc/Bison. Ofc, you can write grammars which don't need arbitrary recursion, but a high priority for me is to support natural grammar definitions and do the hard work in the library. So: can you handle arbitrary lookahead? Or do you, like Bison, make a one-time choice of "reduce or shift" and then you're stuck with it?
Mar 20 2008
parent reply BCS <BCS pathlink.com> writes:
Russell Lewis wrote:
 BCS wrote:
 
 Russell Lewis wrote:

Thanks for the example code. I'll look it over soon, but right now I have to do my real job. :( I do have one response and one more question, though... (huge snip)
 Ofc, if you have a parser library which is functional even if it's a 
 little clunky, then hopefully you could use that library to scan 
 input strings, build parse trees, and then generate parsers from 
 that. :)

I don't follow: Are you talking about parsing at run time or compile time?

Both, actually. My hope is to build a a parser library which is good for runtime parses, but which also, with CTFE and a string-to-template compile-time lexer, could be used to parse the actual grammar code that people give me as input. That is: stringInput -> compile-time-lexer -> myLibrary -> CTFE -> parse tree of user's grammar -> code which uses my library -> generated parser We'll see if it works...

you are ambitious! You might run into some issues with the CTFE retuning parsed structures. IIRC it's very restrictive about what it will return.
 Also, how good are you at handling ambiguous grammars?

It doesn't care. It work by matching the first match given so if there is ambiguity, it won't know and will still produce something as output.

I should have also asked about lookahead. I didn't, because I view lookahead & ambiguity as the same thing in my parser. :)

 
 So: can you handle arbitrary lookahead?  Or do you, like Bison, make a 
 one-time choice of "reduce or shift" and then you're stuck with it?

Mine is a recursive decent LL parser that works by way of "assume this is one of those and if not back-out and try again". So it is LL(*).
Mar 20 2008
parent reply Russell Lewis <webmaster villagersonline.com> writes:
BCS wrote:
 So: can you handle arbitrary lookahead?  Or do you, like Bison, make a 
 one-time choice of "reduce or shift" and then you're stuck with it?

Mine is a recursive decent LL parser that works by way of "assume this is one of those and if not back-out and try again". So it is LL(*).

Cool! Though that leads me to another question: when do the actions run? Inline, while you're attempting paths that might fail, or only after you know they can't fail?
Mar 20 2008
parent reply BCS <BCS pathlink.com> writes:
Russell Lewis wrote:
 BCS wrote:
 
 So: can you handle arbitrary lookahead?  Or do you, like Bison, make 
 a one-time choice of "reduce or shift" and then you're stuck with it?

Mine is a recursive decent LL parser that works by way of "assume this is one of those and if not back-out and try again". So it is LL(*).

Cool! Though that leads me to another question: when do the actions run? Inline, while you're attempting paths that might fail, or only after you know they can't fail?

In-line. That plays hell with side effects as you can't trigger on dropped paths for back out. Also there's a time penalty, but given that efficiency is so bad to begin with it's not much of an issue. If a user needed to have side effects then building a lambda tree might work. In 2.0 this would be easy with the non-broken closers. One thing I've bean thinking of is trying to build in a memoizer: "we tried to parse a foo at location 294 before and got this, lets use it" I'm not sure how or how well it will work but I've got some ideas.
Mar 20 2008
parent reply Russell Lewis <webmaster villagersonline.com> writes:
BCS wrote:
 One thing I've bean thinking of is trying to build in a memoizer: "we 
 tried to parse a foo at location 294 before and got this, lets use it" 
 I'm not sure how or how well it will work but I've got some ideas.

I have an idea that didn't work with my design, but it probably would with yours: you should be able to do a hybrid parser with bottom-up parsings of unambiguous strings. Basically, the concept is that you look for certain tokens (or strings of tokens) which are only used in a certain reduction. Then, you build a low-level scanner which scans the lexer's output (utterly without any understanding of the larger contexts) and collapses any patterns that it recognizes. You can perform multiple passes, and collapse even strings of things that include nonterminals, if you get lucky. Then, once you run out of bottom-up parsings, you run the top-down parser. Things that were bottom-up parsed are effectively memoized. There are other tricks you can use, too, by mixing the bottom-up and top-down parsers. For instance, in D the token ? is (as far as I remember) only used as part of a conditional expression. Your parser library could detect that in the grammar, and say, "whenever I see ? in the bottom-up parser scan, I will run the top-down parser to parse an 'expression' nonterminal as the next thing, then look for a ':' token and then do it again." Likewise, '&' only comes before an expression, as do all of the assignment operators and most of the binary operators. In most of these cases, you don't have any efficient way to bottom-up parse the stuff to the *left* of the key-token, but you can at least memoize the stuff on the right. Of course, it only works on grammars that happen to have unambiguous opportunities, but in D, there are a lot of those.
Mar 20 2008
parent BCS <BCS pathlink.com> writes:
Russell Lewis wrote:
 I have an idea that didn't work with my design, but it probably would 
 with yours: you should be able to do a hybrid parser with bottom-up 
 parsings of unambiguous strings.  Basically, the concept is that you 
 look for certain tokens (or strings of tokens) which are only used in a 
 certain reduction.  Then, you build a low-level scanner which scans the 
 lexer's output (utterly without any understanding of the larger 
 contexts) and collapses any patterns that it recognizes.  You can 
 perform multiple passes, and collapse even strings of things that 
 include nonterminals, if you get lucky.  Then, once you run out of 
 bottom-up parsings, you run the top-down parser.  Things that were 
 bottom-up parsed are effectively memoized.
 

Actually this could be implemented under underneath, beside or on top of mine (I'm not sure what the correct way to describe it it). The lexer object could be passed into a different function in the same struct as the parser that does what you describe. Because each of the non terminals has a publicly callable function for it, the mixed parser idea would be easy. (automating the extraction of the available rules would be a trick though)
 There are other tricks you can use, too, by mixing the bottom-up and 
 top-down parsers.  For instance, in D the token ? is (as far as I 
 remember) only used as part of a conditional expression.  Your parser 
 library could detect that in the grammar, and say, "whenever I see ? in 
 the bottom-up parser scan, I will run the top-down parser to parse an 
 'expression' nonterminal as the next thing, then look for a ':' token 
 and then do it again."  Likewise, '&' only comes before an expression, 
 as do all of the assignment operators and most of the binary operators. 
  In most of these cases, you don't have any efficient way to bottom-up 
 parse the stuff to the *left* of the key-token,

that's no problem, just use an RR parser rather than an LL one. The only issue would be that a lexer that can also walk backwards is more complicated.
 but you can at least 
 memoize the stuff on the right.
 
 Of course, it only works on grammars that happen to have unambiguous 
 opportunities, but in D, there are a lot of those.

Mar 20 2008
prev sibling parent Robert Fraser <fraserofthenight gmail.com> writes:
Russell Lewis wrote:
  What I'm looking for is a
 parser for the D language which is explicitly not tied to dmd.  Not 
 because dmd is bad (I use it all the time), but because I think that, 
 for the health of the community, I think we really need more than one 
 compiler.  Ports of dmd don't count, IMHO.

I just learned about this one, but it looks pretty awesome: http://code.google.com/p/dil/
Mar 19 2008
prev sibling next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Russell Lewis wrote:

 As you can tell from my recent posts (and bug reports), I've been
 pounding on D's template mechanisms a lot recently.  I'm trying to build
 a parser generator using templates.  I'm now starting my 3rd major
 generation of the library, because I keep finding fundamental weaknesses
 in each method I try.
 
 I'm writing this post to observe, publicly, that there is *A LOT* over
 overlap between functions with IFTI, template specializations, and
 is-expressions.  However, the overlap isn't perfect; AFAIK, none of
 these 3 is a strict superset of the others.  So, it looks to me like I
 can't just choose one method and stick with it.  I would like to
 advocate for two changes to the language:

I had to do a google search for IFTI. The top hits pointed me back to your bug reports and posts... I'm pretty sure it means "implicit function template instantiation"
Mar 19 2008
next sibling parent BCS <BCS pathlink.com> writes:
Jason House wrote:

 
 I had to do a google search for IFTI.  The top hits pointed me back to your
 bug reports and posts...  I'm pretty sure it means "implicit function
 template instantiation"

bingo. p.s. That reminds me... isn't it REALLY annoying when you Google for something and it points you to something YOU wrote!? (almost as bad as Windows saying to talk to the sys admin when you are the sys admin)
Mar 19 2008
prev sibling parent Russell Lewis <webmaster villagersonline.com> writes:
Jason House wrote:
 I had to do a google search for IFTI.  The top hits pointed me back to your
 bug reports and posts...  I'm pretty sure it means "implicit function
 template instantiation"

(sheepish grin) You're right, that's what I meant. Sorry!
Mar 19 2008
prev sibling parent reply Russell Lewis <webmaster villagersonline.com> writes:
While I'm glad for the discussion about parsers, we lost track of the 
original post.  What do people think of my suggestions:

* Make IFTI and template-specialization syntax sugar for is-expressions 
(in the right order so the deductions work properly)

* Support multiple tuples in the same template
Mar 21 2008
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Russell Lewis:
 * Support multiple tuples in the same template

For that I think D has to stop flattening tuples, and I think this can be a quite positive thing, allowing 2D/nD arrays of values, trees of them, and generally their nesting. Bye, bearophile
Mar 21 2008
parent renoX <renosky free.fr> writes:
bearophile a écrit :
 Russell Lewis:
 * Support multiple tuples in the same template

For that I think D has to stop flattening tuples

I *definitely* agree!!!! In all the case I've run against 'flattening by default' is a mistake (be it with parameter passing or with structure/record construction): this *loose information* by default which is a big no no in my view. Of course sometimes the developer wants to have the tuples flattened, so there should be a way to do it, but this shouldn't be the default. Regards, renoX
, and I think this
 can be a quite positive thing, allowing 2D/nD arrays of values, trees
 of them, and generally their nesting.
 
 Bye, bearophile

Mar 22 2008
prev sibling next sibling parent BCS <ao pathlink.com> writes:
Reply to Russell,

 While I'm glad for the discussion about parsers,

"The D NG: where all the good stuff is mis-labeled and off topic!" I have seen some threads that are 15+ deeper than where they went OT and /are still worth reading/. :-) (already said my piece on the original topic.)
Mar 21 2008
prev sibling parent reply Jason House <jason.james.house gmail.com> writes:
Russell Lewis wrote:

 While I'm glad for the discussion about parsers, we lost track of the
 original post.  What do people think of my suggestions:
 
 * Make IFTI and template-specialization syntax sugar for is-expressions
 (in the right order so the deductions work properly)

Since templates could eventually span multiple modules, I don't know if one could simply write static if and is expressions to cover all functionality. When limited to a single module, I think there should be a one to one mapping of some kind.
 
 * Support multiple tuples in the same template

I can appreciate the trouble not having this can cause. As long as it doesn't ruin the cool, clean syntax of lightweight use of generic code with tuples, I'm all for it.
Mar 22 2008
parent reply Russell Lewis <webmaster villagersonline.com> writes:
Jason House wrote:
 Russell Lewis wrote:
 
 While I'm glad for the discussion about parsers, we lost track of the
 original post.  What do people think of my suggestions:

 * Make IFTI and template-specialization syntax sugar for is-expressions
 (in the right order so the deductions work properly)

Since templates could eventually span multiple modules, I don't know if one could simply write static if and is expressions to cover all functionality. When limited to a single module, I think there should be a one to one mapping of some kind.

Do templates currently have the ability to span multiple modules? With other types of symbols, this type of overloading isn't allowed (you have to use aliases to explicitly bring in overloads from other modules). If templates are the same way, then we don't have a problem. And just to shortcut the OT discussion...this type of overloading seems very wrong at first, but it is done by design because without it you face some really nasty problems with large programs. It's a "this makes sense in hindsight design feature that we learned from C++." Read previous (long) NG threads about it.
Mar 22 2008
next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Russell Lewis wrote:

 Jason House wrote:
 Russell Lewis wrote:
 
 While I'm glad for the discussion about parsers, we lost track of the
 original post.  What do people think of my suggestions:

 * Make IFTI and template-specialization syntax sugar for is-expressions
 (in the right order so the deductions work properly)

Since templates could eventually span multiple modules, I don't know if one could simply write static if and is expressions to cover all functionality. When limited to a single module, I think there should be a one to one mapping of some kind.

Do templates currently have the ability to span multiple modules? With other types of symbols, this type of overloading isn't allowed (you have to use aliases to explicitly bring in overloads from other modules). If templates are the same way, then we don't have a problem.

posts by others lead me to believe that they can't right now. The same posts also talk about why that's a problem. For example, a custom traits class may know nothing about a new user defined type, and to work would require a specialization for the new type.
 And just to shortcut the OT discussion...this type of overloading seems
 very wrong at first, but it is done by design because without it you
 face some really nasty problems with large programs.  It's a "this makes
 sense in hindsight design feature that we learned from C++."  Read
 previous (long) NG threads about it.

I wonder a bit if the overload sets (an upcoming feature of D2) would be useful for templates as well...
Mar 22 2008
parent BCS <ao pathlink.com> writes:
Reply to Jason,

 posts by others lead me to believe that they can't right now.  The
 same posts also talk about why that's a problem.  For example, a
 custom traits class may know nothing about a new user defined type,
 and to work would require a specialization for the new type.
 

The best I expect you could do would be to have the new UDT generate some hooks to be used by the traits lib. I kind of do some of this in some of my template coding. I have a number of places where I expect a template end up as a type with certain members (I end up duck typing stuff) and what not. Another one that would be interesting would be to require the UDT to defined code generators that the lib then calls and mixes in. Short of something like that, I think your sunk whatever you do. One side needs to know about the other side at some level. The alternative (no body known about anybody but it all just works) was the objective of M$'s IP project. They called it an abstraction ecology. After 10 years and untold millions they had "the world most expensive C compiler" and no ecology.
Mar 22 2008
prev sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Russell Lewis wrote:
 Jason House wrote:
 Russell Lewis wrote:

 While I'm glad for the discussion about parsers, we lost track of the
 original post.  What do people think of my suggestions:

 * Make IFTI and template-specialization syntax sugar for is-expressions
 (in the right order so the deductions work properly)

Since templates could eventually span multiple modules, I don't know if one could simply write static if and is expressions to cover all functionality. When limited to a single module, I think there should be a one to one mapping of some kind.

Do templates currently have the ability to span multiple modules? With other types of symbols, this type of overloading isn't allowed (you have to use aliases to explicitly bring in overloads from other modules). If templates are the same way, then we don't have a problem. And just to shortcut the OT discussion...this type of overloading seems very wrong at first, but it is done by design because without it you face some really nasty problems with large programs. It's a "this makes sense in hindsight design feature that we learned from C++." Read previous (long) NG threads about it.

Are you saying templates don't need to overload across modules? If so, what is the D alternative to the non-intrusive traits template pattern used in C++? --bb
Mar 22 2008
parent Russell Lewis <webmaster villagersonline.com> writes:
Bill Baxter wrote:
 Are you saying templates don't need to overload across modules?  If so, 
 what is the D alternative to the non-intrusive traits template pattern 
 used in C++?

I tried, but I failed, to shortcut that rabbit trail. So let me take the next step, and repeat the subtle problems with silent cross-module overloading. The problem comes about when two different programmers, in different libraries, write templates with the same name. Say that programmer A writes a template foo with two specializations: BEGIN CODE template foo(int A,TPL...) { ... } template foo(TPL...) { ... } END CODE Now, ages later, when somebody links that with another library, we find this template: BEGIN CODE template foo(char c,TPL...) { ... } END CODE Now, when you use the template: alias foo!('c') myTemplateInstance; which library are you trying to use?
Mar 22 2008