www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Watch, discuss, upvote!

https://news.ycombinator.com/newest

https://twitter.com/D_Programming/status/476386465166135296

https://www.facebook.com/dlang.org/posts/863635576983458

http://www.reddit.com/r/programming/comments/27sjxf/dconf_2014_day_1_talk_4_inside_the_regular/


Andrei
Jun 10 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Tuesday, 10 June 2014 at 15:37:11 UTC, Andrei Alexandrescu 
wrote:
 Watch, discuss, upvote!

 https://news.ycombinator.com/newest

 https://twitter.com/D_Programming/status/476386465166135296

 https://www.facebook.com/dlang.org/posts/863635576983458

 http://www.reddit.com/r/programming/comments/27sjxf/dconf_2014_day_1_talk_4_inside_the_regular/


 Andrei
http://youtu.be/hkaOciiP11c
Jun 10 2014
parent "Joakim" <dlang joakim.airpost.net> writes:
On Tuesday, 10 June 2014 at 17:19:42 UTC, Dicebot wrote:
 On Tuesday, 10 June 2014 at 15:37:11 UTC, Andrei Alexandrescu 
 wrote:
 Watch, discuss, upvote!

 https://news.ycombinator.com/newest

 https://twitter.com/D_Programming/status/476386465166135296

 https://www.facebook.com/dlang.org/posts/863635576983458

 http://www.reddit.com/r/programming/comments/27sjxf/dconf_2014_day_1_talk_4_inside_the_regular/


 Andrei
http://youtu.be/hkaOciiP11c
Great talk, just finished watching the youtube upload. I zoned out during the livestream, as it was late over here and I was falling asleep during this fairly technical talk, but now that I'm awake, enjoyed going through it. Never knew how regular expression engines are implemented, good introduction to the topic and how D made your approach easier or harder. A model talk for DConf, particularly given the great results on the regex-dna benchmark.
Jun 12 2014
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
At about 40.42 in the "Thoughts on static regex" there is written 
"even compile-time printf would be awesome". There is a patch 
about __ctWrite in GitHug, it should be fixed and merged.

Bye,
bearophile
Jun 10 2014
parent reply "Atila Neves" <atila.neves gmail.com> writes:
On Tuesday, 10 June 2014 at 19:36:57 UTC, bearophile wrote:
 At about 40.42 in the "Thoughts on static regex" there is 
 written "even compile-time printf would be awesome". There is a 
 patch about __ctWrite in GitHug, it should be fixed and merged.

 Bye,
 bearophile
I wish I'd taken the mic at the end, and 2 days later Adam D. Ruppe said what I was thinking of saying: unit test and debug the CTFE function at runtime and then use it at compile-time when it's ready for production. Yes, Dmitry brought up compiler bugs. But if you write a compile-time UT and it fails, you'll know it wasn't because of your own code because the run-time ones still pass. Maybe there's still a place for something more than pragma msg, but I'd definitely advocate for the above at least in the beginning. If anything, easier ways to write compile-time UTs would be, to me, preferable to a compile-time printf. Atila
Jun 11 2014
next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 11 June 2014 at 18:03:06 UTC, Atila Neves wrote:
 I wish I'd taken the mic at the end, and 2 days later Adam D. 
 Ruppe said what I was thinking of saying: unit test and debug 
 the CTFE function at runtime and then use it at compile-time 
 when it's ready for production.
Aye. It wasn't long ago that this wasn't really possible because of how incomplete and buggy CTFE was, you kinda had to do it with special code, but now so much of the language works, there's a good chance if it works at runtime it will work at compile time too. I was really surprised with CTFE a few months ago when I tried to use my dom.d with it... and it actually worked. That's amazing to me. But anyway, in general, the ctfe mixin stuff could be replaced with an external code generator, so yeah that's the way I write them now - as a code generator standalone thing then go back and enum it to actually use. (BTW I also like to generate fairly pretty code, e.g. indentend properly, just because it makes it easier to read.)
 Yes, Dmitry brought up compiler bugs. But if you write a 
 compile-time UT and it fails, you'll know it wasn't because of 
 your own code because the run-time ones still pass.
Yeah, good point too.
Jun 11 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
12-Jun-2014 03:29, Adam D. Ruppe пишет:
 On Wednesday, 11 June 2014 at 18:03:06 UTC, Atila Neves wrote:
 I wish I'd taken the mic at the end, and 2 days later Adam D. Ruppe
 said what I was thinking of saying: unit test and debug the CTFE
 function at runtime and then use it at compile-time when it's ready
 for production.
Aye. It wasn't long ago that this wasn't really possible because of how incomplete and buggy CTFE was, you kinda had to do it with special code, but now so much of the language works, there's a good chance if it works at runtime it will work at compile time too. I was really surprised with CTFE a few months ago when I tried to use my dom.d with it... and it actually worked. That's amazing to me. But anyway, in general, the ctfe mixin stuff could be replaced with an external code generator, so yeah that's the way I write them now - as a code generator standalone thing then go back and enum it to actually use. (BTW I also like to generate fairly pretty code, e.g. indentend properly, just because it makes it easier to read.)
This one thing I'm loosing sleep over - what precisely is so good in CTFE code generation in _practical_ context (DSL that is quite stable, not just tiny helpers)? By the end of day it's just about having to write a trivial line in your favorite build system (NOT make) vs having to wait for a couple of minutes each build hoping the compiler won't hit your system's memory limits. And these couple of minutes are more like 30 minutes at a times. Worse yet unlike proper build system it doesn't keep track of actual changes (same regex patterns get recompiled over and over), at this point seamless integration into the language starts felling like a joke. And speaking of seamless integration: just generate a symbol name out of pattern at CTFE to link to later, at least this much can be done relatively fast. And voila even the clunky run-time generation is not half-bad at integration. Unless things improve dramatically CTFE code generation + mixin is just our funny painful toy. -- Dmitry Olshansky
Jun 12 2014
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Dmitry Olshansky:

 Unless things improve dramatically CTFE code generation +
An alternative and much faster JITter for LLVM, something like this could make CTFE on LDC2 very quick: http://llvm.org/devmtg/2014-04/PDFs/LightningTalks/fast-jit-code-generation.pdf Bye, bearophile
Jun 12 2014
prev sibling next sibling parent reply "Colin" <grogan.colin gmail.com> writes:
On Thursday, 12 June 2014 at 09:17:45 UTC, Dmitry Olshansky wrote:
 12-Jun-2014 03:29, Adam D. Ruppe пишет:
 On Wednesday, 11 June 2014 at 18:03:06 UTC, Atila Neves wrote:
 I wish I'd taken the mic at the end, and 2 days later Adam D. 
 Ruppe
 said what I was thinking of saying: unit test and debug the 
 CTFE
 function at runtime and then use it at compile-time when it's 
 ready
 for production.
Aye. It wasn't long ago that this wasn't really possible because of how incomplete and buggy CTFE was, you kinda had to do it with special code, but now so much of the language works, there's a good chance if it works at runtime it will work at compile time too. I was really surprised with CTFE a few months ago when I tried to use my dom.d with it... and it actually worked. That's amazing to me. But anyway, in general, the ctfe mixin stuff could be replaced with an external code generator, so yeah that's the way I write them now - as a code generator standalone thing then go back and enum it to actually use. (BTW I also like to generate fairly pretty code, e.g. indentend properly, just because it makes it easier to read.)
And these couple of minutes are more like 30 minutes at a times. Worse yet unlike proper build system it doesn't keep track of actual changes (same regex patterns get recompiled over and over), at this point seamless integration into the language starts felling like a joke.
Maybe a change to the compiler to write any mixin'd string out to a temporary file (along with some identifier information and the line of code that generated it) and at the next compilation time try reading it back from that file iff the line of code that generated it hasnt changed? Then, there'd be no heavy work for the compiler to do, apart from read that file in to a string.
Jun 12 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 12 June 2014 at 10:40:56 UTC, Colin wrote:
 Maybe a change to the compiler to write any mixin'd string out 
 to a temporary file (along with some identifier information and 
 the line of code that generated it) and at the next compilation 
 time try reading it back from that file iff the line of code 
 that generated it hasnt changed?

 Then, there'd be no heavy work for the compiler to do, apart 
 from read that file in to a string.
Compiler can cache return value of function that get called from inside mixin statement (for a given argument set). As CTFE is implicitly pure (no global state at compile-time) later generated code can be simply re-used for same argument set. Re-using it between compiler invocations is more tricky because it is only legal if generator function and all stuff they indirectly use have not changed too. Ignoring this requirement can result in nasty build issues that are only fixed by clean build. Too harmful in my opinion.
Jun 12 2014
next sibling parent "Colin" <grogan.colin gmail.com> writes:
On Thursday, 12 June 2014 at 12:31:09 UTC, Dicebot wrote:
 On Thursday, 12 June 2014 at 10:40:56 UTC, Colin wrote:
 Maybe a change to the compiler to write any mixin'd string out 
 to a temporary file (along with some identifier information 
 and the line of code that generated it) and at the next 
 compilation time try reading it back from that file iff the 
 line of code that generated it hasnt changed?

 Then, there'd be no heavy work for the compiler to do, apart 
 from read that file in to a string.
Compiler can cache return value of function that get called from inside mixin statement (for a given argument set). As CTFE is implicitly pure (no global state at compile-time) later generated code can be simply re-used for same argument set. Re-using it between compiler invocations is more tricky because it is only legal if generator function and all stuff they indirectly use have not changed too. Ignoring this requirement can result in nasty build issues that are only fixed by clean build. Too harmful in my opinion.
Yeah, it quite dangerous I agree. I was only thinking of a solution to the problem above where a ctRegex is compiled every time, whether it was changed or not. I'm sure theres some way of keeping track of all dependent D modules filename, and if any of them have been changed in the chain, recalculate the string mixin. Only trouble with that is, there'd be a good chunk of checking for every mixin, and would slow the compiler down in normal use cases.
Jun 12 2014
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 06/12/2014 02:31 PM, Dicebot wrote:
 Compiler can cache return value of function that get called from inside
 mixin statement (for a given argument set). As CTFE is implicitly pure
 (no global state at compile-time) later generated code can be simply
 re-used for same argument set.

 Re-using it between compiler invocations is more tricky because it is
 only legal if generator function and all stuff they indirectly use have
 not changed too. Ignoring this requirement can result in nasty build
 issues that are only fixed by clean build. Too harmful in my opinion.
Clearly, nirvana is continuous compilation, where the compiler performs explicit dependency management at the level of nodes in the syntax tree.
Jun 12 2014
parent "Dicebot" <public dicebot.lv> writes:
On Thursday, 12 June 2014 at 12:49:23 UTC, Timon Gehr wrote:
 On 06/12/2014 02:31 PM, Dicebot wrote:
 Compiler can cache return value of function that get called 
 from inside
 mixin statement (for a given argument set). As CTFE is 
 implicitly pure
 (no global state at compile-time) later generated code can be 
 simply
 re-used for same argument set.

 Re-using it between compiler invocations is more tricky 
 because it is
 only legal if generator function and all stuff they indirectly 
 use have
 not changed too. Ignoring this requirement can result in nasty 
 build
 issues that are only fixed by clean build. Too harmful in my 
 opinion.
Clearly, nirvana is continuous compilation, where the compiler performs explicit dependency management at the level of nodes in the syntax tree.
Yeah I was wondering if we can merge some of rdmd functionality into compiler to speed up rebuilds and do better dependency tracking. But I am not sure it can fit nicely into current frontend architecture.
Jun 12 2014
prev sibling next sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 12.06.2014 11:17, schrieb Dmitry Olshansky:
 This one thing I'm loosing sleep over - what precisely is so good in
 CTFE code generation in_practical_  context (DSL that is quite stable,
 not just tiny helpers)?

 By the end of day it's just about having to write a trivial line in your
 favorite build system (NOT make) vs having to wait for a couple of
 minutes each build hoping the compiler won't hit your system's memory
 limits.

 And these couple of minutes are more like 30 minutes at a times. Worse
 yet unlike proper build system it doesn't keep track of actual changes
 (same regex patterns get recompiled over and over), at this point
 seamless integration into the language starts felling like a joke.

 And speaking of seamless integration: just generate a symbol name out of
 pattern at CTFE to link to later, at least this much can be done
 relatively fast. And voila even the clunky run-time generation is not
 half-bad at integration.

 Unless things improve dramatically CTFE code generation + mixin is just
 our funny painful toy.
you should write a big top post about your CTFE experience/problems - it is important enough
Jun 12 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/12/14, 4:04 AM, dennis luehring wrote:
 you should write a big top post about your CTFE experience/problems - it
 is important enough
yes please
Jun 12 2014
prev sibling next sibling parent Artur Skawina via Digitalmars-d-announce writes:
On 06/12/14 11:17, Dmitry Olshansky via Digitalmars-d-announce wrote:
 This one thing I'm loosing sleep over - what precisely is so good in CTFE code
generation in _practical_ context (DSL that is quite stable, not just tiny
helpers)?
Language integration; direct access to meta data (such as types, but also constants).
 By the end of day it's just about having to write a trivial line in your
favorite build system (NOT make) vs having to wait for a couple of minutes each
build hoping the compiler won't hit your system's memory limits.
If it really was only about an extra makefile rule then CTFE wouldn't make much difference; it would just be an explicitly-requested smarter version of constant folding. But that is not the case. Simple example: create a function that implements an algorithm which is derived from some type given to it as input. /Derived/ does not mean that it only contains some conditionally executed code that depends on some property of that type; it means that the algorithm itself is determined from the type. With the external-generator solution you can emit a templated function, but what you can *not* do is emit code based on meta-data or CT introspection - because the necessary data simply isn't available when the external generator runs. With CTFE you have direct access to all the data and generating the code becomes almost trivial. It makes a night-and-day type of difference. While you could implement a sufficiently-smart-generator that could handle some subset of the functionality of CTFE, it would be prohibitively expensive to do so, wouldn't scale and would often be pointless, if you had to resort to generating code containing mixin expressions anyway. There's a reason why this isn't done in other languages that don't have CTFE.
 Unless things improve dramatically CTFE code generation + mixin is just our
funny painful toy.
The code snippets posted here are of course just toy programs. This does not mean that CTFE and mixins are merely toys, they enable writing code in ways that just isn't practically possible in other languages. The fact that there isn't much such publicly available code is just a function of D's microscopic user base. Real Programmers write mixins that write mixins. artur
Jun 12 2014
prev sibling next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 12 June 2014 at 09:17:45 UTC, Dmitry Olshansky wrote:
 This one thing I'm loosing sleep over - what precisely is so 
 good in CTFE code generation in _practical_ context (DSL that 
 is quite stable, not just tiny helpers)?

 By the end of day it's just about having to write a trivial 
 line in your favorite build system (NOT make) vs having to wait 
 for a couple of minutes each build hoping the compiler won't 
 hit your system's memory limits.
Oh, this is a very good question :) There are two unrelated concerns here: 1) Reflection. It is less of an issue for pure DSL solutions because those don't provide any good reflection capabilities anyway, but other code generation approaches have very similar problems. By doing all code generation in separate build step you potentially lose many of guarantees of keeping various parts of your application in sync. 2) Moving forward. You use traditional reasoning of DSL generally being something rare and normally stable. This fits most common DSL usage but tight in-language integration D makes possible brings new opportunities of using DSL and code generation casually all other your program. I totally expect programming culture to evolve to the point where something like 90% of all application code is being generated in typical project. D has good base for promoting such paradigm switch and reducing any unnecessary mental context switches is very important here. This was pretty much the point I was trying to make with my DConf talk ( and have probably failed :) )
 And these couple of minutes are more like 30 minutes at a 
 times. Worse yet unlike proper build system it doesn't keep 
 track of actual changes (same regex patterns get recompiled 
 over and over), at this point seamless integration into the 
 language starts felling like a joke.

 And speaking of seamless integration: just generate a symbol 
 name out of pattern at CTFE to link to later, at least this 
 much can be done relatively fast. And voila even the clunky 
 run-time generation is not half-bad at integration.

 Unless things improve dramatically CTFE code generation + mixin 
 is just our funny painful toy.
Unfortunately current implementation of frontend falls behind language capabilities a lot. There are no fundamental reasons why it can't work with better compiler. In fact, deadlnix has made a very good case for SDC taking over as next D frontend exactly because of things like CTFE JIT.
Jun 12 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
12-Jun-2014 16:25, Dicebot пишет:
 On Thursday, 12 June 2014 at 09:17:45 UTC, Dmitry Olshansky wrote:
 This one thing I'm loosing sleep over - what precisely is so good in
 CTFE code generation in _practical_ context (DSL that is quite stable,
 not just tiny helpers)?

 By the end of day it's just about having to write a trivial line in
 your favorite build system (NOT make) vs having to wait for a couple
 of minutes each build hoping the compiler won't hit your system's
 memory limits.
Oh, this is a very good question :) There are two unrelated concerns here:
It's always nice to ask something on D NG, so many good answers I can hardly choose whom to reply ;) So this is kind of broadcast. Yes, the answer seems spot on - reflection! But allow me to retort. I'm not talking about completely stand-alone generator. Just as well generator tool could be written in D using the same exact sources as your D program does. Including the static introspection and type-awareness. Then generator itself is a library + "an invocation script" in D. The Q is specifically of CTFE in this scenario, including not only obvious shortcomings of design, but fundamental ones of compilation inside of compilation. Unlike proper compilation is has nothing persistent to back it up. It feels backwards, a bit like C++ TMP but, of course, much-much better.
 1)

 Reflection. It is less of an issue for pure DSL solutions because those
 don't provide any good reflection capabilities anyway, but other code
 generation approaches have very similar problems.

 By doing all code generation in separate build step you potentially lose
 many of guarantees of keeping various parts of your application in sync.
Use the same sources for the generator. In essence all is the same, just relying on separate runs and linkage, not mixin. Necessary "hooks" to link to later could indeed be generated with a tiny bit of CTFE. Yes, deeply embedded stuff might not be that easy. The scope and damage is smaller though.
 2)

 Moving forward. You use traditional reasoning of DSL generally being
 something rare and normally stable. This fits most common DSL usage but
 tight in-language integration D makes possible brings new opportunities
 of using DSL and code generation casually all other your program.
Well, I'm biased by heavy-handed ones. Say I have a (no longer) secret plan of doing a next-gen parser generator in D. Needless to say swaths of non-trivial code generation. I'm all for embedding nicely but I see very little _practical_ gains in CTFE+mixin here EVEN if CTFE wouldn't suck. See the point above about using the same metadata and types as the user application would.
 I totally expect programming culture to evolve to the point where
 something like 90% of all application code is being generated in typical
 project. D has good base for promoting such paradigm switch and reducing
 any unnecessary mental context switches is very important here.

 This was pretty much the point I was trying to make with my DConf talk (
 and have probably failed :) )
I liked the talk, but you know ... 4th or 5th talk with CTFE/mixin I think I might have been distracted :) More specifically this bright future of 90%+ concise DSL driven programs is undermined by the simple truth - no amount of improvement in CTFE would make generators run faster then optimized standalone tool invocation. The tool (library written in D) may read D metadata just fine. I heard D builds times are important part of its adoption so...
 And these couple of minutes are more like 30 minutes at a times. Worse
 yet unlike proper build system it doesn't keep track of actual changes
 (same regex patterns get recompiled over and over), at this point
 seamless integration into the language starts felling like a joke.

 And speaking of seamless integration: just generate a symbol name out
 of pattern at CTFE to link to later, at least this much can be done
 relatively fast. And voila even the clunky run-time generation is not
 half-bad at integration.

 Unless things improve dramatically CTFE code generation + mixin is
 just our funny painful toy.
Unfortunately current implementation of frontend falls behind language capabilities a lot. There are no fundamental reasons why it can't work with better compiler.
It might solve most of _current_ problems, but I foresee fundamental issues of "no global state" in CTFE that in say 10 years from now would look a lot like `#include` in C++. A major one is there is no way for compiler to not recompile generated code as it has no knowledge of how it might have changed from the previous run.
 In fact, deadlnix has made a very good case for
 SDC taking over as next D frontend exactly because of things like CTFE JIT.
Yeah, we ought to help him! -- Dmitry Olshansky
Jun 12 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 12 June 2014 at 16:42:38 UTC, Dmitry Olshansky wrote:
 It's always nice to ask something on D NG, so many good answers 
 I can hardly choose whom to reply ;) So this is kind of 
 broadcast.

 Yes, the answer seems spot on - reflection! But allow me to 
 retort.

 I'm not talking about completely stand-alone generator. Just as 
 well generator tool could be written in D using the same exact 
 sources as your D program does. Including the static 
 introspection and type-awareness. Then generator itself is a 
 library + "an invocation script" in D.

 The Q is specifically of CTFE in this scenario, including not 
 only obvious shortcomings of design, but fundamental ones of 
 compilation inside of compilation. Unlike proper compilation is 
 has nothing persistent to back it up. It feels backwards, a bit 
 like C++ TMP but, of course, much-much better.

 1)

 Reflection. It is less of an issue for pure DSL solutions 
 because those
 don't provide any good reflection capabilities anyway, but 
 other code
 generation approaches have very similar problems.

 By doing all code generation in separate build step you 
 potentially lose
 many of guarantees of keeping various parts of your 
 application in sync.
Use the same sources for the generator. In essence all is the same, just relying on separate runs and linkage, not mixin. Necessary "hooks" to link to later could indeed be generated with a tiny bit of CTFE. Yes, deeply embedded stuff might not be that easy. The scope and damage is smaller though.
 2)

 Moving forward. You use traditional reasoning of DSL generally 
 being
 something rare and normally stable. This fits most common DSL 
 usage but
 tight in-language integration D makes possible brings new 
 opportunities
 of using DSL and code generation casually all other your 
 program.
Well, I'm biased by heavy-handed ones. Say I have a (no longer) secret plan of doing a next-gen parser generator in D. Needless to say swaths of non-trivial code generation. I'm all for embedding nicely but I see very little _practical_ gains in CTFE+mixin here EVEN if CTFE wouldn't suck. See the point above about using the same metadata and types as the user application would.
Consider something like REST API generator I have described during DConf. There is different code generated in different contexts from same declarative description - both for server and client. Right now simple fact that you import very same module from both gives solid 100% guarantee that API usage between those two programs stays in sync. In your proposed scenario there will be two different generated files imported by server and client respectively. Tiny typo in writing your build script will result in hard to detect run-time bug while code itself still happily compiles. You may keep convenience but losing guarantees hurts a lot. To be able to verify static correctness of your program / group of programs type system needs to be aware how generated code relates to original source. Also this approach does not scale. I can totally imagine you doing it for two or three DSL in single program, probably even dozen. But something like 100+? Huge mess to maintain. According to my experience all builds systems are incredibly fragile beasts, trusting them something that impacts program correctness and won't be detected at compile time is just too dangerous.
 I totally expect programming culture to evolve to the point 
 where
 something like 90% of all application code is being generated 
 in typical
 project. D has good base for promoting such paradigm switch 
 and reducing
 any unnecessary mental context switches is very important here.

 This was pretty much the point I was trying to make with my 
 DConf talk (
 and have probably failed :) )
I liked the talk, but you know ... 4th or 5th talk with CTFE/mixin I think I might have been distracted :) More specifically this bright future of 90%+ concise DSL driven programs is undermined by the simple truth - no amount of improvement in CTFE would make generators run faster then optimized standalone tool invocation. The tool (library written in D) may read D metadata just fine. I heard D builds times are important part of its adoption so...
Adoption - yes. Production usage - less so (though still important). Difference between 1 second and 5 seconds is very important. Between 10 seconds and 1 minute - not so much. JIT will be probably slower than stand-alone generators but not that slower.
 It might solve most of _current_ problems, but I foresee 
 fundamental issues of "no global state" in CTFE that in say 10 
 years from now would look a lot like `#include` in C++.
I hope 10 years ago from now we will consider having global state in RTFE stone age relict :P
 A major one is there is no way for compiler to not recompile 
 generated code as it has no knowledge of how it might have 
 changed from the previous run.
Why can't we merge basic build system functionality akin to rdmd into compiler itself? It makes perfect sense to me as build process can benefit a lot from being semantically aware.
Jun 14 2014
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/14/14, 8:05 AM, Dicebot wrote:
 Adoption - yes. Production usage - less so (though still important).
 Difference between 1 second and 5 seconds is very important. Between 10
 seconds and 1 minute - not so much.
Wait, what? -- Andrei
Jun 14 2014
parent "Dicebot" <public dicebot.lv> writes:
On Saturday, 14 June 2014 at 15:25:11 UTC, Andrei Alexandrescu 
wrote:
 On 6/14/14, 8:05 AM, Dicebot wrote:
 Adoption - yes. Production usage - less so (though still 
 important).
 Difference between 1 second and 5 seconds is very important. 
 Between 10
 seconds and 1 minute - not so much.
Wait, what? -- Andrei
If build time becomes long enough that it forces you to switch the mental context, it is less important how long it takes - you are much likely to do something else and return to it later. Of course it can also get to famous C++ hours of build time which is next level of inconvenience :) But reasonably big and complicated project won't build in 5 seconds anyway (even with perfect compiler), so eventually pure build time becomes less of a selling point. Still important but not _that_ important.
Jun 14 2014
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
14-Jun-2014 19:05, Dicebot пишет:
 On Thursday, 12 June 2014 at 16:42:38 UTC, Dmitry Olshansky wrote:
[snip]
 Well, I'm biased by heavy-handed ones. Say I have a (no longer) secret
 plan of doing a next-gen parser generator in D. Needless to say swaths
 of non-trivial code generation. I'm all for embedding nicely but I see
 very little _practical_ gains in CTFE+mixin here EVEN if CTFE wouldn't
 suck. See the point above about using the same metadata and types as
 the user application would.
Consider something like REST API generator I have described during DConf. There is different code generated in different contexts from same declarative description - both for server and client. Right now simple fact that you import very same module from both gives solid 100% guarantee that API usage between those two programs stays in sync.
But let's face it - it's a one-time job to get it right in your favorite build tool. Then you have fast and cached (re)build. Comparatively costs of CTFE generation are paid in full during _each_ build.
 In your proposed scenario there will be two different generated files
 imported by server and client respectively. Tiny typo in writing your
 build script will result in hard to detect run-time bug while code
 itself still happily compiles.
Or a link error if we go a hybrid path where the imported module is emitting declarations/hooks via CTFE to be linked to by the proper generated code. This is something I'm thinking that could be a practical solution. I.e. currently to get around wasting cycles again and again: module a; bool verify(string s){ static re = ctRegex!"...."; return match(s, re); } // module b; import a; void foo(){ ... verify("blah"); ... } vs would-be hybrid approach: module gen_re; void main() //or wrap it in tiny template mixin { generateCtRegex( //all patterns ); } module b; import std.regex; //notice no import of a void foo(){ ... static re = ctRegex!(...); // ... } and using ctRegex as usual in b, but any miss of compiled cache would lead to a link error. In fact it might be the best of both worlds if there is a switch to try full CTFE vs link-time external option.
 You may keep convenience but losing guarantees hurts a lot. To be able
 to verify static correctness of your program / group of programs type
 system needs to be aware how generated code relates to original source.
Build system does it. We have this problem with all of external deps anyway (i.e. who verifies the right version of libXYZ is linked not some other?)
 Also this approach does not scale. I can totally imagine you doing it
 for two or three DSL in single program, probably even dozen. But
 something like 100+?
Not everything is suitable, of course. Some stuff is good only inline and on spot. But it does use the same sources, it may look a lot like this in case of REST generators: import everything; void main(){ foreach(m; module){ //... generate client code from meta-data } } Waiting for 100+ DSL compiled in a JIT interpreter that can't optimize a thing (pretty much by definition or use separate flags for that?) is not going to be fun too.
 Huge mess to maintain. According to my experience
 all builds systems are incredibly fragile beasts, trusting them
 something that impacts program correctness and won't be detected at
 compile time is just too dangerous.
Could be, but we have dub which should be simple and nice. I had very positive experience with scons and half-generated sources.
 I heard D builds times are important part of its adoption so...
Adoption - yes. Production usage - less so (though still important). Difference between 1 second and 5 seconds is very important. Between 10 seconds and 1 minute - not so much. JIT will be probably slower than stand-alone generators but not that slower.
 It might solve most of _current_ problems, but I foresee fundamental
 issues of "no global state" in CTFE that in say 10 years from now
 would look a lot like `#include` in C++.
I hope 10 years ago from now we will consider having global state in RTFE stone age relict :P
Well, no amount of purity dismisses the point that a cache is a cache. When I say global in D I mean thread/fiber local.
 A major one is there is no way for compiler to not recompile generated
 code as it has no knowledge of how it might have changed from the
 previous run.
Why can't we merge basic build system functionality akin to rdmd into compiler itself? It makes perfect sense to me as build process can benefit a lot from being semantically aware.
I wouldn't cross my fingers, but yes ideally it would need to have powers of a build system making it that much more complicated. Then it can cache results including templates instantiations across module and separate invocations of the tool. It's a distant dream though. Currently available caching at the level of object files is very coarse grained and not really helpful to our problem at hand. -- Dmitry Olshansky
Jun 14 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Saturday, 14 June 2014 at 16:34:35 UTC, Dmitry Olshansky wrote:
 Consider something like REST API generator I have described 
 during
 DConf. There is different code generated in different contexts 
 from same
 declarative description - both for server and client. Right 
 now simple
 fact that you import very same module from both gives solid 
 100%
 guarantee that API usage between those two programs stays in 
 sync.
But let's face it - it's a one-time job to get it right in your favorite build tool. Then you have fast and cached (re)build. Comparatively costs of CTFE generation are paid in full during _each_ build.
There is no such thing as one-time job in programming unless you work alone and abandon any long-term maintenance. As time goes any mistake that can possibly happen will inevitably happen.
 In your proposed scenario there will be two different 
 generated files
 imported by server and client respectively. Tiny typo in 
 writing your
 build script will result in hard to detect run-time bug while 
 code
 itself still happily compiles.
Or a link error if we go a hybrid path where the imported module is emitting declarations/hooks via CTFE to be linked to by the proper generated code. This is something I'm thinking that could be a practical solution. <snip>
What is the benefit of this approach over simply keeping all ctRegex bodies in separate package, compiling it as a static library and referring from actual app by own unique symbol? This is something that can does not need any changes in compiler or Phobos, just matter of project layout. It does not work for more complicated cases were you actually need access to generated sources (generate templates for example).
 You may keep convenience but losing guarantees hurts a lot. To 
 be able
 to verify static correctness of your program / group of 
 programs type
 system needs to be aware how generated code relates to 
 original source.
Build system does it. We have this problem with all of external deps anyway (i.e. who verifies the right version of libXYZ is linked not some other?)
It is somewhat worse because you don't routinely change external libraries, as opposed to local sources.
 Huge mess to maintain. According to my experience
 all builds systems are incredibly fragile beasts, trusting them
 something that impacts program correctness and won't be 
 detected at
 compile time is just too dangerous.
Could be, but we have dub which should be simple and nice. I had very positive experience with scons and half-generated sources.
dub is terrible at defining any complicated build models. Pretty much anything that is not single step compile-them-all approach can only be done via calling external shell script. If using external generators is necessary I will take make over anything else :)
 <snip>
tl; dr: I believe that we should improve compiler technology to achieve same results instead of promoting temporary hacks as the true way to do things. Relying on build system is likely to be most practical solution today but it is not solution I am satisfied with and hardly one I can accept as accomplished target. Imaginary compiler that continuously runs as daemon/service, is capable of JIT-ing and provides basic dependency tracking as part of compilation step should behave as good as any external solution with much better correctness guarantees and overall user experience out of the box.
Jun 15 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
15-Jun-2014 20:21, Dicebot пишет:
 On Saturday, 14 June 2014 at 16:34:35 UTC, Dmitry Olshansky wrote:
 But let's face it - it's a one-time job to get it right in your
 favorite build tool. Then you have fast and cached (re)build.
 Comparatively costs of CTFE generation are paid in full during _each_
 build.
There is no such thing as one-time job in programming unless you work alone and abandon any long-term maintenance. As time goes any mistake that can possibly happen will inevitably happen.
The frequency of such event is orders of magnitude smaller. Let's not take arguments to supreme as then doing anything is futile due to the potential of mistake it introduces sooner or later.
 In your proposed scenario there will be two different generated files
 imported by server and client respectively. Tiny typo in writing your
 build script will result in hard to detect run-time bug while code
 itself still happily compiles.
Or a link error if we go a hybrid path where the imported module is emitting declarations/hooks via CTFE to be linked to by the proper generated code. This is something I'm thinking that could be a practical solution. <snip>
What is the benefit of this approach over simply keeping all ctRegex bodies in separate package, compiling it as a static library and referring from actual app by own unique symbol? This is something that can does not need any changes in compiler or Phobos, just matter of project layout.
Automation. Dumping the body of ctRegex is manual work after all, including putting it with the right symbol. In proposed scheme it's just a matter of copy-pasting a pattern after initial setup has been done.
 It does not work for more complicated cases were you actually need
 access to generated sources (generate templates for example).
Indeed, this is a limitation, and the import of generated source would be required.
 You may keep convenience but losing guarantees hurts a lot. To be able
 to verify static correctness of your program / group of programs type
 system needs to be aware how generated code relates to original source.
Build system does it. We have this problem with all of external deps anyway (i.e. who verifies the right version of libXYZ is linked not some other?)
It is somewhat worse because you don't routinely change external libraries, as opposed to local sources.
But surely we have libraries that are built as separate project and are "external" dependencies, right? There is nothing new here except that "d-->obj-->lib file" is changed to "generator-->generated D file--->obj file".
 Huge mess to maintain. According to my experience
 all builds systems are incredibly fragile beasts, trusting them
 something that impacts program correctness and won't be detected at
 compile time is just too dangerous.
Could be, but we have dub which should be simple and nice. I had very positive experience with scons and half-generated sources.
dub is terrible at defining any complicated build models. Pretty much anything that is not single step compile-them-all approach can only be done via calling external shell script.
I'm not going to like dub then ;)
 If using external generators is
 necessary I will take make over anything else :)
Then I understand your point about inevitable mistakes, it's all in the tool.
 <snip>
tl; dr: I believe that we should improve compiler technology to achieve same results instead of promoting temporary hacks as the true way to do things. Relying on build system is likely to be most practical solution today but it is not solution I am satisfied with and hardly one I can accept as accomplished target. Imaginary compiler that continuously runs as daemon/service, is capable of JIT-ing and provides basic dependency tracking as part of compilation step should behave as good as any external solution with much better correctness guarantees and overall user experience out of the box.
What I want to point out is to not mistake goals and the means to an end. No matter how we call it CTFE code generation is just a means to an end, with serious limitations (especially as it stands today, in the real world). Seamless integration is not about packing everything into single compiler invocation: dmd src/*.d Generation is generation, as long as it's fast and automatic it solves the problem(s) meta programming was established to solve. For instance if D compiler allowed external tools as plugins (just an example to show means vs ends distinction) with some form of the following construct: mixin(call_external_tool("args", 3, 14, 15, .92)); it would make any generation totally practical *today*. This was proposed before, and dismissed out of fear of security risks, never identifying the proper set of restrictions. After all we have textual mixins of potential security risk no problem. Let's focus on the facts that this has the benefits of: - sane debugging of the plug-in (it's just a program with the usual symbols) - fast, as the tool could be built with full optimization flags or run as service - trivially able to cache things across builds and even per each AST node - easy to implement (as in next release) - may include things inexpressible in CTFE like calling into external systems and vendor-specific tools That will for instance give as ability to have practical C-->D transparent header inclusion as say: extern mixin(htod("some_header.h")); How long till C preprocessor is working at CTFE? How long till it's practical to do: mixin(htod(import("some_header.h"))); and have it done optimally fast at CTFE? My answer is - no amount of JITing CTFE and compiler architecture improvements in foreseeable future will get it better then standalone tool(s), due to the mentioned _fundamental_ limitations. There are real practical boundaries on where an internal interpreter can stay competitive. -- Dmitry Olshansky
Jun 15 2014
parent "Dicebot" <public dicebot.lv> writes:
On Sunday, 15 June 2014 at 21:38:18 UTC, Dmitry Olshansky wrote:
 15-Jun-2014 20:21, Dicebot пишет:
 On Saturday, 14 June 2014 at 16:34:35 UTC, Dmitry Olshansky 
 wrote:
 But let's face it - it's a one-time job to get it right in 
 your
 favorite build tool. Then you have fast and cached (re)build.
 Comparatively costs of CTFE generation are paid in full 
 during _each_
 build.
There is no such thing as one-time job in programming unless you work alone and abandon any long-term maintenance. As time goes any mistake that can possibly happen will inevitably happen.
The frequency of such event is orders of magnitude smaller. Let's not take arguments to supreme as then doing anything is futile due to the potential of mistake it introduces sooner or later.
It is more likely to happen if you change you build scripts more often. And this is exactly what you propose. I am not going to say it is impractical, just mentioning flaws that make me seek for better solution.
 Automation. Dumping the body of ctRegex is manual work after 
 all, including putting it with the right symbol. In proposed 
 scheme it's just a matter of copy-pasting a pattern after 
 initial setup has been done.
I think defining regexes in separate module is even less of an effort than adding few lines to the build script ;)
 It is somewhat worse because you don't routinely change 
 external
 libraries, as opposed to local sources.
But surely we have libraries that are built as separate project and are "external" dependencies, right? There is nothing new here except that "d-->obj-->lib file" is changed to "generator-->generated D file--->obj file".
Ok, I am probably convinced on this one. Incidentally I do always prefer full source builds as opposed to static library separation inside application itself. When there is enough RAM for dmd of course :)
 Huge mess to maintain. According to my experience
dub is terrible at defining any complicated build models. Pretty much anything that is not single step compile-them-all approach can only be done via calling external shell script.
I'm not going to like dub then ;)
It is primarily source dependency manager, not a build tool. I remember Sonke mentioning it is intentionally kept simplistic to guarantee no platform-unique features are ever needed. For anything complicated I'd probably wrap dub call inside makefile to prepare all necessary extra files.
 If using external generators is
 necessary I will take make over anything else :)
Then I understand your point about inevitable mistakes, it's all in the tool.
make is actually pretty good if you don't care about other platforms than Linux. Well, other than stupid whitespace sensitivity. But it is incredibly good at defining build systems with chained dependencies.
 What I want to point out is to not mistake goals and the means 
 to an end. No matter how we call it CTFE code generation is 
 just a means to an end, with serious limitations (especially as 
 it stands today, in the real world).
I agree. What I do disagree about is definition of the goal. It is not just "generating code", it is "generating code in a manner understood by compiler".
 For instance if D compiler allowed external tools as plugins 
 (just an example to show means vs ends distinction) with some 
 form of the following construct:

 mixin(call_external_tool("args", 3, 14, 15, .92));

 it would make any generation totally practical *today*.
But this is exactly the case when language integration gives you nothing over build system solution :) If compiler itself is not aware how code gets generated from arguments, there is no real advantage in putting tool invocation inline.
 How long till C preprocessor is working at CTFE? How long till 
 it's practical to do:

 mixin(htod(import("some_header.h")));

 and have it done optimally fast at CTFE?
Never, but it is not really about being fast or convenient. For htod you don't want just C grammar / preprocessor support, you want it as good as one in real C compilers.
 My answer is - no amount of JITing CTFE and compiler 
 architecture improvements in foreseeable future will get it 
 better then standalone tool(s), due to the mentioned 
 _fundamental_ limitations.

 There are real practical boundaries on where an internal 
 interpreter can stay competitive.
I don't see any fundamental practical boundaries. Quality of implementation ones - sure. Quite the contrary, I totally see how better compiler can easily outperform any external tools for most build tasks despite somewhat worse JIT codegen - it has huge advantage of being able to work on language semantical entities and not just files. That allows much smarter caching and dependency tracking, something external tools will never be able to achieve.
Jun 16 2014
prev sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 12 June 2014 at 09:17:45 UTC, Dmitry Olshansky wrote:
 This one thing I'm loosing sleep over - what precisely is so 
 good in CTFE code generation in _practical_ context (DSL that 
 is quite stable, not just tiny helpers)?
I've asked this same question before and my answer is mostly the same as dicebot: I think reflection is the important bit. Of course, even there it is sometimes useful to break it into two steps (one just prints the data out kinda like dmd -X then a regular program reads it and generates the code), but I find it really useful to read D code and generate stuff based on that.
 By the end of day it's just about having to write a trivial 
 line in your favorite build system (NOT make)
it is actually pretty trivial in make too...
Jun 12 2014
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11-Jun-2014 22:03, Atila Neves пишет:
 On Tuesday, 10 June 2014 at 19:36:57 UTC, bearophile wrote:
 At about 40.42 in the "Thoughts on static regex" there is written
 "even compile-time printf would be awesome". There is a patch about
 __ctWrite in GitHug, it should be fixed and merged.

 Bye,
 bearophile
I wish I'd taken the mic at the end, and 2 days later Adam D. Ruppe said what I was thinking of saying: unit test and debug the CTFE function at runtime and then use it at compile-time when it's ready for production.
Yes, that's a starting point - a function working at R-T.
 Yes, Dmitry brought up compiler bugs. But if you write a compile-time UT
 and it fails, you'll know it wasn't because of your own code because the
 run-time ones still pass.
It doesn't help that it's not your fault :) And with a bit of __ctfe's to workaround compiler bugs you won't be so sure of your code anymore.
 Maybe there's still a place for something more than pragma msg, but I'd
 definitely advocate for the above at least in the beginning. If
 anything, easier ways to write compile-time UTs would be, to me,
 preferable to a compile-time printf.
There is nice assertCTFEable written by Kenji in Phobos. I think it's our private magic for now but I see no reason not to expose it somewhere.
 Atila
-- Dmitry Olshansky
Jun 12 2014
parent "Atila Neves" <atila.neves gmail.com> writes:
On Thursday, 12 June 2014 at 08:42:49 UTC, Dmitry Olshansky wrote:
 11-Jun-2014 22:03, Atila Neves пишет:
 On Tuesday, 10 June 2014 at 19:36:57 UTC, bearophile wrote:
 At about 40.42 in the "Thoughts on static regex" there is 
 written
 "even compile-time printf would be awesome". There is a patch 
 about
 __ctWrite in GitHug, it should be fixed and merged.

 Bye,
 bearophile
I wish I'd taken the mic at the end, and 2 days later Adam D. Ruppe said what I was thinking of saying: unit test and debug the CTFE function at runtime and then use it at compile-time when it's ready for production.
Yes, that's a starting point - a function working at R-T.
 Yes, Dmitry brought up compiler bugs. But if you write a 
 compile-time UT
 and it fails, you'll know it wasn't because of your own code 
 because the
 run-time ones still pass.
It doesn't help that it's not your fault :) And with a bit of __ctfe's to workaround compiler bugs you won't be so sure of your code anymore.
 Maybe there's still a place for something more than pragma 
 msg, but I'd
 definitely advocate for the above at least in the beginning. If
 anything, easier ways to write compile-time UTs would be, to 
 me,
 preferable to a compile-time printf.
There is nice assertCTFEable written by Kenji in Phobos. I think it's our private magic for now but I see no reason not to expose it somewhere.
 Atila
It helps; you won't lose time looking at your code and wondering. I thought of the __cfte problem though: that would mean different code paths and what I said wouldn't be valid anymore. Atila
Jun 12 2014