www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Executing pure D at compile-time

reply kris <foo bar.com> writes:
Following on from the "Regex Redux thread, it seems to me there's an 
easy way to execute pure D at compile-time. A few elements are needed:

1) the ability to describe a compile-time function call
2) the facility to pass arguments to it, and recieve a return value
3) a means of identifiying the D code to execute
4) a manner in which the pure D is executed
5) a mechanism for ensuring the executed code is docile

What follows is purely an illustration, since there are a number of ways 
to achieve the same result:

1) Making the call. let's assume standard calling syntax is enabled. 
Perhaps something like this (at the call site):

#  char[] result = regex ("[0-9]", "abc123");

That may not be an entirely practical syntax, but it hopefully gets the 
idea across?


2) Argument passing. The D source file is text, so a simple 
implementation might pass text-arguments to the compile-time function. 
Given that a mixin() is text-based also, it might make sense for the 
function to return text too e.g.

# char[] regex (char[] pattern, char[] string) {}

Note that this function is composed of nothing but standard D code. The 
args are represented by strings for the sake of simplicity; but it could 
also be something more sophisticated.


3) Function identity. The compiler would need to distinguish between 
compile-time functions and all other code (so that it knows what is 
what). One way to do this is to introduce a variation upon public / 
private / package, called 'extension':

# extension char[] regex (char[] pattern, char[] string) {}

With the 'extension' keyword, the compiler can identify 'regex' as a 
compile-time function. Thus, the regex call noted earlier would be 
evaluated as a /compile-time invocation/ of the regex function; as 
opposed to a runtime call. Note that these 'extension' functions would 
be omitted from the target binary: they are for compile-time use only.


4) Execution. One could write a D interpreter and embed it in the 
compiler, but that's perhaps a bit impractical. Instead, why not simply 
recurse the compiler (or spawn a child instance) to generate a seperate 
binary instance of the compile-time function?

Under Win32, for example, the binary could be generated as a .exe file, 
and be passed arguments as normal. The return of the function could be 
captured via an stdout pipe. Better, a dll could be generated instead, 
and be dynamically bound to the executing compiler. The latter has 
several benefits, the most obvious being raw throughput.


5) The problem with enabling pure D at compile-time is a catch-22. You 
want the expressive power and raw execution speed, but you want to 
ensure it doesn't do anything bad. This is a problem regardless of how 
#4 is implemented. However, I rather suspect the OS will provide the 
answer for such concerns? I mean, doesn't Vista (for example) provide an 
execution 'sandbox' where the target is not permitted to create any 
handles? Without handles, there's no file, socket or registry access. 
That's a nice sandbox.


How about an illustrative example? The regex discussed previously?

========
module main;

import regex;

void main()
{
     // result is generated at compile-time ...
     char[] result = regex ("[0-9]", "abc123");
     writefln (result);
}

-----

module regex;

import std.regexp;

extension char[] regex (char[] pattern, char[] string)
{
      auto exp = new RegExp (args[0]);
      return exp.find (args[1]);
}

=========

Note that import operates here exactly as it does today. As does 
everything else. The distinction is the introduction of an 'extension' 
keyword, and the mechanism to invoke the described function at 
compile-time from a call-site (rather than generating a runtime call).

All told, the various posts on compile-time functionality are really all 
about compiler extensions. The degree of extension is just different 
across posts. Supporting a pure D approach is certainly better than 
inventing another language inside D itself; is it not?

Taking this a little further, there's no need for the 'extension' code 
to be generated for each invocation. It can easily be cached by the 
compiler at runtime; particularly a dll implementation. Indeed, assuming 
the sandbox is in place, there's nothing to prevent one from using 
pre-compiled extensions instead:

==========
module regex;

extension char[] regex (char[] pattern, char[] string);
==========

In this case, the extension is simply /declared/ like an extern D 
function would normally be. (there's a assumption that the dll name 
would be somehow bound to the function name. And, of course, the 
assumption that one can sandbox).

The beauty of this approach is in the simplicity and the power. Occam's 
Razor would appear to be at work.

Thoughts?

- Kris
Feb 08 2007
next sibling parent kenny <funisher gmail.com> writes:
this approach is actually quite nice.

In the spawned processes, pragma(lib) won't work and neither will the
-L-lmysqlclient work either. It'd be cool to have the D program be able to
contact mysql and get data directly out of mysql, but it's far too much of a
security risk, I think. With no external libraries (just D code), it would make
file opening very difficult (although not impossible, calling directly the
kernel). If .so files were created (I'm on linux) it could be possible to put
the so execution into a chroot of only the cwd. I dunno if there's a solution
in windows though.

TANGENT: though if chroot were possible in windows as well, then libs could be
imported (only if they're in the cwd or symlinked) because a makefile would
have to put the libs there or symlink them, and a makefile has full access to
the machine, so rm -rf / is possible in a make file already. a compromised
makefile would be required to make D compromise the machine, because import
("/etc/shadow") won't be possible.

Anyway, I think this is a super awesome idea. Could be REALLY AWESOME!


kris wrote:
 Following on from the "Regex Redux thread, it seems to me there's an 
 easy way to execute pure D at compile-time. A few elements are needed:
 
 1) the ability to describe a compile-time function call
 2) the facility to pass arguments to it, and recieve a return value
 3) a means of identifiying the D code to execute
 4) a manner in which the pure D is executed
 5) a mechanism for ensuring the executed code is docile
 
 What follows is purely an illustration, since there are a number of ways 
 to achieve the same result:
 
 1) Making the call. let's assume standard calling syntax is enabled. 
 Perhaps something like this (at the call site):
 
 #  char[] result = regex ("[0-9]", "abc123");
 
 That may not be an entirely practical syntax, but it hopefully gets the 
 idea across?
 
 
 2) Argument passing. The D source file is text, so a simple 
 implementation might pass text-arguments to the compile-time function. 
 Given that a mixin() is text-based also, it might make sense for the 
 function to return text too e.g.
 
 # char[] regex (char[] pattern, char[] string) {}
 
 Note that this function is composed of nothing but standard D code. The 
 args are represented by strings for the sake of simplicity; but it could 
 also be something more sophisticated.
 
 
 3) Function identity. The compiler would need to distinguish between 
 compile-time functions and all other code (so that it knows what is 
 what). One way to do this is to introduce a variation upon public / 
 private / package, called 'extension':
 
 # extension char[] regex (char[] pattern, char[] string) {}
 
 With the 'extension' keyword, the compiler can identify 'regex' as a 
 compile-time function. Thus, the regex call noted earlier would be 
 evaluated as a /compile-time invocation/ of the regex function; as 
 opposed to a runtime call. Note that these 'extension' functions would 
 be omitted from the target binary: they are for compile-time use only.
 
 
 4) Execution. One could write a D interpreter and embed it in the 
 compiler, but that's perhaps a bit impractical. Instead, why not simply 
 recurse the compiler (or spawn a child instance) to generate a seperate 
 binary instance of the compile-time function?
 
 Under Win32, for example, the binary could be generated as a .exe file, 
 and be passed arguments as normal. The return of the function could be 
 captured via an stdout pipe. Better, a dll could be generated instead, 
 and be dynamically bound to the executing compiler. The latter has 
 several benefits, the most obvious being raw throughput.
 
 
 5) The problem with enabling pure D at compile-time is a catch-22. You 
 want the expressive power and raw execution speed, but you want to 
 ensure it doesn't do anything bad. This is a problem regardless of how 
 #4 is implemented. However, I rather suspect the OS will provide the 
 answer for such concerns? I mean, doesn't Vista (for example) provide an 
 execution 'sandbox' where the target is not permitted to create any 
 handles? Without handles, there's no file, socket or registry access. 
 That's a nice sandbox.
 
 
 How about an illustrative example? The regex discussed previously?
 
 ========
 module main;
 
 import regex;
 
 void main()
 {
     // result is generated at compile-time ...
     char[] result = regex ("[0-9]", "abc123");
     writefln (result);
 }
 
 -----
 
 module regex;
 
 import std.regexp;
 
 extension char[] regex (char[] pattern, char[] string)
 {
      auto exp = new RegExp (args[0]);
      return exp.find (args[1]);
 }
 
 =========
 
 Note that import operates here exactly as it does today. As does 
 everything else. The distinction is the introduction of an 'extension' 
 keyword, and the mechanism to invoke the described function at 
 compile-time from a call-site (rather than generating a runtime call).
 
 All told, the various posts on compile-time functionality are really all 
 about compiler extensions. The degree of extension is just different 
 across posts. Supporting a pure D approach is certainly better than 
 inventing another language inside D itself; is it not?
 
 Taking this a little further, there's no need for the 'extension' code 
 to be generated for each invocation. It can easily be cached by the 
 compiler at runtime; particularly a dll implementation. Indeed, assuming 
 the sandbox is in place, there's nothing to prevent one from using 
 pre-compiled extensions instead:
 
 ==========
 module regex;
 
 extension char[] regex (char[] pattern, char[] string);
 ==========
 
 In this case, the extension is simply /declared/ like an extern D 
 function would normally be. (there's a assumption that the dll name 
 would be somehow bound to the function name. And, of course, the 
 assumption that one can sandbox).
 
 The beauty of this approach is in the simplicity and the power. Occam's 
 Razor would appear to be at work.
 
 Thoughts?
 
 - Kris
 
 

Feb 08 2007
prev sibling next sibling parent Johan Granberg <lijat.meREM OVE.gmail.com> writes:
kris wrote:
...
 - Kris

votes++
Feb 08 2007
prev sibling next sibling parent janderson <askme me.com> writes:
kris wrote:
 Following on from the "Regex Redux thread, it seems to me there's an 
 easy way to execute pure D at compile-time. A few elements are needed:
 
 1) the ability to describe a compile-time function call
 2) the facility to pass arguments to it, and recieve a return value
 3) a means of identifiying the D code to execute
 4) a manner in which the pure D is executed
 5) a mechanism for ensuring the executed code is docile
  
 Thoughts?
 
 - Kris
 

I like this approach (essentially its what I was suggesting in a previous thread). This would be an very powerful addition to D. How would it determine 5 when you can create objects. I'm guessing each method in the class would need to be labeled extension. All the compiler would do is validate that "extension" is true (like a constant), otherwise not compile. As for naming I prefer the term "compiletime". My last thought on the matter which I haven't posted was a slightly different one (take it or leave it). Why not extend the template syntax so that it looks like every day D code. First I would get rid of the static if, its inside a template, that should be able to be figured out. Next I would add case statements and for loops. Also compile time variables (int float, arrays, associative arrays). Because these are in a template they would be detected as compile time. However that wouldn't be nearly as powerful because you have no way to reuse code and you can't create new objects.
Feb 08 2007
prev sibling next sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Following on from the "Regex Redux thread, it seems to me there's an 
 easy way to execute pure D at compile-time. A few elements are needed:
 
 1) the ability to describe a compile-time function call
 2) the facility to pass arguments to it, and recieve a return value
 3) a means of identifiying the D code to execute
 4) a manner in which the pure D is executed
 5) a mechanism for ensuring the executed code is docile

6) Ensuring that the code executing at compile-time has full access to program's symbols. 1-5 define a way to define dual functions in D, which is a good thing. But without (6), they are useless. Andrei
Feb 08 2007
parent reply kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 Following on from the "Regex Redux thread, it seems to me there's an 
 easy way to execute pure D at compile-time. A few elements are needed:

 1) the ability to describe a compile-time function call
 2) the facility to pass arguments to it, and recieve a return value
 3) a means of identifiying the D code to execute
 4) a manner in which the pure D is executed
 5) a mechanism for ensuring the executed code is docile

6) Ensuring that the code executing at compile-time has full access to program's symbols. 1-5 define a way to define dual functions in D, which is a good thing. But without (6), they are useless.

On the contrary, they have access to the arguments passed to them; and to thoose arguments only. This should be viewed as a "good thing", since it sandboxes the extent of behaviour. To do otherwise flies in the face of everything good about encapsulation. Taking a step back though, what does it really matter? We're talking about something that a diminishingly small number of programmers would actually apply in everyday usage. Thus, I would hope some serious 'trade off' consideration would be given to such an approach. After all, there's that assertion that /used/ to be on the D web site: the one about how D is a "practical language for practical programmers" or something? Embedding yet another DSL into the compiler would appear to be a long road for very little practical gain. - Kris
Feb 08 2007
next sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 Following on from the "Regex Redux thread, it seems to me there's an 
 easy way to execute pure D at compile-time. A few elements are needed:

 1) the ability to describe a compile-time function call
 2) the facility to pass arguments to it, and recieve a return value
 3) a means of identifiying the D code to execute
 4) a manner in which the pure D is executed
 5) a mechanism for ensuring the executed code is docile

6) Ensuring that the code executing at compile-time has full access to program's symbols. 1-5 define a way to define dual functions in D, which is a good thing. But without (6), they are useless.

On the contrary, they have access to the arguments passed to them; and to thoose arguments only. This should be viewed as a "good thing", since it sandboxes the extent of behaviour. To do otherwise flies in the face of everything good about encapsulation.

Symbols are interlinked. If you pass a class name to a metafunction, you'd sure hope to be able to access its base class name from there.
 Taking a step back though, what does it really matter? We're talking 
 about something that a diminishingly small number of programmers would 
 actually apply in everyday usage.
 
 Thus, I would hope some serious 'trade off' consideration would be given 
 to such an approach. After all, there's that assertion that /used/ to be 
 on the D web site: the one about how D is a "practical language for 
 practical programmers" or something? Embedding yet another DSL into the 
 compiler would appear to be a long road for very little practical gain.

As discussed, there is plentiful evidence of an increase in effective use of advanced language technology for very practical applications. In Loki or Boost, only the limitations of C++, and the consequent explosion in complexity and palatability, have put a ceiling on the usefulness of the libraries - not the inability of library writers nor the backlash from "practical programmers". I think such an attitude does little to help the language and ultimately the very practical programmers that the concern is all about. Walter told me that D didn't even have templates. Knowing Walter, I know he added them because he found them useful. Andrei
Feb 08 2007
parent kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 Taking a step back though, what does it really matter? We're talking 
 about something that a diminishingly small number of programmers would 
 actually apply in everyday usage.

 Thus, I would hope some serious 'trade off' consideration would be 
 given to such an approach. After all, there's that assertion that 
 /used/ to be on the D web site: the one about how D is a "practical 
 language for practical programmers" or something? Embedding yet 
 another DSL into the compiler would appear to be a long road for very 
 little practical gain.

As discussed, there is plentiful evidence of an increase in effective use of advanced language technology for very practical applications. In Loki or Boost, only the limitations of C++, and the consequent explosion in complexity and palatability, have put a ceiling on the usefulness of the libraries - not the inability of library writers nor the backlash from "practical programmers". I think such an attitude does little to help the language and ultimately the very practical programmers that the concern is all about.

"Backlash from practical programmers"? Please explain?
Feb 08 2007
prev sibling parent Carlos Santander <csantander619 gmail.com> writes:
kris escribió:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 Following on from the "Regex Redux thread, it seems to me there's an 
 easy way to execute pure D at compile-time. A few elements are needed:

 1) the ability to describe a compile-time function call
 2) the facility to pass arguments to it, and recieve a return value
 3) a means of identifiying the D code to execute
 4) a manner in which the pure D is executed
 5) a mechanism for ensuring the executed code is docile

6) Ensuring that the code executing at compile-time has full access to program's symbols. 1-5 define a way to define dual functions in D, which is a good thing. But without (6), they are useless.

On the contrary, they have access to the arguments passed to them; and to thoose arguments only.

Sorry, I'm just catching up. How would your regex example work if it can only access the arguments? It wouldn't be able to do "new RegExp". -- Carlos Santander Bernal
Feb 12 2007
prev sibling next sibling parent reply Chad J <gamerChad _spamIsBad_gmail.com> writes:
kris wrote:
...
 
 Thoughts?
 
 - Kris
 
 

Hmmm, why doesn't someone write a D interpreter using templates? Then it could work like this: template createAConstant( char[] args ) { const int createAConstant = execute!(" import std.math; int main( char[] args ) { // maybe do something with args return std.math.PI; } ", args); } It seems very daunting since, well, damn it just is. But we can bootstrap it. Just implement a small, turing complete, amount of D using templates. Then you can write a large part of the interpreter using the D code that the templates implemented. Then you could write an even larger part in the D that the small-D implemented. Ultimately it would be cool if the D front end shared a lot of the same D code as the compile-time interpreter, which would help adoption of new features.
Feb 08 2007
parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Chad J wrote:
 Hmmm, why doesn't someone write a D interpreter using templates?

I'd say it's impractical with the current implementation of templates. Each template instantiation is remembered. You can't just run stuff and forget about it. Think, hundreds of megs of ram to 'interpret' a simple program. That's one of the reasons why ctrace runs so bloody slow :P -- Tomasz Stachowiak
Feb 08 2007
parent Chad J <gamerChad _spamIsBad_gmail.com> writes:
Tom S wrote:
 Chad J wrote:
 
 Hmmm, why doesn't someone write a D interpreter using templates?

I'd say it's impractical with the current implementation of templates. Each template instantiation is remembered. You can't just run stuff and forget about it. Think, hundreds of megs of ram to 'interpret' a simple program. That's one of the reasons why ctrace runs so bloody slow :P -- Tomasz Stachowiak

Yeah. This seems to be a big problem with D templates though - they are getting pushed far enough nowadays that their unoptimized implementation is just not enough. Perhaps we need the D compiler to be a bit more wise about how it instantiates templates. It should probably consider discarding some templates early (particularly those that lack member functions/classes/structs/whatever), since some of these are unlikely to be instantiated again. Or maybe by default it should just not remember templates at all. Then it could unroll some of the common tail recursion. I also hear it generates object code while instantiating - baaad, it should probably do that lazily. These improvements should help template metaprogramming in general, not just to create weird stuff like a D interpreter in metacode :)
Feb 08 2007
prev sibling parent Reiner Pope <xxxx xxx.xxx> writes:
Another approach is to harness what's already possible in templates, and 
just ask the compiler to do a little more. In many cases, the template 
coding style is nice, but there's just a lot of syntactical junk lying 
around. Consider atoi (from ddl's meta.conv):

# /* *****************************************************
#  *  char [] itoa!(long n);
#  */
# template itoa(long n)
# {
#     static if (n<0)
#         const char [] itoa = "-" ~ itoa!(-n);
#     else static if (n<10L)
#         const char [] itoa = decimaldigit!(n);
#     else
#         const char [] itoa = itoa!(n/10L) ~ decimaldigit!(n%10L);
# }

It's easy to understand, but compared to a practically identical 
implementation in 'pure D', it's very clumsy:

# char[] itoa(long n)
# {
#     if (n < 0)
#         return itoa(-n);
#     if (n < 10L)
#         return digits[n..n+1];
#     return itoa(n/10L) ~ digits(n%10L);
# }

It's clumsy for a few reasons:
   - you have to write 'const char[] itoa = ' whenever you mean return
   - you have to write 'static if' instead of 'if'
   - you don't explicitly get to document the return type, which even 
led to the comment shown, which would just be wasteful in 'pure D'

However, looking at it from the other point of view, a compiler could 
easily evaluate the 'pure D' version at compile time by converting all 
the 'if's to 'static if's, the 'return's to 'const char[] itoa =', the 
function calls to template insantiations and changing the prototype from 
'char[] itoa' to 'template itoa.'

Additionally, the compiler could then recognise this as a function call, 
and discard the intermediate templates, avoiding some of the memory 
issues involved with compile-time programming.

In fact, if you can avoid the aliasing issue[1] you can convert any D 
code in this fashion, as long as you keep away from asm. Avoiding the 
aliasing issue effectively means that the functions you call can't 
modify their parameters, which satisfies the requirements of template D 
that all variables are 'const'. This means that all statements will have 
to be converted into assign statements (eg Foo = x;)

The two not-so-obvious conversions are:
  - mutable variables
  - loops.

However, they turn out to not too difficult. Mutable variables:

# int x = 1;
# x = foo(x);
# x++;
# x = foo(x);

becomes

# const int x__0 = 1;
# const int x__1 = foo!(x__0);
# const int x__2 = x__1 + 1;
# const int x__3 = foo!(x__2);

and loops are converted into recursion:

# int x = 5;
# for (int i = 0; i < 5; i++)
# {
#    x++;
# }
# return x;

becomes:

# const int x = 5;
# template __loop(int x, int i=0)
# {
#   static if (! (i__0 < 5) )
#       alias x x__final;
#   else
#   {
#       const int x__0 = x + 1;
#
#       const int i__0 = i + 1;
#       alias __loop!(x__0, i__0).x__final x__final;
#   }
# }
# const int Value = __loop!(x);

I believe that all of these conversions could be done automatically, 
which would allow 'pure D' code[2] to be used in templates.

However, I prefer a more general approach such as kris suggested, 
because it does allow aliasing (which can certainly be useful), 
including any type of D code. The point of this post is both to show 
that, although templates are quite capable for metaprogramming, they are 
syntactically ugly, and using a no-pointer subset of 'pure D' in 
templates can be considered just a syntactic improvement.

Cheers,

Reiner

[1]Basically, this involves using const to show that your function is 
indeed 'pure functional'. Annotating functions thus would be useful for 
more than just compile-time programming, I'm sure.

Avoiding the aliasing issue allows everything to be converted to static 
single assignment (which means that it is treated as const), allowing 
const-folding. You could trivially meet this requirement by only 
allowing built-in types (and not even arrays), but you may be able to 
permit more with a 'const' system. In particular, I'm pretty sure that a 
'const' array could easily be const folded, and you could probably 
support structs and const pointer-to-structs. I'm dubious about classes, 
though.

[2]Of course, I mean the alias-free subset thereof.
Feb 08 2007