www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - CTFE - compiling other languages inside D

reply "Marco Leise" <Marco.Leise gmx.de> writes:
With the recent ORM and RegEx projects dissecting SQL statements and  
regular expressions turning them into D code, I am impressed by the  
possibilities of CTFE. Where are the limitations to this system? An  
unlikely example would be a C compiler within CTFE that takes a string of  
C source code and turns it into a D mixin. Is that possible?
I have written a visualizer that is supposed to work on the web and as a  
stand-alone application. So I chose JavaScript to implement it, since it  
is the only programming language available to web developers across  
platforms. Then I used Rhino JavaScript for Java, implemented a few web  
APIs in Java (HTML canvas, AJAX) and had an applet for IE8 and older and a  
standalone-application in one go. Rhino compiles JavaScript into Java  
code, but from the untyped nature of JavaScript it is not possible to  
optimize much there and it came down to writing a JavaScript to Java byte  
code compiler for the authors of Rhino. So they cannot use the possibly  
advanced optimization features of the actual Java compiler.
Now there is V8, the JavaScript engine in Chrome which is written in C++,  
uses JIT compilation and profiling and is *very* fast. But if I wanted to  
use D and speed was my only concern I would sacrifice some of the  
ECMAScript standard:
- everything is typed
- code cannot be altered or generated on runtime (i.e. eval)
Some things may be difficult to do. In the following case the "obj" adds a  
new field later on, so the for loop has to iterate over the first two  
fields of that data type, excluding 'newField'.

var obj = { a : 1, b : "text" };
for (var key in obj) {
	...
}
obj.newField = 0.5;

Another tricky case where there have to be two versions of foo for "a" and  
"b" or a common type for "a" and "b" that includes a numeric field "n" and  
a text field "c". That means "a" and "b" would become objects of the same  
type where the compiler would have to check if this is at all possible.

function foo(x) {
	...
}
a = { n : 1 };
b = { c : 'w' };
foo(a);
foo(b);

At the end of the day it wouldn't be 100% ECMAScript any more, but it  
would allow that code to compile in an optimized way and at the same time  
run within a browser. Any standard ECMA feature that doesn't work would  
result in an error message. This would probably also allow browsers to  
apply more optimizations on the resulting code in terms of JIT  
compilation. Here is a small snippet in JavaScript and in D:

function log(x) {
	...
}

var x = 3;
x *= 1.2345;
arr = new Array(10);
for (var i = 0; i < 10; ++i) arr[i] = i;
log(arr[5]);

---------------

function log(long x) {	// we need an 'long' version of the log function
	...
}

double x = 3;	// x is later assigned a floating point number
x *= 1.2345;
long[10] arr;	// the slots in this array are all assigned integral numbers  
before any of them are read
for (int i = 0; i < 10; ++i) arr[i] = i;
log(arr[5]);
Aug 10 2011
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
Marco Leise wrote:
 An unlikely example would be a C compiler within CTFE that takes a
 string of C source code and turns it into a D mixin. Is that
 possible?

It'd be a fair amount of work, but it should be possible. It's tempting to try to implement that as an alternative to D bindings modules. mixin include_C ( import("stdio.h") ); Turning Javascript into D is probably harder yet... but I still think it could be done. tbh though, I think you'd be better off using a javascript interpreter and duplicating a little bit of effort to optimize stuff. So you write it in javascript, then use a javascript engine in your distributable app. Functions that are cpu intensive are then rewritten in D so the script can call them and get better speed out of it.
Aug 10 2011
parent reply "Nick Sabalausky" <a a.a> writes:
"Adam D. Ruppe" <destructionator gmail.com> wrote in message 
news:j1ufc0$avd$1 digitalmars.com...
 Marco Leise wrote:
 An unlikely example would be a C compiler within CTFE that takes a
 string of C source code and turns it into a D mixin. Is that
 possible?

It'd be a fair amount of work, but it should be possible. It's tempting to try to implement that as an alternative to D bindings modules. mixin include_C ( import("stdio.h") );

It's a neat possibility, but the downside of that approach, I suspect, is that it may slow down compilation. With that approach, "stdio.h" has to be processed *every* time your program is compiled, not just whenever "stdio.h" is changed (which is what you would get if the conversion were done with a separate tool and a proper buildsystem). Also, I'm sure that CTFE is probably slower than running an already compiled tool. It would have to be slower, since it *is* interpreted, after all. This is another reason why CTFE really needs to support IO access (I really believe the strict adherance to "CTFE must be *guaranteed* stateless" is a mistake. It's right to strongly discourage it, but making the ban this strict is taking things too far - similar to Java's ban on pointers). Then, include_C could be implemented roughly like this: string include_C(string filename) { auto cache = filename~".cache"; if(exists(cache) && timestamp(cache) >= timestamp(filename)) return loadFile(cache); else { auto result = convert_C(loadFile(filename)); saveFile(cache, result); return result; } } string convert_C(string src) { // Do the conversion, possibly even by invoking a pre-compiled tool. } // Only gets processed if stdio.h has changed mixin( include_C_file("stdio.h") ); The other big benefit, of course, if that we'd finally get compile-time write*() for free. This would also open the door for a CTFE/library-based buildsystem that doesn't require a dedicated "makefile" or equivalent, which is an interesting prospect.
Aug 10 2011
next sibling parent Robert Clipsham <robert octarineparrot.com> writes:
On 10/08/2011 21:32, Marco Leise wrote:
 For starters, how about this?:
 static string someExternalText = __ctfeReadFile("external.txt");
 static byte[] chipInitialState = __ctfeReadFile("initial_state.bin");
 Every external file used in compiling a source file would be added to
 the list of files to check for their modification date in relation to
 the resulting object file. This ensures that the object is recreated
 when either of the sources change. The list can be in a separate file
 per each D source using this feature.

 This offers:
 - no execution of arbitrary commands
 - usual compile-if-newer logic doesn't reinvent the wheel
 - compile-time conversion of C headers
 - add snippets in domain specific languages by their own respective
 source files
 - include microcode blobs and other binary data in your modules if desired

 Personally I think this idea rocks, but YMMV :p .

You can already do that! enum _ = import("someFile"); Then when compiling, use the -J switch (required) to specify the include directory. -- Robert http://octarineparrot.com/
Aug 10 2011
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
Marco Leise wrote:
 For starters, how about this?:
      static string someExternalText = __ctfeReadFile("external.txt");
      static byte[] chipInitialState = __ctfeReadFile("initial_state.bin");

static string someExternalText = import("external.txt"); static byte[] chipInitialState = import("initial_state.bin"); (You need to pass the -Jpath switch)
Aug 10 2011
parent David Nadlinger <see klickverbot.at> writes:
On 8/10/11 11:43 PM, Marco Leise wrote:
 Am 10.08.2011, 22:49 Uhr, schrieb Timon Gehr <timon.gehr gmx.ch>:

 Marco Leise wrote:
 For starters, how about this?:
 static string someExternalText = __ctfeReadFile("external.txt");
 static byte[] chipInitialState = __ctfeReadFile("initial_state.bin");

static string someExternalText = import("external.txt"); static byte[] chipInitialState = import("initial_state.bin"); (You need to pass the -Jpath switch)

Oh err, well. I was a little behind time it seems *g*. So Ary Manzana was wrong saying "I think it's possible, though CTFE can't access outside resources."

Well, you can't really use import() from CTFE, the file name must be a compile-time constant. This would make things like importing other files based on include statements in some file cumbersome, as you would have to resort to a strange combination of templates and CTFE. David
Aug 10 2011
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
Nick Sabalausky wrote:
 "Adam D. Ruppe" <destructionator gmail.com> wrote in message
 news:j1ufc0$avd$1 digitalmars.com...
 Marco Leise wrote:
 An unlikely example would be a C compiler within CTFE that takes a
 string of C source code and turns it into a D mixin. Is that
 possible?

It'd be a fair amount of work, but it should be possible. It's tempting to try to implement that as an alternative to D bindings modules. mixin include_C ( import("stdio.h") );

It's a neat possibility, but the downside of that approach, I suspect, is that it may slow down compilation. With that approach, "stdio.h" has to be processed *every* time your program is compiled, not just whenever "stdio.h" is changed (which is what you would get if the conversion were done with a separate tool and a proper buildsystem). Also, I'm sure that CTFE is probably slower than running an already compiled tool. It would have to be slower, since it *is* interpreted, after all. This is another reason why CTFE really needs to support IO access (I really believe the strict adherance to "CTFE must be *guaranteed* stateless" is a mistake. It's right to strongly discourage it, but making the ban this strict is taking things too far - similar to Java's ban on pointers). Then, include_C could be implemented roughly like this: [snip.]

Another interesting way to speed up CTFE-heavy code compilation in a incremental build environment would be to allow the compiler to cache the internal AST-representation of the code after semantic analysis, in some kind of temporary file.
Aug 10 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/10/11 11:09 AM, Marco Leise wrote:
 With the recent ORM and RegEx projects dissecting SQL statements and
 regular expressions turning them into D code, I am impressed by the
 possibilities of CTFE. Where are the limitations to this system? An
 unlikely example would be a C compiler within CTFE that takes a string
 of C source code and turns it into a D mixin. Is that possible?

Yah, that would be possible (albeit difficult). I think, however, that better applications of CTFE are not for translating full-blown languages into D. Instead, the best added value would be to translate small DSLs into D code. Examples include: * regex (I'm very glad Dmitry found the time to implement that - static regexen will long serve as a poster child of CTFE's power); * SQL - embedded SQL integrated perfectly with D data would be awesome and relatively easy to define; * Tokenizers (think lex); * Parsers (think yacc, antlr etc); * String interpolation (think Python's format, printf-style format parsed statically etc); * Make :o); * Protocol description; * Automata, transducers of various kinds; * and more. I hope Dmitry's work will mark a growing trend of defining DSLs in D. Andrei
Aug 10 2011
parent Jacob Carlborg <doob me.com> writes:
On 2011-08-10 20:04, Andrei Alexandrescu wrote:
 On 8/10/11 11:09 AM, Marco Leise wrote:
 With the recent ORM and RegEx projects dissecting SQL statements and
 regular expressions turning them into D code, I am impressed by the
 possibilities of CTFE. Where are the limitations to this system? An
 unlikely example would be a C compiler within CTFE that takes a string
 of C source code and turns it into a D mixin. Is that possible?

Yah, that would be possible (albeit difficult). I think, however, that better applications of CTFE are not for translating full-blown languages into D. Instead, the best added value would be to translate small DSLs into D code. Examples include: * regex (I'm very glad Dmitry found the time to implement that - static regexen will long serve as a poster child of CTFE's power); * SQL - embedded SQL integrated perfectly with D data would be awesome and relatively easy to define; * Tokenizers (think lex); * Parsers (think yacc, antlr etc); * String interpolation (think Python's format, printf-style format parsed statically etc); * Make :o); * Protocol description; * Automata, transducers of various kinds; * and more. I hope Dmitry's work will mark a growing trend of defining DSLs in D. Andrei

I think it would be better if D was more DSL friendlier instead of embedding code inside string mixins. -- /Jacob Carlborg
Aug 11 2011
prev sibling next sibling parent Ary Manzana <ary esperanto.org.ar> writes:
On 8/10/11 2:09 PM, Marco Leise wrote:
 With the recent ORM and RegEx projects dissecting SQL statements and
 regular expressions turning them into D code, I am impressed by the
 possibilities of CTFE. Where are the limitations to this system? An
 unlikely example would be a C compiler within CTFE that takes a string
 of C source code and turns it into a D mixin. Is that possible?

I think it's possible, though CTFE can't access outside resources. In my ideal language with CTFE capabilites, you say how to connect to the database and at compile-time the classes are generated from that.
Aug 10 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 10.08.2011, 20:34 Uhr, schrieb Nick Sabalausky <a a.a>:

 "Adam D. Ruppe" <destructionator gmail.com> wrote in message
 news:j1ufc0$avd$1 digitalmars.com...
 Marco Leise wrote:
 An unlikely example would be a C compiler within CTFE that takes a
 string of C source code and turns it into a D mixin. Is that
 possible?

It'd be a fair amount of work, but it should be possible. It's tempting to try to implement that as an alternative to D bindings modules. mixin include_C ( import("stdio.h") );

It's a neat possibility, but the downside of that approach, I suspect, is that it may slow down compilation. With that approach, "stdio.h" has to be processed *every* time your program is compiled, not just whenever "stdio.h" is changed (which is what you would get if the conversion were done with a separate tool and a proper buildsystem). Also, I'm sure that CTFE is probably slower than running an already compiled tool. It would have to be slower, since it *is* interpreted, after all. This is another reason why CTFE really needs to support IO access (I really believe the strict adherance to "CTFE must be *guaranteed* stateless" is a mistake. It's right to strongly discourage it, but making the ban this strict is taking things too far - similar to Java's ban on pointers). Then, include_C could be implemented roughly like this: string include_C(string filename) { auto cache = filename~".cache"; if(exists(cache) && timestamp(cache) >= timestamp(filename)) return loadFile(cache); else { auto result = convert_C(loadFile(filename)); saveFile(cache, result); return result; } } string convert_C(string src) { // Do the conversion, possibly even by invoking a pre-compiled tool. } // Only gets processed if stdio.h has changed mixin( include_C_file("stdio.h") ); The other big benefit, of course, if that we'd finally get compile-time write*() for free. This would also open the door for a CTFE/library-based buildsystem that doesn't require a dedicated "makefile" or equivalent, which is an interesting prospect.

Although there are other languages allowing you to call external programs during compilation it feels like opening Pandora's box and people will start sending code around that does "rm -rf ~/*". Then again, the same effect can be accomplished later at runtime so I don't know if there is really any objective difference. I wouldn't mind if there was a compiler switch to enable compile-time I/O for exactly the things you mentioned. For starters, how about this?: static string someExternalText = __ctfeReadFile("external.txt"); static byte[] chipInitialState = __ctfeReadFile("initial_state.bin"); Every external file used in compiling a source file would be added to the list of files to check for their modification date in relation to the resulting object file. This ensures that the object is recreated when either of the sources change. The list can be in a separate file per each D source using this feature. This offers: - no execution of arbitrary commands - usual compile-if-newer logic doesn't reinvent the wheel - compile-time conversion of C headers - add snippets in domain specific languages by their own respective source files - include microcode blobs and other binary data in your modules if desired Personally I think this idea rocks, but YMMV :p .
Aug 10 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 10.08.2011, 22:49 Uhr, schrieb Timon Gehr <timon.gehr gmx.ch>:

 Marco Leise wrote:
 For starters, how about this?:
      static string someExternalText = __ctfeReadFile("external.txt");
      static byte[] chipInitialState =  
 __ctfeReadFile("initial_state.bin");

static string someExternalText = import("external.txt"); static byte[] chipInitialState = import("initial_state.bin"); (You need to pass the -Jpath switch)

Oh err, well. I was a little behind time it seems *g*. So Ary Manzana was wrong saying "I think it's possible, though CTFE can't access outside resources."
Aug 10 2011
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Heh, I just tried to outsmart the compiler by mixing in a version
declaration from another file, but it complains about version being
defined after use.

E.g.
ctfe.d:
version = foo;

main.d:
mixin(import("ctfe.d"));
version(foo)  // won't fly, version foo defined after use
{
}
Aug 10 2011