www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Writing a compiler in CTFE

reply MysteryMan <YouCanContactMe Mystery.esp> writes:
I would like to create a compiler, I like D but I hate it! I want 
to migrate to a new compiler, possibly a personal compiler where 
I can easily customize and tweak until my hearts content.

For speed of development, instead of having to compile a compiler 
that then compiles the program I figured using D's CTFE and 
import could work. For a monolithic compiler file import is used. 
This is all easily within D's grasp.


The process is as follows:

We write dmd code utilizing all the power of the D language, but 
minimize complexity since it is for bootstrapping only and 
ideally exists only at version 0 of the compiler that will parse 
our new language grammar from which we built our new compiler in 
it's own language.


We can break the process up in to 5 stages

[Our new compiler's source code written in it's own language]     
->
[D source code that compiles sources in our new language at CTFE] 
->
[DMD] ->
[Have the binary run on the source code from stage 1]

After these steps have been done one has a binary that is the 
boot strap compiler that can be used as the "core" compiler for 
the new language. It takes the core language, which should be 
minimally specified to avoid complexity, bugs, etc but completely 
expressible.

To get the next version of the compiler away from dmd one must 
then alter the source code to supply the new binary code 
generators that we used in stage 2. This is a lot of work as all 
semantics must be remapped from the dmd design to the new 
languages design.

This last stage is where all the thought must be put in so we can 
minimize design time.


So, we start with a well specified but arbitrary programming 
language that has symbols and semantics for those symbols.

For example, we have the tiny super compiler which is written in 
javascript: 
https://github.com/jamiebuilds/the-super-tiny-compiler/blob/master/the-super-tiny-compiler.js

To make life more interesting, just assume this is done in D.

This could be our input to the dmd's CTFE engine in which we 
would have to have a D parser than can parse the source code(maps 
D constructs to D constructs so this is very easy, in fact, we 
can just `mixin` the code directly. Imagine a mixinjs which mixes 
in js source code which was converted to D, a bit more 
complicated but still doable)


What's interesting about this method is that one can 
always(assuming no broken compatibilities) use D to generate a 
new bootstrap and also use the last version to boot strap itself.

The boot strapped compiler automatically has all the features 
that dmd has such as all the architectures are available(does 
require recompiling the boot strap with the new dmd args).

What's more, is if we already had a ctfe compiler for our 
language, we could use it inside any D program, has I've already 
showed with mixinjs, we could have an mixin(import!(js)(file)) 
which converts the js code to D code and mixes it in directly. 
Some plumbing may be required but it would allow us to not only 
import d code in to d but other languages(that can be 
representable in D easily).


For example, suppose we had a C to D compiler in the above sense. 
import!C(C_file) will take any c file and map the source to d 
source(most of the syntax is identical so it is an easy mapping).

Some work is require, for example, It would have to map #import 
X; statements to import!C(X);. With some plumbing work we can use 
any C code with D.


Such a concept would be very powerful indeed! But to be able to 
accomplish this in a general way as to provide this technique we 
need a very general way to specify a compiler framework in D(that 
works in ctfe for rapid production) that makes it easy to 
represent most popular languages.


Most of the work is in translating one grammar to the other, and 
therefor, this new framework must be able to make translation 
easy.

E.g., the for loop in C is identical to the for loop in D so a 
direct mapping can be used. In matlab code the for loop looks 
like for i = 1:10. This is just a rearrangement of the for loop 
in C, for the most part so it too has a direct mapping.

The best I can understand it is that we have our input language 
input grammar and we want to map it to the D language grammar. 
Hence we have a mapping between grammars.

This is a very complex issue because of several corner cases. 
What I am proposing here is for discourse on ways to express this 
problem for to maximize expressivity while minimizing effort(the 
good old min/max problem we all know and love).

I will start by expressing my two current positions on this 
problem:

One of the first problems is to settle on terminology and discuss 
the pathological issues that exist.
Jul 01 2018
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Sunday, 1 July 2018 at 13:51:36 UTC, MysteryMan wrote:
 [...]
 For speed of development, instead of having to compile a 
 compiler that then compiles the program I figured using D's 
 CTFE and import could work. For a monolithic compiler file 
 import is used. This is all easily within D's grasp.
Take a look at https://github.com/UplinkCoder/pl0stuff, which actually does this. However I would like to point out that, without an archive of all the source-modules available at compiletime it'd become very hard.
Jul 01 2018