www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - How hard would it be to create a new backend in D?

reply rempas <rempas tutanota.com> writes:
I was wondering how easy it would be to create a new backend in 
D? From what I know all of the tree DMD, LDC and GDC are using 
the same frontend which is the one from DMD. So I would suppose 
that there is a way to do that and that probably DMD has evolved 
over the years to make its frontend more portable. However, how 
strain-forward is it? Is it legit to do it or will it require 
"hacking" (if you understand what I mean). So it's not about 
working hard but if the work makes sense or if I'll constantly 
find obstacles that no one would be able to help me with.

I know some of you will ask so, I want to use 
[mir](https://github.com/vnmakarov/mir) as a backend because of 
it's fast compile times and it's great runtime performance! So 
yeah, not a question purely out of curiosity!
Aug 05 2022
next sibling parent reply IGotD- <nise nise.com> writes:
On Friday, 5 August 2022 at 20:37:19 UTC, rempas wrote:
 I was wondering how easy it would be to create a new backend in 
 D? From what I know all of the tree DMD, LDC and GDC are using 
 the same frontend which is the one from DMD. So I would suppose 
 that there is a way to do that and that probably DMD has 
 evolved over the years to make its frontend more portable. 
 However, how strain-forward is it? Is it legit to do it or will 
 it require "hacking" (if you understand what I mean). So it's 
 not about working hard but if the work makes sense or if I'll 
 constantly find obstacles that no one would be able to help me 
 with.

 I know some of you will ask so, I want to use 
 [mir](https://github.com/vnmakarov/mir) as a backend because of 
 it's fast compile times and it's great runtime performance! So 
 yeah, not a question purely out of curiosity!
Why would you want to create a new backend when you have LLVM and GCC? Scrap the DMD backend. The only claimed benefit is that it is fast which isn't much compared to the amount of time spent on maintenance. It's not the 90s anymore where you only had a Intel Pentium target. Now you have many Intel optimizations options, ARM (many models there as well), RISC V, PowerPC etc. and more are coming. Let's focus on the language because CPU support is out of scope for the D project.
Aug 05 2022
parent reply rempas <rempas tutanota.com> writes:
On Friday, 5 August 2022 at 21:54:52 UTC, IGotD- wrote:
 Why would you want to create a new backend when you have LLVM 
 and GCC? Scrap the DMD backend. The only claimed benefit is 
 that it is fast which isn't much compared to the amount of time 
 spent on maintenance.
Because LDC (llvm) is terribly slow (and GDC is worse in my system)! I have a dream to create a cross platform tool-chain and package/project manager (no, DUB will not do) to create a new ecosystem where everyone will compile everything from source with the many benefits this adds! But with a compile that is so slow, there is no way that anyone (including me) will want to compile everything from source. So my goal is simply not achievable with the current compilers. Mir is about 4-5 times faster than `GCC -O2` while having about 83%-85% of its runtime performance (on average). Still it's not TCC level but still, much much better!
 It's not the 90s anymore where you only had a Intel Pentium 
 target. Now you have many Intel optimizations options, ARM 
 (many models there as well), RISC V, PowerPC etc. and more are 
 coming.
I don't understand what you're trying to say with this one. Do you mean that LLVM has support for a lot of CPU ISAs so it's a good backend? If yes, then mir has support for a couple of them as well.
 Let's focus on the language because CPU support is out of scope 
 for the D project.
Again, not sure what you mean with that...
Aug 05 2022
parent reply welkam <wwwelkam gmail.com> writes:
On Saturday, 6 August 2022 at 04:33:41 UTC, rempas wrote:
 a new ecosystem where everyone will compile everything from
 source with the many benefits this adds!
So a Gentoo? Now since Gentoo has been mentioned some one is required to mention Arch in the responses. The only benefit I would want from compiling everything myself instead of downloading precompiled binaries is that I could enable specific optimizations for my system. The only backends that have all those optimizations are GCC and LLVM. What other benefits do you see that are worth the hassle? I think when talking about creating executables from source code its better to use the word build instead of compiling to describe that process. In order to build the program you need to compile and link. The last time I built debug version of DMD on my system around half of the time was spent linking so even if you can get a significantly faster backend the whole build time wont change significantly. You need to think about the whole pipeline if you want big changes.
Aug 06 2022
parent rempas <rempas tutanota.com> writes:
On Saturday, 6 August 2022 at 19:35:16 UTC, welkam wrote:
 So a Gentoo? Now since Gentoo has been mentioned some one is 
 required to mention Arch in the responses.

 The only benefit I would want from compiling everything myself 
 instead of downloading precompiled binaries is that I could 
 enable specific optimizations for my system. The only backends 
 that have all those optimizations are GCC and LLVM. What other 
 benefits do you see that are worth the hassle?
Having the ability to create custom builds! Most of the times, this is not a problem but there may be a chance where you have to build your own version of a package (because you need the 'X' feature) and the build-in package manager may not have greatest support to work with its packages and your local ones. Another one, cross-compatibility! No more wasting time for platform-specific bugs because this distro has this built-in and that distro has this option enabled yada yada yada. One package manager, all the systems! And as everything and everyone builds from source, we can have one cross-compiler and we can build a local version of the program. No more, "upload your binary" which, yes it happened to me (if tho we did find the solution without having to do that but still....).
 I think when talking about creating executables from source 
 code its better to use the word build instead of compiling to 
 describe that process. In order to build the program you need 
 to compile and link. The last time I built debug version of DMD 
 on my system around half of the time was spent linking so even 
 if you can get a significantly faster backend the whole build 
 time wont change significantly. You need to think about the 
 whole pipeline if you want big changes.
That's fair but linkers have become faster and faster! See `lld` for example! In my experience, most of the projects (at least the C ones as this is where we have tons of huge projects to test) spend around 90% of the whole compiling process in compiling when using "lld" as the linker (and not link-time optimization). I've seen C++ projects been different as C++ probably outputs more complex symbols so the linker has to do more work and I don't remember for D (Even tho I'm mostly sure that it falls in the C category where link times are faster). So in the end, I think that improving the compile time (output object files) do matter. Also, there is always the ability to create the final executable at one shot! This can be extremely useful in release builds when you don't care about the object files anyways. Is there any D backend that has been build to do that? Actually, the only compilers that I personally know to do that are [Vox](https://github.com/MrSmith33/vox) and [Vlang](https://github.com/vlang/v) when using its "native" backend.
Aug 06 2022
prev sibling next sibling parent reply user1234 <user1234 12.de> writes:
On Friday, 5 August 2022 at 20:37:19 UTC, rempas wrote:
 I was wondering how easy it would be to create a new backend in 
 D? From what I know all of the tree DMD, LDC and GDC are using 
 the same frontend which is the one from DMD. So I would suppose 
 that there is a way to do that and that probably DMD has 
 evolved over the years to make its frontend more portable. 
 However, how strain-forward is it? Is it legit to do it or will 
 it require "hacking" (if you understand what I mean). So it's 
 not about working hard but if the work makes sense or if I'll 
 constantly find obstacles that no one would be able to help me 
 with.

 I know some of you will ask so, I want to use 
 [mir](https://github.com/vnmakarov/mir) as a backend because of 
 it's fast compile times and it's great runtime performance! So 
 yeah, not a question purely out of curiosity!
I think you can try things without writing a backend. If I believe the diagram of what currently works that seems possible to use LDC to produce LLVM IR and then MIR in theory could use that, but MIR does not produce native executables so why ?
Aug 05 2022
parent reply rempas <rempas tutanota.com> writes:
On Friday, 5 August 2022 at 22:09:44 UTC, user1234 wrote:
 I think you can try things without writing a backend. If I 
 believe the diagram of what currently works that seems possible 
 to use LDC to produce LLVM IR and then MIR in theory could use 
 that, ...
LLVM is the slow part of LDC (D's frontend is actually faster than C's frontend for LLVM as LDC compiles cod faster than Clang) so this will not help...
 but MIR does not produce native executables so why?
What do you mean with native? ELF (at least for Unix)? Mir has its own format which is a binary format (including machine instructions). It then uses its own linker so no problem! The format is also cross-platform so in general there are no problems with its format. It also uses JIT so it can probably do some more optimizations (or at least there is room to make them, it don't know what happens in the moment). When it comes to its runtime performance, there is a directory called "c-benchmarks" where I have run the tests (compiling mir from the `bbv` branch) and compared to `GCC -O2`, mir has about 83%-85% of its runtime performance while compiling code about 4-5 times faster on average! If you wonder why I care about compilation times so much, please check my other reply in this thread.
Aug 05 2022
next sibling parent reply cmyka <mauricehuuskes hotmail.com> writes:
On Saturday, 6 August 2022 at 04:41:24 UTC, rempas wrote:
 [...]
In what way do you wish to use MIR? A D frontend that generates MIR or some kind of LLVM-MIR pass? Could be an interesting project if not quite ambitious. I wouldn't let the previous user discourage you here. :)
Aug 05 2022
parent rempas <rempas tutanota.com> writes:
On Saturday, 6 August 2022 at 05:28:06 UTC, cmyka wrote:
 In what way do you wish to use MIR? A D frontend that generates 
 MIR or some kind of LLVM-MIR pass? Could be an interesting 
 project if not quite ambitious. I wouldn't let the previous 
 user discourage you here. :)
Here's the thing... I don't know! That's the case. I wonder what D's frontend (which is DMD's frontend practically) generates. It has to generate some kind of global IR and then probably LDC takes that and turns it into LLVM IR and GDC takes it and turns it into GCC IR. So I will suppose that it has to be some kind of independent middle representation that DMD does. I don't think that there is another way that things can happen... So I wonder how I can get started and if the process is straightforward because if it is not, then I may also think about building a language that is based on D or something like that, idk... As for getting discouraged, I wouldn't see it as if someone tried to discourage me. I think that the guys said their opinion nicely so in any way, I'm still thinking about it. But making my own frontend is still in the corner. The result of this thread will show!
Aug 05 2022
prev sibling parent reply user1234 <user1234 12.de> writes:
On Saturday, 6 August 2022 at 04:41:24 UTC, rempas wrote:
 On Friday, 5 August 2022 at 22:09:44 UTC, user1234 wrote:
 I think you can try things without writing a backend. If I 
 believe the diagram of what currently works that seems 
 possible to use LDC to produce LLVM IR and then MIR in theory 
 could use that, ...
LLVM is the slow part of LDC (D's frontend is actually faster than C's frontend for LLVM as LDC compiles cod faster than Clang) so this will not help...
I suggested to experiment MIR like that, it was not a proposal on the final design. Experimenting using LLVM IR could be useful to determine if working seriously on the project is worth. Anyway if you want to put your hand in the hard stuff from the start I think you have two options. 1. Create an AST visitor that generate MIR format after DMDFE semantics 2. Create the MIR representation after the part of the backend that generate DMD IR (s2ir, e2ir, etc.) has run. The second option might be easier because the production will most of the time map 1:1 to a MIR equivalent. The first option is IMO would be harder because of forward references and imports. and even without that, that would require to split visiting in several passes (decls, aggregate members, function headers, function bodies) About the "how hard" I think that compiler programming is not hard but that takes time. I estimate that this could take you from 1 month to 3 months to finish however you 'd get results much earlier, e.g if you handle just a few constructs.
Aug 06 2022
parent reply rempas <rempas tutanota.com> writes:
On Saturday, 6 August 2022 at 07:10:43 UTC, user1234 wrote:
 I suggested to experiment MIR like that, it was not a proposal 
 on the final design.
 Experimenting using LLVM IR could be useful to determine if 
 working seriously on the project is worth.

 Anyway if you want to put your hand in the hard stuff from the 
 start I think you have two options.

 1. Create an AST visitor that generate MIR format after DMDFE 
 semantics
 2. Create the MIR representation after the part of the backend 
 that generate DMD IR (s2ir, e2ir, etc.) has run.

 The second option might be easier because the production will 
 most of the time map 1:1 to a MIR equivalent.

 The first option is IMO would be harder because of forward 
 references and imports. and even without that, that would 
 require to split visiting in several passes (decls, aggregate 
 members, function headers, function bodies)
Thank you for the info! The thing is (and why I make the question originally) how do I find info about how to get started? I don't even know how DMD works and how it's IR works. Does DMD's frontend parses the text and then outputs something like LLVM-IR (but for DMD) which we can take and then translate it to the final backend that we need (in our case mir) or something else? That's what I want to know. So yeah, is there a legit documentation or something or do backend developers have to guess how things work and do "hacking"?
 About the "how hard" I think that compiler programming is not 
 hard but that takes time. I estimate that this could take you 
 from 1 month to 3 months to finish however you 'd get results 
 much earlier, e.g if you handle just a few constructs.
That's actually pretty nice! I don't mind about putting the work but I mind the work to be strain-forward and make sense. I would expect to see actual documentation and info about how things work in detail. If not, then I would probably spend the time to design and implement my own language.
Aug 06 2022
parent reply user1234 <user1234 12.de> writes:
On Saturday, 6 August 2022 at 08:04:37 UTC, rempas wrote:
 On Saturday, 6 August 2022 at 07:10:43 UTC, user1234 wrote:
 I suggested to experiment MIR like that, it was not a proposal 
 on the final design.
 Experimenting using LLVM IR could be useful to determine if 
 working seriously on the project is worth.

 Anyway if you want to put your hand in the hard stuff from the 
 start I think you have two options.

 1. Create an AST visitor that generate MIR format after DMDFE 
 semantics
 2. Create the MIR representation after the part of the backend 
 that generate DMD IR (s2ir, e2ir, etc.) has run.

 The second option might be easier because the production will 
 most of the time map 1:1 to a MIR equivalent.

 The first option is IMO would be harder because of forward 
 references and imports. and even without that, that would 
 require to split visiting in several passes (decls, aggregate 
 members, function headers, function bodies)
Thank you for the info! The thing is (and why I make the question originally) how do I find info about how to get started? I don't even know how DMD works and how it's IR works. Does DMD's frontend parses the text and then outputs something like LLVM-IR (but for DMD) which we can take and then translate it to the final backend that we need (in our case mir) or something else? That's what I want to know. So yeah, is there a legit documentation or something or do backend developers have to guess how things work and do "hacking"? [...]
You'll have to read DMD code to get familiar with its code base (another way in the past was fixing bugs, unfortunately there are not much easy ones anymore). Fortunately you'll dont have to understand the whole thing. In a first time I'd suggest you to follow the lifetime of one particular construct and that for each big family of node. Choose - a Type (maybe the one for `int`) - a Statement (maybe the ReturnStatement) - a Declaration (the FunctionDeclaration) - an Expression (maybe the IntegerExp). Try to follow what is happening during the different passes. That way you'll have a good idea of what the compiler does for ```d int i(){return 0;} ``` and where you could generate MIR stuff.
Aug 06 2022
parent rempas <rempas tutanota.com> writes:
On Saturday, 6 August 2022 at 08:31:16 UTC, user1234 wrote:
 You'll have to read DMD code to get familiar with its code base 
 (another way in the past was fixing bugs, unfortunately there 
 are not much easy ones anymore). Fortunately you'll dont have 
 to understand the whole thing. In a first time I'd suggest you 
 to follow the lifetime of one particular construct and that for 
 each big family of node.

 Choose
 - a Type (maybe the one for `int`)
 - a Statement (maybe the ReturnStatement)
 - a Declaration (the FunctionDeclaration)
 - an Expression (maybe the IntegerExp).

 Try to follow what is happening during the different passes.
 That way you'll have a good idea of what the compiler does for

 ```d
 int i(){return 0;}
 ```

 and where you could generate MIR stuff.
Thanks my friend! I'll try to read and understand the code and If I end up been able to create anything, I'll shared it here! Have a great day!
Aug 06 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/5/2022 1:37 PM, rempas wrote:
 I was wondering how easy it would be to create a new backend in D? From what I 
 know all of the tree DMD, LDC and GDC are using the same frontend which is the 
 one from DMD. So I would suppose that there is a way to do that and that 
 probably DMD has evolved over the years to make its frontend more portable. 
 However, how strain-forward is it? Is it legit to do it or will it require 
 "hacking" (if you understand what I mean). So it's not about working hard but
if 
 the work makes sense or if I'll constantly find obstacles that no one would be 
 able to help me with.
The dmd backend is already in D :-) But since it's all Boost Licensed, anyone can use 0..100% of it for their own backend project. No asking is required. Have fun!
Aug 06 2022
next sibling parent reply welkam <wwwelkam gmail.com> writes:
On Saturday, 6 August 2022 at 07:39:36 UTC, Walter Bright wrote:
 The dmd backend is already in D :-)

 But since it's all Boost Licensed, anyone can use 0..100% of it 
 for their own backend project. No asking is required.

 Have fun!
But where is inliner located? I read that several other languages (Jai, Zig) are trying to make their own x86_64 backends because GCC and LLVM are too slow. If DMD backend had its IR well documented and inliner implemented not in the frontend I could see a future where it could be used by other languages.
Aug 06 2022
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Saturday, 6 August 2022 at 18:02:34 UTC, welkam wrote:
 But where is inliner located? I read that several other 
 languages (Jai, Zig) are trying to make their own x86_64 
 backends because GCC and LLVM are too slow. If DMD backend had 
 its IR well documented and inliner implemented not in the 
 frontend I could see a future where it could be used by other 
 languages.
DMD's inliner is being moved to the backend: https://github.com/dlang/dmd/pull/14194
Aug 06 2022
parent welkam <wwwelkam gmail.com> writes:
On Saturday, 6 August 2022 at 18:35:18 UTC, Paul Backus wrote:
 On Saturday, 6 August 2022 at 18:02:34 UTC, welkam wrote:
 But where is inliner located? I read that several other 
 languages (Jai, Zig) are trying to make their own x86_64 
 backends because GCC and LLVM are too slow. If DMD backend had 
 its IR well documented and inliner implemented not in the 
 frontend I could see a future where it could be used by other 
 languages.
DMD's inliner is being moved to the backend: https://github.com/dlang/dmd/pull/14194
Awesome.
Aug 06 2022
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/6/2022 11:02 AM, welkam wrote:
 On Saturday, 6 August 2022 at 07:39:36 UTC, Walter Bright wrote:
 The dmd backend is already in D :-)

 But since it's all Boost Licensed, anyone can use 0..100% of it for their own 
 backend project. No asking is required.

 Have fun!
But where is inliner located?
DMD has two inliners, a front end one and a back end one. Here's the back end one: https://github.com/dlang/dmd/blob/master/compiler/src/dmd/backend/inliner.d
 I read that several other languages (Jai, Zig) are 
 trying to make their own x86_64 backends because GCC and LLVM are too slow. If 
 DMD backend had its IR well documented and inliner implemented not in the 
 frontend I could see a future where it could be used by other languages.
The IR is very simple: https://github.com/dlang/dmd/blob/master/compiler/src/dmd/backend/el.d#L70 It's a binary tree. Not clever at all. It's about 50 lines of declaration. Too see it in action, use the --b --f switches: ./dmd test.d -c --b --f which will pretty print the IR before and after optimization. There are some other switches, like --r which will show the register allocator at work.
Aug 07 2022
prev sibling parent rempas <rempas tutanota.com> writes:
On Saturday, 6 August 2022 at 07:39:36 UTC, Walter Bright wrote:
 The dmd backend is already in D :-)

 But since it's all Boost Licensed, anyone can use 0..100% of it 
 for their own backend project. No asking is required.

 Have fun!
Thank you! It just happened that I missed this reply and wouldn't even see it if it wasn't for another reply that quotes it...
Aug 06 2022