www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Is there any language that native-compiles faster than D?

reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
After having evaluated the compilation speed of D compared to 
other languages at

     https://github.com/nordlow/compiler-benchmark

I wonder; is there any language that compiles to native code 
anywhere nearly as fast or faster than D, except C?

If so it most likely needs to use a backend other than LLVM.

I believe Jai is supposed to do that but it hasn't been released 
yet.
Aug 20 2020
next sibling parent reply kinke <noone nowhere.com> writes:
On Thursday, 20 August 2020 at 20:50:25 UTC, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to 
 other languages at

     https://github.com/nordlow/compiler-benchmark

 I wonder; is there any language that compiles to native code 
 anywhere nearly as fast or faster than D, except C?

 If so it most likely needs to use a backend other than LLVM.

 I believe Jai is supposed to do that but it hasn't been 
 released yet.
Pardon me, but that code seems everything but remotely representative to me - no structs, no classes, no control flow, just a few integer additions and calls. It even seems to make D look worse than it is, simply because object.d is imported but totally unused. E.g., on my Win64 box with DMD 2.093, compiling this: ----- int add_int_n0_h0(int x) { return x + 15440; } int add_int_n0(int x) { return x + add_int_n0_h0(x) + 95485; } int add_int_n1_h0(int x) { return x + 37523; } int add_int_n1(int x) { return x + add_int_n1_h0(x) + 92492; } int add_int_n2_h0(int x) { return x + 39239; } int add_int_n2(int x) { return x + add_int_n2_h0(x) + 12248; } int main() { int int_sum = 0; int_sum += add_int_n0(0); int_sum += add_int_n1(1); int_sum += add_int_n2(2); return int_sum; } ----- with `dmd -o- bla.d` takes about 37ms, while creating an empty object.d and compiling with `dmd -o- bla.d object.d` takes 24ms. There's no default includes for C, so this would make the comparison more fair. Additionally, generics and templates are completely different concepts and can't be compared.
Aug 20 2020
next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 20 August 2020 at 21:21:39 UTC, kinke wrote:
 with `dmd -o- bla.d` takes about 37ms, while creating an empty 
 object.d and compiling with `dmd -o- bla.d object.d` takes 
 24ms. There's no default includes for C, so this would make the 
 comparison more fair.
Thanks. But for really large files that won't make that big of difference. I'll add the generation of the object.d file aswell.
 Additionally, generics and templates are completely different 
 concepts and can't be compared.
I'm aware of that. But I wanted to start somewhere and expand from there.
Aug 20 2020
next sibling parent reply kinke <noone nowhere.com> writes:
On Thursday, 20 August 2020 at 21:34:54 UTC, Per Nordlöw wrote:
 But for really large files that won't make that big of 
 difference.
Of course not, but your benchmark is as tiny as it gets. ;)
 I'll add the generation of the object.d file aswell.
For linking, a D main requires the _d_cmain template imported by object.d, so you'll have to make it extern(C) in that case.
 I'm aware of that. But I wanted to start somewhere and expand 
 from there.
I don't think languages can be compared like that. One would probably need a reasonably-sized non-contrived project and port it to each language, exploiting the language features, but then runtime and standard libraries would play a significant role as well (object.d kinda already does). Another strong suit of D, its module system and compiling multiple modules at once, isn't reflected either.
Aug 20 2020
next sibling parent kinke <noone nowhere.com> writes:
On Thursday, 20 August 2020 at 21:49:16 UTC, kinke wrote:
 Of course not, but your benchmark is as tiny as it gets. ;)
Ah sry, just seen now that the table has been generated with `--function-count=200 --function-depth=450`.
Aug 20 2020
prev sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 20 August 2020 at 21:49:16 UTC, kinke wrote:
 For linking, a D main requires the _d_cmain template imported 
 by object.d, so you'll have to make it extern(C) in that case.
Moreover, I just realized I should probably add a test case with `-betterC` aswell. Or maybe make it default until `-betterC` is not sufficient.
Aug 20 2020
prev sibling parent kinke <noone nowhere.com> writes:
I've just seen that you use ldmd2, not ldc2 directly. This makes 
quite a difference for such tiny code, in my case (with -c, not 
-o-) something like 60ms vs. 42ms.
Aug 20 2020
prev sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 20 August 2020 at 21:21:39 UTC, kinke wrote:
 Pardon me, but that code seems everything but remotely 
 representative to me - no structs, no classes, no control flow, 
 just a few integer additions and calls.
I warmly welcome suggestions on how to improve the relevance of these tests. ;)
Aug 21 2020
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:
On 8/20/20 4:50 PM, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to other 
 languages at
 
      https://github.com/nordlow/compiler-benchmark
 
 I wonder; is there any language that compiles to native code anywhere 
 nearly as fast or faster than D, except C?
 
 If so it most likely needs to use a backend other than LLVM.
 
 I believe Jai is supposed to do that but it hasn't been released yet.
I tried Python a while ago, the build-run cycle for a simple program was about the same. For Perl it was faster.
Aug 20 2020
next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 20 August 2020 at 22:20:19 UTC, Andrei Alexandrescu 
wrote:
 I tried Python a while ago, the build-run cycle for a simple 
 program was about the same. For Perl it was faster.
That same as what? D? That entirely depends on the complexity of the Python program. What kind of app/program where you testing?
Aug 20 2020
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:
On 8/20/20 6:35 PM, Per Nordlöw wrote:
 On Thursday, 20 August 2020 at 22:20:19 UTC, Andrei Alexandrescu wrote:
 I tried Python a while ago, the build-run cycle for a simple program 
 was about the same. For Perl it was faster.
That same as what? D? That entirely depends on the complexity of the Python program. What kind of app/program where you testing?
It was a simple wc program with rdmd versus python.
Aug 20 2020
prev sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 20 August 2020 at 22:20:19 UTC, Andrei Alexandrescu 
wrote:
 On 8/20/20 4:50 PM, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to 
 other languages at
 
      https://github.com/nordlow/compiler-benchmark
 
 I wonder; is there any language that compiles to native code 
 anywhere nearly as fast or faster than D, except C?
 
 If so it most likely needs to use a backend other than LLVM.
 
 I believe Jai is supposed to do that but it hasn't been 
 released yet.
I tried Python a while ago, the build-run cycle for a simple program was about the same. For Perl it was faster.
Neither perl nor python compile their code by default.
Aug 20 2020
parent reply Atila Neves <atila.neves gmail.com> writes:
On Thursday, 20 August 2020 at 23:16:40 UTC, Stefan Koch wrote:
 On Thursday, 20 August 2020 at 22:20:19 UTC, Andrei 
 Alexandrescu wrote:
 On 8/20/20 4:50 PM, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to 
 other languages at
 
      https://github.com/nordlow/compiler-benchmark
 
 I wonder; is there any language that compiles to native code 
 anywhere nearly as fast or faster than D, except C?
 
 If so it most likely needs to use a backend other than LLVM.
 
 I believe Jai is supposed to do that but it hasn't been 
 released yet.
I tried Python a while ago, the build-run cycle for a simple program was about the same. For Perl it was faster.
Neither perl nor python compile their code by default.
Yes, they do. Just not to x86.
Aug 26 2020
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 26 August 2020 at 09:19:47 UTC, Atila Neves wrote:
 On Thursday, 20 August 2020 at 23:16:40 UTC, Stefan Koch wrote:
 On Thursday, 20 August 2020 at 22:20:19 UTC, Andrei 
 Alexandrescu wrote:
 On 8/20/20 4:50 PM, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared 
 to other languages at
 
      https://github.com/nordlow/compiler-benchmark
 
 I wonder; is there any language that compiles to native code 
 anywhere nearly as fast or faster than D, except C?
 
 If so it most likely needs to use a backend other than LLVM.
 
 I believe Jai is supposed to do that but it hasn't been 
 released yet.
I tried Python a while ago, the build-run cycle for a simple program was about the same. For Perl it was faster.
Neither perl nor python compile their code by default.
Yes, they do. Just not to x86.
If they compile to bytecode that's still not native code. Which is what was asked if I understood the question correctly.
Aug 26 2020
prev sibling next sibling parent reply Guillaume Piolat <first.name gmail.com> writes:
On Thursday, 20 August 2020 at 20:50:25 UTC, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to 
 other languages at

     https://github.com/nordlow/compiler-benchmark

 I wonder; is there any language that compiles to native code 
 anywhere nearly as fast or faster than D, except C?
Object Pascal / Delphi has been very fast in the past, perhaps the existing Pascal compilers still are.
Aug 20 2020
parent reply oddp <oddp posteo.de> writes:
On 2020-08-21 00:31, Guillaume Piolat via Digitalmars-d wrote:
 On Thursday, 20 August 2020 at 20:50:25 UTC, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to other languages
at

     https://github.com/nordlow/compiler-benchmark

 I wonder; is there any language that compiles to native code anywhere nearly
as fast or faster 
 than D, except C?
Object Pascal / Delphi has been very fast in the past, perhaps the existing Pascal compilers still are.
Same goes for good old ada. It has blazing fast compilation times in conjunction with gcc-gnat, but can't tell whether they made any kind of significant progress on the llvm-gnat front in recent months.
Aug 20 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 20 August 2020 at 23:57:09 UTC, oddp wrote:
 Same goes for good old ada. It has blazing fast compilation 
 times in conjunction with gcc-gnat,
Nope. I just added support for it in compiler-benchmark. It is crazy slow: ./benchmark --languages=D,Ada --function-count=20 --function-depth=450 --run-count=1 gives | Lang-uage | Oper-ation | Temp-lated | Time [s/fn] | Slowdown vs [Best] | Version | Exec | | :---: | :---: | --- | :---: | :---: | :---: | :---: | | D | Build | No | 0.150 | 1.0 [D] | v2.093.1-541-ge54c041a4 | `dmd` | | Ada | Build | No | 9.031 | 60.2 [D] | 10.2.0 | `gnat-10` |
Aug 22 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Sunday, 23 August 2020 at 00:06:02 UTC, Per Nordlöw wrote:
 On Thursday, 20 August 2020 at 23:57:09 UTC, oddp wrote:
 Same goes for good old ada. It has blazing fast compilation 
 times in conjunction with gcc-gnat,
Nope. I just added support for it in compiler-benchmark. It is crazy slow: ./benchmark --languages=D,Ada --function-count=20 --function-depth=450 --run-count=1 gives | Lang-uage | Oper-ation | Temp-lated | Time [s/fn] | Slowdown vs [Best] | Version | Exec | | :---: | :---: | --- | :---: | :---: | :---: | :---: | | D | Build | No | 0.150 | 1.0 [D] | v2.093.1-541-ge54c041a4 | `dmd` | | Ada | Build | No | 9.031 | 60.2 [D] | 10.2.0 | `gnat-10` |
Correction, should be | Lang-uage | Oper-ation | Temp-lated | Time [s/fn] | Slowdown vs [Best] | Version | Exec | | :---: | :---: | --- | :---: | :---: | :---: | :---: | | D | Build | No | 0.163 | 1.0 [D] | v2.093.1-541-ge54c041a4 | `dmd` | | Ada | Build | No | 1.596 | 9.8 [D] | 10.2.0 | `gnat-10` | Builds are 10x slower than D.
Aug 22 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Sunday, 23 August 2020 at 00:18:44 UTC, Per Nordlöw wrote:
 Builds are 10x slower than D.
And the slowdown of Ada compared to D gets larger with function size.
Aug 22 2020
parent reply oddp <oddp posteo.de> writes:
On 2020-08-23 02:20, Per Nordlöw via Digitalmars-d wrote:
 On Sunday, 23 August 2020 at 00:18:44 UTC, Per Nordlöw wrote:
 Builds are 10x slower than D.
And the slowdown of Ada compared to D gets larger with function size.
It might get even slower if you apply: - f.write(Tm(''' GNAT.OS_Lib.OS_Exit(Integer(${T})); + f.write(Tm(''' GNAT.OS_Lib.OS_Exit(Integer(${T}_sum)); Currently, we're seeing: main.adb:9049:32: invalid use of subtype mark in expression or call gnatmake: "generated/ada/main.adb" compilation error
Aug 23 2020
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Sunday, 23 August 2020 at 07:36:19 UTC, oddp wrote:
 It might get even slower if you apply:

 -        f.write(Tm('''   GNAT.OS_Lib.OS_Exit(Integer(${T}));
 +        f.write(Tm('''   
 GNAT.OS_Lib.OS_Exit(Integer(${T}_sum));

 Currently, we're seeing:

 main.adb:9049:32: invalid use of subtype mark in expression or 
 call
 gnatmake: "generated/ada/main.adb" compilation error
Fixed. Thanks!
Aug 23 2020
prev sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Sunday, 23 August 2020 at 07:36:19 UTC, oddp wrote:
 It might get even slower if you apply:

 -        f.write(Tm('''   GNAT.OS_Lib.OS_Exit(Integer(${T}));
 +        f.write(Tm('''   
 GNAT.OS_Lib.OS_Exit(Integer(${T}_sum));

 Currently, we're seeing:

 main.adb:9049:32: invalid use of subtype mark in expression or 
 call
 gnatmake: "generated/ada/main.adb" compilation error
Indeed, it does! :) ./benchmark --languages=D,Ada --function-count=20 --function-depth=450 --run-count=1 gives | Lang-uage | Oper-ation | Temp-lated | Time [s/fn] | Slowdown vs [Best] | Version | Exec | | :---: | :---: | --- | :---: | :---: | :---: | :---: | | D | Build | No | 0.170 | 1.0 [D] | v2.093.1-541-ge54c041a4 | `dmd` | | Ada | Build | No | 9.065 | 53.3 [D] | 10.2.0 | `gnat-10` | !
Aug 23 2020
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On Thursday, 20 August 2020 at 20:50:25 UTC, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to 
 other languages at

     https://github.com/nordlow/compiler-benchmark

 I wonder; is there any language that compiles to native code 
 anywhere nearly as fast or faster than D, except C?
Yes. D1. :)
Aug 20 2020
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2020-08-20 22:50, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to other 
 languages at
 
      https://github.com/nordlow/compiler-benchmark
 
 I wonder; is there any language that compiles to native code anywhere 
 nearly as fast or faster than D, except C?
I'm surprised that you only have gccgo and not the reference/official (or whatever it's called) implementation. That one uses a fully custom tool chain, i.e. custom compiler, custom assembler, custom object format and custom linker. I'm sure it's faster than gccgo and it might be faster than D as well. There's some work on a new Rust backend [1] as well. Not sure if that usable yet. What about Nim and Vala, don't they count since they're generating C code? [1] https://jason-williams.co.uk/a-possible-new-backend-for-rust (first hit on Google) -- /Jacob Carlborg
Aug 20 2020
next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Friday, 21 August 2020 at 06:20:18 UTC, Jacob Carlborg wrote:
 I'm surprised that you only have gccgo and not the 
 reference/official (or whatever it's called) implementation. 
 That one uses a fully custom tool chain, i.e. custom compiler, 
 custom assembler, custom object format and custom linker. I'm 
 sure it's faster than gccgo and it might be faster than D as 
 well.
Well it's only a matter of having the time to add it. What's the easiest way to install the reference/official toolchain on Ubuntu?
Aug 21 2020
parent reply Jacob Carlborg <doob me.com> writes:
On Friday, 21 August 2020 at 11:36:51 UTC, Per Nordlöw wrote:

 Well it's only a matter of having the time to add it.
Fair enough. I would have guessed that the reference compiler is the most commonly used and it's famous for being fast, most likley faster than DMD. Therefore I would expect anyone that compare compile time speed would pick the reference compiler first, not gccgo.
 What's the easiest way to install the reference/official 
 toolchain on Ubuntu?
For the latest version, download the binaries from here: https://golang.org/dl/. Otherwise install the "golang" package using the package manager. -- /Jacob Carlborg
Aug 21 2020
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Friday, 21 August 2020 at 14:37:35 UTC, Jacob Carlborg wrote:
 For the latest version, download the binaries from here: 
 https://golang.org/dl/. Otherwise install the "golang" package 
 using the package manager.

 --
 /Jacob Carlborg
According to https://github.com/golang/go/wiki/Ubuntu sudo add-apt-repository ppa:longsleep/golang-backports sudo apt update sudo apt install golang-go will get you the Go 1.15 on Ubuntu 20.04.
Aug 21 2020
prev sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Friday, 21 August 2020 at 14:37:35 UTC, Jacob Carlborg wrote:
 For the latest version, download the binaries from here: 
 https://golang.org/dl/. Otherwise install the "golang" package 
 using the package manager.

 --
 /Jacob Carlborg
I update compiler-benchmark with support for the reference go compiler. I'll update the numbers now. It will take a while. In the meanwhile you can evaluate it yourself using, for instance, ./benchmark --languages=D,Go --function-count=200 --function-depth=450 --run-count=1 DMD is still far ahead of Go aswell; about 2.5x faster on check and 10x faster on build for my contrived example. I also added a script ./install-compilers.sh you can use to install most of the compilers I use in my tests on Ubuntu 20.04.
Aug 21 2020
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Friday, 21 August 2020 at 23:08:05 UTC, Per Nordlöw wrote:
 I also added a script
     ./install-compilers.sh
Renamed it the more verbose ./install-compilers-on-ubuntu-20.04.sh
Aug 21 2020
prev sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Friday, 21 August 2020 at 23:08:05 UTC, Per Nordlöw wrote:
 ./benchmark --languages=D,Go --function-count=200 
 --function-depth=450 --run-count=1

 DMD is still far ahead of Go aswell; about 2.5x faster on check 
 and 10x faster on build for my contrived example.
Here's the prel. Markdown-formatted table with ref Go compiler version 1.15 added: | Lang-uage | Oper-ation | Temp-lated | Time [s/fn] | Slowdown vs [Best] | Version | Exec | | :---: | :---: | --- | :---: | :---: | :---: | :---: | | D | Check | No | 0.634 | 1.0 [D] | v2.093.1-538-ge9c22d712 | `dmd` | | D | Check | No | 0.691 | 1.1 [D] | 1.23.0 | `ldmd2` | | D | Check | Yes | 1.600 | 2.5 [D] | v2.093.1-538-ge9c22d712 | `dmd` | | D | Check | Yes | 1.647 | 2.6 [D] | 1.23.0 | `ldmd2` | | D | Build | No | 1.518 | 1.0 [D] | v2.093.1-538-ge9c22d712 | `dmd` | | D | Build | No | 17.536 | 11.6 [D] | 1.23.0 | `ldmd2` | | D | Build | Yes | 2.696 | 1.8 [D] | v2.093.1-538-ge9c22d712 | `dmd` | | D | Build | Yes | 18.178 | 12.0 [D] | 1.23.0 | `ldmd2` | | Go | Check | No | 1.554 | 2.5 [D] | 1.15 | `gotype` | | Go | Check | No | 2.232 | 3.5 [D] | 9.3.0 | `gccgo-9` | | Go | Check | No | 2.244 | 3.5 [D] | 10.2.0 | `gccgo-10` | | Go | Build | No | 13.717 | 9.0 [D] | 1.15 | `go` | | Go | Build | No | 53.743 | 35.4 [D] | 9.3.0 | `gccgo-9` | | Go | Build | No | 57.711 | 38.0 [D] | 10.2.0 | `gccgo-10` |
Aug 21 2020
prev sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Friday, 21 August 2020 at 06:20:18 UTC, Jacob Carlborg wrote:
 What about Nim and Vala, don't they count since they're 
 generating C code?
I haven't included backends that generate C because I don't think they are relevant to this metric as dmd is faster than both GCC and Clang at compiling C-style D code.
Aug 21 2020
prev sibling next sibling parent reply Mike James <foo bar.com> writes:
On Thursday, 20 August 2020 at 20:50:25 UTC, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to 
 other languages at

     https://github.com/nordlow/compiler-benchmark

 I wonder; is there any language that compiles to native code 
 anywhere nearly as fast or faster than D, except C?

 If so it most likely needs to use a backend other than LLVM.

 I believe Jai is supposed to do that but it hasn't been 
 released yet.
Turbo Pascal ;-) It compiled so fast at a demonstration to customers they thought it was broke... -=mike=-
Aug 21 2020
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Friday, 21 August 2020 at 11:42:51 UTC, Mike James wrote:
 Turbo Pascal ;-)

 It compiled so fast at a demonstration to customers they 
 thought it was broke...
Ahh, yes I remember! It was my first language (in high school). :) Techniques outlined here: https://prog21.dadgum.com/47.html See also: https://www.reddit.com/r/programming/comments/1fpu6u/how_could_turbo_pascal_be_so_fast/
Aug 21 2020
prev sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Friday, 21 August 2020 at 11:42:51 UTC, Mike James wrote:
 It compiled so fast at a demonstration to customers they 
 thought it was broke...
That was me using D1 again after a while, it is virtually instant even for medium size programs. For my little webassembly demo I made a couple weeks ago, when you do the "try it yourself" thing, it actually runs ldc on the input and sends the output to the browser right there. No caching or anything fancy... but since it compiles in 40 ms you probably barely notice.
Aug 21 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Friday, 21 August 2020 at 12:14:35 UTC, Adam D. Ruppe wrote:
 For my little webassembly demo I made a couple weeks ago, when 
 you do the "try it yourself" thing, it actually runs ldc on the 
 input and sends the output to the browser right there. No 
 caching or anything fancy... but since it compiles in 40 ms you 
 probably barely notice.
How big is that program? Are you saying ldc's webassembly backend is much faster than its native (x86) backend?
Aug 21 2020
parent Adam D. Ruppe <destructionator gmail.com> writes:
On Friday, 21 August 2020 at 12:51:29 UTC, Per Nordlöw wrote:
 How big is that program?
Small, < 1000 lines, it is a little tetris game.
 Are you saying ldc's webassembly backend is much faster than 
 its native (x86) backend?
No, the difference is probably because my custom druntime and stdlib are more minimal. D as a language remains fast, but the stdlib has gotten slower to compile as time goes on.
Aug 21 2020
prev sibling next sibling parent reply James Lu <jamtlu gmail.com> writes:
On Thursday, 20 August 2020 at 20:50:25 UTC, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to 
 other languages at

     https://github.com/nordlow/compiler-benchmark

 I wonder; is there any language that compiles to native code 
 anywhere nearly as fast or faster than D, except C?

 If so it most likely needs to use a backend other than LLVM.

 I believe Jai is supposed to do that but it hasn't been 
 released yet.
V8 JavaScript compiles faster: $ d8 --always-opt --trace-opt --single-threaded --no-compilation-cache mandelbrot.js | ts -s "%.s" 0.000025 [compiling method 0x3fc50821068d <JSFunction (sfi = 0x3fc508210165)> using TurboFan] 0.001651 [optimizing 0x3fc50821068d <JSFunction (sfi = 0x3fc508210165)> - took 4.455, 41.945, 0.052 ms] 0.001707 [optimizing 0x3fc508085b65 <JSFunction Complex (sfi = 0x3fc508210261)> because --always-opt] 0.001736 [compiling method 0x3fc508085b65 <JSFunction Complex (sfi = 0x3fc508210261)> using TurboFan] 0.001763 [optimizing 0x3fc508085b65 <JSFunction Complex (sfi = 0x3fc508210261)> - took 0.125, 0.284, 0.019 ms] 0.001789 [optimizing 0x3fc508211665 <JSFunction iterate_mandelbrot (sfi = 0x3fc508210229)> because --always-opt] 0.001817 [compiling method 0x3fc508211665 <JSFunction iterate_mandelbrot (sfi = 0x3fc508210229)> using TurboFan] 0.001842 [optimizing 0x3fc508211665 <JSFunction iterate_mandelbrot (sfi = 0x3fc508210229)> - took 0.167, 1.197, 0.022 ms] 0.001868 [optimizing 0x3fc508085b85 <JSFunction abs (sfi = 0x3fc508210299)> because --always-opt] 0.001892 [compiling method 0x3fc508085b85 <JSFunction abs (sfi = 0x3fc508210299)> using TurboFan] 0.001916 [optimizing 0x3fc508085b85 <JSFunction abs (sfi = 0x3fc508210299)> - took 0.125, 0.421, 0.025 ms] 0.002093 [optimizing 0x3fc508085bbd <JSFunction mul (sfi = 0x3fc508210309)> because --always-opt] 0.002337 [compiling method 0x3fc508085bbd <JSFunction mul (sfi = 0x3fc508210309)> using TurboFan] 0.002365 [optimizing 0x3fc508085bbd <JSFunction mul (sfi = 0x3fc508210309)> - took 0.134, 0.550, 0.023 ms] 0.002389 [optimizing 0x3fc508085ba1 <JSFunction add (sfi = 0x3fc5082102d1)> because --always-opt] 0.002498 [compiling method 0x3fc508085ba1 <JSFunction add (sfi = 0x3fc5082102d1)> using TurboFan] --single-threaded disables compilation background tasks --always-opt makes V8 immediately compile the function without profiling Timestamps thanks to "ts" from moreutils. $ time dmd -c mandelbrot.d real 0m0.507s user 0m0.416s sys 0m0.094s V8 compiles 202x faster. -c to omit linking, which can be slow. That's using the struct/double version. (I also ran it with timestamps on the verbose version, writing to disk accounted for a negligible amount of time.) Obviously, take these results with a grain of salt, since --always-opt makes the compiled program very slow. But still, even with it off, V8 can compile very fast. --- LDC2 takes 1.695 seconds to compile with no flags, and 2.126 seconds with O2. QuickJS takes around 0.36 seconds to compile, but the program output is unbearably slow. However, I suspect by the time the fastest and most balanced D compiler finishes compiling the program and executing it, V8 will already have finished executing it.
Aug 25 2020
next sibling parent reply James Lu <jamtlu gmail.com> writes:
On Wednesday, 26 August 2020 at 01:13:47 UTC, James Lu wrote:
 However, I suspect by the time the fastest and most balanced D 
 compiler finishes compiling the program and executing it, V8 
 will already have finished executing it.
DMD -O doesn't make a significant difference over DMD, clocking in at 12 seconds total. LDC2 -O/-O2 has the best compilation-execution total, clocking in at 5.4 seconds. Subtracting the link time gets 5.0 seconds. Its code generation phase (measured by measuring the time between -v's "code" and the link command) takes 2.6 seconds. In contrast, compilation+execution for V8 JavaScript 2.0 seconds. V8's --trace-opt says it compiles each function once, and compilation (SSA creation, optimization, and code generation) takes a grand total of 0.027 seconds. LDC's code generation is over 100 times slower. Surely there's opportunities for profiling and optimization here. (dmd's code generation is too bad to count.)
Aug 25 2020
next sibling parent James Lu <jamtlu gmail.com> writes:
On Wednesday, 26 August 2020 at 01:31:01 UTC, James Lu wrote:
 LDC's code generation is over 100 times slower. Surely there's
Sorry, approximately 100 times slower, not over 100 times slower.
Aug 25 2020
prev sibling parent James Lu <jamtlu gmail.com> writes:
On Wednesday, 26 August 2020 at 01:31:01 UTC, James Lu wrote:
 On Wednesday, 26 August 2020 at 01:13:47 UTC, James Lu wrote:
 However, I suspect by the time the fastest and most balanced D 
 compiler finishes compiling the program and executing it, V8 
 will already have finished executing it.
DMD -O doesn't make a significant difference over DMD, clocking in at 12 seconds total. LDC2 -O/-O2 has the best compilation-execution total, clocking in at 5.4 seconds. Subtracting the link time gets 5.0 seconds. Its code generation phase (measured by measuring the time between -v's "code" and the link command) takes 2.6 seconds. In contrast, compilation+execution for V8 JavaScript 2.0 seconds. V8's --trace-opt says it compiles each function once, and compilation (SSA creation, optimization, and code generation) takes a grand total of 0.027 seconds. LDC's code generation is over 100 times slower. Surely there's opportunities for profiling and optimization here. (dmd's code generation is too bad to count.)
I wonder if anyone in the D community has the expertise to change modify or rewrite DMD's backend to be up to be at most 1.5-2x slower at normal, non-SIMD tasks, up to a poor version of LuaJIT or V8 while retaining the speed.
Aug 25 2020
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/25/20 9:13 PM, James Lu wrote:
 V8 JavaScript compiles faster:
 
 $ d8 --always-opt --trace-opt --single-threaded --no-compilation-cache 
 mandelbrot.js | ts -s "%.s"
Interesting. What is the result of that compilation? A traditional binary file, or a webassembly?
Aug 26 2020
next sibling parent James Lu <jamtlu gmail.com> writes:
On Wednesday, 26 August 2020 at 13:16:08 UTC, Andrei Alexandrescu 
wrote:
 On 8/25/20 9:13 PM, James Lu wrote:
 V8 JavaScript compiles faster:
 
 $ d8 --always-opt --trace-opt --single-threaded 
 --no-compilation-cache mandelbrot.js | ts -s "%.s"
Interesting. What is the result of that compilation? A traditional binary file, or a webassembly?
It results in machine code (in my case, x86) in memory.
Aug 26 2020
prev sibling parent James Lu <jamtlu gmail.com> writes:
On Wednesday, 26 August 2020 at 13:16:08 UTC, Andrei Alexandrescu 
wrote:
 On 8/25/20 9:13 PM, James Lu wrote:
 V8 JavaScript compiles faster:
 
 $ d8 --always-opt --trace-opt --single-threaded 
 --no-compilation-cache mandelbrot.js | ts -s "%.s"
Interesting. What is the result of that compilation? A traditional binary file, or a webassembly?
In an earlier post from me in this thread, I added together V8's internal timing to suggest that its codgen (SSA building, SSA optimization, conversion to machine code) phase runs 100 times faster than LLVM on similar code in D.
Aug 26 2020
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/25/20 9:13 PM, James Lu wrote:
 On Thursday, 20 August 2020 at 20:50:25 UTC, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to other 
 languages at

     https://github.com/nordlow/compiler-benchmark

 I wonder; is there any language that compiles to native code anywhere 
 nearly as fast or faster than D, except C?

 If so it most likely needs to use a backend other than LLVM.

 I believe Jai is supposed to do that but it hasn't been released yet.
V8 JavaScript compiles faster: $ d8 --always-opt --trace-opt --single-threaded --no-compilation-cache mandelbrot.js | ts -s "%.s" 0.000025 [compiling method 0x3fc50821068d <JSFunction (sfi = 0x3fc508210165)> using TurboFan] 0.001651 [optimizing 0x3fc50821068d <JSFunction (sfi = 0x3fc508210165)> - took 4.455, 41.945, 0.052 ms] 0.001707 [optimizing 0x3fc508085b65 <JSFunction Complex (sfi = 0x3fc508210261)> because --always-opt] 0.001736 [compiling method 0x3fc508085b65 <JSFunction Complex (sfi = 0x3fc508210261)> using TurboFan] 0.001763 [optimizing 0x3fc508085b65 <JSFunction Complex (sfi = 0x3fc508210261)> - took 0.125, 0.284, 0.019 ms] 0.001789 [optimizing 0x3fc508211665 <JSFunction iterate_mandelbrot (sfi = 0x3fc508210229)> because --always-opt] 0.001817 [compiling method 0x3fc508211665 <JSFunction iterate_mandelbrot (sfi = 0x3fc508210229)> using TurboFan] 0.001842 [optimizing 0x3fc508211665 <JSFunction iterate_mandelbrot (sfi = 0x3fc508210229)> - took 0.167, 1.197, 0.022 ms] 0.001868 [optimizing 0x3fc508085b85 <JSFunction abs (sfi = 0x3fc508210299)> because --always-opt] 0.001892 [compiling method 0x3fc508085b85 <JSFunction abs (sfi = 0x3fc508210299)> using TurboFan] 0.001916 [optimizing 0x3fc508085b85 <JSFunction abs (sfi = 0x3fc508210299)> - took 0.125, 0.421, 0.025 ms] 0.002093 [optimizing 0x3fc508085bbd <JSFunction mul (sfi = 0x3fc508210309)> because --always-opt] 0.002337 [compiling method 0x3fc508085bbd <JSFunction mul (sfi = 0x3fc508210309)> using TurboFan] 0.002365 [optimizing 0x3fc508085bbd <JSFunction mul (sfi = 0x3fc508210309)> - took 0.134, 0.550, 0.023 ms] 0.002389 [optimizing 0x3fc508085ba1 <JSFunction add (sfi = 0x3fc5082102d1)> because --always-opt] 0.002498 [compiling method 0x3fc508085ba1 <JSFunction add (sfi = 0x3fc5082102d1)> using TurboFan]
I just want to point out that ts is not an accurate timestamping system. The shell is starting ts and d8 simultaneously, and it's very possible that d8 has done a lot of stuff by the time ts gets around to deciding what timestamp 0 is. In fact, d8 could be completely finished, and all its output buffered in the pipe, before ts does anything. Especially when these times are so short. Not saying the data is wrong, but I am not certain this is proof. Use shell time builtin. -Steve
Aug 26 2020
parent reply James Lu <jamtlu gmail.com> writes:
On Wednesday, 26 August 2020 at 14:45:48 UTC, Steven 
Schveighoffer wrote:
 On 8/25/20 9:13 PM, James Lu wrote:
 V8 JavaScript compiles faster:
 
 $ d8 --always-opt --trace-opt --single-threaded
I just want to point out that ts is not an accurate timestamping system. The shell is starting ts and d8 simultaneously, and it's very possible that d8 has done a lot of stuff by the time ts gets around to deciding what timestamp 0 is. In fact, d8 could be completely finished, and all its output buffered in the pipe, before ts does anything. Especially when these times are so short. Not saying the data is wrong, but I am not certain this is proof. Use shell time builtin. -Steve
$ time d8 --always-opt --trace-opt --single-threaded --no-compilation-cache mandelbrot.js [compiling method 0x182b08210471 <JSFunction (sfi = 0x182b08210129)> using TurboFan] [optimizing 0x182b08210471 <JSFunction (sfi = 0x182b08210129)> - took 0.888, 1.326, 0.026 ms] [optimizing 0x182b0821073d <JSFunction main (sfi = 0x182b082101e5)> because --always-opt] [compiling method 0x182b0821073d <JSFunction main (sfi = 0x182b082101e5)> using TurboFan] [optimizing 0x182b0821073d <JSFunction main (sfi = 0x182b082101e5)> - took 0.401, 3.229, 0.044 ms] [optimizing 0x182b08085bdd <JSFunction Complex (sfi = 0x182b0821021d)> because --always-opt] [compiling method 0x182b08085bdd <JSFunction Complex (sfi = 0x182b0821021d)> using TurboFan] [optimizing 0x182b08085bdd <JSFunction Complex (sfi = 0x182b0821021d)> - took 0.138, 0.296, 0.028 ms] [optimizing 0x182b08210709 <JSFunction iterate_mandelbrot (sfi = 0x182b082101ad)> because --always-opt] [compiling method 0x182b08210709 <JSFunction iterate_mandelbrot (sfi = 0x182b082101ad)> using TurboFan] [optimizing 0x182b08210709 <JSFunction iterate_mandelbrot (sfi = 0x182b082101ad)> - took 0.228, 1.619, 0.035 ms] [optimizing 0x182b08085bfd <JSFunction abs (sfi = 0x182b08210255)> because --always-opt] [compiling method 0x182b08085bfd <JSFunction abs (sfi = 0x182b08210255)> using TurboFan] [optimizing 0x182b08085bfd <JSFunction abs (sfi = 0x182b08210255)> - took 0.213, 0.502, 0.033 ms] [optimizing 0x182b08085c35 <JSFunction mul (sfi = 0x182b082102c5)> because --always-opt] [compiling method 0x182b08085c35 <JSFunction mul (sfi = 0x182b082102c5)> using TurboFan] [optimizing 0x182b08085c35 <JSFunction mul (sfi = 0x182b082102c5)> - took 0.183, 0.643, 0.030 ms] [optimizing 0x182b08085c19 <JSFunction add (sfi = 0x182b0821028d)> because --always-opt] [compiling method 0x182b08085c19 <JSFunction add (sfi = 0x182b0821028d)> using TurboFan] [optimizing 0x182b08085c19 <JSFunction add (sfi = 0x182b0821028d)> - took 0.150, 0.449, 0.041 ms] real 0m0.052s user 0m0.022s sys 0m0.021s --always-opt makes V8 compile the function the first time it is called, so we can ignore interpreter overhead. I changed the code to quit after one function call, so we would measure how long it takes to compile and run the compilation. For the sake of transparency, I modified some of the code to move it into a main function to ensure it would compile the code. Surprisingly, doing this reduced compilation time. Here is the exact code I used: https://gist.github.com/CrazyPython/3552e1405dbb4b640810f6443cd0a015
Aug 26 2020
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/26/20 12:00 PM, James Lu wrote:
 On Wednesday, 26 August 2020 at 14:45:48 UTC, Steven Schveighoffer wrote:
 On 8/25/20 9:13 PM, James Lu wrote:
 V8 JavaScript compiles faster:

 $ d8 --always-opt --trace-opt --single-threaded
I just want to point out that ts is not an accurate timestamping system. The shell is starting ts and d8 simultaneously, and it's very possible that d8 has done a lot of stuff by the time ts gets around to deciding what timestamp 0 is. In fact, d8 could be completely finished, and all its output buffered in the pipe, before ts does anything. Especially when these times are so short. Not saying the data is wrong, but I am not certain this is proof. Use shell time builtin.
$ time d8 --always-opt --trace-opt --single-threaded --no-compilation-cache mandelbrot.js [compiling method 0x182b08210471 <JSFunction (sfi = 0x182b08210129)> using TurboFan] [optimizing 0x182b08210471 <JSFunction (sfi = 0x182b08210129)> - took 0.888, 1.326, 0.026 ms] [optimizing 0x182b0821073d <JSFunction main (sfi = 0x182b082101e5)> because --always-opt] [compiling method 0x182b0821073d <JSFunction main (sfi = 0x182b082101e5)> using TurboFan] [optimizing 0x182b0821073d <JSFunction main (sfi = 0x182b082101e5)> - took 0.401, 3.229, 0.044 ms] [optimizing 0x182b08085bdd <JSFunction Complex (sfi = 0x182b0821021d)> because --always-opt] [compiling method 0x182b08085bdd <JSFunction Complex (sfi = 0x182b0821021d)> using TurboFan] [optimizing 0x182b08085bdd <JSFunction Complex (sfi = 0x182b0821021d)> - took 0.138, 0.296, 0.028 ms] [optimizing 0x182b08210709 <JSFunction iterate_mandelbrot (sfi = 0x182b082101ad)> because --always-opt] [compiling method 0x182b08210709 <JSFunction iterate_mandelbrot (sfi = 0x182b082101ad)> using TurboFan] [optimizing 0x182b08210709 <JSFunction iterate_mandelbrot (sfi = 0x182b082101ad)> - took 0.228, 1.619, 0.035 ms] [optimizing 0x182b08085bfd <JSFunction abs (sfi = 0x182b08210255)> because --always-opt] [compiling method 0x182b08085bfd <JSFunction abs (sfi = 0x182b08210255)> using TurboFan] [optimizing 0x182b08085bfd <JSFunction abs (sfi = 0x182b08210255)> - took 0.213, 0.502, 0.033 ms] [optimizing 0x182b08085c35 <JSFunction mul (sfi = 0x182b082102c5)> because --always-opt] [compiling method 0x182b08085c35 <JSFunction mul (sfi = 0x182b082102c5)> using TurboFan] [optimizing 0x182b08085c35 <JSFunction mul (sfi = 0x182b082102c5)> - took 0.183, 0.643, 0.030 ms] [optimizing 0x182b08085c19 <JSFunction add (sfi = 0x182b0821028d)> because --always-opt] [compiling method 0x182b08085c19 <JSFunction add (sfi = 0x182b0821028d)> using TurboFan] [optimizing 0x182b08085c19 <JSFunction add (sfi = 0x182b0821028d)> - took 0.150, 0.449, 0.041 ms] real    0m0.052s user    0m0.022s sys    0m0.021s --always-opt makes V8 compile the function the first time it is called, so we can ignore interpreter overhead. I changed the code to quit after one function call, so we would measure how long it takes to compile and run the compilation. For the sake of transparency, I modified some of the code to move it into a main function to ensure it would compile the code. Surprisingly, doing this reduced compilation time. Here is the exact code I used: https://gist.github.com/CrazyPython/3552e1405dbb4b640810f6443cd0a015
Thanks, that's really impressive. D has some significant overhead which might explain some of the discrepancy. Compiling an empty main function with dmd takes 0.128 seconds on my system, but of course comparing my system to yours isn't going to be useful. Compiling with -betterC an empty main takes 0.065 seconds, which means about half the overhead is spent compiling D runtime setup things? It is difficult to assign what is overhead and what is compilation, especially for a JIT compiler, so the claim of "100x", may not be accurate, especially with these low timings. Still, it definitely seems faster than D. -Steve
Aug 26 2020
parent Atila Neves <atila.neves gmail.com> writes:
On Wednesday, 26 August 2020 at 16:24:03 UTC, Steven 
Schveighoffer wrote:
 On 8/26/20 12:00 PM, James Lu wrote:
 On Wednesday, 26 August 2020 at 14:45:48 UTC, Steven 
 Schveighoffer wrote:
 On 8/25/20 9:13 PM, James Lu wrote:
 V8 JavaScript compiles faster:
 Compiling an empty main function with dmd takes 0.128 seconds 
 on my system, but of course comparing my system to yours isn't 
 going to be useful.

 Compiling with -betterC an empty main takes 0.065 seconds, 
 which means about half the overhead is spent compiling D 
 runtime setup things?
Weird. On my machine both take the same amount of time: ~16ms.
Aug 27 2020
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Aug 26, 2020 at 01:31:01AM +0000, James Lu via Digitalmars-d wrote:
[...]
 DMD -O doesn't make a significant difference over DMD, clocking in at
 12 seconds total.
[...] DMD's optimizer is a joke compared to modern optimizing backends like LDC/LLVM or GCC. These days I don't even look at DMD for anything remotely performance-related. I consistently get 15-20% faster executables from LDC than from DMD (even without any optimization flags!), and for compute-heavy programs with -O2/-O3, the difference can be up to 40-50%. Now that LDC releases are closely tracking DMD releases, I honestly have lost interest in DMD codegen quality, and only use DMD for rapid prototyping during development. For everything else, LDC is my go-to compiler. (And don't even get me started on backend codegen bugs triggered by -O and/or -inline. After getting bitten a few times by a couple of those, I stay away from dmd -O / dmd -inline like the plague. If I want optimization, I use LDC instead.) On Wed, Aug 26, 2020 at 01:38:27AM +0000, James Lu via Digitalmars-d wrote: [...]
 I wonder if anyone in the D community has the expertise to change
 modify or rewrite DMD's backend to be up to be at most 1.5-2x slower
 at normal, non-SIMD tasks, up to a poor version of LuaJIT or V8 while
 retaining the speed.
Supposedly Walter is one of the only people who understands the backend well enough to be able to make significant improvements to it. However, Walter is busy with other D-related stuff (important language-level stuff), and we really don't want his time to be spent optimizing a backend that, to be frank, almost nobody is interested in these days. (I'm willing to be pleasantly surprised, though. If Walter can singlehandedly clean up DMD's optimizer and hone it at least to the same ballpark as LDC/GDC, then I'll be all ears. But I'm not holding my breath.) T -- People say I'm indecisive, but I'm not sure about that. -- YHL, CONLANG
Aug 25 2020
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
H. S. Teoh wrote:

 I wonder if anyone in the D community has the expertise to change
 modify or rewrite DMD's backend to be up to be at most 1.5-2x slower
 at normal, non-SIMD tasks, up to a poor version of LuaJIT or V8 while
 retaining the speed.
Supposedly Walter is one of the only people who understands the backend well enough to be able to make significant improvements to it.
that's why there is no reason to "improve" current DMD backend at all. it is much easier to throw it away, and write a brand new one, SSA-based. i bet that bog-standard SSA with linear register allocator will generate code at least as good as DMD -O, but it will be faster, and more maintainable. it is also easy to retarget it, because most analysis (and even spilling, partially) is done on SSA level, and you only have to port instruction selector. so no problems maintaining backends for x86, x86_64 and arm (even in the same executable). also, the same backend can be used to jit ctfe code later. now we only need somebody to do it.
Aug 25 2020
next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 26 August 2020 at 04:37:06 UTC, ketmar wrote:
 H. S. Teoh wrote:

 I wonder if anyone in the D community has the expertise to 
 change
 modify or rewrite DMD's backend to be up to be at most 1.5-2x 
 slower
 at normal, non-SIMD tasks, up to a poor version of LuaJIT or 
 V8 while
 retaining the speed.
Supposedly Walter is one of the only people who understands the backend well enough to be able to make significant improvements to it.
that's why there is no reason to "improve" current DMD backend at all.
Perhaps we should not be that quick to downplay DMD just because it does not optimize as heavily as GDC and LDC at max settings. I may be too theoretical, but I think using only relatively basic optimizations for release build might be preferable to always using the most aggressive setting. Why? Because the program usually spends almost all it's time in tiny fraction of itself. One has profile where it is and do some hand-optimization anyway to get a performant program, no matter optimizations, enough to avoid hand-assembly and things like `foreach(vector; cast(long[])intArray){...}` in the critical parts. But max-optimizing the whole program, for me, just seems to bloat binary size and compile times for relatively little benefit. Also, one supposedly wants to benchmark the critical parts. With conservative optimization, the benchmarks are faster to compile, and supposedly more reliable. There is less surface for compiler-caused performance regression, and your code is more likely to stay fast if you decide you need to use size optimization instead.
Aug 26 2020
parent ketmar <ketmar ketmar.no-ip.org> writes:
Dukc wrote:

 On Wednesday, 26 August 2020 at 04:37:06 UTC, ketmar wrote:
 H. S. Teoh wrote:

 I wonder if anyone in the D community has the expertise to change
 modify or rewrite DMD's backend to be up to be at most 1.5-2x slower
 at normal, non-SIMD tasks, up to a poor version of LuaJIT or V8 while
 retaining the speed.
Supposedly Walter is one of the only people who understands the backend well enough to be able to make significant improvements to it.
that's why there is no reason to "improve" current DMD backend at all.
Perhaps we should not be that quick to downplay DMD just because it does not optimize as heavily as GDC and LDC at max settings.
it's not the reason, at least for me. the real reason is that DMD backend is virtually impenetrable. it is a giant black box with the label "DO NOT ENTER IF YOUR NAME IS NOT WALTER" on its side. SSA backend is much easier to maintain, much easier to retarget, and optimisations over SSA can be nicely layered, from "nothing" to "set of aggressive multipass optimisers". the best thing is that those optimisers are mostly independent of each other, they only need to maintain SSA invariant. so you can write alot of them doing one simple optimisation at a time, and run them as long as you want.
Aug 26 2020
prev sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 26 August 2020 at 04:37:06 UTC, ketmar wrote:
 also, the same backend can be used to jit ctfe code later.

 now we only need somebody to do it.
CTFE needs a different code path from the regular backend. You need to be able to hook many things which usually you wouldn't need to hook.
Aug 26 2020
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
Stefan Koch wrote:

 On Wednesday, 26 August 2020 at 04:37:06 UTC, ketmar wrote:
 also, the same backend can be used to jit ctfe code later.

 now we only need somebody to do it.
CTFE needs a different code path from the regular backend. You need to be able to hook many things which usually you wouldn't need to hook.
you're right... with the current backend. but with universal SSA backend, once you lowered the code to SSA, it doesn't matter anymore. for native code, lowering engine can emit direct memory manipulation SSA opcodes, and for CTFE it can emit function calls. the backend doesn't care, it will still produce machine code you can either write to disk, or run directly. or don't even bother producing machine code at all, but run some SSA optimisers and execute SSA code directly.
Aug 26 2020
parent reply drug <drug2004 bk.ru> writes:
On 8/26/20 4:02 PM, ketmar wrote:
 you're right... with the current backend. but with universal SSA 
 backend, once you lowered the code to SSA, it doesn't matter anymore. 
 for native code, lowering engine can emit direct memory manipulation SSA 
 opcodes, and for CTFE it can emit function calls. the backend doesn't 
 care, it will still produce machine code you can either write to disk, 
 or run directly. or don't even bother producing machine code at all, but 
 run some SSA optimisers and execute SSA code directly.
What are disadvantages of SSA based backend?
Aug 26 2020
next sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
drug wrote:

 What are disadvantages of SSA based backend?
somebody have to write it.
Aug 26 2020
prev sibling next sibling parent Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Wednesday, 26 August 2020 at 13:07:03 UTC, drug wrote:
 On 8/26/20 4:02 PM, ketmar wrote:
 you're right... with the current backend. but with universal 
 SSA backend, once you lowered the code to SSA, it doesn't 
 matter anymore. for native code, lowering engine can emit 
 direct memory manipulation SSA opcodes, and for CTFE it can 
 emit function calls. the backend doesn't care, it will still 
 produce machine code you can either write to disk, or run 
 directly. or don't even bother producing machine code at all, 
 but run some SSA optimisers and execute SSA code directly.
What are disadvantages of SSA based backend?
If I'm not wrong, from what I remember LLVM IR is SSA, for I guess there's a lot of literature around pro and versus the SSA approach...
Aug 26 2020
prev sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 26 August 2020 at 13:07:03 UTC, drug wrote:
 On 8/26/20 4:02 PM, ketmar wrote:
 you're right... with the current backend. but with universal 
 SSA backend, once you lowered the code to SSA, it doesn't 
 matter anymore. for native code, lowering engine can emit 
 direct memory manipulation SSA opcodes, and for CTFE it can 
 emit function calls. the backend doesn't care, it will still 
 produce machine code you can either write to disk, or run 
 directly. or don't even bother producing machine code at all, 
 but run some SSA optimisers and execute SSA code directly.
What are disadvantages of SSA based backend?
Well formed SSA is a little tricky to generate. And does not map well on hardware. Without a few dedicated rewrite and optimization passes, it produces code which is dog slow.
Aug 26 2020
parent ketmar <ketmar ketmar.no-ip.org> writes:
Stefan Koch wrote:

 Well formed SSA is a little tricky to generate.
 And does not map well on hardware.
that is not what SSA is used for. ;-) also, well-formed SSA is dead easy to generate: just don't try to be smart, and don't write "locals reuse logic" at all. other passes will take care of eliminating redunant loads and locals. and then simple linear scan register allocator will give you very surprising results. ;-) The Great Secret of SSA is "don't be smart". each SSA pass should do only one thing, it should do it in the easiest possible way, and only care about repairing SSA damage it done. and then you can easily add more passes, and choose between generated code quality and time spent on optimising.
Aug 26 2020
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On Thursday, 20 August 2020 at 20:50:25 UTC, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to 
 other languages at

     https://github.com/nordlow/compiler-benchmark

 I wonder; is there any language that compiles to native code 
 anywhere nearly as fast or faster than D, except C?
You could add Crystal [1] as well for completeness. It uses LLVM as its backend, so it might not be fast. BTW, I see that you download DMD from dlang.org. The binaries at dlang.org are compiled with DMD itself. You should compile DMD yourself using LDC, with the appropriate flags. It will give you a boost, perhaps 30%. [1] https://crystal-lang.org -- /Jacob Carlborg
Aug 26 2020
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 26 August 2020 at 11:49:06 UTC, Jacob Carlborg 
wrote:
 [snip]

 BTW, I see that you download DMD from dlang.org. The binaries 
 at dlang.org are compiled with DMD itself. You should compile 
 DMD yourself using LDC, with the appropriate flags. It will 
 give you a boost, perhaps 30%.

 [1] https://crystal-lang.org

 --
 /Jacob Carlborg
After [1] I was under the assumption that the linux versions were already compiled on LDC first. Is it only the Windows release that is compiled with LDC? [1] https://dlang.org/changelog/2.091.0.html#windows
Aug 26 2020
parent reply Jacob Carlborg <doob me.com> writes:
On Wednesday, 26 August 2020 at 12:40:28 UTC, jmh530 wrote:

 Is it only the Windows release that is compiled with LDC?
Yes, unfortunately. -- /Jacob Carlborg
Aug 27 2020
parent Atila Neves <atila.neves gmail.com> writes:
On Thursday, 27 August 2020 at 09:54:36 UTC, Jacob Carlborg wrote:
 On Wednesday, 26 August 2020 at 12:40:28 UTC, jmh530 wrote:

 Is it only the Windows release that is compiled with LDC?
Yes, unfortunately. -- /Jacob Carlborg
Didn't that change recently? In any case, on Arch Linux dmd is compiled with ldc.
Aug 27 2020
prev sibling next sibling parent reply MrSmith <mrsmith33 yandex.ru> writes:
On Thursday, 20 August 2020 at 20:50:25 UTC, Per Nordlöw wrote:
 I wonder; is there any language that compiles to native code 
 anywhere nearly as fast or faster than D, except C?
Hi, you may want to try benchmarking my compiler of Vox language: https://github.com/MrSmith33/tiny_jit I designed it with high compilation speed in mind. I added statically compiled linux build to CI tag. Or compile manually with: source> ~/dlang/ldc-1.22.0/bin/ldc2 -d-version=cli -m64 -O3 -release -boundscheck=off -enable-inlining -flto=full -i main.d -of=./tjc So far it may only produce executables for win64, but it should be enough for you purpose.
Aug 26 2020
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 16:08:15 UTC, MrSmith wrote:
 Hi, you may want to try benchmarking my compiler of Vox 
 language:
 https://github.com/MrSmith33/tiny_jit
 I designed it with high compilation speed in mind.
Looks really interesting! Thanks. I'll try it out soon.
Aug 26 2020
prev sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 16:08:15 UTC, MrSmith wrote:
 https://github.com/MrSmith33/tiny_jit
What kinds of memory management are supported/planned and how does this interact with slices? I can't find any code examples in Vox. Are they represented as strings inside the .d-files?
Aug 26 2020
next sibling parent reply MrSmith <mrsmith33 yandex.ru> writes:
On Wednesday, 26 August 2020 at 16:34:39 UTC, Per Nordlöw wrote:
 On Wednesday, 26 August 2020 at 16:08:15 UTC, MrSmith wrote:
 https://github.com/MrSmith33/tiny_jit
What kinds of memory management are supported/planned and how does this interact with slices? I can't find any code examples in Vox. Are they represented as strings inside the .d-files?
Currently only manual MM. Check source/tests for small examples and https://github.com/MrSmith33/rltut_2019 for bigger project. Also see spec/index.md for some docs.
Aug 26 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 17:01:44 UTC, MrSmith wrote:
 Currently only manual MM. Check source/tests for small examples 
 and https://github.com/MrSmith33/rltut_2019 for bigger project. 
 Also see spec/index.md for some docs.
Can you elaborate on what you mean manual MM? Refcounted? I'm very curious.
Aug 26 2020
parent MrSmith <mrsmith33 yandex.ru> writes:
On Wednesday, 26 August 2020 at 20:13:55 UTC, Per Nordlöw wrote:
 Can you elaborate on what you mean manual MM? Refcounted? I'm 
 very curious.
Currently you can either do static/stack allocation and use host provided functions or OS API to dynamically allocate memory (alloc/free, mmap/VirtualAlloc/HeapAlloc etc). No GC or RC is planned. Probably I will add more first class support for allocators like in Jai/Zig/Odin.
Aug 26 2020
prev sibling parent reply MrSmith <mrsmith33 yandex.ru> writes:
On Wednesday, 26 August 2020 at 16:34:39 UTC, Per Nordlöw wrote:
 On Wednesday, 26 August 2020 at 16:08:15 UTC, MrSmith wrote:
 https://github.com/MrSmith33/tiny_jit
What kinds of memory management are supported/planned and how does this interact with slices? I can't find any code examples in Vox. Are they represented as strings inside the .d-files?
I run benchmark myself and here is what Vox code should look like based on D code: i64 add_long_n0_h0[T](i64 x) { return x + 87734; } i64 add_long_n0[T](i64 x) { return x + add_long_n0_h0[i64](x) + 40209; } i64 main() { i64 long_sum = 0; long_sum += add_long_n0[i64](0); return long_sum; }
Aug 26 2020
next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 17:27:48 UTC, MrSmith wrote:
 I run benchmark myself and here is what Vox code should look 
 like based on D code:
Ok, thanks. How should I best build vox for maximum performance? These alternatives correctly produce a binary: dmd -i main.d ldmd2 -i main.d ldmd2 -O -release -i main.d but running the binary produced ldmd2 -O -release -i main.d prints Running 218 tests but never completes...
Aug 26 2020
parent reply MrSmith <mrsmith33 yandex.ru> writes:
On Wednesday, 26 August 2020 at 19:26:17 UTC, Per Nordlöw wrote:
 How should I best build vox for maximum performance?
It needs cli version passed: ldc2 -d-version=cli -m64 -O3 -release -boundscheck=off -enable-inlining -flto=full -i main.d -of=./tjc
Aug 26 2020
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 20:10:09 UTC, MrSmith wrote:
 It needs cli version passed:
 ldc2 -d-version=cli -m64 -O3 -release -boundscheck=off 
 -enable-inlining -flto=full -i main.d -of=./tjc
Ahh, nice.
Aug 26 2020
prev sibling next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 20:10:09 UTC, MrSmith wrote:
 It needs cli version passed:
 ldc2 -d-version=cli -m64 -O3 -release -boundscheck=off 
 -enable-inlining -flto=full -i main.d -of=./tjc
Can `vox` only output Windows binaries?
Aug 26 2020
parent reply MrSmith <mrsmith33 yandex.ru> writes:
On Wednesday, 26 August 2020 at 22:05:51 UTC, Per Nordlöw wrote:
 Can `vox` only output Windows binaries?
Yes. ELF and SytemV ABI is WIP
Aug 26 2020
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 22:11:34 UTC, MrSmith wrote:
 Yes. ELF and SytemV ABI is WIP
What flags should I feed to the compiler? My output binary when I do vox main.vox is a main.exe Windows binary.
Aug 26 2020
prev sibling next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 20:10:09 UTC, MrSmith wrote:
 ldc2 -d-version=cli -m64 -O3 -release -boundscheck=off 
 -enable-inlining -flto=full -i main.d -of=./tjc
I built that binary on my system and called it `vox`. It is indeed fast; about 2.5x faster than dmd | Lang-uage | Oper-ation | Temp-lated | Time [us/#fn] | Slowdown vs [Best] | Version | Exec | | :---: | :---: | --- | :---: | :---: | :---: | :---: | | D | Build | No | 16.4 | 2.4 [Vox] | v2.093.1-697-g537aa8eb1 | `dmd` | | D | Build | No | 188.1 | 27.6 [Vox] | 1.23.0 | `ldmd2` | | D | Build | Yes | 30.8 | 4.5 [Vox] | v2.093.1-697-g537aa8eb1 | `dmd` | | D | Build | Yes | 204.2 | 29.9 [Vox] | 1.23.0 | `ldmd2` | | Vox | Build | No | 6.8 | 1.0 [Vox] | master | `vox` | I've added support for untemplated Vox language to compiler-benchmark if `vox` is found in the path. I'll add the templated version now. Great work!
Aug 26 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 22:21:46 UTC, Per Nordlöw wrote:
 I've added support for untemplated Vox language to 
 compiler-benchmark if `vox` is found in the path.
I've updated the docs aswell at https://github.com/nordlow/compiler-benchmark
Aug 26 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 22:22:52 UTC, Per Nordlöw wrote:
 I've updated the docs aswell at
 https://github.com/nordlow/compiler-benchmark
I've haven't updated the benchmarks yet, though.
Aug 26 2020
next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 22:27:39 UTC, Per Nordlöw wrote:
 I've haven't updated the benchmarks yet, though.
I've pushed support for Vox-generics now aswell.
Aug 26 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 22:56:54 UTC, Per Nordlöw wrote:
 I've pushed support for Vox-generics now aswell.
Hardly any difference in compile-time for the generic version. Impressive.
Aug 26 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 23:03:08 UTC, Per Nordlöw wrote:
 Hardly any difference in compile-time for the generic version. 
 Impressive.
Is Vox so fast because it doesn't (yet) support implicit function template instantiation (IFTI)?
Aug 27 2020
parent reply MrSmith <mrsmith33 yandex.ru> writes:
On Thursday, 27 August 2020 at 09:47:38 UTC, Per Nordlöw wrote:
 Is Vox so fast because it doesn't (yet) support implicit 
 function template instantiation (IFTI)?
IFTI is supported, but only in simple cases. Relevant tests: https://github.com/MrSmith33/tiny_jit/blob/master/source/tests/passing.d#L3342-L3434 Macros are not yet implemented. But I have variadic templates. (See tests below IFTI tests). Here is a fun one. Combining #foreach, variadic template function and type functions to get writeln functionality. selectPrintFunc gets run via CTFE and returns alias to relevant function. $ functions work like traits. $type is $alias restricted to types only. void printStr(u8[]); void printInt(i64 i); $alias selectPrintFunc($type T) { if ($isInteger(T)) return printInt; if ($isSlice(T)) return printStr; $compileError("Invalid type"); } void write[Args...](Args... args) { #foreach(i, arg; args) { alias func = selectPrintFunc(Args[i]); func(arg); } } void run() { write("Hello", 42); }
Aug 27 2020
next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 27 August 2020 at 10:29:41 UTC, MrSmith wrote:
 IFTI is supported, but only in simple cases. Relevant tests: 
 https://github.com/MrSmith33/tiny_jit/blob/master/source/tests/passing.d#L3342-L3434
Nice. Are/Will Vox's overloading rules be different from D's? Is there a list of D features you don't want? And a list of non-D features you plan?
Aug 27 2020
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 27 August 2020 at 14:52:06 UTC, Per Nordlöw wrote:
 Is there a list of D features you don't want?
 And a list of non-D features you plan?
I just read todo.txt. Do you plan to add qualifiers for pure and safe code?
Aug 27 2020
prev sibling parent MrSmith <mrsmith33 yandex.ru> writes:
On Thursday, 27 August 2020 at 14:52:06 UTC, Per Nordlöw wrote:
 Nice. Are/Will Vox's overloading rules be different from D's?

 Is there a list of D features you don't want?
 And a list of non-D features you plan?
Haven't given overloading a thought yet, but D way seems totally ok. Till that point, the lack of overloading wasn't a major pain point, though. I tried to list main differences from D in the readme. I want to have performant ways of introspection and code-generation that utilize CTFE as much as possible. I may add some support for polymorphism, like signatures/traits. I'm not a big fan of poisonous attributes, so not adding them atm (But I have one now - #ctfe, for ctfe-only functions/structs). There are more basic stuff still missing from the language that I need to focus on.
Aug 27 2020
prev sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 27 August 2020 at 10:29:41 UTC, MrSmith wrote:
 void write[Args...](Args... args)
What is the pro of this syntax with `Args... args` compared to D's void write(Args...)(Args args) ?
Oct 05 2020
parent MrSmith <mrsmith33 yandex.ru> writes:
On Monday, 5 October 2020 at 14:19:36 UTC, Per Nordlöw wrote:
 What is the pro of this syntax with `Args... args` compared to 
 D's
 void write(Args...)(Args args)
 ?
I think it was due to simpler implementation. This way you know that Args is variadic as early as at parse time.
Oct 05 2020
prev sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 22:27:39 UTC, Per Nordlöw wrote:
 On Wednesday, 26 August 2020 at 22:22:52 UTC, Per Nordlöw wrote:
 I've updated the docs aswell at
 https://github.com/nordlow/compiler-benchmark
I've haven't updated the benchmarks yet, though.
Here's at least numbers for ./benchmark --languages=D,Vox --function-count=200 --function-depth=450 --run-count=1 outputted in Markdown-table format: | Lang-uage | Oper-ation | Temp-lated | Time [us/#fn] | Slowdown vs [Best] | Version | Exec | | :---: | :---: | --- | :---: | :---: | :---: | :---: | | D | Check | No | 6.9 | 1.0 [D] | v2.093.1-697-g537aa8eb1 | `dmd` | | D | Check | No | 7.5 | 1.1 [D] | 1.23.0 | `ldmd2` | | D | Check | Yes | 17.4 | 2.5 [D] | v2.093.1-697-g537aa8eb1 | `dmd` | | D | Check | Yes | 18.8 | 2.7 [D] | 1.23.0 | `ldmd2` | | D | Build | No | 16.8 | 2.4 [Vox] | v2.093.1-697-g537aa8eb1 | `dmd` | | D | Build | No | 192.8 | 27.3 [Vox] | 1.23.0 | `ldmd2` | | D | Build | Yes | 29.7 | 4.2 [Vox] | v2.093.1-697-g537aa8eb1 | `dmd` | | D | Build | Yes | 211.0 | 29.9 [Vox] | 1.23.0 | `ldmd2` | | Vox | Build | No | 7.1 | 1.0 [Vox] | master | `vox` | | Vox | Build | Yes | 7.9 | 1.1 [Vox] | master | `vox` | vox build equals dmd check in speed! I guess it's time to start running the binary aswell to see if there are any speed differences.
Aug 26 2020
parent reply MrSmith <mrsmith33 yandex.ru> writes:
On Wednesday, 26 August 2020 at 23:07:28 UTC, Per Nordlöw wrote:
 I guess it's time to start running the binary aswell to see if 
 there are any speed differences.
Vox uses SSA form + linear scan register allocation, but no other major optimizations are done yet. I would guess performance between debug and release version of other compilers. You may want to check --print-mem and --print-time flags to get detailed stats.
Aug 26 2020
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 23:14:57 UTC, MrSmith wrote:
 Vox ...
What does the macro syntax look like in Vox?
Aug 27 2020
prev sibling parent Jacob Carlborg <doob me.com> writes:
On Wednesday, 26 August 2020 at 20:10:09 UTC, MrSmith wrote:
 On Wednesday, 26 August 2020 at 19:26:17 UTC, Per Nordlöw wrote:
 How should I best build vox for maximum performance?
It needs cli version passed: ldc2 -d-version=cli -m64 -O3 -release -boundscheck=off -enable-inlining -flto=full -i main.d -of=./tjc
Should add the following flags as well for best performance: `--mcpu=native --defaultlib=libdruntime-ldc-lto.,libphobos2-ldc-lto` The first will enable extra features the current CPU supports, like SSE. The second one will link to druntime and Phobos compiled with LTO enabled, instead of the regular libraries. -- /Jacob Carlborg
Aug 27 2020
prev sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 26 August 2020 at 17:27:48 UTC, MrSmith wrote:
 I run benchmark myself and here is what Vox code should look 
 like based on D code:
What about adding Vox-support for performing semantic analysis only, similar to dmd's -o- flag?
Aug 26 2020
prev sibling next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 20 August 2020 at 20:50:25 UTC, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to 
 other languages at

     https://github.com/nordlow/compiler-benchmark

 I wonder; is there any language that compiles to native code 
 anywhere nearly as fast or faster than D, except C?

 If so it most likely needs to use a backend other than LLVM.

 I believe Jai is supposed to do that but it hasn't been 
 released yet.
I just added support for Apple's Swift. It's massively slow on Linux. Check is 61 times slower than dmd and build is 42 times slower than dmd. ./benchmark --languages=D,Swift --function-count=200 --function-depth=450 --run-count=1 outputs (in Markdown) | Lang-uage | Oper-ation | Temp-lated | Op Time [us/#fn] | Slowdown vs [Best] | Run Time [us/#fn] | Version | Exec | | :---: | :---: | --- | :---: | :---: | :---: | :---: | :---: | | D | Check | No | 7.5 | 1.0 [D] | N/A | v2.094.0-rc.1-75-ga0875a7e0 | `dmd` | | D | Check | No | 8.5 | 1.1 [D] | N/A | 1.23.0 | `ldmd2` | | D | Check | Yes | 19.8 | 2.6 [D] | N/A | v2.094.0-rc.1-75-ga0875a7e0 | `dmd` | | D | Check | Yes | 22.9 | 3.0 [D] | N/A | 1.23.0 | `ldmd2` | | D | Build | No | 27.1 | 1.0 [D] | 50 | v2.094.0-rc.1-75-ga0875a7e0 | `dmd` | | D | Build | No | 205.6 | 7.6 [D] | 108 | 1.23.0 | `ldmd2` | | D | Build | Yes | 38.2 | 1.4 [D] | 31 | v2.094.0-rc.1-75-ga0875a7e0 | `dmd` | | D | Build | Yes | 214.7 | 7.9 [D] | 113 | 1.23.0 | `ldmd2` | | Swift | Check | No | 461.4 | 61.3 [D] | N/A | 5.3 | `swiftc` | | Swift | Build | No | 1133.4 | 41.8 [D] | 61 | 5.3 | `swiftc` | I'll rerun all the benchmarks now.
Sep 25 2020
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Friday, 25 September 2020 at 16:01:32 UTC, Per Nordlöw wrote:
 I just added support for Apple's Swift. It's massively slow on 
 Linux. Check is 61 times slower than dmd and build is 42 times 
 slower than dmd.
The update comments at [1] gives some clues to problems with "Big Agenda Languages" in general och Swift in particular. Jonathan Blow explains the meaning of the term "Big Agenda Languages" at [2]. This should be mentioned when people ask us why we work for Dlang. [1] https://stackoverflow.com/questions/25537614/why-is-swift-compile-time-so-slow [2] https://github.com/BSVino/JaiPrimer/blob/master/JaiPrimer.md
Sep 28 2020
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On Thursday, 20 August 2020 at 20:50:25 UTC, Per Nordlöw wrote:
 After having evaluated the compilation speed of D compared to 
 other languages at

     https://github.com/nordlow/compiler-benchmark

 I wonder; is there any language that compiles to native code 
 anywhere nearly as fast or faster than D, except C?
You should have a look at the self-hosted Zig compiler as well. I'm not sure if it's mature enough to benchmark (it currently only supports a small set of Zig). You would probably need to compile it from source as well. Have a look at this post [1] I made, that mentions how the Zig compiler uses incremental compilation. Even a full build (non-incremental) seems really fast. Although, I don't know how well it scales. [1] https://forum.dlang.org/post/ctelroirrkqpkrlupajp forum.dlang.org -- /Jacob Carlborg
Sep 29 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 29 September 2020 at 08:03:07 UTC, Jacob Carlborg 
wrote:
 Have a look at this post [1] I made, that mentions how the Zig 
 compiler uses incremental compilation.

 https://forum.dlang.org/post/ctelroirrkqpkrlupajp forum.dlang.org
Thanks. Have you tried building and using the self-hosted version?
Sep 29 2020
parent Jacob Carlborg <doob me.com> writes:
On Tuesday, 29 September 2020 at 08:23:28 UTC, Per Nordlöw wrote:

 Thanks. Have you tried building and using the self-hosted 
 version?
No, I have not. I've looked at a couple of live coding videos about Zig and the self-hosted compiler. -- /Jacob Carlborg
Sep 29 2020