digitalmars.D - Potential of a compiler that creates the executable at once

rempas (74/74) Feb 10 2022 A couple of months ago, I found out about a language called

Araq (28/29) Feb 10 2022 It's a very bad idea, it's in fact so bad that I wouldn't call it

rempas (5/32) Feb 10 2022 Thank you for your reply! I suppose you are right and I'm glad I

bauss (18/26) Feb 10 2022 You see, there's a large misconception here.

Mark (14/30) Feb 10 2022 If you generate an executable directly (without going through
rempas (11/29) Feb 11 2022 Yeah, Of course! There is no misconception here. Templates play a

Walter Bright (4/4) Feb 10 2022 This is actually the reason behind why dmd will create a single object f...

rempas (14/19) Feb 11 2022 That's nice to hear! However, does DMD generates object files

max haughton (3/11) Feb 11 2022 The object emission code in the backend is quite inefficient, it

rempas (11/13) Feb 11 2022 I would love if they would do it but I can't complain that they

user1234 (3/9) Feb 11 2022 Openhub and their metrics are old trash. It's more 170K according

user1234 (4/15) Feb 11 2022 wait... it's 175K. I had not pulled since 8 monthes or so.

H. S. Teoh (26/41) Feb 11 2022 I pulled just this week, and running `wc` on *.d *.c *.h says there are

Stanislav Blinov (4/5) Feb 11 2022 https://github.com/AlDanial/cloc would yield a more practical

max haughton (20/26) Feb 11 2022 ```

rempas (7/35) Feb 11 2022 Interesting! We could remove the "test-suit" directory and we

user1234 (257/290) Feb 12 2022 This is the number I gave yesterday. D-Scanner counts sloc more

Walter Bright (2/2) Feb 11 2022 None of the C or C++ code is part of dmd, it is there to interface with ...

H. S. Teoh (5/11) Feb 11 2022 I'm skeptical of any LoC metric.

rempas (18/20) Feb 11 2022 This reminds me of what Walter said before! It is actually so

H. S. Teoh (51/55) Feb 11 2022 [...]

rempas (20/22) Feb 11 2022 I hear you loud and clear! It's very funny how "professionals"

forkit (3/5) Feb 11 2022 umm..reasoning that involves negation is extremely difficult.

rempas (16/18) Feb 11 2022 Of course I was saying that to justify my idea about software

user1234 (5/24) Feb 12 2022 That's why I told earlier that OpenHUB is old trash. Their

user1234 (3/17) Feb 11 2022 Ah yes, the h files... D-Scanner does not take them in account.

rempas (7/12) Feb 11 2022 Thank you for the information! It seems pretty impressive to me

H. S. Teoh (6/11) Feb 11 2022 https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-par...

Walter Bright (1/1) Feb 11 2022 The backend is currently 127,748 lines of code, including the optimizer.

Walter Bright (12/14) Feb 11 2022 I suppose that depends on what you're used to. The basic design is prett...

forkit (11/14) Feb 11 2022 It sure is!

rempas (9/12) Feb 12 2022 Thank you! That's the reason I don't use Rust at all and for

forkit (17/21) Feb 12 2022 Yeah .. users... erh...

rempas (36/53) Feb 12 2022 Makes total sense to me. It's the same way people choose Python
Paulo Pinto (3/7) Feb 12 2022 They went on to create Eiffel, Delphi, .NET, Java, V8, GraalVM

rempas (8/10) Feb 12 2022 Half you mentioned are interpreters. Bytecode, JITs, saying

Paulo Pinto (5/17) Feb 12 2022 That only shows how little you know of them, and the available

rempas (8/12) Feb 12 2022 You are true on what you are saying, but what made you say that

H. S. Teoh (15/21) Feb 12 2022 [...]

rempas (10/21) Feb 12 2022 If you get to a point that runtime becomes too slow for a

H. S. Teoh (19/24) Feb 12 2022 [...]

rempas (4/19) Feb 12 2022 Game dev was what I was sure about and the first thing that comes
max haughton (7/30) Feb 12 2022 The thing with dmd isn't just the performance that also it's

rempas (21/26) Feb 12 2022 You could email the creator of Vox and ask him about the general

Walter Bright (5/8) Feb 12 2022 It doesn't assign variables to x87 registers. The reason it doesn't is b...

rempas (4/9) Feb 12 2022 Thanks for the info! Yeah, agree with you! It seems why should

max haughton (5/21) Feb 12 2022 I'm specifically talking about the file that handles elf files,

Walter Bright (5/8) Feb 12 2022 The elf generator was written nearly 30 years ago, and has never been re...

max haughton (8/16) Feb 12 2022 It's on my list.

Dennis (13/17) Feb 11 2022 DMD goes from its own backend block tree to an object file,

rempas (4/16) Feb 11 2022 Thank you! This sums it up perfectly! Can you choose to pass it
Walter Bright (15/18) Feb 11 2022 I use that dumb feature most every day. It's the most productivity enhan...

rempas (7/22) Feb 11 2022 THANK YOU!!! Every compiler needs that even if they can output

Walter Bright (2/6) Feb 12 2022 You're quite welcome!

Walter Bright (6/9) Feb 11 2022 It generates object files directly. No "asm" step. The intermediate code...

Dave P. (52/53) Feb 10 2022 I think it would be interesting to combine a compiler and a

max haughton (6/16) Feb 10 2022 This goes away if you do a debug build, which most (all

Dave P. (23/42) Feb 10 2022 That *is* a debug build:

Walter Bright (16/24) Feb 10 2022 Things I have never been able to explain, even to long time professional...

rikki cattermole (7/11) Feb 10 2022 It does depend on a few factors.

Walter Bright (2/17) Feb 10 2022 All linkers work this way.

max haughton (8/33) Feb 10 2022 If by hook you mean a callback of sorts that can be overrided,

Walter Bright (25/30) Feb 10 2022 That's not how multiple libraries work.

Dennis (4/8) Feb 11 2022 Unless your compiler places all functions in COMDATs of course.

Walter Bright (7/22) Feb 11 2022 Yes, common blocks (of which COMDATs are) are all treated as identical a...

Dennis (6/10) Feb 11 2022 Don't rely on this when using DMD though, since it likes to place
sfp (9/42) Feb 11 2022 You have now successfully explained this to at least one
max haughton (20/53) Feb 11 2022 If all the libraries rely on hooking something you will silently

Walter Bright (7/23) Feb 11 2022 All link operations conform to the ordering I described. I can't think o...

John Colvin (15/40) Feb 12 2022 I absolutely don’t want my executable defined by the order things

Walter Bright (4/9) Feb 12 2022 For better or worse, that's how linkers work.

rempas (4/55) Feb 11 2022 Yeah, error messages could ALWAYS be better in any compiler (even

Era Scarecrow (16/21) Feb 10 2022 TCC (*Tiny C Compiler*) does this like 10 years ago. TCC was

Walter Bright (3/9) Feb 10 2022 Back in the olden days, creating a DOS executable was trivial. Things ha...
max haughton (27/48) Feb 10 2022 Optimizations are slow, and optimizations that aren't a total

Walter Bright (2/5) Feb 11 2022 Much of that comes from supporting 4 very different object file formats.

rempas (16/28) Feb 11 2022 Yeah but when I don't cross-compile, I only compile for one OS
Patrick Schluter (9/19) Feb 11 2022 If one wants to get really historic it is also what made Turbo

rempas (22/30) Feb 11 2022 Yep and that's what I love about it! You can have 2 ways to do
Era Scarecrow (14/17) Feb 11 2022 Mmm hard to say on various compilers, i never had the money when

rempas <rempas tutanota.com> writes:

A couple of months ago, I found out about a language called 
[Vox](https://github.com/MrSmith33/vox) which uses a design that 
I haven't seen before by any other compiler which is to not 
create object files and then link them together but instead, 
always create an executable at once. This means that every time 
we change something in our code, we have to recompile the whole 
thing. Naturally, you will say that this is a huge problem 
because we will have to wait a lot of times every time we make a 
small change to our project but here is the thing... With this 
design, the compilation times can become really really fast (of 
course the design of the compiler matters too)!

At some point about 3 months ago, the creator of the language 
said that at that point, Vox can compile 1.2M LoC/S which is 
really really fast and this is a point that 99% of the projects 
will not reach so your project will always compiler in less than 
a second no matter what! What is even more impressive is that Vox 
is single thread so when parsing the files for symbols and 
errors, why could get a much bigger performance boost if we had 
multithread support!

Of course, not creating object files and then link them means 
that we don't have to create a lot of object files and then link 
them all into a big executable but rather start creating this 
executable and add everything up. You can understand how this can 
save a lot of time! And CPUs are so fast in our days that we can 
compile Million lines of code in less than a second using 
multi-thead support so even then very rare huge projects will 
compile very fast.
What's even more impressive is that Vox is not even the fastest 
compiler out there. TCC is even faster (about 4-5 times)! I have 
personally tried to see how fast TCC is able to compile using my 
CPU which is Ryzen 5 2400G. I was able to compile 4M LoC in 
700ms! Yeah, the speeds are crazy! And my CPU is an average one, 
if you were to build a PC now, you would get something that is at 
least 20% faster with at least 2 more threads!

However, this is not the best test. This was only an one-line 
functions that had the same assembly code in it without any 
preprocess and libraries linked so I don't know if this played 
any role but that was 8 files using 8 threads and the speed is 
just unreal! And TCC DOES create object files and then links 
them. How faster it could be if it used the same design Vox uses 
(And how slower would Vox be if it used the same design regular 
compilers use?)?

Of course, TCC doesn't produce optimized code but still, even 
when compared with GCC's "-O0", it generates code 4-7 times 
faster than GCC so if TCC could optimize code as much as GCC and 
was using the design Vox used, I can see it been able to compile 
around 1-1.5M LoC/s!

I am personally really interested and inspired to make my own 
compiler by this design. This design also solves a lot of 
problems that we would have to take into account in the other 
classic method. One thing that I thought was the ability to be 
able to also export your project as a library (mostly 
shared/dynamic) so in case you have something really huge like 
10+M LoC (Linux kernel I'm talking to you!), you could split it 
to "sub projects" that will be libraries and then link them all 
together.

Another idea would be to check the type of the files that are 
passed to the compiler and if they are source files, do not 
create object files as they would not be kept anyways. So the 
following would apply:

```
my_lang -c test3.lang // compile mode! Outputs the object files 
"test3.o"

my_lang -c test1.lang test2.lang test3.o -o=TEST // Create 
executable. "test1.lang" and "test2.lang" are source files so we 
won't create object files for them but rather will go straight to 
create a binary out of them. "test3.o" is an object files so we 
will "copy-past" its symbols to the final binary file.
```

This is probably the best of both worlds!

So I thought about sharing this and see what your thoughts are! 
How fast DMD could be using this design? Or even better if we 
created a new, faster backend for DMD that would be faster than 
the current one? D could be very competitive!

Feb 10 2022

Araq <rumpf_a web.de> writes:

On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:
 This is probably the best of both worlds!


It's a very bad idea, it's in fact so bad that I wouldn't call it 
a "design":

- Since everything is recompiled all the time regardless, there 
is no incentive for "modularity" in the language design. Nor is 
there any incentive to keep the compiler's internals clean. Soon 
everything in the compiler operates on an enormous mutable graph 
internally, encouraging many, many bugs.
- You'll likely run into memory management problems too as you 
cannot free memory  as everything is connected to everything 
else. Even if you are willing to use a GC the GC cannot help you 
much as your liveset simply keeps growing.
- Every compiler bugfix tends to add code to a compiler, so it'll 
get slower over time.
- The same is true for the memory consumption, it'll get worse 
over time.
- Every optimization you add to the compiler must not destroy 
your lovely compile-times. So everything in the compiler is 
speed-critical and has to be optimized. Almost anything you do 
ends up being on the critical path.
- This does not only affect optimizations (which can depend on 
algorithms that are O(n^3) btw) but also all sorts of linting 
phases. And static analysis gets more important over time too.

In summary: People expect optimizers and static analysis to get 
better too and demand more of their tools. Your "design" doesn't 
allow for this. And in an IDE setting you might be able to skip 
all the expensive optimization steps, but not the static analyser 
steps.

Feb 10 2022

rempas <rempas tutanota.com> writes:

On Thursday, 10 February 2022 at 10:38:05 UTC, Araq wrote:
 It's a very bad idea, it's in fact so bad that I wouldn't call 
 it a "design":

 - Since everything is recompiled all the time regardless, there 
 is no incentive for "modularity" in the language design. Nor is 
 there any incentive to keep the compiler's internals clean. 
 Soon everything in the compiler operates on an enormous mutable 
 graph internally, encouraging many, many bugs.
 - You'll likely run into memory management problems too as you 
 cannot free memory  as everything is connected to everything 
 else. Even if you are willing to use a GC the GC cannot help 
 you much as your liveset simply keeps growing.
 - Every compiler bugfix tends to add code to a compiler, so 
 it'll get slower over time.
 - The same is true for the memory consumption, it'll get worse 
 over time.
 - Every optimization you add to the compiler must not destroy 
 your lovely compile-times. So everything in the compiler is 
 speed-critical and has to be optimized. Almost anything you do 
 ends up being on the critical path.
 - This does not only affect optimizations (which can depend on 
 algorithms that are O(n^3) btw) but also all sorts of linting 
 phases. And static analysis gets more important over time too.

 In summary: People expect optimizers and static analysis to get 
 better too and demand more of their tools. Your "design" 
 doesn't allow for this. And in an IDE setting you might be able 
 to skip all the expensive optimization steps, but not the 
 static analyser steps.

Thank you for your reply! I suppose you are right and I'm glad I 
asked people with more experience than me. It would be fun to 
hear more negative thoughts to see all the things that I'm 
missing.

Feb 10 2022

bauss <jj_1337 live.dk> writes:

On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:
 At some point about 3 months ago, the creator of the language 
 said that at that point, Vox can compile 1.2M LoC/S which is 
 really really fast and this is a point that 99% of the projects 
 will not reach so your project will always compiler in less 
 than a second no matter what! What is even more impressive is 
 that Vox is single thread so when parsing the files for symbols 
 and errors, why could get a much bigger performance boost if we 
 had multithread support!

You see, there's a large misconception here.

Typically slow compile times aren't due to the LoC a project has, 
but rather what happens during the compilation.

Ex. template instantiation, functions executed at ctfe, 
preprocessing, optimization etc.

I've seen projects with only a couple thousand lines of code 
compile slower than projects with hundreds of thousands of lines 
of code.

Generally most compiles can read large source files and parse 
their tokens etc. really fast, it's usually what happens 
afterwards that are the bottleneck.

Say if you have a project that is compiling very slow, usually 
you won't start out by cutting the amount of lines you have, 
because that's often not as easy or even possible, but rather you 
profile where the compiler is spending most of its time and then 
you attempt to resolve it, ex. perhaps you're running nested 
loops that are unnecessary etc. at compile-time and so on.

Feb 10 2022

Mark <smarksc gmail.com> writes:

On Thursday, 10 February 2022 at 11:54:59 UTC, bauss wrote:
 On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:
 At some point about 3 months ago, the creator of the language 
 said that at that point, Vox can compile 1.2M LoC/S which is 
 really really fast and this is a point that 99% of the 
 projects will not reach so your project will always compiler 
 in less than a second no matter what! What is even more 
 impressive is that Vox is single thread so when parsing the 
 files for symbols and errors, why could get a much bigger 
 performance boost if we had multithread support!

 You see, there's a large misconception here.

 Typically slow compile times aren't due to the LoC a project 
 has, but rather what happens during the compilation.

 Ex. template instantiation, functions executed at ctfe, 
 preprocessing, optimization etc.

If you generate an executable directly (without going through 
compilation to object files and then linking) then you can save 
some compile time on these tasks, no? For instance, you can 
maintain some sort of global cache so that repeated 
instantiations of the same template (in different compilation 
units) are detected during compilation; this then saves you time 
on compiling something that you have already compiled before. I 
assume that such repeated instantiations are very common when 
there is heavy usage of the standard library.

The same goes for identical CTFEs and any other compilation step 
that can potentially repeat in different compilation units.

Assuming link-time optimization, the end result (the executable) 
should be the same, but the compile times will be different.

Feb 10 2022

rempas <rempas tutanota.com> writes:

On Thursday, 10 February 2022 at 11:54:59 UTC, bauss wrote:
 You see, there's a large misconception here.

 Typically slow compile times aren't due to the LoC a project 
 has, but rather what happens during the compilation.

 Ex. template instantiation, functions executed at ctfe, 
 preprocessing, optimization etc.

 I've seen projects with only a couple thousand lines of code 
 compile slower than projects with hundreds of thousands of 
 lines of code.

Yeah, Of course! There is no misconception here. Templates play a 
role. When talking about LoC/s I'm talking about clear lines and 
this is why I made it clear that in my example with TCC, I didn't 
used any preprocessors hence the 4M LoC were exactly 4.

 Generally most compiles can read large source files and parse 
 their tokens etc. really fast, it's usually what happens 
 afterwards that are the bottleneck.

 Say if you have a project that is compiling very slow, usually 
 you won't start out by cutting the amount of lines you have, 
 because that's often not as easy or even possible, but rather 
 you profile where the compiler is spending most of its time and 
 then you attempt to resolve it, ex. perhaps you're running 
 nested loops that are unnecessary etc. at compile-time and so 
 on.

Of course, the backed is what matters. TCC goes from source file 
to object file directly. GCC/D/Rust etc. Go from source file, to 
IR (maybe DMD doesn't but LDC, GDC do), then Assembly and then 
object file so this takes many times more than if you did it 
directly. But even then, TCC/Vox are many more times faster so 
still there is something more. Idk...

Feb 11 2022

Walter Bright <newshound2 digitalmars.com> writes:

This is actually the reason behind why dmd will create a single object file
when 
given multiple source files on the command line. It's also why dmd can create a 
library directly.

I've toyed with the idea of generating an executable directly many times.

Feb 10 2022

rempas <rempas tutanota.com> writes:

On Thursday, 10 February 2022 at 20:39:33 UTC, Walter Bright 
wrote:
 This is actually the reason behind why dmd will create a single 
 object file when given multiple source files on the command 
 line. It's also why dmd can create a library directly.

 I've toyed with the idea of generating an executable directly 
 many times.

That's nice to hear! However, does DMD generates object files 
directly or "asm" files that are passed to a C compile? If I 
remember correctly, LDC2 needs to pass the output to a C compiler 
as people told me so what's the case from DMD?

I tried to compile a C library (code converted in D to use with 
DMD rather than using "ImportC") using GCC and DMD. And it turns 
out that DMD is about 70-80% faster than GCC which is good but I 
would suppose it could have been better given the design of the D 
as a language and if DMD outputs object files directly.

Do you think that there are any very bad places in DMD's backend? 
Has anyone in the team thought about re-writing the backend (or 
parts of it) from the beginning?

Feb 11 2022

max haughton <maxhaton gmail.com> writes:

On Friday, 11 February 2022 at 12:34:21 UTC, rempas wrote:
 On Thursday, 10 February 2022 at 20:39:33 UTC, Walter Bright 
 wrote:
 [...]

 That's nice to hear! However, does DMD generates object files 
 directly or "asm" files that are passed to a C compile? If I 
 remember correctly, LDC2 needs to pass the output to a C 
 compiler as people told me so what's the case from DMD?

 [...]

The object emission code in the backend is quite inefficient, it 
needs to be rewritten (it's horrible old code anyway)

Feb 11 2022

rempas <rempas tutanota.com> writes:

On Friday, 11 February 2022 at 14:52:09 UTC, max haughton wrote:
 The object emission code in the backend is quite inefficient, 
 it needs to be rewritten (it's horrible old code anyway)

I would love if they would do it but I can't complain that they 
don't. Openhub reports that [DMD] consists of 961K LoC!! I know 
that D is a huge language so the frontend will be a good part of 
it and that code for some other stuff (including a lot of stuff 
for the backend) will probably not change. But this is A LOT to 
do still!

Maybe they can do that for D 3.0 along with removing the need for 
GC to use Phobos (and giving the ability to only close that in 
the compiler) then I can see D becoming as big as it was 
intended! But dreams are free...

Feb 11 2022

user1234 <user1234 12.de> writes:

On Friday, 11 February 2022 at 15:17:16 UTC, rempas wrote:
 On Friday, 11 February 2022 at 14:52:09 UTC, max haughton wrote:
 The object emission code in the backend is quite inefficient, 
 it needs to be rewritten (it's horrible old code anyway)

 I would love if they would do it but I can't complain that they 
 don't. Openhub reports that [DMD] consists of 961K LoC!!

Openhub and their metrics are old trash. It's more 170K according 
to D-Scanner.

Feb 11 2022

user1234 <user1234 12.de> writes:

On Friday, 11 February 2022 at 16:41:33 UTC, user1234 wrote:
 On Friday, 11 February 2022 at 15:17:16 UTC, rempas wrote:
 On Friday, 11 February 2022 at 14:52:09 UTC, max haughton 
 wrote:
 The object emission code in the backend is quite inefficient, 
 it needs to be rewritten (it's horrible old code anyway)

 I would love if they would do it but I can't complain that 
 they don't. Openhub reports that [DMD] consists of 961K LoC!!

 Openhub and their metrics are old trash. It's more 170K 
 according to D-Scanner.

wait... it's 175K. I had not pulled since 8 monthes or so. 
There's much new code that was commited since, with importC 
notably.

Feb 11 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Feb 11, 2022 at 04:47:46PM +0000, user1234 via Digitalmars-d wrote:
 On Friday, 11 February 2022 at 16:41:33 UTC, user1234 wrote:
 On Friday, 11 February 2022 at 15:17:16 UTC, rempas wrote:
 On Friday, 11 February 2022 at 14:52:09 UTC, max haughton wrote:
 
 The object emission code in the backend is quite inefficient, it
 needs to be rewritten (it's horrible old code anyway)

 
 I would love if they would do it but I can't complain that they
 don't. Openhub reports that [DMD] consists of 961K LoC!!

 
 Openhub and their metrics are old trash. It's more 170K according to
 D-Scanner.

 
 wait... it's 175K. I had not pulled since 8 monthes or so. There's
 much new code that was commited since, with importC notably.

I pulled just this week, and running `wc` on *.d *.c *.h says there are
365K lines.  I'm not sure what the *.h files are for, since DMD is now
bootstrapping. Excluding *.h yields 347K lines.  But a lot of those are
actually blank lines and comments; excluding // comments, /**/ and /++/
block comments, and blank lines yields 175K.

The 961K probably comes from the myriad test cases in the testsuite,
where more lines is actually a *good* thing.

But really, LoC is an unreliable measure of code complexity. Token count
would be more reflective of the actual complexity of the code, though
even that is questionable. Writing `enum x = 1 + 1;` would be 7 tokens
vs. `enum x = 2;` which is 5 tokens, for example, but the former may
actually make code easier to read in certain cases (e.g., if the longer
expression makes intent clearer that the shorter one).

Compressed size may be an even better approximation, because a high
degree of complexity approaches Kolgomorov complexity in the limit,
which is a measure of the information content of the data. Stripping
comments and compressing (with the best compression algorithm you can
find), for example, would give a good approximation to the actual
complexity in the code.  Though of course, even that fails to measure
the inherent level of complexity in language constructs. So you couldn't
meaningfully compare compressed sizes across different languages, for
example.


T

-- 
Unix was not designed to stop people from doing stupid things, because that
would also stop them from doing clever things. -- Doug Gwyn

Feb 11 2022

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Friday, 11 February 2022 at 17:36:37 UTC, H. S. Teoh wrote:

 I pulled just this week, and running `wc` on *.d *.c *.h says...

https://github.com/AlDanial/cloc would yield a more practical 
metric, at least as far as "practical metric" in terms of LoC 
goes.

Feb 11 2022

max haughton <maxhaton gmail.com> writes:

On Friday, 11 February 2022 at 17:44:45 UTC, Stanislav Blinov 
wrote:
 On Friday, 11 February 2022 at 17:36:37 UTC, H. S. Teoh wrote:

 I pulled just this week, and running `wc` on *.d *.c *.h 
 says...

 https://github.com/AlDanial/cloc would yield a more practical 
 metric, at least as far as "practical metric" in terms of LoC 
 goes.

```
---------------------------------------------------------------------------------------
Language                             files          blank        
comment           code
---------------------------------------------------------------------------------------
D                                     3867          75824         
  88426         431299
HTML                                   114          11405         
    967          61083
C/C++ Header                            57           2729         
    992          23332
C                                       93            830         
    797           3346
C++                                     19            532         
    139           2249
```
this includes the test suite and other stuff that isn't 
technically the compiler-proper.

Feb 11 2022

rempas <rempas tutanota.com> writes:

On Friday, 11 February 2022 at 18:02:21 UTC, max haughton wrote:
 On Friday, 11 February 2022 at 17:44:45 UTC, Stanislav Blinov 
 wrote:
 On Friday, 11 February 2022 at 17:36:37 UTC, H. S. Teoh wrote:

 I pulled just this week, and running `wc` on *.d *.c *.h 
 says...

 https://github.com/AlDanial/cloc would yield a more practical 
 metric, at least as far as "practical metric" in terms of LoC 
 goes.

 ```
 ---------------------------------------------------------------------------------------
 Language                             files          blank       
  comment           code
 ---------------------------------------------------------------------------------------
 D                                     3867          75824       
    88426         431299
 HTML                                   114          11405       
      967          61083
 C/C++ Header                            57           2729       
      992          23332
 C                                       93            830       
      797           3346
 C++                                     19            532       
      139           2249
 ```
 this includes the test suite and other stuff that isn't 
 technically the compiler-proper.

Interesting! We could remove the "test-suit" directory and we 
could tell it to only parse "D" language files which will give us 
more "clean" results. "cloc" is actually what I use and for 
DragonFlyBSD, it gave me the same number "OpenHub" gave so I 
really wonder how other source code or languages have different 
results...

Feb 11 2022

user1234 <user1234 12.de> writes:

On Friday, 11 February 2022 at 20:19:16 UTC, rempas wrote:
 On Friday, 11 February 2022 at 18:02:21 UTC, max haughton wrote:
 On Friday, 11 February 2022 at 17:44:45 UTC, Stanislav Blinov 
 wrote:
 On Friday, 11 February 2022 at 17:36:37 UTC, H. S. Teoh wrote:

 I pulled just this week, and running `wc` on *.d *.c *.h 
 says...

 https://github.com/AlDanial/cloc would yield a more practical 
 metric, at least as far as "practical metric" in terms of LoC 
 goes.

 ```
 ---------------------------------------------------------------------------------------
 Language                             files          blank      
   comment           code
 ---------------------------------------------------------------------------------------
 D                                     3867          75824      
     88426         431299
 HTML                                   114          11405      
       967          61083
 C/C++ Header                            57           2729      
       992          23332
 C                                       93            830      
       797           3346
 C++                                     19            532      
       139           2249
 ```
 this includes the test suite and other stuff that isn't 
 technically the compiler-proper.

 Interesting! We could remove the "test-suit" directory and we 
 could tell it to only parse "D" language files which will give 
 us more "clean" results.

This is the number I gave yesterday. D-Scanner counts sloc more 
cleverly than the other tools mentionned. The report in detail:

dmd/src/build.d:	740
dmd/src/dmd/access.d:	181
dmd/src/dmd/aggregate.d:	362
dmd/src/dmd/aliasthis.d:	93
dmd/src/dmd/apply.d:	58
dmd/src/dmd/argtypes_aarch64.d:	96
dmd/src/dmd/argtypes_sysv_x64.d:	199
dmd/src/dmd/argtypes_x86.d:	190
dmd/src/dmd/arrayop.d:	176
dmd/src/dmd/arraytypes.d:	43
dmd/src/dmd/astbase.d:	2640
dmd/src/dmd/astcodegen.d:	83
dmd/src/dmd/astenums.d:	55
dmd/src/dmd/ast_node.d:	4
dmd/src/dmd/asttypename.d:	73
dmd/src/dmd/attrib.d:	484
dmd/src/dmd/backend/aarray.d:	244
dmd/src/dmd/backend/backconfig.d:	335
dmd/src/dmd/backend/backend.d:	7
dmd/src/dmd/backend/barray.d:	72
dmd/src/dmd/backend/bcomplex.d:	127
dmd/src/dmd/backend/blockopt.d:	1367
dmd/src/dmd/backend/cc.d:	534
dmd/src/dmd/backend/cdef.d:	223
dmd/src/dmd/backend/cg87.d:	2521
dmd/src/dmd/backend/cgcod.d:	1743
dmd/src/dmd/backend/cgcs.d:	445
dmd/src/dmd/backend/cgcse.d:	78
dmd/src/dmd/backend/cgcv.d:	58
dmd/src/dmd/backend/cg.d:	162
dmd/src/dmd/backend/cgelem.d:	3342
dmd/src/dmd/backend/cgen.d:	232
dmd/src/dmd/backend/cgobj.d:	1827
dmd/src/dmd/backend/cgreg.d:	539
dmd/src/dmd/backend/cgsched.d:	1595
dmd/src/dmd/backend/cgxmm.d:	1352
dmd/src/dmd/backend/cod1.d:	3447
dmd/src/dmd/backend/cod2.d:	3650
dmd/src/dmd/backend/cod3.d:	4719
dmd/src/dmd/backend/cod4.d:	3039
dmd/src/dmd/backend/cod5.d:	102
dmd/src/dmd/backend/codebuilder.d:	167
dmd/src/dmd/backend/code.d:	434
dmd/src/dmd/backend/code_x86.d:	114
dmd/src/dmd/backend/compress.d:	63
dmd/src/dmd/backend/cv4.d:	2
dmd/src/dmd/backend/cv8.d:	638
dmd/src/dmd/backend/dcgcv.d:	2196
dmd/src/dmd/backend/dcode.d:	52
dmd/src/dmd/backend/debugprint.d:	279
dmd/src/dmd/backend/disasm86.d:	3316
dmd/src/dmd/backend/divcoeff.d:	129
dmd/src/dmd/backend/dlist.d:	197
dmd/src/dmd/backend/drtlsym.d:	468
dmd/src/dmd/backend/dt.d:	316
dmd/src/dmd/backend/dtype.d:	892
dmd/src/dmd/backend/dvarstats.d:	230
dmd/src/dmd/backend/dvec.d:	287
dmd/src/dmd/backend/dwarf2.d:	1
dmd/src/dmd/backend/dwarf.d:	20
dmd/src/dmd/backend/dwarfdbginf.d:	1721
dmd/src/dmd/backend/dwarfeh.d:	308
dmd/src/dmd/backend/ee.d:	59
dmd/src/dmd/backend/el.d:	112
dmd/src/dmd/backend/elem.d:	1649
dmd/src/dmd/backend/elfobj.d:	1699
dmd/src/dmd/backend/elpicpie.d:	483
dmd/src/dmd/backend/errors.di:	2
dmd/src/dmd/backend/evalu8.d:	1628
dmd/src/dmd/backend/exh.d:	28
dmd/src/dmd/backend/filespec.d:	158
dmd/src/dmd/backend/fp.d:	11
dmd/src/dmd/backend/gdag.d:	527
dmd/src/dmd/backend/gflow.d:	1034
dmd/src/dmd/backend/global.d:	343
dmd/src/dmd/backend/glocal.d:	419
dmd/src/dmd/backend/gloop.d:	2129
dmd/src/dmd/backend/go.d:	247
dmd/src/dmd/backend/goh.d:	58
dmd/src/dmd/backend/gother.d:	1136
dmd/src/dmd/backend/gsroa.d:	330
dmd/src/dmd/backend/iasm.d:	80
dmd/src/dmd/backend/mach.d:	137
dmd/src/dmd/backend/machobj.d:	1361
dmd/src/dmd/backend/md5.d:	152
dmd/src/dmd/backend/md5.di:	9
dmd/src/dmd/backend/melf.d:	317
dmd/src/dmd/backend/mem.d:	19
dmd/src/dmd/backend/mscoff.d:	82
dmd/src/dmd/backend/mscoffobj.d:	1002
dmd/src/dmd/backend/newman.d:	1065
dmd/src/dmd/backend/nteh.d:	445
dmd/src/dmd/backend/obj.d:	299
dmd/src/dmd/backend/oper.d:	444
dmd/src/dmd/backend/os.d:	409
dmd/src/dmd/backend/out.d:	989
dmd/src/dmd/backend/pdata.d:	112
dmd/src/dmd/backend/ph2.d:	63
dmd/src/dmd/backend/ptrntab.d:	986
dmd/src/dmd/backend/rtlsym.d:	4
dmd/src/dmd/backend/symbol.d:	1259
dmd/src/dmd/backend/symtab.d:	50
dmd/src/dmd/backend/ty.d:	50
dmd/src/dmd/backend/type.d:	94
dmd/src/dmd/backend/util2.d:	162
dmd/src/dmd/backend/var.d:	395
dmd/src/dmd/backend/xmm.d:	1
dmd/src/dmd/blockexit.d:	229
dmd/src/dmd/builtin.d:	263
dmd/src/dmd/canthrow.d:	141
dmd/src/dmd/chkformat.d:	801
dmd/src/dmd/cli.d:	94
dmd/src/dmd/clone.d:	858
dmd/src/dmd/common/file.d:	239
dmd/src/dmd/common/int128.d:	325
dmd/src/dmd/common/outbuffer.d:	358
dmd/src/dmd/common/string.d:	72
dmd/src/dmd/compiler.d:	195
dmd/src/dmd/cond.d:	427
dmd/src/dmd/console.d:	68
dmd/src/dmd/constfold.d:	1254
dmd/src/dmd/cparse.d:	2374
dmd/src/dmd/cppmangle.d:	1264
dmd/src/dmd/cppmanglewin.d:	850
dmd/src/dmd/ctfeexpr.d:	1203
dmd/src/dmd/ctorflow.d:	82
dmd/src/dmd/dcast.d:	2122
dmd/src/dmd/dclass.d:	515
dmd/src/dmd/declaration.d:	956
dmd/src/dmd/delegatize.d:	111
dmd/src/dmd/denum.d:	116
dmd/src/dmd/dimport.d:	174
dmd/src/dmd/dinifile.d:	194
dmd/src/dmd/dinterpret.d:	4214
dmd/src/dmd/dmacro.d:	238
dmd/src/dmd/dmangle.d:	634
dmd/src/dmd/dmdparams.d:	12
dmd/src/dmd/dmodule.d:	721
dmd/src/dmd/dmsc.d:	86
dmd/src/dmd/doc.d:	2928
dmd/src/dmd/dscope.d:	388
dmd/src/dmd/dstruct.d:	281
dmd/src/dmd/dsymbol.d:	1015
dmd/src/dmd/dsymbolsem.d:	3557
dmd/src/dmd/dtemplate.d:	4167
dmd/src/dmd/dtoh.d:	1708
dmd/src/dmd/dversion.d:	83
dmd/src/dmd/e2ir.d:	3795
dmd/src/dmd/eh.d:	189
dmd/src/dmd/entity.d:	38
dmd/src/dmd/errors.d:	358
dmd/src/dmd/escape.d:	986
dmd/src/dmd/expression.d:	2608
dmd/src/dmd/expressionsem.d:	7109
dmd/src/dmd/file_manager.d:	140
dmd/src/dmd/foreachvar.d:	193
dmd/src/dmd/frontend.d:	215
dmd/src/dmd/func.d:	1650
dmd/src/dmd/globals.d:	262
dmd/src/dmd/glue.d:	941
dmd/src/dmd/gluelayer.d:	32
dmd/src/dmd/hdrgen.d:	2231
dmd/src/dmd/iasm.d:	20
dmd/src/dmd/iasmdmd.d:	2625
dmd/src/dmd/iasmgcc.d:	231
dmd/src/dmd/id.d:	20
dmd/src/dmd/identifier.d:	125
dmd/src/dmd/impcnvtab.d:	230
dmd/src/dmd/imphint.d:	9
dmd/src/dmd/importc.d:	115
dmd/src/dmd/init.d:	125
dmd/src/dmd/initsem.d:	772
dmd/src/dmd/inlinecost.d:	202
dmd/src/dmd/inline.d:	1067
dmd/src/dmd/intrange.d:	444
dmd/src/dmd/json.d:	621
dmd/src/dmd/lambdacomp.d:	239
dmd/src/dmd/lexer.d:	2303
dmd/src/dmd/lib.d:	54
dmd/src/dmd/libelf.d:	319
dmd/src/dmd/libmach.d:	318
dmd/src/dmd/libmscoff.d:	418
dmd/src/dmd/libomf.d:	311
dmd/src/dmd/link.d:	543
dmd/src/dmd/mars.d:	1759
dmd/src/dmd/mtype.d:	3396
dmd/src/dmd/nogc.d:	127
dmd/src/dmd/nspace.d:	60
dmd/src/dmd/ob.d:	1345
dmd/src/dmd/objc.d:	293
dmd/src/dmd/objc_glue.d:	629
dmd/src/dmd/opover.d:	1066
dmd/src/dmd/optimize.d:	757
dmd/src/dmd/parse.d:	5760
dmd/src/dmd/parsetimevisitor.d:	226
dmd/src/dmd/permissivevisitor.d:	3
dmd/src/dmd/printast.d:	87
dmd/src/dmd/root/aav.d:	145
dmd/src/dmd/root/array.d:	499
dmd/src/dmd/root/bitarray.d:	89
dmd/src/dmd/root/complex.d:	35
dmd/src/dmd/root/ctfloat.d:	113
dmd/src/dmd/root/env.d:	29
dmd/src/dmd/root/file.d:	108
dmd/src/dmd/root/filename.d:	462
dmd/src/dmd/root/hash.d:	38
dmd/src/dmd/root/longdouble.d:	460
dmd/src/dmd/root/man.d:	48
dmd/src/dmd/root/optional.d:	27
dmd/src/dmd/root/port.d:	84
dmd/src/dmd/root/region.d:	56
dmd/src/dmd/root/response.d:	188
dmd/src/dmd/root/rmem.d:	134
dmd/src/dmd/root/rootobject.d:	9
dmd/src/dmd/root/speller.d:	132
dmd/src/dmd/root/string.d:	113
dmd/src/dmd/root/stringtable.d:	182
dmd/src/dmd/root/strtold.d:	284
dmd/src/dmd/root/utf.d:	136
dmd/src/dmd/s2ir.d:	878
dmd/src/dmd/safe.d:	91
dmd/src/dmd/sapply.d:	54
dmd/src/dmd/scanelf.d:	173
dmd/src/dmd/scanmach.d:	195
dmd/src/dmd/scanmscoff.d:	164
dmd/src/dmd/scanomf.d:	265
dmd/src/dmd/semantic2.d:	396
dmd/src/dmd/semantic3.d:	888
dmd/src/dmd/sideeffect.d:	178
dmd/src/dmd/statement.d:	660
dmd/src/dmd/statement_rewrite_walker.d:	71
dmd/src/dmd/statementsem.d:	2645
dmd/src/dmd/staticassert.d:	20
dmd/src/dmd/staticcond.d:	254
dmd/src/dmd/stmtstate.d:	73
dmd/src/dmd/strictvisitor.d:	222
dmd/src/dmd/target.d:	815
dmd/src/dmd/templateparamsem.d:	88
dmd/src/dmd/tocsym.d:	407
dmd/src/dmd/toctype.d:	141
dmd/src/dmd/tocvdebug.d:	689
dmd/src/dmd/todt.d:	842
dmd/src/dmd/toir.d:	529
dmd/src/dmd/tokens.d:	198
dmd/src/dmd/toobj.d:	747
dmd/src/dmd/traits.d:	1219
dmd/src/dmd/transitivevisitor.d:	481
dmd/src/dmd/typesem.d:	2798
dmd/src/dmd/typinf.d:	123
dmd/src/dmd/utils.d:	132
dmd/src/dmd/visitor.d:	117
dmd/src/dmd/vsoptions.d:	384
dmd/src/vcbuild/msvc-lib.d:	26
total:	174122

Feb 12 2022

Walter Bright <newshound2 digitalmars.com> writes:

None of the C or C++ code is part of dmd, it is there to interface with the C 
backends of gdc and ldc. dmd is 100% D.

Feb 11 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Feb 11, 2022 at 05:44:45PM +0000, Stanislav Blinov via Digitalmars-d
wrote:
 On Friday, 11 February 2022 at 17:36:37 UTC, H. S. Teoh wrote:
 
 I pulled just this week, and running `wc` on *.d *.c *.h says...

 
 https://github.com/AlDanial/cloc would yield a more practical metric,
 at least as far as "practical metric" in terms of LoC goes.

I'm skeptical of any LoC metric.


T

-- 
What do you mean the Internet isn't filled with subliminal messages? What about
all those buttons marked "submit"??

Feb 11 2022

rempas <rempas tutanota.com> writes:

On Friday, 11 February 2022 at 18:13:34 UTC, H. S. Teoh wrote:
 I'm skeptical of any LoC metric.


 T

This reminds me of what Walter said before! It is actually so 
simple that I don't understand what's so hard about it!

```
int val = 200; // This is a line of code


// This is a comment
/* This is a comment
   This counts as a comment too!
/*

int function_test() {
   int v = 10;
}
```

The following has:

Lines of code: 4
Empty lines: 3
Comments: 2

Don't we all agree that this is how we should count it?

Feb 11 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Feb 11, 2022 at 08:23:10PM +0000, rempas via Digitalmars-d wrote:
 On Friday, 11 February 2022 at 18:13:34 UTC, H. S. Teoh wrote:
 I'm skeptical of any LoC metric.


[...]
 This reminds me of what Walter said before! It is actually so simple
 that I don't understand what's so hard about it!

[...]

It's not that it's *hard*.  It's pretty straightforward, and everybody
knows what it means.

The problem is the mostly-unfounded *interpretations* that people put on
it.

In the bad ole days, LoC used to be a metric used by employers to
measure their programmers' productivity. (I *hope* they don't do that
anymore, but you never know...)  Which is completely ridiculous because
the amount of code you write has very little correlation with the amount
of effort you put into it. It's trivial to write 1000 lines of sloppy
boilerplate code that accomplishes little; it's a lot harder to write
condense that into 50 lines of code that does the same thing 10x faster
and with 10% of the memory requirements.

One of the hardest bug fixes I've done at my job involve a 1-line fix
for a subtle race condition that took 3+ months to track down and
identify.  I guess they should fire me for non-productivity, because by
the LoC metric I've done almost zero work in that time. Good luck with
the race condition, though; adding another 1000 LoC to the code ain't
getting rid of the race, it'd only obscure it even further and make it
just about impossible to find and fix.

And some of my best bug fixes involve *deleting* poorly-written
redundant code and writing a much shorter replacement. I guess they
should *really* fire me for that, because by the LoC metric I've not
only been unproductive, but *counter*productive. :-P

By the above, it should be clear that the assumption that LoC is a good
measure of complexity is an unfounded one.  If project A has 10000 LoC
and project B has 10000 LoC, does it mean they are of equal complexity?
Hardly. Project A could be mostly boilerplate, copy-pasta, redundant
code, poorly-implemented poorly-chosen O(n^2) algorithms, which has
10000 LoC simply because there's so much useless redundancy. Project B
could be a collection of fine-tuned, hand-optimized professional
algorithms that could do a LOT under the hood, and it has 10000 LoC
because it actually has a large number of algorithms implemented, and
was able to fit them all into 10000 LoC because each individual piece
was written to be as concise as needed to express the algorithm and no
more.  In terms of actual complexity, project A might as well be
kindergarten-level compared to project B's PhD sophistication.  What
does their respective LoC tell us about their complexity?  Basically
nothing.

And don't even get me started on code quality vs. LoC. An IOCCC entry
can easily fit an entire flight simulator into a single page of code,
for example. Don't expect anybody to be able to read it, though (not
even the author :-D).  A more properly-written flight simulator would
occupy a lot more than a single page of code, but in terms of
complexity, they'd be about the same, give or take.  But by the LoC
metric, the two ought to be so far apart they should be completely
unrelated to each other.  Again, the value of LoC as a metric here is
practically nil.


--T

Feb 11 2022

rempas <rempas tutanota.com> writes:

On Friday, 11 February 2022 at 22:08:57 UTC, H. S. Teoh wrote:
 [It's not that it's *hard*... practically nil.]


 --T

I hear you loud and clear! It's very funny how "professionals" 
and their companies work worse that most hobbies programmers. 
This is why I don't want to become a "professional" and work for 
a company and why I FUCKING HATE when everyone talks about 
programming based on what's popular and what you should learn to 
get a "job". Fuck this shit! I remember someone saying me the 
same thing when we were discussing about QT and I said how 
bloated it is and they guy said that this is probably due to this 
reason (as even that QT offers free licenses, a company is behind 
it).

I haven't wrote almost anything but even the few things that I 
tried to, I would always see how many things I could do with so 
few lines of code that I would always wonder how some projects 
take hundreds of thousands of lines of code or even millions! 
Like, wtf they are doing? Even for software that are minimal (see 
suckless), they still do about 80% of what the other "big and 
complete" software do with about 10% of the codebase so bloatware 
is a thing no matter how you see it! You can't make these numbers 
out!

Feb 11 2022

forkit <forkit gmail.com> writes:

On Saturday, 12 February 2022 at 06:29:37 UTC, rempas wrote:
 I haven't wrote almost anything ...
 ...

umm..reasoning that involves negation is extremely difficult.

Walter will not be happy.

Feb 11 2022

rempas <rempas tutanota.com> writes:

On Saturday, 12 February 2022 at 07:06:21 UTC, forkit wrote:
 umm..reasoning that involves negation is extremely difficult.

Of course I was saying that to justify my idea about software 
been bloated which there will be something that does 80% of it 
with just 10% of the code-base. Also I'm just gonna try my first 
(and hopefully last) book about Compiler Design and if it 
succeeds and I'm able to make a fully compiler (including a 
linker) then I may even offer to make a backend for D in case 
someone wants to work in the backend. Or maybe Walter and the 
other folks will want to adopt it and make it the official 
backend of DMD. In any case, I would be glad to offer my help if 
that means improving D!

 Walter will not be happy.

Given the fact that Walter has to actually do real work and at 
the same time he's here answering every single crap that we are 
asking (I post the most crap, not gonna lie), makes me really 
impressed on his ability to stay calm. Makes me appreciate him 
more!

Feb 11 2022

user1234 <user1234 12.de> writes:

On Friday, 11 February 2022 at 22:08:57 UTC, H. S. Teoh wrote:
 On Fri, Feb 11, 2022 at 08:23:10PM +0000, rempas via 
 Digitalmars-d wrote:
 [...]


 [...]
 [...]

 [...]

 It's not that it's *hard*.  It's pretty straightforward, and 
 everybody knows what it means.

 The problem is the mostly-unfounded *interpretations* that 
 people put on it.

 In the bad ole days, LoC used to be a metric used by employers 
 to measure their programmers' productivity. (I *hope* they 
 don't do that anymore, but you never know...)  Which is 
 completely ridiculous because the amount of code you write has 
 very little correlation with the amount of effort you put into 
 it. It's trivial to write 1000 lines of sloppy boilerplate code 
 that accomplishes little; it's a lot harder to write condense 
 that into 50 lines of code that does the same thing 10x faster 
 and with 10% of the memory requirements.

That's why I told earlier that OpenHUB is old trash. Their 
estimation [for 
DMD](https://www.openhub.net/p/dmd/estimated_cost), based on 
model from **the late 70's**

Feb 12 2022

user1234 <user1234 12.de> writes:

On Friday, 11 February 2022 at 17:36:37 UTC, H. S. Teoh wrote:
 On Fri, Feb 11, 2022 at 04:47:46PM +0000, user1234 via 
 Digitalmars-d wrote:
 On Friday, 11 February 2022 at 16:41:33 UTC, user1234 wrote:
 On Friday, 11 February 2022 at 15:17:16 UTC, rempas wrote:
 [...]

 
 Openhub and their metrics are old trash. It's more 170K 
 according to D-Scanner.

 
 wait... it's 175K. I had not pulled since 8 monthes or so. 
 There's much new code that was commited since, with importC 
 notably.

 I pulled just this week, and running `wc` on *.d *.c *.h says 
 there are 365K lines.  I'm not sure what the *.h files are for,

Ah yes, the h files... D-Scanner does not take them in account.
They are still used by GDC I believe.

Feb 11 2022

rempas <rempas tutanota.com> writes:

On Friday, 11 February 2022 at 16:47:46 UTC, user1234 wrote:
 Openhub and their metrics are old trash. It's more 170K 
 according to D-Scanner.

 wait... it's 175K. I had not pulled since 8 monthes or so. 
 There's much new code that was commited since, with importC 
 notably.

Thank you for the information! It seems pretty impressive to me 
that DMD only has 175K LoC in it's code base given the fact of 
how huge D is! Even without the recent commits (which how much 
could they be?), this seems to little to me. In that case, we can 
talk about re-writing it but again, that's up to the developers 
to decide.

Feb 11 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Feb 11, 2022 at 08:00:14PM +0000, rempas via Digitalmars-d wrote:
[...]
 Thank you for the information! It seems pretty impressive to me that
 DMD only has 175K LoC in it's code base given the fact of how huge D
 is! Even without the recent commits (which how much could they be?),
 this seems to little to me. In that case, we can talk about re-writing
 it but again, that's up to the developers to decide.

https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/


T

-- 
"I'm running Windows '98." "Yes." "My computer isn't working now." "Yes, you
already said that." -- User-Friendly

Feb 11 2022

Walter Bright <newshound2 digitalmars.com> writes:

The backend is currently 127,748 lines of code, including the optimizer.

Feb 11 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/11/2022 6:52 AM, max haughton wrote:
 The object emission code in the backend is quite inefficient,

It's faster than any other compiler.

 it needs to be rewritten (it's horrible old code anyway)

I suppose that depends on what you're used to. The basic design is pretty
simple 
- there's a code gen function for each expression node type. The optimizer uses 
standard data flow analysis math. There's a separate pass for register 
allocation, and one for scheduling.

The design was originally written for the 8086. It survived extension to 32 
bits, then 64 bits, then SIMD.

The complexity comes from the complexity of the x86 instruction set and the 
choice of instructions is very dependent on the shape of the expression trees.

The only thing it has really failed at is the x87, which everyone wants to
leave 
behind anyway.

Feb 11 2022

forkit <forkit gmail.com> writes:

On Saturday, 12 February 2022 at 07:13:15 UTC, Walter Bright 
wrote:
 On 2/11/2022 6:52 AM, max haughton wrote:
 The object emission code in the backend is quite inefficient,

 It's faster than any other compiler.

It sure is!

That's the primary reason I became interested in D - the speed of 
compilation, using dmd.


compilation speed, compared to all later versions of VS 
(including VS2022) - at least in my experience.

I don't care how great a programming language is, slow 
compilation is a real turn off!

Hooray for dmd!!

Feb 11 2022

rempas <rempas tutanota.com> writes:

On Saturday, 12 February 2022 at 07:51:38 UTC, forkit wrote:
 I don't care how great a programming language is, slow 
 compilation is a real turn off!

 Hooray for dmd!!

Thank you! That's the reason I don't use Rust at all and for 
anything (even if it is so popular and has so much support). 
That's also the reason I am OBSESSED with TCC and was inspired to 
learn about compilers and make my own.

Funny enough there are people that don't care about compilation 
speed and are willing to have their project compile even twice as 
fast for 5% runtime performance. The same people of course don't 
have any problem using Python in other cases...

Feb 12 2022

forkit <forkit gmail.com> writes:

On Saturday, 12 February 2022 at 08:12:03 UTC, rempas wrote:
 Funny enough there are people that don't care about compilation 
 speed and are willing to have their project compile even twice 
 as fast for 5% runtime performance. The same people of course 
 don't have any problem using Python in other cases...

Yeah .. users... erh...

.. but compiler writers are a different breed all together.

(well, they used to be anyway)

It used to be, that the golden rule of compiler writers was 
"performance is (almost) everything".

i.e.

- Compile time performance -> how long it takes to generate code.

- Runtime performance -> how fast that code runs.

(Almost) nothing else used to matter (to compiler writers)

Why almost? Cause in the end, you need accurate results more than 
you need speed.

(ref: Expert C Programming -  P van der Linden 1994)

I see the performance of (other) compilers these days, and I 
wonder.. what ever happened to that bread of compiler writers... 
from long ago...

Luckily, we still have one of them.

Feb 12 2022

rempas <rempas tutanota.com> writes:

On Saturday, 12 February 2022 at 11:04:48 UTC, forkit wrote:
 Yeah .. users... erh...

 .. but compiler writers are a different breed all together.

 (well, they used to be anyway)

 It used to be, that the golden rule of compiler writers was 
 "performance is (almost) everything".

 i.e.

 - Compile time performance -> how long it takes to generate 
 code.

 - Runtime performance -> how fast that code runs.

 (Almost) nothing else used to matter (to compiler writers)

 Why almost? Cause in the end, you need accurate results more 
 than you need speed.

 (ref: Expert C Programming -  P van der Linden 1994)

 I see the performance of (other) compilers these days, and I 
 wonder.. what ever happened to that bread of compiler 
 writers... from long ago...

Makes total sense to me. It's the same way people choose Python 
(or C++ or Rust or JS or whatever) over C because runtime 
performance is not the only thing that matters. Development speed 
matters too. Super fast compilation times will allow the dream of 
Gentoo, *BSD to become true and everyone will be able to compile 
everything from source with all the advantages this offers.

Another thing to mention is that I was also obsessed with the 
compiler that generates that code that "runs faster" in the past 
but then I realized something. Runtime performance is a really 
really really complicated topic! First of all, runtime 
performance may not (and probably will not) be very critical 
every time to begin with. But development time will always show!

A compile that generates my code fast and allows me to 
save-and-run as much as I can, a compiler that manages the memory 
for me because I will make mistakes as I'm human, a compile that 
does immutability be default (because again humans make mistakes) 
a compiler that will allow me to express myself the way I want 
and focus all my time to actually solve the program rather than 
find a way to bypass the languages limitations etc. THIS IS what 
matters the most!

Even when the runtime performance will be important, the 
optimizations that they compiler will do will mostly not offer 
you more than 20% runtime performance so what you should do is 
either use faster algorithms and/or change the design of your 
program (and maybe remove some unnecessary features). I finally 
understand that now! I don't chase pure raw compiler optimization 
runtime performance but good/smart program designs! Of course, I 
want my compiler to not generate unnecessary instructions but 
again, MY design is what will make the program faster.

Unfortunately, we live in a generation where people are OBSESSED 
with numbers! Ignoring their meaning and what's behind them! I 
don't want to see big words too! I have learned and I'm still 
learning day by day and I'm (hopefully) getting better!

 Luckily, we still have one of them.

We have a couple of people that think this way. Which one do you 
refer to?

Feb 12 2022

Paulo Pinto <pjmlp progtools.org> writes:

On Saturday, 12 February 2022 at 11:04:48 UTC, forkit wrote:
 ...

 I see the performance of (other) compilers these days, and I 
 wonder.. what ever happened to that bread of compiler 
 writers... from long ago...

They went on to create Eiffel, Delphi, .NET, Java, V8, GraalVM 
(nee Maxime), OCaml, Go and Dart.

Feb 12 2022

rempas <rempas tutanota.com> writes:

On Saturday, 12 February 2022 at 15:54:29 UTC, Paulo Pinto wrote:
 They went on to create Eiffel, Delphi, .NET, Java, V8, GraalVM 
 (nee Maxime), OCaml, Go and Dart.

Half you mentioned are interpreters. Bytecode, JITs, saying 
however you want, not true compilers/transpilers that result to 
binary. Dart doesn't compile fast (or why I have this idea?) and 
GO is very fast but isn't extremely fast (and was very annoying 
and limited language the last time I checked).

Not to offend anything and anyone here but these examples don't 
do for me and probably for most people here.

Feb 12 2022

Paulo Pinto <pjmlp progtools.org> writes:

On Saturday, 12 February 2022 at 19:06:55 UTC, rempas wrote:
 On Saturday, 12 February 2022 at 15:54:29 UTC, Paulo Pinto 
 wrote:
 They went on to create Eiffel, Delphi, .NET, Java, V8, GraalVM 
 (nee Maxime), OCaml, Go and Dart.

 Half you mentioned are interpreters. Bytecode, JITs, saying 
 however you want, not true compilers/transpilers that result to 
 binary. Dart doesn't compile fast (or why I have this idea?) 
 and GO is very fast but isn't extremely fast (and was very 
 annoying and limited language the last time I checked).

 Not to offend anything and anyone here but these examples don't 
 do for me and probably for most people here.

That only shows how little you know of them, and the available 
toolchains.

If you want to actually educate yourself about them, there is 
plenty of material available.

Feb 12 2022

rempas <rempas tutanota.com> writes:

On Saturday, 12 February 2022 at 19:18:12 UTC, Paulo Pinto wrote:
 That only shows how little you know of them, and the available 
 toolchains.

 If you want to actually educate yourself about them, there is 
 plenty of material available.

You are true on what you are saying, but what made you say that 
from my comment? I suppose that I was wrong about Dart and Go 
have probably got better. But other than that what was my 
mistake? I'm not saying that to make irony, I really want to see 
your point of view. You understand that I can learn about 2-3 
languages but I cannot make a research about every language you 
listed. Thank you!

Feb 12 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sat, Feb 12, 2022 at 07:51:38AM +0000, forkit via Digitalmars-d wrote:
[...]
 That's the primary reason I became interested in D - the speed of
 compilation, using dmd.

[...]
 I don't care how great a programming language is, slow compilation is
 a real turn off!
 
 Hooray for dmd!!

I use dmd for the code-compile-test cycle because of the fast
turnaround. For small programs dmd is so fast it's almost like
programming in a scripting language(!). For larger programs it's less
so, but still impressively fast for compile times.

Runtime performance of executables compiled by dmd, however, is a
disappointment.  I consistently get 20%-40% runtime performance
improvement by compiling with ldc/gdc, esp. for CPU-intensive programs.

So my usual workflow is dmd for code-compile-test, ldc -O2 for release
builds.


T

-- 
Amateurs built the Ark; professionals built the Titanic.

Feb 12 2022

rempas <rempas tutanota.com> writes:

On Saturday, 12 February 2022 at 16:13:55 UTC, H. S. Teoh wrote:
 I use dmd for the code-compile-test cycle because of the fast 
 turnaround. For small programs dmd is so fast it's almost like 
 programming in a scripting language(!). For larger programs 
 it's less so, but still impressively fast for compile times.

 Runtime performance of executables compiled by dmd, however, is 
 a disappointment.  I consistently get 20%-40% runtime 
 performance improvement by compiling with ldc/gdc, esp. for 
 CPU-intensive programs.

 So my usual workflow is dmd for code-compile-test, ldc -O2 for 
 release builds.


 T

If you get to a point that runtime becomes too slow for a 
specific task then I don't think that 20%-40% will make such of a 
big difference really. There may be cases that even the smallest 
performance boost will make the difference but were that a lot in 
your experience?

The funny stuff is that I may be stupid and talking about things 
I don't have experience with but I'm just talking with logic in 
mind so If I'm wrong then please make sure to properly fix me and 
tell me your experience on this topic. Thank you!

Feb 12 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sat, Feb 12, 2022 at 07:31:28PM +0000, rempas via Digitalmars-d wrote:
[...]
 If you get to a point that runtime becomes too slow for a specific
 task then I don't think that 20%-40% will make such of a big
 difference really. There may be cases that even the smallest
 performance boost will make the difference but were that a lot in your
 experience?

[...]

20%-40% is a HUGE difference. Think about a 60fps 3D game where you have
only 16ms to update the screen for the next frame. If your code takes
~13ms to update a frame when compiled with LDC -O2, then compiling D
will not even be an option because it would not be able to meet the
framerate and the game will be jerky and unplayable.  If the difference
is 2% or 3% then there may still be room for negotiation. 20%-40% is
half an order of magnitude. There is no way you can compromise with
that.

Also, for long-running CPU-intensive computations, which one would you
rather have: your complex computation to finish in 2 days, which may
just make the deadline, or ~4 days, which will definitely *not* meet the
deadline?  Again, if the difference is 2% or 3% then you may still be
able to work with it. 20%-40% is unacceptable.


T

-- 
Written on the window of a clothing store: No shirt, no shoes, no service.

Feb 12 2022

rempas <rempas tutanota.com> writes:

On Saturday, 12 February 2022 at 20:22:44 UTC, H. S. Teoh wrote:
 20%-40% is a HUGE difference. Think about a 60fps 3D game where 
 you have only 16ms to update the screen for the next frame. If 
 your code takes ~13ms to update a frame when compiled with LDC 
 -O2, then compiling D will not even be an option because it 
 would not be able to meet the framerate and the game will be 
 jerky and unplayable.  If the difference is 2% or 3% then there 
 may still be room for negotiation. 20%-40% is half an order of 
 magnitude. There is no way you can compromise with that.

 Also, for long-running CPU-intensive computations, which one 
 would you rather have: your complex computation to finish in 2 
 days, which may just make the deadline, or ~4 days, which will 
 definitely *not* meet the deadline?  Again, if the difference 
 is 2% or 3% then you may still be able to work with it. 20%-40% 
 is unacceptable.


 T

Game dev was what I was sure about and the first thing that comes 
in mind when we talk about runtime performance. The second 
example was a good one too! Thank you!

Feb 12 2022

max haughton <maxhaton gmail.com> writes:

On Saturday, 12 February 2022 at 20:22:44 UTC, H. S. Teoh wrote:
 On Sat, Feb 12, 2022 at 07:31:28PM +0000, rempas via 
 Digitalmars-d wrote: [...]
 If you get to a point that runtime becomes too slow for a 
 specific task then I don't think that 20%-40% will make such 
 of a big difference really. There may be cases that even the 
 smallest performance boost will make the difference but were 
 that a lot in your experience?

 [...]

 20%-40% is a HUGE difference. Think about a 60fps 3D game where 
 you have only 16ms to update the screen for the next frame. If 
 your code takes ~13ms to update a frame when compiled with LDC 
 -O2, then compiling D will not even be an option because it 
 would not be able to meet the framerate and the game will be 
 jerky and unplayable.  If the difference is 2% or 3% then there 
 may still be room for negotiation. 20%-40% is half an order of 
 magnitude. There is no way you can compromise with that.

 Also, for long-running CPU-intensive computations, which one 
 would you rather have: your complex computation to finish in 2 
 days, which may just make the deadline, or ~4 days, which will 
 definitely *not* meet the deadline?  Again, if the difference 
 is 2% or 3% then you may still be able to work with it. 20%-40% 
 is unacceptable.


 T

The thing with dmd isn't just the performance that also it's 
quite buggy when it starts optimizing.

Quite a few libraries have a gotcha due to dmd (*especially* 
`-inline`) that has to be worked around (the inliner can 
basically ignore language semantics which can break NRVO for 
example)

Feb 12 2022

rempas <rempas tutanota.com> writes:

On Saturday, 12 February 2022 at 07:13:15 UTC, Walter Bright 
wrote:
 The complexity comes from the complexity of the x86 instruction 
 set and the choice of instructions is very dependent on the 
 shape of the expression trees.

You could email the creator of Vox and ask him about the general 
structure of Vox and about tricks with X86_X64 specif stuff as 
this is what Vox targets so he may be specialized in this ISA and 
know stuff that you don't (which may also improve the runtime 
performance of programs compiled with DMD). Of course in case you 
didn't checked, the source of Vox is written in D and it is 36K 
LoC (at least that's what the README.md says) so you could also 
have a look (I did and it even looks readable to a n00b like me).

If I knew assembly and machine language (and in general about 
compiler design), I would do it myself to save you some time and 
then directly email you but unfortunately I'm not able to do that 
now.

But tbh, DMD is very fast as it is now given the fact that it 
does optimizations (Vox and TCC doesn't do any if I'm not 
mistaken). And my post was to make discussion and see what others 
think about this topic and not to say that DMD is slow cause that 
would be a lie ;)

 The only thing it has really failed at is the x87, which 
 everyone wants to leave behind anyway.

Why, what was bad about it? Can I get a little of background on 
this one?

Feb 12 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/12/2022 12:08 AM, rempas wrote:
 The only thing it has really failed at is the x87, which everyone wants to 
 leave behind anyway.

 Why, what was bad about it? Can I get a little of background on this one?

It doesn't assign variables to x87 registers. The reason it doesn't is because 
the x87 is a stack machine, meaning the registers all shift position.

There is a way to fix that by using FXCH instructions, but I never got around
to 
doing that.

Feb 12 2022

rempas <rempas tutanota.com> writes:

On Saturday, 12 February 2022 at 09:00:18 UTC, Walter Bright 
wrote:
 It doesn't assign variables to x87 registers. The reason it 
 doesn't is because the x87 is a stack machine, meaning the 
 registers all shift position.

 There is a way to fix that by using FXCH instructions, but I 
 never got around to doing that.

Thanks for the info! Yeah, agree with you! It seems why should 
all forget about x87 then.

Feb 12 2022

max haughton <maxhaton gmail.com> writes:

On Saturday, 12 February 2022 at 07:13:15 UTC, Walter Bright 
wrote:
 On 2/11/2022 6:52 AM, max haughton wrote:
 The object emission code in the backend is quite inefficient,

 It's faster than any other compiler.

 it needs to be rewritten (it's horrible old code anyway)

 I suppose that depends on what you're used to. The basic design 
 is pretty simple - there's a code gen function for each 
 expression node type. The optimizer uses standard data flow 
 analysis math. There's a separate pass for register allocation, 
 and one for scheduling.

 The design was originally written for the 8086. It survived 
 extension to 32 bits, then 64 bits, then SIMD.

 The complexity comes from the complexity of the x86 instruction 
 set and the choice of instructions is very dependent on the 
 shape of the expression trees.

 The only thing it has really failed at is the x87, which 
 everyone wants to leave behind anyway.

I'm specifically talking about the file that handles elf files, 
it's very messy and uses some absolutely enormous structs which 
are naturally very slow by virtue of their size.

Feb 12 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/12/2022 9:20 AM, max haughton wrote:
 I'm specifically talking about the file that handles elf files, it's very
messy 
 and uses some absolutely enormous structs which are naturally very slow by 
 virtue of their size.

The elf generator was written nearly 30 years ago, and has never been
refactored 
properly to modernize it. It could sure use it, but I'm not so sure it would 
speed things up noticeably.

If you want to take a crack at it, feel free!

Feb 12 2022

max haughton <maxhaton gmail.com> writes:

On Sunday, 13 February 2022 at 00:41:38 UTC, Walter Bright wrote:
 On 2/12/2022 9:20 AM, max haughton wrote:
 I'm specifically talking about the file that handles elf 
 files, it's very messy and uses some absolutely enormous 
 structs which are naturally very slow by virtue of their size.

 The elf generator was written nearly 30 years ago, and has 
 never been refactored properly to modernize it. It could sure 
 use it, but I'm not so sure it would speed things up noticeably.

 If you want to take a crack at it, feel free!

It's on my list.

The reason why it's slow is because the structs are very large 
compared to a cacheline so the CPU has to pull in 
(optimistically, the CPU might pull in several lines at once) 64 
bytes but only uses about 10 of them in a given iteration.

There is an O(n^2) algorithm in there but I'm not sure it's a 
particularly big N in normal programs.

Feb 12 2022

Dennis <dkorpel gmail.com> writes:

On Friday, 11 February 2022 at 12:34:21 UTC, rempas wrote:
 That's nice to hear! However, does DMD generates object files 
 directly or "asm" files that are passed to a C compile? If I 
 remember correctly, LDC2 needs to pass the output to a C 
 compiler as people told me so what's the case from DMD?

DMD goes from its own backend block tree to an object file, 
without writing assembly. In fact, only recently was the ability 
to output asm added for debugging purposes:
https://dlang.org/blog/2022/01/24/the-binary-language-of-moisture-vaporators/

On Linux dmd invokes gcc by default to create an executable, but 
only to link the resulting object files, not to compile 
C/assembly code.

LDC goes from LLVM IR to machine code, but it can output assembly 
with the `-output-s` flag.

GDC does generate assembly text to the tmp folder and then 
invokes `gas` the GNU assembler, it can't directly write machine 
code.

Feb 11 2022

rempas <rempas tutanota.com> writes:

On Friday, 11 February 2022 at 16:40:42 UTC, Dennis wrote:
 DMD goes from its own backend block tree to an object file, 
 without writing assembly. In fact, only recently was the 
 ability to output asm added for debugging purposes:
 https://dlang.org/blog/2022/01/24/the-binary-language-of-moisture-vaporators/

 On Linux dmd invokes gcc by default to create an executable, 
 but only to link the resulting object files, not to compile 
 C/assembly code.

 LDC goes from LLVM IR to machine code, but it can output 
 assembly with the `-output-s` flag.

 GDC does generate assembly text to the tmp folder and then 
 invokes `gas` the GNU assembler, it can't directly write 
 machine code.

Thank you! This sums it up perfectly! Can you choose to pass it 
directly to the linker with DMD on Linux? Something like setting 
"ld" (or another linker of course) as the "C" compiler, idk...

Feb 11 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/11/2022 8:40 AM, Dennis wrote:
 In fact, only recently was the ability to output asm added for 
 debugging purposes:
 https://dlang.org/blog/2022/01/24/the-binary-language-of-moisture-vaporators/

I use that dumb feature most every day. It's the most productivity enhancing 
feature I've added in a long time.

For example, I formerly wrote:

     import core.stdio;
     int main() {
        printf("%d\n", expression);
        return 0;
     }

     dmd test
     ./test

every time I wanted to see what `expression` evaluated to. Now I just do:

    int test() { return expression; }

    dmd -c test -vasm

Shazzam! Mucho less typitty-tippity-tip-typing!

Feb 11 2022

rempas <rempas tutanota.com> writes:

On Saturday, 12 February 2022 at 07:08:17 UTC, Walter Bright 
wrote:
 I use that dumb feature most every day. It's the most 
 productivity enhancing feature I've added in a long time.

 For example, I formerly wrote:

     import core.stdio;
     int main() {
        printf("%d\n", expression);
        return 0;
     }

     dmd test
     ./test

 every time I wanted to see what `expression` evaluated to. Now 
 I just do:

    int test() { return expression; }

    dmd -c test -vasm

 Shazzam! Mucho less typitty-tippity-tip-typing!

THANK YOU!!! Every compiler needs that even if they can output 
binary formats directly (TCC I'm talking to you!) because it is 
easier to see the assembly rather than imagine the instructions 
in your head as humans are know to make mistakes. Thank you for 
adding this Walter!

Feb 11 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/11/2022 11:44 PM, rempas wrote:
 THANK YOU!!! Every compiler needs that even if they can output binary formats 
 directly (TCC I'm talking to you!) because it is easier to see the assembly 
 rather than imagine the instructions in your head as humans are know to make 
 mistakes. Thank you for adding this Walter!

You're quite welcome!

Feb 12 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/11/2022 4:34 AM, rempas wrote:
 That's nice to hear! However, does DMD generates object files directly

It generates object files directly. No "asm" step. The intermediate code is 
converted directly to machine code.


 Do you think that there are any very bad places in DMD's backend? Has anyone
in 
 the team thought about re-writing the backend (or parts of it) from the
beginning?

It has evolved over time, but the basic design has held up very well. The main 
difficulty is the very complex nature of the x86 CPU, which leads to endless 
special cases.

Feb 11 2022

Dave P. <dave287091 gmail.com> writes:

On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:
 [...]

I think it would be interesting to combine a compiler and a 
linker into a single executable. Not necessarily for speed 
reasons, but for better diagnostics and the possibility of type 
checking external symbols. Linker errors can sometimes be hard to 
understand in the presence of inlining and optimizations. The 
linker will report references to symbols not present in your code 
or present in completely different places.

For example:

```D
extern(D) int some_func(int x);

pragma(inline, true)
private int foo(int x){
     return some_func(x);
}

pragma(inline, true)
private int bar(int x){
     return foo(x);
}

pragma(inline, true)
private int baz(int x){
     return bar(x);
}

pragma(inline, true)
private int qux(int x){
     return baz(x);
}

int main(){
     return qux(2);
}

```

When you go to compile it:

```sh
Undefined symbols for architecture arm64:
   "__D7example9some_funcFiZi", referenced from:
       __D7example3fooFiZi in example.o
       __D7example3barFiZi in example.o
       __D7example3bazFiZi in example.o
       __D7example3quxFiZi in example.o
       __Dmain in example.o
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to 
see invocation)
Error: /usr/bin/cc failed with status: 1
```

The linker sees references to the extern function in places where 
I never wrote that in my source code. In a nontrivial project 
this can be quite confusing if you’re not used to this quirk of 
the linking process.

If the compiler is invoking the linker for you anyway, why can’t 
it read the object files and libraries and tell you exactly what 
is missing and where in your code you reference it?

Feb 10 2022

max haughton <maxhaton gmail.com> writes:

On Thursday, 10 February 2022 at 22:06:30 UTC, Dave P. wrote:
 On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:
 [...]

 I think it would be interesting to combine a compiler and a 
 linker into a single executable. Not necessarily for speed 
 reasons, but for better diagnostics and the possibility of type 
 checking external symbols. Linker errors can sometimes be hard 
 to understand in the presence of inlining and optimizations. 
 The linker will report references to symbols not present in 
 your code or present in completely different places.

 [...]

This goes away if you do a debug build, which most (all 
professionals I'm aware of) people do.

And why should the compiler do something the linker is going to 
do anyway? It would have to wait until after linking anyway 
because you might want a symbol to be defined somewhere else.

Feb 10 2022

Dave P. <dave287091 gmail.com> writes:

On Thursday, 10 February 2022 at 22:11:13 UTC, max haughton wrote:
 On Thursday, 10 February 2022 at 22:06:30 UTC, Dave P. wrote:
 On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:
 [...]

 I think it would be interesting to combine a compiler and a 
 linker into a single executable. Not necessarily for speed 
 reasons, but for better diagnostics and the possibility of 
 type checking external symbols. Linker errors can sometimes be 
 hard to understand in the presence of inlining and 
 optimizations. The linker will report references to symbols 
 not present in your code or present in completely different 
 places.

 [...]

 This goes away if you do a debug build, which most (all 
 professionals I'm aware of) people do.

That *is* a debug build:

```sh
ldc2 example.d -O0

Undefined symbols for architecture arm64:
   "__D7example9some_funcFiZi", referenced from:
       __D7example3fooFiZi in example.o
       __D7example3barFiZi in example.o
       __D7example3bazFiZi in example.o
       __D7example3quxFiZi in example.o
       __Dmain in example.o
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to 
see invocation)
Error: /usr/bin/cc failed with status: 1
```

I’m used to it at this point, but for people new to the C-style 
model of separate compilation it is extremely confusing. It’s 
made worse by the name mangling required to get C-linkers to link 
code from more modern languages.

 And why should the compiler do something the linker is going to 
 do anyway? It would have to wait until after linking anyway 
 because you might want a symbol to be defined somewhere else.

You would still give the compiler libraries if you wanted them 
defined elsewhere and in my idea the compiler would also be the 
linker so there is no “the linker is going to do anyway”.

Feb 10 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/10/2022 2:06 PM, Dave P. wrote:
 Undefined symbols for architecture arm64:
    "__D7example9some_funcFiZi", referenced from:
        __D7example3fooFiZi in example.o
        __D7example3barFiZi in example.o
        __D7example3bazFiZi in example.o
        __D7example3quxFiZi in example.o
        __Dmain in example.o
 ld: symbol(s) not found for architecture arm64

Things I have never been able to explain, even to long time professional 
programmers:

1. what "undefined symbol" means

2. what "multiply defined symbol" means

3. how linkers resolve symbols

Our own runtime library illustrates this bafflement. In druntime, there are 
these "hooks" where one can replace the default function that deals with 
assertion errors.

Such hooks are entirely unnecessary.

To override a symbol in a library, just write your own function with the same 
name and link it in before the library.

I have never been able to explain these to people. I wonder if it is because it 
is so simple, people think "that can't be right". With the hook thing, they'll 
ask me to re-explain it several times, then they'll say "are you sure?" and
they 
still don't believe it.

Feb 10 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 11/02/2022 11:52 AM, Walter Bright wrote:
 I have never been able to explain these to people. I wonder if it is 
 because it is so simple, people think "that can't be right". With the 
 hook thing, they'll ask me to re-explain it several times, then they'll 
 say "are you sure?" and they still don't believe it.

It does depend on a few factors.

Compiler, linker, build/package manager all playing along.

Not to mention shared library support actually good enough with clear 
common use cases all described.

For me personally there are a few unknowns for the general case that I 
would avoid using it in production.

Feb 10 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/10/2022 4:03 PM, rikki cattermole wrote:
 On 11/02/2022 11:52 AM, Walter Bright wrote:
 I have never been able to explain these to people. I wonder if it is because 
 it is so simple, people think "that can't be right". With the hook thing, 
 they'll ask me to re-explain it several times, then they'll say "are you 
 sure?" and they still don't believe it.

 
 It does depend on a few factors.
 
 Compiler, linker, build/package manager all playing along.
 
 Not to mention shared library support actually good enough with clear common
use 
 cases all described.
 
 For me personally there are a few unknowns for the general case that I would 
 avoid using it in production.

All linkers work this way.

Feb 10 2022

max haughton <maxhaton gmail.com> writes:

On Thursday, 10 February 2022 at 22:52:45 UTC, Walter Bright 
wrote:
 On 2/10/2022 2:06 PM, Dave P. wrote:
 Undefined symbols for architecture arm64:
    "__D7example9some_funcFiZi", referenced from:
        __D7example3fooFiZi in example.o
        __D7example3barFiZi in example.o
        __D7example3bazFiZi in example.o
        __D7example3quxFiZi in example.o
        __Dmain in example.o
 ld: symbol(s) not found for architecture arm64

 Things I have never been able to explain, even to long time 
 professional programmers:

 1. what "undefined symbol" means

 2. what "multiply defined symbol" means

 3. how linkers resolve symbols

 Our own runtime library illustrates this bafflement. In 
 druntime, there are these "hooks" where one can replace the 
 default function that deals with assertion errors.

 Such hooks are entirely unnecessary.

 To override a symbol in a library, just write your own function 
 with the same name and link it in before the library.

 I have never been able to explain these to people. I wonder if 
 it is because it is so simple, people think "that can't be 
 right". With the hook thing, they'll ask me to re-explain it 
 several times, then they'll say "are you sure?" and they still 
 don't believe it.

If by hook you mean a callback of sorts that can be overrided, 
then the problem solved is not strictly the same as a weakly 
defined function. If you have multiple library's in the same 
playpen then it simply doesn't work to have them all trying to 
override the same symbols. If they can neatly hook and unhook 
things that goes away.

Feb 10 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/10/2022 7:45 PM, max haughton wrote:
 If by hook you mean a callback of sorts that can be overrided, then the
problem 
 solved is not strictly the same as a weakly defined function. If you have 
 multiple library's in the same playpen then it simply doesn't work to have
them 
 all trying to override the same symbols. If they can neatly hook and unhook 
 things that goes away.

That's not how multiple libraries work.

Suppose you have 3 libraries, A, B, and C. You have an object file X. The
linker 
command is:

     link X.obj A.lib B.lib C.lib

X refers to "foo". All 4 define "foo". Which one gets picked?

    X.foo

That's it. There are no unresolved symbols to look for.

Now, suppose only B and C define "foo". Which one gets picked?

    B.foo

because it is not in X. Then, A is looked at, and it is not in A. Then, B is 
looked at, and it is in B. C is not looked at because it is now resolved.

It has nothing to do with weak definitions. It's a simple "foo" is referenced. 
Got to find a definition. Look in the libraries in the order they are supplied 
to the linker.

That's it.

Want to not use the library definition? Define it yourself in X. No need for 
hooking. No need for anything clever at all. Just define it in your .obj file.

----

Now suppose X.obj and Y.obj both define foo. Link with:

     link X.obj Y.obj A.lib B.lib C.lib

You get a message:

     Multiple definition of "foo", found in X.obj and Y.obj

because order does not matter for .obj files as far as symbols go. All the 
symbols in .obj files get added.

Feb 10 2022

Dennis <dkorpel gmail.com> writes:

On Friday, 11 February 2022 at 06:33:20 UTC, Walter Bright wrote:
 Now suppose X.obj and Y.obj both define foo. Link with:

     link X.obj Y.obj A.lib B.lib C.lib

 You get a message:

     Multiple definition of "foo", found in X.obj and Y.obj

Unless your compiler places all functions in COMDATs of course.

https://github.com/dlang/dmd/blob/a176f0359a07fa5a252518b512f3b085a43a77d8/src/dmd/backend/backconfig.d#L303
https://issues.dlang.org/show_bug.cgi?id=15342

Feb 11 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/11/2022 1:42 AM, Dennis wrote:
 On Friday, 11 February 2022 at 06:33:20 UTC, Walter Bright wrote:
 Now suppose X.obj and Y.obj both define foo. Link with:

     link X.obj Y.obj A.lib B.lib C.lib

 You get a message:

     Multiple definition of "foo", found in X.obj and Y.obj

 
 Unless your compiler places all functions in COMDATs of course.
 
 https://github.com/dlang/dmd/blob/a176f0359a07fa5a252518b512f3b085a43a77d8/src/dmd/backe
d/backconfig.d#L303 
 
 https://issues.dlang.org/show_bug.cgi?id=15342
 

Yes, common blocks (of which COMDATs are) are all treated as identical and one 
is selected, but if and only if they are already added by the linker. If the 
linker finds a COMDAT that resolves an undefined symbol, it is not going
looking 
further for another one.

COMDATs came about because C++ has a proclivity to spew identical functions
into 
multiple object files. D does, too.

Feb 11 2022

Dennis <dkorpel gmail.com> writes:

On Friday, 11 February 2022 at 06:33:20 UTC, Walter Bright wrote:
 Now suppose X.obj and Y.obj both define foo. Link with:

     link X.obj Y.obj A.lib B.lib C.lib

 You get a message:

     Multiple definition of "foo", found in X.obj and Y.obj

Don't rely on this when using DMD though, since it likes to place 
all functions in COMDATs, meaning the linker will just pick one 
`foo` instead of raising an error.

https://github.com/dlang/dmd/blob/a176f0359a07fa5a252518b512f3b085a43a77d8/src/dmd/backend/backconfig.d#L303
https://issues.dlang.org/show_bug.cgi?id=15342

Feb 11 2022

sfp <sfp hush.ai> writes:

On Friday, 11 February 2022 at 06:33:20 UTC, Walter Bright wrote:
On 2/10/2022 7:45 PM, max haughton wrote:
If by hook you mean a callback of sorts that can be overrided,
then the problem solved is not strictly the same as a weakly
defined function. If you have multiple library's in the same
playpen then it simply doesn't work to have them all trying to
override the same symbols. If they can neatly hook and unhook
things that goes away.

That's not how multiple libraries work.

Suppose you have 3 libraries, A, B, and C. You have an object
file X. The linker command is:

link X.obj A.lib B.lib C.lib

X refers to "foo". All 4 define "foo". Which one gets picked?

X.foo

That's it. There are no unresolved symbols to look for.

Now, suppose only B and C define "foo". Which one gets picked?

B.foo

because it is not in X. Then, A is looked at, and it is not in
A. Then, B is looked at, and it is in B. C is not looked at
because it is now resolved.

It has nothing to do with weak definitions. It's a simple "foo"
is referenced. Got to find a definition. Look in the libraries
in the order they are supplied to the linker.

That's it.

Want to not use the library definition? Define it yourself in
X. No need for hooking. No need for anything clever at all.
Just define it in your .obj file.

----

Now suppose X.obj and Y.obj both define foo. Link with:

link X.obj Y.obj A.lib B.lib C.lib

You get a message:

Multiple definition of "foo", found in X.obj and Y.obj

because order does not matter for .obj files as far as symbols
go. All the symbols in .obj files get added.

You have now successfully explained this to at least one
programmer! :-) Very good explanation, and very simple mechanism
indeed. Had no idea it worked this way.

Inspired by this, I did a little searching and found this blog
post:

http://www.samanbarghi.com/blog/2014/09/05/how-to-wrap-a-system-call-libc-function-in-linux/

One of these days I should get around to learning all the things
the toolchain can actually do for me!

Feb 11 2022

max haughton <maxhaton gmail.com> writes:

On Friday, 11 February 2022 at 06:33:20 UTC, Walter Bright wrote:
 On 2/10/2022 7:45 PM, max haughton wrote:
 If by hook you mean a callback of sorts that can be overrided, 
 then the problem solved is not strictly the same as a weakly 
 defined function. If you have multiple library's in the same 
 playpen then it simply doesn't work to have them all trying to 
 override the same symbols. If they can neatly hook and unhook 
 things that goes away.

 That's not how multiple libraries work.

 Suppose you have 3 libraries, A, B, and C. You have an object 
 file X. The linker command is:

     link X.obj A.lib B.lib C.lib

 X refers to "foo". All 4 define "foo". Which one gets picked?

    X.foo

 That's it. There are no unresolved symbols to look for.

 Now, suppose only B and C define "foo". Which one gets picked?

    B.foo

 because it is not in X. Then, A is looked at, and it is not in 
 A. Then, B is looked at, and it is in B. C is not looked at 
 because it is now resolved.

 It has nothing to do with weak definitions. It's a simple "foo" 
 is referenced. Got to find a definition. Look in the libraries 
 in the order they are supplied to the linker.

 That's it.

 Want to not use the library definition? Define it yourself in 
 X. No need for hooking. No need for anything clever at all. 
 Just define it in your .obj file.

 ----

 Now suppose X.obj and Y.obj both define foo. Link with:

     link X.obj Y.obj A.lib B.lib C.lib

 You get a message:

     Multiple definition of "foo", found in X.obj and Y.obj

 because order does not matter for .obj files as far as symbols 
 go. All the symbols in .obj files get added.

If all the libraries rely on hooking something you will silently 
break all but one, whereas the process of overriding a runtime 
hook can be made into an atomic operation that can fail in a 
reasonable manner if wielded incorrectly.

Doing things based on the order at link-time is simply not good 
practice in the general case. It's OK if you control all the 
things in the stack and want to (say) override malloc, but 
controlling what happens on an assertion is exactly the kind of 
thing that resolution at link-time can make into a real nightmare 
to do cleanly (and mutably, you might want to catch assertions 
differently when acting as a web server than when loading data).

Also linking (especially around shared libraries) doesn't work in 
exactly the same way on all platforms, so basically maximizing 
the entropy of a given link (minimize possible outcomes, so 
minimal magic) can be a real win when it comes to making a 
program that builds and runs reliably on different platforms. At 
Symmetry we have had real issues with shared libraries, for 
reasons more complicated than mentioned here granted, so we 
actually cannot ship anything with dmd even if we wanted to.

Feb 11 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/11/2022 9:20 AM, max haughton wrote:
 If all the libraries rely on hooking something you will silently break all but 
 one, whereas the process of overriding a runtime hook can be made into an
atomic 
 operation that can fail in a reasonable manner if wielded incorrectly.

Sorry, I don't follow that. I don't know what atomic ops have to do with it.


 Doing things based on the order at link-time is simply not good practice in
the 
 general case. It's OK if you control all the things in the stack and want to 
 (say) override malloc, but controlling what happens on an assertion is exactly 
 the kind of thing that resolution at link-time can make into a real nightmare
to 
 do cleanly (and mutably, you might want to catch assertions differently when 
 acting as a web server than when loading data).

All link operations conform to the ordering I described. I can't think of a way 
that is simpler, cleaner, or easier to understand. Hooking certainly ain't.


 Also linking (especially around shared libraries) doesn't work in exactly the 
 same way on all platforms, so basically maximizing the entropy of a given link 
 (minimize possible outcomes, so minimal magic) can be a real win when it comes 
 to making a program that builds and runs reliably on different platforms. At 
 Symmetry we have had real issues with shared libraries, for reasons more 
 complicated than mentioned here granted, so we actually cannot ship anything 
 with dmd even if we wanted to.

DLLs (shared libraries) are a different story because they are all-or-nothing. 
In fact, they aren't actually libraries at all in the programming sense. They 
aren't linked in, either, there's no linking involved when accessing a DLL.

Feb 11 2022

John Colvin <john.loughran.colvin gmail.com> writes:

On Thursday, 10 February 2022 at 22:52:45 UTC, Walter Bright 
wrote:
 On 2/10/2022 2:06 PM, Dave P. wrote:
 Undefined symbols for architecture arm64:
    "__D7example9some_funcFiZi", referenced from:
        __D7example3fooFiZi in example.o
        __D7example3barFiZi in example.o
        __D7example3bazFiZi in example.o
        __D7example3quxFiZi in example.o
        __Dmain in example.o
 ld: symbol(s) not found for architecture arm64

 Things I have never been able to explain, even to long time 
 professional programmers:

 1. what "undefined symbol" means

 2. what "multiply defined symbol" means

 3. how linkers resolve symbols

 Our own runtime library illustrates this bafflement. In 
 druntime, there are these "hooks" where one can replace the 
 default function that deals with assertion errors.

 Such hooks are entirely unnecessary.

 To override a symbol in a library, just write your own function 
 with the same name and link it in before the library.

 I have never been able to explain these to people. I wonder if 
 it is because it is so simple, people think "that can't be 
 right". With the hook thing, they'll ask me to re-explain it 
 several times, then they'll say "are you sure?" and they still 
 don't believe it.

I absolutely don’t want my executable defined by the order things 
happen to appear on the linker command line. I don’t want that 
incidentally and I don’t want to do it deliberately. The boat 
sailed on this long ago, I just want everything to be in the 
executable please with errors on duplicates, unless it’s dead 
code.

Same goes for import paths btw. I don’t want imports selected 
based on the order of import paths, I want hard errors on any 
duplication of fully-qualified modules.

D has amazing compile-time features for deciding what to compile 
or not, what to call and not. I want to use those, not rely on 
the details of how I cobble together my build (or how some 
automated tool does it for me).

Feb 12 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/12/2022 2:00 AM, John Colvin wrote:
 I absolutely don’t want my executable defined by the order things happen to 
 appear on the linker command line. I don’t want that incidentally and I
don’t 
 want to do it deliberately. The boat sailed on this long ago, I just want 
 everything to be in the executable please with errors on duplicates, unless
it’s 
 dead code.

For better or worse, that's how linkers work.

Though you could write a tool to scan libraries for multiple definitions. Most 
of the work is already done for you in dmd's source code.

Feb 12 2022

rempas <rempas tutanota.com> writes:

On Thursday, 10 February 2022 at 22:06:30 UTC, Dave P. wrote:
 I think it would be interesting to combine a compiler and a 
 linker into a single executable. Not necessarily for speed 
 reasons, but for better diagnostics and the possibility of type 
 checking external symbols. Linker errors can sometimes be hard 
 to understand in the presence of inlining and optimizations. 
 The linker will report references to symbols not present in 
 your code or present in completely different places.

 For example:

 ```D
 extern(D) int some_func(int x);

 pragma(inline, true)
 private int foo(int x){
     return some_func(x);
 }

 pragma(inline, true)
 private int bar(int x){
     return foo(x);
 }

 pragma(inline, true)
 private int baz(int x){
     return bar(x);
 }

 pragma(inline, true)
 private int qux(int x){
     return baz(x);
 }

 int main(){
     return qux(2);
 }

 ```

 When you go to compile it:

 ```sh
 Undefined symbols for architecture arm64:
   "__D7example9some_funcFiZi", referenced from:
       __D7example3fooFiZi in example.o
       __D7example3barFiZi in example.o
       __D7example3bazFiZi in example.o
       __D7example3quxFiZi in example.o
       __Dmain in example.o
 ld: symbol(s) not found for architecture arm64
 clang: error: linker command failed with exit code 1 (use -v to 
 see invocation)
 Error: /usr/bin/cc failed with status: 1
 ```

 The linker sees references to the extern function in places 
 where I never wrote that in my source code. In a nontrivial 
 project this can be quite confusing if you’re not used to this 
 quirk of the linking process.

 If the compiler is invoking the linker for you anyway, why 
 can’t it read the object files and libraries and tell you 
 exactly what is missing and where in your code you reference it?

Yeah, error messages could ALWAYS be better in any compiler (even 
rustc) at any time. This design would make it even easier to do 
like you explained. Thank you!

Feb 11 2022

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:
 A couple of months ago, I found out about a language called 
 [Vox](https://github.com/MrSmith33/vox) which uses a design 
 that I haven't seen before by any other compiler which is to 
 not create object files and then link them together but 
 instead, always create an executable at once.

  TCC (*Tiny C Compiler*) does this like 10 years ago. TCC was 
originally made as part of the obfuscation programming challenge, 
and then got updated to be more complete.

  https://www.bellard.org/tcc/

  I believe most of the compilers base is involving optimization 
for various architectures and versions of CPU's, along with 
cross-compiling. GNU/GCC has tons of legacy code in the back that 
it still uses i believe.

  To note, back in 1996 or about there i wrote an assembler that 
took x86 and could compiler itself. But wasn't compatible with 
any other code and couldn't use object files or anything (*as it 
was all made from scratch when i was 12-14*). However it did 
compiler directly to a COM file. I'll just say from experience, 
there are advantages but they don't outweigh the disadvantages. 
That's my flat opinion going from here.

Feb 10 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/10/2022 8:18 PM, Era Scarecrow wrote:
   To note, back in 1996 or about there i wrote an assembler that took x86 and 
 could compiler itself. But wasn't compatible with any other code and couldn't 
 use object files or anything (*as it was all made from scratch when i was 
 12-14*). However it did compiler directly to a COM file. I'll just say from 
 experience, there are advantages but they don't outweigh the disadvantages. 
 That's my flat opinion going from here.

Back in the olden days, creating a DOS executable was trivial. Things have 
gotten much more complicated.

Feb 10 2022

max haughton <maxhaton gmail.com> writes:

On Friday, 11 February 2022 at 04:18:42 UTC, Era Scarecrow wrote:
 On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:
 A couple of months ago, I found out about a language called 
 [Vox](https://github.com/MrSmith33/vox) which uses a design 
 that I haven't seen before by any other compiler which is to 
 not create object files and then link them together but 
 instead, always create an executable at once.

  TCC (*Tiny C Compiler*) does this like 10 years ago. TCC was 
 originally made as part of the obfuscation programming 
 challenge, and then got updated to be more complete.

  https://www.bellard.org/tcc/

  I believe most of the compilers base is involving optimization 
 for various architectures and versions of CPU's, along with 
 cross-compiling. GNU/GCC has tons of legacy code in the back 
 that it still uses i believe.

  To note, back in 1996 or about there i wrote an assembler that 
 took x86 and could compiler itself. But wasn't compatible with 
 any other code and couldn't use object files or anything (*as 
 it was all made from scratch when i was 12-14*). However it did 
 compiler directly to a COM file. I'll just say from experience, 
 there are advantages but they don't outweigh the disadvantages. 
 That's my flat opinion going from here.

Optimizations are slow, and optimizations that aren't a total 
mess when implemented require abstraction. Making those 
abstractions cheap is difficult, so you end up with LLVM and GCC 
being slower even on debug builds because they have more layers 
of abstraction (or rather take less shortcuts). It's probably 
very possible to equalise this performance with a more niche 
compiler, but it would also probably require a really immense 
effort and probably starting from scratch around a new concept (a 
la LLVM).

As for legacy code, there probably are branches being tested for 
old processors in places, but for the most part GCC's algorithms 
may look a bit crude (i.e. some of GCC's development practices 
are very 1980s compared to LLVM and will probably scare off new 
money and minds and kill the project in the long run) because of 
their C heritage, but they are still the benchmark to beat. The 
Itanium scheduler won't be running on an X86 target, to be clear.

I'm also not convinced the compiler assembling code itself is all 
that useful, it probably is marginally faster but on a modern 
system I couldn't measure it as significant on basically any 
workload. It's basically performance theatre, the performance of 
the semantic analysis or moving bytes around prior to object code 
however it's emitted is much more important.

The dmd backend gets a 6/10 for me when it comes to performance. 
The algorithms are very simple, it should really be faster than 
it is. The parts that actually emit the object code are 
particularly slow.

Feb 10 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 2/10/2022 10:36 PM, max haughton wrote:
 The dmd backend gets a 6/10 for me when it comes to performance. The
algorithms 
 are very simple, it should really be faster than it is. The parts that
actually 
 emit the object code are particularly slow.

Much of that comes from supporting 4 very different object file formats.

Feb 11 2022

rempas <rempas tutanota.com> writes:

On Friday, 11 February 2022 at 04:18:42 UTC, Era Scarecrow wrote:
  I believe most of the compilers base is involving optimization 
 for various architectures and versions of CPU's, along with 
 cross-compiling.

Yeah but when I don't cross-compile, I only compile for one OS 
and one instruction set. Code for other cases will not get 
executed so I cannot see how this can play a role. TCC also 
support a lot of architectures and Operating Systems (even 
Windows natively If I'm not wrong). Unless I don't understand 
what you mean...

 GNU/GCC has tons of legacy code in the back that it still uses 
 i believe.

Yeah, that's the problem we will never be able to solve. New and 
better practices will always be invented so to get the best 
possible performance, we must always re-write stuff (or parts of 
it) and in the case of big compilers, this will be a pain in the 
ass and I understand it...

  To note, back in 1996 or about there i wrote an assembler that 
 took x86 and could compiler itself. But wasn't compatible with 
 any other code and couldn't use object files or anything (*as 
 it was all made from scratch when i was 12-14*). However it did 
 compiler directly to a COM file. I'll just say from experience, 
 there are advantages but they don't outweigh the disadvantages. 
 That's my flat opinion going from here.

I wonder what we can do to keep the advantages and take away the 
disadvantages. The second idea I had is probably the answer but I 
would like someone to say something about it directly. Thank you 
for your time!

Feb 11 2022

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Friday, 11 February 2022 at 04:18:42 UTC, Era Scarecrow wrote:
 On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:
 A couple of months ago, I found out about a language called 
 [Vox](https://github.com/MrSmith33/vox) which uses a design 
 that I haven't seen before by any other compiler which is to 
 not create object files and then link them together but 
 instead, always create an executable at once.

  TCC (*Tiny C Compiler*) does this like 10 years ago. TCC was 
 originally made as part of the obfuscation programming 
 challenge, and then got updated to be more complete.

  https://www.bellard.org/tcc/

If one wants to get really historic it is also what made Turbo 
Pascal did up to version 3.0. With Turbo Pascal 4.0 they went 
back to more classic object file/linker and there is a good 
reason for that. Separate compilation and linking modules and 
libraries are a thing. If you build the compiler for direct 
executable production you have to still support normal object 
file/library handling i.e. you put the functionality of the 
linker into your compiler.

Feb 11 2022

rempas <rempas tutanota.com> writes:

On Friday, 11 February 2022 at 17:36:03 UTC, Patrick Schluter 
wrote:
 If one wants to get really historic it is also what made Turbo 
 Pascal did up to version 3.0. With Turbo Pascal 4.0 they went 
 back to more classic object file/linker and there is a good 
 reason for that. Separate compilation and linking modules and 
 libraries are a thing. If you build the compiler for direct 
 executable production you have to still support normal object 
 file/library handling i.e. you put the functionality of the 
 linker into your compiler.

Yep and that's what I love about it! You can have 2 ways to do 
the same thing and choose based on what's best for the case.

For example, if your projects has 10M LoC, even if you can 
compiler 1M LoC/S (which is a very big number), your project will 
need 10 seconds to build which will make it very annoying. In 
that case, we use the classic method of creating object files to 
the files that were changed and then link them together.

However, if your project is 1M LoC or less, that is less than 1 
second to build it which is not noticeable at all. The same 
happens when the end-user compiles the software from source and 
doesn't care (and won't even keep) about the object files because 
he/she is not a developer. In that case it makes sense to not 
waste time creating the object file and go straight creating the 
executable/library.

If we are to make a new compiler (which I plan to), we should 
create a whole toolchain that will consist of all the tools. 
Sounds complex, I know but what's the point if we don't advance? 
Make another compiler that outputs assembly so it will always 
have dependencies and it will be slow to compile (slow compared 
to if we outputted machine language directly)?

Feb 11 2022

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Friday, 11 February 2022 at 17:36:03 UTC, Patrick Schluter 
wrote:
 If one wants to get really historic it is also what made Turbo 
 Pascal did up to version 3.0. With Turbo Pascal 4.0 they went 
 back to more classic object file/linker

  Mmm hard to say on various compilers, i never had the money when 
i was younger to pay for said compilers/toolsets, and now most of 
them (*current popular ones*) are free (*Might have a couple 
Turbo Compiler with a C++ programming book, but never touched 
it*).

  No doubt many earlier commercial compilers didn't have separate 
architectures and probably just did x86. But it's been a long 
time since the 16-bit MS-DOS age when that was more common.

  Though if optimizations are dropped you can probably have a very 
lean toolset, maybe even to build an entire distro from sources 
on a CD. Though last time i tried to build Libc it took a very 
long time, not recommended.

Feb 11 2022

D Programming

C/C++ Programming

Other

digitalmars.D - Potential of a compiler that creates the executable at once