www.digitalmars.com         C & C++   DMDScript  

D - Compilation model

reply "Martin M. Pedersen" <mmp www.moeller-pedersen.dk> writes:
Currently, D is using the same compilation model as C/C++, but I'm not
convinced that it cannot be done better.

We have (at least):
- directories
- modules
- source files
- object files
- libraries
- binaries

The main reason for using the same compilation model as C/C++ is link
compability, as I see it. But there is room for changes, while keeping link
compability. There is a one-to-one correspondance between directories and
modules. I suggest that we keep it this way. There is also a one-to-one
correspondance between source files and object file. I suggest that is
changed.

How about letting the compiler compile a whole module/directory at once, and
emit a library. Object files within the library should be of much smaller
smaller granularity than the source files, and there would be many more of
them. Idially one object file per method or public data symbol. But it would
not really be a concern using the compiler, because we - users of the
compiler - would not mess with the object files, only libraries. The
benefits would be:

- The compiler would be able to do module-wide optimizations.
- The linker (an existing, traditional linker) would be able to filter away
more unused code and data.
- When compiling a module, a source file needs only to be parsed once.

In the traditional model, when a change is made outside a module, it might
happen that only a portion of the module needs to be recompiled. But in many
cases, perhaps the most often cases, such changes means that the whole
module needs to be recompiled. This leads me to think that we might as well
treat the module as the translation unit, and gain the benefits above.

Comments?

Regards,
Martin M. Pedersen
May 14 2003
next sibling parent "Walter" <walter digitalmars.com> writes:
D is designed so that an advanced compiler can compile the entire project in
one go. For the time being it follows the traditional C/C++ development
model because I have limited resources, and so take maximum advantage of
existing tools.

"Martin M. Pedersen" <mmp www.moeller-pedersen.dk> wrote in message
news:b9uh9j$2fls$1 digitaldaemon.com...
 Currently, D is using the same compilation model as C/C++, but I'm not
 convinced that it cannot be done better.

 We have (at least):
 - directories
 - modules
 - source files
 - object files
 - libraries
 - binaries

 The main reason for using the same compilation model as C/C++ is link
 compability, as I see it. But there is room for changes, while keeping

 compability. There is a one-to-one correspondance between directories and
 modules. I suggest that we keep it this way. There is also a one-to-one
 correspondance between source files and object file. I suggest that is
 changed.

 How about letting the compiler compile a whole module/directory at once,

 emit a library. Object files within the library should be of much smaller
 smaller granularity than the source files, and there would be many more of
 them. Idially one object file per method or public data symbol. But it

 not really be a concern using the compiler, because we - users of the
 compiler - would not mess with the object files, only libraries. The
 benefits would be:

 - The compiler would be able to do module-wide optimizations.
 - The linker (an existing, traditional linker) would be able to filter

 more unused code and data.
 - When compiling a module, a source file needs only to be parsed once.

 In the traditional model, when a change is made outside a module, it might
 happen that only a portion of the module needs to be recompiled. But in

 cases, perhaps the most often cases, such changes means that the whole
 module needs to be recompiled. This leads me to think that we might as

 treat the module as the translation unit, and gain the benefits above.

 Comments?

 Regards,
 Martin M. Pedersen

May 14 2003
prev sibling next sibling parent "Sean L. Palmer" <palmer.sean verizon.net> writes:
I find no flaw in your arguments, and agree it would be an improvement.

Sean

"Martin M. Pedersen" <mmp www.moeller-pedersen.dk> wrote in message
news:b9uh9j$2fls$1 digitaldaemon.com...
 Currently, D is using the same compilation model as C/C++, but I'm not
 convinced that it cannot be done better.

 We have (at least):
 - directories
 - modules
 - source files
 - object files
 - libraries
 - binaries

 The main reason for using the same compilation model as C/C++ is link
 compability, as I see it. But there is room for changes, while keeping

 compability. There is a one-to-one correspondance between directories and
 modules. I suggest that we keep it this way. There is also a one-to-one
 correspondance between source files and object file. I suggest that is
 changed.

 How about letting the compiler compile a whole module/directory at once,

 emit a library. Object files within the library should be of much smaller
 smaller granularity than the source files, and there would be many more of
 them. Idially one object file per method or public data symbol. But it

 not really be a concern using the compiler, because we - users of the
 compiler - would not mess with the object files, only libraries. The
 benefits would be:

 - The compiler would be able to do module-wide optimizations.
 - The linker (an existing, traditional linker) would be able to filter

 more unused code and data.
 - When compiling a module, a source file needs only to be parsed once.

 In the traditional model, when a change is made outside a module, it might
 happen that only a portion of the module needs to be recompiled. But in

 cases, perhaps the most often cases, such changes means that the whole
 module needs to be recompiled. This leads me to think that we might as

 treat the module as the translation unit, and gain the benefits above.

 Comments?

 Regards,
 Martin M. Pedersen

May 14 2003
prev sibling next sibling parent reply Helmut Leitner <helmut.leitner chello.at> writes:
I agree, esp. with the need to resolve the one-to-one relationship
between source file (module) and object file.

One way to do this could be manually, by using a 
  #pragma split
which could produce from a source file
  module.d:
  ...independent code, part 1
  #pragma split
  ...independent code, part 2
  #pragma split
  ...independent code, part 3
three separate object files (via a compiler "split" option) 
  module_001.obj
  module_002.obj
  module_003.obj
for inclusion into libaries. 

This would allow to reduce the footprint of D applications:

- embedded people won't consider D with its current 60K
  minimium exe size (they think in the range 1-8K).

- needing Win32 structs you will need to compile and link 
  some windows.d file into your application. Its 500 init
  default structures add about 30K. This could be splitted
  into reasonable parts without creating 500 independent
  source files to have a space optimized interface. 

- it would allow a more efficient reuse situation, because adding
  some code to a module would not autimatically mean to add
  to the footprint of any application using the module.

Of course it would be more beautiful when this effect could be
transparent to the programmer, so that the splitted parts were
produced as one granular object file that could be linked just
the typical monolythic object files. But I think linkers won't
support this.

"Martin M. Pedersen" wrote:
 
 Currently, D is using the same compilation model as C/C++, but I'm not
 convinced that it cannot be done better.
 
 We have (at least):
 - directories
 - modules
 - source files
 - object files
 - libraries
 - binaries
 
 The main reason for using the same compilation model as C/C++ is link
 compability, as I see it. But there is room for changes, while keeping link
 compability. There is a one-to-one correspondance between directories and
 modules. I suggest that we keep it this way. There is also a one-to-one
 correspondance between source files and object file. I suggest that is
 changed.
 
 How about letting the compiler compile a whole module/directory at once, and
 emit a library. Object files within the library should be of much smaller
 smaller granularity than the source files, and there would be many more of
 them. Idially one object file per method or public data symbol. But it would
 not really be a concern using the compiler, because we - users of the
 compiler - would not mess with the object files, only libraries. The
 benefits would be:
 
 - The compiler would be able to do module-wide optimizations.
 - The linker (an existing, traditional linker) would be able to filter away
 more unused code and data.
 - When compiling a module, a source file needs only to be parsed once.
 
 In the traditional model, when a change is made outside a module, it might
 happen that only a portion of the module needs to be recompiled. But in many
 cases, perhaps the most often cases, such changes means that the whole
 module needs to be recompiled. This leads me to think that we might as well
 treat the module as the translation unit, and gain the benefits above.
 
 Comments?
 
 Regards,
 Martin M. Pedersen

-- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
May 14 2003
next sibling parent Mark T <Mark_member pathlink.com> writes:
In article <3EC32D89.2F4CD480 chello.at>, Helmut Leitner says...
I agree, esp. with the need to resolve the one-to-one relationship
between source file (module) and object file.

I think that Walter's current mapping is fine. It doesn't prevent a D build-system from doing a global optimization of the whole "program" for release. Which would be nice for real-time or number crunching apps.
One way to do this could be manually, by using a 
  #pragma split
which could produce from a source file
  module.d:
  ...independent code, part 1
  #pragma split
  ...independent code, part 2
  #pragma split
  ...independent code, part 3
three separate object files (via a compiler "split" option) 
  module_001.obj
  module_002.obj
  module_003.obj
for inclusion into libaries. 

yuk pragma stuff
This would allow to reduce the footprint of D applications:

- embedded people won't consider D with its current 60K
  minimium exe size (they think in the range 1-8K).

I do embedded, it comes in all sizes, the folks working in the above size range do NOT use a 32 bit (or bigger) processor (requirement for D early on). Most everything I work on now is 32 bit with some 16 bit processors. Reuse, project complexity (makefile structure), etc are usually more important. The D module appears to support this.
May 16 2003
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EC32D89.2F4CD480 chello.at...
 I agree, esp. with the need to resolve the one-to-one relationship
 between source file (module) and object file.

 One way to do this could be manually, by using a
   #pragma split
 which could produce from a source file
   module.d:
   ...independent code, part 1
   #pragma split
   ...independent code, part 2
   #pragma split
   ...independent code, part 3
 three separate object files (via a compiler "split" option)
   module_001.obj
   module_002.obj
   module_003.obj
 for inclusion into libaries.

The D compiler automatically generates COMDATs for each function in a module, so they are individually linked in. It's equivalent to doing the above.
 This would allow to reduce the footprint of D applications:
 - embedded people won't consider D with its current 60K
   minimium exe size (they think in the range 1-8K).

The D footprint is about 24k larger than the equivalent C footprint.
 - needing Win32 structs you will need to compile and link
   some windows.d file into your application. Its 500 init
   default structures add about 30K. This could be splitted
   into reasonable parts without creating 500 independent
   source files to have a space optimized interface.

Actually, I need to remove the need to link in the struct inits for 0 initted structs.
May 16 2003
parent reply Helmut Leitner <helmut.leitner chello.at> writes:
Walter wrote:
 
 "Helmut Leitner" <helmut.leitner chello.at> wrote in message
 news:3EC32D89.2F4CD480 chello.at...
 I agree, esp. with the need to resolve the one-to-one relationship
 between source file (module) and object file.

 One way to do this could be manually, by using a
   #pragma split
 which could produce from a source file
   module.d:
   ...independent code, part 1
   #pragma split
   ...independent code, part 2
   #pragma split
   ...independent code, part 3
 three separate object files (via a compiler "split" option)
   module_001.obj
   module_002.obj
   module_003.obj
 for inclusion into libaries.

The D compiler automatically generates COMDATs for each function in a module, so they are individually linked in. It's equivalent to doing the above.

Hey, that's great! I tried to check it and it seems to work. Is the a linker option that gives a more detailed list of the code (text) area?
 This would allow to reduce the footprint of D applications:
 - embedded people won't consider D with its current 60K
   minimium exe size (they think in the range 1-8K).

The D footprint is about 24k larger than the equivalent C footprint.

Both seems a bit large given that the linker removes dead code.
 - needing Win32 structs you will need to compile and link
   some windows.d file into your application. Its 500 init
   default structures add about 30K. This could be splitted
   into reasonable parts without creating 500 independent
   source files to have a space optimized interface.

Actually, I need to remove the need to link in the struct inits for 0 initted structs.

That would be great for the Win32 API, because otherwise one has to work around this. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
May 19 2003
parent "Walter" <walter digitalmars.com> writes:
"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EC938EB.5E35CB39 chello.at...
 The D compiler automatically generates COMDATs for each function in a
 module, so they are individually linked in. It's equivalent to doing the
 above.

Is the a linker option that gives a more detailed list of the code (text) area?

/MAP
 This would allow to reduce the footprint of D applications:
 - embedded people won't consider D with its current 60K
   minimium exe size (they think in the range 1-8K).



They're a bit large on Win32 systems because (unfortunately) there is a lot of code always linked in that deals with exceptions thrown from VC++ compiled DLLs. This is necessary so that DMC++ code can call DLLs built with VC++ and catch exceptions thrown by it. This code has no relevance for embedded systems, and so won't be there.
 - needing Win32 structs you will need to compile and link
   some windows.d file into your application. Its 500 init
   default structures add about 30K. This could be splitted
   into reasonable parts without creating 500 independent
   source files to have a space optimized interface.

initted structs.

to work around this.

I've got a lot of things like this that need to be done.
May 20 2003
prev sibling parent reply midiclub tiscali.de writes:
In article <b9uh9j$2fls$1 digitaldaemon.com>, Martin M. Pedersen says...
Currently, D is using the same compilation model as C/C++, but I'm not
convinced that it cannot be done better.

It is not really the same model as C. Look at any other language - it seems more similar to anything else than to C.
We have (at least):
- directories

- modules

one module ^= one source file ^= one object file.
- source files
- object files
- libraries

libraries are not a part of compilation model - they are used as an agregate of compiled modules and possibly their source (or in the future possibly parsed source).
- binaries

The main reason for using the same compilation model as C/C++ is link
compability, as I see it. But there is room for changes, while keeping link
compability. There is a one-to-one correspondance between directories and
modules. I suggest that we keep it this way. There is also a one-to-one
correspondance between source files and object file. I suggest that is
changed.

Where do you see correspondence of directories to modules? Keep many modules in one directory.
How about letting the compiler compile a whole module/directory at once, and
emit a library. Object files within the library should be of much smaller
smaller granularity than the source files, and there would be many more of
them. Idially one object file per method or public data symbol. But it would
not really be a concern using the compiler, because we - users of the
compiler - would not mess with the object files, only libraries. The
benefits would be:

- The compiler would be able to do module-wide optimizations.
- The linker (an existing, traditional linker) would be able to filter away
more unused code and data.
- When compiling a module, a source file needs only to be parsed once.

The compiler does inter-module optimisations, as far as possible. For that, the complete parsed source of the application and used libraries is contained in memory during the project compilation. This causes the whole library code to be re-parsed once per project compilation, but not more often than that. Maybe Walter could implement dumping of the parse trees to disk - this would be a great and major optimisation.
In the traditional model, when a change is made outside a module, it might
happen that only a portion of the module needs to be recompiled. But in many
cases, perhaps the most often cases, such changes means that the whole
module needs to be recompiled. This leads me to think that we might as well
treat the module as the translation unit, and gain the benefits above.

Modules are as long as a programmer could keep track of - a few dozen pages at most. The curent D compiler doesn't take any significant time to compile such an amount. GCC does, and hence it could be of relevance with it... Object files need not be of smaller granularity, since: - unittest/start-up code is common for a module, and must be included anyway doesn't mater how much of the module you use; - splitting a module apart into tiniest parts is a common misconception - it shouldn't help. Any decent linker is able to eliminate all functions which are not referenced anywhere. This means, methods of a class and functions taking part in a unittest are never removed. All other unused functions are.
Comments?

Seems you're not exactly sure what you're talking about. -i. --i MIDICLUB
May 15 2003
next sibling parent reply "Martin M. Pedersen" <mmp www.moeller-pedersen.dk> writes:
Hi,

 Where do you see correspondence of directories to modules? Keep many

 one directory.

Sorry, I has been away from D awhile. I was thinking more in line of Java's packages. I a larger D project, I would probably organize the modules such that there would be a main module per directory, the one you imported elsewhere, and a number of submodules. In such a case I was thinking of threating all the sources in the directory as one translation unit.
 Any decent linker is able to eliminate all functions which are
 not referenced anywhere.

Then I'm sorry that I don't have a decent linker, or don't know how to operate them efficiently. Given these sources... === main.c #include <stdio.h> extern const char* foo(); int main() { puts(foo()); return 0; } === foobar.c extern const char* baz(); const char* bar() { return baz(); } const char* foo() { return "foo"; } === baz.c const char* baz() { return "This should not be in the executable"; } === .. I'm able to find "This should not be in the executable" in the executable. I have tried this with the linker of MSVC6 and DMC.
 Seems you're not exactly sure what you're talking about.

If I was, I would not ask for comments :-) Regards, Martin M. Pedersen
May 15 2003
next sibling parent Ilya Minkov <midiclub 8ung.at> writes:
Hello.

Martin M. Pedersen wrote:
 Then I'm sorry that I don't have a decent linker, or don't know how
 to operate them efficiently. Given these sources...

 .. I'm able to find "This should not be in the executable" in the 
 executable. I have tried this with the linker of MSVC6 and DMC.

This doesn't show much, since "This should not be ..." is data, not code, and stored separately. However the code seems to be there too, i checked it. :( I'm yet to try some other linkers. Maybe Borland or Watcom? But from the times i worked with Delphi, yet version 2 or 3, i can remember the linker stripping out functions even from debug executables, midst in the actively used modules. This manifestated itself in the debugger complaints when using real-time expression evaluator: "This function has been eliminated by linker." This sometimes gave me a clue what's wrong with my code - that it contains some silly plug (a "bone") instead of real code - a function call - which was intended. Sorry for my misconception - this probably comes from me reading too much documentation containing hidden advertisements. :> But if such claims exist, there probably is some product which fulfills them...
 Seems you're not exactly sure what you're talking about

If I was, I would not ask for comments :-)

It appears that i'm also not. -i.
May 15 2003
prev sibling parent "Walter" <walter digitalmars.com> writes:
"Martin M. Pedersen" <mmp www.moeller-pedersen.dk> wrote in message
news:ba0irh$1g5e$1 digitaldaemon.com...
 .. I'm able to find "This should not be in the executable" in the
 executable. I have tried this with the linker of MSVC6 and DMC.

You need to compile with -Nc (function level linking). It's off by default for C because some legacy C will fail with -Nc. The D compiler does this automatically.
May 20 2003
prev sibling next sibling parent Helmut Leitner <helmut.leitner chello.at> writes:
midiclub tiscali.de wrote:
 - splitting a module apart into tiniest parts is a common misconception - it
 shouldn't help. Any decent linker is able to eliminate all functions which are
 not referenced anywhere. This means, methods of a class and functions taking
 part in a unittest are never removed. All other unused functions are.

I often heard this but never came upon a system that really did this. So please name a linker for Windows or Linux that shows this behaviour together with current D object modules. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
May 15 2003
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
<midiclub tiscali.de> wrote in message
news:b9vvrj$r9r$1 digitaldaemon.com...
 In article <b9uh9j$2fls$1 digitaldaemon.com>, Martin M. Pedersen says...
 For that, the
 complete parsed source of the application and used libraries is contained

 memory during the project compilation. This causes the whole library code

 re-parsed once per project compilation, but not more often than that.

 Walter could implement dumping of the parse trees to disk - this would be

 great and major optimisation.

That was the original plan, but the compiler parses so fast it wasn't justifiable to make the effort.
May 16 2003
parent reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
D compiler is really really fast.  Gotta hand it to ya.

Sean

"Walter" <walter digitalmars.com> wrote in message
news:ba312c$uav$1 digitaldaemon.com...
 re-parsed once per project compilation, but not more often than that.


 Walter could implement dumping of the parse trees to disk - this would


 great and major optimisation.

That was the original plan, but the compiler parses so fast it wasn't justifiable to make the effort.

May 16 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:ba36nl$1430$1 digitaldaemon.com...
 D compiler is really really fast.  Gotta hand it to ya.

Thanks. I haven't even expended any effort tuning it for speed. I just structured the language so it would be fast to parse.
May 16 2003
next sibling parent reply Garen Parham <nospam garen.net> writes:
Walter wrote:

 Thanks. I haven't even expended any effort tuning it for speed. I just
 structured the language so it would be fast to parse.

The whole thing is fast. It's so fast I couldn't tell the difference from doing something like: $ export DMD=/path/to/blah and running the compiler itself. DMC++ is also super fast. What would you attribute that to?
May 16 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Garen Parham" <nospam garen.net> wrote in message
news:ba4f4p$29kc$1 digitaldaemon.com...
 DMC++ is also super fast.  What would you attribute that to?

Profile, profile, profile!
May 18 2003
parent reply Garen Parham <nospam garen.net> writes:
Walter wrote:


 
 Profile, profile, profile!

Thats it? I was thinking maybe you'd say you had some ingenuous bottom-up strategies tightly knit with the target/languages or something. :)
May 18 2003
parent "Walter" <walter digitalmars.com> writes:
"Garen Parham" <nospam garen.net> wrote in message
news:ba8kn1$ugt$1 digitaldaemon.com...
 Walter wrote:
 Profile, profile, profile!

strategies tightly knit with the target/languages or something. :)

I don't think there's anything ingenious in the code. I just have a lot of experience knowing what eats cycles and what doesn't <g>. I must have tried dozens of different ways to do symbol tables. I can tell you, though, if you want a slow compiler, use Lex and Yacc.
May 20 2003
prev sibling parent reply Ilya Minkov <midiclub 8ung.at> writes:
Walter wrote:
 Thanks. I haven't even expended any effort tuning it for speed. I just
 structured the language so it would be fast to parse.

assert (Walter == GeniousWizard); AND LET TEH WORLD CRASH IF THIS ASSERT FAILS! :> -i.
May 18 2003
parent "Walter" <walter digitalmars.com> writes:
"Ilya Minkov" <midiclub 8ung.at> wrote in message
news:ba8h98$qsu$1 digitaldaemon.com...
 Walter wrote:
 Thanks. I haven't even expended any effort tuning it for speed. I just
 structured the language so it would be fast to parse.

assert (Walter == GeniousWizard); AND LET TEH WORLD CRASH IF THIS ASSERT FAILS! :>

LOL. If you want to see the source to the lexer/parser, it's included with the download. It looks pretty straightforward, until you compare it with the source to other compilers.
May 20 2003