www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Filesize of D binaries?

reply Mike Simons <ixulai gmail.com> writes:
Someone recently pointed out to me that a simple hello world sample that only
imports std.stdio and prints a single line to console compiles to a rather
hefty 250k! The program was 7 lines long! The compiled C equivalent is 7k.

I also tried compiling one of the samples from the site which was rouhghly 150
lines and that ended up being 400k.

I'm using gdc on linux.
Is there something I'm missing (i.e. is the compiler chucking in stuff I don't
need?) or is this for real?
Is it just an issue with immature compilers?

I know D has to stick RTTI and GC stuffs in my binary, but what else is going
in there?
Jan 06 2007
next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Mike Simons wrote:
 Someone recently pointed out to me that a simple hello world sample that only
 imports std.stdio and prints a single line to console compiles to a rather
 hefty 250k! The program was 7 lines long! The compiled C equivalent is 7k.
 
 I also tried compiling one of the samples from the site which was rouhghly 150
 lines and that ended up being 400k.
 
 I'm using gdc on linux.
 Is there something I'm missing (i.e. is the compiler chucking in stuff I don't
 need?) or is this for real?
 Is it just an issue with immature compilers?
 
 I know D has to stick RTTI and GC stuffs in my binary, but what else is going
 in there?

writef an friends call a function to handle the formatting. Since they're variadic functions, that function needs to be able to handle any type. So it contains code for every primitive type, structs, classes, etc. The compiler can't (currently) determine which of those are unused so it includes all of that code, plus any code they in turn need, and so on. C's printf() should have the same problem, except C has less types so the superfluous code is smaller, and it doesn't include typeinfo. C++'s iostreams can be written to include only the code that's strictly necessary. Something similar to C++'s iostreams could be written in D, either using similar syntax (lots of small function calls on a line, optionally using operator overloading) or using writef-like syntax (using variadic templates and static ifs). writef and friends predate variadic templates, so that's why those aren't used. I'm not sure if they also predate operator overloading[1], but it could be. I think Mango[2] has an output mechanism that works like iostreams. [1]: They were both present when I started using D [2]: http://dsource.org/projects/mango
Jan 06 2007
parent reply Mike Simons <ixulai gmail.com> writes:
Frits Van Bommel wrote:
"""writef an friends call a function to handle the formatting. Since
they're variadic functions, that function needs to be able to handle any
type. So it contains code for every primitive type, structs, classes,
etc. The compiler can't (currently) determine which of those are unused
so it includes all of that code, plus any code they in turn need, and so on. """

Ah; that would explain it. My main concern was that as you started getting
larger
programs the size would be proportional, but given what you've said it shouldn't
be. Infact, beyond a certain amount of imports the size should stop rising
drastically as more and more dependencies are met with the code that is already
linked?

Anders F Björklund wrote:
"""250K sounds a bit much, like if debugging symbols wasn't stripped ? """

The exact command I used was "gcd hello.d -o hello". I tried various other flags
like O3 and frelease and neither made any difference whatsoever.
However, when I compiled the exact same code using gdc on win32 (under mingw) it
was only ~100k.

I guess this is due to size differences between the static libs on each
platform .
Jan 06 2007
parent Leandro Lucarella <llucarella integratech.com.ar> writes:
Mike Simons escribió:
 Frits Van Bommel wrote:
 """writef an friends call a function to handle the formatting. Since
 they're variadic functions, that function needs to be able to handle any
 type. So it contains code for every primitive type, structs, classes,
 etc. The compiler can't (currently) determine which of those are unused
 so it includes all of that code, plus any code they in turn need, and so on.
"""
 
 Ah; that would explain it. My main concern was that as you started getting
larger
 programs the size would be proportional, but given what you've said it
shouldn't
 be. Infact, beyond a certain amount of imports the size should stop rising
 drastically as more and more dependencies are met with the code that is already
 linked?
 
 Anders F Björklund wrote:
 """250K sounds a bit much, like if debugging symbols wasn't stripped ? """
 
 The exact command I used was "gcd hello.d -o hello". I tried various other
flags
 like O3 and frelease and neither made any difference whatsoever.
 However, when I compiled the exact same code using gdc on win32 (under mingw)
it
 was only ~100k.
 
 I guess this is due to size differences between the static libs on each
platform .

Try to strip both binaries (C and D), and compile them with similar flags too: $ strip my_c_hello_world $ strip my_d_hello_world Then you can compare the sizes more accurately. But surely, D small programs are much larger than C ones, because D have a bigger runtime (there is a lot of module initialization code, GC, etc). And std.stdio doesn't look like the cause of the size, since there is no appreciable difference in the size of import std.stdio; int main() { writefln("Hello"); return 0; } and int main() { return 0; }, at least with DMD. Anyways, 150K (this is the difference in size between a stripped "null" D program compiled with DMD and the same program compiled in C) of extra code for initialization looks like too much (it's dynamically linked, the big part of the extra code should be on phobos). When linked statically, DMD is 640KB and C is 430KB (both stripped), which looks like a fair difference since D version have all the GC code on it, for example. The test programs are: luca azazel:/tmp$ cat null.d int main() { return 0; } luca azazel:/tmp$ cat null.c int main() { return 0; } Compiled with (dynamically linked): dmd null.d gcc -o null_c null.c And statically linked: dmd -c null.d gcc null.o -o null -m32 -lphobos -lpthread -lm -static gcc -o null_c -static null.c And again, all stripped before comparing sizes. -- Leandro Lucarella Integratech S.A. 4571-5252
Jan 08 2007
prev sibling parent =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:
Mike Simons wrote:

 Someone recently pointed out to me that a simple hello world sample that only
 imports std.stdio and prints a single line to console compiles to a rather
 hefty 250k! The program was 7 lines long! The compiled C equivalent is 7k.

250K sounds a bit much, like if debugging symbols wasn't stripped ?
 I'm using gdc on linux.
 Is there something I'm missing (i.e. is the compiler chucking in stuff I don't
 need?) or is this for real?
 Is it just an issue with immature compilers?

Phobos is statically linked, libstdc++ is usually dynamically linked. On platforms where C++ is static, like Mac OS X 10.3, it's more even: 31 hello.sh (#!/bin/sh) 637 hello.jar (Java) 12K hello_c (C) 156K hello_d (D) 368K hello_cpp (C++) It's usually not a problem, unless you are *really* cramped for space. An advantage is that you avoid the libstdc++ version portability hell. i.e. you can just run the program, without extra runtime dependencies. --anders PS. The Java version is smallest or largest, depending on whether you include the size of the JVM runtime or not... (If you compile to an EXE using GCJ, it's more like 10M!) Of course, it also took longest to run (with JVM startup)
Jan 06 2007