www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Program size, linking matter, and static this()

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Hello,


Late last night Walter and I figured a few interesting tidbits of 
information. Allow me to give some context, discuss them, and sketch a 
few approaches for improving things.

A while ago Walter wanted to enable function-level linking, i.e. only 
get the needed functions from a given (and presumably large) module. So 
he arranged things that a library contains many small object "files" 
(that actually are generated from a single .d file and never exist on 
disk, only inside the library file, which can be considered an archive 
like tar). Then the linker would only pick the used object "files" from 
the library and link those in. Unfortunately that didn't have nearly the 
expected impact - essentially the size of most binaries stayed the same. 
The mystery was unsolved, and Walter needed to move on to other things.

One particularly annoying issue is that even programs that don't 
ostensibly use anything from an imported module may balloon inexplicably 
in size. Consider:

import std.path;
void main(){}

This program, after stripping and all, has some 750KB in size. Removing 
the import line reduces the size to 218KB. That includes the runtime 
support, garbage collector, and such, and I'll consider it a baseline. 
(A similar but separate discussion could be focused on reducing the 
baseline size, but herein I'll consider it constant.)

What we'd simply want is to be able to import stuff without blatantly 
paying for what we don't use. If a program imports std.path and uses no 
function from it, it should be as large as a program without the import. 
Furthermore, the increase should be incremental - using 2-3 functions 
from std.path should only increase the executable size by a little, not 
suddenly link in all code in that module.

But in experiments it seemed like program size would increase in sudden 
amounts when certain modules were included. After much investigation we 
figured that the following fateful causal sequence happened:

1. Some modules define static constructors with "static this()" or 
"static shared this()", and/or static destructors.

2. These constructors/destructors are linked in automatically whenever a 
module is included.

3. Importing a module with a static constructor (or destructor) will 
generate its ModuleInfo structure, which contains static information 
about all module members. In particular, it keeps virtual table pointers 
for all classes defined inside the module.

4. That means generating ModuleInfo refers all virtual functions defined 
in that module, whether they're used or not.

5. The phenomenon is transitive, e.g. even if std.path has no static 
constructors but imports std.datetime which does, a ModuleInfo is 
generated for std.path too, in addition to the one for std.datetime. So 
now classes inside std.path (if any) will be all linked in.

6. It follows that a module that defines classes which in turn use other 
functions in other modules, and has static constructors (or includes 
other modules that do) will baloon the size of the executable suddenly.

There are a few approaches that we can use to improve the state of affairs.

A. On the library side, use static constructors and destructors 
sparingly inside druntime and std. We can use lazy initialization 
instead of compulsively initializing library internals. I think this is 
often a worthy thing to do in any case (dynamic libraries etc) because 
it only does work if and when work needs to be done at the small cost of 
a check upon each use.

B. On the compiler side, we could use a similar lazy initialization 
trick to only refer class methods in the module if they're actually 
needed. I'm being vague here because I'm not sure what and how that can 
be done.

Here's a list of all files in std using static cdtors:

std/__fileinit.d
std/concurrency.d
std/cpuid.d
std/cstream.d
std/datebase.d
std/datetime.d
std/encoding.d
std/internal/math/biguintcore.d
std/internal/math/biguintx86.d
std/internal/processinit.d
std/internal/windows/advapi32.d
std/mmfile.d
std/parallelism.d
std/perf.d
std/socket.d
std/stdiobase.d
std/uri.d

The majority of them don't do a lot of work and are not much used inside 
phobos, so they don't blow up the executable. The main one that could 
receive some attention is std.datetime. It has a few static ctors and a 
lot of classes. Essentially just importing std.datetime or any std 
module that transitively imports std.datetime (and there are many of 
them) ends up linking in most of Phobos and blows the size up from the 
218KB baseline to 700KB.

Jonathan, could I impose on you to replace all static cdtors in 
std.datetime with lazy initialization? I looked through it and it 
strikes me as a reasonably simple job, but I think you'd know better 
what to do than me.

A similar effort could be conducted to reduce or eliminate static cdtors 
from druntime. I made the experiment of commenting them all, and that 
reduced the size of the baseline from 218KB to 200KB. This is a good 
amount, but not as dramatic as what we can get by working on std.datetime.


Thanks,

Andrei
Dec 16 2011
next sibling parent "Nick Sabalausky" <a a.a> writes:
Interesting stuff.

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:jcg2lu$17p2$1 digitalmars.com...
 We can use lazy initialization instead of compulsively initializing 
 library internals. I think this is often a worthy thing to do in any case 
 (dynamic libraries etc) because it only does work if and when work needs 
 to be done at the small cost of a check upon each use.
That also has the benefit of reducing the risk of dreaded circular ctor dependency problems.
Dec 16 2011
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 16 Dec 2011 13:29:18 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Hello,


 Late last night Walter and I figured a few interesting tidbits of  
 information. Allow me to give some context, discuss them, and sketch a  
 few approaches for improving things.

 A while ago Walter wanted to enable function-level linking, i.e. only  
 get the needed functions from a given (and presumably large) module. So  
 he arranged things that a library contains many small object "files"  
 (that actually are generated from a single .d file and never exist on  
 disk, only inside the library file, which can be considered an archive  
 like tar). Then the linker would only pick the used object "files" from  
 the library and link those in. Unfortunately that didn't have nearly the  
 expected impact - essentially the size of most binaries stayed the same.  
 The mystery was unsolved, and Walter needed to move on to other things.

 One particularly annoying issue is that even programs that don't  
 ostensibly use anything from an imported module may balloon inexplicably  
 in size. Consider:

 import std.path;
 void main(){}

 This program, after stripping and all, has some 750KB in size. Removing  
 the import line reduces the size to 218KB. That includes the runtime  
 support, garbage collector, and such, and I'll consider it a baseline.  
 (A similar but separate discussion could be focused on reducing the  
 baseline size, but herein I'll consider it constant.)

 What we'd simply want is to be able to import stuff without blatantly  
 paying for what we don't use. If a program imports std.path and uses no  
 function from it, it should be as large as a program without the import.  
 Furthermore, the increase should be incremental - using 2-3 functions  
 from std.path should only increase the executable size by a little, not  
 suddenly link in all code in that module.

 But in experiments it seemed like program size would increase in sudden  
 amounts when certain modules were included. After much investigation we  
 figured that the following fateful causal sequence happened:

 1. Some modules define static constructors with "static this()" or  
 "static shared this()", and/or static destructors.

 2. These constructors/destructors are linked in automatically whenever a  
 module is included.

 3. Importing a module with a static constructor (or destructor) will  
 generate its ModuleInfo structure, which contains static information  
 about all module members. In particular, it keeps virtual table pointers  
 for all classes defined inside the module.

 4. That means generating ModuleInfo refers all virtual functions defined  
 in that module, whether they're used or not.

 5. The phenomenon is transitive, e.g. even if std.path has no static  
 constructors but imports std.datetime which does, a ModuleInfo is  
 generated for std.path too, in addition to the one for std.datetime. So  
 now classes inside std.path (if any) will be all linked in.

 6. It follows that a module that defines classes which in turn use other  
 functions in other modules, and has static constructors (or includes  
 other modules that do) will baloon the size of the executable suddenly.

 There are a few approaches that we can use to improve the state of  
 affairs.

 A. On the library side, use static constructors and destructors  
 sparingly inside druntime and std. We can use lazy initialization  
 instead of compulsively initializing library internals. I think this is  
 often a worthy thing to do in any case (dynamic libraries etc) because  
 it only does work if and when work needs to be done at the small cost of  
 a check upon each use.

 B. On the compiler side, we could use a similar lazy initialization  
 trick to only refer class methods in the module if they're actually  
 needed. I'm being vague here because I'm not sure what and how that can  
 be done.
I disagree with this assessment. It's good to know the cause of the problem, but let's look at the root issue -- reflection. The only reason to include class information for classes not being referenced is to be able to construct/use classes at runtime instead of at compile time. But if you look at D's runtime reflection capabilities, they are quite poor. You can only construct a class at runtime if it has a zero-arg constructor. So essentially, we are paying the penalty of having runtime reflection in terms of bloat, but get very very little benefit. I think there are two things that need to be considered: 1. We eventually should have some reasonably complete runtime reflection capability 2. Runtime reflection and shared libraries go hand-in-hand. With shared library support, the bloat penalty isn't nearly as significant. I don't think the right answer is to avoid using features of the language because the compiler/runtime has some design deficiencies. At some point these deficiencies will be fixed, and then we are left with a library that has seemingly odd design choices that we can't change. -Steve
Dec 16 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 1:23 PM, Steven Schveighoffer wrote:
 I disagree with this assessment. It's good to know the cause of the
 problem, but let's look at the root issue -- reflection. The only reason
 to include class information for classes not being referenced is to be
 able to construct/use classes at runtime instead of at compile time. But
 if you look at D's runtime reflection capabilities, they are quite poor.
 You can only construct a class at runtime if it has a zero-arg constructor.

 So essentially, we are paying the penalty of having runtime reflection
 in terms of bloat, but get very very little benefit.
I'd almost agree, but the code showed doesn't use Object.factory(). So that shouldn't be linked in, and shouldn't pull vtables.
 I think there are two things that need to be considered:

 1. We eventually should have some reasonably complete runtime reflection
 capability
 2. Runtime reflection and shared libraries go hand-in-hand. With shared
 library support, the bloat penalty isn't nearly as significant.

 I don't think the right answer is to avoid using features of the
 language because the compiler/runtime has some design deficiencies. At
 some point these deficiencies will be fixed, and then we are left with a
 library that has seemingly odd design choices that we can't change.
Runtime reflection is great, but I think it's a separate issue from what's discussed here. Andrei
Dec 16 2011
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-12-16 20:48, Andrei Alexandrescu wrote:
 On 12/16/11 1:23 PM, Steven Schveighoffer wrote:
 I disagree with this assessment. It's good to know the cause of the
 problem, but let's look at the root issue -- reflection. The only reason
 to include class information for classes not being referenced is to be
 able to construct/use classes at runtime instead of at compile time. But
 if you look at D's runtime reflection capabilities, they are quite poor.
 You can only construct a class at runtime if it has a zero-arg
 constructor.

 So essentially, we are paying the penalty of having runtime reflection
 in terms of bloat, but get very very little benefit.
I'd almost agree, but the code showed doesn't use Object.factory(). So that shouldn't be linked in, and shouldn't pull vtables.
There are other runtime reflection functionality that can be used.
 I think there are two things that need to be considered:

 1. We eventually should have some reasonably complete runtime reflection
 capability
 2. Runtime reflection and shared libraries go hand-in-hand. With shared
 library support, the bloat penalty isn't nearly as significant.

 I don't think the right answer is to avoid using features of the
 language because the compiler/runtime has some design deficiencies. At
 some point these deficiencies will be fixed, and then we are left with a
 library that has seemingly odd design choices that we can't change.
Runtime reflection is great, but I think it's a separate issue from what's discussed here.
I don't think it's completely separate. Can the compiler know if runtime reflection is used or not? -- /Jacob Carlborg
Dec 16 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 2:47 PM, Jacob Carlborg wrote:
 I don't think it's completely separate. Can the compiler know if runtime
 reflection is used or not?
Yes. Reflection is used if reflection primitive functions are called. Andrei
Dec 16 2011
parent Jacob Carlborg <doob me.com> writes:
On 2011-12-16 21:49, Andrei Alexandrescu wrote:
 On 12/16/11 2:47 PM, Jacob Carlborg wrote:
 I don't think it's completely separate. Can the compiler know if runtime
 reflection is used or not?
Yes. Reflection is used if reflection primitive functions are called. Andrei
Yeah, but how does the compiler know which are primitive functions, hard code them in the compiler? Or perhaps the compiler already need to know this. -- /Jacob Carlborg
Dec 16 2011
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 16 Dec 2011 14:48:33 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 12/16/11 1:23 PM, Steven Schveighoffer wrote:
 I disagree with this assessment. It's good to know the cause of the
 problem, but let's look at the root issue -- reflection. The only reason
 to include class information for classes not being referenced is to be
 able to construct/use classes at runtime instead of at compile time. But
 if you look at D's runtime reflection capabilities, they are quite poor.
 You can only construct a class at runtime if it has a zero-arg  
 constructor.

 So essentially, we are paying the penalty of having runtime reflection
 in terms of bloat, but get very very little benefit.
I'd almost agree, but the code showed doesn't use Object.factory(). So that shouldn't be linked in, and shouldn't pull vtables.
You cannot know until link time whether factory is used when compiling individual files. By then it's probably too late to exclude them. The point is that you can instantiate unreferenced classes simply by calling them out by name.
 I think there are two things that need to be considered:

 1. We eventually should have some reasonably complete runtime reflection
 capability
 2. Runtime reflection and shared libraries go hand-in-hand. With shared
 library support, the bloat penalty isn't nearly as significant.

 I don't think the right answer is to avoid using features of the
 language because the compiler/runtime has some design deficiencies. At
 some point these deficiencies will be fixed, and then we are left with a
 library that has seemingly odd design choices that we can't change.
Runtime reflection is great, but I think it's a separate issue from what's discussed here.
I'm not pushing for runtime reflection, all I'm saying is, I don't think it's worth changing how the library is written to work around something because the *compiler* is incorrectly implemented/designed. So why don't we just leave the code size situation as-is? 500kb is not a terribly significant amount, but dlls are on the horizon (Walter has publicly said so). Then size becomes a moot point. If we get reflection, then you will find that having excluded all the runtime information when not used is going to hamper D's reflection capability, and we'll probably have to start putting it back in anyway. In short, dlls will solve the problem, let's work on that instead of shuffling around code. -Steve
Dec 16 2011
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 16 Dec 2011 16:28:03 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 So why don't we just leave the code size situation as-is?  500kb is not  
 a terribly significant amount, but dlls are on the horizon (Walter has  
 publicly said so).  Then size becomes a moot point.

 If we get reflection, then you will find that having excluded all the  
 runtime information when not used is going to hamper D's reflection  
 capability, and we'll probably have to start putting it back in anyway.

 In short, dlls will solve the problem, let's work on that instead of  
 shuffling around code.
The other valid option I see is removing the link to the virtual tables, thereby disabling reflection via factory until we can implement full reflection. -Steve
Dec 16 2011
prev sibling next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 16 December 2011 at 21:28:03 UTC, Steven Schveighoffer 
wrote:
 In short, dlls will solve the problem, let's work on that 
 instead of shuffling around code.
I wouldn't want to cripple either - put all the reflection info in the dll, but keep it sufficiently decoupled so the linker can strip it out when statically linking. The effort in decoupling most the code isn't great.
Dec 16 2011
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 16 Dec 2011 16:48:47 -0500, Adam D. Ruppe  
<destructionator gmail.com> wrote:

 On Friday, 16 December 2011 at 21:28:03 UTC, Steven Schveighoffer wrote:
 In short, dlls will solve the problem, let's work on that instead of  
 shuffling around code.
I wouldn't want to cripple either - put all the reflection info in the dll, but keep it sufficiently decoupled so the linker can strip it out when statically linking. The effort in decoupling most the code isn't great.
The only way I can think of to decouple it is to disable it with a compiler switch, since the compiler is the one including the info. I envision a nasty world where libraries are built 4 ways, with two orthogonal factors -- dynamic vs. static, and reflection vs. no reflection. Oh, hello visual C++, what are you doing here? -Steve
Dec 16 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 3:28 PM, Steven Schveighoffer wrote:
 On Fri, 16 Dec 2011 14:48:33 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 12/16/11 1:23 PM, Steven Schveighoffer wrote:
 I disagree with this assessment. It's good to know the cause of the
 problem, but let's look at the root issue -- reflection. The only reason
 to include class information for classes not being referenced is to be
 able to construct/use classes at runtime instead of at compile time. But
 if you look at D's runtime reflection capabilities, they are quite poor.
 You can only construct a class at runtime if it has a zero-arg
 constructor.

 So essentially, we are paying the penalty of having runtime reflection
 in terms of bloat, but get very very little benefit.
I'd almost agree, but the code showed doesn't use Object.factory(). So that shouldn't be linked in, and shouldn't pull vtables.
You cannot know until link time whether factory is used when compiling individual files. By then it's probably too late to exclude them.
I'm not an expert in linkers, but my understanding is that linkers naturally remove unused object files. That, coupled with dmd's ability to break compilation output in many pseudo-object files, would take care of the matter. Truth be told, once you link in Object.factory(), bam - all classes are linked.
 The point is that you can instantiate unreferenced classes simply by
 calling them out by name.
Yah, but you must call a function to do that.
 I'm not pushing for runtime reflection, all I'm saying is, I don't think
 it's worth changing how the library is written to work around something
 because the *compiler* is incorrectly implemented/designed.

 So why don't we just leave the code size situation as-is? 500kb is not a
 terribly significant amount, but dlls are on the horizon (Walter has
 publicly said so). Then size becomes a moot point.

 If we get reflection, then you will find that having excluded all the
 runtime information when not used is going to hamper D's reflection
 capability, and we'll probably have to start putting it back in anyway.

 In short, dlls will solve the problem, let's work on that instead of
 shuffling around code.
I think there are more issues with static this() than simply executable size, as discussed. Also, adding dynamic linking capability does not mean we give up on static linking. A lot of programs use static linking by choice, and for good reasons. Andrei
Dec 16 2011
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 16 Dec 2011 17:00:45 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 12/16/11 3:28 PM, Steven Schveighoffer wrote:
 On Fri, 16 Dec 2011 14:48:33 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 12/16/11 1:23 PM, Steven Schveighoffer wrote:
 I disagree with this assessment. It's good to know the cause of the
 problem, but let's look at the root issue -- reflection. The only  
 reason
 to include class information for classes not being referenced is to be
 able to construct/use classes at runtime instead of at compile time.  
 But
 if you look at D's runtime reflection capabilities, they are quite  
 poor.
 You can only construct a class at runtime if it has a zero-arg
 constructor.

 So essentially, we are paying the penalty of having runtime reflection
 in terms of bloat, but get very very little benefit.
I'd almost agree, but the code showed doesn't use Object.factory(). So that shouldn't be linked in, and shouldn't pull vtables.
You cannot know until link time whether factory is used when compiling individual files. By then it's probably too late to exclude them.
I'm not an expert in linkers, but my understanding is that linkers naturally remove unused object files. That, coupled with dmd's ability to break compilation output in many pseudo-object files, would take care of the matter. Truth be told, once you link in Object.factory(), bam - all classes are linked.
Factory doesn't directly reference classes, it does so through the moduleinfo tree/array (not sure what it is). So the way it works is, the linker includes the module info because it's defined as static data, which includes the vtable functions, and factory can instantiate non-referenced classes because of this fact, not the other way around.
 I'm not pushing for runtime reflection, all I'm saying is, I don't think
 it's worth changing how the library is written to work around something
 because the *compiler* is incorrectly implemented/designed.

 So why don't we just leave the code size situation as-is? 500kb is not a
 terribly significant amount, but dlls are on the horizon (Walter has
 publicly said so). Then size becomes a moot point.

 If we get reflection, then you will find that having excluded all the
 runtime information when not used is going to hamper D's reflection
 capability, and we'll probably have to start putting it back in anyway.

 In short, dlls will solve the problem, let's work on that instead of
 shuffling around code.
I think there are more issues with static this() than simply executable size, as discussed. Also, adding dynamic linking capability does not mean we give up on static linking. A lot of programs use static linking by choice, and for good reasons.
Even statically linked programs might use runtime reflection. I agree the issue is not static linking vs. dynamic linking, but dynamic linking would hide the problem quite well. Note that on Linux today, the executable is not truly static -- OS libs are dynamically linked. Another option is to disable runtime reflection via a compiler switch (which would sever the ties between moduleinfo and classinfo). Then we simply must make sure we don't use factory in the library anywhere. -Steve
Dec 16 2011
parent reply "Marco Leise" <Marco.Leise gmx.de> writes:
Am 16.12.2011, 23:08 Uhr, schrieb Steven Schveighoffer  
<schveiguy yahoo.com>:

 Note that on Linux today, the executable is not truly static -- OS libs  
 are dynamically linked.
That should hold true for any OS. Otherwise, how would the program communicate with the kernel and drivers, i.e. render a button on the screen? Some dynamically linked in functions must provide the interface to that "administrative singleton" that manages system resources.
Dec 18 2011
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 18 Dec 2011 18:02:10 -0500, Marco Leise <Marco.Leise gmx.de> wrote:

 Am 16.12.2011, 23:08 Uhr, schrieb Steven Schveighoffer  
 <schveiguy yahoo.com>:

 Note that on Linux today, the executable is not truly static -- OS libs  
 are dynamically linked.
That should hold true for any OS. Otherwise, how would the program communicate with the kernel and drivers, i.e. render a button on the screen? Some dynamically linked in functions must provide the interface to that "administrative singleton" that manages system resources.
Not necessarily. On Linux, system calls provide the "interface" between the code and the OS. A system call is essentially an OS interrupt, similar to a network protocol. You don't need dynamic linking to implement it. Remember, Linux didn't even support dynamic libraries before kernel 1.2 maybe? Hm... must check wikipedia... But my point is, if the intention is that you have a myriad of D based libraries or executables on your system, then druntime and phobos enter the same realm as glibc. -Steve
Dec 19 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Dec 16, 2011, at 2:00 PM, Andrei Alexandrescu wrote:
=20
 I'm not an expert in linkers, but my understanding is that linkers =
naturally remove unused object files. That, coupled with dmd's ability = to break compilation output in many pseudo-object files, would take care = of the matter. Truth be told, once you link in Object.factory(), bam - = all classes are linked. There's an old bugzilla entry that may apply: http://d.puremagic.com/issues/show_bug.cgi?id=3D879=
Dec 16 2011
prev sibling parent "Martin Nowak" <dawg dawgfoto.de> writes:
 I'm not an expert in linkers, but my understanding is that linkers  
 naturally remove unused object files. That, coupled with dmd's ability  
 to break compilation output in many pseudo-object files, would take care  
 of the matter. Truth be told, once you link in Object.factory(), bam -  
 all classes are linked.
That's strange, because Object.factory should only require TypeInfo_Class which only indirectly iterates through all modules. The ModuleInfos do drag in all their classes so what we currently don't get is a module with only some of it's classes. What OS are you using? Can you bundle up some files that reproduce this?
Jan 18 2012
prev sibling parent reply torhu <no spam.invalid> writes:
On 16.12.2011 22:28, Steven Schveighoffer wrote:
 In short, dlls will solve the problem, let's work on that instead of
 shuffling around code.
How exactly do they solve the problem? An exe plus a DLL version of the library will usually be larger than just a statically linked exe.
Dec 16 2011
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, December 16, 2011 23:30:44 torhu wrote:
 On 16.12.2011 22:28, Steven Schveighoffer wrote:
 In short, dlls will solve the problem, let's work on that instead of
 shuffling around code.
How exactly do they solve the problem? An exe plus a DLL version of the library will usually be larger than just a statically linked exe.
You have to stick it all in the DLL anyway (since you can't know which parts will and won't be used), so the whole issue of not including used functionality goes away completely. There's no point in worrying about how much unused functionality gets included when you have no choice but to include everything regardless of whether it's actually used. - Jonathan M Davis
Dec 16 2011
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 16 Dec 2011 17:30:44 -0500, torhu <no spam.invalid> wrote:

 On 16.12.2011 22:28, Steven Schveighoffer wrote:
 In short, dlls will solve the problem, let's work on that instead of
 shuffling around code.
How exactly do they solve the problem? An exe plus a DLL version of the library will usually be larger than just a statically linked exe.
The DLL is loaded into memory once. With static linking, it's loaded every time you run an exe. -Steve
Dec 19 2011
parent reply torhu <no spam.invalid> writes:
On 19.12.2011 16:08, Steven Schveighoffer wrote:
 On Fri, 16 Dec 2011 17:30:44 -0500, torhu<no spam.invalid>  wrote:

  On 16.12.2011 22:28, Steven Schveighoffer wrote:
  In short, dlls will solve the problem, let's work on that instead of
  shuffling around code.
How exactly do they solve the problem? An exe plus a DLL version of the library will usually be larger than just a statically linked exe.
The DLL is loaded into memory once. With static linking, it's loaded every time you run an exe.
I thought we were talking about distribution sizes, not memory use. But anyway, DLL's won't do a lot as long as people don't have a whole bunch of D programs installed.
Dec 19 2011
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 19 Dec 2011 13:09:18 -0500, torhu <no spam.invalid> wrote:

 On 19.12.2011 16:08, Steven Schveighoffer wrote:
 On Fri, 16 Dec 2011 17:30:44 -0500, torhu<no spam.invalid>  wrote:

  On 16.12.2011 22:28, Steven Schveighoffer wrote:
  In short, dlls will solve the problem, let's work on that instead of
  shuffling around code.
How exactly do they solve the problem? An exe plus a DLL version of the library will usually be larger than just a statically linked exe.
The DLL is loaded into memory once. With static linking, it's loaded every time you run an exe.
I thought we were talking about distribution sizes, not memory use. But anyway, DLL's won't do a lot as long as people don't have a whole bunch of D programs installed.
Right, in order for dlls to make a difference, you need to separate the library install from the exe install, as is done most of the time. If you are installing one D application on your box, what would be the issue with the size anyway? The complaint is generally that the size is much bigger than a hello world compiled for C/C++, which obviously doesn't take into account that the C/C++ standard libraries are DLLs. -Steve
Dec 19 2011
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-12-19 19:09, torhu wrote:
 On 19.12.2011 16:08, Steven Schveighoffer wrote:
 On Fri, 16 Dec 2011 17:30:44 -0500, torhu<no spam.invalid> wrote:

 On 16.12.2011 22:28, Steven Schveighoffer wrote:
 In short, dlls will solve the problem, let's work on that instead of
 shuffling around code.
How exactly do they solve the problem? An exe plus a DLL version of the library will usually be larger than just a statically linked exe.
The DLL is loaded into memory once. With static linking, it's loaded every time you run an exe.
I thought we were talking about distribution sizes, not memory use. But anyway, DLL's won't do a lot as long as people don't have a whole bunch of D programs installed.
It could be useful for a package manager. Theoretically all installed packages could share the same dynamic library. But I would guess the the packages would depend on different versions of the library and the package manager would end up installing a whole bunch of different versions of the Phobos and druntime. -- /Jacob Carlborg
Dec 19 2011
parent reply "Marco Leise" <Marco.Leise gmx.de> writes:
Am 19.12.2011, 20:43 Uhr, schrieb Jacob Carlborg <doob me.com>:

 It could be useful for a package manager. Theoretically all installed  
 packages could share the same dynamic library. But I would guess the the  
 packages would depend on different versions of the library and the  
 package manager would end up installing a whole bunch of different  
 versions of the Phobos and druntime.
No! Let's please try to get closer to something that works with package managers than the situation on Windows. On Windows I see few applications that install libraries separately, unless they started on Linux or the libraries are established like DirectX. In the past DLLs from newly installed programs used to overwrite existing DLLs. IIRC the DLLs were then checked for their versions by installers, so they are never downgraded, but that still broke some applications with library updates that changed the API. Starting with Vista, there is the winsxs difrectory that - as I understand it - keeps a copy of every version of every dll associated to the programs that installed/use them. Package managers are close to my ideal world: - different API versions (major revisions) can be installed in parallel - applications link to the API version they were designed for - bug fixes replace the old DLL for the whole system, all applications benefit - RAM is shared between applications that use the same DLL I'd think it would be bad to make cuts here. If you cannot even imagine an operating system with 1000 little apps like type/cat, cp/copy, sed etc... written in D, because they would all link statically against the runtime and cause major bloat, then that is turning off another few % of C users and purists. You don't drive an off-road car, because you go off-roads so often, but because you could imagine it. (Please buy small cars for city use.) Linking against different library versions goes in practice like this: There is at least one version installed, maybe libphobos2.so.1.057. The 1 would be a major revision (one where hard deprecations occur), then there is a link named libphobos2.so.1 to that file, that all applications using API version 1 link against. So the actual file can be updated to libphobos2.so.1.058 without recompiles or breakage.
Dec 20 2011
parent "dsimcha" <dsimcha yahoo.com> writes:
On Tuesday, 20 December 2011 at 20:51:38 UTC, Marco Leise wrote:
 Am 19.12.2011, 20:43 Uhr, schrieb Jacob Carlborg <doob me.com>:
 On Windows I see few applications that install libraries 
 separately, unless they started on Linux or the libraries are 
 established like DirectX. In the past DLLs from newly installed 
 programs used to overwrite existing DLLs. IIRC the DLLs were 
 then checked for their versions by installers, so they are 
 never downgraded, but that still broke some applications with 
 library updates that changed the API. Starting with Vista, 
 there is the winsxs difrectory that - as I understand it - 
 keeps a copy of every version of every dll associated to the 
 programs that installed/use them.
Minor nitpick: winsxs has been around since XP.
Dec 20 2011
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-12-16 20:23, Steven Schveighoffer wrote:
 I disagree with this assessment. It's good to know the cause of the
 problem, but let's look at the root issue -- reflection. The only reason
 to include class information for classes not being referenced is to be
 able to construct/use classes at runtime instead of at compile time. But
 if you look at D's runtime reflection capabilities, they are quite poor.
 You can only construct a class at runtime if it has a zero-arg constructor.
It's not very useful as is, but you can create your own version that doesn't call the constructor and that can be more useful sometimes. I'm using that technique in my serialization library and providing a special method that can act as a constructor. -- /Jacob Carlborg
Dec 16 2011
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, December 16, 2011 12:29:18 Andrei Alexandrescu wrote:
 Jonathan, could I impose on you to replace all static cdtors in
 std.datetime with lazy initialization? I looked through it and it
 strikes me as a reasonably simple job, but I think you'd know better
 what to do than me.
 
 A similar effort could be conducted to reduce or eliminate static cdtors
 from druntime. I made the experiment of commenting them all, and that
 reduced the size of the baseline from 218KB to 200KB. This is a good
 amount, but not as dramatic as what we can get by working on std.datetime.
Hmm. I had reply for this already, but it seems to have disappeared, so I'll try again. You could make core.time use property functions instead of the static immutable variables that it's using now for ticksPerSec and appOrigin, but in order to do that right would require introducing a mutex or synchronized block (which is really just a mutex under the hood anyway), and I'm loathe to do that in time-related code. ticksPerSec gets used all over the place in TickDuration, and that could have a negative impact on performance for something that needs to be really fast (since it's used in stuff like StopWatch and benchmarking). On top of that, in order to maintain the current semantics, the property functions would have to be pure, which they can't be without doing some nasty casting to convince the compiler that stuff which isn't pure is actually pure. For std.datetime, the problem would be reduced if a class could be created in CTFE and still be around at runtime, but we can't do that yet, and it wouldn't completely solve the problem, since the shared static constructor related to LocalTime has to call tzset. So, some sort of runtime initialization must be done. And the instances for the singleton are not only immutable, but the functions for getting them are pure. So, once again, some nasty casting would be required to get it to work without breaking purity. And once again, we'd have introduce a mutex. And for both core.time and std.datetime we're talking about a mutex would be needed only briefly to ensure that we don't end up with two threads trying to initialize the variable at the same time. After that, it would just be impeding performance for no value. They're classic situations for static constructors - initializing static immutable variables - and really, they _should_ be using static constructors. If we have to get rid of them, it's to get around other problems in the language or compiler instead of fixing those problems. So, on some level, that seems like a failure on the part of the language and the compiler. If we _have_ to find a workaround, then we have to find a workaround, but I find the need to be distasteful to say the least. I previously tried to get rid of the static constructors in std.datetime and couldn't precisely because they're needed unless you play major casting games to get around immutable and pure. If we play nice, it's impossible to get rid of the static constructors in std.datetime. It probably is possible if we do nasty casting, but (much as I hate to use the word) it seems like this is a hack to get around the fact that the compiler isn't dealing with static constructors as well as we'd like. I'd _really_ like to see this fixed at the compiler level. And honestly, I think that a far worse problem with static constructors is circular dependencies. _That_ is something that needs to be addressed with regards to static constructors. In general at this point, it's looking like static constructors are turning out to be a bit of a failure on some level, given the issues that we're having because of them, and I think that we should fix the language and/or compiler so that they _aren't_ a failure. - Jonathan M Davis
Dec 16 2011
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 12/16/2011 08:41 PM, Jonathan M Davis wrote:
 On Friday, December 16, 2011 12:29:18 Andrei Alexandrescu wrote:
 Jonathan, could I impose on you to replace all static cdtors in
 std.datetime with lazy initialization? I looked through it and it
 strikes me as a reasonably simple job, but I think you'd know better
 what to do than me.

 A similar effort could be conducted to reduce or eliminate static cdtors
 from druntime. I made the experiment of commenting them all, and that
 reduced the size of the baseline from 218KB to 200KB. This is a good
 amount, but not as dramatic as what we can get by working on std.datetime.
Hmm. I had reply for this already, but it seems to have disappeared, so I'll try again. You could make core.time use property functions instead of the static immutable variables that it's using now for ticksPerSec and appOrigin, but in order to do that right would require introducing a mutex or synchronized block (which is really just a mutex under the hood anyway), and I'm loathe to do that in time-related code. ticksPerSec gets used all over the place in TickDuration, and that could have a negative impact on performance for something that needs to be really fast (since it's used in stuff like StopWatch and benchmarking). On top of that, in order to maintain the current semantics, the property functions would have to be pure, which they can't be without doing some nasty casting to convince the compiler that stuff which isn't pure is actually pure.
lazy variables would resolve this.
 For std.datetime, the problem would be reduced if a class could be created in
 CTFE and still be around at runtime, but we can't do that yet, and it wouldn't
 completely solve the problem, since the shared static constructor related to
 LocalTime has to call tzset. So, some sort of runtime initialization must be
 done. And the instances for the singleton are not only immutable, but the
 functions for getting them are pure. So, once again, some nasty casting would
 be required to get it to work without breaking purity. And once again, we'd
 have introduce a mutex. And for both core.time and std.datetime we're talking
 about a mutex would be needed only briefly to ensure that we don't end up with
 two threads trying to initialize the variable at the same time. After that, it
 would just be impeding performance for no value. They're classic situations
 for static constructors - initializing static immutable variables - and
 really, they _should_ be using static constructors. If we have to get rid of
 them, it's to get around other problems in the language or compiler instead of
 fixing those problems. So, on some level, that seems like a failure on the part
 of the language
no.
 and the compiler.
yes. Although I am not severely affected by 500kb of bloat.
 If we _have_ to find a workaround, then we
 have to find a workaround, but I find the need to be distasteful to say the
 least. I previously tried to get rid of the static constructors in
 std.datetime and couldn't precisely because they're needed unless you play
 major casting games to get around immutable and pure.

 If we play nice, it's impossible to get rid of the static constructors in
 std.datetime. It probably is possible if we do nasty casting, but (much as I
 hate to use the word) it seems like this is a hack to get around the fact that
 the compiler isn't dealing with static constructors as well as we'd like. I'd
 _really_ like to see this fixed at the compiler level.

 And honestly, I think that a far worse problem with static constructors is
 circular dependencies. _That_ is something that needs to be addressed with
 regards to static constructors.
Circular dependencies are not to be blamed on the design of static constructors.
 In general at this point, it's looking like
 static constructors are turning out to be a bit of a failure on some level,
 given the issues that we're having because of them, and I think that we should
 fix the language and/or compiler so that they _aren't_ a failure.

 - Jonathan M Davis
We are having (minor!!) problems because the task of initializing global data in a modular way is inherently hard. Just have a look how other languages handle initialization of global data and you'll notice that the D solution is actually very sensible.
Dec 16 2011
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, December 16, 2011 21:06:49 Timon Gehr wrote:
 On 12/16/2011 08:41 PM, Jonathan M Davis wrote:
 On Friday, December 16, 2011 12:29:18 Andrei Alexandrescu wrote:
 Jonathan, could I impose on you to replace all static cdtors in
 std.datetime with lazy initialization? I looked through it and it
 strikes me as a reasonably simple job, but I think you'd know better
 what to do than me.
 
 A similar effort could be conducted to reduce or eliminate static
 cdtors
 from druntime. I made the experiment of commenting them all, and that
 reduced the size of the baseline from 218KB to 200KB. This is a good
 amount, but not as dramatic as what we can get by working on
 std.datetime.> 
Hmm. I had reply for this already, but it seems to have disappeared, so I'll try again. You could make core.time use property functions instead of the static immutable variables that it's using now for ticksPerSec and appOrigin, but in order to do that right would require introducing a mutex or synchronized block (which is really just a mutex under the hood anyway), and I'm loathe to do that in time-related code. ticksPerSec gets used all over the place in TickDuration, and that could have a negative impact on performance for something that needs to be really fast (since it's used in stuff like StopWatch and benchmarking). On top of that, in order to maintain the current semantics, the property functions would have to be pure, which they can't be without doing some nasty casting to convince the compiler that stuff which isn't pure is actually pure.
lazy variables would resolve this.
True, but we don't have them.
 Circular dependencies are not to be blamed on the design of static
 constructors.
Yes they are. static constructors completely chicken out on them. Not only is there no real attempt to determine whether the static constructors are actually dependent (which granted, isn't an easy problem), but there is _zero_ support in the language for resolving such circular dependencies. There's no way to say that they _aren't_ dependent even if you can clearly see that they aren't. The solution used in Phobos (which won't work in std.datetime due to the use of immutable and pure) is to create a C module which has the code from the static constructor and then have a separate module which calls it in its static constructor. It works, but it's not pretty (and it doesn't always work - e.g. std.datetime), and it would be _far_ better if you could just mark a static constructor as not depending on anything or mark it as not depending on a specific module or something similar. And given how disgusting it generally is to even figure out what's causing a circular dependency when the runtime won't start your program because of it, I really think that this is a problem which should resolved. static constructors need to be improved.
 In general at this point, it's looking like
 static constructors are turning out to be a bit of a failure on some
 level, given the issues that we're having because of them, and I think
 that we should fix the language and/or compiler so that they _aren't_ a
 failure.
 
 - Jonathan M Davis
We are having (minor!!) problems because the task of initializing global data in a modular way is inherently hard. Just have a look how other languages handle initialization of global data and you'll notice that the D solution is actually very sensible.
Yes. The situation with D is better than that of many other languages, but what prodblems we do have can be _really_ annoying to deal with. Have to deal with circular dependencies due to static module constructors which aren't actually interdependent is one of the most annoying issues in D IMHO. - Jonathan M Davis
Dec 16 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 12/16/2011 09:31 PM, Jonathan M Davis wrote:
 On Friday, December 16, 2011 21:06:49 Timon Gehr wrote:
 On 12/16/2011 08:41 PM, Jonathan M Davis wrote:
 On Friday, December 16, 2011 12:29:18 Andrei Alexandrescu wrote:
 Jonathan, could I impose on you to replace all static cdtors in
 std.datetime with lazy initialization? I looked through it and it
 strikes me as a reasonably simple job, but I think you'd know better
 what to do than me.

 A similar effort could be conducted to reduce or eliminate static
 cdtors
 from druntime. I made the experiment of commenting them all, and that
 reduced the size of the baseline from 218KB to 200KB. This is a good
 amount, but not as dramatic as what we can get by working on
 std.datetime.>
Hmm. I had reply for this already, but it seems to have disappeared, so I'll try again. You could make core.time use property functions instead of the static immutable variables that it's using now for ticksPerSec and appOrigin, but in order to do that right would require introducing a mutex or synchronized block (which is really just a mutex under the hood anyway), and I'm loathe to do that in time-related code. ticksPerSec gets used all over the place in TickDuration, and that could have a negative impact on performance for something that needs to be really fast (since it's used in stuff like StopWatch and benchmarking). On top of that, in order to maintain the current semantics, the property functions would have to be pure, which they can't be without doing some nasty casting to convince the compiler that stuff which isn't pure is actually pure.
lazy variables would resolve this.
True, but we don't have them.
 Circular dependencies are not to be blamed on the design of static
 constructors.
Yes they are.
No. They arise from the design of the module hierarchy.
 static constructors completely chicken out on them. Not only is
 there no real attempt to determine whether the static constructors are
 actually dependent (which granted, isn't an easy problem),
I don't think that is an option.
 but there is _zero_ support in the language for resolving such circular
dependencies. There's no
 way to say that they _aren't_ dependent even if you can clearly see that they
 aren't.
Yes there is. The compiler and runtime understand that they are not mutually dependent if their modules are not mutually dependent. Package level is the right level for dealing with such issues because the circular dependencies are a modularity problem.
 The solution used in Phobos (which won't work in std.datetime due to
 the use of immutable and pure) is to create a C module which has the code from
 the static constructor and then have a separate module which calls it in its
 static constructor.
You don't need a C function if you just factor out every variable it initializes to the separate D module. __fileinit.d works that way. I don't see why stdiobase.d could not do the same.
 It works, but it's not pretty (and it doesn't always work
 - e.g. std.datetime), and it would be _far_ better if you could just mark a
 static constructor as not depending on anything or mark it as not depending on
 a specific module or something similar.
How would that be checked?
 And given how disgusting it generally
 is to even figure out what's causing a circular dependency when the runtime
 won't start your program because of it, I really think that this is a problem
 which should resolved. static constructors need to be improved.
Nobody has figured out how to solve the problem of modular global data initialization. That is because there probably is no solution.
 In general at this point, it's looking like
 static constructors are turning out to be a bit of a failure on some
 level, given the issues that we're having because of them, and I think
 that we should fix the language and/or compiler so that they _aren't_ a
 failure.

 - Jonathan M Davis
We are having (minor!!) problems because the task of initializing global data in a modular way is inherently hard. Just have a look how other languages handle initialization of global data and you'll notice that the D solution is actually very sensible.
Yes. The situation with D is better than that of many other languages, but what prodblems we do have can be _really_ annoying to deal with. Have to deal with circular dependencies due to static module constructors which aren't actually interdependent is one of the most annoying issues in D IMHO.
Adding a language construct that turns off the checking entirely (as you seem to suggest) is not at all better than having to create a few additional source files.
Dec 16 2011
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, December 16, 2011 22:41:14 Timon Gehr wrote:
 On 12/16/2011 09:31 PM, Jonathan M Davis wrote:
 You don't need a C function if you just factor out every variable it
 initializes to the separate D module. __fileinit.d works that way. I
 don't see why stdiobase.d could not do the same.
That only works if the variable being initialized is in the new module instead of the original module, which you can't always do.
 It works, but it's not pretty (and it doesn't always work
 - e.g. std.datetime), and it would be _far_ better if you could just
 mark a static constructor as not depending on anything or mark it as
 not depending on a specific module or something similar.
How would that be checked?
It wouldn't be. It wouldn't need to be. The programmer is telling the compiler that there isn't a dependency. It's up to the programmer to make sure that it's right, and it's wrong, it's their fault. There are plenty of other features like that in D - just not SafeD.
 annoying issues in D IMHO.
Adding a language construct that turns off the checking entirely (as you seem to suggest) is not at all better than having to create a few additional source files.
I completely disagree. For instance, it's impossible to move the singleton instances of UTC and LocalTime from std.datetime into another module without breaking encapsulation, and it's definitely impossible to do it and leave them as members of their respective classes. Those static constructors clearly don't rely on any other modules except for the one which gives the declaration for tzset (and has no static constructors). But if std.file needed a module constructor, we'd end up with a circular dependency between std.datetime and std.file when clearly nothing in std.datetime's static constructor relies on std.file in any way shape or form. It would be a huge improvement to be able to just mark those static constructors as not relying on any other modules having their static constructors run first. As it stands, it's a royal pain to deal with any circular dependencies which pop up and because of that, it quickly becomes best practice to avoid static constructors as much as possible, which is a big problem IMHO. Factoring out the static constructor's contents into a separate module is not always possible, and it's an ugly solution IMHO. I'd _much_ rather have a feature where I can tell the compiler that there is no circular dependency so that it can appropriately order the loading of the modules. - Jonathan M Davis
Dec 16 2011
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 12/16/2011 11:39 PM, Jonathan M Davis wrote:
 [...]  For instance, it's impossible to move the singleton
 instances of UTC and LocalTime from std.datetime into another module without
 breaking encapsulation.
In what way would encapsulation be broken by just moving the class to a helper module?
Dec 16 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 4:39 PM, Jonathan M Davis wrote:
 It wouldn't be. It wouldn't need to be. The programmer is telling the compiler
 that there isn't a dependency. It's up to the programmer to make sure that
 it's right, and it's wrong, it's their fault. There are plenty of other
 features like that in D - just not SafeD.
I don't see progress here over arranging packages and modules to reflect program structure in a way that clarifies it to the human /and/ the compiler.
 annoying issues in D IMHO.
Adding a language construct that turns off the checking entirely (as you seem to suggest) is not at all better than having to create a few additional source files.
I completely disagree. For instance, it's impossible to move the singleton instances of UTC and LocalTime from std.datetime into another module without breaking encapsulation, and it's definitely impossible to do it and leave them as members of their respective classes.
Maybe there's an issue with the design. Maybe Singleton (the most damned of all patterns) is not the best choice here. Or maybe the use of an inheritance hierarchy with a grand total of 4 classes. Or maybe the encapsulation could be rethought. The general point is, a design lives within a language. Any language is going to disallow a few designs or make them unsuitable for particular situation. This is, again, multiplied by the context: it's the standard library.
 Those static constructors clearly
 don't rely on any other modules except for the one which gives the declaration
 for tzset (and has no static constructors). But if std.file needed a module
 constructor, we'd end up with a circular dependency between std.datetime and
 std.file when clearly nothing in std.datetime's static constructor relies on
 std.file in any way shape or form. It would be a huge improvement to be able to
 just mark those static constructors as not relying on any other modules having
 their static constructors run first. As it stands, it's a royal pain to deal
 with any circular dependencies which pop up and because of that, it quickly
 becomes best practice to avoid static constructors as much as possible, which
 is a big problem IMHO.
I think this point has gotten into an extreme, a corner of the design space. Yeah, sky's blue, apple pie is good (and too much of it gives diabetes), and module dependencies can be messy. But it strikes me as a bit backwards to add instructions in the core language to lessen guarantees and make things even messier, when alternatives exist that foster better dependency control for the very rare situations that need intervention. It's just not proportional response. The persona using such a feature would be quite an odd combination - a developer with sophisticated enough needs to want unchecked dependencies as a feature, yet naive enough to be unable to solve the problem without the feature, and yet again sophisticated enough to not make mistakes in using said feature.
 Factoring out the static constructor's contents into a separate module is not
 always possible, and it's an ugly solution IMHO. I'd _much_ rather have a
 feature where I can tell the compiler that there is no circular dependency so
 that it can appropriately order the loading of the modules.
But what's the appropriate order then? :o) Andrei
Dec 16 2011
next sibling parent reply maarten van damme <maartenvd1994 gmail.com> writes:
how did other languages solve this issue? I can't imagine D beeing the only
language with static constructors, do they have that problem too?
Dec 16 2011
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/16/2011 3:18 PM, maarten van damme wrote:
 how did other languages solve this issue? I can't imagine D beeing the only
 language with static constructors, do they have that problem too?
In C++, the order that static constructors run is implementation defined. No guarantees at all. The programmer has no reasonable way to control the order in which they are done. (Of course, C++ doesn't even have modules, so the notion of a module constructor is tenuous at best.)
Dec 16 2011
parent Sean Kelly <sean invisibleduck.org> writes:
On Dec 16, 2011, at 3:24 PM, Walter Bright wrote:

 On 12/16/2011 3:18 PM, maarten van damme wrote:
 how did other languages solve this issue? I can't imagine D beeing =
the only
 language with static constructors, do they have that problem too?
=20 In C++, the order that static constructors run is implementation =
defined. No guarantees at all. The programmer has no reasonable way to = control the order in which they are done.
=20
 (Of course, C++ doesn't even have modules, so the notion of a module =
constructor is tenuous at best.) This aspect of C++ drives me absolutely crazy. Though I imagine it = bothers a lot of people given all the coverage static initialization has = gotten in C++ literature.=
Dec 16 2011
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 12/17/2011 12:18 AM, maarten van damme wrote:
 how did other languages solve this issue? I can't imagine D beeing the
 only language with static constructors, do they have that problem too?
is to call the static constructor lazily upon class load time. That means it can be called at an arbitrary point during your program execution. And if you accidentally have circular dependencies between static constructors, your program may or may not blow up or behave badly.
Dec 16 2011
prev sibling parent Somedude <lovelydear mailmetrash.com> writes:
Le 17/12/2011 00:18, maarten van damme a écrit :
 how did other languages solve this issue? I can't imagine D beeing the
 only language with static constructors, do they have that problem too?
AFAIK, I believe like in D, it's best practice to avoid static well, even though the running order is well-defined. The dependency injection design pattern seems to help here.
Dec 17 2011
prev sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, December 16, 2011 17:13:49 Andrei Alexandrescu wrote:
 Maybe there's an issue with the design. Maybe Singleton (the most damned
 of all patterns) is not the best choice here. Or maybe the use of an
 inheritance hierarchy with a grand total of 4 classes. Or maybe the
 encapsulation could be rethought.
 
 The general point is, a design lives within a language. Any language is
 going to disallow a few designs or make them unsuitable for particular
 situation. This is, again, multiplied by the context: it's the standard
 library.
I don't know what's wrong with singletons. It's a great pattern in certain circumstances. In this case, it avoids unnecessary allocations every time that you do something like Clock.currTime(). There's no reason to keep allocating new instances of LocalTime and wasting memory. The data in all of them would be identical. And since the time zone has to be dynamic, it requires either a class or function pointers (or delegates). And since multiple functions are involved per time zone, it's far cleaner to use class. It has the added benefit of giving you a nice place to do stuff like ask the time zone its name. So, I don't see what could be better than using classes for the time zones like it does now. And given the fact that it's completely unnecessary and wasteful to allocate multiple instances of UTC and LocalTime, it seems to me that the singleton pattern is exactly the correct solution for this problem. There would be fewer potential issues with circular dependencies if std.datetime were broken up, but the consensus seems to be that we don't want to do that. Regardless, if I find a way to lazily load the singletons in spite of immutable and pure, then there won't be any more need for the static constructors for them. There's still one for the unit tests, but worse comes to worst, that functionality could be moved to a function which is called by the first unittest block.
 But what's the appropriate order then? :o)
It doesn't matter. The static constructors in std.datetime has no dependencies on other modules at all aside from object and the core module which holds the declaration for tzset. In neither case does it depend on any other static constructors. In my experience, that's almost always the case. But because of how circular dependencies are treated, the compiler/runtime considers it a circular dependency as soon as two modules which import each other directly - or worse, indirectly - both have module constructors, regardless of whether there is anything even vaguely interdependent about those static constructors and what they initialize. So, you're forced to move stuff into other modules, and in some cases (such as when pure or immutable is being used), that may not work. Clearly, I'm not going to win any arguments on this, given that both you and Walter are definitely opposed, but I definitely think that the current situation with circular dependencies is one of D's major warts. - Jonathan M Davis
Dec 16 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 5:50 PM, Jonathan M Davis wrote:
 On Friday, December 16, 2011 17:13:49 Andrei Alexandrescu wrote:
 Maybe there's an issue with the design. Maybe Singleton (the most damned
 of all patterns) is not the best choice here. Or maybe the use of an
 inheritance hierarchy with a grand total of 4 classes. Or maybe the
 encapsulation could be rethought.

 The general point is, a design lives within a language. Any language is
 going to disallow a few designs or make them unsuitable for particular
 situation. This is, again, multiplied by the context: it's the standard
 library.
I don't know what's wrong with singletons.
http://en.wikipedia.org/wiki/Singleton_pattern Second paragraph.
 It's a great pattern in certain
 circumstances. In this case, it avoids unnecessary allocations every time that
 you do something like Clock.currTime(). There's no reason to keep allocating
 new instances of LocalTime and wasting memory. The data in all of them would
 be identical. And since the time zone has to be dynamic, it requires either a
 class or function pointers (or delegates). And since multiple functions are
 involved per time zone, it's far cleaner to use class. It has the added
 benefit of giving you a nice place to do stuff like ask the time zone its name.
 So, I don't see what could be better than using classes for the time zones
 like it does now. And given the fact that it's completely unnecessary and
 wasteful to allocate multiple instances of UTC and LocalTime, it seems to me
 that the singleton pattern is exactly the correct solution for this problem.
You're using a stilted version of it. Most often the singleton object is created lazily upon the first access, whereas std.datetime creates the object (and therefore shotguns linkage with the garbage collector) even if never needed. But what I'm trying here is to lift the level of discourse. The Singleton sounds like the solution of choice already presupposing that inheritance and polymorphism are good decisions. What I'm trying to say is that D should be rich enough to allow you considerable freedom in the design space, so we should have enough means to navigate around this one particular issue. I don't think we can say with a straight face we can't avoid use of static this inside std.datetime.
 There would be fewer potential issues with circular dependencies if
 std.datetime were broken up, but the consensus seems to be that we don't want
 to do that. Regardless, if I find a way to lazily load the singletons in spite
 of immutable and pure, then there won't be any more need for the static
 constructors for them. There's still one for the unit tests, but worse comes
 to worst, that functionality could be moved to a function which is called by
 the first unittest block.
Maybe the choice of immutable and pure is too restrictive. How about making the object returned const?
 But what's the appropriate order then? :o)
It doesn't matter. The static constructors in std.datetime has no dependencies on other modules at all aside from object and the core module which holds the declaration for tzset. In neither case does it depend on any other static constructors. In my experience, that's almost always the case. But because of how circular dependencies are treated, the compiler/runtime considers it a circular dependency as soon as two modules which import each other directly - or worse, indirectly - both have module constructors, regardless of whether there is anything even vaguely interdependent about those static constructors and what they initialize. So, you're forced to move stuff into other modules, and in some cases (such as when pure or immutable is being used), that may not work.
Under what circumstances it doesn't work, and how would adding _more_ support for _less_ safety would be better than a glorified cast that you can use _today_?
 Clearly, I'm not going to win any arguments on this, given that both you and
 Walter are definitely opposed, but I definitely think that the current
situation
 with circular dependencies is one of D's major warts.
I'm not nailed to the floor. Any good arguments would definitely change my opinion. Andrei
Dec 16 2011
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, December 16, 2011 18:05:56 Andrei Alexandrescu wrote:
 On 12/16/11 5:50 PM, Jonathan M Davis wrote:
 http://en.wikipedia.org/wiki/Singleton_pattern
 
 Second paragraph.
Valid points, but it's still useful under some circumstances. I don't actually use it very often personally. It just made sense here. Thanks for the link.
 You're using a stilted version of it. Most often the singleton object is
 created lazily upon the first access, whereas std.datetime creates the
 object (and therefore shotguns linkage with the garbage collector) even
 if never needed.
 
 But what I'm trying here is to lift the level of discourse. The
 Singleton sounds like the solution of choice already presupposing that
 inheritance and polymorphism are good decisions. What I'm trying to say
 is that D should be rich enough to allow you considerable freedom in the
 design space, so we should have enough means to navigate around this one
 particular issue. I don't think we can say with a straight face we can't
 avoid use of static this inside std.datetime.
The only reason that it's not lazily loading is because of the purity issue an the fact that it would require a mutex. The mutex we can live with. pure can't be gotten around easily, but I'll figure it out. As for the general design, SysTime needs to be able to dynamically adjust its value based on the time zone upon request (e.g. asking for the SysTime as a string or asking for the that SysTime's year). That essentially requires that the set of functions required for the calculations be swappable (preferably as a group, since that's far cleaner). Encapsulating it in a class gives you that polymorphic behavior quite nicely and also groups the various functions quite nicely. It also gives you a nice place to put some stuff like the time zone's name. Sure, we could theoretically change it to' be struct which holds function pointers, but that seems to me like you're pretty much just trying to redesign classes that way. I think that the basic design is solid.
 There would be fewer potential issues with circular dependencies if
 std.datetime were broken up, but the consensus seems to be that we don't
 want to do that. Regardless, if I find a way to lazily load the
 singletons in spite of immutable and pure, then there won't be any more
 need for the static constructors for them. There's still one for the
 unit tests, but worse comes to worst, that functionality could be moved
 to a function which is called by the first unittest block.
Maybe the choice of immutable and pure is too restrictive. How about making the object returned const?
SysTime holds an immutable TimeZone (currently with Rebindable). In theory, this should have the advantage of making it possible to pass a SysTime across with send and receive, but bugs in the compiler currently make it impossible to construct and immutable SysTime. So, all TimeZone objects are const, or they won't work with SysTime. And since there's not normally a reason to change any of the values in a TimeZone (they don't hold much data in the first place), that's really not a problem. The only problem with making it immutable has to do with the singleton. I suppose that it could be change to Rebindable!(immutable TimeZone) like in SysTime, but when I designed it, there didn't seem much point to that, since it had to be constructed at runtime and required a static constructor regardless. And I was trying to make absolutely as much in std.datetime pure as possible, which inevitably led to the singletons being pure. Making them impure makes it so that a variety of other functions can't be pure and would break code. I don't remember how much however. Regardless, to avoid breaking code, it has to pure. It's possible that the code breakage would be worth it, but I'd have to mess around with it to see. With appropriate casts, pure can be subverted, but that's obviously ugly.
 Under what circumstances it doesn't work,
I couldn't move the singletons out of std.datetime in that way. pure disallows it.
 and how would adding _more_
 support for _less_ safety would be better than a glorified cast that you
 can use _today_?

 Clearly, I'm not going to win any arguments on this, given that both you
 and Walter are definitely opposed, but I definitely think that the
 current situation with circular dependencies is one of D's major warts.
I'm not nailed to the floor. Any good arguments would definitely change my opinion.
I don't think that I have ever seen an _actual_ circular dependency when a program blows up because of it. It's always a case of the two modules doing completely unrelated stuff with their static constructors. It's generally incredibly obvious that there's no interdependency, but the compiler/runtime isn't smart enough to see that. And if you use static constructors much (which invariably happens if you have much in the way of immutable variables which are commonly used enough to put at module or class scope), you run into this problem fairly easily. And given the large amount of inter-module importing in Phobos, it's _very_ easy to run into the problem there if we use static constructors. When such circular dependencies happen, it's a royal pain to sort out what's going on - especially if the modules to import each other directly. The error messages have improved, but it's still nasty to sort out exactly what's happening. And then fixing it? Assuming that you can use the solution that some of Phobos' modules use by having a secondary module for the initialization, then there's a way to do it, but that solution is quite ugly IMHO, and regardless of that, it's _not_ in the least bit obvious. I don't know that I ever would have thought of it myself (maybe, maybe not). So, the programmer is essentially faced with a situation where they have two modules with static constructors that they can clearly see are completely unrelated, but they're going to have to do some major refactoring to get around the issue that the compiler and runtime _aren't_ smart enough to see that there order that the modules are initialized doesn't matter at all. _If_ they think of the solution that Phobos uses or are lucky enough to have someone else points it out to them _and_ it's actually possible to refactor the static constructor out like that, then the solution is doable, albeit arguably on the ugly side. But that's assuming a lot IMHO. By contrast, we could have a simple feature that was explained in the documenation along with static constructors which made it easy to tell the compiler that the order doesn't matter - either by saying that it doesn't matter at all or that it doesn't matter in regards to a specific module. e.g. nodepends(std.file) static this() { } Now the code doesn't have to be redesigned to get around the fact that the compiler just isn't smart enough to figure it out on its own. Sure, the feature is potentially unsafe, but so are plenty of other features in D. The best situation would be if the compiler was smart enough to figure it out for itself, but barring that this definitely seems like a far cleaner solution than having to try and figure out how to break up some of the initialization code for a module into a separate module, especially when features such as immutable and pure tend to make such separation impossible without some nasty casts. It would just be way simpler to have a feature which allowed you to tell the compiler that there was no dependency. I'd probably feel differently about this if static constructors tended to have actual interdependencies, but they are almost invariably used for initializing immutable variables and the like and have no dependencies on other modules at all. It's other stuff in the modules which have those interdependencies. - Jonathan M Davis
Dec 16 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 6:54 PM, Jonathan M Davis wrote:
 By contrast, we could have a simple feature that was explained in the
 documenation along with static constructors which made it easy to tell the
 compiler that the order doesn't matter - either by saying that it doesn't
 matter at all or that it doesn't matter in regards to a specific module. e.g.

  nodepends(std.file)
 static this()
 {
 }

 Now the code doesn't have to be redesigned to get around the fact that the
 compiler just isn't smart enough to figure it out on its own. Sure, the feature
 is potentially unsafe, but so are plenty of other features in D.
That is hardly a good argument in favor of the feature :o). One issue that you might have not considered is that this is more brittle than it might seem. Even though the dependency pattern is "painfully obvious" to the human at a point in time, maintenance work can easily change that, and in very non-obvious ways (e.g. dependency cycles spanning multiple modules). I've seen it happening in C++, and when you realize it it's quite mind-boggling.
 The best
 situation would be if the compiler was smart enough to figure it out for
 itself, but barring that this definitely seems like a far cleaner solution than
 having to try and figure out how to break up some of the initialization code
 for a module into a separate module, especially when features such as
 immutable and pure tend to make such separation impossible without some nasty
 casts. It would just be way simpler to have a feature which allowed you to
 tell the compiler that there was no dependency.
I think the only right approach to this must be principled - either by CTFEing the constructor or by guaranteeing it calls no functions that may close a dependency cycle. Even without that, I'd say we're in very good shape. Andrei
Dec 16 2011
parent reply deadalnix <deadalnix gmail.com> writes:
Le 17/12/2011 02:39, Andrei Alexandrescu a écrit :
 On 12/16/11 6:54 PM, Jonathan M Davis wrote:
 By contrast, we could have a simple feature that was explained in the
 documenation along with static constructors which made it easy to tell
 the
 compiler that the order doesn't matter - either by saying that it doesn't
 matter at all or that it doesn't matter in regards to a specific
 module. e.g.

  nodepends(std.file)
 static this()
 {
 }

 Now the code doesn't have to be redesigned to get around the fact that
 the
 compiler just isn't smart enough to figure it out on its own. Sure,
 the feature
 is potentially unsafe, but so are plenty of other features in D.
That is hardly a good argument in favor of the feature :o). One issue that you might have not considered is that this is more brittle than it might seem. Even though the dependency pattern is "painfully obvious" to the human at a point in time, maintenance work can easily change that, and in very non-obvious ways (e.g. dependency cycles spanning multiple modules). I've seen it happening in C++, and when you realize it it's quite mind-boggling.
 The best
 situation would be if the compiler was smart enough to figure it out for
 itself, but barring that this definitely seems like a far cleaner
 solution than
 having to try and figure out how to break up some of the
 initialization code
 for a module into a separate module, especially when features such as
 immutable and pure tend to make such separation impossible without
 some nasty
 casts. It would just be way simpler to have a feature which allowed
 you to
 tell the compiler that there was no dependency.
I think the only right approach to this must be principled - either by CTFEing the constructor or by guaranteeing it calls no functions that may close a dependency cycle. Even without that, I'd say we're in very good shape. Andrei
Very good point. CTFE is improving with each version of dmd, and is a real alternative to static this(); It should be considered when apropriate, it has many benefices.
Dec 17 2011
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, December 17, 2011 19:44:28 deadalnix wrote:
 Very good point. CTFE is improving with each version of dmd, and is a
 real alternative to static this(); It should be considered when
 apropriate, it has many benefices.
I think that in general, the uses for static this fall into one of two categories: 1. Initializing stuff that can't be initialized at compile time. This includes stuff like classes or AAs as well as stuff which needs to be initialized with a value which isn't known until runtime (e.g. when the program started running). 2. Calling functions which need to be called at the beginning of the program (e.g. a function which does something to the environment that the program is running in). rarer of the two use cases. So, ultimately static this may become very rare. - Jonathan M Davis
Dec 17 2011
parent Somedude <lovelydear mailmetrash.com> writes:
Le 18/12/2011 03:01, Jonathan M Davis a écrit :
 On Saturday, December 17, 2011 19:44:28 deadalnix wrote:
 Very good point. CTFE is improving with each version of dmd, and is a
 real alternative to static this(); It should be considered when
 apropriate, it has many benefices.
I think that in general, the uses for static this fall into one of two categories: 1. Initializing stuff that can't be initialized at compile time. This includes stuff like classes or AAs as well as stuff which needs to be initialized with a value which isn't known until runtime (e.g. when the program started running). 2. Calling functions which need to be called at the beginning of the program (e.g. a function which does something to the environment that the program is running in). rarer of the two use cases. So, ultimately static this may become very rare. - Jonathan M Davis
Google Guice or picocontainer to deal with this issue. In the case of datetime, though, I suspect it would be a using a hammer to crush a fly.
Dec 18 2011
prev sibling parent reply so <so so.so> writes:
On Sat, 17 Dec 2011 01:50:51 +0200, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Friday, December 16, 2011 17:13:49 Andrei Alexandrescu wrote:
 Maybe there's an issue with the design. Maybe Singleton (the most damned
 of all patterns) is not the best choice here. Or maybe the use of an
 inheritance hierarchy with a grand total of 4 classes. Or maybe the
 encapsulation could be rethought.

 The general point is, a design lives within a language. Any language is
 going to disallow a few designs or make them unsuitable for particular
 situation. This is, again, multiplied by the context: it's the standard
 library.
I don't know what's wrong with singletons. It's a great pattern in certain circumstances.
I don't like patterns much but when it comes to singleton i absolutely hate it. Just ask yourself what does it do to earn that fancy name. NOTHING. It is nothing but a hype of those who want to rule everything with one paradigm. Generic solutions/rules/paradigms are our final target WHEN they are elegant. If you are using singleton in your C++/D (or any other M-P language) code, do yourself a favor and trash that book you learned it from. --- class A { static A make(); } class B; B makeB(); --- What A.make can do makeB can not? (Other than creating objects of two different types :P )
Dec 17 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/17/11 6:34 AM, so wrote:
 If you are using singleton in your C++/D (or any other M-P language)
 code, do yourself a favor and trash that book you learned it from.

 ---
 class A {
 static A make();
 }

 class B;
 B makeB();
 ---

 What A.make can do makeB can not? (Other than creating objects of two
 different types :P )
Singleton has two benefits. One, you can't accidentally create more than one instance. The second, which is often overlooked, is that you still benefit of polymorphism (as opposed to making its state global). Andrei
Dec 17 2011
next sibling parent reply so <so so.so> writes:
On Sat, 17 Dec 2011 21:20:33 +0200, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 On 12/17/11 6:34 AM, so wrote:
 If you are using singleton in your C++/D (or any other M-P language)
 code, do yourself a favor and trash that book you learned it from.

 ---
 class A {
 static A make();
 }

 class B;
 B makeB();
 ---

 What A.make can do makeB can not? (Other than creating objects of two
 different types :P )
Singleton has two benefits. One, you can't accidentally create more than one instance. The second, which is often overlooked, is that you still benefit of polymorphism (as opposed to making its state global). Andrei
Now i am puzzled, "makeB" does both and does better. (better as it doesn't expose any detail to user)
Dec 17 2011
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 17 December 2011 at 21:02:58 UTC, so wrote:
 On Sat, 17 Dec 2011 21:20:33 +0200, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 12/17/11 6:34 AM, so wrote:
 If you are using singleton in your C++/D (or any other M-P 
 language)
 code, do yourself a favor and trash that book you learned it 
 from.

 ---
 class A {
 static A make();
 }

 class B;
 B makeB();
 ---

 What A.make can do makeB can not? (Other than creating 
 objects of two
 different types :P )
Singleton has two benefits. One, you can't accidentally create more than one instance. The second, which is often overlooked, is that you still benefit of polymorphism (as opposed to making its state global). Andrei
Now i am puzzled, "makeB" does both and does better. (better as it doesn't expose any detail to user)
Both of your examples are the singleton pattern if `make` returns the same instance every time, and arguably (optionally?) A or B shouldn't be instantiable in any other way. I suspect that the reason a static member function is prevalent is because it's easy to just make the constructor private (and not have to mess with things like C++'s `friend`). In D, there's no real difference because you can still use private members as long as you're in the same module. The only difference between them I can see is that the module-level function doesn't expose the class name directly when using the function, which is but a minor improvement.
Dec 17 2011
parent reply so <so so.so> writes:
On Sat, 17 Dec 2011 23:12:16 +0200, Jakob Ovrum <jakobovrum gmail.com>  
wrote:

 I suspect that the reason a static member function is prevalent is  
 because it's easy to just make the constructor private (and not have to  
 mess with things like C++'s `friend`). In D, there's no real difference  
 because you can still use private members as long as you're in the same  
 module.
Exactly. there is no difference between "static A.make" and "makeA" in D.
 The only difference between them I can see is that the module-level  
 function doesn't expose the class name directly when using the function,  
 which is but a minor improvement.
You have to expose either way no? "A.make" instead of "makeA"
Dec 18 2011
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Sunday, 18 December 2011 at 08:56:56 UTC, so wrote:
 You have to expose either way no? "A.make" instead of "makeA"
Yeah, in most sane code, I would imagine so. But still, the original example was just `make` version `A.make`. They could both obscure their return type through various means (like auto), but imo it makes less sense to do so for the static member function - I would be surprised to call `A.make` and not get a value of type `A`. But it would only be a tiny improvement and I don't think it's really relevant to the singleton pattern.
Dec 18 2011
parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Sunday, 18 December 2011 at 09:26:58 UTC, Jakob Ovrum wrote:
 On Sunday, 18 December 2011 at 08:56:56 UTC, so wrote:
 You have to expose either way no? "A.make" instead of "makeA"
Yeah, in most sane code, I would imagine so. But still, the original example was just `make` version `A.make`. They could both obscure their return type through various means (like auto), but imo it makes less sense to do so for the static member function - I would be surprised to call `A.make` and not get a value of type `A`. But it would only be a tiny improvement and I don't think it's really relevant to the singleton pattern.
Sorry, I'm wrong, that wasn't the case at all. The original example was indeed `A.make` versus `makeB`.
Dec 18 2011
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, December 17, 2011 13:20:33 Andrei Alexandrescu wrote:
 On 12/17/11 6:34 AM, so wrote:
 If you are using singleton in your C++/D (or any other M-P language)
 code, do yourself a favor and trash that book you learned it from.
 
 ---
 class A {
 static A make();
 }
 
 class B;
 B makeB();
 ---
 
 What A.make can do makeB can not? (Other than creating objects of two
 different types :P )
Singleton has two benefits. One, you can't accidentally create more than one instance. The second, which is often overlooked, is that you still benefit of polymorphism (as opposed to making its state global).
Yes. There are occasions when singleton is very useful and makes perfect sense. There's every possibity that it's a design pattern which is overused, and if you don't need it, you probably shouldn't use it, but there _are_ cases where it's useful. In the case of std.datetime, the UTC and LocalTime classes are singletons because there's absolutely no point in ever allocating multiple of them. It would be a waste of memory. Imagine if auto time = Clock.currTime(); had to allocate a LocalTime object every time. That's a lot of useless heap allocation. By making it a singleton, it's far more efficient. Currently, it does _no_ heap allocation, and once the singleton becomes lazy, it'll only allocate on the first call. I don't see a valid reason _not_ to use a singleton in this case - certainly not as long as time zones are classes, and I think that they make the most sense as classes considering what they have to do and how they have to behave. - Jonathan M Davis
Dec 17 2011
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/16/2011 1:41 PM, Timon Gehr wrote:
 Adding a language construct that turns off the checking entirely (as you seem
to
 suggest) is not at all better than having to create a few additional source
files.
I also don't really see how turning off checking is even slightly more elegant than using a dirty cast. The additional source file thing is best because it fits in with the guarantees of the language - it is not a hack nor does it require trust in the programmer to get it right. It's not going to have heisenbugs where it working or not depends on arbitrary link order.
Dec 16 2011
prev sibling next sibling parent "Martin Nowak" <dawg dawgfoto.de> writes:
 Yes they are. static constructors completely chicken out on them. Not  
 only is
 there no real attempt to determine whether the static constructors are
 actually dependent (which granted, isn't an easy problem), but there is  
 _zero_
 support in the language for resolving such circular dependencies.  
 There's no
 way to say that they _aren't_ dependent even if you can clearly see that  
 they
 aren't. The solution used in Phobos (which won't work in std.datetime  
 due to
 the use of immutable and pure) is to create a C module which has the  
 code from
 the static constructor and then have a separate module which calls it in  
 its
 static constructor.
Which is a hack because that C function is a compiler wall while the dependency persists. Btw. that stdiobase and datebase are obsolete the cycles have vanished. You will get this only if std.dateparse had a shared static ctor too. Cycle detected between modules with ctors/dtors: std.date -> std.dateparse -> std.date object.Exception src/rt/minfo.d(309): Aborting! There is a cleaner hack to solve the issue but I really don't like it. It's two DAGs that are iterated one for "shared static this" and one for "static this". ---- module a; import b; shared static this() { } ---- module b; import a, core.atomic : cas; shared bool initialized; static this() { if (!cas(&initialized, false, true)) return; ... } ----
Jan 18 2012
prev sibling parent "Martin Nowak" <dawg dawgfoto.de> writes:
On Wed, 18 Jan 2012 12:14:07 +0100, Martin Nowak <dawg dawgfoto.de> wrote:

 Yes they are. static constructors completely chicken out on them. Not  
 only is
 there no real attempt to determine whether the static constructors are
 actually dependent (which granted, isn't an easy problem), but there is  
 _zero_
 support in the language for resolving such circular dependencies.  
 There's no
 way to say that they _aren't_ dependent even if you can clearly see  
 that they
 aren't. The solution used in Phobos (which won't work in std.datetime  
 due to
 the use of immutable and pure) is to create a C module which has the  
 code from
 the static constructor and then have a separate module which calls it  
 in its
 static constructor.
Which is a hack because that C function is a compiler wall while the dependency persists. Btw. that stdiobase and datebase are obsolete the cycles have vanished. You will get this only if std.dateparse had a shared static ctor too. Cycle detected between modules with ctors/dtors: std.date -> std.dateparse -> std.date object.Exception src/rt/minfo.d(309): Aborting! There is a cleaner hack to solve the issue but I really don't like it. It's two DAGs that are iterated one for "shared static this" and one for "static this". ---- module a; import b; shared static this() { } ---- module b; import a, core.atomic : cas; shared bool initialized; static this() { if (!cas(&initialized, false, true)) return; ... } ----
Forget about it. Immutable initialization shouldn't work from thread local ctors. But hey I found a bug and it already had a number http://d.puremagic.com/issues/show_bug.cgi?id=4923.
Jan 18 2012
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 1:41 PM, Jonathan M Davis wrote:
 You could make core.time use property functions instead of the static
 immutable variables that it's using now for ticksPerSec and
 appOrigin, but in order to do that right would require introducing a
 mutex or synchronized block (which is really just a mutex under the
 hood anyway), and I'm loathe to do that in time-related code.
 ticksPerSec gets used all over the place in TickDuration, and that
 could have a negative impact on performance for something that needs
 to be really fast (since it's used in stuff like StopWatch and
 benchmarking). On top of that, in order to maintain the current
 semantics, the property functions would have to be pure, which they
 can't be without doing some nasty casting to convince the compiler
 that stuff which isn't pure is actually pure.

 For std.datetime, the problem would be reduced if a class could be
 created in CTFE and still be around at runtime, but we can't do that
 yet, and it wouldn't completely solve the problem, since the shared
 static constructor related to LocalTime has to call tzset. So, some
 sort of runtime initialization must be done. And the instances for
 the singleton are not only immutable, but the functions for getting
 them are pure. So, once again, some nasty casting would be required
 to get it to work without breaking purity. And once again, we'd have
 introduce a mutex. And for both core.time and std.datetime we're
 talking about a mutex would be needed only briefly to ensure that we
 don't end up with two threads trying to initialize the variable at
 the same time. After that, it would just be impeding performance for
 no value. They're classic situations for static constructors -
 initializing static immutable variables - and really, they _should_
 be using static constructors. If we have to get rid of them, it's to
 get around other problems in the language or compiler instead of
 fixing those problems. So, on some level, that seems like a failure
 on the part of the language and the compiler. If we _have_ to find a
 workaround, then we have to find a workaround, but I find the need to
 be distasteful to say the least. I previously tried to get rid of the
 static constructors in std.datetime and couldn't precisely because
 they're needed unless you play major casting games to get around
 immutable and pure.

 If we play nice, it's impossible to get rid of the static
 constructors in std.datetime. It probably is possible if we do nasty
 casting, but (much as I hate to use the word) it seems like this is a
 hack to get around the fact that the compiler isn't dealing with
 static constructors as well as we'd like. I'd _really_ like to see
 this fixed at the compiler level.
I understand and empathize with the sentiment, and I agree with most of the technical points at face value, save for a few details. But there are other things at stake. Consider scope. Many arguments applicable to application code are not quite fit for the standard library. The stdlib is the connection between the compiler innards, the runtime innards, and the OS innards all meet, and the role of the stdlib is to provide nice abstractions to client code. Inside the stdlib it's entirely expected to find things like __traits most nobody heard of, casts, and other things that would be normally shunned in application code. I'd be more worried if there was no possibility to do what we need to do. The standard library is not a place to play it nice. We can't afford to say "well yeah everyone's binary is bloated and slower to start but we didn't like the cast that would have taken care of that". As another matter, there is value in minimizing compulsive work during library startup. Consider for example this code in std.datetime: shared static this() { tzset(); _localTime = new immutable(LocalTime)(); } This summons the garbage collector right off the bat, thus wiping off anyone's chance of compiling and linking without a GC - as many people seem to want to do. And that happens not to programs that import and use std.datetime, but to program using any part of the standard library that transitively imports std.datetime, even for the most innocuous uses, and even if they never, ever use _localtime! That one line essentially locks out 75% of the standard library to anyone wishing to ever avoid using the GC.
 And honestly, I think that a far worse problem with static
 constructors is circular dependencies. _That_ is something that
 needs to be addressed with regards to static constructors. In general
 at this point, it's looking like static constructors are turning out
 to be a bit of a failure on some level, given the issues that we're
 having because of them, and I think that we should fix the language
 and/or compiler so that they _aren't_ a failure.
Here I totally disagree. The design is sound. The issues discussed here are entirely detail implementation artifacts. Andrei
Dec 16 2011
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, December 16, 2011 14:44:48 Andrei Alexandrescu wrote:
 On 12/16/11 1:41 PM, Jonathan M Davis wrote:
 I understand and empathize with the sentiment, and I agree with most of
 the technical points at face value, save for a few details. But there
 are other things at stake.
 
 Consider scope. Many arguments applicable to application code are not
 quite fit for the standard library. The stdlib is the connection between
 the compiler innards, the runtime innards, and the OS innards all meet,
 and the role of the stdlib is to provide nice abstractions to client
 code. Inside the stdlib it's entirely expected to find things like
 __traits most nobody heard of, casts, and other things that would be
 normally shunned in application code. I'd be more worried if there was
 no possibility to do what we need to do. The standard library is not a
 place to play it nice. We can't afford to say "well yeah everyone's
 binary is bloated and slower to start but we didn't like the cast that
 would have taken care of that".
I'm not completely against this precisely because of this, but at the same time, it strikes me as completely ridiculous to have to resort to some nasty casting simply to reduce the binary size of the base executable. I'd much rather see the compiler improved such that this isn't necessary.
 As another matter, there is value in minimizing compulsive work during
 library startup. Consider for example this code in std.datetime:
 
 shared static this()
 {
 tzset();
 _localTime = new immutable(LocalTime)();
 }
 
 This summons the garbage collector right off the bat, thus wiping off
 anyone's chance of compiling and linking without a GC - as many people
 seem to want to do. And that happens not to programs that import and use
 std.datetime, but to program using any part of the standard library that
 transitively imports std.datetime, even for the most innocuous uses, and
 even if they never, ever use _localtime! That one line essentially locks
 out 75% of the standard library to anyone wishing to ever avoid using
 the GC.
This, on the other hand, is of much greater concern, and is a much better argument for using the ugly casting necessary to get rid of the static constructors, even if the compiler did a fanastic job at cutting out the extra cruft in the binary - though as far as the GC goes, it might not be an issue once CTFE is good enough to create classes at compile time that still exist at runtime. Unfortunately, the necessity of tzset would remain however.
 And honestly, I think that a far worse problem with static
 constructors is circular dependencies. _That_ is something that
 needs to be addressed with regards to static constructors. In general
 at this point, it's looking like static constructors are turning out
 to be a bit of a failure on some level, given the issues that we're
 having because of them, and I think that we should fix the language
 and/or compiler so that they _aren't_ a failure.
Here I totally disagree. The design is sound. The issues discussed here are entirely detail implementation artifacts.
As far as the binary size goes, I completely agree that it's an implementation issue, but I definitely think that the issues with circular dependencies is a design issue which needs to be addressed. The basics of static constructors wouldn't have to change drastically, but there should at least be a way to indicate to the compiler that there is not actually a circular dependency. I don't think that I have ever seen druntime blow up on a circular dependency where there was actually a circular dependency. It's just that the compiler (or druntime or both) isn't smart enough to determine whether the static constructors _actually_ create a circular dependency. It has no way of determining which module's static constructors should be called first and givse up. We need a way to give it that information so that it can order them when they aren't actually interdependent. _That_ is the design flaw that I see in static constructors, and it's one of the most annoying issues in the language IMHO (which arguably just goes to show how good D is in general, I suppose). - Jonathan M Davis
Dec 16 2011
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 16 Dec 2011 15:58:28 -0500, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Friday, December 16, 2011 14:44:48 Andrei Alexandrescu wrote:
 As another matter, there is value in minimizing compulsive work during
 library startup. Consider for example this code in std.datetime:

 shared static this()
 {
 tzset();
 _localTime = new immutable(LocalTime)();
 }

 This summons the garbage collector right off the bat, thus wiping off
 anyone's chance of compiling and linking without a GC - as many people
 seem to want to do. And that happens not to programs that import and use
 std.datetime, but to program using any part of the standard library that
 transitively imports std.datetime, even for the most innocuous uses, and
 even if they never, ever use _localtime! That one line essentially locks
 out 75% of the standard library to anyone wishing to ever avoid using
 the GC.
This, on the other hand, is of much greater concern, and is a much better argument for using the ugly casting necessary to get rid of the static constructors, even if the compiler did a fanastic job at cutting out the extra cruft in the binary - though as far as the GC goes, it might not be an issue once CTFE is good enough to create classes at compile time that still exist at runtime. Unfortunately, the necessity of tzset would remain however.
This can be solved with malloc and emplace -Steve
Dec 16 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 3:43 PM, Steven Schveighoffer wrote:
 On Fri, 16 Dec 2011 15:58:28 -0500, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:

 On Friday, December 16, 2011 14:44:48 Andrei Alexandrescu wrote:
 As another matter, there is value in minimizing compulsive work during
 library startup. Consider for example this code in std.datetime:

 shared static this()
 {
 tzset();
 _localTime = new immutable(LocalTime)();
 }

 This summons the garbage collector right off the bat, thus wiping off
 anyone's chance of compiling and linking without a GC - as many people
 seem to want to do. And that happens not to programs that import and use
 std.datetime, but to program using any part of the standard library that
 transitively imports std.datetime, even for the most innocuous uses, and
 even if they never, ever use _localtime! That one line essentially locks
 out 75% of the standard library to anyone wishing to ever avoid using
 the GC.
This, on the other hand, is of much greater concern, and is a much better argument for using the ugly casting necessary to get rid of the static constructors, even if the compiler did a fanastic job at cutting out the extra cruft in the binary - though as far as the GC goes, it might not be an issue once CTFE is good enough to create classes at compile time that still exist at runtime. Unfortunately, the necessity of tzset would remain however.
This can be solved with malloc and emplace
Sure you meant static ubyte[__traits(classInstanceSize, T)] and emplace :o). Andrei
Dec 16 2011
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 16 Dec 2011 16:48:18 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 12/16/11 3:43 PM, Steven Schveighoffer wrote:
 On Fri, 16 Dec 2011 15:58:28 -0500, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:

 On Friday, December 16, 2011 14:44:48 Andrei Alexandrescu wrote:
 As another matter, there is value in minimizing compulsive work during
 library startup. Consider for example this code in std.datetime:

 shared static this()
 {
 tzset();
 _localTime = new immutable(LocalTime)();
 }

 This summons the garbage collector right off the bat, thus wiping off
 anyone's chance of compiling and linking without a GC - as many people
 seem to want to do. And that happens not to programs that import and  
 use
 std.datetime, but to program using any part of the standard library  
 that
 transitively imports std.datetime, even for the most innocuous uses,  
 and
 even if they never, ever use _localtime! That one line essentially  
 locks
 out 75% of the standard library to anyone wishing to ever avoid using
 the GC.
This, on the other hand, is of much greater concern, and is a much better argument for using the ugly casting necessary to get rid of the static constructors, even if the compiler did a fanastic job at cutting out the extra cruft in the binary - though as far as the GC goes, it might not be an issue once CTFE is good enough to create classes at compile time that still exist at runtime. Unfortunately, the necessity of tzset would remain however.
This can be solved with malloc and emplace
Sure you meant static ubyte[__traits(classInstanceSize, T)] and emplace :o).
That works too! -Steve
Dec 16 2011
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
On Dec 16, 2011, at 1:48 PM, Andrei Alexandrescu wrote:

 On 12/16/11 3:43 PM, Steven Schveighoffer wrote:
 
 
 This can be solved with malloc and emplace
Sure you meant static ubyte[__traits(classInstanceSize, T)] and emplace :o).
Don't forget the 16 byte alignment :-)
Dec 16 2011
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 12/17/2011 12:11 AM, Sean Kelly wrote:
 On Dec 16, 2011, at 1:48 PM, Andrei Alexandrescu wrote:

 On 12/16/11 3:43 PM, Steven Schveighoffer wrote:
 This can be solved with malloc and emplace
Sure you meant static ubyte[__traits(classInstanceSize, T)] and emplace :o).
Don't forget the 16 byte alignment :-)
Which is currently relatively easy: http://d.puremagic.com/issues/show_bug.cgi?id=6635
Dec 16 2011
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Sean Kelly:

 On Dec 16, 2011, at 1:48 PM, Andrei Alexandrescu wrote:
 Sure you meant static ubyte[__traits(classInstanceSize, T)]
 and emplace :o).
Don't forget the 16 byte alignment :-)
Is it possible to support this in D2/D3? align(16) static ubyte[__traits(classInstanceSize, T)] _localTime; There are some situations I'd like a static array to be aligned to 16 bytes. Bye, bearophile
Dec 16 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 2:58 PM, Jonathan M Davis wrote:
 Unfortunately, the necessity of tzset would remain however.
Why? From http://pubs.opengroup.org/onlinepubs/007904875/functions/tzset.html: "The tzset() function shall use the value of the environment variable TZ to set time conversion information used by ctime(), localtime(), mktime(), and strftime(). If TZ is absent from the environment, implementation-defined default timezone information shall be used." I'd expect a good standard library implementation for D would call tzset() once per process instance, lazily, inside the wrapper functions for the four functions above. Alternatively, people could call the stdc.* versions and expect tzet() to _not_ having been called. That strikes the right balance between convenience, flexibility, and efficiency. Andrei
Dec 16 2011
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, December 16, 2011 16:58:51 Andrei Alexandrescu wrote:
 On 12/16/11 2:58 PM, Jonathan M Davis wrote:
 Unfortunately, the necessity of tzset would remain however.
Why? From http://pubs.opengroup.org/onlinepubs/007904875/functions/tzset.html: "The tzset() function shall use the value of the environment variable TZ to set time conversion information used by ctime(), localtime(), mktime(), and strftime(). If TZ is absent from the environment, implementation-defined default timezone information shall be used." I'd expect a good standard library implementation for D would call tzset() once per process instance, lazily, inside the wrapper functions for the four functions above. Alternatively, people could call the stdc.* versions and expect tzet() to _not_ having been called. That strikes the right balance between convenience, flexibility, and efficiency.
I mean that if CTFE was advanced enough that I could do immutable _localTime = new LocalTime(); then I could eliminate the shared static constructor for UTC completely, but the tzset for LocalTime would still be required. It _should_ be run once per process, and it's currently in a shared static constructor, so that's what it does. It's just not currently lazy. Regardless, my point was that even if CTFE were that advanced, the static constructor would still be required. If it's changed so that it's lazily loaded, then it can be moved out of the static constructor, but the CTFE solution wouldn't be enough. I'll look at what it would take to get rid of the static constructors and make the singletons load lazily, but it will require subverting the type system, since it's going to have to break both immutable and pure to be loaded lazily. - Jonathan M Davis
Dec 16 2011
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/16/2011 3:54 PM, Jonathan M Davis wrote:
 I'll look at what it would take to get rid of the static constructors and make
 the singletons load lazily, but it will require subverting the type system,
 since it's going to have to break both immutable and pure to be loaded lazily.
Sure, but having a way to tell the compiler "assume this constructor does not have any dependencies" also subverts the type system.
Dec 16 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 5:54 PM, Jonathan M Davis wrote:
 On Friday, December 16, 2011 16:58:51 Andrei Alexandrescu wrote:
 On 12/16/11 2:58 PM, Jonathan M Davis wrote:
 Unfortunately, the necessity of tzset would remain however.
Why? From http://pubs.opengroup.org/onlinepubs/007904875/functions/tzset.html:
"The tzset() function shall use the value of the environment variable TZ
 to set time conversion information used by ctime(), localtime(),
 mktime(), and strftime(). If TZ is absent from the environment,
 implementation-defined default timezone information shall be
 used."

 I'd expect a good standard library implementation for D would call
 tzset() once per process instance, lazily, inside the wrapper
 functions for the four functions above. Alternatively, people could
 call the stdc.* versions and expect tzet() to _not_ having been
 called.

 That strikes the right balance between convenience, flexibility,
 and efficiency.
I mean that if CTFE was advanced enough that I could do immutable _localTime = new LocalTime(); then I could eliminate the shared static constructor for UTC completely, but the tzset for LocalTime would still be required. It _should_ be run once per process, and it's currently in a shared static constructor, so that's what it does. It's just not currently lazy. Regardless, my point was that even if CTFE were that advanced, the static constructor would still be required. If it's changed so that it's lazily loaded, then it can be moved out of the static constructor, but the CTFE solution wouldn't be enough. I'll look at what it would take to get rid of the static constructors and make the singletons load lazily, but it will require subverting the type system, since it's going to have to break both immutable and pure to be loaded lazily.
I think it's all a matter of terminology. Calling tzset during module initialization is not "required", doing it otherwise is not "impossible", and the standard library does not have to always "play it nice". :o) One more thing - could you take the time to explain why you believe calling tzset() compulsively is needed? Thanks, Andrei
Dec 16 2011
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, December 16, 2011 18:47:02 Andrei Alexandrescu wrote:
 One more thing - could you take the time to explain why you believe
 calling tzset() compulsively is needed?
Some of the C stuff that LocalTime uses requires it. If LocalTime is lazily initialized, then it can be called then though rather than in the shared static constructor. - Jonathan M Davis
Dec 16 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 6:53 PM, Jonathan M Davis wrote:
 On Friday, December 16, 2011 18:47:02 Andrei Alexandrescu wrote:
 One more thing - could you take the time to explain why you believe
 calling tzset() compulsively is needed?
Some of the C stuff that LocalTime uses requires it. If LocalTime is lazily initialized, then it can be called then though rather than in the shared static constructor. - Jonathan M Davis
Thanks. Sounds like we have a plan! Andrei
Dec 16 2011
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
On Dec 16, 2011, at 12:44 PM, Andrei Alexandrescu wrote:
=20
 Consider scope. Many arguments applicable to application code are not =
quite fit for the standard library. The stdlib is the connection between = the compiler innards, the runtime innards, and the OS innards all meet, = and the role of the stdlib is to provide nice abstractions to client = code. Inside the stdlib it's entirely expected to find things like = __traits most nobody heard of, casts, and other things that would be = normally shunned in application code. I'd be more worried if there was = no possibility to do what we need to do. The standard library is not a = place to play it nice. We can't afford to say "well yeah everyone's = binary is bloated and slower to start but we didn't like the cast that = would have taken care of that". I think this is a reasonable assertion about druntime, but the standard = library itself should require very little black magic, though the use of = obscure features (like __traits) could be commonplace.=
Dec 16 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 4:55 PM, Sean Kelly wrote:
 On Dec 16, 2011, at 12:44 PM, Andrei Alexandrescu wrote:
 Consider scope. Many arguments applicable to application code are
 not quite fit for the standard library. The stdlib is the
 connection between the compiler innards, the runtime innards, and
 the OS innards all meet, and the role of the stdlib is to provide
 nice abstractions to client code. Inside the stdlib it's entirely
 expected to find things like __traits most nobody heard of, casts,
 and other things that would be normally shunned in application
 code. I'd be more worried if there was no possibility to do what we
 need to do. The standard library is not a place to play it nice. We
 can't afford to say "well yeah everyone's binary is bloated and
 slower to start but we didn't like the cast that would have taken
 care of that".
I think this is a reasonable assertion about druntime, but the standard library itself should require very little black magic,
"Very little" sounds almost enough :o).
 though the use of obscure features (like __traits) could be
 commonplace.
Absolutely. Andrei
Dec 16 2011
prev sibling next sibling parent reply Trass3r <un known.com> writes:
A related issue is phobos being an intermodule dependency monster.
A simple hello world pulls in almost 30 modules!
And std.stdio is supposed to be just a simple wrapper around C FILE.
Dec 16 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 3:38 PM, Trass3r wrote:
 A related issue is phobos being an intermodule dependency monster.
 A simple hello world pulls in almost 30 modules!
 And std.stdio is supposed to be just a simple wrapper around C FILE.
In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime. Once we solve the static constructor issue, function-level linking should take care of pulling only the minimum needed. One interesting fact is that a lot of issues that I tended to take non-critically ("templates cause bloat", "intermodule dependencies cause bloat", "static linking creates large programs") looked a whole lot differently when I looked closer at causes and effects. Andrei
Dec 16 2011
next sibling parent reply Trass3r <un known.com> writes:
Am 16.12.2011, 22:45 Uhr, schrieb Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 On 12/16/11 3:38 PM, Trass3r wrote:
 A related issue is phobos being an intermodule dependency monster.
 A simple hello world pulls in almost 30 modules!
 And std.stdio is supposed to be just a simple wrapper around C FILE.
In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime.
Yep, the 30 modules is a measure I took before that commit.
 Once we solve the static constructor issue, function-level linking  
 should take care of pulling only the minimum needed.
Also by pulling in I just meant the imports. But the planned lazy semantic analysis should improve the situation.
Dec 16 2011
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 12/16/2011 10:53 PM, Trass3r wrote:
 Am 16.12.2011, 22:45 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 12/16/11 3:38 PM, Trass3r wrote:
 A related issue is phobos being an intermodule dependency monster.
 A simple hello world pulls in almost 30 modules!
 And std.stdio is supposed to be just a simple wrapper around C FILE.
In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime.
Yep, the 30 modules is a measure I took before that commit.
 Once we solve the static constructor issue, function-level linking
 should take care of pulling only the minimum needed.
Also by pulling in I just meant the imports. But the planned lazy semantic analysis should improve the situation.
I think it is already lazy? --- module a; void foo(){ imanundefinedsymbolandcauseacompileerror(); } --- --- module b; import a; void main(){ foo(); } ---
Dec 16 2011
prev sibling next sibling parent Bane <branimir.milosavljevic gmail.com> writes:
Andrei Alexandrescu Wrote:

 On 12/16/11 3:38 PM, Trass3r wrote:
 A related issue is phobos being an intermodule dependency monster.
 A simple hello world pulls in almost 30 modules!
 And std.stdio is supposed to be just a simple wrapper around C FILE.
In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime. Once we solve the static constructor issue, function-level linking should take care of pulling only the minimum needed. One interesting fact is that a lot of issues that I tended to take non-critically ("templates cause bloat", "intermodule dependencies cause bloat", "static linking creates large programs") looked a whole lot differently when I looked closer at causes and effects. Andrei
http://wiki.freepascal.org/Size_Matters Otherwise a great language that never did manage to remove "bloated" factor from its name. Many people stopped using it because of that, including me. I guess people do not like bloat when programming systems stuff.
Dec 16 2011
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 16 December 2011 at 21:45:43 UTC, Andrei Alexandrescu 
wrote:
 Once we solve the static constructor issue, function-level 
 linking should take care of pulling only the minimum needed.
This sounds fantastic.
 One interesting fact is that a lot of issues that I tended to 
 take non-critically ("templates cause bloat", "intermodule 
 dependencies cause bloat", "static linking creates large 
 programs") looked a whole lot differently when I looked closer 
 at causes and effects.
I'd be careful to overgeneralize from this though; templates do have the potential to bloat things up, etc. Though static linking has and always shall rok. (For bloated templates, I had a monster of one in web.d that shrunk the binary by about three megabytes by refactoring some of it into regular functions. Shaved two seconds off the compile time too! Note this binary is my work project, so your results may vary with my library. It was basically inlining several kilobytes of the same stuff into hundreds of different functions... 10 kb * 300 functions = lots of code.)
Dec 16 2011
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/16/2011 1:45 PM, Andrei Alexandrescu wrote:
 On 12/16/11 3:38 PM, Trass3r wrote:
 A related issue is phobos being an intermodule dependency monster.
 A simple hello world pulls in almost 30 modules!
 And std.stdio is supposed to be just a simple wrapper around C FILE.
In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime.
Another thing is to avoid using classes for things where one does not expect it to ever be derived from. Use a struct instead, as referencing parts of the struct implementation will not pull in the whole of it, nor is there a vtbl[] to pull it all in. For example, in std.datetime there's "final class Clock". It inherits nothing, and nothing can be derived from it. The comments for it say it is merely a namespace. It should be a struct.
Dec 16 2011
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 16 Dec 2011 17:55:47 -0500, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 12/16/2011 1:45 PM, Andrei Alexandrescu wrote:
 On 12/16/11 3:38 PM, Trass3r wrote:
 A related issue is phobos being an intermodule dependency monster.
 A simple hello world pulls in almost 30 modules!
 And std.stdio is supposed to be just a simple wrapper around C FILE.
In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime.
Another thing is to avoid using classes for things where one does not expect it to ever be derived from. Use a struct instead, as referencing parts of the struct implementation will not pull in the whole of it, nor is there a vtbl[] to pull it all in. For example, in std.datetime there's "final class Clock". It inherits nothing, and nothing can be derived from it. The comments for it say it is merely a namespace. It should be a struct.
Although I don't disagree with you that it should be a struct and not a class, does it have anything in its vtbl anyways if it's final? I'm just trying to understand what gets pulled in when you import a module with static ctors... -Steve
Dec 19 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/19/2011 7:17 AM, Steven Schveighoffer wrote:
 On Fri, 16 Dec 2011 17:55:47 -0500, Walter Bright <newshound2 digitalmars.com>
 wrote:
 For example, in std.datetime there's "final class Clock". It inherits nothing,
 and nothing can be derived from it. The comments for it say it is merely a
 namespace. It should be a struct.
Although I don't disagree with you that it should be a struct and not a class, does it have anything in its vtbl anyways if it's final?
Yes. The pointers to Object's functions, and a pointer to the TypeInfo for that class.
 I'm just trying to
 understand what gets pulled in when you import a module with static ctors...
Write some trivial code snippets, compile them, and take a look at the object file with obj2asm.
Dec 19 2011
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 19 Dec 2011 13:09:42 -0500, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 12/19/2011 7:17 AM, Steven Schveighoffer wrote:
 On Fri, 16 Dec 2011 17:55:47 -0500, Walter Bright  
 <newshound2 digitalmars.com>
 wrote:
 For example, in std.datetime there's "final class Clock". It inherits  
 nothing,
 and nothing can be derived from it. The comments for it say it is  
 merely a
 namespace. It should be a struct.
Although I don't disagree with you that it should be a struct and not a class, does it have anything in its vtbl anyways if it's final?
Yes. The pointers to Object's functions, and a pointer to the TypeInfo for that class.
Well pointers to Object's functions shouldn't add any bloat. The TypeInfo may, but that shouldn't pull in any real code from the module, right?
 I'm just trying to
 understand what gets pulled in when you import a module with static  
 ctors...
Write some trivial code snippets, compile them, and take a look at the object file with obj2asm.
I'll rephrase -- I'm trying to understand what's *supposed* to happen :) Trusting that the compiler is doing it right isn't always correct. Though it probably is in this case. -Steve
Dec 19 2011
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/16/2011 2:55 PM, Walter Bright wrote:
 For example, in std.datetime there's "final class Clock". It inherits nothing,
 and nothing can be derived from it. The comments for it say it is merely a
 namespace. It should be a struct.
Or perhaps it should be in its own module.
Dec 19 2011
parent reply "Marco Leise" <Marco.Leise gmx.de> writes:
Am 19.12.2011, 19:08 Uhr, schrieb Walter Bright  
<newshound2 digitalmars.com>:

 On 12/16/2011 2:55 PM, Walter Bright wrote:
 For example, in std.datetime there's "final class Clock". It inherits  
 nothing,
 and nothing can be derived from it. The comments for it say it is  
 merely a
 namespace. It should be a struct.
Or perhaps it should be in its own module.
When I first saw it I thought "That's how _Java_ goes about free functions: Make it a class." :)
Dec 20 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/20/11 2:58 PM, Marco Leise wrote:
 Am 19.12.2011, 19:08 Uhr, schrieb Walter Bright
 <newshound2 digitalmars.com>:

 On 12/16/2011 2:55 PM, Walter Bright wrote:
 For example, in std.datetime there's "final class Clock". It inherits
 nothing,
 and nothing can be derived from it. The comments for it say it is
 merely a
 namespace. It should be a struct.
Or perhaps it should be in its own module.
When I first saw it I thought "That's how _Java_ goes about free functions: Make it a class." :)
Same here. If I had my way I'd rethink the name of those functions. Having a cutesy prefix "Clock." is hardly justifiable. Andrei
Dec 20 2011
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, December 20, 2011 17:32:53 Andrei Alexandrescu wrote:
 On 12/20/11 2:58 PM, Marco Leise wrote:
 Am 19.12.2011, 19:08 Uhr, schrieb Walter Bright
 
 <newshound2 digitalmars.com>:
 On 12/16/2011 2:55 PM, Walter Bright wrote:
 For example, in std.datetime there's "final class Clock". It
 inherits
 nothing,
 and nothing can be derived from it. The comments for it say it is
 merely a
 namespace. It should be a struct.
Or perhaps it should be in its own module.
When I first saw it I thought "That's how _Java_ goes about free functions: Make it a class." :)
Same here. If I had my way I'd rethink the name of those functions. Having a cutesy prefix "Clock." is hardly justifiable.
It's not the only place in Phobos which uses a class as a namespace. I believe that both std.process and std.windows.registry are doing the same thing. In this case, it nicely group all of the functions that are grabbing the time in one form or another. They're all effectively grabbing the time from the system clock, so they're grouped on Clock. - Jonathan M Davis
Dec 20 2011
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Wednesday, 21 December 2011 at 02:10:30 UTC, Jonathan M Davis 
wrote:
 On Tuesday, December 20, 2011 17:32:53 Andrei Alexandrescu 
 wrote:
 On 12/20/11 2:58 PM, Marco Leise wrote:
 Am 19.12.2011, 19:08 Uhr, schrieb Walter Bright
 
 <newshound2 digitalmars.com>:
 On 12/16/2011 2:55 PM, Walter Bright wrote:
 For example, in std.datetime there's "final class Clock". 
 It
 inherits
 nothing,
 and nothing can be derived from it. The comments for it 
 say it is
 merely a
 namespace. It should be a struct.
Or perhaps it should be in its own module.
When I first saw it I thought "That's how _Java_ goes about free functions: Make it a class." :)
Same here. If I had my way I'd rethink the name of those functions. Having a cutesy prefix "Clock." is hardly justifiable.
It's not the only place in Phobos which uses a class as a namespace. I believe that both std.process and std.windows.registry are doing the same thing. In this case, it nicely group all of the functions that are grabbing the time in one form or another. They're all effectively grabbing the time from the system clock, so they're grouped on Clock. - Jonathan M Davis
Sounds like the perfect candidate for its own module.
Dec 20 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, December 21, 2011 06:18:59 Jakob Ovrum wrote:
 On Wednesday, 21 December 2011 at 02:10:30 UTC, Jonathan M Davis
 It's not the only place in Phobos which uses a class as a
 namespace. I believe that both std.process and
 std.windows.registry are doing the same thing.
 
 In this case, it nicely group all of the functions that are
 grabbing the time in one form or another. They're all
 effectively grabbing the time from the system clock, so they're
 grouped on Clock.
 
 - Jonathan M Davis
Sounds like the perfect candidate for its own module.
Not out of the question, I suppose, but it would make an awfully small module and would inevitably make it that much harder for people to figure out how to get the current time. - Jonathan M Davis
Dec 20 2011
parent reply so <so so.so> writes:
On Wed, 21 Dec 2011 07:34:30 +0200, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Wednesday, December 21, 2011 06:18:59 Jakob Ovrum wrote:
 On Wednesday, 21 December 2011 at 02:10:30 UTC, Jonathan M Davis
 It's not the only place in Phobos which uses a class as a
 namespace. I believe that both std.process and
 std.windows.registry are doing the same thing.

 In this case, it nicely group all of the functions that are
 grabbing the time in one form or another. They're all
 effectively grabbing the time from the system clock, so they're
 grouped on Clock.

 - Jonathan M Davis
Sounds like the perfect candidate for its own module.
Not out of the question, I suppose, but it would make an awfully small module and would inevitably make it that much harder for people to figure out how to get the current time. - Jonathan M Davis
Supporting module nesting in single file wouldn't hurt, would it? module main; module nested { }
Dec 21 2011
parent Michal Minich <michal.minich gmail.com> writes:
On 21. 12. 2011 14:22, so wrote:
 Supporting module nesting in single file wouldn't hurt, would it?

 module main;
 module nested
 {
 }
Kind of... template MyNamespaceImpl () { int i; } alias MyNamespaceImpl!() MyNamespace; void main () { MyNamespace.i = 1; with (MyNamespace) { i = 2; } writeln(MyNamespace.i); readln(); }
Dec 21 2011
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, December 20, 2011 21:34:30 Jonathan M Davis wrote:
 On Wednesday, December 21, 2011 06:18:59 Jakob Ovrum wrote:
 On Wednesday, 21 December 2011 at 02:10:30 UTC, Jonathan M Davis
 
 It's not the only place in Phobos which uses a class as a
 namespace. I believe that both std.process and
 std.windows.registry are doing the same thing.
 
 In this case, it nicely group all of the functions that are
 grabbing the time in one form or another. They're all
 effectively grabbing the time from the system clock, so they're
 grouped on Clock.
 
 - Jonathan M Davis
Sounds like the perfect candidate for its own module.
Not out of the question, I suppose, but it would make an awfully small module and would inevitably make it that much harder for people to figure out how to get the current time.
Not to mention, I quite like the effect that you get with it as a class, since it's explicit that it's coming from the clock, whereas if it were a module, that wouldn't be the case. You get the same effect with std.process' Environment. When you're calling functions on it, it's explicit that you're getting information from and affecting the environment. In a way, it's like a singleton, but there's nothing to instantiate. - Jonathan M Davis
Dec 20 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Dec 16, 2011, at 1:45 PM, Andrei Alexandrescu wrote:

 On 12/16/11 3:38 PM, Trass3r wrote:
 A related issue is phobos being an intermodule dependency monster.
 A simple hello world pulls in almost 30 modules!
 And std.stdio is supposed to be just a simple wrapper around C FILE.
=20 In fact it doesn't (after yesterday's commit). The std code in hello, =
world is a minuscule 3KB. The rest of 218KB is runtime. Once upon a time, a minimal D app was roughly 65K. TypeInfo has = ballooned a lot since then however. It's worth considering whether = you're writing a Windows or Posix app as well, since the Posix headers = are far more extensive (and thus may result in far more ModuleInfo = instances).=
Dec 16 2011
prev sibling parent Somedude <lovelydear mailmetrash.com> writes:
Le 16/12/2011 22:45, Andrei Alexandrescu a écrit :
 On 12/16/11 3:38 PM, Trass3r wrote:
 A related issue is phobos being an intermodule dependency monster.
 A simple hello world pulls in almost 30 modules!
 And std.stdio is supposed to be just a simple wrapper around C FILE.
In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime. Once we solve the static constructor issue, function-level linking should take care of pulling only the minimum needed. One interesting fact is that a lot of issues that I tended to take non-critically ("templates cause bloat", "intermodule dependencies cause bloat", "static linking creates large programs") looked a whole lot differently when I looked closer at causes and effects. Andrei
Fantastic ! :)
Dec 17 2011
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
On Dec 16, 2011, at 1:38 PM, Trass3r wrote:

 A related issue is phobos being an intermodule dependency monster.
 A simple hello world pulls in almost 30 modules!
This was one of the major motivations for separating druntime from = phobos. The last thing anyone wants is for something in runtime to = print to the console and end up pulling in 80% of the standard library = as a result.=
Dec 16 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/16/11 5:08 PM, Sean Kelly wrote:
 On Dec 16, 2011, at 1:38 PM, Trass3r wrote:

 A related issue is phobos being an intermodule dependency monster.
 A simple hello world pulls in almost 30 modules!
This was one of the major motivations for separating druntime from phobos. The last thing anyone wants is for something in runtime to print to the console and end up pulling in 80% of the standard library as a result.
Well, right now druntime itself may have become the interdependency knot it once wanted to shun :o). Commenting out all static cdtors from druntime only reduced the code size from 218KB to 200KB for a do-nothing program, so most of druntime is compulsively linked and loaded. I think we can improve things a bit there. Andrei
Dec 16 2011
parent reply Sean Kelly <sean invisibleduck.org> writes:
On Dec 16, 2011, at 3:16 PM, Andrei Alexandrescu wrote:

 On 12/16/11 5:08 PM, Sean Kelly wrote:
 On Dec 16, 2011, at 1:38 PM, Trass3r wrote:
=20
 A related issue is phobos being an intermodule dependency monster.
 A simple hello world pulls in almost 30 modules!
=20 This was one of the major motivations for separating druntime from phobos. The last thing anyone wants is for something in runtime to print to the console and end up pulling in 80% of the standard library as a result.
=20 Well, right now druntime itself may have become the interdependency =
knot it once wanted to shun :o). The first place to look would be rt/. I know there's some tool that = generates dependency graphs for D. Does Descent do that?=
Dec 16 2011
parent torhu <no spam.invalid> writes:
On 17.12.2011 00:34, Sean Kelly wrote:
 On Dec 16, 2011, at 3:16 PM, Andrei Alexandrescu wrote:

  On 12/16/11 5:08 PM, Sean Kelly wrote:
  On Dec 16, 2011, at 1:38 PM, Trass3r wrote:

  A related issue is phobos being an intermodule dependency monster.
  A simple hello world pulls in almost 30 modules!
This was one of the major motivations for separating druntime from phobos. The last thing anyone wants is for something in runtime to print to the console and end up pulling in 80% of the standard library as a result.
Well, right now druntime itself may have become the interdependency knot it once wanted to shun :o).
The first place to look would be rt/. I know there's some tool that generates dependency graphs for D. Does Descent do that?
Maybe this is the tool you're thinking of: http://www.shfls.org/w/d/dimple/
Dec 16 2011
prev sibling next sibling parent Richard Webb <webby beardmouse.org.uk> writes:
On 16/12/2011 18:29, Andrei Alexandrescu wrote:
 Here's a list of all files in std using static cdtors:

 std/__fileinit.d
 std/concurrency.d
 std/cpuid.d
 std/cstream.d
 std/datebase.d
 std/datetime.d
 std/encoding.d
 std/internal/math/biguintcore.d
 std/internal/math/biguintx86.d
 std/internal/processinit.d
 std/internal/windows/advapi32.d
 std/mmfile.d
 std/parallelism.d
 std/perf.d
 std/socket.d
 std/stdiobase.d
 std/uri.d
On a slightly related note: http://d.puremagic.com/issues/show_bug.cgi?id=5614 Basically, do the static constructors in __fileinit and mmfile need to exist on a (hypothetical) 64bit Windows build?
Dec 16 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Dec 16, 2011, at 10:29 AM, Andrei Alexandrescu wrote:
=20
 But in experiments it seemed like program size would increase in =
sudden amounts when certain modules were included. After much = investigation we figured that the following fateful causal sequence = happened:
=20
 1. Some modules define static constructors with "static this()" or =
"static shared this()", and/or static destructors.
=20
 2. These constructors/destructors are linked in automatically whenever =
a module is included.
=20
 3. Importing a module with a static constructor (or destructor) will =
generate its ModuleInfo structure, which contains static information = about all module members. In particular, it keeps virtual table pointers = for all classes defined inside the module. What is gained from having class vtbls referenced by ModuleInfo? Could = we put them elsewhere?=
Dec 16 2011
prev sibling next sibling parent "Martin Nowak" <dawg dawgfoto.de> writes:
On Fri, 16 Dec 2011 19:29:18 +0100, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Hello,


 Late last night Walter and I figured a few interesting tidbits of  
 information. Allow me to give some context, discuss them, and sketch a  
 few approaches for improving things.

 A while ago Walter wanted to enable function-level linking, i.e. only  
 get the needed functions from a given (and presumably large) module. So  
 he arranged things that a library contains many small object "files"  
 (that actually are generated from a single .d file and never exist on  
 disk, only inside the library file, which can be considered an archive  
 like tar). Then the linker would only pick the used object "files" from  
 the library and link those in. Unfortunately that didn't have nearly the  
 expected impact - essentially the size of most binaries stayed the same.  
 The mystery was unsolved, and Walter needed to move on to other things.

 One particularly annoying issue is that even programs that don't  
 ostensibly use anything from an imported module may balloon inexplicably  
 in size. Consider:

 import std.path;
 void main(){}

 This program, after stripping and all, has some 750KB in size. Removing  
 the import line reduces the size to 218KB. That includes the runtime  
 support, garbage collector, and such, and I'll consider it a baseline.  
 (A similar but separate discussion could be focused on reducing the  
 baseline size, but herein I'll consider it constant.)

 What we'd simply want is to be able to import stuff without blatantly  
 paying for what we don't use. If a program imports std.path and uses no  
 function from it, it should be as large as a program without the import.  
 Furthermore, the increase should be incremental - using 2-3 functions  
 from std.path should only increase the executable size by a little, not  
 suddenly link in all code in that module.

 But in experiments it seemed like program size would increase in sudden  
 amounts when certain modules were included. After much investigation we  
 figured that the following fateful causal sequence happened:

 1. Some modules define static constructors with "static this()" or  
 "static shared this()", and/or static destructors.

 2. These constructors/destructors are linked in automatically whenever a  
 module is included.

 3. Importing a module with a static constructor (or destructor) will  
 generate its ModuleInfo structure, which contains static information  
 about all module members. In particular, it keeps virtual table pointers  
 for all classes defined inside the module.

 4. That means generating ModuleInfo refers all virtual functions defined  
 in that module, whether they're used or not.

 5. The phenomenon is transitive, e.g. even if std.path has no static  
 constructors but imports std.datetime which does, a ModuleInfo is  
 generated for std.path too, in addition to the one for std.datetime. So  
 now classes inside std.path (if any) will be all linked in.

 6. It follows that a module that defines classes which in turn use other  
 functions in other modules, and has static constructors (or includes  
 other modules that do) will baloon the size of the executable suddenly.

 There are a few approaches that we can use to improve the state of  
 affairs.

 A. On the library side, use static constructors and destructors  
 sparingly inside druntime and std. We can use lazy initialization  
 instead of compulsively initializing library internals. I think this is  
 often a worthy thing to do in any case (dynamic libraries etc) because  
 it only does work if and when work needs to be done at the small cost of  
 a check upon each use.

 B. On the compiler side, we could use a similar lazy initialization  
 trick to only refer class methods in the module if they're actually  
 needed. I'm being vague here because I'm not sure what and how that can  
 be done.

 Here's a list of all files in std using static cdtors:

 std/__fileinit.d
 std/concurrency.d
 std/cpuid.d
 std/cstream.d
 std/datebase.d
 std/datetime.d
 std/encoding.d
 std/internal/math/biguintcore.d
 std/internal/math/biguintx86.d
 std/internal/processinit.d
 std/internal/windows/advapi32.d
 std/mmfile.d
 std/parallelism.d
 std/perf.d
 std/socket.d
 std/stdiobase.d
 std/uri.d

 The majority of them don't do a lot of work and are not much used inside  
 phobos, so they don't blow up the executable. The main one that could  
 receive some attention is std.datetime. It has a few static ctors and a  
 lot of classes. Essentially just importing std.datetime or any std  
 module that transitively imports std.datetime (and there are many of  
 them) ends up linking in most of Phobos and blows the size up from the  
 218KB baseline to 700KB.

 Jonathan, could I impose on you to replace all static cdtors in  
 std.datetime with lazy initialization? I looked through it and it  
 strikes me as a reasonably simple job, but I think you'd know better  
 what to do than me.

 A similar effort could be conducted to reduce or eliminate static cdtors  
 from druntime. I made the experiment of commenting them all, and that  
 reduced the size of the baseline from 218KB to 200KB. This is a good  
 amount, but not as dramatic as what we can get by working on  
 std.datetime.


 Thanks,

 Andrei
We'd need the linker to do anything of this. Unreferenced symbols should be outputted using kind of vague linkage (multiobj partly does this). I-reference-everything stuff link ModuleInfos should only create weak references. This includes that localClasses might contain only part of the actual module. People can use the designated export attribute to forcefully output unused symbols.
Dec 16 2011
prev sibling next sibling parent reply "Martin Nowak" <dawg dawgfoto.de> writes:
On Sat, 17 Dec 2011 07:09:50 +0100, Martin Nowak <dawg dawgfoto.de> wrote:

 On Fri, 16 Dec 2011 19:29:18 +0100, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Hello,


 Late last night Walter and I figured a few interesting tidbits of  
 information. Allow me to give some context, discuss them, and sketch a  
 few approaches for improving things.

 A while ago Walter wanted to enable function-level linking, i.e. only  
 get the needed functions from a given (and presumably large) module. So  
 he arranged things that a library contains many small object "files"  
 (that actually are generated from a single .d file and never exist on  
 disk, only inside the library file, which can be considered an archive  
 like tar). Then the linker would only pick the used object "files" from  
 the library and link those in. Unfortunately that didn't have nearly  
 the expected impact - essentially the size of most binaries stayed the  
 same. The mystery was unsolved, and Walter needed to move on to other  
 things.

 One particularly annoying issue is that even programs that don't  
 ostensibly use anything from an imported module may balloon  
 inexplicably in size. Consider:

 import std.path;
 void main(){}

 This program, after stripping and all, has some 750KB in size. Removing  
 the import line reduces the size to 218KB. That includes the runtime  
 support, garbage collector, and such, and I'll consider it a baseline.  
 (A similar but separate discussion could be focused on reducing the  
 baseline size, but herein I'll consider it constant.)

 What we'd simply want is to be able to import stuff without blatantly  
 paying for what we don't use. If a program imports std.path and uses no  
 function from it, it should be as large as a program without the  
 import. Furthermore, the increase should be incremental - using 2-3  
 functions from std.path should only increase the executable size by a  
 little, not suddenly link in all code in that module.

 But in experiments it seemed like program size would increase in sudden  
 amounts when certain modules were included. After much investigation we  
 figured that the following fateful causal sequence happened:

 1. Some modules define static constructors with "static this()" or  
 "static shared this()", and/or static destructors.

 2. These constructors/destructors are linked in automatically whenever  
 a module is included.

 3. Importing a module with a static constructor (or destructor) will  
 generate its ModuleInfo structure, which contains static information  
 about all module members. In particular, it keeps virtual table  
 pointers for all classes defined inside the module.

 4. That means generating ModuleInfo refers all virtual functions  
 defined in that module, whether they're used or not.

 5. The phenomenon is transitive, e.g. even if std.path has no static  
 constructors but imports std.datetime which does, a ModuleInfo is  
 generated for std.path too, in addition to the one for std.datetime. So  
 now classes inside std.path (if any) will be all linked in.

 6. It follows that a module that defines classes which in turn use  
 other functions in other modules, and has static constructors (or  
 includes other modules that do) will baloon the size of the executable  
 suddenly.

 There are a few approaches that we can use to improve the state of  
 affairs.

 A. On the library side, use static constructors and destructors  
 sparingly inside druntime and std. We can use lazy initialization  
 instead of compulsively initializing library internals. I think this is  
 often a worthy thing to do in any case (dynamic libraries etc) because  
 it only does work if and when work needs to be done at the small cost  
 of a check upon each use.

 B. On the compiler side, we could use a similar lazy initialization  
 trick to only refer class methods in the module if they're actually  
 needed. I'm being vague here because I'm not sure what and how that can  
 be done.

 Here's a list of all files in std using static cdtors:

 std/__fileinit.d
 std/concurrency.d
 std/cpuid.d
 std/cstream.d
 std/datebase.d
 std/datetime.d
 std/encoding.d
 std/internal/math/biguintcore.d
 std/internal/math/biguintx86.d
 std/internal/processinit.d
 std/internal/windows/advapi32.d
 std/mmfile.d
 std/parallelism.d
 std/perf.d
 std/socket.d
 std/stdiobase.d
 std/uri.d

 The majority of them don't do a lot of work and are not much used  
 inside phobos, so they don't blow up the executable. The main one that  
 could receive some attention is std.datetime. It has a few static ctors  
 and a lot of classes. Essentially just importing std.datetime or any  
 std module that transitively imports std.datetime (and there are many  
 of them) ends up linking in most of Phobos and blows the size up from  
 the 218KB baseline to 700KB.

 Jonathan, could I impose on you to replace all static cdtors in  
 std.datetime with lazy initialization? I looked through it and it  
 strikes me as a reasonably simple job, but I think you'd know better  
 what to do than me.

 A similar effort could be conducted to reduce or eliminate static  
 cdtors from druntime. I made the experiment of commenting them all, and  
 that reduced the size of the baseline from 218KB to 200KB. This is a  
 good amount, but not as dramatic as what we can get by working on  
 std.datetime.


 Thanks,

 Andrei
We'd need the linker to do anything of this. Unreferenced symbols should be outputted using kind of vague linkage (multiobj partly does this). I-reference-everything stuff link ModuleInfos should only create weak references. This includes that localClasses
More concrete if we'd output weak defined symbols (null) for what is referenced by a ModuleInfo then the linker should not open further object files to find a definition. But if another definition is linked in it will replace the weak definition. The program would then need to skip the dummy symbols (null) at runtime.
 might contain only
 part of the actual module. People can use the designated export  
 attribute to forcefully
 output unused symbols.
Dec 16 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/17/11 12:27 AM, Martin Nowak wrote:
 We'd need the linker to do anything of this. Unreferenced symbols
 should be outputted using
 kind of vague linkage (multiobj partly does this).
 I-reference-everything stuff link ModuleInfos
 should only create weak references. This includes that localClasses
More concrete if we'd output weak defined symbols (null) for what is referenced by a ModuleInfo then the linker should not open further object files to find a definition. But if another definition is linked in it will replace the weak definition. The program would then need to skip the dummy symbols (null) at runtime.
I think it would be awesome to exploit weak symbols. Andrei
Dec 16 2011
prev sibling parent reply Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
16.12.2011 21:29, Andrei Alexandrescu пишет:
 Hello,


 Late last night Walter and I figured a few interesting tidbits of
 information. Allow me to give some context, discuss them, and sketch a
 few approaches for improving things.

 A while ago Walter wanted to enable function-level linking, i.e. only
 get the needed functions from a given (and presumably large) module. So
 he arranged things that a library contains many small object "files"
 (that actually are generated from a single .d file and never exist on
 disk, only inside the library file, which can be considered an archive
 like tar). Then the linker would only pick the used object "files" from
 the library and link those in. Unfortunately that didn't have nearly the
 expected impact - essentially the size of most binaries stayed the same.
 The mystery was unsolved, and Walter needed to move on to other things.

 One particularly annoying issue is that even programs that don't
 ostensibly use anything from an imported module may balloon inexplicably
 in size. Consider:

 import std.path;
 void main(){}

 This program, after stripping and all, has some 750KB in size. Removing
 the import line reduces the size to 218KB. That includes the runtime
 support, garbage collector, and such, and I'll consider it a baseline.
 (A similar but separate discussion could be focused on reducing the
 baseline size, but herein I'll consider it constant.)

 What we'd simply want is to be able to import stuff without blatantly
 paying for what we don't use. If a program imports std.path and uses no
 function from it, it should be as large as a program without the import.
 Furthermore, the increase should be incremental - using 2-3 functions
 from std.path should only increase the executable size by a little, not
 suddenly link in all code in that module.

 But in experiments it seemed like program size would increase in sudden
 amounts when certain modules were included. After much investigation we
 figured that the following fateful causal sequence happened:

 1. Some modules define static constructors with "static this()" or
 "static shared this()", and/or static destructors.

 2. These constructors/destructors are linked in automatically whenever a
 module is included.

 3. Importing a module with a static constructor (or destructor) will
 generate its ModuleInfo structure, which contains static information
 about all module members. In particular, it keeps virtual table pointers
 for all classes defined inside the module.

 4. That means generating ModuleInfo refers all virtual functions defined
 in that module, whether they're used or not.

 5. The phenomenon is transitive, e.g. even if std.path has no static
 constructors but imports std.datetime which does, a ModuleInfo is
 generated for std.path too, in addition to the one for std.datetime. So
 now classes inside std.path (if any) will be all linked in.

 6. It follows that a module that defines classes which in turn use other
 functions in other modules, and has static constructors (or includes
 other modules that do) will baloon the size of the executable suddenly.

 There are a few approaches that we can use to improve the state of affairs.

 A. On the library side, use static constructors and destructors
 sparingly inside druntime and std. We can use lazy initialization
 instead of compulsively initializing library internals. I think this is
 often a worthy thing to do in any case (dynamic libraries etc) because
 it only does work if and when work needs to be done at the small cost of
 a check upon each use.

 B. On the compiler side, we could use a similar lazy initialization
 trick to only refer class methods in the module if they're actually
 needed. I'm being vague here because I'm not sure what and how that can
 be done.

 Here's a list of all files in std using static cdtors:

 std/__fileinit.d
 std/concurrency.d
 std/cpuid.d
 std/cstream.d
 std/datebase.d
 std/datetime.d
 std/encoding.d
 std/internal/math/biguintcore.d
 std/internal/math/biguintx86.d
 std/internal/processinit.d
 std/internal/windows/advapi32.d
 std/mmfile.d
 std/parallelism.d
 std/perf.d
 std/socket.d
 std/stdiobase.d
 std/uri.d

 The majority of them don't do a lot of work and are not much used inside
 phobos, so they don't blow up the executable. The main one that could
 receive some attention is std.datetime. It has a few static ctors and a
 lot of classes. Essentially just importing std.datetime or any std
 module that transitively imports std.datetime (and there are many of
 them) ends up linking in most of Phobos and blows the size up from the
 218KB baseline to 700KB.

 Jonathan, could I impose on you to replace all static cdtors in
 std.datetime with lazy initialization? I looked through it and it
 strikes me as a reasonably simple job, but I think you'd know better
 what to do than me.

 A similar effort could be conducted to reduce or eliminate static cdtors
 from druntime. I made the experiment of commenting them all, and that
 reduced the size of the baseline from 218KB to 200KB. This is a good
 amount, but not as dramatic as what we can get by working on std.datetime.


 Thanks,

 Andrei
Really sorry, but it sounds silly for me. It's a minor problem. Does anyone really cares about 600 KiB (3.5x) size change in an empty program? Yes, he does, but only if there is no other size increases in real programs. Now dmd have at least _two order of magnitude_ file size increase. I posted that problem four months ago at "Building GtkD app on Win32 results in 111 MiB file mostly from zeroes". An example of this bug is in archive: http://deoma-cmd.ru/files/other/gtkD-1.5.1-size.7z Built version (with *.exe and *.lib files): http://deoma-cmd.ru/files/other/gtkD-1.5.1-size-built.7z Detailed description: GtkD is built using singe (gtk-one-obj.lib) or separate (one per source file) object files (gtk-sep-obj.lib). Than main.d that imports gtk.Main is built using those libraries. Than zeroCount utils is built and launched over resulting files: -------------------------------------------------- Now let's calculate zero bytes counts: -------------------------------------------------- Zero bytes| %| Non-zero| Total bytes| File 3628311| 21.56| 13202153| 16830464|gtk-one-obj.lib 1953124| 15.98| 10272924| 12226048|gtk-sep-obj.lib 127968798| 99.00| 1298430| 129267228|main-one-obj.exe 743821| 37.51| 1239183| 1983004|main-sep-obj.exe Done. So we have to use very slow per-file build to produce a good (not 100 MiB) executable. No matter what *.exe is launched, its process allocates ~20MiB of RAM (loaded Gtk dll-s). The second dmd issue (that was discovered because of 99.00% of zeros) is that _it doesn't use bss section_. Lets look at the C++ program built using Microsoft's cl: --- char arr[1024 * 1024 * 10]; void main() { } --- It resultis in ~10KiB executable, because `arr` is initialized with zero bytes and put in bss section. If one of its elements is set to non-zero: --- char arr[1024 * 1024 * 10] = { 1 }; void main() { } --- The array can't be in .bss any more and resulting executable size will be increased by adding ~10MiB. The following D program results in ~10MiB executable: --- ubyte[1024 * 1024 * 10] arr; void main() { } --- So, if there really is a reason not to use .bss, it should be clearly explained. If described issues aren't much more significant than "static this()", show me where am I wrong, please.
Dec 20 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/20/11 9:00 AM, Denis Shelomovskij wrote:
 16.12.2011 21:29, Andrei Alexandrescu пишет:
[snip]
 Really sorry, but it sounds silly for me. It's a minor problem. Does
 anyone really cares about 600 KiB (3.5x) size change in an empty
 program? Yes, he does, but only if there is no other size increases in
 real programs.
In my experience, in a system programming language people do care about baseline size for one reason or another. I'd agree the reason is often overstated. But I did notice that people take a look at D and use "hello, world" size as a proxy for language's overall overhead - runtime, handling of linking etc. You may or may not care about the conclusions of our investigation, but we and a category of people do care for a variety of project sizes and approaches to building them.
 Now dmd have at least _two order of magnitude_ file size increase. I
 posted that problem four months ago at "Building GtkD app on Win32
 results in 111 MiB file mostly from zeroes".
[snip]
 ---
 char arr[1024 * 1024 * 10];
 void main() { }
 ---
[snip]
 If described issues aren't much more significant than "static this()",
 show me where am I wrong, please.
Using BSS is a nice optimization, but not all compilers do it and I know for a fact MSVC didn't have it for a long time. That's probably why I got used to thinking "poor style" when seeing a large statically-sized buffer with static duration. I'd say both issues deserve to be looked at, and saying one is more significant than the other would be difficult. Andrei
Dec 20 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/20/2011 6:23 AM, Andrei Alexandrescu wrote:
 On 12/20/11 9:00 AM, Denis Shelomovskij wrote:
 Now dmd have at least _two order of magnitude_ file size increase. I
 posted that problem four months ago at "Building GtkD app on Win32
 results in 111 MiB file mostly from zeroes".
[snip]
 ---
 char arr[1024 * 1024 * 10];
 void main() { }
 ---
[snip]
 If described issues aren't much more significant than "static this()",
 show me where am I wrong, please.
Using BSS is a nice optimization, but not all compilers do it and I know for a fact MSVC didn't have it for a long time. That's probably why I got used to thinking "poor style" when seeing a large statically-sized buffer with static duration. I'd say both issues deserve to be looked at, and saying one is more significant than the other would be difficult.
First off, dmd most definitely puts 0 initialized static data into the BSS segment. So what's going on here? 1. char data is not initialized to 0, it is initialized to 0xFF. Non-zero data cannot be put in BSS. 2. Static data goes, by default, into thread local storage. BSS data is not thread local. To put it in global data, it has to be declared with __gshared. So, __gshared byte arr[1024 * 1024 *10]; will go into BSS. There is pretty much no reason to have such huge arrays in static data. Instead, dynamically allocate them.
Dec 20 2011
parent Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
21.12.2011 0:22, Walter Bright пишет:
 First off, dmd most definitely puts 0 initialized static data into the
 BSS segment. So what's going on here?

 1. char data is not initialized to 0, it is initialized to 0xFF.
 Non-zero data cannot be put in BSS.
Sorry, it was because of copying C code in my post. ubyte array was tested in D.
 2. Static data goes, by default, into thread local storage. BSS data is
 not thread local. To put it in global data, it has to be declared with
 __gshared.
I completely forgot about TLS.
 So,

 __gshared byte arr[1024 * 1024 *10];

 will go into BSS.

 There is pretty much no reason to have such huge arrays in static data.
 Instead, dynamically allocate them.
Of course, it was just an example of a huge executable. Now I see that dmd uses BSS , thank you for the explanation! I still think that zero-filled TLS arrays can occupy no size in the executable, but it should be done with compiler and D run-time system support and surely it is not worth the time it will take to implement. I apologize for the unfair accusation.
Dec 21 2011
prev sibling next sibling parent reply "Marco Leise" <Marco.Leise gmx.de> writes:
Am 20.12.2011, 16:00 Uhr, schrieb Denis Shelomovskij  
<verylonglogin.reg gmail.com>:

 The second dmd issue (that was discovered because of 99.00% of zeros) is  
 that _it doesn't use bss section_.
 Lets look at the C++ program built using Microsoft's cl:
 ---
 char arr[1024 * 1024 * 10];
 void main() { }
 ---
 It resultis in ~10KiB executable, because `arr` is initialized with zero  
 bytes and put in bss section. If one of its elements is set to non-zero:
 ---
 char arr[1024 * 1024 * 10] = { 1 };
 void main() { }
 ---
 The array can't be in .bss any more and resulting executable size will  
 be increased by adding ~10MiB. The following D program results in ~10MiB  
 executable:
 ---
 ubyte[1024 * 1024 * 10] arr;
 void main() { }
 ---
 So, if there really is a reason not to use .bss, it should be clearly  
 explained.



 If described issues aren't much more significant than "static this()",  
 show me where am I wrong, please.
+1. I didn't know about .bss, but static arrays of zeroes (global, struct, class) increasing the executable size looked like a problem wanting a solution. I hope it is easy to solve for dmd and is just an unimportant issue, so was never implemented.
Dec 20 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/20/2011 1:07 PM, Marco Leise wrote:
 +1. I didn't know about .bss, but static arrays of zeroes (global, struct,
 class) increasing the executable size looked like a problem wanting a solution.
 I hope it is easy to solve for dmd and is just an unimportant issue, so was
 never implemented.
I added a faq entry for this.
Dec 20 2011
parent reply "Marco Leise" <Marco.Leise gmx.de> writes:
Am 20.12.2011, 22:39 Uhr, schrieb Walter Bright  
<newshound2 digitalmars.com>:

 On 12/20/2011 1:07 PM, Marco Leise wrote:
 +1. I didn't know about .bss, but static arrays of zeroes (global,  
 struct,
 class) increasing the executable size looked like a problem wanting a  
 solution.
 I hope it is easy to solve for dmd and is just an unimportant issue, so  
 was
 never implemented.
I added a faq entry for this.
Ok, I jumped on the band wagon to early. Personally I only had this problem with classes and structs. struct Test { byte arr[1024 * 1024 *10]; } and class Test { byte arr[1024 * 1024 *10]; } both create a 10MB executable. While for the class, init may contain more data than just that one field, I don't see the struct adding anything or going into TLS. Can these initializers also go into .bss?
Dec 20 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/20/2011 5:52 PM, Marco Leise wrote:
 Ok, I jumped on the band wagon to early. Personally I only had this problem
with
 classes and structs.

 struct Test {
 byte arr[1024 * 1024 *10];
 }

 and

 class Test {
 byte arr[1024 * 1024 *10];
 }

 both create a 10MB executable. While for the class, init may contain more data
 than just that one field, I don't see the struct adding anything or going into
 TLS. Can these initializers also go into .bss?
The struct one already does. Compile it, obj2asm it, and you'll see it there.
Dec 20 2011
parent reply "Marco Leise" <Marco.Leise gmx.de> writes:
Am 21.12.2011, 07:11 Uhr, schrieb Walter Bright  
<newshound2 digitalmars.com>:

 On 12/20/2011 5:52 PM, Marco Leise wrote:
 Ok, I jumped on the band wagon to early. Personally I only had this  
 problem with
 classes and structs.

 struct Test {
 byte arr[1024 * 1024 *10];
 }

 and

 class Test {
 byte arr[1024 * 1024 *10];
 }

 both create a 10MB executable. While for the class, init may contain  
 more data
 than just that one field, I don't see the struct adding anything or  
 going into
 TLS. Can these initializers also go into .bss?
The struct one already does. Compile it, obj2asm it, and you'll see it there.
Ah, I see it now. Sorry for the noise!
Dec 26 2011
parent reply "Marco Leise" <Marco.Leise gmx.de> writes:
Am 27.12.2011, 03:42 Uhr, schrieb Marco Leise <Marco.Leise gmx.de>:

 Am 21.12.2011, 07:11 Uhr, schrieb Walter Bright  
 <newshound2 digitalmars.com>:

 On 12/20/2011 5:52 PM, Marco Leise wrote:
 Ok, I jumped on the band wagon to early. Personally I only had this  
 problem with
 classes and structs.

 struct Test {
 byte arr[1024 * 1024 *10];
 }

 and

 class Test {
 byte arr[1024 * 1024 *10];
 }

 both create a 10MB executable. While for the class, init may contain  
 more data
 than just that one field, I don't see the struct adding anything or  
 going into
 TLS. Can these initializers also go into .bss?
The struct one already does. Compile it, obj2asm it, and you'll see it there.
Ah, I see it now. Sorry for the noise!
It is back again! The following struct in my main module increases the executable size by 10MB with DMD 2.075: struct Test { byte abcd[10 * 1024 * 1024]; } It seems not to do so with *both* of these declarations, that create static arrays in the module: byte abcd[10 * 1024 * 1024]; __gshared byte abcd[10 * 1024 * 1024];
Jan 18 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/18/2012 1:43 AM, Marco Leise wrote:
 It is back again! The following struct in my main module increases the
 executable size by 10MB with DMD 2.075:

 struct Test {
 byte abcd[10 * 1024 * 1024];
 }
Compiling it and obj2asm'ing the result, and you'll see it goes into the BSS segment: _TEXT segment dword use32 public 'CODE' ;size is 0 _TEXT ends _DATA segment para use32 public 'DATA' ;size is 12 _DATA ends CONST segment para use32 public 'CONST' ;size is 0 CONST ends _BSS segment para use32 public 'BSS' ;size is 10485760 _BSS ends FLAT group extrn _D19TypeInfo_S3foo4Test6__initZ public _D3foo4Test6__initZ FMB segment dword use32 public 'DATA' ;size is 0 FMB ends FM segment dword use32 public 'DATA' ;size is 4 FM ends FME segment dword use32 public 'DATA' ;size is 0 FME ends extrn _D15TypeInfo_Struct6__vtblZ public _D3foo12__ModuleInfoZ _D19TypeInfo_S3foo4Test6__initZ COMDAT flags=x0 attr=x10 align=x0 _TEXT segment assume CS:_TEXT _TEXT ends _DATA segment _D3foo12__ModuleInfoZ: db 004h,000h,000h,0ffffff80h,000h,000h,000h,000h ;........ db 066h,06fh,06fh,000h ;foo. _DATA ends CONST segment CONST ends _BSS segment _BSS ends FMB segment FMB ends FM segment dd offset FLAT:_D3foo12__ModuleInfoZ FM ends FME segment FME ends _D19TypeInfo_S3foo4Test6__initZ comdat dd offset FLAT:_D15TypeInfo_Struct6__vtblZ db 000h,000h,000h,000h ;.... db 008h,000h,000h,000h ;.... dd offset FLAT:_D19TypeInfo_S3foo4Test6__initZ[03Ch] db 000h,000h,0ffffffa0h,000h,000h,000h,000h,000h ;........ db 000h,000h,000h,000h,000h,000h,000h,000h ;........ db 000h,000h,000h,000h,000h,000h,000h,000h ;........ db 000h,000h,000h,000h,000h,000h,000h,000h ;........ db 000h,000h,000h,000h,000h,000h,000h,000h ;........ db 001h,000h,000h,000h,066h,06fh,06fh,02eh ;....foo. db 054h,065h,073h,074h,000h ;Test. _D19TypeInfo_S3foo4Test6__initZ ends end ------------------------------------------------- Adding a void main(){} yields an executable of 145,948 bytes.
Jan 18 2012
next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 18.01.2012, 11:18 Uhr, schrieb Walter Bright  
<newshound2 digitalmars.com>:

 On 1/18/2012 1:43 AM, Marco Leise wrote:
 It is back again! The following struct in my main module increases the
 executable size by 10MB with DMD 2.075:

 struct Test {
 byte abcd[10 * 1024 * 1024];
 }
Compiling it and obj2asm'ing the result, and you'll see it goes into the BSS segment: [...] Adding a void main(){} yields an executable of 145,948 bytes.
Thanks for checking back. I'll have to experiment a bit to narrow this one down. It comes and goes like a ghost. I was using Linux 64-bit and the switches -O -release on a medium size code base.
Jan 18 2012
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
I tried different versions of DMD 2.057:
- compiled from sources in the release zip (Gentoo ebuild)
- using the 32-bit binaries in the release zip
- compiling the latest 32-bit version of DMD from the repository
I tried different compiler flags or no flags at all, compiled similar code  
in C++ to see if the linker is ok and tried -m32 and -m64, all to no  
avail. Then I found a solution that I can hardly imagine happening only on  
my unique snow-flake of a system ;) :

struct Test {
     __gshared byte abcd[10 * 1024 * 1024];
}

If it weren't for your own test results, I'd assume there is a small  
compiler bug in the code that decides what can go into .bss, that makes it  
look only for data explicitly flagged as __gshared, but not other  
immutable data. (Something like that anyway.)
I back-tracked the compiler code to where it either calls obj_bytes (good  
case, goes into .bss) or obj_lidata (bad case) to write the 10 MB of  
zeros. But there were so many call sites, that I figured someone with  
inside knowledge would figure it out faster.

As a side-effect of this experiment I found this combination to do funny  
things at runtime:

--------------------------------------------------

struct Test {
	byte arr1[1024 * 1024 * 10];
	__gshared byte arr2[1024 * 1024 * 10];
}

int main() {
	Test test;
	return 0;
}

--------------------------------------------------

-- Marco

Am 18.01.2012, 11:18 Uhr, schrieb Walter Bright  
<newshound2 digitalmars.com>:

 On 1/18/2012 1:43 AM, Marco Leise wrote:
 It is back again! The following struct in my main module increases the
 executable size by 10MB with DMD 2.075:

 struct Test {
 byte abcd[10 * 1024 * 1024];
 }
Compiling it and obj2asm'ing the result, and you'll see it goes into the BSS segment: _TEXT segment dword use32 public 'CODE' ;size is 0 _TEXT ends _DATA segment para use32 public 'DATA' ;size is 12 _DATA ends CONST segment para use32 public 'CONST' ;size is 0 CONST ends _BSS segment para use32 public 'BSS' ;size is 10485760 _BSS ends FLAT group extrn _D19TypeInfo_S3foo4Test6__initZ public _D3foo4Test6__initZ FMB segment dword use32 public 'DATA' ;size is 0 FMB ends FM segment dword use32 public 'DATA' ;size is 4 FM ends FME segment dword use32 public 'DATA' ;size is 0 FME ends extrn _D15TypeInfo_Struct6__vtblZ public _D3foo12__ModuleInfoZ _D19TypeInfo_S3foo4Test6__initZ COMDAT flags=x0 attr=x10 align=x0 _TEXT segment assume CS:_TEXT _TEXT ends _DATA segment _D3foo12__ModuleInfoZ: db 004h,000h,000h,0ffffff80h,000h,000h,000h,000h ;........ db 066h,06fh,06fh,000h ;foo. _DATA ends CONST segment CONST ends _BSS segment _BSS ends FMB segment FMB ends FM segment dd offset FLAT:_D3foo12__ModuleInfoZ FM ends FME segment FME ends _D19TypeInfo_S3foo4Test6__initZ comdat dd offset FLAT:_D15TypeInfo_Struct6__vtblZ db 000h,000h,000h,000h ;.... db 008h,000h,000h,000h ;.... dd offset FLAT:_D19TypeInfo_S3foo4Test6__initZ[03Ch] db 000h,000h,0ffffffa0h,000h,000h,000h,000h,000h ;........ db 000h,000h,000h,000h,000h,000h,000h,000h ;........ db 000h,000h,000h,000h,000h,000h,000h,000h ;........ db 000h,000h,000h,000h,000h,000h,000h,000h ;........ db 000h,000h,000h,000h,000h,000h,000h,000h ;........ db 001h,000h,000h,000h,066h,06fh,06fh,02eh ;....foo. db 054h,065h,073h,074h,000h ;Test. _D19TypeInfo_S3foo4Test6__initZ ends end ------------------------------------------------- Adding a void main(){} yields an executable of 145,948 bytes.
Jan 19 2012
prev sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
P.S.: I could have realized it earlier: DMD uses the Windows PE BSS  
section quite well! It is Linux where the .bss section is not used! I'll  
file a bug report about this after lunch and look forward to smaller  
executables under Linux any time soon :D
Jan 19 2012
prev sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tuesday, 20 December 2011 at 14:01:04 UTC, Denis Shelomovskij 
wrote:
 Detailed description:
 GtkD is built using singe (gtk-one-obj.lib) or separate (one 
 per source file) object files (gtk-sep-obj.lib).

 Than main.d that imports gtk.Main is built using those 
 libraries.

 Than zeroCount utils is built and launched over resulting files:
 --------------------------------------------------
 Now let's calculate zero bytes counts:
 --------------------------------------------------
  Zero bytes|     %|    Non-zero| Total bytes|        File
     3628311| 21.56|    13202153|    16830464|gtk-one-obj.lib
     1953124| 15.98|    10272924|    12226048|gtk-sep-obj.lib
   127968798| 99.00|     1298430|   129267228|main-one-obj.exe
      743821| 37.51|     1239183|     1983004|main-sep-obj.exe
 Done.

 So we have to use very slow per-file build to produce a good 
 (not 100 MiB) executable.
 No matter what *.exe is launched, its process allocates ~20MiB 
 of RAM (loaded Gtk dll-s).
I believe this is bug 2254: http://d.puremagic.com/issues/show_bug.cgi?id=2254 The cause is the way DMD builds libraries. The old way of building libraries (using a librarian) does not create libraries that exhibit this problem when linked with an executable.
Dec 20 2011