www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Suggestion - use local imports in Phobos

reply "Idan Arye" <GenericNPC gmail.com> writes:
One of Phobos' design goals was internal modularity. As Walter 
mentioned in another 
thread(http://forum.dlang.org/thread/kojnq5$2qst$1 digitalmars.com#post-kok6q3:24jen:241
40digitalmars.com), 
this goal has been greatly compromised. A long list of imports is 
a common sight in the heads of Phobos' source files. This, 
ofcourse, is done for a good reason - we all know the benefits of 
DRY.

The advantages for internal modularity, as written in the Phobos 
readme(=`index.d`), are:

  1) "It's dis­cour­ag­ing to pull in a megabyte of code bloat by 
just try­ing to read a file into an array of bytes."
  2) "Class in­de­pen­dence also means that classes that turn out 
to be mis­takes can be dep­re­cated and re­designed with­out 
forc­ing a rewrite of the rest of the class li­brary."

The second advantage comes in direct conflict with the DRY 
principle - if Foo is to enjoy bug fixes in Bar, Foo must also 
suffer from breaking changes in Bar. Also a change that breaks 
library code will probably break user code as well, so those 
changes are discouraged anyways.

As for the first advantage, I believe it can be achieved with 
local imports.

Many modules import other modules solely for usage in unit tests. 
Those imports are redundant if you don't unit-test Phobos - and 
most projects written in D don't run the standard library unit 
tests. If those imports were local to the unit test, they 
wouldn't be imported in outside code.

Also, a huge portion of Phobos is written in templates. If an 
import is local to a template, and the template is not 
instantiated, then the module is not imported.

Due to these characteristics of Phobos, I believe making the 
imports local to the unit tests and templates that use them will 
reduce the number of imports the compiler has to do.


Another advantage of making the imports local(whenever possible) 
is that it would make it easier to remove imports. Currently, if 
a change to an implementation in Phobos removes it dependency on 
a module, you can't remove them module's name from the import 
list, because maybe some other part of this module needs that 
import. If that import was local to the implementation that used 
it, you could remove the import safely, knowing that if it is 
needed somewhere else, it is imported locally there.


One big disadvantage of this suggestion is that implementing it 
is a tedious job - like I said, Phobos has many templates, so the 
compiler won't alert us about a missing module unless we 
instantiate the template that needs it - and some templates can 
have multiple "instantiation paths", that some of them might not 
use the module!

So, we will have to scan the source files manually to determine 
which section requires which modules. Luckily, this change is not 
needed to be done at once, since it is not a breaking change, and 
should not affect user code(though it will make Phobos pull 
requests harder to merge).


Also, I'm not really familiar with the internals of dmd - how 
much impact will importing the same module many times have on the 
compilation performance? Will it be more or less than what we 
save by reducing the number of imported modules?


Opinions?
Jun 04 2013
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jun 04, 2013 at 08:59:15PM +0200, Idan Arye wrote:
[...]
 Many modules import other modules solely for usage in unit tests.
 Those imports are redundant if you don't unit-test Phobos - and most
 projects written in D don't run the standard library unit tests. If
 those imports were local to the unit test, they wouldn't be imported
 in outside code.

AFAIK, Phobos code wraps such imports in version(unittest) blocks, so this shouldn't be a problem. If there are stray imports that aren't wrapped and are only needed for unittests, you should file a bug. [...]
 Due to these characteristics of Phobos, I believe making the imports
 local to the unit tests and templates that use them will reduce the
 number of imports the compiler has to do.

This breaks DRY because some imports are used by multiple unittests. As long as the imports are wrapped in version(unittest) blocks, I don't see this as a problem.
 Another advantage of making the imports local(whenever possible) is
 that it would make it easier to remove imports. Currently, if a change
 to an implementation in Phobos removes it dependency on a module, you
 can't remove them module's name from the import list, because maybe
 some other part of this module needs that import. If that import was
 local to the implementation that used it, you could remove the import
 safely, knowing that if it is needed somewhere else, it is imported
 locally there.

This is a good point, not just for Phobos, but for D code in general. [...]
 Also, I'm not really familiar with the internals of dmd - how much
 impact will importing the same module many times have on the
 compilation performance? Will it be more or less than what we save
 by reducing the number of imported modules?

I may be wrong, but I seem to recall hearing/seeing somewhere that a repeated import is just ignored by DMD, since it has already loaded the imported module. If not, then DMD should be fixed to do that. :) T -- Doubt is a self-fulfilling prophecy.
Jun 04 2013
prev sibling next sibling parent "Idan Arye" <GenericNPC gmail.com> writes:
On Tuesday, 4 June 2013 at 19:20:26 UTC, H. S. Teoh wrote:
 On Tue, Jun 04, 2013 at 08:59:15PM +0200, Idan Arye wrote:
 [...]
 Due to these characteristics of Phobos, I believe making the 
 imports
 local to the unit tests and templates that use them will 
 reduce the
 number of imports the compiler has to do.

This breaks DRY because some imports are used by multiple unittests. As long as the imports are wrapped in version(unittest) blocks, I don't see this as a problem.

That's like saying that defining a local `i` variable for using in a `for` loop breaks DRY, because you could have defined it once at global scope. Importing the same module over and over does not break DRY, just like calling the same function in multiple places does not break DRY. Breaking DRY means to write the internal code of that function in several places - or implementing the same things in several modules.
 [...]
 Also, I'm not really familiar with the internals of dmd - how 
 much
 impact will importing the same module many times have on the
 compilation performance? Will it be more or less than what we 
 save
 by reducing the number of imported modules?

I may be wrong, but I seem to recall hearing/seeing somewhere that a repeated import is just ignored by DMD, since it has already loaded the imported module. If not, then DMD should be fixed to do that. :)

I'm pretty sure dmd ignores already loaded modules - if it didn't, trying to compile anything that uses Phobos would enter an infinite loop/recursion - and I'm not sure about ignoring already imported modules, but it probably does that too. Not doing it would have a major impact on compilation performance, considering that most Phobos modules import many of the other modules. The problem is that ignoring an import, while cheaper than doing the import and much much much cheaper than loading the module, is not a free action. Before the compiler decides to ignore an import, it needs to perform a lookup to check that it was already imported. If an import is local to a template, this lookup will be performed every time the template is instantiated - and if the lookup is not fast enough, this can add up to harm performance. I believe that lookup is pretty fast, but as I'm not familiar with the dmd implementation I can't know this for sure, so I thought it would be better to bring it up.
Jun 04 2013
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-06-04 20:59, Idan Arye wrote:
 One of Phobos' design goals was internal modularity. As Walter mentioned
 in another
 thread(http://forum.dlang.org/thread/kojnq5$2qst$1 digitalmars.com#post-kok6q3:24jen:241:40digitalmars.com),
 this goal has been greatly compromised. A long list of imports is a
 common sight in the heads of Phobos' source files. This, ofcourse, is
 done for a good reason - we all know the benefits of DRY.

Tango contains some duplicated code just to avoid dependencies between modules. -- /Jacob Carlborg
Jun 04 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/4/13 5:13 PM, Jacob Carlborg wrote:
 On 2013-06-04 20:59, Idan Arye wrote:
 One of Phobos' design goals was internal modularity. As Walter mentioned
 in another
 thread(http://forum.dlang.org/thread/kojnq5$2qst$1 digitalmars.com#post-kok6q3:24jen:241:40digitalmars.com),

 this goal has been greatly compromised. A long list of imports is a
 common sight in the heads of Phobos' source files. This, ofcourse, is
 done for a good reason - we all know the benefits of DRY.

Tango contains some duplicated code just to avoid dependencies between modules.

One question would be where the right balance is, and how to make sure we strike it. Andrei
Jun 04 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-06-05 00:43, Andrei Alexandrescu wrote:

 One question would be where the right balance is, and how to make sure
 we strike it.

Yes, exactly. The duplication in Tango is usually some minor array function or trait which is used internally by the module. Usually it's just enough for what the module needs and not necessarily a direct copy of some other function. Example, the tango.net.http.HttpCookies module contains a toLower function for ASCII. That's a small utility function, just four lines long. -- /Jacob Carlborg
Jun 05 2013
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 6/4/13, Idan Arye <GenericNPC gmail.com> wrote:
   1) "It's dis=ADcour=ADag=ADing to pull in a megabyte of code bloat by
 just try=ADing to read a file into an array of bytes."

Isn't this the linker's job? If nothing in an import is used, I would expect no code to be linked into the exectuable (well, except maybe module constructors..).
Jun 04 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 04 Jun 2013 18:44:21 -0400, Andrej Mitrovic  
<andrej.mitrovich gmail.com> wrote:

 On 6/4/13, Idan Arye <GenericNPC gmail.com> wrote:
   1) "It's discouraging to pull in a megabyte of code bloat by
 just trying to read a file into an array of bytes."

Isn't this the linker's job? If nothing in an import is used, I would expect no code to be linked into the exectuable (well, except maybe module constructors..).

If the module contains static ctors, those may thwart any efforts by the linker. -Steve
Jun 04 2013
prev sibling next sibling parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Tuesday, 4 June 2013 at 22:43:12 UTC, Andrei Alexandrescu 
wrote:
 One question would be where the right balance is, and how to 
 make sure we strike it.

 Andrei

I would say the principle of least surprise could be used here. i.e you may not be totally surprised to have a dependency upon the gzip module in an http module, but you don't really expect to have a dependency upon LevenshteinDistance, for instance.
Jun 06 2013
prev sibling next sibling parent "anonymous" <anonymous example.com> writes:
On Tuesday, 4 June 2013 at 18:59:17 UTC, Idan Arye wrote:
[...]
 Opinions?

Issue 7016 blocks this.
Jun 06 2013
prev sibling next sibling parent "Tyler Jameson Little" <beatgammit gmail.com> writes:
 Due to these characteristics of Phobos, I believe making the 
 imports
 local to the unit tests and templates that use them will 
 reduce the
 number of imports the compiler has to do.

This breaks DRY because some imports are used by multiple unittests. As long as the imports are wrapped in version(unittest) blocks, I don't see this as a problem.

That's like saying that defining a local `i` variable for using in a `for` loop breaks DRY, because you could have defined it once at global scope. Importing the same module over and over does not break DRY, just like calling the same function in multiple places does not break DRY. Breaking DRY means to write the internal code of that function in several places - or implementing the same things in several modules.

I completely agree. I write a lot of Go code, and in Go, unused imports are a compile-time error, which is a pretty nice feature because you know what the dependencies are for a package. In D, this is not the case (though I would like it to be a warning at least, but that's another issue...), but at least Phobos can declare which imports are needed, and where they are needed. For example, if a unittest requires std.datetime, an import is added to the top of the file. Then later, if this requirement is removed, the import remains because it is not immediately obvious if some other unittest needs it. Sooner or later, the imports stack up and it's hard to tell which imports are required. If unittest dependencies are localized, they can be removed from the tests that do not need it. I think this would simplify the task of removing inter-dependencies later if this ever becomes a priority. This is already what I do in my own D code, and for this very reason. Just my 2c.
Jun 06 2013
prev sibling parent "Tyler Jameson Little" <beatgammit gmail.com> writes:
 Tango contains some duplicated code just to avoid dependencies 
 between modules.

FWIW, so does Go's standard library. For small pieces of code, I think this is reasonable.
Jun 06 2013