www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Please integrate build framework into the compiler

reply davidl <davidl 126.com> writes:
1. compiler know in what situation a file need to be recompiled

Consider the file given the same header file, then the obj file of this  
will be required for linking, all other files import this file shouldn't  
require any recompilation in this case. If a file's header file changes,  
thus the interface changes, all files import this file should be  
recompiled.
Compiler can emit building command like rebuild does.

I would enjoy:

dmd -buildingcommand abc.d  > responsefile

dmd  responsefile

I think we need to eliminate useless recompilation as much as we should  
with consideration of the growing d project size.

2. maintaining the build without compiler support costs
Mar 21 2009
next sibling parent reply grauzone <none example.net> writes:
I don't really understand what you mean. But if you want the compiler to 
scan for dependencies, I fully agree.

I claim that we don't even need incremental compilation. It would be 
better if the compiler would scan for dependencies, and if a source file 
has changed, recompile the whole project in one go. This would be simple 
and efficient.

Here are some arguments that speak for this approach:

- A full compiler is the only piece of software that can build a 
correct/complete module dependency graph. This is because you need full 
semantic analysis to catch all import statements. For example, you can 
use a string mixin to generate import statements: mixin("import bla;"). 
No naive dependency scanner would be able to detect this import. You 
need CTFE capabilities, which require almost a full compiler. (Actually, 
dsss uses the dmd frontend for dependency scanning.)

- Speed. Incremental compilation is godawfully slow (10 times slower 
than to compile all files in one dmd invocation). You could pass all 
changed files to dmd at once, but this is broken and often causes linker 
errors (ask the dsss author for details lol). Recompiling the whole 
thing every time is faster.

- Long dependency chains. Unlike in C/C++, you can't separate a module 
into interface and implementation. Compared to C++, it's as if a change 
to one .c file triggers recompilation of a _lot_ of other .c files. This 
makes incremental compilation really look useless. Unless you move 
modules into libraries and use them through .di files.

I would even go so far to say, that dmd should automatically follow all 
imports and compile them in one go. This would be faster than having a 
separate responsefile step, because the source code needs to be analyzed 
only once. To prevent compilation of imported library headers, the 
compiler could provide a new include switch for library code. Modules 
inside "library" include paths wouldn't be compiled.

Hell, maybe I'll even manage to come up with a compiler patch, to turn 
this into reality.
Mar 21 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
grauzone wrote:
 I don't really understand what you mean. But if you want the compiler to 
 scan for dependencies, I fully agree.
 
 I claim that we don't even need incremental compilation. It would be 
 better if the compiler would scan for dependencies, and if a source file 
 has changed, recompile the whole project in one go. This would be simple 
 and efficient.

That's precisely what rdmd does. Andrei
Mar 21 2009
parent reply grauzone <none example.net> writes:
Andrei Alexandrescu wrote:
 grauzone wrote:
 I don't really understand what you mean. But if you want the compiler 
 to scan for dependencies, I fully agree.

 I claim that we don't even need incremental compilation. It would be 
 better if the compiler would scan for dependencies, and if a source 
 file has changed, recompile the whole project in one go. This would be 
 simple and efficient.

That's precisely what rdmd does.

This looks really good, but I couldn't get it to work. Am I doing something wrong? --- o.d: module o; import tango.io.Stdout; void k() { Stdout("foo").newline; } --- u.d: module u; import o; void main() { k(); } $ rdmd u.d /tmp/u-1000-20-49158160-A46C236CDE107E3B9F053881E4257C2D.o:(.data+0x38): undefined reference to `_D1o12__ModuleInfoZ' /tmp/u-1000-20-49158160-A46C236CDE107E3B9F053881E4257C2D.o: In function `_Dmain': u.d:(.text._Dmain+0x4): undefined reference to `_D1o1kFZv' collect2: ld returned 1 exit status --- errorlevel 1 rdmd: Couldn't compile or execute u.d. $ dmd|grep Compiler Digital Mars D Compiler v1.041
 Andrei

Mar 21 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
grauzone wrote:
 Andrei Alexandrescu wrote:
 grauzone wrote:
 I don't really understand what you mean. But if you want the compiler 
 to scan for dependencies, I fully agree.

 I claim that we don't even need incremental compilation. It would be 
 better if the compiler would scan for dependencies, and if a source 
 file has changed, recompile the whole project in one go. This would 
 be simple and efficient.

That's precisely what rdmd does.

This looks really good, but I couldn't get it to work. Am I doing something wrong? --- o.d: module o; import tango.io.Stdout; void k() { Stdout("foo").newline; } --- u.d: module u; import o; void main() { k(); } $ rdmd u.d /tmp/u-1000-20-49158160-A46C236CDE107E3B9F053881E4257C2D.o:(.data+0x38): undefined reference to `_D1o12__ModuleInfoZ' /tmp/u-1000-20-49158160-A46C236CDE107E3B9F053881E4257C2D.o: In function `_Dmain': u.d:(.text._Dmain+0x4): undefined reference to `_D1o1kFZv' collect2: ld returned 1 exit status --- errorlevel 1 rdmd: Couldn't compile or execute u.d. $ dmd|grep Compiler Digital Mars D Compiler v1.041

Should work, but I tested only with D2. You may want to pass --chatty to rdmd and see what commands it invokes. Andrei
Mar 21 2009
parent reply grauzone <none example.net> writes:
My rdmd doesn't know --chatty. Probably the zip file for dmd 1.041 
contains an outdated, buggy version. Where can I find the up-to-date 
source code?

Another question, rdmd just calls dmd, right? How does it scan for 
dependencies, or is this step actually done by dmd itself?
Mar 21 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
grauzone wrote:
 My rdmd doesn't know --chatty. Probably the zip file for dmd 1.041 
 contains an outdated, buggy version. Where can I find the up-to-date 
 source code?

Hold off on that for now.
 Another question, rdmd just calls dmd, right? How does it scan for 
 dependencies, or is this step actually done by dmd itself?

rdmd invokes dmd -v to get deps. It's a interesting idea to add a compilation mode to rdmd that asks dmd to generate headers and diff them against the old headers. That way we can implement incremental rebuilds without changing the compiler. Andrei
Mar 21 2009
next sibling parent Ary Borenszweig <ary esperanto.org.ar> writes:
davidl escribió:
 在 Sun, 22 Mar 2009 12:18:03 +0800,Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> 写道:
 
 grauzone wrote:
 My rdmd doesn't know --chatty. Probably the zip file for dmd 1.041 
 contains an outdated, buggy version. Where can I find the up-to-date 
 source code?

Hold off on that for now.
 Another question, rdmd just calls dmd, right? How does it scan for 
 dependencies, or is this step actually done by dmd itself?

rdmd invokes dmd -v to get deps. It's a interesting idea to add a compilation mode to rdmd that asks dmd to generate headers and diff them against the old headers. That way we can implement incremental rebuilds without changing the compiler. Andrei

The bad news is that public imports ruin the simplicity of dependencies. Though most cases d projs uses private imports. Maybe we can further restrict the public imports.

Yes. They could give a compile-time error... always. ;-)
Mar 22 2009
prev sibling parent reply grauzone <none example.net> writes:
Andrei Alexandrescu wrote:
 grauzone wrote:
 My rdmd doesn't know --chatty. Probably the zip file for dmd 1.041 
 contains an outdated, buggy version. Where can I find the up-to-date 
 source code?

Hold off on that for now.
 Another question, rdmd just calls dmd, right? How does it scan for 
 dependencies, or is this step actually done by dmd itself?

rdmd invokes dmd -v to get deps. It's a interesting idea to add a compilation mode to rdmd that asks dmd to generate headers and diff them against the old headers. That way we can implement incremental rebuilds without changing the compiler.

Is this just an "interesting idea", or are you actually considering implementing it? Anyway, maybe you could pressure Walter to fix that dmd bug, that stops dsss from being efficient. I can't advertise this enough.
 Andrei

Mar 23 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
grauzone wrote:
 Andrei Alexandrescu wrote:
 grauzone wrote:
 My rdmd doesn't know --chatty. Probably the zip file for dmd 1.041 
 contains an outdated, buggy version. Where can I find the up-to-date 
 source code?

Hold off on that for now.
 Another question, rdmd just calls dmd, right? How does it scan for 
 dependencies, or is this step actually done by dmd itself?

rdmd invokes dmd -v to get deps. It's a interesting idea to add a compilation mode to rdmd that asks dmd to generate headers and diff them against the old headers. That way we can implement incremental rebuilds without changing the compiler.

Is this just an "interesting idea", or are you actually considering implementing it?

I would if there was a compelling case made in favor of it. Andrei
Mar 23 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from grauzone (none example.net)'s article
 I claim that we don't even need incremental compilation. It would be
 better if the compiler would scan for dependencies, and if a source file
 has changed, recompile the whole project in one go. This would be simple
 and efficient.

I'm surprised that this could possibly be more efficient than incremental compilation, but I've never worked on a project large enough for compile times to be a major issue, so I've never really looked into this. If incremental compilation were removed from the spec, meaning the compiler would always know about the whole program when compiling, I assume (correct me if I'm wrong) that would mean the following restrictions could be removed: 1. std.traits could offer a way to get a tuple of all derived classes, essentially the opposite of BaseTypeType. 2. Since DMD would know about all derived classes when compiling the base class, it would be feasible to allow templates to add virtual functions to classes. IMHO, this would be an absolute godsend, as it is currently a _huge_ limitation of templates. 3. For the same reason, methods calls to classes with no derived classes could be made directly instead of through the vtable. Of course, these restrictions would still apply to libraries that use .di files. If incremental compilation is actually causing more problems than it solves anyhow, it would be great to get rid of it along with the annoying restrictions it creates.
Mar 21 2009
next sibling parent grauzone <none example.net> writes:
dsimcha wrote:
 == Quote from grauzone (none example.net)'s article
 I claim that we don't even need incremental compilation. It would be
 better if the compiler would scan for dependencies, and if a source file
 has changed, recompile the whole project in one go. This would be simple
 and efficient.

I'm surprised that this could possibly be more efficient than incremental compilation, but I've never worked on a project large enough for compile times to be a major issue, so I've never really looked into this.

Maybe incremental compilation could be faster, but dmd has a bug that forces tools like dsss/rebuild to use a slower method. Instead of invoking the compiler once to recompile all modules that depend from changed files, it has to start a new compiler process for each file.
 If incremental compilation were removed from the spec, meaning the compiler
would
 always know about the whole program when compiling, I assume (correct me if I'm
 wrong) that would mean the following restrictions could be removed:
 
 1.  std.traits could offer a way to get a tuple of all derived classes,
 essentially the opposite of BaseTypeType.
 2.  Since DMD would know about all derived classes when compiling the base
class,
 it would be feasible to allow templates to add virtual functions to classes.
 IMHO, this would be an absolute godsend, as it is currently a _huge_
limitation of
 templates.
 3.  For the same reason, methods calls to classes with no derived classes
could be
 made directly instead of through the vtable.

And you could do all kinds of interprocedural optimizations.
 Of course, these restrictions would still apply to libraries that use .di
files.
 If incremental compilation is actually causing more problems than it solves
 anyhow, it would be great to get rid of it along with the annoying
restrictions it
 creates.

It seems Microsoft thought the same. C# goes without incremental compilation. But for now, D's build model is too similar to C/C++ as that you'd completely remove that ability.
Mar 21 2009
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
dsimcha wrote:
 1.  std.traits could offer a way to get a tuple of all derived classes,
 essentially the opposite of BaseTypeType.
 2.  Since DMD would know about all derived classes when compiling the base
class,
 it would be feasible to allow templates to add virtual functions to classes.
 IMHO, this would be an absolute godsend, as it is currently a _huge_
limitation of
 templates.
 3.  For the same reason, methods calls to classes with no derived classes
could be
 made directly instead of through the vtable.

This is only if there is no dynamic linking.
Mar 21 2009
prev sibling next sibling parent BCS <none anon.com> writes:
Hello grauzone,

 I would even go so far to say, that dmd should automatically follow
 all imports and compile them in one go. This would be faster than
 having a separate responsefile step, because the source code needs to
 be analyzed only once. To prevent compilation of imported library
 headers, the compiler could provide a new include switch for library
 code. Modules inside "library" include paths wouldn't be compiled.

Adding that without a way to turn it off would kill D in some cases. I have a project where DMD uses up >30% of the available address space compiling one module. If I was forced to compile all modules at once, it might not work, end of story. That said, for many cases, I don't see a problem with having that feature available.
Mar 21 2009
prev sibling next sibling parent Christopher Wright <dhasenan gmail.com> writes:
grauzone wrote:
 - Long dependency chains. Unlike in C/C++, you can't separate a module 
 into interface and implementation. Compared to C++, it's as if a change 
 to one .c file triggers recompilation of a _lot_ of other .c files. This 
 makes incremental compilation really look useless. Unless you move 
 modules into libraries and use them through .di files.

You can use interfaces for this, though that is not always possible.
Mar 21 2009
prev sibling next sibling parent reply davidl <davidl 126.com> writes:
在 Sun, 22 Mar 2009 04:19:31 +0800,grauzone <none example.net> 写道:

 I don't really understand what you mean. But if you want the compiler to  
 scan for dependencies, I fully agree.

 I claim that we don't even need incremental compilation. It would be  
 better if the compiler would scan for dependencies, and if a source file  
 has changed, recompile the whole project in one go. This would be simple  
 and efficient.

This may not be true. Consider the dwt lib case, once you tweaked a module very little(that means you do not modify any interface connects with outside modules and code that could possible affect modules in the same packages), the optimal way is dmd -c your_tweaked_module link all_obj That's much faster than regenerating all other object files. Yes, feed them all to DMD compiles really fast. Writing all object files to disk costs much time. And your impression of incremental compilation seems to be misguided by the rebuild and dsss system. Rebuild takes no advantage of di files, thus it have to recompile everytime even in the situation that the module based on all other di files unchanged. I posted several blocking header generation bugs in DMD and with fixes. Just so little change that dmd can generate almost all header files correctly. I tested tango, dwt, dwt-addons. Those projects are very big and some take advanced use of templates. So the header generation building strategy is really not far away. Little self-promotion here, and in case Walter misses some of them: http://d.puremagic.com/issues/show_bug.cgi?id=2744 http://d.puremagic.com/issues/show_bug.cgi?id=2745 http://d.puremagic.com/issues/show_bug.cgi?id=2747 http://d.puremagic.com/issues/show_bug.cgi?id=2748 http://d.puremagic.com/issues/show_bug.cgi?id=2751 In c++, a sophisticated makefile carefully build .h dependencies of .c files. Thus, once .h files are updated, then .c files which are based on them need to be recompile. This detection can be made by comparison of old .di files and new .di files by testing their equality.
Mar 21 2009
parent grauzone <none example.net> writes:
 Little self-promotion here, and in case Walter misses some of them:
 http://d.puremagic.com/issues/show_bug.cgi?id=2744
 http://d.puremagic.com/issues/show_bug.cgi?id=2745
 http://d.puremagic.com/issues/show_bug.cgi?id=2747
 http://d.puremagic.com/issues/show_bug.cgi?id=2748
 http://d.puremagic.com/issues/show_bug.cgi?id=2751

If it's about bugs, it would (probably) be easier for Walter to fix that code generation bug, that forces dsss/rebuild to invoke a new dmd process to recompile each outdated file separately. This would bring a critical speedup for incremental compilation (from absolutely useless to relatively useful), and all impatient D users with middle sized source bases could be happy.
 In c++, a sophisticated makefile carefully build .h dependencies of .c 
 files. Thus, once .h files are updated, then .c files which are based on 
 them need to be recompile. This detection can be made by comparison of 
 old .di files and new .di files by testing their equality.

This sounds like a really nice idea, but it's also quite complex. For example, to guarantee correctness, the D compiler _always_ had to read the .di file when importing a module (and not the .d file directly). If it doesn't do that, it could "accidentally" use information that isn't included in the .di file (like code when doing inlining). This means you had to generate the .di files first. When doing this, you also had to deal with circular dependencies, which will bring extra headaches. And of course, you need to fix all those .di generation bugs. It's actually a bit scary that the compiler not only has to be able to parse D code, but also to output D source code again. And .di files are not even standardized. It's perhaps messy enough to deem it unrealistic. Still, nice idea.
Mar 21 2009
prev sibling next sibling parent davidl <davidl 126.com> writes:
在 Sun, 22 Mar 2009 12:18:03 +0800,Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> 写道:

 grauzone wrote:
 My rdmd doesn't know --chatty. Probably the zip file for dmd 1.041  
 contains an outdated, buggy version. Where can I find the up-to-date  
 source code?

Hold off on that for now.
 Another question, rdmd just calls dmd, right? How does it scan for  
 dependencies, or is this step actually done by dmd itself?

rdmd invokes dmd -v to get deps. It's a interesting idea to add a compilation mode to rdmd that asks dmd to generate headers and diff them against the old headers. That way we can implement incremental rebuilds without changing the compiler. Andrei

The bad news is that public imports ruin the simplicity of dependencies. Though most cases d projs uses private imports. Maybe we can further restrict the public imports. I suggest we add a new module style of interfacing. Public imports are only allowed in those modules. Interface module can only have public imports. example: all.d module(interface) all; public import blah; public import blah.foo; interface module can not import another interface module. Thus no public import chain will be created. The shortcoming of it is: module(interface) subpack.all; public import subpack.mod; module(interface) all; public import subpack.mod; // duplication here. public import subpack1.mod1;
Mar 21 2009
prev sibling next sibling parent reply "Kristian Kilpi" <kjkilpi gmail.com> writes:
On Sat, 21 Mar 2009 22:19:31 +0200, grauzone <none example.net> wrote:
 I don't really understand what you mean. But if you want the compiler to  
 scan for dependencies, I fully agree.

 I claim that we don't even need incremental compilation. It would be  
 better if the compiler would scan for dependencies, and if a source file  
 has changed, recompile the whole project in one go. This would be simple  
 and efficient.

Well, why not get rid of the imports altogether... Ok, that would not be feasible because of the way compilers (D, C++, etc) are build nowadays. I find adding of #includes/imports laborious. (Is this component already #included/imported? Where's that class defined? Did I forgot something?) And when you modify or refractor the file, you have to update the #includes/imports accordingly... (In case of modification/refractoring) the easiest way is just to compile the file, and see if there's errors... Of course, that approach will not help to remove the unnecessary #includes/imports. So, sometimes (usually?) I give up, create one huge #include/import file that #includes/imports all the stuff, and use that instead. Efficient? Pretty? No. Easy? Simple? Yes. #includes/imports are redundant information: the source code of course describes what's used in it. So, the compiler could be aware of the whole project (and the libraries used) instead of one file at the time.
Mar 22 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Kristian Kilpi wrote:
 #includes/imports are redundant information: the source code of course 
 describes what's used in it. So, the compiler could be aware of the 
 whole project (and the libraries used) instead of one file at the time.

That's not sufficient. I'm using SDL right now; if I type 'Surface s;', should that import sdl.surface or cairo.Surface? How is the compiler to tell? How should the compiler find out where to look for classes named Surface? Should it scan everything under /usr/local/include/d/? That's going to be pointlessly expensive.
Mar 22 2009
next sibling parent dennis luehring <dl.soluz gmx.net> writes:
 Such things should of course be told to the compiler somehow. By using
 the project configuration, or by other means. (It's only a matter of
 definition.)

maybe like delphi did it there is a file called .dpr (delphi project) which holds the absolute/relative pathes for in project used imports it could be seen as an delphi source based makefile test.dpr --- project test; uses // like D's import unit1 in '\temp\unit1.pas', unit2 in '\bla\unit2.pas', unit3 in '\blub\unit3.pas', ... --- unit1.pas --- uses unit2, unit3; interface ... implementation ... --- and the sources files .pas compiled into an delphi compiler specific "object file format" called .dcu (delphi compiled unit) which holds all intelligent data for the compiler when used serveral times (if the compiler finds an .dcu he will use it, or compile the .pas if needed to an .dcu) i think that, the blasting fast parser (and the absence of generic programming features) makes delphi the fastest compiler out there the compiling speed is compareable to sending a message through icq or save a small file did the dmd compiler have rich compile/linktime intermediate files? and btw: if we do compiletime bechmarks - delphi is the only hart to beat reference but i still don't like delphi :-)
Mar 22 2009
prev sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Kristian Kilpi wrote:
 On Sun, 22 Mar 2009 14:14:39 +0200, Christopher Wright 
 <dhasenan gmail.com> wrote:
 
 Kristian Kilpi wrote:
 #includes/imports are redundant information: the source code of 
 course describes what's used in it. So, the compiler could be aware 
 of the whole project (and the libraries used) instead of one file at 
 the time.

That's not sufficient. I'm using SDL right now; if I type 'Surface s;', should that import sdl.surface or cairo.Surface? How is the compiler to tell? How should the compiler find out where to look for classes named Surface? Should it scan everything under /usr/local/include/d/? That's going to be pointlessly expensive.

Such things should of course be told to the compiler somehow. By using the project configuration, or by other means. (It's only a matter of definition.) For example, if my project contains the Surface class, then 'Surface s;' should of course refer to it. If some library (used by the project) also has the Surface class, then one should use some other way to refer it (e.g. sdl.Surface).

Then I want to deal with a library type with the same name as my builtin type. You can come up with a convention that does the right thing 90% of the time, but produces strange errors on occasion.
 But my point was that the compilers today do not have knowledge about 
 the projects as a whole. That makes this kind of 'scanning' too 
 expensive (in the current compiler implementations). But if the 
 compilers were build differently that wouldn't have to be true.

If you want a system that accepts plugins, you will never have access to the entire project. If you are writing a library, you will never have access to the entire project. So a compiler has to address those needs, too.
 If I were to create/design a compiler (which I am not ;) ), it would be 
 something like this:
 
 Every file is cached (why to read and parse files over and over again, 
 if not necessary). These cache files would contain all the information 
 (parse trees, interfaces, etc) needed during the compilation (of the 
 whole project). Also, they would contain the compilation results too 
 (i.e. assembly). So, these cache/database files would logically replace 
 the old object files.
 
 That is, there would be database for the whole project. When something 
 gets changed, the compiler knows what effect it has and what's required 
 to do.

All this is helpful for developers. It's not helpful if you are merely compiling everything once, but then, the overhead would only be experienced on occasion.
 And finally, I would also change the format of libraries. A library 
 would be one file only. No more header/.di -files; one compact file 
 containing all the needed information (in a binary formated database 
 that can be read very quickly).

Why binary? If your program can operate efficiently with a textual representation, it's easier to test, easier to debug, and less susceptible to changes in internal structures. Additionally, a database in a binary format will require special tools to examine. You can't just pop it open in a text editor to see what functions are defined.
Mar 22 2009
parent reply "Nick Sabalausky" <a a.a> writes:
"Christopher Wright" <dhasenan gmail.com> wrote in message 
news:gq6lms$1815$1 digitalmars.com...
 And finally, I would also change the format of libraries. A library would 
 be one file only. No more header/.di -files; one compact file containing 
 all the needed information (in a binary formated database that can be 
 read very quickly).

Why binary? If your program can operate efficiently with a textual representation, it's easier to test, easier to debug, and less susceptible to changes in internal structures. Additionally, a database in a binary format will require special tools to examine. You can't just pop it open in a text editor to see what functions are defined.

"If your program can operate efficiently with a textual representation..." I think that's the key right there. Most of the time, parsing a sensibly-designed text format is going to be a bit slower than reading in an equivalent sensibly-designed (as opposed to over-engineered [pet-peeve]ex: GOLD Parser Builder's .cgt format[/pet-peeve]) binary format. First off, there's just simply more raw data to be read off the disk and processed, then you've got the actual tokenizing/syntax-parsing itself, and then anything that isn't supposed to be interpreted as a string (like ints and bools) need to get converted to their proper internal representations. And then for saving, you go through all the same, but in reverse. (Also, mixed human/computer editing of a text file can sometimes be problematic.) With a sensibly-designed binary format (and a sensible systems language like D, as opposed to C# or Java), all you really need to do is load a few chunks into memory and apply some structs over top of them. Toss in some trivial version checks and maybe some endian fixups and you're done. Very little processing and memory is needed. I can certainly appreciate the other benefits of text formats, though, and certainly agree that there are cases where the performance of using a text format would be perfectly acceptable. But it can add up. And I often wonder how much faster and more memory-efficient things like linux and the web could have been if they weren't so big on sticking damn near everything into "convenient" text formats.
Mar 24 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Nick Sabalausky:
 I often wonder how much faster and more 
 memory-efficient things like linux and the web could have been if they 
 weren't so big on sticking damn near everything into "convenient" text 
 formats.

Maybe not much, because today textual files can be compressed and decomperssed on the fly. CPUs are now fast enough that even with compression the I/O is usually the bottleneck anyway. Bye, bearophile
Mar 24 2009
parent reply "Nick Sabalausky" <a a.a> writes:
"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:gqbe2k$13al$1 digitalmars.com...
 Nick Sabalausky:
 I often wonder how much faster and more
 memory-efficient things like linux and the web could have been if they
 weren't so big on sticking damn near everything into "convenient" text
 formats.

Maybe not much, because today textual files can be compressed and decomperssed on the fly. CPUs are now fast enough that even with compression the I/O is usually the bottleneck anyway.

I've become more and more wary of this "CPUs are now fast enough..." phrase that keeps getting tossed around these days. The problem is, that argument gets used SO much, that on this fastest computer I've ever owned, I've actually experienced *basic text-entry boxes* (with no real bells or whistles or anything) that had *seconds* of delay. That never once happened to me on my "slow" Apple 2. The unfortunate truth is that the speed and memory of modern systems are constantly getting used to rationalize shoddy bloatware practices and we wind up with systems that are even *slower* than they were back on less-powerful hardware. It's pathetic, and drives me absolutely nuts.
Mar 24 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Nick Sabalausky:
 That never once happened to me on my "slow" Apple 2.<

See here too :-) http://hubpages.com/hub/_86_Mac_Plus_Vs_07_AMD_DualCore_You_Wont_Believe_Who_Wins Yet, what I have written is often true :-) Binary data can't be compressed as well as textual data, and lzop is I/O bound in most situations: http://www.lzop.org/ Bye, bearophile
Mar 24 2009
parent reply "Nick Sabalausky" <a a.a> writes:
"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:gqbgma$189l$1 digitalmars.com...
 Nick Sabalausky:
 That never once happened to me on my "slow" Apple 2.<

See here too :-) http://hubpages.com/hub/_86_Mac_Plus_Vs_07_AMD_DualCore_You_Wont_Believe_Who_Wins

Excellent article :)
 Yet, what I have written is often true :-)
 Binary data can't be compressed as well as textual data,

Doesn't really matter, since binary data (assuming a format that isn't over-engineered) is already smaller than the same data in text form. Text compresses well *because* it contains so much more excess redundant data than binary data does. I could stick 10GB of zeros to the end of a 1MB binary file and suddenly it would compress far better than any typical text file.
 and lzop is I/O bound in most situations:
 http://www.lzop.org/

I'm not really sure why you're bringing up compression...? Do you mean that the actual disk access time of a text format can be brought down to the time of an equivalent binary format by storing the text file in a compressed form?
Mar 24 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Nick Sabalausky:
 Doesn't really matter, since binary data (assuming a format that isn't 
 over-engineered) is already smaller than the same data in text form.

If you take into account compression too, sometimes text compressed is smaller than the same binary file and the same binary file compressed (because good compressors are often able to spot redundancy better in text files than in arbitrary structured binary files).
I'm not really sure why you're bringing up compression...?<

Because experiments have shown it solves or reduces a lot the problem you were talking about.
 Do you mean that 
 the actual disk access time of a text format can be brought down to the time 
 of an equivalent binary format by storing the text file in a compressed 
 form?

It's not always true, but it happens often enough, or the difference becomes tolerable and balances the clarity advantages of the textual format (and sometimes the actual time becomes less, but this is less common). Bye, bearophile
Mar 24 2009
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Nick Sabalausky wrote:
 But it can add up. And I often wonder how much faster and more 
 memory-efficient things like linux and the web could have been if they 
 weren't so big on sticking damn near everything into "convenient" text 
 formats.

Most programs only need to load up text on startup. So the cost of parsing the config file is linear in the number of times you start the application, and linear in the size of the config file. If there were a binary database format in place of libraries, I would be fine with it, as long as there were a convenient way to get the textual version.
Mar 24 2009
prev sibling parent "Kristian Kilpi" <kjkilpi gmail.com> writes:
On Sun, 22 Mar 2009 14:14:39 +0200, Christopher Wright  
<dhasenan gmail.com> wrote:

 Kristian Kilpi wrote:
 #includes/imports are redundant information: the source code of course  
 describes what's used in it. So, the compiler could be aware of the  
 whole project (and the libraries used) instead of one file at the time.

That's not sufficient. I'm using SDL right now; if I type 'Surface s;', should that import sdl.surface or cairo.Surface? How is the compiler to tell? How should the compiler find out where to look for classes named Surface? Should it scan everything under /usr/local/include/d/? That's going to be pointlessly expensive.

Such things should of course be told to the compiler somehow. By using the project configuration, or by other means. (It's only a matter of definition.) For example, if my project contains the Surface class, then 'Surface s;' should of course refer to it. If some library (used by the project) also has the Surface class, then one should use some other way to refer it (e.g. sdl.Surface). But my point was that the compilers today do not have knowledge about the projects as a whole. That makes this kind of 'scanning' too expensive (in the current compiler implementations). But if the compilers were build differently that wouldn't have to be true. If I were to create/design a compiler (which I am not ;) ), it would be something like this: Every file is cached (why to read and parse files over and over again, if not necessary). These cache files would contain all the information (parse trees, interfaces, etc) needed during the compilation (of the whole project). Also, they would contain the compilation results too (i.e. assembly). So, these cache/database files would logically replace the old object files. That is, there would be database for the whole project. When something gets changed, the compiler knows what effect it has and what's required to do. And finally, I would also change the format of libraries. A library would be one file only. No more header/.di -files; one compact file containing all the needed information (in a binary formated database that can be read very quickly).
Mar 22 2009
prev sibling parent "Unknown W. Brackets" <unknown simplemachines.org> writes:
Actually, dmd is so fast I never bother with these "build" utilities.  I 
just send it all the files and have it rebuild everytime, deleting all 
the o files afterward.

This is very fast, even for larger projects.  It appears (to me) the 
static cost of calling dmd is much greater than the dynamic cost of 
compiling a file.  These toolkits always compile a, then b, then c, 
which takes like 2.5 times as long as compiling a, b, and c at once.

That said, if dmd were made to link into other programs, these toolkits 
could hook into it, and have the fixed cost only once (theoretically) - 
but still dynamically decide which files to compile.  This seems ideal.

-[Unknown]


davidl wrote:
 
 1. compiler know in what situation a file need to be recompiled
 
 Consider the file given the same header file, then the obj file of this 
 will be required for linking, all other files import this file shouldn't 
 require any recompilation in this case. If a file's header file changes, 
 thus the interface changes, all files import this file should be 
 recompiled.
 Compiler can emit building command like rebuild does.
 
 I would enjoy:
 
 dmd -buildingcommand abc.d  > responsefile
 
 dmd  responsefile
 
 I think we need to eliminate useless recompilation as much as we should 
 with consideration of the growing d project size.
 
 2. maintaining the build without compiler support costs
 

Mar 22 2009