www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Adding ccache-like output caching to dmd

reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
Has anyone considered integrating into a `dmd` a ccache-like 
caching of output files indexed by digests based on

- environment variables,
- process arguments which, in turn, decide
- input file contents (including import files detected upon first 
uncached compile)
- dmd compiler binary fingerprint
- ...probably something more I missed

Initial call stores that list alongside content hash and 
resulting binary(s).

If not, would anyone have any strong objections against adding 
this?
Dec 28 2020
next sibling parent Max Haughton <maxhaton gmail.com> writes:
On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
 Has anyone considered integrating into a `dmd` a ccache-like 
 caching of output files indexed by digests based on

 - environment variables,
 - process arguments which, in turn, decide
 - input file contents (including import files detected upon 
 first uncached compile)
 - dmd compiler binary fingerprint
 - ...probably something more I missed

 Initial call stores that list alongside content hash and 
 resulting binary(s).

 If not, would anyone have any strong objections against adding 
 this?
If it's implemented in a sensible manner I don't see why not. My only worry would be that dmd code tends to be a weird blend of C, C++, and Java - if the cache is properly wrapped up in a way that compartmentalizes the things that can go wrong then go for it.
Dec 28 2020
prev sibling next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
 Has anyone considered integrating into a `dmd` a ccache-like 
 caching of output files indexed by digests based on

 - environment variables,
 - process arguments which, in turn, decide
 - input file contents (including import files detected upon 
 first uncached compile)
 - dmd compiler binary fingerprint
 - ...probably something more I missed

 Initial call stores that list alongside content hash and 
 resulting binary(s).

 If not, would anyone have any strong objections against adding 
 this?
The issue is that because of string imports you don't know the full set of files you are depending on. which means any change can cause any file to be required.
Dec 29 2020
next sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:
 On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
 Has anyone considered integrating into a `dmd` a ccache-like 
 caching of output files indexed by digests based on

 - environment variables,
 - process arguments which, in turn, decide
 - input file contents (including import files detected upon 
 first uncached compile)
 - dmd compiler binary fingerprint
 - ...probably something more I missed

 Initial call stores that list alongside content hash and 
 resulting binary(s).

 If not, would anyone have any strong objections against adding 
 this?
The issue is that because of string imports you don't know the full set of files you are depending on. which means any change can cause any file to be required.
In general it's unknown what files a given D build depends on until after the build has (mostly) happened. This is true for string imports, but also for regular imports. Conceptually we split inputs in to: Y: inputs knowable only after compilation is done (set of the contents of all imported files, string or code) X: inputs known ahead of time (e.g. the command line flags to DMD). Object files are O. The set of file names containing Y are referred to by S. Compiler is then a pure function F(X, Y) -> O. Real compiler invocation is C(X, [Y]) -> O where [Y] means Y is implicit. But the compiler can give us S, so we can instead say compiler is C(X, [Y]) -> (O, S). The only way S will change is if X or Y change. It (roughly :-p ) follows that we can build a persistent nested map Hash(X) -> ((S, Hash(Y)) -> O). We calculate Hash(X) before compiling and look up in the map to get (S, Hash(Y)). If it's not there then you need to recompile and store a new entry in the outer map. If it is, then read all the files in S and use that to calculate Hash(Y)', if Hash(Y)' == Hash(Y) then proceed to get O, else recompile and store a new entry in the inner map. Or something like that, you get the idea... It's not intractable, it's just a bit fiddly.
Dec 29 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 29 December 2020 at 16:43:33 UTC, John Colvin wrote:
 Or something like that, you get the idea... It's not 
 intractable, it's just a bit fiddly.
Superb answer. Does this design match https://github.com/dlang/dub/pull/2044
Dec 29 2020
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Tuesday, 29 December 2020 at 19:57:21 UTC, Per Nordlöw wrote:
 On Tuesday, 29 December 2020 at 16:43:33 UTC, John Colvin wrote:
 Or something like that, you get the idea... It's not 
 intractable, it's just a bit fiddly.
Superb answer. Does this design match https://github.com/dlang/dub/pull/2044
Not really, although maybe it should? If I understand correctly (I haven't reviewed the implementation), that PR is using dub's normal rebuild rules w.r.t. changed files and is just swapping out access times for content hashes. Maybe I'm mistaken, but I don't think dub pays any attention to changes in files that aren't source files.
Dec 29 2020
next sibling parent John Colvin <john.loughran.colvin gmail.com> writes:
On Tuesday, 29 December 2020 at 20:09:25 UTC, John Colvin wrote:
 On Tuesday, 29 December 2020 at 19:57:21 UTC, Per Nordlöw wrote:
 On Tuesday, 29 December 2020 at 16:43:33 UTC, John Colvin 
 wrote:
 Or something like that, you get the idea... It's not 
 intractable, it's just a bit fiddly.
Superb answer. Does this design match https://github.com/dlang/dub/pull/2044
Not really, although maybe it should? If I understand correctly (I haven't reviewed the implementation), that PR is using dub's normal rebuild rules w.r.t. changed files and is just swapping out access times for content hashes. Maybe I'm mistaken, but I don't think dub pays any attention to changes in files that aren't source files.
s/access times/modification times/ s/source files/in sourceFiles\/sourcePaths/
Dec 29 2020
prev sibling parent drug <drug2004 bk.ru> writes:
On 12/29/20 11:09 PM, John Colvin wrote:
 On Tuesday, 29 December 2020 at 19:57:21 UTC, Per Nordlöw wrote:
 On Tuesday, 29 December 2020 at 16:43:33 UTC, John Colvin wrote:
 Or something like that, you get the idea... It's not intractable, 
 it's just a bit fiddly.
Superb answer. Does this design match https://github.com/dlang/dub/pull/2044
Not really, although maybe it should? If I understand correctly (I haven't reviewed the implementation), that PR is using dub's normal rebuild rules w.r.t. changed files and is just swapping out access times for content hashes.
Dub already provides something like S, say S* [1], so currently compiler invocation is F(Z)->0, where Z = (X, S*). The PR implements this: rebuild = false foreach(file: Z) { if Hash(file) != BuildCache[file] or !file.exists { rebuild = true break } } if (rebuild) { buildWithCompiler BuildCache = Hash(Z) }
 Maybe I'm mistaken, but I
 don't think dub pays any attention to changes in files that aren't 
 source files.
Probably I misunderstand but if you mean that source files are *.d files only then you are mistaken, Z contains *.{sdl|json}, string imports, binary libraries etc 1 Not exactly S, dub scans all string import paths and adds their content to S, so it can add files that are not used in the current build
Dec 30 2020
prev sibling next sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:
 On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
 Has anyone considered integrating into a `dmd` a ccache-like 
 caching of output files indexed by digests based on

 - environment variables,
 - process arguments which, in turn, decide
 - input file contents (including import files detected upon 
 first uncached compile)
 - dmd compiler binary fingerprint
 - ...probably something more I missed

 Initial call stores that list alongside content hash and 
 resulting binary(s).

 If not, would anyone have any strong objections against adding 
 this?
The issue is that because of string imports you don't know the full set of files you are depending on. which means any change can cause any file to be required.
If we pass the complete set of files (instead of using relying on [string] import paths, which not very precise), this definitely doable. Sure, the developer "experience" would be a bit more clumsy, but not a big deal either - a wrapper tool could first compile your code with `dmd -i -makedeps` [1] and then save the currently known set of files and then the incremental compilation would use it. [1]: coming soon: https://github.com/dlang/dmd/pull/12049
Dec 29 2020
parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 29 December 2020 at 17:41:49 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:
 On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
 [...]
The issue is that because of string imports you don't know the full set of files you are depending on. which means any change can cause any file to be required.
If we pass the complete set of files (instead of using relying on [string] import paths, which not very precise), this definitely doable. Sure, the developer "experience" would be a bit more clumsy, but not a big deal either - a wrapper tool could first compile your code with `dmd -i -makedeps` [1] and then save the currently known set of files and then the incremental compilation would use it. [1]: coming soon: https://github.com/dlang/dmd/pull/12049
Edit: What John Colvin said :D
Dec 29 2020
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 29 December 2020 at 17:43:49 UTC, Petar Kirov 
[ZombineDev] wrote:
 [1]: coming soon: https://github.com/dlang/dmd/pull/12049
Great addition. Will the new dub caching pull request benefit from using -makedeps?
Dec 30 2020
prev sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:
 The issue is that because of string imports you don't know the 
 full set of files you are depending on.
 which means any change can cause any file to be required.
If we, in dmd, during the initial (uncached) build log all the imported files including string imports and output them to a cache description together with their individual content hashes and pessimistically rebuild every time anything changes I don't see how this can be an issue. Can you elaborate on which case I've missed?
Dec 29 2020
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 29 December 2020 at 19:26:20 UTC, Per Nordlöw wrote:
 Can you elaborate on which case I've missed?
Unless, CTFE incorporate non-deterministic states, but afaict it isn't allowed to do that since, the functions it calls must all be pure.
Dec 29 2020
prev sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 29 December 2020 at 19:26:20 UTC, Per Nordlöw wrote:
 If we, in dmd, during the initial (uncached) build log all the 
 imported files including string imports and output them to a 
 cache description together with their individual content hashes 
 and pessimistically rebuild every time anything changes I don't 
 see how this can be an issue. Can you elaborate on which case 
 I've missed?
Thanks, John Colvin for your thorough answer. Both I and others will greatly benefit from me making my language as formal as yours. ;)
Dec 29 2020
prev sibling next sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 12/28/20 3:14 PM, Per Nordl=C3=B6w wrote:
 Has anyone considered integrating into a `dmd` a ccache-like caching of=
=20
 output files indexed by digests based on
Related: https://forum.dlang.org/post/r812of$11n7$1 digitalmars.com Ali
Dec 29 2020
prev sibling next sibling parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
 Has anyone considered integrating into a `dmd` a ccache-like 
 caching of output files indexed by digests based on

 - environment variables,
 - process arguments which, in turn, decide
 - input file contents (including import files detected upon 
 first uncached compile)
 - dmd compiler binary fingerprint
 - ...probably something more I missed

 Initial call stores that list alongside content hash and 
 resulting binary(s).

 If not, would anyone have any strong objections against adding 
 this?
Or we could just use Nix [1] (TL;DR version - [2]) :P That said, Nix mostly with high-level caching, and won't help with incremental compilation. Checkout the previous efforts in this area: [3] [4] [1]: https://edolstra.github.io/pubs/phd-thesis.pdf [2]: https://nixos.org/guides/how-nix-works.html [3]: https://www.youtube.com/watch?v=WHb7y3JYEBQ [4]: https://github.com/dlang/dmd/pull/7843
Dec 29 2020
prev sibling parent Johan <j j.nl> writes:
On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
 Has anyone considered integrating into a `dmd` a ccache-like 
 caching of output files indexed by digests based on

 - environment variables,
 - process arguments which, in turn, decide
 - input file contents (including import files detected upon 
 first uncached compile)
 - dmd compiler binary fingerprint
 - ...probably something more I missed

 Initial call stores that list alongside content hash and 
 resulting binary(s).

 If not, would anyone have any strong objections against adding 
 this?
FWIW, I feel this is much better handled by a build system that invokes the compiler, and not by the compiler itself. Handling the build environment, input/intermediate/output files (timestamps, interdependencies etc.), invoking (or caching) the substep tool, ..., are core tasks of a build system tool. Caching would add a lot of non-core-task complexity to a compiler. The specific task of optimization and machine code generation is cachable by LDC (see `--cache`), but that is much more limited task. -Johan
Dec 29 2020