www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Why?

reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
https://github.com/dlang/dmd/pull/16348

*sigh*

/P
Apr 04
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348
 
 *sigh*
 
 /P
I can certainly see it being used with shared libraries. However there will need to be some changes or at least acknowledgement for cli args per file and support passing it via the import path switch.
Apr 04
parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348
 
 *sigh*
 
 /P
I can certainly see it being used with shared libraries.
If you mean shared code for libraries, I don't see how. Isn't dub the *official* tool to use for library? What's the problem in revamping the way dub download / organise / handle source distributions?
Apr 04
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348

 *sigh*

 /P
I can certainly see it being used with shared libraries.
If you mean shared code for libraries, I don't see how. Isn't dub the *official* tool to use for library? What's the problem in revamping the way dub download / organise / handle source distributions?
No I meant shared libraries. Specifically the distribution of the source files that act as the interface to it. That way you have the .sar file and the .dll and that's everything you need to use it.
Apr 04
parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348

 *sigh*

 /P
I can certainly see it being used with shared libraries.
If you mean shared code for libraries, I don't see how. Isn't dub the *official* tool to use for library? What's the problem in revamping the way dub download / organise / handle source distributions?
No I meant shared libraries. Specifically the distribution of the source files that act as the interface to it. That way you have the .sar file and the .dll and that's everything you need to use it.
Aren't 'di' sources the target solution for library API? What's the problem in distributing a zip or tar? In C++ you usually have a specific "include" directory with all you need, what's the burden in doing a zip with the shared library binary and the 'di' directories? /P
Apr 04
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 05/04/2024 1:27 AM, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348

 *sigh*

 /P
I can certainly see it being used with shared libraries.
If you mean shared code for libraries, I don't see how. Isn't dub the *official* tool to use for library? What's the problem in revamping the way dub download / organise / handle source distributions?
No I meant shared libraries. Specifically the distribution of the source files that act as the interface to it. That way you have the .sar file and the .dll and that's everything you need to use it.
Aren't 'di' sources the target solution for library API? What's the problem in distributing a zip or tar? In C++ you usually have a specific "include" directory with all you need, what's the burden in doing a zip with the shared library binary and the 'di' directories? /P
You can and probably will still zip them up. Getting the number of files down makes it a bit easier to work with when moving them around or using it. I may like the idea of it, but not enough to be arguing for it, so my main concern is making sure Walter isn't simplifying it down to a point where we will have issues with it. I.e. recommending the spec should be a little more complicated than what the code he has written so far (my improved spec): https://gist.github.com/rikkimax/d75bdd1cb9eb9aa7bacb532b69ed398c
Apr 04
parent Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Thursday, 4 April 2024 at 12:33:58 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 05/04/2024 1:27 AM, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 [...]
Aren't 'di' sources the target solution for library API? What's the problem in distributing a zip or tar? In C++ you usually have a specific "include" directory with all you need, what's the burden in doing a zip with the shared library binary and the 'di' directories? /P
You can and probably will still zip them up. Getting the number of files down makes it a bit easier to work with when moving them around or using it. I may like the idea of it, but not enough to be arguing for it, so my main concern is making sure Walter isn't simplifying it down to a point where we will have issues with it. I.e. recommending the spec should be a little more complicated than what the code he has written so far (my improved spec): https://gist.github.com/rikkimax/d75bdd1cb9eb9aa7bacb532b69ed398c
Package manager should take care of moving files around following a given configuration, not users. Let the package manager be cool enough to handle the specific problem that SAR files are supposed to solve.
Apr 04
prev sibling parent reply Hipreme <msnmancini hotmail.com> writes:
On Thursday, 4 April 2024 at 12:27:00 UTC, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348

 *sigh*

 /P
I can certainly see it being used with shared libraries.
If you mean shared code for libraries, I don't see how. Isn't dub the *official* tool to use for library? What's the problem in revamping the way dub download / organise / handle source distributions?
No I meant shared libraries. Specifically the distribution of the source files that act as the interface to it. That way you have the .sar file and the .dll and that's everything you need to use it.
Aren't 'di' sources the target solution for library API? What's the problem in distributing a zip or tar? In C++ you usually have a specific "include" directory with all you need, what's the burden in doing a zip with the shared library binary and the 'di' directories? /P
The .di feature is a big flop of D. They are nearly useless, their generation is dumb since no code analysis is done for removing useless imports. It should analyze the code and: check if the type import is used or not. Also, public imports are always kept. There's no reason nowadays to use auto DI generation. One must either completely ignore this feature or just create their own DI files.
Apr 04
parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Thursday, 4 April 2024 at 13:15:50 UTC, Hipreme wrote:
 On Thursday, 4 April 2024 at 12:27:00 UTC, Paolo Invernizzi 
 wrote:
 On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:
 [...]
No I meant shared libraries. Specifically the distribution of the source files that act as the interface to it. That way you have the .sar file and the .dll and that's everything you need to use it.
Aren't 'di' sources the target solution for library API? What's the problem in distributing a zip or tar? In C++ you usually have a specific "include" directory with all you need, what's the burden in doing a zip with the shared library binary and the 'di' directories? /P
The .di feature is a big flop of D. They are nearly useless, their generation is dumb since no code analysis is done for removing useless imports. It should analyze the code and: check if the type import is used or not. Also, public imports are always kept. There's no reason nowadays to use auto DI generation. One must either completely ignore this feature or just create their own DI files.
Let's put aside the implementation of the automatic way to generate '.di', and rephrase: Are .di intended to be the correct way to expose _public_ API of an opaque library binary? What's the problem SAR targets to solve that a package manager can't solve? Why D needs it?
Apr 04
next sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 05/04/2024 3:11 AM, Paolo Invernizzi wrote:
 Let's put aside the implementation of the automatic way to generate 
 '.di', and rephrase: Are .di intended to be the correct way to expose 
 /public/ API of an opaque library binary?
I intend to see this happen yes. If we don't do this, language features like inference simply make it impossible to link against code that has had it. It will also mean shared libraries can hide their internal details, which is something I care about greatly.
Apr 04
prev sibling parent reply Hipreme <msnmancini hotmail.com> writes:
On Thursday, 4 April 2024 at 14:11:35 UTC, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 13:15:50 UTC, Hipreme wrote:
 On Thursday, 4 April 2024 at 12:27:00 UTC, Paolo Invernizzi 
 wrote:
 On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:
 [...]
No I meant shared libraries. Specifically the distribution of the source files that act as the interface to it. That way you have the .sar file and the .dll and that's everything you need to use it.
Aren't 'di' sources the target solution for library API? What's the problem in distributing a zip or tar? In C++ you usually have a specific "include" directory with all you need, what's the burden in doing a zip with the shared library binary and the 'di' directories? /P
The .di feature is a big flop of D. They are nearly useless, their generation is dumb since no code analysis is done for removing useless imports. It should analyze the code and: check if the type import is used or not. Also, public imports are always kept. There's no reason nowadays to use auto DI generation. One must either completely ignore this feature or just create their own DI files.
Let's put aside the implementation of the automatic way to generate '.di', and rephrase: Are .di intended to be the correct way to expose _public_ API of an opaque library binary? What's the problem SAR targets to solve that a package manager can't solve? Why D needs it?
On the case of it being opaque, I could not care less. Yes, .di files are the correct way to expose public API. When they are used correctly, you could reduce by a lot the compilation time required by your files. SAR solves exactly 0 things. It wasn't even put to test and Dennis even tested in the thread discussion SAR vs dmd -i. DMD -i was faster. Walter said that a cold run could be way faster though since the files won't need to be mem mapped.
Apr 04
parent reply Adam Wilson <flyboynw gmail.com> writes:
On Thursday, 4 April 2024 at 15:36:31 UTC, Hipreme wrote:
 SAR solves exactly 0 things. It wasn't even put to test and 
 Dennis even tested in the thread discussion SAR vs dmd -i. DMD 
 -i was faster. Walter said that a cold run could be way faster 
 though since the files won't need to be mem mapped.
Now ... to convince Walter that loading the whole archive into RAM once will be better than mem-mapping... RAM is cheap and source code is not a big memory hit.
Apr 05
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/5/2024 5:20 PM, Adam Wilson wrote:
 Now ... to convince Walter that loading the whole archive into RAM once will
be 
 better than mem-mapping...
 
 RAM is cheap and source code is not a big memory hit.
There's a switch in the source code to read it all at once or use memory mapping, so I could benchmark which is faster. There's no significant difference, probably because the files weren't large enough. BTW, I recall that executable files are not read into memory and then jumped to. They are memory-mapped files, this is so the executable can start up much faster. Pieces of the executable are loaded in on demand, although the OS will speculatively load in pieces, too.
Apr 09
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 10/04/2024 5:18 AM, Walter Bright wrote:
 BTW, I recall that executable files are not read into memory and then 
 jumped to. They are memory-mapped files, this is so the executable can 
 start up much faster. Pieces of the executable are loaded in on demand, 
 although the OS will speculatively load in pieces, too.
That doesn't sound right. Address randomization, Windows remapping of symbols at runtime (with state that is kept around so you can do it later), all suggest it isn't like that now.
Apr 09
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 10/04/2024 5:22 AM, Richard (Rikki) Andrew Cattermole wrote:
 On 10/04/2024 5:18 AM, Walter Bright wrote:
 BTW, I recall that executable files are not read into memory and then 
 jumped to. They are memory-mapped files, this is so the executable can 
 start up much faster. Pieces of the executable are loaded in on 
 demand, although the OS will speculatively load in pieces, too.
That doesn't sound right. Address randomization, Windows remapping of symbols at runtime (with state that is kept around so you can do it later), all suggest it isn't like that now.
It appears to have been true as of Windows 2000. https://learn.microsoft.com/en-us/archive/msdn-magazine/2002/march/windows-2000-loader-what-goes-on-inside-windows-2000-solving-the-mysteries-of-the-loader See: LdrpMapDll However I don't think those two features may have existed at the time. As of Windows Internals 5, the cache manager uses 256kb blocks as part of memory mapping (very useful information that!). Would be worth double checking that this is the default for std.mmap. So it seems I'm half right, there is no way Windows could be memory mapping binaries when address randomization is turned on for a given block that has rewrites for symbol locations, but it may be memory mapping large blocks of data if it doesn't.
Apr 09
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
https://issues.dlang.org/show_bug.cgi?id=24494
Apr 09
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/9/2024 10:22 AM, Richard (Rikki) Andrew Cattermole wrote:
 Address randomization, Windows remapping of symbols at runtime (with state
that 
 is kept around so you can do it later), all suggest it isn't like that now.
Code generation has changed to be PIC (Position Independent Code) so this is workable.
Apr 09
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 10/04/2024 7:04 AM, Walter Bright wrote:
 On 4/9/2024 10:22 AM, Richard (Rikki) Andrew Cattermole wrote:
 Address randomization, Windows remapping of symbols at runtime (with 
 state that is kept around so you can do it later), all suggest it 
 isn't like that now.
Code generation has changed to be PIC (Position Independent Code) so this is workable.
From Windows Internals 5 and WinAPI docs, it seems as though it does memory map initially. But the patching will activate CoW, so in effect it isn't memory mapped if you need to patch. Which is quite useful information for me while working with Unicode tables. If you need to patch? That won't be shared. If you don't need to patch? Who cares how much ROM is used! Don't be afraid to use 256kb in a single table. Just don't use pointers... no matter what and it'll be shared.
Apr 09
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/9/2024 12:11 PM, Richard (Rikki) Andrew Cattermole wrote:
 On 10/04/2024 7:04 AM, Walter Bright wrote:
 On 4/9/2024 10:22 AM, Richard (Rikki) Andrew Cattermole wrote:
 Address randomization, Windows remapping of symbols at runtime (with state 
 that is kept around so you can do it later), all suggest it isn't like that
now.
Code generation has changed to be PIC (Position Independent Code) so this is workable.
From Windows Internals 5 and WinAPI docs, it seems as though it does memory map initially. But the patching will activate CoW, so in effect it isn't memory mapped if you need to patch.
Right, so the executable is designed to not need patching.
 If you don't need to patch? Who cares how much ROM is used! Don't be afraid to 
 use 256kb in a single table. Just don't use pointers... no matter what and
it'll 
 be shared.
Instead of using pointers, use offsets from the beginning of the file.
Apr 09
prev sibling parent reply ryuukk_ <ryuukk.dev gmail.com> writes:
On Tuesday, 9 April 2024 at 17:18:02 UTC, Walter Bright wrote:
 On 4/5/2024 5:20 PM, Adam Wilson wrote:
 Now ... to convince Walter that loading the whole archive into 
 RAM once will be better than mem-mapping...
 
 RAM is cheap and source code is not a big memory hit.
There's a switch in the source code to read it all at once or use memory mapping, so I could benchmark which is faster. There's no significant difference, probably because the files weren't large enough. BTW, I recall that executable files are not read into memory and then jumped to. They are memory-mapped files, this is so the executable can start up much faster. Pieces of the executable are loaded in on demand, although the OS will speculatively load in pieces, too.
Who managed to convince you to spend time working on this? The solution to poor phobos build time is to stop abusing templates for code that doesn't exist for problems that nobody have https://github.com/dlang/phobos/tree/master/phobos/sys
Apr 09
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/9/2024 10:58 AM, ryuukk_ wrote:
 Who managed to convince you to spend time working on this?
Nobody. I've wanted to do it for decades, just never got around to it. What triggered it was my proposal to Adam to split Phobos modules into a much more granular structure, which would increase the number of files in it by a factor of 5 or more. (The current structure is of each module being a grab bag of marginally related functions.) A more granular nature would hopefully reduce the "every module imports every other module" problem Phobos has. But lots more modules increases aggregate file lookup times. One cannot really tell how well it works without trying it.
Apr 09
parent reply Adam Wilson <flyboynw gmail.com> writes:
On Tuesday, 9 April 2024 at 19:11:28 UTC, Walter Bright wrote:
 Nobody. I've wanted to do it for decades, just never got around 
 to it. What triggered it was my proposal to Adam to split 
 Phobos modules into a much more granular structure, which would 
 increase the number of files in it by a factor of 5 or more. 
 (The current structure is of each module being a grab bag of 
 marginally related functions.) A more granular nature would 
 hopefully reduce the "every module imports every other module" 
 problem Phobos has.
Great. Now everybody is going to think that I started this. For the record I did **not** start this. Walter sent me this idea out of the blue after I pointed out that working with hundreds (or thousands) of files in Phobos was going This wasn't the problem I was thinking of because frankly, nobody certain advantages from a distribution standpoint. Although honestly, we're going to end up unpacking the files for other tools to use anyways.
Apr 10
parent Paulo Pinto <pjmlp progtools.org> writes:
On Wednesday, 10 April 2024 at 10:17:53 UTC, Adam Wilson wrote:
 On Tuesday, 9 April 2024 at 19:11:28 UTC, Walter Bright wrote:
 [...]
Great. Now everybody is going to think that I started this. For the record I did **not** start this. Walter sent me this idea out of the blue after I pointed out that working with hundreds (or thousands) of files in Phobos This wasn't the problem I was thinking of because frankly, have certain advantages from a distribution standpoint. Although honestly, we're going to end up unpacking the files for other tools to use anyways.
Not only we don't care about file access times to JAR/WAR/EAR and DLLs, we happily ship binary libraries, instead of parsing source code all the time. This looks to me at yet another distraction.
Apr 10
prev sibling parent reply Hipreme <msnmancini hotmail.com> writes:
On Thursday, 4 April 2024 at 09:55:32 UTC, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348

 *sigh*

 /P
Rationale: https://github.com/dlang/dmd/blob/8e94bc644fc72dc3f72a00791eb52b40230ceb26/changelog/dmd.source-archive.dd#L79 The part I'm very interested in in the compilation time. This may reduce by a lot the compilation time required by dub libraries. But there would be a requirement of doing a synchronized change between all the compilers and our existing tools :) For development time, this feature might be useless. I also will need to start thinking now how to support those.
2. To compile all the source files at once with DMD, the command 
line can get
extremely long, and certainly unwieldy. With .sar files, you may not even need a makefile or builder, just: This is actually a like. It is possible to do that with `dmd -i -i=std` or something like that. The main feature one gain from makefiles or builders aren't declaring the files you're using, it is for defining version configuration. Also, I'll be renaming the thread name since it doesn't open up for any discussion.
Apr 04
parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Thursday, 4 April 2024 at 10:48:50 UTC, Hipreme wrote:
 On Thursday, 4 April 2024 at 09:55:32 UTC, Paolo Invernizzi 
 wrote:
 [...]
Rationale: https://github.com/dlang/dmd/blob/8e94bc644fc72dc3f72a00791eb52b40230ceb26/changelog/dmd.source-archive.dd#L79 [...]
My 2 cents: there will be NO advantages in compilation time.
 [...]
What you can't do with `dmd -I` that you can do with sar?
Apr 04
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/4/2024 4:12 AM, Paolo Invernizzi wrote:
 My 2 cents: there will be NO advantages in compilation time.
Unfortunately, some things cannot be benchmarked until they are built.
Apr 09
parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Tuesday, 9 April 2024 at 17:19:41 UTC, Walter Bright wrote:
 On 4/4/2024 4:12 AM, Paolo Invernizzi wrote:
 My 2 cents: there will be NO advantages in compilation time.
Unfortunately, some things cannot be benchmarked until they are built.
Exactly, mine It's a bet ... but hey, I'll be happy to lost it, of course!
Apr 09
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Tuesday, 9 April 2024 at 18:49:21 UTC, Paolo Invernizzi wrote:
 On Tuesday, 9 April 2024 at 17:19:41 UTC, Walter Bright wrote:
 On 4/4/2024 4:12 AM, Paolo Invernizzi wrote:
 My 2 cents: there will be NO advantages in compilation time.
Unfortunately, some things cannot be benchmarked until they are built.
Exactly, mine It's a bet ... but hey, I'll be happy to lost it, of course!
I will also bet that any difference in compile time will be extremely insignificant. I don't bet against decades of filesystem read optimizations. Saving e.g. microseconds on a 1.5 second build isn't going to move the needle. I did reduce stats semi-recently for DMD and saved a significant percentage of stats, I don't really think it saved insane amounts of time. It was more of a "oh, I thought of a better way to do this". I think at the time, there was some resistance to adding more stats to the compiler due to the same misguided optimization beliefs, and so I started looking at it. If reducing stats by 90% wasn't significant, reducing them again likely isn't going to be noticed. See https://github.com/dlang/dmd/pull/14582 The only benefit I might see in this is to *manage* the source as one item. But I don't really know that we need a new custom format. `tar` is pretty simple. ARSD has a tar implementation that I lifted for my raylib-d installer which allows reading tar files with about [100 lines of code](https://github.com/schveiguy/raylib-d/blob/9906279494f1f83b2c4c9550779d46962af7c342/install/source/app.d#L22-L132). -Steve
Apr 09
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/9/2024 4:42 PM, Steven Schveighoffer wrote:
 I will also bet that any difference in compile time will be extremely 
 insignificant. I don't bet against decades of filesystem read optimizations. 
 Saving e.g. microseconds on a 1.5 second build isn't going to move the needle.
On my timing on compiling hello world, a 1.412s build becomes 1.375s, 35 milliseconds faster. Most of the savings appear to be due to when the archive is first accessed, its table of contents is loaded into the path cache and file cache that you developed. Then, no stats are done on the filesystem.
 I did reduce stats semi-recently for DMD and saved a significant percentage of 
 stats, I don't really think it saved insane amounts of time. It was more of a 
 "oh, I thought of a better way to do this". I think at the time, there was
some 
 resistance to adding more stats to the compiler due to the same misguided 
 optimization beliefs, and so I started looking at it. If reducing stats by 90% 
 wasn't significant, reducing them again likely isn't going to be noticed.
 
 See https://github.com/dlang/dmd/pull/14582
Nice. I extended it so files in an archive are tracked.
 The only benefit I might see in this is to *manage* the source as one item.
The convenience of being able to distribute a "header only" library as one file may be significant. I've always liked things that didn't need an installation program. An install should be "copy the file onto your system" and uninstall should be "delete the file" ! Back in the days of CD software, my compiler was set up so no install was necessary, just put the CD in the drive and run it. You didn't even have to set the environment variables, as the compiler would look for its files relative to where the executable file was (argv[0]). You can see vestiges of that still in today's dmd. Of course, to get it to run faster you'd XCOPY it onto the hard drive. Though some users were flummoxed by the absence of INSTALL.EXE and I'd have to explain how to use XCOPY.
 But 
 I don't really know that we need a new custom format. `tar` is pretty simple. 
 ARSD has a tar implementation that I lifted for my raylib-d installer which 
 allows reading tar files with about [100 lines of 
 code](https://github.com/schveiguy/raylib-d/blob/9906279494f1f83b2c4c9550779d46962af7c342/install/source/app.d#L22-L132).
Thanks for the code. A tar file is serial, meaning one has to read the entire file to see what it is in it (because it was designed for tape systems where data is simply appended). The tar file doesn't have a table of contents, the filename is limited to 100 characters, and the path is limited to 155 characters. Sar files have a table of contents at the beginning, and unlimited filespec sizes. P.S. the code that actually reads the .sar file is about 20 lines! (Excluding checking for corrupt files, and the header structure definition.) The archive reader and writer can be encapsulated in a separate module, so anyone can replace it with a different format.
Apr 09
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Wednesday, 10 April 2024 at 03:47:30 UTC, Walter Bright wrote:
 On 4/9/2024 4:42 PM, Steven Schveighoffer wrote:
 I will also bet that any difference in compile time will be 
 extremely insignificant. I don't bet against decades of 
 filesystem read optimizations. Saving e.g. microseconds on a 
 1.5 second build isn't going to move the needle.
On my timing on compiling hello world, a 1.412s build becomes 1.375s, 35 milliseconds faster. Most of the savings appear to be due to when the archive is first accessed, its table of contents is loaded into the path cache and file cache that you developed. Then, no stats are done on the filesystem.
Yes, the nice thing is knowing you will not have to ask the filesystem for something you know doesn't exist. Pre-loading the directory structure could do the same thing, but I think that's definitely not as efficient.
 The only benefit I might see in this is to *manage* the source 
 as one item.
The convenience of being able to distribute a "header only" library as one file may be significant. I've always liked things that didn't need an installation program. An install should be "copy the file onto your system" and uninstall should be "delete the file" ! Back in the days of CD software, my compiler was set up so no install was necessary, just put the CD in the drive and run it. You didn't even have to set the environment variables, as the compiler would look for its files relative to where the executable file was (argv[0]). You can see vestiges of that still in today's dmd. Of course, to get it to run faster you'd XCOPY it onto the hard drive. Though some users were flummoxed by the absence of INSTALL.EXE and I'd have to explain how to use XCOPY.
Consider that java archives (`.jar` files) are distributed as a package instead of individual `.class` files. And Microsoft (and other C compilers) can produce "pre-compiled headers", that take away some of the initial steps of compilation. I think there would be enthusiastic support for D archive files that reduce some of the compilation steps, or provide extra features (e.g. predetermined inference or matching compile-time switches). Especially if you aren't going to directly edit these archive files, you will be mechanically generating them, why not do more inside there?
 A tar file is serial, meaning one has to read the entire file 
 to see what it is in it (because it was designed for tape 
 systems where data is simply appended).
You can index a tar file easily. Each file is preceded by a header with the information about the file (including size). So you can determine the catalog by seeking to each header. Note also that we can work with tar files to add indexes that are backwards compatible with existing tools. Remember, we are generating this *from a tool that we control*. Prepending an index "file" is trivial.
 The tar file doesn't have a table of contents, the filename is 
 limited to 100 characters, and the path is limited to 155 
 characters.
I'm not too worried about such things. I've never run into filename length problems with tar. But also, most modern tar formats do not have these limitations: https://www.gnu.org/software/tar/manual/html_section/Formats.html
 Sar files have a table of contents at the beginning, and 
 unlimited filespec sizes.

 P.S. the code that actually reads the .sar file is about 20 
 lines! (Excluding checking for corrupt files, and the header 
 structure definition.) The archive reader and writer can be 
 encapsulated in a separate module, so anyone can replace it 
 with a different format.
I would suggest we replace it with a modern tar format for maximum compatibility with existing tools. We already have seen the drawbacks of using the abandoned `sdl` format for dub packages. We should not repeat that mistake. -Steve
Apr 10
next sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Wednesday, 10 April 2024 at 16:42:53 UTC, Steven Schveighoffer 
wrote:
 On Wednesday, 10 April 2024 at 03:47:30 UTC, Walter Bright 
 wrote:
 [...]
Yes, the nice thing is knowing you will not have to ask the filesystem for something you know doesn't exist. Pre-loading the directory structure could do the same thing, but I think that's definitely not as efficient. [...]
C++ compilers are already on the next level, past PCH, with C++ modules. VC++ uses a database format for BMI (Binary Module Interface), has open sourced it, and there are some people trying to champion it as means to have C++ tooling similar to what Java and .NET IDEs can do with JVM/CLR metadata. https://devblogs.microsoft.com/cppblog/open-sourcing-ifc-sdk-for-cpp-modules/
Apr 10
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/10/2024 9:54 AM, Paulo Pinto wrote:
 C++ compilers are already on the next level, past PCH, with C++ modules.
 
 VC++ uses a database format for BMI (Binary Module Interface), has open
sourced 
 it, and there are some people trying to champion it as means to have C++
tooling 
 similar to what Java and .NET IDEs can do with JVM/CLR metadata.
 
 https://devblogs.microsoft.com/cppblog/open-sourcing-ifc-sdk-for-cpp-modules/
That's more or less what my C++ compiler did back in the 1990s. The symbol table and AST was created in a memory-mapped file, which could be read back in to jump-start the next compilation. Yes, it was faster. But the problem C++ has is compiling it is inherently slow due to the design of the language. My experience with that led to D being fast to compile, because I knew what to get rid of. With a language that compiles fast, it isn't worthwhile to have a binary precompiled module.
Apr 10
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
We certainly could do more with .sar files, we just have to start somewhere.

If we're going to add features to a .tar file, like an index, aren't we then 
creating our own format and won't be able to use existing .tar programs?

Yes, one can skip through a .tar archive indexing as one goes. The problem is 
one winds up reading the .tar archive. With the .sar format, the index is at
the 
beginning and none of the rest of the file is read in, unless actually needed. 
.tar is the only archive format I'm aware of that does not have an index 
section, and that's because it's designed for append-only magtapes. (Talk about 
ancient obsolete technology!)

Many archive formats also include optional compression, and various compression 
methods at that. All that support would have to be added to the compiler, as 
otherwise I'll get the bug reports "dmd failed with my .zip file!"

Still, the concept of presenting things as a single file is completely distinct 
from the file format used. The archive format being pluggable is certainly an 
option.
Apr 10
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Thursday, 11 April 2024 at 01:36:57 UTC, Walter Bright wrote:
 We certainly could do more with .sar files, we just have to 
 start somewhere.

 If we're going to add features to a .tar file, like an index, 
 aren't we then creating our own format and won't be able to use 
 existing .tar programs?
No. tar programs would work fine with it. We could indicate they are normal files, and normal tar programs would just extract an "index" file when expanding, or we could indicate they are vendor-specific extensions, which should be ignored or processed as normal files by other tar programs. We are not the first ones to think of these things, it is in the spec.
 Yes, one can skip through a .tar archive indexing as one goes. 
 The problem is one winds up reading the .tar archive. With the 
 .sar format, the index is at the beginning and none of the rest 
 of the file is read in, unless actually needed. .tar is the 
 only archive format I'm aware of that does not have an index 
 section, and that's because it's designed for append-only 
 magtapes. (Talk about ancient obsolete technology!)
This would be a fallback, when an index isn't provided as the first file. So normal tar source files could be supported.
 Many archive formats also include optional compression, and 
 various compression methods at that. All that support would 
 have to be added to the compiler, as otherwise I'll get the bug 
 reports "dmd failed with my .zip file!"
tar format doesn't have compression, though the tar executable supports it. I wouldn't recommend zip files as a supported archive format, and using compressed tarballs would definitely result in reading the whole file (you can't skip N bytes when you don't know the compressed size).
 Still, the concept of presenting things as a single file is 
 completely distinct from the file format used. The archive 
 format being pluggable is certainly an option.
I stress again, we should not introduce esoteric formats that are mostly equivalent to existing formats without a good reason. The first option should be to use existing formats, seeing if we can fit our use case into them. If that is impossible or prevents certain features, then we can consider using a new format. It should be a high bar to add new file formats to the toolchain, as this affects all tools that people depend on and use. Think of why we use standard object formats instead of our own format (which would allow much more tight integration with the language). -Steve
Apr 11
next sibling parent Nick Treleaven <nick geany.org> writes:
On Thursday, 11 April 2024 at 15:28:34 UTC, Steven Schveighoffer 
wrote:
 On Thursday, 11 April 2024 at 01:36:57 UTC, Walter Bright wrote:
 If we're going to add features to a .tar file, like an index, 
 aren't we then creating our own format and won't be able to 
 use existing .tar programs?
No. tar programs would work fine with it. We could indicate they are normal files, and normal tar programs would just extract an "index" file when expanding, or we could indicate they are vendor-specific extensions, which should be ignored or processed as normal files by other tar programs. We are not the first ones to think of these things, it is in the spec.
Sounds like a good solution. Users would be able to use e.g. any GUI program that supports tar to extract a file from the archive. The advantage is for reading. D-specific tools should be used to write the file. If there is any concern about this, it could even have a different extension so long as the file format is standard tar - users that know this can still benefit from tar readers. There seems to be precedent for this - apparently .jar files are .zip files.
 Yes, one can skip through a .tar archive indexing as one goes. 
 The problem is one winds up reading the .tar archive. With the 
 .sar format, the index is at the beginning and none of the 
 rest of the file is read in, unless actually needed. .tar is 
 the only archive format I'm aware of that does not have an 
 index section, and that's because it's designed for 
 append-only magtapes. (Talk about ancient obsolete technology!)
This would be a fallback, when an index isn't provided as the first file. So normal tar source files could be supported.
Or just error if a tar file doesn't have the expected index file.
Apr 11
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/11/2024 8:28 AM, Steven Schveighoffer wrote:
 Think of why we use standard object formats instead of our own format (which 
 would allow much more tight integration with the language).
We use standard object formats because we don't have a linker. I've spent a lot of time trying to understand their byzantine structure. It's not fun work. I mentioned that the archive support can be pluggable. It's only two functions with a generic interface to them. If we aren't going to move forward with source archives, it would be a giant waste of time to learn .tar and all its variations. I chose to invent the .sar format because it's 20 lines of code to read them, and about the same to write them. Even doing a survey of the top 10 archive formats would have taken more time than the entire PR, let alone the time spent debating them. The source archive PR is a proof of concept. The actual archive format is irrelevant.
 or we could indicate they are vendor-specific extensions
Wouldn't that defeat the purpose of being a .tar format?
 It should be a high bar to add new file formats to the toolchain, as this 
affects all tools that people depend on and use. Using a .tar format would affect all the dlang source code tools just as much as using the .sar format would.
Apr 13
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Sunday, 14 April 2024 at 06:04:02 UTC, Walter Bright wrote:
 On 4/11/2024 8:28 AM, Steven Schveighoffer wrote:
 Think of why we use standard object formats instead of our own 
 format (which would allow much more tight integration with the 
 language).
We use standard object formats because we don't have a linker. I've spent a lot of time trying to understand their byzantine structure. It's not fun work.
Exactly, we don't need to be responsible for all the things. Using standard object format means we don't have to write our own linker.
 I mentioned that the archive support can be pluggable. It's 
 only two functions with a generic interface to them. If we 
 aren't going to move forward with source archives, it would be 
 a giant waste of time to learn .tar and all its variations.
Fair point. If this doesn't fly, then learning all the variations of tar might not be applicable (though I can say I personally "learned" tar in about 15 minutes, it's really simple).
 I chose to invent the .sar format because it's 20 lines of code 
 to read them, and about the same to write them. Even doing a 
 survey of the top 10 archive formats would have taken more time 
 than the entire PR, let alone the time spent debating them.
This misses the point. It's not that it's easy to add to the compiler. Both are easy, both are straightforward, one might be easier than the other, but it's probably a wash (maybe 2 hours vs 4 hours?) The problem is *all the other tools* that people might want to use. And specifically, I'm talking about IDEs. You have a 20 line solution in D, how does that help an IDE written in Java? However, Java has `tar` support that is tried and tested, and probably already in the IDE codebase itself. Writing 20 lines of code isn't "mission accomplished". We now have to ask all IDE providers to support this for symbol lookup. That's what I'm talking about.
 The source archive PR is a proof of concept. The actual archive 
 format is irrelevant.
This is good, and I understand what you are trying to say. As long as it remains PoC, with the expectation that if it turns out to be useful, we address these ecosystem issues, then I have no objections.
 or we could indicate they are vendor-specific extensions
Wouldn't that defeat the purpose of being a .tar format?
No, vendor-specific sections are in the spec. Existing tar programs would still read these just fine. But even if we wanted to avoid that, adding an index can be done by including a specific filename that the D compiler recognizes as the index.
 It should be a high bar to add new file formats to the
toolchain, as this affects all tools that people depend on and use. Using a .tar format would affect all the dlang source code tools just as much as using the .sar format would.
Yes, of course. It's just, will there be a ready-made library available for whatever IDEs are using for language/libraries? With .sar, the answer is no (it hasn't been invented yet). With .tar, it's likely yes. -Steve
Apr 14