digitalmars.D

digitalmars.D - Why?

Paolo Invernizzi (3/3) Apr 04 2024 https://github.com/dlang/dmd/pull/16348

Richard (Rikki) Andrew Cattermole (4/9) Apr 04 2024 I can certainly see it being used with shared libraries.

Paolo Invernizzi (6/13) Apr 04 2024 If you mean shared code for libraries, I don't see how. Isn't dub

Richard (Rikki) Andrew Cattermole (6/20) Apr 04 2024 No I meant shared libraries.

Paolo Invernizzi (8/29) Apr 04 2024 Aren't 'di' sources the target solution for library API? What's

Richard (Rikki) Andrew Cattermole (10/45) Apr 04 2024 You can and probably will still zip them up.

Paolo Invernizzi (6/28) Apr 04 2024 Package manager should take care of moving files around following

Hipreme (9/42) Apr 04 2024 The .di feature is a big flop of D. They are nearly useless,

Paolo Invernizzi (7/39) Apr 04 2024 Let's put aside the implementation of the automatic way to

Richard (Rikki) Andrew Cattermole (6/9) Apr 04 2024 I intend to see this happen yes.
Hipreme (9/49) Apr 04 2024 On the case of it being opaque, I could not care less. Yes, .di

Adam Wilson (4/8) Apr 05 2024 Now ... to convince Walter that loading the whole archive into

Walter Bright (8/12) Apr 09 2024 There's a switch in the source code to read it all at once or use memory...

Richard (Rikki) Andrew Cattermole (5/9) Apr 09 2024 That doesn't sound right.

Richard (Rikki) Andrew Cattermole (12/23) Apr 09 2024 It appears to have been true as of Windows 2000.

Richard (Rikki) Andrew Cattermole (1/1) Apr 09 2024 https://issues.dlang.org/show_bug.cgi?id=24494

Walter Bright (3/5) Apr 09 2024 Code generation has changed to be PIC (Position Independent Code) so thi...

Richard (Rikki) Andrew Cattermole (9/16) Apr 09 2024 From Windows Internals 5 and WinAPI docs, it seems as though it does

Walter Bright (3/17) Apr 09 2024 Instead of using pointers, use offsets from the beginning of the file.

ryuukk_ (6/20) Apr 09 2024 Who managed to convince you to spend time working on this?

Walter Bright (9/10) Apr 09 2024 Nobody. I've wanted to do it for decades, just never got around to it. W...

Adam Wilson (12/20) Apr 10 2024 Great.

Paulo Pinto (5/18) Apr 10 2024 Not only we don't care about file access times to JAR/WAR/EAR and

Hipreme (18/23) Apr 04 2024 Rationale:

Paolo Invernizzi (3/10) Apr 04 2024 What you can't do with `dmd -I` that you can do with sar?

Walter Bright (2/3) Apr 09 2024 Unfortunately, some things cannot be benchmarked until they are built.

Paolo Invernizzi (3/7) Apr 09 2024 Exactly, mine It's a bet ... but hey, I'll be happy to lost it,

Steven Schveighoffer (21/29) Apr 09 2024 I will also bet that any difference in compile time will be

Walter Bright (28/45) Apr 09 2024 On my timing on compiling hello world, a 1.412s build becomes 1.375s, 35...

Steven Schveighoffer (31/70) Apr 10 2024 Yes, the nice thing is knowing you will not have to ask the

Paulo Pinto (9/17) Apr 10 2024 C++ compilers are already on the next level, past PCH, with C++

Walter Bright (9/16) Apr 10 2024 That's more or less what my C++ compiler did back in the 1990s. The symb...

Walter Bright (15/15) Apr 10 2024 We certainly could do more with .sar files, we just have to start somewh...

Steven Schveighoffer (25/44) Apr 11 2024 No. tar programs would work fine with it. We could indicate they

Nick Treleaven (11/30) Apr 11 2024 Sounds like a good solution. Users would be able to use e.g. any
Walter Bright (16/20) Apr 13 2024 We use standard object formats because we don't have a linker. I've spen...

Steven Schveighoffer (33/57) Apr 14 2024 Exactly, we don't need to be responsible for all the things.

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

https://github.com/dlang/dmd/pull/16348

*sigh*

/P

Apr 04 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348
 
 *sigh*
 
 /P

I can certainly see it being used with shared libraries.

However there will need to be some changes or at least acknowledgement 
for cli args per file and support passing it via the import path switch.

Apr 04 2024

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348
 
 *sigh*
 
 /P

 I can certainly see it being used with shared libraries.

If you mean shared code for libraries, I don't see how. Isn't dub 
the *official* tool to use for library? What's the problem in 
revamping the way dub download / organise / handle source 
distributions?

Apr 04 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348

 *sigh*

 /P

 I can certainly see it being used with shared libraries.

 
 If you mean shared code for libraries, I don't see how. Isn't dub the 
 *official* tool to use for library? What's the problem in revamping the 
 way dub download / organise / handle source distributions?

No I meant shared libraries.

Specifically the distribution of the source files that act as the 
interface to it.

That way you have the .sar file and the .dll and that's everything you 
need to use it.

Apr 04 2024

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348

 *sigh*

 /P

 I can certainly see it being used with shared libraries.

 
 If you mean shared code for libraries, I don't see how. Isn't 
 dub the *official* tool to use for library? What's the problem 
 in revamping the way dub download / organise / handle source 
 distributions?

 No I meant shared libraries.

 Specifically the distribution of the source files that act as 
 the interface to it.

 That way you have the .sar file and the .dll and that's 
 everything you need to use it.

Aren't 'di' sources the target solution for library API? What's 
the problem in distributing a zip or tar?

In C++ you usually have a specific "include" directory with all 
you need, what's the burden in doing a zip with the shared 
library binary and the 'di' directories?

/P

Apr 04 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 05/04/2024 1:27 AM, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348

 *sigh*

 /P

 I can certainly see it being used with shared libraries.

 If you mean shared code for libraries, I don't see how. Isn't dub the 
 *official* tool to use for library? What's the problem in revamping 
 the way dub download / organise / handle source distributions?

 No I meant shared libraries.

 Specifically the distribution of the source files that act as the 
 interface to it.

 That way you have the .sar file and the .dll and that's everything you 
 need to use it.

 
 Aren't 'di' sources the target solution for library API? What's the 
 problem in distributing a zip or tar?
 
 In C++ you usually have a specific "include" directory with all you 
 need, what's the burden in doing a zip with the shared library binary 
 and the 'di' directories?
 
 /P

You can and probably will still zip them up.

Getting the number of files down makes it a bit easier to work with when 
moving them around or using it.

I may like the idea of it, but not enough to be arguing for it, so my 
main concern is making sure Walter isn't simplifying it down to a point 
where we will have issues with it.

I.e. recommending the spec should be a little more complicated than what 
the code he has written so far (my improved spec): 
https://gist.github.com/rikkimax/d75bdd1cb9eb9aa7bacb532b69ed398c

Apr 04 2024

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Thursday, 4 April 2024 at 12:33:58 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 05/04/2024 1:27 AM, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 [...]

 
 Aren't 'di' sources the target solution for library API? 
 What's the problem in distributing a zip or tar?
 
 In C++ you usually have a specific "include" directory with 
 all you need, what's the burden in doing a zip with the shared 
 library binary and the 'di' directories?
 
 /P

 You can and probably will still zip them up.

 Getting the number of files down makes it a bit easier to work 
 with when moving them around or using it.

 I may like the idea of it, but not enough to be arguing for it, 
 so my main concern is making sure Walter isn't simplifying it 
 down to a point where we will have issues with it.

 I.e. recommending the spec should be a little more complicated 
 than what the code he has written so far (my improved spec): 
 https://gist.github.com/rikkimax/d75bdd1cb9eb9aa7bacb532b69ed398c

Package manager should take care of moving files around following 
a given configuration, not users.

Let the package manager be cool enough to handle the specific 
problem that SAR files are supposed to solve.

Apr 04 2024

Hipreme <msnmancini hotmail.com> writes:

On Thursday, 4 April 2024 at 12:27:00 UTC, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348

 *sigh*

 /P

 I can certainly see it being used with shared libraries.

 
 If you mean shared code for libraries, I don't see how. Isn't 
 dub the *official* tool to use for library? What's the 
 problem in revamping the way dub download / organise / handle 
 source distributions?

 No I meant shared libraries.

 Specifically the distribution of the source files that act as 
 the interface to it.

 That way you have the .sar file and the .dll and that's 
 everything you need to use it.

 Aren't 'di' sources the target solution for library API? What's 
 the problem in distributing a zip or tar?

 In C++ you usually have a specific "include" directory with all 
 you need, what's the burden in doing a zip with the shared 
 library binary and the 'di' directories?

 /P

The .di feature is a big flop of D. They are nearly useless, 
their generation is dumb since no code analysis is done for 
removing useless imports. It should analyze the code and: check 
if the type import is used or not. Also, public imports are 
always kept.
There's no reason nowadays to use auto DI generation. One must 
either completely ignore this feature or just create their own DI 
files.

Apr 04 2024

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Thursday, 4 April 2024 at 13:15:50 UTC, Hipreme wrote:
 On Thursday, 4 April 2024 at 12:27:00 UTC, Paolo Invernizzi 
 wrote:
 On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:
 [...]

 No I meant shared libraries.

 Specifically the distribution of the source files that act as 
 the interface to it.

 That way you have the .sar file and the .dll and that's 
 everything you need to use it.

 Aren't 'di' sources the target solution for library API? 
 What's the problem in distributing a zip or tar?

 In C++ you usually have a specific "include" directory with 
 all you need, what's the burden in doing a zip with the shared 
 library binary and the 'di' directories?

 /P

 The .di feature is a big flop of D. They are nearly useless, 
 their generation is dumb since no code analysis is done for 
 removing useless imports. It should analyze the code and: check 
 if the type import is used or not. Also, public imports are 
 always kept.
 There's no reason nowadays to use auto DI generation. One must 
 either completely ignore this feature or just create their own 
 DI files.

Let's put aside the implementation of the automatic way to 
generate '.di', and rephrase:
Are .di intended to be the correct way to expose _public_ API of 
an opaque library binary?

What's the problem SAR targets to solve that a package manager 
can't solve? Why D needs it?

Apr 04 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 05/04/2024 3:11 AM, Paolo Invernizzi wrote:
 Let's put aside the implementation of the automatic way to generate 
 '.di', and rephrase: Are .di intended to be the correct way to expose 
 /public/ API of an opaque library binary?

I intend to see this happen yes.

If we don't do this, language features like inference simply make it 
impossible to link against code that has had it.

It will also mean shared libraries can hide their internal details, 
which is something I care about greatly.

Apr 04 2024

Hipreme <msnmancini hotmail.com> writes:

On Thursday, 4 April 2024 at 14:11:35 UTC, Paolo Invernizzi wrote:
 On Thursday, 4 April 2024 at 13:15:50 UTC, Hipreme wrote:
 On Thursday, 4 April 2024 at 12:27:00 UTC, Paolo Invernizzi 
 wrote:
 On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:
 [...]

 No I meant shared libraries.

 Specifically the distribution of the source files that act 
 as the interface to it.

 That way you have the .sar file and the .dll and that's 
 everything you need to use it.

 Aren't 'di' sources the target solution for library API? 
 What's the problem in distributing a zip or tar?

 In C++ you usually have a specific "include" directory with 
 all you need, what's the burden in doing a zip with the 
 shared library binary and the 'di' directories?

 /P

 The .di feature is a big flop of D. They are nearly useless, 
 their generation is dumb since no code analysis is done for 
 removing useless imports. It should analyze the code and: 
 check if the type import is used or not. Also, public imports 
 are always kept.
 There's no reason nowadays to use auto DI generation. One must 
 either completely ignore this feature or just create their own 
 DI files.

 Let's put aside the implementation of the automatic way to 
 generate '.di', and rephrase:
 Are .di intended to be the correct way to expose _public_ API 
 of an opaque library binary?

 What's the problem SAR targets to solve that a package manager 
 can't solve? Why D needs it?


On the case of it being opaque, I could not care less. Yes, .di 
files are the correct way to expose public API. When they are 
used correctly, you could reduce by a lot the compilation time 
required by your files.

SAR solves exactly 0 things. It wasn't even put to test and 
Dennis even tested in the thread discussion SAR vs dmd -i. DMD -i 
was faster. Walter said that a cold run could be way faster 
though since the files won't need to be mem mapped.

Apr 04 2024

Adam Wilson <flyboynw gmail.com> writes:

On Thursday, 4 April 2024 at 15:36:31 UTC, Hipreme wrote:
 SAR solves exactly 0 things. It wasn't even put to test and 
 Dennis even tested in the thread discussion SAR vs dmd -i. DMD 
 -i was faster. Walter said that a cold run could be way faster 
 though since the files won't need to be mem mapped.

Now ... to convince Walter that loading the whole archive into 
RAM once will be better than mem-mapping...

RAM is cheap and source code is not a big memory hit.

Apr 05 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 4/5/2024 5:20 PM, Adam Wilson wrote:
 Now ... to convince Walter that loading the whole archive into RAM once will
be 
 better than mem-mapping...
 
 RAM is cheap and source code is not a big memory hit.

There's a switch in the source code to read it all at once or use memory 
mapping, so I could benchmark which is faster.

There's no significant difference, probably because the files weren't large
enough.

BTW, I recall that executable files are not read into memory and then jumped
to. 
They are memory-mapped files, this is so the executable can start up much 
faster. Pieces of the executable are loaded in on demand, although the OS will 
speculatively load in pieces, too.

Apr 09 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 10/04/2024 5:18 AM, Walter Bright wrote:
 BTW, I recall that executable files are not read into memory and then 
 jumped to. They are memory-mapped files, this is so the executable can 
 start up much faster. Pieces of the executable are loaded in on demand, 
 although the OS will speculatively load in pieces, too.

That doesn't sound right.

Address randomization, Windows remapping of symbols at runtime (with 
state that is kept around so you can do it later), all suggest it isn't 
like that now.

Apr 09 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 10/04/2024 5:22 AM, Richard (Rikki) Andrew Cattermole wrote:
 On 10/04/2024 5:18 AM, Walter Bright wrote:
 BTW, I recall that executable files are not read into memory and then 
 jumped to. They are memory-mapped files, this is so the executable can 
 start up much faster. Pieces of the executable are loaded in on 
 demand, although the OS will speculatively load in pieces, too.

 
 That doesn't sound right.
 
 Address randomization, Windows remapping of symbols at runtime (with 
 state that is kept around so you can do it later), all suggest it isn't 
 like that now.

It appears to have been true as of Windows 2000.

https://learn.microsoft.com/en-us/archive/msdn-magazine/2002/march/windows-2000-loader-what-goes-on-inside-windows-2000-solving-the-mysteries-of-the-loader

See: LdrpMapDll

However I don't think those two features may have existed at the time.

As of Windows Internals 5, the cache manager uses 256kb blocks as part 
of memory mapping (very useful information that!). Would be worth double 
checking that this is the default for std.mmap.

So it seems I'm half right, there is no way Windows could be memory 
mapping binaries when address randomization is turned on for a given 
block that has rewrites for symbol locations, but it may be memory 
mapping large blocks of data if it doesn't.

Apr 09 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

https://issues.dlang.org/show_bug.cgi?id=24494

Apr 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 4/9/2024 10:22 AM, Richard (Rikki) Andrew Cattermole wrote:
 Address randomization, Windows remapping of symbols at runtime (with state
that 
 is kept around so you can do it later), all suggest it isn't like that now.

Code generation has changed to be PIC (Position Independent Code) so this is 
workable.

Apr 09 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 10/04/2024 7:04 AM, Walter Bright wrote:
 On 4/9/2024 10:22 AM, Richard (Rikki) Andrew Cattermole wrote:
 Address randomization, Windows remapping of symbols at runtime (with 
 state that is kept around so you can do it later), all suggest it 
 isn't like that now.

 
 Code generation has changed to be PIC (Position Independent Code) so 
 this is workable.

 From Windows Internals 5 and WinAPI docs, it seems as though it does 
memory map initially. But the patching will activate CoW, so in effect 
it isn't memory mapped if you need to patch.

Which is quite useful information for me while working with Unicode tables.

If you need to patch? That won't be shared.

If you don't need to patch? Who cares how much ROM is used! Don't be 
afraid to use 256kb in a single table. Just don't use pointers... no 
matter what and it'll be shared.

Apr 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 4/9/2024 12:11 PM, Richard (Rikki) Andrew Cattermole wrote:
 On 10/04/2024 7:04 AM, Walter Bright wrote:
 On 4/9/2024 10:22 AM, Richard (Rikki) Andrew Cattermole wrote:
 Address randomization, Windows remapping of symbols at runtime (with state 
 that is kept around so you can do it later), all suggest it isn't like that
now.

 Code generation has changed to be PIC (Position Independent Code) so this is 
 workable.

 
  From Windows Internals 5 and WinAPI docs, it seems as though it does memory
map 
 initially. But the patching will activate CoW, so in effect it isn't memory 
 mapped if you need to patch.

Right, so the executable is designed to not need patching.

 If you don't need to patch? Who cares how much ROM is used! Don't be afraid to 
 use 256kb in a single table. Just don't use pointers... no matter what and
it'll 
 be shared.

Instead of using pointers, use offsets from the beginning of the file.

Apr 09 2024

ryuukk_ <ryuukk.dev gmail.com> writes:

On Tuesday, 9 April 2024 at 17:18:02 UTC, Walter Bright wrote:
 On 4/5/2024 5:20 PM, Adam Wilson wrote:
 Now ... to convince Walter that loading the whole archive into 
 RAM once will be better than mem-mapping...
 
 RAM is cheap and source code is not a big memory hit.

 There's a switch in the source code to read it all at once or 
 use memory mapping, so I could benchmark which is faster.

 There's no significant difference, probably because the files 
 weren't large enough.

 BTW, I recall that executable files are not read into memory 
 and then jumped to. They are memory-mapped files, this is so 
 the executable can start up much faster. Pieces of the 
 executable are loaded in on demand, although the OS will 
 speculatively load in pieces, too.

Who managed to convince you to spend time working on this?

The solution to poor phobos build time is to stop abusing 
templates for code that doesn't exist for problems that nobody 
have

https://github.com/dlang/phobos/tree/master/phobos/sys

Apr 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 4/9/2024 10:58 AM, ryuukk_ wrote:
 Who managed to convince you to spend time working on this?

Nobody. I've wanted to do it for decades, just never got around to it. What 
triggered it was my proposal to Adam to split Phobos modules into a much more 
granular structure, which would increase the number of files in it by a factor 
of 5 or more. (The current structure is of each module being a grab bag of 
marginally related functions.) A more granular nature would hopefully reduce
the 
"every module imports every other module" problem Phobos has.

But lots more modules increases aggregate file lookup times.

One cannot really tell how well it works without trying it.

Apr 09 2024

Adam Wilson <flyboynw gmail.com> writes:

On Tuesday, 9 April 2024 at 19:11:28 UTC, Walter Bright wrote:
 Nobody. I've wanted to do it for decades, just never got around 
 to it. What triggered it was my proposal to Adam to split 
 Phobos modules into a much more granular structure, which would 
 increase the number of files in it by a factor of 5 or more. 
 (The current structure is of each module being a grab bag of 
 marginally related functions.) A more granular nature would 
 hopefully reduce the "every module imports every other module" 
 problem Phobos has.

Great.

Now everybody is going to think that I started this.

For the record I did **not** start this.

Walter sent me this idea out of the blue after I pointed out that 
working with hundreds (or thousands) of files in Phobos was going 


This wasn't the problem I was thinking of because frankly, nobody 

certain advantages from a distribution standpoint. Although 
honestly, we're going to end up unpacking the files for other 
tools to use anyways.

Apr 10 2024

Paulo Pinto <pjmlp progtools.org> writes:

On Wednesday, 10 April 2024 at 10:17:53 UTC, Adam Wilson wrote:
 On Tuesday, 9 April 2024 at 19:11:28 UTC, Walter Bright wrote:
 [...]

 Great.

 Now everybody is going to think that I started this.

 For the record I did **not** start this.

 Walter sent me this idea out of the blue after I pointed out 
 that working with hundreds (or thousands) of files in Phobos 


 This wasn't the problem I was thinking of because frankly, 

 have certain advantages from a distribution standpoint. 
 Although honestly, we're going to end up unpacking the files 
 for other tools to use anyways.

Not only we don't care about file access times to JAR/WAR/EAR and 
DLLs, we happily ship binary libraries, instead of parsing source 
code all the time.

This looks to me at yet another distraction.

Apr 10 2024

Hipreme <msnmancini hotmail.com> writes:

On Thursday, 4 April 2024 at 09:55:32 UTC, Paolo Invernizzi wrote:
 https://github.com/dlang/dmd/pull/16348

 *sigh*

 /P

Rationale: 
https://github.com/dlang/dmd/blob/8e94bc644fc72dc3f72a00791eb52b40230ceb26/changelog/dmd.source-archive.dd#L79

The part I'm very interested in in the compilation time. This may 
reduce by a lot the compilation time required by dub libraries. 
But there would be a requirement of doing a synchronized change 
between all the compilers and our existing tools :)
For development time, this feature might be useless. I also will 
need to start thinking now how to support those.


2. To compile all the source files at once with DMD, the command 
line can get

extremely long, and certainly unwieldy. With .sar files, you may 
not even need
a makefile or builder, just:

This is actually a like. It is possible to do that with `dmd -i 
-i=std` or something like that. The main feature one gain from 
makefiles or builders aren't declaring the files you're using, it 
is for defining version configuration.

Also, I'll be renaming the thread name since it doesn't open up 
for any discussion.

Apr 04 2024

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Thursday, 4 April 2024 at 10:48:50 UTC, Hipreme wrote:
 On Thursday, 4 April 2024 at 09:55:32 UTC, Paolo Invernizzi 
 wrote:
 [...]

 Rationale: 
 https://github.com/dlang/dmd/blob/8e94bc644fc72dc3f72a00791eb52b40230ceb26/changelog/dmd.source-archive.dd#L79

 [...]

My 2 cents: there will be NO advantages in compilation time.

 [...]

What you can't do with `dmd -I` that you can do with sar?

Apr 04 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 4/4/2024 4:12 AM, Paolo Invernizzi wrote:
 My 2 cents: there will be NO advantages in compilation time.

Unfortunately, some things cannot be benchmarked until they are built.

Apr 09 2024

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Tuesday, 9 April 2024 at 17:19:41 UTC, Walter Bright wrote:
 On 4/4/2024 4:12 AM, Paolo Invernizzi wrote:
 My 2 cents: there will be NO advantages in compilation time.

 Unfortunately, some things cannot be benchmarked until they are 
 built.

Exactly, mine It's a bet ... but hey, I'll be happy to lost it, 
of course!

Apr 09 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Tuesday, 9 April 2024 at 18:49:21 UTC, Paolo Invernizzi wrote:
On Tuesday, 9 April 2024 at 17:19:41 UTC, Walter Bright wrote:
On 4/4/2024 4:12 AM, Paolo Invernizzi wrote:
My 2 cents: there will be NO advantages in compilation time.

Unfortunately, some things cannot be benchmarked until they
are built.

Exactly, mine It's a bet ... but hey, I'll be happy to lost it,
of course!

I will also bet that any difference in compile time will be
extremely insignificant. I don't bet against decades of
filesystem read optimizations. Saving e.g. microseconds on a 1.5
second build isn't going to move the needle.

I did reduce stats semi-recently for DMD and saved a significant
percentage of stats, I don't really think it saved insane amounts
of time. It was more of a "oh, I thought of a better way to do
this". I think at the time, there was some resistance to adding
more stats to the compiler due to the same misguided optimization
beliefs, and so I started looking at it. If reducing stats by 90%
wasn't significant, reducing them again likely isn't going to be
noticed.

See https://github.com/dlang/dmd/pull/14582

The only benefit I might see in this is to *manage* the source as
one item. But I don't really know that we need a new custom
format. `tar` is pretty simple. ARSD has a tar implementation
that I lifted for my raylib-d installer which allows reading tar
files with about [100 lines of
code](https://github.com/schveiguy/raylib-d/blob/9906279494f1f83b2c4c9550779d46962af7c342/install/source/app.d#L22-L132).

-Steve

Apr 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 4/9/2024 4:42 PM, Steven Schveighoffer wrote:
I will also bet that any difference in compile time will be extremely
insignificant. I don't bet against decades of filesystem read optimizations.
Saving e.g. microseconds on a 1.5 second build isn't going to move the needle.

On my timing on compiling hello world, a 1.412s build becomes 1.375s, 35
milliseconds faster. Most of the savings appear to be due to when the archive
is
first accessed, its table of contents is loaded into the path cache and file
cache that you developed. Then, no stats are done on the filesystem.

I did reduce stats semi-recently for DMD and saved a significant percentage of
stats, I don't really think it saved insane amounts of time. It was more of a
"oh, I thought of a better way to do this". I think at the time, there was
some
resistance to adding more stats to the compiler due to the same misguided
optimization beliefs, and so I started looking at it. If reducing stats by 90%
wasn't significant, reducing them again likely isn't going to be noticed.

See https://github.com/dlang/dmd/pull/14582

Nice. I extended it so files in an archive are tracked.

The only benefit I might see in this is to *manage* the source as one item.

The convenience of being able to distribute a "header only" library as one file
may be significant. I've always liked things that didn't need an installation
program. An install should be "copy the file onto your system" and uninstall
should be "delete the file" !

Back in the days of CD software, my compiler was set up so no install was
necessary, just put the CD in the drive and run it. You didn't even have to set
the environment variables, as the compiler would look for its files relative to
where the executable file was (argv[0]). You can see vestiges of that still in
today's dmd.

Of course, to get it to run faster you'd XCOPY it onto the hard drive. Though
some users were flummoxed by the absence of INSTALL.EXE and I'd have to explain
how to use XCOPY.

But
I don't really know that we need a new custom format. `tar` is pretty simple.
ARSD has a tar implementation that I lifted for my raylib-d installer which
allows reading tar files with about [100 lines of
code](https://github.com/schveiguy/raylib-d/blob/9906279494f1f83b2c4c9550779d46962af7c342/install/source/app.d#L22-L132).

Thanks for the code.

A tar file is serial, meaning one has to read the entire file to see what it is
in it (because it was designed for tape systems where data is simply appended).

The tar file doesn't have a table of contents, the filename is limited to 100
characters, and the path is limited to 155 characters.

Sar files have a table of contents at the beginning, and unlimited filespec
sizes.

P.S. the code that actually reads the .sar file is about 20 lines! (Excluding
checking for corrupt files, and the header structure definition.) The archive
reader and writer can be encapsulated in a separate module, so anyone can
replace it with a different format.

Apr 09 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Wednesday, 10 April 2024 at 03:47:30 UTC, Walter Bright wrote:
 On 4/9/2024 4:42 PM, Steven Schveighoffer wrote:
 I will also bet that any difference in compile time will be 
 extremely insignificant. I don't bet against decades of 
 filesystem read optimizations. Saving e.g. microseconds on a 
 1.5 second build isn't going to move the needle.

 On my timing on compiling hello world, a 1.412s build becomes 
 1.375s, 35 milliseconds faster. Most of the savings appear to 
 be due to when the archive is first accessed, its table of 
 contents is loaded into the path cache and file cache that you 
 developed. Then, no stats are done on the filesystem.

Yes, the nice thing is knowing you will not have to ask the 
filesystem for something you know doesn't exist. Pre-loading the 
directory structure could do the same thing, but I think that's 
definitely not as efficient.

 The only benefit I might see in this is to *manage* the source 
 as one item.

 The convenience of being able to distribute a "header only" 
 library as one file may be significant. I've always liked 
 things that didn't need an installation program. An install 
 should be "copy the file onto your system" and uninstall should 
 be "delete the file" !

 Back in the days of CD software, my compiler was set up so no 
 install was necessary, just put the CD in the drive and run it. 
 You didn't even have to set the environment variables, as the 
 compiler would look for its files relative to where the 
 executable file was (argv[0]). You can see vestiges of that 
 still in today's dmd.

 Of course, to get it to run faster you'd XCOPY it onto the hard 
 drive. Though some users were flummoxed by the absence of 
 INSTALL.EXE and I'd have to explain how to use XCOPY.

Consider that java archives (`.jar` files) are distributed as a 
package instead of individual `.class` files.

And Microsoft (and other C compilers) can produce "pre-compiled 
headers", that take away some of the initial steps of compilation.

I think there would be enthusiastic support for D archive files 
that reduce some of the compilation steps, or provide extra 
features (e.g. predetermined inference or matching compile-time 
switches). Especially if you aren't going to directly edit these 
archive files, you will be mechanically generating them, why not 
do more inside there?

 A tar file is serial, meaning one has to read the entire file 
 to see what it is in it (because it was designed for tape 
 systems where data is simply appended).

You can index a tar file easily. Each file is preceded by a 
header with the information about the file (including size). So 
you can determine the catalog by seeking to each header.

Note also that we can work with tar files to add indexes that are 
backwards compatible with existing tools. Remember, we are 
generating this *from a tool that we control*. Prepending an 
index "file" is trivial.

 The tar file doesn't have a table of contents, the filename is 
 limited to 100 characters, and the path is limited to 155 
 characters.

I'm not too worried about such things. I've never run into 
filename length problems with tar. But also, most modern tar 
formats do not have these limitations:

https://www.gnu.org/software/tar/manual/html_section/Formats.html

 Sar files have a table of contents at the beginning, and 
 unlimited filespec sizes.

 P.S. the code that actually reads the .sar file is about 20 
 lines! (Excluding checking for corrupt files, and the header 
 structure definition.) The archive reader and writer can be 
 encapsulated in a separate module, so anyone can replace it 
 with a different format.

I would suggest we replace it with a modern tar format for 
maximum compatibility with existing tools. We already have seen 
the drawbacks of using the abandoned `sdl` format for dub 
packages. We should not repeat that mistake.

-Steve

Apr 10 2024

Paulo Pinto <pjmlp progtools.org> writes:

On Wednesday, 10 April 2024 at 16:42:53 UTC, Steven Schveighoffer 
wrote:
 On Wednesday, 10 April 2024 at 03:47:30 UTC, Walter Bright 
 wrote:
 [...]

 Yes, the nice thing is knowing you will not have to ask the 
 filesystem for something you know doesn't exist. Pre-loading 
 the directory structure could do the same thing, but I think 
 that's definitely not as efficient.

 [...]

C++ compilers are already on the next level, past PCH, with C++ 
modules.

VC++ uses a database format for BMI (Binary Module Interface), 
has open sourced it, and there are some people trying to champion 
it as means to have C++ tooling similar to what Java and .NET 
IDEs can do with JVM/CLR metadata.

https://devblogs.microsoft.com/cppblog/open-sourcing-ifc-sdk-for-cpp-modules/

Apr 10 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 4/10/2024 9:54 AM, Paulo Pinto wrote:
 C++ compilers are already on the next level, past PCH, with C++ modules.
 
 VC++ uses a database format for BMI (Binary Module Interface), has open
sourced 
 it, and there are some people trying to champion it as means to have C++
tooling 
 similar to what Java and .NET IDEs can do with JVM/CLR metadata.
 
 https://devblogs.microsoft.com/cppblog/open-sourcing-ifc-sdk-for-cpp-modules/

That's more or less what my C++ compiler did back in the 1990s. The symbol
table 
and AST was created in a memory-mapped file, which could be read back in to 
jump-start the next compilation.

Yes, it was faster.

But the problem C++ has is compiling it is inherently slow due to the design of 
the language. My experience with that led to D being fast to compile, because I 
knew what to get rid of. With a language that compiles fast, it isn't
worthwhile 
to have a binary precompiled module.

Apr 10 2024

Walter Bright <newshound2 digitalmars.com> writes:

We certainly could do more with .sar files, we just have to start somewhere.

If we're going to add features to a .tar file, like an index, aren't we then 
creating our own format and won't be able to use existing .tar programs?

Yes, one can skip through a .tar archive indexing as one goes. The problem is 
one winds up reading the .tar archive. With the .sar format, the index is at
the 
beginning and none of the rest of the file is read in, unless actually needed. 
.tar is the only archive format I'm aware of that does not have an index 
section, and that's because it's designed for append-only magtapes. (Talk about 
ancient obsolete technology!)

Many archive formats also include optional compression, and various compression 
methods at that. All that support would have to be added to the compiler, as 
otherwise I'll get the bug reports "dmd failed with my .zip file!"

Still, the concept of presenting things as a single file is completely distinct 
from the file format used. The archive format being pluggable is certainly an 
option.

Apr 10 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Thursday, 11 April 2024 at 01:36:57 UTC, Walter Bright wrote:
 We certainly could do more with .sar files, we just have to 
 start somewhere.

 If we're going to add features to a .tar file, like an index, 
 aren't we then creating our own format and won't be able to use 
 existing .tar programs?

No. tar programs would work fine with it. We could indicate they 
are normal files, and normal tar programs would just extract an 
"index" file when expanding, or we could indicate they are 
vendor-specific extensions, which should be ignored or processed 
as normal files by other tar programs. We are not the first ones 
to think of these things, it is in the spec.

 Yes, one can skip through a .tar archive indexing as one goes. 
 The problem is one winds up reading the .tar archive. With the 
 .sar format, the index is at the beginning and none of the rest 
 of the file is read in, unless actually needed. .tar is the 
 only archive format I'm aware of that does not have an index 
 section, and that's because it's designed for append-only 
 magtapes. (Talk about ancient obsolete technology!)

This would be a fallback, when an index isn't provided as the 
first file. So normal tar source files could be supported.

 Many archive formats also include optional compression, and 
 various compression methods at that. All that support would 
 have to be added to the compiler, as otherwise I'll get the bug 
 reports "dmd failed with my .zip file!"

tar format doesn't have compression, though the tar executable 
supports it. I wouldn't recommend zip files as a supported 
archive format, and using compressed tarballs would definitely 
result in reading the whole file (you can't skip N bytes when you 
don't know the compressed size).

 Still, the concept of presenting things as a single file is 
 completely distinct from the file format used. The archive 
 format being pluggable is certainly an option.

I stress again, we should not introduce esoteric formats that are 
mostly equivalent to existing formats without a good reason. The 
first option should be to use existing formats, seeing if we can 
fit our use case into them. If that is impossible or prevents 
certain features, then we can consider using a new format. It 
should be a high bar to add new file formats to the toolchain, as 
this affects all tools that people depend on and use.

Think of why we use standard object formats instead of our own 
format (which would allow much more tight integration with the 
language).

-Steve

Apr 11 2024

Nick Treleaven <nick geany.org> writes:

On Thursday, 11 April 2024 at 15:28:34 UTC, Steven Schveighoffer 
wrote:
 On Thursday, 11 April 2024 at 01:36:57 UTC, Walter Bright wrote:
 If we're going to add features to a .tar file, like an index, 
 aren't we then creating our own format and won't be able to 
 use existing .tar programs?

 No. tar programs would work fine with it. We could indicate 
 they are normal files, and normal tar programs would just 
 extract an "index" file when expanding, or we could indicate 
 they are vendor-specific extensions, which should be ignored or 
 processed as normal files by other tar programs. We are not the 
 first ones to think of these things, it is in the spec.

Sounds like a good solution. Users would be able to use e.g. any 
GUI program that supports tar to extract a file from the archive. 
The advantage is for reading. D-specific tools should be used to 
write the file. If there is any concern about this, it could even 
have a different extension so long as the file format is standard 
tar - users that know this can still benefit from tar readers. 
There seems to be precedent for this - apparently .jar files are 
.zip files.

 Yes, one can skip through a .tar archive indexing as one goes. 
 The problem is one winds up reading the .tar archive. With the 
 .sar format, the index is at the beginning and none of the 
 rest of the file is read in, unless actually needed. .tar is 
 the only archive format I'm aware of that does not have an 
 index section, and that's because it's designed for 
 append-only magtapes. (Talk about ancient obsolete technology!)

 This would be a fallback, when an index isn't provided as the 
 first file. So normal tar source files could be supported.

Or just error if a tar file doesn't have the expected index file.

Apr 11 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 4/11/2024 8:28 AM, Steven Schveighoffer wrote:
 Think of why we use standard object formats instead of our own format (which 
 would allow much more tight integration with the language).

We use standard object formats because we don't have a linker. I've spent a lot 
of time trying to understand their byzantine structure. It's not fun work.

I mentioned that the archive support can be pluggable. It's only two functions 
with a generic interface to them. If we aren't going to move forward with
source 
archives, it would be a giant waste of time to learn .tar and all its
variations.

I chose to invent the .sar format because it's 20 lines of code to read them, 
and about the same to write them. Even doing a survey of the top 10 archive 
formats would have taken more time than the entire PR, let alone the time spent 
debating them.

The source archive PR is a proof of concept. The actual archive format is 
irrelevant.


 or we could indicate they are vendor-specific extensions

Wouldn't that defeat the purpose of being a .tar format?

 It should be a high bar to add new file formats to the toolchain, as this 

affects all tools that people depend on and use.

Using a .tar format would affect all the dlang source code tools just as much
as 
using the .sar format would.

Apr 13 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Sunday, 14 April 2024 at 06:04:02 UTC, Walter Bright wrote:
 On 4/11/2024 8:28 AM, Steven Schveighoffer wrote:
 Think of why we use standard object formats instead of our own 
 format (which would allow much more tight integration with the 
 language).

 We use standard object formats because we don't have a linker. 
 I've spent a lot of time trying to understand their byzantine 
 structure. It's not fun work.

Exactly, we don't need to be responsible for all the things. 
Using standard object format means we don't have to write our own 
linker.

 I mentioned that the archive support can be pluggable. It's 
 only two functions with a generic interface to them. If we 
 aren't going to move forward with source archives, it would be 
 a giant waste of time to learn .tar and all its variations.

Fair point. If this doesn't fly, then learning all the variations 
of tar might not be applicable (though I can say I personally 
"learned" tar in about 15 minutes, it's really simple).

 I chose to invent the .sar format because it's 20 lines of code 
 to read them, and about the same to write them. Even doing a 
 survey of the top 10 archive formats would have taken more time 
 than the entire PR, let alone the time spent debating them.

This misses the point. It's not that it's easy to add to the 
compiler. Both are easy, both are straightforward, one might be 
easier than the other, but it's probably a wash (maybe 2 hours vs 
4 hours?)

The problem is *all the other tools* that people might want to 
use. And specifically, I'm talking about IDEs. You have a 20 line 
solution in D, how does that help an IDE written in Java? 
However, Java has `tar` support that is tried and tested, and 
probably already in the IDE codebase itself.

Writing 20 lines of code isn't "mission accomplished". We now 
have to ask all IDE providers to support this for symbol lookup. 
That's what I'm talking about.

 The source archive PR is a proof of concept. The actual archive 
 format is irrelevant.

This is good, and I understand what you are trying to say. As 
long as it remains PoC, with the expectation that if it turns out 
to be useful, we address these ecosystem issues, then I have no 
objections.

 or we could indicate they are vendor-specific extensions

 Wouldn't that defeat the purpose of being a .tar format?

No, vendor-specific sections are in the spec. Existing tar 
programs would still read these just fine.

But even if we wanted to avoid that, adding an index can be done 
by including a specific filename that the D compiler recognizes 
as the index.

 It should be a high bar to add new file formats to the

 toolchain, as this affects all tools that people depend on and 
 use.

 Using a .tar format would affect all the dlang source code 
 tools just as much as using the .sar format would.

Yes, of course. It's just, will there be a ready-made library 
available for whatever IDEs are using for language/libraries? 
With .sar, the answer is no (it hasn't been invented yet). With 
.tar, it's likely yes.

-Steve

Apr 14 2024

D Programming

C/C++ Programming

Other

digitalmars.D - Why?