digitalmars.D - Would you pay for GC?

Elronnd (5/5) Jan 24 2022 Apropos recent discussion, here is a serious question: would you

Random Dude (10/15) Jan 24 2022 I'd pay to have it removed and replaced with ARC.

Elronnd (6/11) Jan 24 2022 I think it can, for reasons I've explained elsewhere. But that's
Paulo Pinto (13/31) Jan 24 2022 ARC will also not compete, unless one goes the extra mile of

Elronnd (6/11) Jan 24 2022 Indeed. See Bacon et al, 'Unified Theory of Garbage Collection':

rikki cattermole (12/23) Jan 25 2022 RC shines for when deterministic destruction is required.

Paulo Pinto (4/19) Jan 25 2022 That is the naive idea, until a cascade deletion of a graph based

Steven Schveighoffer (8/30) Jan 25 2022 I use ARC for determinism only, not memory deallocation:

norm (11/25) Jan 25 2022 In a small code base it might but in larger SW RC is on par with

Araq (6/17) Jan 25 2022 Only if you take the "deferred" RC route, which

Paulo Pinto (5/23) Jan 25 2022 Indeed, that was kind of my point, unless one is willing to
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (11/13) Jan 25 2022 What do you mean by "deferred"? RC increment when taking a

Araq (5/18) Jan 25 2022 That's what it means, yes.
Paulo Pinto (10/21) Jan 25 2022 Pity that programmer knowledge can't do much to ABI requirements.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (23/29) Jan 25 2022 So this is basically about giving a shared_ptr the semantics of a

Paulo Pinto (12/41) Jan 25 2022 Yes, the one from the compiler and OS vendor shipping their C++

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (39/46) Jan 25 2022 Actually, all it takes is for the core team to make it a

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/7) Jan 25 2022 With per fiber/task/actor ownership + unique_ptr you reduce the

rikki cattermole (3/10) Jan 25 2022 With scope without ref or return it may even be possible to elide all
Paulo Pinto (3/10) Jan 25 2022 It doesn't matter if it is still worse than the competition.

Tejas (6/40) Jan 25 2022 Even then people are dissatisfied, apparently.

H. S. Teoh (7/23) Jan 25 2022 [...]
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/11) Jan 25 2022 Bullshit argument. There is much less demand for iOS-only or

max haughton (6/11) Jan 25 2022 If it was delivered the foundation would probably be happy to

Mike Parker (9/20) Jan 25 2022 A contract for this sort of work is always a possibility. That's

Adam D Ruppe (2/4) Jan 25 2022 No. D's GC is already plenty good enough right now.

Era Scarecrow (9/13) Jan 27 2022 While i might pay for a good GC implementation to be worked on

H. S. Teoh (11/13) Jan 27 2022 I recently also upgraded to an 8-core AMD CPU with hyperthreading, but I
Elronnd (5/7) Jan 27 2022 Sadly, doesn't work as well as we'd like. Concurrent GC exists
rikki cattermole (5/8) Jan 27 2022 We already do this (more or less).

bachmeier (7/12) Jan 25 2022 What you are going to hear is "I'd like someone else to do a

H. S. Teoh (10/26) Jan 25 2022 +1. Around these parts I hear a lot of complaints about GC this, GC

IGotD- (14/19) Jan 25 2022 No I wouldn't pay for any of those because that's not where the

rikki cattermole (17/17) Jan 25 2022 After reading my book on GC, you're kinda right.

Elronnd (17/20) Jan 25 2022 Thanks for the show of confidence :)

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/8) Jan 26 2022 The difference is that you maybe have 8 threads, but maybe 10000

Elronnd (5/8) Jan 26 2022 Yes. Good point. The more I think about it, the more I see

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/13) Jan 26 2022 Yes, if the load is somewhat even and you have 16 (8+8) cores

IGotD- (16/20) Jan 28 2022 I would like to challenge the idea that thread aware GC would do

Paulo Pinto (10/33) Jan 28 2022 You can have your cake and eat it too, using something like Pony

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/6) Jan 28 2022 Pony is very much a high level language though. Which has some
jmh530 (10/14) Jan 28 2022 Well D would do it in a D way rather than in a pony way...for

Paulo Pinto (3/19) Jan 28 2022 Yeah, maybe that would be possible then.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (27/49) Jan 28 2022 Which is why you would want ARC for shared objects and a local GC

Guillaume Piolat (2/4) Jan 26 2022 No. No problem => no solution needed.

Elronnd <elronnd elronnd.net> writes:

Apropos recent discussion, here is a serious question: would you 
pay for either of these?

- High-throughput/scalable gc.  High sustained allocation rates, 
large heaps, many cores, compacting&generational

- Concurrent gc.  No pauses

Jan 24 2022

Random Dude <wtqdzouiyyrhaijcyy nthrw.com> writes:

On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 - High-throughput/scalable gc.  High sustained allocation 
 rates, large heaps, many cores, compacting&generational

 - Concurrent gc.  No pauses

I'd pay to have it removed and replaced with ARC.

GC in it's current form can not compete with other more 
performant GCs and it shouldn't. D is in a unique position to 
enable people to write code as if they're writing python and also 
accommodate them when they want to do low-level optimizations.

If we could just have automatic reference counting both the GC 
and No-GC people would be happy. It's okay if that route changes 
how pointers work (metadata would have to be added and some code 
would break), this is the right move in the long run.

Jan 24 2022

Elronnd <elronnd elronnd.net> writes:

On Tuesday, 25 January 2022 at 06:13:31 UTC, Random Dude wrote:
 GC in it's current form can not compete with other more 
 performant GCs

I think it can, for reasons I've explained elsewhere.  But that's 
a bit beside the point; the question is: what _would_ you do _if_ 
it could?

 D is in a unique position to enable people to write code
 as if they're writing python and also accommodate them
 when they want to do low-level optimizations.

Fast AND expressive is a much sexier value proposition than fast 
XOR expressive.

Jan 24 2022

Paulo Pinto <pjmlp progtools.org> writes:

On Tuesday, 25 January 2022 at 06:13:31 UTC, Random Dude wrote:
On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
Apropos recent discussion, here is a serious question: would
you pay for either of these?

- High-throughput/scalable gc. High sustained allocation
rates, large heaps, many cores, compacting&generational

- Concurrent gc. No pauses

I'd pay to have it removed and replaced with ARC.

GC in it's current form can not compete with other more
performant GCs and it shouldn't. D is in a unique position to
enable people to write code as if they're writing python and
also accommodate them when they want to do low-level
optimizations.

If we could just have automatic reference counting both the GC
and No-GC people would be happy. It's okay if that route
changes how pointers work (metadata would have to be added and
some code would break), this is the right move in the long run.

ARC will also not compete, unless one goes the extra mile of
making the compiler ARC aware, elide retain/release calls, do
cascade deletions in background threads, take care on cascade
deletions to avoid stack overflows on destructor calls, provide
multicore friendly versions of them,.....

If you are paying to replace GC with ARC, without putting the
money to reach Swift level of performance (which is still pretty
lame versus last gen tracing GCs in Java/.NET), then you will be
getting lemons.

https://forums.swift.org/t/a-roadmap-for-improving-swift-performance-predictability-arc-improvements-and-ownership-control/54206

I can already see it, the forums being inundated with complains
about ARC performance versus other languages.

Jan 24 2022

Elronnd <elronnd elronnd.net> writes:

On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of 
 making the compiler ARC aware, elide retain/release calls, do 
 cascade deletions in background threads, take care on cascade 
 deletions to avoid stack overflows on destructor calls, provide 
 multicore friendly versions of them,.....

Indeed.  See Bacon et al, 'Unified Theory of Garbage Collection': 
increasingly sophisticated RC approaches tracing (and vice 
versa).  So it's a bit strange to assume we can do one but not 
the other.  And tracing makes a better starting point due to the 
generational hypothesis.

Jan 24 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 25/01/2022 8:22 PM, Elronnd wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of making 
 the compiler ARC aware, elide retain/release calls, do cascade 
 deletions in background threads, take care on cascade deletions to 
 avoid stack overflows on destructor calls, provide multicore friendly 
 versions of them,.....

 
 Indeed.  See Bacon et al, 'Unified Theory of Garbage Collection': 
 increasingly sophisticated RC approaches tracing (and vice versa).  So 
 it's a bit strange to assume we can do one but not the other.  And 
 tracing makes a better starting point due to the generational hypothesis.

RC shines for when deterministic destruction is required.

So that is when you have any external resource bound to a D type.

But it is horrible as a language default. Not all types like say a 
pointer or a slice should be bound to any sort of memory management 
strategy in a native language.

It is very expensive compared to a GC. Due to the constant cache 
invalidations going on.

I want to get RC properly in D without having to rely on a struct 
wrapper. That way the compiler can know that eliding of calls can take 
place. Plus if its in the language, we can get const string type too 
with classes and all!

Jan 25 2022

Paulo Pinto <pjmlp progtools.org> writes:

On Tuesday, 25 January 2022 at 08:24:22 UTC, rikki cattermole 
wrote:
 On 25/01/2022 8:22 PM, Elronnd wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of 
 making the compiler ARC aware, elide retain/release calls, do 
 cascade deletions in background threads, take care on cascade 
 deletions to avoid stack overflows on destructor calls, 
 provide multicore friendly versions of them,.....

 
 Indeed.  See Bacon et al, 'Unified Theory of Garbage 
 Collection': increasingly sophisticated RC approaches tracing 
 (and vice versa).  So it's a bit strange to assume we can do 
 one but not the other.  And tracing makes a better starting 
 point due to the generational hypothesis.

 RC shines for when deterministic destruction is required.

 ...

That is the naive idea, until a cascade deletion of a graph based 
datastructure happens.

Jan 25 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/25/22 4:32 AM, Paulo Pinto wrote:
 On Tuesday, 25 January 2022 at 08:24:22 UTC, rikki cattermole wrote:
 On 25/01/2022 8:22 PM, Elronnd wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of making 
 the compiler ARC aware, elide retain/release calls, do cascade 
 deletions in background threads, take care on cascade deletions to 
 avoid stack overflows on destructor calls, provide multicore 
 friendly versions of them,.....

 Indeed.  See Bacon et al, 'Unified Theory of Garbage Collection': 
 increasingly sophisticated RC approaches tracing (and vice versa).  
 So it's a bit strange to assume we can do one but not the other.  And 
 tracing makes a better starting point due to the generational 
 hypothesis.

 RC shines for when deterministic destruction is required.

 ...

 
 That is the naive idea, until a cascade deletion of a graph based 
 datastructure happens.

I use ARC for determinism only, not memory deallocation: 
https://github.com/schveiguy/iopipe/blob/master/source/iopipe/refc.d

e.g., when I want the last reference to a buffered output stream to 
flush its buffer and close the file when going out of scope. I don't 
care about the memory management, that's fine for the GC to clean up.

As an added benefit, it's trivially ` safe`.

-Steve

Jan 25 2022

norm <norm.rowtree gmail.com> writes:

On Tuesday, 25 January 2022 at 08:24:22 UTC, rikki cattermole 
wrote:
 On 25/01/2022 8:22 PM, Elronnd wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of 
 making the compiler ARC aware, elide retain/release calls, do 
 cascade deletions in background threads, take care on cascade 
 deletions to avoid stack overflows on destructor calls, 
 provide multicore friendly versions of them,.....

 
 Indeed.  See Bacon et al, 'Unified Theory of Garbage 
 Collection': increasingly sophisticated RC approaches tracing 
 (and vice versa).  So it's a bit strange to assume we can do 
 one but not the other.  And tracing makes a better starting 
 point due to the generational hypothesis.

 RC shines for when deterministic destruction is required.

In a small code base it might but in larger SW RC is on par with 
GC because it is impossible to keep track of all the references. 
You end up with leaks and dangling shared_ptrs because someone 
has a ref ... somewhere.

The best option in a large code base is RAII with value types 
until you cannot and then rely on unique_ptr equivalents that can 
only have a single owner at any one time (without hackery). Only 
RC/shared_ptr when you absolutely must have multiple owners and 
even then interrogate your design.

Jan 25 2022

Araq <rumpf_a web.de> writes:

On Tuesday, 25 January 2022 at 07:22:36 UTC, Elronnd wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of 
 making the compiler ARC aware, elide retain/release calls, do 
 cascade deletions in background threads, take care on cascade 
 deletions to avoid stack overflows on destructor calls, 
 provide multicore friendly versions of them,.....

 Indeed.  See Bacon et al, 'Unified Theory of Garbage 
 Collection': increasingly sophisticated RC approaches tracing 
 (and vice versa).  So it's a bit strange to assume we can do 
 one but not the other.  And tracing makes a better starting 
 point due to the generational hypothesis.

Only if you take the "deferred" RC route, which 
Swift/Rust/C++/Nim do not! Without the "deferred" aspect RC 
remains quite a different beast. Different algorithm, different 
runtime profiles, different memory consumptions, enables 
different optimizations and of course different problems.

Jan 25 2022

Paulo Pinto <pjmlp progtools.org> writes:

On Tuesday, 25 January 2022 at 09:42:25 UTC, Araq wrote:
 On Tuesday, 25 January 2022 at 07:22:36 UTC, Elronnd wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of 
 making the compiler ARC aware, elide retain/release calls, do 
 cascade deletions in background threads, take care on cascade 
 deletions to avoid stack overflows on destructor calls, 
 provide multicore friendly versions of them,.....

 Indeed.  See Bacon et al, 'Unified Theory of Garbage 
 Collection': increasingly sophisticated RC approaches tracing 
 (and vice versa).  So it's a bit strange to assume we can do 
 one but not the other.  And tracing makes a better starting 
 point due to the generational hypothesis.

 Only if you take the "deferred" RC route, which 
 Swift/Rust/C++/Nim do not! Without the "deferred" aspect RC 
 remains quite a different beast. Different algorithm, different 
 runtime profiles, different memory consumptions, enables 
 different optimizations and of course different problems.

Indeed, that was kind of my point, unless one is willing to 
invest the required resources, a bit like you guys are doing with 
Nim, a RC implementation will not magically outperform modern 
tracing GC, only naive implementations of tracing GCs.

Jan 25 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 25 January 2022 at 09:42:25 UTC, Araq wrote:
 Only if you take the "deferred" RC route, which 
 Swift/Rust/C++/Nim do not!

What do you mean by "deferred"? RC increment when taking a 
reference from the heap, but not when the reference is taken from 
the stack + periodic stack scanning?

Actually, in C++ (and to some extent in Objective-C) you minimize 
reference counting by using programmer knowledge. You increment 
when you take it from the heap and from thereon you use a 
borrowed (raw) pointer down the call tree.

Anyway, the key problem is not solved by "deferred RC". The key 
problem can only be solved by segmenting the heap in the type 
system.

Jan 25 2022

Araq <rumpf_a web.de> writes:

On Tuesday, 25 January 2022 at 09:58:14 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 25 January 2022 at 09:42:25 UTC, Araq wrote:
 Only if you take the "deferred" RC route, which 
 Swift/Rust/C++/Nim do not!

 What do you mean by "deferred"? RC increment when taking a 
 reference from the heap, but not when the reference is taken 
 from the stack + periodic stack scanning?

That's what it means, yes.

 Actually, in C++ (and to some extent in Objective-C) you 
 minimize reference counting by using programmer knowledge. You 
 increment when you take it from the heap and from thereon you 
 use a borrowed (raw) pointer down the call tree.

 Anyway, the key problem is not solved by "deferred RC". The key 
 problem can only be solved by segmenting the heap in the type 
 system.

I didn't claim that deferred RC is a "solution". My post was a 
reply to another post.

Jan 25 2022

Paulo Pinto <pjmlp progtools.org> writes:

On Tuesday, 25 January 2022 at 09:58:14 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 25 January 2022 at 09:42:25 UTC, Araq wrote:
 Only if you take the "deferred" RC route, which 
 Swift/Rust/C++/Nim do not!

 What do you mean by "deferred"? RC increment when taking a 
 reference from the heap, but not when the reference is taken 
 from the stack + periodic stack scanning?

 Actually, in C++ (and to some extent in Objective-C) you 
 minimize reference counting by using programmer knowledge. You 
 increment when you take it from the heap and from thereon you 
 use a borrowed (raw) pointer down the call tree.

 ...

Pity that programmer knowledge can't do much to ABI requirements.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1116r0.pdf

Also that trick only works in single developer code bases, good 
luck not introducing a memory corruption some months/years down 
the line.

My experience with COM proves that is generally what happens when 
one decides to be smart about manually optimizing AddRef/Release 
calls.

Jan 25 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 25 January 2022 at 10:56:50 UTC, Paulo Pinto wrote:
 Pity that programmer knowledge can't do much to ABI 
 requirements.

C++ has an ABI?


 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1116r0.pdf

So this is basically about giving a shared_ptr the semantics of a 
unique_ptr.

That is not required for what we are talking about here.


 Also that trick only works in single developer code bases, good 
 luck not introducing a memory corruption some months/years down 
 the line.

No, this is not a big issue if you create proper ADTs. The issue 
is that it is very difficult for a compiler to distinguish 
between objects that "wrap ownership" around a data-structure and 
nodes within a datastructure; in particular what happens to 
ownership when those nodes are rearranged.

However, the programmer should have good and solid knowledge 
about this, so you only need to increment on the root-object if 
you know that nodes do not escape below a point in the call tree. 
(And you might be able to wrap this in a reference-type specific 
to the ADT).

Anyway, in C++ you tend almost always to use unique_ptr, 
shared_ptr is the exception.  So you usually have very few 
shared_ptrs and therefore they are not all that hard to reason 
about.

For a language like D you could have ARC + borrow checker + the 
ability to constrain ARC-pointers (to get a unique_ptr) for 
shared objects and something GC-like for objects local to 
actors/tasks.

Jan 25 2022

Paulo Pinto <pjmlp progtools.org> writes:

On Tuesday, 25 January 2022 at 11:47:26 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 25 January 2022 at 10:56:50 UTC, Paulo Pinto wrote:
 Pity that programmer knowledge can't do much to ABI 
 requirements.

 C++ has an ABI?

Yes, the one from the compiler and OS vendor shipping their C++ 
compilers on their platform.

 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1116r0.pdf

 So this is basically about giving a shared_ptr the semantics of 
 a unique_ptr.

 That is not required for what we are talking about here.


 Also that trick only works in single developer code bases, 
 good luck not introducing a memory corruption some 
 months/years down the line.

 No, this is not a big issue if you create proper ADTs. The 
 issue is that it is very difficult for a compiler to 
 distinguish between objects that "wrap ownership" around a 
 data-structure and nodes within a datastructure; in particular 
 what happens to ownership when those nodes are rearranged.

 However, the programmer should have good and solid knowledge 
 about this, so you only need to increment on the root-object if 
 you know that nodes do not escape below a point in the call 
 tree. (And you might be able to wrap this in a reference-type 
 specific to the ADT).

 Anyway, in C++ you tend almost always to use unique_ptr, 
 shared_ptr is the exception.  So you usually have very few 
 shared_ptrs and therefore they are not all that hard to reason 
 about.

As someone that does security as part of DevOps assignments, what 
the programmers should be able to do, and what they actually 
deploy into production isn't always the same.

That is how we end up with the 70% magical number being quoted 
from several security reports.

 For a language like D you could have ARC + borrow checker + the 
 ability to constrain ARC-pointers (to get a unique_ptr) for 
 shared objects and something GC-like for objects local to 
 actors/tasks.

In theory yes, in practice someone has to put down the money to 
make it happen and ensure that the performance gains are worth 
the money spent into it.

Jan 25 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 25 January 2022 at 12:29:46 UTC, Paulo Pinto wrote:
 For a language like D you could have ARC + borrow checker + 
 the ability to constrain ARC-pointers (to get a unique_ptr) 
 for shared objects and something GC-like for objects local to 
 actors/tasks.

 In theory yes, in practice someone has to put down the money to 
 make it happen and ensure that the performance gains are worth 
 the money spent into it.

Actually, all it takes is for the core team to make it a 
priority. The existing GC can be taken as a starting point for 
local GCs, and you don't have to start with ARC, you can just 
start with well designed RC as a foundation to evolve from.

What is needed, to get this ball rolling as an open source 
project, is to focus on making the compiler more modular, 
especially the backend-interface.

So, I don't think this is a strict money issue.

Leadership needs to:

1. provide a the clean compiler architecture that allows adding 
additional static analysis

2. pick a memory management "coordination" design that can evolve 
in a open-source friendly manner (e.g. a protocol for purging 
unused/cached resources)

Only doable if leadership makes it a priority, as creating a 
better compiler architecture based on DMD is out of scope for 
individual contributors.

Without making memory management a priority, as a strategy, 
nothing will happen. And the reason for this is that good modern 
memory management requires solid static analysis and that is hard 
to add, for outsiders, to the current compiler architecture.

It is also clear that maintaining a separate branch of the 
compiler over time is not productive (given how D evolves, e.g. 
the sudden addition of import-C). It would be more satisfying to 
just create your own language then… *Which many D users seem to 
do!!!*

Given that last fact it becomes clear that this is not a money 
issue. People apparently find creating compiler tech enjoyable, 
if they have a good starting point to evolve from.

I don't think the D foundation necessarily has to provide a 
memory management solution that fits system level programming, 
but if it wants this to work as an well functioning open source 
project then the foundation must make sure that the core compiler 
infrastructure has well defined interfaces, and follows an open 
design philosophy, so that people can evolve solutions that fits 
their concrete projects and interests.

(Java had an advantage in having a well defined VM/IR that made 
it easy to build on for outsiders.)

Jan 25 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 If you are paying to replace GC with ARC, without putting the 
 money to reach Swift level of performance (which is still 
 pretty lame versus last gen tracing GCs in Java/.NET), then you 
 will be getting lemons.

With per fiber/task/actor ownership + unique_ptr you reduce the 
need to increase the refcount significantly.

Jan 25 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 25/01/2022 10:23 PM, Ola Fosheim Grøstad wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 If you are paying to replace GC with ARC, without putting the money to 
 reach Swift level of performance (which is still pretty lame versus 
 last gen tracing GCs in Java/.NET), then you will be getting lemons.

 
 With per fiber/task/actor ownership + unique_ptr you reduce the need to 
 increase the refcount significantly.

With scope without ref or return it may even be possible to elide all 
calls to reference counting except for the last decrement.

Jan 25 2022

Paulo Pinto <pjmlp progtools.org> writes:

On Tuesday, 25 January 2022 at 09:23:01 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 If you are paying to replace GC with ARC, without putting the 
 money to reach Swift level of performance (which is still 
 pretty lame versus last gen tracing GCs in Java/.NET), then 
 you will be getting lemons.

 With per fiber/task/actor ownership + unique_ptr you reduce the 
 need to increase the refcount significantly.

It doesn't matter if it is still worse than the competition.

Jan 25 2022

Tejas <notrealemail gmail.com> writes:

On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
On Tuesday, 25 January 2022 at 06:13:31 UTC, Random Dude wrote:
On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
Apropos recent discussion, here is a serious question: would
you pay for either of these?

- High-throughput/scalable gc. High sustained allocation
rates, large heaps, many cores, compacting&generational

- Concurrent gc. No pauses

I'd pay to have it removed and replaced with ARC.

If you are paying to replace GC with ARC, without putting the
money to reach Swift level of performance (which is still
pretty lame versus last gen tracing GCs in Java/.NET), then you
will be getting lemons.

https://forums.swift.org/t/a-roadmap-for-improving-swift-performance-predictability-arc-improvements-and-ownership-control/54206

I can already see it, the forums being inundated with complains
about ARC performance versus other languages.

Even then people are dissatisfied, apparently.

I asked Reddit why ARC isn't used more widely despite Swift being
so successful and was **swiftly**(pun intended 😉) corrected that
Swift user share has become 50% of what it once was at it's peak.

https://www.reddit.com/r/Compilers/comments/s6r9wo/pure_arc_in_a_low_levelprogramming_language/htpx4g1/?utm_medium=android_app&utm_source=share&context=3

Jan 25 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jan 26, 2022 at 02:09:24AM +0000, Tejas via Digitalmars-d wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:

[...]
 If you are paying to replace GC with ARC, without putting the money
 to reach Swift level of performance (which is still pretty lame
 versus last gen tracing GCs in Java/.NET), then you will be getting
 lemons.
 
 https://forums.swift.org/t/a-roadmap-for-improving-swift-performance-predictability-arc-improvements-and-ownership-control/54206
 
 I can already see it, the forums being inundated with complains
 about ARC performance versus other languages.

 
 Even then people are dissatisfied, apparently.
 
 I asked Reddit why ARC isn't used more widely despite Swift being so
 successful and was **swiftly**(pun intended 😉) corrected that Swift
 user share has become 50% of what it once was at it's peak.

[...]

Cognitive dissonance. :-P


T

-- 
Try to keep an open mind, but not so open your brain falls out. -- theboz

Jan 25 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Wednesday, 26 January 2022 at 02:09:24 UTC, Tejas wrote:
 I asked Reddit why ARC isn't used more widely despite Swift 
 being so successful and was **swiftly**(pun intended 😉) 
 corrected that Swift user share has become 50% of what it once 
 was at it's peak.

Bullshit argument. There is much less demand for iOS-only or 
Android-only development than cross-platform. Swift is not 
cross-platform. Thus Dart and other solutions are cheaper. 
Cheaper wins.

What makes Swift annoying is related to Objective-C requirements. 
Swift + C++ is ok for development of Apple-only applications.

Jan 25 2022

max haughton <maxhaton gmail.com> writes:

On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 - High-throughput/scalable gc.  High sustained allocation 
 rates, large heaps, many cores, compacting&generational

 - Concurrent gc.  No pauses

If it was delivered the foundation would probably be happy to 
give at least some money (not that we have an unlimited supply), 
on the condition that it were open-sourced.

Speaking as a user of D, I wouldn't use a forked compiler should 
one be required.

Jan 25 2022

Mike Parker <aldacron gmail.com> writes:

On Tuesday, 25 January 2022 at 10:19:48 UTC, max haughton wrote:
 On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 - High-throughput/scalable gc.  High sustained allocation 
 rates, large heaps, many cores, compacting&generational

 - Concurrent gc.  No pauses

 If it was delivered the foundation would probably be happy to 
 give at least some money (not that we have an unlimited 
 supply), on the condition that it were open-sourced.

A contract for this sort of work is always a possibility. That's 
what the HR fund is for:

https://www.flipcause.com/secure/cause_pdetails/NTUxOTc=

Anyone serious about doing a project like this (any relatively 
complex project, not just a new GC) can get in touch and we can 
discuss it. I'm not saying the foundation *would* pay for any 
particular project, but the discussion and possibly a meeting 
could lead to that if agreement is reached that it's worth doing.

Jan 25 2022

Adam D Ruppe <destructionator gmail.com> writes:

On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

No. D's GC is already plenty good enough right now.

Jan 25 2022

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Tuesday, 25 January 2022 at 13:09:58 UTC, Adam D Ruppe wrote:
 On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 No. D's GC is already plenty good enough right now.

While i might pay for a good GC implementation to be worked on 
and added to D; However unless you're needing real-time and high 
workflow with GC active a lot, i don't see the need for it. So as 
Ruppe says, the current one is probably good enough.

I'd almost prefer to set and have the GC with it's own 
thread/core where it works at regular intervals; Recently having 
just gotten a 8 core machine i can't seem to keep all my cores 
busy, even when trying hard.

Jan 27 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Jan 27, 2022 at 09:11:18PM +0000, Era Scarecrow via Digitalmars-d wrote:
[...]
 [...] Recently having just gotten a 8 core machine i can't seem to
 keep all my cores busy, even when trying hard.

I recently also upgraded to an 8-core AMD CPU with hyperthreading, but I
find myself wishing it was 16 cores or 32... maybe even that 80-core
Intel experiment from a number of years ago.  It just takes forever to
churn through the large amounts of computations I throw at it.  With
high-volume compute-intensive tasks that I'm doing, one can never have
enough CPUs... :-P


T

-- 
"I suspect the best way to deal with procrastination is to put off the
procrastination itself until later. I've been meaning to try this, but haven't
gotten around to it yet. " -- swr

Jan 27 2022

Elronnd <elronnd elronnd.net> writes:

On Thursday, 27 January 2022 at 21:11:18 UTC, Era Scarecrow wrote:
 I'd almost prefer to set and have the GC with it's own 
 thread/core where it works at regular intervals

Sadly, doesn't work as well as we'd like.  Concurrent GC exists 
and does peg its own cores, but hurts mainline application 
performance; I hear 10-50% (depending on workload).  Contention 
sucks...

Jan 27 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 28/01/2022 10:11 AM, Era Scarecrow wrote:
 I'd almost prefer to set and have the GC with it's own thread/core where 
 it works at regular intervals; Recently having just gotten a 8 core 
 machine i can't seem to keep all my cores busy, even when trying hard.

We already do this (more or less).

     uint parallel = 99;      // number of additional threads for 
marking (limited by cpuid.threadsPerCPU-1)

https://github.com/dlang/druntime/blob/master/src/core/gc/config.d#L26

Jan 27 2022

bachmeier <no spam.net> writes:

On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 - High-throughput/scalable gc.  High sustained allocation 
 rates, large heaps, many cores, compacting&generational

 - Concurrent gc.  No pauses

What you are going to hear is "I'd like someone else to do a 
bunch of work, and if it benefits me, I'll think about using it."

If you're serious about this, you should put together an 
extensive set of numbers demonstrating clear failure of D's 
garbage collector, failure of existing D solutions, and 
well-defined opportunities for improvement.

Jan 25 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jan 25, 2022 at 02:24:48PM +0000, bachmeier via Digitalmars-d wrote:
 On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would you pay
 for either of these?
 
 - High-throughput/scalable gc.  High sustained allocation rates,
 large heaps, many cores, compacting&generational
 
 - Concurrent gc.  No pauses

 
 What you are going to hear is "I'd like someone else to do a bunch of
 work, and if it benefits me, I'll think about using it."
 
 If you're serious about this, you should put together an extensive set
 of numbers demonstrating clear failure of D's garbage collector,
 failure of existing D solutions, and well-defined opportunities for
 improvement.

+1.  Around these parts I hear a lot of complaints about GC this, GC
that, but I've yet to see actual performance measurements that show just
how bad the GC is.  It would be nice to see some actual numbers (and the
actual code where the bad GC performance happens) that would show us
just where D's GC is not up to the task, so that we have a concrete way
of measuring any progress (or lack thereof) made on the GC.


T

-- 
The only difference between male factor and malefactor is just a little
emptiness inside.

Jan 25 2022

IGotD- <nise nise.com> writes:

On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 - High-throughput/scalable gc.  High sustained allocation 
 rates, large heaps, many cores, compacting&generational

 - Concurrent gc.  No pauses

No I wouldn't pay for any of those because that's not where the 
problem lies.

The problem is that the maintainers refuse to realize that the 
language/runtime are too limited and cannot support any of the 
proposed GC types.

D has two options, either add a managed pointers in the language 
or use library pointer types types (like C++, unique_ptr etc). 
Problem is that the runtime and standard library also needs to be 
changed in order to support switching GC types depending on wich 
route they take.

After the D project adds the necessary additions to support plug 
and play GC types, new GC types will emerge naturally as many 
people with start to tinker with new GC types.

Jan 25 2022

rikki cattermole <rikki cattermole.co.nz> writes:

After reading my book on GC, you're kinda right.

Right now a generational GC wouldn't be possible in D due to not having 
write barriers.

However this is not a language limitation and can be freely added to the 
compiler implementation as an opt-in solution. The GC interface of 
course is also freely modifiable and would too need to be modified.

Right now there is only two sort of wins I can see being possible.

1) Make the current conservative GC support snapshotting for concurrency 
on Windows.
2) Support a task/fiber aware GC. This will kinda give us a generational 
GC, without actually being a generational GC.

Either way, still no reason to think we need to change the language to 
make more advanced GC's possible.

Just to be clear, that book clearly states that a generational GC is not 
always the best solution. It is not worth complicating the language by 
adding a whole new pointer type just to make this possible even if it 
was required (which it absolutely isn't).

Jan 25 2022

Elronnd <elronnd elronnd.net> writes:

On Wednesday, 26 January 2022 at 00:03:26 UTC, rikki cattermole 
wrote:
 After reading my book on GC, you're kinda right.

Thanks for the show of confidence :)


 2) Support a task/fiber aware GC. This will kinda give us a 
 generational GC, without actually being a generational GC.

Thread-local gc is a thing.  Good for false sharing too (w/real 
threads); can move contended objects away from owned ones.  But I 
see no reason why fibre-local heaps should need to be much 
different from thread-local heaps.

One java implementation used the high bits of the stack pointer 
as a thread identifier/tls pointer/etc.

I would like to see adaptive nursery size.  Good for 
non-fibre-based web stuff, also e.g. video games.  Imagine: you 
tell the GC every tick/request, and it tunes nursery size to 
99%ile allocation size per frame.  Actual GC is pretty much free 
since all the stuff you allocated over the course of the frame is 
gone.  Then you have all the safety of the full GC approach, and 
nearly all the performance of the manual arena approach (and much 
better than malloc/free).

Jan 25 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Wednesday, 26 January 2022 at 06:20:06 UTC, Elronnd wrote:
 Thread-local gc is a thing.  Good for false sharing too (w/real 
 threads); can move contended objects away from owned ones.  But 
 I see no reason why fibre-local heaps should need to be much 
 different from thread-local heaps.

The difference is that you maybe have 8 threads, but maybe 10000 
tasks. So in the latter case you cannot let the heap-owner 
collect its own garbage.

Jan 26 2022

Elronnd <elronnd elronnd.net> writes:

On Wednesday, 26 January 2022 at 08:20:51 UTC, Ola Fosheim 
Grøstad wrote:
 The difference is that you maybe have 8 threads, but maybe 
 10000 tasks. So in the latter case you cannot let the 
 heap-owner collect its own garbage.

Yes.  Good point.  The more I think about it, the more I see 
differences and opportunities to profit from doing things 
differently.

Jan 26 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Wednesday, 26 January 2022 at 08:32:44 UTC, Elronnd wrote:
 Yes.  Good point.  The more I think about it, the more I see 
 differences and opportunities to profit from doing things 
 differently.

Yes, if the load is somewhat even and you have 16 (8+8) cores 
then you could let 15 tasks run and pick one of the 100s others 
to collect with little impact on latency.

But you need heuristics to pick the one with most garbage and can 
be delayed without penalty (like if the task recently started 
waiting for network response or is marked as low priority).

So even though the situations seem similar conceptually I think a 
*good dedicated implementation* would be very different! :-D

Sounds like a fun project to me!!

Jan 26 2022

IGotD- <nise nise.com> writes:

On Wednesday, 26 January 2022 at 06:20:06 UTC, Elronnd wrote:
 Thread-local gc is a thing.  Good for false sharing too (w/real 
 threads); can move contended objects away from owned ones.  But 
 I see no reason why fibre-local heaps should need to be much 
 different from thread-local heaps.

I would like to challenge the idea that thread aware GC would do 
much for performance. Pegging memory to one thread is unusual and 
doesn't often correspond to the reality.

For example a computer game with large amount of vertex data 
where you decide to split up the workload on several threads. You 
don't make a thread local copy of that data but keep the original 
vertex data global and even destination buffer would be global.

What I can think of is a server with one thread per client with 
data that no other reason thread works on. Perhaps there thread 
local GC could be benefitial. My experience is that this thread 
model isn't good programming and servers should instead be 
completely async meaning any thread might handle the next partial 
work.

As I see it thread aware GC doesn't do much for performance but 
complicates it for the programmer.

Jan 28 2022

Paulo Pinto <pjmlp progtools.org> writes:

On Friday, 28 January 2022 at 10:18:32 UTC, IGotD- wrote:
On Wednesday, 26 January 2022 at 06:20:06 UTC, Elronnd wrote:
Thread-local gc is a thing. Good for false sharing too
(w/real threads); can move contended objects away from owned
ones. But I see no reason why fibre-local heaps should need
to be much different from thread-local heaps.

I would like to challenge the idea that thread aware GC would
do much for performance. Pegging memory to one thread is
unusual and doesn't often correspond to the reality.

For example a computer game with large amount of vertex data
where you decide to split up the workload on several threads.
You don't make a thread local copy of that data but keep the
original vertex data global and even destination buffer would
be global.

What I can think of is a server with one thread per client with
data that no other reason thread works on. Perhaps there thread
local GC could be benefitial. My experience is that this thread
model isn't good programming and servers should instead be
completely async meaning any thread might handle the next
partial work.

As I see it thread aware GC doesn't do much for performance but
complicates it for the programmer.

You can have your cake and eat it too, using something like Pony
capabilities.

The memory isn't copied in practice, just logically and just one
owner at a time.

https://tutorial.ponylang.io/reference-capabilities/reference-capabilities.html#the-list-of-reference-capabilities

That is something that would be impossible to put into D's
typesystem, without turning it into something else.

Also most key developers on Pony have moved into either Verona
(https://www.microsoft.com/en-us/research/project/project-verona), which
carries on these ideas, or Rust
(https://www.wallaroo.ai/blog/wallaroo-move-to-rust), so mostly likely Pony
will have a hard time to improve itself.

Jan 28 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 28 January 2022 at 11:11:54 UTC, Paulo Pinto wrote:
 You can have your cake and eat it too, using something like 
 Pony capabilities.

Pony is very much a high level language though. Which has some 
advantages such as being able to collect actors that no longer 
respond to any events. That is too high level for D though.

Jan 28 2022

jmh530 <john.michael.hall gmail.com> writes:

On Friday, 28 January 2022 at 11:11:54 UTC, Paulo Pinto wrote:
 [snip]

 That is something that would be impossible to put into D's 
 typesystem, without turning it into something else.

 [snip]

Well D would do it in a D way rather than in a pony way...for 
instance pony's val is similar to D's immutable but not the same. 
The question would be what from pony's reference capabilities 
would it make sense to add to D. I think the one with the most 
obvious benefit would be iso. Some people have talked about 
wanting something like that in the language. It's somewhat 
different from Rust's borrow checker in that it only allows one 
mutable alias, whereas Rust allows that or as many const aliases 
as you want (but not both).

Jan 28 2022

Paulo Pinto <pjmlp progtools.org> writes:

On Friday, 28 January 2022 at 15:34:49 UTC, jmh530 wrote:
 On Friday, 28 January 2022 at 11:11:54 UTC, Paulo Pinto wrote:
 [snip]

 That is something that would be impossible to put into D's 
 typesystem, without turning it into something else.

 [snip]

 Well D would do it in a D way rather than in a pony way...for 
 instance pony's val is similar to D's immutable but not the 
 same. The question would be what from pony's reference 
 capabilities would it make sense to add to D. I think the one 
 with the most obvious benefit would be iso. Some people have 
 talked about wanting something like that in the language. It's 
 somewhat different from Rust's borrow checker in that it only 
 allows one mutable alias, whereas Rust allows that or as many 
 const aliases as you want (but not both).

Yeah, maybe that would be possible then.

Still better stabilize D's approach to lifetimes first.

Jan 28 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 28 January 2022 at 10:18:32 UTC, IGotD- wrote:
 On Wednesday, 26 January 2022 at 06:20:06 UTC, Elronnd wrote:
 Thread-local gc is a thing.  Good for false sharing too 
 (w/real threads); can move contended objects away from owned 
 ones.  But I see no reason why fibre-local heaps should need 
 to be much different from thread-local heaps.

 I would like to challenge the idea that thread aware GC would 
 do much for performance. Pegging memory to one thread is 
 unusual and doesn't often correspond to the reality.

 For example a computer game with large amount of vertex data 
 where you decide to split up the workload on several threads. 
 You don't make a thread local copy of that data but keep the 
 original vertex data global and even destination buffer would 
 be global.

Which is why you would want ARC for shared objects and a local GC 
for tasks/actors.

Then what you need for more flexibility and optimization is 
static analysis that determines if local objects can be turned 
into shared objects. If that is possible you could put them in a 
separate region of the GC heap with space for a RC field at 
negative offset.

 What I can think of is a server with one thread per client with 
 data that no other reason thread works on.

It shouldn't be per thread, but per actor/task/fiber.

 My experience is that this thread model isn't good programming 
 and servers should instead be completely async meaning any 
 thread might handle the next partial work.

You have experience with this model? From where?

Actually, it could be massively beneficial if you have short 
lived actors and most objects have trivial destructors. Then you 
can simply release the entire local heap with no scanning.

You basically get to configure the system to use arena-allocators 
with GC-fallback for out-of-memory situations. Useful for actors 
where most of the memory it holds are released towards the end of 
the actor's life time.

 As I see it thread aware GC doesn't do much for performance but 
 complicates it for the programmer.

You cannot discuss performance without selecting a particular 
realistic application. Which is why system level programming 
requires multiple choices and configurations if you want 
automatic memory management. There is simply no model that works 
well with all scenarios.

What is needed for D is to find a combinations that works both 
for current high level programming D-users and also makes 
automatic memory management more useful in more system level 
programming scenarios.

Perfect should be considered as out-of-scope.

Jan 28 2022

Guillaume Piolat <first.last gmail.com> writes:

On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

No. No problem => no solution needed.

Jan 26 2022

D Programming

C/C++ Programming

Other

digitalmars.D - Would you pay for GC?