www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Would you pay for GC?

reply Elronnd <elronnd elronnd.net> writes:
Apropos recent discussion, here is a serious question: would you 
pay for either of these?

- High-throughput/scalable gc.  High sustained allocation rates, 
large heaps, many cores, compacting&generational

- Concurrent gc.  No pauses
Jan 24 2022
next sibling parent reply Random Dude <wtqdzouiyyrhaijcyy nthrw.com> writes:
On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 - High-throughput/scalable gc.  High sustained allocation 
 rates, large heaps, many cores, compacting&generational

 - Concurrent gc.  No pauses
I'd pay to have it removed and replaced with ARC. GC in it's current form can not compete with other more performant GCs and it shouldn't. D is in a unique position to enable people to write code as if they're writing python and also accommodate them when they want to do low-level optimizations. If we could just have automatic reference counting both the GC and No-GC people would be happy. It's okay if that route changes how pointers work (metadata would have to be added and some code would break), this is the right move in the long run.
Jan 24 2022
next sibling parent Elronnd <elronnd elronnd.net> writes:
On Tuesday, 25 January 2022 at 06:13:31 UTC, Random Dude wrote:
 GC in it's current form can not compete with other more 
 performant GCs
I think it can, for reasons I've explained elsewhere. But that's a bit beside the point; the question is: what _would_ you do _if_ it could?
 D is in a unique position to enable people to write code
 as if they're writing python and also accommodate them
 when they want to do low-level optimizations.
Fast AND expressive is a much sexier value proposition than fast XOR expressive.
Jan 24 2022
prev sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 25 January 2022 at 06:13:31 UTC, Random Dude wrote:
 On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 - High-throughput/scalable gc.  High sustained allocation 
 rates, large heaps, many cores, compacting&generational

 - Concurrent gc.  No pauses
I'd pay to have it removed and replaced with ARC. GC in it's current form can not compete with other more performant GCs and it shouldn't. D is in a unique position to enable people to write code as if they're writing python and also accommodate them when they want to do low-level optimizations. If we could just have automatic reference counting both the GC and No-GC people would be happy. It's okay if that route changes how pointers work (metadata would have to be added and some code would break), this is the right move in the long run.
ARC will also not compete, unless one goes the extra mile of making the compiler ARC aware, elide retain/release calls, do cascade deletions in background threads, take care on cascade deletions to avoid stack overflows on destructor calls, provide multicore friendly versions of them,..... If you are paying to replace GC with ARC, without putting the money to reach Swift level of performance (which is still pretty lame versus last gen tracing GCs in Java/.NET), then you will be getting lemons. https://forums.swift.org/t/a-roadmap-for-improving-swift-performance-predictability-arc-improvements-and-ownership-control/54206 I can already see it, the forums being inundated with complains about ARC performance versus other languages.
Jan 24 2022
next sibling parent reply Elronnd <elronnd elronnd.net> writes:
On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of 
 making the compiler ARC aware, elide retain/release calls, do 
 cascade deletions in background threads, take care on cascade 
 deletions to avoid stack overflows on destructor calls, provide 
 multicore friendly versions of them,.....
Indeed. See Bacon et al, 'Unified Theory of Garbage Collection': increasingly sophisticated RC approaches tracing (and vice versa). So it's a bit strange to assume we can do one but not the other. And tracing makes a better starting point due to the generational hypothesis.
Jan 24 2022
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 25/01/2022 8:22 PM, Elronnd wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of making 
 the compiler ARC aware, elide retain/release calls, do cascade 
 deletions in background threads, take care on cascade deletions to 
 avoid stack overflows on destructor calls, provide multicore friendly 
 versions of them,.....
Indeed.  See Bacon et al, 'Unified Theory of Garbage Collection': increasingly sophisticated RC approaches tracing (and vice versa).  So it's a bit strange to assume we can do one but not the other.  And tracing makes a better starting point due to the generational hypothesis.
RC shines for when deterministic destruction is required. So that is when you have any external resource bound to a D type. But it is horrible as a language default. Not all types like say a pointer or a slice should be bound to any sort of memory management strategy in a native language. It is very expensive compared to a GC. Due to the constant cache invalidations going on. I want to get RC properly in D without having to rely on a struct wrapper. That way the compiler can know that eliding of calls can take place. Plus if its in the language, we can get const string type too with classes and all!
Jan 25 2022
next sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 25 January 2022 at 08:24:22 UTC, rikki cattermole 
wrote:
 On 25/01/2022 8:22 PM, Elronnd wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of 
 making the compiler ARC aware, elide retain/release calls, do 
 cascade deletions in background threads, take care on cascade 
 deletions to avoid stack overflows on destructor calls, 
 provide multicore friendly versions of them,.....
Indeed.  See Bacon et al, 'Unified Theory of Garbage Collection': increasingly sophisticated RC approaches tracing (and vice versa).  So it's a bit strange to assume we can do one but not the other.  And tracing makes a better starting point due to the generational hypothesis.
RC shines for when deterministic destruction is required. ...
That is the naive idea, until a cascade deletion of a graph based datastructure happens.
Jan 25 2022
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/25/22 4:32 AM, Paulo Pinto wrote:
 On Tuesday, 25 January 2022 at 08:24:22 UTC, rikki cattermole wrote:
 On 25/01/2022 8:22 PM, Elronnd wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of making 
 the compiler ARC aware, elide retain/release calls, do cascade 
 deletions in background threads, take care on cascade deletions to 
 avoid stack overflows on destructor calls, provide multicore 
 friendly versions of them,.....
Indeed.  See Bacon et al, 'Unified Theory of Garbage Collection': increasingly sophisticated RC approaches tracing (and vice versa). So it's a bit strange to assume we can do one but not the other.  And tracing makes a better starting point due to the generational hypothesis.
RC shines for when deterministic destruction is required. ...
That is the naive idea, until a cascade deletion of a graph based datastructure happens.
I use ARC for determinism only, not memory deallocation: https://github.com/schveiguy/iopipe/blob/master/source/iopipe/refc.d e.g., when I want the last reference to a buffered output stream to flush its buffer and close the file when going out of scope. I don't care about the memory management, that's fine for the GC to clean up. As an added benefit, it's trivially ` safe`. -Steve
Jan 25 2022
prev sibling parent norm <norm.rowtree gmail.com> writes:
On Tuesday, 25 January 2022 at 08:24:22 UTC, rikki cattermole 
wrote:
 On 25/01/2022 8:22 PM, Elronnd wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of 
 making the compiler ARC aware, elide retain/release calls, do 
 cascade deletions in background threads, take care on cascade 
 deletions to avoid stack overflows on destructor calls, 
 provide multicore friendly versions of them,.....
Indeed.  See Bacon et al, 'Unified Theory of Garbage Collection': increasingly sophisticated RC approaches tracing (and vice versa).  So it's a bit strange to assume we can do one but not the other.  And tracing makes a better starting point due to the generational hypothesis.
RC shines for when deterministic destruction is required.
In a small code base it might but in larger SW RC is on par with GC because it is impossible to keep track of all the references. You end up with leaks and dangling shared_ptrs because someone has a ref ... somewhere. The best option in a large code base is RAII with value types until you cannot and then rely on unique_ptr equivalents that can only have a single owner at any one time (without hackery). Only RC/shared_ptr when you absolutely must have multiple owners and even then interrogate your design.
Jan 25 2022
prev sibling parent reply Araq <rumpf_a web.de> writes:
On Tuesday, 25 January 2022 at 07:22:36 UTC, Elronnd wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of 
 making the compiler ARC aware, elide retain/release calls, do 
 cascade deletions in background threads, take care on cascade 
 deletions to avoid stack overflows on destructor calls, 
 provide multicore friendly versions of them,.....
Indeed. See Bacon et al, 'Unified Theory of Garbage Collection': increasingly sophisticated RC approaches tracing (and vice versa). So it's a bit strange to assume we can do one but not the other. And tracing makes a better starting point due to the generational hypothesis.
Only if you take the "deferred" RC route, which Swift/Rust/C++/Nim do not! Without the "deferred" aspect RC remains quite a different beast. Different algorithm, different runtime profiles, different memory consumptions, enables different optimizations and of course different problems.
Jan 25 2022
next sibling parent Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 25 January 2022 at 09:42:25 UTC, Araq wrote:
 On Tuesday, 25 January 2022 at 07:22:36 UTC, Elronnd wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 ARC will also not compete, unless one goes the extra mile of 
 making the compiler ARC aware, elide retain/release calls, do 
 cascade deletions in background threads, take care on cascade 
 deletions to avoid stack overflows on destructor calls, 
 provide multicore friendly versions of them,.....
Indeed. See Bacon et al, 'Unified Theory of Garbage Collection': increasingly sophisticated RC approaches tracing (and vice versa). So it's a bit strange to assume we can do one but not the other. And tracing makes a better starting point due to the generational hypothesis.
Only if you take the "deferred" RC route, which Swift/Rust/C++/Nim do not! Without the "deferred" aspect RC remains quite a different beast. Different algorithm, different runtime profiles, different memory consumptions, enables different optimizations and of course different problems.
Indeed, that was kind of my point, unless one is willing to invest the required resources, a bit like you guys are doing with Nim, a RC implementation will not magically outperform modern tracing GC, only naive implementations of tracing GCs.
Jan 25 2022
prev sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 January 2022 at 09:42:25 UTC, Araq wrote:
 Only if you take the "deferred" RC route, which 
 Swift/Rust/C++/Nim do not!
What do you mean by "deferred"? RC increment when taking a reference from the heap, but not when the reference is taken from the stack + periodic stack scanning? Actually, in C++ (and to some extent in Objective-C) you minimize reference counting by using programmer knowledge. You increment when you take it from the heap and from thereon you use a borrowed (raw) pointer down the call tree. Anyway, the key problem is not solved by "deferred RC". The key problem can only be solved by segmenting the heap in the type system.
Jan 25 2022
next sibling parent Araq <rumpf_a web.de> writes:
On Tuesday, 25 January 2022 at 09:58:14 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 25 January 2022 at 09:42:25 UTC, Araq wrote:
 Only if you take the "deferred" RC route, which 
 Swift/Rust/C++/Nim do not!
What do you mean by "deferred"? RC increment when taking a reference from the heap, but not when the reference is taken from the stack + periodic stack scanning?
That's what it means, yes.
 Actually, in C++ (and to some extent in Objective-C) you 
 minimize reference counting by using programmer knowledge. You 
 increment when you take it from the heap and from thereon you 
 use a borrowed (raw) pointer down the call tree.

 Anyway, the key problem is not solved by "deferred RC". The key 
 problem can only be solved by segmenting the heap in the type 
 system.
I didn't claim that deferred RC is a "solution". My post was a reply to another post.
Jan 25 2022
prev sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 25 January 2022 at 09:58:14 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 25 January 2022 at 09:42:25 UTC, Araq wrote:
 Only if you take the "deferred" RC route, which 
 Swift/Rust/C++/Nim do not!
What do you mean by "deferred"? RC increment when taking a reference from the heap, but not when the reference is taken from the stack + periodic stack scanning? Actually, in C++ (and to some extent in Objective-C) you minimize reference counting by using programmer knowledge. You increment when you take it from the heap and from thereon you use a borrowed (raw) pointer down the call tree. ...
Pity that programmer knowledge can't do much to ABI requirements. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1116r0.pdf Also that trick only works in single developer code bases, good luck not introducing a memory corruption some months/years down the line. My experience with COM proves that is generally what happens when one decides to be smart about manually optimizing AddRef/Release calls.
Jan 25 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 January 2022 at 10:56:50 UTC, Paulo Pinto wrote:
 Pity that programmer knowledge can't do much to ABI 
 requirements.
C++ has an ABI?
 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1116r0.pdf
So this is basically about giving a shared_ptr the semantics of a unique_ptr. That is not required for what we are talking about here.
 Also that trick only works in single developer code bases, good 
 luck not introducing a memory corruption some months/years down 
 the line.
No, this is not a big issue if you create proper ADTs. The issue is that it is very difficult for a compiler to distinguish between objects that "wrap ownership" around a data-structure and nodes within a datastructure; in particular what happens to ownership when those nodes are rearranged. However, the programmer should have good and solid knowledge about this, so you only need to increment on the root-object if you know that nodes do not escape below a point in the call tree. (And you might be able to wrap this in a reference-type specific to the ADT). Anyway, in C++ you tend almost always to use unique_ptr, shared_ptr is the exception. So you usually have very few shared_ptrs and therefore they are not all that hard to reason about. For a language like D you could have ARC + borrow checker + the ability to constrain ARC-pointers (to get a unique_ptr) for shared objects and something GC-like for objects local to actors/tasks.
Jan 25 2022
parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 25 January 2022 at 11:47:26 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 25 January 2022 at 10:56:50 UTC, Paulo Pinto wrote:
 Pity that programmer knowledge can't do much to ABI 
 requirements.
C++ has an ABI?
Yes, the one from the compiler and OS vendor shipping their C++ compilers on their platform.
 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1116r0.pdf
So this is basically about giving a shared_ptr the semantics of a unique_ptr. That is not required for what we are talking about here.
 Also that trick only works in single developer code bases, 
 good luck not introducing a memory corruption some 
 months/years down the line.
No, this is not a big issue if you create proper ADTs. The issue is that it is very difficult for a compiler to distinguish between objects that "wrap ownership" around a data-structure and nodes within a datastructure; in particular what happens to ownership when those nodes are rearranged. However, the programmer should have good and solid knowledge about this, so you only need to increment on the root-object if you know that nodes do not escape below a point in the call tree. (And you might be able to wrap this in a reference-type specific to the ADT). Anyway, in C++ you tend almost always to use unique_ptr, shared_ptr is the exception. So you usually have very few shared_ptrs and therefore they are not all that hard to reason about.
As someone that does security as part of DevOps assignments, what the programmers should be able to do, and what they actually deploy into production isn't always the same. That is how we end up with the 70% magical number being quoted from several security reports.
 For a language like D you could have ARC + borrow checker + the 
 ability to constrain ARC-pointers (to get a unique_ptr) for 
 shared objects and something GC-like for objects local to 
 actors/tasks.
In theory yes, in practice someone has to put down the money to make it happen and ensure that the performance gains are worth the money spent into it.
Jan 25 2022
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 January 2022 at 12:29:46 UTC, Paulo Pinto wrote:
 For a language like D you could have ARC + borrow checker + 
 the ability to constrain ARC-pointers (to get a unique_ptr) 
 for shared objects and something GC-like for objects local to 
 actors/tasks.
In theory yes, in practice someone has to put down the money to make it happen and ensure that the performance gains are worth the money spent into it.
Actually, all it takes is for the core team to make it a priority. The existing GC can be taken as a starting point for local GCs, and you don't have to start with ARC, you can just start with well designed RC as a foundation to evolve from. What is needed, to get this ball rolling as an open source project, is to focus on making the compiler more modular, especially the backend-interface. So, I don't think this is a strict money issue. Leadership needs to: 1. provide a the clean compiler architecture that allows adding additional static analysis 2. pick a memory management "coordination" design that can evolve in a open-source friendly manner (e.g. a protocol for purging unused/cached resources) Only doable if leadership makes it a priority, as creating a better compiler architecture based on DMD is out of scope for individual contributors. Without making memory management a priority, as a strategy, nothing will happen. And the reason for this is that good modern memory management requires solid static analysis and that is hard to add, for outsiders, to the current compiler architecture. It is also clear that maintaining a separate branch of the compiler over time is not productive (given how D evolves, e.g. the sudden addition of import-C). It would be more satisfying to just create your own language then… *Which many D users seem to do!!!* Given that last fact it becomes clear that this is not a money issue. People apparently find creating compiler tech enjoyable, if they have a good starting point to evolve from. I don't think the D foundation necessarily has to provide a memory management solution that fits system level programming, but if it wants this to work as an well functioning open source project then the foundation must make sure that the core compiler infrastructure has well defined interfaces, and follows an open design philosophy, so that people can evolve solutions that fits their concrete projects and interests. (Java had an advantage in having a well defined VM/IR that made it easy to build on for outsiders.)
Jan 25 2022
prev sibling next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 If you are paying to replace GC with ARC, without putting the 
 money to reach Swift level of performance (which is still 
 pretty lame versus last gen tracing GCs in Java/.NET), then you 
 will be getting lemons.
With per fiber/task/actor ownership + unique_ptr you reduce the need to increase the refcount significantly.
Jan 25 2022
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 25/01/2022 10:23 PM, Ola Fosheim Grøstad wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 If you are paying to replace GC with ARC, without putting the money to 
 reach Swift level of performance (which is still pretty lame versus 
 last gen tracing GCs in Java/.NET), then you will be getting lemons.
With per fiber/task/actor ownership + unique_ptr you reduce the need to increase the refcount significantly.
With scope without ref or return it may even be possible to elide all calls to reference counting except for the last decrement.
Jan 25 2022
prev sibling parent Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 25 January 2022 at 09:23:01 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 If you are paying to replace GC with ARC, without putting the 
 money to reach Swift level of performance (which is still 
 pretty lame versus last gen tracing GCs in Java/.NET), then 
 you will be getting lemons.
With per fiber/task/actor ownership + unique_ptr you reduce the need to increase the refcount significantly.
It doesn't matter if it is still worse than the competition.
Jan 25 2022
prev sibling parent reply Tejas <notrealemail gmail.com> writes:
On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
 On Tuesday, 25 January 2022 at 06:13:31 UTC, Random Dude wrote:
 On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 - High-throughput/scalable gc.  High sustained allocation 
 rates, large heaps, many cores, compacting&generational

 - Concurrent gc.  No pauses
I'd pay to have it removed and replaced with ARC. GC in it's current form can not compete with other more performant GCs and it shouldn't. D is in a unique position to enable people to write code as if they're writing python and also accommodate them when they want to do low-level optimizations. If we could just have automatic reference counting both the GC and No-GC people would be happy. It's okay if that route changes how pointers work (metadata would have to be added and some code would break), this is the right move in the long run.
ARC will also not compete, unless one goes the extra mile of making the compiler ARC aware, elide retain/release calls, do cascade deletions in background threads, take care on cascade deletions to avoid stack overflows on destructor calls, provide multicore friendly versions of them,..... If you are paying to replace GC with ARC, without putting the money to reach Swift level of performance (which is still pretty lame versus last gen tracing GCs in Java/.NET), then you will be getting lemons. https://forums.swift.org/t/a-roadmap-for-improving-swift-performance-predictability-arc-improvements-and-ownership-control/54206 I can already see it, the forums being inundated with complains about ARC performance versus other languages.
Even then people are dissatisfied, apparently. I asked Reddit why ARC isn't used more widely despite Swift being so successful and was **swiftly**(pun intended 😉) corrected that Swift user share has become 50% of what it once was at it's peak. https://www.reddit.com/r/Compilers/comments/s6r9wo/pure_arc_in_a_low_levelprogramming_language/htpx4g1/?utm_medium=android_app&utm_source=share&context=3
Jan 25 2022
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jan 26, 2022 at 02:09:24AM +0000, Tejas via Digitalmars-d wrote:
 On Tuesday, 25 January 2022 at 07:13:41 UTC, Paulo Pinto wrote:
[...]
 If you are paying to replace GC with ARC, without putting the money
 to reach Swift level of performance (which is still pretty lame
 versus last gen tracing GCs in Java/.NET), then you will be getting
 lemons.
 
 https://forums.swift.org/t/a-roadmap-for-improving-swift-performance-predictability-arc-improvements-and-ownership-control/54206
 
 I can already see it, the forums being inundated with complains
 about ARC performance versus other languages.
Even then people are dissatisfied, apparently. I asked Reddit why ARC isn't used more widely despite Swift being so successful and was **swiftly**(pun intended 😉) corrected that Swift user share has become 50% of what it once was at it's peak.
[...] Cognitive dissonance. :-P T -- Try to keep an open mind, but not so open your brain falls out. -- theboz
Jan 25 2022
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 26 January 2022 at 02:09:24 UTC, Tejas wrote:
 I asked Reddit why ARC isn't used more widely despite Swift 
 being so successful and was **swiftly**(pun intended 😉) 
 corrected that Swift user share has become 50% of what it once 
 was at it's peak.
Bullshit argument. There is much less demand for iOS-only or Android-only development than cross-platform. Swift is not cross-platform. Thus Dart and other solutions are cheaper. Cheaper wins. What makes Swift annoying is related to Objective-C requirements. Swift + C++ is ok for development of Apple-only applications.
Jan 25 2022
prev sibling next sibling parent reply max haughton <maxhaton gmail.com> writes:
On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 - High-throughput/scalable gc.  High sustained allocation 
 rates, large heaps, many cores, compacting&generational

 - Concurrent gc.  No pauses
If it was delivered the foundation would probably be happy to give at least some money (not that we have an unlimited supply), on the condition that it were open-sourced. Speaking as a user of D, I wouldn't use a forked compiler should one be required.
Jan 25 2022
parent Mike Parker <aldacron gmail.com> writes:
On Tuesday, 25 January 2022 at 10:19:48 UTC, max haughton wrote:
 On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 - High-throughput/scalable gc.  High sustained allocation 
 rates, large heaps, many cores, compacting&generational

 - Concurrent gc.  No pauses
If it was delivered the foundation would probably be happy to give at least some money (not that we have an unlimited supply), on the condition that it were open-sourced.
A contract for this sort of work is always a possibility. That's what the HR fund is for: https://www.flipcause.com/secure/cause_pdetails/NTUxOTc= Anyone serious about doing a project like this (any relatively complex project, not just a new GC) can get in touch and we can discuss it. I'm not saying the foundation *would* pay for any particular project, but the discussion and possibly a meeting could lead to that if agreement is reached that it's worth doing.
Jan 25 2022
prev sibling next sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?
No. D's GC is already plenty good enough right now.
Jan 25 2022
parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Tuesday, 25 January 2022 at 13:09:58 UTC, Adam D Ruppe wrote:
 On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?
No. D's GC is already plenty good enough right now.
While i might pay for a good GC implementation to be worked on and added to D; However unless you're needing real-time and high workflow with GC active a lot, i don't see the need for it. So as Ruppe says, the current one is probably good enough. I'd almost prefer to set and have the GC with it's own thread/core where it works at regular intervals; Recently having just gotten a 8 core machine i can't seem to keep all my cores busy, even when trying hard.
Jan 27 2022
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jan 27, 2022 at 09:11:18PM +0000, Era Scarecrow via Digitalmars-d wrote:
[...]
 [...] Recently having just gotten a 8 core machine i can't seem to
 keep all my cores busy, even when trying hard.
I recently also upgraded to an 8-core AMD CPU with hyperthreading, but I find myself wishing it was 16 cores or 32... maybe even that 80-core Intel experiment from a number of years ago. It just takes forever to churn through the large amounts of computations I throw at it. With high-volume compute-intensive tasks that I'm doing, one can never have enough CPUs... :-P T -- "I suspect the best way to deal with procrastination is to put off the procrastination itself until later. I've been meaning to try this, but haven't gotten around to it yet. " -- swr
Jan 27 2022
prev sibling next sibling parent Elronnd <elronnd elronnd.net> writes:
On Thursday, 27 January 2022 at 21:11:18 UTC, Era Scarecrow wrote:
 I'd almost prefer to set and have the GC with it's own 
 thread/core where it works at regular intervals
Sadly, doesn't work as well as we'd like. Concurrent GC exists and does peg its own cores, but hurts mainline application performance; I hear 10-50% (depending on workload). Contention sucks...
Jan 27 2022
prev sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 28/01/2022 10:11 AM, Era Scarecrow wrote:
 I'd almost prefer to set and have the GC with it's own thread/core where 
 it works at regular intervals; Recently having just gotten a 8 core 
 machine i can't seem to keep all my cores busy, even when trying hard.
We already do this (more or less). uint parallel = 99; // number of additional threads for marking (limited by cpuid.threadsPerCPU-1) https://github.com/dlang/druntime/blob/master/src/core/gc/config.d#L26
Jan 27 2022
prev sibling next sibling parent reply bachmeier <no spam.net> writes:
On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 - High-throughput/scalable gc.  High sustained allocation 
 rates, large heaps, many cores, compacting&generational

 - Concurrent gc.  No pauses
What you are going to hear is "I'd like someone else to do a bunch of work, and if it benefits me, I'll think about using it." If you're serious about this, you should put together an extensive set of numbers demonstrating clear failure of D's garbage collector, failure of existing D solutions, and well-defined opportunities for improvement.
Jan 25 2022
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jan 25, 2022 at 02:24:48PM +0000, bachmeier via Digitalmars-d wrote:
 On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would you pay
 for either of these?
 
 - High-throughput/scalable gc.  High sustained allocation rates,
 large heaps, many cores, compacting&generational
 
 - Concurrent gc.  No pauses
What you are going to hear is "I'd like someone else to do a bunch of work, and if it benefits me, I'll think about using it." If you're serious about this, you should put together an extensive set of numbers demonstrating clear failure of D's garbage collector, failure of existing D solutions, and well-defined opportunities for improvement.
+1. Around these parts I hear a lot of complaints about GC this, GC that, but I've yet to see actual performance measurements that show just how bad the GC is. It would be nice to see some actual numbers (and the actual code where the bad GC performance happens) that would show us just where D's GC is not up to the task, so that we have a concrete way of measuring any progress (or lack thereof) made on the GC. T -- The only difference between male factor and malefactor is just a little emptiness inside.
Jan 25 2022
prev sibling next sibling parent reply IGotD- <nise nise.com> writes:
On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?

 - High-throughput/scalable gc.  High sustained allocation 
 rates, large heaps, many cores, compacting&generational

 - Concurrent gc.  No pauses
No I wouldn't pay for any of those because that's not where the problem lies. The problem is that the maintainers refuse to realize that the language/runtime are too limited and cannot support any of the proposed GC types. D has two options, either add a managed pointers in the language or use library pointer types types (like C++, unique_ptr etc). Problem is that the runtime and standard library also needs to be changed in order to support switching GC types depending on wich route they take. After the D project adds the necessary additions to support plug and play GC types, new GC types will emerge naturally as many people with start to tinker with new GC types.
Jan 25 2022
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
After reading my book on GC, you're kinda right.

Right now a generational GC wouldn't be possible in D due to not having 
write barriers.

However this is not a language limitation and can be freely added to the 
compiler implementation as an opt-in solution. The GC interface of 
course is also freely modifiable and would too need to be modified.

Right now there is only two sort of wins I can see being possible.

1) Make the current conservative GC support snapshotting for concurrency 
on Windows.
2) Support a task/fiber aware GC. This will kinda give us a generational 
GC, without actually being a generational GC.

Either way, still no reason to think we need to change the language to 
make more advanced GC's possible.

Just to be clear, that book clearly states that a generational GC is not 
always the best solution. It is not worth complicating the language by 
adding a whole new pointer type just to make this possible even if it 
was required (which it absolutely isn't).
Jan 25 2022
parent reply Elronnd <elronnd elronnd.net> writes:
On Wednesday, 26 January 2022 at 00:03:26 UTC, rikki cattermole 
wrote:
 After reading my book on GC, you're kinda right.
Thanks for the show of confidence :)
 2) Support a task/fiber aware GC. This will kinda give us a 
 generational GC, without actually being a generational GC.
Thread-local gc is a thing. Good for false sharing too (w/real threads); can move contended objects away from owned ones. But I see no reason why fibre-local heaps should need to be much different from thread-local heaps. One java implementation used the high bits of the stack pointer as a thread identifier/tls pointer/etc. I would like to see adaptive nursery size. Good for non-fibre-based web stuff, also e.g. video games. Imagine: you tell the GC every tick/request, and it tunes nursery size to 99%ile allocation size per frame. Actual GC is pretty much free since all the stuff you allocated over the course of the frame is gone. Then you have all the safety of the full GC approach, and nearly all the performance of the manual arena approach (and much better than malloc/free).
Jan 25 2022
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 26 January 2022 at 06:20:06 UTC, Elronnd wrote:
 Thread-local gc is a thing.  Good for false sharing too (w/real 
 threads); can move contended objects away from owned ones.  But 
 I see no reason why fibre-local heaps should need to be much 
 different from thread-local heaps.
The difference is that you maybe have 8 threads, but maybe 10000 tasks. So in the latter case you cannot let the heap-owner collect its own garbage.
Jan 26 2022
parent reply Elronnd <elronnd elronnd.net> writes:
On Wednesday, 26 January 2022 at 08:20:51 UTC, Ola Fosheim 
Grøstad wrote:
 The difference is that you maybe have 8 threads, but maybe 
 10000 tasks. So in the latter case you cannot let the 
 heap-owner collect its own garbage.
Yes. Good point. The more I think about it, the more I see differences and opportunities to profit from doing things differently.
Jan 26 2022
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 26 January 2022 at 08:32:44 UTC, Elronnd wrote:
 Yes.  Good point.  The more I think about it, the more I see 
 differences and opportunities to profit from doing things 
 differently.
Yes, if the load is somewhat even and you have 16 (8+8) cores then you could let 15 tasks run and pick one of the 100s others to collect with little impact on latency. But you need heuristics to pick the one with most garbage and can be delayed without penalty (like if the task recently started waiting for network response or is marked as low priority). So even though the situations seem similar conceptually I think a *good dedicated implementation* would be very different! :-D Sounds like a fun project to me!!
Jan 26 2022
prev sibling parent reply IGotD- <nise nise.com> writes:
On Wednesday, 26 January 2022 at 06:20:06 UTC, Elronnd wrote:
 Thread-local gc is a thing.  Good for false sharing too (w/real 
 threads); can move contended objects away from owned ones.  But 
 I see no reason why fibre-local heaps should need to be much 
 different from thread-local heaps.
I would like to challenge the idea that thread aware GC would do much for performance. Pegging memory to one thread is unusual and doesn't often correspond to the reality. For example a computer game with large amount of vertex data where you decide to split up the workload on several threads. You don't make a thread local copy of that data but keep the original vertex data global and even destination buffer would be global. What I can think of is a server with one thread per client with data that no other reason thread works on. Perhaps there thread local GC could be benefitial. My experience is that this thread model isn't good programming and servers should instead be completely async meaning any thread might handle the next partial work. As I see it thread aware GC doesn't do much for performance but complicates it for the programmer.
Jan 28 2022
next sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Friday, 28 January 2022 at 10:18:32 UTC, IGotD- wrote:
 On Wednesday, 26 January 2022 at 06:20:06 UTC, Elronnd wrote:
 Thread-local gc is a thing.  Good for false sharing too 
 (w/real threads); can move contended objects away from owned 
 ones.  But I see no reason why fibre-local heaps should need 
 to be much different from thread-local heaps.
I would like to challenge the idea that thread aware GC would do much for performance. Pegging memory to one thread is unusual and doesn't often correspond to the reality. For example a computer game with large amount of vertex data where you decide to split up the workload on several threads. You don't make a thread local copy of that data but keep the original vertex data global and even destination buffer would be global. What I can think of is a server with one thread per client with data that no other reason thread works on. Perhaps there thread local GC could be benefitial. My experience is that this thread model isn't good programming and servers should instead be completely async meaning any thread might handle the next partial work. As I see it thread aware GC doesn't do much for performance but complicates it for the programmer.
You can have your cake and eat it too, using something like Pony capabilities. The memory isn't copied in practice, just logically and just one owner at a time. https://tutorial.ponylang.io/reference-capabilities/reference-capabilities.html#the-list-of-reference-capabilities That is something that would be impossible to put into D's typesystem, without turning it into something else. Also most key developers on Pony have moved into either Verona (https://www.microsoft.com/en-us/research/project/project-verona), which carries on these ideas, or Rust (https://www.wallaroo.ai/blog/wallaroo-move-to-rust), so mostly likely Pony will have a hard time to improve itself.
Jan 28 2022
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 28 January 2022 at 11:11:54 UTC, Paulo Pinto wrote:
 You can have your cake and eat it too, using something like 
 Pony capabilities.
Pony is very much a high level language though. Which has some advantages such as being able to collect actors that no longer respond to any events. That is too high level for D though.
Jan 28 2022
prev sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Friday, 28 January 2022 at 11:11:54 UTC, Paulo Pinto wrote:
 [snip]

 That is something that would be impossible to put into D's 
 typesystem, without turning it into something else.

 [snip]
Well D would do it in a D way rather than in a pony way...for instance pony's val is similar to D's immutable but not the same. The question would be what from pony's reference capabilities would it make sense to add to D. I think the one with the most obvious benefit would be iso. Some people have talked about wanting something like that in the language. It's somewhat different from Rust's borrow checker in that it only allows one mutable alias, whereas Rust allows that or as many const aliases as you want (but not both).
Jan 28 2022
parent Paulo Pinto <pjmlp progtools.org> writes:
On Friday, 28 January 2022 at 15:34:49 UTC, jmh530 wrote:
 On Friday, 28 January 2022 at 11:11:54 UTC, Paulo Pinto wrote:
 [snip]

 That is something that would be impossible to put into D's 
 typesystem, without turning it into something else.

 [snip]
Well D would do it in a D way rather than in a pony way...for instance pony's val is similar to D's immutable but not the same. The question would be what from pony's reference capabilities would it make sense to add to D. I think the one with the most obvious benefit would be iso. Some people have talked about wanting something like that in the language. It's somewhat different from Rust's borrow checker in that it only allows one mutable alias, whereas Rust allows that or as many const aliases as you want (but not both).
Yeah, maybe that would be possible then. Still better stabilize D's approach to lifetimes first.
Jan 28 2022
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 28 January 2022 at 10:18:32 UTC, IGotD- wrote:
 On Wednesday, 26 January 2022 at 06:20:06 UTC, Elronnd wrote:
 Thread-local gc is a thing.  Good for false sharing too 
 (w/real threads); can move contended objects away from owned 
 ones.  But I see no reason why fibre-local heaps should need 
 to be much different from thread-local heaps.
I would like to challenge the idea that thread aware GC would do much for performance. Pegging memory to one thread is unusual and doesn't often correspond to the reality. For example a computer game with large amount of vertex data where you decide to split up the workload on several threads. You don't make a thread local copy of that data but keep the original vertex data global and even destination buffer would be global.
Which is why you would want ARC for shared objects and a local GC for tasks/actors. Then what you need for more flexibility and optimization is static analysis that determines if local objects can be turned into shared objects. If that is possible you could put them in a separate region of the GC heap with space for a RC field at negative offset.
 What I can think of is a server with one thread per client with 
 data that no other reason thread works on.
It shouldn't be per thread, but per actor/task/fiber.
 My experience is that this thread model isn't good programming 
 and servers should instead be completely async meaning any 
 thread might handle the next partial work.
You have experience with this model? From where? Actually, it could be massively beneficial if you have short lived actors and most objects have trivial destructors. Then you can simply release the entire local heap with no scanning. You basically get to configure the system to use arena-allocators with GC-fallback for out-of-memory situations. Useful for actors where most of the memory it holds are released towards the end of the actor's life time.
 As I see it thread aware GC doesn't do much for performance but 
 complicates it for the programmer.
You cannot discuss performance without selecting a particular realistic application. Which is why system level programming requires multiple choices and configurations if you want automatic memory management. There is simply no model that works well with all scenarios. What is needed for D is to find a combinations that works both for current high level programming D-users and also makes automatic memory management more useful in more system level programming scenarios. Perfect should be considered as out-of-scope.
Jan 28 2022
prev sibling parent Guillaume Piolat <first.last gmail.com> writes:
On Tuesday, 25 January 2022 at 03:37:57 UTC, Elronnd wrote:
 Apropos recent discussion, here is a serious question: would 
 you pay for either of these?
No. No problem => no solution needed.
Jan 26 2022