www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Incremental garbage collection

reply Chris Katko <ckatko gmail.com> writes:
I just found out that Unity has incremental garbage collection. I 
didn't know that was a possibility for GCs.

https://docs.unity3d.com/Manual/performance-incremental-garbage-collection.html

I'm just curious what the D language community's thoughts are on 
it. The tradeoff is: For
a longer total time / lower throughput, you reduce stress on 
individual frames which prevents hiccups. That's pretty darn 
important for games and soft/hard real-time systems.

Is that an option on a hypothetical level, for D? Or does D's 
language design preclude that. I recall some sort of issue with 

to insert memory barriers into D to work or something like that.

Also, it's hard to find one specific place for "news" regarding 
D's garbage collector development and "omg we deleted everything 
GC from the standard library" projects. Are there any major 
updates on those fronts?
Jan 20
next sibling parent Chris Katko <ckatko gmail.com> writes:
On Thursday, 20 January 2022 at 23:56:42 UTC, Chris Katko wrote:
 I just found out that Unity has incremental garbage collection. 
 I didn't know that was a possibility for GCs.

 [...]
Oh, also, here's a blog post when they released in the end of 2018 https://blog.unity.com/technology/feature-preview-incremental-garbage-collection
Jan 20
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jan 20, 2022 at 11:56:42PM +0000, Chris Katko via Digitalmars-d wrote:
 I just found out that Unity has incremental garbage collection. I
 didn't know that was a possibility for GCs.
[...] AFAIK, Ruby has had it ages ago. It's not a new idea.
 Is that an option on a hypothetical level, for D? Or does D's language
 design preclude that. I recall some sort of issue with using

 memory barriers into D to work or something like that.
Write barriers are required for an incremental GC to work. Write barriers are unlikely to be accepted in D because they introduce synchonization overhead per pointer write. This would kill D's performance in low-level code, and I don't think Walter will ever accept this.
 Also, it's hard to find one specific place for "news" regarding D's
 garbage collector development and "omg we deleted everything GC from
 the standard library" projects. Are there any major updates on those
 fronts?
https://dlang.org/blog/the-gc-series/ The GC isn't going anywhere. AFAIK there's some work to eliminate *unnecessary* GC usage in Phobos, et al, but there is no such project to remove *all* GC use from Phobos. T -- Three out of two people have difficulties with fractions. -- Dirk Eddelbuettel
Jan 20
prev sibling next sibling parent reply Elronnd <elronnd elronnd.net> writes:
On Thursday, 20 January 2022 at 23:56:42 UTC, Chris Katko wrote:
 I'm just curious what the D language community's thoughts are 
 on it. The tradeoff is: For
 a longer total time / lower throughput, you reduce stress on 
 individual frames which prevents hiccups. That's pretty darn 
 important for games and soft/hard real-time systems.
You can already avoid pauses by simply not allocating. I would prefer to see a generational gc (much higher throughput). Note also incremental is not a panacea; the mutator can still get ahead of the collector; go has a somewhat laughable solution to this. On Friday, 21 January 2022 at 00:55:43 UTC, H. S. Teoh wrote:
 Write barriers are required for an incremental GC to work. 
 Write barriers are unlikely to be accepted in D because they 
 introduce synchonization overhead per pointer write. This would 
 kill D's performance in low-level code, and I don't think 
 Walter will ever accept this.
I have heard this claim before. I don't buy it. 1. Barriers are not that slow. It is not a 'synchronization'; it is a cheap compare-and-branch that will be correctly predicted. (Nobody complains about bounds checking...) 2. You do not need a barrier on _every_ pointer write 3. It can be a compile-time option to disable it 4. Lack of typed pointers (gc/non-gc) can be solved conservatively with monomorphization
Jan 20
parent reply Brad Roberts <braddr puremagic.com> writes:
On 1/20/2022 5:46 PM, Elronnd via Digitalmars-d wrote:
 On Friday, 21 January 2022 at 00:55:43 UTC, H. S. Teoh wrote:
 Write barriers are required for an incremental GC to work. Write 
 barriers are unlikely to be accepted in D because they introduce 
 synchronization overhead per pointer write. This would kill D's 
 performance in low-level code, and I don't think Walter will ever 
 accept this.
I have heard this claim before.  I don't buy it.
This is clearly an area where anecdotals like "Walter will never accept this" aren't useful. There's ample evidence to show that he accepts when he's proven wrong. To make any progress towards a better GC it's going to have to be demonstrated. And the proof will justify the changes. If they're good and the complexities required to get there aren't awful, then they'll make into the code base. If they're not, then they wont. The status quo is easy to get stuck in and help no one. Someone with a good background in GC's needs to step up and be it's champion. I expect that several people would rally to help once some basic momentum is established. So, willing to be the rallying cry and step up?
Jan 20
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 21/01/2022 3:17 PM, Brad Roberts wrote:
 To make any progress towards a better GC it's going to have to be 
 demonstrated.  And the proof will justify the changes.  If they're good 
 and the complexities required to get there aren't awful, then they'll 
 make into the code base.  If they're not, then they wont.  The status 
 quo is easy to get stuck in and help no one.
 
 Someone with a good background in GC's needs to step up and be it's 
 champion.  I expect that several people would rally to help once some 
 basic momentum is established.  So, willing to be the rallying cry and 
 step up?
I've recently bought (but waiting on delivery) The Garbage Collection Handbook. It covers generational GC's, memory allocators (which is what I'm interested in) and like 13 pages on write barriers. It would be an interesting experiment to see if we added a -enablewb flag and have it enable more GC options at runtime, if this allowed some better fitting GC's.
Jan 20
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 21 January 2022 at 04:17:53 UTC, rikki cattermole 
wrote:
 I've recently bought (but waiting on delivery) The Garbage 
 Collection Handbook. It covers generational GC's, memory 
 allocators (which is what I'm interested in) and like 13 pages 
 on write barriers.
I've browsed through it. The writing style is very accessible, I think. It also has a chapter on real time garbage collection ("wait-free" collection), but I guess that requires a different take on the language-runtime. Some also require OS-support, I guess.
 It would be an interesting experiment to see if we added a 
 -enablewb flag and have it enable more GC options at runtime, 
 if this allowed some better fitting GC's.
I think many programmers would shoot themselves in the the foot in nogc code... You would need very good documentation and tutorials for that to be a reasonable option.
Jan 21
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 21/01/2022 10:34 PM, Ola Fosheim Grøstad wrote:
 It would be an interesting experiment to see if we added a -enablewb 
 flag and have it enable more GC options at runtime, if this allowed 
 some better fitting GC's.
I think many programmers would shoot themselves in the the foot in nogc code... You would need very good documentation and tutorials for that to be a reasonable option.
Yeah in practice this may require UDA's and all sorts to guarantee good speeds. With very easy to ways to segfault. Not necessarily something D as a whole wants to go in, hence opt-in rather than opt-out.
Jan 21
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 21 January 2022 at 09:48:48 UTC, rikki cattermole 
wrote:
 Not necessarily something D as a whole wants to go in, hence 
 opt-in rather than opt-out.
If you are looking for niche options then there are many approaches you can look at. Here is a Boehm-like collector for real time audio called *Rollendurchmesserzeitsammler*: http://users.notam02.no/~kjetism/rollendurchmesserzeitsammler/ It would work for D, I guess.
Jan 21
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 21/01/2022 11:58 PM, Ola Fosheim Grøstad wrote:
 It would work for D, I guess.
Not likely, it would need a lot of work to bring that up to todays requirements. For context: A heap size of less than 1 megabyte may work very well. The cost of duplicating a heap of 100.000 bytes only takes up about 2% extra cpu time on my single processor personal computer from 2002-2003 running a blocksize of 128. (Note that a more common blocksize of 1028 would yield a cpu usage of 1/8 of 2%). Furthermore, since todays computers are many times faster than this, especially those with 4 or 8 cores, (which the garbage collector is able to fully utilitize (well, at least that's the plan)), the number of bytes in the heap can probably be multiplied many times. As long as the heap is reasonable small, which it probably should be while doing signal processing tasks, duplicating the entire heap should not be a costly operation with current personal computers. On a multiprocessor machine, using heaps of many megabytes should not be a problem. I've also successfully tried using a 20MB heap on a dual-core intel machine.
Jan 21
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 21 January 2022 at 11:06:58 UTC, rikki cattermole 
wrote:
 On 21/01/2022 11:58 PM, Ola Fosheim Grøstad wrote:
 It would work for D, I guess.
Not likely, it would need a lot of work to bring that up to todays requirements.
What are the requirements? If it is "niche", then that is application specific. The GC is for audio-work. IIRC it aims to get close to hard real time without affecting the real time thread, so it is designed to use 20% of the time that it takes to fill one audio buffer by taking a copy of the pointer heap? Makes sense to me for that particular niche. Might also work for embedded with the caveat that you need extra memory.
Jan 21
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 22/01/2022 12:27 AM, Ola Fosheim Grøstad wrote:
 On Friday, 21 January 2022 at 11:06:58 UTC, rikki cattermole wrote:
 On 21/01/2022 11:58 PM, Ola Fosheim Grøstad wrote:
 It would work for D, I guess.
Not likely, it would need a lot of work to bring that up to todays requirements.
What are the requirements?
That project was a masters thesis in 2009 running on 2002 era hardware. Size of data has changed, hardware has changed. It'll need work to prove it can even do the sort of workloads D has even in audio space.
Jan 21
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 21 January 2022 at 11:38:10 UTC, rikki cattermole 
wrote:
 On 22/01/2022 12:27 AM, Ola Fosheim Grøstad wrote:
 On Friday, 21 January 2022 at 11:06:58 UTC, rikki cattermole 
 wrote:
 On 21/01/2022 11:58 PM, Ola Fosheim Grøstad wrote:
 It would work for D, I guess.
Not likely, it would need a lot of work to bring that up to todays requirements.
What are the requirements?
That project was a masters thesis in 2009 running on 2002 era hardware.
Yes, but Notam and CCRMA tend to attract programmers that take audio seriously (electro acoustic computer music). So, I would not dismiss it because it is a mater thesis.
 Size of data has changed, hardware has changed. It'll need work 
 to prove it can even do the sort of workloads D has even in 
 audio space.
Well, in his master thesis he covers the Boehm copy-heap collector and a collector based on barriers (which seems to use a lot of memory, but he claims it is essentially wait-free IIRC): https://www.duo.uio.no/handle/10852/8805 I haven't read the details, and maybe chapter 19 in the GC handbook is more interesting for you, but it is at least an example of a practical real time oriented approach to garbage collection. Which is in some sense is better than the high flying theories about what can be done in the GC department for D that tend to "poison" the forum discussions on the topic.
Jan 21
prev sibling next sibling parent reply user1234 <user1234 12.de> writes:
On Friday, 21 January 2022 at 11:27:51 UTC, Ola Fosheim Grøstad 
wrote:
 The GC is for audio-work. IIRC it aims to get close to hard 
 real time without affecting the real time thread, so it is 
 designed to use 20% of the time that it takes to fill one audio 
 buffer by taking a copy of the pointer heap?
No. The theory is that GC is not *that much a handicap* for real audio because it's *not really real-time*. It is not that "The GC is for audio-work". That theory is just totally wrong.
Jan 21
next sibling parent user1234 <user1234 12.de> writes:
On Friday, 21 January 2022 at 11:47:05 UTC, user1234 wrote:
 On Friday, 21 January 2022 at 11:27:51 UTC, Ola Fosheim Grøstad 
 wrote:
 The GC is for audio-work. IIRC it aims to get close to hard 
 real time without affecting the real time thread, so it is 
 designed to use 20% of the time that it takes to fill one 
 audio buffer by taking a copy of the pointer heap?
No. The theory is that GC is not *that much a handicap* for real audio
sorry, "real-time audio" was meant here. just in case of.
Jan 21
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 21 January 2022 at 11:47:05 UTC, user1234 wrote:
 No. The theory is that GC is not *that much a handicap* for 
 real audio because it's *not really real-time*. It is not that 
 "The GC is for audio-work". That theory is just totally wrong.
It is possible to make it hard real time, by imposing a max heap size, which is reasonable for audio. You need to use a garbage collector of some sort, either language level or application level if your application allows the user to build a graph. Which is some creative audio applications does (including audio programming languages). If you can get away with using a language level collector then that will be the cheaper option. To get there you need to take a niche-approach like he does.
Jan 21
prev sibling parent reply Guillaume Piolat <first.last gmail.com> writes:
On Friday, 21 January 2022 at 11:27:51 UTC, Ola Fosheim Grøstad 
wrote:
 Makes sense to me for that particular niche. Might also work 
 for embedded with the caveat that you need extra memory.
While it must be interesting, another solution you have in D is to deregister audio threads so that a GC pause would let them run free. The audio threads rarely owns GC root anyway. Despite popular conception, it's indeed common to allocate from the audio-thread at first buffer, it can avoid some pessimization depending on incoming parameters that are only known at play time. Amortized malloc is surprisingly cheap, you can merge allocation buffers, and you would do it only once at initialization. A lot of the audio buffers are also scoped in nature, with a lifetime that is the one of the processing treatment. So you can also leverage specialized allocators to mass release those buffers rapidly. In Dplug plugins we got a GC back :) by integrating Wren as UI scripting language. It is expected to cause very few problems, as it has a max-heap size and doesn't touch audio.
Jan 21
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 21 January 2022 at 13:00:31 UTC, Guillaume Piolat 
wrote:
 On Friday, 21 January 2022 at 11:27:51 UTC, Ola Fosheim Grøstad 
 wrote:
 Makes sense to me for that particular niche. Might also work 
 for embedded with the caveat that you need extra memory.
While it must be interesting, another solution you have in D is to deregister audio threads so that a GC pause would let them run free. The audio threads rarely owns GC root anyway.
Yes, for a plugin that is reasonable. I think an audio-friendly garbage collector would be more for a language-like application like Max, or maybe even a complex sound editor. But even if you do make your realtime tasks independent of the GC, it is still better to have a collector that is throttling the collection in the background so it doesn't wipe out the caches or saturate the data bus. Then you get more headroom for your realtime tasks (you get a more even load over time).
 Amortized malloc is surprisingly cheap, you can merge 
 allocation buffers, and you would do it only once at 
 initialization. A lot of the audio buffers are also scoped in 
 nature, with a lifetime that is the one of the processing 
 treatment. So you can also leverage specialized allocators to 
 mass release those buffers rapidly.
Hm, how would you use amortized algorithms in a real time thread where you want even load on every callback? I guess you could use "coroutines" and spread the load over many "interrupts"/"callbacks". One strategy for dynamic allocation is to have "chunks" for same sized allocations (avoiding fragmentation and merging), and then do a worst case estimate for how many allocations can happen in one realtime-callback. Then a non-realtime thread can make sure that there is enough free space in the chunks to support N callbacks. Another strategy I have used that may be used if you do not write a plugin, but create your own host, is to pass in the buffers needed in the events you send to the realtime thread. Then have a mailbox for returning the buffers and let a non-realtime thread do the cleanup.
 In Dplug plugins we got a GC back :) by integrating Wren as UI 
 scripting language. It is expected to cause very few problems, 
 as it has a max-heap size and doesn't touch audio.
Is that a dedicated GC for wren or is it a generic GC?
Jan 21
next sibling parent reply Guillaume Piolat <first.last gmail.com> writes:
On Friday, 21 January 2022 at 18:53:03 UTC, Ola Fosheim Grøstad 
wrote:
 Is that a dedicated GC for wren or is it a generic GC?
Wren use its own non-moving, non-incremental, precise, stop-the-world mark&sweep GC.
Jan 21
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 21 January 2022 at 20:17:16 UTC, Guillaume Piolat 
wrote:
 On Friday, 21 January 2022 at 18:53:03 UTC, Ola Fosheim Grøstad 
 wrote:
 Is that a dedicated GC for wren or is it a generic GC?
Wren use its own non-moving, non-incremental, precise, stop-the-world mark&sweep GC.
But it only stops the wren-code, so it does not affect real time properties of the main program? I think this is the better approach for system level programming. Have dedicated memory management where you get to control the properties and priorities, and also have the ability to add new mechanisms (like purging cached objects). IIRC, the author of the audio-oriented GC suggested implementing the collector as a coroutine, so that you could execute it when the CPU is idle in a controlled manner. Many ways to be inventive here, if you focus on adapting to the specifics of the application. There is no-size-fits-all solution. So, the main issue is not GV vs no-GC, but whether the language/libraries lock you to the specifics of a generic runtime.
Jan 22
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 23/01/2022 2:03 AM, Ola Fosheim Grøstad wrote:
 So, the main issue is not GV vs no-GC, but whether the 
 language/libraries lock you to the specifics of a generic runtime.
Having a decent allocator library at the runtime level available for all D code (not just for those with druntime) will be key to allowing this to happen.
Jan 22
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 22 January 2022 at 13:19:57 UTC, rikki cattermole 
wrote:
 Having a decent allocator library at the runtime level 
 available for all D code (not just for those with druntime) 
 will be key to allowing this to happen.
I think some language adjustments are needed, but it could be a dialect. Doesn't have to apply to DMD, as it seems to be locked down at this point. You basically need some room to experiment to get it right, and that requires freedom (which will never happen if everyone in the forums are going to have their say). So whether you want to focus on ARC, actors/fibers with local GC or concurrent GC, you should be able to adjust the language so that your vision can be enabled in a clean fashion. I don't think having dialects is a big issue. I think it is the only way forward for automatic memory management + system level programming. You can design your dialect such that most straight forward single-threaded code run with next to no changes. It is either that or another language will grab the user base that wants it.
Jan 22
prev sibling parent Guillaume Piolat <first.last gmail.com> writes:
On Saturday, 22 January 2022 at 13:03:26 UTC, Ola Fosheim Grøstad 
wrote:
 Wren use its own non-moving, non-incremental, precise, 
 stop-the-world mark&sweep GC.
But it only stops the wren-code, so it does not affect real time properties of the main program?
No because (roughly speaking) UI callbacks are in event loops / timers separated from audio callbacks, and wren is called from these UI callbacks. So you can take as much time as you want to process a click, or in WM_PAINT, it will never affect audio playback (well, it will if you spawn many demanding threads from there). To support that windowing in desktop OSes separates invalidation from repaint.
Jan 22
prev sibling parent reply Guillaume Piolat <first.last gmail.com> writes:
On Friday, 21 January 2022 at 18:53:03 UTC, Ola Fosheim Grøstad 
wrote:
 Hm, how would you use amortized algorithms in a real time 
 thread where you want even load on every callback?
Well I was using the term incorrectly, "amortized" in the sense the large majority of callback will not allocate, so in other words, the opposite :) pushack_back leading to realloc can make a lot more trouble because it moves large amount of memory. Delay lines may need to be allocated with a max size / pessimized for that reason. In game-mixer I had to use chunked allocation to avoid that while decoding an audio file with playback, in the case you want to keep it in memory.
Jan 21
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 21 January 2022 at 20:22:15 UTC, Guillaume Piolat 
wrote:
 Well I was using the term incorrectly, "amortized" in the sense 
 the large majority of callback will not allocate, so in other 
 words, the opposite :)
Yes, that is different, but actually using a stackless coroutine can be a solution to spread out the load when you need to. In D I guess you could have do it as a state machine using a switch statement. I've never done this in practice though…
 pushack_back leading to realloc can make a lot more trouble 
 because it moves large amount of memory. Delay lines may need 
 to be allocated with a max size / pessimized for that reason.
What is pushack_back? You can also build a delay with a chain of smaller fixed size buffers, although a bit unusual perhaps. If you need a filter you can do that in a smaller buffer. So many implementation options!
Jan 21
parent Guillaume Piolat <first.last gmail.com> writes:
On Friday, 21 January 2022 at 21:29:06 UTC, Ola Fosheim Grøstad 
wrote:
 What is pushack_back?
push_back* ie. appending to a vector-like structure.
Jan 22
prev sibling next sibling parent reply Elronnd <elronnd elronnd.net> writes:
On Friday, 21 January 2022 at 02:17:34 UTC, Brad Roberts wrote:
 To make any progress towards a better GC it's going to have to 
 be demonstrated.  And the proof will justify the changes.  If 
 they're good and the complexities required to get there aren't 
 awful, then they'll make into the code base.  If they're not, 
 then they wont.  The status quo is easy to get stuck in and 
 help no one.

 Someone with a good background in GC's needs to step up and be 
 its champion.  I expect that several people would rally to help 
 once some basic momentum is established.  So, willing to be the 
 rallying cry and step up?
I agree. But: it is an involved task that will likely not bear fruit until it is nearly done; I do not expect an implementation would perform very well until it performed the optimizations I mentioned elsewhere. (Well, a first step would be to make the precise gc actually precise, which would benefit everybody; but after that...) For my part, I think it would be a fun project, but it would not be a high enough priority for me to see it through unless somebody were paying me to do it full time (and who would pay for an 'unproven' idea? It is exactly the problem you identify). Maybe rikki will be the champion :)
Jan 20
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 21/01/2022 5:48 PM, Elronnd wrote:
 Maybe rikki will be the champion :)
I've considered it, but it wouldn't align with what I am doing with D atm. I'd also like to see is fiber aware GC hooks and configurable memory allocator.
Jan 20
prev sibling next sibling parent Paulo Pinto <pjmlp progtools.org> writes:
On Friday, 21 January 2022 at 02:17:34 UTC, Brad Roberts wrote:
 On 1/20/2022 5:46 PM, Elronnd via Digitalmars-d wrote:
 On Friday, 21 January 2022 at 00:55:43 UTC, H. S. Teoh wrote:
 Write barriers are required for an incremental GC to work. 
 Write barriers are unlikely to be accepted in D because they 
 introduce synchronization overhead per pointer write. This 
 would kill D's performance in low-level code, and I don't 
 think Walter will ever accept this.
I have heard this claim before.  I don't buy it.
This is clearly an area where anecdotals like "Walter will never accept this" aren't useful. There's ample evidence to show that he accepts when he's proven wrong. To make any progress towards a better GC it's going to have to be demonstrated. And the proof will justify the changes. If they're good and the complexities required to get there aren't awful, then they'll make into the code base. If they're not, then they wont. The status quo is easy to get stuck in and help no one. Someone with a good background in GC's needs to step up and be it's champion. I expect that several people would rally to help once some basic momentum is established. So, willing to be the rallying cry and step up?
To be honest, that is something that is more fun to do in the Go, .NET and Java communities, because not only no one cares about the matra that on a systems programming language GC verbotten is supposed to be a thing, there are hardware companies shipping real products with them. So in those communities the whole discussion isn't if GC belongs or not in a systems programming language, being rebooted every couple of months, rather how to improve it to the point that e.g. writing firmware in Go is a reality that one can ship into a commercial product (USB Armory).
Jan 20
prev sibling next sibling parent reply IGotD- <nise nise.com> writes:
On Friday, 21 January 2022 at 02:17:34 UTC, Brad Roberts wrote:
 This is clearly an area where anecdotals like "Walter will 
 never accept this" aren't useful.  There's ample evidence to 
 show that he accepts when he's proven wrong.

 To make any progress towards a better GC it's going to have to 
 be demonstrated.  And the proof will justify the changes.  If 
 they're good and the complexities required to get there aren't 
 awful, then they'll make into the code base.  If they're not, 
 then they wont.  The status quo is easy to get stuck in and 
 help no one.

 Someone with a good background in GC's needs to step up and be 
 it's champion.  I expect that several people would rally to 
 help once some basic momentum is established.  So, willing to 
 be the rallying cry and step up?
I've been nagging about this for ages it feels. The problem is that D as no managed pointers. If that was the case, it would enable a myriad of GC possibilities for the D community to experiment with different GC algorithms. This at the same time keeping the possibility that managed pointers is just a pointer like today. Today it doesn't matter if we have world class GC experts in the D community as the language itself limits what you can implement. D suffers from that what can be seen in several other inventions. The inventor thinks that the invention is perfect and do not want to develop it further even if the shortcomings are obvious. Some other inventor comes along and drastically improves the design and the product really shoots off in popularity.
Jan 21
parent reply Elronnd <elronnd elronnd.net> writes:
On Friday, 21 January 2022 at 10:00:17 UTC, IGotD- wrote:
 The problem is that D as no managed pointers.
Again, this can be solved conservatively at the implementation-level.
Jan 21
next sibling parent reply IGotD- <nise nise.com> writes:
On Friday, 21 January 2022 at 10:13:17 UTC, Elronnd wrote:
 Again, this can be solved conservatively at the 
 implementation-level.
Basically no. D has a lot of standard library types in druntime/phobos, like arrays and other containers. Even if you implement your own GC at implementation level, the standard library types will still use the default GC and having other GC properties you probably don't want. Managed pointers enables you to swap out the entire default GC all together including GC usage in the standard library.
Jan 21
parent reply Elronnd <elronnd elronnd.net> writes:
On Friday, 21 January 2022 at 10:42:58 UTC, IGotD- wrote:
 Again, this can be solved conservatively at the 
 implementation-level.
Basically no. Managed pointers enables you to swap out the entire default GC all together including GC usage in the standard library.
I guess I was insufficiently clear. My bad. I am talking about a mechanism that works without needing to modify any existing code. Let's say we have this: void f(int** x) { *x = new int; } void g() { int** x = new int*; f(x); int* y; f(&y); } and we want to add a generational gc, which barriers. So naively we rewrite f as follows: void f(int** x) { store_barrier(x, new int); } This will involve overhead because we don't know if x is a gc pointer or not. So instead, generate multiple definitions and rewrite callers: void f_gc(int** x) { store_barrier(x, new int); } void f_nogc(int** x) { *x = new int; //no barrier! } void g() { int** x = new int*; f_gc(x); int* y; f_nogc(&y); //no overhead from passing stack object! } Of course this transformation is conservative: you will sometimes call f_gc with a non-gc pointer. But I bet it works 99% of the time such that the runtime overhead is negligible in practice.
Jan 21
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 21, 2022 at 11:12:01AM +0000, Elronnd via Digitalmars-d wrote:
[...]
 void f(int** x) {
 	*x = new int;
 }
 void g() {
 	int** x = new int*;
 	f(x);
 	int* y;
 	f(&y);
 }
 
 and we want to add a generational gc, which barriers.  So naively we
 rewrite f as follows:
 
 void f(int** x) {
 	store_barrier(x, new int);
 }
 
 This will involve overhead because we don't know if x is a gc pointer or
 not.  So instead, generate multiple definitions and rewrite callers:
 
 void f_gc(int** x) {
 	store_barrier(x, new int);
 }
 void f_nogc(int** x) {
 	*x = new int; //no barrier!
 }
 void g() {
 	int** x = new int*;
 	f_gc(x);
 	int* y;
 	f_nogc(&y); //no overhead from passing stack object!
 }
So basically *every* function that receives pointers in some way (could be implicitly via built-in slices, for example) will essentially be implicitly templatized? That will lead to a LOT of template bloat. I'm not sure if this is a good idea.
 Of course this transformation is conservative: you will sometimes call
 f_gc with a non-gc pointer.  But I bet it works 99% of the time such
 that the runtime overhead is negligible in practice.
What we need is a working proof-of-concept implementation that we can actually benchmark and compare with the current situation, so that we can make decisions based on actual data. Otherwise, it's just your word against mine and my word against yours, which gets us nowhere. T -- If blunt statements had a point, they wouldn't be blunt...
Jan 21
prev sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 21 January 2022 at 10:13:17 UTC, Elronnd wrote:
 On Friday, 21 January 2022 at 10:00:17 UTC, IGotD- wrote:
 The problem is that D as no managed pointers.
Again, this can be solved conservatively at the implementation-level.
It could, in theory, if you change the semantics of nogc and annotate everything with nogc that the collector must leave alone. I suspect that some of the existing nogc code would have to be rewritten. But this approach is very close to introducing a new pointer type. You also need to prevent scannable pointers to be available through unions, so a breaking language change is practically unavoidable if you want to make a good GC collector for the language. There is also no way for D to force barriers on linked code, hence the need for a breaking language level change at some level. (I assume many approaches are possible, but all require a breaking change).
Jan 21
parent reply Elronnd <elronnd elronnd.net> writes:
On Friday, 21 January 2022 at 12:26:42 UTC, Ola Fosheim Grøstad 
wrote:
 It could, in theory, if you change the semantics of  nogc and 
 annotate everything with  nogc that the collector must leave 
 alone.
You do not. See my other post where I clarified what I meant with this approach.
 You also need to prevent scannable pointers to be available 
 through unions
You do not. You do need to do _something_ about unions, but you do need to disallow pointers in them. The spec says to pin objects which are in unions, which is a perfectly acceptable solution. There are also cleverer solutions which violate that clause of the spec but do not break any actual code.
 There is also no way for D to force barriers on linked code, 
 hence the need for a breaking language level change at some 
 level.
You can do it with name mangling. D is not binary compatible between releases. ------------------------------------ Bottom line: no source-level changes are necessary.
Jan 21
next sibling parent Elronnd <elronnd elronnd.net> writes:
On Friday, 21 January 2022 at 12:52:02 UTC, Elronnd wrote:
 You do need to do _something_ about unions, but you do need to 
 disallow pointers in them.
Err, this should read: 'you do _not_ need to disallow...'
Jan 21
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 21 January 2022 at 12:52:02 UTC, Elronnd wrote:
 You do not.  You do need to do _something_ about unions, but 
 you do need to disallow pointers in them.  The spec says to pin 
 objects which are in unions, which is a perfectly acceptable 
 solution.  There are also cleverer solutions which violate that 
 clause of the spec but do not break any actual code.
No, pinning is not a solution, that is band-aid for a broken design. You'll end up with memory leaks all over the place. Regarding your suggestion, this is not as trivial as you imply. There is no way today for the type system to know where a pointer points. So there is also no way to know whether you need to apply a barrier or not. Since non-D libraries can mutate pointers there is also no way to be sure that they only modify pointers that does not require barriers. As a result you risk having spurious bugs that are impossible to track. You need some kind of language change to make the type system work properly with barriers.
Jan 21
prev sibling parent reply Guillaume Piolat <first.last gmail.com> writes:
  On Friday, 21 January 2022 at 02:17:34 UTC, Brad Roberts wrote:
 To make any progress towards a better GC it's going to have to 
 be demonstrated.
The GC is not as slow and annoying as it used to be. When I went to Dconf 2019 I was surprised to learn that it had been speed-up during the year (with both fork-GC and speed-up efforts by Rainer IIRC) but this wasn't advertised anywhere. IT's easy to be under the impression that the GC is the same than 10 years ago when it's not written anywhere what changes happened, you'd have to peruse all the release notes. Who knows the real timeline of GC improvement in D? What happened?
Jan 21
next sibling parent reply user1234 <user1234 12.de> writes:
On Friday, 21 January 2022 at 12:45:42 UTC, Guillaume Piolat 
wrote:
  On Friday, 21 January 2022 at 02:17:34 UTC, Brad Roberts wrote:
 To make any progress towards a better GC it's going to have to 
 be demonstrated.
The GC is not as slow and annoying as it used to be. When I went to Dconf 2019 I was surprised to learn that it had been speed-up during the year (with both fork-GC and speed-up efforts by Rainer IIRC) but this wasn't advertised anywhere.
I agree. The GC run-time options are poorly documented. It's written nowhere that druntime can take specific arguments that change the GC behavior.
Jan 21
parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Friday, 21 January 2022 at 12:51:11 UTC, user1234 wrote:
 I agree. The GC run-time options are poorly documented. It's 
 written nowhere that druntime can take specific arguments that 
 change the GC behavior.
the page dlang.org/garbage has a header called "nowhere": https://dlang.org/spec/garbage.html#gc_config
Jan 21
parent user1234 <user1234 12.de> writes:
On Friday, 21 January 2022 at 13:10:46 UTC, Adam D Ruppe wrote:
 On Friday, 21 January 2022 at 12:51:11 UTC, user1234 wrote:
 I agree. The GC run-time options are poorly documented. It's 
 written nowhere that druntime can take specific arguments that 
 change the GC behavior.
the page dlang.org/garbage has a header called "nowhere": https://dlang.org/spec/garbage.html#gc_config
okay it's better nowadays, I've upgraded my point of view.
Jan 21
prev sibling parent Mike Parker <aldacron gmail.com> writes:
On Friday, 21 January 2022 at 12:45:42 UTC, Guillaume Piolat 
wrote:
 The GC is not as slow and annoying as it used to be. When I 
 went to Dconf 2019 I was surprised to learn that it had been 
 speed-up during the year (with both fork-GC and speed-up 
 efforts by Rainer IIRC) but this wasn't advertised anywhere. 
 IT's easy to be under the impression that the GC is the same 
 than 10 years ago when it's not written anywhere what changes 
 happened, you'd have to peruse all the release notes. Who knows 
 the real timeline of GC improvement in D? What happened?
IIRC, Rainer's work on the GC was discussed in the forums at the time. The fork-based GC was a SAOC 2018 project, but it wasn't actually merged until last fall. [I announced it on the blog](https://dlang.org/blog/2021/10/29/dlang-news-september-october-2021-d-2-098-0-openbsd-saoc dconf-online-swag/) and [in a YouTube video](https://youtu.be/jX9grHMTGAU?t=229).
Jan 21
prev sibling next sibling parent Era Scarecrow <rtcvb32 yahoo.com> writes:
On Thursday, 20 January 2022 at 23:56:42 UTC, Chris Katko wrote:
 I just found out that Unity has incremental garbage collection. 
 I didn't know that was a possibility for GCs.

 https://docs.unity3d.com/Manual/performance-incremental-garbage-collection.html

 I'm just curious what the D language community's thoughts are 
 on it. The tradeoff is: For a longer total time / lower 
 throughput, you reduce stress on individual frames which 
 prevents hiccups. That's pretty darn important for games and 
 soft/hard real-time systems.
I would say it really depends on a handful of things, like how much allocating/deallocating you're actually doing, and if fragmentation or things would be the issue and if doing it more often would help. I try to write code using the stack rather than allocating for small temporary items which frees itself when it leaves the function. Though, having the GC pick up while waiting on the OS to reply would be a great time to do it's work; that would be the best time to work. As for handling fragmentation, I'd almost wish to make a different type of allocator which gives you an id and you use the id+offset to get the actual address (*though to make it work it would be a new type of slice that would handle those details transparently*); Then periodically (*say every 50ms or something*) it would do a quick check/scan for a hole, and then move data and adjust the values of said id's to remove empty holes and make it as tight as possible. I'm also not sure how well it would work on multi-threaded applications, though having locks which the GC locks the id's while it moves them and then unlocks *should* handle those details. This of course only would be needed if you have limited/preallocated memory and fragmentation (*due to some parts being too small*) could kill the process. Otherwise it would probably be a lot of busywork. To note while i love contemplating this stuff, I'm not quite sure how to implement it all myself. I don't have the same mental fortitude i had when i was 14.
Jan 21
prev sibling next sibling parent Bienlein <jeti789 web.de> writes:
On Thursday, 20 January 2022 at 23:56:42 UTC, Chris Katko wrote:
 I just found out that Unity has incremental garbage collection. 
 I didn't know that was a possibility for GCs.

 https://docs.unity3d.com/Manual/performance-incremental-garbage-collection.html
Are you kidding? The incremental GC based on the train algorithm (see https://wiki.c2.com/?TheTrainAlgorithm) that earlier was part of the JVM has meanwhile been removed from it since there are newer better ones.
Jan 26
prev sibling parent Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:
On Thursday, 20 January 2022 at 23:56:42 UTC, Chris Katko wrote:
 I just found out that Unity has incremental garbage collection. 
 I didn't know that was a possibility for GCs.

 https://docs.unity3d.com/Manual/performance-incremental-garbage-collection.html

 I'm just curious what the D language community's thoughts are 
 on it. The tradeoff is: For
 a longer total time / lower throughput, you reduce stress on 
 individual frames which prevents hiccups. That's pretty darn 
 important for games and soft/hard real-time systems.

 Is that an option on a hypothetical level, for D? Or does D's 
 language design preclude that. I recall some sort of issue with 

 have to insert memory barriers into D to work or something like 
 that.

 Also, it's hard to find one specific place for "news" regarding 
 D's garbage collector development and "omg we deleted 
 everything GC from the standard library" projects. Are there 
 any major updates on those fronts?
Just to add to this for completeness sake: a while ago I spent a few hours digging through Uneal Engine source code to understand their GC algorithm. It's a peculiar and interesting take on iterative, precise, conservative garbage collection: - First of all, UE4 has precise runtime reflection information for all data types that are handled via the GC (everything derived from UObject - which is mostly game logic code). Heap scans are optimized aggressively based on this data. Pointers are regular naked C++ pointers without any additional magic. They need to be annotated for the custom preprocessor that generates the reflection data, but that's it. The C++ code touching these pointers is plain old code using naked pointers. No hidden wrappers there. - The algorithm is basically mark and sweep with a couple of awful situational hacks that stem from weird engine subsystem interactions. I'll gloss over those. The GC performs a multistep GC process. Each frame, the engine prompts the GC to run one step in a stop-the-world fashion. - Each step, the GC can either run an *entire* mark phase or a single step of the sweep phase. The mark step is always run in its entirety, never in multiple steps. The sweep phase is broken up into steps that free a limited number of marked objects in each step. The rest is deferred to the next step. When sweep phase is done, the next GC step will be a new mark phase. IMO, this is a very pragmatic approach. But for the runtime to be truly bounded, the heap size needs to be bounded, too. I assume that this works for UE4 because for most games using the engine, the number of objects involved in the game logic at any point is fairly small (mostly ~ a few hundred). Stopping the world for the mark phase is also a major simplification. This avoids the overhead typically incurred by GC algorithms that do incremental marking. Also, GC references can be treated just like regular pointers. I don't know how the D garbage collector runtime is split between mark and sweep phases. If the sweep phase is indeed the slower phase, an option to have that run iteratively might be enticing. I would naively assume that this would be lower hanging fruit compared to a generational and/or concurrent GC. But then again, I don't have a need for any of that at the moment.
Jan 29