digitalmars.D - Incremental garbage collection

Chris Katko (16/16) Jan 20 2022 I just found out that Unity has incremental garbage collection. I

Chris Katko (3/6) Jan 20 2022 Oh, also, here's a blog post when they released in the end of 2018
H. S. Teoh (15/25) Jan 20 2022 [...]
Elronnd (15/25) Jan 20 2022 You can already avoid pauses by simply not allocating. I would

Brad Roberts (13/21) Jan 20 2022 This is clearly an area where anecdotals like "Walter will never accept

rikki cattermole (7/17) Jan 20 2022 I've recently bought (but waiting on delivery) The Garbage Collection

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/17) Jan 21 2022 I've browsed through it. The writing style is very accessible, I

rikki cattermole (5/12) Jan 21 2022 Yeah in practice this may require UDA's and all sorts to guarantee good

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/10) Jan 21 2022 If you are looking for niche options then there are many

rikki cattermole (23/24) Jan 21 2022 Not likely, it would need a lot of work to bring that up to todays

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/14) Jan 21 2022 What are the requirements?

rikki cattermole (4/13) Jan 21 2022 That project was a masters thesis in 2009 running on 2002 era hardware.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (15/31) Jan 21 2022 Yes, but Notam and CCRMA tend to attract programmers that take

user1234 (5/9) Jan 21 2022 No. The theory is that GC is not *that much a handicap* for real

user1234 (2/10) Jan 21 2022 sorry, "real-time audio" was meant here. just in case of.
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/13) Jan 21 2022 It is possible to make it hard real time, by imposing a max heap

Guillaume Piolat (17/19) Jan 21 2022 While it must be interesting, another solution you have in D is

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (26/43) Jan 21 2022 Yes, for a plugin that is reasonable. I think an audio-friendly

Guillaume Piolat (4/5) Jan 21 2022 Wren use its own non-moving, non-incremental, precise,

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (15/21) Jan 22 2022 But it only stops the wren-code, so it does not affect real time

rikki cattermole (4/6) Jan 22 2022 Having a decent allocator library at the runtime level available for all...

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (17/20) Jan 22 2022 I think some language adjustments are needed, but it could be a

Guillaume Piolat (10/15) Jan 22 2022 No because (roughly speaking) UI callbacks are in event loops /

Guillaume Piolat (11/13) Jan 21 2022 Well I was using the term incorrectly, "amortized" in the sense

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/16) Jan 21 2022 Yes, that is different, but actually using a stackless coroutine

Guillaume Piolat (3/4) Jan 22 2022 push_back* ie. appending to a vector-like structure.

Elronnd (12/22) Jan 20 2022 I agree. But: it is an involved task that will likely not bear

rikki cattermole (4/5) Jan 20 2022 I've considered it, but it wouldn't align with what I am doing with D at...

Paulo Pinto (11/33) Jan 20 2022 To be honest, that is something that is more fun to do in the Go,
IGotD- (14/27) Jan 21 2022 I've been nagging about this for ages it feels. The problem is

Elronnd (3/4) Jan 21 2022 Again, this can be solved conservatively at the

IGotD- (8/10) Jan 21 2022 Basically no.

Elronnd (36/41) Jan 21 2022 I guess I was insufficiently clear. My bad. I am talking about

H. S. Teoh (13/48) Jan 21 2022 So basically *every* function that receives pointers in some way (could

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (14/18) Jan 21 2022 It could, in theory, if you change the semantics of @nogc and

Elronnd (13/21) Jan 21 2022 You do not. See my other post where I clarified what I meant

Elronnd (2/4) Jan 21 2022 Err, this should read: 'you do _not_ need to disallow...'
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (12/17) Jan 21 2022 No, pinning is not a solution, that is band-aid for a broken

Guillaume Piolat (9/11) Jan 21 2022 The GC is not as slow and annoying as it used to be. When I went

user1234 (5/13) Jan 21 2022 I agree. The GC run-time options are poorly documented. It's

Adam D Ruppe (5/8) Jan 21 2022 the page

user1234 (2/10) Jan 21 2022 okay it's better nowadays, I've upgraded my point of view.

Mike Parker (6/14) Jan 21 2022 IIRC, Rainer's work on the GC was discussed in the forums at the

Era Scarecrow (29/37) Jan 21 2022 I would say it really depends on a handful of things, like how
Bienlein (5/8) Jan 26 2022 Are you kidding? The incremental GC based on the train algorithm
Gregor =?UTF-8?B?TcO8Y2ts?= (40/57) Jan 29 2022 Just to add to this for completeness sake: a while ago I spent a

Chris Katko <ckatko gmail.com> writes:

I just found out that Unity has incremental garbage collection. I 
didn't know that was a possibility for GCs.

https://docs.unity3d.com/Manual/performance-incremental-garbage-collection.html

I'm just curious what the D language community's thoughts are on 
it. The tradeoff is: For
a longer total time / lower throughput, you reduce stress on 
individual frames which prevents hiccups. That's pretty darn 
important for games and soft/hard real-time systems.

Is that an option on a hypothetical level, for D? Or does D's 
language design preclude that. I recall some sort of issue with 

to insert memory barriers into D to work or something like that.

Also, it's hard to find one specific place for "news" regarding 
D's garbage collector development and "omg we deleted everything 
GC from the standard library" projects. Are there any major 
updates on those fronts?

Jan 20 2022

Chris Katko <ckatko gmail.com> writes:

On Thursday, 20 January 2022 at 23:56:42 UTC, Chris Katko wrote:
 I just found out that Unity has incremental garbage collection. 
 I didn't know that was a possibility for GCs.

 [...]

Oh, also, here's a blog post when they released in the end of 2018

https://blog.unity.com/technology/feature-preview-incremental-garbage-collection

Jan 20 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Jan 20, 2022 at 11:56:42PM +0000, Chris Katko via Digitalmars-d wrote:
 I just found out that Unity has incremental garbage collection. I
 didn't know that was a possibility for GCs.

[...]

AFAIK, Ruby has had it ages ago. It's not a new idea.


 Is that an option on a hypothetical level, for D? Or does D's language
 design preclude that. I recall some sort of issue with using

 memory barriers into D to work or something like that.

Write barriers are required for an incremental GC to work. Write
barriers are unlikely to be accepted in D because they introduce
synchonization overhead per pointer write. This would kill D's
performance in low-level code, and I don't think Walter will ever accept
this.


 Also, it's hard to find one specific place for "news" regarding D's
 garbage collector development and "omg we deleted everything GC from
 the standard library" projects. Are there any major updates on those
 fronts?

https://dlang.org/blog/the-gc-series/

The GC isn't going anywhere.  AFAIK there's some work to eliminate
*unnecessary* GC usage in Phobos, et al, but there is no such project to
remove *all* GC use from Phobos.


T

-- 
Three out of two people have difficulties with fractions. -- Dirk Eddelbuettel

Jan 20 2022

Elronnd <elronnd elronnd.net> writes:

On Thursday, 20 January 2022 at 23:56:42 UTC, Chris Katko wrote:
 I'm just curious what the D language community's thoughts are 
 on it. The tradeoff is: For
 a longer total time / lower throughput, you reduce stress on 
 individual frames which prevents hiccups. That's pretty darn 
 important for games and soft/hard real-time systems.

You can already avoid pauses by simply not allocating.  I would 
prefer to see a generational gc (much higher throughput).  Note 
also incremental is not a panacea; the mutator can still get 
ahead of the collector; go has a somewhat laughable solution to 
this.

On Friday, 21 January 2022 at 00:55:43 UTC, H. S. Teoh wrote:
 Write barriers are required for an incremental GC to work. 
 Write barriers are unlikely to be accepted in D because they 
 introduce synchonization overhead per pointer write. This would 
 kill D's performance in low-level code, and I don't think 
 Walter will ever accept this.

I have heard this claim before.  I don't buy it.

1. Barriers are not that slow.  It is not a 'synchronization'; it 
is a cheap compare-and-branch that will be correctly predicted.  
(Nobody complains about bounds checking...)

2. You do not need a barrier on _every_ pointer write

3. It can be a compile-time option to disable it

4. Lack of typed pointers (gc/non-gc) can be solved 
conservatively with monomorphization

Jan 20 2022

Brad Roberts <braddr puremagic.com> writes:

On 1/20/2022 5:46 PM, Elronnd via Digitalmars-d wrote:
 On Friday, 21 January 2022 at 00:55:43 UTC, H. S. Teoh wrote:

 Write barriers are required for an incremental GC to work. Write 
 barriers are unlikely to be accepted in D because they introduce 
 synchronization overhead per pointer write. This would kill D's 
 performance in low-level code, and I don't think Walter will ever 
 accept this.

 
 I have heard this claim before.  I don't buy it.

This is clearly an area where anecdotals like "Walter will never accept 
this" aren't useful.  There's ample evidence to show that he accepts 
when he's proven wrong.

To make any progress towards a better GC it's going to have to be 
demonstrated.  And the proof will justify the changes.  If they're good 
and the complexities required to get there aren't awful, then they'll 
make into the code base.  If they're not, then they wont.  The status 
quo is easy to get stuck in and help no one.

Someone with a good background in GC's needs to step up and be it's 
champion.  I expect that several people would rally to help once some 
basic momentum is established.  So, willing to be the rallying cry and 
step up?

Jan 20 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 21/01/2022 3:17 PM, Brad Roberts wrote:
 To make any progress towards a better GC it's going to have to be 
 demonstrated.  And the proof will justify the changes.  If they're good 
 and the complexities required to get there aren't awful, then they'll 
 make into the code base.  If they're not, then they wont.  The status 
 quo is easy to get stuck in and help no one.
 
 Someone with a good background in GC's needs to step up and be it's 
 champion.  I expect that several people would rally to help once some 
 basic momentum is established.  So, willing to be the rallying cry and 
 step up?

I've recently bought (but waiting on delivery) The Garbage Collection 
Handbook. It covers generational GC's, memory allocators (which is what 
I'm interested in) and like 13 pages on write barriers.

It would be an interesting experiment to see if we added a -enablewb 
flag and have it enable more GC options at runtime, if this allowed some 
better fitting GC's.

Jan 20 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 21 January 2022 at 04:17:53 UTC, rikki cattermole 
wrote:
 I've recently bought (but waiting on delivery) The Garbage 
 Collection Handbook. It covers generational GC's, memory 
 allocators (which is what I'm interested in) and like 13 pages 
 on write barriers.

I've browsed through it. The writing style is very accessible, I 
think. It also has a chapter on real time garbage collection 
("wait-free" collection), but I guess that requires a different 
take on the language-runtime. Some also require OS-support, I 
guess.

 It would be an interesting experiment to see if we added a 
 -enablewb flag and have it enable more GC options at runtime, 
 if this allowed some better fitting GC's.

I think many programmers would shoot themselves in the the foot 
in  nogc code... You would need very good documentation and 
tutorials for that to be a reasonable option.

Jan 21 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 21/01/2022 10:34 PM, Ola Fosheim Grøstad wrote:
 It would be an interesting experiment to see if we added a -enablewb 
 flag and have it enable more GC options at runtime, if this allowed 
 some better fitting GC's.

 
 I think many programmers would shoot themselves in the the foot in  nogc 
 code... You would need very good documentation and tutorials for that to 
 be a reasonable option.

Yeah in practice this may require UDA's and all sorts to guarantee good 
speeds. With very easy to ways to segfault.

Not necessarily something D as a whole wants to go in, hence opt-in 
rather than opt-out.

Jan 21 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 21 January 2022 at 09:48:48 UTC, rikki cattermole 
wrote:
 Not necessarily something D as a whole wants to go in, hence 
 opt-in rather than opt-out.

If you are looking for niche options then there are many 
approaches you can look at.

Here is a Boehm-like collector for real time audio called 
*Rollendurchmesserzeitsammler*:

http://users.notam02.no/~kjetism/rollendurchmesserzeitsammler/

It would work for D, I guess.

Jan 21 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 21/01/2022 11:58 PM, Ola Fosheim Grøstad wrote:
 It would work for D, I guess.

Not likely, it would need a lot of work to bring that up to todays 
requirements.

For context:

A heap size of less than 1 megabyte may work very well. The cost
of duplicating a heap of 100.000 bytes only takes up about 2% extra cpu 
time on my
single processor personal computer from 2002-2003 running a blocksize of 
128.
(Note that a more common blocksize of 1028 would yield a cpu usage of 
1/8 of 2%).
Furthermore, since todays computers are many times faster than
this, especially those with 4 or 8 cores, (which the garbage collector
is able to fully utilitize (well, at least that's the plan)), the number
of bytes in the heap can probably be multiplied many times.

As long as the heap is reasonable small, which it probably should be 
while doing signal processing
tasks, duplicating the entire heap should not be a costly operation with 
current personal
computers. On a multiprocessor machine, using heaps of many megabytes 
should not
be a problem. I've also successfully tried using a 20MB heap on a 
dual-core intel machine.

Jan 21 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 21 January 2022 at 11:06:58 UTC, rikki cattermole 
wrote:
 On 21/01/2022 11:58 PM, Ola Fosheim Grøstad wrote:
 It would work for D, I guess.

 Not likely, it would need a lot of work to bring that up to 
 todays requirements.

What are the requirements?

If it is "niche", then that is application specific.

The GC is for audio-work. IIRC it aims to get close to hard real 
time without affecting the real time thread, so it is designed to 
use 20% of the time that it takes to fill one audio buffer by 
taking a copy of the pointer heap?

Makes sense to me for that particular niche. Might also work for 
embedded with the caveat that you need extra memory.

Jan 21 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 22/01/2022 12:27 AM, Ola Fosheim Grøstad wrote:
 On Friday, 21 January 2022 at 11:06:58 UTC, rikki cattermole wrote:
 On 21/01/2022 11:58 PM, Ola Fosheim Grøstad wrote:
 It would work for D, I guess.

 Not likely, it would need a lot of work to bring that up to todays 
 requirements.

 
 What are the requirements?

That project was a masters thesis in 2009 running on 2002 era hardware.

Size of data has changed, hardware has changed. It'll need work to prove 
it can even do the sort of workloads D has even in audio space.

Jan 21 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 21 January 2022 at 11:38:10 UTC, rikki cattermole 
wrote:
 On 22/01/2022 12:27 AM, Ola Fosheim Grøstad wrote:
 On Friday, 21 January 2022 at 11:06:58 UTC, rikki cattermole 
 wrote:
 On 21/01/2022 11:58 PM, Ola Fosheim Grøstad wrote:
 It would work for D, I guess.

 Not likely, it would need a lot of work to bring that up to 
 todays requirements.

 
 What are the requirements?

 That project was a masters thesis in 2009 running on 2002 era 
 hardware.

Yes, but Notam and CCRMA tend to attract programmers that take 
audio seriously (electro acoustic computer music). So, I would 
not dismiss it because it is a mater thesis.

 Size of data has changed, hardware has changed. It'll need work 
 to prove it can even do the sort of workloads D has even in 
 audio space.

Well, in his master thesis he covers the Boehm copy-heap 
collector and a collector based on barriers (which seems to use a 
lot of memory, but he claims it is essentially wait-free IIRC): 
https://www.duo.uio.no/handle/10852/8805

I haven't read the details, and maybe chapter 19 in the GC 
handbook is more interesting for you, but it is at least an 
example of a practical real time oriented approach to garbage 
collection. Which is in some sense is better than the high flying 
theories about what can be done in the GC department for D that 
tend to "poison" the forum discussions on the topic.

Jan 21 2022

user1234 <user1234 12.de> writes:

On Friday, 21 January 2022 at 11:27:51 UTC, Ola Fosheim Grøstad 
wrote:
 The GC is for audio-work. IIRC it aims to get close to hard 
 real time without affecting the real time thread, so it is 
 designed to use 20% of the time that it takes to fill one audio 
 buffer by taking a copy of the pointer heap?

No. The theory is that GC is not *that much a handicap* for real 
audio because it's *not really real-time*. It is not that "The GC 
is for audio-work". That theory is just totally wrong.

Jan 21 2022

user1234 <user1234 12.de> writes:

On Friday, 21 January 2022 at 11:47:05 UTC, user1234 wrote:
 On Friday, 21 January 2022 at 11:27:51 UTC, Ola Fosheim Grøstad 
 wrote:
 The GC is for audio-work. IIRC it aims to get close to hard 
 real time without affecting the real time thread, so it is 
 designed to use 20% of the time that it takes to fill one 
 audio buffer by taking a copy of the pointer heap?

 No. The theory is that GC is not *that much a handicap* for 
 real audio

sorry, "real-time audio" was meant here. just in case of.

Jan 21 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 21 January 2022 at 11:47:05 UTC, user1234 wrote:
 No. The theory is that GC is not *that much a handicap* for 
 real audio because it's *not really real-time*. It is not that 
 "The GC is for audio-work". That theory is just totally wrong.

It is possible to make it hard real time, by imposing a max heap 
size, which is reasonable for audio.

You need to use a garbage collector of some sort, either language 
level or application level if your application allows the user to 
build a graph. Which is some creative audio applications does 
(including audio programming languages).

If you can get away with using a language level collector then 
that will be the cheaper option. To get there you need to take a 
niche-approach like he does.

Jan 21 2022

Guillaume Piolat <first.last gmail.com> writes:

On Friday, 21 January 2022 at 11:27:51 UTC, Ola Fosheim Grøstad 
wrote:
 Makes sense to me for that particular niche. Might also work 
 for embedded with the caveat that you need extra memory.

While it must be interesting, another solution you have in D is 
to deregister audio threads so that a GC pause would let them run 
free. The audio threads rarely owns GC root anyway.

Despite popular conception, it's indeed common to allocate from 
the audio-thread at first buffer, it can avoid some pessimization 
depending on incoming parameters that are only known at play time.

Amortized malloc is surprisingly cheap, you can merge allocation 
buffers, and you would do it only once at initialization. A lot 
of the audio buffers are also scoped in nature, with a lifetime 
that is the one of the processing treatment. So you can also 
leverage specialized allocators to mass release those buffers 
rapidly.

In Dplug plugins we got a GC back :) by integrating Wren as UI 
scripting language. It is expected to cause very few problems, as 
it has a max-heap size and doesn't touch audio.

Jan 21 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 21 January 2022 at 13:00:31 UTC, Guillaume Piolat 
wrote:
 On Friday, 21 January 2022 at 11:27:51 UTC, Ola Fosheim Grøstad 
 wrote:
 Makes sense to me for that particular niche. Might also work 
 for embedded with the caveat that you need extra memory.

 While it must be interesting, another solution you have in D is 
 to deregister audio threads so that a GC pause would let them 
 run free. The audio threads rarely owns GC root anyway.

Yes, for a plugin that is reasonable. I think an audio-friendly 
garbage collector would be more for a language-like application 
like Max, or maybe even a complex sound editor.

But even if you do make your realtime tasks independent of the 
GC, it is still better to have a collector that is throttling the 
collection in the background so it doesn't wipe out the caches or 
saturate the data bus. Then you get more headroom for your 
realtime tasks (you get a more even load over time).

 Amortized malloc is surprisingly cheap, you can merge 
 allocation buffers, and you would do it only once at 
 initialization. A lot of the audio buffers are also scoped in 
 nature, with a lifetime that is the one of the processing 
 treatment. So you can also leverage specialized allocators to 
 mass release those buffers rapidly.

Hm, how would you use amortized algorithms in a real time thread 
where you want even load on every callback? I guess you could use 
"coroutines" and spread the load over many 
"interrupts"/"callbacks".

One strategy for dynamic allocation is to have "chunks" for same 
sized allocations (avoiding fragmentation and merging), and then 
do a worst case estimate for how many allocations can happen in 
one realtime-callback. Then a non-realtime thread can make sure 
that there is enough free space in the chunks to support N 
callbacks.

Another strategy I have used that may be used if you do not write 
a plugin, but create your own host, is to pass in the buffers 
needed in the events you send to the realtime thread. Then have a 
mailbox for returning the buffers and let a non-realtime thread 
do the cleanup.

 In Dplug plugins we got a GC back :) by integrating Wren as UI 
 scripting language. It is expected to cause very few problems, 
 as it has a max-heap size and doesn't touch audio.

Is that a dedicated GC for wren or is it a generic GC?

Jan 21 2022

Guillaume Piolat <first.last gmail.com> writes:

On Friday, 21 January 2022 at 18:53:03 UTC, Ola Fosheim Grøstad 
wrote:
 Is that a dedicated GC for wren or is it a generic GC?

Wren use its own non-moving, non-incremental, precise, 
stop-the-world mark&sweep GC.

Jan 21 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 21 January 2022 at 20:17:16 UTC, Guillaume Piolat 
wrote:
 On Friday, 21 January 2022 at 18:53:03 UTC, Ola Fosheim Grøstad 
 wrote:
 Is that a dedicated GC for wren or is it a generic GC?

 Wren use its own non-moving, non-incremental, precise, 
 stop-the-world mark&sweep GC.

But it only stops the wren-code, so it does not affect real time 
properties of the main program? I think this is the better 
approach for system level programming. Have dedicated memory 
management where you get to control the properties and 
priorities, and also have the ability to add new mechanisms (like 
purging cached objects).

IIRC, the author of the audio-oriented GC suggested implementing 
the collector as a coroutine, so that you could execute it when 
the CPU is idle in a controlled manner.

Many ways to be inventive here, if you focus on adapting to the 
specifics of the application. There is no-size-fits-all solution.

So, the main issue is not GV vs no-GC, but whether the 
language/libraries lock you to the specifics of a generic runtime.

Jan 22 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 23/01/2022 2:03 AM, Ola Fosheim Grøstad wrote:
 So, the main issue is not GV vs no-GC, but whether the 
 language/libraries lock you to the specifics of a generic runtime.

Having a decent allocator library at the runtime level available for all 
D code (not just for those with druntime) will be key to allowing this 
to happen.

Jan 22 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 22 January 2022 at 13:19:57 UTC, rikki cattermole 
wrote:
 Having a decent allocator library at the runtime level 
 available for all D code (not just for those with druntime) 
 will be key to allowing this to happen.

I think some language adjustments are needed, but it could be a 
dialect. Doesn't have to apply to DMD, as it seems to be locked 
down at this point.

You basically need some room to experiment to get it right, and 
that requires freedom (which will never happen if everyone in the 
forums are going to have their say).

So whether you want to focus on ARC, actors/fibers with local GC 
or concurrent GC, you should be able to adjust the language so 
that your vision can be enabled in a clean fashion.

I don't think having dialects is a big issue. I think it is the 
only way forward for automatic memory management + system level 
programming. You can design your dialect such that most straight 
forward single-threaded code run with next to no changes.

It is either that or another language will grab the user base 
that wants it.

Jan 22 2022

Guillaume Piolat <first.last gmail.com> writes:

On Saturday, 22 January 2022 at 13:03:26 UTC, Ola Fosheim Grøstad 
wrote:
 Wren use its own non-moving, non-incremental, precise, 
 stop-the-world mark&sweep GC.

 But it only stops the wren-code, so it does not affect real 
 time properties of the main program?

No because (roughly speaking) UI callbacks are in event loops / 
timers separated from audio callbacks, and wren is called from 
these UI callbacks.

So you can take as much time as you want to process a click, or 
in WM_PAINT, it will never affect audio playback (well, it will 
if you spawn many demanding threads from there).

To support that windowing in desktop OSes separates invalidation 
from repaint.

Jan 22 2022

Guillaume Piolat <first.last gmail.com> writes:

On Friday, 21 January 2022 at 18:53:03 UTC, Ola Fosheim Grøstad 
wrote:
 Hm, how would you use amortized algorithms in a real time 
 thread where you want even load on every callback?

Well I was using the term incorrectly, "amortized" in the sense 
the large majority of callback will not allocate, so in other 
words, the opposite :)

pushack_back leading to realloc can make a lot more trouble 
because it moves large amount of memory. Delay lines may need to 
be allocated with a max size / pessimized for that reason.

In game-mixer I had to use chunked allocation to avoid that while 
decoding an audio file with playback, in the case you want to 
keep it in memory.

Jan 21 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 21 January 2022 at 20:22:15 UTC, Guillaume Piolat 
wrote:
 Well I was using the term incorrectly, "amortized" in the sense 
 the large majority of callback will not allocate, so in other 
 words, the opposite :)

Yes, that is different, but actually using a stackless coroutine 
can be a solution to spread out the load when you need to. In D I 
guess you could have do it as a state machine using a switch 
statement. I've never done this in practice though…

 pushack_back leading to realloc can make a lot more trouble 
 because it moves large amount of memory. Delay lines may need 
 to be allocated with a max size / pessimized for that reason.

What is pushack_back? You can also build a delay with a chain of 
smaller fixed size buffers, although a bit unusual perhaps. If 
you need a filter you can do that in a smaller buffer. So many 
implementation options!

Jan 21 2022

Guillaume Piolat <first.last gmail.com> writes:

On Friday, 21 January 2022 at 21:29:06 UTC, Ola Fosheim Grøstad 
wrote:
 What is pushack_back?

push_back* ie. appending to a vector-like structure.

Jan 22 2022

Elronnd <elronnd elronnd.net> writes:

On Friday, 21 January 2022 at 02:17:34 UTC, Brad Roberts wrote:
 To make any progress towards a better GC it's going to have to 
 be demonstrated.  And the proof will justify the changes.  If 
 they're good and the complexities required to get there aren't 
 awful, then they'll make into the code base.  If they're not, 
 then they wont.  The status quo is easy to get stuck in and 
 help no one.

 Someone with a good background in GC's needs to step up and be 
 its champion.  I expect that several people would rally to help 
 once some basic momentum is established.  So, willing to be the 
 rallying cry and step up?

I agree.  But: it is an involved task that will likely not bear 
fruit until it is nearly done; I do not expect an implementation 
would perform very well until it performed the optimizations I 
mentioned elsewhere.  (Well, a first step would be to make the 
precise gc actually precise, which would benefit everybody; but 
after that...)

For my part, I think it would be a fun project, but it would not 
be a high enough priority for me to see it through unless 
somebody were paying me to do it full time (and who would pay for 
an 'unproven' idea? It is exactly the problem you identify).  
Maybe rikki will be the champion :)

Jan 20 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 21/01/2022 5:48 PM, Elronnd wrote:
 Maybe rikki will be the champion :)

I've considered it, but it wouldn't align with what I am doing with D atm.

I'd also like to see is fiber aware GC hooks and configurable memory 
allocator.

Jan 20 2022

Paulo Pinto <pjmlp progtools.org> writes:

On Friday, 21 January 2022 at 02:17:34 UTC, Brad Roberts wrote:
 On 1/20/2022 5:46 PM, Elronnd via Digitalmars-d wrote:
 On Friday, 21 January 2022 at 00:55:43 UTC, H. S. Teoh wrote:

 Write barriers are required for an incremental GC to work. 
 Write barriers are unlikely to be accepted in D because they 
 introduce synchronization overhead per pointer write. This 
 would kill D's performance in low-level code, and I don't 
 think Walter will ever accept this.

 
 I have heard this claim before.  I don't buy it.

 This is clearly an area where anecdotals like "Walter will 
 never accept this" aren't useful.  There's ample evidence to 
 show that he accepts when he's proven wrong.

 To make any progress towards a better GC it's going to have to 
 be demonstrated.  And the proof will justify the changes.  If 
 they're good and the complexities required to get there aren't 
 awful, then they'll make into the code base.  If they're not, 
 then they wont.  The status quo is easy to get stuck in and 
 help no one.

 Someone with a good background in GC's needs to step up and be 
 it's champion.  I expect that several people would rally to 
 help once some basic momentum is established.  So, willing to 
 be the rallying cry and step up?

To be honest, that is something that is more fun to do in the Go, 
.NET and Java communities, because not only no one cares about 
the matra that on a systems programming language GC verbotten is 
supposed to be a thing, there are hardware companies shipping 
real products with them.

So in those communities the whole discussion isn't if GC belongs 
or not in a systems programming language, being rebooted every 
couple of months, rather how to improve it to the point that e.g. 
writing firmware in Go is a reality that one can ship into a 
commercial product (USB Armory).

Jan 20 2022

IGotD- <nise nise.com> writes:

On Friday, 21 January 2022 at 02:17:34 UTC, Brad Roberts wrote:
 This is clearly an area where anecdotals like "Walter will 
 never accept this" aren't useful.  There's ample evidence to 
 show that he accepts when he's proven wrong.

 To make any progress towards a better GC it's going to have to 
 be demonstrated.  And the proof will justify the changes.  If 
 they're good and the complexities required to get there aren't 
 awful, then they'll make into the code base.  If they're not, 
 then they wont.  The status quo is easy to get stuck in and 
 help no one.

 Someone with a good background in GC's needs to step up and be 
 it's champion.  I expect that several people would rally to 
 help once some basic momentum is established.  So, willing to 
 be the rallying cry and step up?

I've been nagging about this for ages it feels. The problem is 
that D as no managed pointers. If that was the case, it would 
enable a myriad of GC possibilities for the D community to 
experiment with different GC algorithms. This at the same time 
keeping the possibility that managed pointers is just a pointer 
like today.

Today it doesn't matter if we have world class GC experts in the 
D community as the language itself limits what you can implement.

D suffers from that what can be seen in several other inventions. 
The inventor thinks that the invention is perfect and do not want 
to develop it further even if the shortcomings are obvious. Some 
other inventor comes along and drastically improves the design 
and the product really shoots off in popularity.

Jan 21 2022

Elronnd <elronnd elronnd.net> writes:

On Friday, 21 January 2022 at 10:00:17 UTC, IGotD- wrote:
 The problem is that D as no managed pointers.

Again, this can be solved conservatively at the 
implementation-level.

Jan 21 2022

IGotD- <nise nise.com> writes:

On Friday, 21 January 2022 at 10:13:17 UTC, Elronnd wrote:
 Again, this can be solved conservatively at the 
 implementation-level.

Basically no.

D has a lot of standard library types in druntime/phobos, like 
arrays and other containers. Even if you implement your own GC at 
implementation level, the standard library types will still use 
the default GC and having other GC properties you probably don't 
want. Managed pointers enables you to swap out the entire default 
GC all together including GC usage in the standard library.

Jan 21 2022

Elronnd <elronnd elronnd.net> writes:

On Friday, 21 January 2022 at 10:42:58 UTC, IGotD- wrote:
 Again, this can be solved conservatively at the 
 implementation-level.

 Basically no.
 Managed pointers enables you to swap out the entire default
 GC all together including GC usage in the standard library.

I guess I was insufficiently clear.  My bad.  I am talking about 
a mechanism that works without needing to modify any existing 
code.  Let's say we have this:

void f(int** x) {
	*x = new int;
}
void g() {
	int** x = new int*;
	f(x);
	int* y;
	f(&y);
}

and we want to add a generational gc, which barriers.  So naively 
we rewrite f as follows:

void f(int** x) {
	store_barrier(x, new int);
}

This will involve overhead because we don't know if x is a gc 
pointer or not.  So instead, generate multiple definitions and 
rewrite callers:

void f_gc(int** x) {
	store_barrier(x, new int);
}
void f_nogc(int** x) {
	*x = new int; //no barrier!
}
void g() {
	int** x = new int*;
	f_gc(x);
	int* y;
	f_nogc(&y); //no overhead from passing stack object!
}

Of course this transformation is conservative: you will sometimes 
call f_gc with a non-gc pointer.  But I bet it works 99% of the 
time such that the runtime overhead is negligible in practice.

Jan 21 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Jan 21, 2022 at 11:12:01AM +0000, Elronnd via Digitalmars-d wrote:
[...]
 void f(int** x) {
 	*x = new int;
 }
 void g() {
 	int** x = new int*;
 	f(x);
 	int* y;
 	f(&y);
 }
 
 and we want to add a generational gc, which barriers.  So naively we
 rewrite f as follows:
 
 void f(int** x) {
 	store_barrier(x, new int);
 }
 
 This will involve overhead because we don't know if x is a gc pointer or
 not.  So instead, generate multiple definitions and rewrite callers:
 
 void f_gc(int** x) {
 	store_barrier(x, new int);
 }
 void f_nogc(int** x) {
 	*x = new int; //no barrier!
 }
 void g() {
 	int** x = new int*;
 	f_gc(x);
 	int* y;
 	f_nogc(&y); //no overhead from passing stack object!
 }

So basically *every* function that receives pointers in some way (could
be implicitly via built-in slices, for example) will essentially be
implicitly templatized?  That will lead to a LOT of template bloat. I'm
not sure if this is a good idea.


 Of course this transformation is conservative: you will sometimes call
 f_gc with a non-gc pointer.  But I bet it works 99% of the time such
 that the runtime overhead is negligible in practice.

What we need is a working proof-of-concept implementation that we can
actually benchmark and compare with the current situation, so that we
can make decisions based on actual data. Otherwise, it's just your word
against mine and my word against yours, which gets us nowhere.


T

-- 
If blunt statements had a point, they wouldn't be blunt...

Jan 21 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 21 January 2022 at 10:13:17 UTC, Elronnd wrote:
 On Friday, 21 January 2022 at 10:00:17 UTC, IGotD- wrote:
 The problem is that D as no managed pointers.

 Again, this can be solved conservatively at the 
 implementation-level.

It could, in theory, if you change the semantics of  nogc and 
annotate everything with  nogc that the collector must leave 
alone. I suspect that some of the existing  nogc code would have 
to be rewritten. But this approach is very close to introducing a 
new pointer type.

You also need to prevent scannable pointers to be available 
through unions, so a breaking language change is practically 
unavoidable if you want to make a good GC collector for the 
language.

There is also no way for D to force barriers on linked code, 
hence the need for a breaking language level change at some level.

(I assume many approaches are possible, but all require a 
breaking change).

Jan 21 2022

Elronnd <elronnd elronnd.net> writes:

On Friday, 21 January 2022 at 12:26:42 UTC, Ola Fosheim Grøstad 
wrote:
 It could, in theory, if you change the semantics of  nogc and 
 annotate everything with  nogc that the collector must leave 
 alone.

You do not.  See my other post where I clarified what I meant 
with this approach.

 You also need to prevent scannable pointers to be available 
 through unions

You do not.  You do need to do _something_ about unions, but you 
do need to disallow pointers in them.  The spec says to pin 
objects which are in unions, which is a perfectly acceptable 
solution.  There are also cleverer solutions which violate that 
clause of the spec but do not break any actual code.

 There is also no way for D to force barriers on linked code, 
 hence the need for a breaking language level change at some 
 level.

You can do it with name mangling.  D is not binary compatible 
between releases.

------------------------------------

Bottom line: no source-level changes are necessary.

Jan 21 2022

Elronnd <elronnd elronnd.net> writes:

On Friday, 21 January 2022 at 12:52:02 UTC, Elronnd wrote:
 You do need to do _something_ about unions, but you do need to 
 disallow pointers in them.

Err, this should read: 'you do _not_ need to disallow...'

Jan 21 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 21 January 2022 at 12:52:02 UTC, Elronnd wrote:
 You do not.  You do need to do _something_ about unions, but 
 you do need to disallow pointers in them.  The spec says to pin 
 objects which are in unions, which is a perfectly acceptable 
 solution.  There are also cleverer solutions which violate that 
 clause of the spec but do not break any actual code.

No, pinning is not a solution, that is band-aid for a broken 
design. You'll end up with memory leaks all over the place.

Regarding your suggestion, this is not as trivial as you imply. 
There is no way today for the type system to know where a pointer 
points. So there is also no way to know whether you need to apply 
a barrier or not. Since non-D libraries can mutate pointers there 
is also no way to be sure that they only modify pointers that 
does not require barriers. As a result you risk having spurious 
bugs that are impossible to track.

You need some kind of language change to make the type system 
work properly with barriers.

Jan 21 2022

Guillaume Piolat <first.last gmail.com> writes:

  On Friday, 21 January 2022 at 02:17:34 UTC, Brad Roberts wrote:
 To make any progress towards a better GC it's going to have to 
 be demonstrated.

The GC is not as slow and annoying as it used to be. When I went 
to Dconf 2019 I was surprised to learn that it had been speed-up 
during the year (with both fork-GC and speed-up efforts by Rainer 
IIRC) but this wasn't advertised anywhere. IT's easy to be under 
the impression that the GC is the same than 10 years ago when 
it's not written anywhere what changes happened, you'd have to 
peruse all the release notes. Who knows the real timeline of GC 
improvement in D? What happened?

Jan 21 2022

user1234 <user1234 12.de> writes:

On Friday, 21 January 2022 at 12:45:42 UTC, Guillaume Piolat 
wrote:
  On Friday, 21 January 2022 at 02:17:34 UTC, Brad Roberts wrote:
 To make any progress towards a better GC it's going to have to 
 be demonstrated.

 The GC is not as slow and annoying as it used to be. When I 
 went to Dconf 2019 I was surprised to learn that it had been 
 speed-up during the year (with both fork-GC and speed-up 
 efforts by Rainer IIRC) but this wasn't advertised anywhere.

I agree. The GC run-time options are poorly documented. It's 
written nowhere that druntime can take specific arguments that 
change the GC behavior.

Jan 21 2022

Adam D Ruppe <destructionator gmail.com> writes:

On Friday, 21 January 2022 at 12:51:11 UTC, user1234 wrote:
 I agree. The GC run-time options are poorly documented. It's 
 written nowhere that druntime can take specific arguments that 
 change the GC behavior.

the page
dlang.org/garbage

has a header called "nowhere":

https://dlang.org/spec/garbage.html#gc_config

Jan 21 2022

user1234 <user1234 12.de> writes:

On Friday, 21 January 2022 at 13:10:46 UTC, Adam D Ruppe wrote:
 On Friday, 21 January 2022 at 12:51:11 UTC, user1234 wrote:
 I agree. The GC run-time options are poorly documented. It's 
 written nowhere that druntime can take specific arguments that 
 change the GC behavior.

 the page
 dlang.org/garbage

 has a header called "nowhere":

 https://dlang.org/spec/garbage.html#gc_config

okay it's better nowadays, I've upgraded my point of view.

Jan 21 2022

Mike Parker <aldacron gmail.com> writes:

On Friday, 21 January 2022 at 12:45:42 UTC, Guillaume Piolat 
wrote:
 The GC is not as slow and annoying as it used to be. When I 
 went to Dconf 2019 I was surprised to learn that it had been 
 speed-up during the year (with both fork-GC and speed-up 
 efforts by Rainer IIRC) but this wasn't advertised anywhere. 
 IT's easy to be under the impression that the GC is the same 
 than 10 years ago when it's not written anywhere what changes 
 happened, you'd have to peruse all the release notes. Who knows 
 the real timeline of GC improvement in D? What happened?

IIRC, Rainer's work on the GC was discussed in the forums at the 
time. The fork-based GC was a SAOC 2018 project, but it wasn't 
actually merged until last fall. [I announced it on the 
blog](https://dlang.org/blog/2021/10/29/dlang-news-september-october-2021-d-2-098-0-openbsd-saoc
dconf-online-swag/) and [in a YouTube
video](https://youtu.be/jX9grHMTGAU?t=229).

Jan 21 2022

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Thursday, 20 January 2022 at 23:56:42 UTC, Chris Katko wrote:
 I just found out that Unity has incremental garbage collection. 
 I didn't know that was a possibility for GCs.

 https://docs.unity3d.com/Manual/performance-incremental-garbage-collection.html

 I'm just curious what the D language community's thoughts are 
 on it. The tradeoff is: For a longer total time / lower 
 throughput, you reduce stress on individual frames which 
 prevents hiccups. That's pretty darn important for games and 
 soft/hard real-time systems.

  I would say it really depends on a handful of things, like how 
much allocating/deallocating you're actually doing, and if 
fragmentation or things would be the issue and if doing it more 
often would help. I try to write code using the stack rather than 
allocating for small temporary items which frees itself when it 
leaves the function.

  Though, having the GC pick up while waiting on the OS to reply 
would be a great time to do it's work; that would be the best 
time to work.

  As for handling fragmentation, I'd almost wish to make a 
different type of allocator which gives you an id and you use the 
id+offset to get the actual address (*though to make it work it 
would be a new type of slice that would handle those details 
transparently*); Then periodically (*say every 50ms or 
something*) it would do a quick check/scan for a hole, and then 
move data and adjust the values of said id's to remove empty 
holes and make it as tight as possible.

  I'm also not sure how well it would work on multi-threaded 
applications, though having locks which the GC locks the id's 
while it moves them and then unlocks *should* handle those 
details.

This of course only would be needed if you have 
limited/preallocated memory and fragmentation (*due to some parts 
being too small*) could kill the process. Otherwise it would 
probably be a lot of busywork.

To note while i love contemplating this stuff, I'm not quite sure 
how to implement it all myself. I don't have the same mental 
fortitude i had when i was 14.

Jan 21 2022

Bienlein <jeti789 web.de> writes:

On Thursday, 20 January 2022 at 23:56:42 UTC, Chris Katko wrote:
 I just found out that Unity has incremental garbage collection. 
 I didn't know that was a possibility for GCs.

 https://docs.unity3d.com/Manual/performance-incremental-garbage-collection.html

Are you kidding? The incremental GC based on the train algorithm 
(see https://wiki.c2.com/?TheTrainAlgorithm) that earlier was 
part of the JVM has meanwhile been removed from it since there 
are newer better ones.

Jan 26 2022

Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:

On Thursday, 20 January 2022 at 23:56:42 UTC, Chris Katko wrote:
 I just found out that Unity has incremental garbage collection. 
 I didn't know that was a possibility for GCs.

 https://docs.unity3d.com/Manual/performance-incremental-garbage-collection.html

 I'm just curious what the D language community's thoughts are 
 on it. The tradeoff is: For
 a longer total time / lower throughput, you reduce stress on 
 individual frames which prevents hiccups. That's pretty darn 
 important for games and soft/hard real-time systems.

 Is that an option on a hypothetical level, for D? Or does D's 
 language design preclude that. I recall some sort of issue with 

 have to insert memory barriers into D to work or something like 
 that.

 Also, it's hard to find one specific place for "news" regarding 
 D's garbage collector development and "omg we deleted 
 everything GC from the standard library" projects. Are there 
 any major updates on those fronts?

Just to add to this for completeness sake: a while ago I spent a 
few hours digging through Uneal Engine source code to understand 
their GC algorithm. It's a peculiar and interesting take on 
iterative, precise, conservative garbage collection:

- First of all, UE4 has precise runtime reflection information 
for all data types that are handled via the GC (everything 
derived from UObject - which is mostly game logic code). Heap 
scans are optimized aggressively based on this data. Pointers are 
regular naked C++ pointers without any additional magic. They 
need to be annotated for the custom preprocessor that generates 
the reflection data, but that's it. The C++ code touching these 
pointers is plain old code using naked pointers. No hidden 
wrappers there.

- The algorithm is basically mark and sweep with a couple of 
awful situational hacks that stem from weird engine subsystem 
interactions. I'll gloss over those. The GC performs a multistep 
GC process. Each frame, the engine prompts the GC to run one step 
in a stop-the-world fashion.

- Each step, the GC can either run an *entire* mark phase or a 
single step of the sweep phase. The mark step is always run in 
its entirety, never in multiple steps. The sweep phase is broken 
up into steps that free a limited number of marked objects in 
each step. The rest is deferred to the next step. When sweep 
phase is done, the next GC step will be a new mark phase.

IMO, this is a very pragmatic approach. But for the runtime to be 
truly bounded, the heap size needs to be bounded, too. I assume 
that this works for UE4 because for most games using the engine, 
the number of objects involved in the game logic at any point is 
fairly small (mostly ~ a few hundred). Stopping the world for the 
mark phase is also a major simplification. This avoids the 
overhead typically incurred by GC algorithms that do incremental 
marking. Also, GC references can be treated just like regular 
pointers.

I don't know how the D garbage collector runtime is split between 
mark and sweep phases. If the sweep phase is indeed the slower 
phase, an option to have that run iteratively might be enticing. 
I would naively assume that this would be lower hanging fruit 
compared to a generational and/or concurrent GC. But then again, 
I don't have a need for any of that at the moment.

Jan 29 2022

D Programming

C/C++ Programming

Other

digitalmars.D - Incremental garbage collection