www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Per thread heap, GC, etc.

reply Markk <markus.kuehni triviso.ch> writes:
Hi,

D has this nice default per-thread static memory model, i.e. if I 
understand all this correctly, this allows for better, more 
natural thread safety, while it makes it generally unsafe to use 
this memory from other threads (without locking). I guess the 
same is implicitly true for stack memory.

Now could it equally make sense to use per-thread heaps?

I.e. all allocations would need to be per thread, and it would be 
illegal to reference memory of one thread's heap, static memory, 
or stack from another thread's memory.

Some RAII locking could pin message-passed (etc.) references 
temporarily down for them to be used legally by another thread. 
This would make the sharing of the pointer known to the original 
thread for a strictly scoped time. Perhaps the `synchronized` 
keyword could be used for these stack references (just a spur of 
the moment proposal for the purpose of this discussion). Pinning 
is coupled with message-passing (etc.), i.e. no additional 
locking required.

For permanent change of ownership i.e. storing a reference in 
static or heap memory of the other thread, the referenced memory 
would have to be copied. I.e. there are no `synchronized` 
references from inside static or heap (i.e. non-stack, 
non-scoped) memory.

I guess `synchronized` class objects could get their own, 
non-thread specific, a.k.a. shared heap (similar in a way to the 
`shared` static memory). All stack references to `synchronized` 
class objects would have to be marked with `synchronized` too (or 
this might be inferred). `synchronized` class references stored 
in other (non-stack, non-scoped) thread memory would still be 
illegal.

Given the above, the GC could be run per thread. The world would 
not have to be stopped! Which means that some threads could 
entirely run without GC while others could still benefit from 
what I personally think is the only universal and scalable 
solution to memory safety. As a middle ground, some threads might 
only use a controlled amount of allocations, therefore GC runs 
would be super-fast, perhaps still acceptable under (near) 
real-time performance constrains.

The model would force developers towards a more modularized, per 
thread (service?) oriented architecture where message passing and 
lock free programming would be king... (said from a "schoolbook" 
understanding of these matters ;-)).

Also, I guess the performance of the resulting lock free heap 
allocs/frees, of the now (by language guarantee) lock free thread 
safe memory accesses, of the now per thread, smaller and (per 
definition) lock free GC runs etc. would improve.

Being per-thread i.e. non-preemptive, this could also simplify 
the GC and allow for more compiler optimizations, I guess. There 
is no danger of register aliasing and whatnot, that I can only 
guess makes preemptively interrupting GC correctness under high 
compiler optimization hard.

Just some thoughts after reading a handful Rust and D books... 
and after having seen so many wrinkle their noses at GC ... and 
therefore, unfortunately D.

_Mark
May 14 2021
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 14 May 2021 at 13:48:12 UTC, Markk wrote:
 Just some thoughts after reading a handful Rust and D books... 
 and after having seen so many wrinkle their noses at GC ... and 
 therefore, unfortunately D.
What do you think of per-task GC? https://forum.dlang.org/post/yqdwgbzkmutjzfdhotst forum.dlang.org
May 14 2021
parent reply Markk <markus.kuehni triviso.ch> writes:
On Friday, 14 May 2021 at 14:14:32 UTC, Ola Fosheim Grøstad wrote:
 On Friday, 14 May 2021 at 13:48:12 UTC, Markk wrote:
 Just some thoughts after reading a handful Rust and D books... 
 and after having seen so many wrinkle their noses at GC ... 
 and therefore, unfortunately D.
What do you think of per-task GC? https://forum.dlang.org/post/yqdwgbzkmutjzfdhotst forum.dlang.org
I think the per thread-model could be one very powerful implementation method for your proposal. So the two don't contradict each other at all, at least as far as I understand. However, the per-thread association does more than just introduce a new Allocator variant. It introduces language guarantees, mostly through the fact that there is a stack clearly associated per thread and therefore clear scoping is granted (this extends to fibers with a bit more complexity). The other rules for that guarantee are described above. The proposal would be very "D", i.e. analog to the default thread local static data. In the same spirit as D's thread local static data, it addresses concurrency issues along with memory issues and therefore makes it simpler to code, simpler to implement (non-preemtive), plus more performant at the same time. Personally, I think introducing explicit Allocators, i.e. chaining them through everything as (template) parameters or (worse) booby trapping with sneaky overloads, makes code much more complex, IMHO unbearably so. For contrast, making memory management more of a per-thread thing, could solve this concern too. You can easily set an Allocator as a per thread setting and be done with it. The discussed language guarantees would make sure your allocations are not mixed or illegally referenced across. The exact same code can be run under different Allocators without template code bloat or parametrizing overhead. When the thread terminates (or drops out of the defining scope), the whole memory can be freed as a whole. The discussed language guarantees would make sure you have nothing mixed and dangling. If GC was your set Allocator, no final collection is needed, as you equally proposed ;-). _Mark
May 14 2021
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 14 May 2021 at 15:00:20 UTC, Markk wrote:
 When the thread terminates (or drops out of the defining 
 scope), the whole memory can be freed as a whole. The discussed 
 language guarantees would make sure you have nothing mixed and 
 dangling. If GC was your set Allocator, no final collection is 
 needed, as you equally proposed ;-).
Yes, the problem is that spinning up a new thread is costly, and in order to get the benefits of "no final collection" the thread should be short lived. So, that is where tasks come to the rescue. If you can split the work-load on many short-lived tasks then it can execute on many threads at the same time and still not cause any collection cycle. Of course, if you allow suspension of execution then you need to deal with saving the stack somehow or implement stackless coroutines (or something similar). Anyway, I am happy to see that we are on the same page in general, let's keep the ideas on this flowing :-). Then maybe we can come up with something nice over time. Cheers, Ola.
May 14 2021
parent reply Markk <markus.kuehni triviso.ch> writes:
On Friday, 14 May 2021 at 15:21:30 UTC, Ola Fosheim Grøstad wrote:
 On Friday, 14 May 2021 at 15:00:20 UTC, Markk wrote:
 Yes, the problem is that spinning up a new thread is costly, 
 and in order to get the benefits of "no final collection" the 
 thread should be short lived.
I think thread pooling (along with the scoped release I described) and/or D's fibers address all of these concerns in very elegant ways. Again, D is already very, very close. _Mark
May 14 2021
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 14 May 2021 at 15:30:53 UTC, Markk wrote:
 On Friday, 14 May 2021 at 15:21:30 UTC, Ola Fosheim Grøstad 
 wrote:
 On Friday, 14 May 2021 at 15:00:20 UTC, Markk wrote:
 Yes, the problem is that spinning up a new thread is costly, 
 and in order to get the benefits of "no final collection" the 
 thread should be short lived.
I think thread pooling (along with the scoped release I described) and/or D's fibers address all of these concerns in very elegant ways. Again, D is already very, very close. _Mark
D rox ☀️ Let's improve it so it becomes perfect 😁
May 14 2021
prev sibling next sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 14 May 2021 at 13:48:12 UTC, Markk wrote:
 Hi,

 Just some thoughts after reading a handful Rust and D books... 
 and after having seen so many wrinkle their noses at GC ... and 
 therefore, unfortunately D.

 _Mark
Interesting thoughts. Just a general question I've been trying to understand: Why do you think ppl "wrinkle their noses at GC"? 🤔 I don't get it. GC 4 life!! 🎶☀️ (yes I know in what circumstances you can't use it)
May 14 2021
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 14 May 2021 at 15:10:37 UTC, Imperatorn wrote:
 On Friday, 14 May 2021 at 13:48:12 UTC, Markk wrote:
 Hi,

 Just some thoughts after reading a handful Rust and D books... 
 and after having seen so many wrinkle their noses at GC ... 
 and therefore, unfortunately D.

 _Mark
Interesting thoughts. Just a general question I've been trying to understand: Why do you think ppl "wrinkle their noses at GC"? 🤔
For low level programming in general: 1. real time issues/performance 2. unpredictable cleanup (finalization/RAII) 3. higher memory consumption 4. more challenging interop with other languages 5. cannot be used in some execution contexts For D specifically: 1. freezing all GC threads 2. no tracking of ownership-type in the type system
May 14 2021
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 14 May 2021 at 15:16:55 UTC, Ola Fosheim Grøstad wrote:
 On Friday, 14 May 2021 at 15:10:37 UTC, Imperatorn wrote:
 On Friday, 14 May 2021 at 13:48:12 UTC, Markk wrote:
 Hi,

 Just some thoughts after reading a handful Rust and D 
 books... and after having seen so many wrinkle their noses at 
 GC ... and therefore, unfortunately D.

 _Mark
Interesting thoughts. Just a general question I've been trying to understand: Why do you think ppl "wrinkle their noses at GC"? 🤔
For low level programming in general: 1. real time issues/performance 2. unpredictable cleanup (finalization/RAII) 3. higher memory consumption 4. more challenging interop with other languages 5. cannot be used in some execution contexts For D specifically: 1. freezing all GC threads 2. no tracking of ownership-type in the type system
For low level I wouldn't even think of trying to use GC. 1. Yeah, the implementation could be improved
May 14 2021
prev sibling parent reply Markk <markus.kuehni triviso.ch> writes:
On Friday, 14 May 2021 at 15:10:37 UTC, Imperatorn wrote:
 On Friday, 14 May 2021 at 13:48:12 UTC, Markk wrote:
 Hi,

 Just some thoughts after reading a handful Rust and D books... 
 and after having seen so many wrinkle their noses at GC ... 
 and therefore, unfortunately D.

 _Mark
Interesting thoughts. Just a general question I've been trying to understand: Why do you think ppl "wrinkle their noses at GC"? 🤔 I don't get it. GC 4 life!! 🎶☀️ (yes I know in what circumstances you can't use it)
First, I love the proposition of a GC. Most concerns are probably unfounded and ill-informed. But look at even this forum. It seems to be one of the biggest no-gos for D. Some of that is justified, such as for near-realtime performance or embedded. Some of it is just driven by fashion-victim hypes around Rust (a terrible language) and others. Why is it so important? Mostly because it is an all-or-nothing proposition. Even one dependency (lib) can lock you in. AFAIK, you can't break out of it, even for parts of your application that - in themselves - don't use it. The thread proposition would alleviate all these concerns. You could finally, truly have it both ways at the same time. D is the only language that I know that comes this close! _Mark
May 14 2021
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Friday, 14 May 2021 at 15:23:13 UTC, Markk wrote:
 But look at even this forum. It seems to be one of the biggest 
 no-gos for D.
There's two types of programmer: the ones busy doing actual productive work and the ones with the spare time to complain about GC on the forum.
May 14 2021
next sibling parent Markk <markus.kuehni triviso.ch> writes:
On Friday, 14 May 2021 at 15:30:46 UTC, Adam D. Ruppe wrote:
 On Friday, 14 May 2021 at 15:23:13 UTC, Markk wrote:
 There's two types of programmer: the ones busy doing actual 
 productive work and the ones with the spare time to complain 
 about GC on the forum.
I agree with that statement, but then I also believe that D should address the GC concern. Given how close D already is, it would be a shame not to. :-D _Mark
May 14 2021
prev sibling parent reply russhy <russhy gmail.com> writes:
On Friday, 14 May 2021 at 15:30:46 UTC, Adam D. Ruppe wrote:
 On Friday, 14 May 2021 at 15:23:13 UTC, Markk wrote:
 But look at even this forum. It seems to be one of the biggest 
 no-gos for D.
There's two types of programmer: the ones busy doing actual productive work and the ones with the spare time to complain about GC on the forum.
nobody complain about the GC people complain about the fact everything is modeled around the idea of a poors man GC, implemented and served to everyone by force
May 14 2021
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, May 14, 2021 at 04:25:47PM +0000, russhy via Digitalmars-d wrote:
 On Friday, 14 May 2021 at 15:30:46 UTC, Adam D. Ruppe wrote:
 On Friday, 14 May 2021 at 15:23:13 UTC, Markk wrote:
 But look at even this forum. It seems to be one of the biggest
 no-gos for D.
IMO that impression is misleading, because those who are happy with the GC are silent and you don't hear from them, and the ones complaining about it are the vocal minority.
 There's two types of programmer: the ones busy doing actual
 productive work and the ones with the spare time to complain about
 GC on the forum.
nobody complain about the GC
Oh the irony.
 people complain about the fact everything is modeled around the idea
 of a poors man GC, implemented and served to everyone by force
Nobody is forcing you to do anything. If you don't like D because of the GC, there's plenty of alternatives, like Rust, that people here seem to love talking about. Nobody's twisting your arm that you must use D, and nobody's holding a gun to your head that you must use the GC. :-D T -- Why do conspiracy theories always come from the same people??
May 14 2021
next sibling parent Markk <markus.kuehni triviso.ch> writes:
On Friday, 14 May 2021 at 16:59:58 UTC, H. S. Teoh wrote:

 IMO that impression is misleading, because those who are happy 
 with the GC are silent and you don't hear from them, and the 
 ones complaining about it are the vocal minority.
Yes that could well explain a good proportion of it. But then I do find the argument valid for many application scenarios. And I would probably think less of it if I hadn't watched many of the Dconf and other sessions, where the topic of non-GC memory safety seems very dominant. There is this constant "the grass is greener over there" vibe coming across, with nods to Rust et al. All the proposals I encountered so far ( life etc.) always want to ditch the GC entirely (as a global application choice) and that's something that I think will be very damaging to D's power and ecosystem. It will effectively ban all the existing D code base from these applications and make the life so much harder for those that want to support both worlds. I guess there will be two stdlibs and two of everything, or if unified it will be crippled. The proposal I made here might fix this. If it works, it is the best of both worlds, combined. _Mark
May 14 2021
prev sibling next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 14 May 2021 at 16:59:58 UTC, H. S. Teoh wrote:
 Nobody is forcing you to do anything. If you don't like D 
 because of the GC, there's plenty of alternatives, like Rust, 
 that people here seem to love talking about.  Nobody's twisting 
 your arm that you must use D, and nobody's holding a gun to 
 your head that you must use the GC.
Everybody understands that, but you also have to look at where computing is heading and where people are moving. The trend now is that many people create new languages (thanks to LLVM) for system-like programming. As a result you get many small eco system that cannot sustain themselves well. The big winners... C++/Rust and other languages that have momentum.
May 14 2021
prev sibling parent reply russhy <russhy gmail.com> writes:
On Friday, 14 May 2021 at 16:59:58 UTC, H. S. Teoh wrote:
 On Fri, May 14, 2021 at 04:25:47PM +0000, russhy via 
 Digitalmars-d wrote:
 On Friday, 14 May 2021 at 15:30:46 UTC, Adam D. Ruppe wrote:
 On Friday, 14 May 2021 at 15:23:13 UTC, Markk wrote:
 But look at even this forum. It seems to be one of the 
 biggest no-gos for D.
IMO that impression is misleading, because those who are happy with the GC are silent and you don't hear from them, and the ones complaining about it are the vocal minority.
 There's two types of programmer: the ones busy doing actual 
 productive work and the ones with the spare time to complain 
 about GC on the forum.
nobody complain about the GC
Oh the irony.
 people complain about the fact everything is modeled around 
 the idea of a poors man GC, implemented and served to everyone 
 by force
Nobody is forcing you to do anything. If you don't like D because of the GC, there's plenty of alternatives, like Rust, that people here seem to love talking about. Nobody's twisting your arm that you must use D, and nobody's holding a gun to your head that you must use the GC. :-D T
because of phobos and its people i should stop write D? i like D with core.stdc, i don't touch anything from std.
May 14 2021
parent sighoya <sighoya gmail.com> writes:
On Friday, 14 May 2021 at 23:13:38 UTC, russhy wrote:
 because of phobos and its people i should stop write D? i like 
 D with core.stdc, i don't touch anything from std.
No, please not. But changing stdlib to support other forms of GC may lead to breakage of existing functionality which require to rewrite existing code using GC.
people complain about the fact everything is modeled around the 
idea of a poors man GC
Could you elaborate more about the poor man GC? I think this talked is concerned a bit around that. What are your ideas to make it non poor man?
May 15 2021
prev sibling next sibling parent reply IGotD- <nise nise.com> writes:
On Friday, 14 May 2021 at 13:48:12 UTC, Markk wrote:
 Hi,

 [...]

 _Mark
I'm very much against binding any dynamic memory to any thread. It collides with a lot of programming models. For example threads created outside D, in C++ or any other language has no knowledge of D GC memory. This means that FFI is much more complicated. Also threads are like prostitutes, they do the work of the client and then another client comes along doing some other work. Typically example are thread pools where any thread can do any work. Also bring fibers into the equation makes this even more unfitting. Memory bounded to a thread is a bad idea and as time moves on it becomes more clear that a program should not assume which thread they are running (should only operate on self) and also not which CPU they are running on.
May 14 2021
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 14 May 2021 at 15:37:22 UTC, IGotD- wrote:
 On Friday, 14 May 2021 at 13:48:12 UTC, Markk wrote:
 [...]
I'm very much against binding any dynamic memory to any thread. It collides with a lot of programming models. For example threads created outside D, in C++ or any other language has no knowledge of D GC memory. This means that FFI is much more complicated. [...]
Kinda agree on this. A thread is a virtualization of the cpu and a process is a virtualization of the memory.
May 14 2021
prev sibling parent reply Markk <markus.kuehni triviso.ch> writes:
On Friday, 14 May 2021 at 15:37:22 UTC, IGotD- wrote:

 I'm very much against binding any dynamic memory to any thread. 
 It collides with a lot of programming models. For example 
 threads created outside D, in C++ or any other language has no 
 knowledge of D GC memory. This means that FFI is much more 
 complicated.
I disagree by 180°. If the memory management is associated with the thread, such "foreign" threads would be completely left alone by D or the GC, and that's exactly as it should be. Passing memory from/to that thread to a D thread is already a difficult thing to do right, this proposal would make that safer and more formal.
 Also threads are like prostitutes, they do the work of the 
 client and then another client comes along doing some other 
 work. Typically example are thread pools where any thread can 
 do any work.
Again, I disagree. Please read my earlier post about how the Allocator assignment could be scoped (e.g. by the pool bootstrapper) https://forum.dlang.org/post/mqfuxbuuhpvqeyvxoang forum.dlang.org, Pool usage could be supported in a very natural way and be very fast, because for short tasks, the GC would never run and all the memory could be jettisoned en bloc when the thread task goes out of scope.
 Also bring fibers into the equation makes this even more 
 unfitting.
Obviously fibers need to share the same thread heap/Allocator. Other than that, scoping/RAII (of the pinned references) is still valid and this is the most important thing. The limitations/language guarantees would sometimes be overly strict between fibers of the same thread, but they are still valid. Fibers are scheduled cooperatively, i.e. non-preemtively, so the premise to make the GC simpler/faster, holds. So what is the problem, exactly?
 Memory bounded to a thread is a bad idea and as time moves on 
 it becomes more clear that a program should not assume which 
 thread they are running (should only operate on self) and also 
 not which CPU they are running on.
This contradicts everything I read about locality becoming more and more important with modern multi-core processors. The following is simply the best article I ever, ever read about the issue. I recommend reading it: https://www.informit.com/articles/article.aspx?p=1609144 _Mark
May 14 2021
parent reply IGotD- <nise nise.com> writes:
On Friday, 14 May 2021 at 17:02:00 UTC, Markk wrote:
 I disagree by 180°. If the memory management is associated with 
 the thread, such "foreign" threads would be completely left 
 alone by D or the GC, and that's exactly as it should be. 
 Passing memory from/to that thread to a D thread is already a 
 difficult thing to do right, this proposal would make that 
 safer and more formal.
If for example C++ calls a D function, the D function does something temporary with arrays then those arrays will not be cleaned up if the array memory is thread local. More likely since the thread has no meta data in D, some error will happen. The programmer will sure know it but this is very inconvenient as the D function will not work if called outside D.
May 14 2021
parent Markk <markus.kuehni triviso.ch> writes:
On Friday, 14 May 2021 at 17:13:00 UTC, IGotD- wrote:
 On Friday, 14 May 2021 at 17:02:00 UTC, Markk wrote:

 If for example C++ calls a D function, the D function does 
 something temporary with arrays then those arrays will not be 
 cleaned up if the array memory is thread local.
First of all, if the D function lives in the C++ thread (i.e. normal callback) then it inherits the memory management of the C++ thread (e.g. non-GC) and would have to behave accordingly. The situation is much better than today, where the C++ thread punches into the D memory managed world, and it is solely the developers' responsibility to make sure not to return GC'd memory back to the C++ thread. The language guarantees (I described in the initial post) would make sure that nothing illegal can leak back into C++ by disallowing memory references from other (GC'd) threads. If however the C++ call wanted to pass memory to/from other D threads, It could do so via message passing. Everything is properly managed and accounted for, by the thread separation. _Mark
May 14 2021
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, May 14, 2021 at 01:48:12PM +0000, Markk via Digitalmars-d wrote:
[...]
 D has this nice default per-thread static memory model, i.e. if I
 understand all this correctly, this allows for better, more natural
 thread safety, while it makes it generally unsafe to use this memory
 from other threads (without locking). I guess the same is implicitly
 true for stack memory.
 
 Now could it equally make sense to use per-thread heaps?
It would be nice, because it would allow per-thread GC, which could address some of the problems people complain about the GC. However, there's a big caveat: sharing data between threads would be essentially extremely broken. Today, immutable can be safely shared across threads, because well, it's immutable. But once allocations are bound to a thread, this sharing would be impossible without major problems.
 I.e. all allocations would need to be per thread, and it would be
 illegal to reference memory of one thread's heap, static memory, or
 stack from another thread's memory.
[...] Yeah, this would be a major bugbear for implementing it in D. T -- If creativity is stifled by rigid discipline, then it is not true creativity.
May 14 2021
parent Markk <markus.kuehni triviso.ch> writes:
On Friday, 14 May 2021 at 16:11:26 UTC, H. S. Teoh wrote:

 However, there's a big caveat: sharing data between threads 
 would be essentially extremely broken.  Today, immutable can be 
 safely shared across threads, because well, it's immutable.  
 But once allocations are bound to a thread, this sharing would 
 be impossible without major problems.
No! I did not say this explicitly but of course `immutable` and `shared` remain the same. I was talking about heap memory.
 I.e. all allocations would need to be per thread, and it would 
 be illegal to reference memory of one thread's heap, static 
 memory, or stack from another thread's memory.
[...] Yeah, this would be a major bugbear for implementing it in D.
Compared to what for instance ` life` has to analyze, it is super easy, I think. _Mark
May 14 2021
prev sibling parent reply sighoya <sighoya gmail.com> writes:
On Friday, 14 May 2021 at 13:48:12 UTC, Markk wrote:
 Just some thoughts after reading a handful Rust and D books... 
 and after having seen so many wrinkle their noses at GC ... and 
 therefore, unfortunately D.

 _Mark
Isn't that what Nim already has, thread local garbage collection? I thought there was a problem to equip that in D, I think it relates to traced vs non traced pointer, though I'm no expert on this. I wanted to know more why we can't do this in D because I like the idea in general. However, I'm favoring more a task based solution mentioned already by Ola, i.e. a green thread based local GC as you could to concurrency without threads.
May 14 2021
parent Markk <markus.kuehni triviso.ch> writes:
On Friday, 14 May 2021 at 16:14:00 UTC, sighoya wrote:

 Isn't that what Nim already has, thread local garbage 
 collection?
Oh, I must look at Nim. _Mark
May 14 2021