www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Leave GC collection to the user of the D library?

reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
tl;dr I am scared of non-D programs calling my D library functions from 
foreign threads. So, I am planning on asking the user to trigger 
collection themselves by calling a collection function of my library. Crazy?

I've had serious issues bringing up a D library in a foreign 
environment: Python modules loading a C++ library, which in turn uses 
our D library. There were segmentation failures when loading this .so.

One workaround was to start the GC in disabled state with the following 
global variable defined in the library.

   extern(C) __gshared string[] rt_options = [ "gcopt=disable:1" ];

After that, the GC is enabled inside the library's initialization 
function with the following command.

     GC.enable();

That workaround seemed to be sufficient to load the library 
successfully. Unfortunately, that was not enough to weed out all issues 
related to libraries because this library itself loads other D 
libraries. All of this caused sporadic issues. (My brain is too fried to 
even remember what was a cause, what was a usable workaround, etc. 
Sometimes I wasted days chasing a solution while using a test, which had 
nothing to do with the solution. I would change the code, test, no go; 
repeat, no go. It turns out, my test was unrelated. Argh!)

So, we came up with a drastic solution: Since all this code works just 
fine in a pure D environment, make the library as thin as possible; the 
library starts a daemon that is written in D with all the functionality. 
The library merely dispatches the requests to that daemon.

The library starts the daemon with pipeProcess(); pipes are used for 
dispatching requests and shared memory is used for large data. This idea 
"worked like a charm." Phew!

However, dispatching of the requests to the daemon is performed by a 
single library thread in a blocked manner: When a request is written to 
the pipe, the response is read back (blocked) and the result is returned 
to the user of the library function.

So now we want to use this functionality from multiple threads. Yikes! I 
had so much trouble with foreign threads calling D libraries in the past 
that I get scared. (In one case it was Java threads.) There are so many 
dimensions to play with, hypothesizing a correct solution has been 
exhausting. I was never sure whether the issues were e.g. with 
threadAttachThis() or my misusing it.

Ok... How about this idea that would allow this library to be used from 
multiple threads: Leave the GC disabled with that 'rt_options' variable 
above and don't enable it in the library initialization function (this 
is not init(); rather, a function that the user calls explicitly). 
Instead, add yet another library API function for collecting garbage. I 
can document that no other thread is allowed to call any other function 
of the library when this collection function is called. They can do this 
either at strategic points that they know no other thread is using the 
library or they can use a mutex.

Another trivial function that I add can relay GC stats to the user so 
that they can decide to call the GC if the allocations have been high 
enough.

This would allow the user start as many foreign threads as possible. 
Right? Is this sane? Is collection the only issue here? Do foreign 
threads still need to call threadAttachThis()? What happens if they don't?

I feel so hopeless that in the past, I even thought about and 
experimented with banning the user from starting threads on their own. 
Rather, they would call my library on a posix compatible thread API and 
create their threads through me, which happens to be a D thread, so no 
thread would be a "foreign thread" and everything would work just fine. 
I haven't deployed this crazy idea (yet).

Ali
May 08 2021
next sibling parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 9 May 2021 at 03:25:06 UTC, Ali Çehreli wrote:
 The library starts the daemon with pipeProcess(); pipes are 
 used for dispatching requests and shared memory is used for 
 large data. This idea "worked like a charm." Phew!
Why don't you do this in a manner that works with multiple threads? Block calling threads with semaphores, wake them up when results are ready. (Sidenote, if gc was task limited then this would not have been an issue...)
May 09 2021
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 5/9/21 12:59 AM, Ola Fosheim Grostad wrote:

 On Sunday, 9 May 2021 at 03:25:06 UTC, Ali =C3=87ehreli wrote:
 The library starts the daemon with pipeProcess(); pipes are used for
 dispatching requests and shared memory is used for large data. This
 idea "worked like a charm." Phew!
Why don't you do this in a manner that works with multiple threads?
That's what I want to do but those threads are created by the user,=20 unknown to the D GC. Although there are ways of dealing with that=20 case[1], I am questioning whether the library can disable GC and lets=20 the user manage GC collections explicitly (cooperatively?). Ali [1] https://dlang.org/phobos/core_memory.html
May 09 2021
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 9 May 2021 at 10:27:04 UTC, Ali Çehreli wrote:
 That's what I want to do but those threads are created by the 
 user, unknown to the D GC. Although there are ways of dealing 
 with that case[1], I am questioning whether the library can 
 disable GC and lets the user manage GC collections explicitly 
 (cooperatively?).
Wouldn't this be annoying for the user of the library? (Or maybe you only have a handful users?). I have to admit I haven't read about IPC in a decade or so, but it seems to me that there are many options? Like, if you have N cores, create N worker threads in the daemon and create N pipes? Then set the count of the semaphore to N, so when the semaphore hits 0 all pipes are in use, and the calling thread will wait in the API stub (wrapper function) until a worker thread is available? Kinda clunky, but fits your model... I guess. But maybe you want to get rid of the daemon? In that case I probably would just use mailboxes and wrap the API in futures if the work isn't fine-grained, but I don't know what kind if library you are making...
May 09 2021
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 9 May 2021 at 10:42:19 UTC, Ola Fosheim Grøstad wrote:
 semaphore hits 0 all pipes are in use, and the calling thread 
 will wait in the API stub (wrapper function) until a worker 
 thread is available? Kinda clunky, but fits your model... I 
 guess.
So I imagine you could structure your API wrapper something like this: semaphore.init(N) wrapped_api_call(){ semaphore.down(1) dostuff() api_call() semaphore.up(1) if (memory pressure){ semaphore.down(N) gc_collect() semaphore.up(N) } } I dunno.
May 09 2021
prev sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 5/9/21 3:42 AM, Ola Fosheim Gr=C3=B8stad wrote:

 On Sunday, 9 May 2021 at 10:27:04 UTC, Ali =C3=87ehreli wrote:
 whether the library can disable GC and lets
 the user manage GC collections explicitly (cooperatively?).
Wouldn't this be annoying for the user of the library? (Or maybe you only have a handful users?).
Yes, would be annoying; that's why I am asking what others would think=20 about this. (And yes, there are a handful of users.)
 if you have N cores, create N
 worker threads in the daemon and create N pipes?
That's my idea. The trouble is with the thin library layer: It cannot=20 allocate memory while another one is doing collection. This is handled=20 by D runtime for D threads but foreign threads (one that's created by=20 e.g. the C++ program) cannot be known to the D runtime by-default.
 Then set the count of
 the semaphore to N, so when the semaphore hits 0 all pipes are in use,=
 and the calling thread will wait in the API stub (wrapper function)
 until a worker thread is available?
That's an interesting idea! But it should be a little different: "I need = to collect garbage; I will wait till all others are quiet (and they=20 should know to wait until the collection is over); I collect garbage.=20 Everybody resume." Still, this would require keeping the D GC disabled=20 and my library handles collections at opportune moments that it creates.
 But maybe you want to get rid of the daemon?
I want the daemon because a full-fledged D library that loaded its own=20 libraries as neede caused issues. Perhaps they were all my mistakes but=20 now that it works this way; the daemon stays. :) Ali
May 09 2021
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 9 May 2021 at 20:36:39 UTC, Ali Çehreli wrote:
 That's an interesting idea! But it should be a little 
 different: "I need to collect garbage; I will wait till all 
 others are quiet (and they should know to wait until the 
 collection is over); I collect garbage. Everybody resume." 
 Still, this would require keeping the D GC disabled and my 
 library handles collections at opportune moments that it 
 creates.
Yes, one semaphore may use an nonoptimal order when it wakes up threads... Getting this right takes some time.
May 09 2021
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 5/9/21 6:58 PM, Ola Fosheim Grostad wrote:

 Yes, one semaphore may use an nonoptimal order when it wakes up threads...
 Getting this right takes some time.
ReadWriteMutex is promising: https://dlang.org/library/core/sync/rwmutex/read_write_mutex.html The writer would be the garbage collecting thread in this case. Ali
May 09 2021
parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Monday, 10 May 2021 at 03:45:23 UTC, Ali Çehreli wrote:
 https://dlang.org/library/core/sync/rwmutex/read_write_mutex.html

 The writer would be the garbage collecting thread in this case.
The implementation seems to use a class monitor an syncronized, does that work with non D threads?
May 10 2021
prev sibling next sibling parent reply Daniel N <no public.email> writes:
On Sunday, 9 May 2021 at 03:25:06 UTC, Ali Çehreli wrote:
 tl;dr I am scared of non-D programs calling my D library 
 functions from foreign threads. So, I am planning on asking the 
 user to trigger collection themselves by calling a collection 
 function of my library. Crazy?
Since this is a "thin" library, feels like the sane solution is to make it 100% nogc and keep the GC only in the server. But might not be possible if the design of those other libraries are all-in on GC "this library itself loads other D libraries"
May 09 2021
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 5/9/21 1:09 AM, Daniel N wrote:

 On Sunday, 9 May 2021 at 03:25:06 UTC, Ali =C3=87ehreli wrote:
 tl;dr I am scared of non-D programs calling my D library functions
 from foreign threads. So, I am planning on asking the user to trigger=
 collection themselves by calling a collection function of my library.=
 Crazy?
Since this is a "thin" library, feels like the sane solution is to mak=
e
 it 100% nogc and keep the GC only in the server.
May not be possible because the library creates MmFile objects as needed = (the data length to be placed on shared memory is not known in advance). = Being a class, MmFile objects are naturally created with 'new' but=20 perhaps placement new could work. I haven't investigated that.
 But might not be possible if the design of those other libraries are
 all-in on GC
 "this library itself loads other D libraries"
I wasn't clear on that part: Now that the library is thin, loading other = D libraries has already been pushed to the backend daemon. So, my worry is based on my unconfidence in managing foreign threads=20 with thread_attachThis and thread_detachThis. Should I use a=20 thread-local Boolean to keep track? Should I call those inside 'static=20 this' and 'static ~this' blocks? What if the thread dies? I see that=20 even the name of the functions gained a "_tpl": =20 https://dlang.org/phobos/core_thread_threadbase.html#.thread_attachThis_t= pl I wonder what "_tpl" means. It mentions rt_moduleTlsCtor() there. Failing to find my way through all of that is the reason why I am hoping = that I can push a GC collection cycle to the caller. Although=20 embarrassing, it seems to be a more reliable design because the user is=20 in a better position to know they are not executing any of the D library = functions when they call the collection function. Ali
May 09 2021
prev sibling next sibling parent reply IGotD- <nise nise.com> writes:
On Sunday, 9 May 2021 at 03:25:06 UTC, Ali Çehreli wrote:
 That workaround seemed to be sufficient to load the library 
 successfully. Unfortunately, that was not enough to weed out 
 all issues related to libraries because this library itself 
 loads other D libraries.
Can you statically link the library so that no other D library is needed? Might duplicate code but whatever. However, disable GC in D is difficult as soon some library function use an array for example you need the GC.
 This would allow the user start as many foreign threads as 
 possible. Right? Is this sane? Is collection the only issue 
 here? Do foreign threads still need to call threadAttachThis()? 
 What happens if they don't?

 I feel so hopeless that in the past, I even thought about and 
 experimented with banning the user from starting threads on 
 their own. Rather, they would call my library on a posix 
 compatible thread API and create their threads through me, 
 which happens to be a D thread, so no thread would be a 
 "foreign thread" and everything would work just fine. I haven't 
 deployed this crazy idea (yet).

 Ali
This is why I'm very opposed to any thread specific GC optimization. Allocated memory on GC and malloc/free must always be global and can operated on from any thread. Also, TLS must be removed from phobos/druntime so that there is basically no need for tracking the threads from a memory point of view, at least it shouldn't crash if D doesn't know about the thread unless using thread primitives. Highest priority should be removing TLS totally. Second, make sure there is no connection between memory management and threads.
May 09 2021
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Sunday, 9 May 2021 at 13:42:48 UTC, IGotD- wrote:
 Also, TLS must be removed from phobos/druntime so that there is 
 basically no need for tracking the threads from a memory point 
 of view, at least it shouldn't crash if D doesn't know about 
 the thread unless using thread primitives.
I don't think that would be enough. The garbage collector needs to know of all threads running D code, so that it can scan their stack and registers, so that it can know about objects the pointer to which exists only in that thread's stack or registers.
May 09 2021
parent reply IGotD- <nise nise.com> writes:
On Sunday, 9 May 2021 at 13:53:30 UTC, Vladimir Panteleev wrote:
 I don't think that would be enough. The garbage collector needs 
 to know of all threads running D code, so that it can scan 
 their stack and registers, so that it can know about objects 
 the pointer to which exists only in that thread's stack or 
 registers.
That's the unfortunate case with D in that case. The otherwise good FF interoperability of D is hampered by this. Then we are back to if GC pointers should be an own type or not. Time for the D3 fork.
May 09 2021
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 9 May 2021 at 14:10:48 UTC, IGotD- wrote:
 That's the unfortunate case with D in that case. The otherwise 
 good FF interoperability of D is hampered by this. Then we are 
 back to if GC pointers should be an own type or not. Time for 
 the D3 fork.
I wonder what you think about task-bound GC? https://forum.dlang.org/post/yqdwgbzkmutjzfdhotst forum.dlang.org
May 09 2021
parent reply IGotD- <nise nise.com> writes:
On Sunday, 9 May 2021 at 16:08:07 UTC, Ola Fosheim Grøstad wrote:
 I wonder what you think about task-bound GC?

 https://forum.dlang.org/post/yqdwgbzkmutjzfdhotst forum.dlang.org
I see it as a special case where memory management is bounded to a certain primitive/ways of programming. Also a pool for each possible primitive will increase the meta data. Then you also mention that when the actor is destroyed no scan is needed because the pool is explicitly destroyed as well. This is basically a form of deterministic cleanup that you would expect in C++ when using unique_ptr for example. Some in that thread mention that D must head towards RC and that's what my opinion is as well. As Walter do not want to make a special fat pointer type, at least making Phobos/druntime using RC internally would be a step in the right direction.
May 09 2021
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 9 May 2021 at 16:29:12 UTC, IGotD- wrote:
 I see it as a special case where memory management is bounded 
 to a certain primitive/ways of programming. Also a pool for 
 each possible primitive will increase the meta data.
What do you mean by meta data in this context? I think it should resolve at compile time? Yes, it does imply a programming model if you want GC, but it should allow the programmer to write his own scheduler for flexibility. You would still need RC between tasks...
 Some in that thread mention that D must head towards RC and 
 that's what my opinion is as well. As Walter do not want to 
 make a special fat pointer type, at least making 
 Phobos/druntime using RC internally would be a step in the 
 right direction.
The problem of not having a fat pointer type is when you have arrays of objects. Then you cannot have a RC counter on a negative offset, so you will be forced to embed it in all objects (whether RC-managed or not). It can be mitigated by requiring that all owning pointers to arrays must be a RC-slice and not a RC-pointer. Then you have a RC-slice of static length 1. (shrug)
May 09 2021
prev sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Sunday, 9 May 2021 at 03:25:06 UTC, Ali Çehreli wrote:
 That workaround seemed to be sufficient to load the library 
 successfully. Unfortunately, that was not enough to weed out 
 all issues related to libraries because this library itself 
 loads other D libraries. All of this caused sporadic issues. 
 (My brain is too fried to even remember what was a cause, what 
 was a usable workaround, etc. Sometimes I wasted days chasing a 
 solution while using a test, which had nothing to do with the 
 solution. I would change the code, test, no go; repeat, no go. 
 It turns out, my test was unrelated. Argh!)
In my experience, calling D from C/C++ works fine as long as 1) the D runtime is allowed to initialize, and 2) all threads which execute D code are registered with the D runtime. If C/C++ code is allowed to hold the only reference to an object in the D GC heap, then the second rule needs to be extended to all threads which may hold a reference to said objects, but it may be practical to copy D objects at the C/D barrier either to caller-owned memory, or malloc-allocated memory that the caller can free by calling the standard C `free` function.
 So, we came up with a drastic solution: Since all this code 
 works just fine in a pure D environment, make the library as 
 thin as possible; the library starts a daemon that is written 
 in D with all the functionality. The library merely dispatches 
 the requests to that daemon.

 The library starts the daemon with pipeProcess(); pipes are 
 used for dispatching requests and shared memory is used for 
 large data. This idea "worked like a charm." Phew!

 However, dispatching of the requests to the daemon is performed 
 by a single library thread in a blocked manner: When a request 
 is written to the pipe, the response is read back (blocked) and 
 the result is returned to the user of the library function.
If the threads don't need to share state, you could just as well spawn one subprocess per thread, and let it do its own data processing. Another approach would be to listen on a UNIX socket instead of using a pipe, which allows using `accept` to open new communication channels on-demand.
 I feel so hopeless that in the past, I even thought about and 
 experimented with banning the user from starting threads on 
 their own. Rather, they would call my library on a posix 
 compatible thread API and create their threads through me, 
 which happens to be a D thread, so no thread would be a 
 "foreign thread" and everything would work just fine. I haven't 
 deployed this crazy idea (yet).
Perhaps it would be simpler to just write the library part in C / C++ / -betterC D. std.mmfile has many lines, but the work it has to do is actually quite simple. The same is true about std.socket. This will completely avoid your headache with getting the D runtime / GC to play well with the host process's threading model.
May 09 2021
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 5/9/21 6:51 AM, Vladimir Panteleev wrote:

 In my experience, calling D from C/C++ works fine as long as 1) the D
 runtime is allowed to initialize, and 2) all threads which execute D
 code are registered with the D runtime.
I have the following old branch that tries to fix bugs with attaching and detaching: https://github.com/dlang/druntime/pull/1989 I also have this worry that if a foreign thread goes without the knowledge of D runtime, the program will crash because the runtime would try to stop a non-existing thread.
 If the threads don't need to share state, you could just as well spawn
 one subprocess per thread, and let it do its own data processing.
That can be achieved by some changes but the issue is not with the backend (the daemon); rather, the foreign threads that call the thin library layer. This layer does allocate (see below), which may trigger collection.
 Another approach would be to listen on a UNIX socket instead of using a
 pipe, which allows using `accept` to open new communication channels
 on-demand.
Aside: I've been under the impression that there couldn't be new pipes opened but I learned that a file descriptor can be passed to another process over unix domain sockets (only Linux is interesting to me here) and the file descriptor and its "passed copy" can be used like a pipe: https://stackoverflow.com/questions/2358684/can-i-share-a-file-descriptor-to-another-process-on-linux-or-are-they-local-to-t I haven't experimented with it yet but I think I will use it because the existing channel I wrote uses File objects of PipeProcess, which places large data on shared memory which is re-opened as more space is needed. (And thank you for your work in std.process and more! :) )
 Perhaps it would be simpler to just write the library part in C / C++ /
 -betterC D.
That's a great idea as well.
 std.mmfile has many lines, but the work it has to do is
 actually quite simple.
I know because I've already written the equivalent of mmfile myself by accident. :D When things didn't work about 2 years ago I suspected bugs in mmfile so I wrote my own wrapper. :)
 The same is true about std.socket. This will
 completely avoid your headache with getting the D runtime / GC to play
 well with the host process's threading model.
Makes sense. I've already turned into someone who uses only D's standard library or writes his own. :) The only third party dependency we have is cmake-d. Ali
May 09 2021
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Sunday, 9 May 2021 at 20:53:28 UTC, Ali Çehreli wrote:
 On 5/9/21 6:51 AM, Vladimir Panteleev wrote:

 In my experience, calling D from C/C++ works fine as long as
1) the D
 runtime is allowed to initialize, and 2) all threads which
execute D
 code are registered with the D runtime.
I have the following old branch that tries to fix bugs with attaching and detaching: https://github.com/dlang/druntime/pull/1989 I also have this worry that if a foreign thread goes without the knowledge of D runtime, the program will crash because the runtime would try to stop a non-existing thread.
That sounds like a bug? 🤔
May 09 2021
prev sibling parent Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Sunday, 9 May 2021 at 20:53:28 UTC, Ali Çehreli wrote:
 That can be achieved by some changes but the issue is not with 
 the backend (the daemon); rather, the foreign threads that call 
 the thin library layer.

 [...]

 Aside: I've been under the impression that there couldn't be 
 new pipes opened but I learned that a file descriptor can be 
 passed to another process over unix domain sockets (only Linux 
 is interesting to me here) and the file descriptor and its 
 "passed copy" can be used like a pipe:


 https://stackoverflow.com/questions/2358684/can-i-share-a-file-descriptor-to-another-process-on-linux-or-are-they-local-to-t
Yes. But, all of this requires shared state, which requires synchronization and/or initialization. Alternatively, each thread would be entirely self-contained (and own its personal worker process), which avoids those issues entirely. (Perhaps with TLS, or better, by making the C++ caller own the state object.)
May 10 2021