www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - More fun with toStringz and the GC

reply Don Allen <donaldcallen gmail.com> writes:
Remember all the fun we had last year when I failed to heed the 
warning in the toStringz documentation about retaining a 
reference to a char * passed into C? It took a long time to find 
that one, with a lot of help from Steve Schveighoffer and others.

Well, I've got another one. Consider this:

````
         // Get number of children of the parent account
         auto gc_protect = bind_text(n_children_stmt, 1, 
parent.guid);
         parent.children.length = one_row!(int)(n_children_stmt, 
&get_int);

         auto gc_protect2 = bind_text(account_child_stmt, 1, 
parent.guid);
         for (int i = 0; next_row_available_p(account_child_stmt, 
&sqlite3_reset); i++) {
             parent.children[i] = new Account;
             parent.children[i].name = 
fromStringz(sqlite3_column_text(account_child_stmt, 0)).idup;
             parent.children[i].guid = 
fromStringz(sqlite3_column_text(account_child_stmt, 1)).idup;
             parent.children[i].flags = 
sqlite3_column_int(account_child_stmt, 2);
             parent.children[i].value = 
get_account_value(parent.children[i]);
         }
````
bind_text takes a D string, turns it into a C string with 
toStringz, uses that to call sqlite3_bind_text and returns the C 
string, which I store as you can see with the intention of 
protecting it from the gc. The code as written above does not 
work. At some point, I get an index-out-of-bounds error, because 
the loop is seeing too many children. If I turn off the GC, the 
code works correctly and the application completes normally.

With the GC on, if I put a debugging writeln inside the loop, 
right after the 'for', that prints, among other things, the value 
of gc_protect2 (I wanted to convince myself that the GC wasn't 
moving what it points to; yes, I know the documentation says the 
current GC won't do that), the problem goes away. A Heisenbug!

My theory: because gc_protect2 is never referenced, I'm guessing 
that the compiler is optimizing away the storage of the returned 
pointer, the supporting evidence being what I said in the 
previous paragraph. Anyone have a better idea?

By the way, I get the same error compiling this with dmd or ldc.

/Don Allen
Aug 05 2022
parent reply jfondren <julian.fondren gmail.com> writes:
On Friday, 5 August 2022 at 22:51:07 UTC, Don Allen wrote:
 My theory: because gc_protect2 is never referenced, I'm 
 guessing that the compiler is optimizing away the storage of 
 the returned pointer, the supporting evidence being what I said 
 in the previous paragraph. Anyone have a better idea?
A local variable definitely isn't enough: https://forum.dlang.org/thread/xchnfzvpmxgytqprbosz forum.dlang.org This package came of it: https://code.dlang.org/packages/keepalive
Aug 05 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/5/22 7:13 PM, jfondren wrote:
 On Friday, 5 August 2022 at 22:51:07 UTC, Don Allen wrote:
 My theory: because gc_protect2 is never referenced, I'm guessing that 
 the compiler is optimizing away the storage of the returned pointer, 
 the supporting evidence being what I said in the previous paragraph. 
 Anyone have a better idea?
A local variable definitely isn't enough: https://forum.dlang.org/thread/xchnfzvpmxgytqprbosz forum.dlang.org This package came of it: https://code.dlang.org/packages/keepalive
Yes, but I will warn you, the compilers are smart buggers. I think someone came up with a case where this still doesn't keep it alive (been a while since I made that). The only true solution is to use `GC.addRoot` on the string and `GC.removeRoot` when you are done. -Steve
Aug 05 2022
parent reply Don Allen <donaldcallen gmail.com> writes:
On Friday, 5 August 2022 at 23:38:22 UTC, Steven Schveighoffer 
wrote:
 On 8/5/22 7:13 PM, jfondren wrote:
 On Friday, 5 August 2022 at 22:51:07 UTC, Don Allen wrote:
 My theory: because gc_protect2 is never referenced, I'm 
 guessing that the compiler is optimizing away the storage of 
 the returned pointer, the supporting evidence being what I 
 said in the previous paragraph. Anyone have a better idea?
A local variable definitely isn't enough: https://forum.dlang.org/thread/xchnfzvpmxgytqprbosz forum.dlang.org This package came of it: https://code.dlang.org/packages/keepalive
Yes, but I will warn you, the compilers are smart buggers. I think someone came up with a case where this still doesn't keep it alive (been a while since I made that). The only true solution is to use `GC.addRoot` on the string and `GC.removeRoot` when you are done.
Steve -- Thanks for this. But this time I *did* read the documentation, specifically this: ```` Interfacing Garbage Collected Objects With Foreign Code The garbage collector looks for roots in: the static data segment the stacks and register contents of each thread the TLS (thread-local storage) areas of each thread any roots added by core.memory.GC.addRoot() or core.memory.GC.addRange() If the only pointer to an object is held outside of these areas, then the collector will miss it and free the memory. To avoid this from happening, either maintain a pointer to the object in an area the collector does scan for pointers; add a root where a pointer to the object is stored using core.memory.GC.addRoot() or core.memory.GC.addRange(). reallocate and copy the object using the foreign code's storage allocator or using the C runtime library's malloc/free. ```` And this, from Section 32.2 of the Language Reference Manual: ```` If pointers to D garbage collector allocated memory are passed to C functions, it's critical to ensure that the memory will not be collected by the garbage collector before the C function is done with it. This is accomplished by: Making a copy of the data using core.stdc.stdlib.malloc() and passing the copy instead. -->Leaving a pointer to it on the stack (as a parameter or automatic variable), as the garbage collector will scan the stack.<-- Leaving a pointer to it in the static data segment, as the garbage collector will scan the static data segment. Registering the pointer with the garbage collector with the std.gc.addRoot() or std.gc.addRange() calls. ```` I did what the documentation says and it does not work. Having a better version of C and C++ with a gc and the ability to directly call useful C/C++ libraries is a big D selling point, as far as I am concerned. It was a major motivation for the creation of Go. But getting the interaction between the GC and foreign functions properly documented is essential. Right now, there are bits and pieces of advice in the Language Reference, the Feature Overview, and the toStringz documentation and none of it tells you what you need to know. In fact, it does the opposite, telling you to do something (stick a pointer on the stack) that does not work, which leads to the "nasty bug" spoken of in the toStringz doc. When you waste a lot of a user's time with poor and inaccurate documentation, as this did mine, you are not making friends. I would advise fixing this asap. /Don
Aug 05 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/5/22 8:51 PM, Don Allen wrote:

 And this, from Section 32.2 of the Language Reference Manual:
 ````
 If pointers to D garbage collector allocated memory are passed to C 
 functions, it's critical to ensure that the memory will not be collected 
 by the garbage collector before the C function is done with it. This is 
 accomplished by:
 
      Making a copy of the data using core.stdc.stdlib.malloc() and 
 passing the copy instead.
      -->Leaving a pointer to it on the stack (as a parameter or 
 automatic variable), as the garbage collector will scan the stack.<--
      Leaving a pointer to it in the static data segment, as the garbage 
 collector will scan the static data segment.
      Registering the pointer with the garbage collector with the 
 std.gc.addRoot() or std.gc.addRange() calls.
 ````
 I did what the documentation says and it does not work.
I know, I felt exactly the same way in my post on it: https://forum.dlang.org/post/sial38$7v0$1 digitalmars.com I even issued a PR to remove the problematic recommendation: https://github.com/dlang/dlang.org/pull/3102 But there was pushback to the point where it wasn't worth it. So I closed it. -Steve
Aug 05 2022
next sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Fri, Aug 05, 2022 at 10:14:24PM -0400, Steven Schveighoffer via
Digitalmars-d-announce wrote:
 On 8/5/22 8:51 PM, Don Allen wrote:
 
 And this, from Section 32.2 of the Language Reference Manual:
 ````
 If pointers to D garbage collector allocated memory are passed to C
 functions, it's critical to ensure that the memory will not be
 collected by the garbage collector before the C function is done
 with it. This is accomplished by:
 
      Making a copy of the data using core.stdc.stdlib.malloc() and
  passing the copy instead.
      -->Leaving a pointer to it on the stack (as a parameter or
 automatic variable), as the garbage collector will scan the stack.<--
      Leaving a pointer to it in the static data segment, as the garbage
 collector will scan the static data segment.
      Registering the pointer with the garbage collector with the
 std.gc.addRoot() or std.gc.addRange() calls.
 ````
 I did what the documentation says and it does not work.
I know, I felt exactly the same way in my post on it: https://forum.dlang.org/post/sial38$7v0$1 digitalmars.com I even issued a PR to remove the problematic recommendation: https://github.com/dlang/dlang.org/pull/3102 But there was pushback to the point where it wasn't worth it. So I closed it.
[...] IMO this PR should be revived. The one thing worse than no documentation is misleading documentation. This state of things should not be allowed to continue. T -- Study gravitation, it's a field with a lot of potential.
Aug 06 2022
prev sibling parent reply Don Allen <donaldcallen gmail.com> writes:
On Saturday, 6 August 2022 at 02:14:24 UTC, Steven Schveighoffer 
wrote:
 On 8/5/22 8:51 PM, Don Allen wrote:

 And this, from Section 32.2 of the Language Reference Manual:
 ````
 If pointers to D garbage collector allocated memory are passed 
 to C functions, it's critical to ensure that the memory will 
 not be collected by the garbage collector before the C 
 function is done with it. This is accomplished by:
 
      Making a copy of the data using core.stdc.stdlib.malloc() 
 and passing the copy instead.
      -->Leaving a pointer to it on the stack (as a parameter 
 or automatic variable), as the garbage collector will scan the 
 stack.<--
      Leaving a pointer to it in the static data segment, as 
 the garbage collector will scan the static data segment.
      Registering the pointer with the garbage collector with 
 the std.gc.addRoot() or std.gc.addRange() calls.
 ````
 I did what the documentation says and it does not work.
I know, I felt exactly the same way in my post on it: https://forum.dlang.org/post/sial38$7v0$1 digitalmars.com I even issued a PR to remove the problematic recommendation: https://github.com/dlang/dlang.org/pull/3102 But there was pushback to the point where it wasn't worth it. So I closed it.
As I said in my previous post, the documentation issue really needs to be addressed. I do realize now that I *assumed* that what I did was going to result in a stack reference to the c-string I was trying to keep alive. Bad assumption, obviously. But I think the point is that there is a simple, reliable mechanism -- addRoot, removeRoot -- that works and the documentation should say that and only that. Walter said this in his 9/25/21 post: "Use GC.addRoot() to keep a reference alive. That's what it's for. ". That's all that's needed. All the rest leads people like me who don't think like a compiler to make the mistake I made.
Aug 06 2022
parent Don Allen <donaldcallen gmail.com> writes:
On Saturday, 6 August 2022 at 13:40:12 UTC, Don Allen wrote:
 On Saturday, 6 August 2022 at 02:14:24 UTC, Steven 
 Schveighoffer wrote:
 On 8/5/22 8:51 PM, Don Allen wrote:

 And this, from Section 32.2 of the Language Reference Manual:
 ````
 If pointers to D garbage collector allocated memory are 
 passed to C functions, it's critical to ensure that the 
 memory will not be collected by the garbage collector before 
 the C function is done with it. This is accomplished by:
 
      Making a copy of the data using 
 core.stdc.stdlib.malloc() and passing the copy instead.
      -->Leaving a pointer to it on the stack (as a parameter 
 or automatic variable), as the garbage collector will scan 
 the stack.<--
      Leaving a pointer to it in the static data segment, as 
 the garbage collector will scan the static data segment.
      Registering the pointer with the garbage collector with 
 the std.gc.addRoot() or std.gc.addRange() calls.
 ````
 I did what the documentation says and it does not work.
I know, I felt exactly the same way in my post on it: https://forum.dlang.org/post/sial38$7v0$1 digitalmars.com I even issued a PR to remove the problematic recommendation: https://github.com/dlang/dlang.org/pull/3102 But there was pushback to the point where it wasn't worth it. So I closed it.
As I said in my previous post, the documentation issue really needs to be addressed. I do realize now that I *assumed* that what I did was going to result in a stack reference to the c-string I was trying to keep alive.
At the risk of over-doing this, one more thing I want to say in the interest of clarity: the incorrect documentation led me right into this error: "This is accomplished by ..... Leaving a pointer to it on the stack (as a parameter or automatic variable), as the garbage collector will scan the stack." I've fixed my code using addRoot/removeRoot and so far it seems to work.
Aug 06 2022