www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - returning D string from C++?

reply bitwise <bitwise.pvt gmail.com> writes:
I have a Windows native window class in C++, and I need a 
function to return the window title.

So in D, I have this:

// isn't D's ABI stable enough to just return this from C++
// and call it a string in the extern(C++) interface? anyways..
struct DString
{
     size_t length;
     immutable(char)* ptr;
     string toString() { return ptr[0..length]; }
     alias toString this;
}

extern(C++) interface NativeWindow {
     DString getTitle() const;
}

and in C++, this:

class NativeWindow
{
public:
     struct DString {
         size_t length;
         const char* ptr;
     };

     virtual DString getTitle() const {
         DString ret;
         ret.length = GetWindowTextLength(_hwnd) + 1;
         ret.ptr = (const char*)gc_malloc(ret.length, 0xA, NULL);
         GetWindowText(_hwnd, (char*)ret.ptr, ret.length);
         return ret;
     }
};

So while it's not generally safe to _store_ pointers to D's GC 
allocated memory exclusively in C++, I've read that D's GC scans 
the stack, and getTitle() is being called from D(and so, is on 
that stack..right?). So is the string I'm returning safe from GC 
collection?

   Thanks
Aug 05
next sibling parent reply Jeremy DeHaan <dehaan.jeremiah gmail.com> writes:
On Saturday, 5 August 2017 at 20:17:23 UTC, bitwise wrote:
 I have a Windows native window class in C++, and I need a 
 function to return the window title.

 [...]
As long as you have a reachable reference to the GC memory SOMEWHERE, the GC won't reclaim it. It doesn't have to be on the stack as long as it is reachable through the stack.
Aug 05
parent reply bitwise <bitwise.pvt gmail.com> writes:
On Saturday, 5 August 2017 at 21:18:29 UTC, Jeremy DeHaan wrote:
 On Saturday, 5 August 2017 at 20:17:23 UTC, bitwise wrote:
 I have a Windows native window class in C++, and I need a 
 function to return the window title.

 [...]
As long as you have a reachable reference to the GC memory SOMEWHERE, the GC won't reclaim it. It doesn't have to be on the stack as long as it is reachable through the stack.
I'm basically worried about this happening: virtual DString getTitle() const { DString ret; ret.length = GetWindowTextLength(_hwnd) + 1; ret.ptr = (const char*)gc_malloc(ret.length, 0xA, NULL); ----gc collection on another thread---- GetWindowText(_hwnd, (char*)ret.ptr, ret.length); // BOOM return ret; } So I guess you're saying I'm covered then? I guess there's no reason I can think of for the GC to stop scanning at the language boundary, let alone any way to actually do that efficiently. Thanks
Aug 06
parent reply Mike Parker <aldacron gmail.com> writes:
On Sunday, 6 August 2017 at 16:23:01 UTC, bitwise wrote:

 So I guess you're saying I'm covered then? I guess there's no 
 reason I can think of for the GC to stop scanning at the 
 language boundary, let alone any way to actually do that 
 efficiently.
It's not something you can rely on. If the pointer is stored in memory allocated from the C heap, then the GC will never see it and can pull the rug out from under you. Best to make sure it's never collected. If you don't want to keep a reference to it on the D side, then call GC.addRoot on the pointer. That way, no matter where you hand it off, the GC will consider it as being live. When you're done with it, call GC.removeRoot.
Aug 06
parent reply bitwise <bitwise.pvt gmail.com> writes:
On Sunday, 6 August 2017 at 16:46:40 UTC, Mike Parker wrote:
 On Sunday, 6 August 2017 at 16:23:01 UTC, bitwise wrote:

 So I guess you're saying I'm covered then? I guess there's no 
 reason I can think of for the GC to stop scanning at the 
 language boundary, let alone any way to actually do that 
 efficiently.
It's not something you can rely on. If the pointer is stored in memory allocated from the C heap, then the GC will never see it and can pull the rug out from under you. Best to make sure it's never collected. If you don't want to keep a reference to it on the D side, then call GC.addRoot on the pointer. That way, no matter where you hand it off, the GC will consider it as being live. When you're done with it, call GC.removeRoot.
I was referring specifically to storing gc_malloc'ed pointers on the stack, meaning that I'm calling a C++ function on a D call stack, and storing the pointer as a local var in the C++ function before returning it to D. The more I think about it, the more I think it has to be ok to do. Unless D stores [ESP] to some variable at each extern(*) function call, then the GC would have no choice but indifference as to what side of the language boundary it was scanning on. If it did, I imagine it would say so here: https://dlang.org/spec/cpp_interface.html#memory-allocation
Aug 06
parent Mike Parker <aldacron gmail.com> writes:
On Sunday, 6 August 2017 at 17:16:05 UTC, bitwise wrote:

 I was referring specifically to storing gc_malloc'ed pointers 
 on the stack, meaning that I'm calling a C++ function on a D 
 call stack, and storing the pointer as a local var in the C++ 
 function before returning it to D.

 The more I think about it, the more I think it has to be ok to 
 do. Unless D stores [ESP] to some variable at each extern(*) 
 function call, then the GC would have no choice but 
 indifference as to what side of the language boundary it was 
 scanning on. If it did, I imagine it would say so here:

 https://dlang.org/spec/cpp_interface.html#memory-allocation
Yes, as long as you can guarantee it stays on the stack you should be good to go.
Aug 06
prev sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 05 Aug 2017 20:17:23 +0000
schrieb bitwise <bitwise.pvt gmail.com>:

      virtual DString getTitle() const {
          DString ret;
          ret.length = GetWindowTextLength(_hwnd) + 1;
          ret.ptr = (const char*)gc_malloc(ret.length, 0xA, NULL);
          GetWindowText(_hwnd, (char*)ret.ptr, ret.length);
          return ret;
      }
In due diligence, you are casting an ANSI string into a UTF-8 string which will result in broken Unicode for non-ASCII window titles. In any case it is better to use the wide-character versions of Windows-API functions nowadays. (Those ending in 'W' instead of 'A'). Starting with Windows 2000, the core was upgraded to UTF-16[1], which means you don't have to implement the lossy conversion to ANSI code pages and end up like this ... [information loss] UTF-8 <-> Windows codepage <-> UTF-16 | | in your code inside Windows ... but instead directly pass and get Unicode strings like this ... UTF-8 <-> UTF-16 | in your code string to zero terminated UTF-16: http://dlang.org/phobos/std_utf.html#toUTF16z zero terminated UTF-16 to string: ptr.to!string() or just ptr[0..len] if known Second I'd like to mention that you should have set ret.length = GetWindowText(_hwnd, (char*)ret.ptr, ret.length); Currently your length is anything from 1 to N bytes longer than the actual string[2], which is not obvious because any debug printing or display of the string stops at the embedded \0 terminator. [1] https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows [2] https://msdn.microsoft.com/de-de/library/windows/desktop/ms633521(v=vs.85).aspx -- Marco
Aug 05
parent bitwise <bitwise.pvt gmail.com> writes:
On Sunday, 6 August 2017 at 05:31:51 UTC, Marco Leise wrote:
 Am Sat, 05 Aug 2017 20:17:23 +0000
 schrieb bitwise <bitwise.pvt gmail.com>:

 [...]

 In due diligence, you are casting an ANSI string into a UTF-8
 string which will result in broken Unicode for non-ASCII window
 titles. In any case it is better to use the wide-character
 versions of Windows-API functions nowadays.
 [...]
Good point. (pun not originally intended ;) All serious projects I have done for Windows thus far have actually been in C# (default UTF-16), so I guess I've been spoiled.
 Second I'd like to mention that you should have set ret.length 
 = GetWindowText(_hwnd, (char*)ret.ptr, ret.length); Currently 
 your length is anything from 1 to N bytes longer than the 
 actual string[2], which is not obvious because any debug 
 printing or display of the string stops at the embedded \0 
 terminator.
 [...]
Totally right! I looked right at this info in the docs..not sure how I still got it wrong ;) Thanks
Aug 06