www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Convert wchar* to wstring?

reply Thalamus <norobots foo.com> writes:
I'm sorry for this total newbie question, but for some reason 
this is eluding me. I must be overlooking something obvious, but 
I haven't been able to figure this out and haven't found anything 
helpful.


and one of the parameters is a string. This works just fine for 
ANSI, but I'm having trouble with the Unicode equivalent.

For ANSI, the message parameter is char*, and string info = 
to!string(message) produces the correct string.

For Unicode, I assumed this would be wchar_t*, as it is in C++. 
(In C++ you can just pass the wchar_t* value to the wstring 
constructor.) So I tried wchar_t*, wchar* and dchar* as well. 
When the message parameter is wchar*, wstring info = 
to!wstring(message) populates the string with the _address_ of 
the wchar*. So when message was in the debugger as 
0x00000000035370e8 L"Writing Exhaustive unit tests is 
exhausting.", the wstring info variable ended up as {length=7 
ptr=0x000000001c174a20 L"35370E8" }. The dstring*/wchar_t* 
version had equivalent results.

Again, I'm sure I'm missing something obvious, but I poked at 
this problem with various types, casts, Phobos library string 
conversions, and I'm just stumped! :)

thanks,
Thalamus
Apr 04 2016
next sibling parent tcak <1ltkrs+3wyh1ow7kzn1k sharklasers.com> writes:
On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:
 I'm sorry for this total newbie question, but for some reason 
 this is eluding me. I must be overlooking something obvious, 
 but I haven't been able to figure this out and haven't found 
 anything helpful.


 (C)), and one of the parameters is a string. This works just 
 fine for ANSI, but I'm having trouble with the Unicode 
 equivalent.

 For ANSI, the message parameter is char*, and string info = 
 to!string(message) produces the correct string.

 For Unicode, I assumed this would be wchar_t*, as it is in C++. 
 (In C++ you can just pass the wchar_t* value to the wstring 
 constructor.) So I tried wchar_t*, wchar* and dchar* as well. 
 When the message parameter is wchar*, wstring info = 
 to!wstring(message) populates the string with the _address_ of 
 the wchar*. So when message was in the debugger as 
 0x00000000035370e8 L"Writing Exhaustive unit tests is 
 exhausting.", the wstring info variable ended up as {length=7 
 ptr=0x000000001c174a20 L"35370E8" }. The dstring*/wchar_t* 
 version had equivalent results.

 Again, I'm sure I'm missing something obvious, but I poked at 
 this problem with various types, casts, Phobos library string 
 conversions, and I'm just stumped! :)

 thanks,
 Thalamus
I cannot give you any code example, but can you try that: 1. By using a loop, calculate the total byte length until finding 0 (zero). (This would work only if it was given as NULL-terminated, otherwise you need to know the length already.) 2. Then define wchar[ calculated_length ] mystring; 3. Copy the content from wchar* into you array. mystring[0 .. calculated_length ] = wcharptr[0 .. calculated_length]; 4. If you want, you can do casting for your mystring to convert it to wstring.
Apr 04 2016
prev sibling next sibling parent reply tsbockman <thomas.bockman gmail.com> writes:
On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:
 When the message parameter is wchar*, wstring info = 
 to!wstring(message) populates the string with the _address_ of 
 the wchar*. So when message was in the debugger as 
 0x00000000035370e8 L"Writing Exhaustive unit tests is 
 exhausting.", the wstring info variable ended up as {length=7 
 ptr=0x000000001c174a20 L"35370E8" }.
`wchar*` is a raw pointer. D APIs generally expect a dynamic array - also known as a "slice" - which packs the pointer together with an explicit `length` field. You can easily get a slice from a pointer using D's convenient slicing syntax: https://dlang.org/spec/arrays.html#slicing wchar* cw; size_t cw_len; // be sure to use the right length, or you'll suffer buffer overruns. wchar[] dw = cw[0 .. cw_len]; Slicing is extremely fast, because it does not allocate any new heap memory: `dw` is still pointing to the same chunk of memory as cw. D APIs that work with text will often accept a mutable character array like `dw` without issue. However, `wstring` in D is an alias for `immutable(wchar[])`. In the example above, `dw` cannot be immutable because it is reusing the same mutable memory chunk as `cw`. If the D code you want to interface with requires a real `wstring`, you'll need to copy the text into a new immutable memory chunk: wstring wstr = dw.idup; // idup is short for "immutable duplicate" `idup` will allocate heap memory, so if you care about performance and memory usage, don't use it unless you actually need it. You can also combine both steps into a one-liner: wstring wstr = cw[0 .. cw_len].idup;
Apr 05 2016
parent Mike Parker <aldacron gmail.com> writes:
On Tuesday, 5 April 2016 at 07:10:50 UTC, tsbockman wrote:

 You can also combine both steps into a one-liner:

     wstring wstr = cw[0 .. cw_len].idup;
This should do the trick, too: import std.conv : to; auto wstr = to!wstring(cw);
Apr 05 2016
prev sibling next sibling parent Basile B. <b2.temp gmx.com> writes:
On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:
 I'm sorry for this total newbie question, but for some reason 
 this is eluding me. [...]
You've been given the right answer by the other participants but I'd like to share this simple helper range from my user lib: auto nullTerminated(C)(C c) if (isPointer!C && isSomeChar!(PointerTarget!(C))) { struct NullTerminated(C) { private C _front; /// this(C c) { _front = c; } /// property bool empty() { return *_front == 0; } /// auto front() { return *_front; } /// void popFront() { ++_front; } /// C save() { return _front; } } return NullTerminated!C(c); } The idea is to get rid of the conversion and to process directly the pointer in all phobos function.
Apr 05 2016
prev sibling next sibling parent Rene Zwanenburg <renezwanenburg gmail.com> writes:
On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:
 I'm sorry for this total newbie question, but for some reason 
 this is eluding me. I must be overlooking something obvious, 
 but I haven't been able to figure this out and haven't found 
 anything helpful.
In case you haven't done so already, you'll also have to use CharSet = CharSet.Unicode in the DllImport attribute.
Apr 05 2016
prev sibling next sibling parent Kagamin <spam here.lot> writes:
On Tuesday, 5 April 2016 at 01:21:55 UTC, Thalamus wrote:

 (C)), and one of the parameters is a string. This works just 
 fine for ANSI, but I'm having trouble with the Unicode 
 equivalent.

 When the message parameter is wchar*, wstring info = 
 to!wstring(message) populates the string with the _address_ of 
 the wchar*. So when message was in the debugger as 
 0x00000000035370e8 L"Writing Exhaustive unit tests is 
 exhausting.", the wstring info variable ended up as {length=7 
 ptr=0x000000001c174a20 L"35370E8" }. The dstring*/wchar_t* 
 version had equivalent results.
want to receive them as immutable (StringBuilder is for mutable string buffers), it's also easier to just pass the string length [DllImport(...)] extern void dfunc(string s, int len); dfunc(s, s.Length); D: extern(C) void dfunc(immutable(wchar)* s, int len) { wstring ws = s[0..len]; } Since the string is temporary, you'll have to idup it if you want to retain it after the call finishes.
Apr 05 2016
prev sibling parent reply Thalamus <norobots foo.com> writes:
Thanks everyone! You've all been very helpful.
Apr 05 2016
parent reply Thalamus <norobots foo.com> writes:
On Tuesday, 5 April 2016 at 11:26:44 UTC, Thalamus wrote:
 Thanks everyone! You've all been very helpful.
For anyone who has the same question and happens on this thread, I wanted to post what I finally came up with. I combined the information everyone in this thread gave me with what I saw in Phobos source for the to!string() implementation, closely following the latter. The important to!string() code is in the toImpl implementation in conv.d at line 880. The existing code uses strlen, but that's an ANSI function. Fortunately, D has wcslen available, too. import core.stdc.stddef; // For wchar_t. This is defined differently for Windows vs POSIX. import core.stdc.wchar_; // For wcslen. wstring toWstring(wchar_t* value) { return value ? cast(wstring) value[0..wcslen(wstr)].dup : null; } The Phobos code notes that this operation is unsafe, because there's no guarantee the string is null-terminated as it should be. That's definitely true. The only outcome you can be really sure is accurate is an access violation. :) thanks! Thalamus
Apr 05 2016
parent reply ag0aep6g <anonymous example.com> writes:
On 05.04.2016 20:44, Thalamus wrote:
 import core.stdc.stddef; // For wchar_t. This is defined differently for
 Windows vs POSIX.
 import core.stdc.wchar_; // For wcslen.
Aside: D has syntax for "// For wchar_t.": `import core.stdc.stddef: wchar_t;`.
 wstring toWstring(wchar_t* value)
 {
      return value ? cast(wstring) value[0..wcslen(wstr)].dup : null;
 }
wchar_t is not wchar. wstring is not (portably) compatible with a wchar_t array. If you actually have a wchar_t* and you want a wstring as opposed to a wchar_t[], then you will potentially have to do some converting. If you have a wchar*, then don't use wcslen, as that's defined in terms of wchar_t. There may be some function for finding the first null wchar from a wchar*, but I don't know it, and writing out a loop isn't exactly hard: ---- wstring toWstring(const(wchar)* value) { if (value is null) return null; auto cursor = value; while (*cursor != 0) ++cursor; return value[0 .. cursor - value].dup; } ----
Apr 05 2016
parent Thalamus <norobots foo.com> writes:
On Tuesday, 5 April 2016 at 19:19:10 UTC, ag0aep6g wrote:
 On 05.04.2016 20:44, Thalamus wrote:
 [...]
Aside: D has syntax for "// For wchar_t.": `import core.stdc.stddef: wchar_t;`.
 [...]
wchar_t is not wchar. wstring is not (portably) compatible with a wchar_t array. If you actually have a wchar_t* and you want a wstring as opposed to a wchar_t[], then you will potentially have to do some converting. If you have a wchar*, then don't use wcslen, as that's defined in terms of wchar_t. There may be some function for finding the first null wchar from a wchar*, but I don't know it, and writing out a loop isn't exactly hard: ---- wstring toWstring(const(wchar)* value) { if (value is null) return null; auto cursor = value; while (*cursor != 0) ++cursor; return value[0 .. cursor - value].dup; } ----
Thank you for the feedback. You are correct.
Apr 05 2016