www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - A weird example of .toUTF16z concatination side-effects in wcsncat

reply BoQsc <vaidas.boqsc gmail.com> writes:
Here I try to concatenate three character strings using 
`wcsncat()`.

`clang_string`         AAAAAAAAAA
`dlang_string`         BBBBBBBBBBB
`winpointer_to_string` CCCCCCCCCC


```
import std.stdio;

 system void main(){

	import std.utf                : toUTF16z, toUTF16;
	import core.stdc.wchar_       : wcsncat, wcslen, wprintf;
	import core.stdc.stdlib       : wchar_t;
	import core.sys.windows.winnt : LPCWSTR;

	wchar_t* clang_string         = cast(wchar_t *)"AAAAAAAAAA";
	string   dlang_string         = "BBBBBBBBBBB";
	LPCWSTR  winpointer_to_string = "CCCCCCCCCC";
	
	wcsncat(clang_string, dlang_string.toUTF16z, 
wcslen(dlang_string.toUTF16z));
	//   String output: AAAAAAAAAABBBBBBBBBBB
	
	wcsncat(clang_string, winpointer_to_string, 
wcslen(winpointer_to_string));
	//   String output: AAAAAAAAAABBBBBBBBBBBBBBBBBBBB
	// Expected string: AAAAAAAAAABBBBBBBBBBBCCCCCCCCCC

	wprintf(clang_string);
	//   String output: AAAAAAAAAABBBBBBBBBBBBBBBBBBBB
	// Expected string: AAAAAAAAAABBBBBBBBBBBCCCCCCCCCC

}


```

**Problem:**
Any *following concatenated string* after "`wcsncat()` 
concatenation of `dlang_string.toUTF16z` string", happen to not 
be printed and gets overwritten.

**The Expected output:**
I was expecting the `wprintf()` **result** to be 
`AAAAAAAAAABBBBBBBBBBBCCCCCCCCCC`
The `wprintf() `  **result** I've received is this:  
`AAAAAAAAAABBBBBBBBBBBBBBBBBBBB`
Apr 07 2022
next sibling parent reply Tejas <notrealemail gmail.com> writes:
On Thursday, 7 April 2022 at 10:50:35 UTC, BoQsc wrote:
 Here I try to concatenate three character strings using 
 `wcsncat()`.

 [...]
Maybe try using `wstring` instead of string? Also use the `w` postfix ```d wstring dlang_string = "BBBBBBBBBBB"w; I can't test because I'm not on my PC and I don't use Windows
Apr 07 2022
parent BoQsc <vaidas.boqsc gmail.com> writes:
On Thursday, 7 April 2022 at 11:03:39 UTC, Tejas wrote:
 On Thursday, 7 April 2022 at 10:50:35 UTC, BoQsc wrote:
 Here I try to concatenate three character strings using 
 `wcsncat()`.

 [...]
Maybe try using `wstring` instead of string? Also use the `w` postfix ```d wstring dlang_string = "BBBBBBBBBBB"w; I can't test because I'm not on my PC and I don't use Windows
Exactly same results. `AAAAAAAAAABBBBBBBBBBBBBBBBBBBB` ``` import std.stdio; system void main(){ import std.utf : toUTF16z, toUTF16; import core.stdc.wchar_ : wcsncat, wcslen, wprintf; import core.stdc.stdlib : wchar_t; import core.sys.windows.winnt : LPCWSTR; wchar_t* clang_string = cast(wchar_t *)"AAAAAAAAAA"; wstring dlang_string = "BBBBBBBBBBB"w; //<---- NEW, same results LPCWSTR winpointer_to_string = "CCCCCCCCCC"; wcsncat(clang_string, dlang_string.toUTF16z, wcslen(dlang_string.toUTF16z)); // String output: AAAAAAAAAABBBBBBBBBBB wcsncat(clang_string, winpointer_to_string, wcslen(winpointer_to_string)); // String output: AAAAAAAAAABBBBBBBBBBBBBBBBBBBB // Expected string: AAAAAAAAAABBBBBBBBBBBCCCCCCCCCC wprintf(clang_string); // String output: AAAAAAAAAABBBBBBBBBBBBBBBBBBBB // Expected string: AAAAAAAAAABBBBBBBBBBBCCCCCCCCCC } ```
Apr 07 2022
prev sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Thursday, 7 April 2022 at 10:50:35 UTC, BoQsc wrote:

 	wchar_t* clang_string         = cast(wchar_t *)"AAAAAAAAAA";
You're witnessing undefined behavior. "AAAAAAAAAA" is a string literal and is stored in the data segment. Mere cast to wchar_t* does not make writing through that pointer legal. Moreover, even if it was legal to write through it, that alone wouldn't be sufficient. From documentation of `wcsncat`:
 The behavior is undefined if the destination array is not large 
 enough for the contents of both str and dest and the 
 terminating null wide character.
`wcsncat` does not allocate memory, it expects you to provide a sufficiently large mutable buffer. For example, like this: ```d // ... auto cls = new wchar_t[256]; cls[] = 0; cls[0..10] = 'A'; wchar_t* clang_string = cls.ptr; // ... ```
Apr 07 2022
parent BoQsc <vaidas.boqsc gmail.com> writes:
On Thursday, 7 April 2022 at 12:51:26 UTC, Stanislav Blinov wrote:
 On Thursday, 7 April 2022 at 10:50:35 UTC, BoQsc wrote:

 	wchar_t* clang_string         = cast(wchar_t *)"AAAAAAAAAA";
You're witnessing undefined behavior. "AAAAAAAAAA" is a string literal and is stored in the data segment. Mere cast to wchar_t* does not make writing through that pointer legal. Moreover, even if it was legal to write through it, that alone wouldn't be sufficient. From documentation of `wcsncat`:
 The behavior is undefined if the destination array is not 
 large enough for the contents of both str and dest and the 
 terminating null wide character.
`wcsncat` does not allocate memory, it expects you to provide a sufficiently large mutable buffer. For example, like this: ```d // ... auto cls = new wchar_t[256]; cls[] = 0; cls[0..10] = 'A'; wchar_t* clang_string = cls.ptr; // ... ```
That is correct, the results are satisfying. I believe this thread is resolved. ``` import std.stdio; system void main(){ import std.utf : toUTF16z, toUTF16; import core.stdc.wchar_ : wcsncat, wcslen, wprintf; import core.stdc.stdlib : wchar_t; import core.sys.windows.winnt : LPCWSTR; auto cls = new wchar_t[256]; cls[] = 0; cls[0..10] = 'A'; wchar_t* clang_string = cls.ptr; //wchar_t* clang_string = cast(wchar_t *)"AAAAAAAAAA"; wstring dlang_string = "BBBBBBBBBB"w; //<---- NEW, same results LPCWSTR winpointer_to_string = "CCCCCCCCCC"; wcsncat(clang_string, dlang_string.toUTF16z, wcslen(dlang_string.toUTF16z)); // String output: AAAAAAAAAABBBBBBBBBB wcsncat(clang_string, winpointer_to_string, wcslen(winpointer_to_string)); // String output: AAAAAAAAAABBBBBBBBBBCCCCCCCCCC // Expected string: AAAAAAAAAABBBBBBBBBBCCCCCCCCCC wprintf(clang_string); // String output: AAAAAAAAAABBBBBBBBBBCCCCCCCCCC // Expected string: AAAAAAAAAABBBBBBBBBBCCCCCCCCCC } ```
Apr 07 2022