www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - From the D Blog -- Interfacing D with C: Strings Part One

reply Mike Parker <aldacron gmail.com> writes:
The latest post in the D and C series dives into the weeds of D 
and C strings: how they're implemented, when you need to 
NUL-terminate your D strings and when you don't, and how the 
storage of literals in memory allows you to avoid NUL termination 
in one case you might not have considered and another case that 
you shouldn't rely on but can in practice with the current 
compilers.

There are at least two more posts worth of information to go into 
on this topic, but everything in this post is enough to cover 
many use cases of D to C string interop.

The blog:
https://dlang.org/blog/2021/05/24/interfacing-d-with-c-strings-part-one/

Reddit:
https://www.reddit.com/r/programming/comments/njyf76/interfacing_d_with_c_strings_part_one/
May 24
next sibling parent zjh <fqbqrr 163.com> writes:
I always think there is something wrong with the JS of ` D blog 
site`.
I can't use chrome to open it.
May 24
prev sibling next sibling parent reply zjh <fqbqrr 163.com> writes:
I always think there is something wrong with the JS of ` D blog 
site`.
I can't use chrome to open it.
May 24
parent reply Mike Parker <aldacron gmail.com> writes:
On Monday, 24 May 2021 at 15:03:20 UTC, zjh wrote:
 I always think there is something wrong with the JS of ` D blog 
 site`.
 I can't use chrome to open it.
I'm looking at it in Chrome right now. Are you saying it doesn't open at all for you or there are rendering issues, or...?
May 24
parent reply zjh <fqbqrr 163.com> writes:
loading all the time, loading for a long time, but also only part 
of the load, and still turn the circle.
just like there is a BIG `js` file .
May 24
parent reply zjh <fqbqrr 163.com> writes:
but ie can open it.
it's strange.
May 24
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
Use the dev tools, network + performance tabs could be very useful 
information for debugging this.
May 24
parent reply zjh <fqbqrr 163.com> writes:
use chrome ,cannot open the DEV.
press `F12` of no use.
`Alt+U`of no use.
continue rotating the small circle.You can do nothing.
May 24
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 25/05/2021 3:39 AM, zjh wrote:
 use chrome ,cannot open the DEV.
 press `F12` of no use.
 `Alt+U`of no use.
 continue rotating the small circle.You can do nothing.
There is something seriously wrong with your install then. It isn't related to the website itself.
May 24
parent reply zjh <fqbqrr 163.com> writes:
switch to ie.
other websites is no problem.
I don't know why.
May 24
parent reply Vinod K Chandran <kcvinu82 gmail.com> writes:
On Tuesday, 25 May 2021 at 00:41:50 UTC, zjh wrote:
 switch to ie.
 other websites is no problem.
 I don't know why.
I too faced this problem in my laptop's chrome. I started reading the article in the middle of my work (In my PC's chrome browser). Then after read half of the page, I bookmarked that page and after a few hours, at the mid night, I try to open the bookmarked page in my laptop's chrome browser. But page is partially opened and no text from the article is visible. Only the links in side bar is visible, but they are not clickable. The progress bar is still rotating. So I gave up and did some other tasks. Then next morning, I have opened the page in my PC's chrome and saved the web page with "Save As" option and send those file to my laptop via dropbox. Then I tried to open that saved webpage in laptop's chrome. But it did the same thing. No text is visible. It's some time showing a alert that this web page is taking too many time to load, wait or kill ?. Then I tested that saved web page in Internet Explorer and it opened without any problem.
May 25
parent reply zjh <fqbqrr 163.com> writes:
On Tuesday, 25 May 2021 at 21:14:24 UTC, Vinod K Chandran wrote:
 On Tuesday, 25 May 2021 at 00:41:50 UTC, zjh wrote:
 switch to ie.
 other websites is no problem.
 I don't know why.
I too faced this problem in my laptop's chrome.
The `js` of this website must have problems. I faced the problem too long. other websites no problem.
May 25
parent reply dangbinghoo <dangbinghoo gmail.com> writes:
On Tuesday, 25 May 2021 at 23:57:24 UTC, zjh wrote:
 On Tuesday, 25 May 2021 at 21:14:24 UTC, Vinod K Chandran wrote:
 On Tuesday, 25 May 2021 at 00:41:50 UTC, zjh wrote:
 switch to ie.
 other websites is no problem.
 I don't know why.
I too faced this problem in my laptop's chrome.
The `js` of this website must have problems. I faced the problem too long. other websites no problem.
the website is ok, it's because, you and me in China mainland, and we just behind that wall, you just need a VPN proxy! as in Chinese: 网站没问题,你需要梯子!
May 25
parent zjh <fqbqrr 163.com> writes:
哦,原来如此.谢谢.
May 25
prev sibling next sibling parent reply sighoya <sighoya gmail.com> writes:
On Monday, 24 May 2021 at 14:02:14 UTC, Mike Parker wrote:
 The latest post in the D and C series dives into the weeds of D 
 and C strings:
Thanks, I wasn't even aware of this. However, I wish the behavior would be the same between string bounded storage variables and string literals. I think this would be a good use case for multiple alias this where we would map d types to c types and vice versa. For strings, it would in case of D->C just add the \0 at the end and for C->D it would be subtracted from the string.
 There are at least two more posts worth of information to go 
 into on this topic, but everything in this post is enough to 
 cover many use cases of D to C string interop.
Which posts did you think of?
May 24
parent Mike Parker <aldacron gmail.com> writes:
On Monday, 24 May 2021 at 15:10:46 UTC, sighoya wrote:

 There are at least two more posts worth of information to go 
 into on this topic, but everything in this post is enough to 
 cover many use cases of D to C string interop.
Which posts did you think of?
I describe them in the Conclusion:
 In Part Two, we’ll look at how mutability, immutability, and 
 constness come into the picture, how to avoid a potential 
 problem spot that can arise when passing GC-allocated D strings 
 to C, and how to get D strings from C strings. We’ll save 
 encoding for Part Three.
May 24
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/24/21 10:02 AM, Mike Parker wrote:
 The latest post in the D and C series dives into the weeds of D and C 
 strings: how they're implemented, when you need to NUL-terminate your D 
 strings and when you don't, and how the storage of literals in memory 
 allows you to avoid NUL termination in one case you might not have 
 considered and another case that you shouldn't rely on but can in 
 practice with the current compilers.
 
 There are at least two more posts worth of information to go into on 
 this topic, but everything in this post is enough to cover many use 
 cases of D to C string interop.
 
 The blog:
 https://dlang.org/blog/2021/05/24/interfacing-d-with-c-strings-part-one/
 
 Reddit:
 https://www.reddit.com/r/programming/comments/njyf76/interfacing_d_with
c_strings_part_one/ 
 
Nice article! Note that there is a huge pitfall awaiting you if you use `toStringz`: garbage collection. You may want to amend the article to identify this pitfall. And I'm not talking about requiring ` nogc`, I'm talking about the GC collecting the data while C is still using it. In your example: ```d puts(s1.toStringz()); ``` This leaves a GC-collectible allocation in C land. For `puts`, it's fine, as the data is not used past the call, but in something else that might keep it somewhere not accessible to the GC, you'll want to assign that to a variable that lasts as long as the resource is used. -Steve
May 24
next sibling parent John Colvin <john.loughran.colvin gmail.com> writes:
On Monday, 24 May 2021 at 16:16:53 UTC, Steven Schveighoffer 
wrote:
 On 5/24/21 10:02 AM, Mike Parker wrote:
 The latest post in the D and C series dives into the weeds of 
 D and C strings: how they're implemented, when you need to 
 NUL-terminate your D strings and when you don't, and how the 
 storage of literals in memory allows you to avoid NUL 
 termination in one case you might not have considered and 
 another case that you shouldn't rely on but can in practice 
 with the current compilers.
 
 There are at least two more posts worth of information to go 
 into on this topic, but everything in this post is enough to 
 cover many use cases of D to C string interop.
 
 The blog:
 https://dlang.org/blog/2021/05/24/interfacing-d-with-c-strings-part-one/
 
 Reddit:
 https://www.reddit.com/r/programming/comments/njyf76/interfacing_d_with_c_strings_part_one/
 
Nice article! Note that there is a huge pitfall awaiting you if you use `toStringz`: garbage collection. You may want to amend the article to identify this pitfall. And I'm not talking about requiring ` nogc`, I'm talking about the GC collecting the data while C is still using it. In your example: ```d puts(s1.toStringz()); ``` This leaves a GC-collectible allocation in C land. For `puts`, it's fine, as the data is not used past the call, but in something else that might keep it somewhere not accessible to the GC, you'll want to assign that to a variable that lasts as long as the resource is used. -Steve
It’s worse than that, no? If the only reference to GC data isn’t on the stack of a tracked thread, in GC allocated memory or in a tracked root then it can be freed. Even in D: void foo(int* a) { int** b = cast(int**) malloc((int*).sizeof); *b = a; a = null; GC.collect(); **b = 4; // whoops!! } foo(new int); Right? Obviously that collection could be from calling another function (e.g. a callback from C to D code) or from another thread. Or am I missing something?
May 24
prev sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Monday, 24 May 2021 at 16:16:53 UTC, Steven Schveighoffer 
wrote:

 Nice article!
Thanks!
 Note that there is a huge pitfall awaiting you if you use 
 `toStringz`: garbage collection. You may want to amend the 
 article to identify this pitfall.

 And I'm not talking about requiring ` nogc`, I'm talking about 
 the GC collecting the data while C is still using it.

 In your example:

 ```d
 puts(s1.toStringz());
 ```

 This leaves a GC-collectible allocation in C land. For `puts`, 
 it's fine, as the data is not used past the call, but in 
 something else that might keep it somewhere not accessible to 
 the GC, you'll want to assign that to a variable that lasts as 
 long as the resource is used.
That's what I'm referring to in the conclusion where I say this about what's going to be in Part Two:
 how to avoid a potential problem spot that can arise when 
 passing GC-allocated D strings to C
I'll cover approaches to maintaining a reference, like `GC.addRoot`, and emphasize that it applies to any GC-allocated memory, not just strings.
May 24
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/24/21 8:38 PM, Mike Parker wrote:
 On Monday, 24 May 2021 at 16:16:53 UTC, Steven Schveighoffer wrote:
 This leaves a GC-collectible allocation in C land. For `puts`, it's 
 fine, as the data is not used past the call, but in something else 
 that might keep it somewhere not accessible to the GC, you'll want to 
 assign that to a variable that lasts as long as the resource is used.
That's what I'm referring to in the conclusion where I say this about what's going to be in Part Two:
 how to avoid a potential problem spot that can arise when passing 
 GC-allocated D strings to C
I'll cover approaches to maintaining a reference, like `GC.addRoot`, and emphasize that it applies to any GC-allocated memory, not just strings.
OK, I'm just concerned people will see the pattern: ```d somecfunc(str.toStringz); ``` and think that's the end of it. -Steve
May 24
next sibling parent reply surlymoor <surlymoor cock.li> writes:
On Tuesday, 25 May 2021 at 00:58:31 UTC, Steven Schveighoffer 
wrote:
 OK, I'm just concerned people will see the pattern:

 ```d
 somecfunc(str.toStringz);
 ```

 and think that's the end of it.

 -Steve
Pretty sure its documentation has a conspicuous warning regarding that.
May 24
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/24/21 11:30 PM, surlymoor wrote:
 On Tuesday, 25 May 2021 at 00:58:31 UTC, Steven Schveighoffer wrote:
 OK, I'm just concerned people will see the pattern:

 ```d
 somecfunc(str.toStringz);
 ```

 and think that's the end of it.
Pretty sure its documentation has a conspicuous warning regarding that.
It does, and that's [sometimes ignored](https://forum.dlang.org/post/bogwusbqfewqifjlfmjz forum.dlang.org), even by [seasoned veterans](https://forum.dlang.org/post/rorsrk$1tdu$1 digitalmars.com). Anyway, it doesn't hurt to identify the possible memory problems that a function can have when recommending it. -Steve
May 25
prev sibling parent Mike Parker <aldacron gmail.com> writes:
On Tuesday, 25 May 2021 at 00:58:31 UTC, Steven Schveighoffer 
wrote:
 On 5/24/21 8:38 PM, Mike Parker wrote:
 OK, I'm just concerned people will see the pattern:

 ```d
 somecfunc(str.toStringz);
 ```

 and think that's the end of it.
Yeah. Good point. I've updated the post.
May 25
prev sibling parent reply =?UTF-8?B?0JLQuNGC0LDQu9C40Lkg0KTQsNC0?= =?UTF-8?B?0LXQtdCy?= writes:
On Monday, 24 May 2021 at 14:02:14 UTC, Mike Parker wrote:
 The blog:
 https://dlang.org/blog/2021/05/24/interfacing-d-with-c-strings-part-one/
Good! toStringz() Technically we can use 'reserve()' for reserve memory. Possible, memory already reserved. s.reserve( s.length + 1 ); Then we can set trailing zero. s[ $ ] = '\0'; And return pointer. return s.ptr; In this case we prevent memory allocation. Operations will be faster. In other case we cam: auto copy = new char[s.length + 1]; copy[0 .. s.length] = s[]; copy[s.length] = 0; return &assumeUnique(copy)[0]; Example: immutable(char)* toStringz( ref string s ) { if ( s.capacity <= s.length ) s.reserve( s.length + 1 ); char* cptr = cast( char* ) s.ptr; // C ptr char* zptr = cptr + s.length; // zero ptr *zptr = '\0'; return cast( immutable(char)* ) cptr; } Test code: https://run.dlang.io/is/xZwwtw
May 25
next sibling parent reply =?UTF-8?B?0JLQuNGC0LDQu9C40Lkg0KTQsNC0?= =?UTF-8?B?0LXQtdCy?= writes:
On Wednesday, 26 May 2021 at 04:00:17 UTC, Виталий Фадеев wrote:
 On Monday, 24 May 2021 at 14:02:14 UTC, Mike Parker wrote:
 The blog:
 https://dlang.org/blog/2021/05/24/interfacing-d-with-c-strings-part-one/
Test code: https://run.dlang.io/is/xZwwtw
Example for using reserve. Test code 2: https://run.dlang.io/is/aQsr8n
May 25
parent =?UTF-8?B?0JLQuNGC0LDQu9C40Lkg0KTQsNC0?= =?UTF-8?B?0LXQtdCy?= writes:
On Wednesday, 26 May 2021 at 04:27:16 UTC, Виталий Фадеев wrote:
 On Wednesday, 26 May 2021 at 04:00:17 UTC, Виталий Фадеев wrote:
 On Monday, 24 May 2021 at 14:02:14 UTC, Mike Parker wrote:
 The blog:
 https://dlang.org/blog/2021/05/24/interfacing-d-with-c-strings-part-one/
Test code: https://run.dlang.io/is/xZwwtw
Example for using reserve. Test code 2: https://run.dlang.io/is/aQsr8n
Pull request to std.string: https://github.com/dlang/phobos/pull/8111 Review code, please.
May 26
prev sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 5/25/21 9:00 PM, =D0=92=D0=B8=D1=82=D0=B0=D0=BB=D0=B8=D0=B9 =D0=A4=D0=B0=
=D0=B4=D0=B5=D0=B5=D0=B2 wrote:

      immutable(char)* toStringz( ref string s )
      {
          if ( s.capacity <=3D s.length )
              s.reserve( s.length + 1 );

          char* cptr =3D cast( char* ) s.ptr; // C ptr
          char* zptr =3D cptr + s.length;     // zero ptr
          *zptr =3D '\0';
That's undefined behavior because that location does not belong to the=20 string. Here is an example that defeats the proposed toStringz: void main() { string s; s =3D "D string"; auto c_string =3D toStringz( s ); auto other =3D s; other ~=3D 'X'; // <-- Seemingly unrelated operation // ... } puts accesses that unrelated 'X' and more bytes after that: C string: D stringX1^=01 Ali
May 26
parent reply =?UTF-8?B?0JLQuNGC0LDQu9C40Lkg0KTQsNC0?= =?UTF-8?B?0LXQtdCy?= writes:
On Wednesday, 26 May 2021 at 16:35:36 UTC, Ali Çehreli wrote:
 On 5/25/21 9:00 PM, Виталий Фадеев wrote:

      immutable(char)* toStringz( ref string s )
      {
          if ( s.capacity <= s.length )
              s.reserve( s.length + 1 );

          char* cptr = cast( char* ) s.ptr; // C ptr
          char* zptr = cptr + s.length;     // zero ptr
          *zptr = '\0';
That's undefined behavior because that location does not belong to the string. Here is an example that defeats the proposed toStringz: void main() { string s; s = "D string"; auto c_string = toStringz( s ); auto other = s; other ~= 'X'; // <-- Seemingly unrelated operation // ... } puts accesses that unrelated 'X' and more bytes after that: C string: D stringX1^ Ali
Yes. True. reserve/capacity - not for all cases.
May 26
parent reply =?UTF-8?B?0JLQuNGC0LDQu9C40Lkg0KTQsNC0?= =?UTF-8?B?0LXQtdCy?= writes:
On Thursday, 27 May 2021 at 03:40:02 UTC, Виталий Фадеев wrote:
 On Wednesday, 26 May 2021 at 16:35:36 UTC, Ali Çehreli wrote:
 On 5/25/21 9:00 PM, Виталий Фадеев wrote:

      immutable(char)* toStringz( ref string s )
      {
          if ( s.capacity <= s.length )
              s.reserve( s.length + 1 );

          char* cptr = cast( char* ) s.ptr; // C ptr
          char* zptr = cptr + s.length;     // zero ptr
          *zptr = '\0';
That's undefined behavior because that location does not belong to the string. Here is an example that defeats the proposed toStringz: void main() { string s; s = "D string"; auto c_string = toStringz( s ); auto other = s; other ~= 'X'; // <-- Seemingly unrelated operation // ... } puts accesses that unrelated 'X' and more bytes after that: C string: D stringX1^ Ali
Yes. True. reserve/capacity - not for all cases.
Zero terminator not keeped after concatenate source string with other string. auto dString = "D string" ~ 2.to!string; auto cString = dString.toStringz(); dString = dString ~ "new tail"; // cString[ $-1 ] != '\'0';
May 26
parent =?UTF-8?B?0JLQuNGC0LDQu9C40Lkg0KTQsNC0?= =?UTF-8?B?0LXQtdCy?= writes:
On Thursday, 27 May 2021 at 03:52:32 UTC, Виталий Фадеев wrote:
 On Thursday, 27 May 2021 at 03:40:02 UTC, Виталий Фадеев wrote:
 On Wednesday, 26 May 2021 at 16:35:36 UTC, Ali Çehreli wrote:
 On 5/25/21 9:00 PM, Виталий Фадеев wrote:
// cString[ $-1 ] != '\'0';
// cString[ $ ] != '\'0';
May 26