www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - what exactly is string length?

reply mw <mingwu gmail.com> writes:
https://run.dlang.io/is/B4jcno

---
import std;
import std.conv : text;


void main()
{
     char[6] s;
     s = "abc";
     writeln(s, s.length);  // abc6, ok it's the static array's 
length

     string t = text("head-", s, "-tail");
     writeln(t, t.length);  // head-abc-tail16, why?
}
---

Why the last output is 16 instead of 13, t's type is string here.
Apr 01 2021
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 02/04/2021 5:32 PM, mw wrote:
 ---
 import std;
 import std.conv : text;
 
 
 void main()
 {
     char[6] s;
     s = "abc";
     writeln(s, s.length);  // abc6, ok it's the static array's length
 
     string t = text("head-", s, "-tail");
     writeln(t, t.length);  // head-abc-tail16, why?
assert(t[9] == '\0');
 }
 ---
Apr 01 2021
parent reply mw <mingwu gmail.com> writes:
On Friday, 2 April 2021 at 04:36:01 UTC, rikki cattermole wrote:
 On 02/04/2021 5:32 PM, mw wrote:
 ---
 import std;
 import std.conv : text;
 
 
 void main()
 {
     char[6] s;
     s = "abc";
     writeln(s, s.length);  // abc6, ok it's the static array's 
 length
 
     string t = text("head-", s, "-tail");
     writeln(t, t.length);  // head-abc-tail16, why?
assert(t[9] == '\0');
 }
 ---
I don't get it, what do you mean by the assertion: assert(t[9] == '\0'); t == "head-abc-tail"
Apr 01 2021
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 02/04/2021 5:38 PM, mw wrote:
 On Friday, 2 April 2021 at 04:36:01 UTC, rikki cattermole wrote:
 On 02/04/2021 5:32 PM, mw wrote:
 ---
 import std;
 import std.conv : text;


 void main()
 {
     char[6] s;
     s = "abc";
     writeln(s, s.length);  // abc6, ok it's the static array's length

     string t = text("head-", s, "-tail");
     writeln(t, t.length);  // head-abc-tail16, why?
assert(t[9] == '\0');
 }
 ---
I don't get it, what do you mean by the assertion: assert(t[9] == '\0'); t == "head-abc-tail"
Not all characters can be printed such as NULL. [104, 101, 97, 100, 45, 97, 98, 99, 0, 0, 0, 45, 116, 97, 105, 108]
Apr 01 2021
parent reply mw <mingwu gmail.com> writes:
On Friday, 2 April 2021 at 04:43:48 UTC, rikki cattermole wrote:
 On 02/04/2021 5:38 PM, mw wrote:
 On Friday, 2 April 2021 at 04:36:01 UTC, rikki cattermole 
 wrote:
 On 02/04/2021 5:32 PM, mw wrote:
 ---
 import std;
 import std.conv : text;


 void main()
 {
     char[6] s;
     s = "abc";
     writeln(s, s.length);  // abc6, ok it's the static 
 array's length

     string t = text("head-", s, "-tail");
     writeln(t, t.length);  // head-abc-tail16, why?
assert(t[9] == '\0');
 }
 ---
I don't get it, what do you mean by the assertion: assert(t[9] == '\0'); t == "head-abc-tail"
Not all characters can be printed such as NULL. [104, 101, 97, 100, 45, 97, 98, 99, 0, 0, 0, 45, 116, 97, 105, 108]
So you mean inside the writeln() call, the 0s are skipped? Well, if I use `string t` as filename, it will try to looking for a file called: "head-abc\0\0\0-tail" instead of just "head-abc-tail" ? or it's platform dependent?
Apr 01 2021
next sibling parent reply mw <mingwu gmail.com> writes:
On Friday, 2 April 2021 at 04:49:22 UTC, mw wrote:
 On Friday, 2 April 2021 at 04:43:48 UTC, rikki cattermole wrote:
 On 02/04/2021 5:38 PM, mw wrote:
 On Friday, 2 April 2021 at 04:36:01 UTC, rikki cattermole 
 wrote:
 On 02/04/2021 5:32 PM, mw wrote:
 ---
 import std;
 import std.conv : text;


 void main()
 {
     char[6] s;
     s = "abc";
     writeln(s, s.length);  // abc6, ok it's the static 
 array's length

     string t = text("head-", s, "-tail");
     writeln(t, t.length);  // head-abc-tail16, why?
assert(t[9] == '\0');
 }
 ---
I don't get it, what do you mean by the assertion: assert(t[9] == '\0'); t == "head-abc-tail"
Not all characters can be printed such as NULL. [104, 101, 97, 100, 45, 97, 98, 99, 0, 0, 0, 45, 116, 97, 105, 108]
So you mean inside the writeln() call, the 0s are skipped? Well, if I use `string t` as filename, it will try to looking for a file called: "head-abc\0\0\0-tail" instead of just "head-abc-tail" ? or it's platform dependent?
Then how can I construct `t`? to make this assertion true: assert(t == "head-abc-tail"); // failed!
Apr 01 2021
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 02/04/2021 5:51 PM, mw wrote:
 Then how can I construct `t`? to make this assertion true:
 
     assert(t == "head-abc-tail");  // failed!
Slice it. string t = text("head-", s[0 .. 3], "-tail"); http://ddili.org/ders/d.en/slices.html
Apr 01 2021
parent reply mw <mingwu gmail.com> writes:
On Friday, 2 April 2021 at 05:01:27 UTC, rikki cattermole wrote:
 On 02/04/2021 5:51 PM, mw wrote:
 Then how can I construct `t`? to make this assertion true:
 
     assert(t == "head-abc-tail");  // failed!
Slice it. string t = text("head-", s[0 .. 3], "-tail"); http://ddili.org/ders/d.en/slices.html
This is just an example, what if the exact length is not known statically, is there a functions to trim the `\0`s?
Apr 01 2021
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Apr 02, 2021 at 05:05:21AM +0000, mw via Digitalmars-d-learn wrote:
[...]
 This is just an example, what if the exact length is not known
 statically, is there a functions to trim the `\0`s?
What about `s.until('\0')`? Example: auto s = "abc\0\0\0def"; auto t = "blah" ~ s.until('\0').array ~ "boo"; T -- What do you call optometrist jokes? Vitreous humor.
Apr 01 2021
parent reply mw <mingwu gmail.com> writes:
On Friday, 2 April 2021 at 05:18:49 UTC, H. S. Teoh wrote:
 On Fri, Apr 02, 2021 at 05:05:21AM +0000, mw via 
 Digitalmars-d-learn wrote: [...]
 This is just an example, what if the exact length is not known 
 statically, is there a functions to trim the `\0`s?
What about `s.until('\0')`? Example: auto s = "abc\0\0\0def"; auto t = "blah" ~ s.until('\0').array ~ "boo";
Finally, I'm using: https://run.dlang.io/is/651lT6 string t = text("head-", s[].until('\0').array, "-tail"); It works for both s = "abc" (with \0), and "abcdef" (full 6 chars, indexOf will return -1 for bad range index). Thank everyone who helped.
Apr 01 2021
parent Paul Backus <snarwin gmail.com> writes:
On Friday, 2 April 2021 at 05:39:26 UTC, mw wrote:
 Finally, I'm using:

 https://run.dlang.io/is/651lT6

     string t = text("head-", s[].until('\0').array, "-tail");
FYI, you don't need the call to `.array` there--`text` accepts input ranges.
Apr 02 2021
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Apr 02, 2021 at 05:05:21AM +0000, mw via Digitalmars-d-learn wrote:
[...]
 This is just an example, what if the exact length is not known
 statically, is there a functions to trim the `\0`s?
Another way, if you want to avoid the extra allocation, slice the static array with .indexOf: s[0 .. s.indexOf('\0')] should give you the initial segment up to the first null. T -- Questions are the beginning of intelligence, but the fear of God is the beginning of wisdom.
Apr 01 2021
prev sibling parent reply Computermatronic <computermatronic gmail.com> writes:
On Friday, 2 April 2021 at 04:49:22 UTC, mw wrote:
 So you mean inside the writeln() call, the 0s are skipped?

 Well, if I use `string t` as filename, it will try to looking 
 for a file called:

 "head-abc\0\0\0-tail" instead of just "head-abc-tail" ?

 or it's platform dependent?
I would imagine that it's platform dependant, but given most platforms adhere to the C ABI, and C string are null terminated, you'd end up looking for a file called "head-abc".
Apr 01 2021
parent reply mw <mingwu gmail.com> writes:
On Friday, 2 April 2021 at 04:54:07 UTC, Computermatronic wrote:
 On Friday, 2 April 2021 at 04:49:22 UTC, mw wrote:
 So you mean inside the writeln() call, the 0s are skipped?

 Well, if I use `string t` as filename, it will try to looking 
 for a file called:

 "head-abc\0\0\0-tail" instead of just "head-abc-tail" ?

 or it's platform dependent?
I would imagine that it's platform dependant, but given most platforms adhere to the C ABI, and C string are null terminated, you'd end up looking for a file called "head-abc".
Ahh, I got what I see (from writeln) is not what get string here ;-) And I just tried: string t = text("head-", strip(s), "-tail"); It's the same behavior. So how can I trim the leading & trailing `\0` from the static char array?
Apr 01 2021
next sibling parent reply Computermatronic <computermatronic gmail.com> writes:
On Friday, 2 April 2021 at 05:02:52 UTC, mw wrote:
 Ahh, I got what I see (from writeln) is not what get string 
 here ;-)

 And I just tried:

 string t = text("head-", strip(s), "-tail");

 It's the same behavior.

 So how can I trim the leading & trailing `\0` from the static 
 char array?
strip only removes whitespace, not null characters. You'd have to do something like ```d string t = cast(string)text("head-", s, "-tail").filter!`a != '\0'`().array;``` I would assume there would be a better way, but I haven't been able to find a dedicated function for stripping null chars in std.
Apr 01 2021
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 02/04/2021 6:10 PM, Computermatronic wrote:
 On Friday, 2 April 2021 at 05:02:52 UTC, mw wrote:
 Ahh, I got what I see (from writeln) is not what get string here ;-)

 And I just tried:

 string t = text("head-", strip(s), "-tail");

 It's the same behavior.

 So how can I trim the leading & trailing `\0` from the static char array?
strip only removes whitespace, not null characters. You'd have to do something like ```d string t = cast(string)text("head-", s, "-tail").filter!`a != '\0'`().array;``` I would assume there would be a better way, but I haven't been able to find a dedicated function for stripping null chars in std.
If you know it has null terminators you can use fromStringz. But this is a case of you should store the length.
Apr 01 2021
prev sibling parent reply mw <mingwu gmail.com> writes:
On Friday, 2 April 2021 at 05:02:52 UTC, mw wrote:
 On Friday, 2 April 2021 at 04:54:07 UTC, Computermatronic wrote:
 On Friday, 2 April 2021 at 04:49:22 UTC, mw wrote:
 So you mean inside the writeln() call, the 0s are skipped?

 Well, if I use `string t` as filename, it will try to looking 
 for a file called:

 "head-abc\0\0\0-tail" instead of just "head-abc-tail" ?

 or it's platform dependent?
I would imagine that it's platform dependant, but given most platforms adhere to the C ABI, and C string are null terminated, you'd end up looking for a file called "head-abc".
Ahh, I got what I see (from writeln) is not what get string here ;-)
BTW, shall I log a writeln() improvement bug ? It's really confusing, e.g as debug print or logs. Output something like: "head-abc\0\0\0-tail" "head-abc...-tail" "head-abc???-tail" is more clear.
Apr 02 2021
parent Berni44 <someone somemail.com> writes:
On Friday, 2 April 2021 at 15:01:07 UTC, mw wrote:
 BTW, shall I log a writeln() improvement bug ?

 It's really confusing, e.g as debug print or logs.
In my opinion this isn't a bug. The nulls are actually printed: ``` $> rdmd test.d | hd 00000000 61 62 63 00 00 00 36 0a 68 65 61 64 2d 61 62 63 |abc...6.head-abc| 00000010 00 00 00 2d 74 61 69 6c 31 36 0a |...-tail16.| 0000001b ``` It's just, that you can't see them, because it's an invisible character. If you want to get `\0` for nulls, you could write ``` writefln!"%(%s%)"(only(t)); ``` With this, `t` is printed as a string literal: ``` "head-abc\0\0\0-tail" ```
Apr 02 2021
prev sibling parent mw <mingwu gmail.com> writes:
On Friday, 2 April 2021 at 04:38:37 UTC, mw wrote:
 On Friday, 2 April 2021 at 04:36:01 UTC, rikki cattermole wrote:
 I don't get it, what do you mean by the assertion:


 assert(t[9] == '\0');


 t == "head-abc-tail"
Just tried this: https://run.dlang.io/is/SFU5p4 ``` import std; import std.conv : text; void main() { char[6] s; s = "abc"; writeln(s, s.length); // abc6, ok it's the static array's length string t = text("head-", s, "-tail"); writeln(t, t.length); // head-abc-tail16, why 16 instead of 13, t's type is string here assert(t[9] == '\0'); // ok assert(t == "head-abc-tail"); // failed! } ``` I'm even more puzzled by the last 2 assertions behavior: t is print out as "head-abc-tail".
Apr 01 2021
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Apr 02, 2021 at 04:32:53AM +0000, mw via Digitalmars-d-learn wrote:
[...]
 ---
 import std;
 import std.conv : text;
 
 
 void main()
 {
     char[6] s;
     s = "abc";
     writeln(s, s.length);  // abc6, ok it's the static array's length
 
     string t = text("head-", s, "-tail");
     writeln(t, t.length);  // head-abc-tail16, why?
 }
 ---
 
 Why the last output is 16 instead of 13, t's type is string here.
Because `s` contains 6 chars, and you only assigned 3 of them, so there are 3 trailing null bytes that are inserted before "-tail". Null bytes don't print anything, so you don't see them when printed as a string, but if you cast it to ubyte[], you will see them: writefln("%(%02X %)", t); // Prints: 68 65 61 64 2D 61 62 63 00 00 00 2D 74 61 69 6C Remember, this is D, not C. Strings are not terminated by nulls, so appending the static array will append all 6 chars, including the nulls. T -- Holding a grudge is like drinking poison and hoping the other person dies. -- seen on the 'Net
Apr 01 2021
prev sibling parent =?UTF-8?B?0JLQuNGC0LDQu9C40Lkg0KTQsNC0?= =?UTF-8?B?0LXQtdCy?= writes:
On Friday, 2 April 2021 at 04:32:53 UTC, mw wrote:
 https://run.dlang.io/is/B4jcno

 ---
 import std;
 import std.conv : text;


 void main()
 {
     char[6] s;
     s = "abc";
     writeln(s, s.length);  // abc6, ok it's the static array's 
 length

     string t = text("head-", s, "-tail");
     writeln(t, t.length);  // head-abc-tail16, why?
 }
 ---

 Why the last output is 16 instead of 13, t's type is string 
 here.
Test this: https://run.dlang.io/is/Cq4vjP
Apr 01 2021