www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - to delete the '\0' characters

reply Salih Dincer <salihdb hotmail.com> writes:
Is there a more accurate way to delete the '\0' characters at the 
end of the string? I tried functions in this module: 
https://dlang.org/phobos/std_string.html

```d
auto foo(string s)
{
   string r;
   foreach(c; s)
   {
     if(c > 0)
     {
       r ~= c;
     }
   }
   return r;
}
```

SDB 79
Sep 22 2022
next sibling parent reply ag0aep6g <anonymous example.com> writes:
On 22.09.22 12:53, Salih Dincer wrote:
 Is there a more accurate way to delete the '\0' characters at the end of 
 the string? I tried functions in this module: 
 https://dlang.org/phobos/std_string.html
 
 ```d
 auto foo(string s)
 {
    string r;
    foreach(c; s)
    {
      if(c > 0)
      {
        r ~= c;
      }
    }
    return r;
 }
 ```
I don't understand what you mean by "more accurate". Here's a snippet that's a bit shorter than yours and doesn't copy the data: while (s.length > 0 && s[$ - 1] == '\0') { s = s[0 .. $ - 1]; } return s; But do you really want to allow embedded '\0's? I.e., should foo("foo\0bar\0") really resolve to "foo\0bar" and not "foo"? Usually, it's the first '\0' that signals the end of a string. In that case you better start the search at the front and stop at the first hit.
Sep 22 2022
parent ag0aep6g <anonymous example.com> writes:
On 22.09.22 13:14, ag0aep6g wrote:
 On 22.09.22 12:53, Salih Dincer wrote:
[...]
 ```d
 auto foo(string s)
 {
    string r;
    foreach(c; s)
    {
      if(c > 0)
      {
        r ~= c;
      }
    }
    return r;
 }
 ```
[...]
 Here's a snippet that's a bit shorter than yours and doesn't copy the data:
 
      while (s.length > 0 && s[$ - 1] == '\0')
      {
          s = s[0 .. $ - 1];
      }
      return s;
 
 But do you really want to allow embedded '\0's? I.e., should 
 foo("foo\0bar\0") really resolve to "foo\0bar" and not "foo"?
Whoops. Your code actually turns "foo\0bar" into "foobar", removing the embedded '\0'. So my supposed alternative is wrong. Still, you usually want to stop at the first '\0'.
Sep 22 2022
prev sibling next sibling parent reply user1234 <user1234 12.de> writes:
On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
wrote:
 Is there a more accurate way to delete the '\0' characters at 
 the end of the string? I tried functions in this module: 
 https://dlang.org/phobos/std_string.html

 ```d
 auto foo(string s)
 {
   string r;
   foreach(c; s)
   {
     if(c > 0)
     {
       r ~= c;
     }
   }
   return r;
 }
 ```

 SDB 79
Two remarks: 1. The zero implicitly added to literals is not part of the string. for example s[$-1] will not give 0 unless you added it explictly to a literal 2. you code remove all the 0, not the one at the end. As it still ensure what you want to achieve, maybe try [`stripRight()`](https://dlang.org/phobos/std_string.html#.stripRight). The second overload allows to specify the characters to remove.
Sep 22 2022
parent Salih Dincer <salihdb hotmail.com> writes:
On Thursday, 22 September 2022 at 13:29:43 UTC, user1234 wrote:
 Two remarks:

 1. The zero implicitly added to literals is not part of the 
 string. for example s[$-1] will not give 0 unless you added it 
 explictly to a literal

 2. you code remove all the 0, not the one at the end. As it 
 still ensure what you want to achieve, maybe try 
 [`stripRight()`](https://dlang.org/phobos/std_string.html#.stripRight). The
second overload allows to specify the characters to remove.
As I mentioned earlier stripRight() and others don't work. What I'm talking about is not the terminating character. Actually, I'm the one who added the \0 character, and they are multiple. For example:
4B 6F 72 6B 6D 61 20 73 F6 6E 6D 65 7A 20 62 75 20 15F 61 66 61 
6B 6C 61 72 64 61 20 79 FC 7A 65 6E 20 61 6C 20 73 61 6E 63 61 
6B 0 0
53 F6 6E 6D 65 64 65 6E 20 79 75 72 64 75 6D 75 6E 20 FC 73 74 FC 6E 64 65 20 74 FC 74 65 6E 20 65 6E 20 73 6F 6E 20 6F 63 61 6B 0 4F 20 62 65 6E 69 6D 20 6D 69 6C 6C 65 74 69 6D 69 6E 20 79 131 6C 64 131 7A 131 64 131 72 20 70 61 72 6C 61 79 61 63 61 6B 0 0 0 0 4F 20 62 65 6E 69 6D 64 69 72 20 6F 20 62 65 6E 69 6D 20 6D 69 6C 6C 65 74 69 6D 69 6E 64 69 72 20 61 6E 63 61 6B 0 0 C7 61 74 6D 61 20 6B 75 72 62 61 6E 20 6F 6C 61 79 131 6D 20 E7 65 68 72 65 6E 69 20 65 79 20 6E 61 7A 6C 131 20 68 69 6C 61 6C 0 0 4B 61 68 72 61 6D 61 6E 20 131 72 6B 131 6D 61 20 62 69 72 20 67 FC 6C 20 6E 65 20 62 75 20 15F 69 64 64 65 74 20 62 75 20 63 65 6C E2 6C 0 0 0 0 0 0 Thanks, SDB 79
Sep 22 2022
prev sibling next sibling parent Paul Backus <snarwin gmail.com> writes:
On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
wrote:
 Is there a more accurate way to delete the '\0' characters at 
 the end of the string? I tried functions in this module: 
 https://dlang.org/phobos/std_string.html

 ```d
 auto foo(string s)
 {
   string r;
   foreach(c; s)
   {
     if(c > 0)
     {
       r ~= c;
     }
   }
   return r;
 }
 ```
```d import std.algorithm : filter; import std.utf : byCodeUnit; import std.array : array; string removeZeroes(string s) { return s.byCodeUnit .filter!(c => c != '\0') .array; } ```
Sep 22 2022
prev sibling next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 9/22/22 03:53, Salih Dincer wrote:
 Is there a more accurate way to delete the '\0' characters at the end of
 the string? I tried functions in this module:
 https://dlang.org/phobos/std_string.html
Just to remind, the following are always related as well because strings are arrays, which are ranges: std.range std.algorithm std.array
        r ~= c;
Stefan Koch once said the ~ operator should be called "the slow operator". Meaning, if you want to make your code slow, then use that operator. :) The reason is, that operation may need to allocate memory from the heap and copy existing elements there. And any memory allocation may trigger a garbage collection cycle. Of course, none of that matters if we are talking about a short string. However, it may become a dominating reason why a program may be slow. I was going to suggest Paul Backus' solution as well but I may leave the array part out in my own code until I really need it: string noZeroes(string s) { return s.byCodeUnit.filter!(c => c != '\0'); } Now, a caller may be happy without an array: auto a = s.noZeroes.take(10); And another can easily add a .array when really needed: auto b = s.noZeroes.array; That may be seen as premature optimization but I see it as avoiding a premature pessimization because I did not put in any extra work there. But again, this all depends on each program. If we were talking about mutable elements and the order of elements did not matter, then the fastest option would be to remove with SwapStrategy.unstable: import std; void main() { auto arr = [ 1, 0, 2, 0, 0, 3, 4, 5 ]; arr = remove!(i => i == 0, SwapStrategy.unstable)(arr); writeln(arr); } unstable works by swapping the first 0 that it finds with the last non-zero that it finds and continues in that way. No memory is allocated. As a result, the order of elements will not preserved but unstable can be very fast compared to .stable (which is the default) because .stable must move elements to the left (multiple times in some cases) and can be expensive especially for some types. The result of the program above is the following: [1, 5, 2, 4, 3] Zeros are removed but the order is not preserved. And very important: Don't forget to assign remove's return value back to 'arr'. ;) I know this will not work for a string but something to keep in mind... Ali
Sep 22 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 9/22/22 08:19, Ali Çehreli wrote:

 string noZeroes(string s)
 {
      return s.byCodeUnit.filter!(c => c != '\0');
 }
That won't compile; the return type must be 'auto'. Ali
Sep 22 2022
parent reply Salih Dincer <salihdb hotmail.com> writes:
On Thursday, 22 September 2022 at 15:22:06 UTC, Ali Çehreli wrote:
 On 9/22/22 08:19, Ali Çehreli wrote:
 ```d
 string noZeroes(string s)
 {
      return s.byCodeUnit.filter!(c => c != '\0');
 }
 ```
That won't compile; the return type must be 'auto'. Ali
Thank you for all the valuable information you wrote. I chose to split because the '\0' are at the end of the string: ```d string splitz(string s) { import std.string : indexOf; size_t seekPos = s.indexOf('\0'); return s[0..seekPos]; } ``` SDB 79
Sep 22 2022
parent reply Salih Dincer <salihdb hotmail.com> writes:
On Thursday, 22 September 2022 at 20:53:28 UTC, Salih Dincer 
wrote:
 ```d
 string splitz(string s)
 {
   import std.string : indexOf;
   size_t seekPos = s.indexOf('\0');
   return s[0..seekPos];
 }
 ```
I ignored the possibility of not finding '\0'. I'm fixing it now: ```d string splitz(string s) { import std.string : indexOf; auto seekPos = s.indexOf('\0'); return seekPos > 0 ? s[0..seekPos] : s; } ``` But I also wish it could be like this: ```d string splitz(string s) { import std.string : indexOf; if(auto seekPos = s.indexOf('\0') > 0) { return s[0..seekPos]; } return s; } ``` SDB 79
Sep 22 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 9/22/22 14:31, Salih Dincer wrote:

 string splitz(string s)
 {
    import std.string : indexOf;
    auto seekPos = s.indexOf('\0');
    return seekPos > 0 ? s[0..seekPos] : s;
 }
If you have multiple '\0' chars that you will continue looking for, how about the following? import std; auto splitz(string s) { return s.splitter('\0'); } unittest { auto data = [ "hello", "and", "goodbye", "world" ]; auto hasZeros = data.joiner("\0").text; assert(hasZeros.count('\0') == 3); assert(hasZeros.splitz.equal(data)); } void main() { } Ali
Sep 22 2022
parent reply Salih Dincer <salihdb hotmail.com> writes:
On Thursday, 22 September 2022 at 21:49:36 UTC, Ali Çehreli wrote:
 On 9/22/22 14:31, Salih Dincer wrote:

 If you have multiple '\0' chars that you will continue looking 
 for, how about the following?
It can be preferred in terms of working at ranges. But it isn't useful in terms of having more than one character and moving away from strings. For example: ```d auto data = [ "hello", "and", "goodbye", "world" ]; auto hasZeros = data.joiner("\0\0").text; // ("hello\0\0", "and\0\0", "goodbye\0\0", "world\0\0")    assert(hasZeros.count('\0') == 7); assert(hasZeros.splitz.walkLength == data.length * 2 - 1); auto range = hasZeros.splitz; // ("hello", "", "and", "", "goodbye", "", "world") ``` SDB 79
Sep 23 2022
parent reply Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Friday, 23 September 2022 at 08:50:42 UTC, Salih Dincer wrote:
 On Thursday, 22 September 2022 at 21:49:36 UTC, Ali Çehreli 
 wrote:
 On 9/22/22 14:31, Salih Dincer wrote:

 If you have multiple '\0' chars that you will continue looking 
 for, how about the following?
It can be preferred in terms of working at ranges. But it isn't useful in terms of having more than one character and moving away from strings. For example: ```d auto data = [ "hello", "and", "goodbye", "world" ]; auto hasZeros = data.joiner("\0\0").text; // ("hello\0\0", "and\0\0", "goodbye\0\0", "world\0\0")    assert(hasZeros.count('\0') == 7); assert(hasZeros.splitz.walkLength == data.length * 2 - 1); auto range = hasZeros.splitz; // ("hello", "", "and", "", "goodbye", "", "world") ``` SDB 79
You should be explicit with requirements. It was hard to tell if you original code was correct. ```d auto splitz(string s) { return s.splitter('\0') .filter!(x => !x.empty); } ```
Sep 23 2022
parent reply Salih Dincer <salihdb hotmail.com> writes:
On Friday, 23 September 2022 at 14:38:35 UTC, Jesse Phillips 
wrote:
 
 You should be explicit with requirements.
Sorry, generally what I speak is Turkish language. So, I speak English as a foreign language but it's clear I wrote. What do you think when you look at the text I've pointed to following? On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer wrote:
 Is there a more accurate way to delete **the '\0' characters at 
 the end of the string?**
* character**S** * at the **END** * of the **STRING**
 ```d
 auto splitz(string s) {
     return s.splitter('\0')
    .filter!(x => !x.empty);
 }
 ```
By the way, if we're going to filter, why are we splitting? Anyways! For this implementation, indexOf() is a powerful enough tool. In fact, it's pretty fast, as there is a maximum of the \0 8 characters possible and when those 8 '\0' are at the end of the string! For example: ```d void main() { string[] samples = ["the one\0", "the two\0\0", "the three\0\0\0", "the four\0\0\0\0", "the five\0\0\0\0\0", "the six\0\0\0\0\0\0", "the seven\0\0\0\0\0\0\0", "the eight\0\0\0\0\0\0\0\0"]; import std.stdio : writefln; foreach(s; samples) { auto start = s.length - 8; string res = s.splitZeros!false(start); writefln("%(%02X%)", cast(ubyte[])res); } } string splitZeros(bool keepSep)(string s, size_t start = 0) { auto keep = keepSep ? 0 : 1; import std.string : indexOf; if(auto seekPos = s.indexOf('\0', start) + 1) { return s[0..seekPos - keep]; } return s; } ``` SDB 79
Sep 23 2022
next sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 9/23/22 11:37, Salih Dincer wrote:

 * character**S**
 * at the **END**
 * of the **STRING**
I think the misunderstanding is due to the following data you've posted earlier (I am abbreviating): 53 F6 6E 6D 65 64 65 6E 20 79 75 72 64 75 6D 75 6E 20 FC 73 74 FC 6E 64 65 20 74 FC 74 65 6E 20 65 6E 20 73 6F 6E 20 6F 63 61 6B 0 4F 20 62 65 6E 69 6D 20 6D 69 6C 6C 65 74 69 6D 69 6E 20 79 131 6C 64 131 7A 131 64 131 72 20 70 61 72 6C 61 79 61 63 61 6B 0 0 0 0 You must have meant there were multiple strings there (apparently on separate lines) but I assumed you were showing a single string with 0 bytes inside the string. (Word wrap must have contributed to the misunderstanding.) Ali P.S. With that understanding, now I think searching from the end for the first non-zero byte may be faster than searching from the beginning for the first zero; but again, it depends on the data.
Sep 23 2022
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Friday, 23 September 2022 at 18:37:59 UTC, Salih Dincer wrote:
 On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
 wrote:
 Is there a more accurate way to delete **the '\0' characters 
 at the end of the string?**
* character**S** * at the **END** * of the **STRING**
Apologies for the confusion. You can use [`stripRight`][1] for this: ```d import std.string: stripRight; import std.stdio: writeln; void main() { string[] samples = [ "the one\0", "the two\0\0", "the three\0\0\0", "the four\0\0\0\0", "the five\0\0\0\0\0", "the six\0\0\0\0\0\0", "the seven\0\0\0\0\0\0\0", "the eight\0\0\0\0\0\0\0\0" ]; foreach (s; samples) { writeln(s.stripRight("\0")); } } ``` [1]: https://phobos.dpldocs.info/std.string.stripRight.2.html
Sep 23 2022
parent Salih Dincer <salihdb hotmail.com> writes:
On Friday, 23 September 2022 at 22:17:51 UTC, Paul Backus wrote:
 Apologies for the confusion. You can use 
 [stripRight](https://phobos.dpldocs.info/std.string.stripRight.2.html)
We have a saying: Estaghfirullah! Thank you all so much because it has been very useful for me. I learned two things: * First, we can use strip() functions with parameters: https://dlang.org/phobos/std_algorithm_mutation.html#.strip (examples are very nice) * Second, we could walk through the string in reverse and with indexOf(): https://github.com/dlang/phobos/blob/master/std/string.d#L3418 **Source Code:** ```d //import std.string : stripRight;/* string stripRight(string str, const(char)[] chars) { import std.string : indexOf; for (; !str.empty; str.popBack()) { if (chars.indexOf(str.back) == -1) break; } return str; }//*/ ``` Delicious... SDB 79
Sep 23 2022
prev sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
wrote:
 Is there a more accurate way to delete the '\0' characters at 
 the end of the string?
Accurate? No. Your code works. Correct is correct, no matter efficiency or style.
 I tried functions in this module: 
 https://dlang.org/phobos/std_string.html

 [code]
You won’t do it any shorter than this if returning a range of `dchar` is fine: ```d auto removez(const(char)[] string, char ch = '\0') { import std.algorithm.iteration; return string.splitter(ch).joiner; } ``` If `dchar` is a problem and a range is not what you want, ```d inout(char)[] removez(inout(char)[] chars) safe pure nothrow { import std.array, std.algorithm.iteration; auto data = cast(const(ubyte)[])chars; auto result = data.splitter(0).joiner.array; return (() inout trusted => cast(inout(char)[])result)(); } ``` Bonus: Works with any kind of array of qualified char. As `string` is simply `immutable(char)[]`, `removez` returns a `string` given a `string`, but returns a `char[]` given a `char[]`, etc. Warning: I do not know if the ` trusted` expression is really okay. The cast is not ` safe` because of type qualifiers: If `inout` becomes nothing (i.e. mutable), the cast removes `const`. I suspect that it is still okay because the result of `array` is unique. Maybe others know better?
Sep 23 2022