digitalmars.D.learn - to delete the '\0' characters

Salih Dincer (18/18) Sep 22 2022 Is there a more accurate way to delete the '\0' characters at the

ag0aep6g (12/30) Sep 22 2022 I don't understand what you mean by "more accurate".

ag0aep6g (6/31) Sep 22 2022 [...]

user1234 (9/27) Sep 22 2022 Two remarks:

Salih Dincer (19/29) Sep 22 2022 As I mentioned earlier stripRight() and others don't work. What

Paul Backus (13/30) Sep 22 2022 ```d
=?UTF-8?Q?Ali_=c3=87ehreli?= (49/53) Sep 22 2022 Just to remind, the following are always related as well because strings...

=?UTF-8?Q?Ali_=c3=87ehreli?= (3/7) Sep 22 2022 That won't compile; the return type must be 'auto'.

Salih Dincer (12/21) Sep 22 2022 Thank you for all the valuable information you wrote. I chose to

Salih Dincer (24/32) Sep 22 2022 I ignored the possibility of not finding '\0'. I'm fixing it now:

=?UTF-8?Q?Ali_=c3=87ehreli?= (16/22) Sep 22 2022 If you have multiple '\0' chars that you will continue looking for, how

Salih Dincer (14/17) Sep 23 2022 It can be preferred in terms of working at ranges. But it isn't

Jesse Phillips (9/28) Sep 23 2022 You should be explicit with requirements. It was hard to tell if

Salih Dincer (44/54) Sep 23 2022 Sorry, generally what I speak is Turkish language. So, I speak

=?UTF-8?Q?Ali_=c3=87ehreli?= (15/18) Sep 23 2022 I think the misunderstanding is due to the following data you've posted
Paul Backus (20/27) Sep 23 2022 Apologies for the confusion. You can use [`stripRight`][1] for

Salih Dincer (26/28) Sep 23 2022 We have a saying: Estaghfirullah!

Quirin Schroll (32/37) Sep 23 2022 Accurate? No. Your code works. Correct is correct, no matter

Salih Dincer <salihdb hotmail.com> writes:

Is there a more accurate way to delete the '\0' characters at the 
end of the string? I tried functions in this module: 
https://dlang.org/phobos/std_string.html

```d
auto foo(string s)
{
   string r;
   foreach(c; s)
   {
     if(c > 0)
     {
       r ~= c;
     }
   }
   return r;
}
```

SDB 79

Sep 22 2022

ag0aep6g <anonymous example.com> writes:

On 22.09.22 12:53, Salih Dincer wrote:
 Is there a more accurate way to delete the '\0' characters at the end of 
 the string? I tried functions in this module: 
 https://dlang.org/phobos/std_string.html
 
 ```d
 auto foo(string s)
 {
    string r;
    foreach(c; s)
    {
      if(c > 0)
      {
        r ~= c;
      }
    }
    return r;
 }
 ```

I don't understand what you mean by "more accurate".

Here's a snippet that's a bit shorter than yours and doesn't copy the data:

     while (s.length > 0 && s[$ - 1] == '\0')
     {
         s = s[0 .. $ - 1];
     }
     return s;

But do you really want to allow embedded '\0's? I.e., should 
foo("foo\0bar\0") really resolve to "foo\0bar" and not "foo"?

Usually, it's the first '\0' that signals the end of a string. In that 
case you better start the search at the front and stop at the first hit.

Sep 22 2022

ag0aep6g <anonymous example.com> writes:

On 22.09.22 13:14, ag0aep6g wrote:
 On 22.09.22 12:53, Salih Dincer wrote:

[...]
 ```d
 auto foo(string s)
 {
    string r;
    foreach(c; s)
    {
      if(c > 0)
      {
        r ~= c;
      }
    }
    return r;
 }
 ```


[...]
 Here's a snippet that's a bit shorter than yours and doesn't copy the data:
 
      while (s.length > 0 && s[$ - 1] == '\0')
      {
          s = s[0 .. $ - 1];
      }
      return s;
 
 But do you really want to allow embedded '\0's? I.e., should 
 foo("foo\0bar\0") really resolve to "foo\0bar" and not "foo"?

Whoops. Your code actually turns "foo\0bar" into "foobar", removing the 
embedded '\0'. So my supposed alternative is wrong.

Still, you usually want to stop at the first '\0'.

Sep 22 2022

user1234 <user1234 12.de> writes:

On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
wrote:
 Is there a more accurate way to delete the '\0' characters at 
 the end of the string? I tried functions in this module: 
 https://dlang.org/phobos/std_string.html

 ```d
 auto foo(string s)
 {
   string r;
   foreach(c; s)
   {
     if(c > 0)
     {
       r ~= c;
     }
   }
   return r;
 }
 ```

 SDB 79

Two remarks:

1. The zero implicitly added to literals is not part of the 
string. for example s[$-1] will not give 0 unless you added it 
explictly to a literal

2. you code remove all the 0, not the one at the end. As it still 
ensure what you want to achieve, maybe try 
[`stripRight()`](https://dlang.org/phobos/std_string.html#.stripRight). The
second overload allows to specify the characters to remove.

Sep 22 2022

Salih Dincer <salihdb hotmail.com> writes:

On Thursday, 22 September 2022 at 13:29:43 UTC, user1234 wrote:
 Two remarks:

 1. The zero implicitly added to literals is not part of the 
 string. for example s[$-1] will not give 0 unless you added it 
 explictly to a literal

 2. you code remove all the 0, not the one at the end. As it 
 still ensure what you want to achieve, maybe try 
 [`stripRight()`](https://dlang.org/phobos/std_string.html#.stripRight). The
second overload allows to specify the characters to remove.

As I mentioned earlier stripRight() and others don't work. What 
I'm talking about is not the terminating character. Actually, I'm 
the one who added the \0 character, and they are multiple. For 
example:

4B 6F 72 6B 6D 61 20 73 F6 6E 6D 65 7A 20 62 75 20 15F 61 66 61 
6B 6C 61 72 64 61 20 79 FC 7A 65 6E 20 61 6C 20 73 61 6E 63 61 
6B 0 0

53 F6 6E 6D 65 64 65 6E 20 79 75 72 64 75 6D 75 6E 20 FC 73 74 FC 
6E 64 65 20 74 FC 74 65 6E 20 65 6E 20 73 6F 6E 20 6F 63 61 6B 0
4F 20 62 65 6E 69 6D 20 6D 69 6C 6C 65 74 69 6D 69 6E 20 79 131 
6C 64 131 7A 131 64 131 72 20 70 61 72 6C 61 79 61 63 61 6B 0 0 0 
0
4F 20 62 65 6E 69 6D 64 69 72 20 6F 20 62 65 6E 69 6D 20 6D 69 6C 
6C 65 74 69 6D 69 6E 64 69 72 20 61 6E 63 61 6B 0 0
C7 61 74 6D 61 20 6B 75 72 62 61 6E 20 6F 6C 61 79 131 6D 20 E7 
65 68 72 65 6E 69 20 65 79 20 6E 61 7A 6C 131 20 68 69 6C 61 6C 0 
0
4B 61 68 72 61 6D 61 6E 20 131 72 6B 131 6D 61 20 62 69 72 20 67 
FC 6C 20 6E 65 20 62 75 20 15F 69 64 64 65 74 20 62 75 20 63 65 
6C E2 6C 0 0 0 0 0 0

Thanks, SDB 79

Sep 22 2022

Paul Backus <snarwin gmail.com> writes:

On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
wrote:
 Is there a more accurate way to delete the '\0' characters at 
 the end of the string? I tried functions in this module: 
 https://dlang.org/phobos/std_string.html

 ```d
 auto foo(string s)
 {
   string r;
   foreach(c; s)
   {
     if(c > 0)
     {
       r ~= c;
     }
   }
   return r;
 }
 ```

```d
import std.algorithm : filter;
import std.utf : byCodeUnit;
import std.array : array;

string removeZeroes(string s)
{
     return s.byCodeUnit
         .filter!(c => c != '\0')
         .array;
}
```

Sep 22 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 9/22/22 03:53, Salih Dincer wrote:
 Is there a more accurate way to delete the '\0' characters at the end of
 the string? I tried functions in this module:
 https://dlang.org/phobos/std_string.html

Just to remind, the following are always related as well because strings 
are arrays, which are ranges:

   std.range
   std.algorithm
   std.array

        r ~= c;

Stefan Koch once said the ~ operator should be called "the slow 
operator". Meaning, if you want to make your code slow, then use that 
operator. :)

The reason is, that operation may need to allocate memory from the heap 
and copy existing elements there. And any memory allocation may trigger 
a garbage collection cycle.

Of course, none of that matters if we are talking about a short string. 
However, it may become a dominating reason why a program may be slow.

I was going to suggest Paul Backus' solution as well but I may leave the 
array part out in my own code until I really need it:

string noZeroes(string s)
{
     return s.byCodeUnit.filter!(c => c != '\0');
}

Now, a caller may be happy without an array:

     auto a = s.noZeroes.take(10);

And another can easily add a .array when really needed:

     auto b = s.noZeroes.array;

That may be seen as premature optimization but I see it as avoiding a 
premature pessimization because I did not put in any extra work there. 
But again, this all depends on each program.

If we were talking about mutable elements and the order of elements did 
not matter, then the fastest option would be to remove with 
SwapStrategy.unstable:

import std;

void main() {
     auto arr = [ 1, 0, 2, 0, 0, 3, 4, 5 ];
     arr = remove!(i => i == 0, SwapStrategy.unstable)(arr);
     writeln(arr);
}

unstable works by swapping the first 0 that it finds with the last 
non-zero that it finds and continues in that way. No memory is 
allocated. As a result, the order of elements will not preserved but 
unstable can be very fast compared to .stable (which is the default) 
because .stable must move elements to the left (multiple times in some 
cases) and can be expensive especially for some types.

The result of the program above is the following:

[1, 5, 2, 4, 3]

Zeros are removed but the order is not preserved.

And very important: Don't forget to assign remove's return value back to 
'arr'. ;)

I know this will not work for a string but something to keep in mind...

Ali

Sep 22 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 9/22/22 08:19, Ali Çehreli wrote:

 string noZeroes(string s)
 {
      return s.byCodeUnit.filter!(c => c != '\0');
 }

That won't compile; the return type must be 'auto'.

Ali

Sep 22 2022

Salih Dincer <salihdb hotmail.com> writes:

On Thursday, 22 September 2022 at 15:22:06 UTC, Ali Çehreli wrote:
 On 9/22/22 08:19, Ali Çehreli wrote:
 ```d
 string noZeroes(string s)
 {
      return s.byCodeUnit.filter!(c => c != '\0');
 }
 ```

 That won't compile; the return type must be 'auto'.

 Ali

Thank you for all the valuable information you wrote. I chose to 
split because the '\0' are at the end of the string:

```d
string splitz(string s)
{
   import std.string : indexOf;
   size_t seekPos = s.indexOf('\0');
   return s[0..seekPos];
}
```

SDB 79

Sep 22 2022

Salih Dincer <salihdb hotmail.com> writes:

On Thursday, 22 September 2022 at 20:53:28 UTC, Salih Dincer 
wrote:
 ```d
 string splitz(string s)
 {
   import std.string : indexOf;
   size_t seekPos = s.indexOf('\0');
   return s[0..seekPos];
 }
 ```

I ignored the possibility of not finding '\0'. I'm fixing it now:

```d
string splitz(string s)
{
   import std.string : indexOf;
   auto seekPos = s.indexOf('\0');
   return seekPos > 0 ? s[0..seekPos] : s;
}
```

But I also wish it could be like this:

```d
string splitz(string s)
{
   import std.string : indexOf;
   if(auto seekPos = s.indexOf('\0') > 0)
   {
     return s[0..seekPos];
   }
   return s;
}
```

SDB 79

Sep 22 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 9/22/22 14:31, Salih Dincer wrote:

 string splitz(string s)
 {
    import std.string : indexOf;
    auto seekPos = s.indexOf('\0');
    return seekPos > 0 ? s[0..seekPos] : s;
 }

If you have multiple '\0' chars that you will continue looking for, how 
about the following?

import std;

auto splitz(string s) {
     return s.splitter('\0');
}

unittest {
     auto data = [ "hello", "and", "goodbye", "world" ];
     auto hasZeros = data.joiner("\0").text;
     assert(hasZeros.count('\0') == 3);
     assert(hasZeros.splitz.equal(data));
}

void main() {
}

Ali

Sep 22 2022

Salih Dincer <salihdb hotmail.com> writes:

On Thursday, 22 September 2022 at 21:49:36 UTC, Ali Çehreli wrote:
 On 9/22/22 14:31, Salih Dincer wrote:

 If you have multiple '\0' chars that you will continue looking 
 for, how about the following?

It can be preferred in terms of working at ranges.  But it isn't 
useful in terms of having more than one character and moving away 
from strings. For example:

```d
     auto data = [ "hello", "and", "goodbye", "world" ];
     auto hasZeros = data.joiner("\0\0").text; // ("hello\0\0", 
"and\0\0", "goodbye\0\0", "world\0\0")

     assert(hasZeros.count('\0') == 7);
     assert(hasZeros.splitz.walkLength == data.length * 2 - 1);

     auto range = hasZeros.splitz; // ("hello", "", "and", "", 
"goodbye", "", "world")
```
SDB 79

Sep 23 2022

Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:

On Friday, 23 September 2022 at 08:50:42 UTC, Salih Dincer wrote:
 On Thursday, 22 September 2022 at 21:49:36 UTC, Ali Çehreli 
 wrote:
 On 9/22/22 14:31, Salih Dincer wrote:

 If you have multiple '\0' chars that you will continue looking 
 for, how about the following?

 It can be preferred in terms of working at ranges.  But it 
 isn't useful in terms of having more than one character and 
 moving away from strings. For example:

 ```d
     auto data = [ "hello", "and", "goodbye", "world" ];
     auto hasZeros = data.joiner("\0\0").text; // ("hello\0\0", 
 "and\0\0", "goodbye\0\0", "world\0\0")

     assert(hasZeros.count('\0') == 7);
     assert(hasZeros.splitz.walkLength == data.length * 2 - 1);

     auto range = hasZeros.splitz; // ("hello", "", "and", "", 
 "goodbye", "", "world")
 ```
 SDB 79


You should be explicit with requirements. It was hard to tell if 
you original code was correct.

```d
auto splitz(string s) {
     return s.splitter('\0')
    .filter!(x => !x.empty);
}
```

Sep 23 2022

Salih Dincer <salihdb hotmail.com> writes:

On Friday, 23 September 2022 at 14:38:35 UTC, Jesse Phillips 
wrote:
 
 You should be explicit with requirements.

Sorry, generally what I speak is Turkish language. So, I speak 
English as a foreign language but it's clear I wrote. What do you 
think when you look at the text I've pointed to following?

On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
wrote:
 Is there a more accurate way to delete **the '\0' characters at 
 the end of the string?**

* character**S**
* at the **END**
* of the **STRING**

 ```d
 auto splitz(string s) {
     return s.splitter('\0')
    .filter!(x => !x.empty);
 }
 ```

By the way, if we're going to filter, why are we splitting? 
Anyways! For this implementation, indexOf() is a powerful enough 
tool. In fact, it's pretty fast, as there is a maximum of the \0 
8 characters possible and when those 8 '\0' are at the end of the 
string! For example:

```d
void main()
{
   string[] samples = ["the one\0", "the two\0\0", "the 
three\0\0\0",
                       "the four\0\0\0\0", "the five\0\0\0\0\0",
                       "the six\0\0\0\0\0\0", "the 
seven\0\0\0\0\0\0\0",
                       "the eight\0\0\0\0\0\0\0\0"];

   import std.stdio : writefln;
   foreach(s; samples)
   {
     auto start = s.length - 8;
     string res = s.splitZeros!false(start);
     writefln("%(%02X%)", cast(ubyte[])res);
   }
}

string splitZeros(bool keepSep)(string s, size_t start = 0)
{
   auto keep = keepSep ? 0 : 1;

   import std.string : indexOf;
   if(auto seekPos = s.indexOf('\0', start) + 1)
   {
     return s[0..seekPos - keep];
   }
   return s;
}
```
SDB 79

Sep 23 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 9/23/22 11:37, Salih Dincer wrote:

 * character**S**
 * at the **END**
 * of the **STRING**

I think the misunderstanding is due to the following data you've posted 
earlier (I am abbreviating):

53 F6 6E 6D 65 64 65 6E 20 79 75 72 64 75 6D 75 6E 20 FC 73 74 FC 6E 64 
65 20 74 FC 74 65 6E 20 65 6E 20 73 6F 6E 20 6F 63 61 6B 0
4F 20 62 65 6E 69 6D 20 6D 69 6C 6C 65 74 69 6D 69 6E 20 79 131 6C 64 
131 7A 131 64 131 72 20 70 61 72 6C 61 79 61 63 61 6B 0 0 0 0

You must have meant there were multiple strings there (apparently on 
separate lines) but I assumed you were showing a single string with 0 
bytes inside the string. (Word wrap must have contributed to the 
misunderstanding.)

Ali

P.S. With that understanding, now I think searching from the end for the 
first non-zero byte may be faster than searching from the beginning for 
the first zero; but again, it depends on the data.

Sep 23 2022

Paul Backus <snarwin gmail.com> writes:

On Friday, 23 September 2022 at 18:37:59 UTC, Salih Dincer wrote:
 On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
 wrote:
 Is there a more accurate way to delete **the '\0' characters 
 at the end of the string?**

 * character**S**
 * at the **END**
 * of the **STRING**

Apologies for the confusion. You can use [`stripRight`][1] for 
this:

```d
import std.string: stripRight;
import std.stdio: writeln;

void main()
{
     string[] samples = [
         "the one\0", "the two\0\0", "the three\0\0\0", "the 
four\0\0\0\0",
         "the five\0\0\0\0\0", "the six\0\0\0\0\0\0",
         "the seven\0\0\0\0\0\0\0", "the eight\0\0\0\0\0\0\0\0"
     ];

     foreach (s; samples) {
         writeln(s.stripRight("\0"));
     }
}
```

[1]: https://phobos.dpldocs.info/std.string.stripRight.2.html

Sep 23 2022

Salih Dincer <salihdb hotmail.com> writes:

On Friday, 23 September 2022 at 22:17:51 UTC, Paul Backus wrote:
 Apologies for the confusion. You can use 
 [stripRight](https://phobos.dpldocs.info/std.string.stripRight.2.html)

We have a saying: Estaghfirullah!

Thank you all so much because it has been very useful for me.

I learned two things:

* First, we can use strip() functions with parameters:
https://dlang.org/phobos/std_algorithm_mutation.html#.strip

(examples are very nice)

* Second, we could walk through the string in reverse and with 
indexOf():
https://github.com/dlang/phobos/blob/master/std/string.d#L3418

**Source Code:**
```d
//import std.string : stripRight;/*
string stripRight(string str, const(char)[] chars)
{
   import std.string : indexOf;
   for (; !str.empty; str.popBack())
   {
     if (chars.indexOf(str.back) == -1)
       break;
   }
   return str;
}//*/
```

Delicious...

SDB 79

Sep 23 2022

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
wrote:
 Is there a more accurate way to delete the '\0' characters at 
 the end of the string?

Accurate? No. Your code works. Correct is correct, no matter 
efficiency or style.

 I tried functions in this module: 
 https://dlang.org/phobos/std_string.html

 [code]

You won’t do it any shorter than this if returning a range of 
`dchar` is fine:
```d
auto removez(const(char)[] string, char ch = '\0')
{
     import std.algorithm.iteration;
     return string.splitter(ch).joiner;
}
```
If `dchar` is a problem and a range is not what you want,
```d
inout(char)[] removez(inout(char)[] chars)  safe pure nothrow
{
     import std.array, std.algorithm.iteration;
     auto data = cast(const(ubyte)[])chars;
     auto result = data.splitter(0).joiner.array;
     return (() inout  trusted => cast(inout(char)[])result)();
}
```
Bonus: Works with any kind of array of qualified char. As 
`string` is simply `immutable(char)[]`, `removez` returns a 
`string` given a `string`, but returns a `char[]` given a 
`char[]`, etc.

Warning: I do not know if the ` trusted` expression is really 
okay. The cast is not ` safe` because of type qualifiers: If 
`inout` becomes nothing (i.e. mutable), the cast removes `const`. 
I suspect that it is still okay because the result of `array` is 
unique. Maybe others know better?

Sep 23 2022

D Programming

C/C++ Programming

Other

digitalmars.D.learn - to delete the '\0' characters