digitalmars.D.learn - How to loop through characters of a string in D language?

BoQsc (14/14) Dec 08 2021 Let's say I want to skip characters and build a new string.

Biotronic (10/24) Dec 08 2021 import std.stdio : writeln;

BoQsc (27/52) Dec 08 2021 I somehow have universal cross language hate for this kind of

kdevel (14/38) Dec 09 2021 It depends on what you expect when you read source code. I don't

forkit (15/23) Dec 09 2021 more PROs:

Adam D Ruppe (7/8) Dec 08 2021 foreach(ch; a) {

BoQsc (21/29) Dec 08 2021 Thanks Adam.

bauss (2/16) Dec 08 2021 string b = a.replace(";", "");

BoQsc (13/35) Dec 08 2021 Thanks, that's what I used to do few years ago.

forkit (8/44) Dec 08 2021 It's also worth noting the differences in compiler output, as

Stanislav Blinov (4/11) Dec 08 2021 You're passing a literal. Try passing a runtime value (e.g. a

forkit (6/9) Dec 08 2021 but this will change nothing.

forkit (3/13) Dec 08 2021 well... maybe not that apparent afterall ;-)

kdevel (3/4) Dec 09 2021 👍

Salih Dincer (18/32) Dec 08 2021 I always use split() and joiner pair. You can customize it as you
Rumbu (6/12) Dec 09 2021 Since it seems there is a contest here:

IGotD- (2/7) Dec 10 2021 Would that become two for loops or not?

Rumbu (20/30) Dec 10 2021 I thought it's a beauty contest.

forkit (4/5) Dec 10 2021 Well, if it's a beauty contest, then i got a beauty..

=?ISO-8859-1?Q?Lu=EDs_Ferreira?= (8/18) Dec 10 2021 charset=utf-8

Arjan (4/17) Dec 10 2021 ```d

forkit (4/5) Dec 10 2021 I don't think we have enough ways of doing the same thing yet...

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (14/21) Dec 11 2021 Using libraries can trigger hidden allocations.

forkit (17/18) Dec 11 2021 ok. fine. no unnecessary, hidden allocations then.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (18/37) Dec 11 2021 ```putchar(…)``` is too slow!

forkit (5/6) Dec 12 2021 On planet Mars maybe, but here on earth, my computer can do about

bauss (2/11) Dec 12 2021 Can I borrow a couple of your ticks?

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (18/37) Dec 11 2021 ```putchar(…)``` is too slow!

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/5) Dec 11 2021 Shouldn't be there. Residual leftovers… (I don't want to confuse
Stanislav Blinov (4/7) Dec 11 2021 A function with that name, and calling alloca to boot, cannot be

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/11) Dec 11 2021 :-)

russhy (181/181) Dec 11 2021 Here is mine

Rumbu (5/10) Dec 11 2021 You know that this is already in phobos?

russhy (4/19) Dec 11 2021 you need to import a 8k lines of code module that itself imports

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (33/35) Dec 12 2021 I agree.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (12/17) Dec 12 2021 Bug, it fails if the string ends or starts with ';'.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (31/31) Dec 12 2021 Of course, since it is easy to mess up and use ranges in the

Matheus (20/22) Dec 10 2021 My C way of thinking while using D:

Stanislav Blinov (14/26) Dec 10 2021 Oooh, finally someone suggested to preallocate storage for all

Rumbu (3/4) Dec 10 2021 http://lemire.me/blog/2017/01/20/how-quickly-can-you-remove-spaces-from-...
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (18/20) Dec 10 2021 ```

Stanislav Blinov (15/38) Dec 11 2021 That is about 500% not what I meant. At all. Original code in

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/13) Dec 11 2021 You worry too much, just have fun with differing ways of

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (19/23) Dec 11 2021 Scanning short strings twice is not all that expensive as they
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (28/32) Dec 13 2021 Like this?

Salih Dincer (45/71) Dec 22 2021 It seems faster than algorithms in Phobos. We would love to see

Stanislav Blinov (10/21) Dec 23 2021 You're comparing apples and oranges. When benchmarking, at least

Salih Dincer (6/9) Dec 23 2021 I looked now and you're right. Insomuch that it should be

rumbu (23/31) Dec 23 2021 It seems because MallocReplace is cheating a lot:

BoQsc <vaidas.boqsc gmail.com> writes:

Let's say I want to skip characters and build a new string.

The string example to loop/iterate:

```
import std.stdio;

void main()
{
     string a="abc;def;ab";

}
```

The character I want to skip: `;`

Expected result:
```
abcdefab
```

Dec 08 2021

Biotronic <simen.kjaras gmail.com> writes:

On Wednesday, 8 December 2021 at 11:23:45 UTC, BoQsc wrote:
 Let's say I want to skip characters and build a new string.

 The string example to loop/iterate:

 ```
 import std.stdio;

 void main()
 {
     string a="abc;def;ab";

 }
 ```

 The character I want to skip: `;`

 Expected result:
 ```
 abcdefab
 ```

     import std.stdio : writeln;
     import std.algorithm.iteration : filter;
     import std.conv : to;

     void main()
     {
         string a = "abc;def;ab";
         string b = a.filter!(c => c != ';').to!string;
         writeln(b);
     }

Dec 08 2021

BoQsc <vaidas.boqsc gmail.com> writes:

On Wednesday, 8 December 2021 at 11:35:39 UTC, Biotronic wrote:
 On Wednesday, 8 December 2021 at 11:23:45 UTC, BoQsc wrote:
 Let's say I want to skip characters and build a new string.

 The string example to loop/iterate:

 ```
 import std.stdio;

 void main()
 {
     string a="abc;def;ab";

 }
 ```

 The character I want to skip: `;`

 Expected result:
 ```
 abcdefab
 ```

 [..]
         string b = a.filter!(c => c != ';').to!string;
         writeln(b);
     }

I somehow have universal cross language hate for this kind of 
algorithm.
I'm not getting used to the syntax and that leads to poor 
readability.
But that might be just me.

Anyways,
Here is what I've come up with.

```
import std.stdio;

void main()
{
     string a = "abc;def;ab";
	string b;
	
	for(int i=0; i<a.length; i++){
		write(i);
		writeln(a[i]);
		if (a[i] != ';'){
			b ~= a[i];
		}
		
	}
	
     writeln(b);
}
```

Dec 08 2021

kdevel <kdevel vogtner.de> writes:

On Wednesday, 8 December 2021 at 13:01:32 UTC, BoQsc wrote:
[...]
 I'm not getting used to the syntax and that leads to poor 
 readability.

It depends on what you expect when you read source code. I don't 
want to read how seats in the memory are assigned to bits and 
bytes. Instead I want to read what is done.

 But that might be just me.

Unfortunately not.

 Anyways,
 Here is what I've come up with.

 ```
 import std.stdio;

 void main()
 {
     string a = "abc;def;ab";
 	string b;
 	
 	for(int i=0; i<a.length; i++){
 		write(i);
 		writeln(a[i]);
 		if (a[i] != ';'){
 			b ~= a[i];
 		}
 		
 	}
 	
     writeln(b);
 }
 ```

PRO:

- saves two lines of boilerplate code

CONS:

- raw loop
- postinc ++ is only permitted in ++C
- inconsistent spacing around "="
- mixing tabs and spaces for indentation
- arrow code

Dec 09 2021

forkit <forkit gmail.com> writes:

On Thursday, 9 December 2021 at 18:00:42 UTC, kdevel wrote:
 PRO:

 - saves two lines of boilerplate code

 CONS:

 - raw loop
 - postinc ++ is only permitted in ++C
 - inconsistent spacing around "="
 - mixing tabs and spaces for indentation
 - arrow code

more PROs:

  - You become less dependent on someone else's library.
  - You learn how to do some things yourself.

;-)

of course, I would prefer a less verbose, and safer version, 
which D enables, such as:

foreach(val; a)
     {
         writeln(val);
         if (val != ';')
         {
             b ~= val;
         }
     }

Dec 09 2021

Adam D Ruppe <destructionator gmail.com> writes:

On Wednesday, 8 December 2021 at 11:23:45 UTC, BoQsc wrote:
 The string example to loop/iterate:

foreach(ch; a) {

}

does the individual chars of the string you can also

foreach(dchar ch; a) {

}

to decode the utf 8

Dec 08 2021

BoQsc <vaidas.boqsc gmail.com> writes:

On Wednesday, 8 December 2021 at 12:49:39 UTC, Adam D Ruppe wrote:
 On Wednesday, 8 December 2021 at 11:23:45 UTC, BoQsc wrote:
 The string example to loop/iterate:

 foreach(ch; a) {

 }

 does the individual chars of the string you can also

 foreach(dchar ch; a) {

 }

 to decode the utf 8

Thanks Adam.

This is how it would look implemented.

```
import std.stdio;

void main()
{
     string a = "abc;def;ab";
	string b;
	
	foreach(ch; a) {
		if (ch != ';'){
			b ~= ch;
		}
	
		writeln(ch);
	}
	
     writeln(b);
}
```

Dec 08 2021

bauss <jj_1337 live.dk> writes:

On Wednesday, 8 December 2021 at 11:23:45 UTC, BoQsc wrote:
 Let's say I want to skip characters and build a new string.

 The string example to loop/iterate:

 ```
 import std.stdio;

 void main()
 {
     string a="abc;def;ab";

 }
 ```

 The character I want to skip: `;`

 Expected result:
 ```
 abcdefab
 ```

string b = a.replace(";", "");

Dec 08 2021

BoQsc <vaidas.boqsc gmail.com> writes:

On Wednesday, 8 December 2021 at 14:16:16 UTC, bauss wrote:
 On Wednesday, 8 December 2021 at 11:23:45 UTC, BoQsc wrote:
 Let's say I want to skip characters and build a new string.

 The string example to loop/iterate:

 ```
 import std.stdio;

 void main()
 {
     string a="abc;def;ab";

 }
 ```

 The character I want to skip: `;`

 Expected result:
 ```
 abcdefab
 ```

 string b = a.replace(";", "");

Thanks, that's what I used to do few years ago.
It's a great solution I forget about and it works.

```
import std.stdio;
import std.array;

void main()
{
     string a="abc;def;ab";
	string b = a.replace(";", "");
	writeln(b);
}
```

Dec 08 2021

forkit <forkit gmail.com> writes:

On Wednesday, 8 December 2021 at 14:27:22 UTC, BoQsc wrote:
 On Wednesday, 8 December 2021 at 14:16:16 UTC, bauss wrote:
 On Wednesday, 8 December 2021 at 11:23:45 UTC, BoQsc wrote:
 Let's say I want to skip characters and build a new string.

 The string example to loop/iterate:

 ```
 import std.stdio;

 void main()
 {
     string a="abc;def;ab";

 }
 ```

 The character I want to skip: `;`

 Expected result:
 ```
 abcdefab
 ```

 string b = a.replace(";", "");

 Thanks, that's what I used to do few years ago.
 It's a great solution I forget about and it works.

 ```
 import std.stdio;
 import std.array;

 void main()
 {
     string a="abc;def;ab";
 	string b = a.replace(";", "");
 	writeln(b);
 }
 ```

It's also worth noting the differences in compiler output, as 
well as the time taken to compile, these two approaches:

(1)
string str = "abc;def;ab".filter!(c => c != ';').to!string;

(2)
string str = "abc;def;ab".replace(";", "");

see: https://d.godbolt.org/z/3dWYsEGsr

Dec 08 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Wednesday, 8 December 2021 at 22:18:23 UTC, forkit wrote:

 It's also worth noting the differences in compiler output, as 
 well as the time taken to compile, these two approaches:

 (1)
 string str = "abc;def;ab".filter!(c => c != ';').to!string;

 (2)
 string str = "abc;def;ab".replace(";", "");

 see: https://d.godbolt.org/z/3dWYsEGsr

You're passing a literal. Try passing a runtime value (e.g. a 
command line argument). Also, -O2 -release :) Uless, of course, 
your goal is to look at debug code.

Dec 08 2021

forkit <forkit gmail.com> writes:

On Wednesday, 8 December 2021 at 22:35:35 UTC, Stanislav Blinov 
wrote:
 You're passing a literal. Try passing a runtime value (e.g. a 
 command line argument). Also, -O2 -release :) Uless, of course, 
 your goal is to look at debug code.

but this will change nothing.

the compilation cost of using .replace, will always be apparent 
(compared to the presented alternative), both in less time taken 
to compile, and smaller size of executable.

Dec 08 2021

forkit <forkit gmail.com> writes:

On Wednesday, 8 December 2021 at 22:55:02 UTC, forkit wrote:
 On Wednesday, 8 December 2021 at 22:35:35 UTC, Stanislav Blinov 
 wrote:
 You're passing a literal. Try passing a runtime value (e.g. a 
 command line argument). Also, -O2 -release :) Uless, of 
 course, your goal is to look at debug code.

 but this will change nothing.

 the compilation cost of using .replace, will always be apparent 
 (compared to the presented alternative), both in less time 
 taken to compile, and smaller size of executable.

well... maybe not that apparent afterall ;-)

.. the mysteries of compiler optimisation ....

Dec 08 2021

kdevel <kdevel vogtner.de> writes:

On Wednesday, 8 December 2021 at 14:16:16 UTC, bauss wrote:
[...]
 string b = a.replace(";", "");

👍

Dec 09 2021

Salih Dincer <salihdb hotmail.com> writes:

On Wednesday, 8 December 2021 at 11:23:45 UTC, BoQsc wrote:
 Let's say I want to skip characters and build a new string.

 The string example to loop/iterate:

 ```
 import std.stdio;

 void main()
 {
     string a="abc;def;ab";

 }
 ```

 The character I want to skip: `;`

 Expected result:
 ```
 abcdefab
 ```

I always use split() and joiner pair. You can customize it as you 
want:
```d
import std.stdio : writeln;
import std.algorithm : joiner;
import std.array : split;

bool isWhite(dchar c)  safe pure nothrow  nogc
{
   return c == ' ' || c == ';' ||
         (c >= 0x09&& c <= 0x0D);
}

void main()
{
     string str = "a\nb   c\t;d e f;a  b ";
     str.split!isWhite.joiner.writeln(); //abcdefab
}
```

Dec 08 2021

Rumbu <rumbu rumbu.ro> writes:

On Wednesday, 8 December 2021 at 11:23:45 UTC, BoQsc wrote:
 Let's say I want to skip characters and build a new string.
 The character I want to skip: `;`

 Expected result:
 ```
 abcdefab
 ```

Since it seems there is a contest here:

```d
"abc;def;ghi".split(';').join();
```

:)

Dec 09 2021

IGotD- <nise nise.com> writes:

On Friday, 10 December 2021 at 06:24:27 UTC, Rumbu wrote:

 Since it seems there is a contest here:

 ```d
 "abc;def;ghi".split(';').join();
 ```

 :)

Would that become two for loops or not?

Dec 10 2021

Rumbu <rumbu rumbu.ro> writes:

On Friday, 10 December 2021 at 11:06:21 UTC, IGotD- wrote:
 On Friday, 10 December 2021 at 06:24:27 UTC, Rumbu wrote:

 Since it seems there is a contest here:

 ```d
 "abc;def;ghi".split(';').join();
 ```

 :)

 Would that become two for loops or not?

I thought it's a beauty contest.

```d
string stripsemicolons(string s)
{
   string result;
   // prevent reallocations
   result.length = s.length;
   result.length = 0;

   //append to string only when needed
   size_t i = 0;
   while (i < s.length)
   {
     size_t j = i;
     while (i < s.length && s[i] != ';')
       ++i;
     result ~= s[j..i];
   }
}
```

Dec 10 2021

forkit <forkit gmail.com> writes:

On Friday, 10 December 2021 at 12:15:18 UTC, Rumbu wrote:
 I thought it's a beauty contest.

Well, if it's a beauty contest, then i got a beauty..

char[("abc;def;ab".length - count("abc;def;ab", ";"))] b = 
"abc;def;ab".replace(";", "");

Dec 10 2021

=?ISO-8859-1?Q?Lu=EDs_Ferreira?= <lsferreira riseup.net> writes:

 charset=utf-8
Content-Transfer-Encoding: quoted-printable

Yes it will=2E You can use lazy templates instead, like splitter and joiner=
, which splits and joins lazily, respectively=2E LDC can optimize those tem=
plates fairly well and avoid too much lazy calls and pretty much constructs=
 the logic equivalent to for loop=2E

On 10 December 2021 11:06:21 WET, IGotD- via Digitalmars-d-learn <digitalm=
ars-d-learn puremagic=2Ecom> wrote:
On Friday, 10 December 2021 at 06:24:27 UTC, Rumbu wrote:

=20
 Since it seems there is a contest here:
=20
 ```d
 "abc;def;ghi"=2Esplit(';')=2Ejoin();
 ```
=20
 :)

Would that become two for loops or not?

Dec 10 2021

Arjan <arjan ask.me.to> writes:

On Friday, 10 December 2021 at 06:24:27 UTC, Rumbu wrote:
 On Wednesday, 8 December 2021 at 11:23:45 UTC, BoQsc wrote:
 Let's say I want to skip characters and build a new string.
 The character I want to skip: `;`

 Expected result:
 ```
 abcdefab
 ```

 Since it seems there is a contest here:

 ```d
 "abc;def;ghi".split(';').join();
 ```

 :)

```d
"abc;def;ghi".tr(";", "", "d" );
```

Dec 10 2021

forkit <forkit gmail.com> writes:

On Friday, 10 December 2021 at 22:35:58 UTC, Arjan wrote:
 "abc;def;ghi".tr(";", "", "d" );

I don't think we have enough ways of doing the same thing yet...

so here's one more..

"abc;def;ghi".substitute(";", "");

Dec 10 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 11 December 2021 at 00:39:15 UTC, forkit wrote:
 On Friday, 10 December 2021 at 22:35:58 UTC, Arjan wrote:
 "abc;def;ghi".tr(";", "", "d" );

 I don't think we have enough ways of doing the same thing yet...

 so here's one more..

 "abc;def;ghi".substitute(";", "");

Using libraries can trigger hidden allocations.

```
import std.stdio;

string garbagefountain(string s){
     if (s.length == 1) return s == ";" ? "" : s;
     return garbagefountain(s[0..$/2]) ~ 
garbagefountain(s[$/2..$]);
}

int main() {
     writeln(garbagefountain("abc;def;ab"));
     return 0;
}

```

Dec 11 2021

forkit <forkit gmail.com> writes:

On Saturday, 11 December 2021 at 08:05:01 UTC, Ola Fosheim 
Grøstad wrote:
 Using libraries can trigger hidden allocations.

ok. fine. no unnecessary, hidden allocations then.

// ------------------

module test;

import core.stdc.stdio : putchar;

nothrow  nogc void main()
{
     string str = "abc;def;ab";

     ulong len = str.length;

     for (ulong i = 0; i < len; i++)
     {
         if (cast(int) str[i] != ';')
             putchar(cast(int) str[i]);
     }
}

// ------------------

Dec 11 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 11 December 2021 at 08:46:32 UTC, forkit wrote:
 On Saturday, 11 December 2021 at 08:05:01 UTC, Ola Fosheim 
 Grøstad wrote:
 Using libraries can trigger hidden allocations.

 ok. fine. no unnecessary, hidden allocations then.

 // ------------------

 module test;

 import core.stdc.stdio : putchar;

 nothrow  nogc void main()
 {
     string str = "abc;def;ab";

     ulong len = str.length;

     for (ulong i = 0; i < len; i++)
     {
         if (cast(int) str[i] != ';')
             putchar(cast(int) str[i]);
     }
 }

 // ------------------

```putchar(…)``` is too slow!


```

 safe:

extern (C) long write(long, const void *, long);


void donttrythisathome(string s, char stripchar)  trusted {
	import core.stdc.stdlib;
     char* begin = cast(char*)alloca(s.length);
     char* end = begin;
     foreach(c; s) if (c != stripchar) *(end++) = c;
     write(0, begin, end - begin);
}


 system
void main() {
     string str = "abc;def;ab";
     donttrythisathome(str, ';');
}
````

Dec 11 2021

forkit <forkit gmail.com> writes:

On Saturday, 11 December 2021 at 09:25:37 UTC, Ola Fosheim 
Grøstad wrote:
 ```putchar(…)``` is too slow!

On planet Mars maybe, but here on earth, my computer can do about 
4 billion ticks per second, and my entire program (using putchar) 
takes only 3084 ticks.

Dec 12 2021

bauss <jj_1337 live.dk> writes:

On Monday, 13 December 2021 at 05:46:06 UTC, forkit wrote:
 On Saturday, 11 December 2021 at 09:25:37 UTC, Ola Fosheim 
 Grøstad wrote:
 ```putchar(…)``` is too slow!

 On planet Mars maybe, but here on earth, my computer can do 
 about 4 billion ticks per second, and my entire program (using 
 putchar) takes only 3084 ticks.

Can I borrow a couple of your ticks?

Dec 12 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 11 December 2021 at 08:46:32 UTC, forkit wrote:
 On Saturday, 11 December 2021 at 08:05:01 UTC, Ola Fosheim 
 Grøstad wrote:
 Using libraries can trigger hidden allocations.

 ok. fine. no unnecessary, hidden allocations then.

 // ------------------

 module test;

 import core.stdc.stdio : putchar;

 nothrow  nogc void main()
 {
     string str = "abc;def;ab";

     ulong len = str.length;

     for (ulong i = 0; i < len; i++)
     {
         if (cast(int) str[i] != ';')
             putchar(cast(int) str[i]);
     }
 }

 // ------------------

```putchar(…)``` is too slow!


```

 safe:

extern (C) long write(long, const void *, long);


void donttrythisathome(string s, char stripchar)  trusted {
	import core.stdc.stdlib;
     char* begin = cast(char*)alloca(s.length);
     char* end = begin;
     foreach(c; s) if (c != stripchar) *(end++) = c;
     write(0, begin, end - begin);
}


 system
void main() {
     string str = "abc;def;ab";
     donttrythisathome(str, ';');
}
````

Dec 11 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 11 December 2021 at 09:34:17 UTC, Ola Fosheim 
Grøstad wrote:
  system

Shouldn't be there. Residual leftovers… (I don't want to confuse 
newbies!)

Dec 11 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Saturday, 11 December 2021 at 09:34:17 UTC, Ola Fosheim 
Grøstad wrote:

 void donttrythisathome(string s, char stripchar)  trusted {
 	import core.stdc.stdlib;
     char* begin = cast(char*)alloca(s.length);

A function with that name, and calling alloca to boot, cannot be 
 trusted ;)

Dec 11 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 11 December 2021 at 09:40:47 UTC, Stanislav Blinov 
wrote:
 On Saturday, 11 December 2021 at 09:34:17 UTC, Ola Fosheim 
 Grøstad wrote:

 void donttrythisathome(string s, char stripchar)  trusted {
 	import core.stdc.stdlib;
     char* begin = cast(char*)alloca(s.length);

 A function with that name, and calling alloca to boot, cannot 
 be  trusted ;)

:-)

But I am very trustworthy person! PROMISE!!!

Dec 11 2021

russhy <russhy_s gmail.com> writes:

Here is mine

- 0 allocations

- configurable

- let's you use it how you wish

- fast


```D
import std;
void main()
{
     string a = "abc;def;ab";
     writeln("a => ", a);

     foreach(item; split(a, ';'))
         writeln("\t", item);


     string b = "abc;    def   ;ab";
     writeln("a => ", b);

     foreach(item; split(b, ';', SplitOption.TRIM))
         writeln("\t", item);


     string c= "abc;    ;       ;def   ;ab";
     writeln("a => ",c);

     foreach(item; split(c, ';', SplitOption.TRIM | 
SplitOption.REMOVE_EMPTY))
         writeln("\t", item);
}

SplitIterator!T split(T)(const(T)[] buffer, const(T) delimiter, 
SplitOption option = SplitOption.NONE)
{
     return SplitIterator!T(buffer, delimiter, option);
}

struct SplitIterator(T)
{
     const(T)[] buffer;
     const(T) delimiter;
     SplitOption option;
     int index = 0;

	int count()
	{
		int c = 0;
		foreach(line; this)
		{
			c++;
		}
		index = 0;
		return c;
	}

	const(T) get(int index)
	{
		return buffer[index];
	}
	
     int opApply(scope int delegate(const(T)[]) dg)
     {
         auto length = buffer.length;
         for (int i = 0; i < length; i++)
         {
             if (buffer[i] == '\0')
             {
                 length = i;
                 break;
             }
         }

         int result = 0;
         for (int i = index; i < length; i++)
         {
             int entry(int start, int end)
             {
                 // trim only if we got something
                 if ((end - start > 0) && (option & 
SplitOption.TRIM))
                 {
                     for (int j = start; j < end; j++)
                         if (buffer[j] == ' ')
                             start += 1;
                         else
                             break;
                     for (int k = end; k >= start; k--)
                         if (buffer[k - 1] == ' ')
                             end -= 1;
                         else
                             break;
					
					// nothing left
					if(start >= end) return 0;
                 }

				//printf("%i to %i :: %i :: total: %lu\n", start, end, index, 
buffer.length);
                 return dg(buffer[start .. end]) != 0;
             }

             auto c = buffer[i];
             if (c == delimiter)
             {
                 if (i == index && (option & 
SplitOption.REMOVE_EMPTY))
                 {
                     // skip if we keep finding the delimiter
                     index = i + 1;
                     continue;
                 }

                 if ((result = entry(index, i)) != 0)
                     break;

                 // skip delimiter for next result
                 index = i + 1;
             }

             // handle what's left
             if ((i + 1) == length)
             {
                 result = entry(index, i + 1);
             }
         }
         return result;
     }

	// copy from above, only replace if above has changed
     int opApply(scope int delegate(int, const(T)[]) dg)
     {
         auto length = buffer.length;
         for (int i = 0; i < length; i++)
         {
             if (buffer[i] == '\0')
             {
                 length = i;
                 break;
             }
         }

		int n = 0;
         int result = 0;
         for (int i = index; i < length; i++)
         {
             int entry(int start, int end)
             {
                 // trim only if we got something
                 if ((end - start > 0) && (option & 
SplitOption.TRIM))
                 {
                     for (int j = start; j < end; j++)
                         if (buffer[j] == ' ')
                             start += 1;
                         else
                             break;
                     for (int k = end; k >= start; k--)
                         if (buffer[k - 1] == ' ')
                             end -= 1;
                         else
                             break;
					
					// nothing left
					if(start >= end) return 0;
                 }

				//printf("%i to %i :: %i :: total: %lu\n", start, end, index, 
buffer.length);
                 return dg(n++, buffer[start .. end]) != 0;
             }

             auto c = buffer[i];
             if (c == delimiter)
             {
                 if (i == index && (option & 
SplitOption.REMOVE_EMPTY))
                 {
                     // skip if we keep finding the delimiter
                     index = i + 1;
                     continue;
                 }

                 if ((result = entry(index, i)) != 0)
                     break;

                 // skip delimiter for next result
                 index = i + 1;
             }

             // handle what's left
             if ((i + 1) == length)
             {
                 result = entry(index, i + 1);
             }
         }
         return result;
     }
}


enum SplitOption
{
     NONE = 0,
     REMOVE_EMPTY = 1,
     TRIM = 2
}

```

Dec 11 2021

Rumbu <rumbu rumbu.ro> writes:

On Saturday, 11 December 2021 at 14:42:53 UTC, russhy wrote:
 Here is mine

 - 0 allocations

 - configurable

 - let's you use it how you wish

 - fast


You know that this is already in phobos?


```
"abc;def;ghi".splitter(';').joiner
```

Dec 11 2021

russhy <russhy_s gmail.com> writes:

On Saturday, 11 December 2021 at 18:51:12 UTC, Rumbu wrote:
 On Saturday, 11 December 2021 at 14:42:53 UTC, russhy wrote:
 Here is mine

 - 0 allocations

 - configurable

 - let's you use it how you wish

 - fast


 You know that this is already in phobos?


 ```
 "abc;def;ghi".splitter(';').joiner
 ```

you need to import a 8k lines of code module that itself imports 
other modules, and then the code is hard to read

https://github.com/dlang/phobos/blob/v2.098.0/std/algorithm/iteration.d#L2917

Dec 11 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 11 December 2021 at 19:50:55 UTC, russhy wrote:
 you need to import a 8k lines of code module that itself 
 imports other modules, and then the code is hard to read

I agree.

```
 safe:

auto deatheater(char stripchar)(string str) {
	struct voldemort {
         immutable(char)* begin, end;
         bool empty(){ return begin == end; }
         char front(){ return *begin; }
         char back() trusted{ return *(end-1); }
         void popFront() trusted{
             while(begin != end){begin++; if (*begin != stripchar) 
break; }
         }
         void popBack() trusted{
             while(begin != end){end--; if (*(end-1) != stripchar) 
break; }
         }
         this(string s) trusted{
             begin = s.ptr;
             end = s.ptr + s.length;
         }
	}
     return voldemort(str);
}


void main() {
     import std.stdio;
     string str = "abc;def;ab";
     foreach(c; deatheater!';'(str)) write(c);
     writeln();
     foreach_reverse(c; deatheater!';'(str)) write(c);
}

```

Dec 12 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Sunday, 12 December 2021 at 08:58:29 UTC, Ola Fosheim Grøstad 
wrote:
         this(string s) trusted{
             begin = s.ptr;
             end = s.ptr + s.length;
         }
 	}

Bug, it fails if the string ends or starts with ';'.

Fix:

```
         this(string s) trusted{
             begin = s.ptr;
             end = s.ptr + s.length;
             while(begin!=end && *begin==stripchar) begin++;
             while(begin!=end && *(end-1)==stripchar) end--;
         }
```

Dec 12 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

Of course, since it is easy to mess up and use ranges in the 
wrong way, you might want to add ```assert```s. That is most 
likely *helpful* to newbies that might want to use your kickass 
library function:

```
auto helpfuldeatheater(char stripchar)(string str) {
	struct voldemort {
         immutable(char)* begin, end;
         bool empty(){ return begin == end; }
         char front(){ assert(!empty); return *begin; }
         char back() trusted{ assert(!empty); return *(end-1); }
         void popFront() trusted{
			assert(!empty);
     		while(begin != end){begin++; if (*begin != stripchar) 
break; }
         }
         void popBack() trusted{
             assert(!empty);
             while(begin != end){end--; if (*(end-1) != stripchar) 
break; }
         }
         this(string s) trusted{
             begin = s.ptr;
             end = s.ptr + s.length;
             while(begin!=end && *begin==stripchar) begin++;
             while(begin!=end && *(end-1)==stripchar) end--;
         }
	}
     return voldemort(str);
}
```

Dec 12 2021

Matheus <matheus gmail.com> writes:

On Wednesday, 8 December 2021 at 11:23:45 UTC, BoQsc wrote:
 ...
 The character I want to skip: `;`

My C way of thinking while using D:

import std;

string stripsemicolons(string input){
     char[] s = input.dup;
     int j=0;
     for(int i=0;i<input.length;++i){
         if(s[i] == ';'){ continue; }
         s[j++] = s[i];
     }
     s.length = j;
     return s.idup;
}

void main(){
     string s = ";testing;this;thing!;";
     writeln(s);
     writeln(s.stripsemicolons);
     return;
}

Matheus.

Dec 10 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Friday, 10 December 2021 at 13:22:58 UTC, Matheus wrote:

 My C way of thinking while using D:

 import std;

 string stripsemicolons(string input){
     char[] s = input.dup;
     int j=0;
     for(int i=0;i<input.length;++i){
         if(s[i] == ';'){ continue; }
         s[j++] = s[i];
     }
     s.length = j;
     return s.idup;
 }


Oooh, finally someone suggested to preallocate storage for all 
these reinventions of the wheel :D

I would suggest instead of the final idup checking the length and 
only duplicating if certain waste threshold is broken, otherwise 
just doing 
https://dlang.org/phobos/std_exception.html#assumeUnique (or a 
cast to string). The result is unique either way.

Threshold could be relative for short strings and absolute for 
long ones. Makes little sense reallocating if you only waste a 
couple bytes, but makes perfect sense if you've just removed 
pages and pages of semicolons ;)

Be interesting to see if this thread does evolve into a SIMD 
search...

Dec 10 2021

Rumbu <rumbu rumbu.ro> writes:

On Friday, 10 December 2021 at 18:47:53 UTC, Stanislav Blinov 
wrote:
 Be interesting to see if this thread does evolve into a SIMD


http://lemire.me/blog/2017/01/20/how-quickly-can-you-remove-spaces-from-a-string/

Dec 10 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 10 December 2021 at 18:47:53 UTC, Stanislav Blinov 
wrote:
 Oooh, finally someone suggested to preallocate storage for all 
 these reinventions of the wheel :D

```
import std.stdio;

char[] dontdothis(string s, int i=0, int skip=0){
     if (s.length == i) return new char[](i - skip);
     if (s[i] == ';') return dontdothis(s, i+1, skip+1);
     auto r = dontdothis(s, i+1, skip);
     r[i-skip] = s[i];
     return r;
}

int main() {
     string s = "abc;def;ab";
     string s_new = cast(string)dontdothis(s);
     writeln(s_new);
     return 0;
}
```

Dec 10 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Friday, 10 December 2021 at 23:53:47 UTC, Ola Fosheim Grøstad 
wrote:

```d
 char[] dontdothis(string s, int i=0, int skip=0){
     if (s.length == i) return new char[](i - skip);
     if (s[i] == ';') return dontdothis(s, i+1, skip+1);
     auto r = dontdothis(s, i+1, skip);
     r[i-skip] = s[i];
     return r;
 }
```

That is about 500% not what I meant. At all. Original code in 
question:

- duplicates string unconditionally as mutable storage
- uses said mutable storage to gather all non-semicolons
- duplicates said mutable storage (again) as immutable

I suggested to make the second duplicate conditional, based on 
amount of space freed by skipping semicolons.

What you're showing is... indeed, don't do this, but I fail to 
see what that has to do with my suggestion, or the original code.

 Scanning short strings twice is not all that expensive as they 
 will stay in the CPU cache > when you run over them a second 
 time.

```d
 import std.stdio;

  safe:
 string stripsemicolons(string s)  trusted {
     int i,n;
     foreach(c; s) n += c != ';'; // premature optimization
     auto r = new char[](n);
     foreach(c; s) if (c != ';') r[i++] = c;
     return cast(string)r;
 }
```

Again, that is a different algorithm than what I was responding 
to. But sure, short strings - might as well. So long as you do 
track the distinction somewhere up in the code and don't simply 
call this on all strings.

Dec 11 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 11 December 2021 at 09:26:06 UTC, Stanislav Blinov 
wrote:
 What you're showing is... indeed, don't do this, but I fail to 
 see what that has to do with my suggestion, or the original 
 code.

You worry too much, just have fun with differing ways of 
expressing the same thing.

(Recursion can be completely fine if the compiler supports it 
well. Tail recursion that is, not my example.)

 Again, that is a different algorithm than what I was responding 
 to.

Slightly different, but same idea. Isn't the point of this thread 
to present N different ways of doing the same thing? :-)

Dec 11 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 10 December 2021 at 18:47:53 UTC, Stanislav Blinov 
wrote:
 Threshold could be relative for short strings and absolute for 
 long ones. Makes little sense reallocating if you only waste a 
 couple bytes, but makes perfect sense if you've just removed 
 pages and pages of semicolons ;)

Scanning short strings twice is not all that expensive as they 
will stay in the CPU cache when you run over them a second time.

```
import std.stdio;

 safe:

string stripsemicolons(string s)  trusted {
     int i,n;
     foreach(c; s) n += c != ';'; // premature optimization
     auto r = new char[](n);
     foreach(c; s) if (c != ';') r[i++] = c;
     return cast(string)r;
}

int main() {
	writeln(stripsemicolons("abc;def;ab"));
     return 0;
}
```

Dec 11 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 10 December 2021 at 18:47:53 UTC, Stanislav Blinov 
wrote:
 Threshold could be relative for short strings and absolute for 
 long ones. Makes little sense reallocating if you only waste a 
 couple bytes, but makes perfect sense if you've just removed 
 pages and pages of semicolons ;)

Like this?

```
 safe:

string prematureoptimizations(string s, char stripchar)  trusted {
     import core.memory;
     immutable uint flags = 
GC.BlkAttr.NO_SCAN|GC.BlkAttr.APPENDABLE;
     char* begin = cast(char*)GC.malloc(s.length+1, flags);
     char* end = begin + 1;
     foreach(c; s) {
         immutable size_t notsemicolon = c != stripchar;
         // hack: avoid conditional by writing semicolon outside 
buffer
         *(end - notsemicolon) = c;
         end += notsemicolon;
     }
     immutable size_t len = end - begin - 1;
     begin = cast(char*)GC.realloc(begin, len, flags);
     return cast(string)begin[0..len];
}

void main() {
     import std.stdio;
     string str = "abc;def;ab";
     writeln(prematureoptimizations(str, ';'));
}

```

Dec 13 2021

Salih Dincer <salihdb hotmail.com> writes:

On Monday, 13 December 2021 at 09:36:57 UTC, Ola Fosheim Grøstad 
wrote:
 ```d
  safe:

 string prematureoptimizations(string s, char stripchar) 
  trusted {
     import core.memory;
     immutable uint flags = 
 GC.BlkAttr.NO_SCAN|GC.BlkAttr.APPENDABLE;
     char* begin = cast(char*)GC.malloc(s.length+1, flags);
     char* end = begin + 1;
     foreach(c; s) {
         immutable size_t notsemicolon = c != stripchar;
         // hack: avoid conditional by writing semicolon outside 
 buffer
         *(end - notsemicolon) = c;
         end += notsemicolon;
     }
     immutable size_t len = end - begin - 1;
     begin = cast(char*)GC.realloc(begin, len, flags);
     return cast(string)begin[0..len];
 }

 void main() {
     import std.stdio;
     string str = "abc;def;ab";
     writeln(prematureoptimizations(str, ';'));
 }
 ```

It seems faster than algorithms in Phobos. We would love to see 
this in our new Phobos.

```d
enum str = "abc;def;gh";
enum res = "abcdefgh";

void main()
{
   void mallocReplace()
   {
     import core.memory;

     immutable uint flags =
       GC.BlkAttr.NO_SCAN|
       GC.BlkAttr.APPENDABLE;

     char* begin = cast(char*)GC.malloc(str.length+1, flags);
     char* end = begin + 1;

     foreach(c; str)
     {
       immutable size_t f = c != ';';
       *(end - f) = c;
       end += f;
     }
     immutable size_t len = end - begin - 1;
     begin = cast(char*)GC.realloc(begin, len, flags);

     assert(begin[0..len] == res);
   }

   void normalReplace()
   {
     import std.string;

     string result = str.replace(';',"");
     assert(result == res);
   }

   void delegate() t1 = &normalReplace;
   void delegate() t2 = &mallocReplace;

   import std.stdio : writefln;
   import std.datetime.stopwatch : benchmark;

   auto bm = benchmark!(t1, t2)(1_000_000);

   writefln("Replace: %s msecs", bm[0].total!"msecs");
   writefln("Malloc : %s msecs", bm[1].total!"msecs");
}/* Console Out:
Replace: 436 msecs
Malloc : 259 msecs
*/
```

Dec 22 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Thursday, 23 December 2021 at 07:14:35 UTC, Salih Dincer wrote:

 It seems faster than algorithms in Phobos. We would love to see 
 this in our new Phobos.

 ```d
   void mallocReplace()

   void normalReplace()
     string result = str.replace(';',"");

 }/* Console Out:
 Replace: 436 msecs
 Malloc : 259 msecs
 */
 ```

You're comparing apples and oranges. When benchmarking, at least 
look at the generated assembly first.

replace is not in Phobos, it's a D runtime vestige. It's not 
getting inlined even in release builds with lto, whereas that 
manual version would. Also, benchmark with runtime strings, not 
literals, otherwise the compiler might even swallow the thing 
whole.

What you're benchmarking is, basically, inlined optimized search 
in a literal versus a function call.

Dec 23 2021

Salih Dincer <salihdb hotmail.com> writes:

On Thursday, 23 December 2021 at 16:13:49 UTC, Stanislav Blinov 
wrote:
 You're comparing apples and oranges.
 When benchmarking, at least look at
 the generated assembly first.

I looked now and you're right. Insomuch that it should be 
eggplant not apple, banana not orange...:)

Because it's an irrelevant benchmarking!

Thank you all...

Dec 23 2021

rumbu <rumbu rumbu.ro> writes:

On Thursday, 23 December 2021 at 07:14:35 UTC, Salih Dincer wrote:

 It seems faster than algorithms in Phobos. We would love to see 
 this in our new Phobos.

 Replace: 436 msecs
 Malloc : 259 msecs
 */

It seems because MallocReplace is cheating a lot:
- it is not called through another function like replace is 
called;
- accesses directly the constant str;
- assumes that it has a single character to replace;
- assumes that the character will be deleted not replaced with 
something;
- assumes that the character is always ';'
- assumes that the replacing string is not bigger than the 
replaced one, so it knows exactly how much space to allocate;
- does not have any parameter, at least on x86 this means that 
there is no arg pushes when it's called.
- does not return a string, just compares its result with another 
constant;

Since we already know all this stuff, we can go further :)

```d
string superFast()
{
     enum r = str.replace(";", "");
     return r;
}

```
 Replace: 436 msecs
 Malloc : 259 msecs
 SuperFast: 0 msecs

Dec 23 2021

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How to loop through characters of a string in D language?