digitalmars.D.learn - Utf8 to Utf32 cast cost

Kadir Erdem Demir (10/10) Jun 08 2015 I want to use my char array with awesome, cool std.algorithm

Ilya Yaroshenko (5/15) Jun 08 2015 1. dstring range = to!dstring("erdem"); //without dup

weaselcat (2/20) Jun 08 2015 what's wrong with http://dlang.org/phobos/std_utf.html#.toUTF32

Kadir Erdem Demir (2/2) Jun 08 2015 Thanks a lot, your answers are very useful for me .
Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn (7/29) Jun 08 2015 On Mon, 08 Jun 2015 10:51:53 +0000

Daniel Kozak (6/39) Jun 08 2015 BTW on ldc(ldc -O3 -singleobj -release -boundscheck=off)

Marco Leise (6/12) Jun 10 2015 Three functions, each twice as fast and twice as hidden as the

Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn (5/18) Jun 08 2015 On Mon, 08 Jun 2015 10:41:59 +0000
Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn (25/36) Jun 08 2015 On Mon, 08 Jun 2015 10:41:59 +0000

Kagamin (3/27) Jun 08 2015 Chances are you're benchmarking the GC. Try

Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn (4/38) Jun 08 2015 On Mon, 08 Jun 2015 11:32:07 +0000

Anonymouse (2/3) Jun 08 2015 Profile! Callgrind is your friend~

Daniel Kozak via Digitalmars-d-learn (6/10) Jun 08 2015 Yep, but I dont care, I am the one who makes transcode faster, so I am h...

Anonymouse (2/7) Jun 08 2015 Ah, so you are. I confused you with Kadir Erdem Demir.

Marco Leise (13/59) Jun 10 2015 Am Mon, 8 Jun 2015 12:59:31 +0200

"Kadir Erdem Demir" <kerdemdemir hotmail.com> writes:

I want to use my char array with awesome, cool std.algorithm 
functions. Since many of this algorithms requires like slicing 
etc.. I prefer to create my string with Utf32 chars. But by 
default all strings literals are Utf8 for performance.

With my current knowledge I use to!dhar to convert Utf8[](or 
char[]) to Utf32[](or dchar[])

dchar[] range = to!dchar("erdem".dup)

How costly is this?
Is there a way which I can have Utf32 string directly without a 
cast?

Jun 08 2015

"Ilya Yaroshenko" <ilyayaroshenko gmail.com> writes:

On Monday, 8 June 2015 at 10:42:00 UTC, Kadir Erdem Demir wrote:
 I want to use my char array with awesome, cool std.algorithm 
 functions. Since many of this algorithms requires like slicing 
 etc.. I prefer to create my string with Utf32 chars. But by 
 default all strings literals are Utf8 for performance.

 With my current knowledge I use to!dhar to convert Utf8[](or 
 char[]) to Utf32[](or dchar[])

 dchar[] range = to!dchar("erdem".dup)

 How costly is this?
 Is there a way which I can have Utf32 string directly without a 
 cast?

1. dstring range = to!dstring("erdem"); //without dup
2. dchar[] range = to!(dchar[])("erdem"); //mutable
3. dstring range = "erdem"d; //directly
4. dchar[] range = "erdem"d.dup; //mutable

Jun 08 2015

"weaselcat" <weaselcat gmail.com> writes:

On Monday, 8 June 2015 at 10:49:59 UTC, Ilya Yaroshenko wrote:
 On Monday, 8 June 2015 at 10:42:00 UTC, Kadir Erdem Demir wrote:
 I want to use my char array with awesome, cool std.algorithm 
 functions. Since many of this algorithms requires like slicing 
 etc.. I prefer to create my string with Utf32 chars. But by 
 default all strings literals are Utf8 for performance.

 With my current knowledge I use to!dhar to convert Utf8[](or 
 char[]) to Utf32[](or dchar[])

 dchar[] range = to!dchar("erdem".dup)

 How costly is this?
 Is there a way which I can have Utf32 string directly without 
 a cast?

 1. dstring range = to!dstring("erdem"); //without dup
 2. dchar[] range = to!(dchar[])("erdem"); //mutable
 3. dstring range = "erdem"d; //directly
 4. dchar[] range = "erdem"d.dup; //mutable

Jun 08 2015

"Kadir Erdem Demir" <kerdemdemir hotmail.com> writes:

Thanks a lot, your answers are very useful for me .
Nothing wrong with toUtf32, I just didn't know it.

Jun 08 2015

Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn writes:

On Mon, 08 Jun 2015 10:51:53 +0000
weaselcat via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 On Monday, 8 June 2015 at 10:49:59 UTC, Ilya Yaroshenko wrote:
 On Monday, 8 June 2015 at 10:42:00 UTC, Kadir Erdem Demir wrote:
 I want to use my char array with awesome, cool std.algorithm 
 functions. Since many of this algorithms requires like slicing 
 etc.. I prefer to create my string with Utf32 chars. But by 
 default all strings literals are Utf8 for performance.

 With my current knowledge I use to!dhar to convert Utf8[](or 
 char[]) to Utf32[](or dchar[])

 dchar[] range = to!dchar("erdem".dup)

 How costly is this?
 Is there a way which I can have Utf32 string directly without 
 a cast?

 1. dstring range = to!dstring("erdem"); //without dup
 2. dchar[] range = to!(dchar[])("erdem"); //mutable
 3. dstring range = "erdem"d; //directly
 4. dchar[] range = "erdem"d.dup; //mutable

 




Supersedes:
This function supersedes std.utf.toUTF8(), std.utf.toUTF16() and
std.utf.toUTF32() (but note that to!() supersedes it more conveniently).

Jun 08 2015

"Daniel Kozak" <kozzi11 gmail.com> writes:

On Monday, 8 June 2015 at 11:06:07 UTC, Daniel Kozák wrote:
 On Mon, 08 Jun 2015 10:51:53 +0000
 weaselcat via Digitalmars-d-learn 
 <digitalmars-d-learn puremagic.com>
 wrote:

 On Monday, 8 June 2015 at 10:49:59 UTC, Ilya Yaroshenko wrote:
 On Monday, 8 June 2015 at 10:42:00 UTC, Kadir Erdem Demir 
 wrote:
 I want to use my char array with awesome, cool 
 std.algorithm functions. Since many of this algorithms 
 requires like slicing etc.. I prefer to create my string 
 with Utf32 chars. But by default all strings literals are 
 Utf8 for performance.

 With my current knowledge I use to!dhar to convert 
 Utf8[](or char[]) to Utf32[](or dchar[])

 dchar[] range = to!dchar("erdem".dup)

 How costly is this?
 Is there a way which I can have Utf32 string directly 
 without a cast?

 1. dstring range = to!dstring("erdem"); //without dup
 2. dchar[] range = to!(dchar[])("erdem"); //mutable
 3. dstring range = "erdem"d; //directly
 4. dchar[] range = "erdem"d.dup; //mutable

 




 Supersedes:
 This function supersedes std.utf.toUTF8(), std.utf.toUTF16() and
 std.utf.toUTF32() (but note that to!() supersedes it more 
 conveniently).

BTW on ldc(ldc -O3 -singleobj -release -boundscheck=off) 
transcode is the fastest:

f0 time: 1 sec, 115 ms, 48 μs, and 7 hnsecs // to!dstring
f1 time: 449 ms and 329 μs // toUTF32
f2 time: 272 ms, 969 μs, and 1 hnsec // transcode

Jun 08 2015

Marco Leise <Marco.Leise gmx.de> writes:

Am Mon, 08 Jun 2015 11:13:25 +0000
schrieb "Daniel Kozak" <kozzi11 gmail.com>:

 BTW on ldc(ldc -O3 -singleobj -release -boundscheck=3Doff)=20
 transcode is the fastest:
=20
 f0 time: 1 sec, 115 ms, 48 =CE=BCs, and 7 hnsecs // to!dstring
 f1 time: 449 ms and 329 =CE=BCs // toUTF32
 f2 time: 272 ms, 969 =CE=BCs, and 1 hnsec // transcode

Three functions, each twice as fast and twice as hidden as the
one before. :)

--=20
Marco

Jun 10 2015

Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn writes:

On Mon, 08 Jun 2015 10:41:59 +0000
Kadir Erdem Demir via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 I want to use my char array with awesome, cool std.algorithm 
 functions. Since many of this algorithms requires like slicing 
 etc.. I prefer to create my string with Utf32 chars. But by 
 default all strings literals are Utf8 for performance.
 
 With my current knowledge I use to!dhar to convert Utf8[](or 
 char[]) to Utf32[](or dchar[])
 
 dchar[] range = to!dchar("erdem".dup)
 
 How costly is this?
 Is there a way which I can have Utf32 string directly without a 
 cast?

dstring str = "erdem"d;
dstring str2 = std.utf.toUTF32(someUtf8Or16Or32String);

Jun 08 2015

Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn writes:

On Mon, 08 Jun 2015 10:41:59 +0000
Kadir Erdem Demir via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 I want to use my char array with awesome, cool std.algorithm 
 functions. Since many of this algorithms requires like slicing 
 etc.. I prefer to create my string with Utf32 chars. But by 
 default all strings literals are Utf8 for performance.
 
 With my current knowledge I use to!dhar to convert Utf8[](or 
 char[]) to Utf32[](or dchar[])
 
 dchar[] range = to!dchar("erdem".dup)
 
 How costly is this?

import std.conv;
import std.utf;
import std.datetime;
import std.stdio;

void f0() {
    string somestr = "some not so long utf8 string forbenchmarking";
    dstring str = to!dstring(somestr);
}


void f1() {
    string somestr = "some not so long utf8 string forbenchmarking";
    dstring str = toUTF32(somestr);
}

void main() {
    auto r = benchmark!(f0,f1)(1_000_000);
    auto f0Result = to!Duration(r[0]);
    auto f1Result = to!Duration(r[1]);
    writeln("f0 time: ",f0Result);
    writeln("f1 time: ",f1Result);
}


/// output ///
f0 time: 2 secs, 281 ms, 933 μs, and 8 hnsecs
f1 time: 600 ms, 979 μs, and 8 hnsecs

Jun 08 2015

"Kagamin" <spam here.lot> writes:

On Monday, 8 June 2015 at 10:59:45 UTC, Daniel Kozák wrote:
 import std.conv;
 import std.utf;
 import std.datetime;
 import std.stdio;

 void f0() {
     string somestr = "some not so long utf8 string 
 forbenchmarking";
     dstring str = to!dstring(somestr);
 }


 void f1() {
     string somestr = "some not so long utf8 string 
 forbenchmarking";
     dstring str = toUTF32(somestr);
 }

 void main() {
     auto r = benchmark!(f0,f1)(1_000_000);
     auto f0Result = to!Duration(r[0]);
     auto f1Result = to!Duration(r[1]);
     writeln("f0 time: ",f0Result);
     writeln("f1 time: ",f1Result);
 }


 /// output ///
 f0 time: 2 secs, 281 ms, 933 μs, and 8 hnsecs
 f1 time: 600 ms, 979 μs, and 8 hnsecs

Chances are you're benchmarking the GC. Try 
benchmark!(f0,f1,f0,f1,f0,f1);

Jun 08 2015

Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn writes:

On Mon, 08 Jun 2015 11:32:07 +0000
Kagamin via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 On Monday, 8 June 2015 at 10:59:45 UTC, Daniel Kozák wrote:
 import std.conv;
 import std.utf;
 import std.datetime;
 import std.stdio;

 void f0() {
     string somestr = "some not so long utf8 string 
 forbenchmarking";
     dstring str = to!dstring(somestr);
 }


 void f1() {
     string somestr = "some not so long utf8 string 
 forbenchmarking";
     dstring str = toUTF32(somestr);
 }

 void main() {
     auto r = benchmark!(f0,f1)(1_000_000);
     auto f0Result = to!Duration(r[0]);
     auto f1Result = to!Duration(r[1]);
     writeln("f0 time: ",f0Result);
     writeln("f1 time: ",f1Result);
 }


 /// output ///
 f0 time: 2 secs, 281 ms, 933 μs, and 8 hnsecs
 f1 time: 600 ms, 979 μs, and 8 hnsecs

 
 Chances are you're benchmarking the GC. Try 
 benchmark!(f0,f1,f0,f1,f0,f1);

No difference even with GC.disable() results are same.

Jun 08 2015

"Anonymouse" <herp derp.nl> writes:

On Monday, 8 June 2015 at 11:44:47 UTC, Daniel Kozák wrote:
 No difference even with GC.disable() results are same.

Profile! Callgrind is your friend~

Jun 08 2015

Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On Mon, 08 Jun 2015 18:16:57 +0000
Anonymouse via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

 On Monday, 8 June 2015 at 11:44:47 UTC, Daniel Kozák wrote:
 No difference even with GC.disable() results are same.

 
 Profile! Callgrind is your friend~

Yep, but I dont care, I am the one who makes transcode faster, so I am happy
with results :P. 

P.S. I care and probably when I have some spare time I will
improve to!dstring too

Jun 08 2015

"Anonymouse" <herp derp.nl> writes:

On Monday, 8 June 2015 at 18:48:17 UTC, Daniel Kozak wrote:
 Yep, but I dont care, I am the one who makes transcode faster, 
 so I am happy
 with results :P.

 P.S. I care and probably when I have some spare time I will
 improve to!dstring too

Ah, so you are. I confused you with Kadir Erdem Demir.

Jun 08 2015

Marco Leise <Marco.Leise gmx.de> writes:

Am Mon, 8 Jun 2015 12:59:31 +0200
schrieb Daniel Koz=C3=A1k via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com>:

=20
 On Mon, 08 Jun 2015 10:41:59 +0000
 Kadir Erdem Demir via Digitalmars-d-learn
 <digitalmars-d-learn puremagic.com> wrote:
=20
 I want to use my char array with awesome, cool std.algorithm=20
 functions. Since many of this algorithms requires like slicing=20
 etc.. I prefer to create my string with Utf32 chars. But by=20
 default all strings literals are Utf8 for performance.
=20
 With my current knowledge I use to!dhar to convert Utf8[](or=20
 char[]) to Utf32[](or dchar[])
=20
 dchar[] range =3D to!dchar("erdem".dup)
=20
 How costly is this?

=20
 import std.conv;
 import std.utf;
 import std.datetime;
 import std.stdio;
=20
 void f0() {
     string somestr =3D "some not so long utf8 string forbenchmarking";
     dstring str =3D to!dstring(somestr);
 }
=20
=20
 void f1() {
     string somestr =3D "some not so long utf8 string forbenchmarking";
     dstring str =3D toUTF32(somestr);
 }
=20
 void main() {
     auto r =3D benchmark!(f0,f1)(1_000_000);
     auto f0Result =3D to!Duration(r[0]);
     auto f1Result =3D to!Duration(r[1]);
     writeln("f0 time: ",f0Result);
     writeln("f1 time: ",f1Result);
 }
=20
=20
 /// output ///
 f0 time: 2 secs, 281 ms, 933 =CE=BCs, and 8 hnsecs
 f1 time: 600 ms, 979 =CE=BCs, and 8 hnsecs
=20

Please have the result of the transcode influence the program
output. E.g. Add the first character of the UTF32 string to
some global variable and print it out. At the moment - at
least in theory - you allow the compiler to deduce f0/f1 as
pure, return-nothing functions and you will benchmark anything
from your written code to an empty loop. I'm talking out of
experience here:
https://github.com/mleise/fast/blob/master/source/fast/internal.d#L99

--=20
Marco

Jun 10 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Utf8 to Utf32 cast cost