www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Utf8 to Utf32 cast cost

reply "Kadir Erdem Demir" <kerdemdemir hotmail.com> writes:
I want to use my char array with awesome, cool std.algorithm 
functions. Since many of this algorithms requires like slicing 
etc.. I prefer to create my string with Utf32 chars. But by 
default all strings literals are Utf8 for performance.

With my current knowledge I use to!dhar to convert Utf8[](or 
char[]) to Utf32[](or dchar[])

dchar[] range = to!dchar("erdem".dup)

How costly is this?
Is there a way which I can have Utf32 string directly without a 
cast?
Jun 08 2015
next sibling parent reply "Ilya Yaroshenko" <ilyayaroshenko gmail.com> writes:
On Monday, 8 June 2015 at 10:42:00 UTC, Kadir Erdem Demir wrote:
 I want to use my char array with awesome, cool std.algorithm 
 functions. Since many of this algorithms requires like slicing 
 etc.. I prefer to create my string with Utf32 chars. But by 
 default all strings literals are Utf8 for performance.

 With my current knowledge I use to!dhar to convert Utf8[](or 
 char[]) to Utf32[](or dchar[])

 dchar[] range = to!dchar("erdem".dup)

 How costly is this?
 Is there a way which I can have Utf32 string directly without a 
 cast?
1. dstring range = to!dstring("erdem"); //without dup 2. dchar[] range = to!(dchar[])("erdem"); //mutable 3. dstring range = "erdem"d; //directly 4. dchar[] range = "erdem"d.dup; //mutable
Jun 08 2015
parent reply "weaselcat" <weaselcat gmail.com> writes:
On Monday, 8 June 2015 at 10:49:59 UTC, Ilya Yaroshenko wrote:
 On Monday, 8 June 2015 at 10:42:00 UTC, Kadir Erdem Demir wrote:
 I want to use my char array with awesome, cool std.algorithm 
 functions. Since many of this algorithms requires like slicing 
 etc.. I prefer to create my string with Utf32 chars. But by 
 default all strings literals are Utf8 for performance.

 With my current knowledge I use to!dhar to convert Utf8[](or 
 char[]) to Utf32[](or dchar[])

 dchar[] range = to!dchar("erdem".dup)

 How costly is this?
 Is there a way which I can have Utf32 string directly without 
 a cast?
1. dstring range = to!dstring("erdem"); //without dup 2. dchar[] range = to!(dchar[])("erdem"); //mutable 3. dstring range = "erdem"d; //directly 4. dchar[] range = "erdem"d.dup; //mutable
what's wrong with http://dlang.org/phobos/std_utf.html#.toUTF32
Jun 08 2015
next sibling parent "Kadir Erdem Demir" <kerdemdemir hotmail.com> writes:
Thanks a lot, your answers are very useful for me .
Nothing wrong with toUtf32, I just didn't know it.
Jun 08 2015
prev sibling parent reply Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn writes:
On Mon, 08 Jun 2015 10:51:53 +0000
weaselcat via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 On Monday, 8 June 2015 at 10:49:59 UTC, Ilya Yaroshenko wrote:
 On Monday, 8 June 2015 at 10:42:00 UTC, Kadir Erdem Demir wrote:
 I want to use my char array with awesome, cool std.algorithm 
 functions. Since many of this algorithms requires like slicing 
 etc.. I prefer to create my string with Utf32 chars. But by 
 default all strings literals are Utf8 for performance.

 With my current knowledge I use to!dhar to convert Utf8[](or 
 char[]) to Utf32[](or dchar[])

 dchar[] range = to!dchar("erdem".dup)

 How costly is this?
 Is there a way which I can have Utf32 string directly without 
 a cast?
1. dstring range = to!dstring("erdem"); //without dup 2. dchar[] range = to!(dchar[])("erdem"); //mutable 3. dstring range = "erdem"d; //directly 4. dchar[] range = "erdem"d.dup; //mutable
what's wrong with http://dlang.org/phobos/std_utf.html#.toUTF32
from: http://dlang.org/phobos/std_encoding.html#.transcode Supersedes: This function supersedes std.utf.toUTF8(), std.utf.toUTF16() and std.utf.toUTF32() (but note that to!() supersedes it more conveniently).
Jun 08 2015
parent reply "Daniel Kozak" <kozzi11 gmail.com> writes:
On Monday, 8 June 2015 at 11:06:07 UTC, Daniel Kozák wrote:
 On Mon, 08 Jun 2015 10:51:53 +0000
 weaselcat via Digitalmars-d-learn 
 <digitalmars-d-learn puremagic.com>
 wrote:

 On Monday, 8 June 2015 at 10:49:59 UTC, Ilya Yaroshenko wrote:
 On Monday, 8 June 2015 at 10:42:00 UTC, Kadir Erdem Demir 
 wrote:
 I want to use my char array with awesome, cool 
 std.algorithm functions. Since many of this algorithms 
 requires like slicing etc.. I prefer to create my string 
 with Utf32 chars. But by default all strings literals are 
 Utf8 for performance.

 With my current knowledge I use to!dhar to convert 
 Utf8[](or char[]) to Utf32[](or dchar[])

 dchar[] range = to!dchar("erdem".dup)

 How costly is this?
 Is there a way which I can have Utf32 string directly 
 without a cast?
1. dstring range = to!dstring("erdem"); //without dup 2. dchar[] range = to!(dchar[])("erdem"); //mutable 3. dstring range = "erdem"d; //directly 4. dchar[] range = "erdem"d.dup; //mutable
what's wrong with http://dlang.org/phobos/std_utf.html#.toUTF32
from: http://dlang.org/phobos/std_encoding.html#.transcode Supersedes: This function supersedes std.utf.toUTF8(), std.utf.toUTF16() and std.utf.toUTF32() (but note that to!() supersedes it more conveniently).
BTW on ldc(ldc -O3 -singleobj -release -boundscheck=off) transcode is the fastest: f0 time: 1 sec, 115 ms, 48 μs, and 7 hnsecs // to!dstring f1 time: 449 ms and 329 μs // toUTF32 f2 time: 272 ms, 969 μs, and 1 hnsec // transcode
Jun 08 2015
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Mon, 08 Jun 2015 11:13:25 +0000
schrieb "Daniel Kozak" <kozzi11 gmail.com>:

 BTW on ldc(ldc -O3 -singleobj -release -boundscheck=3Doff)=20
 transcode is the fastest:
=20
 f0 time: 1 sec, 115 ms, 48 =CE=BCs, and 7 hnsecs // to!dstring
 f1 time: 449 ms and 329 =CE=BCs // toUTF32
 f2 time: 272 ms, 969 =CE=BCs, and 1 hnsec // transcode
Three functions, each twice as fast and twice as hidden as the one before. :) --=20 Marco
Jun 10 2015
prev sibling next sibling parent Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn writes:
On Mon, 08 Jun 2015 10:41:59 +0000
Kadir Erdem Demir via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 I want to use my char array with awesome, cool std.algorithm 
 functions. Since many of this algorithms requires like slicing 
 etc.. I prefer to create my string with Utf32 chars. But by 
 default all strings literals are Utf8 for performance.
 
 With my current knowledge I use to!dhar to convert Utf8[](or 
 char[]) to Utf32[](or dchar[])
 
 dchar[] range = to!dchar("erdem".dup)
 
 How costly is this?
 Is there a way which I can have Utf32 string directly without a 
 cast?
dstring str = "erdem"d; dstring str2 = std.utf.toUTF32(someUtf8Or16Or32String);
Jun 08 2015
prev sibling parent reply Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn writes:
On Mon, 08 Jun 2015 10:41:59 +0000
Kadir Erdem Demir via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 I want to use my char array with awesome, cool std.algorithm 
 functions. Since many of this algorithms requires like slicing 
 etc.. I prefer to create my string with Utf32 chars. But by 
 default all strings literals are Utf8 for performance.
 
 With my current knowledge I use to!dhar to convert Utf8[](or 
 char[]) to Utf32[](or dchar[])
 
 dchar[] range = to!dchar("erdem".dup)
 
 How costly is this?
import std.conv; import std.utf; import std.datetime; import std.stdio; void f0() { string somestr = "some not so long utf8 string forbenchmarking"; dstring str = to!dstring(somestr); } void f1() { string somestr = "some not so long utf8 string forbenchmarking"; dstring str = toUTF32(somestr); } void main() { auto r = benchmark!(f0,f1)(1_000_000); auto f0Result = to!Duration(r[0]); auto f1Result = to!Duration(r[1]); writeln("f0 time: ",f0Result); writeln("f1 time: ",f1Result); } /// output /// f0 time: 2 secs, 281 ms, 933 μs, and 8 hnsecs f1 time: 600 ms, 979 μs, and 8 hnsecs
Jun 08 2015
next sibling parent reply "Kagamin" <spam here.lot> writes:
On Monday, 8 June 2015 at 10:59:45 UTC, Daniel Kozák wrote:
 import std.conv;
 import std.utf;
 import std.datetime;
 import std.stdio;

 void f0() {
     string somestr = "some not so long utf8 string 
 forbenchmarking";
     dstring str = to!dstring(somestr);
 }


 void f1() {
     string somestr = "some not so long utf8 string 
 forbenchmarking";
     dstring str = toUTF32(somestr);
 }

 void main() {
     auto r = benchmark!(f0,f1)(1_000_000);
     auto f0Result = to!Duration(r[0]);
     auto f1Result = to!Duration(r[1]);
     writeln("f0 time: ",f0Result);
     writeln("f1 time: ",f1Result);
 }


 /// output ///
 f0 time: 2 secs, 281 ms, 933 μs, and 8 hnsecs
 f1 time: 600 ms, 979 μs, and 8 hnsecs
Chances are you're benchmarking the GC. Try benchmark!(f0,f1,f0,f1,f0,f1);
Jun 08 2015
parent reply Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn writes:
On Mon, 08 Jun 2015 11:32:07 +0000
Kagamin via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 On Monday, 8 June 2015 at 10:59:45 UTC, Daniel Kozák wrote:
 import std.conv;
 import std.utf;
 import std.datetime;
 import std.stdio;

 void f0() {
     string somestr = "some not so long utf8 string 
 forbenchmarking";
     dstring str = to!dstring(somestr);
 }


 void f1() {
     string somestr = "some not so long utf8 string 
 forbenchmarking";
     dstring str = toUTF32(somestr);
 }

 void main() {
     auto r = benchmark!(f0,f1)(1_000_000);
     auto f0Result = to!Duration(r[0]);
     auto f1Result = to!Duration(r[1]);
     writeln("f0 time: ",f0Result);
     writeln("f1 time: ",f1Result);
 }


 /// output ///
 f0 time: 2 secs, 281 ms, 933 μs, and 8 hnsecs
 f1 time: 600 ms, 979 μs, and 8 hnsecs
Chances are you're benchmarking the GC. Try benchmark!(f0,f1,f0,f1,f0,f1);
No difference even with GC.disable() results are same.
Jun 08 2015
parent reply "Anonymouse" <herp derp.nl> writes:
On Monday, 8 June 2015 at 11:44:47 UTC, Daniel Kozák wrote:
 No difference even with GC.disable() results are same.
Profile! Callgrind is your friend~
Jun 08 2015
parent reply Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Mon, 08 Jun 2015 18:16:57 +0000
Anonymouse via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

 On Monday, 8 June 2015 at 11:44:47 UTC, Daniel Kozák wrote:
 No difference even with GC.disable() results are same.
Profile! Callgrind is your friend~
Yep, but I dont care, I am the one who makes transcode faster, so I am happy with results :P. P.S. I care and probably when I have some spare time I will improve to!dstring too
Jun 08 2015
parent "Anonymouse" <herp derp.nl> writes:
On Monday, 8 June 2015 at 18:48:17 UTC, Daniel Kozak wrote:
 Yep, but I dont care, I am the one who makes transcode faster, 
 so I am happy
 with results :P.

 P.S. I care and probably when I have some spare time I will
 improve to!dstring too
Ah, so you are. I confused you with Kadir Erdem Demir.
Jun 08 2015
prev sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Mon, 8 Jun 2015 12:59:31 +0200
schrieb Daniel Koz=C3=A1k via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com>:

=20
 On Mon, 08 Jun 2015 10:41:59 +0000
 Kadir Erdem Demir via Digitalmars-d-learn
 <digitalmars-d-learn puremagic.com> wrote:
=20
 I want to use my char array with awesome, cool std.algorithm=20
 functions. Since many of this algorithms requires like slicing=20
 etc.. I prefer to create my string with Utf32 chars. But by=20
 default all strings literals are Utf8 for performance.
=20
 With my current knowledge I use to!dhar to convert Utf8[](or=20
 char[]) to Utf32[](or dchar[])
=20
 dchar[] range =3D to!dchar("erdem".dup)
=20
 How costly is this?
=20 import std.conv; import std.utf; import std.datetime; import std.stdio; =20 void f0() { string somestr =3D "some not so long utf8 string forbenchmarking"; dstring str =3D to!dstring(somestr); } =20 =20 void f1() { string somestr =3D "some not so long utf8 string forbenchmarking"; dstring str =3D toUTF32(somestr); } =20 void main() { auto r =3D benchmark!(f0,f1)(1_000_000); auto f0Result =3D to!Duration(r[0]); auto f1Result =3D to!Duration(r[1]); writeln("f0 time: ",f0Result); writeln("f1 time: ",f1Result); } =20 =20 /// output /// f0 time: 2 secs, 281 ms, 933 =CE=BCs, and 8 hnsecs f1 time: 600 ms, 979 =CE=BCs, and 8 hnsecs =20
Please have the result of the transcode influence the program output. E.g. Add the first character of the UTF32 string to some global variable and print it out. At the moment - at least in theory - you allow the compiler to deduce f0/f1 as pure, return-nothing functions and you will benchmark anything from your written code to an empty loop. I'm talking out of experience here: https://github.com/mleise/fast/blob/master/source/fast/internal.d#L99 --=20 Marco
Jun 10 2015