digitalmars.D - UTF8 + SIMD = win

deadalnix (3/3) Jul 30 2012 http://woboq.com/blog/utf-8-processing-using-simd.html

bearophile (4/5) Jul 30 2012 So many things to do, so little time to do them :-)
Guillaume Chatelet (2/6) Jul 30 2012 Very interesting, thx for sharing. This NG definitely is a horn of plent...
Walter Bright (4/7) Jul 31 2012 If someone wants to fix std.utf

bearophile (4/5) Jul 31 2012 I think in D the most needed UTF operation is UTF8 -> UTF32.

Bernard Helyer (2/7) Jul 31 2012 Where is UTF-32 actually used?

bearophile (5/6) Jul 31 2012 I think all std.algorithm and std.range yield UTF-32 dchars, when

Jakob Ovrum (3/9) Jul 31 2012 In addition, foreach over a string with a dchar loop variable

Walter Bright (3/15) Jul 31 2012 SIMD isn't going to speed things up at all for decoding one character. I...

Jakob Ovrum (2/21) Jul 31 2012 Duh, good point, I totally forgot the context.
Tobias Pankrath (2/21) Jul 31 2012 You could decode them in advance.

jerro (3/25) Jul 31 2012 The problem is you don't know how much you are going to need.

bearophile (10/12) Jul 31 2012 Right.

deadalnix <deadalnix gmail.com> writes:

http://woboq.com/blog/utf-8-processing-using-simd.html

All in the article. As D include Unicode as a language feature, I think 
it is interesting to mention here.

Jul 30 2012

"bearophile" <bearophileHUGS lycos.com> writes:

deadalnix:
 http://woboq.com/blog/utf-8-processing-using-simd.html

So many things to do, so little time to do them :-)

Bye,
bearophile

Jul 30 2012

Guillaume Chatelet <chatelet.guillaume gmail.com> writes:

On 07/30/12 21:13, deadalnix wrote:
 http://woboq.com/blog/utf-8-processing-using-simd.html
 
 All in the article. As D include Unicode as a language feature, I think
 it is interesting to mention here.

Very interesting, thx for sharing. This NG definitely is a horn of plenty :)

Jul 30 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 7/30/2012 12:13 PM, deadalnix wrote:
 http://woboq.com/blog/utf-8-processing-using-simd.html

 All in the article. As D include Unicode as a language feature, I think it is
 interesting to mention here.

If someone wants to fix std.utf

    http://dlang.org/phobos/std_utf.html

to use SIMD instructions, that would be cool!

Jul 31 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Walter Bright:

 to use SIMD instructions, that would be cool!

I think in D the most needed UTF operation is UTF8 -> UTF32.

Bye,
bearophile

Jul 31 2012

"Bernard Helyer" <b.helyer gmail.com> writes:

On Tuesday, 31 July 2012 at 10:57:23 UTC, bearophile wrote:
 Walter Bright:

 to use SIMD instructions, that would be cool!

 I think in D the most needed UTF operation is UTF8 -> UTF32.

 Bye,
 bearophile

Where is UTF-32 actually used?

Jul 31 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Bernard Helyer:

 Where is UTF-32 actually used?

I think all std.algorithm and std.range yield UTF-32 dchars, when 
you give them a string in input.

Bye,
bearophile

Jul 31 2012

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:
 Bernard Helyer:

 Where is UTF-32 actually used?

 I think all std.algorithm and std.range yield UTF-32 dchars, 
 when you give them a string in input.

 Bye,
 bearophile

In addition, foreach over a string with a dchar loop variable 
does implicit UTF-8 decoding.

Jul 31 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 7/31/2012 5:24 AM, Jakob Ovrum wrote:
 On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:
 Bernard Helyer:

 Where is UTF-32 actually used?

 I think all std.algorithm and std.range yield UTF-32 dchars, when you give
 them a string in input.

 Bye,
 bearophile

 In addition, foreach over a string with a dchar loop variable does implicit
 UTF-8 decoding.

SIMD isn't going to speed things up at all for decoding one character. It is
for 
transcoding a large array.

Jul 31 2012

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Tuesday, 31 July 2012 at 19:28:03 UTC, Walter Bright wrote:
 On 7/31/2012 5:24 AM, Jakob Ovrum wrote:
 On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:
 Bernard Helyer:

 Where is UTF-32 actually used?

 I think all std.algorithm and std.range yield UTF-32 dchars, 
 when you give
 them a string in input.

 Bye,
 bearophile

 In addition, foreach over a string with a dchar loop variable 
 does implicit
 UTF-8 decoding.

 SIMD isn't going to speed things up at all for decoding one 
 character. It is for transcoding a large array.

Duh, good point, I totally forgot the context.

Jul 31 2012

"Tobias Pankrath" <tobias pankrath.net> writes:

On Tuesday, 31 July 2012 at 19:28:03 UTC, Walter Bright wrote:
 On 7/31/2012 5:24 AM, Jakob Ovrum wrote:
 On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:
 Bernard Helyer:

 Where is UTF-32 actually used?

 I think all std.algorithm and std.range yield UTF-32 dchars, 
 when you give
 them a string in input.

 Bye,
 bearophile

 In addition, foreach over a string with a dchar loop variable 
 does implicit
 UTF-8 decoding.

 SIMD isn't going to speed things up at all for decoding one 
 character. It is for transcoding a large array.

You could decode them in advance.

Jul 31 2012

"jerro" <a a.com> writes:

On Tuesday, 31 July 2012 at 19:41:02 UTC, Tobias Pankrath wrote:
 On Tuesday, 31 July 2012 at 19:28:03 UTC, Walter Bright wrote:
 On 7/31/2012 5:24 AM, Jakob Ovrum wrote:
 On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:
 Bernard Helyer:

 Where is UTF-32 actually used?

 I think all std.algorithm and std.range yield UTF-32 dchars, 
 when you give
 them a string in input.

 Bye,
 bearophile

 In addition, foreach over a string with a dchar loop variable 
 does implicit
 UTF-8 decoding.

 SIMD isn't going to speed things up at all for decoding one 
 character. It is for transcoding a large array.

 You could decode them in advance.

The problem is you don't know how much you are going to need.
This would actually hurt performance in some cases.

Jul 31 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Walter Bright:

 SIMD isn't going to speed things up at all for decoding one 
 character. It is for transcoding a large array.

Right.
Maybe you remember my two or three posts about vectorized 
lazynesss and related matters (that later was a bit implemented 
in the half-eager map of std.parallelism). Introducing some 
vectorized lazyness in std.algorithm when the iterable is a UTF-8 
(or rarely UTF-16) string allows to use SIMD and probably leads 
to higher performance.

Bye,
bearophile

Jul 31 2012

D Programming

C/C++ Programming

Other

digitalmars.D - UTF8 + SIMD = win