digitalmars.D - UTF8 + SIMD = win
- deadalnix <deadalnix gmail.com> Jul 30 2012
- "bearophile" <bearophileHUGS lycos.com> Jul 30 2012
- Guillaume Chatelet <chatelet.guillaume gmail.com> Jul 30 2012
- Walter Bright <newshound2 digitalmars.com> Jul 31 2012
- "bearophile" <bearophileHUGS lycos.com> Jul 31 2012
- Walter Bright <newshound2 digitalmars.com> Jul 31 2012
- "Bernard Helyer" <b.helyer gmail.com> Jul 31 2012
- "bearophile" <bearophileHUGS lycos.com> Jul 31 2012
- "Jakob Ovrum" <jakobovrum gmail.com> Jul 31 2012
- "Jakob Ovrum" <jakobovrum gmail.com> Jul 31 2012
- "Tobias Pankrath" <tobias pankrath.net> Jul 31 2012
- "bearophile" <bearophileHUGS lycos.com> Jul 31 2012
- "jerro" <a a.com> Jul 31 2012
http://woboq.com/blog/utf-8-processing-using-simd.html All in the article. As D include Unicode as a language feature, I think it is interesting to mention here.
Jul 30 2012
deadalnix:http://woboq.com/blog/utf-8-processing-using-simd.html
So many things to do, so little time to do them :-) Bye, bearophile
Jul 30 2012
On 07/30/12 21:13, deadalnix wrote:http://woboq.com/blog/utf-8-processing-using-simd.html All in the article. As D include Unicode as a language feature, I think it is interesting to mention here.
Very interesting, thx for sharing. This NG definitely is a horn of plenty :)
Jul 30 2012
On 7/30/2012 12:13 PM, deadalnix wrote:http://woboq.com/blog/utf-8-processing-using-simd.html All in the article. As D include Unicode as a language feature, I think it is interesting to mention here.
If someone wants to fix std.utf http://dlang.org/phobos/std_utf.html to use SIMD instructions, that would be cool!
Jul 31 2012
Walter Bright:to use SIMD instructions, that would be cool!
I think in D the most needed UTF operation is UTF8 -> UTF32. Bye, bearophile
Jul 31 2012
On 7/31/2012 5:24 AM, Jakob Ovrum wrote:On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:Bernard Helyer:Where is UTF-32 actually used?
I think all std.algorithm and std.range yield UTF-32 dchars, when you give them a string in input. Bye, bearophile
In addition, foreach over a string with a dchar loop variable does implicit UTF-8 decoding.
SIMD isn't going to speed things up at all for decoding one character. It is for transcoding a large array.
Jul 31 2012
On Tuesday, 31 July 2012 at 10:57:23 UTC, bearophile wrote:Walter Bright:to use SIMD instructions, that would be cool!
I think in D the most needed UTF operation is UTF8 -> UTF32. Bye, bearophile
Where is UTF-32 actually used?
Jul 31 2012
Bernard Helyer:Where is UTF-32 actually used?
I think all std.algorithm and std.range yield UTF-32 dchars, when you give them a string in input. Bye, bearophile
Jul 31 2012
On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:Bernard Helyer:Where is UTF-32 actually used?
I think all std.algorithm and std.range yield UTF-32 dchars, when you give them a string in input. Bye, bearophile
In addition, foreach over a string with a dchar loop variable does implicit UTF-8 decoding.
Jul 31 2012
On Tuesday, 31 July 2012 at 19:28:03 UTC, Walter Bright wrote:On 7/31/2012 5:24 AM, Jakob Ovrum wrote:On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:Bernard Helyer:Where is UTF-32 actually used?
I think all std.algorithm and std.range yield UTF-32 dchars, when you give them a string in input. Bye, bearophile
In addition, foreach over a string with a dchar loop variable does implicit UTF-8 decoding.
SIMD isn't going to speed things up at all for decoding one character. It is for transcoding a large array.
Duh, good point, I totally forgot the context.
Jul 31 2012
On Tuesday, 31 July 2012 at 19:28:03 UTC, Walter Bright wrote:On 7/31/2012 5:24 AM, Jakob Ovrum wrote:On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:Bernard Helyer:Where is UTF-32 actually used?
I think all std.algorithm and std.range yield UTF-32 dchars, when you give them a string in input. Bye, bearophile
In addition, foreach over a string with a dchar loop variable does implicit UTF-8 decoding.
SIMD isn't going to speed things up at all for decoding one character. It is for transcoding a large array.
You could decode them in advance.
Jul 31 2012
Walter Bright:SIMD isn't going to speed things up at all for decoding one character. It is for transcoding a large array.
Right. Maybe you remember my two or three posts about vectorized lazynesss and related matters (that later was a bit implemented in the half-eager map of std.parallelism). Introducing some vectorized lazyness in std.algorithm when the iterable is a UTF-8 (or rarely UTF-16) string allows to use SIMD and probably leads to higher performance. Bye, bearophile
Jul 31 2012
On Tuesday, 31 July 2012 at 19:41:02 UTC, Tobias Pankrath wrote:On Tuesday, 31 July 2012 at 19:28:03 UTC, Walter Bright wrote:On 7/31/2012 5:24 AM, Jakob Ovrum wrote:On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:Bernard Helyer:Where is UTF-32 actually used?
I think all std.algorithm and std.range yield UTF-32 dchars, when you give them a string in input. Bye, bearophile
In addition, foreach over a string with a dchar loop variable does implicit UTF-8 decoding.
SIMD isn't going to speed things up at all for decoding one character. It is for transcoding a large array.
You could decode them in advance.
The problem is you don't know how much you are going to need. This would actually hurt performance in some cases.
Jul 31 2012









"bearophile" <bearophileHUGS lycos.com> 