www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Compiler benchmarks for an alternative to std.uni.asLowerCase.

reply Jon D <jond noreply.com> writes:
I did a performance study on speeding up case conversion in 
std.uni.asLowerCase. Specifics for asLowerCase have been added to 
issue https://issues.dlang.org/show_bug.cgi?id=11229. Publishing 
here as some of the more general observations may be of wider 
interest.

Background - Case conversion can generally be sped up by checking 
if a character is ascii before invoking a full unicode case 
conversion. The single character std.uni.toLower does this 
optimization, but std.uni.asLowerCase does not. asLowerCase does 
a lazy conversion of a range. For the test, I created a 
replacement for asLowerCase which uses map and toLower. In 
essence, `map!(x => x.toLower)` or `map!(x => x.byDchar.toLower)`.

Testing was with DMD (2.071) and LDC 1.0.0-beta1 (Phobos 2.070) 
on OSX. Compiler settings were `-release -O -boundscheck=off`. 
DMD was tested with and without `-inline`. LDC turns on inlining 
(-enable-inlining=1) by default with -O, but DMD does not. Texts 
tried were in Japanese, Chinese, Finnish, English, German, and 
Spanish. Timing was done both including and excluding decoding 
from utf-8 to dchar.

Performance delta including decoding to dchar:
   | Language group  | Pct Ascii | LDC gain   | DMD gain  | DMD no 
inline  |
   
|-----------------+-----------+------------+-----------+----------------|
   | Latin           |    95-99% | 64% (2.7x) | 93% (14x) | 48% 
(1.9x)     |
   | Asian (Jpn/Chn) |  2.4-3.7% | 36% (1.6x) | 80% (5x)  | -1%

Performance delta excluding decoding to dchar:
   | Language group  | Pct Ascii | LDC gain   | DMD gain  | DMD no 
inline |
   
|-----------------+-----------+------------+-----------+---------------|
   | Latin           |    95-99% | 60% (2.5x) | 95% (20x) | 60% 
(2.5x)    |
   | Asian (Jpn/Chn) |  2.4-3.7% | 50% (2x)   | 95% (20x) | -2%

Observations:
* mapAsLowerCase was faster than asLowerCase across the board. 
That it was better for Asian texts suggests the improvement 
involved more just the ascii check optimization.
* Performance varied widely between compilers, and for DMD, 
whether the -inline flag was included. The performance delta 
between asLowerCase and the mapAsLowerCase replacement was very 
dependent on these choices. Similarly, the delta between 
inclusion and exclusion of auto-decoding was highly dependent on 
these selections.
* DMD improvement by using -inline: 30% for asLowerCase (1.5x), 
90% for mapAsLowerCase (10x).
* DMD (-inline) vs LDC: For asLowerCase, LDC was 65-85% faster. 
For mapAsLowerCase, DMD was 10-40% faster. There were changes to 
the map implementation in 2.071, so these were not equivalent, 
but still, it's interesting that DMD beat LDC in this case.

Thoughts:
* The large variances between compiler settings imply extra 
diligence when performance tuning at the source code level, 
especially for code intended for multiple compilers.
* Perhaps DMD -O should also turn on -inline. This would present 
a better performance picture to new users. It's also helpful when 
the different compilers agree on rough meaning of compiler 
switches.
* Auto-decoding is an oft discussed concern. It doesn't show up 
in the table above, but the data I looked at suggests the 
cost/penalty may vary quite a bit depending on usage context and 
compiler/settings. I wasn't studying aspect explicitly. It may be 
worth its own analysis.

Other details:
* Code for mapAsLowerCase and the timing program is at: 
https://dpaste.dzfl.pl/a0e2fa1c71fd
* Texts used for timing were books in several languages from the 
Project Gutenberg site (http://www.gutenberg.org/), with 
boilerplate text removed.

--Jon
May 08 2016
next sibling parent reply Peter =?UTF-8?B?SMOkZ2dtYW4=?= <hagg.pet2 online.fi> writes:
On Sunday, 8 May 2016 at 23:38:31 UTC, Jon D wrote:
 I did a performance study on speeding up case conversion in 
 std.uni.asLowerCase. Specifics for asLowerCase have been added 
 to issue https://issues.dlang.org/show_bug.cgi?id=11229. 
 Publishing here as some of the more general observations may be 
 of wider interest.

 [...]
Nice, it seems that you would have enough material to advocate a pull request in phobos then ;)
May 08 2016
parent reply Jon D <jond noreply.com> writes:
On Monday, 9 May 2016 at 00:15:03 UTC, Peter Häggman wrote:
 On Sunday, 8 May 2016 at 23:38:31 UTC, Jon D wrote:
 I did a performance study on speeding up case conversion in 
 std.uni.asLowerCase. Specifics for asLowerCase have been added 
 to issue https://issues.dlang.org/show_bug.cgi?id=11229. 
 Publishing here as some of the more general observations may 
 be of wider interest.

 [...]
Nice, it seems that you would have enough material to advocate a pull request in phobos then ;)
Thanks! I haven't yet taken the time to go through the 'becoming a contributor' steps, when I have the time I'll do that. In this case, I'd want to start by validating with the library designers that the approach makes sense. It by-passes what appears to a basic primitive, std.uni.toCaser. There may be reasons this is not desirable.
May 08 2016
parent Lionello Lunesu <lionello lunesu.remove.com> writes:
On 9/5/2016 12:01, Jon D wrote:
 On Monday, 9 May 2016 at 00:15:03 UTC, Peter Häggman wrote:
 On Sunday, 8 May 2016 at 23:38:31 UTC, Jon D wrote:
 I did a performance study on speeding up case conversion in
 std.uni.asLowerCase. Specifics for asLowerCase have been added to
 issue https://issues.dlang.org/show_bug.cgi?id=11229. Publishing here
 as some of the more general observations may be of wider interest.

 [...]
Nice, it seems that you would have enough material to advocate a pull request in phobos then ;)
Thanks! I haven't yet taken the time to go through the 'becoming a contributor' steps, when I have the time I'll do that. In this case, I'd want to start by validating with the library designers that the approach makes sense. It by-passes what appears to a basic primitive, std.uni.toCaser. There may be reasons this is not desirable.
Less code, and faster, and using existing building blocks. Win, win, win, if you ask me! +1
May 09 2016
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 09-May-2016 02:38, Jon D wrote:
 I did a performance study on speeding up case conversion in
 std.uni.asLowerCase. Specifics for asLowerCase have been added to issue
 https://issues.dlang.org/show_bug.cgi?id=11229. Publishing here as some
 of the more general observations may be of wider interest.

 Background - Case conversion can generally be sped up by checking if a
 character is ascii before invoking a full unicode case conversion. The
 single character std.uni.toLower does this optimization, but
 std.uni.asLowerCase does not. asLowerCase does a lazy conversion of a
 range. For the test, I created a replacement for asLowerCase which uses
 map and toLower. In essence, `map!(x => x.toLower)` or `map!(x =>
 x.byDchar.toLower)`.
The only problem is that it should consider multi-codepoint replacements aka full-case folding in Unicode. Otherwise - go ahead and issue a pull request to add special case for < 0x80. -- Dmitry Olshansky
May 09 2016
parent reply Marc =?UTF-8?B?U2Now7x0eg==?= <schuetzm gmx.net> writes:
On Monday, 9 May 2016 at 08:44:53 UTC, Dmitry Olshansky wrote:
 On 09-May-2016 02:38, Jon D wrote:
 [...]
The only problem is that it should consider multi-codepoint replacements aka full-case folding in Unicode. Otherwise - go ahead and issue a pull request to add special case for < 0x80.
What about locale dependent case mappings (e.g. Turkish Iı İi)? Or is that currently not supported by std.uni?
May 11 2016
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Wed, 11 May 2016 11:37:03 +0000
schrieb Marc Sch=C3=BCtz <schuetzm gmx.net>:

 On Monday, 9 May 2016 at 08:44:53 UTC, Dmitry Olshansky wrote:
 On 09-May-2016 02:38, Jon D wrote: =20
 [...] =20
The only problem is that it should consider multi-codepoint=20 replacements aka full-case folding in Unicode. Otherwise - go ahead and issue a pull request to add special case for < 0x80.
In case someone wonders, an example for multi-codepoint replacements is when =C3=9F becomes SS in upper-case.
 What about locale dependent case mappings (e.g. Turkish I=C4=B1 =C4=B0i)?=
=20
 Or is that currently not supported by std.uni?
I second those thoughts. Important players* agree that you cannot do case conversion or string sorting without knowing the locale and sorting scheme, and Phobos that set out to be "Unicode first" can't even express locales to begin with. * C++: http://en.cppreference.com/w/cpp/locale/tolower POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/functions/tolower.html Java: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#toLowerCas= e%28java.util.Locale%29 Go: Recently added special handling of Turkish, but lack of full-case-folding is still a "BUG" comment in the code. The actual discussion on std.locale is here: http://forum.dlang.org/thread/gof4ft$2odv$1 digitalmars.com --=20 Marco
May 12 2016
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12-May-2016 17:23, Marco Leise wrote:
 Am Wed, 11 May 2016 11:37:03 +0000
 schrieb Marc Schütz <schuetzm gmx.net>:

 On Monday, 9 May 2016 at 08:44:53 UTC, Dmitry Olshansky wrote:
 On 09-May-2016 02:38, Jon D wrote:
 [...]
The only problem is that it should consider multi-codepoint replacements aka full-case folding in Unicode. Otherwise - go ahead and issue a pull request to add special case for < 0x80.
In case someone wonders, an example for multi-codepoint replacements is when ß becomes SS in upper-case.
 What about locale dependent case mappings (e.g. Turkish Iı İi)?
 Or is that currently not supported by std.uni?
I second those thoughts. Important players* agree that you cannot do case conversion or string sorting without knowing the locale and sorting scheme, and Phobos that set out to be "Unicode first" can't even express locales to begin with.
Proper handling of that is called tailoring in Unicode. Personally I don't think it would be terribly hard to go and do: Locale locale = fetchSomeTailoredLocale(...); locale.toLower(...); // etc. all functions as members By the way we do what is called default handling (tables) which is sensible "default" IMO. -- Dmitry Olshansky
May 12 2016
prev sibling parent qznc <qznc web.de> writes:
On Sunday, 8 May 2016 at 23:38:31 UTC, Jon D wrote:
 * Performance varied widely between compilers, and for DMD, 
 whether the -inline flag was included. The performance delta 
 between asLowerCase and the mapAsLowerCase replacement was very 
 dependent on these choices. Similarly, the delta between 
 inclusion and exclusion of auto-decoding was highly dependent 
 on these selections.
This makes me suspicious, since you did not measure standard deviation or variance. A quick local test with your code looks ok, though. One of my students recently made a benchmarking tool. If you really want to do this properly, you could try that: https://github.com/parttimenerd/temci I would not require better statistics here, since the speedups seem to be stable. :)
May 09 2016