www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - On the D Blog: Lomuto's Comeback

reply Mike Parker <aldacron gmail.com> writes:
After reading a paper that grabbed his curiosity and wouldn't let 
go, Andrei set out to determine if Lomuto partitioning should 
still be considered inferior to Hoare for quicksort on modern 
hardware. This blog post details his results.

Blog:
https://dlang.org/blog/2020/05/14/lomutos-comeback/

Reddit:
https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/

HN:
https://news.ycombinator.com/item?id=23179160
May 14
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/14/20 9:26 AM, Mike Parker wrote:
 After reading a paper that grabbed his curiosity and wouldn't let go, 
 Andrei set out to determine if Lomuto partitioning should still be 
 considered inferior to Hoare for quicksort on modern hardware. This blog 
 post details his results.
 
 Blog:
 https://dlang.org/blog/2020/05/14/lomutos-comeback/
 
 Reddit:
 https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_qui
ksort_partitioning/ 
 
 
 HN:
 https://news.ycombinator.com/item?id=23179160
Thanks, Mike. HN has possibly categorized it as spam already. One thing they do is they detect (by using the "Referrer" header) whether the post has been shared via a direct link. They do so to prevent manipulation. The right way to share something on hackernews is to send people to https://news.ycombinator.com/newest and mention the time of sharing.
May 14
next sibling parent Mike Parker <aldacron gmail.com> writes:
On Thursday, 14 May 2020 at 13:40:24 UTC, Andrei Alexandrescu 
wrote:
 On 5/14/20 9:26 AM, Mike Parker wrote:
 The right way to share something on hackernews is to send 
 people to https://news.ycombinator.com/newest and mention the 
 time of sharing.
Okay everyone, please use this link or search for "Lomuto's Comeback" in the search field. I've been hearing conflicting accounts of this for a while, with more people telling me it doesn't happen anymore. However, it seems posts were never flagged as spam for this. Instead, any upvotes from people coming through direct links *do not count*. Coupled with the fact that the FAQ still says posts are penalized for "asking for votes", I'm no longer going to share direct links to HN articles. Found multiple sources about it, but this 2015 post lays it all out and I assume it's still mostly relevant: https://wiredcraft.com/blog/how-to-post-on-hacker-news/ https://news.ycombinator.com/newsfaq.html
May 14
prev sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 14 May 2020 at 13:40:24 UTC, Andrei Alexandrescu 
wrote:
 [snip]
Really interesting. Thanks for sharing. I have recently been spending some spare time learning more about D's topN and pivotPartition implementation, which led me to your paper on fast deterministic selection. Would you consider changing the pivotPartition implementation based on this? Would the insights gleamed from this paper mean that a branchless version of topN could be faster?
May 14
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/14/20 11:57 AM, jmh530 wrote:
 On Thursday, 14 May 2020 at 13:40:24 UTC, Andrei Alexandrescu wrote:
 [snip]
Really interesting. Thanks for sharing. I have recently been spending some spare time learning more about D's topN and pivotPartition implementation, which led me to your paper on fast deterministic selection. Would you consider changing the pivotPartition implementation based on this?
Yes, and I encourage you to look into putting together a PR.
 Would the insights gleamed from this paper mean that a branchless 
 version of topN could be faster?
Yes. topN also uses partitioning.
May 17
prev sibling next sibling parent reply SashaGreat <s g.com> writes:
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
 ...
 Reddit:
 https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/
 ...
If possible could you please next time share link with "old" instead of "www"? Like: https://old.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ Thanks, SG.
May 14
parent JN <666total wp.pl> writes:
On Thursday, 14 May 2020 at 14:11:57 UTC, SashaGreat wrote:
 If possible could you please next time share link with "old" 
 instead of "www"? Like:

 https://old.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/
There is a Chrome extension that automatically redirects to the old version of Reddit
May 17
prev sibling next sibling parent Francesco Mecca <me francescomecca.eu> writes:
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
 After reading a paper that grabbed his curiosity and wouldn't 
 let go, Andrei set out to determine if Lomuto partitioning 
 should still be considered inferior to Hoare for quicksort on 
 modern hardware. This blog post details his results.

 Blog:
 https://dlang.org/blog/2020/05/14/lomutos-comeback/

 Reddit:
 https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/

 HN:
 https://news.ycombinator.com/item?id=23179160
Wow, very interesting article. Thanks for sharing.
May 15
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
 After reading a paper that grabbed his curiosity and wouldn't 
 let go, Andrei set out to determine if Lomuto partitioning 
 should still be considered inferior to Hoare for quicksort on 
 modern hardware. This blog post details his results.
Fantastic work and great result. Having privately done a very heavy critique of the narrow niche the article had chosen to explore I still recognize and love the results. — Dmitry Olshansky
May 15
prev sibling next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
 After reading a paper that grabbed his curiosity and wouldn't 
 let go, Andrei set out to determine if Lomuto partitioning 
 should still be considered inferior to Hoare for quicksort on 
 modern hardware. This blog post details his results.

 Blog:
 https://dlang.org/blog/2020/05/14/lomutos-comeback/
Nice stuff! One curious question -- unless I've misread things horribly, it looks like the D benchmarks for Lomuto branch-free are consistently slower than for C++. Any idea why that is? I would expect gcc/gdc and clang/ldc to produce effectively identical results for code like this.
May 15
next sibling parent kinke <noone nowhere.com> writes:
On Friday, 15 May 2020 at 10:28:41 UTC, Joseph Rushton Wakeling 
wrote:
 One curious question -- unless I've misread things horribly, it 
 looks like the D benchmarks for Lomuto branch-free are 
 consistently slower than for C++.  Any idea why that is?  I 
 would expect gcc/gdc and clang/ldc to produce effectively 
 identical results for code like this.
Wrt. the LDC results, LDC v1.17 was shipped with LLVM 8.0.1, while the used clang is v10 based on LLVM 10, so that might account for some slight diffs.
May 15
prev sibling next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On 15/05/2020 12:28, Joseph Rushton Wakeling via Digitalmars-d-announce wrote:
 On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
 After reading a paper that grabbed his curiosity and wouldn't let go, Andrei
set out to determine if Lomuto partitioning should still be considered inferior
to Hoare for quicksort on modern hardware. This blog post details his results.

 Blog:
 https://dlang.org/blog/2020/05/14/lomutos-comeback/
Nice stuff! One curious question -- unless I've misread things horribly, it looks like the D benchmarks for Lomuto branch-free are consistently slower than for C++.  Any idea why that is?  I would expect gcc/gdc and clang/ldc to produce effectively identical results for code like this.
Sorry for the belated response, as far as I can see, gdc and g++ only differ on one line. auto delta = smaller & (read - first); This is lowered as: delta = smaller & (read - first) / 8; However, g++ uses an exact divide operator (semantically that ignores rounding), whereas gdc uses a truncating divide operator (semantically rounds the quotient towards zero). I'm willing to bet a beer on tweaking pointer subtraction will get gdc in lockstep with g++.
Aug 03
prev sibling next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On 03/08/2020 13:08, Iain Buclaw via Digitalmars-d-announce wrote:
 
 
 On 15/05/2020 12:28, Joseph Rushton Wakeling via Digitalmars-d-announce wrote:
 On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
 After reading a paper that grabbed his curiosity and wouldn't let go, Andrei
set out to determine if Lomuto partitioning should still be considered inferior
to Hoare for quicksort on modern hardware. This blog post details his results.

 Blog:
 https://dlang.org/blog/2020/05/14/lomutos-comeback/
Nice stuff! One curious question -- unless I've misread things horribly, it looks like the D benchmarks for Lomuto branch-free are consistently slower than for C++.  Any idea why that is?  I would expect gcc/gdc and clang/ldc to produce effectively identical results for code like this.
Sorry for the belated response, as far as I can see, gdc and g++ only differ on one line. auto delta = smaller & (read - first); This is lowered as: delta = smaller & (read - first) / 8; However, g++ uses an exact divide operator (semantically that ignores rounding), whereas gdc uses a truncating divide operator (semantically rounds the quotient towards zero). I'm willing to bet a beer on tweaking pointer subtraction will get gdc in lockstep with g++.
I doubt Andrei will re-run the benchmarks now, but here's the PR (problem reference) with patch attached. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96429
Aug 03
prev sibling parent reply Iain Buclaw <ibuclaw gdcproject.org> writes:
On 04/08/2020 03:14, Andrei Alexandrescu wrote:
 Interesting, thanks!
 
Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}000000... gdc-baseline: min_milliseconds=53.2922 median_milliseconds=56.8761 min_milliseconds=111.2512 median_milliseconds=115.5812 min_milliseconds=168.8659 median_milliseconds=174.9421 min_milliseconds=228.9230 median_milliseconds=235.9950 min_milliseconds=291.4758 median_milliseconds=302.2652 min_milliseconds=354.6598 median_milliseconds=360.3230 min_milliseconds=420.6221 median_milliseconds=424.0275 min_milliseconds=490.9294 median_milliseconds=505.0082 min_milliseconds=535.9912 median_milliseconds=556.0680 min_milliseconds=619.8160 median_milliseconds=629.6446 gdc-pr96429: min_milliseconds=44.1008 median_milliseconds=46.2956 min_milliseconds=96.0864 median_milliseconds=99.4665 min_milliseconds=146.2498 median_milliseconds=151.5661 min_milliseconds=201.9479 median_milliseconds=207.0937 min_milliseconds=253.2555 median_milliseconds=265.6178 min_milliseconds=309.5941 median_milliseconds=317.8178 min_milliseconds=364.9312 median_milliseconds=381.9105 min_milliseconds=427.2581 median_milliseconds=437.9506 min_milliseconds=464.2838 median_milliseconds=482.9127 min_milliseconds=537.3167 median_milliseconds=550.9250 g++: min_milliseconds=44.0164 median_milliseconds=46.5057 min_milliseconds=91.2730 median_milliseconds=97.8507 min_milliseconds=142.8459 median_milliseconds=153.4782 min_milliseconds=196.4765 median_milliseconds=202.1877 min_milliseconds=251.5876 median_milliseconds=261.6350 min_milliseconds=299.8530 median_milliseconds=311.0719 min_milliseconds=351.9959 median_milliseconds=370.0437 min_milliseconds=412.4231 median_milliseconds=420.1336 min_milliseconds=462.4810 median_milliseconds=474.2444 min_milliseconds=539.3359 median_milliseconds=541.5321 Looks good, so committing patch. :-)
Aug 04
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:
On 8/4/20 4:19 AM, Iain Buclaw wrote:
 On 04/08/2020 03:14, Andrei Alexandrescu wrote:
 Interesting, thanks!
Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}000000...
[snip]
 Looks good, so committing patch. :-)
Awesome, thanks! That does solve a puzzler I had while benchmarking. I'm thinking the story of discovering and fixing this would be a great follow-up in the blog. It doesn't quite mesh with Mike's current introductory series, but it could be done as an intermezzo a la "Now For Something Completely Different (And Much Lower Level)". Sketch of an intro: Upon reading "Lomuto's Comeback" in the D blog, I noticed the performance were consistently juuust a bit worse for the D version than for the C++ version for the same source code. My own measurements confirmed the same. That bothered me at two levels. First, people unfamiliar with the D language would form the opinion that D cannot reach the efficiency of C++. Second, as the gdc creator and maintainer, I knew for a fact the produced code must be literally identical. Any difference would pin point a bug somewhere in the code generation pipeline. So I set out to find it and fix it. This is the story of that investigation, which will take us through looking through disassembly, finding the culprit, devising a fix, confirming with measurements, and patching the open-source gdc compiler. ...
Aug 04
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:
On 8/4/20 9:49 AM, Andrei Alexandrescu wrote:
 On 8/4/20 4:19 AM, Iain Buclaw wrote:
 On 04/08/2020 03:14, Andrei Alexandrescu wrote:
 Interesting, thanks!
Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}000000...
[snip]
 Looks good, so committing patch. :-)
Awesome, thanks! That does solve a puzzler I had while benchmarking. I'm thinking the story of discovering and fixing this would be a great follow-up in the blog. It doesn't quite mesh with Mike's current introductory series, but it could be done as an intermezzo a la "Now For Something Completely Different (And Much Lower Level)". Sketch of an intro: Upon reading "Lomuto's Comeback" in the D blog, I noticed the performance were consistently juuust a bit worse for the D version than for the C++ version for the same source code. My own measurements confirmed the same. That bothered me at two levels. First, people unfamiliar with the D language would form the opinion that D cannot reach the efficiency of C++. Second, as the gdc creator and maintainer, I knew for a fact the produced code must be literally identical. Any difference would pin point a bug somewhere in the code generation pipeline. So I set out to find it and fix it. This is the story of that investigation, which will take us through looking through disassembly, finding the culprit, devising a fix, confirming with measurements, and patching the open-source gdc compiler. ...
Oh, and there are a few comments to the original blog post I'd be glad to respond to in an appendix.
Aug 04
prev sibling next sibling parent Les De Ridder <les lesderid.net> writes:
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
 After reading a paper that grabbed his curiosity and wouldn't 
 let go, Andrei set out to determine if Lomuto partitioning 
 should still be considered inferior to Hoare for quicksort on 
 modern hardware. This blog post details his results.

 Blog:
 https://dlang.org/blog/2020/05/14/lomutos-comeback/
Great post, and nice to have another example for how bad branches can really be for performance! One note: The clang/ldc compiler explorer links for lomuto_partition_branchfree are wrong.
May 15
prev sibling next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/14/20 9:26 AM, Mike Parker wrote:
 After reading a paper that grabbed his curiosity and wouldn't let go, 
 Andrei set out to determine if Lomuto partitioning should still be 
 considered inferior to Hoare for quicksort on modern hardware. This blog 
 post details his results.
 
 Blog:
 https://dlang.org/blog/2020/05/14/lomutos-comeback/
 
 Reddit:
 https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_qui
ksort_partitioning/ 
 
 
 HN:
 https://news.ycombinator.com/item?id=23179160
Fantastic article! -Steve
May 15
prev sibling next sibling parent Jon Degenhardt <jond noreply.com> writes:
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
 After reading a paper that grabbed his curiosity and wouldn't 
 let go, Andrei set out to determine if Lomuto partitioning 
 should still be considered inferior to Hoare for quicksort on 
 modern hardware. This blog post details his results.

 Blog:
 https://dlang.org/blog/2020/05/14/lomutos-comeback/

 Reddit:
 https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/

 HN:
 https://news.ycombinator.com/item?id=23179160
Got posted again to Hacker News earlier today. Currently at position 5.
May 17
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/14/20 9:26 AM, Mike Parker wrote:
 After reading a paper that grabbed his curiosity and wouldn't let go, 
 Andrei set out to determine if Lomuto partitioning should still be 
 considered inferior to Hoare for quicksort on modern hardware. This blog 
 post details his results.
 
 Blog:
 https://dlang.org/blog/2020/05/14/lomutos-comeback/
 
 Reddit:
 https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_qui
ksort_partitioning/ 
 
 
 HN:
 https://news.ycombinator.com/item?id=23179160
Looks like the blog post is enjoying a second wind after being posted by soneone else on hackernews. It's in top 10 right now.
May 17
prev sibling parent Arjan <arjan ask.me.to> writes:
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
 After reading a paper that grabbed his curiosity and wouldn't 
 let go, Andrei set out to determine if Lomuto partitioning 
 should still be considered inferior to Hoare for quicksort on 
 modern hardware. This blog post details his results.

 Blog:
 https://dlang.org/blog/2020/05/14/lomutos-comeback/

 Reddit:
 https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/

 HN:
 https://news.ycombinator.com/item?id=23179160
A follow up article on this: https://news.ycombinator.com/item?id=23363165 https://blog.reverberate.org/2020/05/29/hoares-rebuttal-bubble-sorts-comeback.html
May 31