## digitalmars.D - Porting my Integer Sorting Algorithms to D

- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> Feb 23 2014
- Russel Winder <russel winder.org.uk> Feb 23 2014
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> Feb 23 2014
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> Feb 23 2014
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> Feb 23 2014
- Russel Winder <russel winder.org.uk> Feb 23 2014
- Russel Winder <russel winder.org.uk> Feb 23 2014
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> Feb 23 2014
- Russel Winder <russel winder.org.uk> Feb 23 2014
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> Feb 23 2014
- "francesco cattoglio" <francesco.cattoglio gmail.com> Feb 23 2014
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> Feb 24 2014

I have a couple of self-implemented C++ integer sort algorithms lying around in my codebase. I also have a parallel merge sort on top of them that uses Intel TBB to give some further speedups. I have tweaked them to also work for floats and doubles, through some interesting bit-fiddling tips I found on the net. Would anybody be interested in reviewing these to give suggestions on how to best port it to Phobos? /Per

Feb 23 2014

On Sun, 2014-02-23 at 14:09 +0000, "Nordlöw" wrote:I have a couple of self-implemented C++ integer sort algorithms lying around in my codebase.

What is the basic sort algorithm? Radix sort is generally seen as the best for sorting integer values currently. But that doesn't mean there is better, just that that is the one to beat. Which requires benchmarks. Which requires a framework for running benchmarks. As far as I am aware D hasn't got one of these as yet, but it needs one. Such things exists in Python (I am currently playing with benchmark) and , I am sure, other dynamic languages. It should be relatively easy to do something with D.I also have a parallel merge sort on top of them that uses Intel TBB to give some further speedups.

std.parallelism needs some work to compete with TBB. Though I am not sure "like" is a term I would use for the TBB API.I have tweaked them to also work for floats and doubles, through some interesting bit-fiddling tips I found on the net. Would anybody be interested in reviewing these to give suggestions on how to best port it to Phobos?

I guess I just volunteered. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

Feb 23 2014

On Sunday, 23 February 2014 at 15:10:36 UTC, Russel Winder wrote:On Sun, 2014-02-23 at 14:09 +0000, "Nordlöw" wrote:I have a couple of self-implemented C++ integer sort algorithms lying around in my codebase.

What is the basic sort algorithm? Radix sort is generally seen as the best for sorting integer values currently. But that doesn't mean there is better, just that that is the one to beat. Which requires benchmarks. Which requires a framework for running benchmarks. As far as I am aware D hasn't got one of these as yet, but it needs one. Such things exists in Python (I am currently playing with benchmark) and , I am sure, other dynamic languages. It should be relatively easy to do something with D.I also have a parallel merge sort on top of them that uses Intel TBB to give some further speedups.

std.parallelism needs some work to compete with TBB. Though I am not sure "like" is a term I would use for the TBB API.I have tweaked them to also work for floats and doubles, through some interesting bit-fiddling tips I found on the net. Would anybody be interested in reviewing these to give suggestions on how to best port it to Phobos?

I guess I just volunteered.

The non-inplace radix sort algorithm is the most interesting. I have a test and benchmarking suite along with it. Here's an extract of it using GCC 4.8.2 with -O3 settings on my Intel Quad-Core with 8 Hyperthreads. Results are promising. These benchmarks have been verfied to produce correct results: Element Count: 1000000 Number of tries: 5 Do in place: 0 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 1947812us 16 4.27x 4.02x 4.87x 4.37x double 2022670us 16 2.30x 2.09x 3.25x 2.31x uint8_t 1673208us 8 10.4x 10.0x 9.32x 10.6x uint16_t 1800614us 16 10.1x 9.37x 9.52x 10.1x uint32_t 1936704us 16 5.45x 4.99x 5.77x 5.42x Should I place the code on a Github repo or send it to you by email? Thx, Per

Feb 23 2014

On Sunday, 23 February 2014 at 16:16:48 UTC, Nordlöw wrote:On Sunday, 23 February 2014 at 15:10:36 UTC, Russel Winder wrote:On Sun, 2014-02-23 at 14:09 +0000, "Nordlöw" wrote:I have a couple of self-implemented C++ integer sort algorithms lying around in my codebase.

What is the basic sort algorithm? Radix sort is generally seen as the best for sorting integer values currently. But that doesn't mean there is better, just that that is the one to beat. Which requires benchmarks. Which requires a framework for running benchmarks. As far as I am aware D hasn't got one of these as yet, but it needs one. Such things exists in Python (I am currently playing with benchmark) and , I am sure, other dynamic languages. It should be relatively easy to do something with D.I also have a parallel merge sort on top of them that uses Intel TBB to give some further speedups.

std.parallelism needs some work to compete with TBB. Though I am not sure "like" is a term I would use for the TBB API.I have tweaked them to also work for floats and doubles, through some interesting bit-fiddling tips I found on the net. Would anybody be interested in reviewing these to give suggestions on how to best port it to Phobos?

I guess I just volunteered.

The non-inplace radix sort algorithm is the most interesting. I have a test and benchmarking suite along with it. Here's an extract of it using GCC 4.8.2 with -O3 settings on my Intel Quad-Core with 8 Hyperthreads. Results are promising. These benchmarks have been verfied to produce correct results: Element Count: 1000000 Number of tries: 5 Do in place: 0 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 1947812us 16 4.27x 4.02x 4.87x 4.37x double 2022670us 16 2.30x 2.09x 3.25x 2.31x uint8_t 1673208us 8 10.4x 10.0x 9.32x 10.6x uint16_t 1800614us 16 10.1x 9.37x 9.52x 10.1x uint32_t 1936704us 16 5.45x 4.99x 5.77x 5.42x Should I place the code on a Github repo or send it to you by email? Thx, Per

After having take a look at algorithm.d I guess we should use CTFE-logic to autoconfigure to use radix sort when ElementType isInteger or float or double. string and real could probably be sorted aswell. To make this configuration optimal an optional pre-configuration benchmarking pass may be need in order to give optimal speeds. Such a pass should for each sort implementation figure out a suitable limit [low,high] for the size range being sorted. /Per

Feb 23 2014

I did some more benchmarking: These are all using non-inplace radix sort. This can be useful in some cases where in-place is not needed. Performance gain increase with size input array compared to std::sort as radix sort in "some regard" is O(n) and quick sort is O(n*log n). Break sizes (when radix sort wins) for different element types: uint8: <1024 uint16: <1024 uint32: >= 4096 float: >= 8192 double: >= 16384 Number of tries: 3 Do in place: 0 Element Count: 1024 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 2390us 16 0.483x 0.892x 0.386x 0.898x double 809us 16 0.152x 0.155x 0.059x 0.0822x uint8_t 1104us 8 10.9x 10.9x 1.65x 2.9x uint16_t 1813us 16 1.38x 1.4x 0.889x 1.44x uint32_t 707us 16 0.286x 0.281x 0.137x 0.231x Element Count: 2048 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 1511us 16 0.518x 0.513x 0.21x 0.28x double 1520us 16 0.268x 0.27x 0.0997x 0.144x uint8_t 2603us 8 11.8x 11.8x 3.81x 12.2x uint16_t 1451us 16 1.03x 1.1x 0.675x 1.03x uint32_t 1531us 16 0.592x 0.575x 0.44x 0.583x Element Count: 4096 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 3099us 16 0.939x 0.904x 0.785x 0.925x double 3411us 16 0.504x 0.483x 0.13x 0.392x uint8_t 2777us 8 6.69x 6.64x 1.7x 7.37x uint16_t 3133us 16 1.91x 1.9x 0.458x 1.46x uint32_t 3681us 16 1.15x 1.14x 0.229x 0.539x Element Count: 8192 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 6760us 16 1.49x 1.54x 0.422x 0.921x double 7285us 16 0.84x 0.825x 0.333x 0.897x uint8_t 5895us 8 7.19x 6.89x 5.59x 7.77x uint16_t 6730us 16 3.18x 3.23x 1.5x 3.52x uint32_t 7095us 16 1.75x 1.7x 0.4x 1.89x Element Count: 16384 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 14410us 16 2.28x 2.19x 0.732x 2.29x double 15066us 16 1.19x 1.08x 0.539x 1.23x uint8_t 12104us 8 7.8x 7.65x 7.22x 7.84x uint16_t 14570us 16 4.81x 4.69x 0.906x 4.96x uint32_t 15034us 16 2.61x 2.48x 0.537x 2.6x Element Count: 32768 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 30764us 16 2.82x 2.62x 2.61x 2.67x double 31569us 16 1.6x 1.46x 0.764x 1.58x uint8_t 26176us 8 8.23x 8.52x 2.8x 8.49x uint16_t 30947us 16 6.45x 6.13x 6.32x 6.91x uint32_t 32785us 16 3.53x 3.39x 1.09x 3.89x Element Count: 65536 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 63166us 16 3.24x 3.12x 2.12x 3.25x double 71154us 16 1.83x 1.81x 1.66x 2.02x uint8_t 56285us 8 8.52x 8.43x 4.22x 9.35x uint16_t 64761us 16 7.68x 8.02x 2.54x 8.32x uint32_t 68576us 16 4.16x 3.93x 2.45x 4.4x Element Count: 131072 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 137296us 16 3.42x 3.47x 3.13x 3.73x double 139881us 16 2.08x 1.85x 1.93x 2.08x uint8_t 111393us 8 9.04x 9.15x 5.99x 9.45x uint16_t 128848us 16 8.47x 8.12x 4.65x 8.53x uint32_t 134586us 16 4.64x 4.29x 2.37x 4.56x Element Count: 262144 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 275099us 16 3.87x 3.66x 3.02x 3.86x double 294763us 16 2.22x 1.93x 2.82x 2.18x uint8_t 233671us 8 9.53x 9.65x 5.06x 9.92x uint16_t 259553us 16 8.93x 8.43x 5.6x 9.33x uint32_t 282111us 16 5.16x 4.57x 4.37x 5.09x Element Count: 524288 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 573934us 16 3.96x 3.8x 3.25x 4.05x double 622996us 16 2.16x 2.05x 2.74x 2.29x uint8_t 482522us 8 10x 9.82x 5.71x 10.1x uint16_t 525446us 16 9.3x 8.76x 5.64x 9.49x uint32_t 595237us 16 5.42x 4.95x 4.86x 5.44x Element Count: 1048576 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 1202785us 16 4.22x 3.99x 4.9x 4.24x double 1322048us 16 2.39x 2.13x 3.51x 2.45x uint8_t 1009387us 8 10.3x 10.2x 8.83x 10.5x uint16_t 1070777us 16 9.6x 9x 9.45x 9.59x uint32_t 1242144us 16 5.37x 5.09x 6.07x 5.54x Element Count: 2097152 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 2539646us 16 4.42x 4.15x 5.77x 4.46x double 2686074us 16 2.46x 2.19x 3.64x 2.45x uint8_t 2136319us 8 10.8x 10.7x 9.93x 10.8x uint16_t 2199277us 16 9.7x 9x 11.1x 9.6x uint32_t 2619062us 16 5.78x 5.28x 7.01x 5.83x Element Count: 4194304 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 5388683us 16 4.64x 4.46x 6.76x 4.72x double 5617282us 16 2.54x 2.29x 4x 2.56x uint8_t 4449453us 8 11.1x 10.9x 12.7x 11.2x uint16_t 4546878us 16 9.96x 9.24x 12.8x 9.91x uint32_t 5466031us 16 5.87x 5.43x 7.94x 5.92x Element Count: 8388608 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 11029468us 16 4.73x 4.51x 6.91x 4.75x double 11801211us 16 2.65x 2.38x 4.21x 2.62x uint8_t 9363439us 8 11.4x 11.3x 13.5x 11.4x uint16_t 9569329us 16 10.2x 9.4x 14x 10x uint32_t 11413279us 16 6.05x 5.56x 8.58x 5.94x

Feb 23 2014

On Sun, 2014-02-23 at 16:16 +0000, "Nordlöw" wrote: […]Should I place the code on a Github repo or send it to you by email?

BitBucket or GitHub work for me: if we keep this public others may join in… -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

Feb 23 2014

On Sun, 2014-02-23 at 16:59 +0000, "Nordlöw" wrote:I did some more benchmarking:

We definitely need a benchmarking framework so as to ensure we can have multiple runs of each datapoint calculating mean and standard deviation for each data point, and for grouping and ANOVA. I am guessing there isn't already one. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

Feb 23 2014

I guess working on a branch of Phobos (std.algorithm) is preferrable when we get to the D implementation right?

Feb 23 2014

On Sun, 2014-02-23 at 17:33 +0000, "Nordlöw" wrote:I guess working on a branch of Phobos (std.algorithm) is preferrable when we get to the D implementation right?

I just asked effectively that question re std.parallelism on the DMD Concurrency list, only to find I wasn't a member so the email is pending moderation. I am hoping there is a way of working on a self-contained component of Phobos without having to clone Phobos and do a feature branch. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

Feb 23 2014

I guess this component would be integrated into std.algorithm when its ready, right?

Feb 23 2014

On Sunday, 23 February 2014 at 19:17:29 UTC, Nordlöw wrote:I guess this component would be integrated into std.algorithm when its ready, right?

If you are happy with it being in there, if the algorithm has a nice behaviour, and it passes the existing test suite, I don't see why it would not!

Feb 23 2014

I guess I just volunteered.

You can find a prel version working for 8,16,32 and 64-bit signed and unsigned integers at https://github.com/nordlow/justd/blob/master/intsort.d I'll fix float and double support soon.

Feb 24 2014