www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Porting my Integer Sorting Algorithms to D

reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
I have a couple of self-implemented C++ integer sort algorithms 
lying around in my codebase.

I also have a parallel merge sort on top of them that uses Intel 
TBB to give some
further speedups.

I have tweaked them to also work for floats and doubles, through 
some interesting bit-fiddling tips I found on the net.

Would anybody be interested in reviewing these to give 
suggestions on how to best port it to Phobos?

/Per
Feb 23 2014
next sibling parent Russel Winder <russel winder.org.uk> writes:
On Sun, 2014-02-23 at 14:09 +0000, "Nordlöw" wrote:
 I have a couple of self-implemented C++ integer sort algorithms 
 lying around in my codebase.

What is the basic sort algorithm? Radix sort is generally seen as the best for sorting integer values currently. But that doesn't mean there is better, just that that is the one to beat. Which requires benchmarks. Which requires a framework for running benchmarks. As far as I am aware D hasn't got one of these as yet, but it needs one. Such things exists in Python (I am currently playing with benchmark) and , I am sure, other dynamic languages. It should be relatively easy to do something with D.
 I also have a parallel merge sort on top of them that uses Intel 
 TBB to give some
 further speedups.

std.parallelism needs some work to compete with TBB. Though I am not sure "like" is a term I would use for the TBB API.
 I have tweaked them to also work for floats and doubles, through 
 some interesting bit-fiddling tips I found on the net.
 
 Would anybody be interested in reviewing these to give 
 suggestions on how to best port it to Phobos?

I guess I just volunteered. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Feb 23 2014
prev sibling next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Sunday, 23 February 2014 at 15:10:36 UTC, Russel Winder wrote:
 On Sun, 2014-02-23 at 14:09 +0000, "Nordlöw" wrote:
 I have a couple of self-implemented C++ integer sort 
 algorithms lying around in my codebase.

What is the basic sort algorithm? Radix sort is generally seen as the best for sorting integer values currently. But that doesn't mean there is better, just that that is the one to beat. Which requires benchmarks. Which requires a framework for running benchmarks. As far as I am aware D hasn't got one of these as yet, but it needs one. Such things exists in Python (I am currently playing with benchmark) and , I am sure, other dynamic languages. It should be relatively easy to do something with D.
 I also have a parallel merge sort on top of them that uses 
 Intel TBB to give some
 further speedups.

std.parallelism needs some work to compete with TBB. Though I am not sure "like" is a term I would use for the TBB API.
 I have tweaked them to also work for floats and doubles, 
 through some interesting bit-fiddling tips I found on the net.
 
 Would anybody be interested in reviewing these to give 
 suggestions on how to best port it to Phobos?

I guess I just volunteered.

The non-inplace radix sort algorithm is the most interesting. I have a test and benchmarking suite along with it. Here's an extract of it using GCC 4.8.2 with -O3 settings on my Intel Quad-Core with 8 Hyperthreads. Results are promising. These benchmarks have been verfied to produce correct results: Element Count: 1000000 Number of tries: 5 Do in place: 0 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 1947812us 16 4.27x 4.02x 4.87x 4.37x double 2022670us 16 2.30x 2.09x 3.25x 2.31x uint8_t 1673208us 8 10.4x 10.0x 9.32x 10.6x uint16_t 1800614us 16 10.1x 9.37x 9.52x 10.1x uint32_t 1936704us 16 5.45x 4.99x 5.77x 5.42x Should I place the code on a Github repo or send it to you by email? Thx, Per
Feb 23 2014
prev sibling next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Sunday, 23 February 2014 at 16:16:48 UTC, Nordlöw wrote:
 On Sunday, 23 February 2014 at 15:10:36 UTC, Russel Winder 
 wrote:
 On Sun, 2014-02-23 at 14:09 +0000, "Nordlöw" wrote:
 I have a couple of self-implemented C++ integer sort 
 algorithms lying around in my codebase.

What is the basic sort algorithm? Radix sort is generally seen as the best for sorting integer values currently. But that doesn't mean there is better, just that that is the one to beat. Which requires benchmarks. Which requires a framework for running benchmarks. As far as I am aware D hasn't got one of these as yet, but it needs one. Such things exists in Python (I am currently playing with benchmark) and , I am sure, other dynamic languages. It should be relatively easy to do something with D.
 I also have a parallel merge sort on top of them that uses 
 Intel TBB to give some
 further speedups.

std.parallelism needs some work to compete with TBB. Though I am not sure "like" is a term I would use for the TBB API.
 I have tweaked them to also work for floats and doubles, 
 through some interesting bit-fiddling tips I found on the net.
 
 Would anybody be interested in reviewing these to give 
 suggestions on how to best port it to Phobos?

I guess I just volunteered.

The non-inplace radix sort algorithm is the most interesting. I have a test and benchmarking suite along with it. Here's an extract of it using GCC 4.8.2 with -O3 settings on my Intel Quad-Core with 8 Hyperthreads. Results are promising. These benchmarks have been verfied to produce correct results: Element Count: 1000000 Number of tries: 5 Do in place: 0 ElementType Reference(std::sort) Radix radix_sort descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort float 1947812us 16 4.27x 4.02x 4.87x 4.37x double 2022670us 16 2.30x 2.09x 3.25x 2.31x uint8_t 1673208us 8 10.4x 10.0x 9.32x 10.6x uint16_t 1800614us 16 10.1x 9.37x 9.52x 10.1x uint32_t 1936704us 16 5.45x 4.99x 5.77x 5.42x Should I place the code on a Github repo or send it to you by email? Thx, Per

After having take a look at algorithm.d I guess we should use CTFE-logic to autoconfigure to use radix sort when ElementType isInteger or float or double. string and real could probably be sorted aswell. To make this configuration optimal an optional pre-configuration benchmarking pass may be need in order to give optimal speeds. Such a pass should for each sort implementation figure out a suitable limit [low,high] for the size range being sorted. /Per
Feb 23 2014
prev sibling next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
I did some more benchmarking:

These are all using non-inplace radix sort.
This can be useful in some cases where in-place is not needed.

Performance gain increase with size input array compared to 
std::sort as radix sort in "some regard" is O(n) and quick sort 
is O(n*log n).

Break sizes (when radix sort wins) for different element types:
uint8: <1024
uint16: <1024
uint32: >= 4096
float: >= 8192
double: >= 16384

Number of tries: 3
Do in place: 0
Element Count: 1024
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 2390us 16 0.483x 0.892x 0.386x 0.898x
double 809us 16 0.152x 0.155x 0.059x 0.0822x
uint8_t 1104us 8 10.9x 10.9x 1.65x 2.9x
uint16_t 1813us 16 1.38x 1.4x 0.889x 1.44x
uint32_t 707us 16 0.286x 0.281x 0.137x 0.231x
Element Count: 2048
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 1511us 16 0.518x 0.513x 0.21x 0.28x
double 1520us 16 0.268x 0.27x 0.0997x 0.144x
uint8_t 2603us 8 11.8x 11.8x 3.81x 12.2x
uint16_t 1451us 16 1.03x 1.1x 0.675x 1.03x
uint32_t 1531us 16 0.592x 0.575x 0.44x 0.583x
Element Count: 4096
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 3099us 16 0.939x 0.904x 0.785x 0.925x
double 3411us 16 0.504x 0.483x 0.13x 0.392x
uint8_t 2777us 8 6.69x 6.64x 1.7x 7.37x
uint16_t 3133us 16 1.91x 1.9x 0.458x 1.46x
uint32_t 3681us 16 1.15x 1.14x 0.229x 0.539x
Element Count: 8192
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 6760us 16 1.49x 1.54x 0.422x 0.921x
double 7285us 16 0.84x 0.825x 0.333x 0.897x
uint8_t 5895us 8 7.19x 6.89x 5.59x 7.77x
uint16_t 6730us 16 3.18x 3.23x 1.5x 3.52x
uint32_t 7095us 16 1.75x 1.7x 0.4x 1.89x
Element Count: 16384
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 14410us 16 2.28x 2.19x 0.732x 2.29x
double 15066us 16 1.19x 1.08x 0.539x 1.23x
uint8_t 12104us 8 7.8x 7.65x 7.22x 7.84x
uint16_t 14570us 16 4.81x 4.69x 0.906x 4.96x
uint32_t 15034us 16 2.61x 2.48x 0.537x 2.6x
Element Count: 32768
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 30764us 16 2.82x 2.62x 2.61x 2.67x
double 31569us 16 1.6x 1.46x 0.764x 1.58x
uint8_t 26176us 8 8.23x 8.52x 2.8x 8.49x
uint16_t 30947us 16 6.45x 6.13x 6.32x 6.91x
uint32_t 32785us 16 3.53x 3.39x 1.09x 3.89x
Element Count: 65536
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 63166us 16 3.24x 3.12x 2.12x 3.25x
double 71154us 16 1.83x 1.81x 1.66x 2.02x
uint8_t 56285us 8 8.52x 8.43x 4.22x 9.35x
uint16_t 64761us 16 7.68x 8.02x 2.54x 8.32x
uint32_t 68576us 16 4.16x 3.93x 2.45x 4.4x
Element Count: 131072
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 137296us 16 3.42x 3.47x 3.13x 3.73x
double 139881us 16 2.08x 1.85x 1.93x 2.08x
uint8_t 111393us 8 9.04x 9.15x 5.99x 9.45x
uint16_t 128848us 16 8.47x 8.12x 4.65x 8.53x
uint32_t 134586us 16 4.64x 4.29x 2.37x 4.56x
Element Count: 262144
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 275099us 16 3.87x 3.66x 3.02x 3.86x
double 294763us 16 2.22x 1.93x 2.82x 2.18x
uint8_t 233671us 8 9.53x 9.65x 5.06x 9.92x
uint16_t 259553us 16 8.93x 8.43x 5.6x 9.33x
uint32_t 282111us 16 5.16x 4.57x 4.37x 5.09x
Element Count: 524288
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 573934us 16 3.96x 3.8x 3.25x 4.05x
double 622996us 16 2.16x 2.05x 2.74x 2.29x
uint8_t 482522us 8 10x 9.82x 5.71x 10.1x
uint16_t 525446us 16 9.3x 8.76x 5.64x 9.49x
uint32_t 595237us 16 5.42x 4.95x 4.86x 5.44x
Element Count: 1048576
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 1202785us 16 4.22x 3.99x 4.9x 4.24x
double 1322048us 16 2.39x 2.13x 3.51x 2.45x
uint8_t 1009387us 8 10.3x 10.2x 8.83x 10.5x
uint16_t 1070777us 16 9.6x 9x 9.45x 9.59x
uint32_t 1242144us 16 5.37x 5.09x 6.07x 5.54x
Element Count: 2097152
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 2539646us 16 4.42x 4.15x 5.77x 4.46x
double 2686074us 16 2.46x 2.19x 3.64x 2.45x
uint8_t 2136319us 8 10.8x 10.7x 9.93x 10.8x
uint16_t 2199277us 16 9.7x 9x 11.1x 9.6x
uint32_t 2619062us 16 5.78x 5.28x 7.01x 5.83x
Element Count: 4194304
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 5388683us 16 4.64x 4.46x 6.76x 4.72x
double 5617282us 16 2.54x 2.29x 4x 2.56x
uint8_t 4449453us 8 11.1x 10.9x 12.7x 11.2x
uint16_t 4546878us 16 9.96x 9.24x 12.8x 9.91x
uint32_t 5466031us 16 5.87x 5.43x 7.94x 5.92x
Element Count: 8388608
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 11029468us 16 4.73x 4.51x 6.91x 4.75x
double 11801211us 16 2.65x 2.38x 4.21x 2.62x
uint8_t 9363439us 8 11.4x 11.3x 13.5x 11.4x
uint16_t 9569329us 16 10.2x 9.4x 14x 10x
uint32_t 11413279us 16 6.05x 5.56x 8.58x 5.94x
Feb 23 2014
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
On Sun, 2014-02-23 at 16:16 +0000, "Nordlöw" wrote:
[…]
 
 Should I place the code on a Github repo or send it to you by 
 email?

BitBucket or GitHub work for me: if we keep this public others may join in… -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Feb 23 2014
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
On Sun, 2014-02-23 at 16:59 +0000, "Nordlöw" wrote:
 I did some more benchmarking:

We definitely need a benchmarking framework so as to ensure we can have multiple runs of each datapoint calculating mean and standard deviation for each data point, and for grouping and ANOVA. I am guessing there isn't already one. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Feb 23 2014
prev sibling next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
I guess working on a branch of Phobos (std.algorithm) is 
preferrable when we get to the D implementation right?
Feb 23 2014
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
On Sun, 2014-02-23 at 17:33 +0000, "Nordlöw" wrote:
 I guess working on a branch of Phobos (std.algorithm) is 
 preferrable when we get to the D implementation right?

I just asked effectively that question re std.parallelism on the DMD Concurrency list, only to find I wasn't a member so the email is pending moderation. I am hoping there is a way of working on a self-contained component of Phobos without having to clone Phobos and do a feature branch. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Feb 23 2014
prev sibling next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
I guess this component would be integrated into std.algorithm 
when its ready, right?
Feb 23 2014
prev sibling next sibling parent "francesco cattoglio" <francesco.cattoglio gmail.com> writes:
On Sunday, 23 February 2014 at 19:17:29 UTC, Nordlöw wrote:
 I guess this component would be integrated into std.algorithm 
 when its ready, right?

If you are happy with it being in there, if the algorithm has a nice behaviour, and it passes the existing test suite, I don't see why it would not!
Feb 23 2014
prev sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 I guess I just volunteered.

You can find a prel version working for 8,16,32 and 64-bit signed and unsigned integers at https://github.com/nordlow/justd/blob/master/intsort.d I'll fix float and double support soon.
Feb 24 2014