digitalmars.D - Porting my Integer Sorting Algorithms to D

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (10/10) Feb 23 2014 I have a couple of self-implemented C++ integer sort algorithms

Russel Winder (19/29) Feb 23 2014 What is the basic sort algorithm? Radix sort is generally seen as the

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (20/50) Feb 23 2014 The non-inplace radix sort algorithm is the most interesting. I

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (10/67) Feb 23 2014 After having take a look at algorithm.d I guess we should use

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (126/126) Feb 23 2014 I did some more benchmarking:

Russel Winder (11/12) Feb 23 2014 We definitely need a benchmarking framework so as to ensure we can have

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (2/2) Feb 23 2014 I guess working on a branch of Phobos (std.algorithm) is

Russel Winder (12/14) Feb 23 2014 I just asked effectively that question re std.parallelism on the DMD

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (2/2) Feb 23 2014 I guess this component would be integrated into std.algorithm

francesco cattoglio (4/6) Feb 23 2014 If you are happy with it being in there, if the algorithm has a

Russel Winder (10/13) Feb 23 2014 BitBucket or GitHub work for me: if we keep this public others may join

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (4/5) Feb 24 2014 You can find a prel version working for 8,16,32 and 64-bit signed

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

I have a couple of self-implemented C++ integer sort algorithms 
lying around in my codebase.

I also have a parallel merge sort on top of them that uses Intel 
TBB to give some
further speedups.

I have tweaked them to also work for floats and doubles, through 
some interesting bit-fiddling tips I found on the net.

Would anybody be interested in reviewing these to give 
suggestions on how to best port it to Phobos?

/Per

Feb 23 2014

Russel Winder <russel winder.org.uk> writes:

On Sun, 2014-02-23 at 14:09 +0000, "Nordlöw" wrote:
 I have a couple of self-implemented C++ integer sort algorithms 
 lying around in my codebase.

What is the basic sort algorithm? Radix sort is generally seen as the
best for sorting integer values currently. But that doesn't mean there
is better, just that that is the one to beat.

Which requires benchmarks. Which requires a framework for running
benchmarks. As far as I am aware D hasn't got one of these as yet, but
it needs one. Such things exists in Python (I am currently playing with
benchmark) and , I am sure, other dynamic languages. It should be
relatively easy to do something with D.
  
 I also have a parallel merge sort on top of them that uses Intel 
 TBB to give some
 further speedups.

std.parallelism needs some work to compete with TBB. Though I am not
sure "like" is a term I would use for the TBB API.

 I have tweaked them to also work for floats and doubles, through 
 some interesting bit-fiddling tips I found on the net.
 
 Would anybody be interested in reviewing these to give 
 suggestions on how to best port it to Phobos?

I guess I just volunteered.

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Feb 23 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Sunday, 23 February 2014 at 15:10:36 UTC, Russel Winder wrote:
 On Sun, 2014-02-23 at 14:09 +0000, "Nordlöw" wrote:
 I have a couple of self-implemented C++ integer sort 
 algorithms lying around in my codebase.

 What is the basic sort algorithm? Radix sort is generally seen 
 as the
 best for sorting integer values currently. But that doesn't 
 mean there
 is better, just that that is the one to beat.

 Which requires benchmarks. Which requires a framework for 
 running
 benchmarks. As far as I am aware D hasn't got one of these as 
 yet, but
 it needs one. Such things exists in Python (I am currently 
 playing with
 benchmark) and , I am sure, other dynamic languages. It should 
 be
 relatively easy to do something with D.
 
 I also have a parallel merge sort on top of them that uses 
 Intel TBB to give some
 further speedups.

 std.parallelism needs some work to compete with TBB. Though I 
 am not
 sure "like" is a term I would use for the TBB API.

 I have tweaked them to also work for floats and doubles, 
 through some interesting bit-fiddling tips I found on the net.
 
 Would anybody be interested in reviewing these to give 
 suggestions on how to best port it to Phobos?

 I guess I just volunteered.

The non-inplace radix sort algorithm is the most interesting. I 
have a test and benchmarking suite along with it. Here's an 
extract of it using GCC 4.8.2 with -O3 settings on my Intel 
Quad-Core with 8 Hyperthreads. Results are promising. These 
benchmarks have been verfied to produce correct results:

Element Count: 1000000
Number of tries: 5
Do in place: 0
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float    1947812us 16 4.27x 4.02x 4.87x 4.37x
double   2022670us 16 2.30x 2.09x 3.25x 2.31x
uint8_t  1673208us  8 10.4x 10.0x 9.32x 10.6x
uint16_t 1800614us 16 10.1x 9.37x 9.52x 10.1x
uint32_t 1936704us 16 5.45x 4.99x 5.77x 5.42x

Should I place the code on a Github repo or send it to you by 
email?

Thx,
Per

Feb 23 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Sunday, 23 February 2014 at 16:16:48 UTC, Nordlöw wrote:
 On Sunday, 23 February 2014 at 15:10:36 UTC, Russel Winder 
 wrote:
 On Sun, 2014-02-23 at 14:09 +0000, "Nordlöw" wrote:
 I have a couple of self-implemented C++ integer sort 
 algorithms lying around in my codebase.

 What is the basic sort algorithm? Radix sort is generally seen 
 as the
 best for sorting integer values currently. But that doesn't 
 mean there
 is better, just that that is the one to beat.

 Which requires benchmarks. Which requires a framework for 
 running
 benchmarks. As far as I am aware D hasn't got one of these as 
 yet, but
 it needs one. Such things exists in Python (I am currently 
 playing with
 benchmark) and , I am sure, other dynamic languages. It should 
 be
 relatively easy to do something with D.
 
 I also have a parallel merge sort on top of them that uses 
 Intel TBB to give some
 further speedups.

 std.parallelism needs some work to compete with TBB. Though I 
 am not
 sure "like" is a term I would use for the TBB API.

 I have tweaked them to also work for floats and doubles, 
 through some interesting bit-fiddling tips I found on the net.
 
 Would anybody be interested in reviewing these to give 
 suggestions on how to best port it to Phobos?

 I guess I just volunteered.

 The non-inplace radix sort algorithm is the most interesting. I 
 have a test and benchmarking suite along with it. Here's an 
 extract of it using GCC 4.8.2 with -O3 settings on my Intel 
 Quad-Core with 8 Hyperthreads. Results are promising. These 
 benchmarks have been verfied to produce correct results:

 Element Count: 1000000
 Number of tries: 5
 Do in place: 0
 ElementType Reference(std::sort) Radix radix_sort 
 descending_radix_sort parallel_radix_sort 
 tbb_parallel_radix_sort
 float    1947812us 16 4.27x 4.02x 4.87x 4.37x
 double   2022670us 16 2.30x 2.09x 3.25x 2.31x
 uint8_t  1673208us  8 10.4x 10.0x 9.32x 10.6x
 uint16_t 1800614us 16 10.1x 9.37x 9.52x 10.1x
 uint32_t 1936704us 16 5.45x 4.99x 5.77x 5.42x

 Should I place the code on a Github repo or send it to you by 
 email?

 Thx,
 Per

After having take a look at algorithm.d I guess we should use 
CTFE-logic to autoconfigure to use radix sort when ElementType 
isInteger or float or double. string and real could probably be 
sorted aswell.

To make this configuration optimal an optional pre-configuration 
benchmarking pass may be need in order to give optimal speeds. 
Such a pass should for each sort implementation figure out a 
suitable limit [low,high] for the size range being sorted.

/Per

Feb 23 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

I did some more benchmarking:

These are all using non-inplace radix sort.
This can be useful in some cases where in-place is not needed.

Performance gain increase with size input array compared to 
std::sort as radix sort in "some regard" is O(n) and quick sort 
is O(n*log n).

Break sizes (when radix sort wins) for different element types:
uint8: <1024
uint16: <1024
uint32: >= 4096
float: >= 8192
double: >= 16384

Number of tries: 3
Do in place: 0
Element Count: 1024
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 2390us 16 0.483x 0.892x 0.386x 0.898x
double 809us 16 0.152x 0.155x 0.059x 0.0822x
uint8_t 1104us 8 10.9x 10.9x 1.65x 2.9x
uint16_t 1813us 16 1.38x 1.4x 0.889x 1.44x
uint32_t 707us 16 0.286x 0.281x 0.137x 0.231x
Element Count: 2048
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 1511us 16 0.518x 0.513x 0.21x 0.28x
double 1520us 16 0.268x 0.27x 0.0997x 0.144x
uint8_t 2603us 8 11.8x 11.8x 3.81x 12.2x
uint16_t 1451us 16 1.03x 1.1x 0.675x 1.03x
uint32_t 1531us 16 0.592x 0.575x 0.44x 0.583x
Element Count: 4096
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 3099us 16 0.939x 0.904x 0.785x 0.925x
double 3411us 16 0.504x 0.483x 0.13x 0.392x
uint8_t 2777us 8 6.69x 6.64x 1.7x 7.37x
uint16_t 3133us 16 1.91x 1.9x 0.458x 1.46x
uint32_t 3681us 16 1.15x 1.14x 0.229x 0.539x
Element Count: 8192
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 6760us 16 1.49x 1.54x 0.422x 0.921x
double 7285us 16 0.84x 0.825x 0.333x 0.897x
uint8_t 5895us 8 7.19x 6.89x 5.59x 7.77x
uint16_t 6730us 16 3.18x 3.23x 1.5x 3.52x
uint32_t 7095us 16 1.75x 1.7x 0.4x 1.89x
Element Count: 16384
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 14410us 16 2.28x 2.19x 0.732x 2.29x
double 15066us 16 1.19x 1.08x 0.539x 1.23x
uint8_t 12104us 8 7.8x 7.65x 7.22x 7.84x
uint16_t 14570us 16 4.81x 4.69x 0.906x 4.96x
uint32_t 15034us 16 2.61x 2.48x 0.537x 2.6x
Element Count: 32768
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 30764us 16 2.82x 2.62x 2.61x 2.67x
double 31569us 16 1.6x 1.46x 0.764x 1.58x
uint8_t 26176us 8 8.23x 8.52x 2.8x 8.49x
uint16_t 30947us 16 6.45x 6.13x 6.32x 6.91x
uint32_t 32785us 16 3.53x 3.39x 1.09x 3.89x
Element Count: 65536
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 63166us 16 3.24x 3.12x 2.12x 3.25x
double 71154us 16 1.83x 1.81x 1.66x 2.02x
uint8_t 56285us 8 8.52x 8.43x 4.22x 9.35x
uint16_t 64761us 16 7.68x 8.02x 2.54x 8.32x
uint32_t 68576us 16 4.16x 3.93x 2.45x 4.4x
Element Count: 131072
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 137296us 16 3.42x 3.47x 3.13x 3.73x
double 139881us 16 2.08x 1.85x 1.93x 2.08x
uint8_t 111393us 8 9.04x 9.15x 5.99x 9.45x
uint16_t 128848us 16 8.47x 8.12x 4.65x 8.53x
uint32_t 134586us 16 4.64x 4.29x 2.37x 4.56x
Element Count: 262144
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 275099us 16 3.87x 3.66x 3.02x 3.86x
double 294763us 16 2.22x 1.93x 2.82x 2.18x
uint8_t 233671us 8 9.53x 9.65x 5.06x 9.92x
uint16_t 259553us 16 8.93x 8.43x 5.6x 9.33x
uint32_t 282111us 16 5.16x 4.57x 4.37x 5.09x
Element Count: 524288
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 573934us 16 3.96x 3.8x 3.25x 4.05x
double 622996us 16 2.16x 2.05x 2.74x 2.29x
uint8_t 482522us 8 10x 9.82x 5.71x 10.1x
uint16_t 525446us 16 9.3x 8.76x 5.64x 9.49x
uint32_t 595237us 16 5.42x 4.95x 4.86x 5.44x
Element Count: 1048576
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 1202785us 16 4.22x 3.99x 4.9x 4.24x
double 1322048us 16 2.39x 2.13x 3.51x 2.45x
uint8_t 1009387us 8 10.3x 10.2x 8.83x 10.5x
uint16_t 1070777us 16 9.6x 9x 9.45x 9.59x
uint32_t 1242144us 16 5.37x 5.09x 6.07x 5.54x
Element Count: 2097152
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 2539646us 16 4.42x 4.15x 5.77x 4.46x
double 2686074us 16 2.46x 2.19x 3.64x 2.45x
uint8_t 2136319us 8 10.8x 10.7x 9.93x 10.8x
uint16_t 2199277us 16 9.7x 9x 11.1x 9.6x
uint32_t 2619062us 16 5.78x 5.28x 7.01x 5.83x
Element Count: 4194304
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 5388683us 16 4.64x 4.46x 6.76x 4.72x
double 5617282us 16 2.54x 2.29x 4x 2.56x
uint8_t 4449453us 8 11.1x 10.9x 12.7x 11.2x
uint16_t 4546878us 16 9.96x 9.24x 12.8x 9.91x
uint32_t 5466031us 16 5.87x 5.43x 7.94x 5.92x
Element Count: 8388608
ElementType Reference(std::sort) Radix radix_sort 
descending_radix_sort parallel_radix_sort tbb_parallel_radix_sort
float 11029468us 16 4.73x 4.51x 6.91x 4.75x
double 11801211us 16 2.65x 2.38x 4.21x 2.62x
uint8_t 9363439us 8 11.4x 11.3x 13.5x 11.4x
uint16_t 9569329us 16 10.2x 9.4x 14x 10x
uint32_t 11413279us 16 6.05x 5.56x 8.58x 5.94x

Feb 23 2014

Russel Winder <russel winder.org.uk> writes:

On Sun, 2014-02-23 at 16:59 +0000, "Nordlöw" wrote:
 I did some more benchmarking:

We definitely need a benchmarking framework so as to ensure we can have
multiple runs of each datapoint calculating mean and standard deviation
for each data point, and for grouping and ANOVA. I am guessing there
isn't already one.

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Feb 23 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

I guess working on a branch of Phobos (std.algorithm) is 
preferrable when we get to the D implementation right?

Feb 23 2014

Russel Winder <russel winder.org.uk> writes:

On Sun, 2014-02-23 at 17:33 +0000, "Nordlöw" wrote:
 I guess working on a branch of Phobos (std.algorithm) is 
 preferrable when we get to the D implementation right?

I just asked effectively that question re std.parallelism on the DMD
Concurrency list, only to find I wasn't a member so the email is pending
moderation.

I am hoping there is a way of working on a self-contained component of
Phobos without having to clone Phobos and do a feature branch.
-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Feb 23 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

I guess this component would be integrated into std.algorithm 
when its ready, right?

Feb 23 2014

"francesco cattoglio" <francesco.cattoglio gmail.com> writes:

On Sunday, 23 February 2014 at 19:17:29 UTC, Nordlöw wrote:
 I guess this component would be integrated into std.algorithm 
 when its ready, right?

If you are happy with it being in there, if the algorithm has a 
nice behaviour, and it passes the existing test suite, I don't 
see why it would not!

Feb 23 2014

Russel Winder <russel winder.org.uk> writes:

On Sun, 2014-02-23 at 16:16 +0000, "Nordlöw" wrote:
[…]
 
 Should I place the code on a Github repo or send it to you by 
 email?

BitBucket or GitHub work for me: if we keep this public others may join
in…

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Feb 23 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

 I guess I just volunteered.

You can find a prel version working for 8,16,32 and 64-bit signed 
and unsigned integers at

https://github.com/nordlow/justd/blob/master/intsort.d

I'll fix float and double support soon.

Feb 24 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Porting my Integer Sorting Algorithms to D