digitalmars.D - [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.
- Ilya Yaroshenko (49/49) Nov 26 2016 Hey,
- Andrei Alexandrescu (6/12) Nov 26 2016 Congrats! Also thanks for using the Boost license which would allow
- Ilya Yaroshenko (11/28) Nov 26 2016 Yep, I am using Phobos parts and we can do backports. In other
- Joseph Rushton Wakeling (6/8) Nov 26 2016 There's no strong reason for those checks to be done via
- Guillaume Piolat (8/14) Nov 27 2016 Without druntime, global ctor/dtor and TLS can't be used too.
- Andrei Alexandrescu (4/19) Nov 27 2016 That would be progress.
- Guillaume Piolat (8/15) Nov 27 2016 Same story for core.cpuid which is initialized by a shared static
- Ilya Yaroshenko (4/23) Nov 27 2016 They are in Mir CPUID, but methods are not pure.
- Steven Schveighoffer (9/23) Nov 28 2016 How we do a similar thing for core.time:
- Jacob Carlborg (4/5) Nov 28 2016 Isn't TLS initialized by the OS?
- Joseph Rushton Wakeling (53/56) Dec 13 2016 I've finally found a moment to look into this (I'm at home
- Joseph Rushton Wakeling (48/52) Dec 13 2016 Benchmark is here:
- Ilya Yaroshenko (7/64) Dec 13 2016 My own.
- Ilya Yaroshenko (2/4) Dec 13 2016 Using -> Uint
- Joseph Rushton Wakeling (24/40) Dec 13 2016 Congratulations, then -- I think this is a very interesting
- Ilya Yaroshenko (3/8) Dec 14 2016 Because memory access locality. z is always hot.
- Joseph Rushton Wakeling (20/29) Jan 07 2017 PR is now on Phobos, and in the process of looking at this
- Andrei Alexandrescu (4/12) Jan 07 2017 This indicates a compiler bug in dmd itself, so the best outcome for
- Joseph Rushton Wakeling (25/28) Jan 08 2017 Yes, that much is clear -- sorry if I wasn't clear enough myself.
- Joseph Rushton Wakeling (31/34) Jan 08 2017 I underestimated myself :-P Minimal example is as follows:
- jmh530 (3/18) Nov 26 2016 How does a mir-lapack fit in to the plan going forward?
- Ilya Yaroshenko (3/25) Nov 26 2016 LAPACK API will be a part of Mir GLAS
- jmh530 (3/5) Nov 27 2016 Great. I might have asked before, apologies if so.
- =?UTF-8?B?Tm9yZGzDtnc=?= (3/7) Nov 29 2016 Does Gb mean Gigabytes or Gigabits?
- Ilya Yaroshenko (2/9) Nov 29 2016 Gigabits
- Joseph Rushton Wakeling (18/21) Dec 14 2016 A note on this. The `opCall` (or, in the range version,
Hey, 32-bit Mt19937 random number Generator is default in Phobos. It is default in Mir too, except that 64-bit targets use 64-bit Mt19937 instead. The last Mir Random beta improves performance for Mt19937. The goal was to: 1. Improve RNG generation performance by making code more friendly for CPU pipelining. Tempering (finalization) operations was mixed with internal payload update operations. 2. Reduce number of required comparison operations using unification with add/sub operations. The order of payload iteration was reversed (the engine generates the same numbers as before). 3. Reduce memory access twice both for generation (required for 1) and initialization. This will make the algorithm more cache-friendly for real world applications. Ditto (1) Bench results: ---- mir.random 32-bit Mt19937: 6.80851 Gb/s mir.random 64-bit Mt19937: 12.5984 Gb/s std.random 32-bit Mt19937: 4.41989 Gb/s std.random 64-bit Mt19937: wrong initialization and tempering (finalization) ---- 54%: 1.54 = 6.80851 / 4.41989 185%: 2.85 = 12.5984 / 4.41989 The benchmark can be found here: https://github.com/libmir/mir-random/blob/master/bench/Mt19937_bench.d Ads: We are looking for new contributors and partners! Star and contribute! https://github.com/libmir : - mir - Generic numeric library - dcv - D Computer Vision library - mir-glas - Linear Algebra Subprograms (written in D) - mir-random - random numbers generators, includes non-uniform distributions. - mir-cpuid - CPU identification Future: - mir-runtime - lightweight always inlined `nothrow nogc` Dlang Runtime - mir-neural - neural network library [WIP] - mir-fft - Fast and multidimensional FFT - mir-svm - support vector machines Cheers, Ilya & Mir Team
Nov 26 2016
On 11/26/16 11:31 AM, Ilya Yaroshenko wrote:Hey, 32-bit Mt19937 random number Generator is default in Phobos. It is default in Mir too, except that 64-bit targets use 64-bit Mt19937 instead. The last Mir Random beta improves performance for Mt19937. The goal was to:Congrats! Also thanks for using the Boost license which would allow backporting the improvements to Phobos. Who'd be up for it? Also I'm thinking of removing std.random's dependency on druntime, e.g. by removing the uses of enforce. Thoughts? Andrei
Nov 26 2016
On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei Alexandrescu wrote:On 11/26/16 11:31 AM, Ilya Yaroshenko wrote:Yep, I am using Phobos parts and we can do backports. In other hand it is requires efforts (I have experience with ndslice, and I will be happy to remove ndslice from Mir after LDC 1.2.0 release).Hey, 32-bit Mt19937 random number Generator is default in Phobos. It is default in Mir too, except that 64-bit targets use 64-bit Mt19937 instead. The last Mir Random beta improves performance for Mt19937. The goal was to:Congrats! Also thanks for using the Boost license which would allow backporting the improvements to Phobos. Who'd be up for it?Also I'm thinking of removing std.random's dependency on druntime, e.g. by removing the uses of enforce. Thoughts? AndreiAnother one DRuntime dependency is core time. BTW, I would be happy to have core.time + std.datetime as unified nothrow nogc library without DRuntime dependency (ahah, it is BetterC propaganda!) Ilya
Nov 26 2016
On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei Alexandrescu wrote:Also I'm thinking of removing std.random's dependency on druntime, e.g. by removing the uses of enforce. Thoughts?There's no strong reason for those checks to be done via `enforce` except for a design decision that user input should be treated as external input. The user is in complete control of those parameters, there's no reason they can't be `assert` checks.
Nov 26 2016
On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei Alexandrescu wrote:Congrats! Also thanks for using the Boost license which would allow backporting the improvements to Phobos. Who'd be up for it? Also I'm thinking of removing std.random's dependency on druntime, e.g. by removing the uses of enforce. Thoughts? AndreiWithout druntime, global ctor/dtor and TLS can't be used too. Besides nothrow nogc, std.random relies on a global ctor for creating the default RNG, which is used liberally. A useful intermediate step is to have these "[shared] static this" ctor call a function instead, so that programs without druntime can call them too.
Nov 27 2016
On 11/27/2016 08:28 AM, Guillaume Piolat wrote:On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei Alexandrescu wrote:I see, so TLS is a problem.Congrats! Also thanks for using the Boost license which would allow backporting the improvements to Phobos. Who'd be up for it? Also I'm thinking of removing std.random's dependency on druntime, e.g. by removing the uses of enforce. Thoughts? AndreiWithout druntime, global ctor/dtor and TLS can't be used too. Besides nothrow nogc, std.random relies on a global ctor for creating the default RNG, which is used liberally.A useful intermediate step is to have these "[shared] static this" ctor call a function instead, so that programs without druntime can call them too.That would be progress. Andrei
Nov 27 2016
On Sunday, 27 November 2016 at 13:35:48 UTC, Andrei Alexandrescu wrote:Same story for core.cpuid which is initialized by a shared static constructor. https://github.com/dlang/druntime/blob/master/src/core/cpuid.d It seems the "shared static this" is allowed to break immutable but a function wouldn't. So maybe the variables here should be private __gshared?.A useful intermediate step is to have these "[shared] static this" ctor call a function instead, so that programs without druntime can call them too.That would be progress. Andrei
Nov 27 2016
On Sunday, 27 November 2016 at 13:58:29 UTC, Guillaume Piolat wrote:On Sunday, 27 November 2016 at 13:35:48 UTC, Andrei Alexandrescu wrote:They are in Mir CPUID, but methods are not pure. https://github.com/libmir/mir-cpuid/blob/master/source/cpuid/x86_any.dSame story for core.cpuid which is initialized by a shared static constructor. https://github.com/dlang/druntime/blob/master/src/core/cpuid.d It seems the "shared static this" is allowed to break immutable but a function wouldn't. So maybe the variables here should be private __gshared?.A useful intermediate step is to have these "[shared] static this" ctor call a function instead, so that programs without druntime can call them too.That would be progress. Andrei
Nov 27 2016
On 11/27/16 8:58 AM, Guillaume Piolat wrote:On Sunday, 27 November 2016 at 13:35:48 UTC, Andrei Alexandrescu wrote:How we do a similar thing for core.time: https://github.com/dlang/druntime/blob/master/src/core/time.d#L2388 This function is called from the druntime startup (before any static ctors). Not sure if it helps or not. Note that we can cheat the type system here because we know that nothing else will access the data before it's initialized. There is also a check in the function to ensure it's only done once. -SteveSame story for core.cpuid which is initialized by a shared static constructor. https://github.com/dlang/druntime/blob/master/src/core/cpuid.d It seems the "shared static this" is allowed to break immutable but a function wouldn't. So maybe the variables here should be private __gshared?.A useful intermediate step is to have these "[shared] static this" ctor call a function instead, so that programs without druntime can call them too.That would be progress.
Nov 28 2016
On 2016-11-27 14:28, Guillaume Piolat wrote:Without druntime, global ctor/dtor and TLS can't be used too.Isn't TLS initialized by the OS? -- /Jacob Carlborg
Nov 28 2016
On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei Alexandrescu wrote:Congrats! Also thanks for using the Boost license which would allow backporting the improvements to Phobos. Who'd be up for it?I've finally found a moment to look into this (I'm at home recovering from a seasonal virus, which ironically puts some time and mental space in my hands). It makes for an interesting comparison. The traditional Mersenne Twister algorithm generates N random words in one go, and then iterates over them, before generating another N random words, and so on. (Word == unsigned integer type.) By contrast Ilya's implementation just updates the single next entry in the static array that stores the state of the engine. This would seem to work out cheaper on average: perhaps because the update-a-variate code stays hotter, perhaps because it involves less branching and looping. It certainly makes for a simpler, easier to follow update. However, it might be a good idea to benchmark this in a context where there are a lot of random variates being generated, but where other things are also happening that might render the update code less "hot" than a straightforward `foreach` loop benchmark. (Maybe try it out with a Monte Carlo simulation ...?) The tradeoff between many cheap updates and one expensive one, versus all updates being a little more expensive but less on average, might not be viable in some practical use-cases. The Phobos implementation also includes a check on whether the generator is properly seeded, which is performed in every `popFront()`, which has a reasonably significant impact. Ilya's implementation can get away with not performing that check because it uses ` disable this();` to remove the possibility of initializing the generator without it being properly seeded. The impact of that check can possibly be lessened by making it a boolean comparison rather than the existing `size_t` equality comparison. -- action item here: could we deprecate `this()` in Phobos as a precursor to removing the opportunity to instantiate a generator without properly initializing it? I'm going to try to put together a range-based version to see if this also makes any difference. I'll post some benchmarks of my own once that's done, and if all looks good I'll try to put a Phobos PR together. A few questions -- not blockers, but nice to understand: * Ilya, is this implementation your own design, or do you have a reference for the rationale behind this revised implementation of MT? * Is there a particular reason why the `index` variable is a `Uint` (i.e. the word type of the generator) rather than a `size_t`? I presume there's a motivation, since the casting back and forth in `opCall` would otherwise seem inconvenient. * Is there a motivation for the reverse iteration through the RNG state array?
Dec 13 2016
On Tuesday, 13 December 2016 at 18:15:25 UTC, Joseph Rushton Wakeling wrote:I'm going to try to put together a range-based version to see if this also makes any difference. I'll post some benchmarks of my own once that's done, and if all looks good I'll try to put a Phobos PR together.Benchmark is here: https://github.com/WebDrake/mersenne-twister-range Overall these benchmarks seem to have reasonably wide variance. I note that the original benchmark's attempt to make the code "hot" had quite a strong impact in terms of the ordering of individual benchmarks; I've tried to work around that by doing the "make it hot" loop immediately before the benchmark. This is still just a plain old foreach, so it may be worth exploring some benchmarks where other stuff is done with the random variates. As a first step I've added some benchmarks using `uniform`. Sample output from 2 different runs of these benchmarks: ----- Running ./mir-mt-range opCall Mir 32-bit Mt19937: 6.68058 Gb/s check sum = 429472035921730457 range Mir 32-bit Mt19937: 6.639 Gb/s check sum = 429472031909629874 opCall Mir 64-bit Mt19937: 14.0351 Gb/s check sum = 5606740663277085587 range Mir 64-bit Mt19937: 13.8229 Gb/s check sum = 3636092631268073456 Phobos 32-bit Mt19937: 4.22164 Gb/s check sum = 429472035921730457 Phobos 64-bit Mt19937: wrong initialization and tempering range Mir 32-bit Mt19937: 95.4198 U(0.0, 1.0) variates/s range Mir 64-bit Mt19937: 87.184 U(0.0, 1.0) variates/s Phobos 32-bit Mt19937: 80.8407 U(0.0, 1.0) variates/s ---- ---- Running ./mir-mt-range opCall Mir 32-bit Mt19937: 6.36183 Gb/s check sum = 429472035921730457 range Mir 32-bit Mt19937: 6.54397 Gb/s check sum = 429472031909629874 opCall Mir 64-bit Mt19937: 13.7339 Gb/s check sum = 5606740663277085587 range Mir 64-bit Mt19937: 13.8229 Gb/s check sum = 3636092631268073456 Phobos 32-bit Mt19937: 4.18848 Gb/s check sum = 429472035921730457 Phobos 64-bit Mt19937: wrong initialization and tempering range Mir 32-bit Mt19937: 95.2381 U(0.0, 1.0) variates/s range Mir 64-bit Mt19937: 97.0874 U(0.0, 1.0) variates/s Phobos 32-bit Mt19937: 88.1057 U(0.0, 1.0) variates/s ----
Dec 13 2016
On Tuesday, 13 December 2016 at 18:15:25 UTC, Joseph Rushton Wakeling wrote:On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei Alexandrescu wrote:My own.Congrats! Also thanks for using the Boost license which would allow backporting the improvements to Phobos. Who'd be up for it?I've finally found a moment to look into this (I'm at home recovering from a seasonal virus, which ironically puts some time and mental space in my hands). It makes for an interesting comparison. The traditional Mersenne Twister algorithm generates N random words in one go, and then iterates over them, before generating another N random words, and so on. (Word == unsigned integer type.) By contrast Ilya's implementation just updates the single next entry in the static array that stores the state of the engine. This would seem to work out cheaper on average: perhaps because the update-a-variate code stays hotter, perhaps because it involves less branching and looping. It certainly makes for a simpler, easier to follow update. However, it might be a good idea to benchmark this in a context where there are a lot of random variates being generated, but where other things are also happening that might render the update code less "hot" than a straightforward `foreach` loop benchmark. (Maybe try it out with a Monte Carlo simulation ...?) The tradeoff between many cheap updates and one expensive one, versus all updates being a little more expensive but less on average, might not be viable in some practical use-cases. The Phobos implementation also includes a check on whether the generator is properly seeded, which is performed in every `popFront()`, which has a reasonably significant impact. Ilya's implementation can get away with not performing that check because it uses ` disable this();` to remove the possibility of initializing the generator without it being properly seeded. The impact of that check can possibly be lessened by making it a boolean comparison rather than the existing `size_t` equality comparison. -- action item here: could we deprecate `this()` in Phobos as a precursor to removing the opportunity to instantiate a generator without properly initializing it? I'm going to try to put together a range-based version to see if this also makes any difference. I'll post some benchmarks of my own once that's done, and if all looks good I'll try to put a Phobos PR together. A few questions -- not blockers, but nice to understand: * Ilya, is this implementation your own design, or do you have a reference for the rationale behind this revised implementation of MT?* Is there a particular reason why the `index` variable is a `Uint` (i.e. the word type of the generator) rather than a `size_t`? I presume there's a motivation, since the casting back and forth in `opCall` would otherwise seem inconvenient.If I remember correctly it is used with Using, so I use the same type.* Is there a motivation for the reverse iteration through the RNG state array?Yes, it composes add/sub with cmp operations. Ilya
Dec 13 2016
If I remember correctly it is used with Using, so I use the same type.Using -> Uint Sorry, it is phone keyboard
Dec 13 2016
On Tuesday, 13 December 2016 at 23:18:26 UTC, Ilya Yaroshenko wrote:Congratulations, then -- I think this is a very interesting rewrite of the algorithm and I would encourage you to write up a paper on your choice of optimizations, if that is of interest to you.* Ilya, is this implementation your own design, or do you have a reference for the rationale behind this revised implementation of MT?My own.Not anywhere that I can find. `n`, the template parameter from which `this.index` is initialized, is a `size_t`, while the value used inside `opCall` is a `sizediff_t`. TBH I assumed it was related to the bit-packing of the struct, rather than any question of usage.* Is there a particular reason why the `index` variable is a `Uint` (i.e. the word type of the generator) rather than a `size_t`? I presume there's a motivation, since the casting back and forth in `opCall` would otherwise seem inconvenient.If I remember correctly it is used with Uint, so I use the same type.Cool. Last question: IIUC you use the private `_z` parameter as a cache for the most recent `data[index]` value (and AFAICT that's the only use it has). Is there a good reason for doing this, rather than just setting `z = data[index]` inside `opCall` (where `z` is the local parameter inside the method)? I ask because it doesn't seem to make any practical difference to the benchmarking outcomes, and you have to look up `data[index]` anyway when you write it, so it's not obvious to me that it really saves anything in practice. (I confess I haven't looked at the assembly, so I might be missing something.) Thanks for the answers, and apologies for my slow follow-up on some of the things we discussed earlier related to mir.random -- it's been a bit of a busy patch ... :-\* Is there a motivation for the reverse iteration through the RNG state array?Yes, it composes add/sub with cmp operations.
Dec 13 2016
On Wednesday, 14 December 2016 at 00:18:06 UTC, Joseph Rushton Wakeling wrote:Cool. Last question: IIUC you use the private `_z` parameter as a cache for the most recent `data[index]` value (and AFAICT that's the only use it has). Is there a good reason for doing this, rather than just setting `z = data[index]` inside `opCall` (where `z` is the local parameter inside the method)?Because memory access locality. z is always hot.
Dec 14 2016
On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei Alexandrescu wrote:On 11/26/16 11:31 AM, Ilya Yaroshenko wrote:PR is now on Phobos, and in the process of looking at this together, Ilya and I resolved a number of issues both in this code and in mir.random: https://github.com/dlang/phobos/pull/5011 There is unfortunately one serious blocker to this PR. In order to allow it to be a drop-in replacement for the previous Phobos implementation, I had to find a means to default-initialize its internal state at compile time so that it would match what would be there if it was explicitly seeded with the default seed. The method I developed works fine with LDC, but fails with DMD: the internal state of the generator winds up as all zeros, except for the `State.z` parameter which mysteriously ends up at the correct value. This would suggest that somehow the generated state is being overwritten _after_ being correctly generated, rather than not correctly generated in the first place. Any chance that someone with more CTFE-fu than myself could take a glance and advise what might be wrong (or how to work around it)?Hey, 32-bit Mt19937 random number Generator is default in Phobos. It is default in Mir too, except that 64-bit targets use 64-bit Mt19937 instead.Congrats! Also thanks for using the Boost license which would allow backporting the improvements to Phobos. Who'd be up for it?
Jan 07 2017
On 1/7/17 6:16 PM, Joseph Rushton Wakeling wrote:The method I developed works fine with LDC, but fails with DMD: the internal state of the generator winds up as all zeros, except for the `State.z` parameter which mysteriously ends up at the correct value. This would suggest that somehow the generated state is being overwritten _after_ being correctly generated, rather than not correctly generated in the first place. Any chance that someone with more CTFE-fu than myself could take a glance and advise what might be wrong (or how to work around it)?This indicates a compiler bug in dmd itself, so the best outcome for this library work would be a reduced compiler bug + a simple library workaround. -- Andrei
Jan 07 2017
On Sunday, 8 January 2017 at 02:51:51 UTC, Andrei Alexandrescu wrote:This indicates a compiler bug in dmd itself, so the best outcome for this library work would be a reduced compiler bug + a simple library workaround. -- AndreiYes, that much is clear -- sorry if I wasn't clear enough myself. I'm asking for eyes on the problem because reducing it to a minimal example appears non-trivial, while the bug itself looks serious beyond its effect on this PR. For what it's worth: * it appears to be a backend issue, not frontend (using dmd with the same frontend as the latest ldc doesn't make a difference); * creating an `enum` manifest constant from the CTFE function works fine, but assigning to the runtime struct field from that enum produces the same mostly-zeros result; * it definitely doesn't appear to be something as trivial as just default initializing a struct instance (or a struct instance including a static array of more than a certain size) being the problem. I will keep trying to reduce it to a minimal example, but CTFE-related backend bugs are not something I have a lot of experience with tracking down, hence the request for help. Would filing an issue still be helpful at this stage, even without a minimal example?
Jan 08 2017
On Sunday, 8 January 2017 at 13:16:29 UTC, Joseph Rushton Wakeling wrote:I'm asking for eyes on the problem because reducing it to a minimal example appears non-trivial, while the bug itself looks serious beyond its effect on this PR.I underestimated myself :-P Minimal example is as follows: ///////////////////////////////////////////////////////////////// struct Inner { uint value = void; // <=== removing this `void` initializer // removes the problem } struct Outer { Inner inner = defaultInner(); static Inner defaultInner() { if (!__ctfe) assert(false); Inner inn; inn.value = 23; return inn; } } void main() { import std.stdio : writeln; Outer outer; outer.inner.writeln; // outputs `Inner(0)` with dmd // but should be `Inner(23)` // ldc gets it right ;-) } ///////////////////////////////////////////////////////////////// So, the bug comes down to priority of default initialization. I'll follow an issue later today and fix the PR accordingly.
Jan 08 2017
On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko wrote:https://github.com/libmir : - mir - Generic numeric library - dcv - D Computer Vision library - mir-glas - Linear Algebra Subprograms (written in D) - mir-random - random numbers generators, includes non-uniform distributions. - mir-cpuid - CPU identification Future: - mir-runtime - lightweight always inlined `nothrow nogc` Dlang Runtime - mir-neural - neural network library [WIP] - mir-fft - Fast and multidimensional FFT - mir-svm - support vector machines Cheers, Ilya & Mir TeamHow does a mir-lapack fit in to the plan going forward?
Nov 26 2016
On Sunday, 27 November 2016 at 02:28:06 UTC, jmh530 wrote:On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko wrote:LAPACK API will be a part of Mir GLAS https://github.com/libmir/mir-glas --Ilyahttps://github.com/libmir : - mir - Generic numeric library - dcv - D Computer Vision library - mir-glas - Linear Algebra Subprograms (written in D) - mir-random - random numbers generators, includes non-uniform distributions. - mir-cpuid - CPU identification Future: - mir-runtime - lightweight always inlined `nothrow nogc` Dlang Runtime - mir-neural - neural network library [WIP] - mir-fft - Fast and multidimensional FFT - mir-svm - support vector machines Cheers, Ilya & Mir TeamHow does a mir-lapack fit in to the plan going forward?
Nov 26 2016
On Sunday, 27 November 2016 at 06:46:20 UTC, Ilya Yaroshenko wrote:LAPACK API will be a part of Mir GLAS https://github.com/libmir/mir-glas --IlyaGreat. I might have asked before, apologies if so.
Nov 27 2016
On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko wrote:Bench results: ---- mir.random 32-bit Mt19937: 6.80851 Gb/sDoes Gb mean Gigabytes or Gigabits?
Nov 29 2016
On Tuesday, 29 November 2016 at 16:54:55 UTC, Nordlöw wrote:On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko wrote:GigabitsBench results: ---- mir.random 32-bit Mt19937: 6.80851 Gb/sDoes Gb mean Gigabytes or Gigabits?
Nov 29 2016
On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko wrote:1. Improve RNG generation performance by making code more friendly for CPU pipelining. Tempering (finalization) operations was mixed with internal payload update operations.A note on this. The `opCall` (or, in the range version, `popFront`) of Ilya's implementation mixes together two superficially independent actions: (1) calculating the current random variate from the current index of the internal state array; (2) updating the current index of the internal state array, and moving to the next entry. It's straightforward to split out these two procedures into two separate methods (or at least two clearly separated sequences within the `opCall`), but doing so results in a notable performance hit (on my machine, something in the order of 1 GB/s less random bits). Intertwining these steps in this way is therefore a very smart optimization (although TBH it feels a little worrying that it's necessary).
Dec 14 2016