digitalmars.D - [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

Ilya Yaroshenko (49/49) Nov 26 2016 Hey,

Andrei Alexandrescu (6/12) Nov 26 2016 Congrats! Also thanks for using the Boost license which would allow

Ilya Yaroshenko (11/28) Nov 26 2016 Yep, I am using Phobos parts and we can do backports. In other
Joseph Rushton Wakeling (6/8) Nov 26 2016 There's no strong reason for those checks to be done via
Guillaume Piolat (8/14) Nov 27 2016 Without druntime, global ctor/dtor and TLS can't be used too.

Andrei Alexandrescu (4/19) Nov 27 2016 That would be progress.

Guillaume Piolat (8/15) Nov 27 2016 Same story for core.cpuid which is initialized by a shared static

Ilya Yaroshenko (4/23) Nov 27 2016 They are in Mir CPUID, but methods are not pure.
Steven Schveighoffer (9/23) Nov 28 2016 How we do a similar thing for core.time:

Jacob Carlborg (4/5) Nov 28 2016 Isn't TLS initialized by the OS?

Joseph Rushton Wakeling (53/56) Dec 13 2016 I've finally found a moment to look into this (I'm at home

Joseph Rushton Wakeling (48/52) Dec 13 2016 Benchmark is here:
Ilya Yaroshenko (7/64) Dec 13 2016 My own.

Ilya Yaroshenko (2/4) Dec 13 2016 Using -> Uint
Joseph Rushton Wakeling (24/40) Dec 13 2016 Congratulations, then -- I think this is a very interesting

Ilya Yaroshenko (3/8) Dec 14 2016 Because memory access locality. z is always hot.

Joseph Rushton Wakeling (20/29) Jan 07 2017 PR is now on Phobos, and in the process of looking at this

Andrei Alexandrescu (4/12) Jan 07 2017 This indicates a compiler bug in dmd itself, so the best outcome for

Joseph Rushton Wakeling (25/28) Jan 08 2017 Yes, that much is clear -- sorry if I wasn't clear enough myself.

Joseph Rushton Wakeling (31/34) Jan 08 2017 I underestimated myself :-P Minimal example is as follows:

jmh530 (3/18) Nov 26 2016 How does a mir-lapack fit in to the plan going forward?

Ilya Yaroshenko (3/25) Nov 26 2016 LAPACK API will be a part of Mir GLAS

jmh530 (3/5) Nov 27 2016 Great. I might have asked before, apologies if so.

=?UTF-8?B?Tm9yZGzDtnc=?= (3/7) Nov 29 2016 Does Gb mean Gigabytes or Gigabits?

Ilya Yaroshenko (2/9) Nov 29 2016 Gigabits

Joseph Rushton Wakeling (18/21) Dec 14 2016 A note on this. The `opCall` (or, in the range version,

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

Hey,

32-bit Mt19937 random number Generator is default in Phobos.
It is default in Mir too, except that 64-bit targets use 64-bit 
Mt19937 instead.

The last Mir Random beta improves performance for Mt19937.

The goal was to:

1. Improve RNG generation performance by making code more 
friendly for CPU pipelining. Tempering (finalization) operations 
was mixed with internal payload update operations.

2. Reduce number of required comparison operations using 
unification with add/sub operations. The order of payload 
iteration was reversed (the engine generates the same numbers as 
before).

3. Reduce memory access twice both for generation (required for 
1) and initialization. This will make the algorithm more 
cache-friendly for real world applications. Ditto (1)

Bench results:
----
mir.random 32-bit Mt19937:
     6.80851 Gb/s

mir.random 64-bit Mt19937:
    12.5984 Gb/s

std.random 32-bit Mt19937:
    4.41989 Gb/s

std.random 64-bit Mt19937:
     wrong initialization and tempering (finalization)
----
54%: 1.54 = 6.80851 / 4.41989
185%: 2.85 = 12.5984 / 4.41989

The benchmark can be found here:
https://github.com/libmir/mir-random/blob/master/bench/Mt19937_bench.d

Ads:
We are looking for new contributors and partners! Star and 
contribute!

https://github.com/libmir :

  - mir - Generic numeric library
  - dcv - D Computer Vision library
  - mir-glas - Linear Algebra Subprograms (written in D)
  - mir-random - random numbers generators, includes non-uniform 
distributions.
  - mir-cpuid - CPU identification

Future:
  - mir-runtime - lightweight always inlined `nothrow  nogc` Dlang 
Runtime
  - mir-neural - neural network library [WIP]
  - mir-fft - Fast and multidimensional FFT
  - mir-svm - support vector machines

Cheers,
Ilya & Mir Team

Nov 26 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/26/16 11:31 AM, Ilya Yaroshenko wrote:
 Hey,

 32-bit Mt19937 random number Generator is default in Phobos.
 It is default in Mir too, except that 64-bit targets use 64-bit Mt19937
 instead.

 The last Mir Random beta improves performance for Mt19937.

 The goal was to:

Congrats! Also thanks for using the Boost license which would allow 
backporting the improvements to Phobos. Who'd be up for it?

Also I'm thinking of removing std.random's dependency on druntime, e.g. 
by removing the uses of enforce. Thoughts?

Andrei

Nov 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei 
Alexandrescu wrote:
 On 11/26/16 11:31 AM, Ilya Yaroshenko wrote:
 Hey,

 32-bit Mt19937 random number Generator is default in Phobos.
 It is default in Mir too, except that 64-bit targets use 
 64-bit Mt19937
 instead.

 The last Mir Random beta improves performance for Mt19937.

 The goal was to:

 Congrats! Also thanks for using the Boost license which would 
 allow backporting the improvements to Phobos. Who'd be up for 
 it?

Yep, I am using Phobos parts and we can do backports. In other 
hand it is requires efforts (I have experience with ndslice, and 
I will be happy to remove ndslice from Mir after LDC 1.2.0 
release).

 Also I'm thinking of removing std.random's dependency on 
 druntime, e.g. by removing the uses of enforce. Thoughts?

 Andrei

Another one DRuntime dependency is core time.

BTW, I would be happy to have core.time + std.datetime as unified 
nothrow  nogc library without DRuntime dependency (ahah, it is 
BetterC propaganda!)

Ilya

Nov 26 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei 
Alexandrescu wrote:
 Also I'm thinking of removing std.random's dependency on 
 druntime, e.g. by removing the uses of enforce. Thoughts?

There's no strong reason for those checks to be done via 
`enforce` except for a design decision that user input should be 
treated as external input.  The user is in complete control of 
those parameters, there's no reason they can't be `assert` checks.

Nov 26 2016

Guillaume Piolat <first.last gmail.com> writes:

On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei 
Alexandrescu wrote:
 Congrats! Also thanks for using the Boost license which would 
 allow backporting the improvements to Phobos. Who'd be up for 
 it?

 Also I'm thinking of removing std.random's dependency on 
 druntime, e.g. by removing the uses of enforce. Thoughts?

 Andrei

Without druntime, global ctor/dtor and TLS can't be used too.
Besides nothrow  nogc, std.random relies on a global ctor for 
creating the default RNG, which is used liberally.
A useful intermediate step is to have these "[shared] static 
this" ctor call a function instead, so that programs without 
druntime can call them too.

Nov 27 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/27/2016 08:28 AM, Guillaume Piolat wrote:
 On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei Alexandrescu wrote:
 Congrats! Also thanks for using the Boost license which would allow
 backporting the improvements to Phobos. Who'd be up for it?

 Also I'm thinking of removing std.random's dependency on druntime,
 e.g. by removing the uses of enforce. Thoughts?

 Andrei

 Without druntime, global ctor/dtor and TLS can't be used too.
 Besides nothrow  nogc, std.random relies on a global ctor for creating
 the default RNG, which is used liberally.

I see, so TLS is a problem.

 A useful intermediate step is to have these "[shared] static this" ctor
 call a function instead, so that programs without druntime can call them
 too.

That would be progress.


Andrei

Nov 27 2016

Guillaume Piolat <first.last gmail.com> writes:

On Sunday, 27 November 2016 at 13:35:48 UTC, Andrei Alexandrescu 
wrote:
 A useful intermediate step is to have these "[shared] static 
 this" ctor
 call a function instead, so that programs without druntime can 
 call them
 too.

 That would be progress.


 Andrei

Same story for core.cpuid which is initialized by a shared static 
constructor.
https://github.com/dlang/druntime/blob/master/src/core/cpuid.d

It seems the "shared static this" is allowed to break immutable 
but a function wouldn't. So maybe the variables here should be 
private __gshared?.

Nov 27 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Sunday, 27 November 2016 at 13:58:29 UTC, Guillaume Piolat 
wrote:
 On Sunday, 27 November 2016 at 13:35:48 UTC, Andrei 
 Alexandrescu wrote:
 A useful intermediate step is to have these "[shared] static 
 this" ctor
 call a function instead, so that programs without druntime 
 can call them
 too.

 That would be progress.


 Andrei

 Same story for core.cpuid which is initialized by a shared 
 static constructor.
 https://github.com/dlang/druntime/blob/master/src/core/cpuid.d

 It seems the "shared static this" is allowed to break immutable 
 but a function wouldn't. So maybe the variables here should be 
 private __gshared?.

They are in Mir CPUID, but methods are not pure.

https://github.com/libmir/mir-cpuid/blob/master/source/cpuid/x86_any.d

Nov 27 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 11/27/16 8:58 AM, Guillaume Piolat wrote:
 On Sunday, 27 November 2016 at 13:35:48 UTC, Andrei Alexandrescu wrote:
 A useful intermediate step is to have these "[shared] static this" ctor
 call a function instead, so that programs without druntime can call them
 too.

 That would be progress.

 Same story for core.cpuid which is initialized by a shared static
 constructor.
 https://github.com/dlang/druntime/blob/master/src/core/cpuid.d

 It seems the "shared static this" is allowed to break immutable but a
 function wouldn't. So maybe the variables here should be private
 __gshared?.

How we do a similar thing for core.time:

https://github.com/dlang/druntime/blob/master/src/core/time.d#L2388

This function is called from the druntime startup (before any static ctors).

Not sure if it helps or not.

Note that we can cheat the type system here because we know that nothing 
else will access the data before it's initialized. There is also a check 
in the function to ensure it's only done once.

-Steve

Nov 28 2016

Jacob Carlborg <doob me.com> writes:

On 2016-11-27 14:28, Guillaume Piolat wrote:

 Without druntime, global ctor/dtor and TLS can't be used too.

Isn't TLS initialized by the OS?

-- 
/Jacob Carlborg

Nov 28 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei 
Alexandrescu wrote:
 Congrats! Also thanks for using the Boost license which would 
 allow backporting the improvements to Phobos. Who'd be up for 
 it?

I've finally found a moment to look into this (I'm at home 
recovering from a seasonal virus, which ironically puts some time 
and mental space in my hands).

It makes for an interesting comparison.  The traditional Mersenne 
Twister algorithm generates N random words in one go, and then 
iterates over them, before generating another N random words, and 
so on.  (Word == unsigned integer type.)

By contrast Ilya's implementation just updates the single next 
entry in the static array that stores the state of the engine.  
This would seem to work out cheaper on average: perhaps because 
the update-a-variate code stays hotter, perhaps because it 
involves less branching and looping.  It certainly makes for a 
simpler, easier to follow update.

However, it might be a good idea to benchmark this in a context 
where there are a lot of random variates being generated, but 
where other things are also happening that might render the 
update code less "hot" than a straightforward `foreach` loop 
benchmark.  (Maybe try it out with a Monte Carlo simulation ...?) 
  The tradeoff between many cheap updates and one expensive one, 
versus all updates being a little more expensive but less on 
average, might not be viable in some practical use-cases.

The Phobos implementation also includes a check on whether the 
generator is properly seeded, which is performed in every 
`popFront()`, which has a reasonably significant impact.  Ilya's 
implementation can get away with not performing that check 
because it uses ` disable this();` to remove the possibility of 
initializing the generator without it being properly seeded.  The 
impact of that check can possibly be lessened by making it a 
boolean comparison rather than the existing `size_t` equality 
comparison.

   -- action item here: could we deprecate `this()` in Phobos as a 
precursor to removing
      the opportunity to instantiate a generator without properly 
initializing it?

I'm going to try to put together a range-based version to see if 
this also makes any difference.  I'll post some benchmarks of my 
own once that's done, and if all looks good I'll try to put a 
Phobos PR together.

A few questions -- not blockers, but nice to understand:

   *  Ilya, is this implementation your own design, or do you have 
a reference
     for the rationale behind this revised implementation of MT?

   * Is there a particular reason why the `index` variable is a 
`Uint` (i.e.
     the word type of the generator) rather than a `size_t`?  I 
presume there's
     a motivation, since the casting back and forth in `opCall` 
would otherwise
     seem inconvenient.

   * Is there a motivation for the reverse iteration through the 
RNG state array?

Dec 13 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Tuesday, 13 December 2016 at 18:15:25 UTC, Joseph Rushton 
Wakeling wrote:
 I'm going to try to put together a range-based version to see 
 if this also makes any difference.  I'll post some benchmarks 
 of my own once that's done, and if all looks good I'll try to 
 put a Phobos PR together.

Benchmark is here:
https://github.com/WebDrake/mersenne-twister-range

Overall these benchmarks seem to have reasonably wide variance.  
I note that the original benchmark's attempt to make the code 
"hot" had quite a strong impact in terms of the ordering of 
individual benchmarks; I've tried to work around that by doing 
the "make it hot" loop immediately before the benchmark.

This is still just a plain old foreach, so it may be worth 
exploring some benchmarks where other stuff is done with the 
random variates.  As a first step I've added some benchmarks 
using `uniform`.

Sample output from 2 different runs of these benchmarks:

-----
Running ./mir-mt-range
opCall Mir 32-bit Mt19937: 6.68058 Gb/s
check sum = 429472035921730457

range Mir 32-bit Mt19937: 6.639 Gb/s
check sum = 429472031909629874

opCall Mir 64-bit Mt19937: 14.0351 Gb/s
check sum = 5606740663277085587

range Mir 64-bit Mt19937: 13.8229 Gb/s
check sum = 3636092631268073456

Phobos 32-bit Mt19937: 4.22164 Gb/s
check sum = 429472035921730457

Phobos 64-bit Mt19937: wrong initialization and tempering

range Mir 32-bit Mt19937: 95.4198 U(0.0, 1.0) variates/s

range Mir 64-bit Mt19937: 87.184 U(0.0, 1.0) variates/s

Phobos 32-bit Mt19937: 80.8407 U(0.0, 1.0) variates/s
----

----
Running ./mir-mt-range
opCall Mir 32-bit Mt19937: 6.36183 Gb/s
check sum = 429472035921730457

range Mir 32-bit Mt19937: 6.54397 Gb/s
check sum = 429472031909629874

opCall Mir 64-bit Mt19937: 13.7339 Gb/s
check sum = 5606740663277085587

range Mir 64-bit Mt19937: 13.8229 Gb/s
check sum = 3636092631268073456

Phobos 32-bit Mt19937: 4.18848 Gb/s
check sum = 429472035921730457

Phobos 64-bit Mt19937: wrong initialization and tempering

range Mir 32-bit Mt19937: 95.2381 U(0.0, 1.0) variates/s

range Mir 64-bit Mt19937: 97.0874 U(0.0, 1.0) variates/s

Phobos 32-bit Mt19937: 88.1057 U(0.0, 1.0) variates/s
----

Dec 13 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Tuesday, 13 December 2016 at 18:15:25 UTC, Joseph Rushton 
Wakeling wrote:
 On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei 
 Alexandrescu wrote:
 Congrats! Also thanks for using the Boost license which would 
 allow backporting the improvements to Phobos. Who'd be up for 
 it?

 I've finally found a moment to look into this (I'm at home 
 recovering from a seasonal virus, which ironically puts some 
 time and mental space in my hands).

 It makes for an interesting comparison.  The traditional 
 Mersenne Twister algorithm generates N random words in one go, 
 and then iterates over them, before generating another N random 
 words, and so on.  (Word == unsigned integer type.)

 By contrast Ilya's implementation just updates the single next 
 entry in the static array that stores the state of the engine.  
 This would seem to work out cheaper on average: perhaps because 
 the update-a-variate code stays hotter, perhaps because it 
 involves less branching and looping.  It certainly makes for a 
 simpler, easier to follow update.

 However, it might be a good idea to benchmark this in a context 
 where there are a lot of random variates being generated, but 
 where other things are also happening that might render the 
 update code less "hot" than a straightforward `foreach` loop 
 benchmark.  (Maybe try it out with a Monte Carlo simulation 
 ...?)
  The tradeoff between many cheap updates and one expensive one, 
 versus all updates being a little more expensive but less on 
 average, might not be viable in some practical use-cases.

 The Phobos implementation also includes a check on whether the 
 generator is properly seeded, which is performed in every 
 `popFront()`, which has a reasonably significant impact.  
 Ilya's implementation can get away with not performing that 
 check because it uses ` disable this();` to remove the 
 possibility of initializing the generator without it being 
 properly seeded.  The impact of that check can possibly be 
 lessened by making it a boolean comparison rather than the 
 existing `size_t` equality comparison.

   -- action item here: could we deprecate `this()` in Phobos as 
 a precursor to removing
      the opportunity to instantiate a generator without 
 properly initializing it?

 I'm going to try to put together a range-based version to see 
 if this also makes any difference.  I'll post some benchmarks 
 of my own once that's done, and if all looks good I'll try to 
 put a Phobos PR together.

 A few questions -- not blockers, but nice to understand:

   *  Ilya, is this implementation your own design, or do you 
 have a reference
     for the rationale behind this revised implementation of MT?

My own.

   * Is there a particular reason why the `index` variable is a 
 `Uint` (i.e.
     the word type of the generator) rather than a `size_t`?  I 
 presume there's
     a motivation, since the casting back and forth in `opCall` 
 would otherwise
     seem inconvenient.

If I remember correctly it is used with Using, so I use the same 
type.

   * Is there a motivation for the reverse iteration through the 
 RNG state array?

Yes, it composes add/sub with cmp operations.

Ilya

Dec 13 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

 If I remember correctly it is used with Using, so I use the 
 same type.

Using -> Uint
Sorry, it is phone keyboard

Dec 13 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Tuesday, 13 December 2016 at 23:18:26 UTC, Ilya Yaroshenko 
wrote:
   *  Ilya, is this implementation your own design, or do you 
 have a reference
     for the rationale behind this revised implementation of MT?

 My own.

Congratulations, then -- I think this is a very interesting 
rewrite of the algorithm and I would encourage you to write up a 
paper on your choice of optimizations, if that is of interest to 
you.

   * Is there a particular reason why the `index` variable is a 
 `Uint` (i.e.
     the word type of the generator) rather than a `size_t`?  I 
 presume there's
     a motivation, since the casting back and forth in `opCall` 
 would otherwise
     seem inconvenient.

 If I remember correctly it is used with Uint, so I use the same 
 type.

Not anywhere that I can find.  `n`, the template parameter from 
which `this.index` is initialized, is a `size_t`, while the value 
used inside `opCall` is a `sizediff_t`.

TBH I assumed it was related to the bit-packing of the struct, 
rather than any question of usage.

   * Is there a motivation for the reverse iteration through 
 the RNG state array?

 Yes, it composes add/sub with cmp operations.

Cool.  Last question: IIUC you use the private `_z` parameter as 
a cache for the most recent `data[index]` value (and AFAICT 
that's the only use it has).  Is there a good reason for doing 
this, rather than just setting `z = data[index]` inside `opCall` 
(where `z` is the local parameter inside the method)?

I ask because it doesn't seem to make any practical difference to 
the benchmarking outcomes, and you have to look up `data[index]` 
anyway when you write it, so it's not obvious to me that it 
really saves anything in practice.  (I confess I haven't looked 
at the assembly, so I might be missing something.)

Thanks for the answers, and apologies for my slow follow-up on 
some of the things we discussed earlier related to mir.random -- 
it's been a bit of a busy patch ... :-\

Dec 13 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 14 December 2016 at 00:18:06 UTC, Joseph Rushton 
Wakeling wrote:
 Cool.  Last question: IIUC you use the private `_z` parameter 
 as a cache for the most recent `data[index]` value (and AFAICT 
 that's the only use it has).  Is there a good reason for doing 
 this, rather than just setting `z = data[index]` inside 
 `opCall` (where `z` is the local parameter inside the method)?

Because memory access locality. z is always hot.

Dec 14 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei 
Alexandrescu wrote:
 On 11/26/16 11:31 AM, Ilya Yaroshenko wrote:
 Hey,

 32-bit Mt19937 random number Generator is default in Phobos.
 It is default in Mir too, except that 64-bit targets use 
 64-bit Mt19937 instead.

 Congrats! Also thanks for using the Boost license which would 
 allow backporting the improvements to Phobos. Who'd be up for 
 it?

PR is now on Phobos, and in the process of looking at this 
together, Ilya and I resolved a number of issues both in this 
code and in mir.random:
https://github.com/dlang/phobos/pull/5011

There is unfortunately one serious blocker to this PR.  In order 
to allow it to be a drop-in replacement for the previous Phobos 
implementation, I had to find a means to default-initialize its 
internal state at compile time so that it would match what would 
be there if it was explicitly seeded with the default seed.

The method I developed works fine with LDC, but fails with DMD: 
the internal state of the generator winds up as all zeros, except 
for the `State.z` parameter which mysteriously ends up at the 
correct value.  This would suggest that somehow the generated 
state is being overwritten _after_ being correctly generated, 
rather than not correctly generated in the first place.

Any chance that someone with more CTFE-fu than myself could take 
a glance and advise what might be wrong (or how to work around 
it)?

Jan 07 2017

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/7/17 6:16 PM, Joseph Rushton Wakeling wrote:
 The method I developed works fine with LDC, but fails with DMD: the
 internal state of the generator winds up as all zeros, except for the
 `State.z` parameter which mysteriously ends up at the correct value.
 This would suggest that somehow the generated state is being overwritten
 _after_ being correctly generated, rather than not correctly generated
 in the first place.

 Any chance that someone with more CTFE-fu than myself could take a
 glance and advise what might be wrong (or how to work around it)?

This indicates a compiler bug in dmd itself, so the best outcome for 
this library work would be a reduced compiler bug + a simple library 
workaround. -- Andrei

Jan 07 2017

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Sunday, 8 January 2017 at 02:51:51 UTC, Andrei Alexandrescu 
wrote:
 This indicates a compiler bug in dmd itself, so the best 
 outcome for this library work would be a reduced compiler bug + 
 a simple library workaround. -- Andrei

Yes, that much is clear -- sorry if I wasn't clear enough myself. 
  I'm asking for eyes on the problem because reducing it to a 
minimal example appears non-trivial, while the bug itself looks 
serious beyond its effect on this PR.

For what it's worth:

   * it appears to be a backend issue, not frontend (using dmd 
with the same
     frontend as the latest ldc doesn't make a difference);

   * creating an `enum` manifest constant from the CTFE function 
works fine,
     but assigning to the runtime struct field from that enum 
produces the
     same mostly-zeros result;

   * it definitely doesn't appear to be something as trivial as 
just default
     initializing a struct instance (or a struct instance 
including a static
     array of more than a certain size) being the problem.

I will keep trying to reduce it to a minimal example, but 
CTFE-related backend bugs are not something I have a lot of 
experience with tracking down, hence the request for help.

Would filing an issue still be helpful at this stage, even 
without a minimal example?

Jan 08 2017

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Sunday, 8 January 2017 at 13:16:29 UTC, Joseph Rushton 
Wakeling wrote:
 I'm asking for eyes on the problem because reducing it to a 
 minimal example appears non-trivial, while the bug itself looks 
 serious beyond its effect on this PR.

I underestimated myself :-P  Minimal example is as follows:

/////////////////////////////////////////////////////////////////
struct Inner
{
     uint value = void;  // <=== removing this `void` initializer
                         // removes the problem
}

struct Outer
{
     Inner inner = defaultInner();

     static Inner defaultInner()
     {
         if (!__ctfe) assert(false);
         Inner inn;
         inn.value = 23;
         return inn;
     }
}

void main()
{
     import std.stdio : writeln;
     Outer outer;
     outer.inner.writeln;  // outputs `Inner(0)` with dmd
                           // but should be `Inner(23)`
                           // ldc gets it right ;-)
}
/////////////////////////////////////////////////////////////////

So, the bug comes down to priority of default initialization.  
I'll follow an issue later today and fix the PR accordingly.

Jan 08 2017

jmh530 <john.michael.hall gmail.com> writes:

On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko 
wrote:
 https://github.com/libmir :

  - mir - Generic numeric library
  - dcv - D Computer Vision library
  - mir-glas - Linear Algebra Subprograms (written in D)
  - mir-random - random numbers generators, includes non-uniform 
 distributions.
  - mir-cpuid - CPU identification

 Future:
  - mir-runtime - lightweight always inlined `nothrow  nogc` 
 Dlang Runtime
  - mir-neural - neural network library [WIP]
  - mir-fft - Fast and multidimensional FFT
  - mir-svm - support vector machines

 Cheers,
 Ilya & Mir Team

How does a mir-lapack fit in to the plan going forward?

Nov 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Sunday, 27 November 2016 at 02:28:06 UTC, jmh530 wrote:
 On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko 
 wrote:
 https://github.com/libmir :

  - mir - Generic numeric library
  - dcv - D Computer Vision library
  - mir-glas - Linear Algebra Subprograms (written in D)
  - mir-random - random numbers generators, includes 
 non-uniform distributions.
  - mir-cpuid - CPU identification

 Future:
  - mir-runtime - lightweight always inlined `nothrow  nogc` 
 Dlang Runtime
  - mir-neural - neural network library [WIP]
  - mir-fft - Fast and multidimensional FFT
  - mir-svm - support vector machines

 Cheers,
 Ilya & Mir Team

 How does a mir-lapack fit in to the plan going forward?

LAPACK API will be a part of Mir GLAS
https://github.com/libmir/mir-glas --Ilya

Nov 26 2016

jmh530 <john.michael.hall gmail.com> writes:

On Sunday, 27 November 2016 at 06:46:20 UTC, Ilya Yaroshenko 
wrote:
 LAPACK API will be a part of Mir GLAS
 https://github.com/libmir/mir-glas --Ilya

Great. I might have asked before, apologies if so.

Nov 27 2016

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko 
wrote:
 Bench results:
 ----
 mir.random 32-bit Mt19937:
     6.80851 Gb/s

Does Gb mean Gigabytes or Gigabits?

Nov 29 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Tuesday, 29 November 2016 at 16:54:55 UTC, Nordlöw wrote:
 On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko 
 wrote:
 Bench results:
 ----
 mir.random 32-bit Mt19937:
     6.80851 Gb/s

 Does Gb mean Gigabytes or Gigabits?

Gigabits

Nov 29 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko 
wrote:
 1. Improve RNG generation performance by making code more 
 friendly for CPU pipelining. Tempering (finalization) 
 operations was mixed with internal payload update operations.

A note on this.  The `opCall` (or, in the range version, 
`popFront`) of Ilya's implementation mixes together two 
superficially independent actions:

   (1) calculating the current random variate from the current 
index
       of the internal state array;

   (2) updating the current index of the internal state array, and
       moving to the next entry.

It's straightforward to split out these two procedures into two 
separate methods (or at least two clearly separated sequences 
within the `opCall`), but doing so results in a notable 
performance hit (on my machine, something in the order of 1 GB/s 
less random bits).

Intertwining these steps in this way is therefore a very smart 
optimization (although TBH it feels a little worrying that it's 
necessary).

Dec 14 2016

D Programming

C/C++ Programming

Other

digitalmars.D - [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.