digitalmars.D - Normal/Gaussian random number generation for D

Joseph Rushton Wakeling (62/62) Oct 22 2012 Hello all,

bearophile (12/19) Oct 23 2012 Joseph Rushton Wakeling:

Joseph Rushton Wakeling (11/17) Oct 23 2012 Most of the existing std.random is based around the planned C++11 random...
jerro (13/18) Oct 23 2012 I have an implementation of the Ziggurat algorithm at

Joseph Rushton Wakeling (11/23) Oct 23 2012 The caveat here is that the Ziggurat algorithm is both fast and AFAIK op...
Joseph Rushton Wakeling (44/48) Oct 24 2012 OK, so I had a reasonable (not totally in-depth!) look through. Looks g...

jerro (22/27) Oct 24 2012 The first change I made to the algorithm is shifting layer
jerro (25/77) Oct 24 2012 I already have a NormalDist struct at

Joseph Rushton Wakeling (30/49) Oct 25 2012 I'm not expert on this, but there are various tests described in this pa...

jerro (37/70) Oct 26 2012 I don't see any downside to this.

Joseph Rushton Wakeling (41/67) Nov 06 2012 Sorry for delayed response, last week was an unfortunate mix of getting ...

jerro (67/136) Nov 06 2012 No, I think that if the engine defines initialize() function

Joseph Rushton Wakeling (5/16) Nov 06 2012 I guess I was concerned about the possibility of a sequence of normal ra...
Joseph Rushton Wakeling (2/3) Nov 06 2012 ... to initialize in the constructor, that is.
Joseph Rushton Wakeling (20/31) Nov 11 2012 What's wrong with such initialization being in the constructor of the re...

jerro (7/28) Nov 11 2012 I agree that those helper functions are not very important, but

Joseph Rushton Wakeling (6/11) Nov 11 2012 Aaack, I'd completely forgotten that. OK, so we can go with the initial...
Joseph Rushton Wakeling (6/6) Nov 20 2012 Just to update everyone, Jerro and I have been knocking around a few dis...

jerro (5/11) Nov 20 2012 I agree, it's a good idea to ask the community for feedback. I've

Joseph Rushton Wakeling (21/23) Nov 20 2012 I replied in the GitHub thread before reading this email! :-P

jerro (5/12) Nov 20 2012 By functionality, do you mean just the API or also the

Joseph Rushton Wakeling (4/7) Nov 21 2012 A bit of both, but mainly the API. I'm interested in how obvious it is ...

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

Hello all,

A request for feedback on the design of this code before I put it through some 
serious testing and submit a formal pull request to Phobos' std.random.

The goal here is to provide an interface that is close to the Boost/C++11
design 
while allowing for some extra interesting possibilities.  In particular the
goal 
is to allow the possibility of different underlying generation algorithms or 
"engines" that can serve different requirements regarding speed, precision, 
memory consumption etc.  See:
http://www.cse.cuhk.edu.hk/%7Ephwl/mt/public/archives/papers/grng_acmcs07.pdf

... for further details.

The present code contains both a function and struct interface for normal
random 
number generation; the latter stores the parameters of the distribution (mean 
and standard deviation) and an instance of the underlying engine, but not the 
random number generator.

(An alternative design choice for the struct would have been to design it as a 
range with front/popFront, but for this to work I'd have had to also have it 
store the RNG, and this would run into the same problems as RandomSample has
due 
to RNGs not being reference types.  Boost/C++11 also do not store the RNG 
internally but implement a variate_generator class which allows the coupling of 
a random-number distribution and an underlying generator; this is the approach 
I'd take for defining say a RandomVariate range.)

I've so far implemented one underlying engine, the Box-Muller method currently 
used in Boost.  This is not the best engine possible -- an implementation of
the 
Ziggurat algorithm is desirable -- but it represents what is typically used in 
other libraries.  The essence of Box-Muller is that it generates 
(uniformly-distributed) numbers a pair at a time and then uses those to
generate 
a corresponding pair of normally-distributed numbers.  The method is described
here:
https://en.wikipedia.org/wiki/Box-Muller_transform

I've stuck very closely to the implementation in Boost, to the point of 
reproducing a "trick" Boost uses to get round the fact that its uniform random 
number generator only supports the interval [a, b) whereas the canonical 
description of Box-Muller requires the random numbers to be distributed in the 
interval (0, 1].  Obviously in D this is avoidable, but I felt there was some 
advantage in reproducing identical results to Boost, at least within the limits 
of floating-point numerical rounding (and maybe also because D's PI variable 
goes to a greater number of decimal places).

 From a licensing point of view this close reproduction isn't an issue, as
Boost 
and Phobos both use the same licence, but there'll be a credit to the original 
Boost authors in any pull request.  However, if there's a desire to 
differentiate the code more strongly, that can be arranged :-)  I've attached a 
small C++ code example to demonstrate the common output for the D version and 
that of Boost.

Anyway, what I'd really appreciate feedback on is the interface design for
these 
functions and structs.  A few concrete questions:

     (1) Does the order of template arguments seem optimal?  My inclination is 
perhaps to tweak it so that the Uniform RNG type goes first, the underlying 
engine second, and the input/output type T goes last, as these are most likely 
the order in which people will want to change things.

     (2) Is there a desire to have a separate input and output type? 
Alternatively, should I perhaps treat all input and internal processing as
using 
the real type and just use T for the output?

     (3) Given the default template arguments provided, is there any way to 
ensure that an instance of the Normal struct can be implemented like this:

      auto normal01 = Normal(0, 1);

rather than having to do this:

      auto normal01 = Normal!()(0, 1);

Yes, I could create a function that returns an instance, but I'm not sure what 
to call it given that I want the normal() function to serve a similar purpose
as 
uniform() ... :-)

Thanks in advance for any and all feedback.

Best wishes,

      -- Joe

Oct 22 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Joseph Rushton Wakeling:

Thank you for the work on this. A normals generator is quite 
needed in Phobos.

 I've so far implemented one underlying engine, the Box-Muller 
 method currently
 used in Boost.  This is not the best engine possible -- an 
 implementation of the
 Ziggurat algorithm is desirable -- but it represents what is 
 typically used in
 other libraries.

Using the Boost API is useful. And using D-specific 
features/syntax is equally good.
Using the Ziggurat algorithm is desired, otherwise people will 
have to implement it outside Phobos, because it's better if you 
have to generate many normal distributed values (and sometimes I 
need many of them). is it possible to have both algorithms? Maybe 
with a template argument?

Bye,
bearophile

Oct 23 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 10/23/2012 03:50 PM, bearophile wrote:
 Using the Boost API is useful. And using D-specific features/syntax is equally
 good.

Most of the existing std.random is based around the planned C++11 random API, 
which in turn derives from boost.  There are several ways in which D seems able 
to do things more elegantly, however ... :-)

 Using the Ziggurat algorithm is desired, otherwise people will have to
implement
 it outside Phobos, because it's better if you have to generate many normal
 distributed values (and sometimes I need many of them). is it possible to have
 both algorithms? Maybe with a template argument?

Yes, that's exactly the plan.  You see how there is an "engine" template 
argument (NormalRandomNumberEngine) which is currently by default Box-Muller, 
but could be Ziggurat or others (Marsaglia etc.)  Other engines will be added 
later, Box-Muller is there as the first and easiest to implement.

The idea of a NormalRandomNumberEngine is that it is a struct which returns via 
opCall a number drawn from the distribution N(0, 1).  This is trivial to 
transform into N(mean, sigma).

Oct 23 2012

"jerro" <a a.com> writes:

 Using the Ziggurat algorithm is desired, otherwise people will 
 have to implement it outside Phobos, because it's better if you 
 have to generate many normal distributed values (and sometimes 
 I need many of them). is it possible to have both algorithms? 
 Maybe with a template argument?

I have an implementation of the Ziggurat algorithm at 
https://github.com/jerro/phobos/blob/master/std/random.d#L2035. 
It modified the Ziggurat algorithm a bit, so that it doesn't need 
as many layers to work well, which reduces the memory consumption 
and makes initialization faster. The cauchy, normal and 
exponential functions currently use global tables, but the 
zigguratAlgorithm function can also be used to implement a struct 
or class based API.

Joseph mentions having different engines for generating normally 
distributed random numbers, so maybe I could write an engine for 
his API based on what I already have. If we had multiple engines, 
I don't think Ziggurat algorithm should be the default, though, 
because it requires relatively expensive initialization.

Oct 23 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 10/23/2012 04:36 PM, jerro wrote:
 I have an implementation of the Ziggurat algorithm at
 https://github.com/jerro/phobos/blob/master/std/random.d#L2035. It modified the
 Ziggurat algorithm a bit, so that it doesn't need as many layers to work well,
 which reduces the memory consumption and makes initialization faster. The
 cauchy, normal and exponential functions currently use global tables, but the
 zigguratAlgorithm function can also be used to implement a struct or class
based
 API.

Oh, nice!  I will take a look at this in the next days.

 Joseph mentions having different engines for generating normally distributed
 random numbers, so maybe I could write an engine for his API based on what I
 already have. If we had multiple engines, I don't think Ziggurat algorithm
 should be the default, though, because it requires relatively expensive
 initialization.

The caveat here is that the Ziggurat algorithm is both fast and AFAIK optimal 
from the statistical point of view -- yes, it's costly space-wise, but it's
also 
safe.  The Polar method might be a nice alternative that's faster than 
Box-Muller, that has good statistical properties, and that requires much less 
space.  (Box-Muller actually isn't really that good statistically.)

The choice probably needs to depend on the typical number of normal variates we 
think people are going to want to be generating.  The way my code is set up you 
should only need to allocate one engine per thread, so it seems like a once-off 
hit that's trivial in the context of any moderately sized simulation.

Oct 23 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 10/23/2012 04:36 PM, jerro wrote:
 I have an implementation of the Ziggurat algorithm at
 https://github.com/jerro/phobos/blob/master/std/random.d#L2035.

OK, so I had a reasonable (not totally in-depth!) look through.  Looks good! 
And of course reminds of the other benefit of Ziggurat, that it can be used for 
multiple different random number distributions.

It looks like it should be readily possible to integrate the core Ziggurat 
functionality and to convert the normal() function in your code to be a 
NormalRandomNumberEngine struct -- so assuming you like my own architecture 
proposal, let's do this (I'm happy to hear suggestions for alternative designs, 
as I don't feel particularly confident in my API-designing abilities).

For the other distributions, my feeling is that in some cases there's a value
in 
also having this "engine" approach, e.g. for exponentially-distributed numbers 
one could use Ziggurat or one could use the approach

     T u = uniform!("[)", T, T, UniformRandomNumberGenerator)(0.0, 1.0, urng);
     return -log(1 - u)/lambda;

... which is not as fast but has a much lower memory footprint.

 It modified the Ziggurat algorithm a bit, so that it doesn't need as many
layers
 to work well, which reduces the memory consumption and makes initialization
faster.

Can you expand on this, and maybe provide a reference?  I don't doubt your 
code's effectiveness but I think where RNGs are concerned we really need to be 
able to justify our algorithmic choices.  There's too much literature out there 
showing how commonly-used algorithms actually carry statistical flaws.

Bigger picture on my approach to non-uniform random number distributions.  The 
goal is to have the following:

     * Where useful, it should be possible to define and use multiple different
       internal "engines" for generating random numbers from the given
       distribution

     * For each distribution, there should be a function interface and a struct
       interface.

     * The struct implementation should store the distribution parameters and an
       instance of the internal engine (if any).

     * The function implementation should have 2 versions: one which allows the
       user to pass an engine of choice as input, one which contains a static
       instance of the specified engine (hence, thread-safe, distinguished
       according to both engine type and underlying uniform RNG type).

     * ... unless there's no call for distinct underlying engines, in which case
       the function version just takes parameters and uniform RNG :-)

     * The struct version should be useful to couple with an RNG instance to
       create an arbitrary random-number range à la Boost's variate_generator
       class.

So, a nice way to expand on our respective approaches might be to incorporate 
your Ziggurat, adapt it to my Normal engine API, and for me to write a similar 
setup for exponentially-distributed random numbers that uses the simple
approach 
above and can also use your Ziggurat implementation.

What do you think?

Best wishes,

     -- Joe

Oct 24 2012

"jerro" <a a.com> writes:

 Can you expand on this, and maybe provide a reference?  I don't 
 doubt your code's effectiveness but I think where RNGs are 
 concerned we really need to be able to justify our algorithmic 
 choices.  There's too much literature out there showing how 
 commonly-used algorithms actually carry statistical flaws.

The first change I made to the algorithm is shifting layer 
boundaries. Basic Ziggurat algorithm uses n layers with area A. 
Because computing samples from the bottom layer and the top layer 
is typically more expensive than computing samples from other 
layers, I shifted the layer boundaries so that the top and the 
bottom layers have areas A / 2 and other n - 1 layers have area 
A. That way we only need to draw half as many samples from the 
top and the bottom layer.

To better describe the second change, I drew a picture and 
uploaded it to http://i.imgur.com/bDRpP.png . The basic Ziggurat 
algorithm first chooses the layer randomly and then chooses a 
random number x between 0 and xavg. If x is below xmin, it 
returns it. The algorithm I use does this too. When x is above 
xmin, the basic Ziggurat algorithm chooses a random point in the 
outer layer area (see the picture), checks if y < f(y), returns 
its x coordinate if it is, and chooses a new point otherwise. My 
algorithm uses precomputed low and high x offsets for each layer. 
It chooses a point below the high diagonal, and checks if it is 
below the low diagonal. If it is (and it is in most cases), it 
returns its x coordinate without computing f(x). If only checks 
if y < f(x) when y is above the low diagonal. That way it doesn't 
need to make as many calls to f(x).

Oct 24 2012

"jerro" <a a.com> writes:

 It looks like it should be readily possible to integrate the 
 core Ziggurat functionality and to convert the normal() 
 function in your code to be a NormalRandomNumberEngine struct

I already have a NormalDist struct at 
https://github.com/jerro/phobos/blob/new-api/std/random.d#L2363 - 
converting that should be trivial.

 For the other distributions, my feeling is that in some cases 
 there's a value in also having this "engine" approach, e.g. for 
 exponentially-distributed numbers one could use Ziggurat or one 
 could use the approach

     T u = uniform!("[)", T, T, 
 UniformRandomNumberGenerator)(0.0, 1.0, urng);
     return -log(1 - u)/lambda;

 ... which is not as fast but has a much lower memory footprint.

I agree that the it's a good thing to design the API so that we 
can use different engines

 Can you expand on this, and maybe provide a reference?  I don't 
 doubt your code's effectiveness but I think where RNGs are 
 concerned we really need to be able to justify our algorithmic 
 choices.  There's too much literature out there showing how 
 commonly-used algorithms actually carry statistical flaws.

I have only tested a distribution of the output samples, which 
seems to be correct. I agree that we should try to avoid 
statistical flaws as much as possible. Do you know of any good 
tests for nonuniform random number generators?

 Bigger picture on my approach to non-uniform random number 
 distributions.  The goal is to have the following:

     * Where useful, it should be possible to define and use 
 multiple different
       internal "engines" for generating random numbers from the 
 given
       distribution

     * For each distribution, there should be a function 
 interface and a struct
       interface.

     * The struct implementation should store the distribution 
 parameters and an
       instance of the internal engine (if any).

     * The function implementation should have 2 versions: one 
 which allows the
       user to pass an engine of choice as input, one which 
 contains a static
       instance of the specified engine (hence, thread-safe, 
 distinguished
       according to both engine type and underlying uniform RNG 
 type).

     * ... unless there's no call for distinct underlying 
 engines, in which case
       the function version just takes parameters and uniform 
 RNG :-)

     * The struct version should be useful to couple with an RNG 
 instance to
       create an arbitrary random-number range à la Boost's 
 variate_generator
       class.

Seems OK to me. Maybe the function that uses a static engine 
should have one version that takes Rng as a parameter and one 
that uses rndGen, similar to how uniform() works now?

How would the static engines be initialized? They could be 
initialized on first use, but this would slow sample generation a 
bit. Another options is to initialize them in the static 
constructor. I think it would be best to make the engine a static 
field of a struct template, and initialize it in the structs 
static constructor (similar to what my ZigguratTable struct 
does). That way you don't need to check if the engine is 
initialized every time you generate a sample and the engine isn't 
initialized unless the function template that uses it is 
instantiated.

 So, a nice way to expand on our respective approaches might be 
 to incorporate your Ziggurat, adapt it to my Normal engine API, 
 and for me to write a similar setup for 
 exponentially-distributed random numbers that uses the simple 
 approach above and can also use your Ziggurat implementation.

 What do you think?

I think we should do that. Do you already have some code 
somewhere where I can see it? I need to know what API the 
NormalRandomNumberEngine struct should have.

Oct 24 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 10/24/2012 11:23 PM, jerro wrote:
 I already have a NormalDist struct at
 https://github.com/jerro/phobos/blob/new-api/std/random.d#L2363 - converting
 that should be trivial.

I'll have a go at this some time in the next days. :-)

 I have only tested a distribution of the output samples, which seems to be
 correct. I agree that we should try to avoid statistical flaws as much as
 possible. Do you know of any good tests for nonuniform random number
generators?

I'm not expert on this, but there are various tests described in this paper 
which compares different normal random number generation techniques:
http://www.cse.cuhk.edu.hk/%7Ephwl/mt/public/archives/papers/grng_acmcs07.pdf

... and the topic is also discussed in this paper:
http://www.doornik.com/research/ziggurat.pdf

I don't have sufficient knowhow to really feel confident about how to test
these 
things though, and part of me feels it might be worth at some point approaching 
some stats experts and asking them to help define a test suite for D's random 
number functionality.

 Seems OK to me. Maybe the function that uses a static engine should have one
 version that takes Rng as a parameter and one that uses rndGen, similar to how
 uniform() works now?

Well, as I defined the function, the UniformRNG input has a default value of 
rndGen, so if you call it just as normal(mean, sigma); it should work as 
intended.  But if there's some factor which means it would be better to define
a 
separate function which doesn't receive the RNG as input, I can do that.

 How would the static engines be initialized? They could be initialized on first
 use, but this would slow sample generation a bit. Another options is to
 initialize them in the static constructor. I think it would be best to make the
 engine a static field of a struct template, and initialize it in the structs
 static constructor (similar to what my ZigguratTable struct does). That way you
 don't need to check if the engine is initialized every time you generate a
 sample and the engine isn't initialized unless the function template that uses
 it is instantiated.

I think it probably depends on the details of the technique.  E.g. with 
Box-Muller you cache 2 random values, and you need to re-generate them for
every 
pair of normal random variates the user requests, so there's no problem with 
having it initialized in opCall on the first call to the engine.

In general I favour that approach of initializing in opCall because it means 
that you don't have to ever pass anything to the constructor and also if I have 
a simulation which uses a lot of random numbers, its output won't be changed 
simply by creating an instance of a normal random number generator -- only when 
I start actually _using_ that generator in place of some other source of
randomness.

 I think we should do that. Do you already have some code somewhere where I can
 see it? I need to know what API the NormalRandomNumberEngine struct should
have.

Did you get the code I attached to my original post on this topic?  Is there 
something extra that you need?

I'm going to turn that into patches for Phobos which I'll put up on GitHub in 
the next days, so we can pull and push and test/write together as needed.

Best wishes,

     -- Joe

Oct 25 2012

"jerro" <a a.com> writes:

 I'm not expert on this, but there are various tests described 
 in this paper which compares different normal random number 
 generation techniques:
 http://www.cse.cuhk.edu.hk/%7Ephwl/mt/public/archives/papers/grng_acmcs07.pdf

I'll implement the tests described in this paper.

 ... and the topic is also discussed in this paper:
 http://www.doornik.com/research/ziggurat.pdf

 I don't have sufficient knowhow to really feel confident about 
 how to test these things though, and part of me feels it might 
 be worth at some point approaching some stats experts and 
 asking them to help define a test suite for D's random number 
 functionality.

Maybe there are testsuites already written for this.

 Well, as I defined the function, the UniformRNG input has a 
 default value of rndGen, so if you call it just as normal(mean, 
 sigma); it should work as intended.  But if there's some factor 
 which means it would be better to define a separate function 
 which doesn't receive the RNG as input, I can do that.

I don't see any downside to this.

 I think it probably depends on the details of the technique.  
 E.g. with Box-Muller you cache 2 random values, and you need to 
 re-generate them for every pair of normal random variates the 
 user requests, so there's no problem with having it initialized 
 in opCall on the first call to the engine.

 In general I favour that approach of initializing in opCall 
 because it means that you don't have to ever pass anything to 
 the constructor and also if I have a simulation which uses a 
 lot of random numbers, its output won't be changed simply by 
 creating an instance of a normal random number generator -- 
 only when I start actually _using_ that generator in place of 
 some other source of randomness.

I have only been thinking about the Ziggurat algorithm, but you 
are right, it does depend on the details of the technique. For 
Box-Muller (and other engines that cache samples) it only makes 
sense to compute the first samples in the opCall. But for the 
Ziggurat algorithm, tables that must be computed before you can 
start sampling aren't changed during sampling and computing the 
tables doesn't require any additional arguments. So it makes the 
most sense for those tables to be initialized in the struct's 
constructor in the struct based API.

In my previous post I was talking about initializing static 
instances of the engine used in the normal() function. The 
advantage of initializing in a static constructor is that you 
don't need an additional check every time the normal() function 
is called. But because we will also have a struct based API, that 
will not require such checks (at least not for all engines), this 
isn't really that important. So we can also initialize the global 
engine instance in a call to normal(), if this simplifies things.

 Did you get the code I attached to my original post on this 
 topic?  Is there something extra that you need?

I didn't notice the attachments, sorry (but I see them now). It 
seems it will be easy enough to write NormalZigguratEngine, 
there's just one thing that I think should be added to the Normal 
struct. Engines should have a way to do their initialization in 
Normal's constructor (where possible), so I think something like 
this:

static if(is(typeof(_engine.initialize())))
     _engine.initialize();

should be added to Normal's constructor. Or maybe

_engine = NormalRandomNumberEngine();

so that a static opCall can be used.

 I'm going to turn that into patches for Phobos which I'll put 
 up on GitHub in the next days, so we can pull and push and 
 test/write together as needed.

Maybe we should also have a separate file in a separate branch 
for tests. There will probably be a lot of code needed to test 
this well and the tests could take a long time to run, so I don't 
think they should go into unit test blocks (we can put some 
simpler tests in unit test blocks later). It may also be useful 
to use external libraries such as dstats for tests.

Oct 26 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

Sorry for delayed response, last week was an unfortunate mix of getting sick
and 
then having a bunch of work-related stuff ...

On 10/26/2012 11:03 PM, jerro wrote:
 Well, as I defined the function, the UniformRNG input has a default value of
 rndGen, so if you call it just as normal(mean, sigma); it should work as
 intended.  But if there's some factor which means it would be better to define
 a separate function which doesn't receive the RNG as input, I can do that.

 I don't see any downside to this.

Which "this" do you mean?  My current approach, or the adding of an extra 
separate function? :-)

 I have only been thinking about the Ziggurat algorithm, but you are right, it
 does depend on the details of the technique. For Box-Muller (and other engines
 that cache samples) it only makes sense to compute the first samples in the
 opCall. But for the Ziggurat algorithm, tables that must be computed before you
 can start sampling aren't changed during sampling and computing the tables
 doesn't require any additional arguments. So it makes the most sense for those
 tables to be initialized in the struct's constructor in the struct based API.

So we should assume by default then that the struct's constructor should take
an 
RNG as input, to enable it to calculate these first values if it needs to?

 In my previous post I was talking about initializing static instances of the
 engine used in the normal() function. The advantage of initializing in a static
 constructor is that you don't need an additional check every time the normal()
 function is called. But because we will also have a struct based API, that will
 not require such checks (at least not for all engines), this isn't really that
 important. So we can also initialize the global engine instance in a call to
 normal(), if this simplifies things.

I guess my feeling here is that the values generated by an RNG should depend on 
when it is called, and not at all on when it is instantiated.

i.e. if I do something like

     auto nrng = Normal!()(0, 1);
     writeln( uniform(0.0, 1.0) );
     writeln( uniform(0.0, 1.0) );
     writeln( nrng() );
     writeln( nrng() );

I should get the same output as if I do,

     writeln( uniform(0.0, 1.0) );
     writeln( uniform(0.0, 1.0) );
     auto nrng = Normal!()(0, 1);
     writeln( nrng() );
     writeln( nrng() );

You can also think that if I change from e.g.

     auto nrng = Normal!(real, Engine1)(0, 1);
     writeln( uniform(0.0, 1.0) );
     writeln( uniform(0.0, 1.0) );
     writeln( nrng() );
     writeln( nrng() );

to

     auto nrng = Normal!(real, Engine2)(0, 1);
     writeln( uniform(0.0, 1.0) );
     writeln( uniform(0.0, 1.0) );
     writeln( nrng() );
     writeln( nrng() );

... then I would expect to see different results from the normal RNG but 
identical results from uniform().  If the constructor of the normal engine
calls 
the RNG, the uniform() results will change, no?

 I'm going to turn that into patches for Phobos which I'll put up on GitHub in
 the next days, so we can pull and push and test/write together as needed.

 Maybe we should also have a separate file in a separate branch for tests. There
 will probably be a lot of code needed to test this well and the tests could
take
 a long time to run, so I don't think they should go into unit test blocks (we
 can put some simpler tests in unit test blocks later). It may also be useful to
 use external libraries such as dstats for tests.

Yes, a random.d test suite probably should be another project.

Regardless of tests, let's focus for now on getting the API right for this case 
of non-uniform-random-number-generator-with-internal-engine, with normal and 
exponential as the initial cases.  We can always label some engines as "use at 
own risk" in the short term!

Nov 06 2012

"jerro" <a a.com> writes:

 I don't see any downside to this.

 Which "this" do you mean?  My current approach, or the adding 
 of an extra separate function? :-)

Your current approach.

 I have only been thinking about the Ziggurat algorithm, but 
 you are right, it
 does depend on the details of the technique. For Box-Muller 
 (and other engines
 that cache samples) it only makes sense to compute the first 
 samples in the
 opCall. But for the Ziggurat algorithm, tables that must be 
 computed before you
 can start sampling aren't changed during sampling and 
 computing the tables
 doesn't require any additional arguments. So it makes the most 
 sense for those
 tables to be initialized in the struct's constructor in the 
 struct based API.


 So we should assume by default then that the struct's 
 constructor should take an RNG as input, to enable it to 
 calculate these first values if it needs to?

No, I think that if the engine defines initialize() function 
(with no parameters), it should be called in the constructor of 
Normal. I don't think the constructor of Normal should take an 
RNG as input. I think what you currently do in normal.d is fine, 
I would just add something like this:

static if(is(typeof(_engine.initialize())))
     _engine.initialize();

to Normal's constructor. Then ZigguratEngine can define 
initialize() and do its initialization there (it doesn't need a 
uniform RNG for initialization) and NormalBoxMullerEngine can 
remain unchanged.

 In my previous post I was talking about initializing static 
 instances of the
 engine used in the normal() function. The advantage of 
 initializing in a static
 constructor is that you don't need an additional check every 
 time the normal()
 function is called. But because we will also have a struct 
 based API, that will
 not require such checks (at least not for all engines), this 
 isn't really that
 important. So we can also initialize the global engine 
 instance in a call to
 normal(), if this simplifies things.


 I guess my feeling here is that the values generated by an RNG 
 should depend on when it is called, and not at all on when it 
 is instantiated.

 i.e. if I do something like

     auto nrng = Normal!()(0, 1);
     writeln( uniform(0.0, 1.0) );
     writeln( uniform(0.0, 1.0) );
     writeln( nrng() );
     writeln( nrng() );

 I should get the same output as if I do,

     writeln( uniform(0.0, 1.0) );
     writeln( uniform(0.0, 1.0) );
     auto nrng = Normal!()(0, 1);
     writeln( nrng() );
     writeln( nrng() );

 You can also think that if I change from e.g.

     auto nrng = Normal!(real, Engine1)(0, 1);
     writeln( uniform(0.0, 1.0) );
     writeln( uniform(0.0, 1.0) );
     writeln( nrng() );
     writeln( nrng() );

 to

     auto nrng = Normal!(real, Engine2)(0, 1);
     writeln( uniform(0.0, 1.0) );
     writeln( uniform(0.0, 1.0) );
     writeln( nrng() );
     writeln( nrng() );

 ... then I would expect to see different results from the 
 normal RNG but identical results from uniform().  If the 
 constructor of the normal engine calls the RNG, the uniform() 
 results will change, no?

I was only talking about the part of initialization that doesn't 
use a RNG. I agree that everything that uses a RNG should be done 
in opCall (or inside a normal() function in the function 
interface). For Box-Muller, I think the approach you currently 
use in NormalBoxMullerEngine is the most reasonable one.
But a Ziggurat engines needs to compute some tables before it can 
start generating samples. It doesn't need a RNG to do that and 
the tables do not change after initialization.

I think it's obvious that that initialization that doesn't need a 
RNG should be done in Normal's constructor for the struct 
interface. What is not so obvious is where the initialization of 
static data that doesn't require a RNG should be done for 
function interface. That initialization can be done in normal() 
function, or it can be done in a static constructor. If you do it 
in normal(), you need to do an extra check on each call to 
normal(). This isn't really a problem as long as the struct's 
opCall and the version of normal() that takes the engine as a 
parameter don't do such redundant checks. Then the users that 
care about the difference in performance can just use one of 
these interfaces.

 Yes, a random.d test suite probably should be another project.

 Regardless of tests, let's focus for now on getting the API 
 right for this case of 
 non-uniform-random-number-generator-with-internal-engine, with 
 normal and exponential as the initial cases.

In general I like the API in file normal.d attached to your 
original post. I think the engines should have an option to do 
some initialization in Normal's constructor, though. We could 
achieve that by calling _engine.initialize in Normal's 
constructors, if such method exists. This method would also need 
to be called on the static instance of normal engine used in the 
normal() function. We could add something like this to the first 
version of normal:

static if(is(typeof(engine.initialize())))
{
     static bool isInitialized;
     if(!isInitialized)
         engine.initialize();
}

Another option would be to do this:

struct GlobalEngine(Engine)
{
     Engine instance;

     static this()
     {
         instance.initialize();
     }
}

And then inside the version of normal that doesn't take an engine 
as a parameter:

alias GlobalEngine!NormalRandomNumberEngine E;
return normal(mean, sigma, E.instance, urng);

The users would need to construct their own instance of engine to 
use the function that takes engine as a parameter. So it would 
make sense to add helper functions for creating engine instances.

There's one change that I think would make the API more 
convenient. Normal struct and the engine don't store an instance 
of a RNG , so they don't need to take it as a template parameter. 
We could make opCall methods templates instead. That way the 
users would never need to explicitly specify the type of the RNG.

Nov 06 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 11/06/2012 07:14 PM, jerro wrote:
 I was only talking about the part of initialization that doesn't use a RNG. I
 agree that everything that uses a RNG should be done in opCall (or inside a
 normal() function in the function interface). For Box-Muller, I think the
 approach you currently use in NormalBoxMullerEngine is the most reasonable one.
 But a Ziggurat engines needs to compute some tables before it can start
 generating samples. It doesn't need a RNG to do that and the tables do not
 change after initialization.

Ahh, OK, clear.  Then obviously your suggested approach is correct.

 There's one change that I think would make the API more convenient. Normal
 struct and the engine don't store an instance of a RNG , so they don't need to
 take it as a template parameter. We could make opCall methods templates
instead.
 That way the users would never need to explicitly specify the type of the RNG.

I guess I was concerned about the possibility of a sequence of normal random 
numbers generated by a mix of different RNGs, but to be honest, if someone 
really, really wants to do that, I'm not sure I should stop them. :-)

Nov 06 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 11/06/2012 07:39 PM, Joseph Rushton Wakeling wrote:
 Ahh, OK, clear.  Then obviously your suggested approach is correct.

... to initialize in the constructor, that is.

Nov 06 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 11/06/2012 07:14 PM, jerro wrote:
 In general I like the API in file normal.d attached to your original post. I
 think the engines should have an option to do some initialization in Normal's
 constructor, though. We could achieve that by calling _engine.initialize in
 Normal's constructors, if such method exists.

What's wrong with such initialization being in the constructor of the relevant 
NormalEngine?  I think that was your original idea, and I derailed it because
of 
my misunderstanding of what you wanted to initialize.

 The users would need to construct their own instance of engine to use the
 function that takes engine as a parameter. So it would make sense to add helper
 functions for creating engine instances.

In general the idea is that the engine should be something hidden; where you 
need to use it, you just need to pass the name as a template parameter; it 
should be rare that you really need to manually instantiate your own engine. 
But we can add such a helper function if you like.

What I'm frustrated about is that as-is it's not possible to just have

     auto nrng = Normal(mean, sigma);

... but you have instead to write,

     auto nrng = Normal!()(mean, sigma);

despite the fact that Normal has default template parameters given.  So maybe a 
helper function normalRNG which returns an instance of Normal would also be
helpful.

 There's one change that I think would make the API more convenient. Normal
 struct and the engine don't store an instance of a RNG , so they don't need to
 take it as a template parameter. We could make opCall methods templates
instead.
 That way the users would never need to explicitly specify the type of the RNG.

Good call.

I've uploaded a tweaked version based on your comments here:
https://github.com/WebDrake/phobos/tree/normal

... so feel free to pull, further revise and add in your Ziggurat
implementation :-)

I haven't yet written any of the helper functions you suggest, but will do so 
shortly based on your response to my above remarks.

Nov 11 2012

"jerro" <a a.com> writes:

 What's wrong with such initialization being in the constructor 
 of the relevant NormalEngine?  I think that was your original 
 idea, and I derailed it because of my misunderstanding of what 
 you wanted to initialize.

The problem is that structs can't have constructors with no 
parameters.

 In general the idea is that the engine should be something 
 hidden; where you need to use it, you just need to pass the 
 name as a template parameter; it should be rare that you really 
 need to manually instantiate your own engine. But we can add 
 such a helper function if you like.

I agree that those helper functions are not very important, but 
on the other hand I don't think adding them costs us anything. 
But maybe we should take care of the other stuff first.

 What I'm frustrated about is that as-is it's not possible to 
 just have

     auto nrng = Normal(mean, sigma);

 ... but you have instead to write,

     auto nrng = Normal!()(mean, sigma);

 despite the fact that Normal has default template parameters 
 given.  So maybe a helper function normalRNG which returns an 
 instance of Normal would also be helpful.

Yes, I think it would be.

 I've uploaded a tweaked version based on your comments here:
 https://github.com/WebDrake/phobos/tree/normal

 ... so feel free to pull, further revise and add in your 
 Ziggurat implementation :-)

I'll add the Zigggurat implementation, probably tomorrow.

Nov 11 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 11/11/2012 05:30 PM, jerro wrote:
 The problem is that structs can't have constructors with no parameters.

Aaack, I'd completely forgotten that.  OK, so we can go with the initialize 
function as you suggested.  I suggest you add it in together with Ziggurat.

 I agree that those helper functions are not very important, but on the other
 hand I don't think adding them costs us anything. But maybe we should take care
 of the other stuff first.

Well, they should be trivial enough to write, let's get Ziggurat in and working 
first and then add them.

 I'll add the Zigggurat implementation, probably tomorrow.

Excellent!  Will look forward to seeing that.

Nov 11 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

Just to update everyone, Jerro and I have been knocking around a few
discussions 
and code tweaks on GitHub:
https://github.com/WebDrake/phobos/pull/1#issuecomment-10319570

If anyone wants to comment or give feedback, please feel free.  It's surely 
useful that we get as many concerns covered and addressed before submitting an 
official Phobos pull request!

Nov 20 2012

"jerro" <a a.com> writes:

On Tuesday, 20 November 2012 at 15:04:53 UTC, Joseph Rushton 
Wakeling wrote:
 Just to update everyone, Jerro and I have been knocking around 
 a few discussions and code tweaks on GitHub:
 https://github.com/WebDrake/phobos/pull/1#issuecomment-10319570

 If anyone wants to comment or give feedback, please feel free.  
 It's surely useful that we get as many concerns covered and 
 addressed before submitting an official Phobos pull request!

I agree, it's a good idea to ask the community for feedback. I've 
replied to you in the github thread, but maybe it would be better 
to continue the discussion here?

Nov 20 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 11/20/2012 07:00 PM, jerro wrote:
 I agree, it's a good idea to ask the community for feedback. I've replied to
you
 in the github thread, but maybe it would be better to continue the discussion
here?

I replied in the GitHub thread before reading this email! :-P

My feeling is that the core normal RNG code is probably fairly well defined,
and 
won't need too many more tweaks.  What I see lacking right now are unittests
(so 
far the ones I've written have just been to test that the type inference rules 
work correctly) and possibly also some additional necessary type constraints 
(e.g. it would be good to have something like isNormalRandomNumberEngine, and I 
don't know if you want to add some extra type checks to some of your 
Ziggurat-related functions).

I'd particularly appreciate it if community members could give the type
checking 
a once-over to see if it's adequate/appropriate, whether there are any
important 
checks missing and whether in some cases it's actually too strict (e.g. could
we 
reasonably replace the isNumeric checks with isScalar?).

As I wrote on GitHub, I also think it's worth considering if some of the helper 
functions/structs introduced in these patches could better be moved out to
other 
parts of Phobos for general-purpose use, e.g. your isPowerOfTwo.

Lastly, we need to make sure that this new functionality is well documented. 
Perhaps a good start here could be for people to take a look at the code and 
describe what isn't clear or easy to understand just from looking ... ?

Best wishes,

     -- Joe

Nov 20 2012

"jerro" <a a.com> writes:

 isNormalRandomNumberEngine, and I don't know if you want to add 
 some extra type checks to some of your Ziggurat-related 
 functions).

Yes, I need to add some checks. Currently, there aren't any :D

 Lastly, we need to make sure that this new functionality is 
 well documented. Perhaps a good start here could be for people 
 to take a look at the code and describe what isn't clear or 
 easy to understand just from looking ... ?

By functionality, do you mean just the API or also the 
implementation? I assume my Ziggurat algorithm implementation 
isn't easy to understand without some kind of description. I'll 
need to write a ton of comments in there.

Nov 20 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 11/21/2012 01:44 AM, jerro wrote:
 By functionality, do you mean just the API or also the implementation? I assume
 my Ziggurat algorithm implementation isn't easy to understand without some kind
 of description. I'll need to write a ton of comments in there.

A bit of both, but mainly the API.  I'm interested in how obvious it is how to 
_use_ it rather than how obvious the internals are (which is rare for any case, 
let alone something like Ziggurat).

Nov 21 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Normal/Gaussian random number generation for D