www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A Tausworthe random number generator

reply "terchestor" <terchestor gmail.com> writes:
I'm a newcomer to D language (after years of programming with ASM 
Motorola 68030-40 long time ago, as well as C, C++, Javascript, 
PHP, etc.), but not anymore a professional coder.
After reading some articles around there and that nice and cute 
TDPL (just after Coding standards, another world!), I began to 
write some to exercise my fresh new "skill".
I choose as a first experience, to write a simple class 
implementing a very interesting random number generator known as 
Tausworthe, according to the recipe found in this paper: 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.9723&rep=rep1&type=pdf.
The algorithm in C is quite simple and set in Lecuyer's paper as:

unsigned long s1, s2, s3, b;
double taus88 ()
{
  /* Generates numbers between 0 and 1. */
b = (((s1 << 13) ^ s1) >> 19);
s1 = (((s1 & 4294967294) << 12) ^ b);
b = (((s2 << 2) ^ s2) >> 25);
s2 = (((s2 & 4294967288) << 4) ^ b);
b = (((s3 << 3) ^ s3) >> 11);
s3 = (((s3 & 4294967280) << 17) ^ b);
return ((s1 ^ s2 ^ s3) * 2.3283064365e-10);
}

I made some "adjustments" to this algorithm, to yield either a 
(long) integer or (double) float result, in any range.

Benchmarks
----------
Tausworthe module (rdmd -main -unittest 
-debug=Benchmark_Tausworthe tausworthe.d)
============================================================
Call 1_000_000 times benchmark_uniform_ulong( ): 30 ms | Average 
: 3e-05 ms
============================================================
============================================================
Call 1_000_000 times benchmark_uniform_double( ): 42 ms | Average 
: 4.2e-05 ms
============================================================
============================================================
Call 1_000_000 times benchmark_uniform_std( ): 70 ms | Average : 
7e-05 ms
============================================================
Comparing ratio tausworthe / std.random:  0.609229

I'd greatly appreciate any reviewing and testing of my code. Any 
improvements you feel necessary are most welcome (as said before, 
I'm a newbie).

What's the best place to publish the code?
Jan 21 2014
next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Tuesday, 21 January 2014 at 10:05:42 UTC, terchestor wrote:
 I'm a newcomer to D language (after years of programming with 
 ASM Motorola 68030-40 long time ago, as well as C, C++, 
 Javascript, PHP, etc.), but not anymore a professional coder.
 After reading some articles around there and that nice and cute 
 TDPL (just after Coding standards, another world!), I began to 
 write some to exercise my fresh new "skill".
 I choose as a first experience, to write a simple class 
 implementing a very interesting random number generator known 
 as Tausworthe, according to the recipe found in this paper: 
 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.9723&rep=rep1&type=pdf.
 The algorithm in C is quite simple and set in Lecuyer's paper 
 as:

 unsigned long s1, s2, s3, b;
 double taus88 ()
 {
  /* Generates numbers between 0 and 1. */
 b = (((s1 << 13) ^ s1) >> 19);
 s1 = (((s1 & 4294967294) << 12) ^ b);
 b = (((s2 << 2) ^ s2) >> 25);
 s2 = (((s2 & 4294967288) << 4) ^ b);
 b = (((s3 << 3) ^ s3) >> 11);
 s3 = (((s3 & 4294967280) << 17) ^ b);
 return ((s1 ^ s2 ^ s3) * 2.3283064365e-10);
 }

 I made some "adjustments" to this algorithm, to yield either a 
 (long) integer or (double) float result, in any range.

 Benchmarks
 ----------
 Tausworthe module (rdmd -main -unittest 
 -debug=Benchmark_Tausworthe tausworthe.d)
 ============================================================
 Call 1_000_000 times benchmark_uniform_ulong( ): 30 ms | 
 Average : 3e-05 ms
 ============================================================
 ============================================================
 Call 1_000_000 times benchmark_uniform_double( ): 42 ms | 
 Average : 4.2e-05 ms
 ============================================================
 ============================================================
 Call 1_000_000 times benchmark_uniform_std( ): 70 ms | Average 
 : 7e-05 ms
 ============================================================
 Comparing ratio tausworthe / std.random:  0.609229

 I'd greatly appreciate any reviewing and testing of my code. 
 Any improvements you feel necessary are most welcome (as said 
 before, I'm a newbie).

 What's the best place to publish the code?

Github, bitbucket or equivalent. If you think it's ready for general use then write a dub.json and register it to code.dlang.org Take care with modifying random number generators, it's quite easy to ruin them by changing something that seems innocent.
Jan 21 2014
prev sibling next sibling parent "Rikki Cattermole" <alphaglosined gmail.com> writes:
On Tuesday, 21 January 2014 at 10:05:42 UTC, terchestor wrote:
 What's the best place to publish the code?

Something along the lines of github/gist/pastebin. Also this is more of a learn newsgroup post. In case you're not aware, in D we can utilise static variables inside functions. Meaning those global variables can be pushed into the function. If thats an approach you would be interested in.
Jan 21 2014
prev sibling next sibling parent "terchestor" <terchestor gmail.com> writes:
On Tuesday, 21 January 2014 at 10:10:32 UTC, John Colvin wrote:
 On Tuesday, 21 January 2014 at 10:05:42 UTC, terchestor wrote:
 I'm a newcomer to D language (after years of programming with 
 ASM Motorola 68030-40 long time ago, as well as C, C++, 
 Javascript, PHP, etc.), but not anymore a professional coder.
 After reading some articles around there and that nice and 
 cute TDPL (just after Coding standards, another world!), I 
 began to write some to exercise my fresh new "skill".
 I choose as a first experience, to write a simple class 
 implementing a very interesting random number generator known 
 as Tausworthe, according to the recipe found in this paper: 
 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.9723&rep=rep1&type=pdf.
 The algorithm in C is quite simple and set in Lecuyer's paper 
 as:

 unsigned long s1, s2, s3, b;
 double taus88 ()
 {
 /* Generates numbers between 0 and 1. */
 b = (((s1 << 13) ^ s1) >> 19);
 s1 = (((s1 & 4294967294) << 12) ^ b);
 b = (((s2 << 2) ^ s2) >> 25);
 s2 = (((s2 & 4294967288) << 4) ^ b);
 b = (((s3 << 3) ^ s3) >> 11);
 s3 = (((s3 & 4294967280) << 17) ^ b);
 return ((s1 ^ s2 ^ s3) * 2.3283064365e-10);
 }

 I made some "adjustments" to this algorithm, to yield either a 
 (long) integer or (double) float result, in any range.

 Benchmarks
 ----------
 Tausworthe module (rdmd -main -unittest 
 -debug=Benchmark_Tausworthe tausworthe.d)
 ============================================================
 Call 1_000_000 times benchmark_uniform_ulong( ): 30 ms | 
 Average : 3e-05 ms
 ============================================================
 ============================================================
 Call 1_000_000 times benchmark_uniform_double( ): 42 ms | 
 Average : 4.2e-05 ms
 ============================================================
 ============================================================
 Call 1_000_000 times benchmark_uniform_std( ): 70 ms | Average 
 : 7e-05 ms
 ============================================================
 Comparing ratio tausworthe / std.random:  0.609229

 I'd greatly appreciate any reviewing and testing of my code. 
 Any improvements you feel necessary are most welcome (as said 
 before, I'm a newbie).

 What's the best place to publish the code?

Github, bitbucket or equivalent. If you think it's ready for general use then write a dub.json and register it to code.dlang.org Take care with modifying random number generators, it's quite easy to ruin them by changing something that seems innocent.

Well, thanks for the advices. Here it is: https://github.com/Terchestor/dlang/tree/Tausworthe-class Please be indulgent ,-)
Jan 21 2014
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
terchestor:

 https://github.com/Terchestor/dlang/tree/Tausworthe-class

 Please be indulgent ,-)

 /++
 §§§ Tausworthe

What's the purpose of those §? They add noise. If they aren't necessary I suggest to remove them.
 class Tausworthe ( I, F ) if ( __traits(isIntegral, I ) &&
                                __traits(isFloating, F ) )

I suggest to not put spaces around ( ). And if you want also remove the space after the class/functions name. Perhaps you want to use a struct? The decision is important. Are your methods all virtual?
 {
     alias I integral_t;
     alias F floating_t;

Today the syntax "alias Name = Foo;" is preferred. In D code the use of "_t" as suffix for types is less common. And perhaps it's better to use a starting upper case letter for their name.
     enum tauswortheConstFloat : floating_t
     {
         tau0 = 4294967296.0f

Class/struct/Enum/Union names should start with an upper case letter.
          tau1 = 4294967294,

Perhaps that literal can end with a U.
    ~this( )
     { }

No need for this.
     void seed( integral_t seed_=1 )
 ...
         auto s = seed_;

In general annotate with enum/const/immutable all variables that don't need to mutate.
         integral_t fseed( const integral_t s_ )
         {
             return ( tauswortheConst.lgc * s_ ) & 
 tauswortheConst.mask;
         }

That's probably better annotated as pure nothrow.
 }//§§==========END_CLASS::Tausworthe==========

I suggest to reduce the amount of === in this code.
 auto ?tausworthe0 = new Tausworthe!(ulong, double);

While D supports Unicode identifiers, they are a often a pain in the ass (this is a personal opinion), so perhaps you want to kill them all.
 //§§ __COPYRIGHT_NOTICE__

I think there's a ddoc field syntax for the copyright in the module ddoc. Bye, bearophile
Jan 21 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 21 January 2014 at 11:04:13 UTC, terchestor wrote:
 Well, thanks for the advices.
 Here it is: 
 https://github.com/Terchestor/dlang/tree/Tausworthe-class

 Please be indulgent ,-)

The code feels a bit "C-ish", but I'd argue those are just style issues. My major points would be: 1) Make it a "final class": By default, classes have all their methods virtual. I don't think you want that. 2) I'd adhere to the "Range" interface: Give your function the "front"/"popFront()" functions, as well as the public "enum empty = false". 3) I think the *instance* itself should carry the bounds on initialization, and provide a single "uniform" function. 4) Uniform initialization: provide a "tausworthe(I = int, R = double)(int seed, double low = 0.0, double high = 1.0 )" function. This function should simply return a new instance of a Tausworthe. The advantage of this are 2-fold: 4.1) Similar syntax for struct/class initialization 4.2) Default Type parameters for the type. The combination of 2/3/4 will allow you to write nifty things like: int[] myRandomNumbers = tausworthe(randomSeed).take(10).array(); And voila! An array of 10 random numbers! Easy-peasy.
Jan 21 2014
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jan 21, 2014 at 12:32:41PM +0000, bearophile wrote:
 terchestor:
 
https://github.com/Terchestor/dlang/tree/Tausworthe-class

Please be indulgent ,-)

/++
 Tausworthe

What's the purpose of those ? They add noise. If they aren't necessary I suggest to remove them.
class Tausworthe ( I, F ) if ( __traits(isIntegral, I ) &&
                               __traits(isFloating, F ) )

I suggest to not put spaces around ( ). And if you want also remove the space after the class/functions name. Perhaps you want to use a struct? The decision is important. Are your methods all virtual?

There are a lot of issues with RNG's being passed by value; it's better to use a reference type (i.e. class) and just make all methods final. T -- It said to install Windows 2000 or better, so I installed Linux instead.
Jan 21 2014
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
H. S. Teoh:

 There are a lot of issues with RNG's being passed by value; 
 it's better to use a reference type (i.e. class) and just
 make all methods final.

Yes, I remember. A final class is probably better. Bye, bearophile
Jan 21 2014
prev sibling next sibling parent "terchestor" <terchestor gmail.com> writes:
On Tuesday, 21 January 2014 at 13:44:09 UTC, monarch_dodra wrote:
 On Tuesday, 21 January 2014 at 11:04:13 UTC, terchestor wrote:

 My major points would be:
 1) Make it a "final class": By default, classes have all their 
 methods virtual. I don't think you want that.

You're right.
 2) I'd adhere to the "Range" interface: Give your function the 
 "front"/"popFront()" functions, as well as the public "enum 
 empty = false".

That's and excellent idea because your final example is what I had in mind.
 3) I think the *instance* itself should carry the bounds on 
 initialization, and provide a single "uniform" function.

There is a later version with only one uniform method. But I'd like to be able to change bounds with a single instance.
 4) Uniform initialization: provide a "tausworthe(I = int, R = 
 double)(int seed, double low = 0.0, double high = 1.0 )" 
 function. This function should simply return a new instance of 
 a Tausworthe. The advantage of this are 2-fold:
 4.1) Similar syntax for struct/class initialization
 4.2) Default Type parameters for the type.

Excellent, as I said.
 The combination of 2/3/4 will allow you to write nifty things 
 like:
 int[] myRandomNumbers = tausworthe(randomSeed).take(10).array();

 And voila! An array of 10 random numbers! Easy-peasy.

I adhere. Thanks for taking time to review first D try.
Jan 21 2014
prev sibling next sibling parent "terchestor" <terchestor gmail.com> writes:
On Tuesday, 21 January 2014 at 12:32:43 UTC, bearophile wrote:
 terchestor:

 https://github.com/Terchestor/dlang/tree/Tausworthe-class

 /++
 §§§ Tausworthe

What's the purpose of those §? They add noise. If they aren't necessary I suggest to remove them.

 class Tausworthe ( I, F ) if ( __traits(isIntegral, I ) &&
                               __traits(isFloating, F ) )

I suggest to not put spaces around ( ). And if you want also remove the space after the class/functions name.

That's an old usage of mine.
 Perhaps you want to use a struct? The decision is important. 
 Are your methods all virtual?

Not a struc, I need a class.
 {
    alias I integral_t;
    alias F floating_t;

Today the syntax "alias Name = Foo;" is preferred.

I knew not. Do ypu mean: alias integral_t = I;
 In D code the use of "_t" as suffix for types is less common. 
 And perhaps it's better to use a starting upper case letter for 
 their name.

I forgot the Capital letter, but I like the trailing _t, like size_t.
    enum tauswortheConstFloat : floating_t
    {
        tau0 = 4294967296.0f

Class/struct/Enum/Union names should start with an upper case letter.
         tau1 = 4294967294,

Perhaps that literal can end with a U.
   ~this( )
    { }

No need for this.

Of course!
    void seed( integral_t seed_=1 )
 ...
        auto s = seed_;

In general annotate with enum/const/immutable all variables that don't need to mutate.

I'd better put an assertion (or enforce?) to be sure seed won't be null, and give constantness to seed.
        integral_t fseed( const integral_t s_ )
        {
            return ( tauswortheConst.lgc * s_ ) & 
 tauswortheConst.mask;
        }

That's probably better annotated as pure nothrow.

OK.
 }//§§==========END_CLASS::Tausworthe==========

I suggest to reduce the amount of === in this code.

OK.
 auto ?tausworthe0 = new Tausworthe!(ulong, double);

While D supports Unicode identifiers, they are a often a pain in the ass (this is a personal opinion), so perhaps you want to kill them all.

Not sure. I'm experimenting a kind of tagging to easily distinguish nested functions (with greek phi) or temporaries. Perhaps not a so good idea, but i'll see if enhances the readability later.
 //§§ __COPYRIGHT_NOTICE__

I think there's a ddoc field syntax for the copyright in the module ddoc.

Yes, but it's a work on progress...

Thank you so much for your invaluable help.
Jan 21 2014
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
terchestor:

but I like the trailing _t, like size_t.

I can't be sure, but I think the trailing _t is not very used in D. size_t is an exception, like hash_t and little more. Bye, bearophile
Jan 21 2014
prev sibling next sibling parent "terchestor" <terchestor gmail.com> writes:
Thanks everybody for reviewing my code.

I rewrote almost everything to condense the code and conform to 
stylistic D usage.
My problem is the cast to integral type (line 80), downgrading 
performances of circa 50%.
Is there any possibility to avoid it?

Next step will be to implement the range interface as suggested 
by monarch_dodra.
Jan 22 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 22 January 2014 at 09:33:39 UTC, terchestor wrote:
 My problem is the cast to integral type (line 80), downgrading 
 performances of circa 50%.
 Is there any possibility to avoid it?

I'd say it's your "Tconst_tau0" that is getting in your way: You have a uint sized integral, which you divide by a floating point, just to multiply it again and cast back to integral type (!). I'd suggest you get rid of "high/low" completly. There are prng *adaptors* that can create a uniform distribution *from* a natural prng (std.random.uniform). If you do this, you code becomes (starting line 73): auto uni = (s1 ^ s2 ^ s3); static if (isFloatingPoint!result_t) { return uni / Tconst_tau0; } else { return uni; } Heck, you could get rid of floating point support entirely, and generate only integrals. Then, you just let "uniform" do the tough work: auto rnd = tausworthe!(ulong, uint)(randomSeed); auto rndF = rnd.uniform!"[)"(0.0, 1.0); //0 - 1 double range auto rnd1024 = rnd.uniform!"[)"(0, 1024); //[0 .. 1024) integral range. Nitpick: /// get seed from current time auto se = Clock.currTime().stdTime; This is known to be a bad seed. If you need a "good enough" seed, use "std.random.randomSeed". Although arguably (IMO), it is better place the burden of seeding on the caller.
Jan 22 2014
prev sibling next sibling parent "terchestor" <terchestor gmail.com> writes:
On Wednesday, 22 January 2014 at 11:02:10 UTC, monarch_dodra 
wrote:
 On Wednesday, 22 January 2014 at 09:33:39 UTC, terchestor wrote:
 My problem is the cast to integral type (line 80), downgrading 
 performances of circa 50%.
 Is there any possibility to avoid it?

I'd suggest you get rid of "high/low" completly. There are prng *adaptors* that can create a uniform distribution *from* a natural prng (std.random.uniform). If you do this, you code becomes (starting line 73): auto uni = (s1 ^ s2 ^ s3); static if (isFloatingPoint!result_t) { return uni / Tconst_tau0; } else { return uni; } Heck, you could get rid of floating point support entirely, and generate only integrals. Then, you just let "uniform" do the tough work: auto rnd = tausworthe!(ulong, uint)(randomSeed); auto rndF = rnd.uniform!"[)"(0.0, 1.0); //0 - 1 double range auto rnd1024 = rnd.uniform!"[)"(0, 1024); //[0 .. 1024) integral range.

Actually, for the purpose I've in mind, a DSP library, which is time critical (and that's why I need the fastest possible PRNG), I'd better decoupling the range processing and create a class implementing the range interface to store the numbers generated by uniform to serve them on demand: a composite pattern is better than inheritance.
 Nitpick:
         /// get seed from current time
         auto se = Clock.currTime().stdTime;

 This is known to be a bad seed. If you need a "good enough" 
 seed, use "std.random.randomSeed". Although arguably (IMO), it 
 is better place the burden of seeding on the caller.

Fine remark worth adopting.
Jan 22 2014
prev sibling next sibling parent "terchestor" <terchestor gmail.com> writes:
I finally opted for both the simplest and fastest solution: a 
templated class (integral type) and a templated uniform method 
(integral or floating-point) generating variates either in the 
whole integral range or in the 0.0..1.0 range.
I let the burden of generating the whole floating-point range on 
the user side, choosing any divider of the integral generated 
variate.

Hence a similar speed for both types, roughly 40% faster than 
std.random.uniform.
Jan 22 2014
prev sibling parent "Chris Cain" <clcain uncg.edu> writes:
On Wednesday, 22 January 2014 at 11:02:10 UTC, monarch_dodra 
wrote:
 use "std.random.randomSeed".

I think you mean "std.random.unpredictableSeed" but 100% agree.
Jan 22 2014