digitalmars.D - Re: std.string and ranges

bearophile <bearophileHUGS lycos.com> Feb 08 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 10 2009

bearophile <bearophileHUGS lycos.com> Feb 10 2009

Leandro Lucarella <llucax gmail.com> Feb 11 2009

bearophile <bearophileHUGS lycos.com> Feb 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 11 2009

bearophile <bearophileHUGS lycos.com> Feb 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 11 2009

bearophile <bearophileHUGS lycos.com> Feb 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 11 2009
Benji Smith <dlanguage benjismith.net> Feb 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 11 2009

"Steven Schveighoffer" <schveiguy yahoo.com> Feb 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> Feb 12 2009
Jason House <jason.james.house gmail.com> Feb 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 12 2009

Sean Kelly <sean invisibleduck.org> Feb 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 12 2009

Sean Kelly <sean invisibleduck.org> Feb 12 2009

Christopher Wright <dhasenan gmail.com> Feb 13 2009

dsimcha <dsimcha yahoo.com> Feb 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 12 2009

Jason House <jason.james.house gmail.com> Feb 12 2009

Jason House <jaaon.james.house gmail.com> Feb 13 2009
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 13 2009

Bill Baxter <wbaxter gmail.com> Feb 12 2009

Bill Baxter <wbaxter gmail.com> Feb 12 2009

Bill Baxter <wbaxter gmail.com> Feb 12 2009

"Denis Koroskin" <2korden gmail.com> Feb 11 2009
"Denis Koroskin" <2korden gmail.com> Feb 12 2009
Steve Schveighoffer <schveiguy yahoo.com> Feb 12 2009
"Denis Koroskin" <2korden gmail.com> Feb 12 2009
"Simen Kjaeraas" <simen.kjaras gmail.com> Feb 13 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 Yes, std.string must be revamped too.


From what you say in several of your last posts it seems std will be almost
fully rewritten.

I think this is positive, and while sometimes I think D2 language (and its std
lib) is becoming too much complex for my tastes (I have appreciated D1 a lot
because it's simple enough and it gives me quite more than what I give to it.
D2 asks me quite more, but gives me a bit more), I like and silently agree with
most of the things you want to do to improve the std lib.
It seems D2 is becoming a language really different from D1, so I think Tango
for D2 will need very large changes.

It may be positive to copy few things from Tango, like BigInt, etc to the
future std of D2.

I think the current std.random needs many changes (I have expressed my ideas on
this in a past email but if you want I can say them again). I may want to work
a little to help you develop such std lib of D2, for example the std.random,
but sometimes the things you say seem a bit too much sophisticated for me, so I
may not be able to help much in practice...

What I have recently said regarding the lazy generators can be applied to other
things: having two ways to do things is generally bad, but if one of those ways
is really easy and short to write and the other is very flexible and fast, then
two ways to do something may coexist. For example this idea can applied to the
foreach, the std.random, etc.

I also hope to see the printing function(s) of D2 improved in the ways I have
explained you recently. The bad thing is that beside you I think no one else
has expressed a comment regarding those functions. So I don't know if others
like those ideas.

Bye,
bearophile

Feb 08 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 Yes, std.string must be revamped too.


 From what you say in several of your last posts it seems std will be
 almost fully rewritten.


Well at some point I hope to be able to rally a few volunteers to help.

 I think this is positive, and while sometimes I think D2 language
 (and its std lib) is becoming too much complex for my tastes (I have
 appreciated D1 a lot because it's simple enough and it gives me quite
 more than what I give to it. D2 asks me quite more, but gives me a
 bit more), I like and silently agree with most of the things you want
 to do to improve the std lib. It seems D2 is becoming a language
 really different from D1, so I think Tango for D2 will need very
 large changes.
 
 It may be positive to copy few things from Tango, like BigInt, etc to
 the future std of D2.


Yah, if only the political barriers would fall. (BigInt in particular is 
written by Don and will be in Phobos2 as well.)

 I think the current std.random needs many changes (I have expressed
 my ideas on this in a past email but if you want I can say them
 again). I may want to work a little to help you develop such std lib
 of D2, for example the std.random, but sometimes the things you say
 seem a bit too much sophisticated for me, so I may not be able to
 help much in practice...


I searched google for

bearophile std.random site:digitalmars.com

and couldn't find the message, so I'd appreciate a link. In my view 
std.random needs some obvious changes such as obeying the range 
interface (i.e., any random generator is an infinite range). I effected 
that, and also simplified uniform a bit such that you don't ever need to 
pass explicit template parameters into it. There's plenty of more work 
to do on random, particularly on generating non-uniform distributions, 
and probably others so I'd appreciate your input.

 What I have recently said regarding the lazy generators can be
 applied to other things: having two ways to do things is generally
 bad, but if one of those ways is really easy and short to write and
 the other is very flexible and fast, then two ways to do something
 may coexist. For example this idea can applied to the foreach, the
 std.random, etc.


I'm not sure I understand this.

 I also hope to see the printing function(s) of D2 improved in the
 ways I have explained you recently. The bad thing is that beside you
 I think no one else has expressed a comment regarding those
 functions. So I don't know if others like those ideas.


I think your ideas on improving I/O are very sensible. (I wouldn't want 
to format tuples the same way though.)


Andrei

Feb 10 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

Well at some point I hope to be able to rally a few volunteers to help.<


I hope to be able to understand how to do the things :-)


I searched google for bearophile std.random site:digitalmars.com and couldn't
find the message, so I'd appreciate a link<


I can just repeat the things I have said, improving them.


There's plenty of more work to do on random, particularly on generating
non-uniform distributions, and probably others so I'd appreciate your input.<


In my dlibs there's a random module too, I have not finished it, but it already
contains some useful stuff.
http://www.fantascienza.net/leonardo/so/dlibs/random.html


A module like that in a standard lib needs functions like:
- choice: given an iterable of items, picks one at random
- randInt: gives a random integer in an interval, the lower bound is zero by
default.
- shuffle: to shuffle randomly a given array (or range with random access, I'd
say) in place.
- shuffled: like shuffle, but creates a copy of the items. Return an array.
- uniform: gives a random real/float/double in a given interval
- sample: extracts n items from an iterable with uniform distribution, without
replacement
- weightedChoice: like choice, but the user can specify the relative
probability of choosing each item.

I can assure you that such functions are hugely useful in tons of programs.
Yet, you don't need more than few days to implement all those things (I already
have implementations of most of them).

Few basic and easy distributions may be useful too.

A random module has to be used by lot of different people in many different
situations:
- Some people need very good random numbers, even if they are slow to produce.
Other people may also need a thread safe random generator.
- Some people need just to write a 15-lines long D program that looks like a
script. Their code often doesn't need to be very fast, but the code must be as
simple as possible, and the faster possible to write, so the syntax has to be
short and very intuitive. The programmer doesn't want to read the docs to write
such code.
- Some people need to produce tons of random numbers, as fast as possible, but
they don't need to be high quality. In such situation thread safety may be less
important.
- Some people just need to write safe and fast code, something normal.

So to fulfill all such purposes I think the random module needs needs three
different random generators:
- A really fast one. I think R250/521 is the best for this purpose, it's faster
than the usual one used in C and it's much better. It needs a certain amount of
good random numbers to start, that can be given by the other slower and better
random generators. Such code is designed to be as fast as possible.
- A multi-purpose average random generator, very easy to use, so it doesn't
need to instantiate a class if you are writing a single threaded program, you
just use simple functions. A generator like KISS of Tango is good here, but the
API can be made simpler and very easy.
- A very good generator, thread safe. For this a variant of the Mersenne
Twister is good.


Info about R250/521:
http://www.informs-cs.org/wsc97papers/0127.PDF

C implementation of R250/521:


/*
Copyright (c) 2006 the authors listed at the following URL:
http://literateprograms.org/R250/521_(C)?action=history&offset=20060711142344

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Retrieved from: http://literateprograms.org/R250/521_(C)?oldid=6914
*/

#include <stdlib.h>

#define R250_LEN     250
#define R521_LEN     521
#define ULONG_BITS   (8*sizeof(unsigned long))

unsigned long r250_buffer[R250_LEN];
unsigned long r521_buffer[R521_LEN];

int r250_index;
int r521_index;

void r250_521_init() {
    unsigned long mask1 = 1;
    unsigned long mask2 = (unsigned long)(-1L);

    int i = R521_LEN;

    while (i-- > R250_LEN) {
        r521_buffer[i] = rand();
    }
    while (i-- > ULONG_BITS) {
        r250_buffer[i] = rand();
        r521_buffer[i] = rand();
    }

    while (i-- > 0) {
        r250_buffer[i] = (rand() | mask1) & mask2;
        r521_buffer[i] = (rand() | mask1) & mask2;
        mask2 ^= mask1;
        mask1 >>= 1;
    }
    r250_buffer[0] = mask1;
    r521_buffer[0] = mask2;

    r250_index = 0;
    r521_index = 0;

}

#define R250_IA  (sizeof(unsigned long)*103)
#define R250_IB  (sizeof(unsigned long)*R250_LEN - R250_IA)
#define R521_IA  (sizeof(unsigned long)*168)
#define R521_IB  (sizeof(unsigned long)*R521_LEN - R521_IA)

unsigned long r250_521_random() {
    unsigned long r, s;
    unsigned char * b1 = (unsigned char*)r250_buffer;
    unsigned char * b2 = (unsigned char*)r521_buffer;
    unsigned long * tmp1, * tmp2;

    int i1 = r250_index;
    int i2 = r521_index;
    int j1, j2;

    j1 = i1 - R250_IB;
    if (j1 < 0)
        j1 = i1 + R250_IA;
    j2 = i2 - R521_IB;
    if (j2 < 0)
        j2 = i2 + R521_IA;

    tmp1 = (unsigned long *)(b1 + i1);
    r = (*(unsigned long *)(b1 + j1)) ^ (*tmp1);
    *tmp1 = r;
    tmp2 = (unsigned long *)(b2 + i2);
    s = (*(unsigned long *)(b2 + j2)) ^ (*tmp2);
    *tmp2 = s;

    i1 = (i1 != sizeof(unsigned long)*(R250_LEN-1)) ? (i1 + sizeof(unsigned
long)) : 0;
    r250_index = i1;
    i2 = (i2 != sizeof(unsigned long)*(R521_LEN-1)) ? (i2 + sizeof(unsigned
long)) : 0;
    r521_index = i2;

    return r ^ s;
}


 What I have recently said regarding the lazy generators can be applied to
other things: having two ways to do things is generally bad, but if one of
those ways is really easy and short to write and the other is very flexible and
fast, then two ways to do something may coexist. For example this idea can
applied to the foreach, the std.random, etc.<<




I'm not sure I understand this.<


I have expressed that idea once in the past too, and I have just explained an
example of it regarding std.random. I can try again to explain myself.

When you design a feature in a language you have to find compromises among
different needs:
- Such feature needs a short and easy syntax that is not error-prone.
- Such feature has to be efficient in space and time, and very flexible.

Often such two purposes are almost opposite, it's not easy to have both at the
same time. A language like C++ often choses the second, and a language like
Python or Ruby the first (but there are many exceptions).

So I may be tempted to ask you to put two different implementations of such
feature:
- One very easy to use, short and easy to remember, not much efficient.
- One more flexible, more complex, less easy to remember (you may have to look
into the docs to use it), very effficient.

But generally in a language duplicating things and having two ways to do the
same thing is bad, it's extra complexity.

Let's call C1 the complexity (for the programmer) to use the first way, and C2
the complexity of using the second way.

In several situations I have seen that if C1 <<< C2 then C1+C2 ~= C2, so C1
doesn't add much complex to the language, so they both can coexist.


I think your ideas on improving I/O are very sensible.<


I/O is used during debugging, logging, etc. What I'd like is to have a very
nice, very readable, little/unambigous way to show any built-in type. I may
also want two ways to represent such data, a readable one and one for ASCII
serialization (but readable still, if possible), as in Python. Note this may
need another default method beside toString, somethign like toRepr, or
toSerialize, that returns a textual representation that given to the object (to
a standard method, if you want, like fromSerialize? or fromRepr), re-creates
the original built-in data or object.

I also print things often, so the find the printing syntax used by Tango
confusing, and too much long. I think function names like put/putr/putf/putfr
are the best, because they are clean, short to type, etc. put prints, put
prints with a newline, and the versions with f format too in some way (using %
as in C or {} as in C#/Tango, C# may be better here).


(I wouldn't want to format tuples the same way though.)<


Well, nothing I have done is perfect, everything can be improved.
Do you mean structs or tuples?
If you give a tuple to one of my printing functions you just receive a
string-fication of all the items, with no spaces beteen them. It's a bare-bones
thing.
My put/putr show a standard representation of structs, if they miss a toString.
I'd also like every struct to have a built in copy, toHash, opEquals (they
already have this), a default toString, a default opCmp too, etc. My
record()/Record!() structs already have such qualities, even recursively, even
if you nest normal structs into them.

Bye,
bear hugs,
bearophile

Feb 10 2009

Leandro Lucarella <llucax gmail.com> writes:

bearophile, el 10 de febrero a las 15:50 me escribiste:
 Andrei Alexandrescu:
 
Well at some point I hope to be able to rally a few volunteers to help.<


 I hope to be able to understand how to do the things :-)
 
 
I searched google for bearophile std.random site:digitalmars.com and couldn't
find the message, so I'd appreciate a link<


 I can just repeat the things I have said, improving them.
 
 
There's plenty of more work to do on random, particularly on generating
non-uniform distributions, and probably others so I'd appreciate your input.<


 In my dlibs there's a random module too, I have not finished it, but it
already contains some useful stuff.
 http://www.fantascienza.net/leonardo/so/dlibs/random.html
 
 
 A module like that in a standard lib needs functions like:


http://docs.python.org/library/random.html#module-random

;)

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Skepticism is the beginning of failure

Feb 11 2009

bearophile <bearophileHUGS lycos.com> writes:

Leandro Lucarella:
 A module like that in a standard lib needs functions like:


 http://docs.python.org/library/random.html#module-random
 
 ;)


You can notice some differences, shuffle/shuffled, and weightedChoice are
different or missing. You can also see I have not included other things like
randRange.
I have also discussed about thread safety, about having three different rnd
generators in the std lib, etc.

Bye,
bearophile

Feb 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 There's plenty of more work to do on random, particularly on generating
non-uniform distributions, and probably others so I'd appreciate your input.<


 In my dlibs there's a random module too, I have not finished it, but it
already contains some useful stuff.
 http://www.fantascienza.net/leonardo/so/dlibs/random.html


Are you sure we're looking at the same libraries? What I see is that the 
std.algorithm has most of what you ask below, and that dlibs random is 
rather thin.

 A module like that in a standard lib needs functions like:
 - choice: given an iterable of items, picks one at random


int array[100];
auto rnd = Random(unpredictableSeed);
auto rndElement = array[uniform(rnd, 0, 100)];

 - randInt: gives a random integer in an interval, the lower bound is zero by
default.


auto randInt = uniform(rnd, 0, 10);

 - shuffle: to shuffle randomly a given array (or range with random access, I'd
say) in place.


randomShuffle(array, rnd);

 - shuffled: like shuffle, but creates a copy of the items. Return an array.


auto another = array.dup;
randomShuffle(another, rnd);

 - uniform: gives a random real/float/double in a given interval


double d = uniform(rnd, 0, 1);

 - sample: extracts n items from an iterable with uniform distribution, without
replacement


I'll add this.

 - weightedChoice: like choice, but the user can specify the relative
probability of choosing each item.


auto x = dice(rnd, 70, 20, 10);

 I can assure you that such functions are hugely useful in tons of programs.
Yet, you don't need more than few days to implement all those things (I already
have implementations of most of them).


Again, I'm looking at dlibs/random and std/random and am a bit confused. 
  I should be saying what you're saying! Are you sure you're not talking 
about rand in D 1.0 and about a dlibs from the future?

 Few basic and easy distributions may be useful too.


That I'd definitely appreciate your help with. For example, things like 
Gaussian and Poisson distribution would be very useful for many machine 
learning applications.

 A random module has to be used by lot of different people in many different
situations:
 - Some people need very good random numbers, even if they are slow to produce.
Other people may also need a thread safe random generator.
 - Some people need just to write a 15-lines long D program that looks like a
script. Their code often doesn't need to be very fast, but the code must be as
simple as possible, and the faster possible to write, so the syntax has to be
short and very intuitive. The programmer doesn't want to read the docs to write
such code.
 - Some people need to produce tons of random numbers, as fast as possible, but
they don't need to be high quality. In such situation thread safety may be less
important.
 - Some people just need to write safe and fast code, something normal.
 
 So to fulfill all such purposes I think the random module needs needs three
different random generators:
 - A really fast one. I think R250/521 is the best for this purpose, it's
faster than the usual one used in C and it's much better. It needs a certain
amount of good random numbers to start, that can be given by the other slower
and better random generators. Such code is designed to be as fast as possible.


std.random offers MinStdRand0 and MinStdRand, two linear congruential 
random generators that are fast. Also you can use 
LinearCongruentialEngine to create your own, with different parameters.

 - A multi-purpose average random generator, very easy to use, so it doesn't
need to instantiate a class if you are writing a single threaded program, you
just use simple functions. A generator like KISS of Tango is good here, but the
API can be made simpler and very easy.


Perhaps there is some merit in having some random generation that 
doesn't involve creating an object. I don't think it's paramount, but 
it's a marginal convenience.

 - A very good generator, thread safe. For this a variant of the Mersenne
Twister is good.


Like std.random.MersenneTwisterEngine I presume :o).

 Info about R250/521:
 http://www.informs-cs.org/wsc97papers/0127.PDF


How does R250/521 compare with linear congruential and the Mersenne 
Twister engine? A cursory look failed to answer that question, and I 
don't have time for a closer read.

Anyhow, it would be great to include more generators in Phobos, so if 
you plan to contribute some please let me know. The planned interface is 
similar with the range interface.



Andrei

Feb 11 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

What I see is that the std.algorithm has most of what you ask below,<


"shuffle"/"shuffler" are of course algorithms, but I think mixing all kind of
algorithms into a module isn't good, it's better to move the random-related
algorithms into std.random. It helps the memory of the programmer to know where
to look for, and avoids lumping too many mixed things into a single module.


 int array[100];
 auto rnd = Random(unpredictableSeed);
 auto rndElement = array[uniform(rnd, 0, 100)];


That's just a starting point, now you can wrap that into a nice single function
with this interface:
choice(iterable)
that works with any iterable, lazy too.
I know that's slow code, but in many situations you don't care and you want
handy and safe functions. And where you need max speed, you don't use it.

I don't like this:
uniform(rnd, 0, 100)
This seems better:
rnd.uniform(0, 100)
Then 0 can be the default lower bound:
rnd.uniform(100)
And for simple usages you may just want the default struct:
uniform(100)


 - randInt: gives a random integer in an interval, the lower bound is zero by
default.


 - uniform: gives a random real/float/double in a given interval




How can uniform return ints or doubles according to the input? I think it may
be better to have two different functions.
(And you can optimize efficiency if you know you need an integral value as
result).

A normal(avg, stddev) function can be good too, where avg may be by default
zero again :-)

A fastNormal(avg, stddev) function too can be useful, less precise but faster.


 - shuffled: like shuffle, but creates a copy of the items. Return an array.


 randomShuffle(another, rnd);


shuffled() is meant to be used in a single line, so you can use it into an
expression.
And dup works with arrays only, so you need to put into shuffled() a call to
something like the array() function of my libs to have something more general.


 - weightedChoice: like choice, but the user can specify the relative
probability of choosing each item.<




I don't understand that interface, this isn't good. I was talking about a
function that given an iterable of items and an iterable of frequences (or
probabilities) returns an item at random according to the given probabilities.


How does R250/521 compare with linear congruential and the Mersenne Twister
engine?<


R250/521:
- is generates better rnd values than linear congruential, and is a bit faster.
- is faster than KISS.
- is much much faster than Mersenne Twister.
- needs few good rnd numbers to start, so they can be given by a Mersenne
Twister (they may be even hard-coded).


Anyhow, it would be great to include more generators in Phobos, so if you plan
to contribute some please let me know.<


I think it's better to not overdo the number of generators, and put there only
few of them, that the programmer can tell apart in a simple way.

I think 3 generators like R250/521 (that you can choose to use a set of free
functions or as struct methods), KISS (two iterfaces), and Mersenne Twister
(only the thread safe API) may cover most basic usages and some of nonbasic
ones.

Then it's easy to help the programmer to tell them apart, you can sort them
just according to their speed:
  fastRnd ==> R250/521
  rnd ==> KISS
  slowRnd ==> Mersenne Twister
(if you think that people will not use the third if you put "show" there, then
you can call it preciseRND, but it's a longer name).

Later it can be added a way to generate huge random BigInts too.

Bye,
bearophile

Feb 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 
 What I see is that the std.algorithm has most of what you ask below,<


 "shuffle"/"shuffler" are of course algorithms, but I think mixing all kind of
algorithms into a module isn't good, it's better to move the random-related
algorithms into std.random. It helps the memory of the programmer to know where
to look for, and avoids lumping too many mixed things into a single module.


It's there. Read The Fine Manual:

http://www.digitalmars.com/d/2.0/phobos/std_random.html

 int array[100];
 auto rnd = Random(unpredictableSeed);
 auto rndElement = array[uniform(rnd, 0, 100)];


 That's just a starting point, now you can wrap that into a nice single
function with this interface:
 choice(iterable)
 that works with any iterable, lazy too.
 I know that's slow code, but in many situations you don't care and you want
handy and safe functions. And where you need max speed, you don't use it.


That would only work with ranges that know their length, right? I don't 
know how to select a random element of an infinite range or one that has 
an unknown length.

 I don't like this:
 uniform(rnd, 0, 100)
 This seems better:
 rnd.uniform(0, 100)
 Then 0 can be the default lower bound:
 rnd.uniform(100)
 And for simple usages you may just want the default struct:
 uniform(100)


Well this becomes nitpicky.

 - randInt: gives a random integer in an interval, the lower bound is zero by
default.


 - uniform: gives a random real/float/double in a given interval




 How can uniform return ints or doubles according to the input? I think it may
be better to have two different functions.
 (And you can optimize efficiency if you know you need an integral value as
result).


I meant:

double d = uniform(rnd, 0.0, 1.0);

The "uniform" function infers the type of its result from its arguments.

 A normal(avg, stddev) function can be good too, where avg may be by default
zero again :-)
 
 A fastNormal(avg, stddev) function too can be useful, less precise but faster.


Where's the underlying random generator?

 - shuffled: like shuffle, but creates a copy of the items. Return an array.


 randomShuffle(another, rnd);


 shuffled() is meant to be used in a single line, so you can use it into an
expression.
 And dup works with arrays only, so you need to put into shuffled() a call to
something like the array() function of my libs to have something more general.


Right. So what you're saying is that the concern of copying is separated 
from the concern of shuffling. So the two shouldn't be collapsed 
together unless there's an obvious necessity or benefit. Saving a line 
of code doesn't seem convincing.

 - weightedChoice: like choice, but the user can specify the relative
probability of choosing each item.<




 I don't understand that interface, this isn't good.


Well it would be good if you read the documentation, which seems 
increasingly clear to me you haven't.

 How does R250/521 compare with linear congruential and the Mersenne Twister
engine?<


 R250/521:
 - is generates better rnd values than linear congruential, and is a bit faster.
 - is faster than KISS.
 - is much much faster than Mersenne Twister.
 - needs few good rnd numbers to start, so they can be given by a Mersenne
Twister (they may be even hard-coded).
 
 
 Anyhow, it would be great to include more generators in Phobos, so if you plan
to contribute some please let me know.<


 I think it's better to not overdo the number of generators, and put there only
few of them, that the programmer can tell apart in a simple way.


Well it looks like everyone thinks their favorite generator is the 
"best". As far as Phobos is concerned, the source for the design of 
std.random is inspired from 
http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2461.pdf, which 
goes for defining several generators.


Andrei

Feb 11 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

I don't know how to select a random element of an infinite range or one that
has an unknown length.<


The programmer has to take care of not giving an infinite range, because it may
lead to an unbounded time to wait.

If the range is finite, but its length isn't know, there are algorithms to
extract an item randomly with uniform probability.
(Do you remember your quiz to extract a random line from a file, where at the
beginning you don't know how many lines it has?)


Well this becomes nitpicky.<


You are right, I am sorry, I was getting a bit nervous. You are usually gentle
with me, and this helps me think much better. Maybe this time you are
nervous/angry because I have not read the docs of sts.random of Phobos of D2.


 double d = uniform(rnd, 0.0, 1.0);
 The "uniform" function infers the type of its result from its arguments.


If you have forgotten to put a .0 there in a small example, then it looks a bit
too much bug-prone. I think I prefer two functions still, it's also useful as
self-documentation of the program.


Where's the underlying random generator?<


If not specified it uses the default non-thread-safe generator (something like
KISS in my suggestions, that is the middle one).


Saving a line of code doesn't seem convincing.<


It's not a line of code, shuffled() is an expression, so you can put it inside
other expressions, it may save more than one line. One line here and one line
there make the difference.

Having only functionally atomic functions/things is useful because you can
combine them, but when you program you may see that 80% of the times you want
to combine then in small amount of certain specific way. So it's positive to
add such common idioms as handy functions (that can also be used as an
expression too).


Well it would be good if you read the documentation,<


I have taken a look at the docs for dice(), I don't like its name because isn't
intuitive at all, but its usage is easy. The usage of the function I have
suggested is a bit more higher level.

An possible alternative design for such function is to take in input an already
sorted array of the weights (beside the iterable of the items), this may speed
up this function a bit (it just needs to call the algorithm for bisect search,
I presume).


Well it looks like everyone thinks their favorite generator is the "best".<


I have suggested three generators because there's no "best": the numbers
generated by R250/521 are worse than ones generated by KISS and much worse than
ones generated by Mersenne.


 http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2461.pdf
, which goes for defining several generators.


I have read about R250/521, and I have used it in some simulations in C/D, with
good/acceptable results. But I can't "compete" with groups of people that
define C++. Among them there are probably people expert about random
generators, so it may be wiser for you to follow them and to ignore me.

Regarding C++, if not already present, the default sort of D arrays may become
the stable sort of some good C++ STL.

Bye,
bearophile

Feb 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 You are right, I am sorry, I was getting a bit nervous. You are
 usually gentle with me, and this helps me think much better. Maybe
 this time you are nervous/angry because I have not read the docs of
 sts.random of Phobos of D2.


There's no reason for you to allow your conduct being influenced 
negatively by anyone. If you're right then this is what matters. One joy 
of hanging out in this group is that it's not the field of turf wars as 
it once was (and as I believe it could become again in the future). My 
perception is that this is largely a group in which respect is earned on 
technical grounds.

Andrei

Feb 11 2009

Benji Smith <dlanguage benjismith.net> writes:

bearophile wrote:
 I have taken a look at the docs for dice(), I don't like its name because
isn't intuitive at all, but its usage is easy. The usage of the function I have
suggested is a bit more higher level.
 
 An possible alternative design for such function is to take in input an
already sorted array of the weights (beside the iterable of the items), this
may speed up this function a bit (it just needs to call the algorithm for
bisect search, I presume).


FWIW, I've implemented this sort of thing before. In my implementation, 
it was called ProbabilisticChooser(T), and I could instantiate it either 
with a pair of parallel arrays or with a HashMap(T, double).

In my case, I didn't lump it in with my other random-number related 
code, because I had a set of other classes implementing the Chooser(T) 
interface. Some of them were random and some were deterministic, but 
they all provided the same "choose" function on top of a "choice 
strategy" implementation.


--benji

Feb 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
[about shuffling ranges]
 I'm with bearophile here. Not because his version is shorter, but because
 1) some containers/ranges might not have a dup method
 2) it is potentially faster because it may avoid unnecessary data 
 copying (relevant for large arrays).
 
 That said, having both versions is preferred by me with names like 
 shuffle and shuffledCopy.
 The same could be applied to sort - sortedCopy might be useful, too.


Oh, I understand now. Incidentally topNCopy exists already, and it 
becomes a sortedCopy for n = input.length. But perhaps an explicit 
function would drive the point home better.

About shuffleCopy, you and Leonardo are both right that it could be 
significantly faster than copying and then shuffling the copy. The 
solution that I see most in the spirit of the new range design is to 
have a ShuffledRange that, given another range, iterates it in a random 
manner. An ordinary copy() call closes the deal if a copy is needed.


Thanks,

Andrei

Feb 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Andrei Alexandrescu wrote:
 Denis Koroskin wrote:
 [about shuffling ranges]
 I'm with bearophile here. Not because his version is shorter, but because
 1) some containers/ranges might not have a dup method
 2) it is potentially faster because it may avoid unnecessary data 
 copying (relevant for large arrays).

 That said, having both versions is preferred by me with names like 
 shuffle and shuffledCopy.
 The same could be applied to sort - sortedCopy might be useful, too.


 Oh, I understand now. Incidentally topNCopy exists already, and it 
 becomes a sortedCopy for n = input.length. But perhaps an explicit 
 function would drive the point home better.
 
 About shuffleCopy, you and Leonardo are both right that it could be 
 significantly faster than copying and then shuffling the copy. The 
 solution that I see most in the spirit of the new range design is to 
 have a ShuffledRange that, given another range, iterates it in a random 
 manner. An ordinary copy() call closes the deal if a copy is needed.


Wait, I take that back. I don't know of solid ways to sort into a copy 
or shuffle into a copy. For shuffling I'd need auxiliary memory (in 
addition to the copy itself), and for sorting I'd be better off copying 
and then sorting in place. Could anyone illuminate me about better ways?

Andrei

Feb 11 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Denis Koroskin" wrote
 On Thu, 12 Feb 2009 15:14:34 +0300, Steve Schveighoffer
 On Thu, 12 Feb 2009 14:41:26 +0300, Denis Koroskin wrote:

 On Thu, 12 Feb 2009 07:46:31 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Wait, I take that back. I don't know of solid ways to sort into a copy
 or shuffle into a copy. For shuffling I'd need auxiliary memory (in
 addition to the copy itself), and for sorting I'd be better off copying
 and then sorting in place. Could anyone illuminate me about better
 ways?

 Andrei


 That's why I said it /might/ be faster. It doesn't have to, the default
 implementation may look as follows:

 typeof(Range.first)[] shuffledCopy(Range)(Range range) {
      return shuffle(createCopy(range));
 }

 However, it may be implemented quite differently if not inplace. For
 this to work we need two primitives that I didn't find in std.range (I
 know names are not good, feel free to rename them as you please):

 struct RandomNumbers(Rnd = Random)
 {
      // ...
 }

 A finite range of length 'max' that generates random numbers in interval
 [0..max) without repeating. An example:


 That's the problem.  How do you do this without remembering all the
 number you have returned so far (i.e. generate an array of N integers)?
 I've done algorithms like this, and generally you have to mutate the
 source array (swapping used items with the item at the location you took
 from).  You may save some time on swapping by building a temporary array
 of pointers or indexes, but you still have to build a temporary array,
 which is what Andrei was saying.

 -Steve


 Well, I've attached the source code, so take a look at it. I don't 
 remember all the values returned so far but I still use O(N) temporary 
 storage.


Yes, you are remembering.  The O(N) temporary storage is the part that 
remembers what you have returned (or rather, what you have not yet 
returned).  However, it is clever how you do not have to initialize the 
array to begin with.  I'm still not quite sure how it works.

 Copying and modifying array of int is often preffered to copying and 
 modifying and array of structs (they can be big, have heavyweight 
 assignment operators etc).


Yes, but Andrei said "I don't know of solid ways to ... shuffle into a copy. 
For shuffling I'd need auxiliary memory (in
addition to the copy itself)"

the O(N) space requirement could be quite costly for large arrays.

-Steve

Feb 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 "Denis Koroskin" wrote
 On Thu, 12 Feb 2009 15:14:34 +0300, Steve Schveighoffer
 On Thu, 12 Feb 2009 14:41:26 +0300, Denis Koroskin wrote:

 On Thu, 12 Feb 2009 07:46:31 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Wait, I take that back. I don't know of solid ways to sort into a copy
 or shuffle into a copy. For shuffling I'd need auxiliary memory (in
 addition to the copy itself), and for sorting I'd be better off copying
 and then sorting in place. Could anyone illuminate me about better
 ways?

 Andrei


 implementation may look as follows:

 typeof(Range.first)[] shuffledCopy(Range)(Range range) {
      return shuffle(createCopy(range));
 }

 However, it may be implemented quite differently if not inplace. For
 this to work we need two primitives that I didn't find in std.range (I
 know names are not good, feel free to rename them as you please):

 struct RandomNumbers(Rnd = Random)
 {
      // ...
 }

 A finite range of length 'max' that generates random numbers in interval
 [0..max) without repeating. An example:


 number you have returned so far (i.e. generate an array of N integers)?
 I've done algorithms like this, and generally you have to mutate the
 source array (swapping used items with the item at the location you took
 from).  You may save some time on swapping by building a temporary array
 of pointers or indexes, but you still have to build a temporary array,
 which is what Andrei was saying.

 -Steve


 remember all the values returned so far but I still use O(N) temporary 
 storage.


 Yes, you are remembering.  The O(N) temporary storage is the part that 
 remembers what you have returned (or rather, what you have not yet 
 returned).  However, it is clever how you do not have to initialize the 
 array to begin with.  I'm still not quite sure how it works.


Yah, the implementation is neat. The thing is it's not "RandomNumbers", 
it's "RandomPermutation" because what it really generates is a 
permutation of the numbers {0, 1,...., n}. Maybe a convenient way to 
generate permutations in std.random would please everybody. Then you 
index an array by a random permutation of its indices and you obtain a 
random cover of the array.


Andrei

Feb 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Ok, following Leonardo's and Denis' ideas on covering an immutable array 
in random order, I thought about it and came up with one algorithm, in 
addition to the obvious one that stores one integer per element (see 
Denis' code).

Here's the algorithm. It stores one bit per element and takes O(n log n) 
time to cover a range of length n.

1. Given a random-access range r of length n, create a bitmap with one 
bit per element in r. The range is initially all zeros.

2. At each step, select a random index in the _original_ array, i.e. a 
random number in 0 .. r.length. Then walk upwards (with wrapping at the 
end of the range) until you find a bit = 0 in the bitmap. Mark that bit 
= 1 and return the found element.

3. The cover ends when all bits in the bitmap are 1.

I did a ballpark complexity estimate by approximating the average number 
of steps in (2) with n / (n - k), where k is the number of 1s in the 
bitmap. IIRC the average number of steps to hit a 1 in a binomial 
distribution with p = (n - k) / n is 1/p. Right? In that case, the total 
number of steps taken by the algo above is:

N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n 
- 1)) < O (n log n)

Makes sense?

If the range does not offer random access, I think this algorithm will 
have quadratic complexity because it needs to make a number of steps 
proportional to n at each iteration.


Andrei

Feb 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 Ok, following Leonardo's and Denis' ideas on covering an immutable array 
 in random order, I thought about it and came up with one algorithm, in 
 addition to the obvious one that stores one integer per element (see 
 Denis' code).

 Here's the algorithm. It stores one bit per element and takes O(n log n) 
 time to cover a range of length n.

 1. Given a random-access range r of length n, create a bitmap with one bit 
 per element in r. The range is initially all zeros.

 2. At each step, select a random index in the _original_ array, i.e. a 
 random number in 0 .. r.length. Then walk upwards (with wrapping at the 
 end of the range) until you find a bit = 0 in the bitmap. Mark that bit = 
 1 and return the found element.

 3. The cover ends when all bits in the bitmap are 1.

 I did a ballpark complexity estimate by approximating the average number 
 of steps in (2) with n / (n - k), where k is the number of 1s in the 
 bitmap. IIRC the average number of steps to hit a 1 in a binomial 
 distribution with p = (n - k) / n is 1/p. Right? In that case, the total 
 number of steps taken by the algo above is:

 N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n - 
 1)) < O (n log n)

 Makes sense?


OK, but you still have O(n) space requirement (albeit with a much smaller 
constant).

There are some other optimizations you can make.  For instance, finding a 
bit can be optimized somewhat (i.e. if one integer's worth of bits is 
0xffffffff, you can skip 32 bits, and when you find an int which contains 
some 0 bits, you can even use a lookup table for 8 bits to tell you what the 
next bit should be, then you can go 8 bits at a time looking for your empty 
bit).  This might reduce the "search time" to a more acceptable number.  At 
most, the search time for a bit will be n / 32 + 3.

I think the tradeoff might be significant between a larger space requirement 
and an O(n) vs. O(nlgn) algo, but it would be highly dependent on the 
dataset size.

There's one other "obvious" algorithm.. store pointers to the original 
elements, or null if the element hasn't been swapped.  The code would look 
something like:

T [] _origArray;
T *[] _tmparray;
T *_headElem;

private T *getElem(int idx)
{
   if(auto retval  = _tmparray[idx])
      return retval;
   return &_origArray[idx];
}

void next()
{
   int idx = randomInt(0.._tmparray.length);
   _headElem = getElem(idx);
   _tmparray[idx] = getElem(_tmparray.length - 1);
   _tmparray.length = _tmparray.length - 1;
}

Same space requirements as the index generating range, but it may be faster 
in terms of looking up values.

-Steve

Feb 12 2009

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu Wrote:

 Ok, following Leonardo's and Denis' ideas on covering an immutable array 
 in random order, I thought about it and came up with one algorithm, in 
 addition to the obvious one that stores one integer per element (see 
 Denis' code).
 
 Here's the algorithm. It stores one bit per element and takes O(n log n) 
 time to cover a range of length n.
 
 1. Given a random-access range r of length n, create a bitmap with one 
 bit per element in r. The range is initially all zeros.
 
 2. At each step, select a random index in the _original_ array, i.e. a 
 random number in 0 .. r.length. Then walk upwards (with wrapping at the 
 end of the range) until you find a bit = 0 in the bitmap. Mark that bit 
 = 1 and return the found element.
 
 3. The cover ends when all bits in the bitmap are 1.



So far so good.


 I did a ballpark complexity estimate by approximating the average number 
 of steps in (2) with n / (n - k), where k is the number of 1s in the 
 bitmap. IIRC the average number of steps to hit a 1 in a binomial 
 distribution with p = (n - k) / n is 1/p. Right? In that case, the total 
 number of steps taken by the algo above is:
 
 N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n 
 - 1)) < O (n log n)
 
 Makes sense?



No.  Your wording sounds like you're doing stuff that's way off, but the
resulting math is correct.  My calculation would be based on the average length
of a sequence of 1's (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).

Given that O(n*log(n)) is the theoretical best you can do, having a result that
is < O(n*log(n)) is highly suspect.  The sum 1/1 + 1/2 + 1/3 + 1/4 + 1/5 + ...
is in fact O(log(n)).  



 If the range does not offer random access, I think this algorithm will 
 have quadratic complexity because it needs to make a number of steps 
 proportional to n at each iteration.


That's when copying may be the best implementation

Feb 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jason House wrote:
 Andrei Alexandrescu Wrote: 


 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).


Well my wording means this: in an array of length n with k "holes"
randomly distributed, the probability one slot is a a no-hole is
(n-k)/n. What we want is to find the first no-hole starting from a
random position in the array. How many steps do we do on average? That
is the same as the average number of steps of rolling a fair dice with
(n-k) faces until we obtain a particular face. And the average number of
steps is IIRC 1/p = n/(n-k).

 Given that O(n*log(n)) is the theoretical best you can do, having a
 result that is < O(n*log(n)) is highly suspect.  The sum 1/1 + 1/2 +
 1/3 + 1/4 + 1/5 + ... is in fact O(log(n)).


Ever so pedantic. :o) I meant "<=" because I wasn't aware of the best
theoretical bound. Do you have a pointer? Thanks.

I have the feeling there is something clever to do after half of the
array was covered. After that, the probability of a random element being
uncovered falls below 0.5.

I also have the feeling that interesting things can be done if the
length of the range has certain values, such as the period of certain
generators with certain parameters. I don't have the time to look into
that now. Anyone versed in e.g. linear congruential generators with a
given period?

One way or another, I'll add RandomCover to std.random. Thanks Leonardo,
Denis, Steve, and Jason.


Andrei

Feb 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Fri, Feb 13, 2009 at 7:21 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 Andrei Alexandrescu Wrote:
 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).


 randomly distributed, the probability one slot is a a no-hole is
 (n-k)/n. What we want is to find the first no-hole starting from a
 random position in the array. How many steps do we do on average? That
 is the same as the average number of steps of rolling a fair dice with
 (n-k) faces until we obtain a particular face. And the average number of
 steps is IIRC 1/p = n/(n-k).


 Except your holes aren't randomly distributed, because you are more
 likely to fill a hole next to an already filled slot.


Oops, that makes me more likely to choose some slots than other. My 
algorithm stinks!

Consider the array at the second iteration. There are n-1 zeros and one 
1 in the bitmap. The random number selection selects a number in 0..n 
with probability 1/n each. That's already wrong because each element was 
supposed to be selected with probability 1/(n-1). That extra mass goes 
to the element right after the one selected in the first round. I suck.


Andrei

Feb 12 2009

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Bill Baxter wrote:
 On Fri, Feb 13, 2009 at 7:21 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 Andrei Alexandrescu Wrote:
 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).


 randomly distributed, the probability one slot is a a no-hole is
 (n-k)/n. What we want is to find the first no-hole starting from a
 random position in the array. How many steps do we do on average? That
 is the same as the average number of steps of rolling a fair dice with
 (n-k) faces until we obtain a particular face. And the average number of
 steps is IIRC 1/p = n/(n-k).


 Except your holes aren't randomly distributed, because you are more
 likely to fill a hole next to an already filled slot.


 algorithm stinks!
 Consider the array at the second iteration. There are n-1 zeros and one
 1 in the bitmap. The random number selection selects a number in 0..n
 with probability 1/n each. That's already wrong because each element was
 supposed to be selected with probability 1/(n-1). That extra mass goes
 to the element right after the one selected in the first round. I suck.


Skip ahead a random number of elements before performing the linear
search.


Sean

Feb 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Bill Baxter wrote:
 On Fri, Feb 13, 2009 at 7:21 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 Andrei Alexandrescu Wrote:
 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).


 randomly distributed, the probability one slot is a a no-hole is
 (n-k)/n. What we want is to find the first no-hole starting from a
 random position in the array. How many steps do we do on average? That
 is the same as the average number of steps of rolling a fair dice with
 (n-k) faces until we obtain a particular face. And the average number of
 steps is IIRC 1/p = n/(n-k).


 likely to fill a hole next to an already filled slot.


 algorithm stinks!
 Consider the array at the second iteration. There are n-1 zeros and one
 1 in the bitmap. The random number selection selects a number in 0..n
 with probability 1/n each. That's already wrong because each element was
 supposed to be selected with probability 1/(n-1). That extra mass goes
 to the element right after the one selected in the first round. I suck.


 Skip ahead a random number of elements before performing the linear
 search.


Can you prove this will yield a uniform cover?

Andrei

Feb 12 2009

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Sean Kelly wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Bill Baxter wrote:
 On Fri, Feb 13, 2009 at 7:21 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 Andrei Alexandrescu Wrote:
 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).


 randomly distributed, the probability one slot is a a no-hole is
 (n-k)/n. What we want is to find the first no-hole starting from a
 random position in the array. How many steps do we do on average? That
 is the same as the average number of steps of rolling a fair dice with
 (n-k) faces until we obtain a particular face. And the average number of
 steps is IIRC 1/p = n/(n-k).


 likely to fill a hole next to an already filled slot.


 algorithm stinks!
 Consider the array at the second iteration. There are n-1 zeros and one
 1 in the bitmap. The random number selection selects a number in 0..n
 with probability 1/n each. That's already wrong because each element was
 supposed to be selected with probability 1/(n-1). That extra mass goes
 to the element right after the one selected in the first round. I suck.


 Skip ahead a random number of elements before performing the linear
 search.




Nope :-)  But it solves the problem of favoring adjacent elements.


Sean

Feb 12 2009

Christopher Wright <dhasenan gmail.com> writes:

Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Fri, Feb 13, 2009 at 7:21 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 Andrei Alexandrescu Wrote:
 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).


 randomly distributed, the probability one slot is a a no-hole is
 (n-k)/n. What we want is to find the first no-hole starting from a
 random position in the array. How many steps do we do on average? That
 is the same as the average number of steps of rolling a fair dice with
 (n-k) faces until we obtain a particular face. And the average number of
 steps is IIRC 1/p = n/(n-k).


 Except your holes aren't randomly distributed, because you are more
 likely to fill a hole next to an already filled slot.


 Oops, that makes me more likely to choose some slots than other. My 
 algorithm stinks!


On a similar note, we previously suggested using merge sort with a 
random comparer in order to shuffle a list outside memory. My 
experimentation showed a heavy bias toward certain permutations, though; 
can anyone determine why that should be?

Feb 13 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Ok, following Leonardo's and Denis' ideas on covering an immutable array
 in random order, I thought about it and came up with one algorithm, in
 addition to the obvious one that stores one integer per element (see
 Denis' code).
 Here's the algorithm. It stores one bit per element and takes O(n log n)
 time to cover a range of length n.
 1. Given a random-access range r of length n, create a bitmap with one
 bit per element in r. The range is initially all zeros.
 2. At each step, select a random index in the _original_ array, i.e. a
 random number in 0 .. r.length. Then walk upwards (with wrapping at the
 end of the range) until you find a bit = 0 in the bitmap. Mark that bit
 = 1 and return the found element.
 3. The cover ends when all bits in the bitmap are 1.
 I did a ballpark complexity estimate by approximating the average number
 of steps in (2) with n / (n - k), where k is the number of 1s in the
 bitmap. IIRC the average number of steps to hit a 1 in a binomial
 distribution with p = (n - k) / n is 1/p. Right? In that case, the total
 number of steps taken by the algo above is:
 N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n
 - 1)) < O (n log n)
 Makes sense?
 If the range does not offer random access, I think this algorithm will
 have quadratic complexity because it needs to make a number of steps
 proportional to n at each iteration.
 Andrei


The concern I have with this scheme is that iterating over an array with this
cover would not produce an unbiased distribution of permutations.  A good
shuffling algorithm will produce a perfectly uniform distribution over all
possible permutations of the original range, assuming the underlying random
number
generator is unbiased.  In other words, given an array of length N, every
permutation should have an equal probability 1/N!.

To get this unbiasedness, after selecting M values from a larger set N, the
probabilities of selecting any of the remaining N - M elements as element M + 1
must be uniform.  In this case, it is clearly not, since elements just after
elements that have already been selected are "bigger targets".  Similar problems
exist for naive implementations of the Fisher-Yates algorithm
(http://en.wikipedia.org/wiki/Fisher-Yates_shuffle).  In other words, unless one
is very careful, one will not get a truly unbiased, random permutation.

That said, for certain applications, what you propose may be "random enough."
However, if you do include it, please at least document that it is biased and
should not be used for things like monte carlo simulations or gambling, where
people might care.

Feb 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Ok, following Leonardo's and Denis' ideas on covering an immutable array
 in random order, I thought about it and came up with one algorithm, in
 addition to the obvious one that stores one integer per element (see
 Denis' code).
 Here's the algorithm. It stores one bit per element and takes O(n log n)
 time to cover a range of length n.
 1. Given a random-access range r of length n, create a bitmap with one
 bit per element in r. The range is initially all zeros.
 2. At each step, select a random index in the _original_ array, i.e. a
 random number in 0 .. r.length. Then walk upwards (with wrapping at the
 end of the range) until you find a bit = 0 in the bitmap. Mark that bit
 = 1 and return the found element.
 3. The cover ends when all bits in the bitmap are 1.
 I did a ballpark complexity estimate by approximating the average number
 of steps in (2) with n / (n - k), where k is the number of 1s in the
 bitmap. IIRC the average number of steps to hit a 1 in a binomial
 distribution with p = (n - k) / n is 1/p. Right? In that case, the total
 number of steps taken by the algo above is:
 N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n
 - 1)) < O (n log n)
 Makes sense?
 If the range does not offer random access, I think this algorithm will
 have quadratic complexity because it needs to make a number of steps
 proportional to n at each iteration.
 Andrei


 The concern I have with this scheme is that iterating over an array with this
 cover would not produce an unbiased distribution of permutations.  A good
 shuffling algorithm will produce a perfectly uniform distribution over all
 possible permutations of the original range, assuming the underlying random
number
 generator is unbiased.  In other words, given an array of length N, every
 permutation should have an equal probability 1/N!.
 
 To get this unbiasedness, after selecting M values from a larger set N, the
 probabilities of selecting any of the remaining N - M elements as element M + 1
 must be uniform.  In this case, it is clearly not, since elements just after
 elements that have already been selected are "bigger targets".  Similar
problems
 exist for naive implementations of the Fisher-Yates algorithm
 (http://en.wikipedia.org/wiki/Fisher-Yates_shuffle).  In other words, unless
one
 is very careful, one will not get a truly unbiased, random permutation.
 
 That said, for certain applications, what you propose may be "random enough."
 However, if you do include it, please at least document that it is biased and
 should not be used for things like monte carlo simulations or gambling, where
 people might care.


I can't believe I fell for this. I knew Fisher-Yates and used it in 
randomShuffle!

I won't put a subtly skewed distribution in the standard library. The 
quest for finding a random cover of an array with as little extra memory 
consumed and with good complexity is on! I'd appreciate any ideas.


Andrei

Feb 12 2009

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu Wrote:

 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Ok, following Leonardo's and Denis' ideas on covering an immutable array
 in random order, I thought about it and came up with one algorithm, in
 addition to the obvious one that stores one integer per element (see
 Denis' code).
 Here's the algorithm. It stores one bit per element and takes O(n log n)
 time to cover a range of length n.
 1. Given a random-access range r of length n, create a bitmap with one
 bit per element in r. The range is initially all zeros.
 2. At each step, select a random index in the _original_ array, i.e. a
 random number in 0 .. r.length. Then walk upwards (with wrapping at the
 end of the range) until you find a bit = 0 in the bitmap. Mark that bit
 = 1 and return the found element.
 3. The cover ends when all bits in the bitmap are 1.
 I did a ballpark complexity estimate by approximating the average number
 of steps in (2) with n / (n - k), where k is the number of 1s in the
 bitmap. IIRC the average number of steps to hit a 1 in a binomial
 distribution with p = (n - k) / n is 1/p. Right? In that case, the total
 number of steps taken by the algo above is:
 N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n
 - 1)) < O (n log n)
 Makes sense?
 If the range does not offer random access, I think this algorithm will
 have quadratic complexity because it needs to make a number of steps
 proportional to n at each iteration.
 Andrei


 The concern I have with this scheme is that iterating over an array with this
 cover would not produce an unbiased distribution of permutations.  A good
 shuffling algorithm will produce a perfectly uniform distribution over all
 possible permutations of the original range, assuming the underlying random
number
 generator is unbiased.  In other words, given an array of length N, every
 permutation should have an equal probability 1/N!.
 
 To get this unbiasedness, after selecting M values from a larger set N, the
 probabilities of selecting any of the remaining N - M elements as element M + 1
 must be uniform.  In this case, it is clearly not, since elements just after
 elements that have already been selected are "bigger targets".  Similar
problems
 exist for naive implementations of the Fisher-Yates algorithm
 (http://en.wikipedia.org/wiki/Fisher-Yates_shuffle).  In other words, unless
one
 is very careful, one will not get a truly unbiased, random permutation.
 
 That said, for certain applications, what you propose may be "random enough."
 However, if you do include it, please at least document that it is biased and
 should not be used for things like monte carlo simulations or gambling, where
 people might care.


 I can't believe I fell for this. I knew Fisher-Yates and used it in 
 randomShuffle!
 
 I won't put a subtly skewed distribution in the standard library. The 
 quest for finding a random cover of an array with as little extra memory 
 consumed and with good complexity is on! I'd appreciate any ideas.



The obvious baseline is to make an array with 0..n. Start on the left and swap
with a random element to the right. That's O(n) for both speed and memory. 

O(n^2) runtime algorithms are easy with a bit array.

I bet a special random generator could be built for O(1) memory at the
sacrifice of less random sequences. I think it should be possible to pick seed
numbers to a generator that will cycle through all values in a set order.

I'll think more to see if I can come up with something creative.

Feb 12 2009

Jason House <jaaon.james.house gmail.com> writes:

Jason House Wrote:

 Andrei Alexandrescu Wrote:
 
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Ok, following Leonardo's and Denis' ideas on covering an immutable array
 in random order, I thought about it and came up with one algorithm, in
 addition to the obvious one that stores one integer per element (see
 Denis' code).
 Here's the algorithm. It stores one bit per element and takes O(n log n)
 time to cover a range of length n.
 1. Given a random-access range r of length n, create a bitmap with one
 bit per element in r. The range is initially all zeros.
 2. At each step, select a random index in the _original_ array, i.e. a
 random number in 0 .. r.length. Then walk upwards (with wrapping at the
 end of the range) until you find a bit = 0 in the bitmap. Mark that bit
 = 1 and return the found element.
 3. The cover ends when all bits in the bitmap are 1.
 I did a ballpark complexity estimate by approximating the average number
 of steps in (2) with n / (n - k), where k is the number of 1s in the
 bitmap. IIRC the average number of steps to hit a 1 in a binomial
 distribution with p = (n - k) / n is 1/p. Right? In that case, the total
 number of steps taken by the algo above is:
 N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n
 - 1)) < O (n log n)
 Makes sense?
 If the range does not offer random access, I think this algorithm will
 have quadratic complexity because it needs to make a number of steps
 proportional to n at each iteration.
 Andrei


 The concern I have with this scheme is that iterating over an array with this
 cover would not produce an unbiased distribution of permutations.  A good
 shuffling algorithm will produce a perfectly uniform distribution over all
 possible permutations of the original range, assuming the underlying random
number
 generator is unbiased.  In other words, given an array of length N, every
 permutation should have an equal probability 1/N!.
 
 To get this unbiasedness, after selecting M values from a larger set N, the
 probabilities of selecting any of the remaining N - M elements as element M + 1
 must be uniform.  In this case, it is clearly not, since elements just after
 elements that have already been selected are "bigger targets".  Similar
problems
 exist for naive implementations of the Fisher-Yates algorithm
 (http://en.wikipedia.org/wiki/Fisher-Yates_shuffle).  In other words, unless
one
 is very careful, one will not get a truly unbiased, random permutation.
 
 That said, for certain applications, what you propose may be "random enough."
 However, if you do include it, please at least document that it is biased and
 should not be used for things like monte carlo simulations or gambling, where
 people might care.


 I can't believe I fell for this. I knew Fisher-Yates and used it in 
 randomShuffle!
 
 I won't put a subtly skewed distribution in the standard library. The 
 quest for finding a random cover of an array with as little extra memory 
 consumed and with good complexity is on! I'd appreciate any ideas.


 
 The obvious baseline is to make an array with 0..n. Start on the left and swap
with a random element to the right. That's O(n) for both speed and memory. 


Oops, that's wrong. It's O(n log(n)) for memory.

 
 O(n^2) runtime algorithms are easy with a bit array.


It's also possible to use a bit array for remembering what was picked but pick
f(n) elements at a time. The memory is then O(n + log(n)f(n)) and the runtime
>= O(n^2/f(n) + f(n)log(f(n))
 
 I bet a special random generator could be built for O(1) memory at the
sacrifice of less random sequences. I think it should be possible to pick seed
numbers to a generator that will cycle through all values in a set order.
 
 I'll think more to see if I can come up with something creative.

Feb 13 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jason House wrote:
 I bet a special random generator could be built for O(1) memory at
 the sacrifice of less random sequences. I think it should be possible
 to pick seed numbers to a generator that will cycle through all
 values in a set order.
 
 I'll think more to see if I can come up with something creative.


A linear congruential generator

http://en.wikipedia.org/wiki/Linear_congruential_generator

with the recurrence

x[n] = (x[n-1] * a + c) % m

has period exactly m if:

a) c and m are relatively prime

b) (a-1) divisible by all prime factors of m

c) (m-1) % 4 == 0 <= (a-1) % 4 == 0

Now most of the times people want the period to be as long as possible 
so they choose m = 2^32 and work some "good" a and m from there. In our 
situation, m is given (the length of the input range) and we need to 
compute a and c. Here's a simple method:

1. Factor m into primes. Obtain arrays p[], the prime factors, and w[], 
their powers. For example, say m = 125, then p[] = [ 3, 5 ] and w[] = [ 
1, 2 ].

2. Replace each element w[i] with a number between 1 and w[i] inclusive.

3. Construct a1=a-1 as the pointwise product w[] * p[]. If m-1 is 
divisible by 4 and a1 is not, multiply a1 by 2 or 4 as needed.

4. Construct a = a1 + 1

5. Construct c as a product of random powers of prime numbers, paying 
attention that all powers of the primes in p[] are 0.

Now we have a linear congruential generator that covers the range 0 .. m 
exactly.


Andrei

Feb 13 2009

Bill Baxter <wbaxter gmail.com> writes:

On Fri, Feb 13, 2009 at 9:37 AM, Bill Baxter <wbaxter gmail.com> wrote:
 On Fri, Feb 13, 2009 at 9:23 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Sean Kelly wrote:


 Skip ahead a random number of elements before performing the linear
 search.


 Can you prove this will yield a uniform cover?


 I don't see how it could.  If you have a linear search at all, then
 empty at the end of a row of filled elements will always have a higher
 probability of getting chosen.

 Pure dart throwing would clearly work though.  Just keep trying
 randomly till you hit an empty.  Won't perform very well, though.


If you count the total number of 1's that precede the 0 you've chosen,
you can tell how biased the choice was.  Say it was k/n when it should
have been just 1/m (n the total number, m the 0's remaining).  If you
know this,  you can systematically discard the choice and try again
with probability such that the combined probability of both choices
comes out to 1/m.  So that would be k/n * x = 1/m, thus x = n/(k*m).
So you keep the value found by linear probing with probability
n/(k*m), otherwise you try again.  Nice thing is as m is getting
smaller, typical k's are approaching n, and so towards the end you get
to keep more and more of your choices.  So it fixes the problem with
pure dart throwing where it takes a longer and longer time to find the
last few empty slots.

How's that sound?
--bb

Feb 12 2009

Bill Baxter <wbaxter gmail.com> writes:

On Fri, Feb 13, 2009 at 9:23 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Sean Kelly wrote:


 Skip ahead a random number of elements before performing the linear
 search.


 Can you prove this will yield a uniform cover?


I don't see how it could.  If you have a linear search at all, then
empty at the end of a row of filled elements will always have a higher
probability of getting chosen.

Pure dart throwing would clearly work though.  Just keep trying
randomly till you hit an empty.  Won't perform very well, though.

--bb

Feb 12 2009

Bill Baxter <wbaxter gmail.com> writes:

On Fri, Feb 13, 2009 at 7:21 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 Andrei Alexandrescu Wrote:


 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).


 Well my wording means this: in an array of length n with k "holes"
 randomly distributed, the probability one slot is a a no-hole is
 (n-k)/n. What we want is to find the first no-hole starting from a
 random position in the array. How many steps do we do on average? That
 is the same as the average number of steps of rolling a fair dice with
 (n-k) faces until we obtain a particular face. And the average number of
 steps is IIRC 1/p = n/(n-k).


Except your holes aren't randomly distributed, because you are more
likely to fill a hole next to an already filled slot.
Not sure how badly that skews the big-oh though.   Sounds like a
problem you'd find the answer to in an advanced text covering hashing,
because it sounds just like hashing with linear probing.

--bb

Feb 12 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 11 Feb 2009 19:14:38 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 bearophile wrote:
 Andrei Alexandrescu:

 What I see is that the std.algorithm has most of what you ask below,<


 kind of algorithms into a module isn't good, it's better to move the  
 random-related algorithms into std.random. It helps the memory of the  
 programmer to know where to look for, and avoids lumping too many mixed  
 things into a single module.


 It's there. Read The Fine Manual:

 http://www.digitalmars.com/d/2.0/phobos/std_random.html

 int array[100];
 auto rnd = Random(unpredictableSeed);
 auto rndElement = array[uniform(rnd, 0, 100)];


 function with this interface:
 choice(iterable)
 that works with any iterable, lazy too.
 I know that's slow code, but in many situations you don't care and you  
 want handy and safe functions. And where you need max speed, you don't  
 use it.


 That would only work with ranges that know their length, right? I don't  
 know how to select a random element of an infinite range or one that has  
 an unknown length.

 I don't like this:
 uniform(rnd, 0, 100)
 This seems better:
 rnd.uniform(0, 100)
 Then 0 can be the default lower bound:
 rnd.uniform(100)
 And for simple usages you may just want the default struct:
 uniform(100)


 Well this becomes nitpicky.

 - randInt: gives a random integer in an interval, the lower bound is  
 zero by default.


 - uniform: gives a random real/float/double in a given interval




 it may be better to have two different functions.
 (And you can optimize efficiency if you know you need an integral value  
 as result).


 I meant:

 double d = uniform(rnd, 0.0, 1.0);

 The "uniform" function infers the type of its result from its arguments.

 A normal(avg, stddev) function can be good too, where avg may be by  
 default zero again :-)
  A fastNormal(avg, stddev) function too can be useful, less precise but  
 faster.


 Where's the underlying random generator?

 - shuffled: like shuffle, but creates a copy of the items. Return an  
 array.


 randomShuffle(another, rnd);


 into an expression.
 And dup works with arrays only, so you need to put into shuffled() a  
 call to something like the array() function of my libs to have  
 something more general.


 Right. So what you're saying is that the concern of copying is separated  
 from the concern of shuffling. So the two shouldn't be collapsed  
 together unless there's an obvious necessity or benefit. Saving a line  
 of code doesn't seem convincing.


I'm with bearophile here. Not because his version is shorter, but because
1) some containers/ranges might not have a dup method
2) it is potentially faster because it may avoid unnecessary data copying
(relevant for large arrays).

That said, having both versions is preferred by me with names like shuffle and
shuffledCopy.
The same could be applied to sort - sortedCopy might be useful, too.

 - weightedChoice: like choice, but the user can specify the relative  
 probability of choosing each item.<






 Well it would be good if you read the documentation, which seems  
 increasingly clear to me you haven't.

 How does R250/521 compare with linear congruential and the Mersenne  
 Twister engine?<


 - is generates better rnd values than linear congruential, and is a bit  
 faster.
 - is faster than KISS.
 - is much much faster than Mersenne Twister.
 - needs few good rnd numbers to start, so they can be given by a  
 Mersenne Twister (they may be even hard-coded).

 Anyhow, it would be great to include more generators in Phobos, so if  
 you plan to contribute some please let me know.<


 there only few of them, that the programmer can tell apart in a simple  
 way.


 Well it looks like everyone thinks their favorite generator is the  
 "best". As far as Phobos is concerned, the source for the design of  
 std.random is inspired from  
 http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2461.pdf, which  
 goes for defining several generators.


 Andrei

Feb 11 2009

"Denis Koroskin" <2korden gmail.com> writes:

------------1nI8AWoDhf6PfJiCYPO0MD
Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8
Content-Transfer-Encoding: 7bit

On Thu, 12 Feb 2009 07:46:31 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Andrei Alexandrescu wrote:
 Denis Koroskin wrote:
 [about shuffling ranges]
 I'm with bearophile here. Not because his version is shorter, but  
 because
 1) some containers/ranges might not have a dup method
 2) it is potentially faster because it may avoid unnecessary data  
 copying (relevant for large arrays).

 That said, having both versions is preferred by me with names like  
 shuffle and shuffledCopy.
 The same could be applied to sort - sortedCopy might be useful, too.


 becomes a sortedCopy for n = input.length. But perhaps an explicit  
 function would drive the point home better.
  About shuffleCopy, you and Leonardo are both right that it could be  
 significantly faster than copying and then shuffling the copy. The  
 solution that I see most in the spirit of the new range design is to  
 have a ShuffledRange that, given another range, iterates it in a random  
 manner. An ordinary copy() call closes the deal if a copy is needed.


 Wait, I take that back. I don't know of solid ways to sort into a copy  
 or shuffle into a copy. For shuffling I'd need auxiliary memory (in  
 addition to the copy itself), and for sorting I'd be better off copying  
 and then sorting in place. Could anyone illuminate me about better ways?

 Andrei


That's why I said it /might/ be faster. It doesn't have to, the default  
implementation may look as follows:

typeof(Range.first)[] shuffledCopy(Range)(Range range)
{
     return shuffle(createCopy(range));
}

However, it may be implemented quite differently if not inplace. For this  
to work we need two primitives that I didn't find in std.range (I know  
names are not good, feel free to rename them as you please):

struct RandomNumbers(Rnd = Random)
{
     // ...
}

A finite range of length 'max' that generates random numbers in interval  
[0..max) without repeating. An example:

     Random rnd;
     auto gen = RandomNumbers(10, rnd);

     foreach (n; gen) {
         writefln(n);
     }

Sample output:

2
6
8
1
4
7
5
9
0
3

Sample implementation is attached.

Next we need RandomItems - a finite range the acepts a random access range  
and returns random elements one-by-one without repeating:

struct RandomItems(Range, Rnd = Random)
{
    // ...
}

It makes use of RandomNumbers generator and is very simple (sample source  
attached, too). Now we can create shuffledCopy easily:

auto shuffledCopy(Range)(Range range)
{
     auto gen = RandomItems!(Range)(range);
     typeof(range[0])[] copy = new typeof(range[0])[range.length];

     foreach (ref elem; copy) {
         elem = gen.head;
         gen.next();
     }

     return copy;
}
------------1nI8AWoDhf6PfJiCYPO0MD
Content-Disposition: attachment; filename=test.d
Content-Type: application/octet-stream; name=test.d
Content-Transfer-Encoding: Base64

aW1wb3J0IHN0ZC5zdGRpbzsNCmltcG9ydCBzdGQucmFuZG9tOw0KaW1wb3J0IHN0
ZC5jLnN0ZGxpYjsNCg0Kc3RydWN0IFJhbmRvbU51bWJlcnMoUm5kID0gUmFuZG9t
KQ0Kew0KICAgIHByaXZhdGUgaW50W10gX2l0ZW1zOw0KICAgIHByaXZhdGUgaW50
IF9vZmZzZXQgPSAwOw0KICAgIHByaXZhdGUgUm5kIF9ybmQ7DQoNCiAgICB0aGlz
KGludCBtYXgsIFJuZCBybmQgPSBSbmQuaW5pdCwgaW50W10gYnVmZmVyID0gbnVs
bCkgLy8gc2hvdWxkIGJlICdybmQgPSBSbmQoKScgYnV0IGRvZXNuJ3Qgd29yaw0K
ICAgIHsNCiAgICAgICAgX2l0ZW1zID0gYnVmZmVyOw0KICAgICAgICBfaXRlbXMu
bGVuZ3RoID0gbWF4Ow0KDQogICAgICAgIF9ybmQgPSBybmQ7DQogICAgICAgIGdl
bigpOw0KICAgIH0NCg0KICAgIGludCBoZWFkKCkNCiAgICB7DQogICAgICAgIHJl
dHVybiBhdChfb2Zmc2V0KTsNCiAgICB9DQoNCiAgICB2b2lkIG5leHQoKQ0KICAg
IHsNCiAgICAgICAgaW50IGxlbiA9IF9pdGVtcy5sZW5ndGggLSAxOw0KICAgICAg
ICBpZiAobGVuICE9IDApIHsNCiAgICAgICAgICAgIF9pdGVtc1tfb2Zmc2V0XSA9
IGF0KGxlbikgLSBfb2Zmc2V0Ow0KICAgICAgICAgICAgX2l0ZW1zLmxlbmd0aCA9
IGxlbjsNCiAgICAgICAgICAgIGdlbigpOw0KICAgICAgICB9IGVsc2Ugew0KICAg
ICAgICAgICAgX2l0ZW1zLmxlbmd0aCA9IDA7DQogICAgICAgIH0NCiAgICB9DQoN
CiAgICBib29sIGVtcHR5KCkNCiAgICB7DQogICAgICAgIHJldHVybiBsZW5ndGgg
PT0gMDsNCiAgICB9DQogICAgDQogICAgaW50IGxlbmd0aCgpDQogICAgew0KICAg
ICAgICByZXR1cm4gX2l0ZW1zLmxlbmd0aDsNCiAgICB9DQogICAgDQogICAgcHJp
dmF0ZSB2b2lkIGdlbigpDQogICAgew0KICAgICAgICBfb2Zmc2V0ID0gX3JuZC5u
ZXh0KCkgJSBsZW5ndGg7DQogICAgfQ0KICAgIA0KICAgIHByaXZhdGUgaW50IGF0
KGludCBpbmRleCkNCiAgICB7DQogICAgICAgIHJldHVybiBpbmRleCArIF9pdGVt
c1tpbmRleF07DQogICAgfQ0KfQ0KDQpzdHJ1Y3QgUmFuZG9tSXRlbXMoUmFuZ2Us
IFJuZCA9IFJhbmRvbSkNCnsNCiAgICBwcml2YXRlIFJhbmRvbU51bWJlcnMhKFJu
ZCkgX2dlbjsNCiAgICBwcml2YXRlIFJhbmdlIF9yYW5nZTsNCg0KICAvLyBidWc6
IHRoZSBmb2xsb3dpbmcgZG9lc24ndCBjb21waWxlOg0KICAvL3RoaXMoUmFuZ2Ug
cmFuZ2UsIFJuZCByYW5kb20gPSBSbmQuaW5pdCwgdHlwZW9mKHJhbmdlWzBdKVtd
IGJ1ZmZlciA9IG51bGwpIA0KICAgIHRoaXMoUmFuZ2UgcmFuZ2UsIFJuZCByYW5k
b20gPSBSbmQuaW5pdCwgdHlwZW9mKF9yYW5nZVswXSlbXSBidWZmZXIgPSBudWxs
KSANCiAgICB7DQogICAgICAgIF9yYW5nZSA9IHJhbmdlOw0KICAgICAgICBfZ2Vu
ID0gUmFuZG9tTnVtYmVycyEoUm5kKShyYW5nZS5sZW5ndGgsIHJhbmRvbSwgYnVm
ZmVyKTsNCiAgICB9DQoNCiAgICB0eXBlb2YoX3JhbmdlWzBdKSBoZWFkKCkNCiAg
ICB7DQogICAgICAgIHJldHVybiBfcmFuZ2VbX2dlbi5oZWFkXTsNCiAgICB9DQoN
CiAgICB2b2lkIG5leHQoKQ0KICAgIHsNCiAgICAgICAgX2dlbi5uZXh0KCk7DQog
ICAgfQ0KDQogICAgYm9vbCBlbXB0eSgpDQogICAgew0KICAgICAgICByZXR1cm4g
X2dlbi5lbXB0eTsNCiAgICB9DQogICAgDQogICAgaW50IGxlbmd0aCgpDQogICAg
ew0KICAgICAgICByZXR1cm4gX2dlbi5sZW5ndGg7DQogICAgfQ0KfQ0KDQphdXRv
IHNodWZmbGVkQ29weShSYW5nZSkoUmFuZ2UgcmFuZ2UpDQp7DQogICAgLy8gTm90
IGFuIGVzc2VudGlhbCBwYXJ0IC0gY3JlYXRlIGEgdGVtcG9yYXJ5IGJ1ZmZlciBz
byB0aGF0IFJhbmRvbUl0ZW1zIGRvbid0IGFsbG9jYXRlDQogICAgaW50KiBwdHIg
PSBjYXN0KGludCopYWxsb2NhKGludC5zaXplb2YgKiByYW5nZS5sZW5ndGgpOw0K
ICAgIGludFtdIGJ1ZmZlciA9IChwdHIgaXMgbnVsbCkgPyBuZXcgaW50W3Jhbmdl
Lmxlbmd0aF0gOiBwdHJbMC4ucmFuZ2UubGVuZ3RoXTsNCg0KICAgIC8vIEJVRzog
VW5jb21tZW50IHRoZSBmb2xsb3dpbmcgbGluZSBhbmQgaXQgd2lsbCBjcmFzaCBh
dCBydW50aW1lOg0KICAgIC8vIFJhbmRvbSBybmQ7DQogICAgYXV0byBnZW4gPSBS
YW5kb21JdGVtcyEoUmFuZ2UpKHJhbmdlLCBSYW5kb20uaW5pdCwgYnVmZmVyKTsN
CiAgICB0eXBlb2YocmFuZ2VbMF0pW10gY29weSA9IG5ldyB0eXBlb2YocmFuZ2Vb
MF0pW3JhbmdlLmxlbmd0aF07IC8vIFRPRE86IGFsbG9jYXRlIHVuaW5pdGlhbGl6
ZWQgbWVtb3J5IGJsb2NrPw0KDQogICAgZm9yZWFjaCAocmVmIGVsZW07IGNvcHkp
IHsNCiAgICAgICAgZWxlbSA9IGdlbi5oZWFkOw0KICAgICAgICBnZW4ubmV4dCgp
Ow0KICAgIH0NCg0KICAgIHJldHVybiBjb3B5Ow0KfQ0KDQp2b2lkIG1haW4oKQ0K
ew0KICAgIFJhbmRvbSBybmQ7DQogICAgYXV0byBnZW4gPSBSYW5kb21OdW1iZXJz
IShSYW5kb20pKDEwLCBybmQpOyAvLyBhbGxvY2F0ZXMNCg0KICAgIGZvcmVhY2gg
KG47IGdlbikgew0KICAgICAgICB3cml0ZWZsbihuKTsNCiAgICB9DQoNCiAgICB3
cml0ZWZsbigiLS0tLSIpOw0KDQogICAgYXV0byBpdGVtcyA9IFswLCAxLCAyLCAz
LCA0LCA1LCA2LCA3LCA4LCA5XTsNCiAgICBhdXRvIGdlbjIgPSBSYW5kb21JdGVt
cyEodHlwZW9mKGl0ZW1zKSkoaXRlbXMpOw0KDQogICAgZm9yZWFjaCAobjsgZ2Vu
Mikgew0KICAgICAgICB3cml0ZWZsbihuKTsNCiAgICB9DQogICAgDQogICAgd3Jp
dGVmbG4oIi0tLS0iKTsNCiAgICANCiAgICBhdXRvIHNodWZmbGVkID0gc2h1ZmZs
ZWRDb3B5KGl0ZW1zKTsNCiAgICBmb3JlYWNoIChuOyBzaHVmZmxlZCkgew0KICAg
ICAgICB3cml0ZWZsbihuKTsNCiAgICB9DQp9

------------1nI8AWoDhf6PfJiCYPO0MD--

Feb 12 2009

Steve Schveighoffer <schveiguy yahoo.com> writes:

On Thu, 12 Feb 2009 14:41:26 +0300, Denis Koroskin wrote:

 On Thu, 12 Feb 2009 07:46:31 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Wait, I take that back. I don't know of solid ways to sort into a copy
 or shuffle into a copy. For shuffling I'd need auxiliary memory (in
 addition to the copy itself), and for sorting I'd be better off copying
 and then sorting in place. Could anyone illuminate me about better
 ways?

 Andrei


 That's why I said it /might/ be faster. It doesn't have to, the default
 implementation may look as follows:
 
 typeof(Range.first)[] shuffledCopy(Range)(Range range) {
      return shuffle(createCopy(range));
 }
 
 However, it may be implemented quite differently if not inplace. For
 this to work we need two primitives that I didn't find in std.range (I
 know names are not good, feel free to rename them as you please):
 
 struct RandomNumbers(Rnd = Random)
 {
      // ...
 }
 
 A finite range of length 'max' that generates random numbers in interval
 [0..max) without repeating. An example:


That's the problem.  How do you do this without remembering all the 
number you have returned so far (i.e. generate an array of N integers)?  
I've done algorithms like this, and generally you have to mutate the 
source array (swapping used items with the item at the location you took 
from).  You may save some time on swapping by building a temporary array 
of pointers or indexes, but you still have to build a temporary array, 
which is what Andrei was saying.

-Steve

Feb 12 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Thu, 12 Feb 2009 15:14:34 +0300, Steve Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 12 Feb 2009 14:41:26 +0300, Denis Koroskin wrote:

 On Thu, 12 Feb 2009 07:46:31 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Wait, I take that back. I don't know of solid ways to sort into a copy
 or shuffle into a copy. For shuffling I'd need auxiliary memory (in
 addition to the copy itself), and for sorting I'd be better off copying
 and then sorting in place. Could anyone illuminate me about better
 ways?

 Andrei


 That's why I said it /might/ be faster. It doesn't have to, the default
 implementation may look as follows:

 typeof(Range.first)[] shuffledCopy(Range)(Range range) {
      return shuffle(createCopy(range));
 }

 However, it may be implemented quite differently if not inplace. For
 this to work we need two primitives that I didn't find in std.range (I
 know names are not good, feel free to rename them as you please):

 struct RandomNumbers(Rnd = Random)
 {
      // ...
 }

 A finite range of length 'max' that generates random numbers in interval
 [0..max) without repeating. An example:


 That's the problem.  How do you do this without remembering all the
 number you have returned so far (i.e. generate an array of N integers)?
 I've done algorithms like this, and generally you have to mutate the
 source array (swapping used items with the item at the location you took
 from).  You may save some time on swapping by building a temporary array
 of pointers or indexes, but you still have to build a temporary array,
 which is what Andrei was saying.

 -Steve


Well, I've attached the source code, so take a look at it. I don't  
remember all the values returned so far but I still use O(N) temporary  
storage.

Copying and modifying array of int is often preffered to copying and  
modifying and array of structs (they can be big, have heavyweight  
assignment operators etc).

Even if shuffledCopy won't get into algorithms, RandomNumbers/RandomItems  
are still pretty useful ranges.

Feb 12 2009

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Fri, 13 Feb 2009 15:23:07 +0100, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 1. Factor m into primes. Obtain arrays p[], the prime factors, and w[],  
 their powers. For example, say m = 125, then p[] = [ 3, 5 ] and w[] = [  
 1, 2 ].


You meant for m = 75, right? Otherwise, p[] = [ 5 ] and w[] = [ 3 ].

--
Simen

Feb 13 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Re: std.string and ranges