www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: std.string and ranges

reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 Yes, std.string must be revamped too.

From what you say in several of your last posts it seems std will be almost fully rewritten. I think this is positive, and while sometimes I think D2 language (and its std lib) is becoming too much complex for my tastes (I have appreciated D1 a lot because it's simple enough and it gives me quite more than what I give to it. D2 asks me quite more, but gives me a bit more), I like and silently agree with most of the things you want to do to improve the std lib. It seems D2 is becoming a language really different from D1, so I think Tango for D2 will need very large changes. It may be positive to copy few things from Tango, like BigInt, etc to the future std of D2. I think the current std.random needs many changes (I have expressed my ideas on this in a past email but if you want I can say them again). I may want to work a little to help you develop such std lib of D2, for example the std.random, but sometimes the things you say seem a bit too much sophisticated for me, so I may not be able to help much in practice... What I have recently said regarding the lazy generators can be applied to other things: having two ways to do things is generally bad, but if one of those ways is really easy and short to write and the other is very flexible and fast, then two ways to do something may coexist. For example this idea can applied to the foreach, the std.random, etc. I also hope to see the printing function(s) of D2 improved in the ways I have explained you recently. The bad thing is that beside you I think no one else has expressed a comment regarding those functions. So I don't know if others like those ideas. Bye, bearophile
Feb 08 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 Yes, std.string must be revamped too.

From what you say in several of your last posts it seems std will be almost fully rewritten.

Well at some point I hope to be able to rally a few volunteers to help.
 I think this is positive, and while sometimes I think D2 language
 (and its std lib) is becoming too much complex for my tastes (I have
 appreciated D1 a lot because it's simple enough and it gives me quite
 more than what I give to it. D2 asks me quite more, but gives me a
 bit more), I like and silently agree with most of the things you want
 to do to improve the std lib. It seems D2 is becoming a language
 really different from D1, so I think Tango for D2 will need very
 large changes.
 
 It may be positive to copy few things from Tango, like BigInt, etc to
 the future std of D2.

Yah, if only the political barriers would fall. (BigInt in particular is written by Don and will be in Phobos2 as well.)
 I think the current std.random needs many changes (I have expressed
 my ideas on this in a past email but if you want I can say them
 again). I may want to work a little to help you develop such std lib
 of D2, for example the std.random, but sometimes the things you say
 seem a bit too much sophisticated for me, so I may not be able to
 help much in practice...

I searched google for bearophile std.random site:digitalmars.com and couldn't find the message, so I'd appreciate a link. In my view std.random needs some obvious changes such as obeying the range interface (i.e., any random generator is an infinite range). I effected that, and also simplified uniform a bit such that you don't ever need to pass explicit template parameters into it. There's plenty of more work to do on random, particularly on generating non-uniform distributions, and probably others so I'd appreciate your input.
 What I have recently said regarding the lazy generators can be
 applied to other things: having two ways to do things is generally
 bad, but if one of those ways is really easy and short to write and
 the other is very flexible and fast, then two ways to do something
 may coexist. For example this idea can applied to the foreach, the
 std.random, etc.

I'm not sure I understand this.
 I also hope to see the printing function(s) of D2 improved in the
 ways I have explained you recently. The bad thing is that beside you
 I think no one else has expressed a comment regarding those
 functions. So I don't know if others like those ideas.

I think your ideas on improving I/O are very sensible. (I wouldn't want to format tuples the same way though.) Andrei
Feb 10 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

Well at some point I hope to be able to rally a few volunteers to help.<

I hope to be able to understand how to do the things :-)
I searched google for bearophile std.random site:digitalmars.com and couldn't
find the message, so I'd appreciate a link<

I can just repeat the things I have said, improving them.
There's plenty of more work to do on random, particularly on generating
non-uniform distributions, and probably others so I'd appreciate your input.<

In my dlibs there's a random module too, I have not finished it, but it already contains some useful stuff. http://www.fantascienza.net/leonardo/so/dlibs/random.html A module like that in a standard lib needs functions like: - choice: given an iterable of items, picks one at random - randInt: gives a random integer in an interval, the lower bound is zero by default. - shuffle: to shuffle randomly a given array (or range with random access, I'd say) in place. - shuffled: like shuffle, but creates a copy of the items. Return an array. - uniform: gives a random real/float/double in a given interval - sample: extracts n items from an iterable with uniform distribution, without replacement - weightedChoice: like choice, but the user can specify the relative probability of choosing each item. I can assure you that such functions are hugely useful in tons of programs. Yet, you don't need more than few days to implement all those things (I already have implementations of most of them). Few basic and easy distributions may be useful too. A random module has to be used by lot of different people in many different situations: - Some people need very good random numbers, even if they are slow to produce. Other people may also need a thread safe random generator. - Some people need just to write a 15-lines long D program that looks like a script. Their code often doesn't need to be very fast, but the code must be as simple as possible, and the faster possible to write, so the syntax has to be short and very intuitive. The programmer doesn't want to read the docs to write such code. - Some people need to produce tons of random numbers, as fast as possible, but they don't need to be high quality. In such situation thread safety may be less important. - Some people just need to write safe and fast code, something normal. So to fulfill all such purposes I think the random module needs needs three different random generators: - A really fast one. I think R250/521 is the best for this purpose, it's faster than the usual one used in C and it's much better. It needs a certain amount of good random numbers to start, that can be given by the other slower and better random generators. Such code is designed to be as fast as possible. - A multi-purpose average random generator, very easy to use, so it doesn't need to instantiate a class if you are writing a single threaded program, you just use simple functions. A generator like KISS of Tango is good here, but the API can be made simpler and very easy. - A very good generator, thread safe. For this a variant of the Mersenne Twister is good. Info about R250/521: http://www.informs-cs.org/wsc97papers/0127.PDF C implementation of R250/521: /* Copyright (c) 2006 the authors listed at the following URL: http://literateprograms.org/R250/521_(C)?action=history&offset=20060711142344 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Retrieved from: http://literateprograms.org/R250/521_(C)?oldid=6914 */ #include <stdlib.h> #define R250_LEN 250 #define R521_LEN 521 #define ULONG_BITS (8*sizeof(unsigned long)) unsigned long r250_buffer[R250_LEN]; unsigned long r521_buffer[R521_LEN]; int r250_index; int r521_index; void r250_521_init() { unsigned long mask1 = 1; unsigned long mask2 = (unsigned long)(-1L); int i = R521_LEN; while (i-- > R250_LEN) { r521_buffer[i] = rand(); } while (i-- > ULONG_BITS) { r250_buffer[i] = rand(); r521_buffer[i] = rand(); } while (i-- > 0) { r250_buffer[i] = (rand() | mask1) & mask2; r521_buffer[i] = (rand() | mask1) & mask2; mask2 ^= mask1; mask1 >>= 1; } r250_buffer[0] = mask1; r521_buffer[0] = mask2; r250_index = 0; r521_index = 0; } #define R250_IA (sizeof(unsigned long)*103) #define R250_IB (sizeof(unsigned long)*R250_LEN - R250_IA) #define R521_IA (sizeof(unsigned long)*168) #define R521_IB (sizeof(unsigned long)*R521_LEN - R521_IA) unsigned long r250_521_random() { unsigned long r, s; unsigned char * b1 = (unsigned char*)r250_buffer; unsigned char * b2 = (unsigned char*)r521_buffer; unsigned long * tmp1, * tmp2; int i1 = r250_index; int i2 = r521_index; int j1, j2; j1 = i1 - R250_IB; if (j1 < 0) j1 = i1 + R250_IA; j2 = i2 - R521_IB; if (j2 < 0) j2 = i2 + R521_IA; tmp1 = (unsigned long *)(b1 + i1); r = (*(unsigned long *)(b1 + j1)) ^ (*tmp1); *tmp1 = r; tmp2 = (unsigned long *)(b2 + i2); s = (*(unsigned long *)(b2 + j2)) ^ (*tmp2); *tmp2 = s; i1 = (i1 != sizeof(unsigned long)*(R250_LEN-1)) ? (i1 + sizeof(unsigned long)) : 0; r250_index = i1; i2 = (i2 != sizeof(unsigned long)*(R521_LEN-1)) ? (i2 + sizeof(unsigned long)) : 0; r521_index = i2; return r ^ s; }
 What I have recently said regarding the lazy generators can be applied to
other things: having two ways to do things is generally bad, but if one of
those ways is really easy and short to write and the other is very flexible and
fast, then two ways to do something may coexist. For example this idea can
applied to the foreach, the std.random, etc.<<


I'm not sure I understand this.<

I have expressed that idea once in the past too, and I have just explained an example of it regarding std.random. I can try again to explain myself. When you design a feature in a language you have to find compromises among different needs: - Such feature needs a short and easy syntax that is not error-prone. - Such feature has to be efficient in space and time, and very flexible. Often such two purposes are almost opposite, it's not easy to have both at the same time. A language like C++ often choses the second, and a language like Python or Ruby the first (but there are many exceptions). So I may be tempted to ask you to put two different implementations of such feature: - One very easy to use, short and easy to remember, not much efficient. - One more flexible, more complex, less easy to remember (you may have to look into the docs to use it), very effficient. But generally in a language duplicating things and having two ways to do the same thing is bad, it's extra complexity. Let's call C1 the complexity (for the programmer) to use the first way, and C2 the complexity of using the second way. In several situations I have seen that if C1 <<< C2 then C1+C2 ~= C2, so C1 doesn't add much complex to the language, so they both can coexist.
I think your ideas on improving I/O are very sensible.<

I/O is used during debugging, logging, etc. What I'd like is to have a very nice, very readable, little/unambigous way to show any built-in type. I may also want two ways to represent such data, a readable one and one for ASCII serialization (but readable still, if possible), as in Python. Note this may need another default method beside toString, somethign like toRepr, or toSerialize, that returns a textual representation that given to the object (to a standard method, if you want, like fromSerialize? or fromRepr), re-creates the original built-in data or object. I also print things often, so the find the printing syntax used by Tango confusing, and too much long. I think function names like put/putr/putf/putfr are the best, because they are clean, short to type, etc. put prints, put prints with a newline, and the versions with f format too in some way (using % as in C or {} as in C#/Tango, C# may be better here).
(I wouldn't want to format tuples the same way though.)<

Well, nothing I have done is perfect, everything can be improved. Do you mean structs or tuples? If you give a tuple to one of my printing functions you just receive a string-fication of all the items, with no spaces beteen them. It's a bare-bones thing. My put/putr show a standard representation of structs, if they miss a toString. I'd also like every struct to have a built in copy, toHash, opEquals (they already have this), a default toString, a default opCmp too, etc. My record()/Record!() structs already have such qualities, even recursively, even if you nest normal structs into them. Bye, bear hugs, bearophile
Feb 10 2009
next sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
bearophile, el 10 de febrero a las 15:50 me escribiste:
 Andrei Alexandrescu:
 
Well at some point I hope to be able to rally a few volunteers to help.<

I hope to be able to understand how to do the things :-)
I searched google for bearophile std.random site:digitalmars.com and couldn't
find the message, so I'd appreciate a link<

I can just repeat the things I have said, improving them.
There's plenty of more work to do on random, particularly on generating
non-uniform distributions, and probably others so I'd appreciate your input.<

In my dlibs there's a random module too, I have not finished it, but it already contains some useful stuff. http://www.fantascienza.net/leonardo/so/dlibs/random.html A module like that in a standard lib needs functions like:

http://docs.python.org/library/random.html#module-random ;) -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Skepticism is the beginning of failure
Feb 11 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Leandro Lucarella:
 A module like that in a standard lib needs functions like:

http://docs.python.org/library/random.html#module-random ;)

You can notice some differences, shuffle/shuffled, and weightedChoice are different or missing. You can also see I have not included other things like randRange. I have also discussed about thread safety, about having three different rnd generators in the std lib, etc. Bye, bearophile
Feb 11 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 There's plenty of more work to do on random, particularly on generating
non-uniform distributions, and probably others so I'd appreciate your input.<

In my dlibs there's a random module too, I have not finished it, but it already contains some useful stuff. http://www.fantascienza.net/leonardo/so/dlibs/random.html

Are you sure we're looking at the same libraries? What I see is that the std.algorithm has most of what you ask below, and that dlibs random is rather thin.
 A module like that in a standard lib needs functions like:
 - choice: given an iterable of items, picks one at random

int array[100]; auto rnd = Random(unpredictableSeed); auto rndElement = array[uniform(rnd, 0, 100)];
 - randInt: gives a random integer in an interval, the lower bound is zero by
default.

auto randInt = uniform(rnd, 0, 10);
 - shuffle: to shuffle randomly a given array (or range with random access, I'd
say) in place.

randomShuffle(array, rnd);
 - shuffled: like shuffle, but creates a copy of the items. Return an array.

auto another = array.dup; randomShuffle(another, rnd);
 - uniform: gives a random real/float/double in a given interval

double d = uniform(rnd, 0, 1);
 - sample: extracts n items from an iterable with uniform distribution, without
replacement

I'll add this.
 - weightedChoice: like choice, but the user can specify the relative
probability of choosing each item.

auto x = dice(rnd, 70, 20, 10);
 I can assure you that such functions are hugely useful in tons of programs.
Yet, you don't need more than few days to implement all those things (I already
have implementations of most of them).

Again, I'm looking at dlibs/random and std/random and am a bit confused. I should be saying what you're saying! Are you sure you're not talking about rand in D 1.0 and about a dlibs from the future?
 Few basic and easy distributions may be useful too.

That I'd definitely appreciate your help with. For example, things like Gaussian and Poisson distribution would be very useful for many machine learning applications.
 A random module has to be used by lot of different people in many different
situations:
 - Some people need very good random numbers, even if they are slow to produce.
Other people may also need a thread safe random generator.
 - Some people need just to write a 15-lines long D program that looks like a
script. Their code often doesn't need to be very fast, but the code must be as
simple as possible, and the faster possible to write, so the syntax has to be
short and very intuitive. The programmer doesn't want to read the docs to write
such code.
 - Some people need to produce tons of random numbers, as fast as possible, but
they don't need to be high quality. In such situation thread safety may be less
important.
 - Some people just need to write safe and fast code, something normal.
 
 So to fulfill all such purposes I think the random module needs needs three
different random generators:
 - A really fast one. I think R250/521 is the best for this purpose, it's
faster than the usual one used in C and it's much better. It needs a certain
amount of good random numbers to start, that can be given by the other slower
and better random generators. Such code is designed to be as fast as possible.

std.random offers MinStdRand0 and MinStdRand, two linear congruential random generators that are fast. Also you can use LinearCongruentialEngine to create your own, with different parameters.
 - A multi-purpose average random generator, very easy to use, so it doesn't
need to instantiate a class if you are writing a single threaded program, you
just use simple functions. A generator like KISS of Tango is good here, but the
API can be made simpler and very easy.

Perhaps there is some merit in having some random generation that doesn't involve creating an object. I don't think it's paramount, but it's a marginal convenience.
 - A very good generator, thread safe. For this a variant of the Mersenne
Twister is good.

Like std.random.MersenneTwisterEngine I presume :o).
 Info about R250/521:
 http://www.informs-cs.org/wsc97papers/0127.PDF

How does R250/521 compare with linear congruential and the Mersenne Twister engine? A cursory look failed to answer that question, and I don't have time for a closer read. Anyhow, it would be great to include more generators in Phobos, so if you plan to contribute some please let me know. The planned interface is similar with the range interface. Andrei
Feb 11 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

What I see is that the std.algorithm has most of what you ask below,<

"shuffle"/"shuffler" are of course algorithms, but I think mixing all kind of algorithms into a module isn't good, it's better to move the random-related algorithms into std.random. It helps the memory of the programmer to know where to look for, and avoids lumping too many mixed things into a single module.
 int array[100];
 auto rnd = Random(unpredictableSeed);
 auto rndElement = array[uniform(rnd, 0, 100)];

That's just a starting point, now you can wrap that into a nice single function with this interface: choice(iterable) that works with any iterable, lazy too. I know that's slow code, but in many situations you don't care and you want handy and safe functions. And where you need max speed, you don't use it. I don't like this: uniform(rnd, 0, 100) This seems better: rnd.uniform(0, 100) Then 0 can be the default lower bound: rnd.uniform(100) And for simple usages you may just want the default struct: uniform(100)
 - randInt: gives a random integer in an interval, the lower bound is zero by
default.

 - uniform: gives a random real/float/double in a given interval


How can uniform return ints or doubles according to the input? I think it may be better to have two different functions. (And you can optimize efficiency if you know you need an integral value as result). A normal(avg, stddev) function can be good too, where avg may be by default zero again :-) A fastNormal(avg, stddev) function too can be useful, less precise but faster.
 - shuffled: like shuffle, but creates a copy of the items. Return an array.

randomShuffle(another, rnd);

shuffled() is meant to be used in a single line, so you can use it into an expression. And dup works with arrays only, so you need to put into shuffled() a call to something like the array() function of my libs to have something more general.
 - weightedChoice: like choice, but the user can specify the relative
probability of choosing each item.<


I don't understand that interface, this isn't good. I was talking about a function that given an iterable of items and an iterable of frequences (or probabilities) returns an item at random according to the given probabilities.
How does R250/521 compare with linear congruential and the Mersenne Twister
engine?<

R250/521: - is generates better rnd values than linear congruential, and is a bit faster. - is faster than KISS. - is much much faster than Mersenne Twister. - needs few good rnd numbers to start, so they can be given by a Mersenne Twister (they may be even hard-coded).
Anyhow, it would be great to include more generators in Phobos, so if you plan
to contribute some please let me know.<

I think it's better to not overdo the number of generators, and put there only few of them, that the programmer can tell apart in a simple way. I think 3 generators like R250/521 (that you can choose to use a set of free functions or as struct methods), KISS (two iterfaces), and Mersenne Twister (only the thread safe API) may cover most basic usages and some of nonbasic ones. Then it's easy to help the programmer to tell them apart, you can sort them just according to their speed: fastRnd ==> R250/521 rnd ==> KISS slowRnd ==> Mersenne Twister (if you think that people will not use the third if you put "show" there, then you can call it preciseRND, but it's a longer name). Later it can be added a way to generate huge random BigInts too. Bye, bearophile
Feb 11 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 
 What I see is that the std.algorithm has most of what you ask below,<

"shuffle"/"shuffler" are of course algorithms, but I think mixing all kind of algorithms into a module isn't good, it's better to move the random-related algorithms into std.random. It helps the memory of the programmer to know where to look for, and avoids lumping too many mixed things into a single module.

It's there. Read The Fine Manual: http://www.digitalmars.com/d/2.0/phobos/std_random.html
 int array[100];
 auto rnd = Random(unpredictableSeed);
 auto rndElement = array[uniform(rnd, 0, 100)];

That's just a starting point, now you can wrap that into a nice single function with this interface: choice(iterable) that works with any iterable, lazy too. I know that's slow code, but in many situations you don't care and you want handy and safe functions. And where you need max speed, you don't use it.

That would only work with ranges that know their length, right? I don't know how to select a random element of an infinite range or one that has an unknown length.
 I don't like this:
 uniform(rnd, 0, 100)
 This seems better:
 rnd.uniform(0, 100)
 Then 0 can be the default lower bound:
 rnd.uniform(100)
 And for simple usages you may just want the default struct:
 uniform(100)

Well this becomes nitpicky.
 - randInt: gives a random integer in an interval, the lower bound is zero by
default.

 - uniform: gives a random real/float/double in a given interval


How can uniform return ints or doubles according to the input? I think it may be better to have two different functions. (And you can optimize efficiency if you know you need an integral value as result).

I meant: double d = uniform(rnd, 0.0, 1.0); The "uniform" function infers the type of its result from its arguments.
 A normal(avg, stddev) function can be good too, where avg may be by default
zero again :-)
 
 A fastNormal(avg, stddev) function too can be useful, less precise but faster.

Where's the underlying random generator?
 - shuffled: like shuffle, but creates a copy of the items. Return an array.

randomShuffle(another, rnd);

shuffled() is meant to be used in a single line, so you can use it into an expression. And dup works with arrays only, so you need to put into shuffled() a call to something like the array() function of my libs to have something more general.

Right. So what you're saying is that the concern of copying is separated from the concern of shuffling. So the two shouldn't be collapsed together unless there's an obvious necessity or benefit. Saving a line of code doesn't seem convincing.
 - weightedChoice: like choice, but the user can specify the relative
probability of choosing each item.<


I don't understand that interface, this isn't good.

Well it would be good if you read the documentation, which seems increasingly clear to me you haven't.
 How does R250/521 compare with linear congruential and the Mersenne Twister
engine?<

R250/521: - is generates better rnd values than linear congruential, and is a bit faster. - is faster than KISS. - is much much faster than Mersenne Twister. - needs few good rnd numbers to start, so they can be given by a Mersenne Twister (they may be even hard-coded).
 Anyhow, it would be great to include more generators in Phobos, so if you plan
to contribute some please let me know.<

I think it's better to not overdo the number of generators, and put there only few of them, that the programmer can tell apart in a simple way.

Well it looks like everyone thinks their favorite generator is the "best". As far as Phobos is concerned, the source for the design of std.random is inspired from http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2461.pdf, which goes for defining several generators. Andrei
Feb 11 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

I don't know how to select a random element of an infinite range or one that
has an unknown length.<

The programmer has to take care of not giving an infinite range, because it may lead to an unbounded time to wait. If the range is finite, but its length isn't know, there are algorithms to extract an item randomly with uniform probability. (Do you remember your quiz to extract a random line from a file, where at the beginning you don't know how many lines it has?)
Well this becomes nitpicky.<

You are right, I am sorry, I was getting a bit nervous. You are usually gentle with me, and this helps me think much better. Maybe this time you are nervous/angry because I have not read the docs of sts.random of Phobos of D2.
 double d = uniform(rnd, 0.0, 1.0);
 The "uniform" function infers the type of its result from its arguments.

If you have forgotten to put a .0 there in a small example, then it looks a bit too much bug-prone. I think I prefer two functions still, it's also useful as self-documentation of the program.
Where's the underlying random generator?<

If not specified it uses the default non-thread-safe generator (something like KISS in my suggestions, that is the middle one).
Saving a line of code doesn't seem convincing.<

It's not a line of code, shuffled() is an expression, so you can put it inside other expressions, it may save more than one line. One line here and one line there make the difference. Having only functionally atomic functions/things is useful because you can combine them, but when you program you may see that 80% of the times you want to combine then in small amount of certain specific way. So it's positive to add such common idioms as handy functions (that can also be used as an expression too).
Well it would be good if you read the documentation,<

I have taken a look at the docs for dice(), I don't like its name because isn't intuitive at all, but its usage is easy. The usage of the function I have suggested is a bit more higher level. An possible alternative design for such function is to take in input an already sorted array of the weights (beside the iterable of the items), this may speed up this function a bit (it just needs to call the algorithm for bisect search, I presume).
Well it looks like everyone thinks their favorite generator is the "best".<

I have suggested three generators because there's no "best": the numbers generated by R250/521 are worse than ones generated by KISS and much worse than ones generated by Mersenne.
 http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2461.pdf
, which goes for defining several generators.

I have read about R250/521, and I have used it in some simulations in C/D, with good/acceptable results. But I can't "compete" with groups of people that define C++. Among them there are probably people expert about random generators, so it may be wiser for you to follow them and to ignore me. Regarding C++, if not already present, the default sort of D arrays may become the stable sort of some good C++ STL. Bye, bearophile
Feb 11 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 You are right, I am sorry, I was getting a bit nervous. You are
 usually gentle with me, and this helps me think much better. Maybe
 this time you are nervous/angry because I have not read the docs of
 sts.random of Phobos of D2.

There's no reason for you to allow your conduct being influenced negatively by anyone. If you're right then this is what matters. One joy of hanging out in this group is that it's not the field of turf wars as it once was (and as I believe it could become again in the future). My perception is that this is largely a group in which respect is earned on technical grounds. Andrei
Feb 11 2009
prev sibling parent Benji Smith <dlanguage benjismith.net> writes:
bearophile wrote:
 I have taken a look at the docs for dice(), I don't like its name because
isn't intuitive at all, but its usage is easy. The usage of the function I have
suggested is a bit more higher level.
 
 An possible alternative design for such function is to take in input an
already sorted array of the weights (beside the iterable of the items), this
may speed up this function a bit (it just needs to call the algorithm for
bisect search, I presume).

FWIW, I've implemented this sort of thing before. In my implementation, it was called ProbabilisticChooser(T), and I could instantiate it either with a pair of parallel arrays or with a HashMap(T, double). In my case, I didn't lump it in with my other random-number related code, because I had a set of other classes implementing the Chooser(T) interface. Some of them were random and some were deterministic, but they all provided the same "choose" function on top of a "choice strategy" implementation. --benji
Feb 11 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
[about shuffling ranges]
 I'm with bearophile here. Not because his version is shorter, but because
 1) some containers/ranges might not have a dup method
 2) it is potentially faster because it may avoid unnecessary data 
 copying (relevant for large arrays).
 
 That said, having both versions is preferred by me with names like 
 shuffle and shuffledCopy.
 The same could be applied to sort - sortedCopy might be useful, too.

Oh, I understand now. Incidentally topNCopy exists already, and it becomes a sortedCopy for n = input.length. But perhaps an explicit function would drive the point home better. About shuffleCopy, you and Leonardo are both right that it could be significantly faster than copying and then shuffling the copy. The solution that I see most in the spirit of the new range design is to have a ShuffledRange that, given another range, iterates it in a random manner. An ordinary copy() call closes the deal if a copy is needed. Thanks, Andrei
Feb 11 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Andrei Alexandrescu wrote:
 Denis Koroskin wrote:
 [about shuffling ranges]
 I'm with bearophile here. Not because his version is shorter, but because
 1) some containers/ranges might not have a dup method
 2) it is potentially faster because it may avoid unnecessary data 
 copying (relevant for large arrays).

 That said, having both versions is preferred by me with names like 
 shuffle and shuffledCopy.
 The same could be applied to sort - sortedCopy might be useful, too.

Oh, I understand now. Incidentally topNCopy exists already, and it becomes a sortedCopy for n = input.length. But perhaps an explicit function would drive the point home better. About shuffleCopy, you and Leonardo are both right that it could be significantly faster than copying and then shuffling the copy. The solution that I see most in the spirit of the new range design is to have a ShuffledRange that, given another range, iterates it in a random manner. An ordinary copy() call closes the deal if a copy is needed.

Wait, I take that back. I don't know of solid ways to sort into a copy or shuffle into a copy. For shuffling I'd need auxiliary memory (in addition to the copy itself), and for sorting I'd be better off copying and then sorting in place. Could anyone illuminate me about better ways? Andrei
Feb 11 2009
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Denis Koroskin" wrote
 On Thu, 12 Feb 2009 15:14:34 +0300, Steve Schveighoffer
 On Thu, 12 Feb 2009 14:41:26 +0300, Denis Koroskin wrote:

 On Thu, 12 Feb 2009 07:46:31 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Wait, I take that back. I don't know of solid ways to sort into a copy
 or shuffle into a copy. For shuffling I'd need auxiliary memory (in
 addition to the copy itself), and for sorting I'd be better off copying
 and then sorting in place. Could anyone illuminate me about better
 ways?

 Andrei

That's why I said it /might/ be faster. It doesn't have to, the default implementation may look as follows: typeof(Range.first)[] shuffledCopy(Range)(Range range) { return shuffle(createCopy(range)); } However, it may be implemented quite differently if not inplace. For this to work we need two primitives that I didn't find in std.range (I know names are not good, feel free to rename them as you please): struct RandomNumbers(Rnd = Random) { // ... } A finite range of length 'max' that generates random numbers in interval [0..max) without repeating. An example:

That's the problem. How do you do this without remembering all the number you have returned so far (i.e. generate an array of N integers)? I've done algorithms like this, and generally you have to mutate the source array (swapping used items with the item at the location you took from). You may save some time on swapping by building a temporary array of pointers or indexes, but you still have to build a temporary array, which is what Andrei was saying. -Steve

Well, I've attached the source code, so take a look at it. I don't remember all the values returned so far but I still use O(N) temporary storage.

Yes, you are remembering. The O(N) temporary storage is the part that remembers what you have returned (or rather, what you have not yet returned). However, it is clever how you do not have to initialize the array to begin with. I'm still not quite sure how it works.
 Copying and modifying array of int is often preffered to copying and 
 modifying and array of structs (they can be big, have heavyweight 
 assignment operators etc).

Yes, but Andrei said "I don't know of solid ways to ... shuffle into a copy. For shuffling I'd need auxiliary memory (in addition to the copy itself)" the O(N) space requirement could be quite costly for large arrays. -Steve
Feb 12 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Denis Koroskin" wrote
 On Thu, 12 Feb 2009 15:14:34 +0300, Steve Schveighoffer
 On Thu, 12 Feb 2009 14:41:26 +0300, Denis Koroskin wrote:

 On Thu, 12 Feb 2009 07:46:31 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Wait, I take that back. I don't know of solid ways to sort into a copy
 or shuffle into a copy. For shuffling I'd need auxiliary memory (in
 addition to the copy itself), and for sorting I'd be better off copying
 and then sorting in place. Could anyone illuminate me about better
 ways?

 Andrei

implementation may look as follows: typeof(Range.first)[] shuffledCopy(Range)(Range range) { return shuffle(createCopy(range)); } However, it may be implemented quite differently if not inplace. For this to work we need two primitives that I didn't find in std.range (I know names are not good, feel free to rename them as you please): struct RandomNumbers(Rnd = Random) { // ... } A finite range of length 'max' that generates random numbers in interval [0..max) without repeating. An example:

number you have returned so far (i.e. generate an array of N integers)? I've done algorithms like this, and generally you have to mutate the source array (swapping used items with the item at the location you took from). You may save some time on swapping by building a temporary array of pointers or indexes, but you still have to build a temporary array, which is what Andrei was saying. -Steve

remember all the values returned so far but I still use O(N) temporary storage.

Yes, you are remembering. The O(N) temporary storage is the part that remembers what you have returned (or rather, what you have not yet returned). However, it is clever how you do not have to initialize the array to begin with. I'm still not quite sure how it works.

Yah, the implementation is neat. The thing is it's not "RandomNumbers", it's "RandomPermutation" because what it really generates is a permutation of the numbers {0, 1,...., n}. Maybe a convenient way to generate permutations in std.random would please everybody. Then you index an array by a random permutation of its indices and you obtain a random cover of the array. Andrei
Feb 12 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Ok, following Leonardo's and Denis' ideas on covering an immutable array 
in random order, I thought about it and came up with one algorithm, in 
addition to the obvious one that stores one integer per element (see 
Denis' code).

Here's the algorithm. It stores one bit per element and takes O(n log n) 
time to cover a range of length n.

1. Given a random-access range r of length n, create a bitmap with one 
bit per element in r. The range is initially all zeros.

2. At each step, select a random index in the _original_ array, i.e. a 
random number in 0 .. r.length. Then walk upwards (with wrapping at the 
end of the range) until you find a bit = 0 in the bitmap. Mark that bit 
= 1 and return the found element.

3. The cover ends when all bits in the bitmap are 1.

I did a ballpark complexity estimate by approximating the average number 
of steps in (2) with n / (n - k), where k is the number of 1s in the 
bitmap. IIRC the average number of steps to hit a 1 in a binomial 
distribution with p = (n - k) / n is 1/p. Right? In that case, the total 
number of steps taken by the algo above is:

N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n 
- 1)) < O (n log n)

Makes sense?

If the range does not offer random access, I think this algorithm will 
have quadratic complexity because it needs to make a number of steps 
proportional to n at each iteration.


Andrei
Feb 12 2009
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Ok, following Leonardo's and Denis' ideas on covering an immutable array 
 in random order, I thought about it and came up with one algorithm, in 
 addition to the obvious one that stores one integer per element (see 
 Denis' code).

 Here's the algorithm. It stores one bit per element and takes O(n log n) 
 time to cover a range of length n.

 1. Given a random-access range r of length n, create a bitmap with one bit 
 per element in r. The range is initially all zeros.

 2. At each step, select a random index in the _original_ array, i.e. a 
 random number in 0 .. r.length. Then walk upwards (with wrapping at the 
 end of the range) until you find a bit = 0 in the bitmap. Mark that bit = 
 1 and return the found element.

 3. The cover ends when all bits in the bitmap are 1.

 I did a ballpark complexity estimate by approximating the average number 
 of steps in (2) with n / (n - k), where k is the number of 1s in the 
 bitmap. IIRC the average number of steps to hit a 1 in a binomial 
 distribution with p = (n - k) / n is 1/p. Right? In that case, the total 
 number of steps taken by the algo above is:

 N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n - 
 1)) < O (n log n)

 Makes sense?

OK, but you still have O(n) space requirement (albeit with a much smaller constant). There are some other optimizations you can make. For instance, finding a bit can be optimized somewhat (i.e. if one integer's worth of bits is 0xffffffff, you can skip 32 bits, and when you find an int which contains some 0 bits, you can even use a lookup table for 8 bits to tell you what the next bit should be, then you can go 8 bits at a time looking for your empty bit). This might reduce the "search time" to a more acceptable number. At most, the search time for a bit will be n / 32 + 3. I think the tradeoff might be significant between a larger space requirement and an O(n) vs. O(nlgn) algo, but it would be highly dependent on the dataset size. There's one other "obvious" algorithm.. store pointers to the original elements, or null if the element hasn't been swapped. The code would look something like: T [] _origArray; T *[] _tmparray; T *_headElem; private T *getElem(int idx) { if(auto retval = _tmparray[idx]) return retval; return &_origArray[idx]; } void next() { int idx = randomInt(0.._tmparray.length); _headElem = getElem(idx); _tmparray[idx] = getElem(_tmparray.length - 1); _tmparray.length = _tmparray.length - 1; } Same space requirements as the index generating range, but it may be faster in terms of looking up values. -Steve
Feb 12 2009
prev sibling next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 Ok, following Leonardo's and Denis' ideas on covering an immutable array 
 in random order, I thought about it and came up with one algorithm, in 
 addition to the obvious one that stores one integer per element (see 
 Denis' code).
 
 Here's the algorithm. It stores one bit per element and takes O(n log n) 
 time to cover a range of length n.
 
 1. Given a random-access range r of length n, create a bitmap with one 
 bit per element in r. The range is initially all zeros.
 
 2. At each step, select a random index in the _original_ array, i.e. a 
 random number in 0 .. r.length. Then walk upwards (with wrapping at the 
 end of the range) until you find a bit = 0 in the bitmap. Mark that bit 
 = 1 and return the found element.
 
 3. The cover ends when all bits in the bitmap are 1.

So far so good.
 I did a ballpark complexity estimate by approximating the average number 
 of steps in (2) with n / (n - k), where k is the number of 1s in the 
 bitmap. IIRC the average number of steps to hit a 1 in a binomial 
 distribution with p = (n - k) / n is 1/p. Right? In that case, the total 
 number of steps taken by the algo above is:
 
 N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n 
 - 1)) < O (n log n)
 
 Makes sense?

No. Your wording sounds like you're doing stuff that's way off, but the resulting math is correct. My calculation would be based on the average length of a sequence of 1's (k/(n-k)). That means the work is 1+k/(n-k) = n/(n-k). Given that O(n*log(n)) is the theoretical best you can do, having a result that is < O(n*log(n)) is highly suspect. The sum 1/1 + 1/2 + 1/3 + 1/4 + 1/5 + ... is in fact O(log(n)).
 If the range does not offer random access, I think this algorithm will 
 have quadratic complexity because it needs to make a number of steps 
 proportional to n at each iteration.

That's when copying may be the best implementation
Feb 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jason House wrote:
 Andrei Alexandrescu Wrote: 

 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).

Well my wording means this: in an array of length n with k "holes" randomly distributed, the probability one slot is a a no-hole is (n-k)/n. What we want is to find the first no-hole starting from a random position in the array. How many steps do we do on average? That is the same as the average number of steps of rolling a fair dice with (n-k) faces until we obtain a particular face. And the average number of steps is IIRC 1/p = n/(n-k).
 Given that O(n*log(n)) is the theoretical best you can do, having a
 result that is < O(n*log(n)) is highly suspect.  The sum 1/1 + 1/2 +
 1/3 + 1/4 + 1/5 + ... is in fact O(log(n)).

Ever so pedantic. :o) I meant "<=" because I wasn't aware of the best theoretical bound. Do you have a pointer? Thanks. I have the feeling there is something clever to do after half of the array was covered. After that, the probability of a random element being uncovered falls below 0.5. I also have the feeling that interesting things can be done if the length of the range has certain values, such as the period of certain generators with certain parameters. I don't have the time to look into that now. Anyone versed in e.g. linear congruential generators with a given period? One way or another, I'll add RandomCover to std.random. Thanks Leonardo, Denis, Steve, and Jason. Andrei
Feb 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Fri, Feb 13, 2009 at 7:21 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 Andrei Alexandrescu Wrote:
 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).

randomly distributed, the probability one slot is a a no-hole is (n-k)/n. What we want is to find the first no-hole starting from a random position in the array. How many steps do we do on average? That is the same as the average number of steps of rolling a fair dice with (n-k) faces until we obtain a particular face. And the average number of steps is IIRC 1/p = n/(n-k).

Except your holes aren't randomly distributed, because you are more likely to fill a hole next to an already filled slot.

Oops, that makes me more likely to choose some slots than other. My algorithm stinks! Consider the array at the second iteration. There are n-1 zeros and one 1 in the bitmap. The random number selection selects a number in 0..n with probability 1/n each. That's already wrong because each element was supposed to be selected with probability 1/(n-1). That extra mass goes to the element right after the one selected in the first round. I suck. Andrei
Feb 12 2009
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Bill Baxter wrote:
 On Fri, Feb 13, 2009 at 7:21 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 Andrei Alexandrescu Wrote:
 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).

randomly distributed, the probability one slot is a a no-hole is (n-k)/n. What we want is to find the first no-hole starting from a random position in the array. How many steps do we do on average? That is the same as the average number of steps of rolling a fair dice with (n-k) faces until we obtain a particular face. And the average number of steps is IIRC 1/p = n/(n-k).

Except your holes aren't randomly distributed, because you are more likely to fill a hole next to an already filled slot.

algorithm stinks! Consider the array at the second iteration. There are n-1 zeros and one 1 in the bitmap. The random number selection selects a number in 0..n with probability 1/n each. That's already wrong because each element was supposed to be selected with probability 1/(n-1). That extra mass goes to the element right after the one selected in the first round. I suck.

Skip ahead a random number of elements before performing the linear search. Sean
Feb 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Bill Baxter wrote:
 On Fri, Feb 13, 2009 at 7:21 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 Andrei Alexandrescu Wrote:
 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).

randomly distributed, the probability one slot is a a no-hole is (n-k)/n. What we want is to find the first no-hole starting from a random position in the array. How many steps do we do on average? That is the same as the average number of steps of rolling a fair dice with (n-k) faces until we obtain a particular face. And the average number of steps is IIRC 1/p = n/(n-k).

likely to fill a hole next to an already filled slot.

algorithm stinks! Consider the array at the second iteration. There are n-1 zeros and one 1 in the bitmap. The random number selection selects a number in 0..n with probability 1/n each. That's already wrong because each element was supposed to be selected with probability 1/(n-1). That extra mass goes to the element right after the one selected in the first round. I suck.

Skip ahead a random number of elements before performing the linear search.

Can you prove this will yield a uniform cover? Andrei
Feb 12 2009
parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Sean Kelly wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Bill Baxter wrote:
 On Fri, Feb 13, 2009 at 7:21 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 Andrei Alexandrescu Wrote:
 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).

randomly distributed, the probability one slot is a a no-hole is (n-k)/n. What we want is to find the first no-hole starting from a random position in the array. How many steps do we do on average? That is the same as the average number of steps of rolling a fair dice with (n-k) faces until we obtain a particular face. And the average number of steps is IIRC 1/p = n/(n-k).

likely to fill a hole next to an already filled slot.

algorithm stinks! Consider the array at the second iteration. There are n-1 zeros and one 1 in the bitmap. The random number selection selects a number in 0..n with probability 1/n each. That's already wrong because each element was supposed to be selected with probability 1/(n-1). That extra mass goes to the element right after the one selected in the first round. I suck.

Skip ahead a random number of elements before performing the linear search.


Nope :-) But it solves the problem of favoring adjacent elements. Sean
Feb 12 2009
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Fri, Feb 13, 2009 at 7:21 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 Andrei Alexandrescu Wrote:
 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).

randomly distributed, the probability one slot is a a no-hole is (n-k)/n. What we want is to find the first no-hole starting from a random position in the array. How many steps do we do on average? That is the same as the average number of steps of rolling a fair dice with (n-k) faces until we obtain a particular face. And the average number of steps is IIRC 1/p = n/(n-k).

Except your holes aren't randomly distributed, because you are more likely to fill a hole next to an already filled slot.

Oops, that makes me more likely to choose some slots than other. My algorithm stinks!

On a similar note, we previously suggested using merge sort with a random comparer in order to shuffle a list outside memory. My experimentation showed a heavy bias toward certain permutations, though; can anyone determine why that should be?
Feb 13 2009
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Ok, following Leonardo's and Denis' ideas on covering an immutable array
 in random order, I thought about it and came up with one algorithm, in
 addition to the obvious one that stores one integer per element (see
 Denis' code).
 Here's the algorithm. It stores one bit per element and takes O(n log n)
 time to cover a range of length n.
 1. Given a random-access range r of length n, create a bitmap with one
 bit per element in r. The range is initially all zeros.
 2. At each step, select a random index in the _original_ array, i.e. a
 random number in 0 .. r.length. Then walk upwards (with wrapping at the
 end of the range) until you find a bit = 0 in the bitmap. Mark that bit
 = 1 and return the found element.
 3. The cover ends when all bits in the bitmap are 1.
 I did a ballpark complexity estimate by approximating the average number
 of steps in (2) with n / (n - k), where k is the number of 1s in the
 bitmap. IIRC the average number of steps to hit a 1 in a binomial
 distribution with p = (n - k) / n is 1/p. Right? In that case, the total
 number of steps taken by the algo above is:
 N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n
 - 1)) < O (n log n)
 Makes sense?
 If the range does not offer random access, I think this algorithm will
 have quadratic complexity because it needs to make a number of steps
 proportional to n at each iteration.
 Andrei

The concern I have with this scheme is that iterating over an array with this cover would not produce an unbiased distribution of permutations. A good shuffling algorithm will produce a perfectly uniform distribution over all possible permutations of the original range, assuming the underlying random number generator is unbiased. In other words, given an array of length N, every permutation should have an equal probability 1/N!. To get this unbiasedness, after selecting M values from a larger set N, the probabilities of selecting any of the remaining N - M elements as element M + 1 must be uniform. In this case, it is clearly not, since elements just after elements that have already been selected are "bigger targets". Similar problems exist for naive implementations of the Fisher-Yates algorithm (http://en.wikipedia.org/wiki/Fisher-Yates_shuffle). In other words, unless one is very careful, one will not get a truly unbiased, random permutation. That said, for certain applications, what you propose may be "random enough." However, if you do include it, please at least document that it is biased and should not be used for things like monte carlo simulations or gambling, where people might care.
Feb 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Ok, following Leonardo's and Denis' ideas on covering an immutable array
 in random order, I thought about it and came up with one algorithm, in
 addition to the obvious one that stores one integer per element (see
 Denis' code).
 Here's the algorithm. It stores one bit per element and takes O(n log n)
 time to cover a range of length n.
 1. Given a random-access range r of length n, create a bitmap with one
 bit per element in r. The range is initially all zeros.
 2. At each step, select a random index in the _original_ array, i.e. a
 random number in 0 .. r.length. Then walk upwards (with wrapping at the
 end of the range) until you find a bit = 0 in the bitmap. Mark that bit
 = 1 and return the found element.
 3. The cover ends when all bits in the bitmap are 1.
 I did a ballpark complexity estimate by approximating the average number
 of steps in (2) with n / (n - k), where k is the number of 1s in the
 bitmap. IIRC the average number of steps to hit a 1 in a binomial
 distribution with p = (n - k) / n is 1/p. Right? In that case, the total
 number of steps taken by the algo above is:
 N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n
 - 1)) < O (n log n)
 Makes sense?
 If the range does not offer random access, I think this algorithm will
 have quadratic complexity because it needs to make a number of steps
 proportional to n at each iteration.
 Andrei

The concern I have with this scheme is that iterating over an array with this cover would not produce an unbiased distribution of permutations. A good shuffling algorithm will produce a perfectly uniform distribution over all possible permutations of the original range, assuming the underlying random number generator is unbiased. In other words, given an array of length N, every permutation should have an equal probability 1/N!. To get this unbiasedness, after selecting M values from a larger set N, the probabilities of selecting any of the remaining N - M elements as element M + 1 must be uniform. In this case, it is clearly not, since elements just after elements that have already been selected are "bigger targets". Similar problems exist for naive implementations of the Fisher-Yates algorithm (http://en.wikipedia.org/wiki/Fisher-Yates_shuffle). In other words, unless one is very careful, one will not get a truly unbiased, random permutation. That said, for certain applications, what you propose may be "random enough." However, if you do include it, please at least document that it is biased and should not be used for things like monte carlo simulations or gambling, where people might care.

I can't believe I fell for this. I knew Fisher-Yates and used it in randomShuffle! I won't put a subtly skewed distribution in the standard library. The quest for finding a random cover of an array with as little extra memory consumed and with good complexity is on! I'd appreciate any ideas. Andrei
Feb 12 2009
parent reply Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Ok, following Leonardo's and Denis' ideas on covering an immutable array
 in random order, I thought about it and came up with one algorithm, in
 addition to the obvious one that stores one integer per element (see
 Denis' code).
 Here's the algorithm. It stores one bit per element and takes O(n log n)
 time to cover a range of length n.
 1. Given a random-access range r of length n, create a bitmap with one
 bit per element in r. The range is initially all zeros.
 2. At each step, select a random index in the _original_ array, i.e. a
 random number in 0 .. r.length. Then walk upwards (with wrapping at the
 end of the range) until you find a bit = 0 in the bitmap. Mark that bit
 = 1 and return the found element.
 3. The cover ends when all bits in the bitmap are 1.
 I did a ballpark complexity estimate by approximating the average number
 of steps in (2) with n / (n - k), where k is the number of 1s in the
 bitmap. IIRC the average number of steps to hit a 1 in a binomial
 distribution with p = (n - k) / n is 1/p. Right? In that case, the total
 number of steps taken by the algo above is:
 N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n
 - 1)) < O (n log n)
 Makes sense?
 If the range does not offer random access, I think this algorithm will
 have quadratic complexity because it needs to make a number of steps
 proportional to n at each iteration.
 Andrei

The concern I have with this scheme is that iterating over an array with this cover would not produce an unbiased distribution of permutations. A good shuffling algorithm will produce a perfectly uniform distribution over all possible permutations of the original range, assuming the underlying random number generator is unbiased. In other words, given an array of length N, every permutation should have an equal probability 1/N!. To get this unbiasedness, after selecting M values from a larger set N, the probabilities of selecting any of the remaining N - M elements as element M + 1 must be uniform. In this case, it is clearly not, since elements just after elements that have already been selected are "bigger targets". Similar problems exist for naive implementations of the Fisher-Yates algorithm (http://en.wikipedia.org/wiki/Fisher-Yates_shuffle). In other words, unless one is very careful, one will not get a truly unbiased, random permutation. That said, for certain applications, what you propose may be "random enough." However, if you do include it, please at least document that it is biased and should not be used for things like monte carlo simulations or gambling, where people might care.

I can't believe I fell for this. I knew Fisher-Yates and used it in randomShuffle! I won't put a subtly skewed distribution in the standard library. The quest for finding a random cover of an array with as little extra memory consumed and with good complexity is on! I'd appreciate any ideas.

The obvious baseline is to make an array with 0..n. Start on the left and swap with a random element to the right. That's O(n) for both speed and memory. O(n^2) runtime algorithms are easy with a bit array. I bet a special random generator could be built for O(1) memory at the sacrifice of less random sequences. I think it should be possible to pick seed numbers to a generator that will cycle through all values in a set order. I'll think more to see if I can come up with something creative.
Feb 12 2009
next sibling parent Jason House <jaaon.james.house gmail.com> writes:
Jason House Wrote:

 Andrei Alexandrescu Wrote:
 
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Ok, following Leonardo's and Denis' ideas on covering an immutable array
 in random order, I thought about it and came up with one algorithm, in
 addition to the obvious one that stores one integer per element (see
 Denis' code).
 Here's the algorithm. It stores one bit per element and takes O(n log n)
 time to cover a range of length n.
 1. Given a random-access range r of length n, create a bitmap with one
 bit per element in r. The range is initially all zeros.
 2. At each step, select a random index in the _original_ array, i.e. a
 random number in 0 .. r.length. Then walk upwards (with wrapping at the
 end of the range) until you find a bit = 0 in the bitmap. Mark that bit
 = 1 and return the found element.
 3. The cover ends when all bits in the bitmap are 1.
 I did a ballpark complexity estimate by approximating the average number
 of steps in (2) with n / (n - k), where k is the number of 1s in the
 bitmap. IIRC the average number of steps to hit a 1 in a binomial
 distribution with p = (n - k) / n is 1/p. Right? In that case, the total
 number of steps taken by the algo above is:
 N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n
 - 1)) < O (n log n)
 Makes sense?
 If the range does not offer random access, I think this algorithm will
 have quadratic complexity because it needs to make a number of steps
 proportional to n at each iteration.
 Andrei

The concern I have with this scheme is that iterating over an array with this cover would not produce an unbiased distribution of permutations. A good shuffling algorithm will produce a perfectly uniform distribution over all possible permutations of the original range, assuming the underlying random number generator is unbiased. In other words, given an array of length N, every permutation should have an equal probability 1/N!. To get this unbiasedness, after selecting M values from a larger set N, the probabilities of selecting any of the remaining N - M elements as element M + 1 must be uniform. In this case, it is clearly not, since elements just after elements that have already been selected are "bigger targets". Similar problems exist for naive implementations of the Fisher-Yates algorithm (http://en.wikipedia.org/wiki/Fisher-Yates_shuffle). In other words, unless one is very careful, one will not get a truly unbiased, random permutation. That said, for certain applications, what you propose may be "random enough." However, if you do include it, please at least document that it is biased and should not be used for things like monte carlo simulations or gambling, where people might care.

I can't believe I fell for this. I knew Fisher-Yates and used it in randomShuffle! I won't put a subtly skewed distribution in the standard library. The quest for finding a random cover of an array with as little extra memory consumed and with good complexity is on! I'd appreciate any ideas.

The obvious baseline is to make an array with 0..n. Start on the left and swap with a random element to the right. That's O(n) for both speed and memory.

Oops, that's wrong. It's O(n log(n)) for memory.
 O(n^2) runtime algorithms are easy with a bit array.

It's also possible to use a bit array for remembering what was picked but pick f(n) elements at a time. The memory is then O(n + log(n)f(n)) and the runtime >= O(n^2/f(n) + f(n)log(f(n))
 I bet a special random generator could be built for O(1) memory at the
sacrifice of less random sequences. I think it should be possible to pick seed
numbers to a generator that will cycle through all values in a set order.
 
 I'll think more to see if I can come up with something creative.

Feb 13 2009
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jason House wrote:
 I bet a special random generator could be built for O(1) memory at
 the sacrifice of less random sequences. I think it should be possible
 to pick seed numbers to a generator that will cycle through all
 values in a set order.
 
 I'll think more to see if I can come up with something creative.

A linear congruential generator http://en.wikipedia.org/wiki/Linear_congruential_generator with the recurrence x[n] = (x[n-1] * a + c) % m has period exactly m if: a) c and m are relatively prime b) (a-1) divisible by all prime factors of m c) (m-1) % 4 == 0 <= (a-1) % 4 == 0 Now most of the times people want the period to be as long as possible so they choose m = 2^32 and work some "good" a and m from there. In our situation, m is given (the length of the input range) and we need to compute a and c. Here's a simple method: 1. Factor m into primes. Obtain arrays p[], the prime factors, and w[], their powers. For example, say m = 125, then p[] = [ 3, 5 ] and w[] = [ 1, 2 ]. 2. Replace each element w[i] with a number between 1 and w[i] inclusive. 3. Construct a1=a-1 as the pointwise product w[] * p[]. If m-1 is divisible by 4 and a1 is not, multiply a1 by 2 or 4 as needed. 4. Construct a = a1 + 1 5. Construct c as a product of random powers of prime numbers, paying attention that all powers of the primes in p[] are 0. Now we have a linear congruential generator that covers the range 0 .. m exactly. Andrei
Feb 13 2009
prev sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Fri, Feb 13, 2009 at 9:37 AM, Bill Baxter <wbaxter gmail.com> wrote:
 On Fri, Feb 13, 2009 at 9:23 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Sean Kelly wrote:

 Skip ahead a random number of elements before performing the linear
 search.

Can you prove this will yield a uniform cover?

I don't see how it could. If you have a linear search at all, then empty at the end of a row of filled elements will always have a higher probability of getting chosen. Pure dart throwing would clearly work though. Just keep trying randomly till you hit an empty. Won't perform very well, though.

If you count the total number of 1's that precede the 0 you've chosen, you can tell how biased the choice was. Say it was k/n when it should have been just 1/m (n the total number, m the 0's remaining). If you know this, you can systematically discard the choice and try again with probability such that the combined probability of both choices comes out to 1/m. So that would be k/n * x = 1/m, thus x = n/(k*m). So you keep the value found by linear probing with probability n/(k*m), otherwise you try again. Nice thing is as m is getting smaller, typical k's are approaching n, and so towards the end you get to keep more and more of your choices. So it fixes the problem with pure dart throwing where it takes a longer and longer time to find the last few empty slots. How's that sound? --bb
Feb 12 2009
prev sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Fri, Feb 13, 2009 at 9:23 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Sean Kelly wrote:

 Skip ahead a random number of elements before performing the linear
 search.

Can you prove this will yield a uniform cover?

I don't see how it could. If you have a linear search at all, then empty at the end of a row of filled elements will always have a higher probability of getting chosen. Pure dart throwing would clearly work though. Just keep trying randomly till you hit an empty. Won't perform very well, though. --bb
Feb 12 2009
prev sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Fri, Feb 13, 2009 at 7:21 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 Andrei Alexandrescu Wrote:

 No.  Your wording sounds like you're doing
 stuff that's way off, but the resulting math is correct.  My
 calculation would be based on the average length of a sequence of 1's
 (k/(n-k)).  That means the work is 1+k/(n-k) = n/(n-k).

Well my wording means this: in an array of length n with k "holes" randomly distributed, the probability one slot is a a no-hole is (n-k)/n. What we want is to find the first no-hole starting from a random position in the array. How many steps do we do on average? That is the same as the average number of steps of rolling a fair dice with (n-k) faces until we obtain a particular face. And the average number of steps is IIRC 1/p = n/(n-k).

Except your holes aren't randomly distributed, because you are more likely to fill a hole next to an already filled slot. Not sure how badly that skews the big-oh though. Sounds like a problem you'd find the answer to in an advanced text covering hashing, because it sounds just like hashing with linear probing. --bb
Feb 12 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 11 Feb 2009 19:14:38 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 bearophile wrote:
 Andrei Alexandrescu:

 What I see is that the std.algorithm has most of what you ask below,<

kind of algorithms into a module isn't good, it's better to move the random-related algorithms into std.random. It helps the memory of the programmer to know where to look for, and avoids lumping too many mixed things into a single module.

It's there. Read The Fine Manual: http://www.digitalmars.com/d/2.0/phobos/std_random.html
 int array[100];
 auto rnd = Random(unpredictableSeed);
 auto rndElement = array[uniform(rnd, 0, 100)];

function with this interface: choice(iterable) that works with any iterable, lazy too. I know that's slow code, but in many situations you don't care and you want handy and safe functions. And where you need max speed, you don't use it.

That would only work with ranges that know their length, right? I don't know how to select a random element of an infinite range or one that has an unknown length.
 I don't like this:
 uniform(rnd, 0, 100)
 This seems better:
 rnd.uniform(0, 100)
 Then 0 can be the default lower bound:
 rnd.uniform(100)
 And for simple usages you may just want the default struct:
 uniform(100)

Well this becomes nitpicky.
 - randInt: gives a random integer in an interval, the lower bound is  
 zero by default.

 - uniform: gives a random real/float/double in a given interval


it may be better to have two different functions. (And you can optimize efficiency if you know you need an integral value as result).

I meant: double d = uniform(rnd, 0.0, 1.0); The "uniform" function infers the type of its result from its arguments.
 A normal(avg, stddev) function can be good too, where avg may be by  
 default zero again :-)
  A fastNormal(avg, stddev) function too can be useful, less precise but  
 faster.

Where's the underlying random generator?
 - shuffled: like shuffle, but creates a copy of the items. Return an  
 array.

randomShuffle(another, rnd);

into an expression. And dup works with arrays only, so you need to put into shuffled() a call to something like the array() function of my libs to have something more general.

Right. So what you're saying is that the concern of copying is separated from the concern of shuffling. So the two shouldn't be collapsed together unless there's an obvious necessity or benefit. Saving a line of code doesn't seem convincing.

I'm with bearophile here. Not because his version is shorter, but because 1) some containers/ranges might not have a dup method 2) it is potentially faster because it may avoid unnecessary data copying (relevant for large arrays). That said, having both versions is preferred by me with names like shuffle and shuffledCopy. The same could be applied to sort - sortedCopy might be useful, too.
 - weightedChoice: like choice, but the user can specify the relative  
 probability of choosing each item.<



Well it would be good if you read the documentation, which seems increasingly clear to me you haven't.
 How does R250/521 compare with linear congruential and the Mersenne  
 Twister engine?<

- is generates better rnd values than linear congruential, and is a bit faster. - is faster than KISS. - is much much faster than Mersenne Twister. - needs few good rnd numbers to start, so they can be given by a Mersenne Twister (they may be even hard-coded).
 Anyhow, it would be great to include more generators in Phobos, so if  
 you plan to contribute some please let me know.<

there only few of them, that the programmer can tell apart in a simple way.

Well it looks like everyone thinks their favorite generator is the "best". As far as Phobos is concerned, the source for the design of std.random is inspired from http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2461.pdf, which goes for defining several generators. Andrei

Feb 11 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
------------1nI8AWoDhf6PfJiCYPO0MD
Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8
Content-Transfer-Encoding: 7bit

On Thu, 12 Feb 2009 07:46:31 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Andrei Alexandrescu wrote:
 Denis Koroskin wrote:
 [about shuffling ranges]
 I'm with bearophile here. Not because his version is shorter, but  
 because
 1) some containers/ranges might not have a dup method
 2) it is potentially faster because it may avoid unnecessary data  
 copying (relevant for large arrays).

 That said, having both versions is preferred by me with names like  
 shuffle and shuffledCopy.
 The same could be applied to sort - sortedCopy might be useful, too.

becomes a sortedCopy for n = input.length. But perhaps an explicit function would drive the point home better. About shuffleCopy, you and Leonardo are both right that it could be significantly faster than copying and then shuffling the copy. The solution that I see most in the spirit of the new range design is to have a ShuffledRange that, given another range, iterates it in a random manner. An ordinary copy() call closes the deal if a copy is needed.

Wait, I take that back. I don't know of solid ways to sort into a copy or shuffle into a copy. For shuffling I'd need auxiliary memory (in addition to the copy itself), and for sorting I'd be better off copying and then sorting in place. Could anyone illuminate me about better ways? Andrei

That's why I said it /might/ be faster. It doesn't have to, the default implementation may look as follows: typeof(Range.first)[] shuffledCopy(Range)(Range range) { return shuffle(createCopy(range)); } However, it may be implemented quite differently if not inplace. For this to work we need two primitives that I didn't find in std.range (I know names are not good, feel free to rename them as you please): struct RandomNumbers(Rnd = Random) { // ... } A finite range of length 'max' that generates random numbers in interval [0..max) without repeating. An example: Random rnd; auto gen = RandomNumbers(10, rnd); foreach (n; gen) { writefln(n); } Sample output: 2 6 8 1 4 7 5 9 0 3 Sample implementation is attached. Next we need RandomItems - a finite range the acepts a random access range and returns random elements one-by-one without repeating: struct RandomItems(Range, Rnd = Random) { // ... } It makes use of RandomNumbers generator and is very simple (sample source attached, too). Now we can create shuffledCopy easily: auto shuffledCopy(Range)(Range range) { auto gen = RandomItems!(Range)(range); typeof(range[0])[] copy = new typeof(range[0])[range.length]; foreach (ref elem; copy) { elem = gen.head; gen.next(); } return copy; } ------------1nI8AWoDhf6PfJiCYPO0MD Content-Disposition: attachment; filename=test.d Content-Type: application/octet-stream; name=test.d Content-Transfer-Encoding: Base64 aW1wb3J0IHN0ZC5zdGRpbzsNCmltcG9ydCBzdGQucmFuZG9tOw0KaW1wb3J0IHN0 ZC5jLnN0ZGxpYjsNCg0Kc3RydWN0IFJhbmRvbU51bWJlcnMoUm5kID0gUmFuZG9t KQ0Kew0KICAgIHByaXZhdGUgaW50W10gX2l0ZW1zOw0KICAgIHByaXZhdGUgaW50 IF9vZmZzZXQgPSAwOw0KICAgIHByaXZhdGUgUm5kIF9ybmQ7DQoNCiAgICB0aGlz KGludCBtYXgsIFJuZCBybmQgPSBSbmQuaW5pdCwgaW50W10gYnVmZmVyID0gbnVs bCkgLy8gc2hvdWxkIGJlICdybmQgPSBSbmQoKScgYnV0IGRvZXNuJ3Qgd29yaw0K ICAgIHsNCiAgICAgICAgX2l0ZW1zID0gYnVmZmVyOw0KICAgICAgICBfaXRlbXMu bGVuZ3RoID0gbWF4Ow0KDQogICAgICAgIF9ybmQgPSBybmQ7DQogICAgICAgIGdl bigpOw0KICAgIH0NCg0KICAgIGludCBoZWFkKCkNCiAgICB7DQogICAgICAgIHJl dHVybiBhdChfb2Zmc2V0KTsNCiAgICB9DQoNCiAgICB2b2lkIG5leHQoKQ0KICAg IHsNCiAgICAgICAgaW50IGxlbiA9IF9pdGVtcy5sZW5ndGggLSAxOw0KICAgICAg ICBpZiAobGVuICE9IDApIHsNCiAgICAgICAgICAgIF9pdGVtc1tfb2Zmc2V0XSA9 IGF0KGxlbikgLSBfb2Zmc2V0Ow0KICAgICAgICAgICAgX2l0ZW1zLmxlbmd0aCA9 IGxlbjsNCiAgICAgICAgICAgIGdlbigpOw0KICAgICAgICB9IGVsc2Ugew0KICAg ICAgICAgICAgX2l0ZW1zLmxlbmd0aCA9IDA7DQogICAgICAgIH0NCiAgICB9DQoN CiAgICBib29sIGVtcHR5KCkNCiAgICB7DQogICAgICAgIHJldHVybiBsZW5ndGgg PT0gMDsNCiAgICB9DQogICAgDQogICAgaW50IGxlbmd0aCgpDQogICAgew0KICAg ICAgICByZXR1cm4gX2l0ZW1zLmxlbmd0aDsNCiAgICB9DQogICAgDQogICAgcHJp dmF0ZSB2b2lkIGdlbigpDQogICAgew0KICAgICAgICBfb2Zmc2V0ID0gX3JuZC5u ZXh0KCkgJSBsZW5ndGg7DQogICAgfQ0KICAgIA0KICAgIHByaXZhdGUgaW50IGF0 KGludCBpbmRleCkNCiAgICB7DQogICAgICAgIHJldHVybiBpbmRleCArIF9pdGVt c1tpbmRleF07DQogICAgfQ0KfQ0KDQpzdHJ1Y3QgUmFuZG9tSXRlbXMoUmFuZ2Us IFJuZCA9IFJhbmRvbSkNCnsNCiAgICBwcml2YXRlIFJhbmRvbU51bWJlcnMhKFJu ZCkgX2dlbjsNCiAgICBwcml2YXRlIFJhbmdlIF9yYW5nZTsNCg0KICAvLyBidWc6 IHRoZSBmb2xsb3dpbmcgZG9lc24ndCBjb21waWxlOg0KICAvL3RoaXMoUmFuZ2Ug cmFuZ2UsIFJuZCByYW5kb20gPSBSbmQuaW5pdCwgdHlwZW9mKHJhbmdlWzBdKVtd IGJ1ZmZlciA9IG51bGwpIA0KICAgIHRoaXMoUmFuZ2UgcmFuZ2UsIFJuZCByYW5k b20gPSBSbmQuaW5pdCwgdHlwZW9mKF9yYW5nZVswXSlbXSBidWZmZXIgPSBudWxs KSANCiAgICB7DQogICAgICAgIF9yYW5nZSA9IHJhbmdlOw0KICAgICAgICBfZ2Vu ID0gUmFuZG9tTnVtYmVycyEoUm5kKShyYW5nZS5sZW5ndGgsIHJhbmRvbSwgYnVm ZmVyKTsNCiAgICB9DQoNCiAgICB0eXBlb2YoX3JhbmdlWzBdKSBoZWFkKCkNCiAg ICB7DQogICAgICAgIHJldHVybiBfcmFuZ2VbX2dlbi5oZWFkXTsNCiAgICB9DQoN CiAgICB2b2lkIG5leHQoKQ0KICAgIHsNCiAgICAgICAgX2dlbi5uZXh0KCk7DQog ICAgfQ0KDQogICAgYm9vbCBlbXB0eSgpDQogICAgew0KICAgICAgICByZXR1cm4g X2dlbi5lbXB0eTsNCiAgICB9DQogICAgDQogICAgaW50IGxlbmd0aCgpDQogICAg ew0KICAgICAgICByZXR1cm4gX2dlbi5sZW5ndGg7DQogICAgfQ0KfQ0KDQphdXRv IHNodWZmbGVkQ29weShSYW5nZSkoUmFuZ2UgcmFuZ2UpDQp7DQogICAgLy8gTm90 IGFuIGVzc2VudGlhbCBwYXJ0IC0gY3JlYXRlIGEgdGVtcG9yYXJ5IGJ1ZmZlciBz byB0aGF0IFJhbmRvbUl0ZW1zIGRvbid0IGFsbG9jYXRlDQogICAgaW50KiBwdHIg PSBjYXN0KGludCopYWxsb2NhKGludC5zaXplb2YgKiByYW5nZS5sZW5ndGgpOw0K ICAgIGludFtdIGJ1ZmZlciA9IChwdHIgaXMgbnVsbCkgPyBuZXcgaW50W3Jhbmdl Lmxlbmd0aF0gOiBwdHJbMC4ucmFuZ2UubGVuZ3RoXTsNCg0KICAgIC8vIEJVRzog VW5jb21tZW50IHRoZSBmb2xsb3dpbmcgbGluZSBhbmQgaXQgd2lsbCBjcmFzaCBh dCBydW50aW1lOg0KICAgIC8vIFJhbmRvbSBybmQ7DQogICAgYXV0byBnZW4gPSBS YW5kb21JdGVtcyEoUmFuZ2UpKHJhbmdlLCBSYW5kb20uaW5pdCwgYnVmZmVyKTsN CiAgICB0eXBlb2YocmFuZ2VbMF0pW10gY29weSA9IG5ldyB0eXBlb2YocmFuZ2Vb MF0pW3JhbmdlLmxlbmd0aF07IC8vIFRPRE86IGFsbG9jYXRlIHVuaW5pdGlhbGl6 ZWQgbWVtb3J5IGJsb2NrPw0KDQogICAgZm9yZWFjaCAocmVmIGVsZW07IGNvcHkp IHsNCiAgICAgICAgZWxlbSA9IGdlbi5oZWFkOw0KICAgICAgICBnZW4ubmV4dCgp Ow0KICAgIH0NCg0KICAgIHJldHVybiBjb3B5Ow0KfQ0KDQp2b2lkIG1haW4oKQ0K ew0KICAgIFJhbmRvbSBybmQ7DQogICAgYXV0byBnZW4gPSBSYW5kb21OdW1iZXJz IShSYW5kb20pKDEwLCBybmQpOyAvLyBhbGxvY2F0ZXMNCg0KICAgIGZvcmVhY2gg KG47IGdlbikgew0KICAgICAgICB3cml0ZWZsbihuKTsNCiAgICB9DQoNCiAgICB3 cml0ZWZsbigiLS0tLSIpOw0KDQogICAgYXV0byBpdGVtcyA9IFswLCAxLCAyLCAz LCA0LCA1LCA2LCA3LCA4LCA5XTsNCiAgICBhdXRvIGdlbjIgPSBSYW5kb21JdGVt cyEodHlwZW9mKGl0ZW1zKSkoaXRlbXMpOw0KDQogICAgZm9yZWFjaCAobjsgZ2Vu Mikgew0KICAgICAgICB3cml0ZWZsbihuKTsNCiAgICB9DQogICAgDQogICAgd3Jp dGVmbG4oIi0tLS0iKTsNCiAgICANCiAgICBhdXRvIHNodWZmbGVkID0gc2h1ZmZs ZWRDb3B5KGl0ZW1zKTsNCiAgICBmb3JlYWNoIChuOyBzaHVmZmxlZCkgew0KICAg ICAgICB3cml0ZWZsbihuKTsNCiAgICB9DQp9 ------------1nI8AWoDhf6PfJiCYPO0MD--
Feb 12 2009
prev sibling next sibling parent Steve Schveighoffer <schveiguy yahoo.com> writes:
On Thu, 12 Feb 2009 14:41:26 +0300, Denis Koroskin wrote:

 On Thu, 12 Feb 2009 07:46:31 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Wait, I take that back. I don't know of solid ways to sort into a copy
 or shuffle into a copy. For shuffling I'd need auxiliary memory (in
 addition to the copy itself), and for sorting I'd be better off copying
 and then sorting in place. Could anyone illuminate me about better
 ways?

 Andrei

That's why I said it /might/ be faster. It doesn't have to, the default implementation may look as follows: typeof(Range.first)[] shuffledCopy(Range)(Range range) { return shuffle(createCopy(range)); } However, it may be implemented quite differently if not inplace. For this to work we need two primitives that I didn't find in std.range (I know names are not good, feel free to rename them as you please): struct RandomNumbers(Rnd = Random) { // ... } A finite range of length 'max' that generates random numbers in interval [0..max) without repeating. An example:

That's the problem. How do you do this without remembering all the number you have returned so far (i.e. generate an array of N integers)? I've done algorithms like this, and generally you have to mutate the source array (swapping used items with the item at the location you took from). You may save some time on swapping by building a temporary array of pointers or indexes, but you still have to build a temporary array, which is what Andrei was saying. -Steve
Feb 12 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 12 Feb 2009 15:14:34 +0300, Steve Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 12 Feb 2009 14:41:26 +0300, Denis Koroskin wrote:

 On Thu, 12 Feb 2009 07:46:31 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Wait, I take that back. I don't know of solid ways to sort into a copy
 or shuffle into a copy. For shuffling I'd need auxiliary memory (in
 addition to the copy itself), and for sorting I'd be better off copying
 and then sorting in place. Could anyone illuminate me about better
 ways?

 Andrei

That's why I said it /might/ be faster. It doesn't have to, the default implementation may look as follows: typeof(Range.first)[] shuffledCopy(Range)(Range range) { return shuffle(createCopy(range)); } However, it may be implemented quite differently if not inplace. For this to work we need two primitives that I didn't find in std.range (I know names are not good, feel free to rename them as you please): struct RandomNumbers(Rnd = Random) { // ... } A finite range of length 'max' that generates random numbers in interval [0..max) without repeating. An example:

That's the problem. How do you do this without remembering all the number you have returned so far (i.e. generate an array of N integers)? I've done algorithms like this, and generally you have to mutate the source array (swapping used items with the item at the location you took from). You may save some time on swapping by building a temporary array of pointers or indexes, but you still have to build a temporary array, which is what Andrei was saying. -Steve

Well, I've attached the source code, so take a look at it. I don't remember all the values returned so far but I still use O(N) temporary storage. Copying and modifying array of int is often preffered to copying and modifying and array of structs (they can be big, have heavyweight assignment operators etc). Even if shuffledCopy won't get into algorithms, RandomNumbers/RandomItems are still pretty useful ranges.
Feb 12 2009
prev sibling parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Fri, 13 Feb 2009 15:23:07 +0100, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 1. Factor m into primes. Obtain arrays p[], the prime factors, and w[],  
 their powers. For example, say m = 125, then p[] = [ 3, 5 ] and w[] = [  
 1, 2 ].

You meant for m = 75, right? Otherwise, p[] = [ 5 ] and w[] = [ 3 ]. -- Simen
Feb 13 2009