Welcome to Web-News
A Web-based News Reader
Subject Re: CORRECTION: random k-sample of a file
From Russell Lewis <webmaster@villagersonline.com>
Date Thu, 09 Oct 2008 16:11:41 -0700
Newsgroups digitalmars.D

Sorry, I overlooked the (obvious) fact that this biases things towards
the end of the file.  Change the while loop as follows:

BEGIN CODE
   int count = k;
   while(!feof(stdin))
   {
     if(random(0,count) >= k)
       ReadALine();   // this discards the line
     else
     {
       int i = random(0,k);
       bufs[i] = ReadALine();
     }
     count++;
   }
END CODE

Note that I'm assuming that random(a,b) is a function that randomly
returns an integer in the range [a,b).

This new version works because the most-recently-read line has a k/count
  chance of getting picked up into the current working set, and when we
do choose to discard some line, we give all of the old lines an equal
chance of survival.

I don't have to to formally work out the math right now, but I'm pretty
sure that that would give each line in the entire file a k/count chance
of being in the set at any given point in the process.

Russ

Russell Lewis wrote:
> BEGIN CODE
> char[][] choose(int k)
> {
>   char[][] bufs; bufs.length = k;
>
>   for(int i=0; i<k; i++)
>     bufs[i] = ReadALine();
>
>   while(!feof(stdin))
>   {
>     int i = random(0,k);
>     bufs[i] = ReadALine();
>   }
>
>   return bufs;
> }
>
> END CODE
>
> Andrei Alexandrescu wrote:
>> I just ran across a nice little problem that I wanted to share as an
>> exercise for the interested in futile pastimes.
>>
>> The challenge, should you accept it, is to write a program that given
>> a number k and a file, outputs k lines picked uniformly at random from
>> the file. (If the file has less than k lines, it should all be
>> output.) The file's size is unbounded so loading it in memory is not
>> an option. How'd you go about it?
>>
>>
>> Andrei

Recent messages in this thread
 
-# random k-sample of a file Andrei Alexandrescu 09-Oct-2008 02:56 pm
.-# Re: random k-sample of a file dsimcha 09-Oct-2008 03:09 pm
.|\# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 03:30 pm
.-# Re: random k-sample of a file Walter Bright 09-Oct-2008 03:11 pm
.|-# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 03:14 pm
.||-# Re: random k-sample of a file Steven Schveighoffer 09-Oct-2008 03:57 pm
.|||\# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 04:04 pm
.||-# Re: random k-sample of a file Walter Bright 09-Oct-2008 07:29 pm
.||.-# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 07:44 pm
.||..\# Re: random k-sample of a file Walter Bright 09-Oct-2008 07:51 pm
.|-# Re: random k-sample of a file Ary Borenszweig 09-Oct-2008 03:16 pm
.|.-# Re: random k-sample of a file Denis Koroskin 09-Oct-2008 03:19 pm
.|.||# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 03:21 pm
.|.|\# Re: random k-sample of a file Ary Borenszweig 09-Oct-2008 03:22 pm
.|.\# Re: random k-sample of a file KennyTM~ 09-Oct-2008 03:21 pm
.-# Re: random k-sample of a file bearophile 09-Oct-2008 03:19 pm
.|-# Re: random k-sample of a file bearophile 09-Oct-2008 03:29 pm
.|.-# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 03:44 pm
.|.|\# Re: random k-sample of a file bearophile 09-Oct-2008 04:19 pm
.|.-# Re: random k-sample of a file bearophile 09-Oct-2008 03:56 pm
.|..-# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 04:30 pm
.|...-# Re: random k-sample of a file Ary Borenszweig 09-Oct-2008 05:19 pm
.|....-# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 05:59 pm
.|.....-# Re: random k-sample of a file Ary Borenszweig 09-Oct-2008 06:04 pm
.|......-# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 06:23 pm
.|......|-# Re: random k-sample of a file Ary Borenszweig 10-Oct-2008 09:48 am
.|......|.-# Re: random k-sample of a file Andrei Alexandrescu 10-Oct-2008 09:49 am
.|......|..|# Re: random k-sample of a file Ary Borenszweig 10-Oct-2008 10:17 am
.|......|..-# Re: random k-sample of a file bearophile 10-Oct-2008 10:55 am
.|......|...|# Re: random k-sample of a file Andrei Alexandrescu 10-Oct-2008 11:09 am
.|......|...\# Re: random k-sample of a file Sergey Gromov 10-Oct-2008 11:25 am
.|......-# Re: random k-sample of a file Kirk McDonald 09-Oct-2008 06:29 pm
.|.......\# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 06:35 pm
.|# Re: random k-sample of a file Jason House 09-Oct-2008 03:55 pm
.-# Re: random k-sample of a file Kirk McDonald 09-Oct-2008 04:20 pm
.|-# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 04:33 pm
.||-# Re: random k-sample of a file Kirk McDonald 09-Oct-2008 04:41 pm
.|||\# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 04:55 pm
.||-# Re: random k-sample of a file bearophile 09-Oct-2008 05:02 pm
.||.-# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 05:07 pm
.||..-# Re: random k-sample of a file Christopher Wright 10-Oct-2008 08:12 am
.||...\# Re: random k-sample of a file Andrei Alexandrescu 10-Oct-2008 08:32 am
.|-# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 04:34 pm
.|.-# Re: random k-sample of a file Kirk McDonald 09-Oct-2008 04:47 pm
.|..\# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 04:57 pm
.-# Re: random k-sample of a file BCS 09-Oct-2008 05:13 pm
.|\# Re: random k-sample of a file Jason House 09-Oct-2008 05:46 pm
.-# Re: random k-sample of a file Russell Lewis 09-Oct-2008 06:58 pm
.||# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 07:01 pm
.|-# Re: CORRECTION: random k-sample of a file (Current message) Russell Lewis 09-Oct-2008 07:11 pm
.|.\# Re: CORRECTION: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 07:16 pm
.-# Re: random k-sample of a file Christopher Wright 09-Oct-2008 08:33 pm
.|-# Re: random k-sample of a file Carlos 09-Oct-2008 09:12 pm
.|.-# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 09:21 pm
.|..\# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 09:25 pm
.-# Re: random k-sample of a file Sergey Gromov 09-Oct-2008 09:58 pm
.|-# Re: random k-sample of a file Andrei Alexandrescu 09-Oct-2008 11:28 pm
.|.-# Re: random k-sample of a file Jarrett Billingsley 10-Oct-2008 12:27 am
.|.||# Re: random k-sample of a file Andrei Alexandrescu 10-Oct-2008 12:35 am
.|.|-# Re: random k-sample of a file Denis Koroskin 10-Oct-2008 05:55 am
.|.|.-# Re: random k-sample of a file Jarrett Billingsley 10-Oct-2008 08:34 am
.|.|..-# Re: random k-sample of a file Andrei Alexandrescu 10-Oct-2008 08:53 am
.|.|...-# Re: random k-sample of a file Denis Koroskin 10-Oct-2008 09:40 am
.|.|....\# Re: random k-sample of a file Andrei Alexandrescu 10-Oct-2008 10:06 am
.|.-# Re: random k-sample of a file Sergey Gromov 10-Oct-2008 08:11 am
.|..\# Re: random k-sample of a file bearophile 10-Oct-2008 08:59 am
.\# Re: random k-sample of a file Michel Fortin 10-Oct-2008 07:50 am