digitalmars.D.learn - automate tuple creation

forkit (46/46) Jan 19 2022 so I have this code below, that creates an array of tuples.

forkit (2/2) Jan 19 2022 On Wednesday, 19 January 2022 at 21:59:15 UTC, forkit wrote:
H. S. Teoh (23/34) Jan 19 2022 Why can't you just use a loop to initialize it?
=?UTF-8?Q?Ali_=c3=87ehreli?= (24/27) Jan 19 2022 That works but would be unnecessarily slow and be against the idea of

=?UTF-8?Q?Ali_=c3=87ehreli?= (6/10) Jan 19 2022 But that's a mistake: If rnd is thread-local like that, it should be

forkit (38/38) Jan 19 2022 On Wednesday, 19 January 2022 at 22:35:58 UTC, Ali Çehreli wrote:

forkit (5/5) Jan 19 2022 On Wednesday, 19 January 2022 at 23:22:17 UTC, forkit wrote:

H. S. Teoh (21/23) Jan 19 2022 Premature optimization. ;-) There's nothing wrong with allocating an

=?UTF-8?Q?Ali_=c3=87ehreli?= (7/14) Jan 19 2022 Not in this case because I am pointing at premature pessimization. :)

forkit (12/12) Jan 19 2022 On Wednesday, 19 January 2022 at 21:59:15 UTC, forkit wrote:

H. S. Teoh (12/18) Jan 19 2022 Do the id's have to be unique? If not, std.random.uniform() would do

forkit (76/77) Jan 19 2022 yep...

forkit (27/31) Jan 19 2022 arrg!

forkit (73/73) Jan 19 2022 On Thursday, 20 January 2022 at 04:38:39 UTC, forkit wrote:

bauss (5/9) Jan 20 2022 Don't make them random then, but use an incrementor.

forkit (104/108) Jan 20 2022 The 'uniqueness' of id would actually be created in the database.

Stanislav Blinov (29/53) Jan 20 2022 Allocating 4 megs to generate 10 numbers??? You can generate a

forkit (27/39) Jan 20 2022 Nice. Thanks. I had to compromise a little though, as assumUnique

forkit (15/15) Jan 20 2022 On Thursday, 20 January 2022 at 21:16:46 UTC, forkit wrote:

Steven Schveighoffer (13/20) Jan 20 2022 Because it would allow altering const data.

forkit (17/18) Jan 20 2022 I'm not sure I understand. At what point in this function is

=?UTF-8?Q?Ali_=c3=87ehreli?= (18/32) Jan 20 2022 If that were allowed, you could mutate elements of record and would

=?UTF-8?Q?Ali_=c3=87ehreli?= (7/9) Jan 20 2022 As H. S. Teoh would add at this point, that is not idiomatic but the

forkit (147/147) Jan 20 2022 On Thursday, 20 January 2022 at 23:49:59 UTC, Ali Çehreli wrote:

forkit (7/7) Jan 20 2022 On Friday, 21 January 2022 at 01:35:40 UTC, forkit wrote:
=?UTF-8?Q?Ali_=c3=87ehreli?= (36/51) Jan 20 2022 Does that make just the following definition @safe or the entire module

forkit (11/26) Jan 20 2022 Oh. this was intentional, as I wanted to write once, and only

forkit (6/12) Jan 20 2022 oops. looking back at that code, it seems I didn't write what i

H. S. Teoh (7/9) Jan 20 2022 [...]

forkit (27/29) Jan 20 2022 :-)

forkit (6/31) Jan 20 2022 actually something not right with Appender I think...

Stanislav Blinov (4/6) Jan 21 2022 You're using writeln, which goes through C I/O buffered writes.

forkit (32/32) Jan 21 2022 On Friday, 21 January 2022 at 08:53:26 UTC, Stanislav Blinov

forkit (137/137) Jan 21 2022 On Friday, 21 January 2022 at 09:10:56 UTC, forkit wrote:

H. S. Teoh (16/21) Jan 21 2022 Actually you don't even need to do this, unless you want precise control

Steven Schveighoffer (7/19) Jan 21 2022 Yeah, iota is a random-access range, so you can just pass it directly,

forkit (9/15) Jan 21 2022 thanks. that makes more sense actually ;-)

forkit (18/18) Jan 21 2022 On Friday, 21 January 2022 at 21:01:11 UTC, forkit wrote:

forkit (19/19) Jan 21 2022 On Friday, 21 January 2022 at 21:43:38 UTC, forkit wrote:
H. S. Teoh (8/27) Jan 21 2022 What's the point of calling .dup here? The only reference to records is

forkit (99/103) Jan 21 2022 good pickup. thanks ;-)

forkit (22/22) Jan 21 2022 On Friday, 21 January 2022 at 22:25:32 UTC, forkit wrote:

Steven Schveighoffer (16/43) Jan 21 2022 oof! use enums for compile-time strings ;)

forkit (77/77) Jan 21 2022 On Saturday, 22 January 2022 at 01:33:16 UTC, Steven
forkit (32/37) Jan 22 2022 It just worked, so i didn't think about it too much.. but it

forkit (8/15) Jan 21 2022 yes, I was thinking this over as I was waking up this morning,

H. S. Teoh (22/32) Jan 21 2022 [...]

Steven Schveighoffer (24/31) Jan 20 2022 The compiler rules aren't enforced based on what code you wrote, it

forkit <forkit gmail.com> writes:

so I have this code below, that creates an array of tuples.

but instead of hardcoding 5 tuples (or hardcoding any amount of 
tuples),
what I really want to do is automate the creation of 
how-ever-many tuples I ask for:

i.e.

instead of calling this: createBoolMatrix(mArrBool);
I would call something like this: createBoolMatrix(mArrBool,5); 
// create an array of 5 typles.

Some ideas about direction would be welcome ;-)


// ---
module test;

import std.stdio;
import std.range;
import std.traits;
import std.random;

 safe:

void main()
{
     uint[][] mArrBool;
     createBoolMatrix(mArrBool);
     process(mArrBool);
}

void process(T)(const ref T t) if (isForwardRange!T && 
!isInfinite!T)
{
     t.writeln; // sample output -> [[0, 1], [1, 0], [1, 1], [1, 
1], [1, 1]]
}

void createBoolMatrix(ref uint[][] m)
{
     auto rnd = Random(unpredictableSeed);

     // btw. below does register with -profile=gc
     m = [ [cast(uint)rnd.dice(0.6, 1.4), cast(uint)rnd.dice(0.4, 
1.6)].randomShuffle(rnd),
           [cast(uint)rnd.dice(0.6, 1.4), cast(uint)rnd.dice(0.4, 
1.6)].randomShuffle(rnd),
           [cast(uint)rnd.dice(0.6, 1.4), cast(uint)rnd.dice(0.4, 
1.6)].randomShuffle(rnd),
           [cast(uint)rnd.dice(0.6, 1.4), cast(uint)rnd.dice(0.4, 
1.6)].randomShuffle(rnd),
           [cast(uint)rnd.dice(0.6, 1.4), cast(uint)rnd.dice(0.4, 
1.6)].randomShuffle(rnd)
         ];
}
// --

Jan 19 2022

forkit <forkit gmail.com> writes:

On Wednesday, 19 January 2022 at 21:59:15 UTC, forkit wrote:

oh. that randomShuffle was unnecessary ;-)

Jan 19 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jan 19, 2022 at 09:59:15PM +0000, forkit via Digitalmars-d-learn wrote:
 so I have this code below, that creates an array of tuples.
 
 but instead of hardcoding 5 tuples (or hardcoding any amount of
 tuples), what I really want to do is automate the creation of
 how-ever-many tuples I ask for:
 
 i.e.
 
 instead of calling this: createBoolMatrix(mArrBool);
 I would call something like this: createBoolMatrix(mArrBool,5); //
 create an array of 5 typles.

Why can't you just use a loop to initialize it?

	uint[][] createBoolMatrix(size_t n) {
		auto result = new uint[][n]; // allocate outer array
		foreach (ref row; result) {
			row = new uint[n]; // allocate inner array
			foreach (ref cell; row) {
				cell = cast(uint) rnd.dice(0.6, 1.4);
			}
		}
		return result;
	}

Or, if you wanna use those new-fangled range-based idioms:

	uint[][] createBoolMatrix(size_t n) {
		return iota(n)
			.map!(i => iota(n)
				.map!(j => cast(uint) rnd.dice(0.6, 1.4))
				.array)
			.array;
	}


T

-- 
Verbing weirds language. -- Calvin (& Hobbes)

Jan 19 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/19/22 13:59, forkit wrote:

 void createBoolMatrix(ref uint[][] m)
 {
      auto rnd = Random(unpredictableSeed);

That works but would be unnecessarily slow and be against the idea of 
random number generators. The usual approach is, once you have a 
randomized sequence, you just continue using it. For example, I move rnd 
to module scope and initialize it once.

Random rnd;

shared static this() {
   rnd = Random(unpredictableSeed);
}

auto randomValue() {
   return cast(uint)rnd.dice(0.6, 1.4);
}

// Returning a dynamically allocated array looks expensive
// here. Why not use a struct or std.typecons.Tuple instead?
auto randomTuple() {
   return [ randomValue(), randomValue() ];
}

void createBoolMatrix(ref uint[][] m, size_t count)
{
   import std.algorithm : map;
   import std.range : iota;
   m = count.iota.map!(i => randomTuple()).array;
}

Ali

Jan 19 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/19/22 14:33, Ali Çehreli wrote:

 Random rnd;

 shared static this() {
    rnd = Random(unpredictableSeed);
 }

But that's a mistake: If rnd is thread-local like that, it should be 
initialized in a 'static this' (not 'shared static this'). Otherwise, 
only the main thread's 'rnd' would be randomized, which is the only 
thread that executes 'shared static this' blocks.

Ali

Jan 19 2022

forkit <forkit gmail.com> writes:

On Wednesday, 19 January 2022 at 22:35:58 UTC, Ali Çehreli wrote:

so I combined ideas from all responses:

// --
module test;

import std.stdio : writeln;
import std.range : iota, isForwardRange, hasSlicing, hasLength, 
isInfinite, array;
import std.random : Random, unpredictableSeed, dice;
import std.algorithm : map;

 safe:

Random rnd;

static this()
{
   rnd = Random(unpredictableSeed);
}

void main()
{
     uint[][] mArrBool;

     // e.g: create a matrix consisting of 5 tuples, with each 
tuple containing 3 random bools (0 or 1)
     createBoolMatrix(mArrBool,5, 2);

     process(mArrBool);
}

void createBoolMatrix(ref uint[][] m, size_t numberOfTuples, 
size_t numberOfBoolsInTuple)
{
     m = iota(numberOfTuples)
             .map!(i => iota(numberOfBoolsInTuple)
             .map!(numberOfBoolsInTuple => cast(uint) 
rnd.dice(0.6, 1.4))
			.array).array;
}

void process(T)(const ref T t) if (isForwardRange!T && 
hasSlicing!T && hasLength!T && !isInfinite!T)
{
     t.writeln;
}

//--

Jan 19 2022

forkit <forkit gmail.com> writes:

On Wednesday, 19 January 2022 at 23:22:17 UTC, forkit wrote:

oops

// e.g: create a matrix consisting of 5 tuples, with each tuple 
containing 3 random bools (0 or 1)
     createBoolMatrix(mArrBool,5, 3);

Jan 19 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jan 19, 2022 at 02:33:02PM -0800, Ali �ehreli via Digitalmars-d-learn
wrote:
[...]
 // Returning a dynamically allocated array looks expensive
 // here. Why not use a struct or std.typecons.Tuple instead?

Premature optimization. ;-)  There's nothing wrong with allocating an
array.

If you're worried about memory efficiency, you could allocate the entire
matrix in a single block and just assemble slices of it in the outer
block, like this:

	uint[][] createBoolMatrix(size_t count) {
		auto buffer = new uint[count*count];
		return iota(count).map!(i => buffer[count*i .. count*(i+1)])
			.array;
	}

This lets you do only 2 GC allocations instead of (1+count) GC
allocations.  May help with memory fragmentation if `count` is large and
you create a lot of these things.  But I honestly wouldn't bother with
this unless your memory profiler is reporting a problem in this aspect
of your program.  It just adds complexity to your code (== poorer
long-term maintainability) for meager benefits.


T

-- 
What do you call optometrist jokes? Vitreous humor.

Jan 19 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/19/22 15:21, H. S. Teoh wrote:
 On Wed, Jan 19, 2022 at 02:33:02PM -0800, Ali Çehreli via 

Digitalmars-d-learn wrote:
 [...]
 // Returning a dynamically allocated array looks expensive
 // here. Why not use a struct or std.typecons.Tuple instead?

 Premature optimization. ;-)

Not in this case because I am pointing at premature pessimization. :) 
There is no reason to use two-element dynamic arrays when uint[2], 
Tuple!(uint, uint), and structs are available.

 There's nothing wrong with allocating an
 array.

Agreed.

Ali

Jan 19 2022

forkit <forkit gmail.com> writes:

On Wednesday, 19 January 2022 at 21:59:15 UTC, forkit wrote:

so at the moment i can get a set number of tuples, with a set 
number of bool values contained within each tuple.

e.g.
createBoolMatrix(mArrBool,3, 2);
[[1, 0], [1, 1], [1, 0]]

my next challenge (more for myself, but happy for input)..

is to enhance this to an return an associative array:

e.g

createBoolAssociativeMatrix(mArrBool,3, 2);

[ [1000:[1, 0]], [1001:[1, 1]], [1001:[1, 0]]]


where 1000 is some random id...

Jan 19 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Jan 20, 2022 at 12:12:56AM +0000, forkit via Digitalmars-d-learn wrote:
[...]
 createBoolAssociativeMatrix(mArrBool,3, 2);
 
 [ [1000:[1, 0]], [1001:[1, 1]], [1001:[1, 0]]]
 
 
 where 1000 is some random id...

Do the id's have to be unique?  If not, std.random.uniform() would do
the job.

If they have to be unique, you can either use a sequential global
counter (a 64-bit counter will suffice -- you'll won't exhaust it for at
least 60+ years of bumping the counter once per CPU tick at 8.4 GHz), or
use an AA of ids already generated and just call uniform() to generate a
new one until it doesn't collide anymore.


T

-- 
A mathematician learns more and more about less and less, until he knows
everything about nothing; whereas a philospher learns less and less about more
and more, until he knows nothing about everything.

Jan 19 2022

forkit <forkit gmail.com> writes:

On Thursday, 20 January 2022 at 00:30:44 UTC, H. S. Teoh wrote:
 Do the id's have to be unique?

yep...

I'm almost there ;-)

// ---
module test;

import std.stdio : writeln;
import std.range : iota, isForwardRange, hasSlicing, hasLength, 
isInfinite;
import std.array : array, Appender;
import std.random : Random, unpredictableSeed, dice, choice;
import std.algorithm : map, uniq;

 safe:

Random rnd;

static this()
{
   rnd = Random(unpredictableSeed);
}

void main()
{
     int recordsNeeded = 5;

     uint[] uniqueIDs;
     makeUniqueIDs(uniqueIDs, recordsNeeded);
     writeln(uniqueIDs);

     uint[][] mArrBool;

     // e.g: create a matrix consisting of 5 tuples,
     // with each tuple containing 3 random bools (0 or 1)
     createBoolMatrix(mArrBool,recordsNeeded, 3);

     // process just writeln's it's argument at the moment
     process(mArrBool); // [[1, 1, 1], [0, 0, 1], [1, 1, 1], [1, 
1, 1], [1, 1, 0]]

     // to do (integrate a single value taken from uniqueIDs so 
that each tuple looks like this: [999575454:[1, 1, 1]]
     // e.g.
     // processRecords(records);
     // output from above should look like this below:
     // [ [999575454:[1, 1, 1]], [999704246:[0, 0, 1]], 
[999969331:[1, 1, 1]], [999678591:[1, 1, 1]], [999691754:[1, 1, 
0]] ]

}

void createBoolMatrix(ref uint[][] m, size_t numberOfTuples, 
size_t numberOfBoolsInTuple)
{
     m = iota(numberOfTuples)
             .map!(i => iota(numberOfBoolsInTuple)
             .map!(numberOfBoolsInTuple => cast(uint) 
rnd.dice(0.6, 1.4))
			.array).array;
}

void process(T)(const ref T t) if (isForwardRange!T && 
hasSlicing!T && hasLength!T && !isInfinite!T)
{
     t.writeln;
}

void processRecords(T)(const ref T t) if (isForwardRange!T && 
hasSlicing!T && hasLength!T && !isInfinite!T)
{
     t.writeln;
}


void makeUniqueIDs(ref uint[] arr, size_t sz)
{
     // id needs to be 9 digits, and needs to start with 999
     int[] a = iota(999_000_000, 1_000_000_000).array; // can 
produce a max of 1_000_000 records.

     Appender!(uint[]) appndr;
     // pre-allocate space to avoid costly reallocations
     appndr.reserve(sz+1);

     foreach(value; 1..(sz + 1))
         appndr ~= cast(uint)a.choice(rnd);

     // just interesting to see often this asserts.
     //assert(appndr[].array == appndr[].uniq.array);

     arr = appndr[].uniq.array;

     // function should not return if this asserts (i.e. app will 
exit)
     assert(arr[].array == arr[].uniq.array);
}
// ---

Jan 19 2022

forkit <forkit gmail.com> writes:

On Thursday, 20 January 2022 at 04:00:59 UTC, forkit wrote:
 void makeUniqueIDs(ref uint[] arr, size_t sz)
 {
   ...
 }

arrg!

what was i thinking! ;-)

// ---
void makeUniqueIDs(ref uint[] arr, size_t sz)
{
     arr.reserve(sz);

     // id needs to be 9 digits, and needs to start with 999
     int[] a = iota(999_000_000, 1_000_000_000).array;
     // above will contain 1_000_000 records that we can choose 
from.

     int i = 0;
     uint x;
     while(i != sz)
     {
        x = cast(uint)a.choice(rnd);

        // ensure every id added is unique.
        if (!arr.canFind(x))
        {
            arr ~= x;
            i++;
        }
        else
            i--;
     }
}


//------

Jan 19 2022

forkit <forkit gmail.com> writes:

On Thursday, 20 January 2022 at 04:38:39 UTC, forkit wrote:

all done ;-)

// ---

module test;

import std.stdio : writeln;
import std.range : iota, isForwardRange, hasSlicing, hasLength, 
isInfinite;
import std.array : array, Appender;
import std.random : Random, unpredictableSeed, dice, choice;
import std.algorithm : map, uniq, canFind;

 safe:

Random rnd;

static this()
{
   rnd = Random(unpredictableSeed);
}

void main()
{
     int recordsNeeded = 2;
     int boolValuesNeeded = 3;

     uint[] uniqueIDs;
     makeUniqueIDs(uniqueIDs, recordsNeeded);

     uint[][] tuples;
     createBoolMatrix(tuples, recordsNeeded, boolValuesNeeded);

     uint[][uint][] records = CreateTupleDictionary(uniqueIDs, 
tuples);
     processRecords(records);

}

auto CreateTupleDictionary(ref uint[] ids, ref uint[][] tuples)
{
     uint[][uint][] records;

     foreach(i, id; ids)
         records ~= [ ids[i] : tuples[i] ];

     return records.dup;
}

void processRecords(T)(const ref T t) if (isForwardRange!T && 
hasSlicing!T && hasLength!T && !isInfinite!T)
{
     t.writeln;

     // output from above should look like this:
     // [[999583661:[1, 1, 0]], [999273256:[1, 1, 1]]]

     // hoping to explore parallel here too...
}

void createBoolMatrix(ref uint[][] m, size_t numberOfTuples, 
size_t numberOfBoolsInTuple)
{
     m = iota(numberOfTuples)
             .map!(i => iota(numberOfBoolsInTuple)
             .map!(numberOfBoolsInTuple => cast(uint) 
rnd.dice(0.6, 1.4))
             .array).array;
}


void makeUniqueIDs(ref uint[] arr, size_t sz)
{
     arr.reserve(sz);

     // id needs to be 9 digits, and needs to start with 999
     int[] a = iota(999_000_000, 1_000_000_000).array;
     // above will contain 1_000_000 records that we can choose 
from.

     int i = 0;
     uint x;
     while(i != sz)
     {
        x = cast(uint)a.choice(rnd);

        // ensure every id added is unique.
        if (!arr.canFind(x))
        {
            arr ~= x;
            i++;
        }
     }
}

// ---

Jan 19 2022

bauss <jj_1337 live.dk> writes:

On Thursday, 20 January 2022 at 04:00:59 UTC, forkit wrote:
 On Thursday, 20 January 2022 at 00:30:44 UTC, H. S. Teoh wrote:
 Do the id's have to be unique?

 yep...

Don't make them random then, but use an incrementor.

If you can have ids that aren't integers then you could use uuids 
too.

https://dlang.org/phobos/std_uuid.html

Jan 20 2022

forkit <forkit gmail.com> writes:

On Thursday, 20 January 2022 at 10:11:10 UTC, bauss wrote:

 Don't make them random then, but use an incrementor.

 If you can have ids that aren't integers then you could use 
 uuids too.

 https://dlang.org/phobos/std_uuid.html

The 'uniqueness' of id would actually be created in the database.

I just creating a dataset to simulate an export.

I'm pretty much done, just wish -profile=gc was working in 
createUniqueIDArray(..)

// ---------------

module test;
 safe:

import std.stdio : write, writef, writeln, writefln;
import std.range : iota, isForwardRange, hasSlicing, hasLength, 
isInfinite;
import std.array : array, byPair;
import std.random : Random, unpredictableSeed, dice, choice;
import std.algorithm : map, uniq, canFind;

debug { import std; }

Random rnd;
static this() {  rnd = Random(unpredictableSeed); }

void main()
{
     const int recordsNeeded = 10;
     const int valuesPerRecord = 8;

     int[] idArray;
     createUniqueIDArray(idArray, recordsNeeded);

     int[][] valuesArray;
     createValuesArray(valuesArray, recordsNeeded, 
valuesPerRecord);

     int[][int][] records = CreateDataSet(idArray, valuesArray, 
recordsNeeded);
     ProcessRecords(records);
}

void ProcessRecords(ref const(int[][int][]) recArray)
{
     void processRecord(ref int id, ref const(int)[] result)
     {
         writef("%s\t%s", id, result);
     }

     foreach(ref record; recArray)
     {
         foreach (ref rp; record.byPair)
         {
             processRecord(rp.expand);
         }
         writeln;
     }
}

int[][int][] CreateDataSet(ref int[] idArray, ref int[][] 
valuesArray, int numRecords)
{
     int[][int][] records;
     records.reserve(numRecords);
     debug { writefln("records.capacity is %s", records.capacity); 
}

     foreach(i, id; idArray)
         records ~= [ idArray[i] : valuesArray[i] ]; // NOTE: does 
register with -profile=gc

     return records.dup;
}

void createValuesArray(ref int[][] m, size_t recordsNeeded, 
size_t valuesPerRecord)
{
     m = iota(recordsNeeded)
             .map!(i => iota(valuesPerRecord)
             .map!(valuesPerRecord => cast(int)rnd.dice(0.6, 1.4))
             .array).array;  // NOTE: does register with 
-profile=gc
}


void createUniqueIDArray(ref int[] idArray, int recordsNeeded)
{
     idArray.reserve(recordsNeeded);
     debug { writefln("idArray.capacity is %s", idArray.capacity); 
}

     // id needs to be 9 digits, and needs to start with 999
     // below will contain 1_000_000 records that we can choose 
from.
     int[] ids = iota(999_000_000, 1_000_000_000).array; // NOTE: 
does NOT register with -profile=gc

     int i = 0;
     int x;
     while(i != recordsNeeded)
     {
        x = ids.choice(rnd);

        // ensure every id added is unique.
        if (!idArray.canFind(x))
        {
            idArray ~= x; // NOTE: does NOT register with 
-profile=gc
            i++;
        }
     }
}

/+
sample output:

999623777	[0, 0, 1, 1, 1, 0, 0, 0]
999017078	[1, 0, 1, 1, 1, 1, 1, 1]
999269073	[1, 1, 0, 0, 1, 1, 0, 1]
999408504	[0, 1, 1, 1, 1, 1, 0, 0]
999752314	[1, 0, 0, 1, 1, 1, 1, 0]
999660730	[0, 1, 0, 0, 1, 1, 1, 1]
999709822	[1, 1, 1, 0, 1, 1, 0, 0]
999642248	[1, 1, 1, 0, 0, 1, 1, 0]
999533069	[1, 1, 1, 0, 0, 0, 0, 0]
999661591	[1, 1, 1, 1, 1, 0, 1, 1]

+/

// ---------------

Jan 20 2022

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Thursday, 20 January 2022 at 12:15:56 UTC, forkit wrote:

 void createUniqueIDArray(ref int[] idArray, int recordsNeeded)
 {
     idArray.reserve(recordsNeeded);
     debug { writefln("idArray.capacity is %s", 
 idArray.capacity); }

     // id needs to be 9 digits, and needs to start with 999
     // below will contain 1_000_000 records that we can choose 
 from.
     int[] ids = iota(999_000_000, 1_000_000_000).array; // 
 NOTE: does NOT register with -profile=gc

     int i = 0;
     int x;
     while(i != recordsNeeded)
     {
        x = ids.choice(rnd);

        // ensure every id added is unique.
        if (!idArray.canFind(x))
        {
            idArray ~= x; // NOTE: does NOT register with 
 -profile=gc
            i++;
        }
     }
 }

Allocating 4 megs to generate 10 numbers??? You can generate a 
random number between 999000000 and 1000000000.

```
immutable(int)[] createUniqueIDArray(int recordsNeeded)
{
     import std.random;
     import std.algorithm.searching : canFind;
     int[] result = new int[recordsNeeded];

     int i = 0;
     int x;
     while(i != recordsNeeded)
     {
         // id needs to be 9 digits, and needs to start with 999
        x = uniform(999*10^^6, 10^^9);

        // ensure every id added is unique.
        if (!result[0 .. i].canFind(x))
            result[i++] = x;
     }
     import std.exception : assumeUnique;
     return result.assumeUnique;
}

void main()
{
     import std.stdio;
     createUniqueIDArray(10).writeln;
}
```

Only one allocation, and it would be tracked with -profile=gc...

Jan 20 2022

forkit <forkit gmail.com> writes:

On Thursday, 20 January 2022 at 12:40:09 UTC, Stanislav Blinov 
wrote:
 Allocating 4 megs to generate 10 numbers??? You can generate a 
 random number between 999000000 and 1000000000.

 ...
         // id needs to be 9 digits, and needs to start with 999
        x = uniform(999*10^^6, 10^^9);

        // ensure every id added is unique.
        if (!result[0 .. i].canFind(x))
            result[i++] = x;
     }
     import std.exception : assumeUnique;
     return result.assumeUnique;
 ...

Nice. Thanks. I had to compromise a little though, as assumUnique 
is  system, and all my code is  safe (and trying to avoid the 
need for inline  system wrapper ;-)

//---

void createUniqueIDArray(ref int[] idArray, int recordsNeeded)
{
     idArray.reserve(recordsNeeded);
     debug { writefln("idArray.capacity is %s", idArray.capacity); 
}

     int i = 0;
     int x;
     while(i != recordsNeeded)
     {
        // generate a random 9 digit id that starts with 999
        x = uniform(999*10^^6, 10^^9); // thanks Stanislav!

        // ensure every id added is unique.
        if (!idArray.canFind(x))
        {
            idArray ~= x; // NOTE: does NOT register with 
-profile=gc
            i++;
        }
     }
}

//---

Jan 20 2022

forkit <forkit gmail.com> writes:

On Thursday, 20 January 2022 at 21:16:46 UTC, forkit wrote:

Cannot work out why I cannot pass valuesArray in as ref const??

get error: Error: cannot append type `const(int[])[const(int)]` 
to type `int[][int][]`


// --

int[][int][] CreateDataSet(ref const int[] idArray, ref 
const(int[][]) valuesArray, const int numRecords)
{
     int[][int][] records;
     records.reserve(numRecords);

     foreach(i, id; idArray)
         records ~= [ idArray[i] : valuesArray[i] ];

     return records.dup;
}

// ---

Jan 20 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/20/22 5:07 PM, forkit wrote:
 On Thursday, 20 January 2022 at 21:16:46 UTC, forkit wrote:

 
 Cannot work out why I cannot pass valuesArray in as ref const??
 
 get error: Error: cannot append type `const(int[])[const(int)]` to type 
 `int[][int][]`

Because it would allow altering const data.

e.g.:

```d
const(int[])[const(int)] v = [1: [1, 2, 3]];
int[][int][] arr = [v]; // assume this works
arr[0][1][0] = 5; // oops, just set v[1][0]
```

General rule of thumb is that you can convert the HEAD of a structure to 
mutable from const, but not the TAIL (the stuff it points at).

An associative array is a pointer-to-implementation construct, so it's a 
reference.

-Steve

Jan 20 2022

forkit <forkit gmail.com> writes:

On Thursday, 20 January 2022 at 22:31:17 UTC, Steven 
Schveighoffer wrote:
 Because it would allow altering const data.

I'm not sure I understand. At what point in this function is 
valuesArray modified, and thus preventing it being passed in with 
const?

// ---

int[][int][] CreateDataSet
ref const int[] idArray, ref int[][] valuesArray, const int 
numRecords)
{
     int[][int][] records;
     records.reserve(numRecords);

     foreach(i, const id; idArray)
         records ~= [ idArray[i] : valuesArray[i] ];

     return records.dup;
}

// ----

Jan 20 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/20/22 15:01, forkit wrote:
 On Thursday, 20 January 2022 at 22:31:17 UTC, Steven Schveighoffer wrote:
 Because it would allow altering const data.

 I'm not sure I understand. At what point in this function is valuesArray
 modified, and thus preventing it being passed in with const?

 // ---

 int[][int][] CreateDataSet
 ref const int[] idArray, ref int[][] valuesArray, const int numRecords)
 {
      int[][int][] records;

Elements of records are mutable.

      records.reserve(numRecords);

      foreach(i, const id; idArray)
          records ~= [ idArray[i] : valuesArray[i] ];

If that were allowed, you could mutate elements of record and would 
break the promise to your caller.

Aside: There is no reason to pass arrays and associative arrays as 'ref 
const' in D as they are already reference types. Unlike C++, there is no 
copying of the elements. When you pass by value, just a couple of 
fundamental types are copied.

Furthermore and in theory, there may be a performance penalty when an 
array is passed by reference because elements would be accessed by 
dereferencing twice: Once for the parameter reference and once for the 
.ptr property of the array. (This is in theory.)

void foo(ref const int[]) {}  // Unnecessary
void foo(const int[]) {}      // Idiomatic
void foo(in int[]) {}         // Intentful :)

Passing arrays by reference makes sense when the function will mutate 
the argument.

Ali

Jan 20 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/20/22 15:10, Ali Çehreli wrote:

 void foo(const int[]) {}      // Idiomatic

As H. S. Teoh would add at this point, that is not idiomatic but the 
following are (with different meanings):

void foo(const(int)[]) {}      // Idiomatic
void foo(const(int[])) {}      // Idiomatic

 void foo(in int[]) {}         // Intentful :)

I still like that one. :)

Ali

Jan 20 2022

forkit <forkit gmail.com> writes:

On Thursday, 20 January 2022 at 23:49:59 UTC, Ali Çehreli wrote:

so here is final code, in idiomatic D, as far as I can tell ;-)

curious output when using -profile=gc

.. a line referring to: 
std.array.Appender!(immutable(char)[]).Appender.Data 
std.array.Appender!string.Appender.this 
C:\D\dmd2\windows\bin\..\..\src\phobos\std\array.d:3330

That's not real helpful, as I'm not sure what line of my code its 
referrring to.

// ---------------

/+
   
=====================================================================
    This program create a sample dataset consisting of 'random' 
records,
    and then outputs that dataset to a file.

    Arguments can be passed on the command line,
    or otherwise default values are used instead.

    Example of that output can be seen at the end of this code.
    
=====================================================================
+/

module test;
 safe

import std.stdio : write, writef, writeln, writefln;
import std.range : iota;
import std.array : array, byPair;
import std.random : Random, unpredictableSeed, dice, choice, 
uniform;
import std.algorithm : map, uniq, canFind;
import std.conv : to;
import std.stdio : File;
import std.format;

debug { import std; }

Random rnd;
static this() {  rnd = Random(unpredictableSeed); } // thanks Ali

void main(string[] args)
{
     int recordsNeeded, valuesPerRecord;
     string fname;

     if(args.length < 4)
     {
         recordsNeeded = 10;
         valuesPerRecord= 8;
         fname = "D:/rnd_records.txt";
     }
     else
     {
         // assumes valid values being passed in ;-)
         recordsNeeded = to!int(args[1]);
         valuesPerRecord = to!int(args[2]);
         fname = args[3];
     }

     int[] idArray;
     createUniqueIDArray(idArray, recordsNeeded);

     int[][] valuesArray;
     createValuesArray(valuesArray, recordsNeeded, 
valuesPerRecord);

     int[][int][] records = CreateDataSet(idArray, valuesArray, 
recordsNeeded);
     ProcessRecords(records, fname);

     writefln("All done. Check if records written to %s", fname);
}

void createUniqueIDArray
(ref int[] idArray, const(int) recordsNeeded)
{
     idArray.reserve(recordsNeeded);
     debug { writefln("idArray.capacity is %s", idArray.capacity); 
}

     int i = 0;
     int x;
     while(i != recordsNeeded)
     {
        // id needs to be 9 digits, and needs to start with 999
        x = uniform(999*10^^6, 10^^9); // thanks Stanislav

        // ensure every id added is unique.
        if (!idArray.canFind(x))
        {
            idArray ~= x; // NOTE: does NOT appear to register 
with -profile=gc
            i++;
        }
     }
}

void createValuesArray
(ref int[][] valuesArray, const(int) recordsNeeded, const(int) 
valuesPerRecord)
{
     valuesArray = iota(recordsNeeded)
             .map!(i => iota(valuesPerRecord)
             .map!(valuesPerRecord => cast(int)rnd.dice(0.6, 1.4))
             .array).array;  // NOTE: does register with 
-profile=gc
}

int[][int][] CreateDataSet
(const(int)[] idArray, int[][] valuesArray, const(int) numRecords)
{
     int[][int][] records;
     records.reserve(numRecords);
     debug { writefln("records.capacity is %s", records.capacity); 
}

     foreach(i, const id; idArray)
     {
         // NOTE: below does register with -profile=gc
         records ~= [ idArray[i] : valuesArray[i] ];
     }
     return records.dup;
}

void ProcessRecords
(in int[][int][] recArray, const(string) fname)
{
     auto file = File(fname, "w");
     scope(exit) file.close;

     string[] formattedRecords;
     formattedRecords.reserve(recArray.length);
     debug { writefln("formattedRecords.capacity is %s", 
formattedRecords.capacity); }

     void processRecord(const(int) id, const(int)[] values)
     {
         // NOTE: below does register with -profile=gc
         formattedRecords ~= id.to!string ~ 
values.format!"%(%s,%)";
     }

     foreach(ref const record; recArray)
     {
         foreach (ref rp; record.byPair)
         {
             processRecord(rp.expand);
         }
     }

     foreach(ref rec; formattedRecords)
         file.writeln(rec);
}

/+
sample file output:

9992511730,1,0,1,0,1,0,1
9995369731,1,1,1,1,1,1,1
9993136031,1,0,0,0,1,0,0
9998979051,1,1,1,1,0,1,1
9998438090,1,1,0,1,1,0,0
9995132750,0,0,1,0,1,1,1
9997123630,0,1,1,1,0,1,1
9998351590,1,0,0,1,1,1,1
9991454121,1,1,1,1,1,0,1
9997673520,1,1,1,1,1,1,1

+/

// ---------------

Jan 20 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 01:35:40 UTC, forkit wrote:

oops. nasty mistake to make ;-)


module test;
 safe

should be:

module test;
 safe:

Jan 20 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/20/22 17:35, forkit wrote:

 module test;
  safe

Does that make just the following definition  safe or the entire module 
 safe? Trying... Yes, I am right. To make the module safe, use the 
following syntax:

 safe:

      idArray.reserve(recordsNeeded);

[...]
             idArray ~= x; // NOTE: does NOT appear to register with
 -profile=gc

Because you've already reserved enough memory above. Good.

      int[][int][] records;
      records.reserve(numRecords);

That's good for the array part. However...

          // NOTE: below does register with -profile=gc
          records ~= [ idArray[i] : valuesArray[i] ];

The right hand side is a freshly generated associative array. For every 
element of 'records', there is a one-element AA created. AA will need to 
allocate memory for its element. So, GC allocation is expected there.

      string[] formattedRecords;
      formattedRecords.reserve(recArray.length);

[...]
          // NOTE: below does register with -profile=gc
          formattedRecords ~= id.to!string ~ values.format!"%(%s,%)";

Again, although 'formattedRecords' has reserved memory, the right hand 
side has dynamic memory allocations.

1) id.to!string allocates

2) format allocates memory for its 'string' result (I think the Appender 
report comes from format's internals.)

3) Operator ~ makes a new string from the previous two

(Somehow, I don't see three allocations though. Perhaps an NRVO is 
applied there. (?))

I like the following better, which reduces the allocations:

         formattedRecords ~= format!"%s%(%s,%)"(id.to!string, values);

      foreach(ref rec; formattedRecords)
          file.writeln(rec);

The bigger question is, why did 'formattedRecords' exist at all? You 
could have written the output directly to the file. But even *worse* and 
with apologies, ;) here is something crazy that achieves the same thing:

void ProcessRecords
(in int[][int][] recArray, const(string) fname)
{
     import std.algorithm : joiner;
     auto toWrite = recArray.map!(e => e.byPair);
     File("rnd_records.txt", 
"w").writefln!"%(%(%(%s,%(%s,%)%)%)\n%)"(toWrite);
}

I've done lot's of trial and error for the required number of nested %( 
%) pairs. Phew...

Ali

Jan 20 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 02:30:35 UTC, Ali Çehreli wrote:
 The bigger question is, why did 'formattedRecords' exist at 
 all? You could have written the output directly to the file.

Oh. this was intentional, as I wanted to write once, and only 
once, to the file.

The consequence of that decision of course, is the extra memory 
allocations...

But in my example code I only create 10 records. In reality, my 
dataset will have 100,000's of records, so I don't want to write 
100,000s of time to the same file.

 But even *worse* and with apologies, ;) here is something crazy 
 that achieves the same thing:

 void ProcessRecords
 (in int[][int][] recArray, const(string) fname)
 {
     import std.algorithm : joiner;
     auto toWrite = recArray.map!(e => e.byPair);
     File("rnd_records.txt", 
 "w").writefln!"%(%(%(%s,%(%s,%)%)%)\n%)"(toWrite);
 }

 I've done lot's of trial and error for the required number of 
 nested %( %) pairs. Phew...

 Ali

Yes, that does look worse ;-)

But I'm looking into that code to see if I can salvage something 
from it ;-)

Jan 20 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 03:45:08 UTC, forkit wrote:
 On Friday, 21 January 2022 at 02:30:35 UTC, Ali Çehreli wrote:
 The bigger question is, why did 'formattedRecords' exist at 
 all? You could have written the output directly to the file.

 Oh. this was intentional, as I wanted to write once, and only 
 once, to the file.

oops. looking back at that code, it seems I didn't write what i 
intended :-(

I might have to use a kindof stringbuilder instead, then write a 
massive string once to the file.

Jan 20 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Jan 21, 2022 at 03:50:37AM +0000, forkit via Digitalmars-d-learn wrote:
[...]
 I might have to use a kindof stringbuilder instead, then write a
 massive string once to the file.

[...]

std.array.appender is your friend.


T

-- 
Meat: euphemism for dead animal. -- Flora

Jan 20 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 03:57:01 UTC, H. S. Teoh wrote:
 std.array.appender is your friend.

 T

:-)

// --

void ProcessRecords
(in int[][int][] recArray, const(string) fname)
{
     auto file = File(fname, "w");
     scope(exit) file.close;

     Appender!string bigString = appender!string;
     bigString.reserve(recArray.length);
     debug { writefln("bigString.capacity is %s", 
bigString.capacity); }

     void processRecord(const(int) id, const(int)[] values)
     {
         bigString ~= id.to!string ~ values.format!"%(%s,%)" ~ 
"\n";
     }

     foreach(ref const record; recArray)
     {
         foreach (ref rp; record.byPair)
         {
             processRecord(rp.expand);
         }
     }

     file.write(bigString[]);
}

// ---

Jan 20 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 04:08:33 UTC, forkit wrote:
 // --

 void ProcessRecords
 (in int[][int][] recArray, const(string) fname)
 {
     auto file = File(fname, "w");
     scope(exit) file.close;

     Appender!string bigString = appender!string;
     bigString.reserve(recArray.length);
     debug { writefln("bigString.capacity is %s", 
 bigString.capacity); }

     void processRecord(const(int) id, const(int)[] values)
     {
         bigString ~= id.to!string ~ values.format!"%(%s,%)" ~ 
 "\n";
     }

     foreach(ref const record; recArray)
     {
         foreach (ref rp; record.byPair)
         {
             processRecord(rp.expand);
         }
     }

     file.write(bigString[]);
 }

 // ---

actually something not right with Appender I think...

100_000 records took 20sec (ok)


1_000_000 records never finished - after 1hr/45min I cancelled 
the process.

??

Jan 20 2022

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Friday, 21 January 2022 at 03:50:37 UTC, forkit wrote:

 I might have to use a kindof stringbuilder instead, then write 
 a massive string once to the file.

You're using writeln, which goes through C I/O buffered writes. 
Whether you make one call or several is of little consequence - 
you're limited by buffer size and options.

Jan 21 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 08:53:26 UTC, Stanislav Blinov 
wrote:

turns out the problem has nothing to do with appender...

It's actually this line:

if (!idArray.canFind(x)):

when i comment this out in the function below, the program does 
what I want in seconds.

only problem is, the ids are no longer unique (in the file)

// ---
void createUniqueIDArray
(ref int[] idArray, const(int) recordsNeeded)
{
     idArray.reserve(recordsNeeded);
     debug { writefln("idArray.capacity is %s", idArray.capacity); 
}

     int i = 0;
     int x;
     while(i != recordsNeeded)
     {
        // id needs to be 9 digits, and needs to start with 999
        x = uniform(999*10^^6, 10^^9); // thanks Stanislav

        // ensure every id added is unique.
        //if (!idArray.canFind(x))
        //{
            idArray ~= x; // NOTE: does NOT appear to register 
with -profile=gc
            i++;
        //}
     }

     debug { writefln("idArray.length = %s", idArray.length); }
}

// ----

Jan 21 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 09:10:56 UTC, forkit wrote:

ok... in the interest of corecting the code I posted previously...

... here is a version that actually works in secs (for a million 
records), as opposed to hours!


// ---------------

/+
   
=====================================================================
    This program create a sample dataset consisting of 'random' 
records,
    and then outputs that dataset to a file.

    Arguments can be passed on the command line,
    or otherwise default values are used instead.

    Example of that output can be seen at the end of this code.
    
=====================================================================
+/

module test;
 safe:
import std.stdio : write, writef, writeln, writefln;
import std.range : iota, takeExactly;
import std.array : array, byPair, Appender, appender;
import std.random : Random, unpredictableSeed, dice, choice, 
uniform;
import std.algorithm : map, uniq, canFind, among;
import std.conv : to;
import std.format;
import std.stdio : File;
import std.file : exists;
import std.exception : enforce;

debug { import std; }

Random rnd;
static this() {  rnd = Random(unpredictableSeed); } // thanks Ali

void main(string[] args)
{
     int recordsNeeded, valuesPerRecord;
     string fname;

     if(args.length < 4)
     {
         //recordsNeeded = 1_000_000;
         //recordsNeeded = 100_000;
         recordsNeeded = 10;

         valuesPerRecord= 8;

         //fname = "D:/rnd_records.txt";
         fname = "./rnd_records.txt";
     }
     else
     {
         // assumes valid values being passed in ;-)
         recordsNeeded = to!int(args[1]);
         valuesPerRecord = to!int(args[2]);
         fname = args[3];
     }

     debug
         { writefln("%s records, %s values for record, will be 
written to file: %s", recordsNeeded, valuesPerRecord, fname); }
     else
         { enforce(!exists(fname), "Oop! That file already 
exists!"); }

     // id needs to be 9 digits, and needs to start with 999
     int[] idArray = takeExactly(iota(999*10^^6, 10^^9), 
recordsNeeded).array;
     debug { writefln("idArray.length = %s", idArray.length); }

     int[][] valuesArray;
     createValuesArray(valuesArray, recordsNeeded, 
valuesPerRecord);

     int[][int][] records = CreateDataSet(idArray, valuesArray, 
recordsNeeded);

     ProcessRecords(records, fname);

     writefln("All done. Check if records written to %s", fname);
}

void createValuesArray
(ref int[][] valuesArray, const(int) recordsNeeded, const(int) 
valuesPerRecord)
{
     valuesArray = iota(recordsNeeded)
             .map!(i => iota(valuesPerRecord)
             .map!(valuesPerRecord => cast(int)rnd.dice(0.6, 1.4))
             .array).array;  // NOTE: does register with 
-profile=gc

     debug { writefln("valuesArray.length = %s", 
valuesArray.length); }

}

int[][int][] CreateDataSet
(const(int)[] idArray, int[][] valuesArray, const(int) numRecords)
{
     int[][int][] records;
     records.reserve(numRecords);
     debug { writefln("records.capacity is %s", records.capacity); 
}

     foreach(i, const id; idArray)
     {
         // NOTE: below does register with -profile=gc
         records ~= [ idArray[i] : valuesArray[i] ];
     }

     debug { writefln("records.length = %s", records.length); }

     return records.dup;
}

void ProcessRecords
(in int[][int][] recArray, const(string) fname)
{
     auto file = File(fname, "w");
     scope(exit) file.close;

     Appender!string bigString = appender!string;
     bigString.reserve(recArray.length);
     debug { writefln("bigString.capacity is %s", 
bigString.capacity); }

     // NOTE: forward declaration required for this nested function
     void processRecord(const(int) id, const(int)[] values)
     {
         // NOTE: below does register with -profile=gc
         bigString ~= id.to!string ~ "," ~ values.format!"%(%s,%)" 
~ "\n";
     }

     foreach(ref const record; recArray)
     {
         foreach (ref rp; record.byPair)
         {
             processRecord(rp.expand);
         }
     }

     file.write(bigString[]);
}

/+
sample file output:

9992511730,1,0,1,0,1,0,1
9995369731,1,1,1,1,1,1,1
9993136031,1,0,0,0,1,0,0
9998979051,1,1,1,1,0,1,1
9998438090,1,1,0,1,1,0,0
9995132750,0,0,1,0,1,1,1
9997123630,0,1,1,1,0,1,1
9998351590,1,0,0,1,1,1,1
9991454121,1,1,1,1,1,0,1
9997673520,1,1,1,1,1,1,1

+/

// ---------------

Jan 21 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Jan 21, 2022 at 10:12:42AM +0000, forkit via Digitalmars-d-learn wrote:
[...]
 Random rnd;
 static this() {  rnd = Random(unpredictableSeed); } // thanks Ali

Actually you don't even need to do this, unless you want precise control
over the initialization of your RNG.  If you don't specify the RNG
parameter in the calls to std.random functions, they will use the
default RNG, which is already initialized with unpredictableSeed.


[...]
     // id needs to be 9 digits, and needs to start with 999
     int[] idArray = takeExactly(iota(999*10^^6, 10^^9),
 recordsNeeded).array;

[...]

This is wasteful if you're not planning to use every ID in this
million-entry long array.  Much better to just use an AA to keep track
of which IDs have already been generated instead.  Of course, if you
plan to use most of the array, then the AA may wind up using more memory
than the array. So it depends on your use case.


T

-- 
Never wrestle a pig. You both get covered in mud, and the pig likes it.

Jan 21 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/21/22 1:36 PM, H. S. Teoh wrote:
 On Fri, Jan 21, 2022 at 10:12:42AM +0000, forkit via Digitalmars-d-learn wrote:

 [...]
      // id needs to be 9 digits, and needs to start with 999
      int[] idArray = takeExactly(iota(999*10^^6, 10^^9),
 recordsNeeded).array;

 [...]
 
 This is wasteful if you're not planning to use every ID in this
 million-entry long array.  Much better to just use an AA to keep track
 of which IDs have already been generated instead.  Of course, if you
 plan to use most of the array, then the AA may wind up using more memory
 than the array. So it depends on your use case.

Yeah, iota is a random-access range, so you can just pass it directly, 
and not allocate anything.

Looking at the usage, it doesn't need to be an array at all. But 
modifying the code to properly accept the range might prove difficult 
for someone not used to it.

-Steve

Jan 21 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 18:50:46 UTC, Steven Schveighoffer 
wrote:
 Yeah, iota is a random-access range, so you can just pass it 
 directly, and not allocate anything.

 Looking at the usage, it doesn't need to be an array at all. 
 But modifying the code to properly accept the range might prove 
 difficult for someone not used to it.

 -Steve

thanks. that makes more sense actually ;-)

now i can get rid of the idArray completely, and just do:

foreach(i, id; enumerate(iota(iotaStartNum, iotaStartNum + 
recordsNeeded)))
{
     records ~= [ id: valuesArray[i] ];
}

Jan 21 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 21:01:11 UTC, forkit wrote:

even better, I got rid of all those uncessary arrays ;-)

// ---

int[][int][] CreateDataSet
(const(int) recordsNeeded, const(int)valuesPerRecord)
{
     int[][int][] records;
     records.reserve(recordsNeeded);

     foreach(i, id; iota(iotaStartNum, iotaStartNum + 
recordsNeeded).enumerate)
     {
         records ~= [ id: 
iota(valuesPerRecord).map!(valuesPerRecord => 
cast(int)rnd.dice(0.6, 1.4)).array ];
     }

     return records.dup;
}

// ---

Jan 21 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 21:43:38 UTC, forkit wrote:

oops... should be:

// ---

int[][int][] CreateDataSet
(const(int) recordsNeeded, const(int)valuesPerRecord)
{
     int[][int][] records;
     records.reserve(recordsNeeded);

     const int iotaStartNum = 100_000_001;

     foreach(i, id; iota(iotaStartNum, iotaStartNum + 
recordsNeeded).enumerate)
     {
         records ~= [ id: 
iota(valuesPerRecord).map!(valuesPerRecord => 
cast(int)rnd.dice(0.6, 1.4)).array ];
     }

     return records.dup;
}

// ---

Jan 21 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Jan 21, 2022 at 09:43:38PM +0000, forkit via Digitalmars-d-learn wrote:
 On Friday, 21 January 2022 at 21:01:11 UTC, forkit wrote:

[...]
 even better, I got rid of all those uncessary arrays ;-)
 
 // ---
 
 int[][int][] CreateDataSet
 (const(int) recordsNeeded, const(int)valuesPerRecord)
 {
     int[][int][] records;
     records.reserve(recordsNeeded);
 
     foreach(i, id; iota(iotaStartNum, iotaStartNum +
 recordsNeeded).enumerate)
     {
         records ~= [ id: iota(valuesPerRecord).map!(valuesPerRecord =>
 cast(int)rnd.dice(0.6, 1.4)).array ];
     }
 
     return records.dup;

What's the point of calling .dup here?  The only reference to records is
going out of scope, so why can't you just return it?  The .dup is just
creating extra work for nothing.


T

-- 
Unix was not designed to stop people from doing stupid things, because that
would also stop them from doing clever things. -- Doug Gwyn

Jan 21 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 21:56:33 UTC, H. S. Teoh wrote:
 What's the point of calling .dup here?  The only reference to 
 records is going out of scope, so why can't you just return it? 
  The .dup is just creating extra work for nothing.


 T

good pickup. thanks ;-)

// ----

module test;
 safe:

import std.stdio : write, writef, writeln, writefln;
import std.range : iota, enumerate;
import std.array : array, byPair, Appender, appender;
import std.random : Random, unpredictableSeed, dice, randomCover;
import std.algorithm : map;
import std.conv : to;
import std.format;
import std.stdio : File;
import std.file : exists;
import std.exception : enforce;

debug { import std; }

Random rnd;
static this() {  rnd = Random(unpredictableSeed); }

void main(string[] args)
{
     int recordsNeeded, valuesPerRecord;
     string fname;

     if(args.length < 4)
     {
         recordsNeeded = 10; // default
         valuesPerRecord= 8; // default

         fname = "D:/rnd_records.txt"; // default
         //fname = "./rnd_records.txt"; // default
     }
     else
     {
         // assumes valid values being passed in ;-)
         recordsNeeded = to!int(args[1]);
         valuesPerRecord = to!int(args[2]);
         fname = args[3];
     }

     debug
         { writefln("%s records, %s values for record, will be 
written to file: %s", recordsNeeded, valuesPerRecord, fname); }
     else
         {
             enforce(!exists(fname), "Oop! That file already 
exists!");
             enforce(recordsNeeded <= 1_000_000_000, "C'mon! 
That's too many records!");
         }

     int[][int][] records = CreateDataSet(recordsNeeded, 
valuesPerRecord);

     ProcessDataSet(records, fname);

     writefln("All done. Check if records written to %s", fname);
}

int[][int][] CreateDataSet
(const(int) recordsNeeded, const(int) valuesPerRecord)
{
     const int iotaStartNum = 100_000_001;

     int[][int][] records;
     records.reserve(recordsNeeded);
     debug { writefln("records.capacity is %s", records.capacity); 
}

     foreach(i, id; iota(iotaStartNum, iotaStartNum + 
recordsNeeded).enumerate)
     {
         // NOTE: below does register with -profile=gc
         records ~= [ id: 
iota(valuesPerRecord).map!(valuesPerRecord => 
cast(int)rnd.dice(0.6, 1.4)).array ];
     }

     debug { writefln("records.length = %s", records.length); }
     return records;
}

// this creates a big string of 'formatted' records, and outputs 
that string to a file.
void ProcessDataSet
(in int[][int][] records, const(string) fname)
{
     auto file = File(fname, "w");
     scope(exit) file.close;

     Appender!string bigString = appender!string;
     bigString.reserve(records.length);
     debug { writefln("bigString.capacity is %s", 
bigString.capacity); }

     // NOTE: forward declaration required for this nested function
     void processRecord(const(int) id, const(int)[] values)
     {
         bigString ~= id.to!string ~ "," ~ values.format!"%(%s,%)" 
~ "\n";
     }

     foreach(ref const record; records)
     {
         foreach (ref rp; record.byPair)
         {
             processRecord(rp.expand);
         }
     }

     debug { writeln; writeln(bigString[].until("\n")); writeln; } 
// display just one record

     file.write(bigString[]);
}
// ----

Jan 21 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 22:25:32 UTC, forkit wrote:

I really like how alias and mixin can simplify my code even 
further:

//---

int[][int][] CreateDataSet
(const(int) recordsNeeded, const(int) valuesPerRecord)
{
     int[][int][] records;
     records.reserve(recordsNeeded);

     const int iotaStartNum = 100_000_001;
     alias iotaValues = Alias!"iota(iotaStartNum, iotaStartNum + 
recordsNeeded).enumerate";
     alias recordValues = 
Alias!"iota(valuesPerRecord).map!(valuesPerRecord => 
cast(int)rnd.dice(0.6, 1.4)).array";

     foreach(i, id; mixin(iotaValues))
     {
         records ~= [ id: mixin(recordValues) ];
     }

     return records;
}

//---

Jan 21 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/21/22 6:24 PM, forkit wrote:
 On Friday, 21 January 2022 at 22:25:32 UTC, forkit wrote:

 
 I really like how alias and mixin can simplify my code even further:
 
 //---
 
 int[][int][] CreateDataSet
 (const(int) recordsNeeded, const(int) valuesPerRecord)
 {
      int[][int][] records;
      records.reserve(recordsNeeded);
 
      const int iotaStartNum = 100_000_001;
      alias iotaValues = Alias!"iota(iotaStartNum, iotaStartNum + 
 recordsNeeded).enumerate";
      alias recordValues = 
 Alias!"iota(valuesPerRecord).map!(valuesPerRecord => 
 cast(int)rnd.dice(0.6, 1.4)).array";

oof! use enums for compile-time strings ;)

```d
enum iotaValues = "iota(...";
```

 
      foreach(i, id; mixin(iotaValues))
      {
          records ~= [ id: mixin(recordValues) ];
      }
 
      return records;
 }

Not sure I agree that the mixin looks better.

Also, I'm curious about this code:

```d
iota(valuesPerRecord).map!(valuesPerRecord => cast(int)rnd.dice(0.6, 
1.4)).array;
```

That second `valuesPerRecord` is not used in the lambda, and also it's 
not referring to the original element, it's the name of a parameter in 
the lambda.

Are you sure this is doing what you want?

-Steve

Jan 21 2022

forkit <forkit gmail.com> writes:

On Saturday, 22 January 2022 at 01:33:16 UTC, Steven 
Schveighoffer wrote:

so I why watching this video by Andrei:

https://www.youtube.com/watch?v=mCrVYYlFTrA

In it, he talked about writing the simplest design that could 
possibly work....

Which got me thinking....

// ----

module test;
 safe:

import std.stdio : write, writef, writeln, writefln;
import std.range : iota, enumerate;
import std.array : array, byPair, Appender, appender;
import std.random : Random, unpredictableSeed, dice, randomCover;
import std.algorithm : map;
import std.conv : to;
import std.format;
import std.stdio : File;
import std.file : exists;
import std.exception : enforce;
import std.meta : Alias;

debug { import std; }

Random rnd;
static this() {  rnd = Random(unpredictableSeed); }

void main(string[] args)
{
     int recordsNeeded, valuesPerRecord;

     string fname;

     if(args.length < 4) // then set defaults
     {
         recordsNeeded = 10;
         valuesPerRecord= 8;

         version(Windows) { fname = "D:/rnd_records.txt"; }
         version(linux) { fname = "./rnd_records.txt"; }
     }
     else
     {
         // assumes valid values being passed in ;-)
         recordsNeeded = to!int(args[1]);
         valuesPerRecord = to!int(args[2]);
         fname = args[3];
     }

     debug
         { writefln("%s records (where a record is: id and %s 
values), will be written to file: %s", recordsNeeded, 
valuesPerRecord, fname); }
     else
         {
             enforce(!exists(fname), "Oops! That file already 
exists!");
             enforce(recordsNeeded <= 1_000_000_000, "C'mon! 
That's too many records!");
         }

     CreateDataFile(recordsNeeded, valuesPerRecord, fname);

     writefln("All done. Check if records written to %s", fname);
}

void CreateDataFile(const(int) recordsNeeded, const(int) 
valuesPerRecord, const(string) fname)
{
     auto file = File(fname, "w");
     scope(exit) file.close;

     Appender!string bigString = appender!string;
     bigString.reserve(recordsNeeded);

     const int iotaStartNum = 100_000_001;

     foreach(i, id; iota(iotaStartNum, iotaStartNum + 
recordsNeeded).enumerate)
     {
         bigString
             ~= id.to!string
             ~ ","
             ~ valuesPerRecord.iota.map!(valuesPerRecord => 
cast(int)rnd.dice(0.6, 1.4)).format!"%(%s,%)"
             ~ "\n";
     }

     file.write(bigString[]);
}

// ----

Jan 21 2022

forkit <forkit gmail.com> writes:

On Saturday, 22 January 2022 at 01:33:16 UTC, Steven 
Schveighoffer wrote:
 That second `valuesPerRecord` is not used in the lambda, and 
 also it's not referring to the original element, it's the name 
 of a parameter in the lambda.

 Are you sure this is doing what you want?

 -Steve

It just worked, so i didn't think about it too much.. but it 
seems to work either way.

And to be honest, the only part of it I understand, is the dice 
part ;-)

In any case I changed it:

from: valuesPerRecord =>
to:  i =>

// ----

void CreateDataFile(const(int) recordsNeeded, const(int) 
valuesPerRecord, const(string) fname)
{
     auto rnd = Random(unpredictableSeed);

     auto file = File(fname, "w");
     scope(exit) file.close;

     Appender!string bigString = appender!string;
     bigString.reserve(recordsNeeded);

     const int iotaStartNum = 100_000_001;

     foreach(i, id; iota(iotaStartNum, iotaStartNum + 
recordsNeeded).enumerate)
     {
         bigString
             ~= id.to!string
             ~ ","
             ~ valuesPerRecord.iota.map!(i => rnd.dice(0.6, 
1.4)).format!"%(%s,%)"
             ~ "\n";
     }

     file.write(bigString[]);
}

// ---

Jan 22 2022

forkit <forkit gmail.com> writes:

On Friday, 21 January 2022 at 18:36:42 UTC, H. S. Teoh wrote:
 This is wasteful if you're not planning to use every ID in this 
 million-entry long array.  Much better to just use an AA to 
 keep track of which IDs have already been generated instead.  
 Of course, if you plan to use most of the array, then the AA 
 may wind up using more memory than the array. So it depends on 
 your use case.


 T

yes, I was thinking this over as I was waking up this morning, 
and thought... what the hell am I doing generating all those 
numbers that might never get used.

better to do:

const int iotaStartNum = 100_000_000;
int[] idArray = iota(startiotaStartNum, iotaStartNum + 
recordsNeeded).array;

Jan 21 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Jan 21, 2022 at 09:10:56AM +0000, forkit via Digitalmars-d-learn wrote:
[...]
 turns out the problem has nothing to do with appender...
 
 It's actually this line:
 
 if (!idArray.canFind(x)):
 
 when i comment this out in the function below, the program does what I
 want in seconds.
 
 only problem is, the ids are no longer unique (in the file)

[...]

Ah yes, the good ole O(N�) trap that new programmers often fall into.
:-)

Using .canFind on an array of generated IDs means scanning the entire
array every time you find a non-colliding ID. As the array grows, the
cost of doing this increases. The overall effect is O(N�) time
complexity, because you're continually scanning the array every time you
generate a new ID.

Use an AA instead, and performance should dramatically increase. I.e.,
instead of:

	size_t[] idArray;
	...
	if (!idArray.canFind(x)): // O(N) cost to scan array 

write:

	bool[size_t] idAA;
	...
	if (x in idAA) ...	// O(1) cost to look up an ID


T

-- 
VI = Visual Irritation

Jan 21 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/20/22 6:01 PM, forkit wrote:
 On Thursday, 20 January 2022 at 22:31:17 UTC, Steven Schveighoffer wrote:
 Because it would allow altering const data.

 
 I'm not sure I understand. At what point in this function is valuesArray 
 modified, and thus preventing it being passed in with const?

The compiler rules aren't enforced based on what code you wrote, it 
doesn't have the capability of proving that your code doesn't modify things.

Instead, it enforces simple rules that allow prove that const data 
cannot be modified.

I'll make it into a simpler example:

```d
const int[] arr = [1, 2, 3, 4 5];
int[] arr2 = arr;
```

This code does not modify any data in arr. But that in itself isn't easy 
to prove. In order to ensure that arr is never modified, the compiler 
would have to analyze all the code, and every possible way that arr2 
might escape or be used somewhere at some point to modify the data. It 
doesn't have the capability or time to do that (if I understand 
correctly, this is NP-hard).

Instead, it just says, you can't convert references from const to 
mutable without a cast. That guarantees that you can't modify const 
data. However, it does rule out a certain class of code that might not 
modify the const data, even if it has the opportunity to.

It's like saying, "we don't let babies play with sharp knives" vs. "we 
will let babies play with sharp knives but stop them just before they 
stab themselves."

-Steve

Jan 20 2022

D Programming

C/C++ Programming

Other

digitalmars.D.learn - automate tuple creation