www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - automate tuple creation

reply forkit <forkit gmail.com> writes:
so I have this code below, that creates an array of tuples.

but instead of hardcoding 5 tuples (or hardcoding any amount of 
tuples),
what I really want to do is automate the creation of 
how-ever-many tuples I ask for:

i.e.

instead of calling this: createBoolMatrix(mArrBool);
I would call something like this: createBoolMatrix(mArrBool,5); 
// create an array of 5 typles.

Some ideas about direction would be welcome ;-)


// ---
module test;

import std.stdio;
import std.range;
import std.traits;
import std.random;

 safe:

void main()
{
     uint[][] mArrBool;
     createBoolMatrix(mArrBool);
     process(mArrBool);
}

void process(T)(const ref T t) if (isForwardRange!T && 
!isInfinite!T)
{
     t.writeln; // sample output -> [[0, 1], [1, 0], [1, 1], [1, 
1], [1, 1]]
}

void createBoolMatrix(ref uint[][] m)
{
     auto rnd = Random(unpredictableSeed);

     // btw. below does register with -profile=gc
     m = [ [cast(uint)rnd.dice(0.6, 1.4), cast(uint)rnd.dice(0.4, 
1.6)].randomShuffle(rnd),
           [cast(uint)rnd.dice(0.6, 1.4), cast(uint)rnd.dice(0.4, 
1.6)].randomShuffle(rnd),
           [cast(uint)rnd.dice(0.6, 1.4), cast(uint)rnd.dice(0.4, 
1.6)].randomShuffle(rnd),
           [cast(uint)rnd.dice(0.6, 1.4), cast(uint)rnd.dice(0.4, 
1.6)].randomShuffle(rnd),
           [cast(uint)rnd.dice(0.6, 1.4), cast(uint)rnd.dice(0.4, 
1.6)].randomShuffle(rnd)
         ];
}
// --
Jan 19 2022
next sibling parent forkit <forkit gmail.com> writes:
On Wednesday, 19 January 2022 at 21:59:15 UTC, forkit wrote:

oh. that randomShuffle was unnecessary ;-)
Jan 19 2022
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jan 19, 2022 at 09:59:15PM +0000, forkit via Digitalmars-d-learn wrote:
 so I have this code below, that creates an array of tuples.
 
 but instead of hardcoding 5 tuples (or hardcoding any amount of
 tuples), what I really want to do is automate the creation of
 how-ever-many tuples I ask for:
 
 i.e.
 
 instead of calling this: createBoolMatrix(mArrBool);
 I would call something like this: createBoolMatrix(mArrBool,5); //
 create an array of 5 typles.
Why can't you just use a loop to initialize it? uint[][] createBoolMatrix(size_t n) { auto result = new uint[][n]; // allocate outer array foreach (ref row; result) { row = new uint[n]; // allocate inner array foreach (ref cell; row) { cell = cast(uint) rnd.dice(0.6, 1.4); } } return result; } Or, if you wanna use those new-fangled range-based idioms: uint[][] createBoolMatrix(size_t n) { return iota(n) .map!(i => iota(n) .map!(j => cast(uint) rnd.dice(0.6, 1.4)) .array) .array; } T -- Verbing weirds language. -- Calvin (& Hobbes)
Jan 19 2022
prev sibling next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 1/19/22 13:59, forkit wrote:

 void createBoolMatrix(ref uint[][] m)
 {
      auto rnd = Random(unpredictableSeed);
That works but would be unnecessarily slow and be against the idea of random number generators. The usual approach is, once you have a randomized sequence, you just continue using it. For example, I move rnd to module scope and initialize it once. Random rnd; shared static this() { rnd = Random(unpredictableSeed); } auto randomValue() { return cast(uint)rnd.dice(0.6, 1.4); } // Returning a dynamically allocated array looks expensive // here. Why not use a struct or std.typecons.Tuple instead? auto randomTuple() { return [ randomValue(), randomValue() ]; } void createBoolMatrix(ref uint[][] m, size_t count) { import std.algorithm : map; import std.range : iota; m = count.iota.map!(i => randomTuple()).array; } Ali
Jan 19 2022
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 1/19/22 14:33, Ali Çehreli wrote:

 Random rnd;

 shared static this() {
    rnd = Random(unpredictableSeed);
 }
But that's a mistake: If rnd is thread-local like that, it should be initialized in a 'static this' (not 'shared static this'). Otherwise, only the main thread's 'rnd' would be randomized, which is the only thread that executes 'shared static this' blocks. Ali
Jan 19 2022
parent reply forkit <forkit gmail.com> writes:
On Wednesday, 19 January 2022 at 22:35:58 UTC, Ali Çehreli wrote:

so I combined ideas from all responses:

// --
module test;

import std.stdio : writeln;
import std.range : iota, isForwardRange, hasSlicing, hasLength, 
isInfinite, array;
import std.random : Random, unpredictableSeed, dice;
import std.algorithm : map;

 safe:

Random rnd;

static this()
{
   rnd = Random(unpredictableSeed);
}

void main()
{
     uint[][] mArrBool;

     // e.g: create a matrix consisting of 5 tuples, with each 
tuple containing 3 random bools (0 or 1)
     createBoolMatrix(mArrBool,5, 2);

     process(mArrBool);
}

void createBoolMatrix(ref uint[][] m, size_t numberOfTuples, 
size_t numberOfBoolsInTuple)
{
     m = iota(numberOfTuples)
             .map!(i => iota(numberOfBoolsInTuple)
             .map!(numberOfBoolsInTuple => cast(uint) 
rnd.dice(0.6, 1.4))
			.array).array;
}

void process(T)(const ref T t) if (isForwardRange!T && 
hasSlicing!T && hasLength!T && !isInfinite!T)
{
     t.writeln;
}

//--
Jan 19 2022
parent forkit <forkit gmail.com> writes:
On Wednesday, 19 January 2022 at 23:22:17 UTC, forkit wrote:

oops

// e.g: create a matrix consisting of 5 tuples, with each tuple 
containing 3 random bools (0 or 1)
     createBoolMatrix(mArrBool,5, 3);
Jan 19 2022
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jan 19, 2022 at 02:33:02PM -0800, Ali Çehreli via Digitalmars-d-learn
wrote:
[...]
 // Returning a dynamically allocated array looks expensive
 // here. Why not use a struct or std.typecons.Tuple instead?
Premature optimization. ;-) There's nothing wrong with allocating an array. If you're worried about memory efficiency, you could allocate the entire matrix in a single block and just assemble slices of it in the outer block, like this: uint[][] createBoolMatrix(size_t count) { auto buffer = new uint[count*count]; return iota(count).map!(i => buffer[count*i .. count*(i+1)]) .array; } This lets you do only 2 GC allocations instead of (1+count) GC allocations. May help with memory fragmentation if `count` is large and you create a lot of these things. But I honestly wouldn't bother with this unless your memory profiler is reporting a problem in this aspect of your program. It just adds complexity to your code (== poorer long-term maintainability) for meager benefits. T -- What do you call optometrist jokes? Vitreous humor.
Jan 19 2022
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 1/19/22 15:21, H. S. Teoh wrote:
 On Wed, Jan 19, 2022 at 02:33:02PM -0800, Ali Çehreli via 
Digitalmars-d-learn wrote:
 [...]
 // Returning a dynamically allocated array looks expensive
 // here. Why not use a struct or std.typecons.Tuple instead?
Premature optimization. ;-)
Not in this case because I am pointing at premature pessimization. :) There is no reason to use two-element dynamic arrays when uint[2], Tuple!(uint, uint), and structs are available.
 There's nothing wrong with allocating an
 array.
Agreed. Ali
Jan 19 2022
prev sibling parent reply forkit <forkit gmail.com> writes:
On Wednesday, 19 January 2022 at 21:59:15 UTC, forkit wrote:

so at the moment i can get a set number of tuples, with a set 
number of bool values contained within each tuple.

e.g.
createBoolMatrix(mArrBool,3, 2);
[[1, 0], [1, 1], [1, 0]]

my next challenge (more for myself, but happy for input)..

is to enhance this to an return an associative array:

e.g

createBoolAssociativeMatrix(mArrBool,3, 2);

[ [1000:[1, 0]], [1001:[1, 1]], [1001:[1, 0]]]


where 1000 is some random id...
Jan 19 2022
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jan 20, 2022 at 12:12:56AM +0000, forkit via Digitalmars-d-learn wrote:
[...]
 createBoolAssociativeMatrix(mArrBool,3, 2);
 
 [ [1000:[1, 0]], [1001:[1, 1]], [1001:[1, 0]]]
 
 
 where 1000 is some random id...
Do the id's have to be unique? If not, std.random.uniform() would do the job. If they have to be unique, you can either use a sequential global counter (a 64-bit counter will suffice -- you'll won't exhaust it for at least 60+ years of bumping the counter once per CPU tick at 8.4 GHz), or use an AA of ids already generated and just call uniform() to generate a new one until it doesn't collide anymore. T -- A mathematician learns more and more about less and less, until he knows everything about nothing; whereas a philospher learns less and less about more and more, until he knows nothing about everything.
Jan 19 2022
parent reply forkit <forkit gmail.com> writes:
On Thursday, 20 January 2022 at 00:30:44 UTC, H. S. Teoh wrote:
 Do the id's have to be unique?
yep... I'm almost there ;-) // --- module test; import std.stdio : writeln; import std.range : iota, isForwardRange, hasSlicing, hasLength, isInfinite; import std.array : array, Appender; import std.random : Random, unpredictableSeed, dice, choice; import std.algorithm : map, uniq; safe: Random rnd; static this() { rnd = Random(unpredictableSeed); } void main() { int recordsNeeded = 5; uint[] uniqueIDs; makeUniqueIDs(uniqueIDs, recordsNeeded); writeln(uniqueIDs); uint[][] mArrBool; // e.g: create a matrix consisting of 5 tuples, // with each tuple containing 3 random bools (0 or 1) createBoolMatrix(mArrBool,recordsNeeded, 3); // process just writeln's it's argument at the moment process(mArrBool); // [[1, 1, 1], [0, 0, 1], [1, 1, 1], [1, 1, 1], [1, 1, 0]] // to do (integrate a single value taken from uniqueIDs so that each tuple looks like this: [999575454:[1, 1, 1]] // e.g. // processRecords(records); // output from above should look like this below: // [ [999575454:[1, 1, 1]], [999704246:[0, 0, 1]], [999969331:[1, 1, 1]], [999678591:[1, 1, 1]], [999691754:[1, 1, 0]] ] } void createBoolMatrix(ref uint[][] m, size_t numberOfTuples, size_t numberOfBoolsInTuple) { m = iota(numberOfTuples) .map!(i => iota(numberOfBoolsInTuple) .map!(numberOfBoolsInTuple => cast(uint) rnd.dice(0.6, 1.4)) .array).array; } void process(T)(const ref T t) if (isForwardRange!T && hasSlicing!T && hasLength!T && !isInfinite!T) { t.writeln; } void processRecords(T)(const ref T t) if (isForwardRange!T && hasSlicing!T && hasLength!T && !isInfinite!T) { t.writeln; } void makeUniqueIDs(ref uint[] arr, size_t sz) { // id needs to be 9 digits, and needs to start with 999 int[] a = iota(999_000_000, 1_000_000_000).array; // can produce a max of 1_000_000 records. Appender!(uint[]) appndr; // pre-allocate space to avoid costly reallocations appndr.reserve(sz+1); foreach(value; 1..(sz + 1)) appndr ~= cast(uint)a.choice(rnd); // just interesting to see often this asserts. //assert(appndr[].array == appndr[].uniq.array); arr = appndr[].uniq.array; // function should not return if this asserts (i.e. app will exit) assert(arr[].array == arr[].uniq.array); } // ---
Jan 19 2022
next sibling parent reply forkit <forkit gmail.com> writes:
On Thursday, 20 January 2022 at 04:00:59 UTC, forkit wrote:
 void makeUniqueIDs(ref uint[] arr, size_t sz)
 {
   ...
 }
arrg! what was i thinking! ;-) // --- void makeUniqueIDs(ref uint[] arr, size_t sz) { arr.reserve(sz); // id needs to be 9 digits, and needs to start with 999 int[] a = iota(999_000_000, 1_000_000_000).array; // above will contain 1_000_000 records that we can choose from. int i = 0; uint x; while(i != sz) { x = cast(uint)a.choice(rnd); // ensure every id added is unique. if (!arr.canFind(x)) { arr ~= x; i++; } else i--; } } //------
Jan 19 2022
parent forkit <forkit gmail.com> writes:
On Thursday, 20 January 2022 at 04:38:39 UTC, forkit wrote:

all done ;-)

// ---

module test;

import std.stdio : writeln;
import std.range : iota, isForwardRange, hasSlicing, hasLength, 
isInfinite;
import std.array : array, Appender;
import std.random : Random, unpredictableSeed, dice, choice;
import std.algorithm : map, uniq, canFind;

 safe:

Random rnd;

static this()
{
   rnd = Random(unpredictableSeed);
}

void main()
{
     int recordsNeeded = 2;
     int boolValuesNeeded = 3;

     uint[] uniqueIDs;
     makeUniqueIDs(uniqueIDs, recordsNeeded);

     uint[][] tuples;
     createBoolMatrix(tuples, recordsNeeded, boolValuesNeeded);

     uint[][uint][] records = CreateTupleDictionary(uniqueIDs, 
tuples);
     processRecords(records);

}

auto CreateTupleDictionary(ref uint[] ids, ref uint[][] tuples)
{
     uint[][uint][] records;

     foreach(i, id; ids)
         records ~= [ ids[i] : tuples[i] ];

     return records.dup;
}

void processRecords(T)(const ref T t) if (isForwardRange!T && 
hasSlicing!T && hasLength!T && !isInfinite!T)
{
     t.writeln;

     // output from above should look like this:
     // [[999583661:[1, 1, 0]], [999273256:[1, 1, 1]]]

     // hoping to explore parallel here too...
}

void createBoolMatrix(ref uint[][] m, size_t numberOfTuples, 
size_t numberOfBoolsInTuple)
{
     m = iota(numberOfTuples)
             .map!(i => iota(numberOfBoolsInTuple)
             .map!(numberOfBoolsInTuple => cast(uint) 
rnd.dice(0.6, 1.4))
             .array).array;
}


void makeUniqueIDs(ref uint[] arr, size_t sz)
{
     arr.reserve(sz);

     // id needs to be 9 digits, and needs to start with 999
     int[] a = iota(999_000_000, 1_000_000_000).array;
     // above will contain 1_000_000 records that we can choose 
from.

     int i = 0;
     uint x;
     while(i != sz)
     {
        x = cast(uint)a.choice(rnd);

        // ensure every id added is unique.
        if (!arr.canFind(x))
        {
            arr ~= x;
            i++;
        }
     }
}

// ---
Jan 19 2022
prev sibling parent reply bauss <jj_1337 live.dk> writes:
On Thursday, 20 January 2022 at 04:00:59 UTC, forkit wrote:
 On Thursday, 20 January 2022 at 00:30:44 UTC, H. S. Teoh wrote:
 Do the id's have to be unique?
yep...
Don't make them random then, but use an incrementor. If you can have ids that aren't integers then you could use uuids too. https://dlang.org/phobos/std_uuid.html
Jan 20 2022
parent reply forkit <forkit gmail.com> writes:
On Thursday, 20 January 2022 at 10:11:10 UTC, bauss wrote:

 Don't make them random then, but use an incrementor.

 If you can have ids that aren't integers then you could use 
 uuids too.

 https://dlang.org/phobos/std_uuid.html
The 'uniqueness' of id would actually be created in the database. I just creating a dataset to simulate an export. I'm pretty much done, just wish -profile=gc was working in createUniqueIDArray(..) // --------------- module test; safe: import std.stdio : write, writef, writeln, writefln; import std.range : iota, isForwardRange, hasSlicing, hasLength, isInfinite; import std.array : array, byPair; import std.random : Random, unpredictableSeed, dice, choice; import std.algorithm : map, uniq, canFind; debug { import std; } Random rnd; static this() { rnd = Random(unpredictableSeed); } void main() { const int recordsNeeded = 10; const int valuesPerRecord = 8; int[] idArray; createUniqueIDArray(idArray, recordsNeeded); int[][] valuesArray; createValuesArray(valuesArray, recordsNeeded, valuesPerRecord); int[][int][] records = CreateDataSet(idArray, valuesArray, recordsNeeded); ProcessRecords(records); } void ProcessRecords(ref const(int[][int][]) recArray) { void processRecord(ref int id, ref const(int)[] result) { writef("%s\t%s", id, result); } foreach(ref record; recArray) { foreach (ref rp; record.byPair) { processRecord(rp.expand); } writeln; } } int[][int][] CreateDataSet(ref int[] idArray, ref int[][] valuesArray, int numRecords) { int[][int][] records; records.reserve(numRecords); debug { writefln("records.capacity is %s", records.capacity); } foreach(i, id; idArray) records ~= [ idArray[i] : valuesArray[i] ]; // NOTE: does register with -profile=gc return records.dup; } void createValuesArray(ref int[][] m, size_t recordsNeeded, size_t valuesPerRecord) { m = iota(recordsNeeded) .map!(i => iota(valuesPerRecord) .map!(valuesPerRecord => cast(int)rnd.dice(0.6, 1.4)) .array).array; // NOTE: does register with -profile=gc } void createUniqueIDArray(ref int[] idArray, int recordsNeeded) { idArray.reserve(recordsNeeded); debug { writefln("idArray.capacity is %s", idArray.capacity); } // id needs to be 9 digits, and needs to start with 999 // below will contain 1_000_000 records that we can choose from. int[] ids = iota(999_000_000, 1_000_000_000).array; // NOTE: does NOT register with -profile=gc int i = 0; int x; while(i != recordsNeeded) { x = ids.choice(rnd); // ensure every id added is unique. if (!idArray.canFind(x)) { idArray ~= x; // NOTE: does NOT register with -profile=gc i++; } } } /+ sample output: 999623777 [0, 0, 1, 1, 1, 0, 0, 0] 999017078 [1, 0, 1, 1, 1, 1, 1, 1] 999269073 [1, 1, 0, 0, 1, 1, 0, 1] 999408504 [0, 1, 1, 1, 1, 1, 0, 0] 999752314 [1, 0, 0, 1, 1, 1, 1, 0] 999660730 [0, 1, 0, 0, 1, 1, 1, 1] 999709822 [1, 1, 1, 0, 1, 1, 0, 0] 999642248 [1, 1, 1, 0, 0, 1, 1, 0] 999533069 [1, 1, 1, 0, 0, 0, 0, 0] 999661591 [1, 1, 1, 1, 1, 0, 1, 1] +/ // ---------------
Jan 20 2022
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Thursday, 20 January 2022 at 12:15:56 UTC, forkit wrote:

 void createUniqueIDArray(ref int[] idArray, int recordsNeeded)
 {
     idArray.reserve(recordsNeeded);
     debug { writefln("idArray.capacity is %s", 
 idArray.capacity); }

     // id needs to be 9 digits, and needs to start with 999
     // below will contain 1_000_000 records that we can choose 
 from.
     int[] ids = iota(999_000_000, 1_000_000_000).array; // 
 NOTE: does NOT register with -profile=gc

     int i = 0;
     int x;
     while(i != recordsNeeded)
     {
        x = ids.choice(rnd);

        // ensure every id added is unique.
        if (!idArray.canFind(x))
        {
            idArray ~= x; // NOTE: does NOT register with 
 -profile=gc
            i++;
        }
     }
 }
Allocating 4 megs to generate 10 numbers??? You can generate a random number between 999000000 and 1000000000. ``` immutable(int)[] createUniqueIDArray(int recordsNeeded) { import std.random; import std.algorithm.searching : canFind; int[] result = new int[recordsNeeded]; int i = 0; int x; while(i != recordsNeeded) { // id needs to be 9 digits, and needs to start with 999 x = uniform(999*10^^6, 10^^9); // ensure every id added is unique. if (!result[0 .. i].canFind(x)) result[i++] = x; } import std.exception : assumeUnique; return result.assumeUnique; } void main() { import std.stdio; createUniqueIDArray(10).writeln; } ``` Only one allocation, and it would be tracked with -profile=gc...
Jan 20 2022
parent reply forkit <forkit gmail.com> writes:
On Thursday, 20 January 2022 at 12:40:09 UTC, Stanislav Blinov 
wrote:
 Allocating 4 megs to generate 10 numbers??? You can generate a 
 random number between 999000000 and 1000000000.

 ...
         // id needs to be 9 digits, and needs to start with 999
        x = uniform(999*10^^6, 10^^9);

        // ensure every id added is unique.
        if (!result[0 .. i].canFind(x))
            result[i++] = x;
     }
     import std.exception : assumeUnique;
     return result.assumeUnique;
 ...
Nice. Thanks. I had to compromise a little though, as assumUnique is system, and all my code is safe (and trying to avoid the need for inline system wrapper ;-) //--- void createUniqueIDArray(ref int[] idArray, int recordsNeeded) { idArray.reserve(recordsNeeded); debug { writefln("idArray.capacity is %s", idArray.capacity); } int i = 0; int x; while(i != recordsNeeded) { // generate a random 9 digit id that starts with 999 x = uniform(999*10^^6, 10^^9); // thanks Stanislav! // ensure every id added is unique. if (!idArray.canFind(x)) { idArray ~= x; // NOTE: does NOT register with -profile=gc i++; } } } //---
Jan 20 2022
parent reply forkit <forkit gmail.com> writes:
On Thursday, 20 January 2022 at 21:16:46 UTC, forkit wrote:

Cannot work out why I cannot pass valuesArray in as ref const??

get error: Error: cannot append type `const(int[])[const(int)]` 
to type `int[][int][]`


// --

int[][int][] CreateDataSet(ref const int[] idArray, ref 
const(int[][]) valuesArray, const int numRecords)
{
     int[][int][] records;
     records.reserve(numRecords);

     foreach(i, id; idArray)
         records ~= [ idArray[i] : valuesArray[i] ];

     return records.dup;
}

// ---
Jan 20 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/20/22 5:07 PM, forkit wrote:
 On Thursday, 20 January 2022 at 21:16:46 UTC, forkit wrote:

 
 Cannot work out why I cannot pass valuesArray in as ref const??
 
 get error: Error: cannot append type `const(int[])[const(int)]` to type 
 `int[][int][]`
Because it would allow altering const data. e.g.: ```d const(int[])[const(int)] v = [1: [1, 2, 3]]; int[][int][] arr = [v]; // assume this works arr[0][1][0] = 5; // oops, just set v[1][0] ``` General rule of thumb is that you can convert the HEAD of a structure to mutable from const, but not the TAIL (the stuff it points at). An associative array is a pointer-to-implementation construct, so it's a reference. -Steve
Jan 20 2022
parent reply forkit <forkit gmail.com> writes:
On Thursday, 20 January 2022 at 22:31:17 UTC, Steven 
Schveighoffer wrote:
 Because it would allow altering const data.
I'm not sure I understand. At what point in this function is valuesArray modified, and thus preventing it being passed in with const? // --- int[][int][] CreateDataSet ref const int[] idArray, ref int[][] valuesArray, const int numRecords) { int[][int][] records; records.reserve(numRecords); foreach(i, const id; idArray) records ~= [ idArray[i] : valuesArray[i] ]; return records.dup; } // ----
Jan 20 2022
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 1/20/22 15:01, forkit wrote:
 On Thursday, 20 January 2022 at 22:31:17 UTC, Steven Schveighoffer wrote:
 Because it would allow altering const data.
I'm not sure I understand. At what point in this function is valuesArray modified, and thus preventing it being passed in with const? // --- int[][int][] CreateDataSet ref const int[] idArray, ref int[][] valuesArray, const int numRecords) { int[][int][] records;
Elements of records are mutable.
      records.reserve(numRecords);

      foreach(i, const id; idArray)
          records ~= [ idArray[i] : valuesArray[i] ];
If that were allowed, you could mutate elements of record and would break the promise to your caller. Aside: There is no reason to pass arrays and associative arrays as 'ref const' in D as they are already reference types. Unlike C++, there is no copying of the elements. When you pass by value, just a couple of fundamental types are copied. Furthermore and in theory, there may be a performance penalty when an array is passed by reference because elements would be accessed by dereferencing twice: Once for the parameter reference and once for the .ptr property of the array. (This is in theory.) void foo(ref const int[]) {} // Unnecessary void foo(const int[]) {} // Idiomatic void foo(in int[]) {} // Intentful :) Passing arrays by reference makes sense when the function will mutate the argument. Ali
Jan 20 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 1/20/22 15:10, Ali Çehreli wrote:

 void foo(const int[]) {}      // Idiomatic
As H. S. Teoh would add at this point, that is not idiomatic but the following are (with different meanings): void foo(const(int)[]) {} // Idiomatic void foo(const(int[])) {} // Idiomatic
 void foo(in int[]) {}         // Intentful :)
I still like that one. :) Ali
Jan 20 2022
parent reply forkit <forkit gmail.com> writes:
On Thursday, 20 January 2022 at 23:49:59 UTC, Ali Çehreli wrote:

so here is final code, in idiomatic D, as far as I can tell ;-)

curious output when using -profile=gc

.. a line referring to: 
std.array.Appender!(immutable(char)[]).Appender.Data 
std.array.Appender!string.Appender.this 
C:\D\dmd2\windows\bin\..\..\src\phobos\std\array.d:3330

That's not real helpful, as I'm not sure what line of my code its 
referrring to.

// ---------------

/+
   
=====================================================================
    This program create a sample dataset consisting of 'random' 
records,
    and then outputs that dataset to a file.

    Arguments can be passed on the command line,
    or otherwise default values are used instead.

    Example of that output can be seen at the end of this code.
    
=====================================================================
+/

module test;
 safe

import std.stdio : write, writef, writeln, writefln;
import std.range : iota;
import std.array : array, byPair;
import std.random : Random, unpredictableSeed, dice, choice, 
uniform;
import std.algorithm : map, uniq, canFind;
import std.conv : to;
import std.stdio : File;
import std.format;

debug { import std; }

Random rnd;
static this() {  rnd = Random(unpredictableSeed); } // thanks Ali

void main(string[] args)
{
     int recordsNeeded, valuesPerRecord;
     string fname;

     if(args.length < 4)
     {
         recordsNeeded = 10;
         valuesPerRecord= 8;
         fname = "D:/rnd_records.txt";
     }
     else
     {
         // assumes valid values being passed in ;-)
         recordsNeeded = to!int(args[1]);
         valuesPerRecord = to!int(args[2]);
         fname = args[3];
     }

     int[] idArray;
     createUniqueIDArray(idArray, recordsNeeded);

     int[][] valuesArray;
     createValuesArray(valuesArray, recordsNeeded, 
valuesPerRecord);

     int[][int][] records = CreateDataSet(idArray, valuesArray, 
recordsNeeded);
     ProcessRecords(records, fname);

     writefln("All done. Check if records written to %s", fname);
}

void createUniqueIDArray
(ref int[] idArray, const(int) recordsNeeded)
{
     idArray.reserve(recordsNeeded);
     debug { writefln("idArray.capacity is %s", idArray.capacity); 
}

     int i = 0;
     int x;
     while(i != recordsNeeded)
     {
        // id needs to be 9 digits, and needs to start with 999
        x = uniform(999*10^^6, 10^^9); // thanks Stanislav

        // ensure every id added is unique.
        if (!idArray.canFind(x))
        {
            idArray ~= x; // NOTE: does NOT appear to register 
with -profile=gc
            i++;
        }
     }
}

void createValuesArray
(ref int[][] valuesArray, const(int) recordsNeeded, const(int) 
valuesPerRecord)
{
     valuesArray = iota(recordsNeeded)
             .map!(i => iota(valuesPerRecord)
             .map!(valuesPerRecord => cast(int)rnd.dice(0.6, 1.4))
             .array).array;  // NOTE: does register with 
-profile=gc
}

int[][int][] CreateDataSet
(const(int)[] idArray, int[][] valuesArray, const(int) numRecords)
{
     int[][int][] records;
     records.reserve(numRecords);
     debug { writefln("records.capacity is %s", records.capacity); 
}

     foreach(i, const id; idArray)
     {
         // NOTE: below does register with -profile=gc
         records ~= [ idArray[i] : valuesArray[i] ];
     }
     return records.dup;
}

void ProcessRecords
(in int[][int][] recArray, const(string) fname)
{
     auto file = File(fname, "w");
     scope(exit) file.close;

     string[] formattedRecords;
     formattedRecords.reserve(recArray.length);
     debug { writefln("formattedRecords.capacity is %s", 
formattedRecords.capacity); }

     void processRecord(const(int) id, const(int)[] values)
     {
         // NOTE: below does register with -profile=gc
         formattedRecords ~= id.to!string ~ 
values.format!"%(%s,%)";
     }

     foreach(ref const record; recArray)
     {
         foreach (ref rp; record.byPair)
         {
             processRecord(rp.expand);
         }
     }

     foreach(ref rec; formattedRecords)
         file.writeln(rec);
}

/+
sample file output:

9992511730,1,0,1,0,1,0,1
9995369731,1,1,1,1,1,1,1
9993136031,1,0,0,0,1,0,0
9998979051,1,1,1,1,0,1,1
9998438090,1,1,0,1,1,0,0
9995132750,0,0,1,0,1,1,1
9997123630,0,1,1,1,0,1,1
9998351590,1,0,0,1,1,1,1
9991454121,1,1,1,1,1,0,1
9997673520,1,1,1,1,1,1,1

+/

// ---------------
Jan 20 2022
next sibling parent forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 01:35:40 UTC, forkit wrote:

oops. nasty mistake to make ;-)


module test;
 safe

should be:

module test;
 safe:
Jan 20 2022
prev sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 1/20/22 17:35, forkit wrote:

 module test;
  safe
Does that make just the following definition safe or the entire module safe? Trying... Yes, I am right. To make the module safe, use the following syntax: safe:
      idArray.reserve(recordsNeeded);
[...]
             idArray ~= x; // NOTE: does NOT appear to register with
 -profile=gc
Because you've already reserved enough memory above. Good.
      int[][int][] records;
      records.reserve(numRecords);
That's good for the array part. However...
          // NOTE: below does register with -profile=gc
          records ~= [ idArray[i] : valuesArray[i] ];
The right hand side is a freshly generated associative array. For every element of 'records', there is a one-element AA created. AA will need to allocate memory for its element. So, GC allocation is expected there.
      string[] formattedRecords;
      formattedRecords.reserve(recArray.length);
[...]
          // NOTE: below does register with -profile=gc
          formattedRecords ~= id.to!string ~ values.format!"%(%s,%)";
Again, although 'formattedRecords' has reserved memory, the right hand side has dynamic memory allocations. 1) id.to!string allocates 2) format allocates memory for its 'string' result (I think the Appender report comes from format's internals.) 3) Operator ~ makes a new string from the previous two (Somehow, I don't see three allocations though. Perhaps an NRVO is applied there. (?)) I like the following better, which reduces the allocations: formattedRecords ~= format!"%s%(%s,%)"(id.to!string, values);
      foreach(ref rec; formattedRecords)
          file.writeln(rec);
The bigger question is, why did 'formattedRecords' exist at all? You could have written the output directly to the file. But even *worse* and with apologies, ;) here is something crazy that achieves the same thing: void ProcessRecords (in int[][int][] recArray, const(string) fname) { import std.algorithm : joiner; auto toWrite = recArray.map!(e => e.byPair); File("rnd_records.txt", "w").writefln!"%(%(%(%s,%(%s,%)%)%)\n%)"(toWrite); } I've done lot's of trial and error for the required number of nested %( %) pairs. Phew... Ali
Jan 20 2022
parent reply forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 02:30:35 UTC, Ali Çehreli wrote:
 The bigger question is, why did 'formattedRecords' exist at 
 all? You could have written the output directly to the file.
Oh. this was intentional, as I wanted to write once, and only once, to the file. The consequence of that decision of course, is the extra memory allocations... But in my example code I only create 10 records. In reality, my dataset will have 100,000's of records, so I don't want to write 100,000s of time to the same file.
 But even *worse* and with apologies, ;) here is something crazy 
 that achieves the same thing:

 void ProcessRecords
 (in int[][int][] recArray, const(string) fname)
 {
     import std.algorithm : joiner;
     auto toWrite = recArray.map!(e => e.byPair);
     File("rnd_records.txt", 
 "w").writefln!"%(%(%(%s,%(%s,%)%)%)\n%)"(toWrite);
 }

 I've done lot's of trial and error for the required number of 
 nested %( %) pairs. Phew...

 Ali
Yes, that does look worse ;-) But I'm looking into that code to see if I can salvage something from it ;-)
Jan 20 2022
parent reply forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 03:45:08 UTC, forkit wrote:
 On Friday, 21 January 2022 at 02:30:35 UTC, Ali Çehreli wrote:
 The bigger question is, why did 'formattedRecords' exist at 
 all? You could have written the output directly to the file.
Oh. this was intentional, as I wanted to write once, and only once, to the file.
oops. looking back at that code, it seems I didn't write what i intended :-( I might have to use a kindof stringbuilder instead, then write a massive string once to the file.
Jan 20 2022
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 21, 2022 at 03:50:37AM +0000, forkit via Digitalmars-d-learn wrote:
[...]
 I might have to use a kindof stringbuilder instead, then write a
 massive string once to the file.
[...] std.array.appender is your friend. T -- Meat: euphemism for dead animal. -- Flora
Jan 20 2022
parent reply forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 03:57:01 UTC, H. S. Teoh wrote:
 std.array.appender is your friend.

 T
:-) // -- void ProcessRecords (in int[][int][] recArray, const(string) fname) { auto file = File(fname, "w"); scope(exit) file.close; Appender!string bigString = appender!string; bigString.reserve(recArray.length); debug { writefln("bigString.capacity is %s", bigString.capacity); } void processRecord(const(int) id, const(int)[] values) { bigString ~= id.to!string ~ values.format!"%(%s,%)" ~ "\n"; } foreach(ref const record; recArray) { foreach (ref rp; record.byPair) { processRecord(rp.expand); } } file.write(bigString[]); } // ---
Jan 20 2022
parent forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 04:08:33 UTC, forkit wrote:
 // --

 void ProcessRecords
 (in int[][int][] recArray, const(string) fname)
 {
     auto file = File(fname, "w");
     scope(exit) file.close;

     Appender!string bigString = appender!string;
     bigString.reserve(recArray.length);
     debug { writefln("bigString.capacity is %s", 
 bigString.capacity); }

     void processRecord(const(int) id, const(int)[] values)
     {
         bigString ~= id.to!string ~ values.format!"%(%s,%)" ~ 
 "\n";
     }

     foreach(ref const record; recArray)
     {
         foreach (ref rp; record.byPair)
         {
             processRecord(rp.expand);
         }
     }

     file.write(bigString[]);
 }

 // ---
actually something not right with Appender I think... 100_000 records took 20sec (ok) 1_000_000 records never finished - after 1hr/45min I cancelled the process. ??
Jan 20 2022
prev sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Friday, 21 January 2022 at 03:50:37 UTC, forkit wrote:

 I might have to use a kindof stringbuilder instead, then write 
 a massive string once to the file.
You're using writeln, which goes through C I/O buffered writes. Whether you make one call or several is of little consequence - you're limited by buffer size and options.
Jan 21 2022
parent reply forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 08:53:26 UTC, Stanislav Blinov 
wrote:

turns out the problem has nothing to do with appender...

It's actually this line:

if (!idArray.canFind(x)):

when i comment this out in the function below, the program does 
what I want in seconds.

only problem is, the ids are no longer unique (in the file)

// ---
void createUniqueIDArray
(ref int[] idArray, const(int) recordsNeeded)
{
     idArray.reserve(recordsNeeded);
     debug { writefln("idArray.capacity is %s", idArray.capacity); 
}

     int i = 0;
     int x;
     while(i != recordsNeeded)
     {
        // id needs to be 9 digits, and needs to start with 999
        x = uniform(999*10^^6, 10^^9); // thanks Stanislav

        // ensure every id added is unique.
        //if (!idArray.canFind(x))
        //{
            idArray ~= x; // NOTE: does NOT appear to register 
with -profile=gc
            i++;
        //}
     }

     debug { writefln("idArray.length = %s", idArray.length); }
}

// ----
Jan 21 2022
next sibling parent reply forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 09:10:56 UTC, forkit wrote:

ok... in the interest of corecting the code I posted previously...

... here is a version that actually works in secs (for a million 
records), as opposed to hours!


// ---------------

/+
   
=====================================================================
    This program create a sample dataset consisting of 'random' 
records,
    and then outputs that dataset to a file.

    Arguments can be passed on the command line,
    or otherwise default values are used instead.

    Example of that output can be seen at the end of this code.
    
=====================================================================
+/

module test;
 safe:
import std.stdio : write, writef, writeln, writefln;
import std.range : iota, takeExactly;
import std.array : array, byPair, Appender, appender;
import std.random : Random, unpredictableSeed, dice, choice, 
uniform;
import std.algorithm : map, uniq, canFind, among;
import std.conv : to;
import std.format;
import std.stdio : File;
import std.file : exists;
import std.exception : enforce;

debug { import std; }

Random rnd;
static this() {  rnd = Random(unpredictableSeed); } // thanks Ali

void main(string[] args)
{
     int recordsNeeded, valuesPerRecord;
     string fname;

     if(args.length < 4)
     {
         //recordsNeeded = 1_000_000;
         //recordsNeeded = 100_000;
         recordsNeeded = 10;

         valuesPerRecord= 8;

         //fname = "D:/rnd_records.txt";
         fname = "./rnd_records.txt";
     }
     else
     {
         // assumes valid values being passed in ;-)
         recordsNeeded = to!int(args[1]);
         valuesPerRecord = to!int(args[2]);
         fname = args[3];
     }

     debug
         { writefln("%s records, %s values for record, will be 
written to file: %s", recordsNeeded, valuesPerRecord, fname); }
     else
         { enforce(!exists(fname), "Oop! That file already 
exists!"); }

     // id needs to be 9 digits, and needs to start with 999
     int[] idArray = takeExactly(iota(999*10^^6, 10^^9), 
recordsNeeded).array;
     debug { writefln("idArray.length = %s", idArray.length); }

     int[][] valuesArray;
     createValuesArray(valuesArray, recordsNeeded, 
valuesPerRecord);

     int[][int][] records = CreateDataSet(idArray, valuesArray, 
recordsNeeded);

     ProcessRecords(records, fname);

     writefln("All done. Check if records written to %s", fname);
}

void createValuesArray
(ref int[][] valuesArray, const(int) recordsNeeded, const(int) 
valuesPerRecord)
{
     valuesArray = iota(recordsNeeded)
             .map!(i => iota(valuesPerRecord)
             .map!(valuesPerRecord => cast(int)rnd.dice(0.6, 1.4))
             .array).array;  // NOTE: does register with 
-profile=gc

     debug { writefln("valuesArray.length = %s", 
valuesArray.length); }

}

int[][int][] CreateDataSet
(const(int)[] idArray, int[][] valuesArray, const(int) numRecords)
{
     int[][int][] records;
     records.reserve(numRecords);
     debug { writefln("records.capacity is %s", records.capacity); 
}

     foreach(i, const id; idArray)
     {
         // NOTE: below does register with -profile=gc
         records ~= [ idArray[i] : valuesArray[i] ];
     }

     debug { writefln("records.length = %s", records.length); }

     return records.dup;
}

void ProcessRecords
(in int[][int][] recArray, const(string) fname)
{
     auto file = File(fname, "w");
     scope(exit) file.close;

     Appender!string bigString = appender!string;
     bigString.reserve(recArray.length);
     debug { writefln("bigString.capacity is %s", 
bigString.capacity); }

     // NOTE: forward declaration required for this nested function
     void processRecord(const(int) id, const(int)[] values)
     {
         // NOTE: below does register with -profile=gc
         bigString ~= id.to!string ~ "," ~ values.format!"%(%s,%)" 
~ "\n";
     }

     foreach(ref const record; recArray)
     {
         foreach (ref rp; record.byPair)
         {
             processRecord(rp.expand);
         }
     }

     file.write(bigString[]);
}

/+
sample file output:

9992511730,1,0,1,0,1,0,1
9995369731,1,1,1,1,1,1,1
9993136031,1,0,0,0,1,0,0
9998979051,1,1,1,1,0,1,1
9998438090,1,1,0,1,1,0,0
9995132750,0,0,1,0,1,1,1
9997123630,0,1,1,1,0,1,1
9998351590,1,0,0,1,1,1,1
9991454121,1,1,1,1,1,0,1
9997673520,1,1,1,1,1,1,1

+/

// ---------------
Jan 21 2022
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 21, 2022 at 10:12:42AM +0000, forkit via Digitalmars-d-learn wrote:
[...]
 Random rnd;
 static this() {  rnd = Random(unpredictableSeed); } // thanks Ali
Actually you don't even need to do this, unless you want precise control over the initialization of your RNG. If you don't specify the RNG parameter in the calls to std.random functions, they will use the default RNG, which is already initialized with unpredictableSeed. [...]
     // id needs to be 9 digits, and needs to start with 999
     int[] idArray = takeExactly(iota(999*10^^6, 10^^9),
 recordsNeeded).array;
[...] This is wasteful if you're not planning to use every ID in this million-entry long array. Much better to just use an AA to keep track of which IDs have already been generated instead. Of course, if you plan to use most of the array, then the AA may wind up using more memory than the array. So it depends on your use case. T -- Never wrestle a pig. You both get covered in mud, and the pig likes it.
Jan 21 2022
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/21/22 1:36 PM, H. S. Teoh wrote:
 On Fri, Jan 21, 2022 at 10:12:42AM +0000, forkit via Digitalmars-d-learn wrote:
 [...]
      // id needs to be 9 digits, and needs to start with 999
      int[] idArray = takeExactly(iota(999*10^^6, 10^^9),
 recordsNeeded).array;
[...] This is wasteful if you're not planning to use every ID in this million-entry long array. Much better to just use an AA to keep track of which IDs have already been generated instead. Of course, if you plan to use most of the array, then the AA may wind up using more memory than the array. So it depends on your use case.
Yeah, iota is a random-access range, so you can just pass it directly, and not allocate anything. Looking at the usage, it doesn't need to be an array at all. But modifying the code to properly accept the range might prove difficult for someone not used to it. -Steve
Jan 21 2022
parent reply forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 18:50:46 UTC, Steven Schveighoffer 
wrote:
 Yeah, iota is a random-access range, so you can just pass it 
 directly, and not allocate anything.

 Looking at the usage, it doesn't need to be an array at all. 
 But modifying the code to properly accept the range might prove 
 difficult for someone not used to it.

 -Steve
thanks. that makes more sense actually ;-) now i can get rid of the idArray completely, and just do: foreach(i, id; enumerate(iota(iotaStartNum, iotaStartNum + recordsNeeded))) { records ~= [ id: valuesArray[i] ]; }
Jan 21 2022
parent reply forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 21:01:11 UTC, forkit wrote:

even better, I got rid of all those uncessary arrays ;-)

// ---

int[][int][] CreateDataSet
(const(int) recordsNeeded, const(int)valuesPerRecord)
{
     int[][int][] records;
     records.reserve(recordsNeeded);

     foreach(i, id; iota(iotaStartNum, iotaStartNum + 
recordsNeeded).enumerate)
     {
         records ~= [ id: 
iota(valuesPerRecord).map!(valuesPerRecord => 
cast(int)rnd.dice(0.6, 1.4)).array ];
     }

     return records.dup;
}

// ---
Jan 21 2022
next sibling parent forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 21:43:38 UTC, forkit wrote:

oops... should be:

// ---

int[][int][] CreateDataSet
(const(int) recordsNeeded, const(int)valuesPerRecord)
{
     int[][int][] records;
     records.reserve(recordsNeeded);

     const int iotaStartNum = 100_000_001;

     foreach(i, id; iota(iotaStartNum, iotaStartNum + 
recordsNeeded).enumerate)
     {
         records ~= [ id: 
iota(valuesPerRecord).map!(valuesPerRecord => 
cast(int)rnd.dice(0.6, 1.4)).array ];
     }

     return records.dup;
}

// ---
Jan 21 2022
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 21, 2022 at 09:43:38PM +0000, forkit via Digitalmars-d-learn wrote:
 On Friday, 21 January 2022 at 21:01:11 UTC, forkit wrote:
[...]
 even better, I got rid of all those uncessary arrays ;-)
 
 // ---
 
 int[][int][] CreateDataSet
 (const(int) recordsNeeded, const(int)valuesPerRecord)
 {
     int[][int][] records;
     records.reserve(recordsNeeded);
 
     foreach(i, id; iota(iotaStartNum, iotaStartNum +
 recordsNeeded).enumerate)
     {
         records ~= [ id: iota(valuesPerRecord).map!(valuesPerRecord =>
 cast(int)rnd.dice(0.6, 1.4)).array ];
     }
 
     return records.dup;
What's the point of calling .dup here? The only reference to records is going out of scope, so why can't you just return it? The .dup is just creating extra work for nothing. T -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn
Jan 21 2022
parent reply forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 21:56:33 UTC, H. S. Teoh wrote:
 What's the point of calling .dup here?  The only reference to 
 records is going out of scope, so why can't you just return it? 
  The .dup is just creating extra work for nothing.


 T
good pickup. thanks ;-) // ---- module test; safe: import std.stdio : write, writef, writeln, writefln; import std.range : iota, enumerate; import std.array : array, byPair, Appender, appender; import std.random : Random, unpredictableSeed, dice, randomCover; import std.algorithm : map; import std.conv : to; import std.format; import std.stdio : File; import std.file : exists; import std.exception : enforce; debug { import std; } Random rnd; static this() { rnd = Random(unpredictableSeed); } void main(string[] args) { int recordsNeeded, valuesPerRecord; string fname; if(args.length < 4) { recordsNeeded = 10; // default valuesPerRecord= 8; // default fname = "D:/rnd_records.txt"; // default //fname = "./rnd_records.txt"; // default } else { // assumes valid values being passed in ;-) recordsNeeded = to!int(args[1]); valuesPerRecord = to!int(args[2]); fname = args[3]; } debug { writefln("%s records, %s values for record, will be written to file: %s", recordsNeeded, valuesPerRecord, fname); } else { enforce(!exists(fname), "Oop! That file already exists!"); enforce(recordsNeeded <= 1_000_000_000, "C'mon! That's too many records!"); } int[][int][] records = CreateDataSet(recordsNeeded, valuesPerRecord); ProcessDataSet(records, fname); writefln("All done. Check if records written to %s", fname); } int[][int][] CreateDataSet (const(int) recordsNeeded, const(int) valuesPerRecord) { const int iotaStartNum = 100_000_001; int[][int][] records; records.reserve(recordsNeeded); debug { writefln("records.capacity is %s", records.capacity); } foreach(i, id; iota(iotaStartNum, iotaStartNum + recordsNeeded).enumerate) { // NOTE: below does register with -profile=gc records ~= [ id: iota(valuesPerRecord).map!(valuesPerRecord => cast(int)rnd.dice(0.6, 1.4)).array ]; } debug { writefln("records.length = %s", records.length); } return records; } // this creates a big string of 'formatted' records, and outputs that string to a file. void ProcessDataSet (in int[][int][] records, const(string) fname) { auto file = File(fname, "w"); scope(exit) file.close; Appender!string bigString = appender!string; bigString.reserve(records.length); debug { writefln("bigString.capacity is %s", bigString.capacity); } // NOTE: forward declaration required for this nested function void processRecord(const(int) id, const(int)[] values) { bigString ~= id.to!string ~ "," ~ values.format!"%(%s,%)" ~ "\n"; } foreach(ref const record; records) { foreach (ref rp; record.byPair) { processRecord(rp.expand); } } debug { writeln; writeln(bigString[].until("\n")); writeln; } // display just one record file.write(bigString[]); } // ----
Jan 21 2022
parent reply forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 22:25:32 UTC, forkit wrote:

I really like how alias and mixin can simplify my code even 
further:

//---

int[][int][] CreateDataSet
(const(int) recordsNeeded, const(int) valuesPerRecord)
{
     int[][int][] records;
     records.reserve(recordsNeeded);

     const int iotaStartNum = 100_000_001;
     alias iotaValues = Alias!"iota(iotaStartNum, iotaStartNum + 
recordsNeeded).enumerate";
     alias recordValues = 
Alias!"iota(valuesPerRecord).map!(valuesPerRecord => 
cast(int)rnd.dice(0.6, 1.4)).array";

     foreach(i, id; mixin(iotaValues))
     {
         records ~= [ id: mixin(recordValues) ];
     }

     return records;
}

//---
Jan 21 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/21/22 6:24 PM, forkit wrote:
 On Friday, 21 January 2022 at 22:25:32 UTC, forkit wrote:

 
 I really like how alias and mixin can simplify my code even further:
 
 //---
 
 int[][int][] CreateDataSet
 (const(int) recordsNeeded, const(int) valuesPerRecord)
 {
      int[][int][] records;
      records.reserve(recordsNeeded);
 
      const int iotaStartNum = 100_000_001;
      alias iotaValues = Alias!"iota(iotaStartNum, iotaStartNum + 
 recordsNeeded).enumerate";
      alias recordValues = 
 Alias!"iota(valuesPerRecord).map!(valuesPerRecord => 
 cast(int)rnd.dice(0.6, 1.4)).array";
oof! use enums for compile-time strings ;) ```d enum iotaValues = "iota(..."; ```
 
      foreach(i, id; mixin(iotaValues))
      {
          records ~= [ id: mixin(recordValues) ];
      }
 
      return records;
 }
Not sure I agree that the mixin looks better. Also, I'm curious about this code: ```d iota(valuesPerRecord).map!(valuesPerRecord => cast(int)rnd.dice(0.6, 1.4)).array; ``` That second `valuesPerRecord` is not used in the lambda, and also it's not referring to the original element, it's the name of a parameter in the lambda. Are you sure this is doing what you want? -Steve
Jan 21 2022
next sibling parent forkit <forkit gmail.com> writes:
On Saturday, 22 January 2022 at 01:33:16 UTC, Steven 
Schveighoffer wrote:

so I why watching this video by Andrei:

https://www.youtube.com/watch?v=mCrVYYlFTrA

In it, he talked about writing the simplest design that could 
possibly work....

Which got me thinking....

// ----

module test;
 safe:

import std.stdio : write, writef, writeln, writefln;
import std.range : iota, enumerate;
import std.array : array, byPair, Appender, appender;
import std.random : Random, unpredictableSeed, dice, randomCover;
import std.algorithm : map;
import std.conv : to;
import std.format;
import std.stdio : File;
import std.file : exists;
import std.exception : enforce;
import std.meta : Alias;

debug { import std; }

Random rnd;
static this() {  rnd = Random(unpredictableSeed); }

void main(string[] args)
{
     int recordsNeeded, valuesPerRecord;

     string fname;

     if(args.length < 4) // then set defaults
     {
         recordsNeeded = 10;
         valuesPerRecord= 8;

         version(Windows) { fname = "D:/rnd_records.txt"; }
         version(linux) { fname = "./rnd_records.txt"; }
     }
     else
     {
         // assumes valid values being passed in ;-)
         recordsNeeded = to!int(args[1]);
         valuesPerRecord = to!int(args[2]);
         fname = args[3];
     }

     debug
         { writefln("%s records (where a record is: id and %s 
values), will be written to file: %s", recordsNeeded, 
valuesPerRecord, fname); }
     else
         {
             enforce(!exists(fname), "Oops! That file already 
exists!");
             enforce(recordsNeeded <= 1_000_000_000, "C'mon! 
That's too many records!");
         }

     CreateDataFile(recordsNeeded, valuesPerRecord, fname);

     writefln("All done. Check if records written to %s", fname);
}

void CreateDataFile(const(int) recordsNeeded, const(int) 
valuesPerRecord, const(string) fname)
{
     auto file = File(fname, "w");
     scope(exit) file.close;

     Appender!string bigString = appender!string;
     bigString.reserve(recordsNeeded);

     const int iotaStartNum = 100_000_001;

     foreach(i, id; iota(iotaStartNum, iotaStartNum + 
recordsNeeded).enumerate)
     {
         bigString
             ~= id.to!string
             ~ ","
             ~ valuesPerRecord.iota.map!(valuesPerRecord => 
cast(int)rnd.dice(0.6, 1.4)).format!"%(%s,%)"
             ~ "\n";
     }

     file.write(bigString[]);
}

// ----
Jan 21 2022
prev sibling parent forkit <forkit gmail.com> writes:
On Saturday, 22 January 2022 at 01:33:16 UTC, Steven 
Schveighoffer wrote:
 That second `valuesPerRecord` is not used in the lambda, and 
 also it's not referring to the original element, it's the name 
 of a parameter in the lambda.

 Are you sure this is doing what you want?

 -Steve
It just worked, so i didn't think about it too much.. but it seems to work either way. And to be honest, the only part of it I understand, is the dice part ;-) In any case I changed it: from: valuesPerRecord => to: i => // ---- void CreateDataFile(const(int) recordsNeeded, const(int) valuesPerRecord, const(string) fname) { auto rnd = Random(unpredictableSeed); auto file = File(fname, "w"); scope(exit) file.close; Appender!string bigString = appender!string; bigString.reserve(recordsNeeded); const int iotaStartNum = 100_000_001; foreach(i, id; iota(iotaStartNum, iotaStartNum + recordsNeeded).enumerate) { bigString ~= id.to!string ~ "," ~ valuesPerRecord.iota.map!(i => rnd.dice(0.6, 1.4)).format!"%(%s,%)" ~ "\n"; } file.write(bigString[]); } // ---
Jan 22 2022
prev sibling parent forkit <forkit gmail.com> writes:
On Friday, 21 January 2022 at 18:36:42 UTC, H. S. Teoh wrote:
 This is wasteful if you're not planning to use every ID in this 
 million-entry long array.  Much better to just use an AA to 
 keep track of which IDs have already been generated instead.  
 Of course, if you plan to use most of the array, then the AA 
 may wind up using more memory than the array. So it depends on 
 your use case.


 T
yes, I was thinking this over as I was waking up this morning, and thought... what the hell am I doing generating all those numbers that might never get used. better to do: const int iotaStartNum = 100_000_000; int[] idArray = iota(startiotaStartNum, iotaStartNum + recordsNeeded).array;
Jan 21 2022
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 21, 2022 at 09:10:56AM +0000, forkit via Digitalmars-d-learn wrote:
[...]
 turns out the problem has nothing to do with appender...
 
 It's actually this line:
 
 if (!idArray.canFind(x)):
 
 when i comment this out in the function below, the program does what I
 want in seconds.
 
 only problem is, the ids are no longer unique (in the file)
[...] Ah yes, the good ole O(N²) trap that new programmers often fall into. :-) Using .canFind on an array of generated IDs means scanning the entire array every time you find a non-colliding ID. As the array grows, the cost of doing this increases. The overall effect is O(N²) time complexity, because you're continually scanning the array every time you generate a new ID. Use an AA instead, and performance should dramatically increase. I.e., instead of: size_t[] idArray; ... if (!idArray.canFind(x)): // O(N) cost to scan array write: bool[size_t] idAA; ... if (x in idAA) ... // O(1) cost to look up an ID T -- VI = Visual Irritation
Jan 21 2022
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/20/22 6:01 PM, forkit wrote:
 On Thursday, 20 January 2022 at 22:31:17 UTC, Steven Schveighoffer wrote:
 Because it would allow altering const data.
I'm not sure I understand. At what point in this function is valuesArray modified, and thus preventing it being passed in with const?
The compiler rules aren't enforced based on what code you wrote, it doesn't have the capability of proving that your code doesn't modify things. Instead, it enforces simple rules that allow prove that const data cannot be modified. I'll make it into a simpler example: ```d const int[] arr = [1, 2, 3, 4 5]; int[] arr2 = arr; ``` This code does not modify any data in arr. But that in itself isn't easy to prove. In order to ensure that arr is never modified, the compiler would have to analyze all the code, and every possible way that arr2 might escape or be used somewhere at some point to modify the data. It doesn't have the capability or time to do that (if I understand correctly, this is NP-hard). Instead, it just says, you can't convert references from const to mutable without a cast. That guarantees that you can't modify const data. However, it does rule out a certain class of code that might not modify the const data, even if it has the opportunity to. It's like saying, "we don't let babies play with sharp knives" vs. "we will let babies play with sharp knives but stop them just before they stab themselves." -Steve
Jan 20 2022