www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Behavior of joining mapresults

reply =?UTF-8?Q?Christian_K=c3=b6stlin?= <christian.koestlin gmail.com> writes:
When working with json data files, that we're a little bigger than
convenient I stumbled upon a strange behavior with joining of mapresults
(I understand that this is more or less flatmap).
I mapped inputfiles, to JSONValues, from which I took out some arrays,
whose content I wanted to join.
Although the joiner is at the end of the functional pipe, it led to
calling of the parsing code twice.
I tried to reduce the problem:

#!/usr/bin/env rdmd -unittest
unittest {
    import std.stdio;
    import std.range;
    import std.algorithm;
    import std.string;

    auto parse(int i) {
        writeln("parsing %s".format(i));
        return [1, 2, 3];
    }

    writeln(iota(1, 5).map!(parse));
    writeln("-------------------------------");
    writeln((iota(1, 5).map!(parse)).joiner);
}

void main() {}

As you can see if you run this code, parsing 1,..5 is called two times
each. What am I doing wrong here?

Thanks in advance,
Christian
Dec 20 2017
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 20 December 2017 at 15:28:00 UTC, Christian Köstlin 
wrote:
 When working with json data files, that we're a little bigger 
 than
 convenient I stumbled upon a strange behavior with joining of 
 mapresults
 (I understand that this is more or less flatmap).
 I mapped inputfiles, to JSONValues, from which I took out some 
 arrays,
 whose content I wanted to join.
 Although the joiner is at the end of the functional pipe, it 
 led to
 calling of the parsing code twice.
 I tried to reduce the problem:

 [...]
you need to memorize I guess, map is lazy.
Dec 20 2017
parent reply =?UTF-8?Q?Christian_K=c3=b6stlin?= <christian.koestlin gmail.com> writes:
On 20.12.17 17:19, Stefan Koch wrote:
 On Wednesday, 20 December 2017 at 15:28:00 UTC, Christian Köstlin wrote:
 When working with json data files, that we're a little bigger than
 convenient I stumbled upon a strange behavior with joining of mapresults
 (I understand that this is more or less flatmap).
 I mapped inputfiles, to JSONValues, from which I took out some arrays,
 whose content I wanted to join.
 Although the joiner is at the end of the functional pipe, it led to
 calling of the parsing code twice.
 I tried to reduce the problem:

 [...]
you need to memorize I guess, map is lazy.
thats an idea, thank a lot, will give it a try ...
Dec 20 2017
parent reply =?UTF-8?Q?Christian_K=c3=b6stlin?= <christian.koestlin gmail.com> writes:
On 20.12.17 17:30, Christian Köstlin wrote:
 thats an idea, thank a lot, will give it a try ...
#!/usr/bin/env rdmd -unittest unittest { import std.stdio; import std.range; import std.algorithm; import std.string; import std.functional; auto parse(int i) { writeln("parsing %s".format(i)); return [1, 2, 3]; } writeln(iota(1, 5).map!(memoize!parse)); writeln("-------------------------------"); writeln((iota(1, 5).map!(memoize!parse)).joiner); } void main() {} works, but i fear for the data that is stored in the memoization. at the moment its not a big issue, as all the data fits comfortable into ram, but for bigger data another approach is needed (probably even my current json parsing must be exchanged). I still wonder, if the joiner calls front more often than necessary. For sure its valid to call front as many times as one sees fit, but with a lazy map in between, it might not be the best solution.
Dec 20 2017
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, December 21, 2017 07:46:03 Christian Kstlin via Digitalmars-d-
learn wrote:
 On 20.12.17 17:30, Christian Kstlin wrote:
 thats an idea, thank a lot, will give it a try ...
#!/usr/bin/env rdmd -unittest unittest { import std.stdio; import std.range; import std.algorithm; import std.string; import std.functional; auto parse(int i) { writeln("parsing %s".format(i)); return [1, 2, 3]; } writeln(iota(1, 5).map!(memoize!parse)); writeln("-------------------------------"); writeln((iota(1, 5).map!(memoize!parse)).joiner); } void main() {} works, but i fear for the data that is stored in the memoization. at the moment its not a big issue, as all the data fits comfortable into ram, but for bigger data another approach is needed (probably even my current json parsing must be exchanged). I still wonder, if the joiner calls front more often than necessary. For sure its valid to call front as many times as one sees fit, but with a lazy map in between, it might not be the best solution.
I would think that it would make a lot more sense to simply put the whole thing in an array than to use memoize. e.g. auto arr = iota(1, 5).map!parse().array(); - Jonathan M Davis
Dec 20 2017
parent =?UTF-8?Q?Christian_K=c3=b6stlin?= <christian.koestlin gmail.com> writes:
On 21.12.17 08:41, Jonathan M Davis wrote:
 I would think that it would make a lot more sense to simply put the whole
 thing in an array than to use memoize. e.g.
 
 auto arr = iota(1, 5).map!parse().array();
thats also possible, but i wanted to make use of the laziness ... e.g. if i then search over the flattened stuff, i do not have to parse the 10th file. i replaced joiner by a primitive flatten function like this: #!/usr/bin/env rdmd -unittest unittest { import std.stdio; import std.range; import std.algorithm; import std.string; import std.functional; auto parse(int i) { writeln("parsing %s".format(i)); return [1, 2, 3]; } writeln(iota(1, 5).map!(parse)); writeln("-------------------------------"); writeln((iota(1, 5).map!(parse)).joiner); writeln("-------------------------------"); writeln((iota(1, 5).map!(memoize!parse)).joiner); writeln("-------------------------------"); writeln((iota(1, 5).map!(parse)).flatten); } auto flatten(T)(T input) { import std.range; struct Res { T input; ElementType!T current; this(T input) { this.input = input; this.current = this.input.front; advance(); } private void advance() { while (current.empty) { if (input.empty) { return; } input.popFront; if (input.empty) { return; } current = input.front; } } bool empty() { return current.empty; } auto front() { return current.front; } void popFront() { current.popFront; advance(); } } return Res(input); } void main() {} With this implementation my program behaves as expected (parsing the input data only once).
Dec 21 2017