www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Multidimensional dynamic array of strings initialized with split()

reply "Ludovit Lucenic" <llucenic gmail.com> writes:
Hello friends,

with the following code

import std.stdio;
import std.array;

auto file71 = File(argv[2], "r");

string[][] buffer;
foreach (line; file71.byLines) {
     buffer ~= split(line, "\t");
}

I am trying to cut the lines from the file with tab as delimiter 
to pre-fetch the content of a file before further processing.

Each split() call gives correct string[] values in and of itself.
But when I try to read buffer, after the loop, I got corrupted 
data, like this:

[ ["-", "_Unit226", "constructor", 
"sub_00BE896C\t1\t?:?\t\t//con", "t", "uc...

Obviously the concatenation is doing no good, since there are 
tabs in the values...

What am I missing here ? Is it that split() allocated memory that 
gets overwritten in the loop and the ~= just copies the subarrays 
not copying the subsubarrays ? How to overcome this ?

Thank you very much,
Ludovit
Sep 04 2013
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Sep 05, 2013 at 12:57:34AM +0200, Ludovit Lucenic wrote:
 Hello friends,
 
 with the following code
 
 import std.stdio;
 import std.array;
 
 auto file71 = File(argv[2], "r");
 
 string[][] buffer;
 foreach (line; file71.byLines) {
     buffer ~= split(line, "\t");
 }
 
 I am trying to cut the lines from the file with tab as delimiter to
 pre-fetch the content of a file before further processing.
 
 Each split() call gives correct string[] values in and of itself.
 But when I try to read buffer, after the loop, I got corrupted data,
 like this:
 
 [ ["-", "_Unit226", "constructor", "sub_00BE896C\t1\t?:?\t\t//con",
 "t", "uc...
 
 Obviously the concatenation is doing no good, since there are tabs
 in the values...
 
 What am I missing here ? Is it that split() allocated memory that
 gets overwritten in the loop and the ~= just copies the subarrays
 not copying the subsubarrays ? How to overcome this ?
[...] The problem is that File.byLine() reuses its buffer for efficiency, and split is optimized to return slices into that buffer instead of copying each substring. So after every iteration the buffer (and therefore the slices into it) gets overwritten. Replace the loop body with the following and it should work: buffer ~= split(line.dup, "\t"); T -- Dogs have owners ... cats have staff. -- Krista Casada
Sep 04 2013
parent reply "Ludovit Lucenic" <llucenic gmail.com> writes:
On Wednesday, 4 September 2013 at 23:06:10 UTC, H. S. Teoh wrote:
 The problem is that File.byLine() reuses its buffer for 
 efficiency, and
 split is optimized to return slices into that buffer instead of 
 copying
 each substring. So after every iteration the buffer (and 
 therefore the
 slices into it) gets overwritten.

 Replace the loop body with the following and it should work:

 	buffer ~= split(line.dup, "\t");


 T
Thank you so much for your explanation. Helped me a lot to understand things and works actually :-) LL
Sep 04 2013
parent reply "Ludovit Lucenic" <llucenic gmail.com> writes:
I have created a wiki on this one.
http://wiki.dlang.org/Read_table_data_from_file
Sep 05 2013
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 09/05/2013 01:14 AM, Ludovit Lucenic wrote:
 I have created a wiki on this one.
 http://wiki.dlang.org/Read_table_data_from_file
Compiling with "DMD64 D Compiler v2.064-devel-52cc287" produces the following errors: * You had byLines in your original code as well. Shouldn't it be byLine? * You are missing the closing brace of the foreach loop as well. * "Error: cannot append type char[][] to type string[][]" I have to replace .dup with .idup The following version is lazy: import std.stdio; import std.array; import std.algorithm; auto readInData(File inputFile, string fieldSeparator) { return inputFile .byLine .map!(line => line .idup .split("\t")); } The caller can either use the result lazily: import std.range; void main() { auto file = File("deneme.txt"); writeln(readInData(file, "\t").take(2)); } Or call .array on the result to consume the range eagerly: auto table = readInData(file, "\t").array; Ali
Sep 05 2013
parent Ludovit Lucenic <llucenic gmail.com> writes:
On Thursday, 5 September 2013 at 16:22:46 UTC, Ali Çehreli wrote:
 Compiling with "DMD64 D Compiler v2.064-devel-52cc287" produces 
 the following errors:

 * You had byLines in your original code as well. Shouldn't it 
 be byLine?

 * You are missing the closing brace of the foreach loop as well.

 * "Error: cannot append type char[][] to type string[][]" I 
 have to replace .dup with .idup
Thank you for pointing out the errors, Ali. I have updated the example.
 The following version is lazy:

 import std.stdio;
 import std.array;
 import std.algorithm;

 auto readInData(File inputFile, string fieldSeparator)
 {
     return
         inputFile
         .byLine
         .map!(line => line
                       .idup
                       .split("\t"));
 }

 The caller can either use the result lazily:

 import std.range;

 void main()
 {
     auto file = File("deneme.txt");
     writeln(readInData(file, "\t").take(2));
 }

 Or call .array on the result to consume the range eagerly:

     auto table = readInData(file, "\t").array;

 Ali
Thank you for the alternative approaches. This thread is linked from Credits section, if someone wants to find out more on the topic from the wiki.
Sep 22 2017