digitalmars.D.learn - Newbie: Error parsing csv file with very long lines
- salvari (33/33) Apr 23 2016 Hello all!
- rikki cattermole (4/32) Apr 23 2016 Its probably using a buffer.
- salvari (3/3) Apr 23 2016 Fixed!!!
- Nicholas Wilson (20/23) Apr 23 2016 stdin.byLine() reuses its buffer. so the old arrays in columns
- salvari (3/29) Apr 23 2016 Thanks for your clue on std.csv!
- rikki cattermole (8/11) Apr 23 2016 .dup duplicates memory.
- salvari (3/18) Apr 23 2016 Now I understand. Slices are still biting me every now and then.
- Ivan Kazmenko (7/11) Apr 23 2016 Another possibility yet not mentioned is to change
Hello all! I'm trying to read a csv file (';' as separator) with very long lines. It seems to be really simple, I read the columns name with no problem. But as soon as the program parses the first line of data, the array containing the columns names seems to be overwrited. I'm using dmd: DMD64 D Compiler v2.071.0 My code: import std.stdio; import std.algorithm; import std.array; char[][] columns; void main() { LINE:foreach(line; stdin.byLine()){ if(line.startsWith("Interfaz")){ writeln("IN HERE"); columns = line.split(";"); writeln(columns); // Everything seems to be ok continue; } else{ auto linedata = line.split(";"); writefln("My line: %s", line); // Fine. writefln("LineData: %s", linedata); // Fine. Line data is ok writefln("Columns: %s", columns); // Wrong!!! columsn array // contains garbage data // from linedata } } }
Apr 23 2016
On 23/04/2016 10:40 PM, salvari wrote:Hello all! I'm trying to read a csv file (';' as separator) with very long lines. It seems to be really simple, I read the columns name with no problem. But as soon as the program parses the first line of data, the array containing the columns names seems to be overwrited. I'm using dmd: DMD64 D Compiler v2.071.0 My code: import std.stdio; import std.algorithm; import std.array; char[][] columns; void main() { LINE:foreach(line; stdin.byLine()){ if(line.startsWith("Interfaz")){ writeln("IN HERE"); columns = line.split(";"); writeln(columns); // Everything seems to be ok continue; } else{ auto linedata = line.split(";"); writefln("My line: %s", line); // Fine. writefln("LineData: %s", linedata); // Fine. Line data is ok writefln("Columns: %s", columns); // Wrong!!! columsn array // contains garbage data // from linedata } } }Its probably using a buffer. columns = line.dup.split(";"); Should fix it.
Apr 23 2016
Fixed!!! Thanks a lot. :-) But I have to think about this. I don't understand the failure.
Apr 23 2016
On Saturday, 23 April 2016 at 10:57:04 UTC, salvari wrote:Fixed!!! Thanks a lot. :-) But I have to think about this. I don't understand the failure.stdin.byLine() reuses its buffer. so the old arrays in columns point to the data in byLine's buffer and they get overwritten by subsequent calls. Also if you're trying to parse csv check out std.csv from the docs string str = "Hello;65;63.63\nWorld;123;3673.562"; struct Layout { string name; int value; double other; } auto records = csvReader!Layout(str,';'); foreach(record; records) { writeln(record.name); writeln(record.value); writeln(record.other); }
Apr 23 2016
On Saturday, 23 April 2016 at 11:13:19 UTC, Nicholas Wilson wrote:On Saturday, 23 April 2016 at 10:57:04 UTC, salvari wrote:Thanks for your clue on std.csv! I think I will use it a lot. I totally missed it.Fixed!!! Thanks a lot. :-) But I have to think about this. I don't understand the failure.stdin.byLine() reuses its buffer. so the old arrays in columns point to the data in byLine's buffer and they get overwritten by subsequent calls. Also if you're trying to parse csv check out std.csv from the docs string str = "Hello;65;63.63\nWorld;123;3673.562"; struct Layout { string name; int value; double other; } auto records = csvReader!Layout(str,';'); foreach(record; records) { writeln(record.name); writeln(record.value); writeln(record.other); }
Apr 23 2016
On 23/04/2016 10:57 PM, salvari wrote:Fixed!!! Thanks a lot. :-) But I have to think about this. I don't understand the failure..dup duplicates memory. What this means is, it allocates a new block of memory and copies the values across. What byLine does is, read up to \n and copies it into a buffer of memory. Then you get access to said buffer aka line. So it reuses the memory containing said line, meaning no allocations beyond the first and growth of it.
Apr 23 2016
On Saturday, 23 April 2016 at 11:18:08 UTC, rikki cattermole wrote:On 23/04/2016 10:57 PM, salvari wrote:Now I understand. Slices are still biting me every now and then.Fixed!!! Thanks a lot. :-) But I have to think about this. I don't understand the failure..dup duplicates memory. What this means is, it allocates a new block of memory and copies the values across. What byLine does is, read up to \n and copies it into a buffer of memory. Then you get access to said buffer aka line. So it reuses the memory containing said line, meaning no allocations beyond the first and growth of it.
Apr 23 2016
On Saturday, 23 April 2016 at 10:40:13 UTC, salvari wrote:It seems to be really simple, I read the columns name with no problem. But as soon as the program parses the first line of data, the array containing the columns names seems to be overwrited.Another possibility yet not mentioned is to change foreach(line; stdin.byLine()) into foreach(line; stdin.byLineCopy()) to make the older lines' contents available after you read the next line.
Apr 23 2016