www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Newbie: Error parsing csv file with very long lines

reply salvari <salvari gmail.com> writes:
Hello all!

I'm trying to read a csv file (';' as separator) with very long 
lines.

It seems to be really simple, I read the columns name with no 
problem. But as soon as the program parses the first line of 
data, the array containing the columns names seems to be 
overwrited.

I'm using dmd: DMD64 D Compiler v2.071.0

My code:

import std.stdio;
import std.algorithm;
import std.array;

char[][] columns;


void main() {
  LINE:foreach(line; stdin.byLine()){
     if(line.startsWith("Interfaz")){
       writeln("IN HERE");
       columns = line.split(";");
       writeln(columns);               // Everything seems to be ok
       continue;
     } else{
       auto linedata = line.split(";");
       writefln("My line: %s", line);        // Fine.
       writefln("LineData: %s", linedata);   // Fine. Line data is 
ok
       writefln("Columns: %s", columns);     // Wrong!!! columsn 
array
                                             // contains garbage 
data
                                             // from linedata
     }
   }
}
Apr 23 2016
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 23/04/2016 10:40 PM, salvari wrote:
 Hello all!

 I'm trying to read a csv file (';' as separator) with very long lines.

 It seems to be really simple, I read the columns name with no problem.
 But as soon as the program parses the first line of data, the array
 containing the columns names seems to be overwrited.

 I'm using dmd: DMD64 D Compiler v2.071.0

 My code:

 import std.stdio;
 import std.algorithm;
 import std.array;

 char[][] columns;


 void main() {
   LINE:foreach(line; stdin.byLine()){
      if(line.startsWith("Interfaz")){
        writeln("IN HERE");
        columns = line.split(";");
        writeln(columns);               // Everything seems to be ok
        continue;
      } else{
        auto linedata = line.split(";");
        writefln("My line: %s", line);        // Fine.
        writefln("LineData: %s", linedata);   // Fine. Line data is ok
        writefln("Columns: %s", columns);     // Wrong!!! columsn array
                                              // contains garbage data
                                              // from linedata
      }
    }
 }
Its probably using a buffer. columns = line.dup.split(";"); Should fix it.
Apr 23 2016
parent reply salvari <salvari gmail.com> writes:
Fixed!!!

Thanks a lot. :-)


But I have to think about this. I don't understand the failure.
Apr 23 2016
next sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Saturday, 23 April 2016 at 10:57:04 UTC, salvari wrote:
 Fixed!!!

 Thanks a lot. :-)


 But I have to think about this. I don't understand the failure.
stdin.byLine() reuses its buffer. so the old arrays in columns point to the data in byLine's buffer and they get overwritten by subsequent calls. Also if you're trying to parse csv check out std.csv from the docs string str = "Hello;65;63.63\nWorld;123;3673.562"; struct Layout { string name; int value; double other; } auto records = csvReader!Layout(str,';'); foreach(record; records) { writeln(record.name); writeln(record.value); writeln(record.other); }
Apr 23 2016
parent salvari <salvari gmail.com> writes:
On Saturday, 23 April 2016 at 11:13:19 UTC, Nicholas Wilson wrote:
 On Saturday, 23 April 2016 at 10:57:04 UTC, salvari wrote:
 Fixed!!!

 Thanks a lot. :-)


 But I have to think about this. I don't understand the failure.
stdin.byLine() reuses its buffer. so the old arrays in columns point to the data in byLine's buffer and they get overwritten by subsequent calls. Also if you're trying to parse csv check out std.csv from the docs string str = "Hello;65;63.63\nWorld;123;3673.562"; struct Layout { string name; int value; double other; } auto records = csvReader!Layout(str,';'); foreach(record; records) { writeln(record.name); writeln(record.value); writeln(record.other); }
Thanks for your clue on std.csv! I think I will use it a lot. I totally missed it.
Apr 23 2016
prev sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 23/04/2016 10:57 PM, salvari wrote:
 Fixed!!!

 Thanks a lot. :-)


 But I have to think about this. I don't understand the failure.
.dup duplicates memory. What this means is, it allocates a new block of memory and copies the values across. What byLine does is, read up to \n and copies it into a buffer of memory. Then you get access to said buffer aka line. So it reuses the memory containing said line, meaning no allocations beyond the first and growth of it.
Apr 23 2016
parent salvari <salvari gmail.com> writes:
On Saturday, 23 April 2016 at 11:18:08 UTC, rikki cattermole 
wrote:
 On 23/04/2016 10:57 PM, salvari wrote:
 Fixed!!!

 Thanks a lot. :-)


 But I have to think about this. I don't understand the failure.
.dup duplicates memory. What this means is, it allocates a new block of memory and copies the values across. What byLine does is, read up to \n and copies it into a buffer of memory. Then you get access to said buffer aka line. So it reuses the memory containing said line, meaning no allocations beyond the first and growth of it.
Now I understand. Slices are still biting me every now and then.
Apr 23 2016
prev sibling parent Ivan Kazmenko <gassa mail.ru> writes:
On Saturday, 23 April 2016 at 10:40:13 UTC, salvari wrote:
 It seems to be really simple, I read the columns name with no 
 problem. But as soon as the program parses the first line of 
 data, the array containing the columns names seems to be 
 overwrited.
Another possibility yet not mentioned is to change foreach(line; stdin.byLine()) into foreach(line; stdin.byLineCopy()) to make the older lines' contents available after you read the next line.
Apr 23 2016