digitalmars.D.learn - Newbie: Error parsing csv file with very long lines

salvari (33/33) Apr 23 2016 Hello all!

rikki cattermole (4/32) Apr 23 2016 Its probably using a buffer.

salvari (3/3) Apr 23 2016 Fixed!!!

Nicholas Wilson (20/23) Apr 23 2016 stdin.byLine() reuses its buffer. so the old arrays in columns

salvari (3/29) Apr 23 2016 Thanks for your clue on std.csv!

rikki cattermole (8/11) Apr 23 2016 .dup duplicates memory.

salvari (3/18) Apr 23 2016 Now I understand. Slices are still biting me every now and then.

Ivan Kazmenko (7/11) Apr 23 2016 Another possibility yet not mentioned is to change

salvari <salvari gmail.com> writes:

Hello all!

I'm trying to read a csv file (';' as separator) with very long 
lines.

It seems to be really simple, I read the columns name with no 
problem. But as soon as the program parses the first line of 
data, the array containing the columns names seems to be 
overwrited.

I'm using dmd: DMD64 D Compiler v2.071.0

My code:

import std.stdio;
import std.algorithm;
import std.array;

char[][] columns;


void main() {
  LINE:foreach(line; stdin.byLine()){
     if(line.startsWith("Interfaz")){
       writeln("IN HERE");
       columns = line.split(";");
       writeln(columns);               // Everything seems to be ok
       continue;
     } else{
       auto linedata = line.split(";");
       writefln("My line: %s", line);        // Fine.
       writefln("LineData: %s", linedata);   // Fine. Line data is 
ok
       writefln("Columns: %s", columns);     // Wrong!!! columsn 
array
                                             // contains garbage 
data
                                             // from linedata
     }
   }
}

Apr 23 2016

rikki cattermole <rikki cattermole.co.nz> writes:

On 23/04/2016 10:40 PM, salvari wrote:
 Hello all!

 I'm trying to read a csv file (';' as separator) with very long lines.

 It seems to be really simple, I read the columns name with no problem.
 But as soon as the program parses the first line of data, the array
 containing the columns names seems to be overwrited.

 I'm using dmd: DMD64 D Compiler v2.071.0

 My code:

 import std.stdio;
 import std.algorithm;
 import std.array;

 char[][] columns;


 void main() {
   LINE:foreach(line; stdin.byLine()){
      if(line.startsWith("Interfaz")){
        writeln("IN HERE");
        columns = line.split(";");
        writeln(columns);               // Everything seems to be ok
        continue;
      } else{
        auto linedata = line.split(";");
        writefln("My line: %s", line);        // Fine.
        writefln("LineData: %s", linedata);   // Fine. Line data is ok
        writefln("Columns: %s", columns);     // Wrong!!! columsn array
                                              // contains garbage data
                                              // from linedata
      }
    }
 }

Its probably using a buffer.
columns = line.dup.split(";");
Should fix it.

Apr 23 2016

salvari <salvari gmail.com> writes:

Fixed!!!

Thanks a lot. :-)


But I have to think about this. I don't understand the failure.

Apr 23 2016

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Saturday, 23 April 2016 at 10:57:04 UTC, salvari wrote:
 Fixed!!!

 Thanks a lot. :-)


 But I have to think about this. I don't understand the failure.

stdin.byLine() reuses its buffer. so the old arrays in columns 
point to the data in byLine's buffer and they get overwritten by 
subsequent calls.

Also if you're trying to parse csv check out std.csv

from the docs

string str = "Hello;65;63.63\nWorld;123;3673.562";
struct Layout
{
     string name;
     int value;
     double other;
}

auto records = csvReader!Layout(str,';');

foreach(record; records)
{
     writeln(record.name);
     writeln(record.value);
     writeln(record.other);
}

Apr 23 2016

salvari <salvari gmail.com> writes:

On Saturday, 23 April 2016 at 11:13:19 UTC, Nicholas Wilson wrote:
 On Saturday, 23 April 2016 at 10:57:04 UTC, salvari wrote:
 Fixed!!!

 Thanks a lot. :-)


 But I have to think about this. I don't understand the failure.

 stdin.byLine() reuses its buffer. so the old arrays in columns 
 point to the data in byLine's buffer and they get overwritten 
 by subsequent calls.

 Also if you're trying to parse csv check out std.csv

 from the docs

 string str = "Hello;65;63.63\nWorld;123;3673.562";
 struct Layout
 {
     string name;
     int value;
     double other;
 }

 auto records = csvReader!Layout(str,';');

 foreach(record; records)
 {
     writeln(record.name);
     writeln(record.value);
     writeln(record.other);
 }

Thanks for your clue on std.csv!

I think I will use it a lot. I totally missed it.

Apr 23 2016

rikki cattermole <rikki cattermole.co.nz> writes:

On 23/04/2016 10:57 PM, salvari wrote:
 Fixed!!!

 Thanks a lot. :-)


 But I have to think about this. I don't understand the failure.

.dup duplicates memory.
What this means is, it allocates a new block of memory and copies the 
values across.

What byLine does is, read up to \n and copies it into a buffer of memory.
Then you get access to said buffer aka line.
So it reuses the memory containing said line, meaning no allocations 
beyond the first and growth of it.

Apr 23 2016

salvari <salvari gmail.com> writes:

On Saturday, 23 April 2016 at 11:18:08 UTC, rikki cattermole 
wrote:
 On 23/04/2016 10:57 PM, salvari wrote:
 Fixed!!!

 Thanks a lot. :-)


 But I have to think about this. I don't understand the failure.

 .dup duplicates memory.
 What this means is, it allocates a new block of memory and 
 copies the values across.

 What byLine does is, read up to \n and copies it into a buffer 
 of memory.
 Then you get access to said buffer aka line.
 So it reuses the memory containing said line, meaning no 
 allocations beyond the first and growth of it.

Now I understand. Slices are still biting me every now and then.

Apr 23 2016

Ivan Kazmenko <gassa mail.ru> writes:

On Saturday, 23 April 2016 at 10:40:13 UTC, salvari wrote:
 It seems to be really simple, I read the columns name with no 
 problem. But as soon as the program parses the first line of 
 data, the array containing the columns names seems to be 
 overwrited.

Another possibility yet not mentioned is to change
foreach(line; stdin.byLine())
into
foreach(line; stdin.byLineCopy())
to make the older lines' contents available after you read the 
next line.

Apr 23 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Newbie: Error parsing csv file with very long lines