www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - csvReader read file byLine()?

reply Jens Mueller <jens.k.mueller gmx.de> writes:
Hi,

I used std.csv for reading a CSV file.
Thanks a lot to Jesse for writing and adding std.csv to Phobos.
Using it is fairly straightforward but I miss one thing. Very commonly
you need to read a CSV file. With std.csv that boils down to

auto records = csvReader!(Record)(readText(filename));

But csvReader won't parse from File(filename, "r").byLine() even though
that is an InputRange, isn't it? That means I always have to use
readText. All IO happens that very moment.
Shouldn't the csvReader support lazy reading from a file like this

auto file = File(filename, "r");
auto records = csvReader!(Record)(file.byLine());

Am I missing something? Was this left out for a reason or an oversight?

Jens
Jun 21 2012
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 06/21/2012 02:17 PM, Jens Mueller wrote:
 Hi,

 I used std.csv for reading a CSV file.
 Thanks a lot to Jesse for writing and adding std.csv to Phobos.
 Using it is fairly straightforward but I miss one thing. Very commonly
 you need to read a CSV file. With std.csv that boils down to

 auto records = csvReader!(Record)(readText(filename));

 But csvReader won't parse from File(filename, "r").byLine() even though
 that is an InputRange, isn't it? That means I always have to use
 readText. All IO happens that very moment.
 Shouldn't the csvReader support lazy reading from a file like this

 auto file = File(filename, "r");
 auto records = csvReader!(Record)(file.byLine());

 Am I missing something? Was this left out for a reason or an oversight?

 Jens

You might make use of std.algorithm.joiner.
Jun 21 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/22/12 10:44 AM, Jens Mueller wrote:
 Thanks. That works. But this should be either mentioned in the
 documentation or fixed. I would prefer a fix because the code above
 looks like a work around. Probably byLine or joiner then need some
 fixing. What do you think?

Yah, it's a bug in joiner that Walter also found. Andrei
Jun 27 2012
prev sibling next sibling parent Jens Mueller <jens.k.mueller gmx.de> writes:
Timon Gehr wrote:
 On 06/21/2012 02:17 PM, Jens Mueller wrote:
Hi,

I used std.csv for reading a CSV file.
Thanks a lot to Jesse for writing and adding std.csv to Phobos.
Using it is fairly straightforward but I miss one thing. Very commonly
you need to read a CSV file. With std.csv that boils down to

auto records = csvReader!(Record)(readText(filename));

But csvReader won't parse from File(filename, "r").byLine() even though
that is an InputRange, isn't it? That means I always have to use
readText. All IO happens that very moment.
Shouldn't the csvReader support lazy reading from a file like this

auto file = File(filename, "r");
auto records = csvReader!(Record)(file.byLine());

Am I missing something? Was this left out for a reason or an oversight?

Jens

You might make use of std.algorithm.joiner.

You mean like auto file = File(filename, "r"); auto records = csvReader!(Record)(joiner(file.byLine(KeepTerminator.yes))); Then a CSVException is raised. Don't know why. Have to investigate. Thanks for the pointer. BTW std.stdio should publicly import std.string.KeepTerminator, shouldn't it. Otherwise you have to import it yourself. Jens
Jun 21 2012
prev sibling next sibling parent reply Jens Mueller <jens.k.mueller gmx.de> writes:
Timon Gehr wrote:
 On 06/21/2012 02:17 PM, Jens Mueller wrote:
Hi,

I used std.csv for reading a CSV file.
Thanks a lot to Jesse for writing and adding std.csv to Phobos.
Using it is fairly straightforward but I miss one thing. Very commonly
you need to read a CSV file. With std.csv that boils down to

auto records = csvReader!(Record)(readText(filename));

But csvReader won't parse from File(filename, "r").byLine() even though
that is an InputRange, isn't it? That means I always have to use
readText. All IO happens that very moment.
Shouldn't the csvReader support lazy reading from a file like this

auto file = File(filename, "r");
auto records = csvReader!(Record)(file.byLine());

Am I missing something? Was this left out for a reason or an oversight?

Jens

You might make use of std.algorithm.joiner.

The problem is that csvParser expects a range with elements of type dchar. Any idea why that is required for CSV parsing? Jens
Jun 21 2012
parent reply travert phare.normalesup.org (Christophe Travert) writes:
Jens Mueller , dans le message (digitalmars.D:170448), a écrit :
 Jesse Phillips wrote:
 On Friday, 22 June 2012 at 08:12:59 UTC, Jens Mueller wrote:
The last line throws a CSVException due to some conversion error
'Floating point conversion error for input "".' for the attached
input.

If you change the input to
3.0
4.0
you get no exception but wrong a output of
[[4], [4]]
.

Yes, and it seems .joiner isn't as lazy as I'd have thought. byLine() reuses its buffer so it will overwrite previous lines in the file. This can be resolved by mapping a dup to it. import std.stdio; import std.algorithm; import std.csv; void main() { struct Record { double one; } auto filename = "file.csv"; auto file = File(filename, "r"); auto input = map!(a => a.idup)(file.byLine()).joiner("\n"); auto records = csvReader!Record(input); foreach(r; records) { writeln(r); } }

Thanks. That works. But this should be either mentioned in the documentation or fixed. I would prefer a fix because the code above looks like a work around. Probably byLine or joiner then need some fixing. What do you think? Jens

Yes, and that increases GC usage a lot. Looking at the implementation, joiner as a behavior that is incompatible with ranges reusing some buffer: joiner immidiately call's the range of range's popFront after having taken its front range. Instead, it should wait until it is necessary before calling popFront (at least until all the data has be read by the next tool of the chain). Fixing this should not be very hard. Is there an issue preventing to make this change? -- Christophe
Jun 22 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/22/12 11:11 AM, Christophe Travert wrote:
 Looking at the implementation, joiner as a behavior that is incompatible
 with ranges reusing some buffer:

 joiner immidiately call's the range of range's popFront after having
 taken its front range. Instead, it should wait until it is necessary
 before calling popFront (at least until all the data has be read by the
 next tool of the chain).

 Fixing this should not be very hard. Is there an issue preventing to
 make this change?

No, it should be easily fixable. Andrei
Jun 27 2012
prev sibling next sibling parent "Jesse Phillips" <jessekphillips+D gmail.com> writes:
On Thursday, 21 June 2012 at 20:30:07 UTC, Jens Mueller wrote:

auto file = File(filename, "r");
auto records = csvReader!(Record)(file.byLine());

Am I missing something? Was this left out for a reason or an 
oversight?

Jens

You might make use of std.algorithm.joiner.

The problem is that csvParser expects a range with elements of type dchar. Any idea why that is required for CSV parsing? Jens

It requires a dchar range so that Unicode support is enforced. It is the same reason char[] is a range of dchar. You'll have to give me some example code, my test has no issue using joiner with byLine. import std.stdio; import std.algorithm; import std.csv; void main() { struct Record { string one, two, three; } auto filename = "file.csv"; auto file = File(filename, "r"); auto records = csvReader!Record(file.byLine().joiner("\n")); foreach(r; records) { writeln(r); } }
Jun 21 2012
prev sibling next sibling parent Jens Mueller <jens.k.mueller gmx.de> writes:
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Jesse Phillips wrote:
 On Thursday, 21 June 2012 at 20:30:07 UTC, Jens Mueller wrote:
 
auto file = File(filename, "r");
auto records = csvReader!(Record)(file.byLine());

Am I missing something? Was this left out for a reason or an
oversight?

Jens

You might make use of std.algorithm.joiner.

The problem is that csvParser expects a range with elements of type dchar. Any idea why that is required for CSV parsing? Jens

It requires a dchar range so that Unicode support is enforced. It is the same reason char[] is a range of dchar. You'll have to give me some example code, my test has no issue using joiner with byLine. import std.stdio; import std.algorithm; import std.csv; void main() { struct Record { string one, two, three; } auto filename = "file.csv"; auto file = File(filename, "r"); auto records = csvReader!Record(file.byLine().joiner("\n")); foreach(r; records) { writeln(r); } }

auto file = File("test.csv", "r"); auto records = csvReader!double(file.byLine().joiner("\n")); writeln(records); The last line throws a CSVException due to some conversion error 'Floating point conversion error for input "".' for the attached input. If you change the input to 3.0 4.0 you get no exception but wrong a output of [[4], [4]] . Using readText or auto records = csvReader!Record(["3.00", "4.0"].joiner("\n")); works as expected. Can you reproduce the issue? I'm running dmd2.059 on Linux. Thanks. Jens
Jun 22 2012
prev sibling next sibling parent "Jesse Phillips" <jessekphillips+D gmail.com> writes:
On Friday, 22 June 2012 at 08:12:59 UTC, Jens Mueller wrote:
 The last line throws a CSVException due to some conversion error
 'Floating point conversion error for input "".' for the 
 attached input.

 If you change the input to
 3.0
 4.0
 you get no exception but wrong a output of
 [[4], [4]]
 .

Yes, and it seems .joiner isn't as lazy as I'd have thought. byLine() reuses its buffer so it will overwrite previous lines in the file. This can be resolved by mapping a dup to it. import std.stdio; import std.algorithm; import std.csv; void main() { struct Record { double one; } auto filename = "file.csv"; auto file = File(filename, "r"); auto input = map!(a => a.idup)(file.byLine()).joiner("\n"); auto records = csvReader!Record(input); foreach(r; records) { writeln(r); } }
Jun 22 2012
prev sibling next sibling parent Jens Mueller <jens.k.mueller gmx.de> writes:
Jesse Phillips wrote:
 On Friday, 22 June 2012 at 08:12:59 UTC, Jens Mueller wrote:
The last line throws a CSVException due to some conversion error
'Floating point conversion error for input "".' for the attached
input.

If you change the input to
3.0
4.0
you get no exception but wrong a output of
[[4], [4]]
.

Yes, and it seems .joiner isn't as lazy as I'd have thought. byLine() reuses its buffer so it will overwrite previous lines in the file. This can be resolved by mapping a dup to it. import std.stdio; import std.algorithm; import std.csv; void main() { struct Record { double one; } auto filename = "file.csv"; auto file = File(filename, "r"); auto input = map!(a => a.idup)(file.byLine()).joiner("\n"); auto records = csvReader!Record(input); foreach(r; records) { writeln(r); } }

Thanks. That works. But this should be either mentioned in the documentation or fixed. I would prefer a fix because the code above looks like a work around. Probably byLine or joiner then need some fixing. What do you think? Jens
Jun 22 2012
prev sibling next sibling parent "Jesse Phillips" <Jessekphillips+D gmail.com> writes:
On Friday, 22 June 2012 at 15:11:14 UTC, 
travert phare.normalesup.org (Christophe Travert) wrote:

 Fixing this should not be very hard. Is there an issue 
 preventing to
 make this change?

I'd say start by filing a bug that joiner does not work with File.byLine()
Jun 22 2012
prev sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, June 22, 2012 19:33:39 Jesse Phillips wrote:
 On Friday, 22 June 2012 at 15:11:14 UTC,
 
 travert phare.normalesup.org (Christophe Travert) wrote:
 Fixing this should not be very hard. Is there an issue
 preventing to
 make this change?

I'd say start by filing a bug that joiner does not work with File.byLine()

http://d.puremagic.com/issues/show_bug.cgi?id=8085 - Jonathan M Davis
Jun 22 2012