www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Reminder: Two days left to review Jesse Phillips' CSV Parser

reply dsimcha <dsimcha yahoo.com> writes:
As a reminder, the review of Jesse Phillips' CSV parser ends at the end 
of Friday and will be followed by one week of voting.  Please speak up 
now about any remaining issues.
Nov 09 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 11/9/2011 8:02 PM, dsimcha wrote:
 As a reminder, the review of Jesse Phillips' CSV parser ends at the end
 of Friday and will be followed by one week of voting. Please speak up
 now about any remaining issues.
BTW, here's my second round of review now that some changes have been made: Design-level stuff: How do I use csvReader to process a CSV file as it's being read in (i.e. I don't want to ever have the whole file in memory)? I guess I could do something like: import std.algorithm, std.stdio, std.csv, std.conv; void main() { auto file = File("foo.csv"); auto chars = joiner(file.byLine(), '\n'); auto reader = csvReader(chars); } If this would work, though, then it should be documented. I'd also like to see csvReader tested with lowest-common-denominator input ranges instead of just strings, which are bidirectional ranges and can be treated as random access if they only have ASCII characters. Also, IMHO we really need a ByChar range in std.stdio instead of the joiner kludge I used, though I'm not sure about the details with regard to how Unicode support should work. Still no support for writing CSVs? I understand writing is complicated from an API design perspective. Maybe it would be easiest to just provide an enquote() function that adds quotes to a field and escapes any that are already inside the field and let the user handle the higher level stuff, since this is pretty simple. Minor documentation nitpicks: CSVException: IncompletCellException -> IncompleteCellException (typo) The first csvReader example: You declare the variable count but don't use it. Records, Record: front(), empty(), popFront() should not mention std.range.InputRange. These interfaces are virtual function-based interfaces in case someone needs virtual function-based access to a range. They are not used in most cases when ranges are used because templates are usually better.
Nov 09 2011
parent Jesse Phillips <jessekphillips+d gmail.com> writes:
On Wed, 09 Nov 2011 20:46:51 -0500, dsimcha wrote:

 On 11/9/2011 8:02 PM, dsimcha wrote:
 As a reminder, the review of Jesse Phillips' CSV parser ends at the end
 of Friday and will be followed by one week of voting. Please speak up
 now about any remaining issues.
BTW, here's my second round of review now that some changes have been made: Design-level stuff: How do I use csvReader to process a CSV file as it's being read in (i.e. I don't want to ever have the whole file in memory)? I guess I could do something like: import std.algorithm, std.stdio, std.csv, std.conv; void main() { auto file = File("foo.csv"); auto chars = joiner(file.byLine(), '\n'); auto reader = csvReader(chars); } If this would work, though, then it should be documented.
I do not see why the CSV documentation should be the place to document how to read a file in parts. As we don't have byChar and this is non- trivial workaround I could see wanted to have it somewhere and CSV parsing might be a place to apply it. What really don't like about it is that you don't get to see the input. It would be like std.algorithm showing how sorting worked by showing how to read in an array from a file and sorting it. This is something people need to know how to do so we should have a good tutorial on it, but I don't think this is the place to document it (stdio or std.file is more appropriate). If you strongly disagree I will find some place to throw in another example.
 I'd also like
 to see csvReader tested with lowest-common-denominator input ranges
 instead of just strings, which are bidirectional ranges and can be
 treated as random access if they only have ASCII characters.
Yeah, through in a unittest wrapping a wstring with only input range interface.
 Also, IMHO
 we really need a ByChar range in std.stdio instead of the joiner kludge
 I used, though I'm not sure about the details with regard to how Unicode
 support should work.
I thought there was an undocumented such function, but maybe not.
 Still no support for writing CSVs?  I understand writing is complicated
 from an API design perspective.  Maybe it would be easiest to just
 provide an enquote() function that adds quotes to a field and escapes
 any that are already inside the field and let the user handle the higher
 level stuff, since this is pretty simple.
I'm going to make something, try a couple ideas out. Thanks for the naming suggestion. I ended up deciding not to do anything on it as even if I did get it done, it wouldn't have appropriate review time. So I'll let the voting be the deciding factor on whether the library is delayed because it doesn't have csvFormatter. After more thought I think an enquote() type function must exist even with a csvFormatter, but even laying out documentation on that isn't trivial.
 Minor documentation nitpicks:
 
 CSVException:  IncompletCellException -> IncompleteCellException (typo)
 
 The first csvReader example:  You declare the variable count but don't
 use it.
 
 Records, Record:  front(), empty(), popFront() should not mention
 std.range.InputRange.  These interfaces are virtual function-based
 interfaces in case someone needs virtual function-based access to a
 range.  They are not used in most cases when ranges are used because
 templates are usually better.
Yes, my bad. I keep forgetting we have such interfaces there. I've instead pointed it to isInputRange which is what I meant to do. But then I still call it an interface... which it is, but I don't know what we want to call them. Concepts? :D I want this link in there so if you have better suggestions... Thank you.
Nov 09 2011
prev sibling next sibling parent reply Jonas Drewsen <jdrewsen nospam.com> writes:
On 10/11/11 02.02, dsimcha wrote:
 As a reminder, the review of Jesse Phillips' CSV parser ends at the end
 of Friday and will be followed by one week of voting. Please speak up
 now about any remaining issues.
I've not have a look at the code itself yet. But a couple of comments about the API: auto csvReader(Contents = string, Range, Heading, Separator = char)(Range input, Heading heading, Separator delimiter = ',', Separator quote = '"'); 1, I was under the impression that declaring the return value as auto for std library is not good because it does not document what the return value actually is. Or maybe it is good enough that it is mentioned in the comments under "Returns: ..."? 2, The requirement for passing "cast(string[]) null" as the heading parameter to signal that heading is there but should be ignored seems a bit awkward. Maybe an overload for csvReader where you remove the constraints: isForwardRange!Heading && isSomeString!(ElementType!Heading) and set is(Heading:void*) instead. And in there just call the normal csvReader with cast(string[])null. Btw. if any of this has already been brought up then sorry about that. /Jonas
Nov 10 2011
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, November 10, 2011 06:04 Jonas Drewsen wrote:
 On 10/11/11 02.02, dsimcha wrote:
 As a reminder, the review of Jesse Phillips' CSV parser ends at the end
 of Friday and will be followed by one week of voting. Please speak up
 now about any remaining issues.
I've not have a look at the code itself yet. But a couple of comments about the API: auto csvReader(Contents = string, Range, Heading, Separator = char)(Range input, Heading heading, Separator delimiter = ',', Separator quote = '"'); 1, I was under the impression that declaring the return value as auto for std library is not good because it does not document what the return value actually is. Or maybe it is good enough that it is mentioned in the comments under "Returns: ..."?
Then go look at std.algorithm. It returns auto all over the place. That's usually what you do when returning a new range type. The documentation then says that it's returning a range. And that's essentially what the docs do in this case. It would be horrible to have to give the return type for many functions that return ranges. They're frequently hideous templates that would just scare people (which is how std.algorithm's documentation used to look). All you need to know is that it's a range, not what the exact type is. - Jonathan M Davis
Nov 10 2011
parent Jonas Drewsen <jdrewsen nospam.com> writes:
Den 10-11-2011 19:05, Jonathan M Davis skrev:
 On Thursday, November 10, 2011 06:04 Jonas Drewsen wrote:
 On 10/11/11 02.02, dsimcha wrote:
 As a reminder, the review of Jesse Phillips' CSV parser ends at the end
 of Friday and will be followed by one week of voting. Please speak up
 now about any remaining issues.
I've not have a look at the code itself yet. But a couple of comments about the API: auto csvReader(Contents = string, Range, Heading, Separator = char)(Range input, Heading heading, Separator delimiter = ',', Separator quote = '"'); 1, I was under the impression that declaring the return value as auto for std library is not good because it does not document what the return value actually is. Or maybe it is good enough that it is mentioned in the comments under "Returns: ..."?
Then go look at std.algorithm. It returns auto all over the place. That's usually what you do when returning a new range type. The documentation then says that it's returning a range. And that's essentially what the docs do in this case. It would be horrible to have to give the return type for many functions that return ranges. They're frequently hideous templates that would just scare people (which is how std.algorithm's documentation used to look). All you need to know is that it's a range, not what the exact type is. - Jonathan M Davis
Ok - better go change curl range functions back to auto then then :)
Nov 10 2011
prev sibling parent reply Jesse Phillips <jessekphillips+d gmail.com> writes:
On Thu, 10 Nov 2011 15:04:47 +0100, Jonas Drewsen wrote:

 On 10/11/11 02.02, dsimcha wrote:
 As a reminder, the review of Jesse Phillips' CSV parser ends at the end
 of Friday and will be followed by one week of voting. Please speak up
 now about any remaining issues.
I've not have a look at the code itself yet. But a couple of comments about the API: auto csvReader(Contents = string, Range, Heading, Separator = char)(Range input, Heading heading, Separator delimiter = ',', Separator quote = '"'); 1, I was under the impression that declaring the return value as auto for std library is not good because it does not document what the return value actually is. Or maybe it is good enough that it is mentioned in the comments under "Returns: ..."?
Quite simply, I don't know what it returns. The discussion was on examples using auto, which I also do for basically the same reason. Everyone was in agreement that complicated template returns deserve the use of auto, but I didn't get an answer why _complicated_ types get auto but simple ones can't use it. Luckily mine is complicated.
 2, The requirement for passing "cast(string[]) null" as the heading
 parameter to signal that heading is there but should be ignored seems a
 bit awkward.
 Maybe an overload for csvReader where you remove the constraints:
 isForwardRange!Heading && isSomeString!(ElementType!Heading) and set
 is(Heading:void*) instead. And in there just call the normal csvReader
 with cast(string[])null.
You should be able to pass null, but there is a bug in dmd. You might have a good workaround for that though. Well, it seems I just have to assume that void* means they are passing null. Thank you.
 Btw. if any of this has already been brought up then sorry about that.
 
 /Jonas
Nov 10 2011
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 11/11/2011 06:49 AM, Jesse Phillips wrote:
 On Thu, 10 Nov 2011 15:04:47 +0100, Jonas Drewsen wrote:

 On 10/11/11 02.02, dsimcha wrote:
 As a reminder, the review of Jesse Phillips' CSV parser ends at the end
 of Friday and will be followed by one week of voting. Please speak up
 now about any remaining issues.
I've not have a look at the code itself yet. But a couple of comments about the API: auto csvReader(Contents = string, Range, Heading, Separator = char)(Range input, Heading heading, Separator delimiter = ',', Separator quote = '"'); 1, I was under the impression that declaring the return value as auto for std library is not good because it does not document what the return value actually is. Or maybe it is good enough that it is mentioned in the comments under "Returns: ..."?
Quite simply, I don't know what it returns. The discussion was on examples using auto, which I also do for basically the same reason. Everyone was in agreement that complicated template returns deserve the use of auto, but I didn't get an answer why _complicated_ types get auto but simple ones can't use it. Luckily mine is complicated.
 2, The requirement for passing "cast(string[]) null" as the heading
 parameter to signal that heading is there but should be ignored seems a
 bit awkward.
 Maybe an overload for csvReader where you remove the constraints:
 isForwardRange!Heading&&  isSomeString!(ElementType!Heading) and set
 is(Heading:void*) instead. And in there just call the normal csvReader
 with cast(string[])null.
You should be able to pass null, but there is a bug in dmd. You might have a good workaround for that though. Well, it seems I just have to assume that void* means they are passing null. Thank you.
Kenji has already written a pull request to give null it's own type. Therefore, make sure not to hardcode void*, use typeof(null).
Nov 11 2011
prev sibling parent Dejan Lekic <dejan.lekic gmail.com> writes:
dsimcha wrote:

 As a reminder, the review of Jesse Phillips' CSV parser ends at the end
 of Friday and will be followed by one week of voting.  Please speak up
 now about any remaining issues.
I humbly believe the package for CSV parser is wrong. It should either be std.file.format.csv, std.parser.csv or (my personal preference) std.data.storage.csv . Why I prefer the latest? - CSV is ultimately data storage format. OK, it is often used for *data* exchange, but even then it is a *data-storage*. The fact that MariaDB has it as a storage (CSV storage engine) does not add weight to my argument, but it supports what I am trying to say. :) std.csv is just wrong. I know, it is nice and short, less typing etc, but still wrong. Sure this really is irrelevant to this discussion as csv module can be moved around any time. :) Kind regards
Nov 10 2011