digitalmars.D.learn - csvReader & specifying separator problems...

=?iso-8859-1?Q?Robert_M._M=FCnch?= (49/49) Nov 14 2019 Just trying a very simple thing and it's pretty hard: "Read a CSV file

Mike Parker (25/41) Nov 14 2019 Contents, ErrorLevel, Range, and Separator are template (i.e.

=?iso-8859-1?Q?Robert_M._M=FCnch?= (9/12) Nov 14 2019 Mike, thanks a lot... I feel like an idiot. As casual D programmer the

Jon Degenhardt (74/88) Nov 14 2019 Side comment - This code looks like it was taken from the first

=?iso-8859-1?Q?Robert_M._M=FCnch?= <robert.muench saphirion.com> writes:

Just trying a very simple thing and it's pretty hard: "Read a CSV file 
(raw_data) that has a ; separator so that I can iterate over the lines 
and access the fields."

	csv_data = raw_data.byLine.joiner("\n")

From the docs, which I find extremly hard to understand:

auto�csvReader(Contents = string, Malformed ErrorLevel = 
Malformed.throwException, Range, Separator = char)(Range�input, 
Separator�delimiter�= ',', Separator�quote�= '"')

So, let's see if I can decyphre this, step-by-step by trying out:

	csv_records = csv_data.csvReader();

Would split the CSV data into iterable CSV records using ',' char as 
separator using UFCS syntax. When running this I get:

	std.csv.CSVException /Library/D/dmd/src/phobos/std/csv.d(1283): Row 
1's length 0 does not match previous length of 1.

Which indicates some problem because not all fields are set in my CSV 
data. So let's ignore any error by specifying Malformed.ignore;

	csv_records = csv_data.csvReader(Malformed.ignore);

And now I'm lost (just showing the first candidate):

Error: template std.csv.csvReader cannot deduce function from argument 
types !()(Result, Malformed), candidates are:
/Library/D/dmd/src/phobos/std/csv.d(327):        csvReader(Contents = 
string, Malformed ErrorLevel = Malformed.throwException, Range, 
Separator = char)(Range input, Separator delimiter = ',', Separator 
quote = '"')
  with Contents = string,
       ErrorLevel = cast(Malformed)1,
       Range = Result,
       Separator = Malformed
  whose parameters have the following constraints:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    isInputRange!Range
    is(Unqual!(ElementType!Range) == dchar)
  > isSomeChar!Separator
  - !is(Contents T : T[U], U : string)
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The docs state Malformed as 2nd parameter, since I use UFCS I assume 
that this becomes the first parameter. I don't understand what the 3rd 
parameter (Range) is about. 4th parameter is my separator, which I need 
to set to ';' somehow.

But from the error message, it looks like DMD tries to use 
Malformed.ignore as the 4th (!!) Parameter being the Separator.

I'm totally confused:

* What is used as the 3rd parameter by DMD? Where does it come from?
* How to specify a ';' separator?

This is all pretty confusing...

-- 
Robert M. M�nch
http://www.saphirion.com
smarter | better | faster

Nov 14 2019

Mike Parker <aldacron gmail.com> writes:

On Thursday, 14 November 2019 at 12:25:30 UTC, Robert M. Münch 
wrote:

 From the docs, which I find extremly hard to understand:

 auto csvReader(Contents = string, Malformed ErrorLevel = 
 Malformed.throwException, Range, Separator = char)(Range input, 
 Separator delimiter = ',', Separator quote = '"')


Contents, ErrorLevel, Range, and Separator are template (i.e. 
compile-time) parameters. Input, delimiter, and quote are 
function (i.e. runtime) parameters.

 So, let's see if I can decyphre this, step-by-step by trying 
 out:

 	csv_records = csv_data.csvReader();


Here, you aren't providing any template parameters and only the 
first function parameter, so it's the equivalent to calling the 
function like so:

csvReader!(string, Malformed.throwException, typeof(csv_data), 
char)(csv_data, ',', '"');


 Which indicates some problem because not all fields are set in 
 my CSV data. So let's ignore any error by specifying 
 Malformed.ignore;

 	csv_records = csv_data.csvReader(Malformed.ignore);


csv_records = csv_data.csvReader!(string, Malformed.ignore)();



 The docs state Malformed as 2nd parameter, since I use UFCS I 
 assume that this becomes the first parameter. I don't


Malformed is the 2nd template parameter, your UFCS value is the 
first function parameter.

 understand what the 3rd parameter (Range) is about.

Range is the type of the first parameter. It's common outside of 
Phobos use T and U for template types, but any valid symbol name 
can be used. This template has three type parameters which are 
named according to their purpose (Contents, Range, and 
Separator). Since Range is also the type of the first function 
parameter, the compiler will infer the type if you don't specify 
it.


 4th parameter is my separator, which I need to set to ';' 
 somehow.

The fourth _template_ parameter is the _type_ of your separator 
(and is set to default to char) not the actual separator. You 
want to set the delimiter, which is the second _function_ 
parameter.

csv_records = csv_data.csvReader!(string, Malformed.ignore)(';');

Nov 14 2019

=?iso-8859-1?Q?Robert_M._M=FCnch?= <robert.muench saphirion.com> writes:

On 2019-11-14 13:08:10 +0000, Mike Parker said:

 Contents, ErrorLevel, Range, and Separator are template (i.e. 
 compile-time) parameters. Input, delimiter, and quote are function 
 (i.e. runtime) parameters.

Mike, thanks a lot... I feel like an idiot. As casual D programmer the 
template-syntax is not so easy to get used too because it's not so 
distinguishable.

However, your explanation helps a lot to make things much more clear now.

-- 
Robert M. M�nch
http://www.saphirion.com
smarter | better | faster

Nov 14 2019

Jon Degenhardt <jond noreply.com> writes:

On Thursday, 14 November 2019 at 12:25:30 UTC, Robert M. Münch 
wrote:
 Just trying a very simple thing and it's pretty hard: "Read a 
 CSV file (raw_data) that has a ; separator so that I can 
 iterate over the lines and access the fields."

 	csv_data = raw_data.byLine.joiner("\n")

 From the docs, which I find extremly hard to understand:

 auto csvReader(Contents = string, Malformed ErrorLevel = 
 Malformed.throwException, Range, Separator = char)(Range input, 
 Separator delimiter = ',', Separator quote = '"')

 So, let's see if I can decyphre this, step-by-step by trying 
 out:

 	csv_records = csv_data.csvReader();

 Would split the CSV data into iterable CSV records using ',' 
 char as separator using UFCS syntax. When running this I get:

 [...]

Side comment - This code looks like it was taken from the first 
example in the std.csv documentation. To me, the code in the 
std.csv example is doing something that might not be obvious at 
first glance and is potentially confusing.

In particular, 'byLine' is not reading individual CSV records. 
CSV can have embedded newlines, these are identified by CSV 
escape syntax. 'byLine' doesn't know the escape syntax. If there 
are embedded newlines, 'byLine' will read partial records, which 
may not be obvious at first glance. The .joiner("\n") step puts 
the newline back, stitching fields and records back together 
again in the process.

The effect is to create an input range of characters representing 
the entire file, using 'byLine' to do buffered reads. This input 
range is passed to CSVReader.

This could also be done using 'byChunk' and 'joiner' (with no 
separator). This would use a fixed size buffer, no searching for 
newlines while reading, so it should be faster.

An example:

==== csv_by_chunk.d ====
import std.algorithm;
import std.csv;
import std.conv;
import std.stdio;
import std.typecons;
import std.utf;

void main()
{
     // Small buffer used to show it works. Normally would use a 
larger buffer.
     ubyte[16] buffer;
     auto stdinBytes = stdin.byChunk(buffer).joiner;
     auto stdinDChars = stdinBytes.map!((ubyte b) => cast(char) 
b).byDchar;

     writefln("--------------");
     foreach (record; stdinDChars.csvReader!(Tuple!(string, 
string, string)))
     {
         writefln("Field 0: |%s|", record[0]);
         writefln("Field 1: |%s|", record[1]);
         writefln("Field 2: |%s|", record[2]);
         writefln("--------------");
     }
}

Pass it csv data without embedded newlines:

$ echo $'abc,def,ghi\njkl,mno,pqr' | ./csv_by_chunk
--------------
Field 0: |abc|
Field 1: |def|
Field 2: |ghi|
--------------
Field 0: |jkl|
Field 1: |mno|
Field 2: |pqr|
--------------

Pass it csv data with embedded newlines:

$ echo $'abc,"LINE 1\nLINE 2",ghi\njkl,mno,pqr' | ./csv_by_chunk
--------------
Field 0: |abc|
Field 1: |LINE 1
LINE 2|
Field 2: |ghi|
--------------
Field 0: |jkl|
Field 1: |mno|
Field 2: |pqr|
--------------

An example like this may avoid the confusion about newlines. 
Unfortunately, the need to do the odd looking conversion from 
ubyte to char/dchar is undesirable in a code example. I haven't 
found a cleaner way to write that. If there's a nicer way I'd 
appreciate hearing about it.

--Jon

Nov 14 2019

D Programming

C/C++ Programming

Other

digitalmars.D.learn - csvReader & specifying separator problems...