www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - CSV crash: "Quote located in unquoted token"

reply Dmitry <dmitry indiedev.ru> writes:
Hi there.
When I load csv, it crashes ("Quote located in unquoted token") 
on lines with quotes, like this:

ResourceNode_RemoveFromView_Confirm,You are about to remove 
""{0}"" from view ""{1}"". Continue?,You are about to remove 
""{0}"" from view ""{1}"". Continue?,,Resource Tree - 
confirmation of removal from the current view

There are 5 records:
ResourceNode_RemoveFromView_Confirm,
You are about to remove ""{0}"" from view ""{1}"". Continue?,
You are about to remove ""{0}"" from view ""{1}"". Continue?,
,
Resource Tree - confirmation of removal from the current view

If I add quotes at begin and end of the 2 and 3 records, then it 
works. But I can't change original file.

How I can avoid the error?

I read file using this code:
foreach(record; 
file.byLine.joiner("\n").csvReader!(Tuple!(string, string, 
string, string, string)))
{
...
}
Oct 13 2017
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
Write a purpose built csv parser using ranges. It should take only about 
half an hour.
I know, I know not very helpful. But a custom built parser will work 
better for you I think.

This should help get you started: 
https://gist.github.com/rikkimax/42c3dfa6500155c5e441cbb1437142ea#file-reports-d-L126
Oct 13 2017
parent reply Dmitry <dmitry indiedev.ru> writes:
On Friday, 13 October 2017 at 09:00:52 UTC, rikki cattermole 
wrote:
 Write a purpose built csv parser using ranges. It should take 
 only about half an hour.
 I know, I know not very helpful. But a custom built parser will 
 work better for you I think.
Yep, I can parse it myself, but I want to try to avoid this (reduce amount of source code). Maybe there is posiible something like this: foreach(record; file.byLine.fixQuotes.joiner("\n").csvReader!... ? What types should get/return the function (fixQuotes) if I want change the line after .byLine? When compiler says: "candidates are: src\phobos\std\array.d(2534,5): std.array.replaceFirst(E, R1, R2)(E[] subject, R1 from, R2 to) if (isDynamicArray!(E[]) && isForwardRange!R1 && is(typeof(appender!(E[])().put(from[0..1]))) && isForwardRange!R2 && is(typeof(appender!(E[])().put(to[0..1]))))" it's scares me and I hiding under the table. I thought about something like auto fixQuotes(string text) { if (text.canFind("\"\"")) { // some magic } return text; } but obviously, it won't compiled
Oct 13 2017
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 13/10/2017 11:05 AM, Dmitry wrote:
 On Friday, 13 October 2017 at 09:00:52 UTC, rikki cattermole wrote:
 Write a purpose built csv parser using ranges. It should take only 
 about half an hour.
 I know, I know not very helpful. But a custom built parser will work 
 better for you I think.
Yep, I can parse it myself, but I want to try to avoid this (reduce amount of source code). Maybe there is posiible something like this: foreach(record; file.byLine.fixQuotes.joiner("\n").csvReader!... ? What types should get/return the function (fixQuotes) if I want change the line after .byLine? When compiler says: "candidates are: src\phobos\std\array.d(2534,5): std.array.replaceFirst(E, R1, R2)(E[] subject, R1 from, R2 to) if (isDynamicArray!(E[]) && isForwardRange!R1 && is(typeof(appender!(E[])().put(from[0..1]))) && isForwardRange!R2 && is(typeof(appender!(E[])().put(to[0..1]))))" it's scares me and I hiding under the table. I thought about something like auto fixQuotes(string text) {     if (text.canFind("\"\""))     {         // some magic     }     return text; } but obviously, it won't compiled
Something along the lines of: .byLine.map!(a => a.fixQuotes).joiner("\n").csvReader!... Either way, you're using ranges! Its even the same amount of code, if not less.
Oct 13 2017
parent Dmitry <dmitry indiedev.ru> writes:
On Friday, 13 October 2017 at 10:11:39 UTC, rikki cattermole 
wrote:
 Something along the lines of:
 .byLine.map!(a => a.fixQuotes).joiner("\n").csvReader!...
Yep, it works.
 Either way, you're using ranges!
 Its even the same amount of code, if not less.
I see. Thank you!
Oct 13 2017
prev sibling parent reply Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Friday, 13 October 2017 at 08:53:12 UTC, Dmitry wrote:
 Hi there.
 When I load csv, it crashes ("Quote located in unquoted token") 
 on lines with quotes, like this:

 ResourceNode_RemoveFromView_Confirm,You are about to remove 
 ""{0}"" from view ""{1}"". Continue?,You are about to remove 
 ""{0}"" from view ""{1}"". Continue?,,Resource Tree - 
 confirmation of removal from the current view
The problem is that a generic CSV parser can't tell what the expected data should look like if the data doesn't follow defined rules. std.csv is not filled with custom parsing rules. You can use Malformed.ignore[1], but your data will come out like: You are about to remove ""{0}"" from view ""{1}"" This would leave you needing to modify the "". 1. https://dlang.org/phobos/std_csv.html#.Malformed
Oct 13 2017
parent Dmitry <dmitry indiedev.ru> writes:
On Friday, 13 October 2017 at 18:53:37 UTC, Jesse Phillips wrote:
 You can use Malformed.ignore[1], but your data will come out 
 like:

 You are about to remove ""{0}"" from view ""{1}""

 This would leave you needing to modify the "".

 1. https://dlang.org/phobos/std_csv.html#.Malformed
I'll look into it. Thank you!
Oct 13 2017