www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Schroedinger's Ranges

reply vacuum_tube <jxl.ppg gmail.com> writes:
I've been trying to make a struct for CSV parsing and 
manipulating.  The code was as follows:
```
struct CSVData(bool HeaderFromFirstLine)
{
	char[][] header = [];
	char[][][] rest = [];

	this(string filename)
	{
		auto tmp = File(filename).byLine();
		
		if(HeaderFromFirstLine)
		{
			this.header = CSVData.parseCSV(tmp.front()).array;
			tmp.popFront();
		}

		this.rest = tmp.map!(e => parseCSV(e)).array;
	}

	static char[][] parseCSV(char[] str)
	{
		char[][] tmp = split(str, ",");
		return tmp;
	}
	
         void print()
	{
		writeln(this.header);
		foreach(e; this.text)
			writeln(e);
	}
}

void main()
{
	auto data = CSVData!true("testdata");
	data.print();
}
```
The "testdata" text file looked like this:
```
10,15,Hello world
stuff,,more stuff
```
And the output from running it looked like this:
```
["st", "ff", ",more stuff"]
["stuff", "", "more stuff"]
```
As you can see, the `header` field is not printing correctly.  In 
an attempt to debug, I added several `writeln`s to the 
constructor:
```
this(string filename)
{
	auto tmp = File(filename).byLine();
	
	if(HeaderFromFirstLine)
	{
		this.header = CSVData.parseCSV(tmp.front()).array;
		tmp.popFront();
		writeln(this.header);
	}

	this.text = tmp.map!(e => parseCSV(e)).array;
	writeln(this.header);
}
```
This produced the following output:
```
["10", "15", "Hello world"]
["st", "ff", ",more stuff"]
["st", "ff", ",more stuff"]
["stuff", "", "more stuff"]
```
I then tried commenting out the offending line (the one with the 
`map`) and got the expected result:
```
["10", "15", "Hello world"]
["10", "15", "Hello world"]
["10", "15", "Hello world"]
```
Finally, I replaced the offending line and called a different 
function on `tmp`:
```
writeln(tmp.front);
```
And got the following result:
```
["10", "15", "Hello world"]
stuff,,more stuff
["st", "ff", ",more stuff"]
["st", "ff", ",more stuff"]
```
So it appears that observing or modifying `tmp` somehow modifies 
`header`, despite not interacting with it in any visible way.

What is the reason for this?  I'm guessing it either has to do 
with the internals of ranges, or that the arrays were messing up 
somehow, but I'm not sure.

Thanks in advance!
Jun 02
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Thursday, 3 June 2021 at 00:39:04 UTC, vacuum_tube wrote:
 I've been trying to make a struct for CSV parsing and 
 manipulating.  The code was as follows:
 ```
 struct CSVData(bool HeaderFromFirstLine)
 {
 	char[][] header = [];
 	char[][][] rest = [];

 	this(string filename)
 	{
 		auto tmp = File(filename).byLine();
 		
 		if(HeaderFromFirstLine)
 		{
 			this.header = CSVData.parseCSV(tmp.front()).array;
 			tmp.popFront();
 		}

 		this.rest = tmp.map!(e => parseCSV(e)).array;
 	}
 ```
[...]
 The "testdata" text file looked like this:
 ```
 10,15,Hello world
 stuff,,more stuff
 ```
 And the output from running it looked like this:
 ```
 ["st", "ff", ",more stuff"]
 ["stuff", "", "more stuff"]
`File.byLine` overwrites the previous line's data every time it reads a new line. If you want to store each line's data for later use, you need to use [`byLineCopy`][1] instead. [1]: https://phobos.dpldocs.info/std.stdio.File.byLineCopy.1.html
Jun 02
parent reply kdevel <kdevel vogtner.de> writes:
On Thursday, 3 June 2021 at 01:22:14 UTC, Paul Backus wrote:
 		auto tmp = File(filename).byLine();
 `File.byLine` overwrites the previous line's data every time it 
 reads a new line. If you want to store each line's data for 
 later use, you need to use [`byLineCopy`][1] instead.
a) What is the rationale behind not making byLineCopy the default? b) Does not compile: csv.d(17): Error: function csv.CSVData!true.CSVData.parseCSV(char[] str) is not callable using argument types (string) csv.d(17): cannot pass argument tmp.front() of type string to parameter char[] str csv.d(21): Error: function csv.CSVData!true.CSVData.parseCSV(char[] str) is not callable using argument types (string) csv.d(21): cannot pass argument e of type string to parameter char[] str [...]/../../src/phobos/std/algorithm/iteration.d(525): instantiated from here: MapResult!(__lambda2, ByLineCopy!(immutable(char), char)) csv.d(21): instantiated from here: map!(ByLineCopy!(immutable(char), char)) csv.d(40): instantiated from here: CSVData!true c) Reminds me of the necessity to add dups here and there. And reminds me of "helping the compiler" [1]? [1] <https://wiki.c2.com/?HelpingTheCompilerIsEvil>
Jun 03
parent reply Mike Parker <aldacron gmail.com> writes:
On Thursday, 3 June 2021 at 10:18:25 UTC, kdevel wrote:
 a) What is the rationale behind not making byLineCopy the 
 default?
byLine was the original implementation. byLineCopy was added later after the need for it became apparent.
Jun 03
parent reply Mike Parker <aldacron gmail.com> writes:
On Thursday, 3 June 2021 at 10:30:24 UTC, Mike Parker wrote:
 On Thursday, 3 June 2021 at 10:18:25 UTC, kdevel wrote:
 a) What is the rationale behind not making byLineCopy the 
 default?
byLine was the original implementation. byLineCopy was added later after the need for it became apparent.
See: https://forum.dlang.org/post/lg4l7s$11rl$1 digitalmars.com
Jun 03
parent reply kdevel <kdevel vogtner.de> writes:
 a) What is the rationale behind not making byLineCopy the 
 default?
byLine was the original implementation. byLineCopy was added later after the need for it became apparent.
See: https://forum.dlang.org/post/lg4l7s$11rl$1 digitalmars.com
THX. BTW byLineCopy defaults to immutable char. That's why one has to use auto tmp = File(filename).byLineCopy!(char, char); or auto tmp = File(filename).byLine.map!dup;
Jun 03
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/3/21 9:00 AM, kdevel wrote:
 a) What is the rationale behind not making byLineCopy the default?
byLine was the original implementation. byLineCopy was added later after the need for it became apparent.
See: https://forum.dlang.org/post/lg4l7s$11rl$1 digitalmars.com
THX. BTW byLineCopy defaults to immutable char. That's why one has to use     auto tmp = File(filename).byLineCopy!(char, char); or     auto tmp = File(filename).byLine.map!dup;
I was going to suggest use byLineCopy!(char, char), because the second option with map makes a copy every time you call front. And, my goodness, that is backwards for the template parameters. The terminator type should be determined by IFTI, it should never have been the first template parameter! -Steve
Jun 03
prev sibling parent WebFreak001 <d.forum webfreak.org> writes:
On Thursday, 3 June 2021 at 00:39:04 UTC, vacuum_tube wrote:
 I've been trying to make a struct for CSV parsing and 
 manipulating.  The code was as follows:
 ```
 struct CSVData(bool HeaderFromFirstLine)
 {
 	char[][] header = [];
 	char[][][] rest = [];

 ```
 [...]
additionally to the other comment, you probably want to use `string` (`immutable(char)[]`) instead of char[] here, as you want your data to stay the same and not be modified after assignment. If you replace them with `string` and have your code be ` safe`, the compiler will tell you where you try to assign your char[] data that may be modified and in those cases you would want to call `.idup` to duplicate the data to make it persistent.
Jun 03