www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Buffered Files & Associative Arrays

reply Michael <mcoupland gmail.com> writes:
Greetings all!

When I compile and run the below program with a sample input test.txt file, I
get some very strange behavior. It behaves like a problem with strange strings
coming from a BufferedFile that for some reason the associative array can't
handle.

With test.txt containing three one-character lines:
	a
	b
	c

...I get the output:
	a 1 
	b 2 b 2 
	c 3 c 3 c 3 

...rather than the expected:
	a 1 
	a 1 b 2 
	a 1 b 2 c 3 

With test.txt containing longer strings:
	first
	second
	third

...the program crashes entirely with the following output:
	first 1
	Error: ArrayBoundsError TestArray(15)


However, if I replace the two relevant lines with the following:
	string[] file = ["first","second","third"]; // or ["a","b","c"]
	foreach( int n, string line; file )

...then the program runs as expected. But what's the difference?? Adding
newlines to the string constants above doesn't do any harm, which was what I
had first suspected as the culprit.

I don't think I'm missing anything obvious; can someone please confirm I'm not
crazy?

Thanks!
	Michael

--------------------------------------------------------------

import std.stdio;
import std.stream;

int main( char[][] args )
{
	int[string] Ar;
	
	Stream file = new BufferedFile("test.txt");
	
	foreach( ulong n, string line; file )
	{
		Ar[line] = n;
		
		foreach( string k; Ar.keys )
			writef("%s %d ", k, Ar[k] );

		writefln("");
	}

	return 0;
}
Jan 22 2008
next sibling parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
Well, this still happens for "File", so it's not as if it's a 
BufferedFile issue.

As it happens, the problem is the way you are abusing File's buffer. 
You're taking the line, and using it... where the stream is overwriting 
that space with new data.

Find:

Ar[line] = n;

Replace:

Ar[line.dup] = n;

That should solve your problems.

-[Unknown]


Michael wrote:
 Greetings all!
 
 When I compile and run the below program with a sample input test.txt file, I
get some very strange behavior. It behaves like a problem with strange strings
coming from a BufferedFile that for some reason the associative array can't
handle.
 
 With test.txt containing three one-character lines:
 	a
 	b
 	c
 
 ...I get the output:
 	a 1 
 	b 2 b 2 
 	c 3 c 3 c 3 
 
 ...rather than the expected:
 	a 1 
 	a 1 b 2 
 	a 1 b 2 c 3 
 
 With test.txt containing longer strings:
 	first
 	second
 	third
 
 ...the program crashes entirely with the following output:
 	first 1
 	Error: ArrayBoundsError TestArray(15)
 
 
 However, if I replace the two relevant lines with the following:
 	string[] file = ["first","second","third"]; // or ["a","b","c"]
 	foreach( int n, string line; file )
 
 ...then the program runs as expected. But what's the difference?? Adding
newlines to the string constants above doesn't do any harm, which was what I
had first suspected as the culprit.
 
 I don't think I'm missing anything obvious; can someone please confirm I'm not
crazy?
 
 Thanks!
 	Michael
 
 --------------------------------------------------------------
 
 import std.stdio;
 import std.stream;
 
 int main( char[][] args )
 {
 	int[string] Ar;
 	
 	Stream file = new BufferedFile("test.txt");
 	
 	foreach( ulong n, string line; file )
 	{
 		Ar[line] = n;
 		
 		foreach( string k; Ar.keys )
 			writef("%s %d ", k, Ar[k] );
 
 		writefln("");
 	}
 
 	return 0;
 }
 
Jan 22 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Unknown W. Brackets:
 As it happens, the problem is the way you are abusing File's buffer. 
 You're taking the line, and using it... where the stream is overwriting 
 that space with new data.
Yes, D is rather unsafe in that regard. To avoid this kind of bugs I add a "bool copy=true" as a template parameter (constant at compile time) to all my classes that return iterable objects then manage lot of data. So by default they perform the copy, and you avoid that whole class of bugs. When you know what you are doing and you want to go faster (sometimes 10 times faster) accepting a bit less safe code, you set that copy flag to false, and it keeps using the same buffer. I think the Phobos can grow such extra parameter in its iterable objects to avoid such kind of bugs. Bye, bearophile
Jan 22 2008
prev sibling next sibling parent reply Michael <mcoupland gmail.com> writes:
Wow, yeah I think that's pretty unfortunate. I haven't done much D coding, and
was only tangentially aware of the copy-on-write nature of D arrays (which I
think is the underlying cause of this bug/feature...?)

This seems to seriously violate the principle of least surprise: I strongly
suspect that most non-D programmers would make the same assumption I did. It's
one thing when you're passing around a bunch of char*'s; but this is a full
featured string class!

Chalk it up to the pains of learning D if you want, but I'm not confident I
won't make this mistake numerous times (resulting in potentially strange and
hard-to-solve bugs) before getting it straight in my head, which is very
frustrating... :(

bearophile Wrote:

 Unknown W. Brackets:
 As it happens, the problem is the way you are abusing File's buffer. 
 You're taking the line, and using it... where the stream is overwriting 
 that space with new data.
Yes, D is rather unsafe in that regard. To avoid this kind of bugs I add a "bool copy=true" as a template parameter (constant at compile time) to all my classes that return iterable objects then manage lot of data. So by default they perform the copy, and you avoid that whole class of bugs. When you know what you are doing and you want to go faster (sometimes 10 times faster) accepting a bit less safe code, you set that copy flag to false, and it keeps using the same buffer. I think the Phobos can grow such extra parameter in its iterable objects to avoid such kind of bugs. Bye, bearophile
Jan 22 2008
parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
At the end of the day, you still need to have some tracking of memory 
management.  It's just not as complicated as with C/C++.

That is, someone still "owns" the data.  In this case, it's the stream. 
  The stream may change this data (since it owns it) which will screw 
you up unless you copy it.

This is actually not copy on write.  But, copy on write would make the 
stream functions very slow since they would constantly be allocating 
memory while reading...

-[Unknown]


Michael wrote:
 Wow, yeah I think that's pretty unfortunate. I haven't done much D coding, and
was only tangentially aware of the copy-on-write nature of D arrays (which I
think is the underlying cause of this bug/feature...?)
 
 This seems to seriously violate the principle of least surprise: I strongly
suspect that most non-D programmers would make the same assumption I did. It's
one thing when you're passing around a bunch of char*'s; but this is a full
featured string class!
 
 Chalk it up to the pains of learning D if you want, but I'm not confident I
won't make this mistake numerous times (resulting in potentially strange and
hard-to-solve bugs) before getting it straight in my head, which is very
frustrating... :(
 
 bearophile Wrote:
 
 Unknown W. Brackets:
 As it happens, the problem is the way you are abusing File's buffer. 
 You're taking the line, and using it... where the stream is overwriting 
 that space with new data.
Yes, D is rather unsafe in that regard. To avoid this kind of bugs I add a "bool copy=true" as a template parameter (constant at compile time) to all my classes that return iterable objects then manage lot of data. So by default they perform the copy, and you avoid that whole class of bugs. When you know what you are doing and you want to go faster (sometimes 10 times faster) accepting a bit less safe code, you set that copy flag to false, and it keeps using the same buffer. I think the Phobos can grow such extra parameter in its iterable objects to avoid such kind of bugs. Bye, bearophile
Jan 22 2008
parent Brad Roberts <braddr puremagic.com> writes:
In 2.x you can probably make it safe by declaring the key as invariant. 
  I haven't actually tried it to see how well it works out, but in 
concept that's how keys ought to behave.

Later,
Brad


Unknown W. Brackets wrote:
 At the end of the day, you still need to have some tracking of memory 
 management.  It's just not as complicated as with C/C++.
 
 That is, someone still "owns" the data.  In this case, it's the stream. 
  The stream may change this data (since it owns it) which will screw you 
 up unless you copy it.
 
 This is actually not copy on write.  But, copy on write would make the 
 stream functions very slow since they would constantly be allocating 
 memory while reading...
 
 -[Unknown]
 
 
 Michael wrote:
 Wow, yeah I think that's pretty unfortunate. I haven't done much D 
 coding, and was only tangentially aware of the copy-on-write nature of 
 D arrays (which I think is the underlying cause of this bug/feature...?)

 This seems to seriously violate the principle of least surprise: I 
 strongly suspect that most non-D programmers would make the same 
 assumption I did. It's one thing when you're passing around a bunch of 
 char*'s; but this is a full featured string class!

 Chalk it up to the pains of learning D if you want, but I'm not 
 confident I won't make this mistake numerous times (resulting in 
 potentially strange and hard-to-solve bugs) before getting it straight 
 in my head, which is very frustrating... :(

 bearophile Wrote:

 Unknown W. Brackets:
 As it happens, the problem is the way you are abusing File's buffer. 
 You're taking the line, and using it... where the stream is 
 overwriting that space with new data.
Yes, D is rather unsafe in that regard. To avoid this kind of bugs I add a "bool copy=true" as a template parameter (constant at compile time) to all my classes that return iterable objects then manage lot of data. So by default they perform the copy, and you avoid that whole class of bugs. When you know what you are doing and you want to go faster (sometimes 10 times faster) accepting a bit less safe code, you set that copy flag to false, and it keeps using the same buffer. I think the Phobos can grow such extra parameter in its iterable objects to avoid such kind of bugs. Bye, bearophile
Jan 22 2008
prev sibling parent Gide Nwawudu <gide btinternet.com> writes:
On Tue, 22 Jan 2008 03:35:01 -0500, Michael <mcoupland gmail.com>
wrote:

Greetings all!

When I compile and run the below program with a sample input test.txt file, I
get some very strange behavior. It behaves like a problem with strange strings
coming from a BufferedFile that for some reason the associative array can't
handle.

With test.txt containing three one-character lines:
	a
	b
	c

...I get the output:
	a 1 
	b 2 b 2 
	c 3 c 3 c 3 

...rather than the expected:
	a 1 
	a 1 b 2 
	a 1 b 2 c 3 

With test.txt containing longer strings:
	first
	second
	third

...the program crashes entirely with the following output:
	first 1
	Error: ArrayBoundsError TestArray(15)


However, if I replace the two relevant lines with the following:
	string[] file = ["first","second","third"]; // or ["a","b","c"]
	foreach( int n, string line; file )

...then the program runs as expected. But what's the difference?? Adding
newlines to the string constants above doesn't do any harm, which was what I
had first suspected as the culprit.

I don't think I'm missing anything obvious; can someone please confirm I'm not
crazy?

Thanks!
	Michael

--------------------------------------------------------------

import std.stdio;
import std.stream;

int main( char[][] args )
{
	int[string] Ar;
	
	Stream file = new BufferedFile("test.txt");
	
	foreach( ulong n, string line; file )
	{
		Ar[line] = n;
		
		foreach( string k; Ar.keys )
			writef("%s %d ", k, Ar[k] );

		writefln("");
	}

	return 0;
}
Without D2's const/invariant enhancements it is very easy introduce this bug. FWIW your code does not compile on D2. The following code produces the correct output. import std.stdio; import std.stream; int main( char[][] args ) { int[string] Ar; Stream file = new BufferedFile("test.txt"); foreach( ulong n, char[] line; file ) // mutable line variable { Ar[line.idup] = n; // idup needed foreach( string k; Ar.keys ) writef("%s %d ", k, Ar[k] ); writefln(""); } return 0; } Gide
Jan 23 2008