www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Tricky semantics of ranges & potentially numerous Phobos bugs

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
In an effort to locate the suspected Phobos bug that I ran into in my
new Cartesian product implementation, I looked over the code for
std.algorithm joiner more carefully, and noticed the following:

Given a range of ranges ror, it assigns ror.front to a struct member and
then calls ror.popFront() immediately. Then as the user calls the joined
range's popFront, it successively pops its local copy of ror.front until
it's empty, whereupon it assigns the new ror.front to the local copy
again, and so on.

While this works for array of arrays, it *doesn't* work for ranges that
overwrite ror.front when ror.popFront() is called.  For example, it
wouldn't work for stdin.byLine(), because the underlying buffer it
reused. Here's the proof:

Code:
	// Filename: test2.d
	import std.algorithm;
	import std.stdio;
	void main() {
		auto lines = stdin.byLine();
		writeln(lines.joiner);
	}

Compile & test:

	# Compile
	dmd test2.d

	# Echo three lines, "abc", "def", and "ghi", and feed it to
	# the program.
	(echo abc; echo def; echo ghi) | ./test2

	# This is the output:
	defghighi

As you can see, the output is mangled, because byLine() has reused the
line buffer before joiner.Result got to it.

Long story short: saving a local copy of range.front and then calling
range.popFront() may _invalidate_ the copy. So basically, either you
need a forward range and use .save to keep track of old values of
range.front, or you have to refrain from calling popFront() until you're
well and truly done with the current value of .front. While not
respecting this will work with many common ranges, it will also fail in
subtle ways when given other ranges.

The scary thing is, I see similar code like this all over Phobos. Does
this mean that most of std.algorithm may need to be revised to address
this issue? At the very least, it would seem that a code audit is in
order to weed out this particular issue.

:-/


T

-- 
A linguistics professor was lecturing to his class one day. "In
English," he said, "A double negative forms a positive. In some
languages, though, such as Russian, a double negative is still a
negative. However, there is no language wherein a double positive can
form a negative."
A voice from the back of the room piped up, "Yeah, yeah."
Oct 15 2012
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/16/12 1:09 AM, H. S. Teoh wrote:
 The scary thing is, I see similar code like this all over Phobos. Does
 this mean that most of std.algorithm may need to be revised to address
 this issue? At the very least, it would seem that a code audit is in
 order to weed out this particular issue.

Such issues do happen, are relatively rare, and are virtually untested because there's been no unittests with a deliberately "bad" input range. Although of course we do need to add the appropriate fixes and unittests, I'm not worried about it at all systemically. Andrei
Oct 17 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/17/12 12:57 PM, David Nadlinger wrote:
 On Wednesday, 17 October 2012 at 15:14:44 UTC, Andrei Alexandrescu wrote:
 Such issues do happen, are relatively rare, and are virtually untested
 because there's been no unittests with a deliberately "bad" input
 range. Although of course we do need to add the appropriate fixes and
 unittests, I'm not worried about it at all systemically.

If an abstraction encourages use in a way which leads to hard-to-detect logic bugs that do not occur with the most common test cases, this is _exactly_ the thing we should be worried about!

When I designed input ranges vs. forward ranges there were long discussion on how to distinguish them. If you have a better design it would probably not influence the state of the affair, but it would be a good discussion to have. FWIW I can already think of a couple but all rely on UFCS. Andrei
Oct 17 2012
prev sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Wednesday, 17 October 2012 at 15:14:44 UTC, Andrei 
Alexandrescu wrote:
 Such issues do happen, are relatively rare, and are virtually 
 untested because there's been no unittests with a deliberately 
 "bad" input range. Although of course we do need to add the 
 appropriate fixes and unittests, I'm not worried about it at 
 all systemically.

If an abstraction encourages use in a way which leads to hard-to-detect logic bugs that do not occur with the most common test cases, this is _exactly_ the thing we should be worried about! David
Oct 17 2012