www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - char[] annoyance...

reply "Regan Heath" <regan netwin.co.nz> writes:
Take this code:

void main()
{
	//..open a file, read line for line, on each line:

	for(int i = 0; i < line.length-2; i++) {
		if (line[i..i+2] != "||") continue;
		//..etc..
	}
}


There is a subtle bug. On all lines with a length of 0 or 1 it will give  
the following error:

Error: ArrayBoundsError line_length.d(6)


The problem is of course the statement "i < line.length-2". line.length is  
unsigned, and when you - 2 from an unsigned value.. well lets just say  
that it's bigger than the actual length of the line - 2.


Of course there are plently of other ways to code this, perhaps using  
foreach, but that's not the point. The point is that this code  _can_ be  
written and on the surface looks fine. Not even -w (warnings) spots the  
signed/unsigned problem. At the very least can we get a warning for this?

Regan
Apr 09 2006
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:

 Take this code:
 
 void main()
 {
 	//..open a file, read line for line, on each line:
 
 	for(int i = 0; i < line.length-2; i++) {
 		if (line[i..i+2] != "||") continue;
 		//..etc..
 	}
 }
 
 There is a subtle bug. On all lines with a length of 0 or 1 it will give  
 the following error:
 
 Error: ArrayBoundsError line_length.d(6)
 
 The problem is of course the statement "i < line.length-2". line.length is  
 unsigned, and when you - 2 from an unsigned value.. well lets just say  
 that it's bigger than the actual length of the line - 2.
 
 Of course there are plently of other ways to code this, perhaps using  
 foreach, but that's not the point. The point is that this code  _can_ be  
 written and on the surface looks fine. Not even -w (warnings) spots the  
 signed/unsigned problem. At the very least can we get a warning for this?

I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... } -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 10/04/2006 11:54:09 AM
Apr 09 2006
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek psych.ward> wrote:
 On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:

 Take this code:

 void main()
 {
 	//..open a file, read line for line, on each line:

 	for(int i = 0; i < line.length-2; i++) {
 		if (line[i..i+2] != "||") continue;
 		//..etc..
 	}
 }

 There is a subtle bug. On all lines with a length of 0 or 1 it will give
 the following error:

 Error: ArrayBoundsError line_length.d(6)

 The problem is of course the statement "i < line.length-2". line.length  
 is
 unsigned, and when you - 2 from an unsigned value.. well lets just say
 that it's bigger than the actual length of the line - 2.

 Of course there are plently of other ways to code this, perhaps using
 foreach, but that's not the point. The point is that this code  _can_ be
 written and on the surface looks fine. Not even -w (warnings) spots the
 signed/unsigned problem. At the very least can we get a warning for  
 this?

I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }

Thanks Derek, that's exactly what I did. The point of this post isn't to get help with one specific problem but rather to ask whether there is a solution to the underlying problem behaviour. I suspect this behaviour will be the source of many bugs to come and I wonder if there is a way to avoid them. Regan
Apr 09 2006
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 10 Apr 2006 14:04:35 +1200, Regan Heath wrote:

 On Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek psych.ward> wrote:
 On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:

 Take this code:

 void main()
 {
 	//..open a file, read line for line, on each line:

 	for(int i = 0; i < line.length-2; i++) {
 		if (line[i..i+2] != "||") continue;
 		//..etc..
 	}
 }

 There is a subtle bug. On all lines with a length of 0 or 1 it will give
 the following error:

 Error: ArrayBoundsError line_length.d(6)

 The problem is of course the statement "i < line.length-2". line.length  
 is
 unsigned, and when you - 2 from an unsigned value.. well lets just say
 that it's bigger than the actual length of the line - 2.

 Of course there are plently of other ways to code this, perhaps using
 foreach, but that's not the point. The point is that this code  _can_ be
 written and on the surface looks fine. Not even -w (warnings) spots the
 signed/unsigned problem. At the very least can we get a warning for  
 this?

I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }

Thanks Derek, that's exactly what I did.

Of course; that was obvious. No put-down was implied ;-)
 The point of this post isn't to get help with one specific problem but  
 rather to ask whether there is a solution to the underlying problem  
 behaviour. I suspect this behaviour will be the source of many bugs to  
 come and I wonder if there is a way to avoid them.

The point of my reply was that the "solution to the underlying problem behaviour" is to properly express (and thus document) the algorithm rather than rely on side-effects of the implemented language. In this case, as an example of such algorithm documentation, if one explicitly states that the search only applies to lines of two or more characters, it helps the reader of the code understand your intentions. Without that, the code phrase (i < line.length-2) might not trigger the reader's thoughts about handling short lines. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 10/04/2006 12:10:09 PM
Apr 09 2006
next sibling parent "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 10 Apr 2006 12:17:30 +1000, Derek Parnell <derek psych.ward> wrote:
 On Mon, 10 Apr 2006 14:04:35 +1200, Regan Heath wrote:

 On Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek psych.ward>  
 wrote:
 On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:

 Take this code:

 void main()
 {
 	//..open a file, read line for line, on each line:

 	for(int i = 0; i < line.length-2; i++) {
 		if (line[i..i+2] != "||") continue;
 		//..etc..
 	}
 }

 There is a subtle bug. On all lines with a length of 0 or 1 it will  
 give
 the following error:

 Error: ArrayBoundsError line_length.d(6)

 The problem is of course the statement "i < line.length-2".  
 line.length
 is
 unsigned, and when you - 2 from an unsigned value.. well lets just say
 that it's bigger than the actual length of the line - 2.

 Of course there are plently of other ways to code this, perhaps using
 foreach, but that's not the point. The point is that this code  _can_  
 be
 written and on the surface looks fine. Not even -w (warnings) spots  
 the
 signed/unsigned problem. At the very least can we get a warning for
 this?

I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }

Thanks Derek, that's exactly what I did.

Of course; that was obvious. No put-down was implied ;-)

and none was taken. :)
 The point of this post isn't to get help with one specific problem but
 rather to ask whether there is a solution to the underlying problem
 behaviour. I suspect this behaviour will be the source of many bugs to
 come and I wonder if there is a way to avoid them.

The point of my reply was that the "solution to the underlying problem behaviour" is to properly express (and thus document) the algorithm rather than rely on side-effects of the implemented language.

You're right, in that this bug is caused by a side-effect of the language. I disagree that the algorithm is/was incompletely/incorrectly expressed (more on this below). WRT the side-effect; My question is, do we attempt to prevent future bugs of this sort or do we ignore it? Off the top of my head, we can: a - avoid the side-effect i.e. make length signed. b - attempt to catch it, and others like it. i.e. add a signed/unsigned warning. c - have a type with the range of an unsigned type and a lower bound of 0 (no underflow)
 In this case, as an example of such algorithm documentation, if one
 explicitly states that the search only applies to lines of two or more
 characters, it helps the reader of the code understand your intentions.
 Without that, the code phrase (i < line.length-2) might not trigger the
 reader's thoughts about handling short lines.

I agree with the general statement; you should express the algorithm in a clear fashion. In fact that it why I think we need to fix this. This side-effect results in a less clear/concise expression of the actual algorithm. It results in a superfluous check for a non-existant special case. Allow me to elaborate.. From a purely algorithmic perspective there is nothing special about short lines in this case, the same rule applies as to any other line, "the index 'i' starts at 0 and must always be less than the length of the line - 2". I believe this is a complete and correct expression of the algorithm itself. However, a side-effect of the representation (code/language) is creating a special case for short lines which does not logically exist. A special case which is not obvious at the time the code is written, resulting in a bug. A special case which complicates the representation of the algorithm (the code) by requiring we check for short lines for no gain and some loss in terms of performance (negligible) and readability. Of course you can argue that the limitations of the representation (code/language) are always going to affect how we can express any given algorithm, this is true. Ideally however we should be able to express an algorithm as precisely/exactly as possible without a risk of a side-effects causing them to fail. At the very least, in this case, we can make the programmer aware of the side effect where it applies. Regan
Apr 09 2006
prev sibling parent Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Derek Parnell wrote:
 On Mon, 10 Apr 2006 14:04:35 +1200, Regan Heath wrote:
 
 On Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek psych.ward> wrote:
 On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:

 Take this code:

 void main()
 {
 	//..open a file, read line for line, on each line:

 	for(int i = 0; i < line.length-2; i++) {
 		if (line[i..i+2] != "||") continue;
 		//..etc..
 	}
 }

 There is a subtle bug. On all lines with a length of 0 or 1 it will give
 the following error:

 Error: ArrayBoundsError line_length.d(6)

 The problem is of course the statement "i < line.length-2". line.length  
 is
 unsigned, and when you - 2 from an unsigned value.. well lets just say
 that it's bigger than the actual length of the line - 2.

 Of course there are plently of other ways to code this, perhaps using
 foreach, but that's not the point. The point is that this code  _can_ be
 written and on the surface looks fine. Not even -w (warnings) spots the
 signed/unsigned problem. At the very least can we get a warning for  
 this?

surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }


Of course; that was obvious. No put-down was implied ;-)
 The point of this post isn't to get help with one specific problem but  
 rather to ask whether there is a solution to the underlying problem  
 behaviour. I suspect this behaviour will be the source of many bugs to  
 come and I wonder if there is a way to avoid them.

The point of my reply was that the "solution to the underlying problem behaviour" is to properly express (and thus document) the algorithm rather than rely on side-effects of the implemented language. In this case, as an example of such algorithm documentation, if one explicitly states that the search only applies to lines of two or more characters, it helps the reader of the code understand your intentions. Without that, the code phrase (i < line.length-2) might not trigger the reader's thoughts about handling short lines.

I agree with Regan in that the code is "conceptually" well-expressed, it is instead a nasty and unexpected language shortcoming that results in the bug. I, however, at the moment see no good way to solve this language shortcoming, and there may not even be one. :-( (Also: "char[] annoyance..." is an unfit thread title.) -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Apr 10 2006
prev sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Regan Heath wrote:
 On Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek psych.ward> wrote:
 
 On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:

 Take this code:

 void main()
 {
     //..open a file, read line for line, on each line:

     for(int i = 0; i < line.length-2; i++) {
         if (line[i..i+2] != "||") continue;
         //..etc..
     }
 }

 There is a subtle bug. On all lines with a length of 0 or 1 it will give
 the following error:

 Error: ArrayBoundsError line_length.d(6)

 The problem is of course the statement "i < line.length-2". 
 line.length  is
 unsigned, and when you - 2 from an unsigned value.. well lets just say
 that it's bigger than the actual length of the line - 2.

 Of course there are plently of other ways to code this, perhaps using
 foreach, but that's not the point. The point is that this code  _can_ be
 written and on the surface looks fine. Not even -w (warnings) spots the
 signed/unsigned problem. At the very least can we get a warning for  
 this?

I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }

Thanks Derek, that's exactly what I did. The point of this post isn't to get help with one specific problem but rather to ask whether there is a solution to the underlying problem behaviour. I suspect this behaviour will be the source of many bugs to come and I wonder if there is a way to avoid them.

First, how come you don't use this code? for (int i=0; i<line.length-2; i++) { if (line[i..i] == "|") if (line[i+1..i+1] == "|") continue; } It should be more than twice as fast. Second, isn't your code just an excellent example of why we have bounds checking in D in the first place? I think the obvious and clear-for-the-reader solution is to wrap that code in an if: if (line.length > 1) { // the above code here } I can't see what would be wrong with that.
Apr 10 2006
parent "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 10 Apr 2006 17:42:33 +0300, Georg Wrede <georg.wrede nospam.org>  
wrote:
 Regan Heath wrote:
 On Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek psych.ward>  
 wrote:

 On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:

 Take this code:

 void main()
 {
     //..open a file, read line for line, on each line:

     for(int i = 0; i < line.length-2; i++) {
         if (line[i..i+2] != "||") continue;
         //..etc..
     }
 }

 There is a subtle bug. On all lines with a length of 0 or 1 it will  
 give
 the following error:

 Error: ArrayBoundsError line_length.d(6)

 The problem is of course the statement "i < line.length-2".  
 line.length  is
 unsigned, and when you - 2 from an unsigned value.. well lets just say
 that it's bigger than the actual length of the line - 2.

 Of course there are plently of other ways to code this, perhaps using
 foreach, but that's not the point. The point is that this code  _can_  
 be
 written and on the surface looks fine. Not even -w (warnings) spots  
 the
 signed/unsigned problem. At the very least can we get a warning for   
 this?

I too have tripped up on this 'bug' and it is very annoying and surprising. However your approach to the problem might need to change...as you state - "On all lines with a length of 0 or 1 it will give the following error..." - and this is because the test that you are performing is only applicable to lines with two or more characters in them ... so make that condition a part of the algorithm ... if (line.length >= 2) { for(int i = 0; i < line.length-2; i++) { if (line[i..i+2] != "||") continue; //..etc.. } } else { // do something for short lines... }

The point of this post isn't to get help with one specific problem but rather to ask whether there is a solution to the underlying problem behaviour. I suspect this behaviour will be the source of many bugs to come and I wonder if there is a way to avoid them.

First, how come you don't use this code? for (int i=0; i<line.length-2; i++) { if (line[i..i] == "|") if (line[i+1..i+1] == "|") continue; } It should be more than twice as fast.

Because speed wasn't important in this case (once-off console app)
 Second, isn't your code just an excellent example of why we have bounds  
 checking in D in the first place?

Bounds checking certainly helps.
 I think the obvious and clear-for-the-reader solution is to wrap that  
 code in an if:

 	if (line.length > 1)
 	{
 		// the above code here
 	}

 I can't see what would be wrong with that.

See my reply to Derek for why I think it's 'wrong'. Regan
Apr 10 2006
prev sibling parent Lionello Lunesu <lio lunesu.remove.com> writes:
   if (line.length >= 2) {
     for(int i = 0; i < line.length-2; i++) {
       if (line[i..i+2] != "||") continue;
       //..etc..
     }
   }
   else {
      // do something for short lines...
   }

It seems silly to add another 'if', knowing that the compiler could simply use the signed branch instructions instead of the unsigned ones. Does the spec define what happens when comparing a signed against an unsigned type? Perhaps one or the other should be manually cast in this case. Or, force the comparison as signed. That would have prevented the length-2 bug. L.
Apr 10 2006
prev sibling next sibling parent reply S. Chancellor <dnewsgr mephit.kicks-ass.org> writes:
On 2006-04-09 17:23:06 -0700, "Regan Heath" <regan netwin.co.nz> said:

 Take this code:
 
 void main()
 {
 	//..open a file, read line for line, on each line:
 
 	for(int i = 0; i < line.length-2; i++) {
 		if (line[i..i+2] != "||") continue;
 		//..etc..
 	}
 }
 
 
 There is a subtle bug. On all lines with a length of 0 or 1 it will 
 give  the following error:
 
 Error: ArrayBoundsError line_length.d(6)
 
 
 The problem is of course the statement "i < line.length-2". line.length 
 is  unsigned, and when you - 2 from an unsigned value.. well lets just 
 say  that it's bigger than the actual length of the line - 2.
 
 
 Of course there are plently of other ways to code this, perhaps using  
 foreach, but that's not the point. The point is that this code  _can_ 
 be  written and on the surface looks fine. Not even -w (warnings) spots 
 the  signed/unsigned problem. At the very least can we get a warning 
 for this?
 
 Regan

if( line[i.. (i+2 > $ ? $ : i +2 ) ] != "||") { ... } Is that not allowed in the [..] syntax? -S.
Apr 10 2006
parent reply S. Chancellor <dnewsgr mephit.kicks-ass.org> writes:
On 2006-04-10 00:50:24 -0700, S. Chancellor 
<dnewsgr mephit.kicks-ass.org> said:

 On 2006-04-09 17:23:06 -0700, "Regan Heath" <regan netwin.co.nz> said:
 
 Take this code:
 
 void main()
 {
 	//..open a file, read line for line, on each line:
 
 	for(int i = 0; i < line.length-2; i++) {
 		if (line[i..i+2] != "||") continue;
 		//..etc..
 	}
 }
 
 
 There is a subtle bug. On all lines with a length of 0 or 1 it will 
 give  the following error:
 
 Error: ArrayBoundsError line_length.d(6)
 
 
 The problem is of course the statement "i < line.length-2". line.length 
 is  unsigned, and when you - 2 from an unsigned value.. well lets just 
 say  that it's bigger than the actual length of the line - 2.
 
 
 Of course there are plently of other ways to code this, perhaps using  
 foreach, but that's not the point. The point is that this code  _can_ 
 be  written and on the surface looks fine. Not even -w (warnings) spots 
 the  signed/unsigned problem. At the very least can we get a warning 
 for this?
 
 Regan

if( line[i.. (i+2 > $ ? $ : i +2 ) ] != "||") { ... } Is that not allowed in the [..] syntax? -S.

Erps, I should skim your post better. Yeah that's a problem, you can fix it without the other if though. for(int i = 0; i + 2 < line.length; i++) { Or am I still missing something? Hehe. -S
Apr 10 2006
parent reply "Derek Parnell" <derek psych.ward> writes:
On Mon, 10 Apr 2006 17:54:25 +1000, S. Chancellor  
<dnewsgr mephit.kicks-ass.org> wrote:


 Erps, I should skim your post better.  Yeah that's a problem, you can  
 fix it without the other if though.

 for(int i = 0; i + 2 < line.length; i++) {


 Or am I still missing something?  Hehe.

No, that is fine and along the same lines as the original code. As is for(int i = 0; i < (cast(int)line.length)-2; i++) { however I still prefer telling the code reader what I'm doing etc... in clear, simple code. Although your code will work, it just isn't as clear at first glance as if (line.length >= 2) { ... } but of course an even better way (IMHO) ... int posn = std.string.find(line, "||"); if (posn >= 0) { // etc .... } -- Derek Parnell Melbourne, Australia
Apr 10 2006
parent "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 10 Apr 2006 18:51:00 +1000, Derek Parnell <derek psych.ward> wrote:
 On Mon, 10 Apr 2006 17:54:25 +1000, S. Chancellor  
 <dnewsgr mephit.kicks-ass.org> wrote:


 Erps, I should skim your post better.  Yeah that's a problem, you can  
 fix it without the other if though.

 for(int i = 0; i + 2 < line.length; i++) {


 Or am I still missing something?  Hehe.

No, that is fine and along the same lines as the original code. As is for(int i = 0; i < (cast(int)line.length)-2; i++) { however I still prefer telling the code reader what I'm doing etc... in clear, simple code. Although your code will work, it just isn't as clear at first glance as if (line.length >= 2) { ... } but of course an even better way (IMHO) ... int posn = std.string.find(line, "||"); if (posn >= 0) { // etc .... }

In my case there were several || on the lines, eg. ..etc..||<A>_desc||..etc..||<B>_desc||..etc.. and I'm actually rewriting them as: ..etc..||thing(<A>_desc)||..etc..||thing(<B>_desc)||..etc.. but, yeah, it's possible to use find. :) Regan
Apr 10 2006
prev sibling parent reply Kevin Bealer <Kevin_member pathlink.com> writes:
In article <ops7rncsol23k2f5 nrage.netwin.co.nz>, Regan Heath says...
Take this code:

void main()
{
	//..open a file, read line for line, on each line:

	for(int i = 0; i < line.length-2; i++) {
		if (line[i..i+2] != "||") continue;
		//..etc..
	}
}


There is a subtle bug. On all lines with a length of 0 or 1 it will give  
the following error:

Error: ArrayBoundsError line_length.d(6)


The problem is of course the statement "i < line.length-2". line.length is  
unsigned, and when you - 2 from an unsigned value.. well lets just say  
that it's bigger than the actual length of the line - 2.


Of course there are plently of other ways to code this, perhaps using  
foreach, but that's not the point. The point is that this code  _can_ be  
written and on the surface looks fine. Not even -w (warnings) spots the  
signed/unsigned problem. At the very least can we get a warning for this?

Regan

I see the "gotcha" here, and whether conversions should be done is an interesting question. But I wanted to propose a slightly different solution.
 for(int i = 0; i+2 < line.length; i++) {

Since you are reference "i+2" in the expression, that is really the value what you need to be in the [0..length) range. More generally, always do the arithmetic with the signed variables in cases like this if there is a possibility of wraparound. But if the automatic conversions were specified to do uint->int that would also be fine with me. One can also imagine a language with the following inequalities: +> unsigned greater-than +< unsigned less than -> signed greater than -< signed less than +>= unsigned greater-or-equal etc This would not replace the current >, <, but would essentially just be shorthand for existing expressions: (A +> B) is the same as (cast(uint)A > cast(uint)B) .. and so on, except that uint, ulong, ushort, etc would be chosen as needed. Is this worth doing? Maybe - it's not a big deal to me, but the signed/unsigned question does represent a common gotcha in certain expressions. Kevin
Apr 10 2006
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 10 Apr 2006 16:28:00 +0000 (UTC), Kevin Bealer  
<Kevin_member pathlink.com> wrote:
 In article <ops7rncsol23k2f5 nrage.netwin.co.nz>, Regan Heath says...
 Take this code:

 void main()
 {
 	//..open a file, read line for line, on each line:

 	for(int i = 0; i < line.length-2; i++) {
 		if (line[i..i+2] != "||") continue;
 		//..etc..
 	}
 }


 There is a subtle bug. On all lines with a length of 0 or 1 it will give
 the following error:

 Error: ArrayBoundsError line_length.d(6)


 The problem is of course the statement "i < line.length-2". line.length  
 is
 unsigned, and when you - 2 from an unsigned value.. well lets just say
 that it's bigger than the actual length of the line - 2.


 Of course there are plently of other ways to code this, perhaps using
 foreach, but that's not the point. The point is that this code  _can_ be
 written and on the surface looks fine. Not even -w (warnings) spots the
 signed/unsigned problem. At the very least can we get a warning for  
 this?

 Regan

I see the "gotcha" here, and whether conversions should be done is an interesting question. But I wanted to propose a slightly different solution.
 for(int i = 0; i+2 < line.length; i++) {

Since you are reference "i+2" in the expression, that is really the value what you need to be in the [0..length) range.

You're probably right.. now, if only I could train my brain to think of it that way round :)
 More generally, always do the arithmetic with the signed variables in  
 cases like this if there is a possibility of wraparound.

That's good advice, however you have to realise there is a possibility of wraparound, something I didn't do (or I wouldn't have had the problem in the first place) :(
 But if the automatic conversions were specified to do uint->int that  
 would also be fine with me.

But wouldn't that introduce a bug here: int a = int.max-1; uint b = int.max+1; assert(a < b); wouldn't b be promoted to a signed value of <some negative number>?
 One can also imagine a language with the following inequalities:

 +>  unsigned greater-than
 +<  unsigned less than
 ->  signed greater than
 -<  signed less than
 +>= unsigned greater-or-equal
 etc

 This would not replace the current >, <, but would essentially just be  
 shorthand for existing expressions:

 (A +> B) is the same as (cast(uint)A > cast(uint)B)

 .. and so on, except that uint, ulong, ushort, etc would be chosen as  
 needed.

 Is this worth doing?  Maybe - it's not a big deal to me, but the  
 signed/unsigned question does represent a common gotcha in certain  
 expressions.

It's a good idea, but, same problem as before; you have to realise length is signed and you have to realise there is a chance for wraparound. A warning about the signed/unsigned comparrison is the very least we should do. I would be tempted to even make it an outright error requiring a cast (or one of these new operators above) to handle. Regan
Apr 10 2006
parent reply xs0 <xs0 xs0.com> writes:
Regan Heath wrote:
 On Mon, 10 Apr 2006 16:28:00 +0000 (UTC), Kevin Bealer 
 <Kevin_member pathlink.com> wrote:
 In article <ops7rncsol23k2f5 nrage.netwin.co.nz>, Regan Heath says...
 Take this code:

 void main()
 {
     //..open a file, read line for line, on each line:

     for(int i = 0; i < line.length-2; i++) {
         if (line[i..i+2] != "||") continue;
         //..etc..
     }
 }


 There is a subtle bug. On all lines with a length of 0 or 1 it will give
 the following error:

 Error: ArrayBoundsError line_length.d(6)


 The problem is of course the statement "i < line.length-2". 
 line.length is
 unsigned, and when you - 2 from an unsigned value.. well lets just say
 that it's bigger than the actual length of the line - 2.


 Of course there are plently of other ways to code this, perhaps using
 foreach, but that's not the point. The point is that this code  _can_ be
 written and on the surface looks fine. Not even -w (warnings) spots the
 signed/unsigned problem. At the very least can we get a warning for 
 this?

 Regan

I see the "gotcha" here, and whether conversions should be done is an interesting question. But I wanted to propose a slightly different solution.
 for(int i = 0; i+2 < line.length; i++) {

Since you are reference "i+2" in the expression, that is really the value what you need to be in the [0..length) range.

You're probably right.. now, if only I could train my brain to think of it that way round :)
 More generally, always do the arithmetic with the signed variables in 
 cases like this if there is a possibility of wraparound.

That's good advice, however you have to realise there is a possibility of wraparound, something I didn't do (or I wouldn't have had the problem in the first place) :(
 But if the automatic conversions were specified to do uint->int that 
 would also be fine with me.

But wouldn't that introduce a bug here: int a = int.max-1; uint b = int.max+1; assert(a < b); wouldn't b be promoted to a signed value of <some negative number>?
 One can also imagine a language with the following inequalities:

 +>  unsigned greater-than
 +<  unsigned less than
 ->  signed greater than
 -<  signed less than
 +>= unsigned greater-or-equal
 etc

 This would not replace the current >, <, but would essentially just be 
 shorthand for existing expressions:

 (A +> B) is the same as (cast(uint)A > cast(uint)B)

 .. and so on, except that uint, ulong, ushort, etc would be chosen as 
 needed.

 Is this worth doing?  Maybe - it's not a big deal to me, but the 
 signed/unsigned question does represent a common gotcha in certain 
 expressions.

It's a good idea, but, same problem as before; you have to realise length is signed and you have to realise there is a chance for wraparound. A warning about the signed/unsigned comparrison is the very least we should do. I would be tempted to even make it an outright error requiring a cast (or one of these new operators above) to handle. Regan

Wouldn't it be somewhat the best, if signed/unsigned comparison actually worked correctly in mathematical sense? int a; uint b; (a<b) becomes (a<0 || cast(uint)a < b) (a==b) becomes (a>=0 && cast(uint)a == b) (a>b) becomes (a>=0 && cast(uint)a > b) Unfortunately, that doesn't solve the original problem if (a < b.length-2) What would solve it is making .length signed. Even though it should puristically be unsigned, and losing one bit could theoretically be a problem (because you couldn't allocate new byte[more than 2GB], the reality is that a 32-bit app can't allocate more than 2GB of RAM anyway (in Windows at least), and for 64-bit apps, losing that one bit is totally insignificant, unless someone expects a computer with 9 exabytes of ram anytime soon :) And the only place really needing change (and a trivial one) would be the array resizing code.. xs0
Apr 11 2006
parent reply Kevin Bealer <Kevin_member pathlink.com> writes:
In article <e1ftsj$1spp$1 digitaldaemon.com>, xs0 says...
Regan Heath wrote:
 On Mon, 10 Apr 2006 16:28:00 +0000 (UTC), Kevin Bealer 
 <Kevin_member pathlink.com> wrote:
 In article <ops7rncsol23k2f5 nrage.netwin.co.nz>, Regan Heath says...
 Take this code:

 void main()
 {
     //..open a file, read line for line, on each line:

     for(int i = 0; i < line.length-2; i++) {
         if (line[i..i+2] != "||") continue;
         //..etc..
     }
 }


 There is a subtle bug. On all lines with a length of 0 or 1 it will give
 the following error:

 Error: ArrayBoundsError line_length.d(6)


 The problem is of course the statement "i < line.length-2". 
 line.length is
 unsigned, and when you - 2 from an unsigned value.. well lets just say
 that it's bigger than the actual length of the line - 2.


 Of course there are plently of other ways to code this, perhaps using
 foreach, but that's not the point. The point is that this code  _can_ be
 written and on the surface looks fine. Not even -w (warnings) spots the
 signed/unsigned problem. At the very least can we get a warning for 
 this?

 Regan

I see the "gotcha" here, and whether conversions should be done is an interesting question. But I wanted to propose a slightly different solution.
 for(int i = 0; i+2 < line.length; i++) {

Since you are reference "i+2" in the expression, that is really the value what you need to be in the [0..length) range.

You're probably right.. now, if only I could train my brain to think of it that way round :)
 More generally, always do the arithmetic with the signed variables in 
 cases like this if there is a possibility of wraparound.

That's good advice, however you have to realise there is a possibility of wraparound, something I didn't do (or I wouldn't have had the problem in the first place) :(
 But if the automatic conversions were specified to do uint->int that 
 would also be fine with me.

But wouldn't that introduce a bug here: int a = int.max-1; uint b = int.max+1; assert(a < b); wouldn't b be promoted to a signed value of <some negative number>?


True, hadn't thought of that. I guess the correct thing (mathematically) is what is proposed by xs0 below (however, see comments).
 One can also imagine a language with the following inequalities:

 +>  unsigned greater-than
 +<  unsigned less than
 ->  signed greater than
 -<  signed less than
 +>= unsigned greater-or-equal
 etc

 This would not replace the current >, <, but would essentially just be 
 shorthand for existing expressions:

 (A +> B) is the same as (cast(uint)A > cast(uint)B)

 .. and so on, except that uint, ulong, ushort, etc would be chosen as 
 needed.

 Is this worth doing?  Maybe - it's not a big deal to me, but the 
 signed/unsigned question does represent a common gotcha in certain 
 expressions.

It's a good idea, but, same problem as before; you have to realise length is signed and you have to realise there is a chance for wraparound. A warning about the signed/unsigned comparrison is the very least we should do. I would be tempted to even make it an outright error requiring a cast (or one of these new operators above) to handle. Regan

Wouldn't it be somewhat the best, if signed/unsigned comparison actually worked correctly in mathematical sense? int a; uint b; (a<b) becomes (a<0 || cast(uint)a < b) (a==b) becomes (a>=0 && cast(uint)a == b) (a>b) becomes (a>=0 && cast(uint)a > b)

I like this conceptually - it seems like definitely the correct approach for a language like python, but it has the unfortunate effect of adding a conditional to what is otherwise a one-instruction subtraction (< is a subtract in asm).
Unfortunately, that doesn't solve the original problem

if (a < b.length-2)

What would solve it is making .length signed. Even though it should 
puristically be unsigned, and losing one bit could theoretically be a 
problem (because you couldn't allocate new byte[more than 2GB], the 
reality is that a 32-bit app can't allocate more than 2GB of RAM anyway 
(in Windows at least), and for 64-bit apps, losing that one bit is 
totally insignificant, unless someone expects a computer with 9 exabytes 
of ram anytime soon :)

And the only place really needing change (and a trivial one) would be 
the array resizing code..


xs0

The unsigned .length would fix this particular issue. But I think this might be one of those things where every solution seems to cause problems in some other area. For instance, in C, char is (usually) signed, which is good for small numbers, but when you use char as an array index, it walks off the front of the array for anything over 128. Lots and lots of C programs have bugs from this. It's actually up to the compiler writer whether char is signed in C... As a result, when writing crypto and hash function code in C, people usually have to cast the types to get reliable output. Such code is often peppered by casts, to force size and signedness to the right kind. Sometimes you see code where about half of these are completely unnecessary, but the author simply decided that *Let's never have to fix this thing again* and did a cast for every math operation. I realize this is all tangential, but I think that there are three or four ways to handle this and all of them probably have a few "gotchas". You'd almost have to write a big program with the signed version of .length to know if you got bit somehow. I checked a hunch - it looks like .length was changed to size_t in dmd 0.77. It's also mentioned in the portability section of the documentation: "The .length, .size, .sizeof, and .alignof properties will be of type size_t." Not sure what it was before .77. Kevin
Apr 11 2006
parent Georg Wrede <georg.wrede nospam.org> writes:
Kevin Bealer wrote:
 In article <e1ftsj$1spp$1 digitaldaemon.com>, xs0 says...
 
Regan Heath wrote:

On Mon, 10 Apr 2006 16:28:00 +0000 (UTC), Kevin Bealer 
<Kevin_member pathlink.com> wrote:

In article <ops7rncsol23k2f5 nrage.netwin.co.nz>, Regan Heath says...

Take this code:

void main()
{
    //..open a file, read line for line, on each line:

    for(int i = 0; i < line.length-2; i++) {
        if (line[i..i+2] != "||") continue;
        //..etc..
    }
}


There is a subtle bug. On all lines with a length of 0 or 1 it will give
the following error:

Error: ArrayBoundsError line_length.d(6)


The problem is of course the statement "i < line.length-2". 
line.length is
unsigned, and when you - 2 from an unsigned value.. well lets just say
that it's bigger than the actual length of the line - 2.


Of course there are plently of other ways to code this, perhaps using
foreach, but that's not the point. The point is that this code  _can_ be
written and on the surface looks fine. Not even -w (warnings) spots the
signed/unsigned problem. At the very least can we get a warning for 
this?

Regan

I see the "gotcha" here, and whether conversions should be done is an interesting question. But I wanted to propose a slightly different solution.
for(int i = 0; i+2 < line.length; i++) {

Since you are reference "i+2" in the expression, that is really the value what you need to be in the [0..length) range.

You're probably right.. now, if only I could train my brain to think of it that way round :)
More generally, always do the arithmetic with the signed variables in 
cases like this if there is a possibility of wraparound.

That's good advice, however you have to realise there is a possibility of wraparound, something I didn't do (or I wouldn't have had the problem in the first place) :(
But if the automatic conversions were specified to do uint->int that 
would also be fine with me.

But wouldn't that introduce a bug here: int a = int.max-1; uint b = int.max+1; assert(a < b); wouldn't b be promoted to a signed value of <some negative number>?


True, hadn't thought of that. I guess the correct thing (mathematically) is what is proposed by xs0 below (however, see comments).
One can also imagine a language with the following inequalities:

+>  unsigned greater-than
+<  unsigned less than
->  signed greater than
-<  signed less than
+>= unsigned greater-or-equal
etc

This would not replace the current >, <, but would essentially just be 
shorthand for existing expressions:

(A +> B) is the same as (cast(uint)A > cast(uint)B)

.. and so on, except that uint, ulong, ushort, etc would be chosen as 
needed.

Is this worth doing?  Maybe - it's not a big deal to me, but the 
signed/unsigned question does represent a common gotcha in certain 
expressions.

It's a good idea, but, same problem as before; you have to realise length is signed and you have to realise there is a chance for wraparound. A warning about the signed/unsigned comparrison is the very least we should do. I would be tempted to even make it an outright error requiring a cast (or one of these new operators above) to handle. Regan

Wouldn't it be somewhat the best, if signed/unsigned comparison actually worked correctly in mathematical sense? int a; uint b; (a<b) becomes (a<0 || cast(uint)a < b) (a==b) becomes (a>=0 && cast(uint)a == b) (a>b) becomes (a>=0 && cast(uint)a > b)

I like this conceptually - it seems like definitely the correct approach for a language like python, but it has the unfortunate effect of adding a conditional to what is otherwise a one-instruction subtraction (< is a subtract in asm).
Unfortunately, that doesn't solve the original problem

if (a < b.length-2)

What would solve it is making .length signed. Even though it should 
puristically be unsigned, and losing one bit could theoretically be a 
problem (because you couldn't allocate new byte[more than 2GB], the 
reality is that a 32-bit app can't allocate more than 2GB of RAM anyway 
(in Windows at least), and for 64-bit apps, losing that one bit is 
totally insignificant, unless someone expects a computer with 9 exabytes 
of ram anytime soon :)

And the only place really needing change (and a trivial one) would be 
the array resizing code..


xs0

The unsigned .length would fix this particular issue. But I think this might be one of those things where every solution seems to cause problems in some other area. I realize this is all tangential, but I think that there are three or four ways to handle this and all of them probably have a few "gotchas". You'd almost have to write a big program with the signed version of .length to know if you got bit somehow.

I agree. This is not my official opinion, but I sometimes feel like the guy who invented Cruise Control, and this customer comes back and complains that what's the use, he still has to use the steering wheel.
Apr 11 2006