www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - 0 < negative loop condition bug or misunderstanding on my part

reply "ixid" <nuaccount gmail.com> writes:
I'm writing my first basic algorithms, this one is merge sort. 
This version throws an exception when array.length - setSize is 
negative (which should be fine, the rest of my function would 
deal with it):

template mergeSort(T)
{
	void mergeSort(ref T[] array, const T setSize = 100)
	{
		T[][] merge;
		merge.length = array.length / setSize;
		T ii, jj;
		for(ii = 0, jj = 0;ii < array.length - setSize;ii += setSize, 
++jj)
			merge[jj] = array[ii..ii + setSize];

...

If I make the seemingly pointless change to this:

template mergeSort(T)
{
	void mergeSort(ref T[] array, const T setSize = 100)
	{
		T[][] merge;
		merge.length = array.length / setSize;
		T ii, jj;
		T temp2 = array.length - setSize;
		for(ii = 0, jj = 0;ii < temp2;ii += setSize, ++jj)
			merge[jj] = array[ii..ii + setSize];

Where it's a temporary variable then it works perfectly well.
Mar 06 2012
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 03/06/2012 09:11 PM, ixid wrote:
 I'm writing my first basic algorithms, this one is merge sort. This
 version throws an exception when array.length - setSize is negative
 (which should be fine, the rest of my function would deal with it):

 template mergeSort(T)
 {
 void mergeSort(ref T[] array, const T setSize = 100)
 {
 T[][] merge;
 merge.length = array.length / setSize;
 T ii, jj;
 for(ii = 0, jj = 0;ii < array.length - setSize;ii += setSize, ++jj)

We don't know what T is, but I assume a signed type like int. array.length is size_t, i.e. an unsigned type. Unsigned types have this nasty habit of converting the entire expression to unsigned (that is a rule since C). So array.length - setSize above is size_t. In other words, it is never negative.
 merge[jj] = array[ii..ii + setSize];

 ...

 If I make the seemingly pointless change to this:

 template mergeSort(T)
 {
 void mergeSort(ref T[] array, const T setSize = 100)
 {
 T[][] merge;
 merge.length = array.length / setSize;
 T ii, jj;
 T temp2 = array.length - setSize;

There, you are setting temp2's size to be T (e.g. int), so temp2 can be negative.
 for(ii = 0, jj = 0;ii < temp2;ii += setSize, ++jj)
 merge[jj] = array[ii..ii + setSize];

 Where it's a temporary variable then it works perfectly well.

Ali
Mar 06 2012
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 03/06/2012 10:05 PM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is the
 benefit to size being unsigned rather than signed? This case would seem
 like one where allowing negatives is clearly better and more intuitive.

There are probably hundreds of discussions about that over the years on many different language newsgroups and forums. :) There is no clear winner: Both sides of the arguments seem to have good points. Ali
Mar 06 2012
parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 03/07/2012 12:23 AM, Ali Çehreli wrote:
 There are probably hundreds of discussions about that over the years on
 many different language newsgroups and forums. :) There is no clear
 winner: Both sides of the arguments seem to have good points.

 Ali

know any good ones off the top of your head?
Mar 07 2012
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 03/07/2012 07:51 PM, Ellery Newcomer wrote:
 On 03/07/2012 12:23 AM, Ali Çehreli wrote:
 There are probably hundreds of discussions about that over the years on
 many different language newsgroups and forums. :) There is no clear
 winner: Both sides of the arguments seem to have good points.

 Ali

know any good ones off the top of your head?

Really randomly: * Unsigned is better because some things are never negative: sizes, indexes, etc. * But we occasionally need to subtract one from the other. The following is not safe unless we are sure that the left-hand side is greater: size_t i = /* ... */; size_t j = /* ... */; foo(i - j); // Is this the difference or a very large number? Another bug because of an unsigned type: for (size_t i = /* ... */; i >= 0; --i) { (Although I think most compilers catch that one.) I searched for the following terms on comp.lang.c++.moderated and found many interesting discussions: size_t unsigned signed I don't have a strong opinion myself. Ali
Mar 07 2012
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 03/07/2012 07:05 AM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is the
 benefit to size being unsigned rather than signed? This case would seem
 like one where allowing negatives is clearly better and more intuitive.

The problem is not that length is unsigned. The issue is the implicit conversion from signed to unsigned. The right thing would be to disallow signed -> unsigned and unsigned -> signed implicit conversion unless value range propagation can prove it safe, and to make comparison between signed and unsigned actually work by translating it to more than one machine instruction.
Mar 07 2012
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 03/07/2012 11:01 AM, Timon Gehr wrote:
 On 03/07/2012 07:05 AM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is the
 benefit to size being unsigned rather than signed? This case would seem
 like one where allowing negatives is clearly better and more intuitive.

The problem is not that length is unsigned. The issue is the implicit conversion from signed to unsigned. The right thing would be to disallow signed -> unsigned and unsigned -> signed implicit conversion unless value range propagation can prove it safe, and to make comparison between signed and unsigned actually work by translating it to more than one machine instruction.

Furthermore, bitwise boolean operators should still accept arguments of arbitrary signedness but the result should implicitly convert to both signed and unsigned.
Mar 07 2012
prev sibling parent Sean Cavanaugh <WorksOnMyMachine gmail.com> writes:
On 3/7/2012 12:57 PM, Jonathan M Davis wrote:
 On Wednesday, March 07, 2012 11:01:05 Timon Gehr wrote:
 On 03/07/2012 07:05 AM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is the


conversions between signed and unsigned, a number of bugs would completely go away, but others would creep in as a result, and the overal situation wouldn't necessarily be any better, but I don't know. My initial reaction would be to agree with you, but there are definitely cases where such an approach would get annoying and bug-prone (due to the casting involved). But regardless, I really don't think that you're going to convince Walter on this one, given what he's said in the past. - Jonathan M Davis

After writing enough container libraries and whatnot in C++, I always end up going bed while thinking life would be so much easier if implicit signed/unsigned conversions were not allowed. Then I go to sleep, wake up, and realize how much more horrific the code would be in other places if this was actually true. The best compromise would probably have been to make size_t signed when migrating to 64 bit memory addressing, and leave the unsigned/signed problems specificaly with size_t values back in the past of 32 bit and older systems. On a related note I would love to see something in std.somewhere (conv?) to provide a complete set of narrowing and signed/unsigned conversion functions: The matrix is basically the following methods: 1) Reduce size (64->32->16->8 bits) but maintain signedness with three types: 2) Methods to flip signedness of the value (but mainting bitwidth) multiplied against three types of operations: a) truncating (i.e. mask off the lower bits) b) saturating (values outside of range are clamped to min or max of the narrower range) c) throwing (values outside of narrow range throw a range exception)
Mar 07 2012
prev sibling parent reply Ary Manzana <ary esperanto.org.ar> writes:
On 3/7/12 2:28 AM, Ali Çehreli wrote:
 On 03/06/2012 09:11 PM, ixid wrote:
  > I'm writing my first basic algorithms, this one is merge sort. This
  > version throws an exception when array.length - setSize is negative
  > (which should be fine, the rest of my function would deal with it):
  >
  > template mergeSort(T)
  > {
  > void mergeSort(ref T[] array, const T setSize = 100)
  > {
  > T[][] merge;
  > merge.length = array.length / setSize;
  > T ii, jj;
  > for(ii = 0, jj = 0;ii < array.length - setSize;ii += setSize, ++jj)

 We don't know what T is, but I assume a signed type like int.

 array.length is size_t, i.e. an unsigned type. Unsigned types have this
 nasty habit of converting the entire expression to unsigned (that is a
 rule since C). So array.length - setSize above is size_t.

Wouldn't it make more sense to convert the entire expression to signed if there's at list one signed integer? I really don't get it... It's like déjà-vu all the time in this newsgroup.
Mar 07 2012
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 03/08/2012 03:39 AM, Ary Manzana wrote:
 On 3/7/12 2:28 AM, Ali Çehreli wrote:
 On 03/06/2012 09:11 PM, ixid wrote:
 I'm writing my first basic algorithms, this one is merge sort. This
 version throws an exception when array.length - setSize is negative
 (which should be fine, the rest of my function would deal with it):

 template mergeSort(T)
 {
 void mergeSort(ref T[] array, const T setSize = 100)
 {
 T[][] merge;
 merge.length = array.length / setSize;
 T ii, jj;
 for(ii = 0, jj = 0;ii < array.length - setSize;ii += setSize, ++jj)

We don't know what T is, but I assume a signed type like int. array.length is size_t, i.e. an unsigned type. Unsigned types have this nasty habit of converting the entire expression to unsigned (that is a rule since C). So array.length - setSize above is size_t.

Wouldn't it make more sense to convert the entire expression to signed if there's at list one signed integer?

This is just as arbitrary as the rule that says the entire expression is converted to unsigned.
Mar 08 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Mar 07, 2012 at 06:11:18AM +0100, ixid wrote:
 I'm writing my first basic algorithms, this one is merge sort. This
 version throws an exception when array.length - setSize is negative
 (which should be fine, the rest of my function would deal with it):

array.length is of type size_t, which is an *unsigned* integer type. If setSize > array.length, then array.length - setSize will underflow and become a large positive number. T -- They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to Kill
Mar 06 2012
prev sibling next sibling parent "ixid" <nuaccount gmail.com> writes:
Ah, thank you, so it's wrapping. That seems like a bad idea, what 
is the benefit to size being unsigned rather than signed? This 
case would seem like one where allowing negatives is clearly 
better and more intuitive.
Mar 06 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
 On 03/06/2012 10:05 PM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is
 the benefit to size being unsigned rather than signed?


Because it doesn't make sense to have something with a negative size?
 This case would seem like one where allowing negatives is clearly
 better and more intuitive.


Not necessarily. For example, int.max+1 is a very large negative number, and -int.min is still a very large negative number. :-) T -- Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.
Mar 06 2012
prev sibling next sibling parent James Miller <james aatch.net> writes:
On 7 March 2012 19:30, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:
 On 03/06/2012 10:05 PM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is
 the benefit to size being unsigned rather than signed?
 This case would seem like one where allowing negatives is clearly
 better and more intuitive.



Because you don't describe things as -5 metres tall, so you don't describe things as -1024 bytes long. size_t makes sense unsigned because negatives make no sense for size. However, if you cast array.length to an int, it may work, haven't tested it. -- James Miller
Mar 07 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/7/12, Timon Gehr <timon.gehr gmx.ch> wrote:
 The problem is not that length is unsigned. The issue is the implicit
 conversion from signed to unsigned.

You bet. I've once had this hard to spot bug where I've used a call that was something like max(0, min(10, <expression>)), and this ended up returning a negative int because <expression> was combining an integer with a .length property of some array (I don't recall the exact call though). It was truly a WTF moment.
Mar 07 2012
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, March 07, 2012 11:01:05 Timon Gehr wrote:
 On 03/07/2012 07:05 AM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is the
 benefit to size being unsigned rather than signed? This case would seem
 like one where allowing negatives is clearly better and more intuitive.

The problem is not that length is unsigned. The issue is the implicit conversion from signed to unsigned. The right thing would be to disallow signed -> unsigned and unsigned -> signed implicit conversion unless value range propagation can prove it safe, and to make comparison between signed and unsigned actually work by translating it to more than one machine instruction.

Though that's one of those things that you're not going to convince Walter of - primarily, I believe, because it would require a lot more casting. The interesting part about _that_ is that if it's bad enough, it will actually make code _worse_, because the cast forces things. To really deal with it cleanly, you'd proabably need something similar to the const_cast nonsense in C++ except that it just converts signedness. I suspect that the reality of the matter is that if we disallowed implicit conversions between signed and unsigned, a number of bugs would completely go away, but others would creep in as a result, and the overal situation wouldn't necessarily be any better, but I don't know. My initial reaction would be to agree with you, but there are definitely cases where such an approach would get annoying and bug-prone (due to the casting involved). But regardless, I really don't think that you're going to convince Walter on this one, given what he's said in the past. - Jonathan M Davis
Mar 07 2012
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, March 07, 2012 13:20:41 Sean Cavanaugh wrote:
 On 3/7/2012 12:57 PM, Jonathan M Davis wrote:
 On Wednesday, March 07, 2012 11:01:05 Timon Gehr wrote:
 On 03/07/2012 07:05 AM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is the


I suspect that the reality of the matter is that if we disallowed implicit conversions between signed and unsigned, a number of bugs would completely go away, but others would creep in as a result, and the overal situation wouldn't necessarily be any better, but I don't know. My initial reaction would be to agree with you, but there are definitely cases where such an approach would get annoying and bug-prone (due to the casting involved). But regardless, I really don't think that you're going to convince Walter on this one, given what he's said in the past. - Jonathan M Davis

After writing enough container libraries and whatnot in C++, I always end up going bed while thinking life would be so much easier if implicit signed/unsigned conversions were not allowed. Then I go to sleep, wake up, and realize how much more horrific the code would be in other places if this was actually true. The best compromise would probably have been to make size_t signed when migrating to 64 bit memory addressing, and leave the unsigned/signed problems specificaly with size_t values back in the past of 32 bit and older systems. On a related note I would love to see something in std.somewhere (conv?) to provide a complete set of narrowing and signed/unsigned conversion functions: The matrix is basically the following methods: 1) Reduce size (64->32->16->8 bits) but maintain signedness with three types: 2) Methods to flip signedness of the value (but mainting bitwidth) multiplied against three types of operations: a) truncating (i.e. mask off the lower bits) b) saturating (values outside of range are clamped to min or max of the narrower range) c) throwing (values outside of narrow range throw a range exception)

std.conv.to will check non-implicit, narrowing conversions to make sure that the number will fit in the new type (throwing a ConvException if it won't), but if you're converting between types of the same size but which differ in signedness, then that's an implicit conversion, at it's the same as if you didn't use std.conv.to at all. - Jonathan M Davis
Mar 07 2012
prev sibling next sibling parent James Miller <james aatch.net> writes:
On 8 March 2012 15:39, Ary Manzana <ary esperanto.org.ar> wrote:
 On 3/7/12 2:28 AM, Ali =C3=87ehreli wrote:
 On 03/06/2012 09:11 PM, ixid wrote:
 =C2=A0> I'm writing my first basic algorithms, this one is merge sort. T=


 =C2=A0> version throws an exception when array.length - setSize is negat=


 =C2=A0> (which should be fine, the rest of my function would deal with i=


 =C2=A0>
 =C2=A0> template mergeSort(T)
 =C2=A0> {
 =C2=A0> void mergeSort(ref T[] array, const T setSize =3D 100)
 =C2=A0> {
 =C2=A0> T[][] merge;
 =C2=A0> merge.length =3D array.length / setSize;
 =C2=A0> T ii, jj;
 =C2=A0> for(ii =3D 0, jj =3D 0;ii < array.length - setSize;ii +=3D setSi=


 We don't know what T is, but I assume a signed type like int.

 array.length is size_t, i.e. an unsigned type. Unsigned types have this
 nasty habit of converting the entire expression to unsigned (that is a
 rule since C). So array.length - setSize above is size_t.

Wouldn't it make more sense to convert the entire expression to signed if there's at list one signed integer? I really don't get it... It's like d=C3=A9j=C3=A0-vu all the time in this newsgroup.

Its the semantics in C/C++ and D explicitly tries to have the same semantics as them. From what I remember its to aid people moving from those language to D. -- James Miller
Mar 07 2012
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, March 08, 2012 16:15:08 James Miller wrote:
 Its the semantics in C/C++ and D explicitly tries to have the same
 semantics as them. From what I remember its to aid people moving from
 those language to D.

More like it's to avoid code silently breaking when it's ported. In general, C/C++ code is either valid D code with identical semantics, or it won't compile as D code. There are a _few_ exceptions (e.g. static arrays are passed by value), but they're far and few between. Porting C/C++ code to D would be much nastier if the C/C++ compiled as D code but had different semantics. In addition to that, I believe that Walter's take on it is to generally leave stuff as it is in C/C++ unless there's a good reason to change it. Obviously, there's a lot which was worth changing, and there's tons of stuff in D which is different form C/C++. But there's a lot of little stuff that wasn't messed with, because there just wasn't a good reason to. - Jonathan M Davis
Mar 07 2012