digitalmars.D.learn - 0 < negative loop condition bug or misunderstanding on my part

ixid (27/27) Mar 06 2012 I'm writing my first basic algorithms, this one is merge sort.

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (9/34) Mar 06 2012 We don't know what T is, but I assume a signed type like int.

ixid (4/4) Mar 06 2012 Ah, thank you, so it's wrapping. That seems like a bad idea, what

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (5/8) Mar 06 2012 There are probably hundreds of discussions about that over the years on

H. S. Teoh (6/11) Mar 06 2012 Not necessarily. For example, int.max+1 is a very large negative number,
James Miller (7/12) Mar 07 2012 Because you don't describe things as -5 metres tall, so you don't
Ellery Newcomer (2/6) Mar 07 2012 know any good ones off the top of your head?

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (17/26) Mar 07 2012 Really randomly:

Timon Gehr (7/10) Mar 07 2012 The problem is not that length is unsigned. The issue is the implicit

Timon Gehr (4/14) Mar 07 2012 Furthermore, bitwise boolean operators should still accept arguments of
Andrej Mitrovic (6/8) Mar 07 2012 You bet. I've once had this hard to spot bug where I've used a call
Jonathan M Davis (16/27) Mar 07 2012 Though that's one of those things that you're not going to convince Walt...

Sean Cavanaugh (21/33) Mar 07 2012 After writing enough container libraries and whatnot in C++, I always

Jonathan M Davis (7/49) Mar 07 2012 std.conv.to will check non-implicit, narrowing conversions to make sure ...

Ary Manzana (5/22) Mar 07 2012 Wouldn't it make more sense to convert the entire expression to signed

James Miller (10/35) Mar 07 2012 ive
Jonathan M Davis (12/15) Mar 07 2012 More like it's to avoid code silently breaking when it's ported. In gene...
Timon Gehr (3/25) Mar 08 2012 This is just as arbitrary as the rule that says the entire expression is...

H. S. Teoh (10/13) Mar 06 2012 [...]

"ixid" <nuaccount gmail.com> writes:

I'm writing my first basic algorithms, this one is merge sort. 
This version throws an exception when array.length - setSize is 
negative (which should be fine, the rest of my function would 
deal with it):

template mergeSort(T)
{
	void mergeSort(ref T[] array, const T setSize = 100)
	{
		T[][] merge;
		merge.length = array.length / setSize;
		T ii, jj;
		for(ii = 0, jj = 0;ii < array.length - setSize;ii += setSize, 
++jj)
			merge[jj] = array[ii..ii + setSize];

...

If I make the seemingly pointless change to this:

template mergeSort(T)
{
	void mergeSort(ref T[] array, const T setSize = 100)
	{
		T[][] merge;
		merge.length = array.length / setSize;
		T ii, jj;
		T temp2 = array.length - setSize;
		for(ii = 0, jj = 0;ii < temp2;ii += setSize, ++jj)
			merge[jj] = array[ii..ii + setSize];

Where it's a temporary variable then it works perfectly well.

Mar 06 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 03/06/2012 09:11 PM, ixid wrote:
 I'm writing my first basic algorithms, this one is merge sort. This
 version throws an exception when array.length - setSize is negative
 (which should be fine, the rest of my function would deal with it):

 template mergeSort(T)
 {
 void mergeSort(ref T[] array, const T setSize = 100)
 {
 T[][] merge;
 merge.length = array.length / setSize;
 T ii, jj;
 for(ii = 0, jj = 0;ii < array.length - setSize;ii += setSize, ++jj)

We don't know what T is, but I assume a signed type like int.

array.length is size_t, i.e. an unsigned type. Unsigned types have this 
nasty habit of converting the entire expression to unsigned (that is a 
rule since C). So array.length - setSize above is size_t.

In other words, it is never negative.

 merge[jj] = array[ii..ii + setSize];

 ...

 If I make the seemingly pointless change to this:

 template mergeSort(T)
 {
 void mergeSort(ref T[] array, const T setSize = 100)
 {
 T[][] merge;
 merge.length = array.length / setSize;
 T ii, jj;
 T temp2 = array.length - setSize;

There, you are setting temp2's size to be T (e.g. int), so temp2 can be 
negative.

 for(ii = 0, jj = 0;ii < temp2;ii += setSize, ++jj)
 merge[jj] = array[ii..ii + setSize];

 Where it's a temporary variable then it works perfectly well.

Ali

Mar 06 2012

"ixid" <nuaccount gmail.com> writes:

Ah, thank you, so it's wrapping. That seems like a bad idea, what 
is the benefit to size being unsigned rather than signed? This 
case would seem like one where allowing negatives is clearly 
better and more intuitive.

Mar 06 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 03/06/2012 10:05 PM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is the
 benefit to size being unsigned rather than signed? This case would seem
 like one where allowing negatives is clearly better and more intuitive.

There are probably hundreds of discussions about that over the years on 
many different language newsgroups and forums. :) There is no clear 
winner: Both sides of the arguments seem to have good points.

Ali

Mar 06 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

 On 03/06/2012 10:05 PM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is
 the benefit to size being unsigned rather than signed?


Because it doesn't make sense to have something with a negative size?


 This case would seem like one where allowing negatives is clearly
 better and more intuitive.


Not necessarily. For example, int.max+1 is a very large negative number,
and -int.min is still a very large negative number. :-)


T

-- 
Be in denial for long enough, and one day you'll deny yourself of things you
wish you hadn't.

Mar 06 2012

James Miller <james aatch.net> writes:

On 7 March 2012 19:30, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:
 On 03/06/2012 10:05 PM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is
 the benefit to size being unsigned rather than signed?
 This case would seem like one where allowing negatives is clearly
 better and more intuitive.



Because you don't describe things as -5 metres tall, so you don't
describe things as -1024 bytes long. size_t makes sense unsigned
because negatives make no sense for size.

However, if you cast array.length to an int, it may work, haven't tested it.

--
James Miller

Mar 07 2012

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 03/07/2012 12:23 AM, Ali Çehreli wrote:
 There are probably hundreds of discussions about that over the years on
 many different language newsgroups and forums. :) There is no clear
 winner: Both sides of the arguments seem to have good points.

 Ali

know any good ones off the top of your head?

Mar 07 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 03/07/2012 07:51 PM, Ellery Newcomer wrote:
 On 03/07/2012 12:23 AM, Ali Çehreli wrote:
 There are probably hundreds of discussions about that over the years on
 many different language newsgroups and forums. :) There is no clear
 winner: Both sides of the arguments seem to have good points.

 Ali

 know any good ones off the top of your head?

Really randomly:

* Unsigned is better because some things are never negative: sizes, 
indexes, etc.

* But we occasionally need to subtract one from the other. The following 
is not safe unless we are sure that the left-hand side is greater:

    size_t i = /* ... */;
    size_t j = /* ... */;

    foo(i - j); // Is this the difference or a very large number?

Another bug because of an unsigned type:

    for (size_t i = /* ... */; i >= 0; --i) {

(Although I think most compilers catch that one.)

I searched for the following terms on comp.lang.c++.moderated and found 
many interesting discussions:

   size_t unsigned signed

I don't have a strong opinion myself.

Ali

Mar 07 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 03/07/2012 07:05 AM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is the
 benefit to size being unsigned rather than signed? This case would seem
 like one where allowing negatives is clearly better and more intuitive.

The problem is not that length is unsigned. The issue is the implicit 
conversion from signed to unsigned. The right thing would be to disallow 
signed -> unsigned and unsigned -> signed implicit conversion unless 
value range propagation can prove it safe, and to make comparison 
between signed and unsigned actually work by translating it to more than 
one machine instruction.

Mar 07 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 03/07/2012 11:01 AM, Timon Gehr wrote:
 On 03/07/2012 07:05 AM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is the
 benefit to size being unsigned rather than signed? This case would seem
 like one where allowing negatives is clearly better and more intuitive.

 The problem is not that length is unsigned. The issue is the implicit
 conversion from signed to unsigned. The right thing would be to disallow
 signed -> unsigned and unsigned -> signed implicit conversion unless
 value range propagation can prove it safe, and to make comparison
 between signed and unsigned actually work by translating it to more than
 one machine instruction.

Furthermore, bitwise boolean operators should still accept arguments of 
arbitrary signedness but the result should implicitly convert to both 
signed and unsigned.

Mar 07 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 3/7/12, Timon Gehr <timon.gehr gmx.ch> wrote:
 The problem is not that length is unsigned. The issue is the implicit
 conversion from signed to unsigned.

You bet. I've once had this hard to spot bug where I've used a call
that was something like max(0, min(10, <expression>)), and this ended
up returning a negative int because <expression> was combining an
integer with a .length property of some array (I don't recall the
exact call though). It was truly a WTF moment.

Mar 07 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, March 07, 2012 11:01:05 Timon Gehr wrote:
 On 03/07/2012 07:05 AM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is the
 benefit to size being unsigned rather than signed? This case would seem
 like one where allowing negatives is clearly better and more intuitive.

 
 The problem is not that length is unsigned. The issue is the implicit
 conversion from signed to unsigned. The right thing would be to disallow
 signed -> unsigned and unsigned -> signed implicit conversion unless
 value range propagation can prove it safe, and to make comparison
 between signed and unsigned actually work by translating it to more than
 one machine instruction.

Though that's one of those things that you're not going to convince Walter of 
- primarily, I believe, because it would require a lot more casting. The 
interesting part about _that_ is that if it's bad enough, it will actually 
make code _worse_, because the cast forces things. To really deal with it 
cleanly, you'd proabably need something similar to the const_cast nonsense in 
C++ except that it just converts signedness.

I suspect that the reality of the matter is that if we disallowed implicit 
conversions between signed and unsigned, a number of bugs would completely go 
away, but others would creep in as a result, and the overal situation wouldn't 
necessarily be any better, but I don't know. My initial reaction would be to 
agree with you, but there are definitely cases where such an approach would get 
annoying and bug-prone (due to the casting involved). But regardless, I really 
don't think that you're going to convince Walter on this one, given what he's 
said in the past.

- Jonathan M Davis

Mar 07 2012

Sean Cavanaugh <WorksOnMyMachine gmail.com> writes:

On 3/7/2012 12:57 PM, Jonathan M Davis wrote:
 On Wednesday, March 07, 2012 11:01:05 Timon Gehr wrote:
 On 03/07/2012 07:05 AM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is the


 I suspect that the reality of the matter is that if we disallowed implicit
 conversions between signed and unsigned, a number of bugs would completely go
 away, but others would creep in as a result, and the overal situation wouldn't
 necessarily be any better, but I don't know. My initial reaction would be to
 agree with you, but there are definitely cases where such an approach would get
 annoying and bug-prone (due to the casting involved). But regardless, I really
 don't think that you're going to convince Walter on this one, given what he's
 said in the past.

 - Jonathan M Davis

After writing enough container libraries and whatnot in C++, I always 
end up going bed while thinking life would be so much easier if implicit 
signed/unsigned conversions were not allowed.

Then I go to sleep, wake up, and realize how much more horrific the code 
would be in other places if this was actually true.

The best compromise would probably have been to make size_t signed when 
migrating to 64 bit memory addressing, and leave the unsigned/signed 
problems specificaly with size_t values back in the past of 32 bit and 
older systems.


On a related note I would love to see something in std.somewhere (conv?) 
to provide a complete set of narrowing and signed/unsigned conversion 
functions: The matrix is basically the following methods:
1)  Reduce size (64->32->16->8 bits) but maintain signedness with three 
types:
2)  Methods to flip signedness of the value (but mainting bitwidth)


multiplied against three types of operations:
a)  truncating (i.e. mask off the lower bits)
b)  saturating (values outside of range are clamped to min or max of the 
narrower range)
c)  throwing (values outside of narrow range throw a range exception)

Mar 07 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, March 07, 2012 13:20:41 Sean Cavanaugh wrote:
 On 3/7/2012 12:57 PM, Jonathan M Davis wrote:
 On Wednesday, March 07, 2012 11:01:05 Timon Gehr wrote:
 On 03/07/2012 07:05 AM, ixid wrote:
 Ah, thank you, so it's wrapping. That seems like a bad idea, what is the


 
 I suspect that the reality of the matter is that if we disallowed implicit
 conversions between signed and unsigned, a number of bugs would completely
 go away, but others would creep in as a result, and the overal situation
 wouldn't necessarily be any better, but I don't know. My initial reaction
 would be to agree with you, but there are definitely cases where such an
 approach would get annoying and bug-prone (due to the casting involved).
 But regardless, I really don't think that you're going to convince Walter
 on this one, given what he's said in the past.
 
 - Jonathan M Davis

 
 After writing enough container libraries and whatnot in C++, I always
 end up going bed while thinking life would be so much easier if implicit
 signed/unsigned conversions were not allowed.
 
 Then I go to sleep, wake up, and realize how much more horrific the code
 would be in other places if this was actually true.
 
 The best compromise would probably have been to make size_t signed when
 migrating to 64 bit memory addressing, and leave the unsigned/signed
 problems specificaly with size_t values back in the past of 32 bit and
 older systems.
 
 
 On a related note I would love to see something in std.somewhere (conv?)
 to provide a complete set of narrowing and signed/unsigned conversion
 functions: The matrix is basically the following methods:
 1) Reduce size (64->32->16->8 bits) but maintain signedness with three
 types:
 2) Methods to flip signedness of the value (but mainting bitwidth)
 
 
 multiplied against three types of operations:
 a) truncating (i.e. mask off the lower bits)
 b) saturating (values outside of range are clamped to min or max of the
 narrower range)
 c) throwing (values outside of narrow range throw a range exception)

std.conv.to will check non-implicit, narrowing conversions to make sure that 
the number will fit in the new type (throwing a ConvException if it won't), but 
if you're converting between types of the same size but which differ in 
signedness, then that's an implicit conversion, at it's the same as if you 
didn't use std.conv.to at all.

- Jonathan M Davis

Mar 07 2012

Ary Manzana <ary esperanto.org.ar> writes:

On 3/7/12 2:28 AM, Ali Çehreli wrote:
 On 03/06/2012 09:11 PM, ixid wrote:
  > I'm writing my first basic algorithms, this one is merge sort. This
  > version throws an exception when array.length - setSize is negative
  > (which should be fine, the rest of my function would deal with it):
  >
  > template mergeSort(T)
  > {
  > void mergeSort(ref T[] array, const T setSize = 100)
  > {
  > T[][] merge;
  > merge.length = array.length / setSize;
  > T ii, jj;
  > for(ii = 0, jj = 0;ii < array.length - setSize;ii += setSize, ++jj)

 We don't know what T is, but I assume a signed type like int.

 array.length is size_t, i.e. an unsigned type. Unsigned types have this
 nasty habit of converting the entire expression to unsigned (that is a
 rule since C). So array.length - setSize above is size_t.

Wouldn't it make more sense to convert the entire expression to signed 
if there's at list one signed integer?

I really don't get it...

It's like déjà-vu all the time in this newsgroup.

Mar 07 2012

James Miller <james aatch.net> writes:

On 8 March 2012 15:39, Ary Manzana <ary esperanto.org.ar> wrote:
 On 3/7/12 2:28 AM, Ali =C3=87ehreli wrote:
 On 03/06/2012 09:11 PM, ixid wrote:
 =C2=A0> I'm writing my first basic algorithms, this one is merge sort. T=


his
 =C2=A0> version throws an exception when array.length - setSize is negat=


ive
 =C2=A0> (which should be fine, the rest of my function would deal with i=


t):
 =C2=A0>
 =C2=A0> template mergeSort(T)
 =C2=A0> {
 =C2=A0> void mergeSort(ref T[] array, const T setSize =3D 100)
 =C2=A0> {
 =C2=A0> T[][] merge;
 =C2=A0> merge.length =3D array.length / setSize;
 =C2=A0> T ii, jj;
 =C2=A0> for(ii =3D 0, jj =3D 0;ii < array.length - setSize;ii +=3D setSi=


ze, ++jj)
 We don't know what T is, but I assume a signed type like int.

 array.length is size_t, i.e. an unsigned type. Unsigned types have this
 nasty habit of converting the entire expression to unsigned (that is a
 rule since C). So array.length - setSize above is size_t.


 Wouldn't it make more sense to convert the entire expression to signed if
 there's at list one signed integer?

 I really don't get it...

 It's like d=C3=A9j=C3=A0-vu all the time in this newsgroup.

Its the semantics in C/C++ and D explicitly tries to have the same
semantics as them. From what I remember its to aid people moving from
those language to D.

--
James Miller

Mar 07 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, March 08, 2012 16:15:08 James Miller wrote:
 Its the semantics in C/C++ and D explicitly tries to have the same
 semantics as them. From what I remember its to aid people moving from
 those language to D.

More like it's to avoid code silently breaking when it's ported. In general, 
C/C++ code is either valid D code with identical semantics, or it won't 
compile as D code. There are a _few_ exceptions (e.g. static arrays are passed 
by value), but they're far and few between. Porting C/C++ code to D would be 
much nastier if the C/C++ compiled as D code but had different semantics.

In addition to that, I believe that Walter's take on it is to generally leave 
stuff as it is in C/C++ unless there's a good reason to change it. Obviously, 
there's a lot which was worth changing, and there's tons of stuff in D which is 
different form C/C++. But there's a lot of little stuff that wasn't messed
with, 
because there just wasn't a good reason to.

- Jonathan M Davis

Mar 07 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 03/08/2012 03:39 AM, Ary Manzana wrote:
 On 3/7/12 2:28 AM, Ali Çehreli wrote:
 On 03/06/2012 09:11 PM, ixid wrote:
 I'm writing my first basic algorithms, this one is merge sort. This
 version throws an exception when array.length - setSize is negative
 (which should be fine, the rest of my function would deal with it):

 template mergeSort(T)
 {
 void mergeSort(ref T[] array, const T setSize = 100)
 {
 T[][] merge;
 merge.length = array.length / setSize;
 T ii, jj;
 for(ii = 0, jj = 0;ii < array.length - setSize;ii += setSize, ++jj)

 We don't know what T is, but I assume a signed type like int.

 array.length is size_t, i.e. an unsigned type. Unsigned types have this
 nasty habit of converting the entire expression to unsigned (that is a
 rule since C). So array.length - setSize above is size_t.

 Wouldn't it make more sense to convert the entire expression to signed
 if there's at list one signed integer?

This is just as arbitrary as the rule that says the entire expression is 
converted to unsigned.

Mar 08 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Mar 07, 2012 at 06:11:18AM +0100, ixid wrote:
 I'm writing my first basic algorithms, this one is merge sort. This
 version throws an exception when array.length - setSize is negative
 (which should be fine, the rest of my function would deal with it):

[...]

array.length is of type size_t, which is an *unsigned* integer type. If
setSize > array.length, then array.length - setSize will underflow and
become a large positive number.


T

-- 
They say that "guns don't kill people, people kill people." Well I think
the gun helps. If you just stood there and yelled BANG, I don't think
you'd kill too many people. -- Eddie Izzard, Dressed to Kill

Mar 06 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - 0 < negative loop condition bug or misunderstanding on my part