digitalmars.D - why implicitly allowing compare ubyte and byte sucks

davidl (37/37) Jun 10 2009 ubyte func()

Walter Bright (10/12) Jun 11 2009 Consider:

Jarrett Billingsley (3/13) Jun 11 2009 Weren't polysemous types supposed to avoid all that?

Walter Bright (2/3) Jun 11 2009 It kept getting too complicated.

Lionello Lunesu (9/25) Jun 11 2009 Why is "1" an int? Can't it be treated similar to the way string
Don (20/36) Jun 11 2009 The problem is a lot more specific than that.

Frits van Bommel (3/4) Jun 11 2009 [end of message]

Don (12/17) Jun 11 2009

Andrei Alexandrescu (6/48) Jun 11 2009 Yeah, where are zose :o).

bearophile (5/8) Jun 11 2009 Walter is kind of magic, I see :-)) He brings toys.

Derek Parnell (9/33) Jun 11 2009 I think that the common type for byte and ubyte is short. Byte and ubyte

Don (5/34) Jun 11 2009 But then you still have the problem that the high half of the short was

Rainer Deyke (19/23) Jun 11 2009 I disagree. In fact, I don't sign extension or conversion to a common

Don (21/45) Jun 11 2009 That's true. What you are doing is removing the int/byte inconsistency,

Rainer Deyke (11/21) Jun 12 2009 True. I don't consider C compatibility a major issue, but others do.

Sean Kelly (3/15) Jun 11 2009 Until we get polysemous values, that is ;-) Assuming that's still on

davidl <davidl nospam.org> writes:

ubyte func()
{
	return 255;
}

const byte VAR = cast(byte)0xff;
void main()
{

	assert(func == VAR);
	assert(255 == VAR);
}

even if you take a look at the ASM( if not carefully enough ), you might  
still be fooled in some chances.

testcmp.d:10    assert(func == VAR);
0040201b: e8f0ffffff              call 0x402010 testcmp.func testcmp.d:1
00402020: 0fb6c0                  movzx eax, al
00402023: 83f8ff                  cmp eax, 0xff
00402026: 740a                    jz 0x402032   _Dmain testcmp.d:11
00402028: b80a000000              mov eax, 0xa
0040202d: e80e000000              call 0x402040 testcmp.__assert
testcmp.d:11    assert(255 == VAR);
00402032: b80b000000              mov eax, 0xb
00402037: e804000000              call 0x402040 testcmp.__assert
testcmp.d:12 }
0040203c: 5d                      pop ebp
0040203d: c3                      ret

It seems that comparing two different operands with different size makes  
no sense. The compiler should issue an error against that.

Comparing ubyte to byte may lead one to think they are compared in the  
sense of the same size.

This behavior doesn't consist with int and uint:

	int j=-1;
	assert(j==uint.max); // this test passes

	byte k=-1;
	assert(k==ubyte.max); // this test fails

This inconsistent behavior is pretty nasty.

-- 
ʹ�� Opera �����Եĵ����ʼ��ͻ�����: http://www.opera.com/mail/

Jun 10 2009

Walter Bright <newshound1 digitalmars.com> writes:

davidl wrote:
 It seems that comparing two different operands with different size makes 
 no sense. The compiler should issue an error against that.

Consider:

    byte b;
    if (b == 1)

here you're comparing two different sizes, a byte and an int. 
Disallowing such (in its various incarnations) is a heavy burden, as the 
user will have to insert lots of ugly casts.

There really isn't any escaping from the underlying representation of 
2's complement arithmetic with its overflows, wrap-arounds, sign 
extensions, etc.

Jun 11 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

2009/6/11 Walter Bright <newshound1 digitalmars.com>:
 davidl wrote:
 It seems that comparing two different operands with different size makes
 no sense. The compiler should issue an error against that.

 Consider:

 =A0 byte b;
 =A0 if (b =3D=3D 1)

 here you're comparing two different sizes, a byte and an int. Disallowing
 such (in its various incarnations) is a heavy burden, as the user will ha=

ve
 to insert lots of ugly casts.

Weren't polysemous types supposed to avoid all that?

Jun 11 2009

Walter Bright <newshound1 digitalmars.com> writes:

Jarrett Billingsley wrote:
 Weren't polysemous types supposed to avoid all that?

It kept getting too complicated.

Jun 11 2009

Lionello Lunesu <lio lunesu.remove.com> writes:

Walter Bright wrote:
 davidl wrote:
 It seems that comparing two different operands with different size 
 makes no sense. The compiler should issue an error against that.

 
 Consider:
 
    byte b;
    if (b == 1)
 
 here you're comparing two different sizes, a byte and an int. 
 Disallowing such (in its various incarnations) is a heavy burden, as the 
 user will have to insert lots of ugly casts.
 
 There really isn't any escaping from the underlying representation of 
 2's complement arithmetic with its overflows, wrap-arounds, sign 
 extensions, etc.

Why is "1" an int? Can't it be treated similar to the way string 
literals are treated: "a string literal" can be string, wstring and dstring:

dstring test = "Asdf";
int main()
{
  return test == "asdf";
}

L.

Jun 11 2009

Don <nospam nospam.com> writes:

Walter Bright wrote:
 davidl wrote:
 It seems that comparing two different operands with different size 
 makes no sense. The compiler should issue an error against that.

 
 Consider:
 
    byte b;
    if (b == 1)
 
 here you're comparing two different sizes, a byte and an int. 
 Disallowing such (in its various incarnations) is a heavy burden, as the 
 user will have to insert lots of ugly casts.
 
 There really isn't any escaping from the underlying representation of 
 2's complement arithmetic with its overflows, wrap-arounds, sign 
 extensions, etc.

The problem is a lot more specific than that.
The unexpected behaviour comes from the method used to promote two types 
to a common type, when both are smaller than int, but of different 
signedness. Intuitively, you expect the common type of {byte, ubyte} to 
be ubyte, by analogy to {int, uint}->uint, and {long, ulong}->ulong. But 
instead, the common type is int!

The involvement of 'int' in the promotion process is kind of bizarre, 
really. It's a consequence of the fact that in C, short and char are 
second-class citizens, only really intended for saving space. The 
semantics of operations on two different space-saving types are a bit 
problematic.

I think it's true that

byte  == ubyte, byte  == ushort,
short == ubyte, short == ushort

are almost always errors. Could we just make those four illegal?
BTW, it just occured to me that these four (and only these four) are the 
cases where a "signed/unsigned mismatch" warning is actually helpful. A 
signed-unsigned warning involving 'int' is almost always spurious.

For bonus points:

Jun 11 2009

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Don wrote:
 For bonus points:

[end of message]

I guess nobody'll be getting those bonus points then... :P

Jun 11 2009

Don <nospam nospam.com> writes:

Frits van Bommel wrote:
 Don wrote:
 For bonus points:

 [end of message]
 
 I guess nobody'll be getting those bonus points then... :P

<g>

For bonus points:
Code like the following is also almost certainly a bug:
byte b = -1;
if (b == 255)  ... // FALSE!

When variable of byte or short type is compared with a positive literal 
of value > byte.max or short.max respectively, or when an ubyte or 
ushort is compared with a negative literal, it's pretty much the same 
situation.
Flagging an error for this situation would typically reveal the root 
cause: b should have been 'ubyte', not 'byte'.

Jun 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Walter Bright wrote:
 davidl wrote:
 It seems that comparing two different operands with different size 
 makes no sense. The compiler should issue an error against that.

 Consider:

    byte b;
    if (b == 1)

 here you're comparing two different sizes, a byte and an int. 
 Disallowing such (in its various incarnations) is a heavy burden, as 
 the user will have to insert lots of ugly casts.

 There really isn't any escaping from the underlying representation of 
 2's complement arithmetic with its overflows, wrap-arounds, sign 
 extensions, etc.

 
 The problem is a lot more specific than that.
 The unexpected behaviour comes from the method used to promote two types 
 to a common type, when both are smaller than int, but of different 
 signedness. Intuitively, you expect the common type of {byte, ubyte} to 
 be ubyte, by analogy to {int, uint}->uint, and {long, ulong}->ulong. But 
 instead, the common type is int!
 
 The involvement of 'int' in the promotion process is kind of bizarre, 
 really. It's a consequence of the fact that in C, short and char are 
 second-class citizens, only really intended for saving space. The 
 semantics of operations on two different space-saving types are a bit 
 problematic.
 
 I think it's true that
 
 byte  == ubyte, byte  == ushort,
 short == ubyte, short == ushort
 
 are almost always errors. Could we just make those four illegal?
 BTW, it just occured to me that these four (and only these four) are the 
 cases where a "signed/unsigned mismatch" warning is actually helpful. A 
 signed-unsigned warning involving 'int' is almost always spurious.
 
 For bonus points:

Yeah, where are zose :o).

Hey, please bugzillize everything. Walter is almost done with revamping 

C/C++. I just found three bugs in phobos by using his alpha compiler.


Andrei

Jun 11 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 Hey, please bugzillize everything. Walter is almost done with revamping 

 C/C++. I just found three bugs in phobos by using his alpha compiler.

Walter is kind of magic, I see :-)) He brings toys.

Once the designer of Haskell has said to himself: "Avoid success at any cost".
There's worst fate than not having success: maybe being a boring language? :-)

Bye,
bearophile

Jun 11 2009

Derek Parnell <derek psych.ward> writes:

On Fri, 12 Jun 2009 02:08:14 +0200, Don wrote:

 Walter Bright wrote:
 davidl wrote:
 It seems that comparing two different operands with different size 
 makes no sense. The compiler should issue an error against that.

 
 Consider:
 
    byte b;
    if (b == 1)
 
 here you're comparing two different sizes, a byte and an int. 
 Disallowing such (in its various incarnations) is a heavy burden, as the 
 user will have to insert lots of ugly casts.
 
 There really isn't any escaping from the underlying representation of 
 2's complement arithmetic with its overflows, wrap-arounds, sign 
 extensions, etc.

 
 The problem is a lot more specific than that.
 The unexpected behaviour comes from the method used to promote two types 
 to a common type, when both are smaller than int, but of different 
 signedness. Intuitively, you expect the common type of {byte, ubyte} to 
 be ubyte, by analogy to {int, uint}->uint, and {long, ulong}->ulong. But 
 instead, the common type is int!

I think that the common type for byte and ubyte is short. Byte and ubyte
have overlapping ranges of values (-127 to 127) and (0 to 255) so a common
type would have to be able to hold both these ranges at least, and short
(16-bit signed integer) does that.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Jun 11 2009

Don <nospam nospam.com> writes:

Derek Parnell wrote:
 On Fri, 12 Jun 2009 02:08:14 +0200, Don wrote:
 
 Walter Bright wrote:
 davidl wrote:
 It seems that comparing two different operands with different size 
 makes no sense. The compiler should issue an error against that.

 Consider:

    byte b;
    if (b == 1)

 here you're comparing two different sizes, a byte and an int. 
 Disallowing such (in its various incarnations) is a heavy burden, as the 
 user will have to insert lots of ugly casts.

 There really isn't any escaping from the underlying representation of 
 2's complement arithmetic with its overflows, wrap-arounds, sign 
 extensions, etc.

 The problem is a lot more specific than that.
 The unexpected behaviour comes from the method used to promote two types 
 to a common type, when both are smaller than int, but of different 
 signedness. Intuitively, you expect the common type of {byte, ubyte} to 
 be ubyte, by analogy to {int, uint}->uint, and {long, ulong}->ulong. But 
 instead, the common type is int!

 
 I think that the common type for byte and ubyte is short. Byte and ubyte
 have overlapping ranges of values (-127 to 127) and (0 to 255) so a common
 type would have to be able to hold both these ranges at least, and short
 (16-bit signed integer) does that.

But then you still have the problem that the high half of the short was 
extended from the low half in two different ways, once by sign-extend, 
once by zero-extend. Mixing sign-extend and zero-extend in the same 
expression is asking for trouble.

Jun 11 2009

Rainer Deyke <rainerd eldwood.com> writes:

Don wrote:
 But then you still have the problem that the high half of the short was
 extended from the low half in two different ways, once by sign-extend,
 once by zero-extend. Mixing sign-extend and zero-extend in the same
 expression is asking for trouble.

I disagree.  In fact, I don't sign extension or conversion to a common
type should even be necessary.

Given value 's' of type 'sT' and unsigned value 'u' of type 'uT', where
'sT' and 'uT' have the same width, comparisons should be translated as
follows:
  's == u' --> 's >= 0 && cast(uT)(s) == u'
  's != u' --> 's < 0 || cast(uT)(s) != u'
  's < u' --> 's < 0 || cast(uT)(s) < u'
  's <= u' --> 's < 0 || cast(uT)(s) <= u'
  's > u' --> 's >= 0 && cast(uT)(s) > u'
  's >= u' --> 's > 0 && cast(uT)(s) >= u'

This system would always work, even when no type exists that can hold
all possible values of both 'sT' and 'uT'.  And it would always be
*correct*, i.e. negative values would always be smaller than and
different from positive values, even when the positive value is outside
the range of any signed type.

-- 
Rainer Deyke - rainerd eldwood.com

Jun 11 2009

Don <nospam nospam.com> writes:

Rainer Deyke wrote:
 Don wrote:
 But then you still have the problem that the high half of the short was
 extended from the low half in two different ways, once by sign-extend,
 once by zero-extend. Mixing sign-extend and zero-extend in the same
 expression is asking for trouble.

 
 I disagree.  In fact, I don't sign extension or conversion to a common
 type should even be necessary.

Doing _no_ extension doesn't cause problems, of course.

 
 Given value 's' of type 'sT' and unsigned value 'u' of type 'uT', where
 'sT' and 'uT' have the same width, comparisons should be translated as
 follows:
   's == u' --> 's >= 0 && cast(uT)(s) == u'
   's != u' --> 's < 0 || cast(uT)(s) != u'
   's < u' --> 's < 0 || cast(uT)(s) < u'
   's <= u' --> 's < 0 || cast(uT)(s) <= u'
   's > u' --> 's >= 0 && cast(uT)(s) > u'
   's >= u' --> 's > 0 && cast(uT)(s) >= u'
 
 This system would always work, even when no type exists that can hold
 all possible values of both 'sT' and 'uT'.  And it would always be
 *correct*, i.e. negative values would always be smaller than and
 different from positive values, even when the positive value is outside
 the range of any signed type.

That's true. What you are doing is removing the int/byte inconsistency, 
by making  uint == int comparisons behave the same way that ubyte == 
byte comparisons do now.
Notice that your proposal
(1) preserves the existing behaviour of byte==ubyte (which the original 
poster was complaing about);
(2) silently changes the behaviour of existing D and C code (that 
involves int==uint); and
(3) assumes that the code as written is what the programmer intended. I 
suspect that this type of code is frequently an indicator of a bug. 
Consider:

const ubyte u = 0xFF;
byte b;
if (b == u) ...

After your transformation, this will be:

if (false) ...

But actually the code has a simple bug: b should have been ubyte. I 
think this is a pretty common bug (I've done it several times myself).

(2) is fatal, I think.

Jun 11 2009

Rainer Deyke <rainerd eldwood.com> writes:

Don wrote:
 That's true. What you are doing is removing the int/byte inconsistency,
 by making  uint == int comparisons behave the same way that ubyte ==
 byte comparisons do now.
 Notice that your proposal
 (1) preserves the existing behaviour of byte==ubyte (which the original
 poster was complaing about);

Yes.

 (2) silently changes the behaviour of existing D and C code (that
 involves int==uint); and

True.  I don't consider C compatibility a major issue, but others do.
(If C compatibility was a major issue for me, I'd never even consider
moving from C++ to D.)

 (3) assumes that the code as written is what the programmer intended. I
 suspect that this type of code is frequently an indicator of a bug.

Yes, but the opposite behavior is just as likely to be a bug.  Between
two behaviors that mask possible bugs, I'd rather have the
mathematically correct behavior.  The alternative is to flat-out ban
comparison of mixed-sign types.


-- 
Rainer Deyke - rainerd eldwood.com

Jun 12 2009

Sean Kelly <sean invisibleduck.org> writes:

Walter Bright wrote:
 davidl wrote:
 It seems that comparing two different operands with different size 
 makes no sense. The compiler should issue an error against that.

 
 Consider:
 
    byte b;
    if (b == 1)
 
 here you're comparing two different sizes, a byte and an int. 
 Disallowing such (in its various incarnations) is a heavy burden, as the 
 user will have to insert lots of ugly casts.

Until we get polysemous values, that is ;-)  Assuming that's still on 
the radar...

Jun 11 2009

D Programming

C/C++ Programming

Other

digitalmars.D - why implicitly allowing compare ubyte and byte sucks