digitalmars.D.learn - Mixing operations with signed and unsigned types

Michal Minich (34/34) Jun 29 2010 I was surprised by the behavior of division. The resulting type of

Stewart Gordon (13/46) Jun 29 2010 Going by the spec

bearophile (4/5) Jun 29 2010 I have added my vote there a lot of time ago. I think Andrei says that f...

Michal Minich (3/14) Jun 29 2010 Why on the earth should array indexes and lengths be signed !!! My brain...

bearophile (6/9) Jun 29 2010 One other partial solution is to introduce optional runtime integral ove...

Michal Minich (4/15) Jun 29 2010 I voted for the bug, but IMO it should be fixed by other means as making...
Stewart Gordon (13/21) Jun 29 2010 That's probably because many people neglect the unsigned types, instead

bearophile (5/14) Jun 30 2010 Yes, of course that code needs to be fixed after the change I have sugge...

Stewart Gordon (6/13) Jun 30 2010 That code needs to be "fixed"? My point was that being forced to use

bearophile (12/15) Jul 01 2010 Yes, in my opinion it needs to be fixed. Using unsigned integers in D is...

Stewart Gordon (25/43) Jul 01 2010 If it's logical and the program works, it isn't objectively wrong. Some...

Michal Minich (4/14) Jun 29 2010 point 4.4 in docs - "The signed type is converted to the unsigned type."...

Michal Minich (3/3) Jun 29 2010 There is very long discussion on digitamars.D ng "Is there ANY chance we...
bearophile (41/59) Jul 04 2010 Right. But bug-prone means that often enough people write code that does...

Stewart Gordon (30/69) Jul 05 2010 I didn't think you were asking for unsigned types to be removed. My

Ellery Newcomer (8/24) Jul 05 2010 Another important difference is the point of non-'continuity'

Stewart Gordon (6/24) Jul 06 2010 Just using uint, of course!

Ellery Newcomer (14/28) Jul 06 2010 For enforcing a non-negative constraint, that is brain damaged.

Stewart Gordon (18/40) Jul 08 2010 So effectively, the edit wars would be between people thinking at cross

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (12/28) Jul 08 2010 n

bearophile (73/91) Jul 05 2010 This happens and it's one of the usage examples of unsigned values, but ...

Michal Minich <michal.minich gmail.com> writes:

I was surprised by the behavior of division. The resulting type of 
division in example below is uint and the value is incorrect. I would 
expect that when one of operands is signed, then the result is signed 
type. 

int  a = -6;
uint b = 2;
auto c = a / b;          // c is type of uint, and has value 2147483645
int  d = a / b;          // int,  2147483645
auto e = a / cast(int)b; // e, -3 (ok)

I have longer time problems with mixing int and uint, so I tested some 
expression now and here is the result. 

auto f = a - b           // uint, 4294967288
auto g = a + b           // uint, 4294967292
auto h = a < b           // bool, false
auto i = a > b           // bool, true

Recently while I was hunting some bug in templated code, I created a 
templated function for operator <, which requires both arguments to be 
either signed or unsigned. Fortunately D such function was quite easy to 
do, if it wasn't possible I don't know if I would ever find form where 
the ints and uints come from...

bool sameSign (A, B) () {
    return isUnsigned!(A) && isUnsigned!(B)) || (isSigned!(A) && isSigned!
(B);
}

bool lt (A, B) (A a, B b) {
    static assert (sameSign!(A, B) ());
    return a < b;
}

Could somebody please tell me why is this behavior, when mixing signed 
and unsigned, preferred over one that computes correct result. If this 
cannot be changed, is it possible to just make compiler error/warning 
when such incorrect calculation could occur. If it is possible in D code 
to require same-signed types for function, it is definitely possible for 
compiler to require explicit cast in such cases.

Jun 29 2010

Stewart Gordon <smjg_1998 yahoo.com> writes:

Michal Minich wrote:
 I was surprised by the behavior of division. The resulting type of 
 division in example below is uint and the value is incorrect. I would 
 expect that when one of operands is signed, then the result is signed 
 type. 

Going by the spec
http://www.digitalmars.com/d/1.0/type.html
"Usual Arithmetic Conversions"
the compiler is behaving correctly.  But see below....

<snip>
 auto f = a - b           // uint, 4294967288
 auto g = a + b           // uint, 4294967292
 auto h = a < b           // bool, false
 auto i = a > b           // bool, true
 
 Recently while I was hunting some bug in templated code, I created a 
 templated function for operator <, which requires both arguments to be 
 either signed or unsigned.

It is in fact a bug that DMD accepts it.
http://www.digitalmars.com/d/1.0/expression.html#RelExpression
http://d.puremagic.com/issues/show_bug.cgi?id=259

 Fortunately D such function was quite easy to 
 do, if it wasn't possible I don't know if I would ever find form where 
 the ints and uints come from...
 
 bool sameSign (A, B) () {
     return isUnsigned!(A) && isUnsigned!(B)) || (isSigned!(A) && isSigned!
 (B);
 }
 
 bool lt (A, B) (A a, B b) {
     static assert (sameSign!(A, B) ());
     return a < b;
 }
 
 Could somebody please tell me why is this behavior, when mixing signed 
 and unsigned, preferred over one that computes correct result.

It would appear to be Walter's idea of C compatibility taking control 
again.

 If this cannot be changed, is it possible to just make compiler 
 error/warning when such incorrect calculation could occur. If it is 
 possible in D code to require same-signed types for function, it is 
 definitely possible for compiler to require explicit cast in such 
 cases.

I agree.  Either behave sensibly or generate an error.

Stewart.

Jun 29 2010

bearophile <bearophileHUGS lycos.com> writes:

Stewart Gordon:
 http://d.puremagic.com/issues/show_bug.cgi?id=259

I have added my vote there a lot of time ago. I think Andrei says that fixing
this is unworkable, but I don't know why. If you make this an error and at the
same time turn array indexes/lengths into signed values, you don't have that
many unsigned values in normal D programs, so you need very few casts and it
becomes workable.

Bye,
bearophile

Jun 29 2010

Michal Minich <michal.minich gmail.com> writes:

On Tue, 29 Jun 2010 19:42:45 -0400, bearophile wrote:

 Stewart Gordon:
 http://d.puremagic.com/issues/show_bug.cgi?id=259

 
 I have added my vote there a lot of time ago. I think Andrei says that
 fixing this is unworkable, but I don't know why. If you make this an
 error and at the same time turn array indexes/lengths into signed
 values, you don't have that many unsigned values in normal D programs,
 so you need very few casts and it becomes workable.
 
 Bye,
 bearophile

Why on the earth should array indexes and lengths be signed !!! My brain 
just explodes when I think of something like that.

Jun 29 2010

bearophile <bearophileHUGS lycos.com> writes:

Michal Minich:

Why on the earth should array indexes and lengths be signed !!!

I have explained why lengthy elsewhere. Short answer: signed fixnum integers
are a bad approximation of natural numbers, because they are limited in range,
they don't even tell you when you try to step out of their limits, and their
limits aren't even symmetrical (so you can't perform abs(int.min)). But
unsigned numbers are an even worse approximation, C signed-unsigned conversion
rules turn signed values into unsigned in silly situations, and lot of
programmers are bad in using them (this means they sometimes write buggy code
when they use unsigned values. Yet the language forces such any kind of
programmers to use unsigned integers often in even normal simple programs,
because indexes and array lengths are everywhere). Unsigned values are unsafe,
they are good if you need an array of bits to implement a bit set, or if you
want to perform bitwise operations, otherwise I think they are often the wrong
choice in D (I don't want to remove them as in Java because in some situations
they are very useful, especially in a near-system-language as D).


 I voted for the bug, but IMO it should be fixed by other means

One other partial solution is to introduce optional runtime integral overflows
in D (probably two independent switches are needed, one for signed and one for
unsigned integral overflows).


 and would probably affect lot of code.

Yes, but often for the better ;-)

Bye,
bearophile

Jun 29 2010

Michal Minich <michal.minich gmail.com> writes:

On Tue, 29 Jun 2010 19:42:45 -0400, bearophile wrote:

 Stewart Gordon:
 http://d.puremagic.com/issues/show_bug.cgi?id=259

 
 I have added my vote there a lot of time ago. I think Andrei says that
 fixing this is unworkable, but I don't know why. If you make this an
 error and at the same time turn array indexes/lengths into signed
 values, you don't have that many unsigned values in normal D programs,
 so you need very few casts and it becomes workable.
 
 Bye,
 bearophile

I voted for the bug, but IMO it should be fixed by other means as making 
array indexes and lengths signed. It makes no sense for me, and would 
probably affect lot of code.

Jun 29 2010

Stewart Gordon <smjg_1998 yahoo.com> writes:

bearophile wrote:
 Stewart Gordon:
 http://d.puremagic.com/issues/show_bug.cgi?id=259

 
 I have added my vote there a lot of time ago. I think Andrei says 
 that fixing this is unworkable, but I don't know why. If you make 
 this an error and at the same time turn array indexes/lengths into 
 signed values, you don't have that many unsigned values in normal D 
 programs, so you need very few casts and it becomes workable.

That's probably because many people neglect the unsigned types, instead 
using the signed types for array indices and the like.

Array indices are actually of type size_t.  Effectively, what you seem 
to be suggesting is that size_t be the same as ptrdiff_t.

There is, however, another problem: signed types convert implicitly to 
unsigned types, though they do generate a warning if compiled with -w 
(except peculiarly for int/uint).  Removing this implicit conversion 
would break certain existing code that uses signed types where it should 
be using unsigned.  If we also change array indices to be signed, it 
would break that code that sensibly uses unsigned types, which is 
probably worse.

Stewart.

Jun 29 2010

bearophile <bearophileHUGS lycos.com> writes:

Stewart Gordon:
 what you seem to be suggesting is that size_t be the same as ptrdiff_t.

Yes, but an unsigned word type needs to be kept in the language.


 There is, however, another problem: signed types convert implicitly to 
 unsigned types, though they do generate a warning if compiled with -w 
 (except peculiarly for int/uint).  Removing this implicit conversion 
 would break certain existing code that uses signed types where it should 
 be using unsigned.

 If we also change array indices to be signed, it 
 would break that code that sensibly uses unsigned types, which is 
 probably worse.

Yes, of course that code needs to be fixed after the change I have suggested. A
"breaking change" means that some of the old code needs to be fixed.

Bye,
bearophile

Jun 30 2010

Stewart Gordon <smjg_1998 yahoo.com> writes:

bearophile wrote:
 Stewart Gordon:

<snip>
 If we also change array indices to be signed, it would break that
 code that sensibly uses unsigned types, which is probably worse.

 
 Yes, of course that code needs to be fixed after the change I have 
 suggested. A "breaking change" means that some of the old code needs to 
 be fixed.

That code needs to be "fixed"?  My point was that being forced to use 
signed types for values that cannot possibly be negative doesn't to me 
constitute fixing anything.

Stewart.

Jun 30 2010

bearophile <bearophileHUGS lycos.com> writes:

Stewart Gordon:
 That code needs to be "fixed"?  My point was that being forced to use 
 signed types for values that cannot possibly be negative doesn't to me 
 constitute fixing anything.

Yes, in my opinion it needs to be fixed. Using unsigned integers in D is a
hazard, so if you use them where they are not necessary (and representing
positive-only values is often not one of such cases) then you are doing
something wrong, or doing premature optimization. Using a unsigned value to
represent a positive-only value is not going to increase your program safety as
it happens for example in Delphi, in D it decreases your program resilience.

Using size_t and uint in your code where you can use an int is something that
needs to be fixed, in my opinion. Normal D programmers writing very mundane
code must not be forced to face unsigned values every few lines of code.
Unsigned values in D are quite bug-prone, so the language has to avoid putting
them on your plate every time you want to write some code. You need to be free
to use them when you want, but it's better for you to use them only when
necessary.

Unsigned values have some purposes, like representing bit fields, representing
very large integers (over signed values range) when you are optimizing your
code and with your profiler you have found a hot spot and you want to reduce
space used or increase performance, to work with bitwise operators, to work
with bit fields, and few more. But letting all programmers, even D newbies mess
with unsigned values every time they want to use an array length is something
that will cause a very large number of bugs and wasted programming time in
future D programs. You will need a hard evidence to convince me this is false.

If you want to make your D code a bit more safe you have to write code like:
cast(int)somearray_.length - degree
because if you write more normal expressions like:
somearray_.length - degree
You can quickly put some bugs in your code :-) I have written something like
300_000 lines of D code so far, and I have found a good number of
unsigned-derived bugs in my code. Good luck with your code.


lengths and indexes are signed. Maybe they know better than Walter and you
about this design detail.

Bye,
bearophile

Jul 01 2010

Stewart Gordon <smjg_1998 yahoo.com> writes:

bearophile wrote:
<snip>
 Yes, in my opinion it needs to be fixed.  Using unsigned integers 
 in D is a hazard, so if you use them where they are not necessary 
 (and representing positive-only values is often not one of such 
 cases) then you are doing something wrong,

If it's logical and the program works, it isn't objectively wrong.  Some 
of us prefer to use unsigned types where the value is semantically 
unsigned, and know what we're doing.  So any measures to stronghold 
programmers against using them are going to be a nuisance.

I can also imagine promoting your mindset leading to edit wars between 
developers declaring an int and then putting
     assert (qwert >= 0);
in the class invariant, and those who see this and think it's brain-damaged.

<snip>
 Using size_t and uint in your code where you can use an int is 
 something that needs to be fixed, in my opinion.  Normal D 
 programmers writing very mundane code must not be forced to face 
 unsigned values every few lines of code.

True, but that doesn't mean that we should force programmers to use 
signed values for nearly everything.

But it is all the more reason to fix unsigned op signed to be signed, if 
it is to be allowed at all.  The way it is at the moment, a single 
unsigned value in a formula can force the whole result to be unsigned, 
thereby leading to unexpected results.

 Unsigned values in D are quite bug-prone, so the language has to
 avoid putting them on your plate every time you want to write some
 code.  You need to be free to use them when you want, but it's better
 for you to use them only when necessary.

You could make a similar argument the same about integer types 
generally.  People coming from BASIC backgrounds, or new to programming 
generally, are sooner or later going to have some work to do when they 
find that 1/4 != 0.25.  Add to that the surprise that is silent overflow....

 Unsigned values have some purposes, like representing bit fields, 
 representing very large integers (over signed values range) when 
 you are optimizing your code and with your profiler you have found 
 a hot spot and you want to reduce space used or increase 
 performance, to work with bitwise operators, to work with bit 
 fields, and few more.

<snip>

Interfacing file formats.  Simplifying certain conditional expressions. 
  Making code self-documenting.  Maybe others....

Stewart.

Jul 01 2010

Michal Minich <michal.minich gmail.com> writes:

On Wed, 30 Jun 2010 00:30:19 +0100, Stewart Gordon wrote:

 Michal Minich wrote:
 I was surprised by the behavior of division. The resulting type of
 division in example below is uint and the value is incorrect. I would
 expect that when one of operands is signed, then the result is signed
 type.

 
 Going by the spec
 http://www.digitalmars.com/d/1.0/type.html "Usual Arithmetic
 Conversions"
 the compiler is behaving correctly.

point 4.4 in docs - "The signed type is converted to the unsigned type." 
this is just not good for most common binary operators, it might be 
useful for &, | and maybe shift, but they are quite less common....

Jun 29 2010

Michal Minich <michal.minich gmail.com> writes:

There is very long discussion on digitamars.D ng "Is there ANY chance we 
can fix the bitwise operator precedence rules?" which I should probably 
read first...but was there some conclusion ?

Jun 29 2010

bearophile <bearophileHUGS lycos.com> writes:

Stewart Gordon:

Sorry for the late reply, I was quite busy. Thank you for your comments, even
if I don't agree with some of them :-)


If it's logical and the program works, it isn't objectively wrong.<

Right. But bug-prone means that often enough people write code that doesn't
work.


Some of us prefer to use unsigned types where the value is semantically
unsigned, and know what we're doing.  So any measures to stronghold programmers
against using them are going to be a nuisance.<

I have not asked to remove the unsigned types, so you can relax. And replacing
lengths/indexes with signed values isn't a way to forbid you to use unsigned
values in your programs, it's right the opposite: it's a way to not force me
(and many other programmers that want to write simple D non-system programs) to
use unsigned values in my code.

D (and all other languages beside ASM) try to push programmers toward safer
ways to write code, even types can be seen as restrictions, but a wise
programmer knows they are there to help the creation of less buggy programs,
etc.


 I can also imagine promoting your mindset leading to edit wars between
 developers declaring an int and then putting
      assert (qwert >= 0);
 in the class invariant, and those who see this and think it's brain-damaged.

This is quite interesting. You think that using an unsigned type in D is today
the same thing than using a signed value + an assert of it not being negative?
In the beginning, when I was used to Delphi programming I have done the same,
but I have soon found out that was unsafe. Today D unsigned values don't give
you a nice overflow error (as I have asked Walter many times) when you try to
assign them a number outside their range, they happily wrap around, this causes
bugs in programs. So using an unsigned number to denote a value that can't be
negative is dangerous and it can be stupid too. In D you need to take a signed
value from outside and then assign it to a unsigned value only after you have
tested it to be nonnegative.


 True, but that doesn't mean that we should force programmers to use
 signed values for nearly everything.

D wants to be a system language, and I presume system programmers are able to

I presume most usages of D will be of this kind. And in my experience there is
a good number of 'application programmers' that have problems with unsigned
numbers. Length and array indexes are not something that is used by system
programmers only (as the opBinary operator overloading) they are things used
often in any kind of programs, even small ones, so making them unsigned will be
a trap for many programmers.

I don't care if you use unsigned values in your programs, and I don't want to
force you to use signed values in your programs, but I want to be able to avoid
unsigned values when I write small non-system D programs, because they
introduce complexities and bugs that I can live without.


 But it is all the more reason to fix unsigned op signed to be signed, if
 it is to be allowed at all.  The way it is at the moment, a single
 unsigned value in a formula can force the whole result to be unsigned,
 thereby leading to unexpected results.

I think Walter will not change this, because this way D syntax equal to C
syntax does things different from C (there are few exceptions to this D rule,
like fixed-sized arrays are passed by value in D and by pointer in C).

So given that this will not change, other solutions need to be found. I have
suggested two solutions, that can be used at the same time:

two separate switches can be useful, one for signed overflows and one for
unsigned overflows);
- and removing a very common source of unsigned values in simple D programs
(length/indexes).


 You could make a similar argument the same about integer types
 generally.  People coming from BASIC backgrounds, or new to programming
 generally, are sooner or later going to have some work to do when they
 find that 1/4 != 0.25.  

Some languages are indeed able to represent fractions natively, like Scheme. A
"good" high-level language, designed for humans and not for CPUs deserves to
act more correctly.

So I agree that's a possible source of problems for newbies. But having just
one possible source of "problems" is better than having two possible sources of
problems :-)

And in my experience, while somewhat more experienced programmers are quickly
able to cope with the lack of native fractions in a language (and I prefer to
have two operators to perform divisions, like / and div in Delphi and / and //
in Python3, to denote float or integer divisions), they keep having bugs caused
by unsigned values combined with C conversion rules. So I think unsigned values
cause worse troubles.


 Add to that the surprise that is silent overflow....

Adding optional runtime integral overflows in D is something that I really
want. My experience with Delphi has shown me many times they are able to catch

developers are right on this.


Interfacing file formats.  Simplifying certain conditional expressions. Making
code self-documenting.  Maybe others....<

Simplifying certain conditional expressions with unsigned values is cool, but
you want to do it only in performance-critical spots of your programs, because
they can be tricky and in every other part of your program they are bug-prone
premature optimization :-)

Regarding the self-documenting of unsigned values, I have explained that this
is true in a language that actually enforces their unsigned nature, but in D
they are just traps :-) In a language like Ada you can actually do what you
mean, and denote their non-negative nature, this is an example:

http://ideone.com/ViiOB

with Ada.Integer_Text_Io, Ada.Text_Io;
use Ada.Integer_Text_Io, Ada.Text_Io;
procedure Test is
   subtype Small is Integer range 0..99;
   Input : Small;
begin
 
   loop
      Get(Input);
      if Input = 42 then
         exit;
         Else 
         Put (Input);  new_line;
      end if;
   end loop;
 
end;



The Small type is user-defined and it can't be negative (or more than 99, in
Ada ranges are closed on the right), so if you try to assign 100 or a negative
value (as in that example), you receive a run-time error like:

raised CONSTRAINT_ERROR : prog.adb:9 range check failed

This is the right way to enforce a nonnegtive number. In D I will try to create
a ranged integer (with run-time overflow errors), and if you don't like to use
similar ranged values, then it's better to add things like that assert(qwert >=
0); to your class invariant.

Bye,
bearophile

Jul 04 2010

Stewart Gordon <smjg_1998 yahoo.com> writes:

bearophile wrote:
 Stewart Gordon:

<snip>
 Some of us prefer to use unsigned types where the value is 
 semantically unsigned, and know what we're doing. So any measures 
 to stronghold programmers against using them are going to be a 
 nuisance.

 
 I have not asked to remove the unsigned types, so you can relax. And 
 replacing lengths/indexes with signed values isn't a way to forbid 
 you to use unsigned values in your programs, it's right the opposite: 
 it's a way to not force me (and many other programmers that want to 
 write simple D non-system programs) to use unsigned values in my 
 code.

I didn't think you were asking for unsigned types to be removed.  My 
point was that having language features and APIs relying on signed types 
for semantically unsigned values is not just a way of not forcing you to 
use unsigned types - it's also potentially a way of forcing you not to 
use them, or to pepper your code with casts if you do.  I guess you just 
can't please everybody.

<snip>
 I can also imagine promoting your mindset leading to edit wars 
 between developers declaring an int and then putting
      assert (qwert >= 0);
 in the class invariant, and those who see this and think it's 
 brain-damaged.

 
 This is quite interesting. You think that using an unsigned type in D 
 is today the same thing than using a signed value + an assert of it 
 not being negative?

Not quite - an unsigned type has twice the range.  It's true that this 
extra range isn't always used, but in some apps/APIs there may be bits 
that use the extra range and bits that don't, and it is often simpler to 
use unsigned everywhere it's logical than to expect the user to remember 
which is which.

 In the beginning, when I was used to Delphi programming I have done 
 the same, but I have soon found out that was unsafe. Today D unsigned 
 values don't give you a nice overflow error (as I have asked Walter 
 many times) when you try to assign them a number outside their range, 
 they happily wrap around, this causes bugs in programs.

Trouble is that it would add a lot of code to every integer arithmetic 
operation.  Of course, it could be omitted in release builds, but 
arithmetic is so frequent an activity that the extent it would slow down 
and bloat development builds would be annoying.

 So using an unsigned number to denote a value that can't be negative 
 is dangerous and it can be stupid too. In D you need to take a signed 
 value from outside and then assign it to a unsigned value only after 
 you have tested it to be nonnegative.

Why can't I read a 32-bit unsigned integer from a binary file, or use 
such functions as std.conv.toUint (whereby '-' is just another illegal 
character)?

 But it is all the more reason to fix unsigned op signed to be 
 signed, if it is to be allowed at all.  The way it is at the 
 moment, a single unsigned value in a formula can force the whole 
 result to be unsigned, thereby leading to unexpected results.

 
 I think Walter will not change this, because this way D syntax equal 
 to C syntax does things different from C

But D isn't designed to be fully source-compatible with C, hence the 
suggestion of making it illegal.

 (there are few exceptions to this D rule, like fixed-sized arrays are 
 passed by value in D and by pointer in C).

<snip>

Indeed, the "looks like C, acts like C" principle isn't consistently 
applied.  For instance, in switch, we have:
- a case (no pun intended) of it being applied, even though there's no 
real reason for D to allow the code (fall through)
- a case of it being breached (SwitchDefault error).

Stewart.

Jul 05 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 07/05/2010 07:59 AM, Stewart Gordon wrote:
 bearophile wrote:
 Stewart Gordon:
 I can also imagine promoting your mindset leading to edit wars
 between developers declaring an int and then putting
 assert (qwert >= 0);
 in the class invariant, and those who see this and think it's
 brain-damaged.



As opposed to doing what?

 This is quite interesting. You think that using an unsigned type in D
 is today the same thing than using a signed value + an assert of it
 not being negative?

 Not quite - an unsigned type has twice the range. It's true that this
 extra range isn't always used, but in some apps/APIs there may be bits
 that use the extra range and bits that don't, and it is often simpler to
 use unsigned everywhere it's logical than to expect the user to remember
 which is which.

Another important difference is the point of non-'continuity'

with a signed integer, that point is *.max/min. Assuming typical usage 
of integers centers around zero, this point doesn't get hit frequently.

with an unsigned integer, that point is 0. Assuming the same, this point 
gets hit much more frequently, which has important implications for 
subtraction and comparison.

Jul 05 2010

Stewart Gordon <smjg_1998 yahoo.com> writes:

Ellery Newcomer wrote:
 On 07/05/2010 07:59 AM, Stewart Gordon wrote:
 bearophile wrote:
 Stewart Gordon:
 I can also imagine promoting your mindset leading to edit wars
 between developers declaring an int and then putting
 assert (qwert >= 0);
 in the class invariant, and those who see this and think it's
 brain-damaged.



 
 As opposed to doing what?

Just using uint, of course!

<snip>
 Another important difference is the point of non-'continuity'
 
 with a signed integer, that point is *.max/min. Assuming typical usage 
 of integers centers around zero, this point doesn't get hit frequently.
 
 with an unsigned integer, that point is 0. Assuming the same, this point 
 gets hit much more frequently, which has important implications for 
 subtraction and comparison.

Subtraction - yes, obviously.

Comparison - how do you mean?

Stewart.

Jul 06 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 07/06/2010 07:05 PM, Stewart Gordon wrote:
 Ellery Newcomer wrote:
 On 07/05/2010 07:59 AM, Stewart Gordon wrote:
 bearophile wrote:
 Stewart Gordon:
 I can also imagine promoting your mindset leading to edit wars
 between developers declaring an int and then putting
 assert (qwert >= 0);
 in the class invariant, and those who see this and think it's
 brain-damaged.



 As opposed to doing what?

 Just using uint, of course!

For enforcing a non-negative constraint, that is brain damaged. 
Semantically, the two are very different.

int i;
assert(i >= 0);

says i can cross the 0 boundary, but it's an error if it does, i.e. 
programmer doesn't need to be perfect because it *does get caught* 
(extreme instances notwithstanding).

uint i;

says i cannot cross the 0 boundary, but it isn't an error if it does. 
programmer needs to be perfect and error doesn't get caught (unless what 
you're using it for can do appropriate bounds checking).

 Comparison - how do you mean?

 Stewart.

Mmmph. Just signed/unsigned, I guess (I was thinking foggily that 
comparison intrinsically involves subtraction or something like that)

Jul 06 2010

Stewart Gordon <smjg_1998 yahoo.com> writes:

Ellery Newcomer wrote:
 On 07/06/2010 07:05 PM, Stewart Gordon wrote:

<snip>
 Just using uint, of course!

 
 For enforcing a non-negative constraint, that is brain damaged. 
 Semantically, the two are very different.

So effectively, the edit wars would be between people thinking at cross 
purposes.

I guess it would be interesting to see how many libraries are using 
unsigned types wherever the value is semantically unsigned, and how many 
are using signed types for such values (maybe with a few exceptions when 
there's a specific reason).

 int i;
 assert(i >= 0);
 
 says i can cross the 0 boundary, but it's an error if it does, i.e. 
 programmer doesn't need to be perfect because it *does get caught* 
 (extreme instances notwithstanding).

Or equivalently,

uint i;
assert (i <= cast(uint) int.max);

 uint i;
 
 says i cannot cross the 0 boundary, but it isn't an error if it does. 
 programmer needs to be perfect and error doesn't get caught (unless what 
 you're using it for can do appropriate bounds checking).

Or the wrapping round is an intended feature of what you're using it for.

 Comparison - how do you mean?

 Stewart.

 
 Mmmph. Just signed/unsigned, I guess (I was thinking foggily that 
 comparison intrinsically involves subtraction or something like that)

But whether subtraction for comparison works doesn't depend on whether 
the legal ranges of the source values are signed or unsigned, at least 
as long as they're both the same.

What it does depend on is whether the subtraction is performed in more 
bits than the number required to represent the legal range.

Stewart.

Jul 08 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Stewart Gordon wrote:
 Ellery Newcomer wrote:
 On 07/06/2010 07:05 PM, Stewart Gordon wrote:

 <snip>
 Just using uint, of course!

 For enforcing a non-negative constraint, that is brain damaged.
 Semantically, the two are very different.

=20
 So effectively, the edit wars would be between people thinking at cross=

 purposes.
=20
 I guess it would be interesting to see how many libraries are using
 unsigned types wherever the value is semantically unsigned, and how man=

y
 are using signed types for such values (maybe with a few exceptions whe=

n
 there's a specific reason).
=20

	I used to use unsigned types wherever the value is semantically
unsigned, but I am in the process of changing to signed everywhere
possible because of the brain dead way mixed operations are handled
(in C, but D would be the same).

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Jul 08 2010

bearophile <bearophileHUGS lycos.com> writes:

Stewart Gordon:

 having language features and APIs relying on signed types
 for semantically unsigned values is not just a way of not forcing you to
 use unsigned types - it's also potentially a way of forcing you not to
 use them, or to pepper your code with casts if you do.  I guess you just
 can't please everybody.

It's better to limit the usage of unsigned values in APIs too. While you can't
change the API of existing C libs (that can use unsigned values too) we can
create an ecosystem of D modules that use unsigned values only when they are
necessary (this means only in uncommon cases). This allows you to use few



 Not quite - an unsigned type has twice the range.  It's true that this
 extra range isn't always used,

This happens and it's one of the usage examples of unsigned values, but this
usage case requires some conditions:
- The max value of the signed value for example 127, 32767, 2147483647 or
9223372036854775807, is not big enough, this happens.
- The range of the unsigned is surely big enough in your code. This happens,
but sometimes it also happens that what overflows a signed value also overflows
the unsigned one, it's just one bit more.
- You can't use a bigger value (and cent/ucent are not available yet in D).
This is less common. Some APIs give you a value of a certain size/type and you
can't change it, but in many other situations you can use for example a long
where a int isn't enough. There are other situations where this is bad (for
example you have a large array of such values in a performance-critical spot of
your program, so using long instead of uint doubles the array size and
increases the cache pressure), or you really have a compute-bound spot in your
program, where doing lot of operations on  unsigned values give you better
performance compared to using longs, but this is not a so common situation.
There are many situations where using an uint instead of a long is premature
optimization :-)


 but in some apps/APIs there may be bits
 that use the extra range and bits that don't, and it is often simpler to
 use unsigned everywhere it's logical than to expect the user to remember
 which is which.

The D language has to give you the tools to use messy APIs too, but it's better
to teach D programmers to create less messy D APIs, allowing usage of only or
mostly signed values, etc. The array lengths and indexes are a good spot to
start improving the APIs of the language.


 Trouble is that it would add a lot of code to every integer arithmetic
 operation.

This is a quantitative discussion, in theory this feature can be implemented
and then we can measure how many bytes are added to the binaries of a certain

shows that for me this cost is tolerable (I have tried it in small and medium
size programs, for years), both in compile time, run time and binary size
(compile time is about the same. Run-time performance is worsened usually less
than the array bound tests done by D!).

I have filed some related bugs for LLVM:
http://llvm.org/bugs/show_bug.cgi?id=4916
http://llvm.org/bugs/show_bug.cgi?id=4917
http://llvm.org/bugs/show_bug.cgi?id=4918
Those are enhancement proposals that ask to the LLVM backend to produce optimal
asm when it is fed with C code that tests for specific overflows. They show
that such overflow tests can add several instructions for each overflow test if
they are done through normal C/D code.
But that overhead can be reduced to about 3 instructions (one of them is a jump
that usually is not taken, so the code execution goes forward straight, so on
modern CPUs this jump costs very little) if the compiler implements them more
directly, from some little 'templates' written by a human (and in several cases
the compiler can omit such tests, for example for loop variables, simple
operations with enums, etc, reducing both code size and performance loss). To
test overflows of bytes/shorts/ubytes/ushorts it's needed a little more code,
because the CPU flags don't help you much on this.
And in my programs often operations are done among floating point values, that
have no overflow tests, so they incur in no speed loss or code size increase.


 Of course, it could be omitted in release builds, but
 arithmetic is so frequent an activity that the extent it would slow down
 and bloat development builds would be annoying.

The overflow tests I am talking about are optional, if you don't want them you
can disable them even in development builds. If you don't want them you don't
have to pay for them and you can ignore them. They don't even slow down
compilation (unless by design during compilation they are always switched on to
watch for overflows among the values known compile-time).


 But D isn't designed to be fully source-compatible with C, hence the
 suggestion of making it illegal.

From what Andrei and Walter have said this will not happen (maybe because the
language and Phobos force to use too many unsigned values, and we are back to
my original idea), so different solutions are needed.

-----------------

Some numbers, using GCC 4.5 and FreePascal 2.4 (fp), on Windows Vista.

Key:
  C = C code stripped and max optimized.
  fp = FreePascal code max optimized.
  fp+r = FreePascal code max optimized + range tests + overflow tests.

Benchmarks:

nbody:
   binary size, bytes:
      C:     11_776
      fp:   127_336
      fp+r: 127_464
   runtime, N=1_000_000, seconds:
      C:    0.56
      fp:   0.65
      fp+r: 0.66

old fannkuch:
   binary size, bytes:
      C:    12_288
      fp:   66_011
      fp+r: 66_651
   runtime, N=11, seconds:
      C:    4.97
      fp:   5.14
      fp+r: 9.54

old mandelbrot:
   binary size, bytes:
      C:    11_264
      fp:   66_661
      fp+r: 66_723
   runtime, size=3000, seconds:
      C:      1.88
      fp:    10.35
      fp+r:  11.89

fasta:
   binary size, bytes:
      C:    13_312
      fp:   73_748
      fp+r: 74_658
   runtime, N=5_000_000, seconds:
      C:     2.09
      fp:    2.06
      fp+r:  2.06

recursive:
   binary size, bytes:
      C:    18_944
      fp:   72_980
      fp+r: 73_042
   runtime, N=5_000_000, seconds:
      C:     4.04
      fp:   11.88
      fp+r: 11.90



nbody is heavy FP. fannkuch is mostly about small array of integers with some
integer operations. mandelbrot is FP-heavy but contains bit-twiddling too.
fasta contains arrays and integer operations. recursive contains both FP and
integer-based operations.

Those are small programs, both the size and performance doesn't change a lot.
But better benchmarks are needed.

Bye,
bearophile

Jul 05 2010

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Mixing operations with signed and unsigned types