www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Integer overflow and underflow semantics

reply =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
Hi,

I don't think the language really makes it clear whether overflows and 
underflows are well-defined. Do we guarantee that for any integral type 
T, T.max + 1 == T.min and T.min - 1 == T.max?

This is relevant in particular for GDC and LDC since they target a lot 
of weird architectures.

-- 
- Alex
May 04 2012
next sibling parent reply "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Saturday, 5 May 2012 at 04:57:48 UTC, Alex Rønne Petersen 
wrote:
 Hi,

 I don't think the language really makes it clear whether 
 overflows and underflows are well-defined. Do we guarantee that 
 for any integral type T, T.max + 1 == T.min and T.min - 1 == 
 T.max?

 This is relevant in particular for GDC and LDC since they 
 target a lot of weird architectures.
Any systems that implement a carry flag likely works exactly like that. Carry flag is set or cleared after a math operation, allowing you to extend the size of your integer any level you want, like in oh the 6502? Might look something like this: Been a while but you should get the idea clc ;clear carry loop_here: mov ax, [bx] adc [dx], ax inc bx inc dx dec cx cmp cx, 0 jne loop_here ;if carry after this point then... jc overflow_warning
May 04 2012
next sibling parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <xtzgzorex gmail.com> writes:
On 05-05-2012 07:32, Era Scarecrow wrote:
 On Saturday, 5 May 2012 at 04:57:48 UTC, Alex Rønne Petersen wrote:
 Hi,

 I don't think the language really makes it clear whether overflows and
 underflows are well-defined. Do we guarantee that for any integral
 type T, T.max + 1 == T.min and T.min - 1 == T.max?

 This is relevant in particular for GDC and LDC since they target a lot
 of weird architectures.
Any systems that implement a carry flag likely works exactly like that. Carry flag is set or cleared after a math operation, allowing you to extend the size of your integer any level you want, like in oh the 6502? Might look something like this: Been a while but you should get the idea clc ;clear carry loop_here: mov ax, [bx] adc [dx], ax inc bx inc dx dec cx cmp cx, 0 jne loop_here ;if carry after this point then... jc overflow_warning
Right, but the question was whether the language guarantees what I described. C and C++ don't, and IMO, we should strive to fix that. -- - Alex
May 05 2012
parent reply "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Saturday, 5 May 2012 at 07:10:28 UTC, Alex Rønne Petersen 
wrote:

 Right, but the question was whether the language guarantees 
 what I described. C and C++ don't, and IMO, we should strive to 
 fix that.
I can't see why it wouldn't, unless the compiler adds in checks and changes it's behavior or the assembler does it's own quirky magic. The bit patterns of how they end up are pretty fixed, it's just how we interpret them. I know before when i needed to know if it overflowed and simulated the carry flag, i ended up using a larger type of int. (This was for a custom multiple precision unsigned ints) Of course now that I think about it, if anything ends up using MMX for it's registers, the rules do change a little. MMX don't overflow if I remember right, they just cap/truncate to the Max/Min. Since it's intended more for multimedia... If the compiler uses those, I can't tell you what would happen.
May 05 2012
parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <xtzgzorex gmail.com> writes:
On 05-05-2012 10:23, Era Scarecrow wrote:
 On Saturday, 5 May 2012 at 07:10:28 UTC, Alex Rønne Petersen wrote:

 Right, but the question was whether the language guarantees what I
 described. C and C++ don't, and IMO, we should strive to fix that.
I can't see why it wouldn't, unless the compiler adds in checks and changes it's behavior or the assembler does it's own quirky magic. The bit patterns of how they end up are pretty fixed, it's just how we interpret them.
It all depends. GCC (and thus GDC) can target very exotic architectures where any assumptions may not, for whatever reason, hold true. This is a language design issue more than it's a "how does architecture X or compiler Y work" issue. An interesting problem with undefined behavior for integer overflow and underflow in C/C++ is that optimizers are basically free to do anything with regards to them, and often do whatever is more convenient for them.
 I know before when i needed to know if it overflowed and simulated the
 carry flag, i ended up using a larger type of int. (This was for a
 custom multiple precision unsigned ints)



 Of course now that I think about it, if anything ends up using MMX for
 it's registers, the rules do change a little. MMX don't overflow if I
 remember right, they just cap/truncate to the Max/Min. Since it's
 intended more for multimedia... If the compiler uses those, I can't tell
 you what would happen.
Right, but it's not so much about what *would* happen as much as it is about what *should* happen. ;) -- - Alex
May 05 2012
parent reply Manu <turkeyman gmail.com> writes:
On 5 May 2012 11:42, Alex R=C3=B8nne Petersen <xtzgzorex gmail.com> wrote:

 On 05-05-2012 10:23, Era Scarecrow wrote:

 On Saturday, 5 May 2012 at 07:10:28 UTC, Alex R=C3=B8nne Petersen wrote:

  Right, but the question was whether the language guarantees what I
 described. C and C++ don't, and IMO, we should strive to fix that.
I can't see why it wouldn't, unless the compiler adds in checks and changes it's behavior or the assembler does it's own quirky magic. The bit patterns of how they end up are pretty fixed, it's just how we interpret them.
It all depends. GCC (and thus GDC) can target very exotic architectures where any assumptions may not, for whatever reason, hold true. This is a language design issue more than it's a "how does architecture X or compil=
er
 Y work" issue.

 An interesting problem with undefined behavior for integer overflow and
 underflow in C/C++ is that optimizers are basically free to do anything
 with regards to them, and often do whatever is more convenient for them.
With regard to code-gen on such colourful architectures, would stating a defined behaviour for overflow/underflow affect the common case where an over/underflow did not occur? Short of an architecture that causes hardware exception on over/underflow, I suspect that it would interfere with the common case (additional code generated around every add/sub/etc to handle the overflow behaviour), and on such an architecture, every single numerical integer operation would become inefficient. I believe this is why C doesn't define the behaviour, because C is still effectively a macro language, and shouldn't produce 'unexpected' inefficient code. ('unexpected' from the perspective of the architecture you're targeting) I would personally rather see it remain undefined like C, but with convention approved by common/conventional architectures where cross platform porting is likely to occur.
May 05 2012
parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <xtzgzorex gmail.com> writes:
On 05-05-2012 12:25, Manu wrote:
 On 5 May 2012 11:42, Alex Rønne Petersen <xtzgzorex gmail.com
 <mailto:xtzgzorex gmail.com>> wrote:

     On 05-05-2012 10 <tel:05-05-2012%2010>:23, Era Scarecrow wrote:

         On Saturday, 5 May 2012 at 07:10:28 UTC, Alex Rønne Petersen wrote:

             Right, but the question was whether the language guarantees
             what I
             described. C and C++ don't, and IMO, we should strive to fix
             that.


         I can't see why it wouldn't, unless the compiler adds in checks and
         changes it's behavior or the assembler does it's own quirky
         magic. The
         bit patterns of how they end up are pretty fixed, it's just how we
         interpret them.


     It all depends. GCC (and thus GDC) can target very exotic
     architectures where any assumptions may not, for whatever reason,
     hold true. This is a language design issue more than it's a "how
     does architecture X or compiler Y work" issue.

     An interesting problem with undefined behavior for integer overflow
     and underflow in C/C++ is that optimizers are basically free to do
     anything with regards to them, and often do whatever is more
     convenient for them.


 With regard to code-gen on such colourful architectures, would stating a
 defined behaviour for overflow/underflow affect the common case where an
 over/underflow did not occur?
 Short of an architecture that causes hardware exception on
 over/underflow, I suspect that it would interfere with the common case
 (additional code generated around every add/sub/etc to handle the
 overflow behaviour), and on such an architecture, every single numerical
 integer operation would become inefficient.
I don't know of a single architecture (out of the ones on dlang.org/version.html and many others) that doesn't signal overflow/underflow somehow (or obey the rules I described in the OP).
 I believe this is why C doesn't define the behaviour, because C is still
 effectively a macro language, and shouldn't produce 'unexpected'
 inefficient code. ('unexpected' from the perspective of the architecture
 you're targeting)
Right, but C was designed many, many years ago. :)
 I would personally rather see it remain undefined like C, but with
 convention approved by common/conventional architectures where cross
 platform porting is likely to occur.
Do a grep for "\.min" and "\.max" in Phobos and you'll notice a lot of places actually depend on the current x86 behavior (wrapping on overflow and underflow). I don't think we can afford to make the same mistake C and C++ did. While I did say that this is a language design issue, it's also a matter of pragmatism - does any real world architecture that D could possibly target actually *not* obey the aforementioned rules? I don't know of any that doesn't. -- - Alex
May 05 2012
prev sibling parent reply "Mehrdad" <wfunction hotmail.com> writes:
On Saturday, 5 May 2012 at 05:32:14 UTC, Era Scarecrow wrote:

  Any systems that implement a carry flag likely works exactly 
 like that.
Yes, but that misses the actual problem! The problem isn't the _system_, it's the _compiler_. If the language says overflow or underflow are undefined behavior, then what the system does _doesn't matter one bit_, because the compiler can optimize away the code entirely: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
May 06 2012
parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Sunday, 6 May 2012 at 16:39:02 UTC, Mehrdad wrote:
 Yes, but that misses the actual problem!

 The problem isn't the _system_, it's the _compiler_.

 If the language says overflow or underflow are undefined 
 behavior, then what the system does _doesn't matter one bit_, 
 because the compiler can optimize away the code entirely:
I know; as long as the number is within bounds it won't have any issues. hopefully a simple suggestion would be to add a compiler flag to check all results against overflow/underflow, of add/subtract/multiply; and should it happen it throws an exception; but asserts can likely do the same job if you have a critical section you needed checked. Under this case if you needed specific behavior you could use the asm{} blocks, but as a whole I don't know. :( I know I brought something similar to this before walter regarding the carry bit and overflow and if you could have finer control over it; but that ended up being rejected under the same idea that if you really needed that control you can use the asm{} block (I think, it's been a year or two I think). It would be tedious to add checks after every instruction, so the compiler would need to do it if you wanted those checks. Either a flag would need to be set after a certain point in the module that says 'act this way with math operations involving over/underflow', or it is forced inside certain blocks as a new try/catch, except... that might be trysys { a = b + c; } catch (Carry c) { //fix or whatever //since we know it's a carry flag we don't need c //so maybe catchsys(carry) instead?? } Which might take an exception/like approach. But that seems doubtful, since very likely the cases where it's needed is very very very rare and refers back to asm or manual checks. a = b + c; if (a < b+c) { //overflow? //fails if MMX } or if (a - b != c) { //overflow? } Mmmm... Unfortunately I only have a few ideas and nothing concrete.
May 06 2012
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Alex Rønne Petersen:

 I don't think the language really makes it clear whether 
 overflows and underflows are well-defined. Do we guarantee that 
 for any integral type T, T.max + 1 == T.min and T.min - 1 == 
 T.max?

 This is relevant in particular for GDC and LDC since they 
 target a lot of weird architectures.
In a good system language I'd like to see something better than programmer the choice of 3 or 4 different semantics in integral operations: 1) A shared standard semantics that overflows, as in Java; 2) A semantics that overflows, that adapts to the fastest available on the CPU, as in C; 3) Shared standard overflows with unsigned values and run-time errors when a signed value overflows (or goes out of its range). 4) Run-time errors when every signed or unsigned value overflows (or goes out of its range), as in Ada. Bye, bearophile
May 05 2012
parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <xtzgzorex gmail.com> writes:
On 05-05-2012 13:06, bearophile wrote:
 Alex Rønne Petersen:

 I don't think the language really makes it clear whether overflows and
 underflows are well-defined. Do we guarantee that for any integral
 type T, T.max + 1 == T.min and T.min - 1 == T.max?

 This is relevant in particular for GDC and LDC since they target a lot
 of weird architectures.
In a good system language I'd like to see something better than what's choice of 3 or 4 different semantics in integral operations: 1) A shared standard semantics that overflows, as in Java; 2) A semantics that overflows, that adapts to the fastest available on the CPU, as in C; 3) Shared standard overflows with unsigned values and run-time errors when a signed value overflows (or goes out of its range). 4) Run-time errors when every signed or unsigned value overflows (or goes out of its range), as in Ada. Bye, bearophile
Good point. The wrapping rule described in my OP should be the default IMO. Then perhaps the other modes could be activated with an attribute of sorts? -- - Alex
May 05 2012
prev sibling next sibling parent reply =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
On 05-05-2012 06:57, Alex Rnne Petersen wrote:
 Hi,

 I don't think the language really makes it clear whether overflows and
 underflows are well-defined. Do we guarantee that for any integral type
 T, T.max + 1 == T.min and T.min - 1 == T.max?

 This is relevant in particular for GDC and LDC since they target a lot
 of weird architectures.
Can anyone give a definitive answer to this or at least confirm that it is an open issue? -- - Alex
May 06 2012
parent reply "Paul D. Anderson" <paul.d.removethis.anderson comcast.andthis.net> writes:
On Sunday, 6 May 2012 at 16:27:34 UTC, Alex Rønne Petersen wrote:
 On 05-05-2012 06:57, Alex Rønne Petersen wrote:
 Hi,

 I don't think the language really makes it clear whether 
 overflows and
 underflows are well-defined. Do we guarantee that for any 
 integral type
 T, T.max + 1 == T.min and T.min - 1 == T.max?

 This is relevant in particular for GDC and LDC since they 
 target a lot
 of weird architectures.
Can anyone give a definitive answer to this or at least confirm that it is an open issue?
I don't have the reference at the moment but the C99(?) standard requires wraparound behavior by UNSIGNED integer values. I don't know if there is an equivalent requirement for signed values.
May 18 2012
parent reply "Paul D. Anderson" <paul.d.removethis.anderson comcast.andthis.net> writes:
On Friday, 18 May 2012 at 19:59:01 UTC, Paul D. Anderson wrote:
 On Sunday, 6 May 2012 at 16:27:34 UTC, Alex Rønne Petersen 
 wrote:
 On 05-05-2012 06:57, Alex Rønne Petersen wrote:
 Hi,

 I don't think the language really makes it clear whether 
 overflows and
 underflows are well-defined. Do we guarantee that for any 
 integral type
 T, T.max + 1 == T.min and T.min - 1 == T.max?

 This is relevant in particular for GDC and LDC since they 
 target a lot
 of weird architectures.
Can anyone give a definitive answer to this or at least confirm that it is an open issue?
I don't have the reference at the moment but the C99(?) standard requires wraparound behavior by UNSIGNED integer values. I don't know if there is an equivalent requirement for signed values.
Sorry, it's C++: 3.9.1/4 Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer. Paul
May 18 2012
parent "renoX" <renozyx gmail.com> writes:
On Friday, 18 May 2012 at 20:01:15 UTC, Paul D. Anderson wrote:
 On Friday, 18 May 2012 at 19:59:01 UTC, Paul D. Anderson wrote:
 On Sunday, 6 May 2012 at 16:27:34 UTC, Alex Rønne Petersen 
 wrote:
 On 05-05-2012 06:57, Alex Rønne Petersen wrote:
 Hi,

 I don't think the language really makes it clear whether 
 overflows and
 underflows are well-defined. Do we guarantee that for any 
 integral type
 T, T.max + 1 == T.min and T.min - 1 == T.max?

 This is relevant in particular for GDC and LDC since they 
 target a lot
 of weird architectures.
Can anyone give a definitive answer to this or at least confirm that it is an open issue?
I don't have the reference at the moment but the C99(?) standard requires wraparound behavior by UNSIGNED integer values. I don't know if there is an equivalent requirement for signed values.
Sorry, it's C++:
It's both C and C++ for unsigned integer. Signed is undefined, unsigned is 'wraparound'. Both sucks for the default behaviour IMHO : premature optimisation.. renoX
May 21 2012
prev sibling next sibling parent reply Don Clugston <dac nospam.com> writes:
On 05/05/12 06:57, Alex Rnne Petersen wrote:
 Hi,

 I don't think the language really makes it clear whether overflows and
 underflows are well-defined. Do we guarantee that for any integral type
 T, T.max + 1 == T.min and T.min - 1 == T.max?

 This is relevant in particular for GDC and LDC since they target a lot
 of weird architectures.
I think the reason that C makes no guarantees, was because of ones-complement machines (which had become very rare even by 1970). Surely we can assume 2-s complement behaviour?
May 07 2012
next sibling parent =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
On 07-05-2012 10:05, Don Clugston wrote:
 On 05/05/12 06:57, Alex Rnne Petersen wrote:
 Hi,

 I don't think the language really makes it clear whether overflows and
 underflows are well-defined. Do we guarantee that for any integral type
 T, T.max + 1 == T.min and T.min - 1 == T.max?

 This is relevant in particular for GDC and LDC since they target a lot
 of weird architectures.
I think the reason that C makes no guarantees, was because of ones-complement machines (which had become very rare even by 1970). Surely we can assume 2-s complement behaviour?
That's what I'm thinking. I can't even name a one's complement machine still in use today. -- - Alex
May 07 2012
prev sibling parent "renoX" <renozyx gmail.com> writes:
On Monday, 7 May 2012 at 08:06:01 UTC, Don Clugston wrote:
 I think the reason that C makes no guarantees, was because of 
 ones-complement machines
Maybe but that's not the full reason, as Mehrdad says the compiler can use the fact that overflow are undefined to optimise the code generated. In my opinion, there are only two valid options for the integer operation's overflow by default: either the compiler should make sure that the overflow is noticed (Ada: exception, adding 'Not-a-Number' values would work too) or C's signed undefined behaviour. Modulo operation (except where really useful) are bad: slower than the 'undefined' semantic, yet it still produce obviously incorrect behaviour such as negative absolute value..
May 09 2012
prev sibling parent reply =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
On 05-05-2012 06:57, Alex Rnne Petersen wrote:
 Hi,

 I don't think the language really makes it clear whether overflows and
 underflows are well-defined. Do we guarantee that for any integral type
 T, T.max + 1 == T.min and T.min - 1 == T.max?

 This is relevant in particular for GDC and LDC since they target a lot
 of weird architectures.
Bumping this as we still need to make a decision about this. As recently as yesterday, someone on the GCC mailing list posted a complaint about an optimization pass that assumed undefined semantics for overflow. We need to have a stance about this, since GDC is going into mainline GCC soon. -- - Alex
May 14 2012
parent reply "akaz" <nemo utopia.com> writes:
 Bumping this as we still need to make a decision about this. As 
 recently as yesterday, someone on the GCC mailing list posted a 
 complaint about an optimization pass that assumed undefined 
 semantics for overflow. We need to have a stance about this, 
 since GDC is going into mainline GCC soon.
Just jumping into the bandwagon with several info: http://en.wikipedia.org/wiki/Therac Therac25 was a medicale machine that injured several people because: "When input parameters are unverified or inconsistent, the treatment monitor task periodically runs a procedure that increments a counter This counter is used as a flag by the housekeeping task, indicating whether gun firing should be enabled or not However, as the counter is only 8 bits, it will overflow every 256 ticks, and the “flag” will temporarily indicate a zero condition! If the “set” command is given at that instant, inconsistencies are not checked, and unshielded high- energy radiation may result" The case is known in the real-time operating systems programming. Does D throw an exception when an integral type (signed or unsigned) underflows or overflows? I am for defining this as the implicit behavior. Using a counter in the cyclical mode should be rather be explicitly invoked.
May 18 2012
next sibling parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 18-05-2012 15:26, akaz wrote:
 Bumping this as we still need to make a decision about this. As
 recently as yesterday, someone on the GCC mailing list posted a
 complaint about an optimization pass that assumed undefined semantics
 for overflow. We need to have a stance about this, since GDC is going
 into mainline GCC soon.
Just jumping into the bandwagon with several info: http://en.wikipedia.org/wiki/Therac Therac25 was a medicale machine that injured several people because: "When input parameters are unverified or inconsistent, the treatment monitor task periodically runs a procedure that increments a counter This counter is used as a flag by the housekeeping task, indicating whether gun firing should be enabled or not However, as the counter is only 8 bits, it will overflow every 256 ticks, and the “flag” will temporarily indicate a zero condition! If the “set” command is given at that instant, inconsistencies are not checked, and unshielded high- energy radiation may result" The case is known in the real-time operating systems programming. Does D throw an exception when an integral type (signed or unsigned) underflows or overflows? I am for defining this as the implicit behavior. Using a counter in the cyclical mode should be rather be explicitly invoked.
I think this counts as sloppy programming if anything. I agree that throwing an exception could be a good feature to have, but it should *not* be the default. I want my systems code to run at full speed when I know what I'm doing. -- Alex Rønne Petersen alex lycus.org http://lycus.org
May 18 2012
parent reply "akaz" <nemo utopia.com> writes:
 I agree that throwing an exception could be a good feature to 
 have, but it should *not* be the default. I want my systems 
 code to run at full speed when I know what I'm doing.
In the release version, yes. In the release version, array bound checking is disabled, too. But in the debug version?
May 18 2012
parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 18-05-2012 19:28, akaz wrote:
 I agree that throwing an exception could be a good feature to have,
 but it should *not* be the default. I want my systems code to run at
 full speed when I know what I'm doing.
In the release version, yes. In the release version, array bound checking is disabled, too. But in the debug version?
It should be controllable with a command line option just as -noboundscheck. I'll revise what I said: It can be on in debug for all I care, as long as I can disable it independently of what mode I'm compiling in. -- Alex Rønne Petersen alex lycus.org http://lycus.org
May 18 2012
prev sibling parent Andrew Wiley <wiley.andrew.j gmail.com> writes:
On Fri, May 18, 2012 at 6:26 AM, akaz <nemo utopia.com> wrote:

 Bumping this as we still need to make a decision about this. As recently
 as yesterday, someone on the GCC mailing list posted a complaint about a=
n
 optimization pass that assumed undefined semantics for overflow. We need=
to
 have a stance about this, since GDC is going into mainline GCC soon.
Just jumping into the bandwagon with several info: http://en.wikipedia.org/wiki/**Therac<http://en.wikipedia.org/wiki/Therac= Therac25 was a medicale machine that injured several people because: "When input parameters are unverified or inconsistent, the treatment monitor task periodically runs a procedure that increments a counter This counter is used as a flag by the housekeeping task, indicating whether gun firing should be enabled or not However, as the counter is only 8 bits, it will overflow every 256 ticks, and the =93flag=94 will temporarily indicate a zero condition! If the =93set=94 command is given at that instant, inconsistencies are not checked, and unshielded high- energy radiation may result" The case is known in the real-time operating systems programming. Does D throw an exception when an integral type (signed or unsigned) underflows or overflows? I am for defining this as the implicit behavior. Using a counter in the cyclical mode should be rather be explicitly invok=
ed.

Massive industrial systems run on code written in systems languages that
dismissed this behavior as unacceptably slow years ago. That one programmer
was incrementing a counter when he should have been storing a nonzero value
instead isn't really relevant to this discussion.
May 18 2012