digitalmars.D - Removing undefined behavior of bitshifts

Timon Gehr (14/14) Jun 06 2011 Currently, the behavior of a shift by more than the size in bytes of the

Vladimir Panteleev (7/10) Jun 06 2011 Can you think of any cases where this overflow behavior would be expecte...

Timon Gehr (23/33) Jun 07 2011 My point is not that this behavior is expected and useful. It is what ef...

Vladimir Panteleev (5/7) Jun 06 2011 Also see http://d.puremagic.com/issues/show_bug.cgi?id=4887
s_lange (24/27) Jun 07 2011 Well, you probably mean what x86 processors do.

Don (3/36) Jun 09 2011 Really? Itanium, PowerPC, ARM, 6502, Z80, PIC all have rotate

s_lange (5/12) Jun 11 2011 Sorry, my bad, I should have said that there are only a few RISC-like

Stewart Gordon (12/18) Jun 11 2011 Defining the behaviour to match that of one brand of processor would be ...

Timon Gehr (24/43) Jun 11 2011 Well, not too much. It is the easiest behavior to implement in hardware.

Paul D. Anderson (6/65) Jun 11 2011 FWIW, the General Decimal Arithmetic specification (which I'm getting cl...

Timon Gehr <timon.gehr gmx.ch> writes:

Currently, the behavior of a shift by more than the size in bytes of the
operand is undefined. (Well,  it's an 'error', but unchecked.)

int x=32;
x=1<<x;
int y=1<<32;
assert(x!=y); // yes, this actually passes! (DMD 2.053)

The result is different depending on whether or not it was computed during
CTFE/Constant folding.

I'd much prefer the behavior to be defined as 1<<x; being equivalent to
1<<(0x1f&x); (That's what D effectively does during runtime. It is also what
the machine code supports, at least in x87).

Are there any practical downsides to making the behavior defined? (Except that
the CTFE Code would have to be fixed). I think Java does it too.


Timon

Jun 06 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tue, 07 Jun 2011 02:20:17 +0300, Timon Gehr <timon.gehr gmx.ch> wrote:

 I'd much prefer the behavior to be defined as 1<<x; being equivalent to
 1<<(0x1f&x); (That's what D effectively does during runtime. It is also  
 what the machine code supports, at least in x87).

Can you think of any cases where this overflow behavior would be expected  
and useful? D can't (cheaply) catch runtime instance of this, but at  
compile-time it should definitely be an error.

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Jun 06 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Vladimir Panteleev wrote:

 On Tue, 07 Jun 2011 02:20:17 +0300, Timon Gehr <timon.gehr gmx.ch> wrote:

 I'd much prefer the behavior to be defined as 1<<x; being equivalent to
 1<<(0x1f&x); (That's what D effectively does during runtime. It is also
 what the machine code supports, at least in x87).

 Can you think of any cases where this overflow behavior would be expected
 and useful? D can't (cheaply) catch runtime instance of this, but at
 compile-time it should definitely be an error.

 --
 Best regards,
   Vladimir

My point is not that this behavior is expected and useful. It is what
effectively
happens. And I agree that it might be a good idea to disallow it during compile
time. But the behavior during runtime should be _defined_! Saying "it's an
error,
but it is not catched" is saying "the behavior is undefined." An optimizing
compiler could use this to ruin the memory safety of SafeD:

int[32] x;

 safe int foo(int shamt){
    int a=1<<shamt;
    x[shamt]++;
    return a;
}

What's the problem? Since the first line of foo is in error if shamt>31 or
shamt<0, the compiler can remove the bounds check on the array access. It cannot
screw anything up by doing that. If shamt is not in range, the whole program is
meaningless anyways. The compiler could as well make the program format your
hard
drive whenever an illegal shift amount occurs. ;)

This is a hole in the language specification and needs to be fixed. It is not
about adding useful features that people want to use. It is about removing
undefined behavior while keeping the full performance. (of the shift operation)

BTW: Java handles it the same:



Timon

Jun 07 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tue, 07 Jun 2011 02:20:17 +0300, Timon Gehr <timon.gehr gmx.ch> wrote:

 Currently, the behavior of a shift by more than the size in bytes of the
 operand is undefined. (Well,  it's an 'error', but unchecked.)

Also see http://d.puremagic.com/issues/show_bug.cgi?id=4887

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Jun 06 2011

s_lange <s_lange ira.uka.de> writes:

Am 07.06.2011 01:20, schrieb Timon Gehr:
 I'd much prefer the behavior to be defined as 1<<x; being equivalent to
 1<<(0x1f&x); (That's what D effectively does during runtime. It is also what
 the machine code supports, at least in x87).

Well, you probably mean what x86 processors do.

The behaviour of x86 processors in this regard is (at leas) well 
defined, but not uniform. It all depends whether you use general-purpose 
integer instructions or some 64, 128, or 256-bit SIMD instructions (MMX, 
SSE, AVX).

Int denotes 32-bit integers, operations on differently sized integer are 
analoguous.

For general-purpose integer instructions, the following identities hold 
true:
(l<<x == l<<(x & 0x1F)) and (l>>x == l>>(x & 0x1F)) for any (signed | 
unsigned) int l and x , regardless whether arithmetical or logical right 
shifts are used.

However, for SIMD integer instructions, things are a little different 
and the following identities hold true:
(l<<x == (unsigned int)x >= 0x20 ? 0 : l<<x) for any (signed | unsigned) 
int l and x ,
(l>>x == (unsigned int)x >= 0x20 ? 0 : l>>x) for any (signed | unsigned) 
int x and any unsigned int l (using logical right shifts)
(l>>x == (unsigned int)x >= 0x20 ? -1 : l>>x) for any (signed | 
unsigned) int x and any signed int l (using arithmetic right shifts)

As of yet, there are only general-pupose integer rotate instructions on 
x86 processors, and very few other CPUs and µCs actually implement 
rotate instructions.

Jun 07 2011

Don <nospam nospam.com> writes:

s_lange wrote:
 Am 07.06.2011 01:20, schrieb Timon Gehr:
 I'd much prefer the behavior to be defined as 1<<x; being equivalent to
 1<<(0x1f&x); (That's what D effectively does during runtime. It is 
 also what
 the machine code supports, at least in x87).

 Well, you probably mean what x86 processors do.
 
 The behaviour of x86 processors in this regard is (at leas) well 
 defined, but not uniform. It all depends whether you use general-purpose 
 integer instructions or some 64, 128, or 256-bit SIMD instructions (MMX, 
 SSE, AVX).
 
 Int denotes 32-bit integers, operations on differently sized integer are 
 analoguous.
 
 For general-purpose integer instructions, the following identities hold 
 true:
 (l<<x == l<<(x & 0x1F)) and (l>>x == l>>(x & 0x1F)) for any (signed | 
 unsigned) int l and x , regardless whether arithmetical or logical right 
 shifts are used.
 
 However, for SIMD integer instructions, things are a little different 
 and the following identities hold true:
 (l<<x == (unsigned int)x >= 0x20 ? 0 : l<<x) for any (signed | unsigned) 
 int l and x ,
 (l>>x == (unsigned int)x >= 0x20 ? 0 : l>>x) for any (signed | unsigned) 
 int x and any unsigned int l (using logical right shifts)
 (l>>x == (unsigned int)x >= 0x20 ? -1 : l>>x) for any (signed | 
 unsigned) int x and any signed int l (using arithmetic right shifts)
 
 As of yet, there are only general-pupose integer rotate instructions on 
 x86 processors, and very few other CPUs and µCs actually implement 
 rotate instructions.

Really? Itanium, PowerPC, ARM, 6502, Z80, PIC all have rotate 
instructions. I've never used a processor that didn't.

Jun 09 2011

s_lange <s_lange ira.uka.de> writes:

Am 09.06.2011 09:08, schrieb Don:
 s_lange wrote:
 As of yet, there are only general-pupose integer rotate instructions
 on x86 processors, and very few other CPUs and µCs actually implement
 rotate instructions.

 Really? Itanium, PowerPC, ARM, 6502, Z80, PIC all have rotate
 instructions. I've never used a processor that didn't.

Sorry, my bad, I should have said that there are only a few RISC-like 
processors that don't have rotate instructions: MIPS, SPARC...
Although you are right that many accumulator CPUs and µC do have rotate 
instructions: 8051, 6502, Z80 and PIC

Jun 11 2011

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 07/06/2011 00:20, Timon Gehr wrote:
<snip>
 I'd much prefer the behavior to be defined as 1<<x; being equivalent to
 1<<(0x1f&x); (That's what D effectively does during runtime. It is also what
 the machine code supports, at least in x87).

Defining the behaviour to match that of one brand of processor would be
arbitrary and 
confusing.  Why not define it just to shift by the requested number of bits?

Any extra processor instructions to make it behave correctly for cases where
this number 
= 32 would be the part of the backend code generation.  And if the right
operand is a 

compile-time constant (as it probably is usually), these extra instructions can
be 
eliminated or at least optimised to the particular value.

 Are there any practical downsides to making the behavior defined? (Except that
 the CTFE Code would have to be fixed). I think Java does it too.

Apparently Java shifts are modulo the number of bits in the type of the left
operand.  Or 
something like that.  You'd think it was an oversight in the original
implementation that 
was kept for bug compatibility, but you could well ask how they dealt with
finding the 
behaviour to be machine dependent (contrary to the whole philosophy of Java).

Stewart.

Jun 11 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Timon Gehr wrote:
 On 07/06/2011 00:20, Timon Gehr wrote:
 <snip>
 I'd much prefer the behavior to be defined as 1<<x; being equivalent to
 1<<(0x1f&x); (That's what D effectively does during runtime. It is also what
 the machine code supports, at least in x87).

 Defining the behaviour to match that of one brand of processor would be

arbitrary and
 confusing.

Well, not too much. It is the easiest behavior to implement in hardware.
I thought that it is the most common behavior on different brands of processors,
but I might be mistaken.
Is there any processor that badly needs D support that handles it differently?

 Why not define it just to shift by the requested number of bits?

Because it would turn code that is currently

z = x << y;

to

z = y < (8*sizeof(x)) ? x << y : 0;

On many platforms. Given that you seldom want to shift by a custom amount, and
that it could eliminate some bugs, it might be a reasonable trade-off.


 Any extra processor instructions to make it behave correctly for cases where

this number
  >= 32 would be the part of the backend code generation.  And if the right

operand is a
 compile-time constant (as it probably is usually), these extra instructions
can be
 eliminated or at least optimised to the particular value.

 Are there any practical downsides to making the behavior defined? (Except that
 the CTFE Code would have to be fixed). I think Java does it too.

 Apparently Java shifts are modulo the number of bits in the type of the left

operand.  Or
 something like that.  You'd think it was an oversight in the original

implementation that
 was kept for bug compatibility, but you could well ask how they dealt with

finding the
 behaviour to be machine dependent (contrary to the whole philosophy of Java).

 Stewart.

I don't even care so much about what the result is, but I feel that saying "the
program is in error"/"the behavior is undefined", when actually you'd just get
back some number is not optimal. (it allows the compiler to do anything if that
case occurs)
I would prefer to make the behavior at least implementation-defined (just a
formal
change on the D website) or even defined with some runtime-overhead.


Timon

Jun 11 2011

Paul D. Anderson <paul.d.removethis.anderson comcast.andthis.net> writes:

Timon Gehr Wrote:

 Timon Gehr wrote:
 On 07/06/2011 00:20, Timon Gehr wrote:
 <snip>
 I'd much prefer the behavior to be defined as 1<<x; being equivalent to
 1<<(0x1f&x); (That's what D effectively does during runtime. It is also what
 the machine code supports, at least in x87).

 Defining the behaviour to match that of one brand of processor would be

 arbitrary and
 confusing.

 
 Well, not too much. It is the easiest behavior to implement in hardware.
 I thought that it is the most common behavior on different brands of
processors,
 but I might be mistaken.
 Is there any processor that badly needs D support that handles it differently?
 
 Why not define it just to shift by the requested number of bits?

 Because it would turn code that is currently
 
 z = x << y;
 
 to
 
 z = y < (8*sizeof(x)) ? x << y : 0;
 
 On many platforms. Given that you seldom want to shift by a custom amount, and
 that it could eliminate some bugs, it might be a reasonable trade-off.
 
 
 Any extra processor instructions to make it behave correctly for cases where

 this number
  >= 32 would be the part of the backend code generation.  And if the right

 operand is a
 compile-time constant (as it probably is usually), these extra instructions
can be
 eliminated or at least optimised to the particular value.

 Are there any practical downsides to making the behavior defined? (Except that
 the CTFE Code would have to be fixed). I think Java does it too.

 Apparently Java shifts are modulo the number of bits in the type of the left

 operand.  Or
 something like that.  You'd think it was an oversight in the original

 implementation that
 was kept for bug compatibility, but you could well ask how they dealt with

 finding the
 behaviour to be machine dependent (contrary to the whole philosophy of Java).

 Stewart.

 
 I don't even care so much about what the result is, but I feel that saying "the
 program is in error"/"the behavior is undefined", when actually you'd just get
 back some number is not optimal. (it allows the compiler to do anything if that
 case occurs)
 I would prefer to make the behavior at least implementation-defined (just a
formal
 change on the D website) or even defined with some runtime-overhead.
 
 
 Timon

FWIW, the General Decimal Arithmetic specification (which I'm getting close to
implementing in D: it's always just one week away!) specifies that a shift or
rotate by more than the current precision is disallowed (returns a NaN).

I suppose the clarity on this point is possible because the context, including
the precision, is part of the specification, and the means to indicate failure
is also defined.

Unfortunately no such specification exists for integers (at least none that
doesn't start religious wars).

The clarity is offset, however, by  the requirement to shift by decimal digits,
not bits. Just a heads up that this won't be in the first release.

Paul

Jun 11 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Removing undefined behavior of bitshifts