digitalmars.D - Portability bug in integral conversion

Andrei Alexandrescu (28/28) Jan 15 2011 We've spent a lot of time trying to improve the behavior of integral

Graham St Jack (7/37) Jan 16 2011 It seems to me that the real problem here is that it isn't meaningful to...

bearophile (4/8) Jan 16 2011 I'm asking for signed and unsigned overflows for years :-)

Andrei Alexandrescu (3/11) Jan 16 2011 Nagonna happen.

Andrei Alexandrescu (3/46) Jan 16 2011 That's too inefficient.

Graham St Jack (15/61) Jan 16 2011 If that is the case, then a static check like you are suggesting seems

Andrei Alexandrescu (4/65) Jan 16 2011 That would require flow analysis. I'm not sure we want to embark on that...

Graham St Jack (18/92) Jan 16 2011 My fear is that if a cast is always required, people will just put one

Andrei Alexandrescu (9/103) Jan 16 2011 I don't think it's the same. A cast's target will document the behavior....

Jonathan M Davis (10/14) Jan 16 2011 Well, since it would mean checking a condition every time that you did

Graham St Jack (8/21) Jan 16 2011 Yes, I agree that checking all the time would be too expensive. What I

bearophile (8/11) Jan 17 2011 I agree that other solutions have to be adopted first, runtime tests are...

Walter Bright (6/15) Jan 17 2011 Look at the asm dump of a function. It's full of add's - not only ADD

Jonathan M Davis (3/20) Jan 17 2011 I think that you'd fare better with the cat. :)
bearophile (4/7) Jan 17 2011 This answer is a bit relevant only if the programmer is using inline asm...

Walter Bright (3/14) Jan 17 2011 A lot of the addition is also carried out at link time, and even by the ...

bearophile (4/19) Jan 17 2011 The back-end carries out my D operations using unsigned differences on C...

so (5/13) Jan 17 2011 Wouldn't this be just pushing a design error one step further?

so (1/6) Jan 17 2011 Oh didn't see Don's reply.

Bruno Medeiros (5/51) Feb 04 2011 Really? :/ Even if the runtime error can be optionally disabled on

Walter Bright (3/7) Jan 17 2011 1. Yes it is meaningful - depending on what you're doing.

Don (9/45) Jan 17 2011 This is a new example of an old issue; it is in no way specific to 64 bi...

Andrei Alexandrescu (9/18) Jan 17 2011 That doesn't compile. This does:

Don (33/59) Jan 17 2011 Aargh, that should have been:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

We've spent a lot of time trying to improve the behavior of integral 
types in D. For the most part, we succeeded, but the success was 
partial. There was some hope with the polysemy notion, but it ultimately 
was abandoned because it was deemed too difficult to implement for its 
benefits, which were considered solving a minor annoyance. I was sorry 
to see it go, and I'm glad that now its day of reckoning has come.

Some of the 32-64 portability bugs have come in the following form:

char * p;
uint a, b;
...
p += a - b;

On 32 bits, the code works even if a < b: the difference will become a 
large unsigned number, which is then converted to a size_t (which is a 
no-op since size_t is uint) and added to p. The pointer itself is a 
32-bit quantity. Due to two's complement properties, the addition has 
the same result regardless of the signedness of its operands.

On 64-bits, the same code has different behavior. The difference a - b 
becomes a large unsigned number (say e.g. 4 billion), which is then 
converted to a 64-bit size_t. After conversion the sign is not extended 
- so we end up with the number 4 billion on 64-bit. That is added to a 
64-bit pointer yielding an incorrect value. For the wraparound to work, 
the 32-bit uint should have been sign-extended to 64 bit.

To fix this problem, one possibility is to mark statically every result 
of one of uint-uint, uint+int, uint-int as "non-extensible", i.e. as 
impossible to implicitly extend to a 64-bit value. That would force the 
user to insert a cast appropriately.

Thoughts? Ideas?


Andrei

Jan 15 2011

Graham St Jack <Graham.StJack internode.on.net> writes:

On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral 
 types in D. For the most part, we succeeded, but the success was 
 partial. There was some hope with the polysemy notion, but it 
 ultimately was abandoned because it was deemed too difficult to 
 implement for its benefits, which were considered solving a minor 
 annoyance. I was sorry to see it go, and I'm glad that now its day of 
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will become a 
 large unsigned number, which is then converted to a size_t (which is a 
 no-op since size_t is uint) and added to p. The pointer itself is a 
 32-bit quantity. Due to two's complement properties, the addition has 
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a - b 
 becomes a large unsigned number (say e.g. 4 billion), which is then 
 converted to a 64-bit size_t. After conversion the sign is not 
 extended - so we end up with the number 4 billion on 64-bit. That is 
 added to a 64-bit pointer yielding an incorrect value. For the 
 wraparound to work, the 32-bit uint should have been sign-extended to 
 64 bit.

 To fix this problem, one possibility is to mark statically every 
 result of one of uint-uint, uint+int, uint-int as "non-extensible", 
 i.e. as impossible to implicitly extend to a 64-bit value. That would 
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei

It seems to me that the real problem here is that it isn't meaningful to 
perform (a-b) on unsigned integers when (a<b). Attempting to clean up 
the resultant mess is really papering over the problem. How about a 
runtime error instead, much like dividing by 0?


-- 
Graham St Jack

Jan 16 2011

bearophile <bearophileHUGS lycos.com> writes:

Graham St Jack:

 It seems to me that the real problem here is that it isn't meaningful to 
 perform (a-b) on unsigned integers when (a<b). Attempting to clean up 
 the resultant mess is really papering over the problem. How about a 
 runtime error instead, much like dividing by 0?

I'm asking for signed and unsigned overflows for years :-)

Bye,
bearophile

Jan 16 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/16/11 5:53 PM, bearophile wrote:
 Graham St Jack:

 It seems to me that the real problem here is that it isn't meaningful to
 perform (a-b) on unsigned integers when (a<b). Attempting to clean up
 the resultant mess is really papering over the problem. How about a
 runtime error instead, much like dividing by 0?

 I'm asking for signed and unsigned overflows for years :-)

 Bye,
 bearophile

Nagonna happen.

Andrei

Jan 16 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/16/11 5:24 PM, Graham St Jack wrote:
 On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral
 types in D. For the most part, we succeeded, but the success was
 partial. There was some hope with the polysemy notion, but it
 ultimately was abandoned because it was deemed too difficult to
 implement for its benefits, which were considered solving a minor
 annoyance. I was sorry to see it go, and I'm glad that now its day of
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will become a
 large unsigned number, which is then converted to a size_t (which is a
 no-op since size_t is uint) and added to p. The pointer itself is a
 32-bit quantity. Due to two's complement properties, the addition has
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a - b
 becomes a large unsigned number (say e.g. 4 billion), which is then
 converted to a 64-bit size_t. After conversion the sign is not
 extended - so we end up with the number 4 billion on 64-bit. That is
 added to a 64-bit pointer yielding an incorrect value. For the
 wraparound to work, the 32-bit uint should have been sign-extended to
 64 bit.

 To fix this problem, one possibility is to mark statically every
 result of one of uint-uint, uint+int, uint-int as "non-extensible",
 i.e. as impossible to implicitly extend to a 64-bit value. That would
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei

 It seems to me that the real problem here is that it isn't meaningful to
 perform (a-b) on unsigned integers when (a<b). Attempting to clean up
 the resultant mess is really papering over the problem. How about a
 runtime error instead, much like dividing by 0?

That's too inefficient.

Andrei

Jan 16 2011

Graham St Jack <Graham.StJack internode.on.net> writes:

On 17/01/11 10:39, Andrei Alexandrescu wrote:
 On 1/16/11 5:24 PM, Graham St Jack wrote:
 On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral
 types in D. For the most part, we succeeded, but the success was
 partial. There was some hope with the polysemy notion, but it
 ultimately was abandoned because it was deemed too difficult to
 implement for its benefits, which were considered solving a minor
 annoyance. I was sorry to see it go, and I'm glad that now its day of
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will become a
 large unsigned number, which is then converted to a size_t (which is a
 no-op since size_t is uint) and added to p. The pointer itself is a
 32-bit quantity. Due to two's complement properties, the addition has
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a - b
 becomes a large unsigned number (say e.g. 4 billion), which is then
 converted to a 64-bit size_t. After conversion the sign is not
 extended - so we end up with the number 4 billion on 64-bit. That is
 added to a 64-bit pointer yielding an incorrect value. For the
 wraparound to work, the 32-bit uint should have been sign-extended to
 64 bit.

 To fix this problem, one possibility is to mark statically every
 result of one of uint-uint, uint+int, uint-int as "non-extensible",
 i.e. as impossible to implicitly extend to a 64-bit value. That would
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei

 It seems to me that the real problem here is that it isn't meaningful to
 perform (a-b) on unsigned integers when (a<b). Attempting to clean up
 the resultant mess is really papering over the problem. How about a
 runtime error instead, much like dividing by 0?

 That's too inefficient.

 Andrei

If that is the case, then a static check like you are suggesting seems 
like a good way to go. Sure it will be annoying, but it will pick up a 
lot of bugs.

This particular problem is one that bights me from time to time because 
I tend to use uints wherever it isn't meaningful to have negative 
values. It is great until I need to do a subtraction, when I sometimes 
forget to check which is greater. Would the check you have in mind 
statically check the following as ok?

where a and b are uints and ptr is a pointer:

if (a > b) {
     ptr += (a-b);
}


-- 
Graham St Jack

Jan 16 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/16/11 7:51 PM, Graham St Jack wrote:
 On 17/01/11 10:39, Andrei Alexandrescu wrote:
 On 1/16/11 5:24 PM, Graham St Jack wrote:
 On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral
 types in D. For the most part, we succeeded, but the success was
 partial. There was some hope with the polysemy notion, but it
 ultimately was abandoned because it was deemed too difficult to
 implement for its benefits, which were considered solving a minor
 annoyance. I was sorry to see it go, and I'm glad that now its day of
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will become a
 large unsigned number, which is then converted to a size_t (which is a
 no-op since size_t is uint) and added to p. The pointer itself is a
 32-bit quantity. Due to two's complement properties, the addition has
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a - b
 becomes a large unsigned number (say e.g. 4 billion), which is then
 converted to a 64-bit size_t. After conversion the sign is not
 extended - so we end up with the number 4 billion on 64-bit. That is
 added to a 64-bit pointer yielding an incorrect value. For the
 wraparound to work, the 32-bit uint should have been sign-extended to
 64 bit.

 To fix this problem, one possibility is to mark statically every
 result of one of uint-uint, uint+int, uint-int as "non-extensible",
 i.e. as impossible to implicitly extend to a 64-bit value. That would
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei

 It seems to me that the real problem here is that it isn't meaningful to
 perform (a-b) on unsigned integers when (a<b). Attempting to clean up
 the resultant mess is really papering over the problem. How about a
 runtime error instead, much like dividing by 0?

 That's too inefficient.

 Andrei

 If that is the case, then a static check like you are suggesting seems
 like a good way to go. Sure it will be annoying, but it will pick up a
 lot of bugs.

 This particular problem is one that bights me from time to time because
 I tend to use uints wherever it isn't meaningful to have negative
 values. It is great until I need to do a subtraction, when I sometimes
 forget to check which is greater. Would the check you have in mind
 statically check the following as ok?

 where a and b are uints and ptr is a pointer:

 if (a > b) {
 ptr += (a-b);
 }

That would require flow analysis. I'm not sure we want to embark on that 
ship. In certain situations value range propagation could take care of it.

Andrei

Jan 16 2011

Graham St Jack <Graham.StJack internode.on.net> writes:

On 17/01/11 13:30, Andrei Alexandrescu wrote:
 On 1/16/11 7:51 PM, Graham St Jack wrote:
 On 17/01/11 10:39, Andrei Alexandrescu wrote:
 On 1/16/11 5:24 PM, Graham St Jack wrote:
 On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral
 types in D. For the most part, we succeeded, but the success was
 partial. There was some hope with the polysemy notion, but it
 ultimately was abandoned because it was deemed too difficult to
 implement for its benefits, which were considered solving a minor
 annoyance. I was sorry to see it go, and I'm glad that now its day of
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will 
 become a
 large unsigned number, which is then converted to a size_t (which 
 is a
 no-op since size_t is uint) and added to p. The pointer itself is a
 32-bit quantity. Due to two's complement properties, the addition has
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a 
 - b
 becomes a large unsigned number (say e.g. 4 billion), which is then
 converted to a 64-bit size_t. After conversion the sign is not
 extended - so we end up with the number 4 billion on 64-bit. That is
 added to a 64-bit pointer yielding an incorrect value. For the
 wraparound to work, the 32-bit uint should have been sign-extended to
 64 bit.

 To fix this problem, one possibility is to mark statically every
 result of one of uint-uint, uint+int, uint-int as "non-extensible",
 i.e. as impossible to implicitly extend to a 64-bit value. That would
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei

 It seems to me that the real problem here is that it isn't 
 meaningful to
 perform (a-b) on unsigned integers when (a<b). Attempting to clean up
 the resultant mess is really papering over the problem. How about a
 runtime error instead, much like dividing by 0?

 That's too inefficient.

 Andrei

 If that is the case, then a static check like you are suggesting seems
 like a good way to go. Sure it will be annoying, but it will pick up a
 lot of bugs.

 This particular problem is one that bights me from time to time because
 I tend to use uints wherever it isn't meaningful to have negative
 values. It is great until I need to do a subtraction, when I sometimes
 forget to check which is greater. Would the check you have in mind
 statically check the following as ok?

 where a and b are uints and ptr is a pointer:

 if (a > b) {
 ptr += (a-b);
 }

 That would require flow analysis. I'm not sure we want to embark on 
 that ship. In certain situations value range propagation could take 
 care of it.

 Andrei

My fear is that if a cast is always required, people will just put one 
in out of habit and we are no better off (just like exception-swallowing).

Is the cost of run-time checking really prohibitive? Correct code should 
have some checking anyway. Maybe providing phobos functions to perform 
various correct-usage operations with run-time checks like in my code 
fragment above would by useful. They could do the cast, and most of the 
annoyance factor would be dealt with. A trivial example:

int difference(uint a, uint b) {
   if (a >= b) {
     return cast(int) a-b;
   }
   else {
     return -(cast(int) b-a);
   }
}

-- 
Graham St Jack

Jan 16 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/16/11 9:32 PM, Graham St Jack wrote:
 On 17/01/11 13:30, Andrei Alexandrescu wrote:
 On 1/16/11 7:51 PM, Graham St Jack wrote:
 On 17/01/11 10:39, Andrei Alexandrescu wrote:
 On 1/16/11 5:24 PM, Graham St Jack wrote:
 On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral
 types in D. For the most part, we succeeded, but the success was
 partial. There was some hope with the polysemy notion, but it
 ultimately was abandoned because it was deemed too difficult to
 implement for its benefits, which were considered solving a minor
 annoyance. I was sorry to see it go, and I'm glad that now its day of
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will
 become a
 large unsigned number, which is then converted to a size_t (which
 is a
 no-op since size_t is uint) and added to p. The pointer itself is a
 32-bit quantity. Due to two's complement properties, the addition has
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a
 - b
 becomes a large unsigned number (say e.g. 4 billion), which is then
 converted to a 64-bit size_t. After conversion the sign is not
 extended - so we end up with the number 4 billion on 64-bit. That is
 added to a 64-bit pointer yielding an incorrect value. For the
 wraparound to work, the 32-bit uint should have been sign-extended to
 64 bit.

 To fix this problem, one possibility is to mark statically every
 result of one of uint-uint, uint+int, uint-int as "non-extensible",
 i.e. as impossible to implicitly extend to a 64-bit value. That would
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei

 It seems to me that the real problem here is that it isn't
 meaningful to
 perform (a-b) on unsigned integers when (a<b). Attempting to clean up
 the resultant mess is really papering over the problem. How about a
 runtime error instead, much like dividing by 0?

 That's too inefficient.

 Andrei

 If that is the case, then a static check like you are suggesting seems
 like a good way to go. Sure it will be annoying, but it will pick up a
 lot of bugs.

 This particular problem is one that bights me from time to time because
 I tend to use uints wherever it isn't meaningful to have negative
 values. It is great until I need to do a subtraction, when I sometimes
 forget to check which is greater. Would the check you have in mind
 statically check the following as ok?

 where a and b are uints and ptr is a pointer:

 if (a > b) {
 ptr += (a-b);
 }

 That would require flow analysis. I'm not sure we want to embark on
 that ship. In certain situations value range propagation could take
 care of it.

 Andrei

 My fear is that if a cast is always required, people will just put one
 in out of habit and we are no better off (just like exception-swallowing).

I don't think it's the same. A cast's target will document the behavior. 
Right now we're simply doing silently the patently wrong thing. Walter 
stared at that code for hours. A cast would definitely be a good clue 
even if wrong.

 Is the cost of run-time checking really prohibitive?

Yes. There is no question about that. This is not negotiable.

 Correct code should
 have some checking anyway. Maybe providing phobos functions to perform
 various correct-usage operations with run-time checks like in my code
 fragment above would by useful. They could do the cast, and most of the
 annoyance factor would be dealt with. A trivial example:

 int difference(uint a, uint b) {
 if (a >= b) {
 return cast(int) a-b;
 }
 else {
 return -(cast(int) b-a);
 }
 }

The general approach is to define properly bounded types with 
policy-based checking.


Andrei

Jan 16 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday 16 January 2011 19:38:55 Andrei Alexandrescu wrote:
 On 1/16/11 9:32 PM, Graham St Jack wrote:
 Is the cost of run-time checking really prohibitive?

 
 Yes. There is no question about that. This is not negotiable.

Well, since it would mean checking a condition every time that you did 
arithmetic, that would likely _at least_ double the cost of doing any 
arithmetic. And particularly since arithmetic is such a basic operation that 
_everything else_ relies on, that could get really expensive, really fast.

Yeah. I don't think that that's negotiable. Absolutely best case, I could see 
adding a compiler flag to enable it for debugging purposes, but it would 
definitely be expensive to do such checks and would be totally unacceptable in 
the release build of a systems programming language.

- Jonathan M Davis

Jan 16 2011

Graham St Jack <Graham.StJack internode.on.net> writes:

On 17/01/11 14:16, Jonathan M Davis wrote:
 On Sunday 16 January 2011 19:38:55 Andrei Alexandrescu wrote:
 On 1/16/11 9:32 PM, Graham St Jack wrote:
 Is the cost of run-time checking really prohibitive?

 Yes. There is no question about that. This is not negotiable.

 Well, since it would mean checking a condition every time that you did
 arithmetic, that would likely _at least_ double the cost of doing any
 arithmetic. And particularly since arithmetic is such a basic operation that
 _everything else_ relies on, that could get really expensive, really fast.

 Yeah. I don't think that that's negotiable. Absolutely best case, I could see
 adding a compiler flag to enable it for debugging purposes, but it would
 definitely be expensive to do such checks and would be totally unacceptable in
 the release build of a systems programming language.

 - Jonathan M Davis

Yes, I agree that checking all the time would be too expensive. What I 
meant was that we could provide functions that could do appropriate 
checking when it is needed.

Andrei didn't like the functions idea, suggesting types that do 
policy-based checking, which I am happy with.

-- 
Graham St Jack

Jan 16 2011

bearophile <bearophileHUGS lycos.com> writes:

Graham St Jack:

 Yes, I agree that checking all the time would be too expensive.

I agree that other solutions have to be adopted first, runtime tests are the
last thing to try. But I think Andrei doesn't know how much expensive that
would be.

---------------------

Walter:

 1. Yes it is meaningful - depending on what you're doing.

I am not sure.


 2. Such a runtime test is expensive in terms of performance and code bloat.

I have not seen even synthetic benchmarks about this.

Bye,
bearophile

Jan 17 2011

Walter Bright <newshound2 digitalmars.com> writes:

bearophile wrote:
 1. Yes it is meaningful - depending on what you're doing.

 
 I am not sure.
 
 
 2. Such a runtime test is expensive in terms of performance and code bloat.
 

 
 I have not seen even synthetic benchmarks about this.

Look at the asm dump of a function. It's full of add's - not only ADD
instructions, but addressing mode multiplies and add's. Subtraction is often
expressed in terms of addition, relying on twos-complement wraparound.

Trying to remove twos-complement arithmetic from a systems language is like 
trying to teach your cat to fetch.

Jan 17 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday 17 January 2011 01:32:39 Walter Bright wrote:
 bearophile wrote:
 1. Yes it is meaningful - depending on what you're doing.

 
 I am not sure.
 
 2. Such a runtime test is expensive in terms of performance and code
 bloat.

 
 I have not seen even synthetic benchmarks about this.

 
 Look at the asm dump of a function. It's full of add's - not only ADD
 instructions, but addressing mode multiplies and add's. Subtraction is
 often expressed in terms of addition, relying on twos-complement
 wraparound.
 
 Trying to remove twos-complement arithmetic from a systems language is like
 trying to teach your cat to fetch.

I think that you'd fare better with the cat. :)

- Jonathan M Davis

Jan 17 2011

bearophile <bearophileHUGS lycos.com> writes:

Walter:

 Look at the asm dump of a function. It's full of add's - not only ADD
 instructions, but addressing mode multiplies and add's. Subtraction is often
 expressed in terms of addition, relying on twos-complement wraparound.

This answer is a bit relevant only if the programmer is using inline asm, while
the discussion was about unsigned differences in D code, that are uncommon in
my D code. Sometimes I even assign lengths to signed-word variables, to avoid
some signed/unsigned comparison bugs.

Bye,
bearophile

Jan 17 2011

Walter Bright <newshound2 digitalmars.com> writes:

bearophile wrote:
 Walter:
 
 Look at the asm dump of a function. It's full of add's - not only ADD 
 instructions, but addressing mode multiplies and add's. Subtraction is
 often expressed in terms of addition, relying on twos-complement
 wraparound.

 
 This answer is a bit relevant only if the programmer is using inline asm,
 while the discussion was about unsigned differences in D code, that are
 uncommon in my D code. Sometimes I even assign lengths to signed-word
 variables, to avoid some signed/unsigned comparison bugs.

A lot of the addition is also carried out at link time, and even by the loader. 
Subtraction is done by relying on overflow.

Jan 17 2011

bearophile <bearophileHUGS lycos.com> writes:

Walter:

 bearophile wrote:
 Walter:
 
 Look at the asm dump of a function. It's full of add's - not only ADD 
 instructions, but addressing mode multiplies and add's. Subtraction is
 often expressed in terms of addition, relying on twos-complement
 wraparound.

 
 This answer is a bit relevant only if the programmer is using inline asm,
 while the discussion was about unsigned differences in D code, that are
 uncommon in my D code. Sometimes I even assign lengths to signed-word
 variables, to avoid some signed/unsigned comparison bugs.

 
 A lot of the addition is also carried out at link time, and even by the
loader. 
 Subtraction is done by relying on overflow.

The back-end carries out my D operations using unsigned differences on CPU
registers, the linker has to use them, etc. But the discussion was about
explicit operations done by the D code written by the programmer. Modular
arithmetic done by unsigned fixed bitfields is mathematically sound, but it's a
bit too much bug-prone for normal Safe D modules :-)

Bye,
bearophile

Jan 17 2011

so <so so.do> writes:

 int difference(uint a, uint b) {
    if (a >= b) {
      return cast(int) a-b;
    }
    else {
      return -(cast(int) b-a);
    }
 }

Wouldn't this be just pushing a design error one step further?
uint has no mathematical basis whatsoever, it is there because we "can"  
have it.
I have another solution, remove "uint-uint" from the language and provide  
explicit functions.

Jan 17 2011

so <so so.do> writes:

 Wouldn't this be just pushing a design error one step further?
 uint has no mathematical basis whatsoever, it is there because we "can"  
 have it.
 I have another solution, remove "uint-uint" from the language and  
 provide explicit functions.

Oh didn't see Don's reply.

Jan 17 2011

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 17/01/2011 00:09, Andrei Alexandrescu wrote:
 On 1/16/11 5:24 PM, Graham St Jack wrote:
 On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral
 types in D. For the most part, we succeeded, but the success was
 partial. There was some hope with the polysemy notion, but it
 ultimately was abandoned because it was deemed too difficult to
 implement for its benefits, which were considered solving a minor
 annoyance. I was sorry to see it go, and I'm glad that now its day of
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will become a
 large unsigned number, which is then converted to a size_t (which is a
 no-op since size_t is uint) and added to p. The pointer itself is a
 32-bit quantity. Due to two's complement properties, the addition has
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a - b
 becomes a large unsigned number (say e.g. 4 billion), which is then
 converted to a 64-bit size_t. After conversion the sign is not
 extended - so we end up with the number 4 billion on 64-bit. That is
 added to a 64-bit pointer yielding an incorrect value. For the
 wraparound to work, the 32-bit uint should have been sign-extended to
 64 bit.

 To fix this problem, one possibility is to mark statically every
 result of one of uint-uint, uint+int, uint-int as "non-extensible",
 i.e. as impossible to implicitly extend to a 64-bit value. That would
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei

 It seems to me that the real problem here is that it isn't meaningful to
 perform (a-b) on unsigned integers when (a<b). Attempting to clean up
 the resultant mess is really papering over the problem. How about a
 runtime error instead, much like dividing by 0?

 That's too inefficient.

 Andrei

Really? :/ Even if the runtime error can be optionally disabled on 
compilation, like arrays bound checking?


-- 
Bruno Medeiros - Software Engineer

Feb 04 2011

Walter Bright <newshound2 digitalmars.com> writes:

Graham St Jack wrote:
 It seems to me that the real problem here is that it isn't meaningful to 
 perform (a-b) on unsigned integers when (a<b). Attempting to clean up 
 the resultant mess is really papering over the problem. How about a 
 runtime error instead, much like dividing by 0?

1. Yes it is meaningful - depending on what you're doing.

2. Such a runtime test is expensive in terms of performance and code bloat.

Jan 17 2011

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral 
 types in D. For the most part, we succeeded, but the success was 
 partial. There was some hope with the polysemy notion, but it ultimately 
 was abandoned because it was deemed too difficult to implement for its 
 benefits, which were considered solving a minor annoyance. I was sorry 
 to see it go, and I'm glad that now its day of reckoning has come.
 
 Some of the 32-64 portability bugs have come in the following form:
 
 char * p;
 uint a, b;
 ...
 p += a - b;
 
 On 32 bits, the code works even if a < b: the difference will become a 
 large unsigned number, which is then converted to a size_t (which is a 
 no-op since size_t is uint) and added to p. The pointer itself is a 
 32-bit quantity. Due to two's complement properties, the addition has 
 the same result regardless of the signedness of its operands.
 
 On 64-bits, the same code has different behavior. The difference a - b 
 becomes a large unsigned number (say e.g. 4 billion), which is then 
 converted to a 64-bit size_t. After conversion the sign is not extended 
 - so we end up with the number 4 billion on 64-bit. That is added to a 
 64-bit pointer yielding an incorrect value. For the wraparound to work, 
 the 32-bit uint should have been sign-extended to 64 bit.
 
 To fix this problem, one possibility is to mark statically every result 
 of one of uint-uint, uint+int, uint-int as "non-extensible", i.e. as 
 impossible to implicitly extend to a 64-bit value. That would force the 
 user to insert a cast appropriately.
 
 Thoughts? Ideas?
 
 
 Andrei

This is a new example of an old issue; it is in no way specific to 64 bits.
Any expression which contains a size-extension AND a signed<->unsigned 
implicit conversion is almost always a bug. (unsigned - unsigned leaves 
the carry flag unknown, so sign extension is impossible).

It happens a lot with ushort, ubyte. There are several examples of it in 
bugzilla. short a=-1; a = a>>>1; is a particularly horrific example.

I think it should be forbidden in all cases. I think it can be done with 
a flag in the range propagation.

Jan 17 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/17/11 2:47 AM, Don wrote:
 Andrei Alexandrescu wrote:

[snip]
 This is a new example of an old issue; it is in no way specific to 64 bits.
 Any expression which contains a size-extension AND a signed<->unsigned
 implicit conversion is almost always a bug. (unsigned - unsigned leaves
 the carry flag unknown, so sign extension is impossible).

 It happens a lot with ushort, ubyte. There are several examples of it in
 bugzilla. short a=-1; a = a>>>1; is a particularly horrific example.

That doesn't compile. This does:

short a = -1;
a >>>= 1;

a becomes 32767, which didn't surprise me. Replacing >>>= with >>= keeps 
a unchanged, which I also didn't find surprising.

 I think it should be forbidden in all cases. I think it can be done with
 a flag in the range propagation.

Yes, that would be awesome!


Andrei

Jan 17 2011

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 On 1/17/11 2:47 AM, Don wrote:
 Andrei Alexandrescu wrote:

 [snip]
 This is a new example of an old issue; it is in no way specific to 64 
 bits.
 Any expression which contains a size-extension AND a signed<->unsigned
 implicit conversion is almost always a bug. (unsigned - unsigned leaves
 the carry flag unknown, so sign extension is impossible).

 It happens a lot with ushort, ubyte. There are several examples of it in
 bugzilla. short a=-1; a = a>>>1; is a particularly horrific example.

 
 That doesn't compile. This does:
 
 short a = -1;
 a >>>= 1;
 
 a becomes 32767, which didn't surprise me. Replacing >>>= with >>= keeps 
 a unchanged, which I also didn't find surprising.

Aargh, that should have been:
short a = -1;
ushort b = -1;
assert( a == b ); // passes
assert( a >>> 1 == b >>> 1); // fails


Another example:

     uint x = 3;
     uint y = 8;
     ulong z = 0;
     ulong a = (z + x) - y;
     ulong b = z + (x - y);
     assert(a == b); // Thought addition was associative, did you?

'a' only involves size-extension, so it's OK.
But 'b' has a subexpression which sets the carry bit.
Actually it doesn't even need subtraction.

     uint x = uint.max;
     uint y = uint.max;
     ulong z = 0;
     ulong a = (z + x) + y;
     ulong b = z + (x + y);
     assert(a == b); // Still thought addition was associative?

It's the same deal: you shouldn't be able to size-extend, when the state 
of the carry flag is unknown.

Once you have performed an operation which can wrap around, you have 
discarded the carry bit. This means you have made a commitment to 
arithmetic modulo 2^^32.
And then the next addition is arithmetic modulo 2^^64! Which is a 
fundamentally different, incompatible operation. It should be a type 
mismatch.

Note that because small types get promoted to int, the problem mostly 
shows up with uint -> ulong (for smaller types, the carry bit is 
retained inside the int).


 I think it should be forbidden in all cases. I think it can be done with
 a flag in the range propagation.

 
 Yes, that would be awesome!
 
 
 Andrei

Jan 17 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Portability bug in integral conversion