www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Portability bug in integral conversion

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
We've spent a lot of time trying to improve the behavior of integral 
types in D. For the most part, we succeeded, but the success was 
partial. There was some hope with the polysemy notion, but it ultimately 
was abandoned because it was deemed too difficult to implement for its 
benefits, which were considered solving a minor annoyance. I was sorry 
to see it go, and I'm glad that now its day of reckoning has come.

Some of the 32-64 portability bugs have come in the following form:

char * p;
uint a, b;
...
p += a - b;

On 32 bits, the code works even if a < b: the difference will become a 
large unsigned number, which is then converted to a size_t (which is a 
no-op since size_t is uint) and added to p. The pointer itself is a 
32-bit quantity. Due to two's complement properties, the addition has 
the same result regardless of the signedness of its operands.

On 64-bits, the same code has different behavior. The difference a - b 
becomes a large unsigned number (say e.g. 4 billion), which is then 
converted to a 64-bit size_t. After conversion the sign is not extended 
- so we end up with the number 4 billion on 64-bit. That is added to a 
64-bit pointer yielding an incorrect value. For the wraparound to work, 
the 32-bit uint should have been sign-extended to 64 bit.

To fix this problem, one possibility is to mark statically every result 
of one of uint-uint, uint+int, uint-int as "non-extensible", i.e. as 
impossible to implicitly extend to a 64-bit value. That would force the 
user to insert a cast appropriately.

Thoughts? Ideas?


Andrei
Jan 15 2011
next sibling parent reply Graham St Jack <Graham.StJack internode.on.net> writes:
On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral 
 types in D. For the most part, we succeeded, but the success was 
 partial. There was some hope with the polysemy notion, but it 
 ultimately was abandoned because it was deemed too difficult to 
 implement for its benefits, which were considered solving a minor 
 annoyance. I was sorry to see it go, and I'm glad that now its day of 
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will become a 
 large unsigned number, which is then converted to a size_t (which is a 
 no-op since size_t is uint) and added to p. The pointer itself is a 
 32-bit quantity. Due to two's complement properties, the addition has 
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a - b 
 becomes a large unsigned number (say e.g. 4 billion), which is then 
 converted to a 64-bit size_t. After conversion the sign is not 
 extended - so we end up with the number 4 billion on 64-bit. That is 
 added to a 64-bit pointer yielding an incorrect value. For the 
 wraparound to work, the 32-bit uint should have been sign-extended to 
 64 bit.

 To fix this problem, one possibility is to mark statically every 
 result of one of uint-uint, uint+int, uint-int as "non-extensible", 
 i.e. as impossible to implicitly extend to a 64-bit value. That would 
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei
It seems to me that the real problem here is that it isn't meaningful to perform (a-b) on unsigned integers when (a<b). Attempting to clean up the resultant mess is really papering over the problem. How about a runtime error instead, much like dividing by 0? -- Graham St Jack
Jan 16 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Graham St Jack:

 It seems to me that the real problem here is that it isn't meaningful to 
 perform (a-b) on unsigned integers when (a<b). Attempting to clean up 
 the resultant mess is really papering over the problem. How about a 
 runtime error instead, much like dividing by 0?
I'm asking for signed and unsigned overflows for years :-) Bye, bearophile
Jan 16 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/16/11 5:53 PM, bearophile wrote:
 Graham St Jack:

 It seems to me that the real problem here is that it isn't meaningful to
 perform (a-b) on unsigned integers when (a<b). Attempting to clean up
 the resultant mess is really papering over the problem. How about a
 runtime error instead, much like dividing by 0?
I'm asking for signed and unsigned overflows for years :-) Bye, bearophile
Nagonna happen. Andrei
Jan 16 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/16/11 5:24 PM, Graham St Jack wrote:
 On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral
 types in D. For the most part, we succeeded, but the success was
 partial. There was some hope with the polysemy notion, but it
 ultimately was abandoned because it was deemed too difficult to
 implement for its benefits, which were considered solving a minor
 annoyance. I was sorry to see it go, and I'm glad that now its day of
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will become a
 large unsigned number, which is then converted to a size_t (which is a
 no-op since size_t is uint) and added to p. The pointer itself is a
 32-bit quantity. Due to two's complement properties, the addition has
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a - b
 becomes a large unsigned number (say e.g. 4 billion), which is then
 converted to a 64-bit size_t. After conversion the sign is not
 extended - so we end up with the number 4 billion on 64-bit. That is
 added to a 64-bit pointer yielding an incorrect value. For the
 wraparound to work, the 32-bit uint should have been sign-extended to
 64 bit.

 To fix this problem, one possibility is to mark statically every
 result of one of uint-uint, uint+int, uint-int as "non-extensible",
 i.e. as impossible to implicitly extend to a 64-bit value. That would
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei
It seems to me that the real problem here is that it isn't meaningful to perform (a-b) on unsigned integers when (a<b). Attempting to clean up the resultant mess is really papering over the problem. How about a runtime error instead, much like dividing by 0?
That's too inefficient. Andrei
Jan 16 2011
next sibling parent reply Graham St Jack <Graham.StJack internode.on.net> writes:
On 17/01/11 10:39, Andrei Alexandrescu wrote:
 On 1/16/11 5:24 PM, Graham St Jack wrote:
 On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral
 types in D. For the most part, we succeeded, but the success was
 partial. There was some hope with the polysemy notion, but it
 ultimately was abandoned because it was deemed too difficult to
 implement for its benefits, which were considered solving a minor
 annoyance. I was sorry to see it go, and I'm glad that now its day of
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will become a
 large unsigned number, which is then converted to a size_t (which is a
 no-op since size_t is uint) and added to p. The pointer itself is a
 32-bit quantity. Due to two's complement properties, the addition has
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a - b
 becomes a large unsigned number (say e.g. 4 billion), which is then
 converted to a 64-bit size_t. After conversion the sign is not
 extended - so we end up with the number 4 billion on 64-bit. That is
 added to a 64-bit pointer yielding an incorrect value. For the
 wraparound to work, the 32-bit uint should have been sign-extended to
 64 bit.

 To fix this problem, one possibility is to mark statically every
 result of one of uint-uint, uint+int, uint-int as "non-extensible",
 i.e. as impossible to implicitly extend to a 64-bit value. That would
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei
It seems to me that the real problem here is that it isn't meaningful to perform (a-b) on unsigned integers when (a<b). Attempting to clean up the resultant mess is really papering over the problem. How about a runtime error instead, much like dividing by 0?
That's too inefficient. Andrei
If that is the case, then a static check like you are suggesting seems like a good way to go. Sure it will be annoying, but it will pick up a lot of bugs. This particular problem is one that bights me from time to time because I tend to use uints wherever it isn't meaningful to have negative values. It is great until I need to do a subtraction, when I sometimes forget to check which is greater. Would the check you have in mind statically check the following as ok? where a and b are uints and ptr is a pointer: if (a > b) { ptr += (a-b); } -- Graham St Jack
Jan 16 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/16/11 7:51 PM, Graham St Jack wrote:
 On 17/01/11 10:39, Andrei Alexandrescu wrote:
 On 1/16/11 5:24 PM, Graham St Jack wrote:
 On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral
 types in D. For the most part, we succeeded, but the success was
 partial. There was some hope with the polysemy notion, but it
 ultimately was abandoned because it was deemed too difficult to
 implement for its benefits, which were considered solving a minor
 annoyance. I was sorry to see it go, and I'm glad that now its day of
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will become a
 large unsigned number, which is then converted to a size_t (which is a
 no-op since size_t is uint) and added to p. The pointer itself is a
 32-bit quantity. Due to two's complement properties, the addition has
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a - b
 becomes a large unsigned number (say e.g. 4 billion), which is then
 converted to a 64-bit size_t. After conversion the sign is not
 extended - so we end up with the number 4 billion on 64-bit. That is
 added to a 64-bit pointer yielding an incorrect value. For the
 wraparound to work, the 32-bit uint should have been sign-extended to
 64 bit.

 To fix this problem, one possibility is to mark statically every
 result of one of uint-uint, uint+int, uint-int as "non-extensible",
 i.e. as impossible to implicitly extend to a 64-bit value. That would
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei
It seems to me that the real problem here is that it isn't meaningful to perform (a-b) on unsigned integers when (a<b). Attempting to clean up the resultant mess is really papering over the problem. How about a runtime error instead, much like dividing by 0?
That's too inefficient. Andrei
If that is the case, then a static check like you are suggesting seems like a good way to go. Sure it will be annoying, but it will pick up a lot of bugs. This particular problem is one that bights me from time to time because I tend to use uints wherever it isn't meaningful to have negative values. It is great until I need to do a subtraction, when I sometimes forget to check which is greater. Would the check you have in mind statically check the following as ok? where a and b are uints and ptr is a pointer: if (a > b) { ptr += (a-b); }
That would require flow analysis. I'm not sure we want to embark on that ship. In certain situations value range propagation could take care of it. Andrei
Jan 16 2011
parent reply Graham St Jack <Graham.StJack internode.on.net> writes:
On 17/01/11 13:30, Andrei Alexandrescu wrote:
 On 1/16/11 7:51 PM, Graham St Jack wrote:
 On 17/01/11 10:39, Andrei Alexandrescu wrote:
 On 1/16/11 5:24 PM, Graham St Jack wrote:
 On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral
 types in D. For the most part, we succeeded, but the success was
 partial. There was some hope with the polysemy notion, but it
 ultimately was abandoned because it was deemed too difficult to
 implement for its benefits, which were considered solving a minor
 annoyance. I was sorry to see it go, and I'm glad that now its day of
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will 
 become a
 large unsigned number, which is then converted to a size_t (which 
 is a
 no-op since size_t is uint) and added to p. The pointer itself is a
 32-bit quantity. Due to two's complement properties, the addition has
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a 
 - b
 becomes a large unsigned number (say e.g. 4 billion), which is then
 converted to a 64-bit size_t. After conversion the sign is not
 extended - so we end up with the number 4 billion on 64-bit. That is
 added to a 64-bit pointer yielding an incorrect value. For the
 wraparound to work, the 32-bit uint should have been sign-extended to
 64 bit.

 To fix this problem, one possibility is to mark statically every
 result of one of uint-uint, uint+int, uint-int as "non-extensible",
 i.e. as impossible to implicitly extend to a 64-bit value. That would
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei
It seems to me that the real problem here is that it isn't meaningful to perform (a-b) on unsigned integers when (a<b). Attempting to clean up the resultant mess is really papering over the problem. How about a runtime error instead, much like dividing by 0?
That's too inefficient. Andrei
If that is the case, then a static check like you are suggesting seems like a good way to go. Sure it will be annoying, but it will pick up a lot of bugs. This particular problem is one that bights me from time to time because I tend to use uints wherever it isn't meaningful to have negative values. It is great until I need to do a subtraction, when I sometimes forget to check which is greater. Would the check you have in mind statically check the following as ok? where a and b are uints and ptr is a pointer: if (a > b) { ptr += (a-b); }
That would require flow analysis. I'm not sure we want to embark on that ship. In certain situations value range propagation could take care of it. Andrei
My fear is that if a cast is always required, people will just put one in out of habit and we are no better off (just like exception-swallowing). Is the cost of run-time checking really prohibitive? Correct code should have some checking anyway. Maybe providing phobos functions to perform various correct-usage operations with run-time checks like in my code fragment above would by useful. They could do the cast, and most of the annoyance factor would be dealt with. A trivial example: int difference(uint a, uint b) { if (a >= b) { return cast(int) a-b; } else { return -(cast(int) b-a); } } -- Graham St Jack
Jan 16 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/16/11 9:32 PM, Graham St Jack wrote:
 On 17/01/11 13:30, Andrei Alexandrescu wrote:
 On 1/16/11 7:51 PM, Graham St Jack wrote:
 On 17/01/11 10:39, Andrei Alexandrescu wrote:
 On 1/16/11 5:24 PM, Graham St Jack wrote:
 On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral
 types in D. For the most part, we succeeded, but the success was
 partial. There was some hope with the polysemy notion, but it
 ultimately was abandoned because it was deemed too difficult to
 implement for its benefits, which were considered solving a minor
 annoyance. I was sorry to see it go, and I'm glad that now its day of
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will
 become a
 large unsigned number, which is then converted to a size_t (which
 is a
 no-op since size_t is uint) and added to p. The pointer itself is a
 32-bit quantity. Due to two's complement properties, the addition has
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a
 - b
 becomes a large unsigned number (say e.g. 4 billion), which is then
 converted to a 64-bit size_t. After conversion the sign is not
 extended - so we end up with the number 4 billion on 64-bit. That is
 added to a 64-bit pointer yielding an incorrect value. For the
 wraparound to work, the 32-bit uint should have been sign-extended to
 64 bit.

 To fix this problem, one possibility is to mark statically every
 result of one of uint-uint, uint+int, uint-int as "non-extensible",
 i.e. as impossible to implicitly extend to a 64-bit value. That would
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei
It seems to me that the real problem here is that it isn't meaningful to perform (a-b) on unsigned integers when (a<b). Attempting to clean up the resultant mess is really papering over the problem. How about a runtime error instead, much like dividing by 0?
That's too inefficient. Andrei
If that is the case, then a static check like you are suggesting seems like a good way to go. Sure it will be annoying, but it will pick up a lot of bugs. This particular problem is one that bights me from time to time because I tend to use uints wherever it isn't meaningful to have negative values. It is great until I need to do a subtraction, when I sometimes forget to check which is greater. Would the check you have in mind statically check the following as ok? where a and b are uints and ptr is a pointer: if (a > b) { ptr += (a-b); }
That would require flow analysis. I'm not sure we want to embark on that ship. In certain situations value range propagation could take care of it. Andrei
My fear is that if a cast is always required, people will just put one in out of habit and we are no better off (just like exception-swallowing).
I don't think it's the same. A cast's target will document the behavior. Right now we're simply doing silently the patently wrong thing. Walter stared at that code for hours. A cast would definitely be a good clue even if wrong.
 Is the cost of run-time checking really prohibitive?
Yes. There is no question about that. This is not negotiable.
 Correct code should
 have some checking anyway. Maybe providing phobos functions to perform
 various correct-usage operations with run-time checks like in my code
 fragment above would by useful. They could do the cast, and most of the
 annoyance factor would be dealt with. A trivial example:

 int difference(uint a, uint b) {
 if (a >= b) {
 return cast(int) a-b;
 }
 else {
 return -(cast(int) b-a);
 }
 }
The general approach is to define properly bounded types with policy-based checking. Andrei
Jan 16 2011
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday 16 January 2011 19:38:55 Andrei Alexandrescu wrote:
 On 1/16/11 9:32 PM, Graham St Jack wrote:
 Is the cost of run-time checking really prohibitive?
Yes. There is no question about that. This is not negotiable.
Well, since it would mean checking a condition every time that you did arithmetic, that would likely _at least_ double the cost of doing any arithmetic. And particularly since arithmetic is such a basic operation that _everything else_ relies on, that could get really expensive, really fast. Yeah. I don't think that that's negotiable. Absolutely best case, I could see adding a compiler flag to enable it for debugging purposes, but it would definitely be expensive to do such checks and would be totally unacceptable in the release build of a systems programming language. - Jonathan M Davis
Jan 16 2011
parent reply Graham St Jack <Graham.StJack internode.on.net> writes:
On 17/01/11 14:16, Jonathan M Davis wrote:
 On Sunday 16 January 2011 19:38:55 Andrei Alexandrescu wrote:
 On 1/16/11 9:32 PM, Graham St Jack wrote:
 Is the cost of run-time checking really prohibitive?
Yes. There is no question about that. This is not negotiable.
Well, since it would mean checking a condition every time that you did arithmetic, that would likely _at least_ double the cost of doing any arithmetic. And particularly since arithmetic is such a basic operation that _everything else_ relies on, that could get really expensive, really fast. Yeah. I don't think that that's negotiable. Absolutely best case, I could see adding a compiler flag to enable it for debugging purposes, but it would definitely be expensive to do such checks and would be totally unacceptable in the release build of a systems programming language. - Jonathan M Davis
Yes, I agree that checking all the time would be too expensive. What I meant was that we could provide functions that could do appropriate checking when it is needed. Andrei didn't like the functions idea, suggesting types that do policy-based checking, which I am happy with. -- Graham St Jack
Jan 16 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Graham St Jack:

 Yes, I agree that checking all the time would be too expensive.
I agree that other solutions have to be adopted first, runtime tests are the last thing to try. But I think Andrei doesn't know how much expensive that would be. --------------------- Walter:
 1. Yes it is meaningful - depending on what you're doing.
I am not sure.
 2. Such a runtime test is expensive in terms of performance and code bloat.
I have not seen even synthetic benchmarks about this. Bye, bearophile
Jan 17 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
bearophile wrote:
 1. Yes it is meaningful - depending on what you're doing.
I am not sure.
 2. Such a runtime test is expensive in terms of performance and code bloat.
 
I have not seen even synthetic benchmarks about this.
Look at the asm dump of a function. It's full of add's - not only ADD instructions, but addressing mode multiplies and add's. Subtraction is often expressed in terms of addition, relying on twos-complement wraparound. Trying to remove twos-complement arithmetic from a systems language is like trying to teach your cat to fetch.
Jan 17 2011
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday 17 January 2011 01:32:39 Walter Bright wrote:
 bearophile wrote:
 1. Yes it is meaningful - depending on what you're doing.
I am not sure.
 2. Such a runtime test is expensive in terms of performance and code
 bloat.
I have not seen even synthetic benchmarks about this.
Look at the asm dump of a function. It's full of add's - not only ADD instructions, but addressing mode multiplies and add's. Subtraction is often expressed in terms of addition, relying on twos-complement wraparound. Trying to remove twos-complement arithmetic from a systems language is like trying to teach your cat to fetch.
I think that you'd fare better with the cat. :) - Jonathan M Davis
Jan 17 2011
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter:

 Look at the asm dump of a function. It's full of add's - not only ADD
 instructions, but addressing mode multiplies and add's. Subtraction is often
 expressed in terms of addition, relying on twos-complement wraparound.
This answer is a bit relevant only if the programmer is using inline asm, while the discussion was about unsigned differences in D code, that are uncommon in my D code. Sometimes I even assign lengths to signed-word variables, to avoid some signed/unsigned comparison bugs. Bye, bearophile
Jan 17 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
bearophile wrote:
 Walter:
 
 Look at the asm dump of a function. It's full of add's - not only ADD 
 instructions, but addressing mode multiplies and add's. Subtraction is
 often expressed in terms of addition, relying on twos-complement
 wraparound.
This answer is a bit relevant only if the programmer is using inline asm, while the discussion was about unsigned differences in D code, that are uncommon in my D code. Sometimes I even assign lengths to signed-word variables, to avoid some signed/unsigned comparison bugs.
A lot of the addition is also carried out at link time, and even by the loader. Subtraction is done by relying on overflow.
Jan 17 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Walter:

 bearophile wrote:
 Walter:
 
 Look at the asm dump of a function. It's full of add's - not only ADD 
 instructions, but addressing mode multiplies and add's. Subtraction is
 often expressed in terms of addition, relying on twos-complement
 wraparound.
This answer is a bit relevant only if the programmer is using inline asm, while the discussion was about unsigned differences in D code, that are uncommon in my D code. Sometimes I even assign lengths to signed-word variables, to avoid some signed/unsigned comparison bugs.
A lot of the addition is also carried out at link time, and even by the loader. Subtraction is done by relying on overflow.
The back-end carries out my D operations using unsigned differences on CPU registers, the linker has to use them, etc. But the discussion was about explicit operations done by the D code written by the programmer. Modular arithmetic done by unsigned fixed bitfields is mathematically sound, but it's a bit too much bug-prone for normal Safe D modules :-) Bye, bearophile
Jan 17 2011
prev sibling parent reply so <so so.do> writes:
 int difference(uint a, uint b) {
    if (a >= b) {
      return cast(int) a-b;
    }
    else {
      return -(cast(int) b-a);
    }
 }
Wouldn't this be just pushing a design error one step further? uint has no mathematical basis whatsoever, it is there because we "can" have it. I have another solution, remove "uint-uint" from the language and provide explicit functions.
Jan 17 2011
parent so <so so.do> writes:
 Wouldn't this be just pushing a design error one step further?
 uint has no mathematical basis whatsoever, it is there because we "can"  
 have it.
 I have another solution, remove "uint-uint" from the language and  
 provide explicit functions.
Oh didn't see Don's reply.
Jan 17 2011
prev sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 17/01/2011 00:09, Andrei Alexandrescu wrote:
 On 1/16/11 5:24 PM, Graham St Jack wrote:
 On 16/01/11 08:52, Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral
 types in D. For the most part, we succeeded, but the success was
 partial. There was some hope with the polysemy notion, but it
 ultimately was abandoned because it was deemed too difficult to
 implement for its benefits, which were considered solving a minor
 annoyance. I was sorry to see it go, and I'm glad that now its day of
 reckoning has come.

 Some of the 32-64 portability bugs have come in the following form:

 char * p;
 uint a, b;
 ...
 p += a - b;

 On 32 bits, the code works even if a < b: the difference will become a
 large unsigned number, which is then converted to a size_t (which is a
 no-op since size_t is uint) and added to p. The pointer itself is a
 32-bit quantity. Due to two's complement properties, the addition has
 the same result regardless of the signedness of its operands.

 On 64-bits, the same code has different behavior. The difference a - b
 becomes a large unsigned number (say e.g. 4 billion), which is then
 converted to a 64-bit size_t. After conversion the sign is not
 extended - so we end up with the number 4 billion on 64-bit. That is
 added to a 64-bit pointer yielding an incorrect value. For the
 wraparound to work, the 32-bit uint should have been sign-extended to
 64 bit.

 To fix this problem, one possibility is to mark statically every
 result of one of uint-uint, uint+int, uint-int as "non-extensible",
 i.e. as impossible to implicitly extend to a 64-bit value. That would
 force the user to insert a cast appropriately.

 Thoughts? Ideas?


 Andrei
It seems to me that the real problem here is that it isn't meaningful to perform (a-b) on unsigned integers when (a<b). Attempting to clean up the resultant mess is really papering over the problem. How about a runtime error instead, much like dividing by 0?
That's too inefficient. Andrei
Really? :/ Even if the runtime error can be optionally disabled on compilation, like arrays bound checking? -- Bruno Medeiros - Software Engineer
Feb 04 2011
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Graham St Jack wrote:
 It seems to me that the real problem here is that it isn't meaningful to 
 perform (a-b) on unsigned integers when (a<b). Attempting to clean up 
 the resultant mess is really papering over the problem. How about a 
 runtime error instead, much like dividing by 0?
1. Yes it is meaningful - depending on what you're doing. 2. Such a runtime test is expensive in terms of performance and code bloat.
Jan 17 2011
prev sibling parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 We've spent a lot of time trying to improve the behavior of integral 
 types in D. For the most part, we succeeded, but the success was 
 partial. There was some hope with the polysemy notion, but it ultimately 
 was abandoned because it was deemed too difficult to implement for its 
 benefits, which were considered solving a minor annoyance. I was sorry 
 to see it go, and I'm glad that now its day of reckoning has come.
 
 Some of the 32-64 portability bugs have come in the following form:
 
 char * p;
 uint a, b;
 ...
 p += a - b;
 
 On 32 bits, the code works even if a < b: the difference will become a 
 large unsigned number, which is then converted to a size_t (which is a 
 no-op since size_t is uint) and added to p. The pointer itself is a 
 32-bit quantity. Due to two's complement properties, the addition has 
 the same result regardless of the signedness of its operands.
 
 On 64-bits, the same code has different behavior. The difference a - b 
 becomes a large unsigned number (say e.g. 4 billion), which is then 
 converted to a 64-bit size_t. After conversion the sign is not extended 
 - so we end up with the number 4 billion on 64-bit. That is added to a 
 64-bit pointer yielding an incorrect value. For the wraparound to work, 
 the 32-bit uint should have been sign-extended to 64 bit.
 
 To fix this problem, one possibility is to mark statically every result 
 of one of uint-uint, uint+int, uint-int as "non-extensible", i.e. as 
 impossible to implicitly extend to a 64-bit value. That would force the 
 user to insert a cast appropriately.
 
 Thoughts? Ideas?
 
 
 Andrei
This is a new example of an old issue; it is in no way specific to 64 bits. Any expression which contains a size-extension AND a signed<->unsigned implicit conversion is almost always a bug. (unsigned - unsigned leaves the carry flag unknown, so sign extension is impossible). It happens a lot with ushort, ubyte. There are several examples of it in bugzilla. short a=-1; a = a>>>1; is a particularly horrific example. I think it should be forbidden in all cases. I think it can be done with a flag in the range propagation.
Jan 17 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/17/11 2:47 AM, Don wrote:
 Andrei Alexandrescu wrote:
[snip]
 This is a new example of an old issue; it is in no way specific to 64 bits.
 Any expression which contains a size-extension AND a signed<->unsigned
 implicit conversion is almost always a bug. (unsigned - unsigned leaves
 the carry flag unknown, so sign extension is impossible).

 It happens a lot with ushort, ubyte. There are several examples of it in
 bugzilla. short a=-1; a = a>>>1; is a particularly horrific example.
That doesn't compile. This does: short a = -1; a >>>= 1; a becomes 32767, which didn't surprise me. Replacing >>>= with >>= keeps a unchanged, which I also didn't find surprising.
 I think it should be forbidden in all cases. I think it can be done with
 a flag in the range propagation.
Yes, that would be awesome! Andrei
Jan 17 2011
parent Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 On 1/17/11 2:47 AM, Don wrote:
 Andrei Alexandrescu wrote:
[snip]
 This is a new example of an old issue; it is in no way specific to 64 
 bits.
 Any expression which contains a size-extension AND a signed<->unsigned
 implicit conversion is almost always a bug. (unsigned - unsigned leaves
 the carry flag unknown, so sign extension is impossible).

 It happens a lot with ushort, ubyte. There are several examples of it in
 bugzilla. short a=-1; a = a>>>1; is a particularly horrific example.
That doesn't compile. This does: short a = -1; a >>>= 1; a becomes 32767, which didn't surprise me. Replacing >>>= with >>= keeps a unchanged, which I also didn't find surprising.
Aargh, that should have been: short a = -1; ushort b = -1; assert( a == b ); // passes assert( a >>> 1 == b >>> 1); // fails Another example: uint x = 3; uint y = 8; ulong z = 0; ulong a = (z + x) - y; ulong b = z + (x - y); assert(a == b); // Thought addition was associative, did you? 'a' only involves size-extension, so it's OK. But 'b' has a subexpression which sets the carry bit. Actually it doesn't even need subtraction. uint x = uint.max; uint y = uint.max; ulong z = 0; ulong a = (z + x) + y; ulong b = z + (x + y); assert(a == b); // Still thought addition was associative? It's the same deal: you shouldn't be able to size-extend, when the state of the carry flag is unknown. Once you have performed an operation which can wrap around, you have discarded the carry bit. This means you have made a commitment to arithmetic modulo 2^^32. And then the next addition is arithmetic modulo 2^^64! Which is a fundamentally different, incompatible operation. It should be a type mismatch. Note that because small types get promoted to int, the problem mostly shows up with uint -> ulong (for smaller types, the carry bit is retained inside the int).
 I think it should be forbidden in all cases. I think it can be done with
 a flag in the range propagation.
Yes, that would be awesome! Andrei
Jan 17 2011