www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - core.checkedint

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
There are a few simple enhancements to add to core.checkedint:

1. No need for different names (addu/adds) etc., overloading should take 
care of it (simplifies caller code). Right now the API is an odd mix of 
overloading and distinct names.

2. The overflow flag should be an integral, not a bool, so you get to 
count how many overflows there were. This is minor but has no extra cost 
on the happy case.

3. There should be an exponentiation function.

So I'm thinking of the signatures:

pure nothrow  nogc  safe
{
   int add(int x, int y, ref uint overflows);
   uint add(uint x, uint y, ref uint overflows);
   long add(long x, int y, ref uint overflows);
   ulong add(ulong x, uint y, ref uint overflows);

   int mul(int x, int y, ref uint overflows);
   uint mul(uint x, uint y, ref uint overflows);
   long mul(long x, int y, ref uint overflows);
   ulong mul(ulong x, uint y, ref uint overflows);

   int sub(int x, int y, ref uint overflows);
   uint sub(uint x, uint y, ref uint overflows);
   long sub(long x, int y, ref uint overflows);
   ulong sub(ulong x, uint y, ref uint overflows);

   int neg(int x, ref uint overflows);
   long neg(long x, ref uint overflows);

   int pow(int x, int y, ref uint overflows);
   uint pow(uint x, uint y, ref uint overflows);
   long pow(long x, int y, ref uint overflows);
   ulong pow(ulong x, uint y, ref uint overflows);
}

Thoughts?


Andrei
Jun 24 2016
next sibling parent reply Dominikus Dittes Scherkl <Dominikus.Scherkl continental-corporation.com> writes:
On Friday, 24 June 2016 at 16:56:36 UTC, Andrei Alexandrescu 
wrote:
 There are a few simple enhancements to add to core.checkedint:

 1. No need for different names (addu/adds) etc., overloading 
 should take care of it (simplifies caller code). Right now the 
 API is an odd mix of overloading and distinct names.

 2. The overflow flag should be an integral, not a bool, so you 
 get to count how many overflows there were. This is minor but 
 has no extra cost on the happy case.

 3. There should be an exponentiation function.

 So I'm thinking of the signatures:

 pure nothrow  nogc  safe
 {
   int add(int x, int y, ref uint overflows);
   uint add(uint x, uint y, ref uint overflows);
   long add(long x, int y, ref uint overflows);
   ulong add(ulong x, uint y, ref uint overflows);

   int mul(int x, int y, ref uint overflows);
   uint mul(uint x, uint y, ref uint overflows);
   long mul(long x, int y, ref uint overflows);
   ulong mul(ulong x, uint y, ref uint overflows);

   int sub(int x, int y, ref uint overflows);
   uint sub(uint x, uint y, ref uint overflows);
   long sub(long x, int y, ref uint overflows);
   ulong sub(ulong x, uint y, ref uint overflows);

   int neg(int x, ref uint overflows);
   long neg(long x, ref uint overflows);

   int pow(int x, int y, ref uint overflows);
   uint pow(uint x, uint y, ref uint overflows);
   long pow(long x, int y, ref uint overflows);
   ulong pow(ulong x, uint y, ref uint overflows);
 }

 Thoughts?


 Andrei
I have a save exponentiation function (+ two helper functions that may also be useful on their own): /// add a property to numeric types that can be used as return value if a result is out of bounds template invalid(T) if(isNumeric!T) { static if(isFloatingPoint!T) property enum invalid = T.init; else static if(isSigned!T) property enum invalid = T.min; // 0x80..00 else // unsigned property enum invalid = T.max; // 0xFF..FF } /// calculate the maximal power of the value that would fit in an ucent /// for smaller types simply shift down the result /// (e.g. divide by 4 to calc the maxpower that would fit in an uint) ubyte maxpow(const ulong a) pure safe nogc nothrow { assert(a>1); // no useful maxpower exists for 0 and 1 static immutable ubyte[55] mp = [ 127, 80, 63, 55, 49, 45, 43, 40, 38, 37, 35, 34, 33, 32, 31, 31, 30, 30, 29, 29, 28, 28, 27, 27, 27, 26, 26, 26, 26, 25, 25, 25, 25, 24, 24, 24, 24, 24, 24, 23, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22 ]; return (a<139) ? (a<57) ? mp[cast(ubyte)a-2] : ((a<85) ? ((a<69) ? 21 : 20) : ((a<107) ? 19 : 18)) : ((a<7132) ? ((a<566) ? ((a<256) ? ((a<185) ? 17 : 16) : ((a<371) ? 15 : 14)) : ((a<1626) ? ((a<921) ? 13 : 12) : ((a<3184) ? 11 : 10))) : ((a<2642246) ? ((a<65536) ? ((a<19113) ? 9 : 8) : ((a<319558) ? 7 : 6)) : ((a<4294967296) ? ((a<50859009) ? 5 : 4) : ((a<6981463658332) ? 3 : 2)))); } /// exponentiation without overflow /// return invalid if overflow would occur. T safePow(T)(const(T) base, const ubyte exp) pure safe nogc nothrow if(isIntegral!T) { static if(isUnsigned!T) { if(!exp) return 1; // x^^0 is always 1 if(exp == 1 || base < 2) return base; // x^^1 = x, 1^^n = 1 and 0^^n = 0, so do nothing static if(T.sizeof > ulong.sizeof) { if(base > ulong.max) return invalid!T; } if(exp > (maxpow(base)>>(5-bitlen(T.sizeof)))) return invalid!T; return base ^^ exp; // calc only if no overflow for sure (very efficient) } else { auto r = safePow(abs(base), exp); // sometimes calc even if result won't fit if(r > T.max) return invalid!T; // but at least we can easily detect if it happened return (base < 0 && odd(exp)) ? -cast(T)r : cast(T)r; } } unittest { void test(T)() { T r; foreach(base; T.min..T.max+1) { foreach(exp; 0..256) { r = safePow(base, exp); if(r == invalid!T) { assert(base^^exp > T.max, "max wrong: "+T.stringOf()); break; } assert(r == base^^exp, "calc wrong"); } } } test!byte; test!ubyte; test!short; test!ushort; /+ test!int; test!uint; test!long; test!ulong; +/ }
Jun 24 2016
parent Dominikus Dittes Scherkl <Dominikus.Scherkl continental-corporation.com> writes:
On Friday, 24 June 2016 at 17:25:32 UTC, Dominikus Dittes Scherkl 
wrote:
 I have a save exponentiation function (+ two helper functions 
 that may also be useful on their own):
Ah, and I use a better abs() function: /// get the absolute value of x as unsigned type. always succeeds, even for T.min Unsigned!T abs(T)(const(T) x) pure safe nogc nothrow if(isIntegral!T) { static if(isSigned!T) if(x < 0) return -x; return x; } unittest { byte a = -128; auto b = abs(a); assert(is(typeof(b) == ubyte)); assert(b == 128); }
Jun 24 2016
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/24/2016 9:56 AM, Andrei Alexandrescu wrote:
 1. No need for different names (addu/adds) etc., overloading should take care
of
 it (simplifies caller code). Right now the API is an odd mix of overloading and
 distinct names.
The reason for the different names is: 1. Integral promotion rules will affect which overload is selected, when the argument type is not an exact match. Few programmers have an exact understanding of integral promotion rules. 2. Things become further obfuscated when type aliases are used. 3. Mixing signed and unsigned integer types calls which overload? 4. People who use core.checkedint want pedantically correct code. Being absolutely sure what operation is being done is part of that. Using different names makes it trivially inspectable. 5. It is a design mistake to make overloads with the same name have different implementations. Arguably, in this context, a signed overflow is different from an unsigned overflow, and so should properly be a different name.
 2. The overflow flag should be an integral, not a bool, so you get to count 
how many overflows there were. This is minor but has no extra cost on the happy case. I see no value in this beyond amusement value, and a potential bug: 1. One operation in a sequence that overflows will make subsequent operations meaningless, and will likely even cause more overflows. Sticky error flags are used in floating point, and I've never heard any desire for nor use case for counts. 2. The overflow count could conceivably itself overflow, thereby masking the failure in rare cases.
 3. There should be an exponentiation function.
Agreed.
Jun 24 2016
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/24/2016 06:23 PM, Walter Bright wrote:
 On 6/24/2016 9:56 AM, Andrei Alexandrescu wrote:
 1. No need for different names (addu/adds) etc., overloading should
 take care of
 it (simplifies caller code). Right now the API is an odd mix of
 overloading and
 distinct names.
The reason for the different names is: 1. Integral promotion rules will affect which overload is selected, when the argument type is not an exact match. Few programmers have an exact understanding of integral promotion rules.
These functions are not supposed to be used naïvely.
 2. Things become further obfuscated when type aliases are used.
These functions are not supposed to be used naïvely.
 3. Mixing signed and unsigned integer types calls which overload?
These functions are not supposed to be used naïvely.
 4. People who use core.checkedint want pedantically correct code. Being
 absolutely sure what operation is being done is part of that. Using
 different names makes it trivially inspectable.
These functions are not supposed to be used naïvely.
 5. It is a design mistake to make overloads with the same name have
 different implementations. Arguably, in this context, a signed overflow
 is different from an unsigned overflow, and so should properly be a
 different name.
Wait, what? std.conv.to much?
  > 2. The overflow flag should be an integral, not a bool, so you get to
 count how many overflows there were. This is minor but has no extra cost
 on the happy case.

 I see no value in this beyond amusement value, and a potential bug:

 1. One operation in a sequence that overflows will make subsequent
 operations meaningless, and will likely even cause more overflows.
 Sticky error flags are used in floating point, and I've never heard any
 desire for nor use case for counts.
OK.
 2. The overflow count could conceivably itself overflow, thereby masking
 the failure in rare cases.
Touché.
  > 3. There should be an exponentiation function.

 Agreed.
Yay. Andrei
Jun 24 2016
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/24/2016 3:29 PM, Andrei Alexandrescu wrote:
 These functions are not supposed to be used naïvely.
They are also meant to be inspectable. I want to look at it and trivially know what I am getting. Deep knowledge of subtle cases in the language (and integral promotions satisfy that) should not be necessary *especially* for this set of functions. People get into enough trouble over misunderstanding whether they are using signed or unsigned arithmetic as it is. I've used core.checkedint here and there, and have been well satisfied with the existing interface. int add(int x, int y, ref uint overflows); uint add(uint x, uint y, ref uint overflows); long add(long x, int y, ref uint overflows); ulong add(ulong x, uint y, ref uint overflows); ubyte i; uint u; uint overflow; add(i, u, overflow); Quickly, without trying it out, tell me which one it calls. No cheating!
Jun 24 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/24/2016 06:56 PM, Walter Bright wrote:
 On 6/24/2016 3:29 PM, Andrei Alexandrescu wrote:
 These functions are not supposed to be used naïvely.
They are also meant to be inspectable. I want to look at it and trivially know what I am getting.
Then the right API is add(x, y, overflow). I look at it and I trivially know what I am getting. -- Andrei
Jun 24 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
To bring a different angle, the best front-end for arithmetic operations 
that may overflow is:

typeof(lhs + rhs) add(L, R)(const L lhs, const R rhs, ref bool overflow)
if (isIntegral!L && isIntegral!R);

The overflow bit is set if and only if the mathematical result is not 
the same as the concrete result. Interestingly, things like add(5u, -1) 
should succeed without overflow (returning 4u) even though the negative 
value is conceptually converted to a large positive number and the 
operation overflows. (I've implemented this behavior in the DbI checkedint.)

One point that escaped me initially is that sometimes there's no need 
for any checks, e.g. adding any two types smaller than int doesn't need 
to be checked.

I'm not sure whether this function belongs in core.checkedint, but 
that's the right front end design for the primitives in there.


Andrei
Jun 24 2016
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/24/2016 3:55 PM, Andrei Alexandrescu wrote:
 Interestingly, things like add(5u, -1) should succeed
 without overflow (returning 4u) even though the negative value is conceptually
 converted to a large positive number and the operation overflows. (I've
 implemented this behavior in the DbI checkedint.)
I believe adding such behavior is beyond the charter of checkedint. I understand the charter is to check for all overflow, undefined and implementation defined behaviors, and not go beyond that. Pedantically, -1 is an int. But (unsigned op signed) converts the latter to unsigned, becoming (unsigned op unsigned), and so an overflow occurs.
Jun 24 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/24/2016 07:56 PM, Walter Bright wrote:
 I believe adding such behavior is beyond the charter of checkedint.
Which checkedint? The one in core? -- Andrei
Jun 24 2016
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/24/2016 4:59 PM, Andrei Alexandrescu wrote:
 On 06/24/2016 07:56 PM, Walter Bright wrote:
 I believe adding such behavior is beyond the charter of checkedint.
Which checkedint? The one in core? -- Andrei
Your Checked!int. Sorry for the confusion.
Jun 24 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/24/2016 08:11 PM, Walter Bright wrote:
 On 6/24/2016 4:59 PM, Andrei Alexandrescu wrote:
 On 06/24/2016 07:56 PM, Walter Bright wrote:
 I believe adding such behavior is beyond the charter of checkedint.
Which checkedint? The one in core? -- Andrei
Your Checked!int. Sorry for the confusion.
Though you should be able to implement that via a hook, I think at this point we need to agree to disagree. The relevant code is at https://gist.github.com/andralex/a0c0ad32704e6ba66e458ac48add4a99#file-checked-d-L88: static if (!isUnsigned!L) { if (lhs < 0) { return subu(Result(rhs), Result(negs(lhs, overflow)), overflow); } } With your suggestion, this would also be an overflow: long x = -1; auto y = array.length + x; I would be hard pressed to acknowledge that as an overflow that needs to be dynamically signaled. And the beauty of two's complement is that indeed it just works. Andrei
Jun 24 2016
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/24/2016 09:42 PM, Andrei Alexandrescu wrote:
 long x = -1;
 auto y = array.length + x;

 I would be hard pressed to acknowledge that as an overflow that needs to
 be dynamically signaled. And the beauty of two's complement is that
 indeed it just works.
To clarify: if array.length is 0, then indeed that should be signaled as an error. But otherwise it should just go through, no overflow. -- Andrei
Jun 24 2016
parent Observer <here inter.net> writes:
On Saturday, 25 June 2016 at 01:49:22 UTC, Andrei Alexandrescu 
wrote:
 On 06/24/2016 09:42 PM, Andrei Alexandrescu wrote:
 long x = -1;
 auto y = array.length + x;

 I would be hard pressed to acknowledge that as an overflow 
 that needs to
 be dynamically signaled. And the beauty of two's complement is 
 that
 indeed it just works.
To clarify: if array.length is 0, then indeed that should be signaled as an error. But otherwise it should just go through, no overflow. -- Andrei
I'm curious how one attempts to handle overflow detection in a user-level implementation. It seems to me that such calculations are much better handled at the hardware level -- and that means, in the compiler. To wit, I'm looking at some old processor manuals I happen to have sitting around, since they tend to be quite descriptive about such details. My pdp-11 processor handbook says about the V (overflow) and C (carry) status bits in an ADD instruction: V: set if there was arithmetic overflow as a result of the operation; that is both operands were of the same sign and the result was of the opposite sign; cleared otherwise C: set if there was a carry from the most significant bit of the result; cleared otherwise I guess this is just assuming both operands are signed. It doesn't talk about a separate ADDU (add unsigned) instruction, so I guess you'd have to just use ADD and interpret the flags appropriately for that situation. The point is, there are two separate flags involved, and they need to be thought about separately and in detail in such contexts. In the MC68020 32-Bit Microprocessor User's Manual, Appendix A ("Condition Codes Computation") is really nice because it is precise and goes right down to the bottom-most detail, listing the logical equations used to set such flags. For instance, for the ADD instruction, the V and C flags are set according to these logical equations: V = (Sm and Dm and (not Rm)) or ((not Sm) and (not Dm) and Rm) C = (Sm and Dm) or ((not Rm) and Dm) or (Sm and (not Rm)) where: Sm = source (first operand) most significant bit Dm = destination (second operand) most significant bit Rm = result most significant bit (I've added parentheses to what is presented in the book, assuming that logical AND is of higher precedence than logical OR.) I'm assuming that once again, these flags are being set on the assumption that both operands are signed, as I see no separate ADDU instruction. My points are: * correct overflow detection involves an awful lot of bit-fiddling * the V (overflow) bit is *not* the same as the C (carry) bit that everyone first thinks about when considering whether or not a simple add calculation overflowed I had to wonder, how is this handled at the user level in a checkedint implementation? Do you perform a lot of data masking and testing in user-level code? Or do you depend on accessing the machine's hardware flags register and hoping it hasn't changed in between when you executed the compiled ADD instruction and when you attempt to access it from high-level code? I just checked the implementations of adds() and addu() here: https://github.com/dlang/druntime/blob/master/src/core/checkedint.d They don't seem to have quite the complexity of the logical equations I just showed. Not that they're necessarily incorrect as they stand; I'm just curious.
Jun 25 2016
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/24/2016 6:42 PM, Andrei Alexandrescu wrote:
 With your suggestion, this would also be an overflow:

 long x = -1;
 auto y = array.length + x;

 I would be hard pressed to acknowledge that as an overflow that needs to be
 dynamically signaled. And the beauty of two's complement is that indeed it just
 works.
I'm curious if that produces an overflow error on the newer clang that instruments such things.
Jun 24 2016
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/24/2016 6:42 PM, Andrei Alexandrescu wrote:
 With your suggestion, this would also be an overflow:

 long x = -1;
 auto y = array.length + x;

 I would be hard pressed to acknowledge that as an overflow that needs to be
 dynamically signaled. And the beauty of two's complement is that indeed it just
 works.
That's a seductive test case. But I worry that mixed signed/unsigned arithmetic is not so simple. What about: x + array.length commutativity in general associativity Does this become a morass of special cases?
Jun 24 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/24/2016 11:19 PM, Walter Bright wrote:
 On 6/24/2016 6:42 PM, Andrei Alexandrescu wrote:
 With your suggestion, this would also be an overflow:

 long x = -1;
 auto y = array.length + x;

 I would be hard pressed to acknowledge that as an overflow that needs
 to be
 dynamically signaled. And the beauty of two's complement is that
 indeed it just
 works.
That's a seductive test case. But I worry that mixed signed/unsigned arithmetic is not so simple. What about: x + array.length commutativity in general associativity Does this become a morass of special cases?
Doesn't seem that way (with some simplifying rules, associativity is left to right so not necessarily optimal), but commutativity works nicely, please take a close look at https://gist.github.com/andralex/a0c0ad32704e6ba66e458ac48add4a99 and destroy what you find unfit. And indeed UBSAN is a good baseline to keep an eye on. -- Andrei
Jun 24 2016