www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Contracts, Undefined Behavior, and Defensive,Programming

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
A short and sweet paper:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf

It echoes Walter's opinions on contracts vs. runtime checking, to a tee. 
He has been derided on occasion in this group for such views. The paper 
is a good rationale from an independent source.
Jun 12
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/12/2020 8:46 PM, Andrei Alexandrescu wrote:
 A short and sweet paper:
 
 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf
 
 It echoes Walter's opinions on contracts vs. runtime checking, to a tee. He
has 
 been derided on occasion in this group for such views. The paper is a good 
 rationale from an independent source.
The authors present the case better than I did :-)
Jun 12
prev sibling next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Fri, 12 Jun 2020 23:46:38 -0400 schrieb Andrei Alexandrescu:

 A short and sweet paper:
 
 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf
 
 It echoes Walter's opinions on contracts vs. runtime checking, to a tee.
 He has been derided on occasion in this group for such views. The paper
 is a good rationale from an independent source.
A good article. Although there have sometimes been discussions about it, I think the difference between input validation and contract checking is now well understood by most people participating in discussions here. However, this article also gives some vocabulary which might allow us to discuss the assert/assume conflict, where there is no uniform position in the community, in a clearer and more precise way. Consider this example: ------------------------------------------------------------------------ /** * returns (1 + 2 + ... + n) / n */ double sumDiv(ubyte n) safe { double accum = 0; for (size_t i = 0; i <= n; i++) { accum += i; if (i == n) return accum / i; } // DMD requires this to compile the code, cause it does not detect // the above loop covers all possible inputs. // GDC and LDC optimizers remove this assert. assert(0); } ------------------------------------------------------------------------ For n = 0, this function causes soft (library) undefined behavior according to the article: 0.0/0 is defined to give NaN, but it's not a valid result for the algorithm above. Please note that the if block in the loop will return for all possible input values. The assert(0) is only needed because of compiler limitations, but never executed. So we apply defensive programming and add this contract: ---------------------------- double sumDiv(ubyte n) in (n > 0) ---------------------------- This now nicely detects the error in debug mode. However, the D spec says "The compiler is free to assume the assert expression is true and optimize subsequent code accordingly." and "Undefined Behavior: The subsequent execution of the program after an assert contract is false.": https:// dlang.org/spec/contracts.html I didn't remember that the spec was that upfront about this. Let's see how this affects our example: * If the compiler interprets in(n > 0) as assume(n > 0), this means that the first iteration of the for loop with i = 0 has no effect: The if condition is always false (n != 0) and the accum += 0 has no effect. * So it may rewrite the loop bounds to start at i = 1 * If this optimized function is called with n = 0, the loop body and the return is never executed * Remember that the assert(0) at the end could already be removed by the optimizer * I guess all bets are off what happens now So using definsive programming we turned soft UB into hard UB which will probably crash the program. It could also cause memory corruption, run into an infinite loop, .... And please note that all this even happens in a safe function. So because of this, I think it's a really bad idea to conflate assume and assert meanings. -- Johannes
Jun 13
next sibling parent reply Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:
 So using definsive programming we turned soft UB into hard UB 
 which will probably crash the program. It could also cause 
 memory corruption, run into an infinite loop, .... And please 
 note that all this even happens in a  safe function.

 So because of this, I think it's a really bad idea to conflate 
 assume and assert meanings.
The article talks about this. It states that soft undefined behavior can lead to hard UB. I don't think it tries to argue this is OK. What it talks about is the importance of validation of external input. Saying that if we have well guarded inputs will allow us to detect program bugs through assertion seems a little optimistic. However I do think emphasizing the input validation early is something important and severely lacking in the web development side. Though it is also hard to define where the system input vs external input boundaries are. The concept that a method should provide a form with input validation and one with defensive programming is interesting.
Jun 13
parent reply Johannes Pfau <nospam example.com> writes:
Am Sat, 13 Jun 2020 14:41:19 +0000 schrieb Jesse Phillips:

 On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:
 So using definsive programming we turned soft UB into hard UB which
 will probably crash the program. It could also cause memory corruption,
 run into an infinite loop, .... And please note that all this even
 happens in a  safe function.

 So because of this, I think it's a really bad idea to conflate assume
 and assert meanings.
The article talks about this. It states that soft undefined behavior can lead to hard UB. I don't think it tries to argue this is OK.
I think you're refering to the strlen example on page 4? It is true that soft UB **may** lead to hard UB in general. If this is caused by application code, as in that example, it is something safe has to (and can) take care of. In application code, soft UB may be limited to local effects. Consider an online banking application showing the current date somewhere: If some code manages to assign 25 to the hours part of the date, that certainly violates some semantic guarantees. But if the date is only shown to the user, the worst thing that'll happen is that the user may see an incorrect date. And if all code is safe, we're still guaranteed not to get memory corruption (hard UB). But now if I add an assert(day <= 24) somewhere with best defensive programming intentions, the compiler may turn that into hard UB now. And hard UB could even include memory corruption setting your account balance to a negative value. As assert "[...] checks (along with any action taken should a check fail) are, however,never a part of a function’s contract, cannot be relied upon by the clients..." they subvert safe checking when used as assumes: An example: void a(int n) safe { assert (n > 0); } void b() safe { a(-1); } The compiler can not prevent the a(-1) call in b, as it does not know the semantic invariant demanded by a(). However, in a(), the assert (n > 0) get's ransformed to assume(n > 0) in release mode, which will lead to safe code causing hard UB. Without assume, points where soft UB can turn into hard UB should all be covered by safe (at least for memory corruption). To summarize and rephrase this: the encapsulation unit of safe is the function boundary. The parameter definitions (including this pointer) and return type and their modifiers are the information the compiler uses to implement it's safety checks. For all possible combinations of these input values, the safe function may not cause memory corruption. asserts then narrow the function contract by describing semantic limitations and safe is not able to enforce this, as this information is not part of the function signature. This is fine, as safe will still prevent soft UB from escalating to hard UB. But if you now use assert() as assume(), you introduce hard UB for some possible input combinations, subverting safe guarantees.
 [...]
I agree with everything else you said ;-) -- Johannes
Jun 13
parent reply Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Saturday, 13 June 2020 at 15:38:03 UTC, Johannes Pfau wrote:
 
 To summarize and rephrase this: the encapsulation unit of  safe 
 is the function boundary. The parameter definitions (including 
 this pointer) and return type and their modifiers are the 
 information the compiler uses to implement it's safety checks. 
 For all possible combinations of these input values, the  safe 
 function may not cause memory corruption.

 asserts then narrow the function contract by describing 
 semantic limitations and  safe is not able to enforce this, as 
 this information is not part of the function signature. This is 
 fine, as  safe will still prevent soft UB from escalating to 
 hard UB. But if you now use assert() as assume(), you introduce 
 hard UB for some possible input combinations, subverting  safe 
 guarantees.
Let me try and state the position I think you are taking. Soft undefined behavior is unacceptable for safe code because it can lead to memory corruption due to the hard undefined behavior which may result from the soft undefined behavior.
Jun 14
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 14.06.20 19:38, Jesse Phillips wrote:
 On Saturday, 13 June 2020 at 15:38:03 UTC, Johannes Pfau wrote:
 To summarize and rephrase this: the encapsulation unit of  safe is the 
 function boundary. The parameter definitions (including this pointer) 
 and return type and their modifiers are the information the compiler 
 uses to implement it's safety checks. For all possible combinations of 
 these input values, the  safe function may not cause memory corruption.

 asserts then narrow the function contract by describing semantic 
 limitations and  safe is not able to enforce this, as this information 
 is not part of the function signature. This is fine, as  safe will 
 still prevent soft UB from escalating to hard UB. But if you now use 
 assert() as assume(), you introduce hard UB for some possible input 
 combinations, subverting  safe guarantees.
Let me try and state the position I think you are taking. Soft undefined behavior is unacceptable for safe code because it can lead to memory corruption due to the hard undefined behavior which may result from the soft undefined behavior.
No, soft undefined behavior is fine. The problem is that assert is explicitly specified to cause hard undefined behavior on failure, yet is allowed in safe code.
Jun 14
prev sibling next sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:

 ------------------------------------------------------------------------
 /**
  * returns (1 + 2 + ... + n) / n
  */
 double sumDiv(ubyte n)  safe
 {
     double accum = 0;

     for (size_t i = 0; i <= n; i++)
     {
         accum += i;
         if (i == n)
             return accum / i;
     }
     // DMD requires this to compile the code, cause it does not 
 detect
     // the above loop covers all possible inputs.
     // GDC and LDC optimizers remove this assert.
     assert(0);
 }
 For n = 0, this function causes soft (library) undefined 
 behavior according to the article: 0.0/0 is defined to give 
 NaN, but it's not a valid result for the algorithm above.
According to the algorithm, as presented, NaN is a valid result, and this so-called "soft" UB does not exist. It is the documentation of `sumDiv` that is incorrect, or rather, incomplete - it presupposes `n` being non-zero, but otherwise does not make any statement wrt the interface (i.e. usage) of the function, so we shall turn to the interface. The function is safe, so it promises to handle all safe inputs, which certainly include 0, without causing UB, and only return safe values, which NaN is. Optimizers, therefore, must not "optimize" this function into causing "hard" UB (such as removing the 0th iteration without inserting a `ret`). If we only saw the declaration, without definition, i.e.: /** * returns (1 + 2 + ... + n) / n */ double sumDiv(ubyte n) safe; Nowhere is it stated what happens if input is 0. But still, it has a safe interface, it is safe, and so must return a safe value, which NaN is. No UB, "soft" or "hard". If `sumDiv` expects *the callers* to *never* pass 0, it must either: - throw an Exception (which its interface allows) - throw an Error (and become nothrow) - profess so via its interface:
 double sumDiv(NonZero!ubyte n)  safe  nogc nothrow;
 //or, if it must never return a NaN:
 NeverNaN!double sumDiv(NonZero!ubyte n)  safe  nogc 
 /+/*perhaps*/nothrow+/;
...given hypothetical `NonZero` and `NeverNaN` types. Then the responsibility of checking for 0 is delegated away from the function body, and optimizers may indeed safe-ly remove the 0th iteration.
 So we apply defensive programming and add this contract:
 ----------------------------
 double sumDiv(ubyte n)
 in (n > 0)
Dropping safe? Then yes, there may be UB. But if safe is to be kept, there's a case to be made to error out on that contract as being an invalid contract. There's a "but" though, which concerns that stipulation on `assert` which you quoted from the spec. But that "but" has to do with semantics of `sumDiv` being available during compilation, which is not always possible because compilers cop out to "opaque" object files too early (i.e. in case if it's a library function, compiled into a separate object file, with only the declaration present during compilation). However, since, according to Walter, I "misunderstand" what linkers do (nevermind the fact that it's very much still about compilation and not linking), I'll say no more on that particular subject. Suffice it to say that in such a case, indeed, conflating `assert` and `assume` is just wrong.
Jun 13
parent Johannes Pfau <nospam example.com> writes:
Am Sat, 13 Jun 2020 15:04:45 +0000 schrieb Stanislav Blinov:

 On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:
 
 
------------------------------------------------------------------------
 /**
  * returns (1 + 2 + ... + n) / n */
 double sumDiv(ubyte n)  safe {
     double accum = 0;

     for (size_t i = 0; i <= n; i++)
     {
         accum += i;
         if (i == n)
             return accum / i;
     }
     // DMD requires this to compile the code, cause it does not
 detect
     // the above loop covers all possible inputs.
     // GDC and LDC optimizers remove this assert.
     assert(0);
 }
 For n = 0, this function causes soft (library) undefined behavior
 according to the article: 0.0/0 is defined to give NaN, but it's not a
 valid result for the algorithm above.
According to the algorithm, as presented, NaN is a valid result, and this so-called "soft" UB does not exist. It is the documentation of `sumDiv` that is incorrect, or rather
Well, then the documentation could just state that 0 is not a valid input. This is just for illustration, the point is that the algorithm does not want to make any assumption about the result for n=0, so it should not be called with n=0.
 If `sumDiv` expects *the callers* to *never* pass 0, it must either:
 
 - throw an Exception (which its interface allows)
 - throw an Error (and become nothrow)
 - profess so via its interface:
If you can encode such information in types, it has the benefit of the compiler being able to check it statically. But this misses the point, the article is about semantic constraints which are not encoded in types.
 So we apply defensive programming and add this contract:
 ----------------------------
 double sumDiv(ubyte n)
 in (n > 0)
Dropping safe? Then yes, there may be UB. But if safe is to be kept, there's a case to be made to error out on that contract as being an invalid contract.
Just a copy and paste error. Keep the safe, it compiles and can cause hard UB. What is your definition for an invalid contract? Even if you explicitly did if (n == 0) return; after the assert, the compiler would just remove it. The assert is perfectly fine if the algorithm is specified to not be defined for n=0. The problem is using assert() like assume() to optimize: assume(x) is unsafe when x is a function parameter with no additional restrictions. How could the compiler even detect or flag such an 'invalid' contract?
 
 There's a "but" though, which concerns that stipulation on `assert`
 which you quoted from the spec. But that "but" has to do with semantics
 of `sumDiv` being available during compilation, which is not always
 possible because compilers cop out to "opaque" object files too early
 (i.e. in case if it's a library function, compiled into a separate
 object file, with only the declaration present during compilation).
 However, since, according to Walter, I "misunderstand" what linkers do
 (nevermind the fact that it's very much still about compilation and not
 linking), I'll say no more on that particular subject. Suffice it to say
 that in such a case, indeed, conflating `assert` and `assume` is just
 wrong.
I'm not sure what you're arguing for? My whole point is that the spec ("the compiler is free to assume the assert expression is true and optimize subsequent code accordingly.") is unsound and subverts safe code. The example just shows how that can happen. It is true that this argument can be generalized to be explained by the fact that safe enforces it's guarantees based on the function interface (which is available in .di), whereas semantic contracts (asserts in function bodys) may not be available. See my reply to Jesse for some more thoughts about that. -- Johannes
Jun 13
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:
 [snip]
A very good point, has not occured to me before. But I think I can give a more specific example of what you're trying to convey: ``` safe auto readIndex(int[] arr, size_t i) in (i < arr.length) { return arr[i]; } ``` Because the compiler is free to assume that the contract holds, it can elide the array bounds check. If it does that, it results in memory violation from ` safe` code. This is something that should only be possible if `-boundscheck=off`. Not otherwise.
Jun 14
parent reply Paul Backus <snarwin gmail.com> writes:
On Monday, 15 June 2020 at 04:08:28 UTC, Dukc wrote:
 On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:
 [snip]
A very good point, has not occured to me before. But I think I can give a more specific example of what you're trying to convey: ``` safe auto readIndex(int[] arr, size_t i) in (i < arr.length) { return arr[i]; } ``` Because the compiler is free to assume that the contract holds, it can elide the array bounds check. If it does that, it results in memory violation from ` safe` code. This is something that should only be possible if `-boundscheck=off`. Not otherwise.
Well, the compiler has an option specifically for in contracts, `-checkaction=in=[on|off]`, so probably what should be done is to give that option a `safeonly` value like what `-boundscheck` has.
Jun 15
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Monday, 15 June 2020 at 12:45:56 UTC, Paul Backus wrote:
 Well, the compiler has an option specifically for in contracts, 
 `-checkaction=in=[on|off]`, so probably what should be done is 
 to give that option a `safeonly` value like what `-boundscheck` 
 has.
Typo: `-check=...`, not `-checkaction=...`.
Jun 15
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 15.06.20 14:45, Paul Backus wrote:
 On Monday, 15 June 2020 at 04:08:28 UTC, Dukc wrote:
 On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:
 [snip]
A very good point, has not occured to me before. But I think I can give a more specific example of what you're trying to convey: ``` safe auto readIndex(int[] arr, size_t i) in (i < arr.length) {    return arr[i]; } ``` Because the compiler is free to assume that the contract holds, it can elide the array bounds check. If it does that, it results in memory violation from ` safe` code. This is something that should only be possible if `-boundscheck=off`. Not otherwise.
Well, the compiler has an option specifically for in contracts, `-checkaction=in=[on|off]`, so probably what should be done is to give that option a `safeonly` value like what `-boundscheck` has.
One may want to disable assertion checking without potentially causing UB in safe code. In fact, I think this is more common.
Jun 15
prev sibling parent Dukc <ajieskola gmail.com> writes:
On Monday, 15 June 2020 at 12:45:56 UTC, Paul Backus wrote:
 Well, the compiler has an option specifically for in contracts, 
 `-checkaction=in=[on|off]`, so probably what should be done is 
 to give that option a `safeonly` value like what `-boundscheck` 
 has.
This is not acceptable. The canonical way to use `assert`s is to assume they will not negatively affect the performance of the final build. Your proposal would break that assumption. I personally see two options: 1: Simply disallowing the compiler for assuming that asserts hold. 2: Letting the compiler to assume most things like now, but disallow eliding array bound checks. In this case, there should also be a way for the user to add code that may not be elided by the optimizer, unless bounds checking is off. Something like: ``` trusted auto readIndex(CustomIntArray arr, size_t i) pragma(integritycheck, true) in (i < arr.length) { return arr[i]; } ``` Code with such a pragma could not be elided by trusting a regular assert (nor by trusting `enum` to have a valid value for example). It could still be elided by trusting another integrity critical assert, though.
Jun 15
prev sibling parent German Diago <germandiago1983 gmail.com> writes:
On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:

 A good article. Although there have sometimes been discussions 
 about it, I think the difference between input validation and 
 contract checking is now well understood by most people 
 participating in discussions here.

 [...]
Looks like a fair and well-reasoned analysis.
Jun 17
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 13.06.20 05:46, Andrei Alexandrescu wrote:
 A short and sweet paper:
 
 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf
 
 It echoes Walter's opinions on contracts vs. runtime checking, to a tee.
I don't think so. For example, the paper makes a distinction between library and language UB.
 He has been derided on occasion in this group for such views. The paper 
 is a good rationale from an independent source.
I am not sure if that includes me, because there are points in this paper whose negation Walter has occasionally used as a straw man to dismiss other points I made.
Jun 13
prev sibling next sibling parent reply Mark <smarksc gmail.com> writes:
On Saturday, 13 June 2020 at 03:46:38 UTC, Andrei Alexandrescu 
wrote:
 A short and sweet paper:

 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf

 It echoes Walter's opinions on contracts vs. runtime checking, 
 to a tee. He has been derided on occasion in this group for 
 such views. The paper is a good rationale from an independent 
 source.
Interesting paper. They mention that some preconditions can't be checked, or aren't feasible/worthwhile to check. While this is true, I think in most circumstances it is possible (and quite useful) to limit the effects of out-of-contract function invocations (or "soft UB" as they call it) by providing "guaranteed postconditions", for lack of a better expression. To borrow their example, consider: void sort(std::vector<int> &vec, Comparator comp); // Precondition: $comp provides a strict weak ordering on integers. // Postcondition: $vec is sorted with respect to the order dictacted by $comp. // Guaranteed postconditions (even if preconditions are unfulfilled): No data is modified except that which is accessible through $vec, no exceptions are thrown, no memory corruption occurs and no language-level UB occurs. In other words, invoking sort with an invalid comparator might mess up the given vector, but at least no "catastrophes" occur. Luckily, in D we have attributes (pure, safe, nothrow) that basically impose such useful "guaranteed postconditions" and are mechanically checkable.
Jun 14
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 14.06.20 14:49, Mark wrote:
 
 "guaranteed postconditions", for lack of a better expression.
Or more generally, it would actually be useful to have multiple pairs of pre- and postconditions, where each postcondition is established iff the corresponding precondition is established. This would also give a more natural way to inherit contracts. Your "guaranteed postcondition" would be a pre-/postcondition pair where the precondition is `true`.
 In other words, invoking sort with an invalid comparator might mess up 
 the given vector, but at least no "catastrophes" occur. Luckily, in D we 
 have attributes (pure,  safe, nothrow) that basically impose such useful 
 "guaranteed postconditions" and are mechanically checkable.
I wish. Right now, a failing assertion causes _language_ UB, even though assertions are allowed in safe code. It's a hole in the mechanical checking.
Jun 14
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 14.06.20 17:14, Timon Gehr wrote:
 where each postcondition is established iff the corresponding 
 precondition is established
Typo: That should have been `if`, not `iff`. (Of course the postcondition is allowed to hold even if the corresponding precondition does not.)
Jun 14
prev sibling parent Kagamin <spam here.lot> writes:
On Saturday, 13 June 2020 at 03:46:38 UTC, Andrei Alexandrescu 
wrote:
 A short and sweet paper:

 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf

 It echoes Walter's opinions on contracts vs. runtime checking, 
 to a tee. He has been derided on occasion in this group for 
 such views. The paper is a good rationale from an independent 
 source.
The approach trades security for performance. It can make sense, just should be opt in. I personally don't care what happens in release compilation mode, because my code doesn't rely on it, but many people were taught to use release mode, their code may not be ready for silent change of behavior.
Jun 18