www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Concept proposal: Safely catching error

reply Olivier FAURE <olivier.faure epitech.eu> writes:
I recently skimmed the "Bad array indexing is considered deadly" 
thread, which discusses the "array OOB throws Error, which throws 
the whole program away" problem.

The gist of the debate is:

- Array OOB is a programming problem; it means an invariant is 
broken, which means the code surrounding it probably makes 
invalid assumptions and shouldn't be trusted.

- Also, it can be caused by memory corruption.

- But then again, anything can be cause by memory corruption, so 
it's kind of an odd thing to worry about. We should worry about 
not causing it, not making memory corrupted programs safe, since 
it's extremely rare and there's not much we can do about it 
anyway.

- But memory corruption is super bad, if a proved error *might* 
be caused by memory corruption then we must absolutely throw the 
potentially corrupted data away without using it.

- Besides, even without memory corruption, the same argument 
applies to broken invariants; if we have data that breaks 
invariants, we need to throw it away, and use it as little as 
possible.

- But sometimes we have very big applications with lots of data 
and lots of code. If my server deals with dozens of clients or 
more, I don't want to brutally disconnect them all because I need 
to throw away one user's data.

- This could be achieved with processes. Then again, using 
processes often isn't practical for performance or architecture 
reasons.

My proposal for solving these problems would be to explicitly 
allow to catch Errors in  safe code IF the try block from which 
the Error is caught is perfectly pure.

In other words,  safe functions would be allowed to catch Error 
after try blocks if the block only mutates data declared inside 
of it; the code would look like:

     import vibe.d;

     // ...

     string handleRequestOrError(in HTTPServerRequest req)  safe {
         ServerData myData = createData();

         try {
             // both doSomethingWithData and mutateMyData are  pure

             doSomethingWithData(req, myData);
             mutateMyData(myData);

             return myData.toString;
         }
         catch (Error) {
             throw new SomeException("Oh no, a system error 
occured");
         }
     }

     void handleRequest(HTTPServerRequest req,
                        HTTPServerResponse res)  safe
     {
         try {
             res.writeBody(handleRequestOrError(req), 
"text/plain");
         }
         catch (SomeException) {
             // Handle exception
         }
     }

The point is, this is safe even when doSomethingWithData breaks 
an invariant or mutateMyData corrupts myData, because the 
compiler guarantees that the only data affected WILL be thrown 
away or otherwise unaccessible by the time catch(Error) is 
reached.

This would allow to design applications that can fail gracefully 
when dealing with multiple independent clients or tasks, even 
when one of the tasks has to thrown away because of a programmer 
error.

What do you think? Does the idea have merit? Should I make it 
into a DIP?
Jun 05 2017
next sibling parent reply ketmar <ketmar ketmar.no-ip.org> writes:
Olivier FAURE wrote:

 What do you think? Does the idea have merit? Should I make it into a DIP?
tbh, i think that it adds Yet Another Exception Rule to the language, and this does no good in the long run. "oh, you generally cannot do that, except if today is Friday, it is rainy, and you've seen pink unicorn at the morning." the more exceptions to general rules language has, the more it reminds Dragon Poker game from Robert Asprin books. any exception will usually have a strong rationale behind it, of course, so it will be a little reason to not accept it, especially if we had accepted some exceptions before. i think it is better to not follow that path, even if this one idea looks nice.
Jun 05 2017
parent reply Olivier FAURE <olivier.faure epitech.eu> writes:
On Monday, 5 June 2017 at 10:09:30 UTC, ketmar wrote:
 tbh, i think that it adds Yet Another Exception Rule to the 
 language, and this does no good in the long run. "oh, you 
 generally cannot do that, except if today is Friday, it is 
 rainy, and you've seen pink unicorn at the morning." the more 
 exceptions to general rules language has, the more it reminds 
 Dragon Poker game from Robert Asprin books.
Fair enough. A few counterpoints: - This one special case is pretty self-contained. It doesn't impact code that doesn't use it, and the users most likely to hear about it are the one who need to recover from Errors in their code. - It doesn't introduce elaborate under-the-hood tricks (unlike DIP 1008*). It uses already-existing concepts ( safe and pure), and is in fact closer to the intuitive logic behind Error recovery than the current model; instead of "You can't recover from Errors" you have "You can't recover from Errors unless you flush all data that might have been affected by it". *Note that I am not making a statement for or against those DIPs. I'm only using them as examples to compare my proposal against. So while this would add feature creep to the language, but I'd argue that feature creep would be pretty minor and well-contained, and would probably be worth the problem it would solve.
Jun 05 2017
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
Olivier FAURE wrote:

 On Monday, 5 June 2017 at 10:09:30 UTC, ketmar wrote:
 tbh, i think that it adds Yet Another Exception Rule to the language, 
 and this does no good in the long run. "oh, you generally cannot do 
 that, except if today is Friday, it is rainy, and you've seen pink 
 unicorn at the morning." the more exceptions to general rules language 
 has, the more it reminds Dragon Poker game from Robert Asprin books.
Fair enough. A few counterpoints: - This one special case is pretty self-contained. It doesn't require doesn't use it, and the users most likely to hear about it are the one who need to recover from Errors in their code. - It doesn't introduce elaborate under-the-hood tricks (unlike DIP 1008*). It uses already-existing concepts ( safe and pure), and is in fact closer to the intuitive logic behind Error recovery than the current model; instead of "You can't recover from Errors" you have "You can't recover from Errors unless you flush all data that might have been affected by it". *Note that I am not making a statement for or against those DIPs. I'm only using them as examples to compare my proposal against. So while this would add feature creep to the language, but I'd argue that feature creep would be pretty minor and well-contained, and would probably be worth the problem it would solve.
this still nullifies the sense of Error/Exception differences. not all errors are recoverable, even in safe code. assuming that it is safe to catch any Error in safe immediately turns it no unsafe. so... we will need to introduce RecoverableInSafeCodeError class, and change runtime to throw it instead of Error (sometimes). and even more issues follows (it's avalanche of changes, and possible code breakage too). so, in the original form your idea turns safe code into unsafe, and with more changes it becomes a real pain to implement, and adds more complexity to the language (another Dragon Poker modifier). using wrappers and carefully checking preconditions looks better to me. after all, if programmer failed to check some preconditions, the worst thing to do is trying to hide that by masking errors. bombing out is *way* better, i believe, 'cause it forcing programmer to really fix the bugs instead of creating hackish workarounds.
Jun 05 2017
parent Olivier FAURE <olivier.faure epitech.eu> writes:
On Monday, 5 June 2017 at 13:13:01 UTC, ketmar wrote:
 this still nullifies the sense of Error/Exception differences. 
 not all errors are recoverable, even in  safe code.

 ...

 using wrappers and carefully checking preconditions looks 
 better to me. after all, if programmer failed to check some 
 preconditions, the worst thing to do is trying to hide that by 
 masking errors. bombing out is *way* better, i believe, 'cause 
 it forcing programmer to really fix the bugs instead of 
 creating hackish workarounds.
I don't think this is a workaround, or that it goes against the purpose of Errors. The goal would still be to bomb out, cancel whatever you were doing, print a big red error message to the coder / user, and exit. A program that catches an Error would not try to use the data that broke a contract; in fact, the program would not have access to the invalid data, since it would be thrown away. It's natural progression would be to log the error, and quit whatever it was doing. The point is, if the program needs to free system resources before shutting down, it could do so; or if the program is a server or a multi-threaded app dealing with multiple clients at the same time, those clients would not be affected by a crash unrelated to their data.
Jun 07 2017
prev sibling next sibling parent reply Moritz Maxeiner <moritz ucworks.org> writes:
On Monday, 5 June 2017 at 09:50:15 UTC, Olivier FAURE wrote:
 My proposal for solving these problems would be to explicitly 
 allow to catch Errors in  safe code IF the try block from which 
 the Error is caught is perfectly pure.

 This would allow to design applications that can fail 
 gracefully when dealing with multiple independent clients or 
 tasks, even when one of the tasks has to thrown away because of 
 a programmer error.

 What do you think? Does the idea have merit? Should I make it 
 into a DIP?
Pragmatic question: How much work do you think this will require? Because writing a generic wrapper that you can customize the fault behaviour for using DbI requires very little[1]. [1] https://github.com/Calrama/libds/blob/fbceda333dbf76697050faeb6e25dbfcc9e3fbc0/src/ds/linear/array/dynamic.d
Jun 05 2017
parent reply Olivier FAURE <olivier.faure epitech.eu> writes:
On Monday, 5 June 2017 at 10:59:28 UTC, Moritz Maxeiner wrote:
 Pragmatic question: How much work do you think this will 
 require?
Good question. I'm no compiler programmer, so I'm not sure what the answer is. I would say "probably a few days at most". The change is fairly self-contained, and built around existing concepts (mutability and safety); I think it would mostly be a matter of adding a function to the safety checks that tests whether a mutable reference to non-local data is used in any try block with catch(Error). Another problem is that non-gc memory allocated in the try block would be irreversibly leaked when an Error is thrown (though now that I think about it, that would probably count as impure and be impossible anyway). Either way, it's not a safety risk and the programmer can decide whether leaking memory is worse than brutally shutting down for their purpose.
 Because writing a generic wrapper that you can customize the 
 fault behaviour for using DbI requires very little.
Using an array wrapper only covers part of the problem. Users may want their server to keep going even if they fail an assertion, or want the performance of nothrow code, or use a library that throws RangeError in very rare and hard to pinpoint cases. Arrays aside, I think there's some use in being able to safely recover from (or safely shut down after) the kind of broken contracts that throw Errors.
Jun 05 2017
parent reply Moritz Maxeiner <moritz ucworks.org> writes:
On Monday, 5 June 2017 at 12:01:35 UTC, Olivier FAURE wrote:
 On Monday, 5 June 2017 at 10:59:28 UTC, Moritz Maxeiner wrote:
 Pragmatic question: How much work do you think this will 
 require?
Another problem is that non-gc memory allocated in the try block would be irreversibly leaked when an Error is thrown (though now that I think about it, that would probably count as impure and be impossible anyway).
D considers allocating memory as pure[1].
 Either way, it's not a safety risk and the programmer can 
 decide whether leaking memory is worse than brutally shutting 
 down for their purpose.
Sure, but with regards to long running processes that are supposed to handle tens of thousands of requests, leaking memory (and continuing to run) will likely eventually end up brutally shutting down the process on out of memory errors. But yes, that is something that would have to be evaluated on a case by case basis.
 Because writing a generic wrapper that you can customize the 
 fault behaviour for using DbI requires very little.
Using an array wrapper only covers part of the problem.
It *replaces* the hard coded assert Errors with flexible attests, that can throw whatever you want (or even kill the process immediately), you just have to disable the runtimes internal bound checks via `-boundscheck=off`.
 Users may want their server to keep going even if they fail an 
 assertion
Normal assertions (other than assert(false)) are not present in -release mode, they are purely for debug mode.
 or want the performance of  nothrow code
That's easily doable with the attest approach.
 or use a library that throws RangeError in very rare and hard 
 to pinpoint cases.
Fix the library (or get it fixed if you don't have the code).
 Arrays aside, I think there's some use in being able to safely 
 recover from (or safely shut down after) the kind of broken 
 contracts that throw Errors.
I consider there to be value in allowing users to say "this is not a contract, it is a valid use case" (-> wrapper), but a broken contract being recoverable violates the entire concept of DbC. [1] https://dlang.org/spec/function.html#pure-functions
Jun 05 2017
parent reply Olivier FAURE <olivier.faure epitech.eu> writes:
On Monday, 5 June 2017 at 12:59:11 UTC, Moritz Maxeiner wrote:
 On Monday, 5 June 2017 at 12:01:35 UTC, Olivier FAURE wrote:
 Another problem is that non-gc memory allocated in the try 
 block would be irreversibly leaked when an Error is thrown 
 (though now that I think about it, that would probably count 
 as impure and be impossible anyway).
D considers allocating memory as pure[1]. ... Sure, but with regards to long running processes that are supposed to handle tens of thousands of requests, leaking memory (and continuing to run) will likely eventually end up brutally shutting down the process on out of memory errors. But yes, that is something that would have to be evaluated on a case by case basis.
Note that in the case you describe, the alternative is either "Brutally shutdown right now", or "Throwaway some data, potentially some memory as well, and maybe brutally shut down later if that happens too often". (although in the second case, there is also the trade-off that the leaking program "steals" memory from the other routines running on the same computer) Anyway, I don't think this would happen. Most forms of memory allocations are impure, and wouldn't be allowed in a try {} catch(Error) block; C's malloc() is pure, but C's free() isn't, so the thrown Error wouldn't be skipping over any calls to free(). Memory allocated by the GC would be reclaimed once the Error is caught and the data thrown away.
 Arrays aside, I think there's some use in being able to safely 
 recover from (or safely shut down after) the kind of broken 
 contracts that throw Errors.
I consider there to be value in allowing users to say "this is not a contract, it is a valid use case" (-> wrapper), but a broken contract being recoverable violates the entire concept of DbC.
I half-agree. There *should not* be way to say "Okay, the contract is broken, but let's keep going anyway". There *should* be a way to say "okay, the contract is broken, let's get rid of all data associated with it, log an error message to explain what went wrong, then kill *the specific thread/process/task* and let the others keep going". The goal isn't to ignore or bypass Errors, it's to compartmentalize the damage.
Jun 07 2017
parent Moritz Maxeiner <moritz ucworks.org> writes:
On Wednesday, 7 June 2017 at 15:35:56 UTC, Olivier FAURE wrote:
 On Monday, 5 June 2017 at 12:59:11 UTC, Moritz Maxeiner wrote:

 Anyway, I don't think this would happen. Most forms of memory 
 allocations are impure,
Not how pure is currently defined in D, see the referred spec; allocating memory is considered pure (even if it is impure with the theoretical pure definition). This is something that would need to be changed in the spec.
 I consider there to be value in allowing users to say "this is 
 not a contract, it is a valid use case" (-> wrapper), but a 
 broken contract being recoverable violates the entire concept 
 of DbC.
There *should* be a way to say "okay, the contract is broken, let's get rid of all data associated with it, log an error message to explain what went wrong, then kill *the specific thread/process/task* and let the others keep going". The goal isn't to ignore or bypass Errors, it's to compartmentalize the damage.
The problem is that in current operating systems the finest scope/context of computation you can (safely) kill / compartmentalize the damage in in order to allow the rest of the system to proceed is a process (-> process isolation). Anything finer than that (threads, fibers, etc.) may or may not work in a particular use case, but you can't guarantee/proof that it works in the majority of use cases (which is what the runtime would have to be able to do if we were to allow that behaviour as the default). Compartmentalizing like this is your job as the programmer imho, not the job of the runtime.
Jun 07 2017
prev sibling next sibling parent reply ag0aep6g <anonymous example.com> writes:
On 06/05/2017 11:50 AM, Olivier FAURE wrote:
 - But memory corruption is super bad, if a proved error *might* be 
 caused by memory corruption then we must absolutely throw the 
 potentially corrupted data away without using it.
 
 - Besides, even without memory corruption, the same argument applies to 
 broken invariants; if we have data that breaks invariants, we need to 
 throw it away, and use it as little as possible.
 
[...]
 
 My proposal for solving these problems would be to explicitly allow to 
 catch Errors in  safe code IF the try block from which the Error is 
 caught is perfectly pure.
 
 In other words,  safe functions would be allowed to catch Error after 
 try blocks if the block only mutates data declared inside of it; the 
 code would look like:
 
      import vibe.d;
 
      // ...
 
      string handleRequestOrError(in HTTPServerRequest req)  safe {
          ServerData myData = createData();
 
          try {
              // both doSomethingWithData and mutateMyData are  pure
 
              doSomethingWithData(req, myData);
              mutateMyData(myData);
 
              return myData.toString;
          }
          catch (Error) {
              throw new SomeException("Oh no, a system error occured");
          }
      }
 
      void handleRequest(HTTPServerRequest req,
                         HTTPServerResponse res)  safe
      {
          try {
              res.writeBody(handleRequestOrError(req), "text/plain");
          }
          catch (SomeException) {
              // Handle exception
          }
      }
 
 The point is, this is safe even when doSomethingWithData breaks an 
 invariant or mutateMyData corrupts myData, because the compiler 
 guarantees that the only data affected WILL be thrown away or otherwise 
 unaccessible by the time catch(Error) is reached.
But `myData` is still alive when `catch (Error)` is reached, isn't it? [...]
 
 What do you think? Does the idea have merit? Should I make it into a DIP?
How does ` trusted` fit into this? The premise is that there's a bug somewhere. You can't assume that the bug is in a ` system` function. It can just as well be in a ` trusted` one. And then ` safe` and `pure` mean nothing.
Jun 05 2017
parent reply Olivier FAURE <olivier.faure epitech.eu> writes:
On Monday, 5 June 2017 at 12:51:16 UTC, ag0aep6g wrote:
 On 06/05/2017 11:50 AM, Olivier FAURE wrote:
 In other words,  safe functions would be allowed to catch 
 Error after try blocks if the block only mutates data declared 
 inside of it; the code would look like:
 
      import vibe.d;
 
      // ...
 
      string handleRequestOrError(in HTTPServerRequest req) 
  safe {
          ServerData myData = createData();
 
          try {
             ...
          }
          catch (Error) {
              throw new SomeException("Oh no, a system error 
 occured");
          }
      }

      ...
But `myData` is still alive when `catch (Error)` is reached, isn't it?
Good catch; yes, this example would refuse to compile; myData needs to be declared in the try block.
 How does ` trusted` fit into this? The premise is that there's 
 a bug somewhere. You can't assume that the bug is in a 
 ` system` function. It can just as well be in a ` trusted` one. 
 And then ` safe` and `pure` mean nothing.
The point of this proposal is that catching Errors should be considered safe under certain conditions; code that catch Errors properly would be considered as safe as any other code, which is, "as safe as the trusted code it calls". I think the issue of trusted is tangential to this. If you (or the writer of a library you use) are using trusted to cast away pureness and then have side effects, you're already risking data corruption and undefined behavior, catching Errors or no catching Errors.
Jun 07 2017
parent reply ag0aep6g <anonymous example.com> writes:
On 06/07/2017 05:19 PM, Olivier FAURE wrote:
 How does ` trusted` fit into this? The premise is that there's a bug 
 somewhere. You can't assume that the bug is in a ` system` function. 
 It can just as well be in a ` trusted` one. And then ` safe` and 
 `pure` mean nothing.
I think I mistyped there. Makes more sense this way: "You can't assume that the bug is in a **` safe`** function. It can just as well be in a ` trusted` one."
 The point of this proposal is that catching Errors should be considered 
  safe under certain conditions; code that catch Errors properly would be 
 considered as safe as any other code, which is, "as safe as the  trusted 
 code it calls".
When no trusted code is involved, then catching an out-of-bounds error from a safe function is safe. No additional rules are needed. Assuming no compiler bugs, a safe function simply cannot corrupt memory without calling trusted code. You gave the argument against catching out-of-bounds errors as: "it means an invariant is broken, which means the code surrounding it probably makes invalid assumptions and shouldn't be trusted." That line of reasoning applies to trusted code. Only trusted code can lose its trustworthiness. safe code is guaranteed trustworthy (except for calls to trusted code). So the argument against catching out-of-bounds errors is that there might be misbehaving trusted code. And for misbehaving trusted code you can't tell the reach of the potential corruption by looking at the function signature.
 I think the issue of  trusted is tangential to this. If you (or the 
 writer of a library you use) are using  trusted to cast away pureness 
 and then have side effects, you're already risking data corruption and 
 undefined behavior, catching Errors or no catching Errors.
It's not about intentional misuse of the trusted attribute. trusted functions must be safe. The point is that an out-of-bounds error implies a bug somewhere. If the bug is in safe code, it doesn't affect safety at all. There is no explosion. But if the bug is in trusted code, you can't determine how large the explosion is by looking at the function signature.
Jun 07 2017
next sibling parent ag0aep6g <anonymous example.com> writes:
On 06/07/2017 09:45 PM, ag0aep6g wrote:
 When no  trusted code is involved, then catching an out-of-bounds error 
 from a  safe function is safe. No additional rules are needed. Assuming 
 no compiler bugs, a  safe function simply cannot corrupt memory without 
 calling  trusted code.
Thinking a bit more about this, I'm not sure if it's entirely correct. Can a safe language feature throw an Error *after* corrupting memory? For example, could `a[i] = n;` write the value first and do the bounds check afterwards? There's probably a better example, if this kind of "shoot first, ask questions later" style ever makes sense. If bounds checking could be implemented like that, you wouldn't be able to ever catch the resulting error safely. Wouldn't matter if it comes from safe or trusted code. Purity wouldn't matter either, because an arbitrary write like that doesn't care about purity.
Jun 07 2017
prev sibling parent reply Olivier FAURE <olivier.faure epitech.eu> writes:
On Wednesday, 7 June 2017 at 19:45:05 UTC, ag0aep6g wrote:
 You gave the argument against catching out-of-bounds errors as: 
 "it means an invariant is broken, which means the code 
 surrounding it probably makes invalid assumptions and shouldn't 
 be trusted."

 That line of reasoning applies to  trusted code. Only  trusted 
 code can lose its trustworthiness.  safe code is guaranteed 
 trustworthy (except for calls to  trusted code).
To clarify, when I said "shouldn't be trusted", I meant in the general sense, not in the memory safety sense. I think Jonathan M Davis put it nicely: On Wednesday, 31 May 2017 at 23:51:30 UTC, Jonathan M Davis wrote:
 Honestly, once a memory corruption has occurred, all bets are 
 off anyway. The core thing here is that the contract of 
 indexing arrays was violated, which is a bug. If we're going to 
 argue about whether it makes sense to change that contract, 
 then we have to discuss the consequences of doing so, and I 
 really don't see why whether a memory corruption has occurred 
 previously is relevant. [...] In either case, the runtime has 
 no way of determining the reason for the failure, and I don't 
 see why passing a bad value to index an array is any more 
 indicative of a memory corruption than passing an invalid day 
 of the month to std.datetime's Date when constructing it is 
 indicative of a memory corruption.
The sane way to protect against memory corruption is to write safe code, not code that *might* shut down brutally onces memory corruption has already occurred. This is done by using safe and proofreading all trusted functions in your libs. Contracts are made to preempt memory corruption, and to protect against *programming* errors; they're not recoverable because breaking a contract means that from now on the program is in a state that wasn't anticipated by the programmer. Which means the only way to handle them gracefully is to cancel what you were doing and go back to the pre-contract-breaking state, then produce a big, detailed error message and then exit / remove the thread / etc.
 I think the issue of  trusted is tangential to this. If you 
 (or the writer of a library you use) are using  trusted to 
 cast away pureness and then have side effects, you're already 
 risking data corruption and undefined behavior, catching 
 Errors or no catching Errors.
The point is that an out-of-bounds error implies a bug somewhere. If the bug is in safe code, it doesn't affect safety at all. There is no explosion. But if the bug is in trusted code, you can't determine how large the explosion is by looking at the function signature.
I don't think there is much overlap between the problems that can be caused by faulty trusted code and the problems than can be caught by Errors. Not that this is not a philosophical problem. I'm making an empirical claim: "Catching Errors would not open programs to memory safety attacks or accidental memory safety blunders that would not otherwise happen". For instance, if some poorly-written trusted function causes the size of an int[10] slice to be registered as 20, then your program becomes vulnerable to buffer overflows when you iterate over it; the buffer overflow will not throw any Error. I'm not sure what the official stance is on this. As far as I'm aware, contracts and OOB checks are supposed to prevent memory corruption, not detect it. Any security based on detecting potential memory corruption can ultimately be bypassed by a hacker.
Jun 08 2017
parent reply ag0aep6g <anonymous example.com> writes:
On 06/08/2017 11:27 AM, Olivier FAURE wrote:
 Contracts are made to preempt memory corruption, and to protect against 
 *programming* errors; they're not recoverable because breaking a 
 contract means that from now on the program is in a state that wasn't 
 anticipated by the programmer.
 
 Which means the only way to handle them gracefully is to cancel what you 
 were doing and go back to the pre-contract-breaking state, then produce 
 a big, detailed error message and then exit / remove the thread / etc.
I might get the idea now. The throwing code could be in the middle of some unsafe operation when it throws the out-of-bounds error. It would have cleaned up after itself, but it can't because of the (unexpected) error. Silly example: ---- void f(ref int* p) trusted { p = cast(int*) 13; /* corrupt stuff or partially initialize or whatever */ int[] a; auto x = a[0]; /* trigger an out-of-bounds error */ p = new int; /* would have cleaned up */ } ---- Catching the resulting error is safe when you throw the int* away. So if f is `pure` and you make sure that the arguments don't survive the `try` block, you're good, because f supposedly cannot have reached anything else. This is your proposal, right? I don't think that's sound. At least, it clashes with another relatively recent development: https://dlang.org/phobos/core_memory.html#.pureMalloc That's a wrapper around C's malloc. C's malloc might set the global errno, so it's impure. pureMalloc achieves purity by resetting errno to the value it had before the call. So a `pure` function may mess with global state, as long as it cleans it up. But when it's interrupted (e.g. by an out-of-bounds error), it may leave globals in an invalid state. So you can't assume that a `pure` function upholds its purity when it throws an error. In the end, an error indicates that something is wrong, and probably all guarantees may be compromised.
Jun 08 2017
parent reply Olivier FAURE <olivier.faure epitech.eu> writes:
On Thursday, 8 June 2017 at 13:02:38 UTC, ag0aep6g wrote:
 Catching the resulting error is  safe when you throw the int* 
 away. So if f is `pure` and you make sure that the arguments 
 don't survive the `try` block, you're good, because f 
 supposedly cannot have reached anything else. This is your 
 proposal, right?
Right.
 I don't think that's sound. At least, it clashes with another 
 relatively recent development:

 https://dlang.org/phobos/core_memory.html#.pureMalloc

 That's a wrapper around C's malloc. C's malloc might set the 
 global errno, so it's impure. pureMalloc achieves purity by 
 resetting errno to the value it had before the call.

 So a `pure` function may mess with global state, as long as it 
 cleans it up. But when it's interrupted (e.g. by an 
 out-of-bounds error), it may leave globals in an invalid state. 
 So you can't assume that a `pure` function upholds its purity 
 when it throws an error.
That's true. A "pure after cleanup" function is incompatible with catching Errors (unless we introduce a "scope(error)" keyword that also runs on errors, but that comes with other problems). Is pureMalloc supposed to be representative of pure functions, or more of a special case? That's not a rhetorical question, I genuinely don't know. The spec says a pure function "does not read or write any global or static mutable state", which seems incompatible with "save a global, then write it back like it was". In fact, doing so seems contrary to the assumption that you can run any two pure functions on immutable / independent data at the same time and you won't have race conditions. Actually, now I'm wondering whether pureMalloc & co handle potential race conditions at all, or just hope they don't happen.
Jun 08 2017
parent ag0aep6g <anonymous example.com> writes:
On 06/08/2017 04:02 PM, Olivier FAURE wrote:
 That's true. A "pure after cleanup" function is incompatible with 
 catching Errors (unless we introduce a "scope(error)" keyword that also 
 runs on errors, but that comes with other problems).
 
 Is pureMalloc supposed to be representative of pure functions, or more 
 of a special case? That's not a rhetorical question, I genuinely don't 
 know.
I think it's supposed to be just as pure as any other pure function. Here's the pull request that added it: https://github.com/dlang/druntime/pull/1746 I don't see anything about it being special-cased in the compiler or such.
 The spec says a pure function "does not read or write any global or 
 static mutable state", which seems incompatible with "save a global, 
 then write it back like it was".
True. Something similar is going on with safe. There's a list of things that are "not allowed in safe functions" [1], but you can do all those things in trusted code, of course. The list is about what the compiler rejects, not about what a safe function can actually do. It might be the same with the things that pure functions can/cannot do. I suppose the idea is that it cannot be observed that pureMalloc messes with global state, so it's ok. The assumption being that you don't catch errors. By the way, with regards to purity and errors, `new` is the same as pureMalloc. When `new` throws an OutOfMemoryError and you catch it, you can see that errno has been set. Yet `new` is considered `pure`.
 In fact, doing so seems contrary to the 
 assumption that you can run any two pure functions on immutable / 
 independent data at the same time and you won't have race conditions.
 
 Actually, now I'm wondering whether pureMalloc & co handle potential 
 race conditions at all, or just hope they don't happen.
Apparently errno is thread-local. [1] https://dlang.org/spec/function.html#safe-functions
Jun 08 2017
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/5/17 5:50 AM, Olivier FAURE wrote:
 I recently skimmed the "Bad array indexing is considered deadly" thread,
 which discusses the "array OOB throws Error, which throws the whole
 program away" problem.
[snip]
 My proposal for solving these problems would be to explicitly allow to
 catch Errors in  safe code IF the try block from which the Error is
 caught is perfectly pure.
I don't think this will work. Only throwing Error makes a function nothrow. A nothrow function may not properly clean up the stack while unwinding. Not because the stack unwinding code skips over it, but because the compiler knows nothing can throw, and so doesn't include the cleanup code. So this means, regardless of whether you catch an Error or not, the program may be in a state that is not recoverable. Not to mention that only doing this for pure code eliminates usages that sparked the original discussion, as my code communicates with a database, and that wouldn't be allowed in pure code. The only possible language change I can think of here, is to have a third kind of Throwable type. Call it SafeError. A SafeError would be only catchable in system or trusted code. This means that safe code would have to terminate, but any wrapping code that is calling the safe code (such as the vibe.d framework), could catch it and properly handle the error, knowing that everything was properly cleaned up, and knowing that because we are in safe code, there hasn't been a memory corruption (right?). Throwing a SafeError prevents a function from being marked nothrow. I can't see a way around this, unless we came up with another attribute (shudder). Then we can change the compiler (runtime?) to throwing SafeRangeError instead of RangeError inside safe code. All of this, I'm not proposing to do, because I don't see it being accepted. Creating a new array type which is used in my code will work, and avoids all the hassle of navigating the DIP system. -Steve
Jun 05 2017
parent reply Olivier FAURE <olivier.faure epitech.eu> writes:
On Monday, 5 June 2017 at 14:05:27 UTC, Steven Schveighoffer 
wrote:
 I don't think this will work. Only throwing Error makes a 
 function nothrow. A nothrow function may not properly clean up 
 the stack while unwinding. Not because the stack unwinding code 
 skips over it, but because the compiler knows nothing can 
 throw, and so doesn't include the cleanup code.
If the function is pure, then the only things it can set up will be stored on local or GC data, and it won't matter if they're not properly cleaned up, since they won't be accessible anymore. I'm not 100% sure about that, though. Can a pure function do impure things in its scope(exit) / destructor code?
 Not to mention that only doing this for pure code eliminates 
 usages that sparked the original discussion, as my code 
 communicates with a database, and that wouldn't be allowed in 
 pure code.
It would work for sending to a database; but you would need to use the functional programming idiom of "do 99% of the work in pure functions, then send the data to the remaining 1% for impure tasks". A process's structure would be: - Read the inputs from the socket (impure, no catching errors) - Parse them and transform them into database requests (pure) - Send the requests to the database (impure) - Parse / analyse / whatever the results (pure) - Send the results to the socket (impure) And okay, yeah, that list isn't realistic. Using functional programming idioms in real life programs can be a pain in the ass, and lead to convoluted callback-based scaffolding and weird data structures that you need to pass around a bunch of functions that don't really need them. The point is, you could isolate the pure data-manipulating parts of the program from the impure IO parts; and encapsulate the former in Error-catching blocks (which is convenient, since those parts are likely to be more convoluted and harder to foolproof than the IO parts, therefore likely to throw more Errors). Then if an Error occurs, you can close the connection the client (maybe send them an error packet beforehand), close the database file descriptor, log an error message, etc.
Jun 07 2017
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/7/17 12:20 PM, Olivier FAURE wrote:
 On Monday, 5 June 2017 at 14:05:27 UTC, Steven Schveighoffer wrote:
 I don't think this will work. Only throwing Error makes a function
 nothrow. A nothrow function may not properly clean up the stack while
 unwinding. Not because the stack unwinding code skips over it, but
 because the compiler knows nothing can throw, and so doesn't include
 the cleanup code.
If the function is pure, then the only things it can set up will be stored on local or GC data, and it won't matter if they're not properly cleaned up, since they won't be accessible anymore.
Hm... if you locked an object that was passed in on the stack, for instance, there is no guarantee the object gets unlocked.
 I'm not 100% sure about that, though. Can a pure function do impure
 things in its scope(exit) / destructor code?
Even if it does pure things, that can cause problems.
 Not to mention that only doing this for pure code eliminates usages
 that sparked the original discussion, as my code communicates with a
 database, and that wouldn't be allowed in pure code.
It would work for sending to a database; but you would need to use the functional programming idiom of "do 99% of the work in pure functions, then send the data to the remaining 1% for impure tasks".
Even this still pushes the handling of the error onto the user. I want vibe.d to handle the error, in case I create a bug. But vibe.d can't possibly know what database things I'm going to do. And really this isn't possible. 99% of the work is using the database.
 A process's structure would be:
 - Read the inputs from the socket (impure, no catching errors)
 - Parse them and transform them into database requests (pure)
 - Send the requests to the database (impure)
 - Parse / analyse / whatever the results (pure)
 - Send the results to the socket (impure)

 And okay, yeah, that list isn't realistic. Using functional programming
 idioms in real life programs can be a pain in the ass, and lead to
 convoluted callback-based scaffolding and weird data structures that you
 need to pass around a bunch of functions that don't really need them.

 The point is, you could isolate the pure data-manipulating parts of the
 program from the impure IO parts; and encapsulate the former in
 Error-catching blocks (which is convenient, since those parts are likely
 to be more convoluted and harder to foolproof than the IO parts,
 therefore likely to throw more Errors).
Aside from the point that this still doesn't solve the problem (pure functions do cleanup too), this means a lot of headache for people who just want to write code. I'd much rather just write an array type and be done. -Steve
Jun 08 2017
parent reply Olivier FAURE <olivier.faure epitech.eu> writes:
On Thursday, 8 June 2017 at 12:20:19 UTC, Steven Schveighoffer 
wrote:
 Hm... if you locked an object that was passed in on the stack, 
 for instance, there is no guarantee the object gets unlocked.
This wouldn't be allowed unless the object was duplicated / created inside the try block.
 Aside from the point that this still doesn't solve the problem 
 (pure functions do cleanup too), this means a lot of headache 
 for people who just want to write code. I'd much rather just 
 write an array type and be done.

 -Steve
Fair enough. There are other advantages to writing with "create data with pure functions then process it" idioms (easier to do unit tests, better for parallelism, etc), though.
Jun 08 2017
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/8/17 9:42 AM, Olivier FAURE wrote:
 On Thursday, 8 June 2017 at 12:20:19 UTC, Steven Schveighoffer wrote:
 Hm... if you locked an object that was passed in on the stack, for
 instance, there is no guarantee the object gets unlocked.
This wouldn't be allowed unless the object was duplicated / created inside the try block.
void foo(Mutex m, Data d) pure { synchronized(m) { // ... manipulate d } // no guarantee m gets unlocked } -Steve
Jun 08 2017
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Thursday, 8 June 2017 at 14:13:53 UTC, Steven Schveighoffer 
wrote:

 void foo(Mutex m, Data d) pure
 {
    synchronized(m)
    {
    	// ... manipulate d
    } // no guarantee m gets unlocked
 }

 -Steve
Isn't synchronized(m) not nothrow?
Jun 08 2017
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/8/17 11:19 AM, Stanislav Blinov wrote:
 On Thursday, 8 June 2017 at 14:13:53 UTC, Steven Schveighoffer wrote:

 void foo(Mutex m, Data d) pure
 {
    synchronized(m)
    {
        // ... manipulate d
    } // no guarantee m gets unlocked
 }
Isn't synchronized(m) not nothrow?
You're right, it isn't. I actually didn't know that. Also forgot to make my function nothrow. Fixed: void foo(Mutex m, Data d) pure nothrow { try { synchronized(m) { // .. manipulate d } } catch(Exception) { } } -Steve
Jun 08 2017
prev sibling parent Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
I want to start by stating that the discussion around being able 
to throw Error from nothrow functions and the compiler 
optimizations that follow is important to the thoughts below.

The other aspect of array bounds checking is that those 
particular checks will not be added in -release. There has been 
much discussion around this already and I do recall that the 
solution was that  safe code will retain the array bounds checks 
(I'm not sure if contracts was included in this). Thus if using 
-release and  safe you'd be able to rely on having an Error to 
catch.

Now it might make sense for  safe code to throw an 
ArrayOutOfBounds Exception, but that would mean the function 
couldn't be marked as nothrow if array indexing is used. This is 
probably a terrible idea, but  safe nothrow functions could throw 
ArrayIndexError while  safe could throw ArrayIndexException. It 
would really suck that adding nothrow would change the semantics 
silently.
Jun 08 2017