www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Assertions getting corrupted

reply Shachar Shemesh <shachar weka.io> writes:
I'm going to hate myself for suggesting this, but here goes.

There's a fundamental problem with scope(exit) and scope(failure). 
Consider the following code:

{
   a = allocate_something();
   scope(exit) a.cleanup();

   ...

   assert(a.nothingHorribleWentWrong);
}

Ideally, if that assert fails, you'd want the core dump and backtrace 
from the assertion point. That's the earliest point in which the problem 
is visible.

Except, all too often, what will happen is that the assert will throw an 
AssertionError. The scope(exit) will run, and then a.cleanup will 
segfault because, well, something horrible *did* go wrong. This makes it 
much more difficult to find out what actually went wrong.

Target: scope(failure) and scope(exit) should not run when the exception 
thrown is an AssertError.

Which leaves the floor open to two questions:
1. What other exceptions shouldn't run scope(exit) and scope(failure)?
2. Do we want scope(something) that *will* run on AssertError?

Obviously, the answer to 2 is linked to 1.

I think a reasonable approach is to say "scope(failxit) should on all 
Throwables except Errors". Note that this is not the same as saying 
"scope(failxit) runs only on Exceptions".

As for 2, that's the part I'm going to hate myself for. I will not 
object to adding "scope(fatal_error)", that do run on those cases 
(though I think just adding catch for those rare cases ought to be enough).

Shachar
Oct 25
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, October 26, 2017 09:04:56 Shachar Shemesh via Digitalmars-d 
wrote:
 I'm going to hate myself for suggesting this, but here goes.

 There's a fundamental problem with scope(exit) and scope(failure).
 Consider the following code:

 {
    a = allocate_something();
    scope(exit) a.cleanup();

    ...

    assert(a.nothingHorribleWentWrong);
 }

 Ideally, if that assert fails, you'd want the core dump and backtrace
 from the assertion point. That's the earliest point in which the problem
 is visible.

 Except, all too often, what will happen is that the assert will throw an
 AssertionError. The scope(exit) will run, and then a.cleanup will
 segfault because, well, something horrible *did* go wrong. This makes it
 much more difficult to find out what actually went wrong.

 Target: scope(failure) and scope(exit) should not run when the exception
 thrown is an AssertError.

 Which leaves the floor open to two questions:
 1. What other exceptions shouldn't run scope(exit) and scope(failure)?
 2. Do we want scope(something) that *will* run on AssertError?

 Obviously, the answer to 2 is linked to 1.

 I think a reasonable approach is to say "scope(failxit) should on all
 Throwables except Errors". Note that this is not the same as saying
 "scope(failxit) runs only on Exceptions".

 As for 2, that's the part I'm going to hate myself for. I will not
 object to adding "scope(fatal_error)", that do run on those cases
 (though I think just adding catch for those rare cases ought to be
 enough).
The language guarantees that scope(exit), scope(failure), and destructors will be run when an Exception is thrown. It does _not_ guarantee that any of those will be run when an Error is thrown (one case where they definitely aren't is when an Error is thrown from a function that's nothrow). Throwables that aren't Exceptions or Errors really aren't considered in these discussions normally, since almost no one ever derives from Throwable, and I don't think it's really intended that anyone do so much as it is possible (and I'm not sure why you would), but on the whole, the compiler cares about Exceptions vs other Throwables, and whatever is said for Errors normally applies to any Throwables other than Exceptions. Certainly, given the nature of nothrow, the language can't guarantee that cleanup is run for anything that is thrown that is not derived from Exception, regardless of whether it's derived from Error. Walter believes that it's worse to do cleanup when an Error is thrown than it is to not do cleanup, because the program is an unknown and invalid state, and the cleanup code could do more harm than good. Others have argued that it's better to run as much cleanup code as possible and that it's worse to skip cleanup, since usually, the program will be in a valid enough state that the cleanup will work and other problems will be avoided by doing that cleanup. Originally, cleanup did _not_ occur when anything other than an Exception was thrown, but several years ago, someone went and "fixed" druntime so that the cleanup did occur when it could, meaning that most any cleanup not involving nothrow will be run for Errors and other non-Exceptions. As I understand it, Walter disagrees with that change, but he's never mandated that it be undone or made the effort to undo it himself. Personally, overall, I agree with Walter and think that no cleanup should be done when any Error is thrown with the exception that having cleanup run for AssertErrors in unit tests is quite useful (scope(failure) in particular, since it makes it easy to print out information when a failure occurs and only when a failure occurs). So, I'd hate to see cleanup being skipped it unittest blocks when an assertion there fails. However, this topic has been hotly debated on several occasions, and IIRC, the majority of the folks who got involved in those discussions disagreed with Walter. It does sound like you at least partially agree with him though. If you want Errors and Throwables not derived from Exception to be treated differently, I think that you're going to need a really good argument for why, and the nature of nothrow means that it's impossible for cleanup to be guaranteed for anything other than Exceptions - though clearly, the current situation shows that it's possible to do clean-up for non-Exceptions at least some of the time, even if the language technically does not guarantee that cleanup is done for any Throwables which aren't Exceptions. - Jonathan M Davis
Oct 25
next sibling parent reply Shachar Shemesh <shachar weka.io> writes:
On 26/10/17 09:27, Jonathan M Davis wrote:
 since almost no one ever derives from Throwable,
 and I don't think it's really intended that anyone do so much as it is
 possible (and I'm not sure why you would)
Here's where we're doing it. Our product will often have several fibers essentially working for one common sub-component. Trouble is, with our product being distributed the way it is, it might be possible to simply kill that subcomponent. When that happens, we want to kill all the fibers that belong to that subcomponent. The solution is composed of several layers. First, we have a mechanism for injecting an exception into another fiber. This means that any function that may send a fiber to sleep, by definition, may also throw. We also have a FiberGroup, where we can assign fibers to the group. This way, when the component goes down, we just kill the group, and it sends an exception to all fibers that kills them. This leaves just one problem. What happens if someone catches that exception? We want all fibers to actually exit when we do that, and we rely on that happening. Obviously, this is not meant for malicious coders, so we /could/ mandate the following pattern: catch(FiberGroupExit ex) { throw ex; } catch(Exception ex) { ... } The problems with this pattern are huge. For one thing, you'd have to remember to put it anywhere you're catching an Exception. It's also quite verbose. Another solution is to have every other exception inherit not from Exception, but from some subclass of it. Problem is, this does not include exceptions defined in phobos, libraries, and pretty much anything else. It's also quite easy to forget. The solution we came up with was for FiberGroupExit to not inherit from Exception, but directly from Throwable. This means you can catch Exception as usual, but injecting a FiberGroupExit (or a ReactorExit) unconditionally terminates your fiber, while running all proper cleanups. Running the cleanups is important, as the process remains running. It makes no sense for scope(exit) not to clean up merely because a fiber is quitting. Hope this clears up a few things. Shachar
Oct 26
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, October 26, 2017 11:15:52 Shachar Shemesh via Digitalmars-d 
wrote:
 On 26/10/17 09:27, Jonathan M Davis wrote:
 since almost no one ever derives from Throwable,
 and I don't think it's really intended that anyone do so much as it is
 possible (and I'm not sure why you would)
Here's where we're doing it. Our product will often have several fibers essentially working for one common sub-component. Trouble is, with our product being distributed the way it is, it might be possible to simply kill that subcomponent. When that happens, we want to kill all the fibers that belong to that subcomponent. The solution is composed of several layers. First, we have a mechanism for injecting an exception into another fiber. This means that any function that may send a fiber to sleep, by definition, may also throw. We also have a FiberGroup, where we can assign fibers to the group. This way, when the component goes down, we just kill the group, and it sends an exception to all fibers that kills them. This leaves just one problem. What happens if someone catches that exception? We want all fibers to actually exit when we do that, and we rely on that happening. Obviously, this is not meant for malicious coders, so we /could/ mandate the following pattern: catch(FiberGroupExit ex) { throw ex; } catch(Exception ex) { ... } The problems with this pattern are huge. For one thing, you'd have to remember to put it anywhere you're catching an Exception. It's also quite verbose. Another solution is to have every other exception inherit not from Exception, but from some subclass of it. Problem is, this does not include exceptions defined in phobos, libraries, and pretty much anything else. It's also quite easy to forget. The solution we came up with was for FiberGroupExit to not inherit from Exception, but directly from Throwable. This means you can catch Exception as usual, but injecting a FiberGroupExit (or a ReactorExit) unconditionally terminates your fiber, while running all proper cleanups. Running the cleanups is important, as the process remains running. It makes no sense for scope(exit) not to clean up merely because a fiber is quitting. Hope this clears up a few things.
Hmmm. Well, that does make sense, but if cleanup is important, it's not a great idea, because not only does the language definition not guarantee that cleanup happens when anything other than an Exception is thrown, it _can't_ guarantee it, because nothrow specifically means that no Exception was thrown, not that no Throwable was thrown, and when a function is nothrow, at least some portion of the normal exception mechanism stuff is skipped for performance. With the current implementation, if you never use any nothrow functions, I _think_ that it's the case that you're always going to get cleanup, but if you ever get a nothrow function (and nothrow _could_ be inferred if templates are involved), then potentially some cleanup will be skipped (which is particularly bad if you rely on destructors for stuff like reference counting). And while I _think_ that the current implementation guarantees it when nothrow isn't involved, the spec most definitely does not, and Walter has been adamant about that. So, the runtime could be changed in the future so that no cleanup is done for anything that isn't derived from Exception (I'm honestly a bit surprised that Walter has never done it given how adamant he is that trying to do cleanup when an Error is thrown is just making things worse). And if it is, then you won't get any cleanup for your non-Exception Throwables. But even if the you never purposefully use nothrow, and the runtime never gets changed to skip cleanup for non-Exceptions, template inferrence could really bite you if it ever decides that a function is nothrow. So, that alone makes me think that your approach is risky. It may very well be that what you've done is your best choice given the options, but you're relying on behavior that the spec specifically does not guarantee (i.e. cleanup when a non-Exception is thrown), and the fact that it would be screwed up right now if nothrow ever gets involved means that it's not even just a question of what the spec guarantees vs what the runtime currently does. You could easily have a serious problem right now. Unless you know for sure that you never throw one of your Throwables from a nothrow function (which is hard to know unless you don't use templates in any of the code using these Throwables), you could very well be missing some cleanup in your code right now, causing subtle bugs that would not be there if you were using Exceptions. If you're very careful about templates and nothrow, you can certainly make it work with the current runtime implementation, but if I were you, I'd use an approach that didn't use Throwables that aren't derived from either Exception or Error. Even if any other solution has other issues, at least it's not going to involve cleanup code being skipped on you - though it sounds like you may be stuck between the risk of cleanup code being skipped (but fibers exiting when you want them to) and fibers not exiting when you want them to (but cleanup code not being skipped). Clearly, you have a bit of a thorny problem. - Jonathan M Davis
Oct 26
prev sibling parent reply bauss <jj_1337 live.dk> writes:
On Thursday, 26 October 2017 at 06:27:53 UTC, Jonathan M Davis 
wrote:
 On Thursday, October 26, 2017 09:04:56 Shachar Shemesh via 
 Digitalmars-d wrote:
 ...
Walter believes that it's worse to do cleanup when an Error is thrown than it is to not do cleanup, because the program is an unknown and invalid state, and the cleanup code could do more harm than good. Others have argued that it's better to run as much cleanup code as possible and that it's worse to skip cleanup, since usually, the program will be in a valid enough state that the cleanup will work and other problems will be avoided by doing that cleanup. - Jonathan M Davis
For web-development your program may not be an invalid state even if an error is thrown, a user may be in an invalid state, but you don't want your whole website to go down, because one user got into an invalid state. Example on an error thrown where the program is still in a valid state. void doStuffOnUser(User user) { if (!user.role.permissions) return; // ... } Let's say role gets set from the session of the user, which means it's tied to user's browser, so if the session fails to be retrieved and the code still continues to this function then role would be null. Of course it's a bug in the program and you __could__ argue that the program is in an invalid state, but for web applications you don't want it to crash over bugs, you just want it to log and then hopefully no more runs into that issue so you can fix it asap.You want to guarantee up-times no matter what. But anyway since role is null we'll get an access violation, which is not an exception. However the state is only invalid for that one user and possibly not every other user you have, but because it's not an exception the whole program will crash if you actually follow the D guide-lines for exceptions, which couldcause down-time for thousands of users if you have a big website. If it happens in the night-time the website could be down for hours, before you even know about it and that is __really__ bad, because depending on the website it could be a lot money that you're losing. Time is money, especially when it comes to web applications. Amazon did a test once by making their load times 100ms slower and they lost millions in revenue, so imagine if they had hour long down times. It most likely would be catastrophic. If D really wants to succeed with such things, then we cannot assume the program is in an invalid state. It must be up to the developer themselves to figure out if it's in an invalid state or not.
Oct 26
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, October 26, 2017 10:05:11 bauss via Digitalmars-d wrote:
 If D really wants to succeed with such things, then we cannot
 assume the program is in an invalid state. It must be up to the
 developer themselves to figure out if it's in an invalid state or
 not.
We've been over this before plenty of times in this newsgroup. If an Error is thrown, then by definition the program is in an invalid state, because Errors are thrown when bugs are encountered in the program or when the program encounters an unrecoverable condition. As such, it makes no sense whatsoever to recover an Error. Would you try to recover from a segfault? I should hope not, and an Error is pretty much the same thing except that it's done by the language and involves a stack trace rather being done by the hardware and resulting in a core dump. If you don't want your web application going down because of an Error, then write it in a way that you avoid Errors (e.g. check indices before indexing an array rather than relying on RangeError if you can't guarantee that the index is valid). No program should be encountering Errors, and there's nothing special about web servers that makes it okay for them to be hitting them. If your program encounters an Error, then it's buggy, and it needs to be fixed. Making it continue to run after an Error was thrown might work under some circumstances, but in the general case, you're at serious risk of making things worse. As with any program, avoiding the case where your web server terminates due to Errors is a combination of testing it to make sure that the code is correct and making sure that you test any conditions coming from user input or the environment which would ever result in an Error if they weren't checked. I really don't see what the problem with that is or why anyone would then think that it's magically okay to treat Errors differently just because your application is a web server. And if you're really that concerned about your code being so bad that you're going to hit Errors with any regularity, then just make it so that your server restarts if it goes down and load balance across multiple instances. Any service that's looking to be truly robust is going to need multiple instances running anyway. By definition, Errors are _fatal_ error conditions. It's _Exceptions_ which are used for recoverable error conditions, not Errors, and in the long run, you're just shooting yourself in the foot if you try to get your program to recover from an Error instead of terminating. - Jonathan M Davis
Oct 26
prev sibling parent reply Johan Engelen <j j.nl> writes:
On Thursday, 26 October 2017 at 06:04:56 UTC, Shachar Shemesh 
wrote:
 There's a fundamental problem with scope(exit) and 
 scope(failure). Consider the following code:

 {
   a = allocate_something();
   scope(exit) a.cleanup();

   ...

   assert(a.nothingHorribleWentWrong);
 }

 Ideally, if that assert fails, you'd want the core dump and 
 backtrace from the assertion point. That's the earliest point 
 in which the problem is visible.

 Except, all too often, what will happen is that the assert will 
 throw an AssertionError. The scope(exit) will run, and then 
 a.cleanup will segfault because, well, something horrible *did* 
 go wrong. This makes it much more difficult to find out what 
 actually went wrong.
Destructors will have the same problem, right? It seems to me that the cause of the problem is that `assert` throws instead of aborting execution right there. -Johan
Oct 25
parent Shachar Shemesh <shachar weka.io> writes:
On 26/10/17 09:32, Johan Engelen wrote:
 Destructors will have the same problem, right?
 It seems to me that the cause of the problem is that `assert` throws 
 instead of aborting execution right there.
 
 -Johan
And, indeed, my solution in the specific case that triggered this post was to switch from "assert" to "ASSERT", our own internal implementation. Here's a snippet from it: version(unittest ){ throw new AssertError("Assertion failure", file, line); } else { DIE("Assertion failure", file, line); }
Oct 26