www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Performance of exception handling

reply Alexander <aldem+dmars nk7.net> writes:
Hi,

I'm doing some benchmarks (DMD 2.052 on Linux), and noticed that exception
handling is terrible
slow even on quite fast hardware.

There is a note on DM's site, saying:

"Because errors are unusual, execution of error handling code is not
performance critical."

Well, sure this is (somehow) true, but *so* slow? In my tests, Xeon 3.4 GHz is
able to handle
only ca. 1000 exceptions/s (!). 1ms for single exception is a little bit too
much, especially
when application needs to recover fast, not to mention that on slower hardware
(like Atom) it
will be really slow.

And, since some exceptions are not so unusual in normal flow (file not found,
connection reset etc),
"not performance critical" is not really applicable. Simple example - in case
of web (or any other)
server with high load, socket exceptions of any kind are quite common. It is
still natural to use
try/catch to handle them, but it will slow down everything else (imagine > 1K
connections/s and 10%
cut rate).

So, my question - is there something, that can improve performance? Any clues
where to dig for this?

Thank you!

Best regards,
/Alexander
Apr 26 2011
next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tue, 26 Apr 2011 13:08:33 +0300, Alexander <aldem+dmars nk7.net> wrote:

 Hi,

 I'm doing some benchmarks (DMD 2.052 on Linux), and noticed that  
 exception handling is terrible
 slow even on quite fast hardware.

On my Windows box with an i7 920, a simple try/throw/catch loop runs at about 130000 iterations per second. Perhaps DMD doesn't use SEH on Linux, and instead uses setjmp/longjmp? -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Apr 26 2011
next sibling parent Alexander <aldem+dmars nk7.net> writes:
On 26.04.2011 12:57, Vladimir Panteleev wrote:

 On my Windows box with an i7 920, a simple try/throw/catch loop runs at about
130000 iterations per second.

Well, g++ with same loop on same linux system gives ca. 160000 iter/s, which is quite OK for me.
 Perhaps DMD doesn't use SEH on Linux, and instead uses setjmp/longjmp?

I've not found any references to setjmp/longjmp, but what I've found - disabling trace handler with "Runtime.traceHandler = null" boost performance significantly - in my case I got 1600000 iter/s (wow!), which is perfectly OK. AFAIK, traceHandler is something that prints out the stack trace, providing valuable information only when there is no catch. If there is a catch, then, obviously, this slows down exception processing significantly without any need (it is called on every invocation of throw). I am right? /Alexander
Apr 26 2011
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-04-26 12:57, Vladimir Panteleev wrote:
 On Tue, 26 Apr 2011 13:08:33 +0300, Alexander <aldem+dmars nk7.net> wrote:

 Hi,

 I'm doing some benchmarks (DMD 2.052 on Linux), and noticed that
 exception handling is terrible
 slow even on quite fast hardware.

On my Windows box with an i7 920, a simple try/throw/catch loop runs at about 130000 iterations per second. Perhaps DMD doesn't use SEH on Linux, and instead uses setjmp/longjmp?

SEH is only available on Windows. On all other platforms DWARF exceptions (I think that's what they're called) are used. -- /Jacob Carlborg
Apr 26 2011
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Alexander:

 I'm doing some benchmarks (DMD 2.052 on Linux), and noticed that exception
handling is terrible
 slow even on quite fast hardware.

On Windows I have timed DMD exceptions to be about 12 times slower than (irreducible) Java ones. I have done only a limited amount of different benchmarks on this. Bye, bearophile
Apr 26 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tue, 26 Apr 2011 15:43:26 +0300, Alexander <aldem+dmars nk7.net> wrote:

   I am right?

Sounds spot-on to me :) I recall there was a similar performance problem due to stack traces, with the "priority send" function in the message queue implementation. The "priority send" function constructed an exception object, which made it slower than the normal "send" function. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Apr 26 2011
prev sibling next sibling parent reply Kagamin <spam here.lot> writes:
Alexander Wrote:

 I'm doing some benchmarks (DMD 2.052 on Linux), and noticed that exception
handling is terrible
 slow even on quite fast hardware.

Did you compare with c++ dwarf implementation?
Apr 26 2011
next sibling parent reply Alexander <aldem+dmars nk7.net> writes:
On 26.04.2011 15:13, Kagamin wrote:

 Did you compare with c++ dwarf implementation?

Yes, c++ is significantly faster. But it seems that my problem is solved already. /Alexander
Apr 26 2011
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Am 26.04.2011 15:16, schrieb Alexander:
 On 26.04.2011 15:13, Kagamin wrote:
 
 Did you compare with c++ dwarf implementation?

Yes, c++ is significantly faster. But it seems that my problem is solved already. /Alexander

It'd be nice to have fast(er) exceptions and still be able to get backtraces for uncaught exceptions (or even for caught ones on demand). Cheers, - Daniel
Apr 26 2011
next sibling parent Alexander <aldem+dmars nk7.net> writes:
On 26.04.2011 15:20, Daniel Gibson wrote:

 It'd be nice to have fast(er) exceptions and still be able to get
 backtraces for uncaught exceptions (or even for caught ones on demand).

Well, quick-and-dirty solution for this (uncaught only, though) follows. Not very nice, but "works for me" :) ---snip--- import core.runtime; int main(string[] argv) { auto oldTraceHandler = Runtime.traceHandler; Runtime.traceHandler = null; try { return try_main(argv); } catch (Throwable ex) { Runtime.traceHandler = oldTraceHandler; throw ex; } } int try_main(string[] argv) { // Real main() code } ---snip---
Apr 26 2011
prev sibling next sibling parent Alexander <aldem+dmars nk7.net> writes:
On 26.04.2011 18:14, Sean Kelly wrote:

 It would be nice to generate them lazily but I don't think that's possible. 

Why not? My brute-force approach works (temporarily disabling traceHandler), so why it couldn't be done more nicely somewhere in deh2 (I couldn't figure out yet, how exactly)? /Alexander
Apr 26 2011
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/9/2011 8:09 PM, Andrej Mitrovic wrote:
 I'll have to agree that exceptions are quite slow. I was just testing
 out UTF's decode() function on this UTF-8 test file:
 http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt .

 It has a few dozen invalid UTF sequences and is a good stress-test for
 a decoder. The  std.utf.decode function throws an exception on invalid
 UTF sequences.

 On that test file (only 20 kilobytes large), loading of a file and
 skipping invalid sequences takes 210msecs.

 If I change decode() to use a bool to flag invalid sequences instead
 of using exceptions, the UTF test file is parsed in 1.4msecs. Now
 that's quite a difference.

Validating input data should not be done with exceptions. The exceptions should be thrown only for data that is *expected* to be correct.
May 09 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/9/2011 9:48 PM, Andrej Mitrovic wrote:
 Well I can understand throwing exceptions when using readln() or
 validate(), but decode() is used for one code point at a time.
 Throwing is overkill imo.

Perhaps decode() is badly designed.
May 09 2011
parent Alexander <aldem+dmars nk7.net> writes:
On 10.05.2011 16:53, Robert Jacques wrote:

 Well, you are supposed to use validate first on any untrusted input. Now the
fact that validate returns void and throws an exception and there is no
corresponding 'isValid' routine probably is bad design.

Not to mention that validating first and then running again through the same procedure, thus double-checking input, is a waste of time... ;) /Alexander
May 10 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
Right now, traces are generated on throw. It should be possible to generate t=
hem on catch instead. The performance would be the same either way however. I=
t would be nice to generate them lazily but I don't think that's possible.=20=


Sent from my iPhone

On Apr 26, 2011, at 6:20 AM, Daniel Gibson <metalcaedes gmail.com> wrote:

 Am 26.04.2011 15:16, schrieb Alexander:
 On 26.04.2011 15:13, Kagamin wrote:
=20
 Did you compare with c++ dwarf implementation?

Yes, c++ is significantly faster. But it seems that my problem is solved=


=20
 /Alexander

It'd be nice to have fast(er) exceptions and still be able to get backtraces for uncaught exceptions (or even for caught ones on demand). =20 Cheers, - Daniel

Apr 26 2011
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 26 Apr 2011 20:14:05 +0400, Sean Kelly <sean invisibleduck.org>  
wrote:

 Right now, traces are generated on throw. It should be possible to  
 generate them on catch instead. The performance would be the same either  
 way however. It would be nice to generate them lazily but I don't think  
 that's possible.

 Sent from my iPhone

Interesting enough, it is already done lazily (in toString(), which I believe should also cache result when it's called first time), but it can be improved a bit. Here is what it currently does: class DefaultTraceInfo : Throwable.TraceInfo { this() { static enum MAXFRAMES = 128; void*[MAXFRAMES] callstack; numframes = backtrace( callstack, MAXFRAMES ); // 1 framelist = backtrace_symbols( callstack, numframes ); // 2 } // ... } While the first line is definitely required (otherwise stack information gets lost), I believe the second one is not and can be moved to opApply (or somewhere else).
Apr 26 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Apr 26, 2011, at 9:29 AM, Denis Koroskin wrote:

 On Tue, 26 Apr 2011 20:14:05 +0400, Sean Kelly =

=20
 Right now, traces are generated on throw. It should be possible to =


way however. It would be nice to generate them lazily but I don't think = that's possible.
=20
 Sent from my iPhone
=20

Interesting enough, it is already done lazily (in toString(), which I =

can be improved a bit. The readable version is generated in toString, but the actual trace = occurs on throw. This pretty much requires a memory allocation to store = the trace info, which is one cause for the performance hit. Originally, I didn't have the tracing enabled by default, but no one = seemed to like that. I could try enabling it only for non-release = builds if that would be preferable.=
Apr 26 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 26 Apr 2011 14:26:00 -0400, Sean Kelly <sean invisibleduck.org>  
wrote:

 On Apr 26, 2011, at 9:29 AM, Denis Koroskin wrote:

 On Tue, 26 Apr 2011 20:14:05 +0400, Sean Kelly <sean invisibleduck.org>  
 wrote:

 Right now, traces are generated on throw. It should be possible to  
 generate them on catch instead. The performance would be the same  
 either way however. It would be nice to generate them lazily but I  
 don't think that's possible.



If it's possible to generate them on catch, couldn't you request trace generation while inside the catch block (or at the beginning of a catch block) if you intend to use it? That is: try { ... } catch(Exception e) { e.generateTrace(); // only allowed at the start of the catch block printException(e); // if you haven't generated the trace by now, it's gone. } You'd need compiler help for this probably. But I agree with the OP, 99% of the time you are catching exceptions you do not need the trace.
 Interesting enough, it is already done lazily (in toString(), which I  
 believe should also cache result when it's called first time), but it  
 can be improved a bit.

The readable version is generated in toString, but the actual trace occurs on throw. This pretty much requires a memory allocation to store the trace info, which is one cause for the performance hit.

I can't help but think that using the GC may be overkill here. At the very least, you can use C malloc which is usually much faster. The exception should encapsulate the memory allocation and deallocation completely. -Steve
Apr 26 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Apr 26, 2011, at 11:39 AM, Steven Schveighoffer wrote:

 On Tue, 26 Apr 2011 14:26:00 -0400, Sean Kelly =

=20
 On Apr 26, 2011, at 9:29 AM, Denis Koroskin wrote:
=20
 On Tue, 26 Apr 2011 20:14:05 +0400, Sean Kelly =



=20
 Right now, traces are generated on throw. It should be possible to =




way however. It would be nice to generate them lazily but I don't think = that's possible.
=20
 If it's possible to generate them on catch, couldn't you request trace =

block) if you intend to use it?
=20
 That is:
=20
 try
 {
   ...
 }
 catch(Exception e)
 {
   e.generateTrace(); // only allowed at the start of the catch block
   printException(e); // if you haven't generated the trace by now, =

 }
=20
 You'd need compiler help for this probably.  But I agree with the OP, =

I suppose it's a matter of how easy this stuff should be to use. I = generally like the default behavior to be the most foolproof.
 Interesting enough, it is already done lazily (in toString(), which =



can be improved a bit.
=20
 The readable version is generated in toString, but the actual trace =


the trace info, which is one cause for the performance hit.
=20
 I can't help but think that using the GC may be overkill here.  At the =

exception should encapsulate the memory allocation and deallocation = completely. The DefaultTraceInfo object is allocated via the GC, and the trace info = itself is allocated via malloc() inside the backtrace() call. See = core.runtime for the details--the code is pretty succinct.=
Apr 26 2011
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 26 Apr 2011 22:26:00 +0400, Sean Kelly <sean invisibleduck.org>  
wrote:

 On Apr 26, 2011, at 9:29 AM, Denis Koroskin wrote:

 On Tue, 26 Apr 2011 20:14:05 +0400, Sean Kelly <sean invisibleduck.org>  
 wrote:

 Right now, traces are generated on throw. It should be possible to  
 generate them on catch instead. The performance would be the same  
 either way however. It would be nice to generate them lazily but I  
 don't think that's possible.

 Sent from my iPhone

Interesting enough, it is already done lazily (in toString(), which I believe should also cache result when it's called first time), but it can be improved a bit.

The readable version is generated in toString, but the actual trace occurs on throw. This pretty much requires a memory allocation to store the trace info, which is one cause for the performance hit. Originally, I didn't have the tracing enabled by default, but no one seemed to like that. I could try enabling it only for non-release builds if that would be preferable.

You might have misunderstood me, or I might be wrong, but I believe you only need to store backtrace at the point where exception occurs, and resolve the symbols later (e.g. at toString()). Both are currently done in the DefaultTraceInfo ctor (see the code that you snipped). Collecting the backtrace is very fast, and doesn't allocate on its own: static enum MAXFRAMES = 128; void*[MAXFRAMES] callstack; numframes = backtrace( callstack, MAXFRAMES ); It's the "backtrace_symbols" that does all the hard work, I believe, and as such moving it out of the DefaultTraceInfo ctor should be enough. On the other had, DefaultTraceInfo could be made a struct and be moved to the Exception class to minimize allocations if you feel that exceptions should allocate as little as possible (if any at all, because they may end up throwing OutOfMemory exception). Side question: we don't create OutOfMemory instances, throwing predefined ones instead, right? I don't think that work with exception chaining, does it? I mean, what if we e.g. catch an OOM, try to add a log entry, try to allocate and end up with OOM again. An exception chaining implemented with an intrusive linked list (i.e. implemented with an embedded Throwable next;) will not work for this case. Does druntime handle this case correctly?
Apr 26 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Apr 26, 2011, at 11:56 AM, Denis Koroskin wrote:

 On Tue, 26 Apr 2011 22:26:00 +0400, Sean Kelly =

=20
 On Apr 26, 2011, at 9:29 AM, Denis Koroskin wrote:
=20
 On Tue, 26 Apr 2011 20:14:05 +0400, Sean Kelly =



=20
 Right now, traces are generated on throw. It should be possible to =




way however. It would be nice to generate them lazily but I don't think = that's possible.
=20
 Sent from my iPhone
=20

Interesting enough, it is already done lazily (in toString(), which =



can be improved a bit.
=20
 The readable version is generated in toString, but the actual trace =


the trace info, which is one cause for the performance hit.
=20
 Originally, I didn't have the tracing enabled by default, but no one =


builds if that would be preferable.
=20
 You might have misunderstood me, or I might be wrong, but I believe =

and resolve the symbols later (e.g. at toString()). Both are currently = done in the DefaultTraceInfo ctor (see the code that you snipped).
=20
 Collecting the backtrace is very fast, and doesn't allocate on its =

=20
 static enum MAXFRAMES =3D 128;
 void*[MAXFRAMES]  callstack;
=20
 numframes =3D backtrace( callstack, MAXFRAMES );
=20
 It's the "backtrace_symbols" that does all the hard work, I believe, =

It would make the DefaultTraceInfo class substantially larger, but = that's a worthwhile tradeoff if performance is sufficiently improved. = I'll try it out.
 On the other had, DefaultTraceInfo could be made a struct and be moved =

exceptions should allocate as little as possible (if any at all, because = they may end up throwing OutOfMemory exception).
=20
 Side question: we don't create OutOfMemory instances, throwing =

chaining, does it? I mean, what if we e.g. catch an OOM, try to add a = log entry, try to allocate and end up with OOM again. An exception = chaining implemented with an intrusive linked list (i.e. implemented = with an embedded Throwable next;) will not work for this case. Does = druntime handle this case correctly? The general rule for this is that if a static instance is deemed to be = collateral then it's discarded, and if it should replace the current = in-flight exception then the chain is lost. This case used to be = handled explicitly. It isn't any more, but the code still appears to = work (I tested to make sure a few weeks ago). I want to look at this = more carefully though, because I expected to see a segfault and was = surprised when it didn't happen. I don't like unexplained things.=
Apr 26 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
I'll have to agree that exceptions are quite slow. I was just testing
out UTF's decode() function on this UTF-8 test file:
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt .

It has a few dozen invalid UTF sequences and is a good stress-test for
a decoder. The  std.utf.decode function throws an exception on invalid
UTF sequences.

On that test file (only 20 kilobytes large), loading of a file and
skipping invalid sequences takes 210msecs.

If I change decode() to use a bool to flag invalid sequences instead
of using exceptions, the UTF test file is parsed in 1.4msecs. Now
that's quite a difference.

Also, loading and validating a 400 kilobyte text file is done in
35msecs, which I'm quite happy about. I'm not sure if the Scintilla
editing component applies syntax coloring on the entire file before
displaying any lines, but if not then I could easily beat its
performance. It takes about a second for it to load and display a
400kbyte source file (with syntax highlighting of course).
May 09 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Well I can understand throwing exceptions when using readln() or
validate(), but decode() is used for one code point at a time.
Throwing is overkill imo.
May 09 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 5/10/11, Walter Bright <newshound2 digitalmars.com> wrote:
 On 5/9/2011 9:48 PM, Andrej Mitrovic wrote:
 Well I can understand throwing exceptions when using readln() or
 validate(), but decode() is used for one code point at a time.
 Throwing is overkill imo.

Perhaps decode() is badly designed.

It has goto's and exceptions. What's not to like? :p
May 09 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 5/10/11, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 5/10/11, Walter Bright <newshound2 digitalmars.com> wrote:
 On 5/9/2011 9:48 PM, Andrej Mitrovic wrote:
 Well I can understand throwing exceptions when using readln() or
 validate(), but decode() is used for one code point at a time.
 Throwing is overkill imo.

Perhaps decode() is badly designed.

It has goto's and exceptions. What's not to like? :p

Oh and an enforce at the top to kill any potential inlining. Hehehe.
May 09 2011
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 10 May 2011 01:15:04 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 5/9/2011 9:48 PM, Andrej Mitrovic wrote:
 Well I can understand throwing exceptions when using readln() or
 validate(), but decode() is used for one code point at a time.
 Throwing is overkill imo.

Perhaps decode() is badly designed.

Well, you are supposed to use validate first on any untrusted input. Now the fact that validate returns void and throws an exception and there is no corresponding 'isValid' routine probably is bad design.
May 10 2011
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 26 Apr 2011 16:43:26 +0400, Alexander <aldem+dmars nk7.net> wrote:

 On 26.04.2011 12:57, Vladimir Panteleev wrote:

 On my Windows box with an i7 920, a simple try/throw/catch loop runs at  
 about 130000 iterations per second.

Well, g++ with same loop on same linux system gives ca. 160000 iter/s, which is quite OK for me.
 Perhaps DMD doesn't use SEH on Linux, and instead uses setjmp/longjmp?

I've not found any references to setjmp/longjmp, but what I've found - disabling trace handler with "Runtime.traceHandler = null" boost performance significantly - in my case I got 1600000 iter/s (wow!), which is perfectly OK. AFAIK, traceHandler is something that prints out the stack trace, providing valuable information only when there is no catch. If there is a catch, then, obviously, this slows down exception processing significantly without any need (it is called on every invocation of throw). I am right? /Alexander

Full stack trace resolving could be done lazily, but you need to save call stack addresses at throw side (which is very quick anyway). Could you please file a bugzilla enhancement request for that?
Apr 26 2011
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
 Hi,
 
 I'm doing some benchmarks (DMD 2.052 on Linux), and noticed that exception
 handling is terrible slow even on quite fast hardware.
 
 There is a note on DM's site, saying:
 
 "Because errors are unusual, execution of error handling code is not
 performance critical."
 
 Well, sure this is (somehow) true, but *so* slow? In my tests, Xeon 3.4 GHz
 is able to handle only ca. 1000 exceptions/s (!). 1ms for single exception
 is a little bit too much, especially when application needs to recover
 fast, not to mention that on slower hardware (like Atom) it will be really
 slow.
 
 And, since some exceptions are not so unusual in normal flow (file not
 found, connection reset etc), "not performance critical" is not really
 applicable. Simple example - in case of web (or any other) server with
 high load, socket exceptions of any kind are quite common. It is still
 natural to use try/catch to handle them, but it will slow down everything
 else (imagine > 1K connections/s and 10% cut rate).
 
 So, my question - is there something, that can improve performance? Any
 clues where to dig for this?

Yeah. Exceptions are _slow_. I ran into some problems with that when reworking the unit tests in std.datetime. I can barely use assertThrown or it takes the performance of the unit tests. It takes something like 450 times longer to run a line of code that ends up throwing an exception to be caught by assertThrown than it does if it does if a similar call which doesn't thrown that is wrapped by assertNotThrown. So, as far as unit tests are concerned, improving exception performance would be a big boost. Exceptions are bound to be slower, but 450 times slower is an enormous slowdown. If that could be sped up, that would be great. - Jonathan M Davis
Apr 26 2011
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Apr 26, 2011, at 9:18 AM, Alexander wrote:

 On 26.04.2011 18:14, Sean Kelly wrote:
=20
 It would be nice to generate them lazily but I don't think that's =


=20
  Why not? My brute-force approach works (temporarily disabling =

(I couldn't figure out yet, how exactly)? Is your brute-force approach faster for entire app execution, or just = try_main() execution? It looks to me like you're just offloading = tracing to the catch point, which in the case of an unhandled exception = is after main() exits. By lazily, I meant generating the trace in = toString() or similar. It might work for the typical case where the = trace is pulled in the catch block or not at all, but if the exception = is held until later before being inspected the lazy approach won't work. = Thread does this, for example, when it rethrows an uncaught exception = from a call to join().=
Apr 26 2011