www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - List of Phobos functions that allocate memory?

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Would anyone be willing to take on the ingrate task of creating a 
comprehensive list with all Phobos functions (and more generally 
artifacts) that allocate memory? That would help a lot with focusing the 
discussion.

Andrei
Feb 06 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu 
wrote:
 Would anyone be willing to take on the ingrate task of creating 
 a comprehensive list with all Phobos functions (and more 
 generally artifacts) that allocate memory? That would help a 
 lot with focusing the discussion.

 Andrei
Merging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.
Feb 06 2014
next sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/6/14, Dicebot <public dicebot.lv> wrote:
 Merging https://github.com/D-Programming-Language/dmd/pull/1886
 and running phobos unit tests should make it relatively simple,
 at least for a first pass.
Running the tests is overkill, all you have to do is iterate over each module and call "-o- -vgc" on it. We have so many allocations in Phobos that I couldn't even upload my text over to a paste site, most sites have a limit of 150Kb! So here it is on github: https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt
Feb 06 2014
parent "Martin Cejp" <minexew gmail.com> writes:
On Thursday, 6 February 2014 at 17:18:59 UTC, Andrej Mitrovic 
wrote:
 On 2/6/14, Dicebot <public dicebot.lv> wrote:
 Merging https://github.com/D-Programming-Language/dmd/pull/1886
 and running phobos unit tests should make it relatively simple,
 at least for a first pass.
Running the tests is overkill, all you have to do is iterate over each module and call "-o- -vgc" on it. We have so many allocations in Phobos that I couldn't even upload my text over to a paste site, most sites have a limit of 150Kb! So here it is on github: https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt
Quite a few of those seem to be false positives. E.g. C:\dmd-git\dmd2\src\phobos\std\internal\digest\sha_SSSE3.d(512): Concatenation causes gc allocation "rol "~T2~",5", looks like something that only ever makes sense at compilation time
Feb 06 2014
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 We have so many allocations in Phobos that I couldn't even upload my
 text over to a paste site, most sites have a limit of 150Kb! So here
 it is on github:

 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt
Ah just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Feb 06 2014
prev sibling next sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt
Ah just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Updated to remove duplicate reports.
Feb 06 2014
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/6/14, 9:21 AM, Andrej Mitrovic wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt
Ah just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Updated to remove duplicate reports.
Thanks. I guess we'd need to cross-reference to function names from there. Andrei
Feb 06 2014
next sibling parent "grm" <gerhard.mueller gmsoft.at> writes:
On Thursday, 6 February 2014 at 17:57:45 UTC, Andrei Alexandrescu 
wrote:
 On 2/6/14, 9:21 AM, Andrej Mitrovic wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt
Ah just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Updated to remove duplicate reports.
Thanks. I guess we'd need to cross-reference to function names from there. Andrei
lots of them are throws tough
Feb 06 2014
prev sibling next sibling parent reply "grm" <gerhard.mueller gmsoft.at> writes:
On Thursday, 6 February 2014 at 17:57:45 UTC, Andrei Alexandrescu 
wrote:
 On 2/6/14, 9:21 AM, Andrej Mitrovic wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt
Ah just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Updated to remove duplicate reports.
Thanks. I guess we'd need to cross-reference to function names from there. Andrei
and also new *XY*Exception doesn't indicate a problem necessarily
Feb 06 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/6/14, 10:05 AM, grm wrote:
 On Thursday, 6 February 2014 at 17:57:45 UTC, Andrei Alexandrescu wrote:
 On 2/6/14, 9:21 AM, Andrej Mitrovic wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt
Ah just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Updated to remove duplicate reports.
Thanks. I guess we'd need to cross-reference to function names from there. Andrei
and also new *XY*Exception doesn't indicate a problem necessarily
Good point. Seems to me code inspection would be a simpler way. Andrei
Feb 06 2014
prev sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Thanks. I guess we'd need to cross-reference to function names from there.
Updated to include function names.
Feb 06 2014
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/6/14, 10:15 AM, Andrej Mitrovic wrote:
 On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Thanks. I guess we'd need to cross-reference to function names from there.
Updated to include function names.
Noice. One less phobos_allocations.txt | grep 'In function'| sed "s/.*'\\(.*\\)':/\\1/"|sort|uniq >phobos_allocating_functions.txt later, and... Andrei
Feb 06 2014
next sibling parent "Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:
On Thursday, 6 February 2014 at 18:25:34 UTC, Andrei Alexandrescu 
wrote:
 Noice. One

 less phobos_allocations.txt | grep 'In function'| sed
 "s/.*'\\(.*\\)':/\\1/"|sort|uniq
phobos_allocating_functions.txt
later, and...
Well I'm just hacking on the -vgc pull to output what I want, but I should read titles better :). Here's the functions: http://codepad.org/3TsPXryX
Feb 06 2014
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Noice. One

 less phobos_allocations.txt | grep 'In function'| sed
 "s/.*'\\(.*\\)':/\\1/"|sort|uniq >phobos_allocating_functions.txt

 later, and...
Ah you've attached a file, didn't notice it on the left since I usually skim the avatar part: http://forum.dlang.org/thread/ld0d79$2ife$1 digitalmars.com?page=2#post-ld0k2u:242ptu:241:40digitalmars.com
Feb 06 2014
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
06-Feb-2014 22:15, Andrej Mitrovic пишет:
 On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Thanks. I guess we'd need to cross-reference to function names from there.
Updated to include function names.
Hm. Somehow diffing this with coverage report may help filter out CTFE. Some bugs are features :) -- Dmitry Olshansky
Feb 06 2014
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Feb 06, 2014 at 11:39:30PM +0400, Dmitry Olshansky wrote:
 06-Feb-2014 22:15, Andrej Mitrovic пишет:
On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
Thanks. I guess we'd need to cross-reference to function names from
there.
Updated to include function names.
Hm. Somehow diffing this with coverage report may help filter out CTFE. Some bugs are features :)
[...] I thought *all* bugs are features... unintentional features. :-P T -- Bomb technician: If I'm running, try to keep up.
Feb 06 2014
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
07-Feb-2014 00:15, H. S. Teoh пишет:
 On Thu, Feb 06, 2014 at 11:39:30PM +0400, Dmitry Olshansky wrote:
 06-Feb-2014 22:15, Andrej Mitrovic пишет:
 On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Thanks. I guess we'd need to cross-reference to function names from
 there.
Updated to include function names.
Hm. Somehow diffing this with coverage report may help filter out CTFE. Some bugs are features :)
[...] I thought *all* bugs are features... unintentional features. :-P
O.T. From a pragmatic point of view any specific property of a system that is useful to the enduser is a feature. Not all bugs are useful ;)
 T
-- Dmitry Olshansky
Feb 06 2014
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
06-Feb-2014 21:21, Andrej Mitrovic пишет:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt
Ah just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Updated to remove duplicate reports.
Needs to somehow cut down CTFE-only stuff. E.g. std.regex alocates a lot at CTFE (and in debug sections), it's a prominent example of CTFE but there is a _lot_ more in the same theme. -- Dmitry Olshansky
Feb 06 2014
prev sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Thu, 06 Feb 2014 16:32:08 +0000
schrieb "Dicebot" <public dicebot.lv>:

 On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu 
 wrote:
 Would anyone be willing to take on the ingrate task of creating 
 a comprehensive list with all Phobos functions (and more 
 generally artifacts) that allocate memory? That would help a 
 lot with focusing the discussion.

 Andrei
Merging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.
That's only for implicit allocations though. And please, don't merge yet, it'll get another rewrite this weekend ;-) One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions. Here's some example output for std.uuid/digest/path/range/algorithm/curl: http://dpaste.dzfl.pl/96d3725b06e2
Feb 06 2014
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/6/14, 10:05 AM, Johannes Pfau wrote:
 Am Thu, 06 Feb 2014 16:32:08 +0000
 schrieb "Dicebot" <public dicebot.lv>:

 On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu
 wrote:
 Would anyone be willing to take on the ingrate task of creating
 a comprehensive list with all Phobos functions (and more
 generally artifacts) that allocate memory? That would help a
 lot with focusing the discussion.

 Andrei
Merging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.
That's only for implicit allocations though. And please, don't merge yet, it'll get another rewrite this weekend ;-)
Please close if you plan to rewrite.
 One interesting point is that module that were written with avoiding
 allocations in mind usually still allocate when throwing exceptions.
Good point, we need to address that as well. Andrei
Feb 06 2014
next sibling parent "grm" <gerhard.mueller gmsoft.at> writes:
 That's only for implicit allocations though. And please, don't 
 merge
 yet, it'll get another rewrite this weekend ;-)
Please close if you plan to rewrite. Andrei
expecting the requested close, so some OTs (in random order): - bought TDPL shortly after it's been released - was very impressed by the concept - following the NGs since, I guess, 2010 - great community and *very* smart people - had nothing of value to add yet, tough (since I'm stuck with C/C++/Jave and some proprietary stuff) - and today I submitted my first reply, which was incredibly easy. no annoyance! please make this more obvious for guys like me that do not want to register. thx and good luck to you all hope I can contribute my share some day Kind Regards
Feb 06 2014
prev sibling next sibling parent reply "fra" <a b.it> writes:
On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu 
wrote:
 One interesting point is that module that were written with 
 avoiding
 allocations in mind usually still allocate when throwing 
 exceptions.
Good point, we need to address that as well. Andrei
Hey, wait a second. How do you throw without allocating?
Feb 06 2014
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/6/14, 10:52 AM, fra wrote:
 On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:
 One interesting point is that module that were written with avoiding
 allocations in mind usually still allocate when throwing exceptions.
Good point, we need to address that as well. Andrei
Hey, wait a second. How do you throw without allocating?
I don't know yet. That's what the "addressing the problem" will take care of! :o) Andrei
Feb 06 2014
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Feb 06, 2014 at 11:01:18AM -0800, Andrei Alexandrescu wrote:
 On 2/6/14, 10:52 AM, fra wrote:
On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:
One interesting point is that module that were written with
avoiding allocations in mind usually still allocate when throwing
exceptions.
Good point, we need to address that as well. Andrei
Hey, wait a second. How do you throw without allocating?
I don't know yet. That's what the "addressing the problem" will take care of! :o)
[...] You can just pre-declare the Exception as a global variable and then throw that. Well, OK, it's cheating because you still have to allocate it then, but the point is that you get to control how it gets allocated at the top-level rather than having the 'new' buried deep down in the function call chain where you can't control whether the code uses 'new' or a custom allocator (it may not know about which allocator to use). Exception prealloc_exc; static this() { prealloc_exc = ... /* use whatever allocation method you want */ } void main() { try { func(); } catch(Exception e) { // you get prealloc_exc here } } void func() { if (error) { // init exception parameters prealloc_exc.msg = ...; /* presumably you preallocate the * message string too, with the * allocator of your choice */ throw prealloc_exc; // N.B. no allocation } } T -- Doubtless it is a good thing to have an open mind, but a truly open mind should be open at both ends, like the food-pipe, with the capacity for excretion as well as absorption. -- Northrop Frye
Feb 06 2014
prev sibling next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Thu, 06 Feb 2014 18:52:20 +0000
schrieb "fra" <a b.it>:

 On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu 
 wrote:
 One interesting point is that module that were written with 
 avoiding
 allocations in mind usually still allocate when throwing 
 exceptions.
Good point, we need to address that as well. Andrei
Hey, wait a second. How do you throw without allocating?
You can store the exception as a global and that's done for the OutOfMemoryError IIRC, but what I meant was 'allocate with the GC'.
Feb 06 2014
next sibling parent Johannes Pfau <nospam example.com> writes:
Am Thu, 6 Feb 2014 20:00:50 +0100
schrieb Johannes Pfau <nospam example.com>:

 Am Thu, 06 Feb 2014 18:52:20 +0000
 schrieb "fra" <a b.it>:
 
 On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu 
 wrote:
 One interesting point is that module that were written with 
 avoiding
 allocations in mind usually still allocate when throwing 
 exceptions.
Good point, we need to address that as well. Andrei
Hey, wait a second. How do you throw without allocating?
You can store the exception as a global and that's done for the OutOfMemoryError IIRC, but what I meant was 'allocate with the GC'.
Oh and in other languages you can throw by value but I think that wouldn't work in D because of exception chaining.
Feb 06 2014
prev sibling parent reply "Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:
On Thursday, 6 February 2014 at 19:01:33 UTC, Johannes Pfau wrote:
 You can store the exception as a global and that's done for the
 OutOfMemoryError IIRC.
Hmm.. is that even safe? I mean in some case of exception chaining the same object could be overwritten before being thrown again, thereby losing the original exception state. Thinking out loud here..
Feb 06 2014
parent "Namespace" <rswhite4 googlemail.com> writes:
On Thursday, 6 February 2014 at 19:05:49 UTC, Andrej Mitrovic 
wrote:
 On Thursday, 6 February 2014 at 19:01:33 UTC, Johannes Pfau 
 wrote:
 You can store the exception as a global and that's done for the
 OutOfMemoryError IIRC.
Hmm.. is that even safe? I mean in some case of exception chaining the same object could be overwritten before being thrown again, thereby losing the original exception state. Thinking out loud here..
You could use a circular buffer with appropriate length.
Feb 06 2014
prev sibling next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On 6 February 2014 18:52, fra <a b.it> wrote:
 On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:
 One interesting point is that module that were written with avoiding
 allocations in mind usually still allocate when throwing exceptions.
Good point, we need to address that as well. Andrei
Hey, wait a second. How do you throw without allocating?
You can't. :o)
Feb 06 2014
prev sibling next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 Hey, wait a second. How do you throw without allocating?
I think exceptions should be ok. You optimize the typical path, and exceptions are (by definition) an exceptional path. If they are also unacceptable, you could restrict yourself to nothrow functions. (Which can still throw Errors... but meh they are even *more* exceptional)
Feb 06 2014
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Thu, 06 Feb 2014 19:08:39 +0000
schrieb "Adam D. Ruppe" <destructionator gmail.com>:

 On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 Hey, wait a second. How do you throw without allocating?
I think exceptions should be ok. You optimize the typical path, and exceptions are (by definition) an exceptional path. If they are also unacceptable, you could restrict yourself to nothrow functions. (Which can still throw Errors... but meh they are even *more* exceptional)
That depends on your situation. For games and other applications on normal computers it's OK. For games on systems like embedded gaming systems (think like NintendoDS, 4MB ram) you might not have a GC but still want to use exception handling.
Feb 06 2014
parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 6 February 2014 at 19:32:11 UTC, Johannes Pfau wrote:
 For games on systems like embedded gaming systems (think like
 NintendoDS, 4MB ram) you might not have a GC but still want to 
 use exception handling.
Yeah, when I toyed with bare metal D, I did exceptions with manual memory management - malloc when throwing (well, I did malloc in _d_newclass so it was transparent to the throwing code), free when catching. But I think a program written for a special environment will have different coding standards from top to bottom, including the need to free in an exception handler and the option to hack druntime.
Feb 06 2014
prev sibling next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 6 February 2014 at 19:08:40 UTC, Adam D. Ruppe wrote:
 On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 Hey, wait a second. How do you throw without allocating?
I think exceptions should be ok. You optimize the typical path, and exceptions are (by definition) an exceptional path. If they are also unacceptable, you could restrict yourself to nothrow functions. (Which can still throw Errors... but meh they are even *more* exceptional)
Hardly so. Any exception allocation can trigger GC collection cycle and Phobos does not provide any other way to handle data errors. Any application that operates on some external user input will be subject to DoS attack vector if it uses Phobos directly. It was huge performance killer for vibe.d last time I have checked, for example.
Feb 06 2014
next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 6 February 2014 at 21:38:03 UTC, Dicebot wrote:
 On Thursday, 6 February 2014 at 19:08:40 UTC, Adam D. Ruppe 
 wrote:
 On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 Hey, wait a second. How do you throw without allocating?
I think exceptions should be ok. You optimize the typical path, and exceptions are (by definition) an exceptional path. If they are also unacceptable, you could restrict yourself to nothrow functions. (Which can still throw Errors... but meh they are even *more* exceptional)
Hardly so. Any exception allocation can trigger GC collection cycle and Phobos does not provide any other way to handle data errors. Any application that operates on some external user input will be subject to DoS attack vector if it uses Phobos directly. It was huge performance killer for vibe.d last time I have checked, for example.
Personally I don't think bad user input qualifies as an exceptional case because it's expected to happen and the program is expected to handle it (and let the user know) when it does. That's just a matter of taste though.
Feb 06 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 6 February 2014 at 22:15:11 UTC, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an 
 exceptional case because it's expected to happen and the 
 program is expected to handle it (and let the user know) when 
 it does. That's just a matter of taste though.
I agree. It kills the whole concept of "exceptions are rare so they don't need to be fast when thrown". But it is how quite lot of Phobos is currently designed and, in my opinion, is biggest design mistake of vibe.d too (it uses exceptions to propagate HTTP status codes)
Feb 06 2014
parent "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 6 February 2014 at 22:19:42 UTC, Dicebot wrote:
 On Thursday, 6 February 2014 at 22:15:11 UTC, Brad Anderson 
 wrote:
 Personally I don't think bad user input qualifies as an 
 exceptional case because it's expected to happen and the 
 program is expected to handle it (and let the user know) when 
 it does. That's just a matter of taste though.
I agree. It kills the whole concept of "exceptions are rare so they don't need to be fast when thrown". But it is how quite lot of Phobos is currently designed and, in my opinion, is biggest design mistake of vibe.d too (it uses exceptions to propagate HTTP status codes)
I must admit that I am guilty of sometimes using exceptions for routine control flow too. It's just so convenient compared to validation/consumption. Maybe we should make a list of Phobos functions that throw exceptions and ensure that (for the ones where this makes sense) they non-throwing validators available. If we can stop gc allocating them that'd be even better but I don't think them being gc allocating should hold up nogc.
Feb 06 2014
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/6/2014 2:15 PM, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an exceptional case
because
 it's expected to happen and the program is expected to handle it (and let the
 user know) when it does. That's just a matter of taste though.
It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.
Feb 06 2014
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Walter Bright:

 It's not a matter of taste. If your input is subject to a DoS 
 attack, don't put exceptions in the control flow.
Perhaps the world of today malicious attacks on the software you write should be assumed as the default situation, and then the language+library has to offer something less paranoiac on request. That's why some languages have changed their sorting and hashing routines to make them a little slower but safer on default. Bye, bearophile
Feb 06 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/6/2014 7:08 PM, bearophile wrote:
 Walter Bright:

 It's not a matter of taste. If your input is subject to a DoS attack, don't
 put exceptions in the control flow.
Perhaps the world of today malicious attacks on the software you write should be assumed as the default situation, and then the language+library has to offer something less paranoiac on request. That's why some languages have changed their sorting and hashing routines to make them a little slower but safer on default.
DoS attack resistance requires faster code, not slower code.
Feb 07 2014
next sibling parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Friday, 7 February 2014 at 08:30:35 UTC, Walter Bright wrote:
 On 2/6/2014 7:08 PM, bearophile wrote:
 That's why some languages have changed their sorting and 
 hashing routines to
 make them a little slower but safer on default.
DoS attack resistance requires faster code, not slower code.
The specific problem was that it was possible to provoke hash collisions by sending carefully crafted input, causing the hash-tables to degrade to linked lists. The small performance penalty of using collision-resistant hashes is certainly worth it in this case.
Feb 07 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/7/2014 6:50 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 The specific problem was that it was possible to provoke hash collisions by
 sending carefully crafted input, causing the hash-tables to degrade to linked
 lists. The small performance penalty of using collision-resistant hashes is
 certainly worth it in this case.
That has nothing to do with needing exceptions in the control flow path (and the performance penalty for using exceptions in this manner is certainly not small).
Feb 08 2014
parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Saturday, 8 February 2014 at 21:59:24 UTC, Walter Bright wrote:
 On 2/7/2014 6:50 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 The specific problem was that it was possible to provoke hash 
 collisions by
 sending carefully crafted input, causing the hash-tables to 
 degrade to linked
 lists. The small performance penalty of using 
 collision-resistant hashes is
 certainly worth it in this case.
That has nothing to do with needing exceptions in the control flow path (and the performance penalty for using exceptions in this manner is certainly not small).
Huh? I responded to this discussion: On Friday, 7 February 2014 at 08:30:35 UTC, Walter Bright wrote:
 On 2/6/2014 7:08 PM, bearophile wrote:
 That's why some languages have changed their sorting and 
 hashing routines to
 make them a little slower but safer on default.
DoS attack resistance requires faster code, not slower code.
I was merely clarifying why in this specific case making the average code path slower _did_ help DoS attack resistance.
Feb 09 2014
prev sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 7 February 2014 at 08:30:35 UTC, Walter Bright wrote:
 On 2/6/2014 7:08 PM, bearophile wrote:
 Walter Bright:

 It's not a matter of taste. If your input is subject to a DoS 
 attack, don't
 put exceptions in the control flow.
Perhaps the world of today malicious attacks on the software you write should be assumed as the default situation, and then the language+library has to offer something less paranoiac on request. That's why some languages have changed their sorting and hashing routines to make them a little slower but safer on default.
DoS attack resistance requires faster code, not slower code.
I think bearophile is referring to a practice of avoiding fast average-case, slow worst-case algorithms in favour of faster worst-cases. If an algorithm has best-case O(n*log(n)) and worst case O(n^2), it's often not practical to build for the worst case, but anything less than that can make you vulnerable to malicious input as part of DOS. In comparison, an algorithm with O(n*log^2(n)) average and worst-case might be acceptable in the average case, but will hold up better in the face of attack. I'm not sure how relevant the point is to the general discussion.
Feb 07 2014
parent "bearophile" <bearophileHUGS lycos.com> writes:
John Colvin:

 I think bearophile is referring to
Yes, you have explained well my point. Thank you. Bye, bearophile
Feb 07 2014
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
07-Feb-2014 06:44, Walter Bright пишет:
 On 2/6/2014 2:15 PM, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an exceptional
 case because
 it's expected to happen and the program is expected to handle it (and
 let the
 user know) when it does. That's just a matter of taste though.
It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.
Meh. If exceptions are such a liability we'd better make them (much) faster. -- Dmitry Olshansky
Feb 07 2014
next sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 7 February 2014 at 16:41:00 UTC, Dmitry Olshansky 
wrote:
 Meh. If exceptions are such a liability we'd better make them 
 (much) faster.
It's not stack unwinding speed that's an issue here though, but rather that for client-facing services, throwing an exception when an invalid request is received gives malicious clients an opportunity to hurt service performance by flooding it with invalid requests. Improving the exception code specifically doesn't help here because the real issue is with GC collections. I'd say that the real fix is for such services to simply not throw in this case. But the exception could always be recycled as well (since in this case you know that throwing will abort the transaction and so will always be immediately discarded). I'm not convinced that there's any need for a language change here to support scoped exceptions. That seems a bit like killing the ant with a steamroller.
Feb 07 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
07-Feb-2014 20:49, Sean Kelly пишет:
 On Friday, 7 February 2014 at 16:41:00 UTC, Dmitry Olshansky wrote:
 Meh. If exceptions are such a liability we'd better make them (much)
 faster.
It's not stack unwinding speed that's an issue here though, but rather that for client-facing services, throwing an exception when an invalid request is received gives malicious clients an opportunity to hurt service performance by flooding it with invalid requests.
Why throwing a single exception is such a big problem? Surely even C's long_jump wasn't that expensive? *Maybe* we shouldn't re-construct full stack trace on every throw?
 Improving the
 exception code specifically doesn't help here because the real issue is
 with GC collections.
Then the problem is that something so temporary as an exception is allocated on the GC heap in the first place? Let's go for something more sane and deprecate the current behavior, it's not like we are forever stuck with it.
I'd say that the real fix is for such services to
 simply not throw in this case.  But the exception could always be
 recycled as well (since in this case you know that throwing will abort
 the transaction and so will always be immediately discarded).
Exceptions are convenient and they make life that much easier combined with ctors/dtors and scoped lifetime. And then we say **ck it - for busy services, just use good ol': ... if (check42(...) == -1){ call_cleanup42(); return -1; } ... And up the callstack we march. The moment code gets non-trivial there come exceptions and RAII to save the day, I don't see how busy REST services are unlike anything else.
 I'm not
 convinced that there's any need for a language change here to support
 scoped exceptions.  That seems a bit like killing the ant with a
 steamroller.
Well I'm not convinced we should accept that exceptions are many times slower then error codes (with checks on every function that may fail + propagating up the stack). -- Dmitry Olshansky
Feb 07 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Friday, 7 February 2014 at 17:06:36 UTC, Dmitry Olshansky 
wrote:
 I'm not
 convinced that there's any need for a language change here to 
 support
 scoped exceptions.  That seems a bit like killing the ant with 
 a
 steamroller.
Well I'm not convinced we should accept that exceptions are many times slower then error codes (with checks on every function that may fail + propagating up the stack).
As I have already mentioned, they don't necessarily need to be. But that may require tweaking language so that pre-allocated exception usage becomes reliable and I don't see tools right now that allow to express neseccary semantics (can't store reference to instance without deep copy)
Feb 07 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/7/2014 10:10 AM, Dicebot wrote:
 As I have already mentioned, they don't necessarily need to be. But that may
 require tweaking language so that pre-allocated exception usage becomes
reliable
 and I don't see tools right now that allow to express neseccary semantics
(can't
 store reference to instance without deep copy)
It is NOT the allocation that's the issue. C++ code has the same issue. It's the exception handling table lookup.
Feb 08 2014
prev sibling next sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 7 February 2014 at 17:06:36 UTC, Dmitry Olshansky
wrote:
 07-Feb-2014 20:49, Sean Kelly пишет:
 On Friday, 7 February 2014 at 16:41:00 UTC, Dmitry Olshansky 
 wrote:
 Meh. If exceptions are such a liability we'd better make them 
 (much)
 faster.
It's not stack unwinding speed that's an issue here though, but rather that for client-facing services, throwing an exception when an invalid request is received gives malicious clients an opportunity to hurt service performance by flooding it with invalid requests.
Why throwing a single exception is such a big problem? Surely even C's long_jump wasn't that expensive? *Maybe* we shouldn't re-construct full stack trace on every throw?
That can be turned off at run time by clearing the traceHandler. But yeah, it's the allocations that are a problem in this case, not the unwinding. And specifically, that flooding with bad requests effectively generates tons of garbage (an allocation for the exception plus another for the trace data) thus triggering frequent stop-the-world collections.
 Exceptions are convenient and they make life that much easier 
 combined with ctors/dtors and scoped lifetime. And then we say 
 **ck it - for busy services, just use good ol':
 ...
 if (check42(...) == -1){ call_cleanup42(); return -1; }
 ...

 And up the callstack we march. The moment code gets non-trivial 
 there come exceptions and RAII to save the day, I don't see how 
 busy REST services are unlike anything else.
I'm sure you can see how a service is different from a desktop application, right? In the latter case, there's only one user and he's interested in having his application perform well. Outside of a QA lab you won't find desktop app. users deliberately trying to break their app. Services are exactly the opposite. It's not an exaggeration when I say that the services I work on are under attack from botnets 24/7. This is a use case that must be considered as a first order of business or the entire service suffers.
 I'm not convinced that there's any need for a language change 
 here to support scoped exceptions.  That seems a bit like 
 killing the ant with a steamroller.
Well I'm not convinced we should accept that exceptions are many times slower then error codes (with checks on every function that may fail + propagating up the stack).
Exception-oriented code is typically faster for the success case because all that return code checking can be removed. But the tradeoff is that it's slower in the failure case because stack unwinding is simply slower than checking an error code. But again, the issue here isn't the cost of stack unwinding, it's that thousands of exceptions thrown per second generates a lot of garbage, and garbage collection in D is currently fairly slow compared to, say, Java. If we could get an incremental GC for D I probably wouldn't even care, but I think that's impossible.
Feb 07 2014
next sibling parent "Adam Wilson" <flyboynw gmail.com> writes:
On Fri, 07 Feb 2014 10:54:37 -0800, Sean Kelly <sean invisibleduck.org> =
 =

wrote:

 On Friday, 7 February 2014 at 17:06:36 UTC, Dmitry Olshansky
 wrote:
 07-Feb-2014 20:49, Sean Kelly =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
 On Friday, 7 February 2014 at 16:41:00 UTC, Dmitry Olshansky wrote:
 Meh. If exceptions are such a liability we'd better make them (much=
)
 faster.
It's not stack unwinding speed that's an issue here though, but rath=
er
 that for client-facing services, throwing an exception when an inval=
id
 request is received gives malicious clients an opportunity to hurt
 service performance by flooding it with invalid requests.
Why throwing a single exception is such a big problem? Surely even C'=
s =
 long_jump wasn't that expensive? *Maybe* we shouldn't re-construct   =
 full stack trace on every throw?
That can be turned off at run time by clearing the traceHandler. But yeah, it's the allocations that are a problem in this case, not the unwinding. And specifically, that flooding with bad requests effectively generates tons of garbage (an allocation for the exception plus another for the trace data) thus triggering frequent stop-the-world collections.
 Exceptions are convenient and they make life that much easier combine=
d =
 with ctors/dtors and scoped lifetime. And then we say **ck it - for  =
 busy services, just use good ol':
 ...
 if (check42(...) =3D=3D -1){ call_cleanup42(); return -1; }
 ...

 And up the callstack we march. The moment code gets non-trivial there=
=
 come exceptions and RAII to save the day, I don't see how busy REST  =
 services are unlike anything else.
I'm sure you can see how a service is different from a desktop application, right? In the latter case, there's only one user and he's interested in having his application perform well. Outside of a QA lab you won't find desktop app. users deliberately trying to break their app. Services are exactly the opposite. It's not an exaggeration when I say that the services I work on are under attack from botnets 24/7. This is a use case that must be considered as a first order of business or the entire service suffers.
 I'm not convinced that there's any need for a language change here t=
o =
 support scoped exceptions.  That seems a bit like killing the ant wi=
th =
 a steamroller.
Well I'm not convinced we should accept that exceptions are many time=
s =
 slower then error codes (with checks on every function that may fail =
+ =
 propagating up the stack).
Exception-oriented code is typically faster for the success case because all that return code checking can be removed. But the tradeoff is that it's slower in the failure case because stack unwinding is simply slower than checking an error code. But again, the issue here isn't the cost of stack unwinding, it's that thousands of exceptions thrown per second generates a lot of garbage, and garbage collection in D is currently fairly slow compared to, say, Java. If we could get an incremental GC for D I probably wouldn't even care, but I think that's impossible.
Technically, there is no reason that the current GC can't be made = incremental, insofar as incremental means collecting only what is requir= ed = complete the allocation. -- = Adam Wilson GitHub/IRC: LightBender Aurora Project Coordinator
Feb 07 2014
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
07-Feb-2014 22:54, Sean Kelly пишет:
 On Friday, 7 February 2014 at 17:06:36 UTC, Dmitry Olshansky
 wrote:
 It's not stack unwinding speed that's an issue here though, but rather
 that for client-facing services, throwing an exception when an invalid
 request is received gives malicious clients an opportunity to hurt
 service performance by flooding it with invalid requests.
Why throwing a single exception is such a big problem? Surely even C's long_jump wasn't that expensive? *Maybe* we shouldn't re-construct full stack trace on every throw?
That can be turned off at run time by clearing the traceHandler.
Which should be somehow prominently advertised for release builds. Last time I checked not making it null made exceptions ridiculously slow.
 But yeah, it's the allocations that are a problem in this case,
 not the unwinding.  And specifically, that flooding with bad
 requests effectively generates tons of garbage (an allocation for
 the exception plus another for the trace data) thus triggering
 frequent stop-the-world collections.
So again - the problem is allocations on GC heap. Then let's please not worry about tiny gains of avoiding stack unwind, that is well understood. And I see no reason for allocating exceptions on GC (and none presented so far). The main use case of exception is to consume exception on catch or forward it down the line. Storing a reference to an exception elsewhere is rare case. I could see the whole situation with exceptions in D as "we copied this shit from Java, no idea why" Java at least does go to great lengths to make them fast (by caching them behind the scenes and whatnot).
 Exceptions are convenient and they make life that much easier combined
 with ctors/dtors and scoped lifetime. And then we say **ck it - for
 busy services, just use good ol':
 ...
 if (check42(...) == -1){ call_cleanup42(); return -1; }
 ...

 And up the callstack we march. The moment code gets non-trivial there
 come exceptions and RAII to save the day, I don't see how busy REST
 services are unlike anything else.
I'm sure you can see how a service is different from a desktop application, right?
Aye, in fact I haven't written much in the way of desktop apps.
 In the latter case, there's only one user
 and he's interested in having his application perform well.
 Outside of a QA lab you won't find desktop app. users
 deliberately trying to break their app.  Services are exactly the
 opposite.  It's not an exaggeration when I say that the services
 I work on are under attack from botnets 24/7.  This is a use case
 that must be considered as a first order of business or the
 entire service suffers.
I bet some sanity checks on the level of protocol handling is more then enough. Yeah these might be faster then unwinding due to shear volume of bad data, but it's a fraction of code albeit a critical fraction. I was thinking about the service logic on top of that.
 I'm not convinced that there's any need for a language change here to
 support scoped exceptions.  That seems a bit like killing the ant
 with a steamroller.
Well I'm not convinced we should accept that exceptions are many times slower then error codes (with checks on every function that may fail + propagating up the stack).
Exception-oriented code is typically faster for the success case because all that return code checking can be removed. But the tradeoff is that it's slower in the failure case because stack unwinding is simply slower than checking an error code.
Duly noted. Just stating the obvious - in the majority of cases we talk about 1 unwind vs 10s of checks. The difference isn't THAT big anyway, the only advantage of codes checking is being able to fail faster on some _early_ bad condition.
 But
 again, the issue here isn't the cost of stack unwinding, it's
 that thousands of exceptions thrown per second generates a lot of
 garbage, and garbage collection in D is currently fairly slow
 compared to, say, Java.
Let's stop bashing GC here. This part of design of exceptions in D is just backwards (penalizes usual case) - time to fix it? -- Dmitry Olshansky
Feb 07 2014
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/7/2014 10:54 AM, Sean Kelly wrote:
 But yeah, it's the allocations that are a problem in this case,
Code can always pre-allocate the exception that is thrown. There's no reason whatsoever that allocation is required at the throw point, nor is there any reason the thrown exception has to be newly allocated each time. And, as such, this is entirely a coding issue, not a language or runtime one.
Feb 08 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/7/2014 9:06 AM, Dmitry Olshansky wrote:
 Why throwing a single exception is such a big problem?
Because in order to unwind the stack, you need to find the information about the stack layout. This lookup is rather slow. You can make the lookup faster by compromising the function code generation, but this is considered an unacceptable tradeoff.
Feb 08 2014
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
09-Feb-2014 02:03, Walter Bright пишет:
 On 2/7/2014 9:06 AM, Dmitry Olshansky wrote:
 Why throwing a single exception is such a big problem?
Because in order to unwind the stack, you need to find the information about the stack layout. This lookup is rather slow. You can make the lookup faster by compromising the function code generation, but this is considered an unacceptable tradeoff.
A special table lookup can't be slow compared to writing a dummy HTTP 500 response. Just saying. Yes, it's a tad slower then cmp + jz, I do understand that. Again I'm trying to say that framing stack unwinding as the culprit of vibe.d crawling under bad requests is plain wrong, and that was the focal point of the original argument. -- Dmitry Olshansky
Feb 08 2014
next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Saturday, 8 February 2014 at 22:11:13 UTC, Dmitry Olshansky 
wrote:
 Again I'm trying to say that framing stack unwinding as the 
 culprit of vibe.d crawling under bad requests is plain wrong, 
 and that was the focal point of the original argument.
Can you see if it is better with this little patch? https://github.com/D-Programming-Language/druntime/pull/717 on a simple test, I got a 20x speedup on most exceptions by lazy generating the stack trace upon request in toString (though if you are printing it anyway you won't see a difference)
Feb 08 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/8/2014 2:11 PM, Dmitry Olshansky wrote:
 09-Feb-2014 02:03, Walter Bright пишет:
 On 2/7/2014 9:06 AM, Dmitry Olshansky wrote:
 Why throwing a single exception is such a big problem?
Because in order to unwind the stack, you need to find the information about the stack layout. This lookup is rather slow. You can make the lookup faster by compromising the function code generation, but this is considered an unacceptable tradeoff.
A special table lookup can't be slow compared to writing a dummy HTTP 500 response. Just saying. Yes, it's a tad slower then cmp + jz, I do understand that. Again I'm trying to say that framing stack unwinding as the culprit of vibe.d crawling under bad requests is plain wrong, and that was the focal point of the original argument.
I don't know how vibe.d works, but my point is using exception handling to implement normal control flow is bad design and it is going to be slow and the reason it is slow is because of the table lookup and unwinding cost, and that is not going to be fixed.
Feb 08 2014
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, February 08, 2014 21:21:40 Walter Bright wrote:
 On 2/8/2014 2:11 PM, Dmitry Olshansky wrote:
 09-Feb-2014 02:03, Walter Bright пишет:
 On 2/7/2014 9:06 AM, Dmitry Olshansky wrote:
 Why throwing a single exception is such a big problem?
Because in order to unwind the stack, you need to find the information about the stack layout. This lookup is rather slow. You can make the lookup faster by compromising the function code generation, but this is considered an unacceptable tradeoff.
A special table lookup can't be slow compared to writing a dummy HTTP 500 response. Just saying. Yes, it's a tad slower then cmp + jz, I do understand that. Again I'm trying to say that framing stack unwinding as the culprit of vibe.d crawling under bad requests is plain wrong, and that was the focal point of the original argument.
I don't know how vibe.d works, but my point is using exception handling to implement normal control flow is bad design and it is going to be slow and the reason it is slow is because of the table lookup and unwinding cost, and that is not going to be fixed.
I wouldn't have considered throwing on an HTTP error to be "flow control." That's normal error handling, and throwing on HTTP errors is exactly what I would have done. It generally makes code a _lot_ cleaner that way, because you don't have to constantly check return codes for errors, and it's using exceptions for exactly what they're there for - reporting and handling errors. You don't want to use exceptions for stuff other than error reporting, and you don't want to use them in situations where the error case is the frequent case, but that shouldn't be the case for HTTP. Exceptions _will_ be slower than other code paths, and you don't want them to be the normal code path. Nothing is going to make exceptions as fast as the normal code paths either. However, D's exceptions are painfully slow - far slower than is reasonable - whether that's because of allocating the exception or unwinding the stack or creating the string for the stack trace or whatever is a matter for investigation, and I'm not about to claim that I know where the bottlenecks are. Fortunately, it looks like Adam Ruppe has found some ways to speed up exceptions: https://github.com/D-Programming-Language/druntime/pull/717 And there may be other improvements that we can implement as well. I agree that there's a limit to how much we can speed up exceptions, but right now, at minimum, we're getting creamed by Java in terms of speed: https://d.puremagic.com/issues/show_bug.cgi?id=9584 - Jonathan M Davis
Feb 08 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 9 February 2014 at 05:57:44 UTC, Jonathan M Davis 
wrote:
 Exceptions _will_ be slower than other code paths, and you 
 don't want them to
 be the normal code path. Nothing is going to make exceptions as 
 fast as the
 normal code paths either. However, D's exceptions are painfully
Just to be pedantic: this is not true. If you have frame based exception meta-info recording then a throw out of recursion (without try-blocks in the recursion) will be faster than normal returns. You unwind down to the try-block with loading a register and a single JMP. All you have to do is to maintain a single linked list of stack frames that can catch. AFAIK the overhead is neglectible if you avoid doing try-blocks in light-weight function calls. You store one pointer per catching stack-frame. That alone is good enough reason to realize that exception handling strategy should be a compiler switch, not a language policy. Because performance depends on what kind of code patterns you have and the architecture. On current gen of x86 CPUs the decode stage of instructions into micro ops and pipelineing ought to be heavy enough that simple BRA instructions "disappear". Thus the offset strategy ought to work well too (injecting data into the code stream near the return point and branch over it if necessary, but usually not).
Feb 09 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
And with profiling you get the call-frequency between functions, 
so a throw could be replaced with:

if (return_address = 0x1234556){...} // 60%
if (return_address = 0x7899324){...} // 30%
slow_unwinding()

That ought to be obvious.
Feb 09 2014
prev sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Saturday, 8 February 2014 at 22:03:13 UTC, Walter Bright wrote:
 You can make the lookup faster by compromising the function 
 code generation, but this is considered an unacceptable 
 tradeoff.
"Compromising"? You mean they had to modify codegen, which they didn't want to. Clearly, if you know the return address you also could have stack info access close to it (at a fixed offset), at no runtime cost whatsoever.
Feb 08 2014
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
But the c++ Dwarf way of doing it was developed for Itanium which 
was targetting HPC, for which you probably don't need exceptions 
all that often. So it made sense in that context.

For regular applications it makes no sense, and with whole 
program analysis (or language level linker) you probably often 
can get a good match at the throw site.
Feb 08 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
AND (this just has to be said) if D is really meant to be a SAFE 
programming language then the language should NOT encourage 
programmers to a coding style where you can fail to test for 
errors. The obvious solution is to ensure that you cannot ignore 
errors unless you are explicit about it. Exceptions ensure that.

Having 3 different ways of returning errors is not a good 
strategy for safe and bug free programming.

Ah, I just had to say it... ;)
Feb 08 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/8/2014 2:59 PM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Saturday, 8 February 2014 at 22:03:13 UTC, Walter Bright wrote:
 You can make the lookup faster by compromising the function code generation,
 but this is considered an unacceptable tradeoff.
"Compromising"? You mean they had to modify codegen, which they didn't want to. Clearly, if you know the return address you also could have stack info access close to it (at a fixed offset), at no runtime cost whatsoever.
Ola, I've done it both ways, I actually do know what I'm talking about. I've sometimes been proven wrong here, so you're welcome to do a pull request proving so.
Feb 08 2014
next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 08 Feb 2014 21:29:27 -0800
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 2/8/2014 2:59 PM, "Ola Fosheim Gr=C3=B8stad"=20
 <ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Saturday, 8 February 2014 at 22:03:13 UTC, Walter Bright wrote:
 You can make the lookup faster by compromising the function code gener=
ation,
 but this is considered an unacceptable tradeoff.
"Compromising"? You mean they had to modify codegen, which they didn't =
want to.
 Clearly, if you know the return address you also could have stack info =
access
 close to it (at a fixed offset), at no runtime cost whatsoever.
=20 Ola, I've done it both ways, I actually do know what I'm talking about. =20 I've sometimes been proven wrong here, so you're welcome to do a pull req=
uest=20
 proving so.
It is not the function code gen that needs to be improved on Linux, Walter. In fact that would be premature optimization considering that the *construction* of exceptions outweights unwinding costs for functions with no local variables by multiple orders of magnitude. --=20 Marco
Feb 08 2014
prev sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 9 February 2014 at 05:29:25 UTC, Walter Bright wrote:
 Ola, I've done it both ways, I actually do know what I'm 
 talking about.
Please note that "you" and "they" was meant as "one" or "the c++ community" not personal. It was not ad hominem. So no reason to be defensive about it. I am grateful if you can point out where my reasoning fails, then I learn something new. Maybe you could explain why a single occasional Branch Always over the unwind-pointer would be slow. Clearly the offset should be empirically based (so that you usually can avoid the goto), maybe even set to a separate cache line for some CPUs, and you could fill out the gaps with other data you need there. It's not like I have run i7 on Vtune, so I could be wrong, but I don't see why… And I also think that if you have a CPU with sufficient number of callee save registers you can carry along a pointer to the last try-block stack frame with not much penalty. After all you only have to restore it if the function ruined it and before calling new functions that are not inlined and not nothrow, and you could stick it into a thread local global too where it matters. On 32 bit x86 it probably is quite expensive though. In code where I write try blocks they tend to stay in the "main logic function", this cosde is so heavy that adding the stack frame to a linked list (of stack frames) is a neglectible cost One really need to be careful when doing performance tests of exception handling, because it is easy to construct "theoretical" code. Programmers should write exception handlers with the implementation in mind, so using existing programs as a base line is not a good solution either.
 I've sometimes been proven wrong here, so you're welcome to do 
 a pull request proving so.
You know very well that I am not going to rewrite codegen for DMD. Adding this feature will complicate codegen and you need to understand the code generator well to do the modification. Besides, I am not sure if a system level language should have exceptions at all or that I would use them when doing the kind of stuff I like to use D for. :-P ;-) I like to use exception handling in application-level code, but not in code for audio/simulations/buffer-streaming/low-level-stuff.
Feb 09 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
This is a pretty nice description of the i7 pipeline by Hennesey 
and Patterson:

https://www.inkling.com/read/computer-architecture-hennessy-5th/chapter-3/section-3-13#0113e87a6dc141d7abda84b497128d61

Notice the 28 micro ops buffer before execution. I'd expect a 
short predicted branch to not cause a big bubble, but I don't 
know for sure.
Feb 09 2014
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, February 07, 2014 20:40:57 Dmitry Olshansky wrote:
 07-Feb-2014 06:44, Walter Bright пишет:
 On 2/6/2014 2:15 PM, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an exceptional
 case because
 it's expected to happen and the program is expected to handle it (and
 let the
 user know) when it does. That's just a matter of taste though.
It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.
Meh. If exceptions are such a liability we'd better make them (much) faster.
Related: http://d.puremagic.com/issues/show_bug.cgi?id=9584 The DOS aspect of exceptions are not something that I've ever thought about or seen discussed before, but one area where I've found the slowness of D's exceptions to be a real pain is in unit tests. I like to test failure cases as well as successful ones, and if you do much of that, your unit tests start taking a long time due to how insanely slow exceptions are in D. So, while in some situations, the solution may be to not use exceptions (or to use them less), I think that we really need to look at doing what's necessary to make exceptions a lot faster - be it to more efficiently deal with stack traces or to avoid allocating them or whatever else we can come up with to make them fast. I think that the approach of assuming that exceptions don't need to be fast, because they're used for error conditions is a bad one. They're not as performance critical as normal code, but their speed still very much matters. - Jonathan M Davis
Feb 07 2014
next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 7 February 2014 at 19:54:14 UTC, Jonathan M Davis 
wrote:
 They're not as performance critical as normal code, but their 
 speed still very much matters.
Well, it is at least more difficult to write reliable code when you have to try to avoid them. Still for a webservice you should probably not have to deal with more than 1000 per second on average, assume 1Ghz, then that is like 1.000.000 cycles of running code per stack unwinding. If you sacrifice 10% of that for exception handling that means you have 100.000 cycles to unwind the stack. If the unwound stack is 5 frames deep you have 20.000 cycles per stack frame. If that is not possible something should be done with the Release-version of the runtime. For a webserver you could of course tie the request handler directly to the request object and instantiate different ones for each request type then have all "unwinding" in the object itself. Quirky, but workable.
Feb 07 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/7/2014 11:53 AM, Jonathan M Davis wrote:
 or to avoid allocating them
Grep for 'throw' in std.datetime shows that every throw is actually: throw new ... and an example: throw new DateTimeException("SYSTEMTIME cannot hold dates prior to the year 1601."); There is no requirement that the new is done there. You can preallocate the DateTimeException statically, and simply keep rethrowing the same exception instance. I.e. the allocation issue is a coding style issue, not a language problem.
Feb 08 2014
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, February 08, 2014 14:13:04 Walter Bright wrote:
 On 2/7/2014 11:53 AM, Jonathan M Davis wrote:
 or to avoid allocating them
Grep for 'throw' in std.datetime shows that every throw is actually: throw new ... and an example: throw new DateTimeException("SYSTEMTIME cannot hold dates prior to the year 1601."); There is no requirement that the new is done there. You can preallocate the DateTimeException statically, and simply keep rethrowing the same exception instance. I.e. the allocation issue is a coding style issue, not a language problem.
Of course allocation is not a language issue. The question is whether (and how) we can change our approach to allocating exceptions in order to reduce their cost. And that's a change in how we approach them, not a change in the language itself. It might require some changes in druntime to better deal with other allocation schemes (particularly with how that affects exception chaining), but it's not a language issue. And in general, I would expect that any speed-ups that we could attain with regards to actually throwing an exception would be in druntime's implementation rather than anything in the language itself. Any improvements there could then be combined with any improvements we could make to our approach to allocating exceptions (and for better or worse - probably worse - the normal approach at this point is to allocate a new exception when throwing). - Jonathan M Davis
Feb 08 2014
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/7/14, 8:40 AM, Dmitry Olshansky wrote:
 07-Feb-2014 06:44, Walter Bright пишет:
 On 2/6/2014 2:15 PM, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an exceptional
 case because
 it's expected to happen and the program is expected to handle it (and
 let the
 user know) when it does. That's just a matter of taste though.
It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.
Meh. If exceptions are such a liability we'd better make them (much) faster.
One simple idea is to statically allocate the same exception and rethrow it over and over. After all there's no guarantee a distinct exception is thrown every time, and the approach is still memory safe (though it might surprise the programmer who saves a reference to an old exception). Andrei
Feb 07 2014
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, February 07, 2014 16:49:45 Andrei Alexandrescu wrote:
 On 2/7/14, 8:40 AM, Dmitry Olshansky wrote:
 07-Feb-2014 06:44, Walter Bright пишет:
 On 2/6/2014 2:15 PM, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an exceptional
 case because
 it's expected to happen and the program is expected to handle it (and
 let the
 user know) when it does. That's just a matter of taste though.
It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.
Meh. If exceptions are such a liability we'd better make them (much) faster.
One simple idea is to statically allocate the same exception and rethrow it over and over. After all there's no guarantee a distinct exception is thrown every time, and the approach is still memory safe (though it might surprise the programmer who saves a reference to an old exception).
As long as exceptions are cloneable, and people are aware of the fact that they tend to be non-unique, then it can be common practice to clone/dup an exception when you need to keep it around. However, the two potential problems with this overall approach are 1. Do we just always allocate one of each exception type per thread (probably in a static constructor for that exception type)? That would result in a fair number of exceptions being allocated up front. The obvious alternative would be to allocate it the first time that it's thrown so that you only end up with exceptions that get used being allocated, but regardless, we need to take close look at the allocation scheme. 2. This sort of thing has a definite impact on enforce and any idioms related to it. We'd need to either adjust enforce, enforceEx, etc. to avoid the allocation, or we'd need to introduce alternatives to them that expect something like a static opCall on the exception type which returns the common exception for that type or some other standard means of getting at the reusable exception. Regardless, we need to agree upon a standard way to define exception types allow with some set of standard idioms for handling them such that we can deal with exceptions generically (particularly with regards to stuff like enforce) rather than it being an ad-hoc per-exception type thing that you can't reasonably rely on. - Jonathan M Davis
Feb 07 2014
prev sibling parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 8 February 2014 at 00:49:46 UTC, Andrei Alexandrescu 
wrote:
 One simple idea is to statically allocate the same exception 
 and rethrow it over and over. After all there's no guarantee a 
 distinct exception is thrown every time, and the approach is 
 still memory safe (though it might surprise the programmer who 
 saves a reference to an old exception).

 Andrei
I don't think it's that simple. What happens if an XException causes another XException and they need to be chained together?
Feb 08 2014
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
08-Feb-2014 15:02, Jakob Ovrum пишет:
 On Saturday, 8 February 2014 at 00:49:46 UTC, Andrei Alexandrescu wrote:
 One simple idea is to statically allocate the same exception and
 rethrow it over and over. After all there's no guarantee a distinct
 exception is thrown every time, and the approach is still memory safe
 (though it might surprise the programmer who saves a reference to an
 old exception).

 Andrei
I don't think it's that simple. What happens if an XException causes another XException and they need to be chained together?
If both are thread-local and cached I see no problem whatsoever. The thing is the current "default" of creating exception is AWFUL. And D stands for sane defaults and the simple path being good last time I checked. -- Dmitry Olshansky
Feb 08 2014
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky 
wrote:>
 If both are thread-local and cached I see no problem whatsoever.
 The thing is the current "default" of creating exception is 
 AWFUL.
 And D stands for sane defaults and the simple path being good 
 last time I checked.
How is it not a problem? XException's fields (message, location etc) would be overwritten by the latest throw site, and its `next` field would point to itself.
Feb 08 2014
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, February 08, 2014 11:17:25 Jakob Ovrum wrote:
 On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky
 wrote:>
 
 If both are thread-local and cached I see no problem whatsoever.
 The thing is the current "default" of creating exception is
 AWFUL.
 And D stands for sane defaults and the simple path being good
 last time I checked.
How is it not a problem? XException's fields (message, location etc) would be overwritten by the latest throw site, and its `next` field would point to itself.
Then we have multiple of them, or we new up another one when a second one is needed. Even if it were only the first exception which avoided the allocation, it would be a big gain, and in most cases, you're only going to get a single exception, or the exceptions will be of different types. - Jonathan M Davis
Feb 08 2014
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 8 February 2014 at 11:27:27 UTC, Jonathan M Davis 
wrote:
 On Saturday, February 08, 2014 11:17:25 Jakob Ovrum wrote:
 On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky
 wrote:>
 
 If both are thread-local and cached I see no problem 
 whatsoever.
 The thing is the current "default" of creating exception is
 AWFUL.
 And D stands for sane defaults and the simple path being good
 last time I checked.
How is it not a problem? XException's fields (message, location etc) would be overwritten by the latest throw site, and its `next` field would point to itself.
Then we have multiple of them, or we new up another one when a second one is needed. Even if it were only the first exception which avoided the allocation, it would be a big gain, and in most cases, you're only going to get a single exception, or the exceptions will be of different types. - Jonathan M Davis
Yes, I'm sure there is a cool solution, I'm just pointing out that it's not as simple as statically allocating. I think it would be a nice exercise to compose such a solution with std.allocator.
Feb 08 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 08 Feb 2014 11:33:51 +0000
schrieb "Jakob Ovrum" <jakobovrum gmail.com>:

 On Saturday, 8 February 2014 at 11:27:27 UTC, Jonathan M Davis 
 wrote:
 On Saturday, February 08, 2014 11:17:25 Jakob Ovrum wrote:
 On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky
 wrote:>
 
 If both are thread-local and cached I see no problem 
 whatsoever.
 The thing is the current "default" of creating exception is
 AWFUL.
 And D stands for sane defaults and the simple path being good
 last time I checked.
How is it not a problem? XException's fields (message, location etc) would be overwritten by the latest throw site, and its `next` field would point to itself.
Then we have multiple of them, or we new up another one when a second one is needed. Even if it were only the first exception which avoided the allocation, it would be a big gain, and in most cases, you're only going to get a single exception, or the exceptions will be of different types. - Jonathan M Davis
Yes, I'm sure there is a cool solution, I'm just pointing out that it's not as simple as statically allocating. I think it would be a nice exercise to compose such a solution with std.allocator.
Yes, it doesn't seem feasible otherwise. Since you can call functions recursively you could potentially chain exceptions from the same line of code several times. catch (Exception e) { staticException.line = __LINE__; staticException.file = __FILE__; staticException.next = e; // e.next is staticException throw staticException; } You'd have to flag staticException as "in use" and spawn a new instance every time you need another one of the same type. Since there is no way to reset that flag automatically when the last user goes out of scope (i.e. ref counting), that's not even an option. Preallocated exceptions only work if you are confident your exception wont be recursively thrown and thereby chained to itself. Granted, the majority of code, but really too much cognitive load when writing exception handling code. -- Marco
Feb 08 2014
parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Sunday, 9 February 2014 at 04:38:23 UTC, Marco Leise wrote:
 Yes, it doesn't seem feasible otherwise. Since you can call
 functions recursively you could potentially chain exceptions
 from the same line of code several times.

   catch (Exception e)
   {
       staticException.line = __LINE__;
       staticException.file = __FILE__;
       staticException.next = e;  // e.next is staticException
       throw staticException;
   }

 You'd have to flag staticException as "in use" and spawn a new
 instance every time you need another one of the same type.
 Since there is no way to reset that flag automatically when
 the last user goes out of scope (i.e. ref counting), that's
 not even an option.

 Preallocated exceptions only work if you are confident your
 exception wont be recursively thrown and thereby chained to
 itself. Granted, the majority of code, but really too much
 cognitive load when writing exception handling code.
While writes directly to line and file and such can't be prevented, `next` could be implemented as a property that does the conditional .dup when assigned to itself (or throw an Error).
Feb 09 2014
prev sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Saturday, 8 February 2014 at 11:17:26 UTC, Jakob Ovrum wrote:
 On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky 
 wrote:>
 If both are thread-local and cached I see no problem 
 whatsoever.
 The thing is the current "default" of creating exception is 
 AWFUL.
 And D stands for sane defaults and the simple path being good 
 last time I checked.
How is it not a problem? XException's fields (message, location etc) would be overwritten by the latest throw site, and its `next` field would point to itself.
It's supposedly one exception instance per place where it can be thrown, not per exception type. Then the problem would be restricted to recursive calls, where in the exception handler for XException, another XException is thrown.
Feb 09 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/8/14, 3:02 AM, Jakob Ovrum wrote:
 On Saturday, 8 February 2014 at 00:49:46 UTC, Andrei Alexandrescu wrote:
 One simple idea is to statically allocate the same exception and
 rethrow it over and over. After all there's no guarantee a distinct
 exception is thrown every time, and the approach is still memory safe
 (though it might surprise the programmer who saves a reference to an
 old exception).

 Andrei
I don't think it's that simple. What happens if an XException causes another XException and they need to be chained together?
The chaining method detects that and .dup's one of them. Andrei
Feb 08 2014
next sibling parent "Dicebot" <public dicebot.lv> writes:
On Saturday, 8 February 2014 at 16:50:53 UTC, Andrei Alexandrescu 
wrote:
 The chaining method detects that and .dup's one of them.

 Andrei
After some thinking I don't think it actually helps - exception will be modified _before_ throwing in library code so cloning will be to late. But I don't see any reason why basic exception instances in Phobos can't be made immutable.
Feb 08 2014
prev sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 8 February 2014 at 16:50:53 UTC, Andrei Alexandrescu 
wrote:
 On 2/8/14, 3:02 AM, Jakob Ovrum wrote:
 On Saturday, 8 February 2014 at 00:49:46 UTC, Andrei 
 Alexandrescu wrote:
 One simple idea is to statically allocate the same exception 
 and
 rethrow it over and over. After all there's no guarantee a 
 distinct
 exception is thrown every time, and the approach is still 
 memory safe
 (though it might surprise the programmer who saves a 
 reference to an
 old exception).

 Andrei
I don't think it's that simple. What happens if an XException causes another XException and they need to be chained together?
The chaining method detects that and .dup's one of them. Andrei
What if the statically allocated XException is escaped to be inspected later, but before that is thrown again in a separate exception chain? I suppose it would be no different from the current situation, as it's legal to throw exceptions allocated in any fashion, so there is already no guarantee of uniqueness. It's probable that some code out there still takes exception uniqueness for granted, so changing the allocation scheme would be a (typically silent) breaking change, even if the code is arguably broken in the first place. I suppose we could make that breakage a compile error by making exceptions implicitly `scope` at the catch-site, but that would of course be a much more involved change... Personally I still like the idea, but if implemented, I think something should be done about the change in uniqueness at the same time, even if it's just an added note in the language documentation on exceptions.
Feb 08 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:
 Meh. If exceptions are such a liability we'd better make them (much) faster.
They can be made faster by slowing down non-exception code. This has been debated at length in the C++ community, and the generally accepted answer is that non-exception code performance is preferred and exception performance is thrown under the bus in order to achieve it. I think it's quite a reasonable conclusion.
Feb 08 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 08 Feb 2014 14:01:12 -0800
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:
 Meh. If exceptions are such a liability we'd better make them (much) faster.
They can be made faster by slowing down non-exception code. This has been debated at length in the C++ community, and the generally accepted answer is that non-exception code performance is preferred and exception performance is thrown under the bus in order to achieve it. I think it's quite a reasonable conclusion.
https://yourlogicalfallacyis.com/black-or-white The reasons for slow exceptions in D could be the generation of stack trace strings or the garbage collector instead of inherent trade offs to keep the successful code path fast. And static allocation isn't an exactly appealing option... throw staticException ? staticException : (staticException = new SomethingException("Don't do this at home kids!")); and practically out of question when you need to chain exceptions and your call stack could contain this line of code more than once, resulting in infinite loops in exception chains as a new bug type in D, that is fixed by writing: catch (Exception e) { throw (staticException ? (e.linksTo(staticException) ? staticException.dupThenWrap(e) : staticException) : (staticException = new SomethingException("Don't do this at home kids!")); } -- Marco
Feb 08 2014
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/8/2014 9:00 PM, Marco Leise wrote:
 The reasons for slow exceptions in D could be the generation
 of stack trace strings or the garbage collector instead of
 inherent trade offs to keep the successful code path fast.
Sigh, once again, 1. It is not the collector 2. I've implemented it both ways, I know what I'm talking about. You can see the fast exception way in the Win32 code generation, and the slow way in the Linux code generation.
Feb 08 2014
parent Marco Leise <Marco.Leise gmx.de> writes:
Content-Disposition: inline

Am Sat, 08 Feb 2014 14:01:12 -0800
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:
 Meh. If exceptions are such a liability we'd better make them (much) faster.
They can be made faster by slowing down non-exception code. This has been debated at length in the C++ community, and the generally accepted answer is that non-exception code performance is preferred and exception performance is thrown under the bus in order to achieve it. I think it's quite a reasonable conclusion.
Am Sat, 08 Feb 2014 21:31:53 -0800 schrieb Walter Bright <newshound2 digitalmars.com>:
 On 2/8/2014 9:00 PM, Marco Leise wrote:
 The reasons for slow exceptions in D could be the generation
 of stack trace strings or the garbage collector instead of
 inherent trade offs to keep the successful code path fast.
Sigh, once again, 1. It is not the collector 2. I've implemented it both ways, I know what I'm talking about. You can see the fast exception way in the Win32 code generation, and the slow way in the Linux code generation.
Ok, I'm on Linux which should be inherently slower at throwing exceptions as you say. So I've written a little test and it shows two things: 1. You are right, about the collector. It is not the bottleneck. 2. It doesn't have anything to do with trading speed for the successful code path either. I called two functions recursively until a nesting depth of 1000. The first version allocates a new exception, the second one reuses an existing exception. At the call site I caught the exception. I did this 10_000 times in a loop. [The code is attached.] Even at this nesting depth the second version still outperformed the first one by a factor of ~200(!) and all the CPU time (>98%) was is spent somewhere in libc. Using static exceptions (or similarly in C++: throwing literal strings) is VERY fast in D already and I see no reason to improve that at the moment. So I repeat my point: The reasons for slow exceptions in D could be the generation of stack trace strings or anything else other than some inherent trade offs to keep the successful code path fast. -- Marco
Feb 08 2014
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Sunday, 9 February 2014 at 05:00:15 UTC, Marco Leise wrote:
 And static allocation isn't an exactly appealing option...

   throw staticException ? staticException : (staticException =
   new SomethingException("Don't do this at home kids!"));

 and practically out of question when you need to chain
 exceptions and your call stack could contain this line of code
 more than once, resulting in infinite loops in exception
 chains as a new bug type in D, that is fixed by writing:

   catch (Exception e) {
       throw (staticException ? (e.linksTo(staticException) ? 
 staticException.dupThenWrap(e) : staticException) : 
 (staticException = new SomethingException("Don't do this at 
 home kids!"));
   }
This doesn't seem like a valid concern. Nothing stops you from using a (standard) function to do that ugly boilerplate.
Feb 09 2014
prev sibling next sibling parent "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Sunday, 9 February 2014 at 05:00:15 UTC, Marco Leise wrote:
 https://yourlogicalfallacyis.com/black-or-white
Off topic, but that is a fantastic web site. I wish I had known about it before.
Feb 09 2014
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/8/14, 9:00 PM, Marco Leise wrote:
 Am Sat, 08 Feb 2014 14:01:12 -0800
 schrieb Walter Bright <newshound2 digitalmars.com>:

 On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:
 Meh. If exceptions are such a liability we'd better make them (much) faster.
They can be made faster by slowing down non-exception code. This has been debated at length in the C++ community, and the generally accepted answer is that non-exception code performance is preferred and exception performance is thrown under the bus in order to achieve it. I think it's quite a reasonable conclusion.
https://yourlogicalfallacyis.com/black-or-white The reasons for slow exceptions in D could be the generation of stack trace strings or the garbage collector instead of inherent trade offs to keep the successful code path fast.
This threads is about memory allocation, not exceptions being slow.
 And static allocation isn't an exactly appealing option...

    throw staticException ? staticException : (staticException =
    new SomethingException("Don't do this at home kids!"));
Function calls could do that. Andrei
Feb 09 2014
prev sibling parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Thursday, 6 February 2014 at 22:15:11 UTC, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an 
 exceptional case because it's expected to happen and the 
 program is expected to handle it (and let the user know) when 
 it does. That's just a matter of taste though.
Hmm... then what _does_ qualify as exceptional in your opinion? A logic error (i.e. a mistake on the programmers side) doesn't, IMO, it should abort instead. On the other hand, there is the class of situations where e.g. a system call returns an error (say, "permission denied" when opening a file, or out of disk space). Or more generally, an external service, like a database or a remote server. However, I can't see how these are fundamentally different from invalid user input, and indeed, there's often not even a clear separation, e.g. when a user asked you to read a file they don't have access to. So, what's left then?
Feb 07 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Friday, 7 February 2014 at 14:26:48 UTC, Marc Schütz wrote:
 Hmm... then what _does_ qualify as exceptional in your opinion?

 A logic error (i.e. a mistake on the programmers side) doesn't, 
 IMO, it should abort instead. On the other hand, there is the 
 class of situations where e.g. a system call returns an error 
 (say, "permission denied" when opening a file, or out of disk 
 space). Or more generally, an external service, like a database 
 or a remote server. However, I can't see how these are 
 fundamentally different from invalid user input, and indeed, 
 there's often not even a clear separation, e.g. when a user 
 asked you to read a file they don't have access to.

 So, what's left then?
It is exceptional situation if input is supposed to be valid but surprisingly is not. For example, calling `decodeGrapheme` on external string without making sure it is valid first. Same goes for file - trying open a missing file is exceptional, but checking for file presence is not.
Feb 07 2014
parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Friday, 7 February 2014 at 14:42:18 UTC, Dicebot wrote:
 On Friday, 7 February 2014 at 14:26:48 UTC, Marc Schütz wrote:
 Hmm... then what _does_ qualify as exceptional in your opinion?

 A logic error (i.e. a mistake on the programmers side) 
 doesn't, IMO, it should abort instead. On the other hand, 
 there is the class of situations where e.g. a system call 
 returns an error (say, "permission denied" when opening a 
 file, or out of disk space). Or more generally, an external 
 service, like a database or a remote server. However, I can't 
 see how these are fundamentally different from invalid user 
 input, and indeed, there's often not even a clear separation, 
 e.g. when a user asked you to read a file they don't have 
 access to.

 So, what's left then?
It is exceptional situation if input is supposed to be valid but surprisingly is not. For example, calling `decodeGrapheme` on external string without making sure it is valid first.
If the function expects it to be valid but you pass it an invalid value, you're breaking the contract, which is a logic error and thus should be checked for by assert, not by an exception. => Case number one: logic errors, no exceptions should be used here. If however the function doesn't require it to be valid (for `decodeGrapheme` the docs don't say anything, so I assume it doesn't), then it needs to be able to handle invalid input, for example by throwing an exception. => This is an example of case number two: user errors, exceptions are okay here. But Brad Anderson seems to disagree on case two (or maybe case one?). Or is there a third type of situation not covered by these two cases?
 Same goes for file - trying open a missing file is exceptional, 
 but checking for file presence is not.
I agree here, checking for presence is not exceptional.
Feb 07 2014
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 7 February 2014 at 14:26:48 UTC, Marc Schütz wrote:
 or a remote server. However, I can't see how these are 
 fundamentally different from invalid user input, and indeed, 
 there's often not even a clear separation, e.g. when a user 
 asked you to read a file they don't have access to.
I agree. Any situation where it makes sense to say: "Ouch, this is not going to work out, roll back, roll back, let's move out of this module! We need to try a different approach. We are not going to continue with anything productive down this lane, lets go back to the context and get into a new direction." is suitable for exceptions and it makes code reuse, evolution and modification to error reporting easy. - validation and veracity checking - authentication failures - database failures - transactional retries - serious allocation issues - timeouts are all fiiine for exceptions. You get to write a request handler like this: { auto sid = request.authenticate(); auto data = validator(request.getPost('label1','label2','label3')); auto key = model.create_and_put(sid,data); response.writeJson(key); response.status = 201; return; } And you can change the error reporting at the request dispatcher level rather than sifting through 20 different spaghetti-like request handlers trying to figure out if you got it right: { auto sid = request.authenticate(); if (sid<0){ ... return ...; } auto data = request.getPost('label1','label2','label3'); if (data){ data = validate(data); if (data){ auto key = model.create_and_put(sid,data); if (item){ auto ok = response.writeJson(key); if(ok){ response.status = 201; return; } ....; } else { .... ; } } else { .... ; } } else { ... ; } }
Feb 07 2014
prev sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, February 07, 2014 14:26:47 Marc Schütz" 
<schuetzm gmx.net> puremagic.com wrote:
 On Thursday, 6 February 2014 at 22:15:11 UTC, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an
 exceptional case because it's expected to happen and the
 program is expected to handle it (and let the user know) when
 it does. That's just a matter of taste though.
Hmm... then what _does_ qualify as exceptional in your opinion?
Honestly, I think that the typical approach of discussing exceptions as being for "exceptional" circumstances is bad. It inevitably leads to confusion and debate over what "exceptional" means. Some programmers would consider that to mean any bad input, whereas others would take it to the extreme that they should only happen when your program is in an invalid state (essentially what we use Errors for). I've found rather that when discussing exceptions it works much better to explain exactly why you'd use them, and I think that that comes primarily down to three types of circumstances. 1. Code which which should succeed most of the time and which would be far cleaner if it's written to throw exceptions - particularly when the alternative would be to check error codes on every function call (which would be incredibly error-prone). A prime example of this would be a parser. It's far cleaner to write a parser which assumes that each step succeeds than it is to constantly check that each one succeeded. It makes it so that only code that could actually encounter an error has to check for it and so that it can easily and cleanly propagate the error to the top. Doing that with error codes would generally be a mess, and unless failure is the norm, efficiency shouldn't be a problem. 2. Code which you can't actually guarantee will ever succeed. There are some cases where you can avoid errors by doing validation before proceeding (e.g. testing strings for Unicode correctness before doing a lot of string processing), but there are others where you either can't validate ahead of time or where you could still end up with an error in spite of your validation. A prime example of this would be operating on files. For, instance, std.file.isDir will tell you whether a particular file is directory or not by returning bool. If that file does not actually exist, then what is isDir supposed to do? All it can do is throw an exception, unless you want to have a separate out parameter to report whether it succeeded or not or change it so that it returns an error code and returns the bool as out parameter, both of which would make it much uglier to use. And isDir can't assert that the file exists, because that's a runtime condition that cannot be fully verified ahead of time. You can (and should) check whether the file exists first if(file.exists) { if(file.isDir) {} else if(file.isFile) {} else {} } but the file system could actually delete that file right out from under you between the call to exists and the call to isDir (or between the calls to isDir and isFile), so validation reduces how often you hit the error case but cannot eliminate it. It should also be rare that isDir will fail (since you should be checking that the file exists first). So, throwing an exception makes perfect sense. You get clean code that's still able to handle error cases rather than them being ignored (as frequently happens with error codes). 3. Code which should succeed most of the time but where doing validation essentially requires doing what you're validating for anyway. Again, parsers are a good example of this. For instance, to validate that "2013-12-22T01:22:27z" is in the valid ISO extended string format for a timestamp, you have to do pretty much exactly the same work that you have to do to parse out all of the values to convert it to something other than a string (e.g. SysTime). So, if you validated it first, you'd be doing the work twice. As such, why validate first? Just have it throw an exception when the parsing fails. And if for some reason, you expect that there's a high chance that the parsing would fail, then you can have a function which returns an error code and passed out the result as an out parameter instead, but that makes the code much uglier and error-prone. So, in most cases, you'd want it to throw an exception on failure. But regardless, you wouldn't want to validate it first as that would just be expensive all the time rather than more expensive in the (hopefully) rare error case. The areas that you want to normally avoid exceptions are when you're validating up front or when the error condition is likely. If you're validating, you're normally asking a question - is this data valid - in which case, returning bool is the correct thing to do, not throwing on failure (though if the result is false, the caller could choose to throw if appropriate). And trying to do something which has a good chance of failing should probably return whether it succeeded or not, because you don't want exceptions to be your normal code path. Also, performance-critical stuff may need to go the error-code path rather than exceptions simply due to it being performance-critical, but in general, error conditions which aren't bugs in your program should be reported via exceptions (not error codes) with validation being used where appropriate to make it so that the error conditions are infrequent. - Jonathan M Davis
Feb 07 2014
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 3. Code which should succeed most of the time but where doing 
 validation
 essentially requires doing what you're validating for anyway. 
 Again, parsers
 are a good example of this. For instance, to validate that
 "2013-12-22T01:22:27z" is in the valid ISO extended string 
 format for a
 timestamp, you have to do pretty much exactly the same work 
 that you have to
 do to parse out all of the values to convert it to something 
 other than a
 string (e.g. SysTime). So, if you validated it first, you'd be 
 doing the work
 twice. As such, why validate first? Just have it throw an 
 exception when the
 parsing fails. And if for some reason, you expect that there's 
 a high chance
 that the parsing would fail, then you can have a function which 
 returns an
 error code and passed out the result as an out parameter 
 instead, but that
 makes the code much uglier and error-prone. So, in most cases, 
 you'd want it
 to throw an exception on failure.
Languages with a good type system solve this with Maybe / Nullable / Optional and similar things. It's both safe (and efficient if the result is equivalent to just a wapping struct). Bye, bearophile
Feb 07 2014
parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, February 07, 2014 21:27:04 bearophile wrote:
 Jonathan M Davis:
 3. Code which should succeed most of the time but where doing
 validation
 essentially requires doing what you're validating for anyway.
 Again, parsers
 are a good example of this. For instance, to validate that
 "2013-12-22T01:22:27z" is in the valid ISO extended string
 format for a
 timestamp, you have to do pretty much exactly the same work
 that you have to
 do to parse out all of the values to convert it to something
 other than a
 string (e.g. SysTime). So, if you validated it first, you'd be
 doing the work
 twice. As such, why validate first? Just have it throw an
 exception when the
 parsing fails. And if for some reason, you expect that there's
 a high chance
 that the parsing would fail, then you can have a function which
 returns an
 error code and passed out the result as an out parameter
 instead, but that
 makes the code much uglier and error-prone. So, in most cases,
 you'd want it
 to throw an exception on failure.
Languages with a good type system solve this with Maybe / Nullable / Optional and similar things. It's both safe (and efficient if the result is equivalent to just a wapping struct).
That can be a good solution, but it also then requires checking the result. One of the big advantages of exceptions is that your code can not care except for the relatively few points that catch exceptions and handle them. Where you run into problems is when the failure case is likely. And if that's the case, then something like Maybe or Nullable is definitely better. - Jonathan M Davis
Feb 07 2014
prev sibling next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 6 February 2014 at 21:38:03 UTC, Dicebot wrote:
 Any application that operates on some external user input will 
 be subject to DoS attack vector if it uses Phobos directly.
Hmm, I hadn't considered that. Maybe exceptions could be handled automatically though due to the facts that there are rarely more than one in flight at any time and they typically don't live for long: 1) prohibit escaping of exception objects from catch blocks (we could just say it is undefined behavior in the spec). The data pointed to by the throwable object should be normal though, if you want to keep the exception, you can thus just shallow copy it. 2) Set aside a static (thread local) buffer early on with a size of like 512 bytes. 3) Make "throw new" call a special function which favors the static buffer. It can do a simple bump-the-pointer allocation in the static region or call the regular GC if there isn't enough room (should be extremely rare). throw e; works the same way it does now. You can pre-allocate with some other method if you want. 4) Have the compiler automatically insert a call to _d_free_exception in a scope(success) block inside every catch block. It checks the given reference, if it is in the static buffer, just zero it all out. If all the chain is in there, zeroing it will free it all. If there's any GC chained exceptions, zeroing it will orphan them and they'll be freed on the next sweep. Otherwise ... well do nothing, let the GC clean up after it. Proof of concept: bool isThrowable(const ClassInfo ci) { if(ci is null) return false; if(ci is typeid(Throwable)) return true; return isThrowable(ci.base); } byte[512] exceptionHolder = 0; size_t exceptionHolderPosition = 0; extern(C) Object _d_newclass(const ClassInfo ci) { if(!isThrowable(ci)) return _d_newclass_original(ci); auto size = ci.init.length; if(exceptionHolderPosition + size > exceptionHolder.length) return _d_newclass_original(ci); byte[] slice = exceptionHolder[exceptionHolderPosition .. exceptionHolderPosition + size]; exceptionHolderPosition += size; slice[] = ci.init[]; import core.stdc.stdio; printf("Magic allocation to %d\n", exceptionHolderPosition); return cast(Object) slice.ptr; } extern(C) void _d_freeexception(Throwable t) { auto ptr = cast(void*) t; if(ptr >= exceptionHolder.ptr && ptr < exceptionHolder.ptr + exceptionHolder.length) { exceptionHolder[] = 0; exceptionHolderPosition = 0; import core.stdc.stdio; printf("Freeing\n"); } // else do nothing, the GC will handle it } void main() { import std.stdio; try { writefln("%s"); // orphaned argument } catch(Exception e) { scope(success) _d_freeexception(e); writeln(e); } } // copy/paste from druntime as fallback extern (C) void onOutOfMemoryError(); extern (C) void* gc_malloc( size_t sz, uint ba = 0 ); extern (C) Object _d_newclass_original(const ClassInfo ci) { import core.stdc.stdlib; static import core.memory; alias BlkAttr = core.memory.GC.BlkAttr; void* p; if (ci.m_flags & TypeInfo_Class.ClassFlags.isCOMclass) { p = malloc(ci.init.length); if (!p) onOutOfMemoryError(); } else { // TODO: should this be + 1 to avoid having pointers to the next block? BlkAttr attr = BlkAttr.FINALIZE; // extern(C++) classes don't have a classinfo pointer in their vtable so the GC can't finalize them if (ci.m_flags & TypeInfo_Class.ClassFlags.isCPPclass) attr &= ~BlkAttr.FINALIZE; if (ci.m_flags & TypeInfo_Class.ClassFlags.noPointers) attr |= BlkAttr.NO_SCAN; p = gc_malloc(ci.init.length, attr); } // initialize it (cast(byte*) p)[0 .. ci.init.length] = ci.init[]; debug(PRINTF) printf("initialization done\n"); return cast(Object) p; } === Just compile and run normally, the linker will prefer our d_newclass to the one in phobos.lib automatically. And you'll see the throw from writeln went into our static buffer and was freed at the end. I toyed with a few other things too: void main() { import std.stdio; try { try { writefln("%s"); // orphaned argument } catch(Exception e) { scope(success) _d_freeexception(e); // don't forget these throw new Exception("LOL", e); } } catch(Exception e) { scope(success) _d_freeexception(e); writeln(e); writeln(e.next); } } still works. Am I missing a fatal flaw here? It seems to work and is kinda simple to do... exceptions don't really need a huge amount of dynamic memory.
Feb 06 2014
next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 6 February 2014 at 22:56:45 UTC, Adam D. Ruppe wrote:
 Proof of concept:
code in a link so the lines aren't broken http://arsdnet.net/dcode/except.d
Feb 06 2014
prev sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Thursday, 6 February 2014 at 22:56:45 UTC, Adam D. Ruppe wrote:
 On Thursday, 6 February 2014 at 21:38:03 UTC, Dicebot wrote:
 Any application that operates on some external user input will 
 be subject to DoS attack vector if it uses Phobos directly.
Hmm, I hadn't considered that. Maybe exceptions could be handled automatically though due to the facts that there are rarely more than one in flight at any time and they typically don't live for long: [snipped lengthy example]
I really like vibe.d. A lot. But the way HTTP parse errors are handled is a disaster. Do you know what happened when I was testing vibe.d recently and I sent it a bad request? It sent a stack trace as a responses. A stack trace! To a client! I was speechless. Needless to say, I don't support the idea of further enabling this design, regardless of whether it can be made a pinnacle of elegance.
Feb 06 2014
next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 7 February 2014 at 03:19:32 UTC, Sean Kelly wrote:
 It sent a stack trace as a responses.  A stack trace!  To a 
 client!  I was speechless.
lol, my cgi.d will do that too if you compile with -debug.... I find it convenient at times. (It also sends it to stderr but when doing cgi apps, that means digging into the apache log which is a pain compared to just looking at the browser)
Feb 06 2014
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2014-02-07 04:19, Sean Kelly wrote:

 I really like vibe.d.  A lot.  But the way HTTP parse errors are handled
 is a disaster.  Do you know what happened when I was testing vibe.d
 recently and I sent it a bad request?  It sent a stack trace as a
 responses.  A stack trace!  To a client!  I was speechless.  Needless to
 say, I don't support the idea of further enabling this design,
 regardless of whether it can be made a pinnacle of elegance.
Ruby on Rails renders a page with a stack trace in development mode and a standard 500 page in production mode. I can't understand how anyone can do web development without that. There's even a plugin that renders a the stack trace as links pointing back to your editor (if supported). It also allows you to navigate the stack trace with a code snippet and simple debugger for each stack frame. Very convenient. -- /Jacob Carlborg
Feb 07 2014
parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 7 February 2014 at 20:31:00 UTC, Jacob Carlborg wrote:
 On 2014-02-07 04:19, Sean Kelly wrote:

 I really like vibe.d.  A lot.  But the way HTTP parse errors 
 are handled
 is a disaster.  Do you know what happened when I was testing 
 vibe.d
 recently and I sent it a bad request?  It sent a stack trace 
 as a
 responses.  A stack trace!  To a client!  I was speechless.  
 Needless to
 say, I don't support the idea of further enabling this design,
 regardless of whether it can be made a pinnacle of elegance.
Ruby on Rails renders a page with a stack trace in development mode and a standard 500 page in production mode. I can't understand how anyone can do web development without that. There's even a plugin that renders a the stack trace as links pointing back to your editor (if supported). It also allows you to navigate the stack trace with a code snippet and simple debugger for each stack frame. Very convenient.
I was mostly surprised that the stack trace was written back to the client. I'd expect something like that in a log on the server side. I do see how it would be convenient to have a stack trace included in a bug report, but if this feature is disabled in release mode then you can't rely on it anyway. I'd just always be checking the logs (where I'd hope the stack trace would always be written).
Feb 07 2014
parent Jacob Carlborg <doob me.com> writes:
On 2014-02-07 21:56, Sean Kelly wrote:

 I was mostly surprised that the stack trace was written back to
 the client.  I'd expect something like that in a log on the
 server side.  I do see how it would be convenient to have a stack
 trace included in a bug report, but if this feature is disabled
 in release mode then you can't rely on it anyway.  I'd just
 always be checking the logs (where I'd hope the stack trace would
 always be written).
Ruby on Rails always writes the stack trace to the log. In development mode it will also render it to the client. In production mode we use a plugin that sends an email when an exception occurs. The email will contain the full stack trace, environment variables and some other data about the request that failed. BTW, you can do a lot more with HTML than plain text (log files). -- /Jacob Carlborg
Feb 09 2014
prev sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 6 February 2014 at 21:38:03 UTC, Dicebot wrote:
 Hardly so. Any exception allocation can trigger GC collection 
 cycle and Phobos does not provide any other way to handle data 
 errors. Any application that operates on some external user 
 input will be subject to DoS attack vector if it uses Phobos 
 directly.
Thinking about this more it'd probably be a good idea to use the type system to segregate non-validated user input from the rest of your program. UnvalidatedString or something. UnvalidatedString.validate() returns a string you can then use in the regular fashion. That way unvalidated data can't weasel its way into the trusted portion of your program without getting checked first. Anyway, that's just an idea (and getting further and further off topic).
Feb 06 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Friday, 7 February 2014 at 05:25:26 UTC, Brad Anderson wrote:
 Thinking about this more it'd probably be a good idea to use the
 type system to segregate non-validated user input from the rest
 of your program. UnvalidatedString or something.
 UnvalidatedString.validate() returns a string you can then use 
 in
 the regular fashion. That way unvalidated data can't weasel its
 way into the trusted portion of your program without getting
 checked first. Anyway, that's just an idea (and getting further
 and further off topic).
Yes, I even had some simple proof-of-concept drafts of such approach for vibe.d but have never finished it. User input is not a problem if Phobos will provide more strongly typed nothrow tools.
Feb 07 2014
parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 7 February 2014 at 11:06:47 UTC, Dicebot wrote:
 Yes, I even had some simple proof-of-concept drafts of such 
 approach for vibe.d but have never finished it. User input is 
 not a problem if Phobos will provide more strongly typed 
  nothrow tools.
Yeah, I think using separate types for printing to users is often a good idea too, since then the type system can help with i18n.
Feb 07 2014
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Feb 2014 14:08:39 -0500, Adam D. Ruppe  
<destructionator gmail.com> wrote:

 On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 Hey, wait a second. How do you throw without allocating?
I think exceptions should be ok. You optimize the typical path, and exceptions are (by definition) an exceptional path. If they are also unacceptable, you could restrict yourself to nothrow functions. (Which can still throw Errors... but meh they are even *more* exceptional)
I think if reference counting is added, exceptions would be a prime candidate for using it. They are basically discarded immediately after being handled. -Steve
Feb 06 2014
prev sibling next sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei 
 Alexandrescu wrote:
 One interesting point is that module that were written with 
 avoiding
 allocations in mind usually still allocate when throwing 
 exceptions.
Good point, we need to address that as well.
Hey, wait a second. How do you throw without allocating?
Does this case even matter? Exceptions are not a normal function of execution, and so should happen rarely to never. And it's a time when I'd expect a delay anyway.
Feb 06 2014
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/6/14, 11:54 AM, Sean Kelly wrote:
 On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:
 One interesting point is that module that were written with avoiding
 allocations in mind usually still allocate when throwing exceptions.
Good point, we need to address that as well.
Hey, wait a second. How do you throw without allocating?
Does this case even matter? Exceptions are not a normal function of execution, and so should happen rarely to never. And it's a time when I'd expect a delay anyway.
I think it's okay to put this on the backburner and revisit it later. Andrei
Feb 06 2014
prev sibling next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 6 February 2014 at 19:54:27 UTC, Sean Kelly wrote:
 Does this case even matter?  Exceptions are not a normal 
 function of execution, and so should happen rarely to never.  
 And it's a time when I'd expect a delay anyway.
Imagine intentionally crafted broken utf as user input in repeated requests. You don't have control over it. Now if Phobos would have only thrown exceptions in really _exceptional_ situations and handled broken input gracefully...
Feb 06 2014
next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 6 February 2014 at 21:48:13 UTC, Dicebot wrote:
 On Thursday, 6 February 2014 at 19:54:27 UTC, Sean Kelly wrote:
 Does this case even matter?  Exceptions are not a normal 
 function of execution, and so should happen rarely to never.  
 And it's a time when I'd expect a delay anyway.
Imagine intentionally crafted broken utf as user input in repeated requests. You don't have control over it. Now if Phobos would have only thrown exceptions in really _exceptional_ situations and handled broken input gracefully...
You should probably validate utf from all foreign sources. Catch a problem with it as it comes in rather than in some arbitrary part of your program.
Feb 06 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 6 February 2014 at 22:18:10 UTC, Brad Anderson wrote:
 You should probably validate utf from all foreign sources. 
 Catch a problem with it as it comes in rather than in some 
 arbitrary part of your program.


pure safe void validate(S)(in S str) if (isSomeString!S); Throws: UTFException if str is not well-formed. ;)
Feb 06 2014
next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 6 February 2014 at 22:20:38 UTC, Dicebot wrote:
 On Thursday, 6 February 2014 at 22:18:10 UTC, Brad Anderson 
 wrote:
 You should probably validate utf from all foreign sources. 
 Catch a problem with it as it comes in rather than in some 
 arbitrary part of your program.


pure safe void validate(S)(in S str) if (isSomeString!S); Throws: UTFException if str is not well-formed. ;)
Heh, well then... let me just wipe this egg off my face. :P
Feb 06 2014
prev sibling next sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Thursday, 6 February 2014 at 22:20:38 UTC, Dicebot wrote:
 On Thursday, 6 February 2014 at 22:18:10 UTC, Brad Anderson 
 wrote:
 You should probably validate utf from all foreign sources. 
 Catch a problem with it as it comes in rather than in some 
 arbitrary part of your program.


pure safe void validate(S)(in S str) if (isSomeString!S); Throws: UTFException if str is not well-formed.
And somewhere in the world, darkness fell forever on a bright and beautiful countryside. The monsters poured forth and devoured everything in sight, given strength by that unbelievable abomination of a function design.
Feb 06 2014
next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 7 February 2014 at 03:14:45 UTC, Sean Kelly wrote:
 On Thursday, 6 February 2014 at 22:20:38 UTC, Dicebot wrote:
 UTFException if str is not well-formed.
unbelievable abomination of a function design.
Yeah, that is absurd. It is a bad, bad sign when almost every time you use a function, you write bool ok = true; try validate(s); catch(UTFException) ok = false; if(!ok) {} yet that's how i use validate... fun fact, my little toy scripting language supports var a = try foo();; // if foo throws, a == the exception object but it's a toy scripting language, ugly crap is allowed there :)
Feb 06 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/6/14, 7:27 PM, Adam D. Ruppe wrote:
 On Friday, 7 February 2014 at 03:14:45 UTC, Sean Kelly wrote:
 On Thursday, 6 February 2014 at 22:20:38 UTC, Dicebot wrote:
 UTFException if str is not well-formed.
unbelievable abomination of a function design.
Yeah, that is absurd. It is a bad, bad sign when almost every time you use a function, you write bool ok = true; try validate(s); catch(UTFException) ok = false; if(!ok) {} yet that's how i use validate...
Add a bugzilla and let's define isValid that returns bool! Andrei
Feb 07 2014
parent reply "Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:
On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu 
wrote:
 Add a bugzilla and let's define isValid that returns bool!
Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code but assigns the return value through another parameter.
Feb 07 2014
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!
Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code
Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec. -- Dmitry Olshansky
Feb 07 2014
next sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/7/14, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 Much simpler - it returns a special dchar to designate bad encoding. And
 there is one defined by Unicode spec.
A NaN for chars? Sounds great to me! :)
Feb 07 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
07-Feb-2014 21:07, Andrej Mitrovic пишет:
 On 2/7/14, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 Much simpler - it returns a special dchar to designate bad encoding. And
 there is one defined by Unicode spec.
A NaN for chars? Sounds great to me! :)
It's called \uFFFD and is specifically for bad encodings. I wonder why nobody had perused the spec when writing std.utf.decode in the first place... 5.22 Best Practice for U+FFFD Substitution When converting text from one character encoding to another, a conversion algorithm may encounter unconvertible code units. This is most commonly caused by some sort of corruption of the source data, so that it does not correctly follow the specification for that character encoding. Examples include dropping a byte in a multibyte encoding such as Shift-JIS, improper concatenation of strings, a mismatch between an encoding declaration and actual encoding of text, use of non-shortest form for UTF-8, and so on. ... Whenever an unconvertible offset is reached during conversion of a code unit sequence: 1. The maximal subpart at that offset should be replaced by a single U+FFFD. 2. The conversion should proceed at the offset immediately after the maximal subpart. --- Fast, simple and according to the standard. Best of all - no stinkin' exceptions! ;) -- Dmitry Olshansky
Feb 07 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/7/2014 12:14 PM, Dmitry Olshansky wrote:
 Fast, simple and according to the standard. Best of all - no stinkin'
 exceptions! ;)
Nice find. Looks good to me.
Feb 08 2014
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
09-Feb-2014 02:16, Walter Bright пишет:
 On 2/7/2014 12:14 PM, Dmitry Olshansky wrote:
 Fast, simple and according to the standard. Best of all - no stinkin'
 exceptions! ;)
Nice find. Looks good to me.
https://d.puremagic.com/issues/show_bug.cgi?id=12113 -- Dmitry Olshansky
Feb 08 2014
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!
Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code
Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.
Isn't that actually worse? Unless you're suggesting that we stop throwing on decode errors, then functions like std.array.front will have to check the result on every call to see whether it was valid or not and thus whether they should throw, which would mean extra overhead over simply having decode throw on decode errors. validate has no business throwing, and we definitely should add isValidUnicode (or isValid or whatever you want to call it) for validation purposes. Code can then call that to validate that a string is valid and not worry about any UTFExceptions being thrown as long as it doesn't manipulate the string in a way that could result in its Unicode becoming invalid. However, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO. You're essentially dealing with error codes at that point, and I think that experience has shown quite clearly that error codes are generally a bad way to go. Almost no one checks them unless they have to. I think that having decode throw on invalid Unicode is exactly what it should be doing. The problem is that validate shouldn't. - Jonathan M Davis
Feb 07 2014
next sibling parent reply "Meta" <jared771 gmail.com> writes:
On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis 
wrote:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei 
 Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!
Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code
Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.
Isn't that actually worse? Unless you're suggesting that we stop throwing on decode errors, then functions like std.array.front will have to check the result on every call to see whether it was valid or not and thus whether they should throw, which would mean extra overhead over simply having decode throw on decode errors. validate has no business throwing, and we definitely should add isValidUnicode (or isValid or whatever you want to call it) for validation purposes. Code can then call that to validate that a string is valid and not worry about any UTFExceptions being thrown as long as it doesn't manipulate the string in a way that could result in its Unicode becoming invalid. However, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO. You're essentially dealing with error codes at that point, and I think that experience has shown quite clearly that error codes are generally a bad way to go. Almost no one checks them unless they have to. I think that having decode throw on invalid Unicode is exactly what it should be doing. The problem is that validate shouldn't. - Jonathan M Davis
You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.
Feb 07 2014
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, February 07, 2014 23:01:46 Meta wrote:
 On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis
 
 wrote:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei
 
 Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!
Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code
Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.
Isn't that actually worse? Unless you're suggesting that we stop throwing on decode errors, then functions like std.array.front will have to check the result on every call to see whether it was valid or not and thus whether they should throw, which would mean extra overhead over simply having decode throw on decode errors. validate has no business throwing, and we definitely should add isValidUnicode (or isValid or whatever you want to call it) for validation purposes. Code can then call that to validate that a string is valid and not worry about any UTFExceptions being thrown as long as it doesn't manipulate the string in a way that could result in its Unicode becoming invalid. However, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO. You're essentially dealing with error codes at that point, and I think that experience has shown quite clearly that error codes are generally a bad way to go. Almost no one checks them unless they have to. I think that having decode throw on invalid Unicode is exactly what it should be doing. The problem is that validate shouldn't. - Jonathan M Davis
You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.
How is that any better than returning an invalid dchar with a specific value? In either case, you have to check the value. With the exception, code doesn't have to care. If the string is invalid, it'll get a UTFException, and it can handle it appropriately, but having to check the return value just adds overhead (albeit minimal) and is error-prone, because it generally won't be checked (and if it is checked, it complicates the calling code, because it has to do the check). Code that doesn't want to risk a UTFException being thrown can validate up front - and that validator function return bool and _not_ throw. But having decode not throw is going to be error-prone. It also doesn't help performance- wise, because it still has to do all of the same validity checks as it decodes. It's just that instead of throwing, it returns an error value. I really think that having decode throw on invalid Unicode is the right decision, and I don't see what we gain by making it not throw. - Jonathan M Davis
Feb 07 2014
parent reply "Meta" <jared771 gmail.com> writes:
On Friday, 7 February 2014 at 23:45:06 UTC, Jonathan M Davis 
wrote:
 On Friday, February 07, 2014 23:01:46 Meta wrote:
 On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis
 
 wrote:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei
 
 Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns 
 bool!
Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code
Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.
Isn't that actually worse? Unless you're suggesting that we stop throwing on decode errors, then functions like std.array.front will have to check the result on every call to see whether it was valid or not and thus whether they should throw, which would mean extra overhead over simply having decode throw on decode errors. validate has no business throwing, and we definitely should add isValidUnicode (or isValid or whatever you want to call it) for validation purposes. Code can then call that to validate that a string is valid and not worry about any UTFExceptions being thrown as long as it doesn't manipulate the string in a way that could result in its Unicode becoming invalid. However, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO. You're essentially dealing with error codes at that point, and I think that experience has shown quite clearly that error codes are generally a bad way to go. Almost no one checks them unless they have to. I think that having decode throw on invalid Unicode is exactly what it should be doing. The problem is that validate shouldn't. - Jonathan M Davis
You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.
How is that any better than returning an invalid dchar with a specific value? In either case, you have to check the value. With the exception, code doesn't have to care. If the string is invalid, it'll get a UTFException, and it can handle it appropriately, but having to check the return value just adds overhead (albeit minimal) and is error-prone, because it generally won't be checked (and if it is checked, it complicates the calling code, because it has to do the check).
We have had this discussion at least once before. A hypothetical Option type will not let you do anything with the wrapped value UNTIL you check it, as opposed to returning null, -1, some special Unicode value, etc. Trying to use it before this check is necessarily a compile-time error. This is both faster than exceptions and safer than special "error values" that are only special by convention. I recall that you've worked with Haskell before, so you must know how useful this pattern is.
 Code that doesn't want to risk a UTFException being thrown can 
 validate up
 front - and that validator function return bool and _not_ 
 throw. But having
 decode not throw is going to be error-prone. It also doesn't 
 help performance-
 wise, because it still has to do all of the same validity 
 checks as it
 decodes. It's just that instead of throwing, it returns an 
 error value. I
 really think that having decode throw on invalid Unicode is the 
 right
 decision, and I don't see what we gain by making it not throw.

 - Jonathan M Davis
Feb 07 2014
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Saturday, February 08, 2014 01:26:10 Meta wrote:
 You could always return an Option!char. Nullable won't work
 because it lets you access the naked underlying value.
How is that any better than returning an invalid dchar with a specific value? In either case, you have to check the value. With the exception, code doesn't have to care. If the string is invalid, it'll get a UTFException, and it can handle it appropriately, but having to check the return value just adds overhead (albeit minimal) and is error-prone, because it generally won't be checked (and if it is checked, it complicates the calling code, because it has to do the check).
We have had this discussion at least once before. A hypothetical Option type will not let you do anything with the wrapped value UNTIL you check it, as opposed to returning null, -1, some special Unicode value, etc. Trying to use it before this check is necessarily a compile-time error. This is both faster than exceptions and safer than special "error values" that are only special by convention. I recall that you've worked with Haskell before, so you must know how useful this pattern is.
The problem is that you need to check it. This is _slower_ than exceptions in the normal case, as invalid Unicode should be the rare case. The great thing with exceptions is that you can write your code as if it will always work and don't need to put checks in it everywhere. Instead, you just put try-catch blocks in the (relatively) few places that you want to handle exceptions. Most of your code doesn't care. And if you validate the string before you start doing a bunch of operations on it, then you don't have to worry about a UTFException being thrown. Also, if code fails to validate a string for one reason or another, the error gets reported rather than an invalid return value being ignored. As for returning Optional/Nullable dchar vs an invalid dchar, I don't see much difference. In both cases, you have to check the return value, which is precisely what you don't want to have to do in most cases. And decode has to do the same work to check for valid Unicode whether it throws an exception or returns a value indicating decode-failure, so why have the extra overhead of having to check the result for decode-failure? Just let it throw an exception in that case and handle it in the appropriate part of your code. Returning a Nullable result or a specific bad value that you have to check rather than throwing an exception only makes sense when it's expected that failures are going to be frequent. If failures are infrequent, it's generally far better to use exceptions, because it will lead to much cleaner, less error-prone code. - Jonathan M Davis
Feb 07 2014
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 The problem is that you need to check it. This is _slower_ than 
 exceptions in the normal case,
Right, but verifying the correctness of the Unicode encoding of a string probably on average requires much more than time than testing a single conditional. So I think this tiny added time is acceptable. Bye, bearophile
Feb 07 2014
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Saturday, February 08, 2014 02:41:54 bearophile wrote:
 Jonathan M Davis:
 The problem is that you need to check it. This is _slower_ than
 exceptions in the normal case,
Right, but verifying the correctness of the Unicode encoding of a string probably on average requires much more than time than testing a single conditional. So I think this tiny added time is acceptable.
But why even do it in the first place then? The code is cleaner and less error-prone if it uses exceptions. The only argument I can see being made for not using exceptions with decode is efficiency, because it's more cumbersome to use if it's returning error values of some kind rather than just throwing in the rare case that there's a Unicode decoding error. It's also more error- prone than using exceptions, because most code will just skip checking the result. That's one of the big reasons that error codes are generally a bad idea. But since decode has to do the same validity checks whether it returns an invalid dchar or a Nullable!dchar or if it throws, I don't see why not having the exception buys us anything. It just makes the API worse. - Jonathan M Davis
Feb 07 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 07 Feb 2014 22:42:00 -0500
schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:

 On Saturday, February 08, 2014 02:41:54 bearophile wrote:
 Jonathan M Davis:
 The problem is that you need to check it. This is _slower_ than
 exceptions in the normal case,
Right, but verifying the correctness of the Unicode encoding of a string probably on average requires much more than time than testing a single conditional. So I think this tiny added time is acceptable.
But why even do it in the first place then? The code is cleaner and less error-prone if it uses exceptions. The only argument I can see being made for not using exceptions with decode is efficiency, because it's more cumbersome to use if it's returning error values of some kind rather than just throwing in the rare case that there's a Unicode decoding error. It's also more error- prone than using exceptions, because most code will just skip checking the result. That's one of the big reasons that error codes are generally a bad idea. But since decode has to do the same validity checks whether it returns an invalid dchar or a Nullable!dchar or if it throws, I don't see why not having the exception buys us anything. It just makes the API worse. - Jonathan M Davis
I agree with both of you. The Unicode standard tells us that it is correct to replace invalid data with that special code point, so it should be used where applicable, e.g. when one sanitizes an invalid string. On the other hand exceptions are clearly superior to error returns. I guess we just have two use cases here. One where invalid encoding is not an error (e.g. for sanitizing purposes) and one where you don't want to lose information and have to enforce correct encoding. Name the first one "decodeSubst" maybe and have decode call that and check for 0xFFFD? -- Marco
Feb 07 2014
next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 8 Feb 2014 05:29:35 +0100
schrieb Marco Leise <Marco.Leise gmx.de>:

 Name the first one "decodeSubst" maybe and have decode call
 that and check for 0xFFFD?
Err... the other way round. 0xFFFD would actually be valid from an encoding point of view, I guess. -- Marco
Feb 07 2014
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, February 08, 2014 05:29:35 Marco Leise wrote:
 I guess we just have two use cases here. One where invalid
 encoding is not an error (e.g. for sanitizing purposes) and
 one where you don't want to lose information and have to
 enforce correct encoding.
 Name the first one "decodeSubst" maybe and have decode call
 that and check for 0xFFFD?
I think that that would call for us to have 3 related but distinct functions: 1. decode, which throws on invalid Unicode. We already have this. 2. isValidUnicode, which returns whether the string is valid Unicode and does not throw. We don't yet have this. Rather, we have validate which does the same job and then throws instead of returning bool. 3. sanitizeUnicode (or whatever would be a good name for it), which replaces invalid Unicode with 0xFFFD (or whatever the appropriate character is) so that it can be operated on without causing decode to throw in spite of the fact that it was invalid Unicode. We don't have anything like this yet. - Jonathan M Davis
Feb 07 2014
next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 07 Feb 2014 21:04:08 -0800
schrieb Jonathan M Davis <jmdavisProg gmx.com>:

 On Saturday, February 08, 2014 05:29:35 Marco Leise wrote:
 I guess we just have two use cases here. One where invalid
 encoding is not an error (e.g. for sanitizing purposes) and
 one where you don't want to lose information and have to
 enforce correct encoding.
 Name the first one "decodeSubst" maybe and have decode call
 that and check for 0xFFFD?
I think that that would call for us to have 3 related but distinct functions: 1. decode, which throws on invalid Unicode. We already have this. 2. isValidUnicode, which returns whether the string is valid Unicode and does not throw. We don't yet have this. Rather, we have validate which does the same job and then throws instead of returning bool.
Yes, that's the one that needs to be added.
 3. sanitizeUnicode (or whatever would be a good name for it), which replaces 
 invalid Unicode with 0xFFFD (or whatever the appropriate character is) so that 
 it can be operated on without causing decode to throw in spite of the fact 
 that it was invalid Unicode. We don't have anything like this yet.
And oh wonder, we actually have that already! Problem solved: (Not that I knew that before hand *cough*) Or does someone have a need to also sanitize code point by code point?
 - Jonathan M Davis
-- Marco
Feb 07 2014
prev sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Saturday, 8 February 2014 at 05:04:35 UTC, Jonathan M Davis
wrote:
 I think that that would call for us to have 3 related but 
 distinct functions:

 1. decode, which throws on invalid Unicode. We already have 
 this.
I wonder if it'd be too reckless to just make decode for string nothrow (we want this function to be as fast as possible) and just require that string, by definition, must be valid unicode. to!string and company could validate strings as they come in from foreign sources. This way invalid unicode is caught early and decode gets a speedup. char[] is different because the mutability means it could be made invalid at any time so we can't rely on it staying valid after it's been checked but once a string has been confirmed valid there is no reason to check it for validity ever again.
Feb 08 2014
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 02/08/2014 07:44 PM, Brad Anderson wrote:
 On Saturday, 8 February 2014 at 05:04:35 UTC, Jonathan M Davis
 wrote:
 I think that that would call for us to have 3 related but distinct
 functions:

 1. decode, which throws on invalid Unicode. We already have this.
I wonder if it'd be too reckless to just make decode for string nothrow (we want this function to be as fast as possible) and just require that string, by definition, must be valid unicode. to!string and company could validate strings as they come in from foreign sources. This way invalid unicode is caught early and decode gets a speedup. char[] is different because the mutability means it could be made invalid at any time so we can't rely on it staying valid after it's been checked but once a string has been confirmed valid there is no reason to check it for validity ever again.
"☹"[1..$]
Feb 08 2014
prev sibling parent "Dominikus Dittes Scherkl" writes:
On Saturday, 8 February 2014 at 18:44:38 UTC, Brad Anderson wrote:
 I wonder if it'd be too reckless to just make decode for string
 nothrow (we want this function to be as fast as possible) and
Yes. It shouldn't throw. Never.
 just require that string, by definition, must be valid unicode.
Why? Replacement of broken code is defined by unicode - we should use it. Noone prevents you to call isValidUnicode beforehand and handle that sepearately if it returns "false" (I would recomment that only if security is relevant e.g. if you chack a signature or something like that) or search for 0xFFFD in the result string afterwards and throw if you find some (but this is generally no good idea because the replacement characters may have been there even before and were intended). As default relplacing broken characters is very good. And fast.
Feb 08 2014
prev sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, February 07, 2014 21:04:08 Jonathan M Davis wrote:
 On Saturday, February 08, 2014 05:29:35 Marco Leise wrote:
 I guess we just have two use cases here. One where invalid
 encoding is not an error (e.g. for sanitizing purposes) and
 one where you don't want to lose information and have to
 enforce correct encoding.
 Name the first one "decodeSubst" maybe and have decode call
 that and check for 0xFFFD?
I think that that would call for us to have 3 related but distinct functions: 1. decode, which throws on invalid Unicode. We already have this. 2. isValidUnicode, which returns whether the string is valid Unicode and does not throw. We don't yet have this. Rather, we have validate which does the same job and then throws instead of returning bool. 3. sanitizeUnicode (or whatever would be a good name for it), which replaces invalid Unicode with 0xFFFD (or whatever the appropriate character is) so that it can be operated on without causing decode to throw in spite of the fact that it was invalid Unicode. We don't have anything like this yet.
Actually, thinking this through some more, if we can replace invalid Unicode with 0xFFFD, and have all algorithms work with that and consider it valid Unicode (rather than getting weird bugs due to invalid Unicode), then if decode returned that on error rather than throwing, we wouldn't actually need to check the return value. It wouldn't matter that the Unicode was invalid. So, we wouldn't even need to _care_ that the Unicode was invalid. Anyone who _did_ care could call isValidUnicode to validate the Unicode first, and those who didn't wouldn't need to worry about UTFException being thrown, because everything would still work even if the string was invalid Unicode. So, if that's indeed what 0xFFFD does, and that's what Dmitry meant by proposing that we return that rather than throwing, then I rescind my assessment that throwing was the best way to go and have to agree that returning 0xFFFD would be better. I was responding under the assumption that you had to check for 0xFFFD and respond to it order to avoid having your code be buggy, in which case throwing would be far better. But if 0xFFFD is considered valid Unicode, then returning that would be a fantastic solution. And if that's the case, we only need two functions, not three: 1. decode, which returns 0xFFFD on decode failure 2. isValidUnicode, which returns whether the string is valid And I actually really like the idea that we could just operate on invalid Unicode as valid Unicode this way, making it so that most code doesn't need to care, and code that _does_ need to care, can validate the strings first. Right now, pretty much all string code needs to care in order to avoid processing invalid Unicode, which is much messier. - Jonathan M Davis
Feb 07 2014
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
08-Feb-2014 09:45, Jonathan M Davis пишет:
 On Friday, February 07, 2014 21:04:08 Jonathan M Davis wrote:
 Actually, thinking this through some more, if we can replace invalid Unicode
 with 0xFFFD, and have all algorithms work with that and consider it valid
 Unicode (rather than getting weird bugs due to invalid Unicode), then if
 decode returned that on error rather than throwing, we wouldn't actually need
 to check the return value. It wouldn't matter that the Unicode was invalid.
 So, we wouldn't even need to _care_ that the Unicode was invalid. Anyone who
 _did_ care could call isValidUnicode to validate the Unicode first, and those
 who didn't wouldn't need to worry about UTFException being thrown, because
 everything would still work even if the string was invalid Unicode.
Hm.. yes. I gotta read the whole thread next time :)
 So, if that's indeed what 0xFFFD does, and that's what Dmitry meant by
 proposing that we return that rather than throwing, then I rescind my
 assessment that throwing was the best way to go and have to agree that
 returning 0xFFFD would be better. I was responding under the assumption that
 you had to check for 0xFFFD and respond to it order to avoid having your code
 be buggy, in which case throwing would be far better. But if 0xFFFD is
 considered valid Unicode,
It is.
 then returning that would be a fantastic solution.
 And if that's the case, we only need two functions, not three:

 1. decode, which returns 0xFFFD on decode failure

 2. isValidUnicode, which returns whether the string is valid
Yay.
 And I actually really like the idea that we could just operate on invalid
 Unicode as valid Unicode this way, making it so that most code doesn't need to
 care, and code that _does_ need to care, can validate the strings first. Right
 now, pretty much all string code needs to care in order to avoid processing
 invalid Unicode, which is much messier.
Horray! The goodness is that for example I can run regex on partially broken text and have some sane results out of it.
 - Jonathan M Davis
-- Dmitry Olshansky
Feb 08 2014
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
08-Feb-2014 03:01, Meta пишет:
 On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis wrote:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.
This is ridiculously distracting suggestion and simply has no merits whatsoever. To underline how impractical this suggestion is: currently every code out there expect dchar out of .front not some magic animal called 'Option!char'. -- Dmitry Olshansky
Feb 08 2014
parent reply "Meta" <jared771 gmail.com> writes:
On Saturday, 8 February 2014 at 11:24:56 UTC, Dmitry Olshansky 
wrote:
 08-Feb-2014 03:01, Meta пишет:
 On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis 
 wrote:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.
This is ridiculously distracting suggestion and simply has no merits whatsoever. To underline how impractical this suggestion is: currently every code out there expect dchar out of .front not some magic animal called 'Option!char'.
I'm not actually suggesting a replacement. Just wishful thinking on how the function could've been better designed.
Feb 08 2014
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, February 08, 2014 18:03:54 Meta wrote:
 On Saturday, 8 February 2014 at 11:24:56 UTC, Dmitry Olshansky
 
 wrote:
 08-Feb-2014 03:01, Meta пишет:
 On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis
 
 wrote:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.
This is ridiculously distracting suggestion and simply has no merits whatsoever. To underline how impractical this suggestion is: currently every code out there expect dchar out of .front not some magic animal called 'Option!char'.
I'm not actually suggesting a replacement. Just wishful thinking on how the function could've been better designed.
I don't see how returning Nullable!dchar would improve decode function at all. Currently, it throws on invalid UTF, so you don't have to check the return value, and your code can avoid caring about decode errors except for the points where you put your catches (which are generally in far fewer places than the number of places that decode gets called - be it directly or indirectly). On the other hand, with Nullable!dchar, you'd have to always check the result or risking hitting an assertion when you don't check the result (or ending up with dchar.init in -release). I don't see how that's better than the current situation at all. It just makes decode harder to use. And Dmitry's suggestion is better than both. We end up returning the Unicode character specifically intended to designate bad encodings (\uFFFD) such that you don't even have to care that there was a decode error. You just decode the string and use it. It will just be one more character in the string that doesn't match what you're looking for for find and the like, and pretty much nothing should choke on it. Anything which then cares about Unicode validity can use isValidUnicode (once we have it) to validate the string instead of relying on decode to throw. It will clean up string processing in the face of invalid Unicode quite nicely. So, I don't see how using Nullable!dchar as you suggest would ever have been a better design. - Jonathan M Davis
Feb 08 2014
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
08-Feb-2014 02:57, Jonathan M Davis пишет:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!
Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code
Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.
Isn't that actually worse?
No, it's better and more flexible for those who care to repair broken text in case it's broken. We currently have ZERO facilities to work with partly broken UTF and it's not that rare thing to have it.
 Unless you're suggesting that we stop throwing on
 decode errors,
That is exactly what I suggest. then functions like std.array.front will have to check the
 result on every call to see whether it was valid or not and thus whether they
 should throw, which would mean extra overhead over simply having decode throw
 on decode errors.
Why the heck? It will not throw either. In the very end bad encoding is handled by displaying the 'substituted' (typically '?') character in places where it broke not by throwing up hands in the air and spitting "UTF Exception: offset 4302 bad UTF sequence". This is not good enough (in case somebody though that it is). Those who care about throwing add a trivial map!(x => x != '\uFFFD' || die()) over a string, where die function throws an exception.
 validate has no business throwing, and we definitely should
 add isValidUnicode (or isValid or whatever you want to call it) for validation
 purposes. Code can then call that to validate that a string is valid and not
 worry about any UTFExceptions being thrown as long as it doesn't manipulate
 the string in a way that could result in its Unicode becoming invalid.
Yet later down the road decode will triple check that anyway. Just saying. BTW if the string was checked beforehand there is no difference between 2 approaches at all (don't have to check).
 However, I would argue that assuming that everyone is going to validate their
 strings and that pretty much all string-related functions shouldn't ever have
 to worry about invalid Unicode is just begging for subtle bugs all over the
 place IMHO. You're essentially dealing with error codes at that point, and I
 think that experience has shown quite clearly that error codes are generally a
 bad way to go. Almost no one checks them unless they have to. I think that
 having decode throw on invalid Unicode is exactly what it should be doing. The
 problem is that validate shouldn't.
Every single text editor out there seems to disagree with you: they do show you partially substituted text, not a dialog box "My bad, it's broken UTF-8, I'm giving up!". -- Dmitry Olshansky
Feb 08 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 08 Feb 2014 15:21:26 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 08-Feb-2014 02:57, Jonathan M Davis =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!
Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code
Much simpler - it returns a special dchar to designate bad encoding. A=
nd
 there is one defined by Unicode spec.
Isn't that actually worse?
=20 No, it's better and more flexible for those who care to repair broken=20 text in case it's broken. We currently have ZERO facilities to work with=
=20
 partly broken UTF and it's not that rare thing to have it.
Your argument is unsubstantiated, since we have this already:
 Unless you're suggesting that we stop throwing on
 decode errors,
=20 That is exactly what I suggest. =20 then functions like std.array.front will have to check the
 result on every call to see whether it was valid or not and thus whethe=
r they
 should throw, which would mean extra overhead over simply having decode=
throw
 on decode errors.
=20 Why the heck? It will not throw either. In the very end bad encoding is=20 handled by displaying the 'substituted' (typically '?') character in=20 places where it broke not by throwing up hands in the air and spitting=20 "UTF Exception: offset 4302 bad UTF sequence". This is not good enough=20 (in case somebody though that it is). =20 Those who care about throwing add a trivial map!(x =3D> x !=3D '\uFFFD' |=
|=20
 die()) over a string, where die function throws an exception.
Thats neither an improvement over calling "validate" nor does that deal with distinguishing between invalid UTF and \uFFFD in the input.
 validate has no business throwing, and we definitely should
 add isValidUnicode (or isValid or whatever you want to call it) for val=
idation
 purposes. Code can then call that to validate that a string is valid an=
d not
 worry about any UTFExceptions being thrown as long as it doesn't manipu=
late
 the string in a way that could result in its Unicode becoming invalid.
=20 Yet later down the road decode will triple check that anyway. Just=20 saying. BTW if the string was checked beforehand there is no difference=20 between 2 approaches at all (don't have to check). =20
 However, I would argue that assuming that everyone is going to validate=
their
 strings and that pretty much all string-related functions shouldn't eve=
r have
 to worry about invalid Unicode is just begging for subtle bugs all over=
the
 place IMHO. You're essentially dealing with error codes at that point, =
and I
 think that experience has shown quite clearly that error codes are gene=
rally a
 bad way to go. Almost no one checks them unless they have to. I think t=
hat
 having decode throw on invalid Unicode is exactly what it should be doi=
ng. The
 problem is that validate shouldn't.
=20 Every single text editor out there seems to disagree with you: they do=20 show you partially substituted text, not a dialog box "My bad, it's=20 broken UTF-8, I'm giving up!".
Editor do different things. They often try to detect the encoding with a fall back to Latin1. If you open a file explicitly as UTF-8 they may display a substitution char or detect the error and use the fall back, as is the case with Geany and gedit does in fact throw an error message at you saying "My bad, it's broken UTF-8, I'm giving up!". --=20 Marco
Feb 08 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
09-Feb-2014 09:35, Marco Leise пишет:
 Am Sat, 08 Feb 2014 15:21:26 +0400
 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 08-Feb-2014 02:57, Jonathan M Davis пишет:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!
Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code
Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.
Isn't that actually worse?
No, it's better and more flexible for those who care to repair broken text in case it's broken. We currently have ZERO facilities to work with partly broken UTF and it's not that rare thing to have it.
Your argument is unsubstantiated, since we have this already:
Working with ranges of dchar? Nobody is taking eager validation from your hands anyway.
 Unless you're suggesting that we stop throwing on
 decode errors,
That is exactly what I suggest. then functions like std.array.front will have to check the
 result on every call to see whether it was valid or not and thus whether they
 should throw, which would mean extra overhead over simply having decode throw
 on decode errors.
Why the heck? It will not throw either. In the very end bad encoding is handled by displaying the 'substituted' (typically '?') character in places where it broke not by throwing up hands in the air and spitting "UTF Exception: offset 4302 bad UTF sequence". This is not good enough (in case somebody though that it is). Those who care about throwing add a trivial map!(x => x != '\uFFFD' || die()) over a string, where die function throws an exception.
Thats neither an improvement over calling "validate" nor does that deal with distinguishing between invalid UTF and
Means text is broken but wasn't ever read...
\uFFFD
 in the input.
...means text was broken sometime before. Hardly makes any difference to the most applications. Normal text doesn't contain \uFFFD. And you can test a string with proper 'validate', it's just that while decoding the default is to substitute.
 validate has no business throwing, and we definitely should
 add isValidUnicode (or isValid or whatever you want to call it) for validation
 purposes. Code can then call that to validate that a string is valid and not
 worry about any UTFExceptions being thrown as long as it doesn't manipulate
 the string in a way that could result in its Unicode becoming invalid.
Yet later down the road decode will triple check that anyway. Just saying. BTW if the string was checked beforehand there is no difference between 2 approaches at all (don't have to check).
 However, I would argue that assuming that everyone is going to validate their
 strings and that pretty much all string-related functions shouldn't ever have
 to worry about invalid Unicode is just begging for subtle bugs all over the
 place IMHO. You're essentially dealing with error codes at that point, and I
 think that experience has shown quite clearly that error codes are generally a
 bad way to go. Almost no one checks them unless they have to. I think that
 having decode throw on invalid Unicode is exactly what it should be doing. The
 problem is that validate shouldn't.
Every single text editor out there seems to disagree with you: they do show you partially substituted text, not a dialog box "My bad, it's broken UTF-8, I'm giving up!".
Editor do different things. They often try to detect the encoding with a fall back to Latin1. If you open a file explicitly as UTF-8 they may display a substitution char or detect the error and use the fall back, as is the case with Geany and
Throwing exception here is not something useful in 90% of cases. Requiring everybody to call sanitize on every string from the outside smells like a wrong default to me.
 gedit does in fact throw an error message at you
 saying "My bad, it's broken UTF-8, I'm giving up!".
I know and it's piece of junk :) Seriously it doesn't even has regular expressions for search and replace! -- Dmitry Olshansky
Feb 09 2014
next sibling parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Dmitry Olshansky"  wrote in message news:ld7dla$pdg$1 digitalmars.com... 

 gedit does in fact throw an error message at you
 saying "My bad, it's broken UTF-8, I'm giving up!".
I know and it's piece of junk :) Seriously it doesn't even has regular expressions for search and replace!
That would be a luxury, gedit doesn't even have auto-indent.
Feb 09 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sun, 9 Feb 2014 22:24:21 +1100
schrieb "Daniel Murphy" <yebbliesnospam gmail.com>:

 "Dmitry Olshansky"  wrote in message news:ld7dla$pdg$1 digitalmars.com... 
 
 gedit does in fact throw an error message at you
 saying "My bad, it's broken UTF-8, I'm giving up!".
I know and it's piece of junk :) Seriously it doesn't even has regular expressions for search and replace!
That would be a luxury, gedit doesn't even have auto-indent.
You can talk about missing features in gedit all day, but from my point of view an editor is broken when it doesn't throw an error message at you. By silently replacing incorrect UTF-8 they change the original text. 0xFFFD should probably be used only when error messages are out of question like when displaying/printing text only. -- Marco
Feb 16 2014
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Marco Leise"  wrote in message 
news:20140217030525.67a21dfc org.homedns.org...

 0xFFFD should probably be used only when error messages are
 out of question like when displaying/printing text only.
What do you use for displaying text, if not a text editor?
Feb 17 2014
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 18 Feb 2014 01:01:53 +1100
schrieb "Daniel Murphy" <yebbliesnospam gmail.com>:

 "Marco Leise"  wrote in message 
 news:20140217030525.67a21dfc org.homedns.org...
 
 0xFFFD should probably be used only when error messages are
 out of question like when displaying/printing text only.
What do you use for displaying text, if not a text editor?
That was directed at D development. Or programming with Unicode encodings in general. If you load a text file and replace broken UTF-8 with \0xFFFD or ? as Sublime 3 does, you loose information. I think that smells and asks for a big red message box. gedit is an editor that works this way. What I meant by displaying text is static UI elements, since there is no risk of propagating the error. Everything else that can notify the user of the incorrect encoding or loss of information should do so. -- Marco
Feb 17 2014
prev sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sun, 09 Feb 2014 12:18:41 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 09-Feb-2014 09:35, Marco Leise =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
 Thats neither an improvement over calling "validate" nor does
 that deal with distinguishing between invalid UTF and
=20 Means text is broken but wasn't ever read...
\uFFFD
 in the input.
...means text was broken sometime before. =20 Hardly makes any difference to the most applications. Normal text doesn't contain \uFFFD.
Of course it does. It is a valid symbol and a lot of websites describing the "Specials" Unicode block make use of it, like the one on Wikipedia: http://en.wikipedia.org/wiki/Specials_(Unicode_block) With your definition, pulling such a document from the web and parsing it in D would mean playing on broken strings.
 [...]
 Every single text editor out there seems to disagree with you: they do
 show you partially substituted text, not a dialog box "My bad, it's
 broken UTF-8, I'm giving up!".
 gedit does in fact throw an error message at you
 saying "My bad, it's broken UTF-8, I'm giving up!".
 I know and it's piece of junk :)
 Seriously it doesn't even has regular expressions for search and replace!
https://yourlogicalfallacyis.com/no-true-scotsman :p --=20 Marco
Feb 16 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
17-Feb-2014 06:19, Marco Leise пишет:
 Am Sun, 09 Feb 2014 12:18:41 +0400
 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 09-Feb-2014 09:35, Marco Leise пишет:
 Thats neither an improvement over calling "validate" nor does
 that deal with distinguishing between invalid UTF and
Means text is broken but wasn't ever read...
 \uFFFD
 in the input.
...means text was broken sometime before. Hardly makes any difference to the most applications. Normal text doesn't contain \uFFFD.
Of course it does. It is a valid symbol and a lot of websites describing the "Specials" Unicode block make use of it, like the one on Wikipedia: http://en.wikipedia.org/wiki/Specials_(Unicode_block) With your definition, pulling such a document from the web and parsing it in D would mean playing on broken strings.
In a sense, \uFFFD means broken encoding. What about lone surrogates? Private use symbols that must not occur in transmission? They all displayed in various Unicode listings. About 'playing on broken strings' - ignoring broken/partially broken strings, I specifically think that it's what most users/use cases want. A more useful and sensible default of decoding is to substitute on broken encoding. And it's a standard procedure. It's particularly better for displaying text. To remind: since it's only a decode you are still in the control of original text - in fact you may re-test what bytes are there IF you want. The way of "throw on bad encoding" could be useful but I hardly see it as what you want for default. I'm wary of breaking code that relies on throwing. For the moment I think the best course of action would be to introduce xdecode or some such that will do substitution on failure, see how it floats and then change ranges/foreach etc to use xdecode.
 [...]
 Every single text editor out there seems to disagree with you: they do
 show you partially substituted text, not a dialog box "My bad, it's
 broken UTF-8, I'm giving up!".
 gedit does in fact throw an error message at you
 saying "My bad, it's broken UTF-8, I'm giving up!".
 I know and it's piece of junk :)
 Seriously it doesn't even has regular expressions for search and replace!
https://yourlogicalfallacyis.com/no-true-scotsman :p
Well, gedit is a nice example of why just throwing exception is not good enough for many apps (editors in particular). The fact that it's piece of junk might be irrelevant ;) -- Dmitry Olshansky
Feb 18 2014
next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/18/14, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 Well, gedit is a nice example of why just throwing exception is not good
 enough for many apps (editors in particular). The fact that it's piece
 of junk might be irrelevant ;)
OT: Considering how many big-budget events (World Cup / Olympics) do such a poor job at displaying any kind of unicode text (e.g. they frequently display č/ć/đ ad c/c/dj), the only thing that could be worse is a big red dialog box, lol!
Feb 18 2014
prev sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 18 Feb 2014 12:14:58 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 In a sense, \uFFFD means broken encoding.
In a sense yes, in another no. It is a defined code point and it has a symbol: =EF=BF=BD a diamond with a question mark inside.
 What about lone surrogates?
Those are actual broken encoding.
 Private use symbols that must not occur in transmission?
Then that "transmission" seems to exclude private symbols. It may also exclude special characters like \uFFFD. That's part of the particular protocol and should be handled there.
 They all=20
 displayed in various Unicode listings. About 'playing on broken strings'=
=20
 - ignoring broken/partially broken strings, I specifically think that=20
 it's what most users/use cases want.
=20
 A more useful and sensible default of decoding is to substitute on=20
 broken encoding. And it's a standard procedure. It's particularly better=
=20
 for displaying text.
Correct. I just don't agree that displaying text should the the one true use case and instead prefer exceptions instead of silent loss of information as the default.
 To remind: since it's only a decode you are still in the control of=20
 original text - in fact you may re-test what bytes are there IF you want.
=20
 The way of "throw on bad encoding" could be useful but I hardly see it=20
 as what you want for default.
=20
 I'm wary of breaking code that relies on throwing. For the moment I=20
 think the best course of action would be to introduce xdecode or some=20
 such that will do substitution on failure, see how it floats and then=20
 change ranges/foreach etc to use xdecode.
We wont convince each other. Let's just stop here. --=20 Marco
Feb 18 2014
prev sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/7/14, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 However, I would argue that assuming that everyone is going to validate
 their
 strings and that pretty much all string-related functions shouldn't ever
 have
 to worry about invalid Unicode is just begging for subtle bugs all over the

 place IMHO.
I suggested we would introduce an overload, not replace the existing function, so this isn't an issue.
 The problem is that you need to check it. This is _slower_ than exceptions in
the normal case, as invalid Unicode should be the rare case. Do you have any benchmarks for this? I have vague memory about complaining that the exception code is *de-facto* slower, regardless of input. But I'll try to provide some test-cases later and see where we're at.
Feb 08 2014
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
08-Feb-2014 12:20, Andrej Mitrovic пишет:
 On 2/7/14, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 However, I would argue that assuming that everyone is going to validate
 their
 strings and that pretty much all string-related functions shouldn't ever
 have
 to worry about invalid Unicode is just begging for subtle bugs all over the

 place IMHO.
I suggested we would introduce an overload, not replace the existing function, so this isn't an issue.
 The problem is that you need to check it. This is _slower_ than exceptions in
the normal case, as invalid Unicode should be the rare case. Do you have any benchmarks for this? I have vague memory about complaining that the exception code is *de-facto* slower, regardless of input. But I'll try to provide some test-cases later and see where we're at.
Just be sure to test on LDC or GDC. DMD results are irrelevant to the performance-minded of our community. Also be sure to copy the whole code involved in a single file not link to Phobos. People tend to thrown figures like ~10% slower with exceptions turned on but you'll never known what exactly they test. -- Dmitry Olshansky
Feb 08 2014
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/7/14, 8:29 AM, Andrej Mitrovic wrote:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!
Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code but assigns the return value through another parameter.
.toBugzilla() Andrei
Feb 07 2014
prev sibling parent "Dicebot" <public dicebot.lv> writes:
On Friday, 7 February 2014 at 03:14:45 UTC, Sean Kelly wrote:
 pure  safe void validate(S)(in S str) if (isSomeString!S);

 Throws:
 UTFException if str is not well-formed.
And somewhere in the world, darkness fell forever on a bright and beautiful countryside. The monsters poured forth and devoured everything in sight, given strength by that unbelievable abomination of a function design.
True words indeed! To sum up this small thread : I am perfectly OK with exceptions not showing in -vgc if we also agree on cleaning up Phobos from control flow exceptions.
Feb 07 2014
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, February 06, 2014 22:20:37 Dicebot wrote:
 On Thursday, 6 February 2014 at 22:18:10 UTC, Brad Anderson wrote:
 You should probably validate utf from all foreign sources.
 Catch a problem with it as it comes in rather than in some
 arbitrary part of your program.
 

pure safe void validate(S)(in S str) if (isSomeString!S); Throws: UTFException if str is not well-formed. ;)
In general, I think that throwing on malformed Unicode is a good thing, because it results in code that's less error-prone (as the alternative is to not validate Unicode and try and continue somehow regardless of bad input when decoding Unicode, which would be very bad IMHO). That being said, validating strings when they enter the program is a good way to localize any failures - which is where validate would come in - and I have to agree that the fact that validate throws is horrific. It's a classic example of a function that should return a bool rather than throw. You're asking it whether the string is valid, not asking to report errors when your normal control flow encounters an error that prevents it from functioning normally (which is where exceptions should normally be used). As such, I think that it's clear that we need a new function to replace it (e.g. isValidUnicode). I'll have to take a look at it. If I'm lucky, it won't even take all that long to implement. - Jonathan M Davis
Feb 07 2014
prev sibling next sibling parent "Sean Kelly" <sean invisibleduck.org> writes:
On Thursday, 6 February 2014 at 21:48:13 UTC, Dicebot wrote:
 On Thursday, 6 February 2014 at 19:54:27 UTC, Sean Kelly wrote:
 Does this case even matter?  Exceptions are not a normal 
 function of execution, and so should happen rarely to never.  
 And it's a time when I'd expect a delay anyway.
Imagine intentionally crafted broken utf as user input in repeated requests. You don't have control over it. Now if Phobos would have only thrown exceptions in really _exceptional_ situations and handled broken input gracefully...
That's a tough one. Bad input typically shouldn't generate an exception, but sometimes doing so is handy from a flow control perspective (I know I know, exceptions aren't for flow control). In the few instances where I use an exception for flow control though (like core.demangle) I always use a static instance, so no allocation occurs, and it's entirely internal to the routine. I think it's fair to say that _an_API_ shouldn't allocate and throw an exception to indicate an expected error condition. For a parser, invalid input definitely applies. So then if the user wants to throw an exception in that case, they can do so themselves. Then the choice of allocation is left to the user, not imposed on them. It's generally really easy to let the user supply a delegate to execute on error too, so they don't even necessarily have to check a return code.
Feb 06 2014
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Dicebot:

 Now if Phobos would have only thrown exceptions in really 
 _exceptional_ situations and handled broken input gracefully...
I wrote two small ideas to reduce throwing exceptions in Phobos: http://d.puremagic.com/issues/show_bug.cgi?id=6840 http://d.puremagic.com/issues/show_bug.cgi?id=11913 Bye, bearophile
Feb 07 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/6/2014 11:54 AM, Sean Kelly wrote:
 Does this case even matter?  Exceptions are not a normal function of execution,
 and so should happen rarely to never.  And it's a time when I'd expect a delay
 anyway.
Right. If you're: 1. using throws as control flow logic 2. requiring a throw in a performance critical loop to be performance critical 3. doing so many throws that the garbage collector needs to run to clean them up you're doing it wrong. I'm tempted to say that the throw expression can call 'new' even if the function is marked as nogc.
Feb 06 2014
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:
 Right. If you're:

 1. using throws as control flow logic
[...]
 you're doing it wrong.
I disagree. REST based web services tend to use throws all the time. It is a an effective and clean way to break all transactions that are in progress throughout the call chain when you cannot carry through a request, or if the request returns nothing.
Feb 06 2014
next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Friday, 7 February 2014 at 01:31:17 UTC, Ola Fosheim Grøstad
wrote:
 On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:
 Right. If you're:

 1. using throws as control flow logic
[...]
 you're doing it wrong.
I disagree. REST based web services tend to use throws all the time. It is a an effective and clean way to break all transactions that are in progress throughout the call chain when you cannot carry through a request, or if the request returns nothing.
I think in the case of people using exceptions for control flow a GC.free in your exception handler would suffice for preventing the GC heap from growing to the point where collection times become a concern.
Feb 06 2014
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/6/2014 5:31 PM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:
 Right. If you're:

 1. using throws as control flow logic
[...]
 you're doing it wrong.
I disagree. REST based web services tend to use throws all the time. It is a an effective and clean way to break all transactions that are in progress throughout the call chain when you cannot carry through a request, or if the request returns nothing.
They're going to be slow when you do it that way.
Feb 06 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 7 February 2014 at 02:42:14 UTC, Walter Bright wrote:
 They're going to be slow when you do it that way.
How slow is slow? Is it slower than in Go and Python? Why would unwinding 8 stack frames be so slow? Is it a language mandated speed issue or just a runtime issue that could be fixed with a compiler switch? Most of the time is spent waiting for async request from memcaches/databases and other types of network traffic so you usually have some free cycles on a decent CPU. With native code and lightweight threads (coroutines) you should be able to handle 100+ concurrent requests per process.
Feb 07 2014
next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad 
wrote:
 usually have some free cycles on a decent CPU. With native code 
 and lightweight threads (coroutines) you should be able to 
 handle 100+ concurrent requests per process.
When I think of it you could probably just push the RESTException throwing coroutine onto a "delayed request queue" since a timeout on a transaction might be no worse than aborting it (or carry along some kind of context object). That would make DoS less problematic too and you get better latency for good requests and complete the bad requests when you are idle.
Feb 07 2014
prev sibling next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad 
wrote:
 Is it a language mandated speed issue?
It is assumed by http://dlang.org/errors.html
Feb 07 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Friday, 7 February 2014 at 11:41:43 UTC, Dicebot wrote:
 On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad 
 wrote:
 Is it a language mandated speed issue?
It is assumed by http://dlang.org/errors.html
P.S. Throwing exception is not that slow in D, it is allocating new instance that makes a huge impact.
Feb 07 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/7/2014 3:42 AM, Dicebot wrote:
 P.S. Throwing exception is not that slow in D, it is allocating new instance
 that makes a huge impact.
Throwing speed can vary greatly from platform to platform. The idea, as in C++, is when there's a speed tradeoff between throw/catch speed and compromising speed to handle the possibility of exceptions, the non-throw case gets priority.
Feb 07 2014
prev sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad 
wrote:
 How slow is slow? Is it slower than in Go and Python?
One problem with allocating the exception is the stop-the-world thing. My cgi.d's built in httpd does some allocations in its constructor, which is run once per request. It can answer requests at a rate of about 6000/sec on my computer... Until the allocation have gone too much and the GC starts running. Then all the pending requests stop, killing the throughput. (BTW, interestingly, on Linux it uses separate process pools instead of threads. The GC does NOT stop the world since the other processes can keep going. But, if the requests are fairly uniform - as is typically the case with benchmarks - each process hits the GC threshold at about the same time.... ironically, it is the deterministic nature of the GC that leads to the performance killer there.)
Feb 07 2014
next sibling parent "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 7 February 2014 at 15:33:01 UTC, Adam D. Ruppe wrote:
 On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad 
 wrote:
 How slow is slow? Is it slower than in Go and Python?
One problem with allocating the exception is the stop-the-world thing. My cgi.d's built in httpd does some allocations in its constructor, which is run once per request. It can answer requests at a rate of about 6000/sec on my computer... Until the allocation have gone too much and the GC starts running. Then all the pending requests stop, killing the throughput. (BTW, interestingly, on Linux it uses separate process pools instead of threads. The GC does NOT stop the world since the other processes can keep going. But, if the requests are fairly uniform - as is typically the case with benchmarks - each process hits the GC threshold at about the same time.... ironically, it is the deterministic nature of the GC that leads to the performance killer there.)
It's obviously not a solution, but you could change that by having each process call GC.reserve() with a different size.
Feb 07 2014
prev sibling next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 7 February 2014 at 15:33:01 UTC, Adam D. Ruppe wrote:
 One problem with allocating the exception is the stop-the-world 
 thing.
Ok, well I guess that primarily is an issue for validation errors where you need to return detailed error reporting. "Not Found" etc can be preallocated as immutable, or?
 constructor, which is run once per request. It can answer 
 requests at a rate of about 6000/sec on my computer...
That sounds pretty good, was that as localhost, or over a network?
 (BTW, interestingly, on Linux it uses separate process pools 
 instead of threads. The GC does NOT stop the world since the 
 other processes can keep going. But, if the requests are fairly 
 uniform - as is typically the case with benchmarks - each 
 process hits the GC threshold at about the same time.... 
 ironically, it is the deterministic nature of the GC that leads 
 to the performance killer there.)
You could synchronize them by calling the GC explicitly N seconds after the other process GC or you if you use a load balancer, maybe the GC could be scheduled by the load balancer or notify the load balancer (assuming all requests are short-lived). This won't work for a simulation type server though. (which is what I am most interested in)
Feb 07 2014
parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 7 February 2014 at 17:10:15 UTC, Ola Fosheim Grøstad 
wrote:
 Ok, well I guess that primarily is an issue for validation 
 errors where you need to return detailed error reporting. "Not 
 Found" etc can be preallocated as immutable, or?
yeah, preallocating exceptions might be a really good idea.
 That sounds pretty good, was that as localhost, or over a 
 network?
localhost, and it was just hello world, performance of my thing degrades kinda quickly - it never gets /bad/, but it isn't great either once it starts doing more stuff than the basisc (but it is soooo easy to use! for me anyway)
 You could synchronize them by calling the GC explicitly N 
 seconds after the other process GC or you if you use a load 
 balancer, maybe the GC could be scheduled by the load balancer 
 or notify the load balancer (assuming all requests are 
 short-lived).
yeah. I'm not even sure if it would be a big deal in practice because there's often a lull anyway where the gc can get caught up (certainly not a problem for the lower traffic sites I mostly work on)
Feb 07 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 7 February 2014 at 20:41:01 UTC, Adam D. Ruppe wrote:
 yeah, preallocating exceptions might be a really good idea.
I wonder if it would be possible to get better unwinding speed by only throwing a single type of exception class and only a single catch. Then do pattern matching on an embedded typefield. I.e.: if (e.id & MASK_5xx) {} if (e.id & MASK_409) {} etc. After looking at the code for stack unwinding it seems like keeping the loops short is essential.
Feb 07 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/7/2014 7:33 AM, Adam D. Ruppe wrote:
 On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad wrote:
 How slow is slow? Is it slower than in Go and Python?
One problem with allocating the exception is the stop-the-world thing. My cgi.d's built in httpd does some allocations in its constructor, which is run once per request. It can answer requests at a rate of about 6000/sec on my computer...
The gc is not the real speed issue with exceptions, after all, one can preallocate the exception: throw new Exception(); v.s. e = new Exception(); ... throw e; It's the unwinding speed. Just have a look at what deh2.d has to do.
Feb 07 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
07-Feb-2014 23:45, Walter Bright пишет:
 On 2/7/2014 7:33 AM, Adam D. Ruppe wrote:
 On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad wrote:
 How slow is slow? Is it slower than in Go and Python?
One problem with allocating the exception is the stop-the-world thing. My cgi.d's built in httpd does some allocations in its constructor, which is run once per request. It can answer requests at a rate of about 6000/sec on my computer...
The gc is not the real speed issue with exceptions, after all, one can preallocate the exception: throw new Exception(); v.s. e = new Exception(); ... throw e;
And the standard library basically can't do this for every function.
 It's the unwinding speed. Just have a look at what deh2.d has to do.
It's deh.d or rather deh_win32./ deh_win64_posix.d and it doesn't look like _all_ that lot especially if you have no finally blocks and the only catch is the top-most catch-all. After all error codes would also have to propagate up the same call stack depth. -- Dmitry Olshansky
Feb 07 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/7/2014 12:51 PM, Dmitry Olshansky wrote:
 It's deh.d or rather deh_win32./ deh_win64_posix.d and it doesn't look like
 _all_ that lot especially if you have no finally blocks and the only catch is
 the top-most catch-all.
It's a heluva lot slower than "jmp".
Feb 08 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
09-Feb-2014 02:17, Walter Bright пишет:
 On 2/7/2014 12:51 PM, Dmitry Olshansky wrote:
 It's deh.d or rather deh_win32./ deh_win64_posix.d and it doesn't look
 like
 _all_ that lot especially if you have no finally blocks and the only
 catch is
 the top-most catch-all.
It's a heluva lot slower than "jmp".
If you can show me how a single unconditional jump propagates error code 4 calls up the stack I'm sold. I do understand it's slow, it's not that slow to make difference in the discussed case. It's all about jumping to the wrong conclusions. To put it in one pitch: it should be possible to throw/catch in excess of 100k exceptions per second no problem at all (assuming a single core of some run of the mill modern CPU). Nobody is asking to optimize it better then the normal flow. -- Dmitry Olshansky
Feb 09 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/9/2014 2:17 AM, Dmitry Olshansky wrote:
 If you can show me how a single unconditional jump propagates error code 4
calls
 up the stack I'm sold.

 I do understand it's slow, it's not that slow to make difference in the
 discussed case. It's all about jumping to the wrong conclusions.

 To put it in one pitch: it should be possible to throw/catch in excess of 100k
 exceptions per second no problem at all (assuming a single core of some run of
 the mill modern CPU).

 Nobody is asking to optimize it better then the normal flow.
It's the table lookup that's inherently slow.
Feb 10 2014
prev sibling next sibling parent "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 7 February 2014 at 01:31:17 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:
 Right. If you're:

 1. using throws as control flow logic
[...]
 you're doing it wrong.
I disagree. REST based web services tend to use throws all the time. It is a an effective and clean way to break all transactions that are in progress throughout the call chain when you cannot carry through a request, or if the request returns nothing.
But let this be up to the programmer working on the service, not imposed on them by the API. Then if they run into something like this DoS issue they can fix it. My experience with these services is that performance is critical and bad input is common, because people are always trying to hack your shit. Where I work, people are serious about performance, our daily volume is ridiculous, and our goal is five nine's of uptime across the board. At the same time, really good asynchronous programmers are about as rare as water on the moon. So something like vibe.d, where mid-level programmers could write correct code that still performs well thanks to the underlying event model, would be a godsend. But only if I really can get what I pay for. The thing I think a lot of people don't realize these days is that performance per watt is just about the most important thing there is. Data centers are expensive, slow to build, and rack space is limited. If you can find a way to increase the concurrent load per box by, say, an order of magnitude by choosing a different language or programming model or whatever, there's a real economic motivation to do so. Java gets by by having a really good GC and a low barrier of entry, but its scalability is really pretty poor all things considered. On the other hand, C/C++ scales tremendously but then you're stuck with the burden those languages impose in terms of semantic complexity, bug frequency, and so on. D seems really promising here but can't rely on having a fantastic incremental GC like Java, and so I think it's a mistake to use Java as a model for how to manage memory. And maybe Java just got it wrong anyway. I know some people who had to go to ridiculous lengths to avoid GC collection cycles in Java because a collection in the app took _20_seconds_ to complete. Now maybe the application was poorly designed or they should have been using an aftermarket GC, but even so. Finally, library programming is the one place where premature optimization really is a good idea, because you can never be sure how people will be using your code. That allocation may not be a big deal to you or 98% of your users, but for the one big client who calls that routine in a tight inner loop or operates at volumes you never conceived of it's a deal breaker. I really don't want Phobos to be the deal breaker :-)
Feb 06 2014
prev sibling parent "Dicebot" <public dicebot.lv> writes:
On Friday, 7 February 2014 at 01:31:17 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:
 Right. If you're:

 1. using throws as control flow logic
[...]
 you're doing it wrong.
I disagree. REST based web services tend to use throws all the time. It is a an effective and clean way to break all transactions that are in progress throughout the call chain when you cannot carry through a request, or if the request returns nothing.
And it is horrible. Exceptions were never designed for this. Try benchmarking trivial vibe.d REST service looking up an entry in an array and throwing 404 upon failure. Difference in performanc between "all requests are 200" and "all requests are 404" will be of order of magnitude.
Feb 07 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/6/14, 5:23 PM, Walter Bright wrote:
 I'm tempted to say that the throw expression can call 'new' even if the
 function is marked as  nogc.
That's extreme. A better possibility is to allocate exceptions from a different heap and proclaim that the heap is cleaned once all catch blocks are left. (I'm sure we can find something better, but now is not the time to worry about it.) Andrei
Feb 06 2014
next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 7 February 2014 at 02:19:42 UTC, Andrei Alexandrescu 
wrote:
 A better possibility is to allocate exceptions from a different 
 heap and proclaim that the heap is cleaned once all catch 
 blocks are left.
I wrote a quick proof of concept of this that can be tested right now: http://arsdnet.net/dcode/except.d It hooks _d_newclass to allocate Throwables on a little static bump-the-pointer array. Each catch block has a scope(success) in it that zeroes the throwables area back out to zero.
Feb 06 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/6/2014 6:19 PM, Andrei Alexandrescu wrote:
 On 2/6/14, 5:23 PM, Walter Bright wrote:
 I'm tempted to say that the throw expression can call 'new' even if the
 function is marked as  nogc.
That's extreme. A better possibility is to allocate exceptions from a different heap and proclaim that the heap is cleaned once all catch blocks are left. (I'm sure we can find something better, but now is not the time to worry about it.)
That doesn't work, as nothing prevents code from squirreling away the caught exception object handle.
Feb 07 2014
next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 7 February 2014 at 08:32:04 UTC, Walter Bright wrote:
 That doesn't work, as nothing prevents code from squirreling 
 away the caught exception object handle.
scope would. I'm just saying. We could also just document it as undefined behavior and leave matters in the user's hands, but this wouldn't jive nicely with safe :(
Feb 07 2014
parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 7 February 2014 at 15:41:59 UTC, Adam D. Ruppe wrote:
 On Friday, 7 February 2014 at 08:32:04 UTC, Walter Bright wrote:
 That doesn't work, as nothing prevents code from squirreling 
 away the caught exception object handle.
scope would. I'm just saying. We could also just document it as undefined behavior and leave matters in the user's hands, but this wouldn't jive nicely with safe :(
Thread stores an uncaught exception reference so it can be rethrown on join(). But I suppose a case could be made that an uncaught exception could either be discarded or abort the app.
Feb 07 2014
parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 7 February 2014 at 15:44:08 UTC, Sean Kelly wrote:
 But I suppose a case could be made that an uncaught exception 
 could either be discarded or abort the app.
It could also make a copy at that time on to the regular GC heap and store that (the members of the throwable class are still GC'd so all the store function has to do is a shallow copy, using the RTTI to get the correct size to copy, onto the gc heap). It'd surely be fewer exceptions to get through that than the thrown, caught, and subsequentely discarded typical case.
Feb 07 2014
parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 7 February 2014 at 15:48:56 UTC, Adam D. Ruppe wrote:
 It could also make a copy at that time on to the regular GC 
 heap and store that
lol just add in a quick call to .toGC when you want to store it: T toGC(T)(T t) if(is(T==class)) { auto size = typeid(t).init.length; import core.memory; auto ptr = GC.malloc(size); ptr[0 .. size] = (cast(void*) t)[0 .. size]; return cast(T) ptr; }
Feb 07 2014
prev sibling parent reply Jerry <jlquinn optonline.net> writes:
Walter Bright <newshound2 digitalmars.com> writes:

 On 2/6/2014 6:19 PM, Andrei Alexandrescu wrote:
 On 2/6/14, 5:23 PM, Walter Bright wrote:
 I'm tempted to say that the throw expression can call 'new' even if the
 function is marked as  nogc.
That's extreme. A better possibility is to allocate exceptions from a different heap and proclaim that the heap is cleaned once all catch blocks are left. (I'm sure we can find something better, but now is not the time to worry about it.)
That doesn't work, as nothing prevents code from squirreling away the caught exception object handle.
Very naive question (that may have already been answered), but why can't throw use structs instead of classes? Then the exception would propagate by copy rather than passing the object up the stack?
Feb 07 2014
parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 7 February 2014 at 18:28:24 UTC, Jerry wrote:
 throw use structs instead of classes?
I think that'd be more costly and would mess up the whole inheritance checks; catch(Exception) wouldn't catch the same children.
Feb 07 2014
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Jerry:

 throw use structs instead of classes?
This thread discusses the (low) performance of D exceptions, and suggests some ideas: https://d.puremagic.com/issues/show_bug.cgi?id=9584 Another thread: https://d.puremagic.com/issues/show_bug.cgi?id=9581 The thread also discusses an old idea from Java: http://www.javaspecialists.eu/archive/Issue187.html Bye, bearophile
Feb 07 2014
parent "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 7 February 2014 at 18:45:24 UTC, bearophile wrote:
 Jerry:

 throw use structs instead of classes?
This thread discusses the (low) performance of D exceptions, and suggests some ideas: https://d.puremagic.com/issues/show_bug.cgi?id=9584 Another thread: https://d.puremagic.com/issues/show_bug.cgi?id=9581 The thread also discusses an old idea from Java: http://www.javaspecialists.eu/archive/Issue187.html
Okay, I'm going to look into generating traces lazily. I think it should be possible.
Feb 07 2014
prev sibling parent "Dicebot" <public dicebot.lv> writes:
On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 Hey, wait a second. How do you throw without allocating?
Throw pre-allocated thread-local exception. And make a deep copy for it if it is going to be put into exception chain to avoid modifying one already in chain. I have been told in that PR that some of language features assume exception instances are always unique and rely on it. It sounds like major language design flaw that will block usage of Phobos in memory-caring code even if other issues are taken care of. Probably language spec should be relaxed to fix this.
Feb 06 2014
prev sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu 
wrote:
 On 2/6/14, 10:05 AM, Johannes Pfau wrote:
 Am Thu, 06 Feb 2014 16:32:08 +0000
 schrieb "Dicebot" <public dicebot.lv>:

 On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei 
 Alexandrescu
 wrote:
 Would anyone be willing to take on the ingrate task of 
 creating
 a comprehensive list with all Phobos functions (and more
 generally artifacts) that allocate memory? That would help a
 lot with focusing the discussion.

 Andrei
Merging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.
That's only for implicit allocations though. And please, don't merge yet, it'll get another rewrite this weekend ;-)
Please close if you plan to rewrite.
 One interesting point is that module that were written with 
 avoiding
 allocations in mind usually still allocate when throwing 
 exceptions.
Good point, we need to address that as well.
I'd think fixing that is probably above and beyond what is required to satisfy most people. If you are throwing so many exceptions that GC pauses are a problem you've got more serious problems than the GC. nothrow doesn't concern itself with Error exceptions, I think nogc should just ignore exceptions generally.
 Andrei
Feb 06 2014
prev sibling next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On 6 February 2014 18:05, Johannes Pfau <nospam example.com> wrote:
 Am Thu, 06 Feb 2014 16:32:08 +0000
 schrieb "Dicebot" <public dicebot.lv>:

 On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu
 wrote:
 Would anyone be willing to take on the ingrate task of creating
 a comprehensive list with all Phobos functions (and more
 generally artifacts) that allocate memory? That would help a
 lot with focusing the discussion.

 Andrei
Merging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.
That's only for implicit allocations though. And please, don't merge yet, it'll get another rewrite this weekend ;-) One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions. Here's some example output for std.uuid/digest/path/range/algorithm/curl: http://dpaste.dzfl.pl/96d3725b06e2
That message will look much better with vcolumns. ;) Albeit, it also depends on moving fprint(global.stdmsg, ...) => message(...) http://dpaste.dzfl.pl/5b1961918ed6
Feb 06 2014
prev sibling next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On 6 February 2014 19:03, Iain Buclaw <ibuclaw gdcproject.org> wrote:
 On 6 February 2014 18:05, Johannes Pfau <nospam example.com> wrote:
 Am Thu, 06 Feb 2014 16:32:08 +0000
 schrieb "Dicebot" <public dicebot.lv>:

 On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu
 wrote:
 Would anyone be willing to take on the ingrate task of creating
 a comprehensive list with all Phobos functions (and more
 generally artifacts) that allocate memory? That would help a
 lot with focusing the discussion.

 Andrei
Merging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.
That's only for implicit allocations though. And please, don't merge yet, it'll get another rewrite this weekend ;-) One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions. Here's some example output for std.uuid/digest/path/range/algorithm/curl: http://dpaste.dzfl.pl/96d3725b06e2
That message will look much better with vcolumns. ;) Albeit, it also depends on moving fprint(global.stdmsg, ...) => message(...) http://dpaste.dzfl.pl/5b1961918ed6
Saying that, it seems it doesn't show the column number correctly. http://dpaste.dzfl.pl/31c8800e223a
Feb 06 2014
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Johannes Pfau:

 Here's some example output for
 std.uuid/digest/path/range/algorithm/curl:
 http://dpaste.dzfl.pl/96d3725b06e2
 ./dmd -vgc ~/Dokumente/d/phobos/std/range.d -c -unittest
 /home/jpf/Dokumente/d/phobos/std/range.d(7307): vgc: Array 
 literals cause gc allocation
Since some time in some cases dynamic array literals don't allocate. And there's also this: https://github.com/D-Programming-Language/dmd/pull/2952 the [1, 2]s syntax guarantees no heap allocation. Bye, bearophile
Feb 06 2014
parent "Namespace" <rswhite4 googlemail.com> writes:
On Thursday, 6 February 2014 at 20:40:28 UTC, bearophile wrote:
 Johannes Pfau:

 Here's some example output for
 std.uuid/digest/path/range/algorithm/curl:
 http://dpaste.dzfl.pl/96d3725b06e2
 ./dmd -vgc ~/Dokumente/d/phobos/std/range.d -c -unittest
 /home/jpf/Dokumente/d/phobos/std/range.d(7307): vgc: Array 
 literals cause gc allocation
Since some time in some cases dynamic array literals don't allocate. And there's also this: https://github.com/D-Programming-Language/dmd/pull/2952 the [1, 2]s syntax guarantees no heap allocation. Bye, bearophile
My pull was not perfect. And I have no time to finish the type[$] and auto[$] pull. :/
Feb 06 2014
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, February 08, 2014 09:20:15 Andrej Mitrovic wrote:
 On 2/7/14, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 However, I would argue that assuming that everyone is going to validate
 their
 strings and that pretty much all string-related functions shouldn't ever
 have
 to worry about invalid Unicode is just begging for subtle bugs all over
 the
 
 place IMHO.
I suggested we would introduce an overload, not replace the existing function, so this isn't an issue.
 The problem is that you need to check it. This is _slower_ than exceptions
 in
the normal case, as invalid Unicode should be the rare case. Do you have any benchmarks for this? I have vague memory about complaining that the exception code is *de-facto* slower, regardless of input. But I'll try to provide some test-cases later and see where we're at.
The exception version has to all of the same checks that the version which returns an error value would have to do, while the one returning an error value which had to be checked for validity would have an extra check. So, the only ways that the exception version would be slower are if the plumbing for being able to throw an exception from the function makes it slower (assuming that the other would be nothrow) or if the optimizer just does worse with the exception one for some reason. Because the number of operations that the actual D code would be doing in the successful case would be greater for the non-throwing version. Code generation can do entertaining things to efficiency though, so benchmarking would be required to see what would actually happen. However, as I stated in another post, I've reconsidered the situation. I think that I misunderstood what Dmitry was suggesting and that checking the error value is not actually necessary: http://forum.dlang.org/post/mailman.66.1391838333.21734.digitalmars-d puremagic.com And if that's the case, then we can probably move towards having decode not throw and possibly getting rid of UTFException altogether (certainly, most code wouldn't throw it or have to worry about it, since decode and stride are the two main cases where that's a concern, and if they don't throw anymore, then UTFException would have very little use). - Jonathan M Davis
Feb 08 2014