digitalmars.D - List of Phobos functions that allocate memory?

Andrei Alexandrescu (5/5) Feb 06 2014 Would anyone be willing to take on the ingrate task of creating a

Dicebot (5/10) Feb 06 2014 Merging https://github.com/D-Programming-Language/dmd/pull/1886

Andrej Mitrovic (7/10) Feb 06 2014 Running the tests is overkill, all you have to do is iterate over each

Martin Cejp (9/22) Feb 06 2014 Quite a few of those seem to be false positives.

Andrej Mitrovic (3/7) Feb 06 2014 Ah just realized there are duplicates in the report. I guess -vgc is
Andrej Mitrovic (2/6) Feb 06 2014 Updated to remove duplicate reports.

Andrei Alexandrescu (3/10) Feb 06 2014 Thanks. I guess we'd need to cross-reference to function names from ther...

grm (3/16) Feb 06 2014 lots of them are throws tough
grm (3/16) Feb 06 2014 and also new *XY*Exception doesn't indicate a problem necessarily

Andrei Alexandrescu (3/20) Feb 06 2014 Good point. Seems to me code inspection would be a simpler way.

Andrej Mitrovic (2/3) Feb 06 2014 Updated to include function names.

Andrei Alexandrescu (6/9) Feb 06 2014 Noice. One

Andrej Mitrovic (5/10) Feb 06 2014 Well I'm just hacking on the -vgc pull to output what I want, but
Andrej Mitrovic (4/8) Feb 06 2014 Ah you've attached a file, didn't notice it on the left since I

Dmitry Olshansky (6/9) Feb 06 2014 Hm.

H. S. Teoh (6/16) Feb 06 2014 [...]

Dmitry Olshansky (5/19) Feb 06 2014 O.T. From a pragmatic point of view any specific property of a system

Dmitry Olshansky (6/13) Feb 06 2014 Needs to somehow cut down CTFE-only stuff.

Johannes Pfau (9/21) Feb 06 2014 That's only for implicit allocations though. And please, don't merge

Andrei Alexandrescu (4/22) Feb 06 2014 Good point, we need to address that as well.

grm (14/19) Feb 06 2014 expecting the requested close, so some OTs (in random order):
fra (3/9) Feb 06 2014 Hey, wait a second. How do you throw without allocating?

Andrei Alexandrescu (4/13) Feb 06 2014 I don't know yet. That's what the "addressing the problem" will take

H. S. Teoh (32/47) Feb 06 2014 [...]

Johannes Pfau (4/18) Feb 06 2014 You can store the exception as a global and that's done for the

Johannes Pfau (4/24) Feb 06 2014 Oh and in other languages you can throw by value but I think that
Andrej Mitrovic (5/7) Feb 06 2014 Hmm.. is that even safe? I mean in some case of exception

Namespace (3/11) Feb 06 2014 You could use a circular buffer with appropriate length.

Iain Buclaw (2/13) Feb 06 2014 You can't. :o)
Adam D. Ruppe (6/7) Feb 06 2014 I think exceptions should be ok. You optimize the typical path,

Johannes Pfau (7/15) Feb 06 2014 That depends on your situation. For games and other applications on

Adam D. Ruppe (8/11) Feb 06 2014 Yeah, when I toyed with bare metal D, I did exceptions with

Dicebot (7/14) Feb 06 2014 Hardly so. Any exception allocation can trigger GC collection

Brad Anderson (5/22) Feb 06 2014 Personally I don't think bad user input qualifies as an

Dicebot (6/10) Feb 06 2014 I agree. It kills the whole concept of "exceptions are rare so

Brad Anderson (9/20) Feb 06 2014 I must admit that I am guilty of sometimes using exceptions for

Walter Bright (3/6) Feb 06 2014 It's not a matter of taste. If your input is subject to a DoS attack, do...

bearophile (8/10) Feb 06 2014 Perhaps the world of today malicious attacks on the software you

Walter Bright (2/10) Feb 07 2014 DoS attack resistance requires faster code, not slower code.

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (6/11) Feb 07 2014 The specific problem was that it was possible to provoke hash

Walter Bright (3/7) Feb 08 2014 That has nothing to do with needing exceptions in the control flow path ...

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (5/21) Feb 09 2014 Huh? I responded to this discussion:

John Colvin (12/29) Feb 07 2014 I think bearophile is referring to a practice of avoiding fast

bearophile (4/5) Feb 07 2014 Yes, you have explained well my point. Thank you.

Dmitry Olshansky (5/13) Feb 07 2014 Meh. If exceptions are such a liability we'd better make them (much)

Sean Kelly (15/17) Feb 07 2014 It's not stack unwinding speed that's an issue here though, but

Dmitry Olshansky (22/41) Feb 07 2014 Why throwing a single exception is such a big problem? Surely even C's

Dicebot (7/16) Feb 07 2014 As I have already mentioned, they don't necessarily need to be.

Walter Bright (3/7) Feb 08 2014 It is NOT the allocation that's the issue. C++ code has the same issue. ...

Sean Kelly (26/59) Feb 07 2014 That can be turned off at run time by clearing the traceHandler.

Adam Wilson (21/78) Feb 07 2014 On Fri, 07 Feb 2014 10:54:37 -0800, Sean Kelly ...
Dmitry Olshansky (27/79) Feb 07 2014 Which should be somehow prominently advertised for release builds. Last
Walter Bright (5/6) Feb 08 2014 Code can always pre-allocate the exception that is thrown. There's no re...

Walter Bright (5/6) Feb 08 2014 Because in order to unwind the stack, you need to find the information a...

Dmitry Olshansky (9/15) Feb 08 2014 A special table lookup can't be slow compared to writing a dummy HTTP

Adam D. Ruppe (7/10) Feb 08 2014 Can you see if it is better with this little patch?
Walter Bright (5/18) Feb 08 2014 I don't know how vibe.d works, but my point is using exception handling ...

Jonathan M Davis (24/46) Feb 08 2014 I wouldn't have considered throwing on an HTTP error to be "flow control...

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (20/25) Feb 09 2014 Just to be pedantic: this is not true.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/6) Feb 09 2014 And with profiling you get the call-frequency between functions,

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (5/8) Feb 08 2014 "Compromising"? You mean they had to modify codegen, which they

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/6) Feb 08 2014 But the c++ Dwarf way of doing it was developed for Itanium which

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/8) Feb 08 2014 AND (this just has to be said) if D is really meant to be a SAFE

Walter Bright (5/11) Feb 08 2014 Ola, I've done it both ways, I actually do know what I'm talking about.

Marco Leise (13/27) Feb 08 2014 ation,
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (35/39) Feb 09 2014 Please note that "you" and "they" was meant as "one" or "the c++

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/6) Feb 09 2014 This is a pretty nice description of the i7 pipeline by Hennesey

Jonathan M Davis (16/29) Feb 07 2014 Related: http://d.puremagic.com/issues/show_bug.cgi?id=9584

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (16/18) Feb 07 2014 Well, it is at least more difficult to write reliable code when
Walter Bright (10/11) Feb 08 2014 Grep for 'throw' in std.datetime shows that every throw is actually:

Jonathan M Davis (15/32) Feb 08 2014 Of course allocation is not a language issue. The question is whether (a...

Andrei Alexandrescu (6/18) Feb 07 2014 One simple idea is to statically allocate the same exception and rethrow...

Jonathan M Davis (23/42) Feb 07 2014 As long as exceptions are cloneable, and people are aware of the fact th...
Jakob Ovrum (4/10) Feb 08 2014 I don't think it's that simple. What happens if an XException

Dmitry Olshansky (7/17) Feb 08 2014 If both are thread-local and cached I see no problem whatsoever.

Jakob Ovrum (5/10) Feb 08 2014 How is it not a problem? XException's fields (message, location

Jonathan M Davis (6/18) Feb 08 2014 Then we have multiple of them, or we new up another one when a second on...

Jakob Ovrum (6/28) Feb 08 2014 Yes, I'm sure there is a cool solution, I'm just pointing out

Marco Leise (23/55) Feb 08 2014 Yes, it doesn't seem feasible otherwise. Since you can call

Jakob Ovrum (4/23) Feb 09 2014 While writes directly to line and file and such can't be

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (5/16) Feb 09 2014 It's supposedly one exception instance per place where it can be

Andrei Alexandrescu (3/13) Feb 08 2014 The chaining method detects that and .dup's one of them.

Dicebot (7/9) Feb 08 2014 After some thinking I don't think it actually helps - exception
Jakob Ovrum (18/38) Feb 08 2014 What if the statically allocated XException is escaped to be

Walter Bright (6/7) Feb 08 2014 They can be made faster by slowing down non-exception code.

Marco Leise (18/28) Feb 08 2014 https://yourlogicalfallacyis.com/black-or-white

Walter Bright (6/9) Feb 08 2014 Sigh, once again,

Marco Leise (29/50) Feb 08 2014 Content-Disposition: inline

Jakob Ovrum (3/16) Feb 09 2014 This doesn't seem like a valid concern. Nothing stops you from
Lars T. Kyllingstad (3/4) Feb 09 2014 Off topic, but that is a fantastic web site. I wish I had known
Andrei Alexandrescu (4/23) Feb 09 2014 Function calls could do that.

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (12/16) Feb 07 2014 Hmm... then what _does_ qualify as exceptional in your opinion?

Dicebot (6/17) Feb 07 2014 It is exceptional situation if input is supposed to be valid but

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (16/36) Feb 07 2014 If the function expects it to be valid but you pass it an invalid

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (55/59) Feb 07 2014 I agree. Any situation where it makes sense to say:
Jonathan M Davis (80/87) Feb 07 2014 Honestly, I think that the typical approach of discussing exceptions as ...

bearophile (6/30) Feb 07 2014 Languages with a good type system solve this with Maybe /

Jonathan M Davis (7/36) Feb 07 2014 That can be a good solution, but it also then requires checking the resu...

Adam D. Ruppe (130/132) Feb 06 2014 Hmm, I hadn't considered that. Maybe exceptions could be handled

Adam D. Ruppe (3/4) Feb 06 2014 code in a link so the lines aren't broken
Sean Kelly (8/16) Feb 06 2014 I really like vibe.d. A lot. But the way HTTP parse errors are

Adam D. Ruppe (5/7) Feb 06 2014 lol, my cgi.d will do that too if you compile with -debug.... I
Jacob Carlborg (9/15) Feb 07 2014 Ruby on Rails renders a page with a stack trace in development mode and

Sean Kelly (8/26) Feb 07 2014 I was mostly surprised that the stack trace was written back to

Jacob Carlborg (9/16) Feb 09 2014 Ruby on Rails always writes the stack trace to the log. In development

Brad Anderson (9/14) Feb 06 2014 Thinking about this more it'd probably be a good idea to use the

Dicebot (5/14) Feb 07 2014 Yes, I even had some simple proof-of-concept drafts of such

Adam D. Ruppe (3/7) Feb 07 2014 Yeah, I think using separate types for printing to users is often

Steven Schveighoffer (6/12) Feb 06 2014 I think if reference counting is added, exceptions would be a prime

Sean Kelly (4/13) Feb 06 2014 Does this case even matter? Exceptions are not a normal function

Andrei Alexandrescu (3/14) Feb 06 2014 I think it's okay to put this on the backburner and revisit it later.
Dicebot (5/8) Feb 06 2014 Imagine intentionally crafted broken utf as user input in

Brad Anderson (5/13) Feb 06 2014 You should probably validate utf from all foreign sources. Catch

Dicebot (5/9) Feb 06 2014 pure @safe void validate(S)(in S str) if (isSomeString!S);

Brad Anderson (2/13) Feb 06 2014 Heh, well then... let me just wipe this egg off my face. :P
Sean Kelly (5/15) Feb 06 2014 And somewhere in the world, darkness fell forever on a bright and

Adam D. Ruppe (10/13) Feb 06 2014 Yeah, that is absurd. It is a bad, bad sign when almost every

Andrei Alexandrescu (3/14) Feb 07 2014 Add a bugzilla and let's define isValid that returns bool!

Andrej Mitrovic (5/6) Feb 07 2014 Add std.utf.decode() to that as well. IOW, it should have an

Dmitry Olshansky (5/9) Feb 07 2014 Much simpler - it returns a special dchar to designate bad encoding. And...

Andrej Mitrovic (2/4) Feb 07 2014 A NaN for chars? Sounds great to me! :)

Dmitry Olshansky (28/32) Feb 07 2014 It's called \uFFFD and is specifically for bad encodings. I wonder why

Walter Bright (2/4) Feb 08 2014 Nice find. Looks good to me.

Dmitry Olshansky (4/8) Feb 08 2014 https://d.puremagic.com/issues/show_bug.cgi?id=12113

Jonathan M Davis (19/28) Feb 07 2014 Isn't that actually worse? Unless you're suggesting that we stop throwin...

Meta (4/51) Feb 07 2014 You could always return an Option!char. Nullable won't work

Jonathan M Davis (16/72) Feb 07 2014 How is that any better than returning an invalid dchar with a specific v...

Meta (10/104) Feb 07 2014 We have had this discussion at least once before. A hypothetical

Jonathan M Davis (23/48) Feb 07 2014 The problem is that you need to check it. This is _slower_ than exceptio...

bearophile (7/9) Feb 07 2014 Right, but verifying the correctness of the Unicode encoding of a

Jonathan M Davis (13/21) Feb 07 2014 But why even do it in the first place then? The code is cleaner and less...

Marco Leise (16/40) Feb 07 2014 I agree with both of you. The Unicode standard tells us that

Marco Leise (7/9) Feb 07 2014
Jonathan M Davis (11/17) Feb 07 2014 I think that that would call for us to have 3 related but distinct funct...

Marco Leise (10/29) Feb 07 2014 Yes, that's the one that needs to be added.
Brad Anderson (12/16) Feb 08 2014 I wonder if it'd be too reckless to just make decode for string

Timon Gehr (2/18) Feb 08 2014 "☹"[1..$]
Dominikus Dittes Scherkl (13/16) Feb 08 2014 Why?

Jonathan M Davis (26/47) Feb 07 2014 Actually, thinking this through some more, if we can replace invalid Uni...

Dmitry Olshansky (8/35) Feb 08 2014 It is.

Dmitry Olshansky (8/12) Feb 08 2014 This is ridiculously distracting suggestion and simply has no merits

Meta (4/16) Feb 08 2014 I'm not actually suggesting a replacement. Just wishful thinking

Jonathan M Davis (22/43) Feb 08 2014 I don't see how returning Nullable!dchar would improve decode function a...

Dmitry Olshansky (21/50) Feb 08 2014 No, it's better and more flexible for those who care to repair broken

Marco Leise (30/87) Feb 08 2014 nd

Dmitry Olshansky (16/88) Feb 09 2014 Working with ranges of dchar? Nobody is taking eager validation from

Daniel Murphy (2/6) Feb 09 2014 That would be a luxury, gedit doesn't even have auto-indent.

Marco Leise (10/19) Feb 16 2014 You can talk about missing features in gedit all day, but from

Daniel Murphy (3/5) Feb 17 2014 What do you use for displaying text, if not a text editor?

Marco Leise (13/20) Feb 17 2014 That was directed at D development. Or programming with

Marco Leise (11/30) Feb 16 2014 Of course it does. It is a valid symbol and a lot of websites

Dmitry Olshansky (22/50) Feb 18 2014 In a sense, \uFFFD means broken encoding. What about lone surrogates?

Andrej Mitrovic (5/8) Feb 18 2014 OT: Considering how many big-budget events (World Cup / Olympics) do
Marco Leise (16/37) Feb 18 2014 In a sense yes, in another no. It is a defined code point and

Andrej Mitrovic (8/15) Feb 08 2014 I suggested we would introduce an overload, not replace the existing

Dmitry Olshansky (8/24) Feb 08 2014 Just be sure to test on LDC or GDC. DMD results are irrelevant to the

Andrei Alexandrescu (3/8) Feb 07 2014 .toBugzilla()

Dicebot (5/14) Feb 07 2014 True words indeed!

Jonathan M Davis (16/29) Feb 07 2014 In general, I think that throwing on malformed Unicode is a good thing,

Sean Kelly (15/23) Feb 06 2014 That's a tough one. Bad input typically shouldn't generate an
bearophile (6/8) Feb 07 2014 I wrote two small ideas to reduce throwing exceptions in Phobos:

Walter Bright (8/11) Feb 06 2014 Right. If you're:

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/10) Feb 06 2014 I disagree.

Brad Anderson (6/17) Feb 06 2014 I think in the case of people using exceptions for control flow a
Walter Bright (3/13) Feb 06 2014 They're going to be slow when you do it that way.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/11) Feb 07 2014 How slow is slow? Is it slower than in Go and Python? Why would

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/11) Feb 07 2014 When I think of it you could probably just push the RESTException
Dicebot (3/4) Feb 07 2014 It is assumed by http://dlang.org/errors.html

Dicebot (3/7) Feb 07 2014 P.S. Throwing exception is not that slow in D, it is allocating

Walter Bright (5/7) Feb 07 2014 Throwing speed can vary greatly from platform to platform.

Adam D. Ruppe (16/17) Feb 07 2014 One problem with allocating the exception is the stop-the-world

Sean Kelly (3/20) Feb 07 2014 It's obviously not a solution, but you could change that by
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (11/22) Feb 07 2014 Ok, well I guess that primarily is an issue for validation errors

Adam D. Ruppe (11/21) Feb 07 2014 yeah, preallocating exceptions might be a really good idea.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/11) Feb 07 2014 I wonder if it would be possible to get better unwinding speed by

Walter Bright (9/15) Feb 07 2014 The gc is not the real speed issue with exceptions, after all, one can

Dmitry Olshansky (9/27) Feb 07 2014 It's deh.d or rather deh_win32./ deh_win64_posix.d and it doesn't look

Walter Bright (2/5) Feb 08 2014 It's a heluva lot slower than "jmp".

Dmitry Olshansky (11/18) Feb 09 2014 If you can show me how a single unconditional jump propagates error code...

Walter Bright (2/10) Feb 10 2014 It's the table lookup that's inherently slow.

Sean Kelly (41/52) Feb 06 2014 But let this be up to the programmer working on the service, not
Dicebot (7/18) Feb 07 2014 And it is horrible. Exceptions were never designed for this. Try

Andrei Alexandrescu (6/8) Feb 06 2014 That's extreme. A better possibility is to allocate exceptions from a

Adam D. Ruppe (8/11) Feb 06 2014 I wrote a quick proof of concept of this that can be tested right
Walter Bright (3/9) Feb 07 2014 That doesn't work, as nothing prevents code from squirreling away the ca...

Adam D. Ruppe (5/7) Feb 07 2014 scope would. I'm just saying.

Sean Kelly (4/11) Feb 07 2014 Thread stores an uncaught exception reference so it can be

Adam D. Ruppe (7/9) Feb 07 2014 It could also make a copy at that time on to the regular GC heap

Adam D. Ruppe (9/11) Feb 07 2014 lol just add in a quick call to .toGC when you want to store it:

Jerry (4/14) Feb 07 2014 Very naive question (that may have already been answered), but why can't

Adam D. Ruppe (4/5) Feb 07 2014 I think that'd be more costly and would mess up the whole

bearophile (10/11) Feb 07 2014 This thread discusses the (low) performance of D exceptions, and

Sean Kelly (3/12) Feb 07 2014 Okay, I'm going to look into generating traces lazily. I think

Dicebot (9/10) Feb 06 2014 Throw pre-allocated thread-local exception. And make a deep copy

Brad Anderson (8/39) Feb 06 2014 I'd think fixing that is probably above and beyond what is

Iain Buclaw (4/25) Feb 06 2014 That message will look much better with vcolumns. ;)
Iain Buclaw (3/32) Feb 06 2014 Saying that, it seems it doesn't show the column number correctly.
bearophile (8/14) Feb 06 2014 Since some time in some cases dynamic array literals don't

Namespace (3/17) Feb 06 2014 My pull was not perfect. And I have no time to finish the type[$]

Jonathan M Davis (21/42) Feb 08 2014 The exception version has to all of the same checks that the version whi...

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Would anyone be willing to take on the ingrate task of creating a 
comprehensive list with all Phobos functions (and more generally 
artifacts) that allocate memory? That would help a lot with focusing the 
discussion.

Andrei

Feb 06 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu 
wrote:
 Would anyone be willing to take on the ingrate task of creating 
 a comprehensive list with all Phobos functions (and more 
 generally artifacts) that allocate memory? That would help a 
 lot with focusing the discussion.

 Andrei

Merging https://github.com/D-Programming-Language/dmd/pull/1886 
and running phobos unit tests should make it relatively simple, 
at least for a first pass.

Feb 06 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/6/14, Dicebot <public dicebot.lv> wrote:
 Merging https://github.com/D-Programming-Language/dmd/pull/1886
 and running phobos unit tests should make it relatively simple,
 at least for a first pass.

Running the tests is overkill, all you have to do is iterate over each
module and call "-o- -vgc" on it.

We have so many allocations in Phobos that I couldn't even upload my
text over to a paste site, most sites have a limit of 150Kb! So here
it is on github:

https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt

Feb 06 2014

"Martin Cejp" <minexew gmail.com> writes:

On Thursday, 6 February 2014 at 17:18:59 UTC, Andrej Mitrovic 
wrote:
 On 2/6/14, Dicebot <public dicebot.lv> wrote:
 Merging https://github.com/D-Programming-Language/dmd/pull/1886
 and running phobos unit tests should make it relatively simple,
 at least for a first pass.

 Running the tests is overkill, all you have to do is iterate 
 over each
 module and call "-o- -vgc" on it.

 We have so many allocations in Phobos that I couldn't even 
 upload my
 text over to a paste site, most sites have a limit of 150Kb! So 
 here
 it is on github:

 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt

Quite a few of those seem to be false positives.
E.g.

C:\dmd-git\dmd2\src\phobos\std\internal\digest\sha_SSSE3.d(512): 
Concatenation causes gc allocation
                 "rol "~T2~",5",

looks like something that only ever makes sense at compilation 
time

Feb 06 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 We have so many allocations in Phobos that I couldn't even upload my
 text over to a paste site, most sites have a limit of 150Kb! So here
 it is on github:

 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt

Ah just realized there are duplicates in the report. I guess -vgc is
emitting dupes.

Feb 06 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt

 Ah just realized there are duplicates in the report. I guess -vgc is
 emitting dupes.

Updated to remove duplicate reports.

Feb 06 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/14, 9:21 AM, Andrej Mitrovic wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt

 Ah just realized there are duplicates in the report. I guess -vgc is
 emitting dupes.

 Updated to remove duplicate reports.

Thanks. I guess we'd need to cross-reference to function names from there.

Andrei

Feb 06 2014

"grm" <gerhard.mueller gmsoft.at> writes:

On Thursday, 6 February 2014 at 17:57:45 UTC, Andrei Alexandrescu 
wrote:
 On 2/6/14, 9:21 AM, Andrej Mitrovic wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt

 Ah just realized there are duplicates in the report. I guess 
 -vgc is
 emitting dupes.

 Updated to remove duplicate reports.

 Thanks. I guess we'd need to cross-reference to function names 
 from there.

 Andrei

lots of them are throws tough

Feb 06 2014

"grm" <gerhard.mueller gmsoft.at> writes:

On Thursday, 6 February 2014 at 17:57:45 UTC, Andrei Alexandrescu 
wrote:
 On 2/6/14, 9:21 AM, Andrej Mitrovic wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt

 Ah just realized there are duplicates in the report. I guess 
 -vgc is
 emitting dupes.

 Updated to remove duplicate reports.

 Thanks. I guess we'd need to cross-reference to function names 
 from there.

 Andrei

and also new *XY*Exception doesn't indicate a problem necessarily

Feb 06 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/14, 10:05 AM, grm wrote:
 On Thursday, 6 February 2014 at 17:57:45 UTC, Andrei Alexandrescu wrote:
 On 2/6/14, 9:21 AM, Andrej Mitrovic wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt

 Ah just realized there are duplicates in the report. I guess -vgc is
 emitting dupes.

 Updated to remove duplicate reports.

 Thanks. I guess we'd need to cross-reference to function names from
 there.

 Andrei

 and also new *XY*Exception doesn't indicate a problem necessarily

Good point. Seems to me code inspection would be a simpler way.

Andrei

Feb 06 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Thanks. I guess we'd need to cross-reference to function names from there.

Updated to include function names.

Feb 06 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/14, 10:15 AM, Andrej Mitrovic wrote:
 On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Thanks. I guess we'd need to cross-reference to function names from there.

 Updated to include function names.

Noice. One

less phobos_allocations.txt | grep 'In function'| sed 
"s/.*'\\(.*\\)':/\\1/"|sort|uniq >phobos_allocating_functions.txt

later, and...


Andrei

Feb 06 2014

"Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:

On Thursday, 6 February 2014 at 18:25:34 UTC, Andrei Alexandrescu 
wrote:
 Noice. One

 less phobos_allocations.txt | grep 'In function'| sed
 "s/.*'\\(.*\\)':/\\1/"|sort|uniq
phobos_allocating_functions.txt

 later, and...

Well I'm just hacking on the -vgc pull to output what I want, but 
I should read titles better :). Here's the functions:

http://codepad.org/3TsPXryX

Feb 06 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Noice. One

 less phobos_allocations.txt | grep 'In function'| sed
 "s/.*'\\(.*\\)':/\\1/"|sort|uniq >phobos_allocating_functions.txt

 later, and...

Ah you've attached a file, didn't notice it on the left since I
usually skim the avatar part:
http://forum.dlang.org/thread/ld0d79$2ife$1 digitalmars.com?page=2#post-ld0k2u:242ptu:241:40digitalmars.com

Feb 06 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

06-Feb-2014 22:15, Andrej Mitrovic пишет:
 On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Thanks. I guess we'd need to cross-reference to function names from there.

 Updated to include function names.

Hm.
Somehow diffing this with coverage report may help filter out CTFE.
Some bugs are features :)

-- 
Dmitry Olshansky

Feb 06 2014

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Feb 06, 2014 at 11:39:30PM +0400, Dmitry Olshansky wrote:
 06-Feb-2014 22:15, Andrej Mitrovic пишет:
On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
Thanks. I guess we'd need to cross-reference to function names from
there.

Updated to include function names.

 Hm.
 Somehow diffing this with coverage report may help filter out CTFE.
 Some bugs are features :)

[...]

I thought *all* bugs are features... unintentional features. :-P


T

-- 
Bomb technician: If I'm running, try to keep up.

Feb 06 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

07-Feb-2014 00:15, H. S. Teoh пишет:
 On Thu, Feb 06, 2014 at 11:39:30PM +0400, Dmitry Olshansky wrote:
 06-Feb-2014 22:15, Andrej Mitrovic пишет:
 On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Thanks. I guess we'd need to cross-reference to function names from
 there.

 Updated to include function names.

 Hm.
 Somehow diffing this with coverage report may help filter out CTFE.
 Some bugs are features :)

 [...]

 I thought *all* bugs are features... unintentional features. :-P

O.T. From a pragmatic point of view any specific property of a system 
that is useful to the enduser is a feature. Not all bugs are useful ;)

 T


-- 
Dmitry Olshansky

Feb 06 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

06-Feb-2014 21:21, Andrej Mitrovic пишет:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt

 Ah just realized there are duplicates in the report. I guess -vgc is
 emitting dupes.

 Updated to remove duplicate reports.

Needs to somehow cut down CTFE-only stuff.
E.g. std.regex alocates a lot at CTFE (and in debug sections), it's a 
prominent example of CTFE but there is a _lot_ more in the same theme.

-- 
Dmitry Olshansky

Feb 06 2014

Johannes Pfau <nospam example.com> writes:

Am Thu, 06 Feb 2014 16:32:08 +0000
schrieb "Dicebot" <public dicebot.lv>:

 On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu 
 wrote:
 Would anyone be willing to take on the ingrate task of creating 
 a comprehensive list with all Phobos functions (and more 
 generally artifacts) that allocate memory? That would help a 
 lot with focusing the discussion.

 Andrei

 
 Merging https://github.com/D-Programming-Language/dmd/pull/1886 
 and running phobos unit tests should make it relatively simple, 
 at least for a first pass.

That's only for implicit allocations though. And please, don't merge
yet, it'll get another rewrite this weekend ;-)

One interesting point is that module that were written with avoiding
allocations in mind usually still allocate when throwing exceptions.

Here's some example output for
std.uuid/digest/path/range/algorithm/curl:
http://dpaste.dzfl.pl/96d3725b06e2

Feb 06 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/14, 10:05 AM, Johannes Pfau wrote:
 Am Thu, 06 Feb 2014 16:32:08 +0000
 schrieb "Dicebot" <public dicebot.lv>:

 On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu
 wrote:
 Would anyone be willing to take on the ingrate task of creating
 a comprehensive list with all Phobos functions (and more
 generally artifacts) that allocate memory? That would help a
 lot with focusing the discussion.

 Andrei

 Merging https://github.com/D-Programming-Language/dmd/pull/1886
 and running phobos unit tests should make it relatively simple,
 at least for a first pass.

 That's only for implicit allocations though. And please, don't merge
 yet, it'll get another rewrite this weekend ;-)

Please close if you plan to rewrite.

 One interesting point is that module that were written with avoiding
 allocations in mind usually still allocate when throwing exceptions.

Good point, we need to address that as well.


Andrei

Feb 06 2014

"grm" <gerhard.mueller gmsoft.at> writes:

 That's only for implicit allocations though. And please, don't 
 merge
 yet, it'll get another rewrite this weekend ;-)

 Please close if you plan to rewrite.



 Andrei


expecting the requested close, so some OTs (in random order):

- bought TDPL shortly after it's been released
- was very impressed by the concept
- following the NGs since, I guess, 2010
- great community and *very* smart people
- had nothing of value to add yet, tough (since I'm stuck with 
C/C++/Jave and some proprietary stuff)

- and today I submitted my first reply, which was incredibly easy.
   no annoyance!
   please make this more obvious for guys like me that do not want 
to register.

thx and good luck to you all
hope I can contribute my share some day

Kind Regards

Feb 06 2014

"fra" <a b.it> writes:

On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu 
wrote:
 One interesting point is that module that were written with 
 avoiding
 allocations in mind usually still allocate when throwing 
 exceptions.

 Good point, we need to address that as well.


 Andrei

Hey, wait a second. How do you throw without allocating?

Feb 06 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/14, 10:52 AM, fra wrote:
 On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:
 One interesting point is that module that were written with avoiding
 allocations in mind usually still allocate when throwing exceptions.

 Good point, we need to address that as well.


 Andrei

 Hey, wait a second. How do you throw without allocating?

I don't know yet. That's what the "addressing the problem" will take 
care of! :o)

Andrei

Feb 06 2014

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Feb 06, 2014 at 11:01:18AM -0800, Andrei Alexandrescu wrote:
 On 2/6/14, 10:52 AM, fra wrote:
On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:
One interesting point is that module that were written with
avoiding allocations in mind usually still allocate when throwing
exceptions.

Good point, we need to address that as well.


Andrei

Hey, wait a second. How do you throw without allocating?

 
 I don't know yet. That's what the "addressing the problem" will take
 care of! :o)

[...]

You can just pre-declare the Exception as a global variable and then
throw that. Well, OK, it's cheating because you still have to allocate
it then, but the point is that you get to control how it gets allocated
at the top-level rather than having the 'new' buried deep down in the
function call chain where you can't control whether the code uses 'new'
or a custom allocator (it may not know about which allocator to use).

	Exception prealloc_exc;
	static this() {
		prealloc_exc = ... /* use whatever allocation method you want */
	}
	void main() {
		try {
			func();
		} catch(Exception e) {
			// you get prealloc_exc here
		}
	}
	void func() {
		if (error) {
			// init exception parameters
			prealloc_exc.msg = ...;
				/* presumably you preallocate the
				 * message string too, with the
				 * allocator of your choice */

			throw prealloc_exc; // N.B. no allocation
		}
	}


T

-- 
Doubtless it is a good thing to have an open mind, but a truly open mind should
be open at both ends, like the food-pipe, with the capacity for excretion as
well as absorption. -- Northrop Frye

Feb 06 2014

Johannes Pfau <nospam example.com> writes:

Am Thu, 06 Feb 2014 18:52:20 +0000
schrieb "fra" <a b.it>:

 On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu 
 wrote:
 One interesting point is that module that were written with 
 avoiding
 allocations in mind usually still allocate when throwing 
 exceptions.

 Good point, we need to address that as well.


 Andrei

 
 Hey, wait a second. How do you throw without allocating?
 

You can store the exception as a global and that's done for the
OutOfMemoryError IIRC, but what I meant was 'allocate with the GC'.

Feb 06 2014

Johannes Pfau <nospam example.com> writes:

Am Thu, 6 Feb 2014 20:00:50 +0100
schrieb Johannes Pfau <nospam example.com>:

 Am Thu, 06 Feb 2014 18:52:20 +0000
 schrieb "fra" <a b.it>:
 
 On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu 
 wrote:
 One interesting point is that module that were written with 
 avoiding
 allocations in mind usually still allocate when throwing 
 exceptions.

 Good point, we need to address that as well.


 Andrei

 
 Hey, wait a second. How do you throw without allocating?
 

 
 You can store the exception as a global and that's done for the
 OutOfMemoryError IIRC, but what I meant was 'allocate with the GC'.

Oh and in other languages you can throw by value but I think that
wouldn't work in D because of exception chaining.

Feb 06 2014

"Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:

On Thursday, 6 February 2014 at 19:01:33 UTC, Johannes Pfau wrote:
 You can store the exception as a global and that's done for the
 OutOfMemoryError IIRC.

Hmm.. is that even safe? I mean in some case of exception 
chaining the same object could be overwritten before being thrown 
again, thereby losing the original exception state. Thinking out 
loud here..

Feb 06 2014

"Namespace" <rswhite4 googlemail.com> writes:

On Thursday, 6 February 2014 at 19:05:49 UTC, Andrej Mitrovic 
wrote:
 On Thursday, 6 February 2014 at 19:01:33 UTC, Johannes Pfau 
 wrote:
 You can store the exception as a global and that's done for the
 OutOfMemoryError IIRC.

 Hmm.. is that even safe? I mean in some case of exception 
 chaining the same object could be overwritten before being 
 thrown again, thereby losing the original exception state. 
 Thinking out loud here..

You could use a circular buffer with appropriate length.

Feb 06 2014

Iain Buclaw <ibuclaw gdcproject.org> writes:

On 6 February 2014 18:52, fra <a b.it> wrote:
 On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:
 One interesting point is that module that were written with avoiding
 allocations in mind usually still allocate when throwing exceptions.


 Good point, we need to address that as well.


 Andrei


 Hey, wait a second. How do you throw without allocating?

You can't. :o)

Feb 06 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 Hey, wait a second. How do you throw without allocating?

I think exceptions should be ok. You optimize the typical path, 
and exceptions are (by definition) an exceptional path. If they 
are also unacceptable, you could restrict yourself to nothrow 
functions. (Which can still throw Errors... but meh they are even 
*more* exceptional)

Feb 06 2014

Johannes Pfau <nospam example.com> writes:

Am Thu, 06 Feb 2014 19:08:39 +0000
schrieb "Adam D. Ruppe" <destructionator gmail.com>:

 On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 Hey, wait a second. How do you throw without allocating?

 
 I think exceptions should be ok. You optimize the typical path, 
 and exceptions are (by definition) an exceptional path. If they 
 are also unacceptable, you could restrict yourself to nothrow 
 functions. (Which can still throw Errors... but meh they are even 
 *more* exceptional)

That depends on your situation. For games and other applications on
normal computers it's OK.

For games on systems like embedded gaming systems (think like
NintendoDS, 4MB ram) you might not have a GC but still want to use
exception handling.

Feb 06 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Thursday, 6 February 2014 at 19:32:11 UTC, Johannes Pfau wrote:
 For games on systems like embedded gaming systems (think like
 NintendoDS, 4MB ram) you might not have a GC but still want to 
 use exception handling.

Yeah, when I toyed with bare metal D, I did exceptions with 
manual memory management - malloc when throwing (well, I did 
malloc in _d_newclass so it was transparent to the throwing 
code), free when catching.

But I think a program written for a special environment will have 
different coding standards from top to bottom, including the need 
to free in an exception handler and the option to hack druntime.

Feb 06 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 6 February 2014 at 19:08:40 UTC, Adam D. Ruppe wrote:
 On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 Hey, wait a second. How do you throw without allocating?

 I think exceptions should be ok. You optimize the typical path, 
 and exceptions are (by definition) an exceptional path. If they 
 are also unacceptable, you could restrict yourself to nothrow 
 functions. (Which can still throw Errors... but meh they are 
 even *more* exceptional)

Hardly so. Any exception allocation can trigger GC collection 
cycle and Phobos does not provide any other way to handle data 
errors. Any application that operates on some external user input 
will be subject to DoS attack vector if it uses Phobos directly.

It was huge performance killer for vibe.d last time I have 
checked, for example.

Feb 06 2014

"Brad Anderson" <eco gnuk.net> writes:

On Thursday, 6 February 2014 at 21:38:03 UTC, Dicebot wrote:
 On Thursday, 6 February 2014 at 19:08:40 UTC, Adam D. Ruppe 
 wrote:
 On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 Hey, wait a second. How do you throw without allocating?

 I think exceptions should be ok. You optimize the typical 
 path, and exceptions are (by definition) an exceptional path. 
 If they are also unacceptable, you could restrict yourself to 
 nothrow functions. (Which can still throw Errors... but meh 
 they are even *more* exceptional)

 Hardly so. Any exception allocation can trigger GC collection 
 cycle and Phobos does not provide any other way to handle data 
 errors. Any application that operates on some external user 
 input will be subject to DoS attack vector if it uses Phobos 
 directly.

 It was huge performance killer for vibe.d last time I have 
 checked, for example.

Personally I don't think bad user input qualifies as an 
exceptional case because it's expected to happen and the program 
is expected to handle it (and let the user know) when it does. 
That's just a matter of taste though.

Feb 06 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 6 February 2014 at 22:15:11 UTC, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an 
 exceptional case because it's expected to happen and the 
 program is expected to handle it (and let the user know) when 
 it does. That's just a matter of taste though.

I agree. It kills the whole concept of "exceptions are rare so 
they don't need to be fast when thrown". But it is how quite lot 
of Phobos is currently designed and, in my opinion, is biggest 
design mistake of vibe.d too (it uses exceptions to propagate 
HTTP status codes)

Feb 06 2014

"Brad Anderson" <eco gnuk.net> writes:

On Thursday, 6 February 2014 at 22:19:42 UTC, Dicebot wrote:
 On Thursday, 6 February 2014 at 22:15:11 UTC, Brad Anderson 
 wrote:
 Personally I don't think bad user input qualifies as an 
 exceptional case because it's expected to happen and the 
 program is expected to handle it (and let the user know) when 
 it does. That's just a matter of taste though.

 I agree. It kills the whole concept of "exceptions are rare so 
 they don't need to be fast when thrown". But it is how quite 
 lot of Phobos is currently designed and, in my opinion, is 
 biggest design mistake of vibe.d too (it uses exceptions to 
 propagate HTTP status codes)

I must admit that I am guilty of sometimes using exceptions for 
routine control flow too. It's just so convenient compared to 
validation/consumption.

Maybe we should make a list of Phobos functions that throw 
exceptions and ensure that (for the ones where this makes sense) 
they non-throwing validators available. If we can stop gc 
allocating them that'd be even better but I don't think them 
being gc allocating should hold up  nogc.

Feb 06 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/6/2014 2:15 PM, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an exceptional case
because
 it's expected to happen and the program is expected to handle it (and let the
 user know) when it does. That's just a matter of taste though.

It's not a matter of taste. If your input is subject to a DoS attack, don't put 
exceptions in the control flow.

Feb 06 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Walter Bright:

 It's not a matter of taste. If your input is subject to a DoS 
 attack, don't put exceptions in the control flow.

Perhaps the world of today malicious attacks on the software you 
write should be assumed as the default situation, and then the 
language+library has to offer something less paranoiac on request.

That's why some languages have changed their sorting and hashing 
routines to make them a little slower but safer on default.

Bye,
bearophile

Feb 06 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/6/2014 7:08 PM, bearophile wrote:
 Walter Bright:

 It's not a matter of taste. If your input is subject to a DoS attack, don't
 put exceptions in the control flow.

 Perhaps the world of today malicious attacks on the software you write should
be
 assumed as the default situation, and then the language+library has to offer
 something less paranoiac on request.

 That's why some languages have changed their sorting and hashing routines to
 make them a little slower but safer on default.

DoS attack resistance requires faster code, not slower code.

Feb 07 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 7 February 2014 at 08:30:35 UTC, Walter Bright wrote:
 On 2/6/2014 7:08 PM, bearophile wrote:
 That's why some languages have changed their sorting and 
 hashing routines to
 make them a little slower but safer on default.

 DoS attack resistance requires faster code, not slower code.

The specific problem was that it was possible to provoke hash 
collisions by sending carefully crafted input, causing the 
hash-tables to degrade to linked lists. The small performance 
penalty of using collision-resistant hashes is certainly worth it 
in this case.

Feb 07 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/7/2014 6:50 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 The specific problem was that it was possible to provoke hash collisions by
 sending carefully crafted input, causing the hash-tables to degrade to linked
 lists. The small performance penalty of using collision-resistant hashes is
 certainly worth it in this case.

That has nothing to do with needing exceptions in the control flow path (and
the 
performance penalty for using exceptions in this manner is certainly not small).

Feb 08 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Saturday, 8 February 2014 at 21:59:24 UTC, Walter Bright wrote:
 On 2/7/2014 6:50 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 The specific problem was that it was possible to provoke hash 
 collisions by
 sending carefully crafted input, causing the hash-tables to 
 degrade to linked
 lists. The small performance penalty of using 
 collision-resistant hashes is
 certainly worth it in this case.

 That has nothing to do with needing exceptions in the control 
 flow path (and the performance penalty for using exceptions in 
 this manner is certainly not small).

Huh? I responded to this discussion:

On Friday, 7 February 2014 at 08:30:35 UTC, Walter Bright wrote:
 On 2/6/2014 7:08 PM, bearophile wrote:
 That's why some languages have changed their sorting and 
 hashing routines to
 make them a little slower but safer on default.

 DoS attack resistance requires faster code, not slower code.

I was merely clarifying why in this specific case making the 
average code path slower _did_ help DoS attack resistance.

Feb 09 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Friday, 7 February 2014 at 08:30:35 UTC, Walter Bright wrote:
 On 2/6/2014 7:08 PM, bearophile wrote:
 Walter Bright:

 It's not a matter of taste. If your input is subject to a DoS 
 attack, don't
 put exceptions in the control flow.

 Perhaps the world of today malicious attacks on the software 
 you write should be
 assumed as the default situation, and then the 
 language+library has to offer
 something less paranoiac on request.

 That's why some languages have changed their sorting and 
 hashing routines to
 make them a little slower but safer on default.

 DoS attack resistance requires faster code, not slower code.

I think bearophile is referring to a practice of avoiding fast 
average-case, slow worst-case algorithms in favour of faster 
worst-cases.

If an algorithm has best-case O(n*log(n)) and worst case O(n^2), 
it's often not practical to build for the worst case, but 
anything less than that can make you vulnerable to malicious 
input as part of DOS.

In comparison, an algorithm with O(n*log^2(n)) average and 
worst-case might be acceptable in the average case, but will hold 
up better in the face of attack.


I'm not sure how relevant the point is to the general discussion.

Feb 07 2014

"bearophile" <bearophileHUGS lycos.com> writes:

John Colvin:

 I think bearophile is referring to

Yes, you have explained well my point. Thank you.

Bye,
bearophile

Feb 07 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

07-Feb-2014 06:44, Walter Bright пишет:
 On 2/6/2014 2:15 PM, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an exceptional
 case because
 it's expected to happen and the program is expected to handle it (and
 let the
 user know) when it does. That's just a matter of taste though.

 It's not a matter of taste. If your input is subject to a DoS attack,
 don't put exceptions in the control flow.

Meh. If exceptions are such a liability we'd better make them (much) 
faster.

-- 
Dmitry Olshansky

Feb 07 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Friday, 7 February 2014 at 16:41:00 UTC, Dmitry Olshansky 
wrote:
 Meh. If exceptions are such a liability we'd better make them 
 (much) faster.

It's not stack unwinding speed that's an issue here though, but 
rather that for client-facing services, throwing an exception 
when an invalid request is received gives malicious clients an 
opportunity to hurt service performance by flooding it with 
invalid requests.  Improving the exception code specifically 
doesn't help here because the real issue is with GC collections.  
I'd say that the real fix is for such services to simply not 
throw in this case.  But the exception could always be recycled 
as well (since in this case you know that throwing will abort the 
transaction and so will always be immediately discarded).  I'm 
not convinced that there's any need for a language change here to 
support scoped exceptions.  That seems a bit like killing the ant 
with a steamroller.

Feb 07 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

07-Feb-2014 20:49, Sean Kelly пишет:
 On Friday, 7 February 2014 at 16:41:00 UTC, Dmitry Olshansky wrote:
 Meh. If exceptions are such a liability we'd better make them (much)
 faster.

 It's not stack unwinding speed that's an issue here though, but rather
 that for client-facing services, throwing an exception when an invalid
 request is received gives malicious clients an opportunity to hurt
 service performance by flooding it with invalid requests.

Why throwing a single exception is such a big problem? Surely even C's 
long_jump wasn't that expensive? *Maybe* we shouldn't re-construct  full 
stack trace on every throw?

 Improving the
 exception code specifically doesn't help here because the real issue is
 with GC collections.

Then the problem is that something so temporary as an exception is 
allocated on the GC heap in the first place? Let's go for something more 
sane and deprecate the current behavior, it's not like we are forever 
stuck with it.

I'd say that the real fix is for such services to
 simply not throw in this case.  But the exception could always be
 recycled as well (since in this case you know that throwing will abort
 the transaction and so will always be immediately discarded).

Exceptions are convenient and they make life that much easier combined 
with ctors/dtors and scoped lifetime. And then we say **ck it - for busy 
services, just use good ol':
...
if (check42(...) == -1){ call_cleanup42(); return -1; }
...

And up the callstack we march. The moment code gets non-trivial there 
come exceptions and RAII to save the day, I don't see how busy REST 
services are unlike anything else.

 I'm not
 convinced that there's any need for a language change here to support
 scoped exceptions.  That seems a bit like killing the ant with a
 steamroller.

Well I'm not convinced we should accept that exceptions are many times 
slower then error codes (with checks on every function that may fail + 
propagating up the stack).

-- 
Dmitry Olshansky

Feb 07 2014

"Dicebot" <public dicebot.lv> writes:

On Friday, 7 February 2014 at 17:06:36 UTC, Dmitry Olshansky 
wrote:
 I'm not
 convinced that there's any need for a language change here to 
 support
 scoped exceptions.  That seems a bit like killing the ant with 
 a
 steamroller.

 Well I'm not convinced we should accept that exceptions are 
 many times slower then error codes (with checks on every 
 function that may fail + propagating up the stack).

As I have already mentioned, they don't necessarily need to be. 
But that may require tweaking language so that pre-allocated 
exception usage becomes reliable and I don't see tools right now 
that allow to express neseccary semantics (can't store reference 
to instance without deep copy)

Feb 07 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/7/2014 10:10 AM, Dicebot wrote:
 As I have already mentioned, they don't necessarily need to be. But that may
 require tweaking language so that pre-allocated exception usage becomes
reliable
 and I don't see tools right now that allow to express neseccary semantics
(can't
 store reference to instance without deep copy)

It is NOT the allocation that's the issue. C++ code has the same issue. It's
the 
exception handling table lookup.

Feb 08 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Friday, 7 February 2014 at 17:06:36 UTC, Dmitry Olshansky
wrote:
 07-Feb-2014 20:49, Sean Kelly пишет:
 On Friday, 7 February 2014 at 16:41:00 UTC, Dmitry Olshansky 
 wrote:
 Meh. If exceptions are such a liability we'd better make them 
 (much)
 faster.

 It's not stack unwinding speed that's an issue here though, 
 but rather
 that for client-facing services, throwing an exception when an 
 invalid
 request is received gives malicious clients an opportunity to 
 hurt
 service performance by flooding it with invalid requests.

 Why throwing a single exception is such a big problem? Surely 
 even C's long_jump wasn't that expensive? *Maybe* we shouldn't 
 re-construct  full stack trace on every throw?

That can be turned off at run time by clearing the traceHandler.
But yeah, it's the allocations that are a problem in this case,
not the unwinding.  And specifically, that flooding with bad
requests effectively generates tons of garbage (an allocation for
the exception plus another for the trace data) thus triggering
frequent stop-the-world collections.


 Exceptions are convenient and they make life that much easier 
 combined with ctors/dtors and scoped lifetime. And then we say 
 **ck it - for busy services, just use good ol':
 ...
 if (check42(...) == -1){ call_cleanup42(); return -1; }
 ...

 And up the callstack we march. The moment code gets non-trivial 
 there come exceptions and RAII to save the day, I don't see how 
 busy REST services are unlike anything else.

I'm sure you can see how a service is different from a desktop
application, right?  In the latter case, there's only one user
and he's interested in having his application perform well.
Outside of a QA lab you won't find desktop app. users
deliberately trying to break their app.  Services are exactly the
opposite.  It's not an exaggeration when I say that the services
I work on are under attack from botnets 24/7.  This is a use case
that must be considered as a first order of business or the
entire service suffers.


 I'm not convinced that there's any need for a language change 
 here to support scoped exceptions.  That seems a bit like 
 killing the ant with a steamroller.

 Well I'm not convinced we should accept that exceptions are 
 many times slower then error codes (with checks on every 
 function that may fail + propagating up the stack).

Exception-oriented code is typically faster for the success case
because all that return code checking can be removed.  But the
tradeoff is that it's slower in the failure case because stack
unwinding is simply slower than checking an error code.  But
again, the issue here isn't the cost of stack unwinding, it's
that thousands of exceptions thrown per second generates a lot of
garbage, and garbage collection in D is currently fairly slow
compared to, say, Java.  If we could get an incremental GC for D
I probably wouldn't even care, but I think that's impossible.

Feb 07 2014

"Adam Wilson" <flyboynw gmail.com> writes:

On Fri, 07 Feb 2014 10:54:37 -0800, Sean Kelly <sean invisibleduck.org> =
 =

wrote:

 On Friday, 7 February 2014 at 17:06:36 UTC, Dmitry Olshansky
 wrote:
 07-Feb-2014 20:49, Sean Kelly =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
 On Friday, 7 February 2014 at 16:41:00 UTC, Dmitry Olshansky wrote:
 Meh. If exceptions are such a liability we'd better make them (much=




)
 faster.

 It's not stack unwinding speed that's an issue here though, but rath=



er
 that for client-facing services, throwing an exception when an inval=



id
 request is received gives malicious clients an opportunity to hurt
 service performance by flooding it with invalid requests.

 Why throwing a single exception is such a big problem? Surely even C'=


s  =

 long_jump wasn't that expensive? *Maybe* we shouldn't re-construct   =


 full stack trace on every throw?

 That can be turned off at run time by clearing the traceHandler.
 But yeah, it's the allocations that are a problem in this case,
 not the unwinding.  And specifically, that flooding with bad
 requests effectively generates tons of garbage (an allocation for
 the exception plus another for the trace data) thus triggering
 frequent stop-the-world collections.


 Exceptions are convenient and they make life that much easier combine=


d  =

 with ctors/dtors and scoped lifetime. And then we say **ck it - for  =


 busy services, just use good ol':
 ...
 if (check42(...) =3D=3D -1){ call_cleanup42(); return -1; }
 ...

 And up the callstack we march. The moment code gets non-trivial there=


  =

 come exceptions and RAII to save the day, I don't see how busy REST  =


 services are unlike anything else.

 I'm sure you can see how a service is different from a desktop
 application, right?  In the latter case, there's only one user
 and he's interested in having his application perform well.
 Outside of a QA lab you won't find desktop app. users
 deliberately trying to break their app.  Services are exactly the
 opposite.  It's not an exaggeration when I say that the services
 I work on are under attack from botnets 24/7.  This is a use case
 that must be considered as a first order of business or the
 entire service suffers.


 I'm not convinced that there's any need for a language change here t=



o  =

 support scoped exceptions.  That seems a bit like killing the ant wi=



th  =

 a steamroller.

 Well I'm not convinced we should accept that exceptions are many time=


s  =

 slower then error codes (with checks on every function that may fail =


+  =

 propagating up the stack).

 Exception-oriented code is typically faster for the success case
 because all that return code checking can be removed.  But the
 tradeoff is that it's slower in the failure case because stack
 unwinding is simply slower than checking an error code.  But
 again, the issue here isn't the cost of stack unwinding, it's
 that thousands of exceptions thrown per second generates a lot of
 garbage, and garbage collection in D is currently fairly slow
 compared to, say, Java.  If we could get an incremental GC for D
 I probably wouldn't even care, but I think that's impossible.

Technically, there is no reason that the current GC can't be made  =

incremental, insofar as incremental means collecting only what is requir=
ed  =

complete the allocation.

-- =

Adam Wilson
GitHub/IRC: LightBender
Aurora Project Coordinator

Feb 07 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

07-Feb-2014 22:54, Sean Kelly пишет:
 On Friday, 7 February 2014 at 17:06:36 UTC, Dmitry Olshansky
 wrote:
 It's not stack unwinding speed that's an issue here though, but rather
 that for client-facing services, throwing an exception when an invalid
 request is received gives malicious clients an opportunity to hurt
 service performance by flooding it with invalid requests.

 Why throwing a single exception is such a big problem? Surely even C's
 long_jump wasn't that expensive? *Maybe* we shouldn't re-construct
 full stack trace on every throw?

 That can be turned off at run time by clearing the traceHandler.

Which should be somehow prominently advertised for release builds. Last 
time I checked not making it null made exceptions ridiculously slow.

 But yeah, it's the allocations that are a problem in this case,
 not the unwinding.  And specifically, that flooding with bad
 requests effectively generates tons of garbage (an allocation for
 the exception plus another for the trace data) thus triggering
 frequent stop-the-world collections.

So again - the problem is allocations on GC heap. Then let's please not 
worry about tiny gains of avoiding stack unwind, that is well understood.

And I see no reason for allocating exceptions on GC (and none presented 
so far). The main use case of exception is to consume exception on catch 
or forward it down the line. Storing a reference to an exception 
elsewhere is rare case. I could see the whole situation with exceptions 
in D as
"we copied this shit from Java, no idea why"

Java at least does go to great lengths to make them fast
(by caching them behind the scenes and whatnot).

 Exceptions are convenient and they make life that much easier combined
 with ctors/dtors and scoped lifetime. And then we say **ck it - for
 busy services, just use good ol':
 ...
 if (check42(...) == -1){ call_cleanup42(); return -1; }
 ...

 And up the callstack we march. The moment code gets non-trivial there
 come exceptions and RAII to save the day, I don't see how busy REST
 services are unlike anything else.

 I'm sure you can see how a service is different from a desktop
 application, right?

Aye, in fact I haven't written much in the way of desktop apps.

 In the latter case, there's only one user
 and he's interested in having his application perform well.
 Outside of a QA lab you won't find desktop app. users
 deliberately trying to break their app.  Services are exactly the
 opposite.  It's not an exaggeration when I say that the services
 I work on are under attack from botnets 24/7.  This is a use case
 that must be considered as a first order of business or the
 entire service suffers.

I bet some sanity checks on the level of protocol handling is more then 
enough.
Yeah these might be faster then unwinding due to shear volume of bad 
data, but it's a fraction of code albeit a critical fraction.

I was thinking about the service logic on top of that.

 I'm not convinced that there's any need for a language change here to
 support scoped exceptions.  That seems a bit like killing the ant
 with a steamroller.

 Well I'm not convinced we should accept that exceptions are many times
 slower then error codes (with checks on every function that may fail +
 propagating up the stack).

 Exception-oriented code is typically faster for the success case
 because all that return code checking can be removed.  But the
 tradeoff is that it's slower in the failure case because stack
 unwinding is simply slower than checking an error code.

Duly noted. Just stating the obvious - in the majority of cases we talk 
about 1 unwind vs 10s of checks. The difference isn't THAT big anyway, 
the only advantage of codes checking is being able to fail faster on 
some _early_ bad condition.

 But
 again, the issue here isn't the cost of stack unwinding, it's
 that thousands of exceptions thrown per second generates a lot of
 garbage, and garbage collection in D is currently fairly slow
 compared to, say, Java.

Let's stop bashing GC here. This part of design of exceptions in D is 
just backwards (penalizes usual case) - time to fix it?



-- 
Dmitry Olshansky

Feb 07 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/7/2014 10:54 AM, Sean Kelly wrote:
 But yeah, it's the allocations that are a problem in this case,

Code can always pre-allocate the exception that is thrown. There's no reason 
whatsoever that allocation is required at the throw point, nor is there any 
reason the thrown exception has to be newly allocated each time.

And, as such, this is entirely a coding issue, not a language or runtime one.

Feb 08 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/7/2014 9:06 AM, Dmitry Olshansky wrote:
 Why throwing a single exception is such a big problem?

Because in order to unwind the stack, you need to find the information about
the 
stack layout. This lookup is rather slow. You can make the lookup faster by 
compromising the function code generation, but this is considered an 
unacceptable tradeoff.

Feb 08 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

09-Feb-2014 02:03, Walter Bright пишет:
 On 2/7/2014 9:06 AM, Dmitry Olshansky wrote:
 Why throwing a single exception is such a big problem?

 Because in order to unwind the stack, you need to find the information
 about the stack layout. This lookup is rather slow. You can make the
 lookup faster by compromising the function code generation, but this is
 considered an unacceptable tradeoff.

A special table lookup can't be slow compared to writing a dummy HTTP 
500 response. Just saying. Yes, it's a tad slower then cmp + jz, I do 
understand that.

Again I'm trying to say that framing stack unwinding as the culprit of 
vibe.d crawling under bad requests is plain wrong, and that was the 
focal point of the original argument.

-- 
Dmitry Olshansky

Feb 08 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Saturday, 8 February 2014 at 22:11:13 UTC, Dmitry Olshansky 
wrote:
 Again I'm trying to say that framing stack unwinding as the 
 culprit of vibe.d crawling under bad requests is plain wrong, 
 and that was the focal point of the original argument.

Can you see if it is better with this little patch?

https://github.com/D-Programming-Language/druntime/pull/717

on a simple test, I got a 20x speedup on most exceptions by lazy 
generating the stack trace upon request in toString (though if 
you are printing it anyway you won't see a difference)

Feb 08 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/8/2014 2:11 PM, Dmitry Olshansky wrote:
 09-Feb-2014 02:03, Walter Bright пишет:
 On 2/7/2014 9:06 AM, Dmitry Olshansky wrote:
 Why throwing a single exception is such a big problem?

 Because in order to unwind the stack, you need to find the information
 about the stack layout. This lookup is rather slow. You can make the
 lookup faster by compromising the function code generation, but this is
 considered an unacceptable tradeoff.

 A special table lookup can't be slow compared to writing a dummy HTTP 500
 response. Just saying. Yes, it's a tad slower then cmp + jz, I do understand
that.

 Again I'm trying to say that framing stack unwinding as the culprit of vibe.d
 crawling under bad requests is plain wrong, and that was the focal point of the
 original argument.

I don't know how vibe.d works, but my point is using exception handling to 
implement normal control flow is bad design and it is going to be slow and the 
reason it is slow is because of the table lookup and unwinding cost, and that
is 
not going to be fixed.

Feb 08 2014

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, February 08, 2014 21:21:40 Walter Bright wrote:
 On 2/8/2014 2:11 PM, Dmitry Olshansky wrote:
 09-Feb-2014 02:03, Walter Bright пишет:
 On 2/7/2014 9:06 AM, Dmitry Olshansky wrote:
 Why throwing a single exception is such a big problem?

 
 Because in order to unwind the stack, you need to find the information
 about the stack layout. This lookup is rather slow. You can make the
 lookup faster by compromising the function code generation, but this is
 considered an unacceptable tradeoff.

 
 A special table lookup can't be slow compared to writing a dummy HTTP 500
 response. Just saying. Yes, it's a tad slower then cmp + jz, I do
 understand that.
 
 Again I'm trying to say that framing stack unwinding as the culprit of
 vibe.d crawling under bad requests is plain wrong, and that was the focal
 point of the original argument.

 
 I don't know how vibe.d works, but my point is using exception handling to
 implement normal control flow is bad design and it is going to be slow and
 the reason it is slow is because of the table lookup and unwinding cost,
 and that is not going to be fixed.

I wouldn't have considered throwing on an HTTP error to be "flow control." 
That's normal error handling, and throwing on HTTP errors is exactly what I 
would have done. It generally makes code a _lot_ cleaner that way, because you 
don't have to constantly check return codes for errors, and it's using 
exceptions for exactly what they're there for - reporting and handling errors.

You don't want to use exceptions for stuff other than error reporting, and you 
don't want to use them in situations where the error case is the frequent 
case, but that shouldn't be the case for HTTP.

Exceptions _will_ be slower than other code paths, and you don't want them to 
be the normal code path. Nothing is going to make exceptions as fast as the 
normal code paths either. However, D's exceptions are painfully slow - far 
slower than is reasonable - whether that's because of allocating the exception 
or unwinding the stack or creating the string for the stack trace or whatever 
is a matter for investigation, and I'm not about to claim that I know where 
the bottlenecks are.

Fortunately, it looks like Adam Ruppe has found some ways to speed up 
exceptions:

https://github.com/D-Programming-Language/druntime/pull/717

And there may be other improvements that we can implement as well. I agree 
that there's a limit to how much we can speed up exceptions, but right now, at 
minimum, we're getting creamed by Java in terms of speed:

https://d.puremagic.com/issues/show_bug.cgi?id=9584

- Jonathan M Davis

Feb 08 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Sunday, 9 February 2014 at 05:57:44 UTC, Jonathan M Davis 
wrote:
 Exceptions _will_ be slower than other code paths, and you 
 don't want them to
 be the normal code path. Nothing is going to make exceptions as 
 fast as the
 normal code paths either. However, D's exceptions are painfully

Just to be pedantic: this is not true.

If you have frame based exception meta-info recording then a 
throw out of recursion (without try-blocks in the recursion) will 
be faster than normal returns. You unwind down to the try-block 
with loading a register and a single JMP. All you have to do is 
to maintain a single linked list of stack frames that can catch. 
AFAIK the overhead is neglectible if you avoid doing try-blocks 
in light-weight function calls. You store one pointer per 
catching stack-frame.

That alone is good enough reason to realize that exception 
handling strategy should be a compiler switch, not a language 
policy. Because performance depends on what kind of code patterns 
you have and the architecture.

On current gen of x86 CPUs the decode stage of instructions into 
micro ops and pipelineing ought to be heavy enough  that simple 
BRA instructions "disappear". Thus the offset strategy ought to 
work well too (injecting data into the code stream near the 
return point and branch over it if necessary, but usually not).

Feb 09 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

And with profiling you get the call-frequency between functions, 
so a throw could be replaced with:

if (return_address = 0x1234556){...} // 60%
if (return_address = 0x7899324){...} // 30%
slow_unwinding()

That ought to be obvious.

Feb 09 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Saturday, 8 February 2014 at 22:03:13 UTC, Walter Bright wrote:
 You can make the lookup faster by compromising the function 
 code generation, but this is considered an unacceptable 
 tradeoff.

"Compromising"? You mean they had to modify codegen, which they 
didn't want to. Clearly, if you know the return address you also 
could have stack info access close to it (at a fixed offset), at 
no runtime cost whatsoever.

Feb 08 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

But the c++ Dwarf way of doing it was developed for Itanium which 
was targetting HPC, for which you probably don't need exceptions 
all that often. So it made sense in that context.

For regular applications it makes no sense, and with whole 
program analysis (or language level linker) you probably often 
can get a good match at the throw site.

Feb 08 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

AND (this just has to be said) if D is really meant to be a SAFE 
programming language then the language should NOT encourage 
programmers to a coding style where you can fail to test for 
errors. The obvious solution is to ensure that you cannot ignore 
errors unless you are explicit about it. Exceptions ensure that.

Having 3 different ways of returning errors is not a good 
strategy for safe and bug free programming.

Ah, I just had to say it... ;)

Feb 08 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/8/2014 2:59 PM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Saturday, 8 February 2014 at 22:03:13 UTC, Walter Bright wrote:
 You can make the lookup faster by compromising the function code generation,
 but this is considered an unacceptable tradeoff.

 "Compromising"? You mean they had to modify codegen, which they didn't want to.
 Clearly, if you know the return address you also could have stack info access
 close to it (at a fixed offset), at no runtime cost whatsoever.

Ola, I've done it both ways, I actually do know what I'm talking about.

I've sometimes been proven wrong here, so you're welcome to do a pull request 
proving so.

Feb 08 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 08 Feb 2014 21:29:27 -0800
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 2/8/2014 2:59 PM, "Ola Fosheim Gr=C3=B8stad"=20
 <ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Saturday, 8 February 2014 at 22:03:13 UTC, Walter Bright wrote:
 You can make the lookup faster by compromising the function code gener=



ation,
 but this is considered an unacceptable tradeoff.

 "Compromising"? You mean they had to modify codegen, which they didn't =


want to.
 Clearly, if you know the return address you also could have stack info =


access
 close to it (at a fixed offset), at no runtime cost whatsoever.

=20
 Ola, I've done it both ways, I actually do know what I'm talking about.
=20
 I've sometimes been proven wrong here, so you're welcome to do a pull req=

uest=20
 proving so.

It is not the function code gen that needs to be improved on
Linux, Walter. In fact that would be premature optimization
considering that the *construction* of exceptions outweights
unwinding costs for functions with no local variables by
multiple orders of magnitude.

--=20
Marco

Feb 08 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Sunday, 9 February 2014 at 05:29:25 UTC, Walter Bright wrote:
 Ola, I've done it both ways, I actually do know what I'm 
 talking about.

Please note that "you" and "they" was meant as "one" or "the c++ 
community" not personal. It was not ad hominem. So no reason to 
be defensive about it. I am grateful if you can point out where 
my reasoning fails, then I learn something new.

Maybe you could explain why a single occasional Branch Always 
over the unwind-pointer would be slow. Clearly the offset should 
be empirically based (so that you usually can avoid the goto), 
maybe even set to a separate cache line for some CPUs, and you 
could fill out the gaps with other data you need there. It's not 
like I have run i7 on Vtune, so I could be wrong, but I don't see 
why…

And I also think that if you have a CPU with sufficient number of 
callee save registers you can carry along a pointer to the last 
try-block stack frame with not much penalty. After all you only 
have to restore it if the function ruined it and before calling 
new functions that are not inlined and not nothrow, and you could 
stick it into a thread local global too where it matters. On 32 
bit x86 it probably is quite expensive though.

In code where I write try blocks  they tend to stay in the "main 
logic function", this cosde is so heavy that adding the stack 
frame to a linked list (of stack frames) is a neglectible cost

One really need to be careful when doing performance tests of 
exception handling, because it is easy to construct "theoretical" 
code. Programmers should write exception handlers with the 
implementation in mind, so using existing programs as a base line 
is not a good solution either.

 I've sometimes been proven wrong here, so you're welcome to do 
 a pull request proving so.

You know very well that I am not going to rewrite codegen for 
DMD. Adding this feature will complicate codegen and you need to 
understand the code generator well to do the modification.

Besides, I am not sure if a system level language should have 
exceptions at all or that I would use them when doing the kind of 
stuff I like to use D for. :-P ;-) I like to use exception 
handling in application-level code, but not in code for 
audio/simulations/buffer-streaming/low-level-stuff.

Feb 09 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

This is a pretty nice description of the i7 pipeline by Hennesey 
and Patterson:

https://www.inkling.com/read/computer-architecture-hennessy-5th/chapter-3/section-3-13#0113e87a6dc141d7abda84b497128d61

Notice the 28 micro ops buffer before execution. I'd expect a 
short predicted branch to not cause a big bubble, but I don't 
know for sure.

Feb 09 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, February 07, 2014 20:40:57 Dmitry Olshansky wrote:
 07-Feb-2014 06:44, Walter Bright пишет:
 On 2/6/2014 2:15 PM, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an exceptional
 case because
 it's expected to happen and the program is expected to handle it (and
 let the
 user know) when it does. That's just a matter of taste though.

 
 It's not a matter of taste. If your input is subject to a DoS attack,
 don't put exceptions in the control flow.

 
 Meh. If exceptions are such a liability we'd better make them (much)
 faster.

Related: http://d.puremagic.com/issues/show_bug.cgi?id=9584

The DOS aspect of exceptions are not something that I've ever thought about or 
seen discussed before, but one area where I've found the slowness of D's 
exceptions to be a real pain is in unit tests. I like to test failure cases as 
well as successful ones, and if you do much of that, your unit tests start 
taking a long time due to how insanely slow exceptions are in D.

So, while in some situations, the solution may be to not use exceptions (or to 
use them less), I think that we really need to look at doing what's necessary 
to make exceptions a lot faster - be it to more efficiently deal with stack 
traces or to avoid allocating them or whatever else we can come up with to 
make them fast. I think that the approach of assuming that exceptions don't 
need to be fast, because they're used for error conditions is a bad one. 
They're not as performance critical as normal code, but their speed still very 
much matters.

- Jonathan M Davis

Feb 07 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 7 February 2014 at 19:54:14 UTC, Jonathan M Davis 
wrote:
 They're not as performance critical as normal code, but their 
 speed still very much matters.

Well, it is at least more difficult to write reliable code when 
you have to try to avoid them. Still for a webservice you should 
probably not have to deal with more than 1000 per second on 
average, assume 1Ghz, then that is like 1.000.000 cycles of 
running code per stack unwinding.

If you sacrifice 10% of that for exception handling that means 
you have 100.000 cycles to unwind the stack. If the unwound stack 
is 5 frames deep you have 20.000 cycles per stack frame. If that 
is not possible something should be done with the Release-version 
of the runtime.

For a webserver you could of course tie the request handler 
directly to the request object and instantiate different ones for 
each request type then have all "unwinding" in the object itself. 
Quirky, but workable.

Feb 07 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/7/2014 11:53 AM, Jonathan M Davis wrote:
 or to avoid allocating them

Grep for 'throw' in std.datetime shows that every throw is actually:

     throw new ...

and an example:

     throw new DateTimeException("SYSTEMTIME cannot hold dates prior to the
year 
1601.");

There is no requirement that the new is done there. You can preallocate the 
DateTimeException statically, and simply keep rethrowing the same exception 
instance.

I.e. the allocation issue is a coding style issue, not a language problem.

Feb 08 2014

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, February 08, 2014 14:13:04 Walter Bright wrote:
 On 2/7/2014 11:53 AM, Jonathan M Davis wrote:
 or to avoid allocating them

 
 Grep for 'throw' in std.datetime shows that every throw is actually:
 
      throw new ...
 
 and an example:
 
      throw new DateTimeException("SYSTEMTIME cannot hold dates prior to the
 year 1601.");
 
 There is no requirement that the new is done there. You can preallocate the
 DateTimeException statically, and simply keep rethrowing the same exception
 instance.
 
 I.e. the allocation issue is a coding style issue, not a language problem.

Of course allocation is not a language issue. The question is whether (and 
how) we can change our approach to allocating exceptions in order to reduce 
their cost. And that's a change in how we approach them, not a change in the 
language itself. It might require some changes in druntime to better deal with 
other allocation schemes (particularly with how that affects exception 
chaining), but it's not a language issue.

And in general, I would expect that any speed-ups that we could attain with 
regards to actually throwing an exception would be in druntime's 
implementation rather than anything in the language itself. Any improvements 
there could then be combined with any improvements we could make to our 
approach to allocating exceptions (and for better or worse - probably worse - 
the normal approach at this point is to allocate a new exception when 
throwing).

- Jonathan M Davis

Feb 08 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/7/14, 8:40 AM, Dmitry Olshansky wrote:
 07-Feb-2014 06:44, Walter Bright пишет:
 On 2/6/2014 2:15 PM, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an exceptional
 case because
 it's expected to happen and the program is expected to handle it (and
 let the
 user know) when it does. That's just a matter of taste though.

 It's not a matter of taste. If your input is subject to a DoS attack,
 don't put exceptions in the control flow.

 Meh. If exceptions are such a liability we'd better make them (much)
 faster.

One simple idea is to statically allocate the same exception and rethrow 
it over and over. After all there's no guarantee a distinct exception is 
thrown every time, and the approach is still memory safe (though it 
might surprise the programmer who saves a reference to an old exception).

Andrei

Feb 07 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, February 07, 2014 16:49:45 Andrei Alexandrescu wrote:
 On 2/7/14, 8:40 AM, Dmitry Olshansky wrote:
 07-Feb-2014 06:44, Walter Bright пишет:
 On 2/6/2014 2:15 PM, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an exceptional
 case because
 it's expected to happen and the program is expected to handle it (and
 let the
 user know) when it does. That's just a matter of taste though.

 
 It's not a matter of taste. If your input is subject to a DoS attack,
 don't put exceptions in the control flow.

 
 Meh. If exceptions are such a liability we'd better make them (much)
 faster.

 
 One simple idea is to statically allocate the same exception and rethrow
 it over and over. After all there's no guarantee a distinct exception is
 thrown every time, and the approach is still memory safe (though it
 might surprise the programmer who saves a reference to an old exception).

As long as exceptions are cloneable, and people are aware of the fact that 
they tend to be non-unique, then it can be common practice to clone/dup an 
exception when you need to keep it around. However, the two potential problems 
with this overall approach are

1. Do we just always allocate one of each exception type per thread (probably 
in a static constructor for that exception type)? That would result in a fair 
number of exceptions being allocated up front. The obvious alternative would 
be to allocate it the first time that it's thrown so that you only end up with 
exceptions that get used being allocated, but regardless, we need to take 
close look at the allocation scheme.

2. This sort of thing has a definite impact on enforce and any idioms related 
to it. We'd need to either adjust enforce, enforceEx, etc. to avoid the 
allocation, or we'd need to introduce alternatives to them that expect 
something like a static opCall on the exception type which returns the common 
exception for that type or some other standard means of getting at the 
reusable exception.

Regardless, we need to agree upon a standard way to define exception types 
allow with some set of standard idioms for handling them such that we can deal 
with exceptions generically (particularly with regards to stuff like enforce) 
rather than it being an ad-hoc per-exception type thing that you can't 
reasonably rely on.

- Jonathan M Davis

Feb 07 2014

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Saturday, 8 February 2014 at 00:49:46 UTC, Andrei Alexandrescu 
wrote:
 One simple idea is to statically allocate the same exception 
 and rethrow it over and over. After all there's no guarantee a 
 distinct exception is thrown every time, and the approach is 
 still memory safe (though it might surprise the programmer who 
 saves a reference to an old exception).

 Andrei

I don't think it's that simple. What happens if an XException 
causes another XException and they need to be chained together?

Feb 08 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

08-Feb-2014 15:02, Jakob Ovrum пишет:
 On Saturday, 8 February 2014 at 00:49:46 UTC, Andrei Alexandrescu wrote:
 One simple idea is to statically allocate the same exception and
 rethrow it over and over. After all there's no guarantee a distinct
 exception is thrown every time, and the approach is still memory safe
 (though it might surprise the programmer who saves a reference to an
 old exception).

 Andrei

 I don't think it's that simple. What happens if an XException causes
 another XException and they need to be chained together?

If both are thread-local and cached I see no problem whatsoever.
The thing is the current "default" of creating exception is AWFUL.
And D stands for sane defaults and the simple path being good last time 
I checked.

-- 
Dmitry Olshansky

Feb 08 2014

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky 
wrote:>
 If both are thread-local and cached I see no problem whatsoever.
 The thing is the current "default" of creating exception is 
 AWFUL.
 And D stands for sane defaults and the simple path being good 
 last time I checked.

How is it not a problem? XException's fields (message, location 
etc) would be overwritten by the latest throw site, and its 
`next` field would point to itself.

Feb 08 2014

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, February 08, 2014 11:17:25 Jakob Ovrum wrote:
 On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky
 wrote:>
 
 If both are thread-local and cached I see no problem whatsoever.
 The thing is the current "default" of creating exception is
 AWFUL.
 And D stands for sane defaults and the simple path being good
 last time I checked.

 
 How is it not a problem? XException's fields (message, location
 etc) would be overwritten by the latest throw site, and its
 `next` field would point to itself.

Then we have multiple of them, or we new up another one when a second one is 
needed. Even if it were only the first exception which avoided the allocation, 
it would be a big gain, and in most cases, you're only going to get a single 
exception, or the exceptions will be of different types.

- Jonathan M Davis

Feb 08 2014

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Saturday, 8 February 2014 at 11:27:27 UTC, Jonathan M Davis 
wrote:
 On Saturday, February 08, 2014 11:17:25 Jakob Ovrum wrote:
 On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky
 wrote:>
 
 If both are thread-local and cached I see no problem 
 whatsoever.
 The thing is the current "default" of creating exception is
 AWFUL.
 And D stands for sane defaults and the simple path being good
 last time I checked.

 
 How is it not a problem? XException's fields (message, location
 etc) would be overwritten by the latest throw site, and its
 `next` field would point to itself.

 Then we have multiple of them, or we new up another one when a 
 second one is
 needed. Even if it were only the first exception which avoided 
 the allocation,
 it would be a big gain, and in most cases, you're only going to 
 get a single
 exception, or the exceptions will be of different types.

 - Jonathan M Davis

Yes, I'm sure there is a cool solution, I'm just pointing out 
that it's not as simple as statically allocating.

I think it would be a nice exercise to compose such a solution 
with std.allocator.

Feb 08 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 08 Feb 2014 11:33:51 +0000
schrieb "Jakob Ovrum" <jakobovrum gmail.com>:

 On Saturday, 8 February 2014 at 11:27:27 UTC, Jonathan M Davis 
 wrote:
 On Saturday, February 08, 2014 11:17:25 Jakob Ovrum wrote:
 On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky
 wrote:>
 
 If both are thread-local and cached I see no problem 
 whatsoever.
 The thing is the current "default" of creating exception is
 AWFUL.
 And D stands for sane defaults and the simple path being good
 last time I checked.

 
 How is it not a problem? XException's fields (message, location
 etc) would be overwritten by the latest throw site, and its
 `next` field would point to itself.

 Then we have multiple of them, or we new up another one when a 
 second one is
 needed. Even if it were only the first exception which avoided 
 the allocation,
 it would be a big gain, and in most cases, you're only going to 
 get a single
 exception, or the exceptions will be of different types.

 - Jonathan M Davis

 
 Yes, I'm sure there is a cool solution, I'm just pointing out 
 that it's not as simple as statically allocating.
 
 I think it would be a nice exercise to compose such a solution 
 with std.allocator.

Yes, it doesn't seem feasible otherwise. Since you can call
functions recursively you could potentially chain exceptions
from the same line of code several times.

  catch (Exception e)
  {
      staticException.line = __LINE__;
      staticException.file = __FILE__;
      staticException.next = e;  // e.next is staticException
      throw staticException;
  }

You'd have to flag staticException as "in use" and spawn a new
instance every time you need another one of the same type.
Since there is no way to reset that flag automatically when
the last user goes out of scope (i.e. ref counting), that's
not even an option.

Preallocated exceptions only work if you are confident your
exception wont be recursively thrown and thereby chained to
itself. Granted, the majority of code, but really too much
cognitive load when writing exception handling code.

-- 
Marco

Feb 08 2014

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Sunday, 9 February 2014 at 04:38:23 UTC, Marco Leise wrote:
 Yes, it doesn't seem feasible otherwise. Since you can call
 functions recursively you could potentially chain exceptions
 from the same line of code several times.

   catch (Exception e)
   {
       staticException.line = __LINE__;
       staticException.file = __FILE__;
       staticException.next = e;  // e.next is staticException
       throw staticException;
   }

 You'd have to flag staticException as "in use" and spawn a new
 instance every time you need another one of the same type.
 Since there is no way to reset that flag automatically when
 the last user goes out of scope (i.e. ref counting), that's
 not even an option.

 Preallocated exceptions only work if you are confident your
 exception wont be recursively thrown and thereby chained to
 itself. Granted, the majority of code, but really too much
 cognitive load when writing exception handling code.

While writes directly to line and file and such can't be 
prevented, `next` could be implemented as a property that does 
the conditional .dup when assigned to itself (or throw an Error).

Feb 09 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Saturday, 8 February 2014 at 11:17:26 UTC, Jakob Ovrum wrote:
 On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky 
 wrote:>
 If both are thread-local and cached I see no problem 
 whatsoever.
 The thing is the current "default" of creating exception is 
 AWFUL.
 And D stands for sane defaults and the simple path being good 
 last time I checked.

 How is it not a problem? XException's fields (message, location 
 etc) would be overwritten by the latest throw site, and its 
 `next` field would point to itself.

It's supposedly one exception instance per place where it can be 
thrown, not per exception type. Then the problem would be 
restricted to recursive calls, where in the exception handler for 
XException, another XException is thrown.

Feb 09 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/8/14, 3:02 AM, Jakob Ovrum wrote:
 On Saturday, 8 February 2014 at 00:49:46 UTC, Andrei Alexandrescu wrote:
 One simple idea is to statically allocate the same exception and
 rethrow it over and over. After all there's no guarantee a distinct
 exception is thrown every time, and the approach is still memory safe
 (though it might surprise the programmer who saves a reference to an
 old exception).

 Andrei

 I don't think it's that simple. What happens if an XException causes
 another XException and they need to be chained together?

The chaining method detects that and .dup's one of them.

Andrei

Feb 08 2014

"Dicebot" <public dicebot.lv> writes:

On Saturday, 8 February 2014 at 16:50:53 UTC, Andrei Alexandrescu 
wrote:
 The chaining method detects that and .dup's one of them.

 Andrei

After some thinking I don't think it actually helps - exception 
will be modified _before_ throwing in library code so cloning 
will be to late.

But I don't see any reason why basic exception instances in 
Phobos can't be made immutable.

Feb 08 2014

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Saturday, 8 February 2014 at 16:50:53 UTC, Andrei Alexandrescu 
wrote:
 On 2/8/14, 3:02 AM, Jakob Ovrum wrote:
 On Saturday, 8 February 2014 at 00:49:46 UTC, Andrei 
 Alexandrescu wrote:
 One simple idea is to statically allocate the same exception 
 and
 rethrow it over and over. After all there's no guarantee a 
 distinct
 exception is thrown every time, and the approach is still 
 memory safe
 (though it might surprise the programmer who saves a 
 reference to an
 old exception).

 Andrei

 I don't think it's that simple. What happens if an XException 
 causes
 another XException and they need to be chained together?

 The chaining method detects that and .dup's one of them.

 Andrei

What if the statically allocated XException is escaped to be 
inspected later, but before that is thrown again in a separate 
exception chain?

I suppose it would be no different from the current situation, as 
it's legal to throw exceptions allocated in any fashion, so there 
is already no guarantee of uniqueness. It's probable that some 
code out there still takes exception uniqueness for granted, so 
changing the allocation scheme would be a (typically silent) 
breaking change, even if the code is arguably broken in the first 
place. I suppose we could make that breakage a compile error by 
making exceptions implicitly `scope` at the catch-site, but that 
would of course be a much more involved change...

Personally I still like the idea, but if implemented, I think 
something should be done about the change in uniqueness at the 
same time, even if it's just an added note in the language 
documentation on exceptions.

Feb 08 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:
 Meh. If exceptions are such a liability we'd better make them (much) faster.

They can be made faster by slowing down non-exception code.

This has been debated at length in the C++ community, and the generally
accepted 
answer is that non-exception code performance is preferred and exception 
performance is thrown under the bus in order to achieve it.

I think it's quite a reasonable conclusion.

Feb 08 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 08 Feb 2014 14:01:12 -0800
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:
 Meh. If exceptions are such a liability we'd better make them (much) faster.

 
 They can be made faster by slowing down non-exception code.
 
 This has been debated at length in the C++ community, and the generally
accepted 
 answer is that non-exception code performance is preferred and exception 
 performance is thrown under the bus in order to achieve it.
 
 I think it's quite a reasonable conclusion.

https://yourlogicalfallacyis.com/black-or-white

The reasons for slow exceptions in D could be the generation
of stack trace strings or the garbage collector instead of
inherent trade offs to keep the successful code path fast.

And static allocation isn't an exactly appealing option...

  throw staticException ? staticException : (staticException =
  new SomethingException("Don't do this at home kids!"));

and practically out of question when you need to chain
exceptions and your call stack could contain this line of code
more than once, resulting in infinite loops in exception
chains as a new bug type in D, that is fixed by writing:

  catch (Exception e) {
      throw (staticException ? (e.linksTo(staticException) ?
staticException.dupThenWrap(e) : staticException) : (staticException = new
SomethingException("Don't do this at home kids!"));
  }

-- 
Marco

Feb 08 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/8/2014 9:00 PM, Marco Leise wrote:
 The reasons for slow exceptions in D could be the generation
 of stack trace strings or the garbage collector instead of
 inherent trade offs to keep the successful code path fast.

Sigh, once again,

1. It is not the collector

2. I've implemented it both ways, I know what I'm talking about. You can see
the 
fast exception way in the Win32 code generation, and the slow way in the Linux 
code generation.

Feb 08 2014

Marco Leise <Marco.Leise gmx.de> writes:

Content-Disposition: inline

Am Sat, 08 Feb 2014 14:01:12 -0800
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:
 Meh. If exceptions are such a liability we'd better make them (much) faster.

 
 They can be made faster by slowing down non-exception code.
 
 This has been debated at length in the C++ community, and the generally
accepted 
 answer is that non-exception code performance is preferred and exception 
 performance is thrown under the bus in order to achieve it.
 
 I think it's quite a reasonable conclusion.

Am Sat, 08 Feb 2014 21:31:53 -0800
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 2/8/2014 9:00 PM, Marco Leise wrote:
 The reasons for slow exceptions in D could be the generation
 of stack trace strings or the garbage collector instead of
 inherent trade offs to keep the successful code path fast.

 
 Sigh, once again,
 
 1. It is not the collector

 2. I've implemented it both ways, I know what I'm talking about. You can see
the 
 fast exception way in the Win32 code generation, and the slow way in the Linux 
 code generation.

Ok, I'm on Linux which should be inherently slower at
throwing exceptions as you say. So I've written a little test
and it shows two things:
1. You are right, about the collector. It is not the
bottleneck.
2. It doesn't have anything to do with trading speed for the
   successful code path either.

I called two functions recursively until a nesting depth of
1000. The first version allocates a new exception, the second
one reuses an existing exception. At the call site I caught
the exception. I did this 10_000 times in a loop.
[The code is attached.]

Even at this nesting depth the second version still
outperformed the first one by a factor of ~200(!) and all
the CPU time (>98%) was is spent somewhere in libc.

Using static exceptions (or similarly in C++: throwing
literal strings) is VERY fast in D already and I see no reason
to improve that at the moment.

So I repeat my point:

The reasons for slow exceptions in D could be the generation
of stack trace strings or anything else other than some
inherent trade offs to keep the successful code path fast.

-- 
Marco

Feb 08 2014

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Sunday, 9 February 2014 at 05:00:15 UTC, Marco Leise wrote:
 And static allocation isn't an exactly appealing option...

   throw staticException ? staticException : (staticException =
   new SomethingException("Don't do this at home kids!"));

 and practically out of question when you need to chain
 exceptions and your call stack could contain this line of code
 more than once, resulting in infinite loops in exception
 chains as a new bug type in D, that is fixed by writing:

   catch (Exception e) {
       throw (staticException ? (e.linksTo(staticException) ? 
 staticException.dupThenWrap(e) : staticException) : 
 (staticException = new SomethingException("Don't do this at 
 home kids!"));
   }

This doesn't seem like a valid concern. Nothing stops you from
using a (standard) function to do that ugly boilerplate.

Feb 09 2014

"Lars T. Kyllingstad" <public kyllingen.net> writes:

On Sunday, 9 February 2014 at 05:00:15 UTC, Marco Leise wrote:
 https://yourlogicalfallacyis.com/black-or-white

Off topic, but that is a fantastic web site.  I wish I had known
about it before.

Feb 09 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/8/14, 9:00 PM, Marco Leise wrote:
 Am Sat, 08 Feb 2014 14:01:12 -0800
 schrieb Walter Bright <newshound2 digitalmars.com>:

 On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:
 Meh. If exceptions are such a liability we'd better make them (much) faster.

 They can be made faster by slowing down non-exception code.

 This has been debated at length in the C++ community, and the generally
accepted
 answer is that non-exception code performance is preferred and exception
 performance is thrown under the bus in order to achieve it.

 I think it's quite a reasonable conclusion.

 https://yourlogicalfallacyis.com/black-or-white

 The reasons for slow exceptions in D could be the generation
 of stack trace strings or the garbage collector instead of
 inherent trade offs to keep the successful code path fast.

This threads is about memory allocation, not exceptions being slow.

 And static allocation isn't an exactly appealing option...

    throw staticException ? staticException : (staticException =
    new SomethingException("Don't do this at home kids!"));

Function calls could do that.


Andrei

Feb 09 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Thursday, 6 February 2014 at 22:15:11 UTC, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an 
 exceptional case because it's expected to happen and the 
 program is expected to handle it (and let the user know) when 
 it does. That's just a matter of taste though.

Hmm... then what _does_ qualify as exceptional in your opinion?

A logic error (i.e. a mistake on the programmers side) doesn't, 
IMO, it should abort instead. On the other hand, there is the 
class of situations where e.g. a system call returns an error 
(say, "permission denied" when opening a file, or out of disk 
space). Or more generally, an external service, like a database 
or a remote server. However, I can't see how these are 
fundamentally different from invalid user input, and indeed, 
there's often not even a clear separation, e.g. when a user asked 
you to read a file they don't have access to.

So, what's left then?

Feb 07 2014

"Dicebot" <public dicebot.lv> writes:

On Friday, 7 February 2014 at 14:26:48 UTC, Marc Schütz wrote:
 Hmm... then what _does_ qualify as exceptional in your opinion?

 A logic error (i.e. a mistake on the programmers side) doesn't, 
 IMO, it should abort instead. On the other hand, there is the 
 class of situations where e.g. a system call returns an error 
 (say, "permission denied" when opening a file, or out of disk 
 space). Or more generally, an external service, like a database 
 or a remote server. However, I can't see how these are 
 fundamentally different from invalid user input, and indeed, 
 there's often not even a clear separation, e.g. when a user 
 asked you to read a file they don't have access to.

 So, what's left then?

It is exceptional situation if input is supposed to be valid but 
surprisingly is not. For example, calling `decodeGrapheme` on 
external string without making sure it is valid first. Same goes 
for file - trying open a missing file is exceptional, but 
checking for file presence is not.

Feb 07 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 7 February 2014 at 14:42:18 UTC, Dicebot wrote:
 On Friday, 7 February 2014 at 14:26:48 UTC, Marc Schütz wrote:
 Hmm... then what _does_ qualify as exceptional in your opinion?

 A logic error (i.e. a mistake on the programmers side) 
 doesn't, IMO, it should abort instead. On the other hand, 
 there is the class of situations where e.g. a system call 
 returns an error (say, "permission denied" when opening a 
 file, or out of disk space). Or more generally, an external 
 service, like a database or a remote server. However, I can't 
 see how these are fundamentally different from invalid user 
 input, and indeed, there's often not even a clear separation, 
 e.g. when a user asked you to read a file they don't have 
 access to.

 So, what's left then?

 It is exceptional situation if input is supposed to be valid 
 but surprisingly is not. For example, calling `decodeGrapheme` 
 on external string without making sure it is valid first.

If the function expects it to be valid but you pass it an invalid 
value, you're breaking the contract, which is a logic error and 
thus should be checked for by assert, not by an exception.
=> Case number one: logic errors, no exceptions should be used 
here.

If however the function doesn't require it to be valid (for 
`decodeGrapheme` the docs don't say anything, so I assume it 
doesn't), then it needs to be able to handle invalid input, for 
example by throwing an exception.
=> This is an example of case number two: user errors, exceptions 
are okay here.

But Brad Anderson seems to disagree on case two (or maybe case 
one?). Or is there a third type of situation not covered by these 
two cases?

 Same goes for file - trying open a missing file is exceptional, 
 but checking for file presence is not.

I agree here, checking for presence is not exceptional.

Feb 07 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 7 February 2014 at 14:26:48 UTC, Marc Schütz wrote:
 or a remote server. However, I can't see how these are 
 fundamentally different from invalid user input, and indeed, 
 there's often not even a clear separation, e.g. when a user 
 asked you to read a file they don't have access to.

I agree. Any situation where it makes sense to say:

"Ouch, this is not going to work out, roll back, roll back, let's 
move out of this module! We need to try a different approach. We 
are not going to continue with anything productive down this 
lane, lets go back to the context and get into a new direction."

is suitable for exceptions and it makes code reuse, evolution and 
modification to error reporting easy.

- validation and veracity checking
- authentication failures
- database failures
- transactional retries
- serious allocation issues
- timeouts

are all fiiine for exceptions.

You get to write a request handler like this:

{
   auto sid = request.authenticate();
   auto data = 
validator(request.getPost('label1','label2','label3'));
   auto key = model.create_and_put(sid,data);
   response.writeJson(key);
   response.status = 201;
   return;
}

And you can change the error reporting at the request dispatcher 
level rather than sifting through 20 different spaghetti-like 
request handlers trying to figure out if you got it right:

{
   auto sid = request.authenticate();
   if (sid<0){
       ... return ...;
   }
   auto data = request.getPost('label1','label2','label3');
   if (data){
      data = validate(data);
      if (data){
         auto key = model.create_and_put(sid,data);
         if (item){
            auto ok = response.writeJson(key);
            if(ok){
               response.status = 201;
               return;
            }
            ....;
         } else {
            .... ;
         }
      } else {
         .... ;
      }
   } else {
     ... ;
   }
}

Feb 07 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, February 07, 2014 14:26:47 Marc Schütz" 
<schuetzm gmx.net> puremagic.com wrote:
 On Thursday, 6 February 2014 at 22:15:11 UTC, Brad Anderson wrote:
 Personally I don't think bad user input qualifies as an
 exceptional case because it's expected to happen and the
 program is expected to handle it (and let the user know) when
 it does. That's just a matter of taste though.

 
 Hmm... then what _does_ qualify as exceptional in your opinion?

Honestly, I think that the typical approach of discussing exceptions as being 
for "exceptional" circumstances is bad. It inevitably leads to confusion and 
debate over what "exceptional" means. Some programmers would consider that to 
mean any bad input, whereas others would take it to the extreme that they 
should only happen when your program is in an invalid state (essentially what 
we use Errors for). I've found rather that when discussing exceptions it works 
much better to explain exactly why you'd use them, and I think that that comes 
primarily down to three types of circumstances.

1. Code which which should succeed most of the time and which would be far 
cleaner if it's written to throw exceptions - particularly when the 
alternative would be to check error codes on every function call (which would 
be incredibly error-prone). A prime example of this would be a parser. It's 
far cleaner to write a parser which assumes that each step succeeds than it is 
to constantly check that each one succeeded. It makes it so that only code 
that could actually encounter an error has to check for it and so that it can 
easily and cleanly propagate the error to the top. Doing that with error codes 
would generally be a mess, and unless failure is the norm, efficiency 
shouldn't be a problem.

2. Code which you can't actually guarantee will ever succeed. There are some 
cases where you can avoid errors by doing validation before proceeding (e.g. 
testing strings for Unicode correctness before doing a lot of string 
processing), but there are others where you either can't validate ahead of 
time or where you could still end up with an error in spite of your 
validation. A prime example of this would be operating on files. For, 
instance, std.file.isDir will tell you whether a particular file is directory 
or not by returning bool. If that file does not actually exist, then what is 
isDir supposed to do? All it can do is throw an exception, unless you want to 
have a separate out parameter to report whether it succeeded or not or change 
it so that it returns an error code and returns the bool as out parameter, 
both of which would make it much uglier to use. And isDir can't assert that 
the file exists, because that's a runtime condition that cannot be fully 
verified ahead of time. You can (and should) check whether the file exists 
first

if(file.exists)
{
 if(file.isDir)
 {}
 else if(file.isFile)
 {}
 else
 {}
}

but the file system could actually delete that file right out from under you 
between the call to exists and the call to isDir (or between the calls to 
isDir and isFile), so validation reduces how often you hit the error case but 
cannot eliminate it. It should also be rare that isDir will fail (since you 
should be checking that the file exists first). So, throwing an exception 
makes perfect sense. You get clean code that's still able to handle error 
cases rather than them being ignored (as frequently happens with error codes).

3. Code which should succeed most of the time but where doing validation 
essentially requires doing what you're validating for anyway. Again, parsers 
are a good example of this. For instance, to validate that 
"2013-12-22T01:22:27z" is in the valid ISO extended string format for a 
timestamp, you have to do pretty much exactly the same work that you have to 
do to parse out all of the values to convert it to something other than a 
string (e.g. SysTime). So, if you validated it first, you'd be doing the work 
twice. As such, why validate first? Just have it throw an exception when the 
parsing fails. And if for some reason, you expect that there's a high chance 
that the parsing would fail, then you can have a function which returns an 
error code and passed out the result as an out parameter instead, but that 
makes the code much uglier and error-prone. So, in most cases, you'd want it 
to throw an exception on failure. But regardless, you wouldn't want to 
validate it first as that would just be expensive all the time rather than 
more expensive in the (hopefully) rare error case.


The areas that you want to normally avoid exceptions are when you're 
validating up front or when the error condition is likely. If you're 
validating, you're normally asking a question - is this data valid - in which 
case, returning bool is the correct thing to do, not throwing on failure 
(though if the result is false, the caller could choose to throw if 
appropriate). And trying to do something which has a good chance of failing 
should probably return whether it succeeded or not, because you don't want 
exceptions to be your normal code path.

Also, performance-critical stuff may need to go the error-code path rather 
than exceptions simply due to it being performance-critical, but in general, 
error conditions which aren't bugs in your program should be reported via 
exceptions (not error codes) with validation being used where appropriate to 
make it so that the error conditions are infrequent.

- Jonathan M Davis

Feb 07 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Jonathan M Davis:

 3. Code which should succeed most of the time but where doing 
 validation
 essentially requires doing what you're validating for anyway. 
 Again, parsers
 are a good example of this. For instance, to validate that
 "2013-12-22T01:22:27z" is in the valid ISO extended string 
 format for a
 timestamp, you have to do pretty much exactly the same work 
 that you have to
 do to parse out all of the values to convert it to something 
 other than a
 string (e.g. SysTime). So, if you validated it first, you'd be 
 doing the work
 twice. As such, why validate first? Just have it throw an 
 exception when the
 parsing fails. And if for some reason, you expect that there's 
 a high chance
 that the parsing would fail, then you can have a function which 
 returns an
 error code and passed out the result as an out parameter 
 instead, but that
 makes the code much uglier and error-prone. So, in most cases, 
 you'd want it
 to throw an exception on failure.

Languages with a good type system solve this with Maybe / 
Nullable / Optional and similar things. It's both safe (and 
efficient if the result is equivalent to just a wapping struct).

Bye,
bearophile

Feb 07 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, February 07, 2014 21:27:04 bearophile wrote:
 Jonathan M Davis:
 3. Code which should succeed most of the time but where doing
 validation
 essentially requires doing what you're validating for anyway.
 Again, parsers
 are a good example of this. For instance, to validate that
 "2013-12-22T01:22:27z" is in the valid ISO extended string
 format for a
 timestamp, you have to do pretty much exactly the same work
 that you have to
 do to parse out all of the values to convert it to something
 other than a
 string (e.g. SysTime). So, if you validated it first, you'd be
 doing the work
 twice. As such, why validate first? Just have it throw an
 exception when the
 parsing fails. And if for some reason, you expect that there's
 a high chance
 that the parsing would fail, then you can have a function which
 returns an
 error code and passed out the result as an out parameter
 instead, but that
 makes the code much uglier and error-prone. So, in most cases,
 you'd want it
 to throw an exception on failure.

 
 Languages with a good type system solve this with Maybe /
 Nullable / Optional and similar things. It's both safe (and
 efficient if the result is equivalent to just a wapping struct).

That can be a good solution, but it also then requires checking the result. 
One of the big advantages of exceptions is that your code can not care except 
for the relatively few points that catch exceptions and handle them. Where you 
run into problems is when the failure case is likely. And if that's the case, 
then something like Maybe or Nullable is definitely better.

- Jonathan M Davis

Feb 07 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Thursday, 6 February 2014 at 21:38:03 UTC, Dicebot wrote:
 Any application that operates on some external user input will 
 be subject to DoS attack vector if it uses Phobos directly.

Hmm, I hadn't considered that. Maybe exceptions could be handled 
automatically though due to the facts that there are rarely more 
than one in flight at any time and they typically don't live for 
long:

1) prohibit escaping of exception objects from catch blocks (we 
could just say it is undefined behavior in the spec). The data 
pointed to by the throwable object should be normal though, if 
you want to keep the exception, you can thus just shallow copy it.

2) Set aside a static (thread local) buffer early on with a size 
of like 512 bytes.

3) Make "throw new" call a special function which favors the 
static buffer. It can do a simple bump-the-pointer allocation in 
the static region or call the regular GC if there isn't enough 
room (should be extremely rare).

throw e; works the same way it does now. You can pre-allocate 
with some other method if you want.

4) Have the compiler automatically insert a call to 
_d_free_exception in a scope(success) block inside every catch 
block. It checks the given reference, if it is in the static 
buffer, just zero it all out. If all the chain is in there, 
zeroing it will free it all. If there's any GC chained 
exceptions, zeroing it will orphan them and they'll be freed on 
the next sweep. Otherwise ... well do nothing, let the GC clean 
up after it.



Proof of concept:


bool isThrowable(const ClassInfo ci) {
     if(ci is null) return false;
     if(ci is typeid(Throwable)) return true;
     return isThrowable(ci.base);
}

byte[512] exceptionHolder = 0;
size_t exceptionHolderPosition = 0;

extern(C)
     Object _d_newclass(const ClassInfo ci) {
         if(!isThrowable(ci))
             return _d_newclass_original(ci);

         auto size = ci.init.length;
         if(exceptionHolderPosition + size > 
exceptionHolder.length)
             return _d_newclass_original(ci);

         byte[] slice = exceptionHolder[exceptionHolderPosition .. 
exceptionHolderPosition + size];
         exceptionHolderPosition += size;

         slice[] = ci.init[];
         import core.stdc.stdio;
         printf("Magic allocation to %d\n", 
exceptionHolderPosition);
         return cast(Object) slice.ptr;
     }

extern(C)
     void _d_freeexception(Throwable t) {
         auto ptr = cast(void*) t;
         if(ptr >= exceptionHolder.ptr && ptr < 
exceptionHolder.ptr + exceptionHolder.length) {
             exceptionHolder[] = 0;
             exceptionHolderPosition = 0;
             import core.stdc.stdio;
             printf("Freeing\n");
         }
         // else do nothing, the GC will handle it
     }

void main() {
     import std.stdio;
     try {
         writefln("%s"); // orphaned argument
     } catch(Exception e) {
         scope(success) _d_freeexception(e);
         writeln(e);
     }
}

// copy/paste from druntime as fallback
extern (C) void onOutOfMemoryError();
extern (C) void*  gc_malloc( size_t sz, uint ba = 0 );
extern (C) Object _d_newclass_original(const ClassInfo ci)
{
     import core.stdc.stdlib;
     static import core.memory;
     alias BlkAttr = core.memory.GC.BlkAttr;


     void* p;

     if (ci.m_flags & TypeInfo_Class.ClassFlags.isCOMclass)
     {
         p = malloc(ci.init.length);
         if (!p)
             onOutOfMemoryError();
     }
     else
     {
         // TODO: should this be + 1 to avoid having pointers to 
the next block?
         BlkAttr attr = BlkAttr.FINALIZE;
         // extern(C++) classes don't have a classinfo pointer in 
their vtable so the GC can't finalize them
         if (ci.m_flags & TypeInfo_Class.ClassFlags.isCPPclass)
             attr &= ~BlkAttr.FINALIZE;
         if (ci.m_flags & TypeInfo_Class.ClassFlags.noPointers)
             attr |= BlkAttr.NO_SCAN;
         p = gc_malloc(ci.init.length, attr);
     }

     // initialize it
     (cast(byte*) p)[0 .. ci.init.length] = ci.init[];

     debug(PRINTF) printf("initialization done\n");
     return cast(Object) p;
}

===

Just compile and run normally, the linker will prefer our 
d_newclass to the one in phobos.lib automatically.

And you'll see the throw from writeln went into our static buffer 
and was freed at the end.

I toyed with a few other things too:

void main() {
     import std.stdio;
     try {
         try {
             writefln("%s"); // orphaned argument
         } catch(Exception e) {
             scope(success) _d_freeexception(e); // don't forget 
these
             throw new Exception("LOL", e);
         }
     } catch(Exception e) {
         scope(success) _d_freeexception(e);
         writeln(e);
         writeln(e.next);
     }
}

still works.



Am I missing a fatal flaw here? It seems to work and is kinda 
simple to do... exceptions don't really need a huge amount of 
dynamic memory.

Feb 06 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Thursday, 6 February 2014 at 22:56:45 UTC, Adam D. Ruppe wrote:
 Proof of concept:

code in a link so the lines aren't broken
http://arsdnet.net/dcode/except.d

Feb 06 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Thursday, 6 February 2014 at 22:56:45 UTC, Adam D. Ruppe wrote:
 On Thursday, 6 February 2014 at 21:38:03 UTC, Dicebot wrote:
 Any application that operates on some external user input will 
 be subject to DoS attack vector if it uses Phobos directly.

 Hmm, I hadn't considered that. Maybe exceptions could be 
 handled automatically though due to the facts that there are 
 rarely more than one in flight at any time and they typically 
 don't live for long:
 [snipped lengthy example]

I really like vibe.d.  A lot.  But the way HTTP parse errors are 
handled is a disaster.  Do you know what happened when I was 
testing vibe.d recently and I sent it a bad request?  It sent a 
stack trace as a responses.  A stack trace!  To a client!  I was 
speechless.  Needless to say, I don't support the idea of further 
enabling this design, regardless of whether it can be made a 
pinnacle of elegance.

Feb 06 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 7 February 2014 at 03:19:32 UTC, Sean Kelly wrote:
 It sent a stack trace as a responses.  A stack trace!  To a 
 client!  I was speechless.

lol, my cgi.d will do that too if you compile with -debug.... I 
find it convenient at times. (It also sends it to stderr but when 
doing cgi apps, that means digging into the apache log which is a 
pain compared to just looking at the browser)

Feb 06 2014

Jacob Carlborg <doob me.com> writes:

On 2014-02-07 04:19, Sean Kelly wrote:

 I really like vibe.d.  A lot.  But the way HTTP parse errors are handled
 is a disaster.  Do you know what happened when I was testing vibe.d
 recently and I sent it a bad request?  It sent a stack trace as a
 responses.  A stack trace!  To a client!  I was speechless.  Needless to
 say, I don't support the idea of further enabling this design,
 regardless of whether it can be made a pinnacle of elegance.

Ruby on Rails renders a page with a stack trace in development mode and 
a standard 500 page in production mode. I can't understand how anyone 
can do web development without that. There's even a plugin that renders 
a the stack trace as links pointing back to your editor (if supported). 
It also allows you to navigate the stack trace with a code snippet and 
simple debugger for each stack frame. Very convenient.

-- 
/Jacob Carlborg

Feb 07 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Friday, 7 February 2014 at 20:31:00 UTC, Jacob Carlborg wrote:
 On 2014-02-07 04:19, Sean Kelly wrote:

 I really like vibe.d.  A lot.  But the way HTTP parse errors 
 are handled
 is a disaster.  Do you know what happened when I was testing 
 vibe.d
 recently and I sent it a bad request?  It sent a stack trace 
 as a
 responses.  A stack trace!  To a client!  I was speechless.  
 Needless to
 say, I don't support the idea of further enabling this design,
 regardless of whether it can be made a pinnacle of elegance.

 Ruby on Rails renders a page with a stack trace in development 
 mode and a standard 500 page in production mode. I can't 
 understand how anyone can do web development without that. 
 There's even a plugin that renders a the stack trace as links 
 pointing back to your editor (if supported). It also allows you 
 to navigate the stack trace with a code snippet and simple 
 debugger for each stack frame. Very convenient.

I was mostly surprised that the stack trace was written back to
the client.  I'd expect something like that in a log on the
server side.  I do see how it would be convenient to have a stack
trace included in a bug report, but if this feature is disabled
in release mode then you can't rely on it anyway.  I'd just
always be checking the logs (where I'd hope the stack trace would
always be written).

Feb 07 2014

Jacob Carlborg <doob me.com> writes:

On 2014-02-07 21:56, Sean Kelly wrote:

 I was mostly surprised that the stack trace was written back to
 the client.  I'd expect something like that in a log on the
 server side.  I do see how it would be convenient to have a stack
 trace included in a bug report, but if this feature is disabled
 in release mode then you can't rely on it anyway.  I'd just
 always be checking the logs (where I'd hope the stack trace would
 always be written).

Ruby on Rails always writes the stack trace to the log. In development 
mode it will also render it to the client. In production mode we use a 
plugin that sends an email when an exception occurs. The email will 
contain the full stack trace, environment variables and some other data 
about the request that failed.

BTW, you can do a lot more with HTML than plain text (log files).

-- 
/Jacob Carlborg

Feb 09 2014

"Brad Anderson" <eco gnuk.net> writes:

On Thursday, 6 February 2014 at 21:38:03 UTC, Dicebot wrote:
 Hardly so. Any exception allocation can trigger GC collection 
 cycle and Phobos does not provide any other way to handle data 
 errors. Any application that operates on some external user 
 input will be subject to DoS attack vector if it uses Phobos 
 directly.

Thinking about this more it'd probably be a good idea to use the
type system to segregate non-validated user input from the rest
of your program. UnvalidatedString or something.
UnvalidatedString.validate() returns a string you can then use in
the regular fashion. That way unvalidated data can't weasel its
way into the trusted portion of your program without getting
checked first. Anyway, that's just an idea (and getting further
and further off topic).

Feb 06 2014

"Dicebot" <public dicebot.lv> writes:

On Friday, 7 February 2014 at 05:25:26 UTC, Brad Anderson wrote:
 Thinking about this more it'd probably be a good idea to use the
 type system to segregate non-validated user input from the rest
 of your program. UnvalidatedString or something.
 UnvalidatedString.validate() returns a string you can then use 
 in
 the regular fashion. That way unvalidated data can't weasel its
 way into the trusted portion of your program without getting
 checked first. Anyway, that's just an idea (and getting further
 and further off topic).

Yes, I even had some simple proof-of-concept drafts of such 
approach for vibe.d but have never finished it. User input is not 
a problem if Phobos will provide more strongly typed  nothrow 
tools.

Feb 07 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 7 February 2014 at 11:06:47 UTC, Dicebot wrote:
 Yes, I even had some simple proof-of-concept drafts of such 
 approach for vibe.d but have never finished it. User input is 
 not a problem if Phobos will provide more strongly typed 
  nothrow tools.

Yeah, I think using separate types for printing to users is often
a good idea too, since then the type system can help with i18n.

Feb 07 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 06 Feb 2014 14:08:39 -0500, Adam D. Ruppe  
<destructionator gmail.com> wrote:

 On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 Hey, wait a second. How do you throw without allocating?

 I think exceptions should be ok. You optimize the typical path, and  
 exceptions are (by definition) an exceptional path. If they are also  
 unacceptable, you could restrict yourself to nothrow functions. (Which  
 can still throw Errors... but meh they are even *more* exceptional)

I think if reference counting is added, exceptions would be a prime  
candidate for using it. They are basically discarded immediately after  
being handled.

-Steve

Feb 06 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei 
 Alexandrescu wrote:
 One interesting point is that module that were written with 
 avoiding
 allocations in mind usually still allocate when throwing 
 exceptions.

 Good point, we need to address that as well.

 Hey, wait a second. How do you throw without allocating?

Does this case even matter?  Exceptions are not a normal function 
of execution, and so should happen rarely to never.  And it's a 
time when I'd expect a delay anyway.

Feb 06 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/14, 11:54 AM, Sean Kelly wrote:
 On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:
 One interesting point is that module that were written with avoiding
 allocations in mind usually still allocate when throwing exceptions.

 Good point, we need to address that as well.

 Hey, wait a second. How do you throw without allocating?

 Does this case even matter?  Exceptions are not a normal function of
 execution, and so should happen rarely to never.  And it's a time when
 I'd expect a delay anyway.

I think it's okay to put this on the backburner and revisit it later.

Andrei

Feb 06 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 6 February 2014 at 19:54:27 UTC, Sean Kelly wrote:
 Does this case even matter?  Exceptions are not a normal 
 function of execution, and so should happen rarely to never.  
 And it's a time when I'd expect a delay anyway.

Imagine intentionally crafted broken utf as user input in 
repeated requests. You don't have control over it.

Now if Phobos would have only thrown exceptions in really 
_exceptional_ situations and handled broken input gracefully...

Feb 06 2014

"Brad Anderson" <eco gnuk.net> writes:

On Thursday, 6 February 2014 at 21:48:13 UTC, Dicebot wrote:
 On Thursday, 6 February 2014 at 19:54:27 UTC, Sean Kelly wrote:
 Does this case even matter?  Exceptions are not a normal 
 function of execution, and so should happen rarely to never.  
 And it's a time when I'd expect a delay anyway.

 Imagine intentionally crafted broken utf as user input in 
 repeated requests. You don't have control over it.

 Now if Phobos would have only thrown exceptions in really 
 _exceptional_ situations and handled broken input gracefully...

You should probably validate utf from all foreign sources. Catch 
a problem with it as it comes in rather than in some arbitrary 
part of your program.

Feb 06 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 6 February 2014 at 22:18:10 UTC, Brad Anderson wrote:
 You should probably validate utf from all foreign sources. 
 Catch a problem with it as it comes in rather than in some 
 arbitrary part of your program.



pure  safe void validate(S)(in S str) if (isSomeString!S);

Throws:
UTFException if str is not well-formed.

;)

Feb 06 2014

"Brad Anderson" <eco gnuk.net> writes:

On Thursday, 6 February 2014 at 22:20:38 UTC, Dicebot wrote:
 On Thursday, 6 February 2014 at 22:18:10 UTC, Brad Anderson 
 wrote:
 You should probably validate utf from all foreign sources. 
 Catch a problem with it as it comes in rather than in some 
 arbitrary part of your program.



 pure  safe void validate(S)(in S str) if (isSomeString!S);

 Throws:
 UTFException if str is not well-formed.

 ;)

Heh, well then... let me just wipe this egg off my face. :P

Feb 06 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Thursday, 6 February 2014 at 22:20:38 UTC, Dicebot wrote:
 On Thursday, 6 February 2014 at 22:18:10 UTC, Brad Anderson 
 wrote:
 You should probably validate utf from all foreign sources. 
 Catch a problem with it as it comes in rather than in some 
 arbitrary part of your program.



 pure  safe void validate(S)(in S str) if (isSomeString!S);

 Throws:
 UTFException if str is not well-formed.

And somewhere in the world, darkness fell forever on a bright and 
beautiful countryside.  The monsters poured forth and devoured 
everything in sight, given strength by that unbelievable 
abomination of a function design.

Feb 06 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 7 February 2014 at 03:14:45 UTC, Sean Kelly wrote:
 On Thursday, 6 February 2014 at 22:20:38 UTC, Dicebot wrote:
 UTFException if str is not well-formed.

 unbelievable abomination of a function design.

Yeah, that is absurd. It is a bad, bad sign when almost every 
time you use a function, you write

bool ok = true;
try validate(s); catch(UTFException) ok = false;
if(!ok) {}

yet that's how i use validate...

fun fact, my little toy scripting language supports
var a = try foo();; // if foo throws, a == the exception object

but it's a toy scripting language, ugly crap is allowed there :)

Feb 06 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/14, 7:27 PM, Adam D. Ruppe wrote:
 On Friday, 7 February 2014 at 03:14:45 UTC, Sean Kelly wrote:
 On Thursday, 6 February 2014 at 22:20:38 UTC, Dicebot wrote:
 UTFException if str is not well-formed.

 unbelievable abomination of a function design.

 Yeah, that is absurd. It is a bad, bad sign when almost every time you
 use a function, you write

 bool ok = true;
 try validate(s); catch(UTFException) ok = false;
 if(!ok) {}

 yet that's how i use validate...

Add a bugzilla and let's define isValid that returns bool!

Andrei

Feb 07 2014

"Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:

On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu 
wrote:
 Add a bugzilla and let's define isValid that returns bool!

Add std.utf.decode() to that as well. IOW, it should have an 
overload which returns a status code but assigns the return value 
through another parameter.

Feb 07 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!

 Add std.utf.decode() to that as well. IOW, it should have an overload
 which returns a status code

Much simpler - it returns a special dchar to designate bad encoding. And 
there is one defined by Unicode spec.


-- 
Dmitry Olshansky

Feb 07 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/7/14, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 Much simpler - it returns a special dchar to designate bad encoding. And
 there is one defined by Unicode spec.

A NaN for chars? Sounds great to me! :)

Feb 07 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

07-Feb-2014 21:07, Andrej Mitrovic пишет:
 On 2/7/14, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 Much simpler - it returns a special dchar to designate bad encoding. And
 there is one defined by Unicode spec.

 A NaN for chars? Sounds great to me! :)

It's called \uFFFD and is specifically for bad encodings. I wonder why 
nobody had perused the spec when writing std.utf.decode in the first 
place...

5.22 Best Practice for U+FFFD Substitution

When converting text from one character encoding to another, a 
conversion algorithm may
encounter unconvertible code units. This is most commonly caused by some 
sort of corruption
of the source data, so that it does not correctly follow the 
specification for that
character encoding. Examples include dropping a byte in a multibyte 
encoding such as
Shift-JIS, improper concatenation of strings, a mismatch between an 
encoding declaration
and actual encoding of text, use of non-shortest form for UTF-8, and so on.

...

Whenever an unconvertible offset is reached during conversion of a code
unit sequence:
1. The maximal subpart at that offset should be replaced by a single
U+FFFD.
2. The conversion should proceed at the offset immediately after the maximal
subpart.
---

Fast, simple and according to the standard. Best of all - no stinkin' 
exceptions! ;)

-- 
Dmitry Olshansky

Feb 07 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/7/2014 12:14 PM, Dmitry Olshansky wrote:
 Fast, simple and according to the standard. Best of all - no stinkin'
 exceptions! ;)

Nice find. Looks good to me.

Feb 08 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

09-Feb-2014 02:16, Walter Bright пишет:
 On 2/7/2014 12:14 PM, Dmitry Olshansky wrote:
 Fast, simple and according to the standard. Best of all - no stinkin'
 exceptions! ;)

 Nice find. Looks good to me.

https://d.puremagic.com/issues/show_bug.cgi?id=12113

-- 
Dmitry Olshansky

Feb 08 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!

 
 Add std.utf.decode() to that as well. IOW, it should have an overload
 which returns a status code

 
 Much simpler - it returns a special dchar to designate bad encoding. And
 there is one defined by Unicode spec.

Isn't that actually worse? Unless you're suggesting that we stop throwing on 
decode errors, then functions like std.array.front will have to check the 
result on every call to see whether it was valid or not and thus whether they 
should throw, which would mean extra overhead over simply having decode throw 
on decode errors. validate has no business throwing, and we definitely should 
add isValidUnicode (or isValid or whatever you want to call it) for validation 
purposes. Code can then call that to validate that a string is valid and not 
worry about any UTFExceptions being thrown as long as it doesn't manipulate 
the string in a way that could result in its Unicode becoming invalid. 
However, I would argue that assuming that everyone is going to validate their 
strings and that pretty much all string-related functions shouldn't ever have 
to worry about invalid Unicode is just begging for subtle bugs all over the 
place IMHO. You're essentially dealing with error codes at that point, and I 
think that experience has shown quite clearly that error codes are generally a 
bad way to go. Almost no one checks them unless they have to. I think that 
having decode throw on invalid Unicode is exactly what it should be doing. The 
problem is that validate shouldn't.

- Jonathan M Davis

Feb 07 2014

"Meta" <jared771 gmail.com> writes:

On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis 
wrote:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei 
 Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!

 
 Add std.utf.decode() to that as well. IOW, it should have an 
 overload
 which returns a status code

 
 Much simpler - it returns a special dchar to designate bad 
 encoding. And
 there is one defined by Unicode spec.

 Isn't that actually worse? Unless you're suggesting that we 
 stop throwing on
 decode errors, then functions like std.array.front will have to 
 check the
 result on every call to see whether it was valid or not and 
 thus whether they
 should throw, which would mean extra overhead over simply 
 having decode throw
 on decode errors. validate has no business throwing, and we 
 definitely should
 add isValidUnicode (or isValid or whatever you want to call it) 
 for validation
 purposes. Code can then call that to validate that a string is 
 valid and not
 worry about any UTFExceptions being thrown as long as it 
 doesn't manipulate
 the string in a way that could result in its Unicode becoming 
 invalid.
 However, I would argue that assuming that everyone is going to 
 validate their
 strings and that pretty much all string-related functions 
 shouldn't ever have
 to worry about invalid Unicode is just begging for subtle bugs 
 all over the
 place IMHO. You're essentially dealing with error codes at that 
 point, and I
 think that experience has shown quite clearly that error codes 
 are generally a
 bad way to go. Almost no one checks them unless they have to. I 
 think that
 having decode throw on invalid Unicode is exactly what it 
 should be doing. The
 problem is that validate shouldn't.

 - Jonathan M Davis

You could always return an Option!char. Nullable won't work 
because it lets you access the naked underlying value.

Feb 07 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, February 07, 2014 23:01:46 Meta wrote:
 On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis
 
 wrote:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei
 
 Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!

 
 Add std.utf.decode() to that as well. IOW, it should have an
 overload
 which returns a status code

 
 Much simpler - it returns a special dchar to designate bad
 encoding. And
 there is one defined by Unicode spec.

 
 Isn't that actually worse? Unless you're suggesting that we
 stop throwing on
 decode errors, then functions like std.array.front will have to
 check the
 result on every call to see whether it was valid or not and
 thus whether they
 should throw, which would mean extra overhead over simply
 having decode throw
 on decode errors. validate has no business throwing, and we
 definitely should
 add isValidUnicode (or isValid or whatever you want to call it)
 for validation
 purposes. Code can then call that to validate that a string is
 valid and not
 worry about any UTFExceptions being thrown as long as it
 doesn't manipulate
 the string in a way that could result in its Unicode becoming
 invalid.
 However, I would argue that assuming that everyone is going to
 validate their
 strings and that pretty much all string-related functions
 shouldn't ever have
 to worry about invalid Unicode is just begging for subtle bugs
 all over the
 place IMHO. You're essentially dealing with error codes at that
 point, and I
 think that experience has shown quite clearly that error codes
 are generally a
 bad way to go. Almost no one checks them unless they have to. I
 think that
 having decode throw on invalid Unicode is exactly what it
 should be doing. The
 problem is that validate shouldn't.
 
 - Jonathan M Davis

 
 You could always return an Option!char. Nullable won't work
 because it lets you access the naked underlying value.

How is that any better than returning an invalid dchar with a specific value? 
In either case, you have to check the value. With the exception, code doesn't 
have to care. If the string is invalid, it'll get a UTFException, and it can 
handle it appropriately, but having to check the return value just adds 
overhead (albeit minimal) and is error-prone, because it generally won't be 
checked (and if it is checked, it complicates the calling code, because it has 
to do the check).

Code that doesn't want to risk a UTFException being thrown can validate up 
front - and that validator function return bool and _not_ throw. But having 
decode not throw is going to be error-prone. It also doesn't help performance-
wise, because it still has to do all of the same validity checks as it 
decodes. It's just that instead of throwing, it returns an error value. I 
really think that having decode throw on invalid Unicode is the right 
decision, and I don't see what we gain by making it not throw.

- Jonathan M Davis

Feb 07 2014

"Meta" <jared771 gmail.com> writes:

On Friday, 7 February 2014 at 23:45:06 UTC, Jonathan M Davis 
wrote:
 On Friday, February 07, 2014 23:01:46 Meta wrote:
 On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis
 
 wrote:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei
 
 Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns 
 bool!

 
 Add std.utf.decode() to that as well. IOW, it should have 
 an
 overload
 which returns a status code

 
 Much simpler - it returns a special dchar to designate bad
 encoding. And
 there is one defined by Unicode spec.

 
 Isn't that actually worse? Unless you're suggesting that we
 stop throwing on
 decode errors, then functions like std.array.front will have 
 to
 check the
 result on every call to see whether it was valid or not and
 thus whether they
 should throw, which would mean extra overhead over simply
 having decode throw
 on decode errors. validate has no business throwing, and we
 definitely should
 add isValidUnicode (or isValid or whatever you want to call 
 it)
 for validation
 purposes. Code can then call that to validate that a string 
 is
 valid and not
 worry about any UTFExceptions being thrown as long as it
 doesn't manipulate
 the string in a way that could result in its Unicode becoming
 invalid.
 However, I would argue that assuming that everyone is going 
 to
 validate their
 strings and that pretty much all string-related functions
 shouldn't ever have
 to worry about invalid Unicode is just begging for subtle 
 bugs
 all over the
 place IMHO. You're essentially dealing with error codes at 
 that
 point, and I
 think that experience has shown quite clearly that error 
 codes
 are generally a
 bad way to go. Almost no one checks them unless they have 
 to. I
 think that
 having decode throw on invalid Unicode is exactly what it
 should be doing. The
 problem is that validate shouldn't.
 
 - Jonathan M Davis

 
 You could always return an Option!char. Nullable won't work
 because it lets you access the naked underlying value.

 How is that any better than returning an invalid dchar with a 
 specific value?
 In either case, you have to check the value. With the 
 exception, code doesn't
 have to care. If the string is invalid, it'll get a 
 UTFException, and it can
 handle it appropriately, but having to check the return value 
 just adds
 overhead (albeit minimal) and is error-prone, because it 
 generally won't be
 checked (and if it is checked, it complicates the calling code, 
 because it has
 to do the check).

We have had this discussion at least once before. A hypothetical 
Option type will not let you do anything with the wrapped value 
UNTIL you check it, as opposed to returning null, -1, some 
special Unicode value, etc. Trying to use it before this check is 
necessarily a compile-time error. This is both faster than 
exceptions and safer than special "error values" that are only 
special by convention. I recall that you've worked with Haskell 
before, so you must know how useful this pattern is.

 Code that doesn't want to risk a UTFException being thrown can 
 validate up
 front - and that validator function return bool and _not_ 
 throw. But having
 decode not throw is going to be error-prone. It also doesn't 
 help performance-
 wise, because it still has to do all of the same validity 
 checks as it
 decodes. It's just that instead of throwing, it returns an 
 error value. I
 really think that having decode throw on invalid Unicode is the 
 right
 decision, and I don't see what we gain by making it not throw.

 - Jonathan M Davis

Feb 07 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Saturday, February 08, 2014 01:26:10 Meta wrote:
 You could always return an Option!char. Nullable won't work
 because it lets you access the naked underlying value.

 
 How is that any better than returning an invalid dchar with a
 specific value?
 In either case, you have to check the value. With the
 exception, code doesn't
 have to care. If the string is invalid, it'll get a
 UTFException, and it can
 handle it appropriately, but having to check the return value
 just adds
 overhead (albeit minimal) and is error-prone, because it
 generally won't be
 checked (and if it is checked, it complicates the calling code,
 because it has
 to do the check).

 
 We have had this discussion at least once before. A hypothetical
 Option type will not let you do anything with the wrapped value
 UNTIL you check it, as opposed to returning null, -1, some
 special Unicode value, etc. Trying to use it before this check is
 necessarily a compile-time error. This is both faster than
 exceptions and safer than special "error values" that are only
 special by convention. I recall that you've worked with Haskell
 before, so you must know how useful this pattern is.

The problem is that you need to check it. This is _slower_ than exceptions in 
the normal case, as invalid Unicode should be the rare case. The great thing 
with exceptions is that you can write your code as if it will always work and 
don't need to put checks in it everywhere. Instead, you just put try-catch 
blocks in the (relatively) few places that you want to handle exceptions. Most 
of your code doesn't care. And if you validate the string before you start 
doing a bunch of operations on it, then you don't have to worry about a 
UTFException being thrown. Also, if code fails to validate a string for one 
reason or another, the error gets reported rather than an invalid return value 
being ignored.

As for returning Optional/Nullable dchar vs an invalid dchar, I don't see much 
difference. In both cases, you have to check the return value, which is 
precisely what you don't want to have to do in most cases. And decode has to 
do the same work to check for valid Unicode whether it throws an exception or 
returns a value indicating decode-failure, so why have the extra overhead of 
having to check the result for decode-failure? Just let it throw an exception 
in that case and handle it in the appropriate part of your code. Returning a 
Nullable result or a specific bad value that you have to check rather than 
throwing an exception only makes sense when it's expected that failures are 
going to be frequent. If failures are infrequent, it's generally far better to 
use exceptions, because it will lead to much cleaner, less error-prone code.

- Jonathan M Davis

Feb 07 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Jonathan M Davis:

 The problem is that you need to check it. This is _slower_ than 
 exceptions in the normal case,

Right, but verifying the correctness of the Unicode encoding of a 
string probably on average requires much more than time than 
testing a single conditional. So I think this tiny added time is 
acceptable.

Bye,
bearophile

Feb 07 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Saturday, February 08, 2014 02:41:54 bearophile wrote:
 Jonathan M Davis:
 The problem is that you need to check it. This is _slower_ than
 exceptions in the normal case,

 
 Right, but verifying the correctness of the Unicode encoding of a
 string probably on average requires much more than time than
 testing a single conditional. So I think this tiny added time is
 acceptable.

But why even do it in the first place then? The code is cleaner and less 
error-prone if it uses exceptions. The only argument I can see being made for 
not using exceptions with decode is efficiency, because it's more cumbersome 
to use if it's returning error values of some kind rather than just throwing 
in the rare case that there's a Unicode decoding error. It's also more error-
prone than using exceptions, because most code will just skip checking the 
result. That's one of the big reasons that error codes are generally a bad 
idea.

But since decode has to do the same validity checks whether it returns an 
invalid dchar or a Nullable!dchar or if it throws, I don't see why not having 
the exception buys us anything. It just makes the API worse.

- Jonathan M Davis

Feb 07 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Fri, 07 Feb 2014 22:42:00 -0500
schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:

 On Saturday, February 08, 2014 02:41:54 bearophile wrote:
 Jonathan M Davis:
 The problem is that you need to check it. This is _slower_ than
 exceptions in the normal case,

 
 Right, but verifying the correctness of the Unicode encoding of a
 string probably on average requires much more than time than
 testing a single conditional. So I think this tiny added time is
 acceptable.

 
 But why even do it in the first place then? The code is cleaner and less 
 error-prone if it uses exceptions. The only argument I can see being made for 
 not using exceptions with decode is efficiency, because it's more cumbersome 
 to use if it's returning error values of some kind rather than just throwing 
 in the rare case that there's a Unicode decoding error. It's also more error-
 prone than using exceptions, because most code will just skip checking the 
 result. That's one of the big reasons that error codes are generally a bad 
 idea.
 
 But since decode has to do the same validity checks whether it returns an 
 invalid dchar or a Nullable!dchar or if it throws, I don't see why not having 
 the exception buys us anything. It just makes the API worse.
 
 - Jonathan M Davis

I agree with both of you. The Unicode standard tells us that
it is correct to replace invalid data with that special code
point, so it should be used where applicable, e.g. when one
sanitizes an invalid string.
On the other hand exceptions are clearly superior to error
returns.

I guess we just have two use cases here. One where invalid
encoding is not an error (e.g. for sanitizing purposes) and
one where you don't want to lose information and have to
enforce correct encoding.
Name the first one "decodeSubst" maybe and have decode call
that and check for 0xFFFD?

-- 
Marco

Feb 07 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 8 Feb 2014 05:29:35 +0100
schrieb Marco Leise <Marco.Leise gmx.de>:

 Name the first one "decodeSubst" maybe and have decode call
 that and check for 0xFFFD?

 
Err... the other way round. 0xFFFD would actually be valid
from an encoding point of view, I guess.

-- 
Marco

Feb 07 2014

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, February 08, 2014 05:29:35 Marco Leise wrote:
 I guess we just have two use cases here. One where invalid
 encoding is not an error (e.g. for sanitizing purposes) and
 one where you don't want to lose information and have to
 enforce correct encoding.
 Name the first one "decodeSubst" maybe and have decode call
 that and check for 0xFFFD?

I think that that would call for us to have 3 related but distinct functions:

1. decode, which throws on invalid Unicode. We already have this.

2. isValidUnicode, which returns whether the string is valid Unicode and does 
not throw. We don't yet have this. Rather, we have validate which does the 
same job and then throws instead of returning bool.

3. sanitizeUnicode (or whatever would be a good name for it), which replaces 
invalid Unicode with 0xFFFD (or whatever the appropriate character is) so that 
it can be operated on without causing decode to throw in spite of the fact 
that it was invalid Unicode. We don't have anything like this yet.

- Jonathan M Davis

Feb 07 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Fri, 07 Feb 2014 21:04:08 -0800
schrieb Jonathan M Davis <jmdavisProg gmx.com>:

 On Saturday, February 08, 2014 05:29:35 Marco Leise wrote:
 I guess we just have two use cases here. One where invalid
 encoding is not an error (e.g. for sanitizing purposes) and
 one where you don't want to lose information and have to
 enforce correct encoding.
 Name the first one "decodeSubst" maybe and have decode call
 that and check for 0xFFFD?

 
 I think that that would call for us to have 3 related but distinct functions:
 
 1. decode, which throws on invalid Unicode. We already have this.

 2. isValidUnicode, which returns whether the string is valid Unicode and does 
 not throw. We don't yet have this. Rather, we have validate which does the 
 same job and then throws instead of returning bool.

Yes, that's the one that needs to be added.

 3. sanitizeUnicode (or whatever would be a good name for it), which replaces 
 invalid Unicode with 0xFFFD (or whatever the appropriate character is) so that 
 it can be operated on without causing decode to throw in spite of the fact 
 that it was invalid Unicode. We don't have anything like this yet.

And oh wonder, we actually have that already! Problem solved:

(Not that I knew that before hand *cough*)

Or does someone have a need to also sanitize code point by code
point?

 - Jonathan M Davis

-- 
Marco

Feb 07 2014

"Brad Anderson" <eco gnuk.net> writes:

On Saturday, 8 February 2014 at 05:04:35 UTC, Jonathan M Davis
wrote:
 I think that that would call for us to have 3 related but 
 distinct functions:

 1. decode, which throws on invalid Unicode. We already have 
 this.

I wonder if it'd be too reckless to just make decode for string
nothrow (we want this function to be as fast as possible) and
just require that string, by definition, must be valid unicode.
to!string and company could validate strings as they come in from
foreign sources. This way invalid unicode is caught early and
decode gets a speedup.

char[] is different because the mutability means it could be made
invalid at any time so we can't rely on it staying valid after
it's been checked but once a string has been confirmed valid
there is no reason to check it for validity ever again.

Feb 08 2014

Timon Gehr <timon.gehr gmx.ch> writes:

On 02/08/2014 07:44 PM, Brad Anderson wrote:
 On Saturday, 8 February 2014 at 05:04:35 UTC, Jonathan M Davis
 wrote:
 I think that that would call for us to have 3 related but distinct
 functions:

 1. decode, which throws on invalid Unicode. We already have this.

 I wonder if it'd be too reckless to just make decode for string
 nothrow (we want this function to be as fast as possible) and
 just require that string, by definition, must be valid unicode.
 to!string and company could validate strings as they come in from
 foreign sources. This way invalid unicode is caught early and
 decode gets a speedup.

 char[] is different because the mutability means it could be made
 invalid at any time so we can't rely on it staying valid after
 it's been checked but once a string has been confirmed valid
 there is no reason to check it for validity ever again.

"☹"[1..$]

Feb 08 2014

"Dominikus Dittes Scherkl" writes:

On Saturday, 8 February 2014 at 18:44:38 UTC, Brad Anderson wrote:
 I wonder if it'd be too reckless to just make decode for string
 nothrow (we want this function to be as fast as possible) and

Yes. It shouldn't throw. Never.

 just require that string, by definition, must be valid unicode.

Why?
Replacement of broken code is defined by unicode - we should use 
it.
Noone prevents you to call isValidUnicode beforehand and handle 
that sepearately if it returns "false" (I would recomment that 
only if security is relevant e.g. if you chack a signature or 
something like that) or search for 0xFFFD in the result string 
afterwards and throw if you find some (but this is generally no 
good idea because the replacement characters may have been there 
even before and were intended).
As default relplacing broken characters is very good. And fast.

Feb 08 2014

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, February 07, 2014 21:04:08 Jonathan M Davis wrote:
 On Saturday, February 08, 2014 05:29:35 Marco Leise wrote:
 I guess we just have two use cases here. One where invalid
 encoding is not an error (e.g. for sanitizing purposes) and
 one where you don't want to lose information and have to
 enforce correct encoding.
 Name the first one "decodeSubst" maybe and have decode call
 that and check for 0xFFFD?

 
 I think that that would call for us to have 3 related but distinct
 functions:
 
 1. decode, which throws on invalid Unicode. We already have this.
 
 2. isValidUnicode, which returns whether the string is valid Unicode and
 does not throw. We don't yet have this. Rather, we have validate which does
 the same job and then throws instead of returning bool.
 
 3. sanitizeUnicode (or whatever would be a good name for it), which replaces
 invalid Unicode with 0xFFFD (or whatever the appropriate character is) so
 that it can be operated on without causing decode to throw in spite of the
 fact that it was invalid Unicode. We don't have anything like this yet.

Actually, thinking this through some more, if we can replace invalid Unicode 
with 0xFFFD, and have all algorithms work with that and consider it valid 
Unicode (rather than getting weird bugs due to invalid Unicode), then if 
decode returned that on error rather than throwing, we wouldn't actually need 
to check the return value. It wouldn't matter that the Unicode was invalid. 
So, we wouldn't even need to _care_ that the Unicode was invalid. Anyone who 
_did_ care could call isValidUnicode to validate the Unicode first, and those 
who didn't wouldn't need to worry about UTFException being thrown, because 
everything would still work even if the string was invalid Unicode.

So, if that's indeed what 0xFFFD does, and that's what Dmitry meant by 
proposing that we return that rather than throwing, then I rescind my 
assessment that throwing was the best way to go and have to agree that 
returning 0xFFFD would be better. I was responding under the assumption that 
you had to check for 0xFFFD and respond to it order to avoid having your code 
be buggy, in which case throwing would be far better. But if 0xFFFD is 
considered valid Unicode, then returning that would be a fantastic solution. 
And if that's the case, we only need two functions, not three:

1. decode, which returns 0xFFFD on decode failure

2. isValidUnicode, which returns whether the string is valid

And I actually really like the idea that we could just operate on invalid 
Unicode as valid Unicode this way, making it so that most code doesn't need to 
care, and code that _does_ need to care, can validate the strings first. Right 
now, pretty much all string code needs to care in order to avoid processing 
invalid Unicode, which is much messier.

- Jonathan M Davis

Feb 07 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

08-Feb-2014 09:45, Jonathan M Davis пишет:
 On Friday, February 07, 2014 21:04:08 Jonathan M Davis wrote:
 Actually, thinking this through some more, if we can replace invalid Unicode
 with 0xFFFD, and have all algorithms work with that and consider it valid
 Unicode (rather than getting weird bugs due to invalid Unicode), then if
 decode returned that on error rather than throwing, we wouldn't actually need
 to check the return value. It wouldn't matter that the Unicode was invalid.
 So, we wouldn't even need to _care_ that the Unicode was invalid. Anyone who
 _did_ care could call isValidUnicode to validate the Unicode first, and those
 who didn't wouldn't need to worry about UTFException being thrown, because
 everything would still work even if the string was invalid Unicode.

Hm.. yes. I gotta read the whole thread next time :)


 So, if that's indeed what 0xFFFD does, and that's what Dmitry meant by
 proposing that we return that rather than throwing, then I rescind my
 assessment that throwing was the best way to go and have to agree that
 returning 0xFFFD would be better. I was responding under the assumption that
 you had to check for 0xFFFD and respond to it order to avoid having your code
 be buggy, in which case throwing would be far better. But if 0xFFFD is
 considered valid Unicode,

It is.

 then returning that would be a fantastic solution.
 And if that's the case, we only need two functions, not three:

 1. decode, which returns 0xFFFD on decode failure

 2. isValidUnicode, which returns whether the string is valid

Yay.

 And I actually really like the idea that we could just operate on invalid
 Unicode as valid Unicode this way, making it so that most code doesn't need to
 care, and code that _does_ need to care, can validate the strings first. Right
 now, pretty much all string code needs to care in order to avoid processing
 invalid Unicode, which is much messier.

Horray! The goodness is that for example I can run regex on partially 
broken text and have some sane results out of it.

 - Jonathan M Davis


-- 
Dmitry Olshansky

Feb 08 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

08-Feb-2014 03:01, Meta пишет:
 On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis wrote:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:

 You could always return an Option!char. Nullable won't work because it
 lets you access the naked underlying value.

This is ridiculously distracting suggestion and simply has no merits 
whatsoever.

To underline how impractical this suggestion is: currently every code 
out there expect dchar out of .front not some magic animal called 
'Option!char'.

-- 
Dmitry Olshansky

Feb 08 2014

"Meta" <jared771 gmail.com> writes:

On Saturday, 8 February 2014 at 11:24:56 UTC, Dmitry Olshansky 
wrote:
 08-Feb-2014 03:01, Meta пишет:
 On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis 
 wrote:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:

 You could always return an Option!char. Nullable won't work 
 because it
 lets you access the naked underlying value.

 This is ridiculously distracting suggestion and simply has no 
 merits whatsoever.

 To underline how impractical this suggestion is: currently 
 every code out there expect dchar out of .front not some magic 
 animal called 'Option!char'.

I'm not actually suggesting a replacement. Just wishful thinking 
on how the function could've been better designed.

Feb 08 2014

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, February 08, 2014 18:03:54 Meta wrote:
 On Saturday, 8 February 2014 at 11:24:56 UTC, Dmitry Olshansky
 
 wrote:
 08-Feb-2014 03:01, Meta пишет:
 On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis
 
 wrote:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:

 You could always return an Option!char. Nullable won't work
 because it
 lets you access the naked underlying value.

 
 This is ridiculously distracting suggestion and simply has no
 merits whatsoever.
 
 To underline how impractical this suggestion is: currently
 every code out there expect dchar out of .front not some magic
 animal called 'Option!char'.

 
 I'm not actually suggesting a replacement. Just wishful thinking
 on how the function could've been better designed.

I don't see how returning Nullable!dchar would improve decode function at all. 
Currently, it throws on invalid UTF, so you don't have to check the return 
value, and your code can avoid caring about decode errors except for the 
points where you put your catches (which are generally in far fewer places 
than the number of places that decode gets called - be it directly or 
indirectly). On the other hand, with Nullable!dchar, you'd have to always 
check the result or risking hitting an assertion when you don't check the 
result (or ending up with dchar.init in -release). I don't see how that's 
better than the current situation at all. It just makes decode harder to use.

And Dmitry's suggestion is better than both. We end up returning the Unicode 
character specifically intended to designate bad encodings (\uFFFD) such that 
you don't even have to care that there was a decode error. You just decode the 
string and use it. It will just be one more character in the string that 
doesn't match what you're looking for for find and the like, and pretty much 
nothing should choke on it. Anything which then cares about Unicode validity 
can use isValidUnicode (once we have it) to validate the string instead of 
relying on decode to throw. It will clean up string processing in the face of 
invalid Unicode quite nicely.

So, I don't see how using Nullable!dchar as you suggest would ever have been a 
better design.

- Jonathan M Davis

Feb 08 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

08-Feb-2014 02:57, Jonathan M Davis пишет:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!

 Add std.utf.decode() to that as well. IOW, it should have an overload
 which returns a status code

 Much simpler - it returns a special dchar to designate bad encoding. And
 there is one defined by Unicode spec.

 Isn't that actually worse?

No, it's better and more flexible for those who care to repair broken 
text in case it's broken. We currently have ZERO facilities to work with 
partly broken UTF and it's not that rare thing to have it.

 Unless you're suggesting that we stop throwing on
 decode errors,

That is exactly what I suggest.

then functions like std.array.front will have to check the
 result on every call to see whether it was valid or not and thus whether they
 should throw, which would mean extra overhead over simply having decode throw
 on decode errors.

Why the heck? It will not throw either. In the very end bad encoding is 
handled by displaying the 'substituted' (typically '?') character in 
places where it broke not by throwing up hands in the air and spitting 
"UTF Exception: offset 4302 bad UTF sequence". This is not good enough 
(in case somebody though that it is).

Those who care about throwing add a trivial map!(x => x != '\uFFFD' || 
die()) over a string, where die function throws an exception.

 validate has no business throwing, and we definitely should
 add isValidUnicode (or isValid or whatever you want to call it) for validation
 purposes. Code can then call that to validate that a string is valid and not
 worry about any UTFExceptions being thrown as long as it doesn't manipulate
 the string in a way that could result in its Unicode becoming invalid.

Yet later down the road decode will triple check that anyway. Just 
saying. BTW if the string was checked beforehand there is no difference 
between 2 approaches at all (don't have to check).

 However, I would argue that assuming that everyone is going to validate their
 strings and that pretty much all string-related functions shouldn't ever have
 to worry about invalid Unicode is just begging for subtle bugs all over the
 place IMHO. You're essentially dealing with error codes at that point, and I
 think that experience has shown quite clearly that error codes are generally a
 bad way to go. Almost no one checks them unless they have to. I think that
 having decode throw on invalid Unicode is exactly what it should be doing. The
 problem is that validate shouldn't.

Every single text editor out there seems to disagree with you: they do 
show you partially substituted text, not a dialog box "My bad, it's 
broken UTF-8, I'm giving up!".

-- 
Dmitry Olshansky

Feb 08 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 08 Feb 2014 15:21:26 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 08-Feb-2014 02:57, Jonathan M Davis =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!

 Add std.utf.decode() to that as well. IOW, it should have an overload
 which returns a status code

 Much simpler - it returns a special dchar to designate bad encoding. A=



nd
 there is one defined by Unicode spec.

 Isn't that actually worse?

=20
 No, it's better and more flexible for those who care to repair broken=20
 text in case it's broken. We currently have ZERO facilities to work with=

=20
 partly broken UTF and it's not that rare thing to have it.

Your argument is unsubstantiated, since we have this already:


 Unless you're suggesting that we stop throwing on
 decode errors,

=20
 That is exactly what I suggest.
=20
 then functions like std.array.front will have to check the
 result on every call to see whether it was valid or not and thus whethe=


r they
 should throw, which would mean extra overhead over simply having decode=


 throw
 on decode errors.

=20
 Why the heck? It will not throw either. In the very end bad encoding is=20
 handled by displaying the 'substituted' (typically '?') character in=20
 places where it broke not by throwing up hands in the air and spitting=20
 "UTF Exception: offset 4302 bad UTF sequence". This is not good enough=20
 (in case somebody though that it is).
=20
 Those who care about throwing add a trivial map!(x =3D> x !=3D '\uFFFD' |=

|=20
 die()) over a string, where die function throws an exception.

Thats neither an improvement over calling "validate" nor does
that deal with distinguishing between invalid UTF and \uFFFD
in the input.

 validate has no business throwing, and we definitely should
 add isValidUnicode (or isValid or whatever you want to call it) for val=


idation
 purposes. Code can then call that to validate that a string is valid an=


d not
 worry about any UTFExceptions being thrown as long as it doesn't manipu=


late
 the string in a way that could result in its Unicode becoming invalid.

=20
 Yet later down the road decode will triple check that anyway. Just=20
 saying. BTW if the string was checked beforehand there is no difference=20
 between 2 approaches at all (don't have to check).
=20
 However, I would argue that assuming that everyone is going to validate=


 their
 strings and that pretty much all string-related functions shouldn't eve=


r have
 to worry about invalid Unicode is just begging for subtle bugs all over=


 the
 place IMHO. You're essentially dealing with error codes at that point, =


and I
 think that experience has shown quite clearly that error codes are gene=


rally a
 bad way to go. Almost no one checks them unless they have to. I think t=


hat
 having decode throw on invalid Unicode is exactly what it should be doi=


ng. The
 problem is that validate shouldn't.

=20
 Every single text editor out there seems to disagree with you: they do=20
 show you partially substituted text, not a dialog box "My bad, it's=20
 broken UTF-8, I'm giving up!".

Editor do different things. They often try to detect the
encoding with a fall back to Latin1. If you open a file
explicitly as UTF-8 they may display a substitution char or
detect the error and use the fall back, as is the case with
Geany and gedit does in fact throw an error message at you
saying "My bad, it's broken UTF-8, I'm giving up!".


--=20
Marco

Feb 08 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

09-Feb-2014 09:35, Marco Leise пишет:
 Am Sat, 08 Feb 2014 15:21:26 +0400
 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 08-Feb-2014 02:57, Jonathan M Davis пишет:
 On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
 07-Feb-2014 20:29, Andrej Mitrovic пишет:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!

 Add std.utf.decode() to that as well. IOW, it should have an overload
 which returns a status code

 Much simpler - it returns a special dchar to designate bad encoding. And
 there is one defined by Unicode spec.

 Isn't that actually worse?

 No, it's better and more flexible for those who care to repair broken
 text in case it's broken. We currently have ZERO facilities to work with
 partly broken UTF and it's not that rare thing to have it.

 Your argument is unsubstantiated, since we have this already:


Working with ranges of dchar? Nobody is taking eager validation from 
your hands anyway.

 Unless you're suggesting that we stop throwing on
 decode errors,

 That is exactly what I suggest.

 then functions like std.array.front will have to check the
 result on every call to see whether it was valid or not and thus whether they
 should throw, which would mean extra overhead over simply having decode throw
 on decode errors.

 Why the heck? It will not throw either. In the very end bad encoding is
 handled by displaying the 'substituted' (typically '?') character in
 places where it broke not by throwing up hands in the air and spitting
 "UTF Exception: offset 4302 bad UTF sequence". This is not good enough
 (in case somebody though that it is).

 Those who care about throwing add a trivial map!(x => x != '\uFFFD' ||
 die()) over a string, where die function throws an exception.

 Thats neither an improvement over calling "validate" nor does
 that deal with distinguishing between invalid UTF and

Means text is broken but wasn't ever read...
\uFFFD
 in the input.

...means text was broken sometime before.

Hardly makes any difference to the most applications.
Normal text doesn't contain \uFFFD.

And you can test a string with proper 'validate', it's just that while 
decoding the default is to substitute.

 validate has no business throwing, and we definitely should
 add isValidUnicode (or isValid or whatever you want to call it) for validation
 purposes. Code can then call that to validate that a string is valid and not
 worry about any UTFExceptions being thrown as long as it doesn't manipulate
 the string in a way that could result in its Unicode becoming invalid.

 Yet later down the road decode will triple check that anyway. Just
 saying. BTW if the string was checked beforehand there is no difference
 between 2 approaches at all (don't have to check).

 However, I would argue that assuming that everyone is going to validate their
 strings and that pretty much all string-related functions shouldn't ever have
 to worry about invalid Unicode is just begging for subtle bugs all over the
 place IMHO. You're essentially dealing with error codes at that point, and I
 think that experience has shown quite clearly that error codes are generally a
 bad way to go. Almost no one checks them unless they have to. I think that
 having decode throw on invalid Unicode is exactly what it should be doing. The
 problem is that validate shouldn't.

 Every single text editor out there seems to disagree with you: they do
 show you partially substituted text, not a dialog box "My bad, it's
 broken UTF-8, I'm giving up!".

 Editor do different things. They often try to detect the
 encoding with a fall back to Latin1. If you open a file
 explicitly as UTF-8 they may display a substitution char or
 detect the error and use the fall back, as is the case with
 Geany and

Throwing exception here is not something useful in 90% of cases. 
Requiring everybody to call sanitize on every string from the outside 
smells like a wrong default to me.

 gedit does in fact throw an error message at you
 saying "My bad, it's broken UTF-8, I'm giving up!".

I know and it's piece of junk :)
Seriously it doesn't even has regular expressions for search and replace!

-- 
Dmitry Olshansky

Feb 09 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Dmitry Olshansky"  wrote in message news:ld7dla$pdg$1 digitalmars.com... 

 gedit does in fact throw an error message at you
 saying "My bad, it's broken UTF-8, I'm giving up!".

 I know and it's piece of junk :)
 Seriously it doesn't even has regular expressions for search and replace!

That would be a luxury, gedit doesn't even have auto-indent.

Feb 09 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 9 Feb 2014 22:24:21 +1100
schrieb "Daniel Murphy" <yebbliesnospam gmail.com>:

 "Dmitry Olshansky"  wrote in message news:ld7dla$pdg$1 digitalmars.com... 
 
 gedit does in fact throw an error message at you
 saying "My bad, it's broken UTF-8, I'm giving up!".

 I know and it's piece of junk :)
 Seriously it doesn't even has regular expressions for search and replace!

 
 That would be a luxury, gedit doesn't even have auto-indent.

You can talk about missing features in gedit all day, but from
my point of view an editor is broken when it doesn't throw an
error message at you. By silently replacing incorrect UTF-8
they change the original text.
0xFFFD should probably be used only when error messages are
out of question like when displaying/printing text only.

-- 
Marco

Feb 16 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Marco Leise"  wrote in message 
news:20140217030525.67a21dfc org.homedns.org...

 0xFFFD should probably be used only when error messages are
 out of question like when displaying/printing text only.

What do you use for displaying text, if not a text editor?

Feb 17 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Tue, 18 Feb 2014 01:01:53 +1100
schrieb "Daniel Murphy" <yebbliesnospam gmail.com>:

 "Marco Leise"  wrote in message 
 news:20140217030525.67a21dfc org.homedns.org...
 
 0xFFFD should probably be used only when error messages are
 out of question like when displaying/printing text only.

 
 What do you use for displaying text, if not a text editor? 

That was directed at D development. Or programming with
Unicode encodings in general. If you load a text file and
replace broken UTF-8 with \0xFFFD or ? as Sublime 3 does, you
loose information. I think that smells and asks for a big
red message box. gedit is an editor that works this way.

What I meant by displaying text is static UI elements, since
there is no risk of propagating the error. Everything else that
can notify the user of the incorrect encoding or loss of
information should do so.

-- 
Marco

Feb 17 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 09 Feb 2014 12:18:41 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 09-Feb-2014 09:35, Marco Leise =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
 Thats neither an improvement over calling "validate" nor does
 that deal with distinguishing between invalid UTF and

=20
 Means text is broken but wasn't ever read...
\uFFFD
 in the input.

 ...means text was broken sometime before.
=20
 Hardly makes any difference to the most applications.
 Normal text doesn't contain \uFFFD.

Of course it does. It is a valid symbol and a lot of websites
describing the "Specials" Unicode block make use of it, like
the one on Wikipedia:
http://en.wikipedia.org/wiki/Specials_(Unicode_block)

With your definition, pulling such a document from the web and
parsing it in D would mean playing on broken strings.

 [...]
 Every single text editor out there seems to disagree with you: they do
 show you partially substituted text, not a dialog box "My bad, it's
 broken UTF-8, I'm giving up!".



 gedit does in fact throw an error message at you
 saying "My bad, it's broken UTF-8, I'm giving up!".


 I know and it's piece of junk :)
 Seriously it doesn't even has regular expressions for search and replace!

https://yourlogicalfallacyis.com/no-true-scotsman :p

--=20
Marco

Feb 16 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

17-Feb-2014 06:19, Marco Leise пишет:
 Am Sun, 09 Feb 2014 12:18:41 +0400
 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 09-Feb-2014 09:35, Marco Leise пишет:
 Thats neither an improvement over calling "validate" nor does
 that deal with distinguishing between invalid UTF and

 Means text is broken but wasn't ever read...
 \uFFFD
 in the input.

 ...means text was broken sometime before.

 Hardly makes any difference to the most applications.
 Normal text doesn't contain \uFFFD.

 Of course it does. It is a valid symbol and a lot of websites
 describing the "Specials" Unicode block make use of it, like
 the one on Wikipedia:
 http://en.wikipedia.org/wiki/Specials_(Unicode_block)

 With your definition, pulling such a document from the web and
 parsing it in D would mean playing on broken strings.

In a sense, \uFFFD means broken encoding. What about lone surrogates? 
Private use symbols that must not occur in transmission? They all 
displayed in various Unicode listings. About 'playing on broken strings' 
- ignoring broken/partially broken strings, I specifically think that 
it's what most users/use cases want.

A more useful and sensible default of decoding is to substitute on 
broken encoding. And it's a standard procedure. It's particularly better 
for displaying text.

To remind: since it's only a decode you are still in the control of 
original text - in fact you may re-test what bytes are there IF you want.

The way of "throw on bad encoding" could be useful but I hardly see it 
as what you want for default.

I'm wary of breaking code that relies on throwing. For the moment I 
think the best course of action would be to introduce xdecode or some 
such that will do substitution on failure, see how it floats and then 
change ranges/foreach etc to use xdecode.

 [...]
 Every single text editor out there seems to disagree with you: they do
 show you partially substituted text, not a dialog box "My bad, it's
 broken UTF-8, I'm giving up!".



 gedit does in fact throw an error message at you
 saying "My bad, it's broken UTF-8, I'm giving up!".


 I know and it's piece of junk :)
 Seriously it doesn't even has regular expressions for search and replace!

 https://yourlogicalfallacyis.com/no-true-scotsman :p

Well, gedit is a nice example of why just throwing exception is not good 
enough for many apps (editors in particular). The fact that it's piece 
of junk might be irrelevant ;)

-- 
Dmitry Olshansky

Feb 18 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/18/14, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 Well, gedit is a nice example of why just throwing exception is not good
 enough for many apps (editors in particular). The fact that it's piece
 of junk might be irrelevant ;)

OT: Considering how many big-budget events (World Cup / Olympics) do
such a poor job at displaying any kind of unicode text (e.g. they
frequently display č/ć/đ ad c/c/dj), the only thing that could be
worse is a big red dialog box, lol!

Feb 18 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Tue, 18 Feb 2014 12:14:58 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 In a sense, \uFFFD means broken encoding.

In a sense yes, in another no. It is a defined code point and
it has a symbol: =EF=BF=BD a diamond with a question mark inside.

 What about lone surrogates?

Those are actual broken encoding.

 Private use symbols that must not occur in transmission?

Then that "transmission" seems to exclude private symbols. It
may also exclude special characters like \uFFFD. That's part
of the particular protocol and should be handled there.

 They all=20
 displayed in various Unicode listings. About 'playing on broken strings'=

=20
 - ignoring broken/partially broken strings, I specifically think that=20
 it's what most users/use cases want.
=20
 A more useful and sensible default of decoding is to substitute on=20
 broken encoding. And it's a standard procedure. It's particularly better=

=20
 for displaying text.

Correct. I just don't agree that displaying text should the
the one true use case and instead prefer exceptions instead of
silent loss of information as the default.

 To remind: since it's only a decode you are still in the control of=20
 original text - in fact you may re-test what bytes are there IF you want.
=20
 The way of "throw on bad encoding" could be useful but I hardly see it=20
 as what you want for default.
=20
 I'm wary of breaking code that relies on throwing. For the moment I=20
 think the best course of action would be to introduce xdecode or some=20
 such that will do substitution on failure, see how it floats and then=20
 change ranges/foreach etc to use xdecode.

We wont convince each other. Let's just stop here.

--=20
Marco

Feb 18 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/7/14, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 However, I would argue that assuming that everyone is going to validate
 their
 strings and that pretty much all string-related functions shouldn't ever
 have
 to worry about invalid Unicode is just begging for subtle bugs all over the

 place IMHO.

I suggested we would introduce an overload, not replace the existing
function, so this isn't an issue.

 The problem is that you need to check it. This is _slower_ than exceptions in

the normal case, as invalid Unicode should be the rare case.

Do you have any benchmarks for this? I have vague memory about
complaining that the exception code is *de-facto* slower, regardless
of input. But I'll try to provide some test-cases later and see where
we're at.

Feb 08 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

08-Feb-2014 12:20, Andrej Mitrovic пишет:
 On 2/7/14, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 However, I would argue that assuming that everyone is going to validate
 their
 strings and that pretty much all string-related functions shouldn't ever
 have
 to worry about invalid Unicode is just begging for subtle bugs all over the

 place IMHO.

 I suggested we would introduce an overload, not replace the existing
 function, so this isn't an issue.

 The problem is that you need to check it. This is _slower_ than exceptions in

 the normal case, as invalid Unicode should be the rare case.

 Do you have any benchmarks for this? I have vague memory about
 complaining that the exception code is *de-facto* slower, regardless
 of input. But I'll try to provide some test-cases later and see where
 we're at.

Just be sure to test on LDC or GDC. DMD results are irrelevant to the 
performance-minded of our community. Also be sure to copy the whole code 
involved in a single file not link to Phobos.

People tend to thrown figures like ~10% slower with exceptions turned on 
but you'll never known what exactly they test.

-- 
Dmitry Olshansky

Feb 08 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/7/14, 8:29 AM, Andrej Mitrovic wrote:
 On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
 Add a bugzilla and let's define isValid that returns bool!

 Add std.utf.decode() to that as well. IOW, it should have an overload
 which returns a status code but assigns the return value through another
 parameter.

.toBugzilla()

Andrei

Feb 07 2014

"Dicebot" <public dicebot.lv> writes:

On Friday, 7 February 2014 at 03:14:45 UTC, Sean Kelly wrote:
 pure  safe void validate(S)(in S str) if (isSomeString!S);

 Throws:
 UTFException if str is not well-formed.

 And somewhere in the world, darkness fell forever on a bright 
 and beautiful countryside.  The monsters poured forth and 
 devoured everything in sight, given strength by that 
 unbelievable abomination of a function design.

True words indeed!

To sum up this small thread : I am perfectly OK with exceptions 
not showing in -vgc if we also agree on cleaning up Phobos from 
control flow exceptions.

Feb 07 2014

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, February 06, 2014 22:20:37 Dicebot wrote:
 On Thursday, 6 February 2014 at 22:18:10 UTC, Brad Anderson wrote:
 You should probably validate utf from all foreign sources.
 Catch a problem with it as it comes in rather than in some
 arbitrary part of your program.
 


 
 pure  safe void validate(S)(in S str) if (isSomeString!S);
 
 Throws:
 UTFException if str is not well-formed.
 
 ;)

In general, I think that throwing on malformed Unicode is a good thing, 
because it results in code that's less error-prone (as the alternative is to 
not validate Unicode and try and continue somehow regardless of bad input when 
decoding Unicode, which would be very bad IMHO). That being said, validating 
strings when they enter the program is a good way to localize any failures - 
which is where validate would come in - and I have to agree that the fact that 
validate throws is horrific. It's a classic example of a function that should 
return a bool rather than throw. You're asking it whether the string is valid, 
not asking to report errors when your normal control flow encounters an error 
that prevents it from functioning normally (which is where exceptions should 
normally be used).

As such, I think that it's clear that we need a new function to replace it 
(e.g. isValidUnicode). I'll have to take a look at it. If I'm lucky, it won't 
even take all that long to implement.

- Jonathan M Davis

Feb 07 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Thursday, 6 February 2014 at 21:48:13 UTC, Dicebot wrote:
 On Thursday, 6 February 2014 at 19:54:27 UTC, Sean Kelly wrote:
 Does this case even matter?  Exceptions are not a normal 
 function of execution, and so should happen rarely to never.  
 And it's a time when I'd expect a delay anyway.

 Imagine intentionally crafted broken utf as user input in 
 repeated requests. You don't have control over it.

 Now if Phobos would have only thrown exceptions in really 
 _exceptional_ situations and handled broken input gracefully...

That's a tough one.  Bad input typically shouldn't generate an 
exception, but sometimes doing so is handy from a flow control 
perspective (I know I know, exceptions aren't for flow control).  
In the few instances where I use an exception for flow control 
though (like core.demangle) I always use a static instance, so no 
allocation occurs, and it's entirely internal to the routine.

I think it's fair to say that _an_API_ shouldn't allocate and 
throw an exception to indicate an expected error condition.  For 
a parser, invalid input definitely applies.  So then if the user 
wants to throw an exception in that case, they can do so 
themselves.  Then the choice of allocation is left to the user, 
not imposed on them.  It's generally really easy to let the user 
supply a delegate to execute on error too, so they don't even 
necessarily have to check a return code.

Feb 06 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Dicebot:

 Now if Phobos would have only thrown exceptions in really 
 _exceptional_ situations and handled broken input gracefully...

I wrote two small ideas to reduce throwing exceptions in Phobos:

http://d.puremagic.com/issues/show_bug.cgi?id=6840
http://d.puremagic.com/issues/show_bug.cgi?id=11913

Bye,
bearophile

Feb 07 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/6/2014 11:54 AM, Sean Kelly wrote:
 Does this case even matter?  Exceptions are not a normal function of execution,
 and so should happen rarely to never.  And it's a time when I'd expect a delay
 anyway.

Right. If you're:

1. using throws as control flow logic

2. requiring a throw in a performance critical loop to be performance critical

3. doing so many throws that the garbage collector needs to run to clean them up

you're doing it wrong.

I'm tempted to say that the throw expression can call 'new' even if the
function 
is marked as  nogc.

Feb 06 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:
 Right. If you're:

 1. using throws as control flow logic

[...]
 you're doing it wrong.

I disagree.

REST based web services tend to use throws all the time. It is a 
an effective and clean way to break all transactions that are in 
progress throughout the call chain when you cannot carry through 
a request, or if the request returns nothing.

Feb 06 2014

"Brad Anderson" <eco gnuk.net> writes:

On Friday, 7 February 2014 at 01:31:17 UTC, Ola Fosheim Grøstad
wrote:
 On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:
 Right. If you're:

 1. using throws as control flow logic

 [...]
 you're doing it wrong.

 I disagree.

 REST based web services tend to use throws all the time. It is 
 a an effective and clean way to break all transactions that are 
 in progress throughout the call chain when you cannot carry 
 through a request, or if the request returns nothing.

I think in the case of people using exceptions for control flow a 
GC.free in your exception handler would suffice for preventing 
the GC heap from growing to
the point where collection times become a concern.

Feb 06 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/6/2014 5:31 PM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:
 Right. If you're:

 1. using throws as control flow logic

 [...]
 you're doing it wrong.

 I disagree.

 REST based web services tend to use throws all the time. It is a an effective
 and clean way to break all transactions that are in progress throughout the
call
 chain when you cannot carry through a request, or if the request returns
nothing.

They're going to be slow when you do it that way.

Feb 06 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 7 February 2014 at 02:42:14 UTC, Walter Bright wrote:
 They're going to be slow when you do it that way.

How slow is slow? Is it slower than in Go and Python? Why would 
unwinding 8 stack frames be so slow? Is it a language mandated 
speed issue or just a runtime issue that could be fixed with a 
compiler switch?

Most of the time is spent waiting for async request from 
memcaches/databases and other types of network traffic so you 
usually have some free cycles on a decent CPU. With native code 
and lightweight threads (coroutines) you should be able to handle 
100+ concurrent requests per process.

Feb 07 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad 
wrote:
 usually have some free cycles on a decent CPU. With native code 
 and lightweight threads (coroutines) you should be able to 
 handle 100+ concurrent requests per process.

When I think of it you could probably just push the RESTException 
throwing coroutine onto a "delayed request queue" since a timeout 
on a transaction might be no worse than aborting it (or carry 
along some kind of context object). That would make DoS less 
problematic too and you get better latency for good requests and 
complete the bad requests when you are idle.

Feb 07 2014

"Dicebot" <public dicebot.lv> writes:

On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad 
wrote:
 Is it a language mandated speed issue?

It is assumed by http://dlang.org/errors.html

Feb 07 2014

"Dicebot" <public dicebot.lv> writes:

On Friday, 7 February 2014 at 11:41:43 UTC, Dicebot wrote:
 On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad 
 wrote:
 Is it a language mandated speed issue?

 It is assumed by http://dlang.org/errors.html

P.S. Throwing exception is not that slow in D, it is allocating 
new instance that makes a huge impact.

Feb 07 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/7/2014 3:42 AM, Dicebot wrote:
 P.S. Throwing exception is not that slow in D, it is allocating new instance
 that makes a huge impact.

Throwing speed can vary greatly from platform to platform.

The idea, as in C++, is when there's a speed tradeoff between throw/catch speed 
and compromising speed to handle the possibility of exceptions, the non-throw 
case gets priority.

Feb 07 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad 
wrote:
 How slow is slow? Is it slower than in Go and Python?

One problem with allocating the exception is the stop-the-world 
thing. My cgi.d's built in httpd does some allocations in its 
constructor, which is run once per request. It can answer 
requests at a rate of about 6000/sec on my computer...

Until the allocation have gone too much and the GC starts 
running. Then all the pending requests stop, killing the 
throughput.

(BTW, interestingly, on Linux it uses separate process pools 
instead of threads. The GC does NOT stop the world since the 
other processes can keep going. But, if the requests are fairly 
uniform - as is typically the case with benchmarks - each process 
hits the GC threshold at about the same time.... ironically, it 
is the deterministic nature of the GC that leads to the 
performance killer there.)

Feb 07 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Friday, 7 February 2014 at 15:33:01 UTC, Adam D. Ruppe wrote:
 On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad 
 wrote:
 How slow is slow? Is it slower than in Go and Python?

 One problem with allocating the exception is the stop-the-world 
 thing. My cgi.d's built in httpd does some allocations in its 
 constructor, which is run once per request. It can answer 
 requests at a rate of about 6000/sec on my computer...

 Until the allocation have gone too much and the GC starts 
 running. Then all the pending requests stop, killing the 
 throughput.

 (BTW, interestingly, on Linux it uses separate process pools 
 instead of threads. The GC does NOT stop the world since the 
 other processes can keep going. But, if the requests are fairly 
 uniform - as is typically the case with benchmarks - each 
 process hits the GC threshold at about the same time.... 
 ironically, it is the deterministic nature of the GC that leads 
 to the performance killer there.)

It's obviously not a solution, but you could change that by 
having each process call GC.reserve() with a different size.

Feb 07 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 7 February 2014 at 15:33:01 UTC, Adam D. Ruppe wrote:
 One problem with allocating the exception is the stop-the-world 
 thing.

Ok, well I guess that primarily is an issue for validation errors 
where you need to return detailed error reporting. "Not Found" 
etc can be preallocated as immutable, or?

 constructor, which is run once per request. It can answer 
 requests at a rate of about 6000/sec on my computer...

That sounds pretty good, was that as localhost, or over a network?

 (BTW, interestingly, on Linux it uses separate process pools 
 instead of threads. The GC does NOT stop the world since the 
 other processes can keep going. But, if the requests are fairly 
 uniform - as is typically the case with benchmarks - each 
 process hits the GC threshold at about the same time.... 
 ironically, it is the deterministic nature of the GC that leads 
 to the performance killer there.)

You could synchronize them by calling the GC explicitly N seconds 
after the other process GC or you if you use a load balancer, 
maybe the GC could be scheduled by the load balancer or notify 
the load balancer (assuming all requests are short-lived).

This won't work for a simulation type server though. (which is 
what I am most interested in)

Feb 07 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 7 February 2014 at 17:10:15 UTC, Ola Fosheim Grøstad 
wrote:
 Ok, well I guess that primarily is an issue for validation 
 errors where you need to return detailed error reporting. "Not 
 Found" etc can be preallocated as immutable, or?

yeah, preallocating exceptions might be a really good idea.

 That sounds pretty good, was that as localhost, or over a 
 network?

localhost, and it was just hello world, performance of my thing 
degrades kinda quickly - it never gets /bad/, but it isn't great 
either once it starts doing more stuff than the basisc (but it is 
soooo easy to use! for me anyway)

 You could synchronize them by calling the GC explicitly N 
 seconds after the other process GC or you if you use a load 
 balancer, maybe the GC could be scheduled by the load balancer 
 or notify the load balancer (assuming all requests are 
 short-lived).

yeah. I'm not even sure if it would be a big deal in practice 
because there's often a lull anyway where the gc can get caught 
up (certainly not a problem for the lower traffic sites I mostly 
work on)

Feb 07 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 7 February 2014 at 20:41:01 UTC, Adam D. Ruppe wrote:
 yeah, preallocating exceptions might be a really good idea.

I wonder if it would be possible to get better unwinding speed by 
only throwing a single type of exception class and only a single 
catch. Then do pattern matching on an embedded typefield.

I.e.:

if (e.id & MASK_5xx) {}
if (e.id & MASK_409) {}

etc.

After looking at the code for stack unwinding it seems like 
keeping the loops short is essential.

Feb 07 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/7/2014 7:33 AM, Adam D. Ruppe wrote:
 On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad wrote:
 How slow is slow? Is it slower than in Go and Python?

 One problem with allocating the exception is the stop-the-world thing. My
 cgi.d's built in httpd does some allocations in its constructor, which is run
 once per request. It can answer requests at a rate of about 6000/sec on my
 computer...

The gc is not the real speed issue with exceptions, after all, one can 
preallocate the exception:

     throw new Exception();

           v.s.

     e = new Exception();
     ...
     throw e;

It's the unwinding speed. Just have a look at what deh2.d has to do.

Feb 07 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

07-Feb-2014 23:45, Walter Bright пишет:
 On 2/7/2014 7:33 AM, Adam D. Ruppe wrote:
 On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad wrote:
 How slow is slow? Is it slower than in Go and Python?

 One problem with allocating the exception is the stop-the-world thing. My
 cgi.d's built in httpd does some allocations in its constructor, which
 is run
 once per request. It can answer requests at a rate of about 6000/sec
 on my
 computer...

 The gc is not the real speed issue with exceptions, after all, one can
 preallocate the exception:

      throw new Exception();

            v.s.

      e = new Exception();
      ...
      throw e;

And the standard library basically can't do this for every function.

 It's the unwinding speed. Just have a look at what deh2.d has to do.

It's deh.d or rather deh_win32./ deh_win64_posix.d and it doesn't look 
like _all_ that lot especially if you have no finally blocks and the 
only catch is the top-most catch-all.

After all error codes would also have to propagate up the same call 
stack depth.

-- 
Dmitry Olshansky

Feb 07 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/7/2014 12:51 PM, Dmitry Olshansky wrote:
 It's deh.d or rather deh_win32./ deh_win64_posix.d and it doesn't look like
 _all_ that lot especially if you have no finally blocks and the only catch is
 the top-most catch-all.

It's a heluva lot slower than "jmp".

Feb 08 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

09-Feb-2014 02:17, Walter Bright пишет:
 On 2/7/2014 12:51 PM, Dmitry Olshansky wrote:
 It's deh.d or rather deh_win32./ deh_win64_posix.d and it doesn't look
 like
 _all_ that lot especially if you have no finally blocks and the only
 catch is
 the top-most catch-all.

 It's a heluva lot slower than "jmp".

If you can show me how a single unconditional jump propagates error code 
4 calls up the stack I'm sold.

I do understand it's slow, it's not that slow to make difference in the 
discussed case. It's all about jumping to the wrong conclusions.

To put it in one pitch: it should be possible to throw/catch in excess 
of 100k exceptions per second no problem at all (assuming a single core 
of some run of the mill modern CPU).

Nobody is asking to optimize it better then the normal flow.

-- 
Dmitry Olshansky

Feb 09 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/9/2014 2:17 AM, Dmitry Olshansky wrote:
 If you can show me how a single unconditional jump propagates error code 4
calls
 up the stack I'm sold.

 I do understand it's slow, it's not that slow to make difference in the
 discussed case. It's all about jumping to the wrong conclusions.

 To put it in one pitch: it should be possible to throw/catch in excess of 100k
 exceptions per second no problem at all (assuming a single core of some run of
 the mill modern CPU).

 Nobody is asking to optimize it better then the normal flow.

It's the table lookup that's inherently slow.

Feb 10 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Friday, 7 February 2014 at 01:31:17 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:
 Right. If you're:

 1. using throws as control flow logic

 [...]
 you're doing it wrong.

 I disagree.

 REST based web services tend to use throws all the time. It is 
 a an effective and clean way to break all transactions that are 
 in progress throughout the call chain when you cannot carry 
 through a request, or if the request returns nothing.

But let this be up to the programmer working on the service, not 
imposed on them by the API.  Then if they run into something like 
this DoS issue they can fix it.  My experience with these 
services is that performance is critical and bad input is common, 
because people are always trying to hack your shit.

Where I work, people are serious about performance, our daily 
volume is ridiculous, and our goal is five nine's of uptime 
across the board.  At the same time, really good asynchronous 
programmers are about as rare as water on the moon.  So something 
like vibe.d, where mid-level programmers could write correct code 
that still performs well thanks to the underlying event model, 
would be a godsend.  But only if I really can get what I pay for.

The thing I think a lot of people don't realize these days is 
that performance per watt is just about the most important thing 
there is.  Data centers are expensive, slow to build, and rack 
space is limited.  If you can find a way to increase the 
concurrent load per box by, say, an order of magnitude by 
choosing a different language or programming model or whatever, 
there's a real economic motivation to do so.

Java gets by by having a really good GC and a low barrier of 
entry, but its scalability is really pretty poor all things 
considered.  On the other hand, C/C++ scales tremendously but 
then you're stuck with the burden those languages impose in terms 
of semantic complexity, bug frequency, and so on.  D seems really 
promising here but can't rely on having a fantastic incremental 
GC like Java, and so I think it's a mistake to use Java as a 
model for how to manage memory.  And maybe Java just got it wrong 
anyway.  I know some people who had to go to ridiculous lengths 
to avoid GC collection cycles in Java because a collection in the 
app took _20_seconds_ to complete.  Now maybe the application was 
poorly designed or they should have been using an aftermarket GC, 
but even so.

Finally, library programming is the one place where premature 
optimization really is a good idea, because you can never be sure 
how people will be using your code.  That allocation may not be a 
big deal to you or 98% of your users, but for the one big client 
who calls that routine in a tight inner loop or operates at 
volumes you never conceived of it's a deal breaker.  I really 
don't want Phobos to be the deal breaker :-)

Feb 06 2014

"Dicebot" <public dicebot.lv> writes:

On Friday, 7 February 2014 at 01:31:17 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:
 Right. If you're:

 1. using throws as control flow logic

 [...]
 you're doing it wrong.

 I disagree.

 REST based web services tend to use throws all the time. It is 
 a an effective and clean way to break all transactions that are 
 in progress throughout the call chain when you cannot carry 
 through a request, or if the request returns nothing.

And it is horrible. Exceptions were never designed for this. Try 
benchmarking trivial vibe.d REST service looking up an entry in 
an array and throwing 404 upon failure. Difference in performanc 
between "all requests are 200" and "all requests are 404" will be 
of order of magnitude.

Feb 07 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/14, 5:23 PM, Walter Bright wrote:
 I'm tempted to say that the throw expression can call 'new' even if the
 function is marked as  nogc.

That's extreme. A better possibility is to allocate exceptions from a 
different heap and proclaim that the heap is cleaned once all catch 
blocks are left. (I'm sure we can find something better, but now is not 
the time to worry about it.)

Andrei

Feb 06 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 7 February 2014 at 02:19:42 UTC, Andrei Alexandrescu 
wrote:
 A better possibility is to allocate exceptions from a different 
 heap and proclaim that the heap is cleaned once all catch 
 blocks are left.

I wrote a quick proof of concept of this that can be tested right 
now:
http://arsdnet.net/dcode/except.d

It hooks _d_newclass to allocate Throwables on a little static 
bump-the-pointer array. Each catch block has a scope(success) in 
it that zeroes the throwables area back out to zero.

Feb 06 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 2/6/2014 6:19 PM, Andrei Alexandrescu wrote:
 On 2/6/14, 5:23 PM, Walter Bright wrote:
 I'm tempted to say that the throw expression can call 'new' even if the
 function is marked as  nogc.

 That's extreme. A better possibility is to allocate exceptions from a different
 heap and proclaim that the heap is cleaned once all catch blocks are left. (I'm
 sure we can find something better, but now is not the time to worry about it.)

That doesn't work, as nothing prevents code from squirreling away the caught 
exception object handle.

Feb 07 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 7 February 2014 at 08:32:04 UTC, Walter Bright wrote:
 That doesn't work, as nothing prevents code from squirreling 
 away the caught exception object handle.

scope would. I'm just saying.

We could also just document it as undefined behavior and leave 
matters in the user's hands, but this wouldn't jive nicely with 
 safe :(

Feb 07 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Friday, 7 February 2014 at 15:41:59 UTC, Adam D. Ruppe wrote:
 On Friday, 7 February 2014 at 08:32:04 UTC, Walter Bright wrote:
 That doesn't work, as nothing prevents code from squirreling 
 away the caught exception object handle.

 scope would. I'm just saying.

 We could also just document it as undefined behavior and leave 
 matters in the user's hands, but this wouldn't jive nicely with 
  safe :(

Thread stores an uncaught exception reference so it can be 
rethrown on join().  But I suppose a case could be made that an 
uncaught exception could either be discarded or abort the app.

Feb 07 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 7 February 2014 at 15:44:08 UTC, Sean Kelly wrote:
 But I suppose a case could be made that an uncaught exception 
 could either be discarded or abort the app.

It could also make a copy at that time on to the regular GC heap 
and store that (the members of the throwable class are still GC'd 
so all the store function has to do is a shallow copy, using the 
RTTI to get the correct size to copy, onto the gc heap). It'd 
surely be fewer exceptions to get through that than the thrown, 
caught, and subsequentely discarded typical case.

Feb 07 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 7 February 2014 at 15:48:56 UTC, Adam D. Ruppe wrote:
 It could also make a copy at that time on to the regular GC 
 heap and store that

lol just add in a quick call to .toGC when you want to store it:

T toGC(T)(T t) if(is(T==class)) {
     auto size = typeid(t).init.length;
     import core.memory;
     auto ptr = GC.malloc(size);
     ptr[0 .. size] = (cast(void*) t)[0 .. size];
     return cast(T) ptr;
}

Feb 07 2014

Jerry <jlquinn optonline.net> writes:

Walter Bright <newshound2 digitalmars.com> writes:

 On 2/6/2014 6:19 PM, Andrei Alexandrescu wrote:
 On 2/6/14, 5:23 PM, Walter Bright wrote:
 I'm tempted to say that the throw expression can call 'new' even if the
 function is marked as  nogc.

 That's extreme. A better possibility is to allocate exceptions from a different
 heap and proclaim that the heap is cleaned once all catch blocks are left. (I'm
 sure we can find something better, but now is not the time to worry about it.)

 That doesn't work, as nothing prevents code from squirreling away the caught
 exception object handle.

Very naive question (that may have already been answered), but why can't
throw use structs instead of classes?  Then the exception would
propagate by copy rather than passing the object up the stack?

Feb 07 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 7 February 2014 at 18:28:24 UTC, Jerry wrote:
 throw use structs instead of classes?

I think that'd be more costly and would mess up the whole 
inheritance checks; catch(Exception) wouldn't catch the same 
children.

Feb 07 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Jerry:

 throw use structs instead of classes?


This thread discusses the (low) performance of D exceptions, and 
suggests some ideas:
https://d.puremagic.com/issues/show_bug.cgi?id=9584

Another thread:
https://d.puremagic.com/issues/show_bug.cgi?id=9581

The thread also discusses an old idea from Java:
http://www.javaspecialists.eu/archive/Issue187.html

Bye,
bearophile

Feb 07 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Friday, 7 February 2014 at 18:45:24 UTC, bearophile wrote:
 Jerry:

 throw use structs instead of classes?


 This thread discusses the (low) performance of D exceptions, 
 and suggests some ideas:
 https://d.puremagic.com/issues/show_bug.cgi?id=9584

 Another thread:
 https://d.puremagic.com/issues/show_bug.cgi?id=9581

 The thread also discusses an old idea from Java:
 http://www.javaspecialists.eu/archive/Issue187.html

Okay, I'm going to look into generating traces lazily.  I think
it should be possible.

Feb 07 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:
 Hey, wait a second. How do you throw without allocating?

Throw pre-allocated thread-local exception. And make a deep copy 
for it if it is going to be put into exception chain to avoid 
modifying one already in chain.

I have been told in that PR that some of language features assume 
exception instances are always unique and rely on it. It sounds 
like major language design flaw that will block usage of Phobos 
in memory-caring code even if other issues are taken care of. 
Probably language spec should be relaxed to fix this.

Feb 06 2014

"Brad Anderson" <eco gnuk.net> writes:

On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu 
wrote:
 On 2/6/14, 10:05 AM, Johannes Pfau wrote:
 Am Thu, 06 Feb 2014 16:32:08 +0000
 schrieb "Dicebot" <public dicebot.lv>:

 On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei 
 Alexandrescu
 wrote:
 Would anyone be willing to take on the ingrate task of 
 creating
 a comprehensive list with all Phobos functions (and more
 generally artifacts) that allocate memory? That would help a
 lot with focusing the discussion.

 Andrei

 Merging 
 https://github.com/D-Programming-Language/dmd/pull/1886
 and running phobos unit tests should make it relatively 
 simple,
 at least for a first pass.

 That's only for implicit allocations though. And please, don't 
 merge
 yet, it'll get another rewrite this weekend ;-)

 Please close if you plan to rewrite.

 One interesting point is that module that were written with 
 avoiding
 allocations in mind usually still allocate when throwing 
 exceptions.

 Good point, we need to address that as well.

I'd think fixing that is probably above and beyond what is 
required to satisfy most people. If you are throwing so many 
exceptions that GC pauses are a problem you've got more serious 
problems than the GC.

nothrow doesn't concern itself with Error exceptions, I think 
nogc should just ignore exceptions generally.

 Andrei

Feb 06 2014

Iain Buclaw <ibuclaw gdcproject.org> writes:

On 6 February 2014 18:05, Johannes Pfau <nospam example.com> wrote:
 Am Thu, 06 Feb 2014 16:32:08 +0000
 schrieb "Dicebot" <public dicebot.lv>:

 On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu
 wrote:
 Would anyone be willing to take on the ingrate task of creating
 a comprehensive list with all Phobos functions (and more
 generally artifacts) that allocate memory? That would help a
 lot with focusing the discussion.

 Andrei

 Merging https://github.com/D-Programming-Language/dmd/pull/1886
 and running phobos unit tests should make it relatively simple,
 at least for a first pass.

 That's only for implicit allocations though. And please, don't merge
 yet, it'll get another rewrite this weekend ;-)

 One interesting point is that module that were written with avoiding
 allocations in mind usually still allocate when throwing exceptions.

 Here's some example output for
 std.uuid/digest/path/range/algorithm/curl:
 http://dpaste.dzfl.pl/96d3725b06e2

That message will look much better with vcolumns.  ;)

Albeit, it also depends on moving fprint(global.stdmsg, ...)  =>  message(...)

http://dpaste.dzfl.pl/5b1961918ed6

Feb 06 2014

Iain Buclaw <ibuclaw gdcproject.org> writes:

On 6 February 2014 19:03, Iain Buclaw <ibuclaw gdcproject.org> wrote:
 On 6 February 2014 18:05, Johannes Pfau <nospam example.com> wrote:
 Am Thu, 06 Feb 2014 16:32:08 +0000
 schrieb "Dicebot" <public dicebot.lv>:

 On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu
 wrote:
 Would anyone be willing to take on the ingrate task of creating
 a comprehensive list with all Phobos functions (and more
 generally artifacts) that allocate memory? That would help a
 lot with focusing the discussion.

 Andrei

 Merging https://github.com/D-Programming-Language/dmd/pull/1886
 and running phobos unit tests should make it relatively simple,
 at least for a first pass.

 That's only for implicit allocations though. And please, don't merge
 yet, it'll get another rewrite this weekend ;-)

 One interesting point is that module that were written with avoiding
 allocations in mind usually still allocate when throwing exceptions.

 Here's some example output for
 std.uuid/digest/path/range/algorithm/curl:
 http://dpaste.dzfl.pl/96d3725b06e2

 That message will look much better with vcolumns.  ;)

 Albeit, it also depends on moving fprint(global.stdmsg, ...)  =>  message(...)

 http://dpaste.dzfl.pl/5b1961918ed6

Saying that, it seems it doesn't show the column number correctly.

http://dpaste.dzfl.pl/31c8800e223a

Feb 06 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Johannes Pfau:

 Here's some example output for
 std.uuid/digest/path/range/algorithm/curl:
 http://dpaste.dzfl.pl/96d3725b06e2

 ./dmd -vgc ~/Dokumente/d/phobos/std/range.d -c -unittest
 /home/jpf/Dokumente/d/phobos/std/range.d(7307): vgc: Array 
 literals cause gc allocation

Since some time in some cases dynamic array literals don't 
allocate.

And there's also this:
https://github.com/D-Programming-Language/dmd/pull/2952
the [1, 2]s syntax guarantees no heap allocation.

Bye,
bearophile

Feb 06 2014

"Namespace" <rswhite4 googlemail.com> writes:

On Thursday, 6 February 2014 at 20:40:28 UTC, bearophile wrote:
 Johannes Pfau:

 Here's some example output for
 std.uuid/digest/path/range/algorithm/curl:
 http://dpaste.dzfl.pl/96d3725b06e2

 ./dmd -vgc ~/Dokumente/d/phobos/std/range.d -c -unittest
 /home/jpf/Dokumente/d/phobos/std/range.d(7307): vgc: Array 
 literals cause gc allocation

 Since some time in some cases dynamic array literals don't 
 allocate.

 And there's also this:
 https://github.com/D-Programming-Language/dmd/pull/2952
 the [1, 2]s syntax guarantees no heap allocation.

 Bye,
 bearophile

My pull was not perfect. And I have no time to finish the type[$] 
and auto[$] pull. :/

Feb 06 2014

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, February 08, 2014 09:20:15 Andrej Mitrovic wrote:
 On 2/7/14, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 However, I would argue that assuming that everyone is going to validate
 their
 strings and that pretty much all string-related functions shouldn't ever
 have
 to worry about invalid Unicode is just begging for subtle bugs all over
 the
 
 place IMHO.

 
 I suggested we would introduce an overload, not replace the existing
 function, so this isn't an issue.
 
 The problem is that you need to check it. This is _slower_ than exceptions
 in

 the normal case, as invalid Unicode should be the rare case.
 
 Do you have any benchmarks for this? I have vague memory about
 complaining that the exception code is *de-facto* slower, regardless
 of input. But I'll try to provide some test-cases later and see where
 we're at.

The exception version has to all of the same checks that the version which 
returns an error value would have to do, while the one returning an error 
value which had to be checked for validity would have an extra check. So, the 
only ways that the exception version would be slower are if the plumbing for 
being able to throw an exception from the function makes it slower (assuming 
that the other would be nothrow) or if the optimizer just does worse with the 
exception one for some reason. Because the number of operations that the 
actual D code would be doing in the successful case would be greater for the 
non-throwing version. Code generation can do entertaining things to efficiency 
though, so benchmarking would be required to see what would actually happen.

However, as I stated in another post, I've reconsidered the situation. I think 
that I misunderstood what Dmitry was suggesting and that checking the error 
value is not actually necessary:

http://forum.dlang.org/post/mailman.66.1391838333.21734.digitalmars-d puremagic.com

And if that's the case, then we can probably move towards having decode not 
throw and possibly getting rid of UTFException altogether (certainly, most 
code wouldn't throw it or have to worry about it, since decode and stride are 
the two  main cases where that's a concern, and if they don't throw anymore, 
then UTFException would have very little use).

- Jonathan M Davis

Feb 08 2014

D Programming

C/C++ Programming

Other

digitalmars.D - List of Phobos functions that allocate memory?