digitalmars.D - List of Phobos functions that allocate memory?
- Andrei Alexandrescu (5/5) Feb 06 2014 Would anyone be willing to take on the ingrate task of creating a
- Dicebot (5/10) Feb 06 2014 Merging https://github.com/D-Programming-Language/dmd/pull/1886
- Andrej Mitrovic (7/10) Feb 06 2014 Running the tests is overkill, all you have to do is iterate over each
- Martin Cejp (9/22) Feb 06 2014 Quite a few of those seem to be false positives.
- Andrej Mitrovic (3/7) Feb 06 2014 Ah just realized there are duplicates in the report. I guess -vgc is
- Andrej Mitrovic (2/6) Feb 06 2014 Updated to remove duplicate reports.
- Andrei Alexandrescu (3/10) Feb 06 2014 Thanks. I guess we'd need to cross-reference to function names from ther...
- grm (3/16) Feb 06 2014 lots of them are throws tough
- grm (3/16) Feb 06 2014 and also new *XY*Exception doesn't indicate a problem necessarily
- Andrei Alexandrescu (3/20) Feb 06 2014 Good point. Seems to me code inspection would be a simpler way.
- Andrej Mitrovic (2/3) Feb 06 2014 Updated to include function names.
- Andrei Alexandrescu (6/9) Feb 06 2014 Noice. One
- Andrej Mitrovic (5/10) Feb 06 2014 Well I'm just hacking on the -vgc pull to output what I want, but
- Andrej Mitrovic (4/8) Feb 06 2014 Ah you've attached a file, didn't notice it on the left since I
- Dmitry Olshansky (6/9) Feb 06 2014 Hm.
- H. S. Teoh (6/16) Feb 06 2014 [...]
- Dmitry Olshansky (5/19) Feb 06 2014 O.T. From a pragmatic point of view any specific property of a system
- Dmitry Olshansky (6/13) Feb 06 2014 Needs to somehow cut down CTFE-only stuff.
- Johannes Pfau (9/21) Feb 06 2014 That's only for implicit allocations though. And please, don't merge
- Andrei Alexandrescu (4/22) Feb 06 2014 Good point, we need to address that as well.
- grm (14/19) Feb 06 2014 expecting the requested close, so some OTs (in random order):
- fra (3/9) Feb 06 2014 Hey, wait a second. How do you throw without allocating?
- Andrei Alexandrescu (4/13) Feb 06 2014 I don't know yet. That's what the "addressing the problem" will take
- H. S. Teoh (32/47) Feb 06 2014 [...]
- Johannes Pfau (4/18) Feb 06 2014 You can store the exception as a global and that's done for the
- Johannes Pfau (4/24) Feb 06 2014 Oh and in other languages you can throw by value but I think that
- Andrej Mitrovic (5/7) Feb 06 2014 Hmm.. is that even safe? I mean in some case of exception
- Namespace (3/11) Feb 06 2014 You could use a circular buffer with appropriate length.
- Iain Buclaw (2/13) Feb 06 2014 You can't. :o)
- Adam D. Ruppe (6/7) Feb 06 2014 I think exceptions should be ok. You optimize the typical path,
- Johannes Pfau (7/15) Feb 06 2014 That depends on your situation. For games and other applications on
- Adam D. Ruppe (8/11) Feb 06 2014 Yeah, when I toyed with bare metal D, I did exceptions with
- Dicebot (7/14) Feb 06 2014 Hardly so. Any exception allocation can trigger GC collection
- Brad Anderson (5/22) Feb 06 2014 Personally I don't think bad user input qualifies as an
- Dicebot (6/10) Feb 06 2014 I agree. It kills the whole concept of "exceptions are rare so
- Brad Anderson (9/20) Feb 06 2014 I must admit that I am guilty of sometimes using exceptions for
- Walter Bright (3/6) Feb 06 2014 It's not a matter of taste. If your input is subject to a DoS attack, do...
- bearophile (8/10) Feb 06 2014 Perhaps the world of today malicious attacks on the software you
- Walter Bright (2/10) Feb 07 2014 DoS attack resistance requires faster code, not slower code.
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (6/11) Feb 07 2014 The specific problem was that it was possible to provoke hash
- Walter Bright (3/7) Feb 08 2014 That has nothing to do with needing exceptions in the control flow path ...
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (5/21) Feb 09 2014 Huh? I responded to this discussion:
- John Colvin (12/29) Feb 07 2014 I think bearophile is referring to a practice of avoiding fast
- bearophile (4/5) Feb 07 2014 Yes, you have explained well my point. Thank you.
- Dmitry Olshansky (5/13) Feb 07 2014 Meh. If exceptions are such a liability we'd better make them (much)
- Sean Kelly (15/17) Feb 07 2014 It's not stack unwinding speed that's an issue here though, but
- Dmitry Olshansky (22/41) Feb 07 2014 Why throwing a single exception is such a big problem? Surely even C's
- Dicebot (7/16) Feb 07 2014 As I have already mentioned, they don't necessarily need to be.
- Walter Bright (3/7) Feb 08 2014 It is NOT the allocation that's the issue. C++ code has the same issue. ...
- Sean Kelly (26/59) Feb 07 2014 That can be turned off at run time by clearing the traceHandler.
-
Adam Wilson
(21/78)
Feb 07 2014
On Fri, 07 Feb 2014 10:54:37 -0800, Sean Kelly
... - Dmitry Olshansky (27/79) Feb 07 2014 Which should be somehow prominently advertised for release builds. Last
- Walter Bright (5/6) Feb 08 2014 Code can always pre-allocate the exception that is thrown. There's no re...
- Walter Bright (5/6) Feb 08 2014 Because in order to unwind the stack, you need to find the information a...
- Dmitry Olshansky (9/15) Feb 08 2014 A special table lookup can't be slow compared to writing a dummy HTTP
- Adam D. Ruppe (7/10) Feb 08 2014 Can you see if it is better with this little patch?
- Walter Bright (5/18) Feb 08 2014 I don't know how vibe.d works, but my point is using exception handling ...
- Jonathan M Davis (24/46) Feb 08 2014 I wouldn't have considered throwing on an HTTP error to be "flow control...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (20/25) Feb 09 2014 Just to be pedantic: this is not true.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/6) Feb 09 2014 And with profiling you get the call-frequency between functions,
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (5/8) Feb 08 2014 "Compromising"? You mean they had to modify codegen, which they
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/6) Feb 08 2014 But the c++ Dwarf way of doing it was developed for Itanium which
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/8) Feb 08 2014 AND (this just has to be said) if D is really meant to be a SAFE
- Walter Bright (5/11) Feb 08 2014 Ola, I've done it both ways, I actually do know what I'm talking about.
- Marco Leise (13/27) Feb 08 2014 ation,
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (35/39) Feb 09 2014 Please note that "you" and "they" was meant as "one" or "the c++
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/6) Feb 09 2014 This is a pretty nice description of the i7 pipeline by Hennesey
- Jonathan M Davis (16/29) Feb 07 2014 Related: http://d.puremagic.com/issues/show_bug.cgi?id=9584
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (16/18) Feb 07 2014 Well, it is at least more difficult to write reliable code when
- Walter Bright (10/11) Feb 08 2014 Grep for 'throw' in std.datetime shows that every throw is actually:
- Jonathan M Davis (15/32) Feb 08 2014 Of course allocation is not a language issue. The question is whether (a...
- Andrei Alexandrescu (6/18) Feb 07 2014 One simple idea is to statically allocate the same exception and rethrow...
- Jonathan M Davis (23/42) Feb 07 2014 As long as exceptions are cloneable, and people are aware of the fact th...
- Jakob Ovrum (4/10) Feb 08 2014 I don't think it's that simple. What happens if an XException
- Dmitry Olshansky (7/17) Feb 08 2014 If both are thread-local and cached I see no problem whatsoever.
- Jakob Ovrum (5/10) Feb 08 2014 How is it not a problem? XException's fields (message, location
- Jonathan M Davis (6/18) Feb 08 2014 Then we have multiple of them, or we new up another one when a second on...
- Jakob Ovrum (6/28) Feb 08 2014 Yes, I'm sure there is a cool solution, I'm just pointing out
- Marco Leise (23/55) Feb 08 2014 Yes, it doesn't seem feasible otherwise. Since you can call
- Jakob Ovrum (4/23) Feb 09 2014 While writes directly to line and file and such can't be
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (5/16) Feb 09 2014 It's supposedly one exception instance per place where it can be
- Andrei Alexandrescu (3/13) Feb 08 2014 The chaining method detects that and .dup's one of them.
- Dicebot (7/9) Feb 08 2014 After some thinking I don't think it actually helps - exception
- Jakob Ovrum (18/38) Feb 08 2014 What if the statically allocated XException is escaped to be
- Walter Bright (6/7) Feb 08 2014 They can be made faster by slowing down non-exception code.
- Marco Leise (18/28) Feb 08 2014 https://yourlogicalfallacyis.com/black-or-white
- Walter Bright (6/9) Feb 08 2014 Sigh, once again,
- Marco Leise (29/50) Feb 08 2014 Content-Disposition: inline
- Jakob Ovrum (3/16) Feb 09 2014 This doesn't seem like a valid concern. Nothing stops you from
- Lars T. Kyllingstad (3/4) Feb 09 2014 Off topic, but that is a fantastic web site. I wish I had known
- Andrei Alexandrescu (4/23) Feb 09 2014 Function calls could do that.
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (12/16) Feb 07 2014 Hmm... then what _does_ qualify as exceptional in your opinion?
- Dicebot (6/17) Feb 07 2014 It is exceptional situation if input is supposed to be valid but
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (16/36) Feb 07 2014 If the function expects it to be valid but you pass it an invalid
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (55/59) Feb 07 2014 I agree. Any situation where it makes sense to say:
- Jonathan M Davis (80/87) Feb 07 2014 Honestly, I think that the typical approach of discussing exceptions as ...
- bearophile (6/30) Feb 07 2014 Languages with a good type system solve this with Maybe /
- Jonathan M Davis (7/36) Feb 07 2014 That can be a good solution, but it also then requires checking the resu...
- Adam D. Ruppe (130/132) Feb 06 2014 Hmm, I hadn't considered that. Maybe exceptions could be handled
- Adam D. Ruppe (3/4) Feb 06 2014 code in a link so the lines aren't broken
- Sean Kelly (8/16) Feb 06 2014 I really like vibe.d. A lot. But the way HTTP parse errors are
- Adam D. Ruppe (5/7) Feb 06 2014 lol, my cgi.d will do that too if you compile with -debug.... I
- Jacob Carlborg (9/15) Feb 07 2014 Ruby on Rails renders a page with a stack trace in development mode and
- Sean Kelly (8/26) Feb 07 2014 I was mostly surprised that the stack trace was written back to
- Jacob Carlborg (9/16) Feb 09 2014 Ruby on Rails always writes the stack trace to the log. In development
- Brad Anderson (9/14) Feb 06 2014 Thinking about this more it'd probably be a good idea to use the
- Dicebot (5/14) Feb 07 2014 Yes, I even had some simple proof-of-concept drafts of such
- Adam D. Ruppe (3/7) Feb 07 2014 Yeah, I think using separate types for printing to users is often
- Steven Schveighoffer (6/12) Feb 06 2014 I think if reference counting is added, exceptions would be a prime
- Sean Kelly (4/13) Feb 06 2014 Does this case even matter? Exceptions are not a normal function
- Andrei Alexandrescu (3/14) Feb 06 2014 I think it's okay to put this on the backburner and revisit it later.
- Dicebot (5/8) Feb 06 2014 Imagine intentionally crafted broken utf as user input in
- Brad Anderson (5/13) Feb 06 2014 You should probably validate utf from all foreign sources. Catch
- Dicebot (5/9) Feb 06 2014 pure @safe void validate(S)(in S str) if (isSomeString!S);
- Brad Anderson (2/13) Feb 06 2014 Heh, well then... let me just wipe this egg off my face. :P
- Sean Kelly (5/15) Feb 06 2014 And somewhere in the world, darkness fell forever on a bright and
- Adam D. Ruppe (10/13) Feb 06 2014 Yeah, that is absurd. It is a bad, bad sign when almost every
- Andrei Alexandrescu (3/14) Feb 07 2014 Add a bugzilla and let's define isValid that returns bool!
- Andrej Mitrovic (5/6) Feb 07 2014 Add std.utf.decode() to that as well. IOW, it should have an
- Dmitry Olshansky (5/9) Feb 07 2014 Much simpler - it returns a special dchar to designate bad encoding. And...
- Andrej Mitrovic (2/4) Feb 07 2014 A NaN for chars? Sounds great to me! :)
- Dmitry Olshansky (28/32) Feb 07 2014 It's called \uFFFD and is specifically for bad encodings. I wonder why
- Walter Bright (2/4) Feb 08 2014 Nice find. Looks good to me.
- Dmitry Olshansky (4/8) Feb 08 2014 https://d.puremagic.com/issues/show_bug.cgi?id=12113
- Jonathan M Davis (19/28) Feb 07 2014 Isn't that actually worse? Unless you're suggesting that we stop throwin...
- Meta (4/51) Feb 07 2014 You could always return an Option!char. Nullable won't work
- Jonathan M Davis (16/72) Feb 07 2014 How is that any better than returning an invalid dchar with a specific v...
- Meta (10/104) Feb 07 2014 We have had this discussion at least once before. A hypothetical
- Jonathan M Davis (23/48) Feb 07 2014 The problem is that you need to check it. This is _slower_ than exceptio...
- bearophile (7/9) Feb 07 2014 Right, but verifying the correctness of the Unicode encoding of a
- Jonathan M Davis (13/21) Feb 07 2014 But why even do it in the first place then? The code is cleaner and less...
- Marco Leise (16/40) Feb 07 2014 I agree with both of you. The Unicode standard tells us that
- Marco Leise (7/9) Feb 07 2014
- Jonathan M Davis (11/17) Feb 07 2014 I think that that would call for us to have 3 related but distinct funct...
- Marco Leise (10/29) Feb 07 2014 Yes, that's the one that needs to be added.
- Brad Anderson (12/16) Feb 08 2014 I wonder if it'd be too reckless to just make decode for string
- Timon Gehr (2/18) Feb 08 2014 "☹"[1..$]
- Dominikus Dittes Scherkl (13/16) Feb 08 2014 Why?
- Jonathan M Davis (26/47) Feb 07 2014 Actually, thinking this through some more, if we can replace invalid Uni...
- Dmitry Olshansky (8/35) Feb 08 2014 It is.
- Dmitry Olshansky (8/12) Feb 08 2014 This is ridiculously distracting suggestion and simply has no merits
- Meta (4/16) Feb 08 2014 I'm not actually suggesting a replacement. Just wishful thinking
- Jonathan M Davis (22/43) Feb 08 2014 I don't see how returning Nullable!dchar would improve decode function a...
- Dmitry Olshansky (21/50) Feb 08 2014 No, it's better and more flexible for those who care to repair broken
- Marco Leise (30/87) Feb 08 2014 nd
- Dmitry Olshansky (16/88) Feb 09 2014 Working with ranges of dchar? Nobody is taking eager validation from
- Daniel Murphy (2/6) Feb 09 2014 That would be a luxury, gedit doesn't even have auto-indent.
- Marco Leise (10/19) Feb 16 2014 You can talk about missing features in gedit all day, but from
- Daniel Murphy (3/5) Feb 17 2014 What do you use for displaying text, if not a text editor?
- Marco Leise (13/20) Feb 17 2014 That was directed at D development. Or programming with
- Marco Leise (11/30) Feb 16 2014 Of course it does. It is a valid symbol and a lot of websites
- Dmitry Olshansky (22/50) Feb 18 2014 In a sense, \uFFFD means broken encoding. What about lone surrogates?
- Andrej Mitrovic (5/8) Feb 18 2014 OT: Considering how many big-budget events (World Cup / Olympics) do
- Marco Leise (16/37) Feb 18 2014 In a sense yes, in another no. It is a defined code point and
- Andrej Mitrovic (8/15) Feb 08 2014 I suggested we would introduce an overload, not replace the existing
- Dmitry Olshansky (8/24) Feb 08 2014 Just be sure to test on LDC or GDC. DMD results are irrelevant to the
- Andrei Alexandrescu (3/8) Feb 07 2014 .toBugzilla()
- Dicebot (5/14) Feb 07 2014 True words indeed!
- Jonathan M Davis (16/29) Feb 07 2014 In general, I think that throwing on malformed Unicode is a good thing,
- Sean Kelly (15/23) Feb 06 2014 That's a tough one. Bad input typically shouldn't generate an
- bearophile (6/8) Feb 07 2014 I wrote two small ideas to reduce throwing exceptions in Phobos:
- Walter Bright (8/11) Feb 06 2014 Right. If you're:
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/10) Feb 06 2014 I disagree.
- Brad Anderson (6/17) Feb 06 2014 I think in the case of people using exceptions for control flow a
- Walter Bright (3/13) Feb 06 2014 They're going to be slow when you do it that way.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/11) Feb 07 2014 How slow is slow? Is it slower than in Go and Python? Why would
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/11) Feb 07 2014 When I think of it you could probably just push the RESTException
- Dicebot (3/4) Feb 07 2014 It is assumed by http://dlang.org/errors.html
- Dicebot (3/7) Feb 07 2014 P.S. Throwing exception is not that slow in D, it is allocating
- Walter Bright (5/7) Feb 07 2014 Throwing speed can vary greatly from platform to platform.
- Adam D. Ruppe (16/17) Feb 07 2014 One problem with allocating the exception is the stop-the-world
- Sean Kelly (3/20) Feb 07 2014 It's obviously not a solution, but you could change that by
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (11/22) Feb 07 2014 Ok, well I guess that primarily is an issue for validation errors
- Adam D. Ruppe (11/21) Feb 07 2014 yeah, preallocating exceptions might be a really good idea.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/11) Feb 07 2014 I wonder if it would be possible to get better unwinding speed by
- Walter Bright (9/15) Feb 07 2014 The gc is not the real speed issue with exceptions, after all, one can
- Dmitry Olshansky (9/27) Feb 07 2014 It's deh.d or rather deh_win32./ deh_win64_posix.d and it doesn't look
- Walter Bright (2/5) Feb 08 2014 It's a heluva lot slower than "jmp".
- Dmitry Olshansky (11/18) Feb 09 2014 If you can show me how a single unconditional jump propagates error code...
- Walter Bright (2/10) Feb 10 2014 It's the table lookup that's inherently slow.
- Sean Kelly (41/52) Feb 06 2014 But let this be up to the programmer working on the service, not
- Dicebot (7/18) Feb 07 2014 And it is horrible. Exceptions were never designed for this. Try
- Andrei Alexandrescu (6/8) Feb 06 2014 That's extreme. A better possibility is to allocate exceptions from a
- Adam D. Ruppe (8/11) Feb 06 2014 I wrote a quick proof of concept of this that can be tested right
- Walter Bright (3/9) Feb 07 2014 That doesn't work, as nothing prevents code from squirreling away the ca...
- Adam D. Ruppe (5/7) Feb 07 2014 scope would. I'm just saying.
- Sean Kelly (4/11) Feb 07 2014 Thread stores an uncaught exception reference so it can be
- Adam D. Ruppe (7/9) Feb 07 2014 It could also make a copy at that time on to the regular GC heap
- Adam D. Ruppe (9/11) Feb 07 2014 lol just add in a quick call to .toGC when you want to store it:
- Jerry (4/14) Feb 07 2014 Very naive question (that may have already been answered), but why can't
- Adam D. Ruppe (4/5) Feb 07 2014 I think that'd be more costly and would mess up the whole
- bearophile (10/11) Feb 07 2014 This thread discusses the (low) performance of D exceptions, and
- Sean Kelly (3/12) Feb 07 2014 Okay, I'm going to look into generating traces lazily. I think
- Dicebot (9/10) Feb 06 2014 Throw pre-allocated thread-local exception. And make a deep copy
- Brad Anderson (8/39) Feb 06 2014 I'd think fixing that is probably above and beyond what is
- Iain Buclaw (4/25) Feb 06 2014 That message will look much better with vcolumns. ;)
- Iain Buclaw (3/32) Feb 06 2014 Saying that, it seems it doesn't show the column number correctly.
- bearophile (8/14) Feb 06 2014 Since some time in some cases dynamic array literals don't
- Namespace (3/17) Feb 06 2014 My pull was not perfect. And I have no time to finish the type[$]
- Jonathan M Davis (21/42) Feb 08 2014 The exception version has to all of the same checks that the version whi...
Would anyone be willing to take on the ingrate task of creating a comprehensive list with all Phobos functions (and more generally artifacts) that allocate memory? That would help a lot with focusing the discussion. Andrei
Feb 06 2014
On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu wrote:Would anyone be willing to take on the ingrate task of creating a comprehensive list with all Phobos functions (and more generally artifacts) that allocate memory? That would help a lot with focusing the discussion. AndreiMerging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.
Feb 06 2014
On 2/6/14, Dicebot <public dicebot.lv> wrote:Merging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.Running the tests is overkill, all you have to do is iterate over each module and call "-o- -vgc" on it. We have so many allocations in Phobos that I couldn't even upload my text over to a paste site, most sites have a limit of 150Kb! So here it is on github: https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt
Feb 06 2014
On Thursday, 6 February 2014 at 17:18:59 UTC, Andrej Mitrovic wrote:On 2/6/14, Dicebot <public dicebot.lv> wrote:Quite a few of those seem to be false positives. E.g. C:\dmd-git\dmd2\src\phobos\std\internal\digest\sha_SSSE3.d(512): Concatenation causes gc allocation "rol "~T2~",5", looks like something that only ever makes sense at compilation timeMerging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.Running the tests is overkill, all you have to do is iterate over each module and call "-o- -vgc" on it. We have so many allocations in Phobos that I couldn't even upload my text over to a paste site, most sites have a limit of 150Kb! So here it is on github: https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txt
Feb 06 2014
On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:We have so many allocations in Phobos that I couldn't even upload my text over to a paste site, most sites have a limit of 150Kb! So here it is on github: https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txtAh just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Feb 06 2014
On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Updated to remove duplicate reports.https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txtAh just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Feb 06 2014
On 2/6/14, 9:21 AM, Andrej Mitrovic wrote:On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Thanks. I guess we'd need to cross-reference to function names from there. AndreiOn 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Updated to remove duplicate reports.https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txtAh just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Feb 06 2014
On Thursday, 6 February 2014 at 17:57:45 UTC, Andrei Alexandrescu wrote:On 2/6/14, 9:21 AM, Andrej Mitrovic wrote:lots of them are throws toughOn 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Thanks. I guess we'd need to cross-reference to function names from there. AndreiOn 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Updated to remove duplicate reports.https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txtAh just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Feb 06 2014
On Thursday, 6 February 2014 at 17:57:45 UTC, Andrei Alexandrescu wrote:On 2/6/14, 9:21 AM, Andrej Mitrovic wrote:and also new *XY*Exception doesn't indicate a problem necessarilyOn 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Thanks. I guess we'd need to cross-reference to function names from there. AndreiOn 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Updated to remove duplicate reports.https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txtAh just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Feb 06 2014
On 2/6/14, 10:05 AM, grm wrote:On Thursday, 6 February 2014 at 17:57:45 UTC, Andrei Alexandrescu wrote:Good point. Seems to me code inspection would be a simpler way. AndreiOn 2/6/14, 9:21 AM, Andrej Mitrovic wrote:and also new *XY*Exception doesn't indicate a problem necessarilyOn 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Thanks. I guess we'd need to cross-reference to function names from there. AndreiOn 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Updated to remove duplicate reports.https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txtAh just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Feb 06 2014
On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Thanks. I guess we'd need to cross-reference to function names from there.Updated to include function names.
Feb 06 2014
On 2/6/14, 10:15 AM, Andrej Mitrovic wrote:On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Noice. One less phobos_allocations.txt | grep 'In function'| sed "s/.*'\\(.*\\)':/\\1/"|sort|uniq >phobos_allocating_functions.txt later, and... AndreiThanks. I guess we'd need to cross-reference to function names from there.Updated to include function names.
Feb 06 2014
On Thursday, 6 February 2014 at 18:25:34 UTC, Andrei Alexandrescu wrote:Noice. One less phobos_allocations.txt | grep 'In function'| sed "s/.*'\\(.*\\)':/\\1/"|sort|uniqWell I'm just hacking on the -vgc pull to output what I want, but I should read titles better :). Here's the functions: http://codepad.org/3TsPXryXphobos_allocating_functions.txtlater, and...
Feb 06 2014
On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Noice. One less phobos_allocations.txt | grep 'In function'| sed "s/.*'\\(.*\\)':/\\1/"|sort|uniq >phobos_allocating_functions.txt later, and...Ah you've attached a file, didn't notice it on the left since I usually skim the avatar part: http://forum.dlang.org/thread/ld0d79$2ife$1 digitalmars.com?page=2#post-ld0k2u:242ptu:241:40digitalmars.com
Feb 06 2014
06-Feb-2014 22:15, Andrej Mitrovic пишет:On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Hm. Somehow diffing this with coverage report may help filter out CTFE. Some bugs are features :) -- Dmitry OlshanskyThanks. I guess we'd need to cross-reference to function names from there.Updated to include function names.
Feb 06 2014
On Thu, Feb 06, 2014 at 11:39:30PM +0400, Dmitry Olshansky wrote:06-Feb-2014 22:15, Andrej Mitrovic пишет:[...] I thought *all* bugs are features... unintentional features. :-P T -- Bomb technician: If I'm running, try to keep up.On 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Hm. Somehow diffing this with coverage report may help filter out CTFE. Some bugs are features :)Thanks. I guess we'd need to cross-reference to function names from there.Updated to include function names.
Feb 06 2014
07-Feb-2014 00:15, H. S. Teoh пишет:On Thu, Feb 06, 2014 at 11:39:30PM +0400, Dmitry Olshansky wrote:O.T. From a pragmatic point of view any specific property of a system that is useful to the enduser is a feature. Not all bugs are useful ;)06-Feb-2014 22:15, Andrej Mitrovic пишет:[...] I thought *all* bugs are features... unintentional features. :-POn 2/6/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Hm. Somehow diffing this with coverage report may help filter out CTFE. Some bugs are features :)Thanks. I guess we'd need to cross-reference to function names from there.Updated to include function names.T-- Dmitry Olshansky
Feb 06 2014
06-Feb-2014 21:21, Andrej Mitrovic пишет:On 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Needs to somehow cut down CTFE-only stuff. E.g. std.regex alocates a lot at CTFE (and in debug sections), it's a prominent example of CTFE but there is a _lot_ more in the same theme. -- Dmitry OlshanskyOn 2/6/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Updated to remove duplicate reports.https://raw.github.com/AndrejMitrovic/phobos_allocations/master/phobos_allocations.txtAh just realized there are duplicates in the report. I guess -vgc is emitting dupes.
Feb 06 2014
Am Thu, 06 Feb 2014 16:32:08 +0000 schrieb "Dicebot" <public dicebot.lv>:On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu wrote:That's only for implicit allocations though. And please, don't merge yet, it'll get another rewrite this weekend ;-) One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions. Here's some example output for std.uuid/digest/path/range/algorithm/curl: http://dpaste.dzfl.pl/96d3725b06e2Would anyone be willing to take on the ingrate task of creating a comprehensive list with all Phobos functions (and more generally artifacts) that allocate memory? That would help a lot with focusing the discussion. AndreiMerging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.
Feb 06 2014
On 2/6/14, 10:05 AM, Johannes Pfau wrote:Am Thu, 06 Feb 2014 16:32:08 +0000 schrieb "Dicebot" <public dicebot.lv>:Please close if you plan to rewrite.On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu wrote:That's only for implicit allocations though. And please, don't merge yet, it'll get another rewrite this weekend ;-)Would anyone be willing to take on the ingrate task of creating a comprehensive list with all Phobos functions (and more generally artifacts) that allocate memory? That would help a lot with focusing the discussion. AndreiMerging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions.Good point, we need to address that as well. Andrei
Feb 06 2014
expecting the requested close, so some OTs (in random order): - bought TDPL shortly after it's been released - was very impressed by the concept - following the NGs since, I guess, 2010 - great community and *very* smart people - had nothing of value to add yet, tough (since I'm stuck with C/C++/Jave and some proprietary stuff) - and today I submitted my first reply, which was incredibly easy. no annoyance! please make this more obvious for guys like me that do not want to register. thx and good luck to you all hope I can contribute my share some day Kind RegardsThat's only for implicit allocations though. And please, don't merge yet, it'll get another rewrite this weekend ;-)Please close if you plan to rewrite. Andrei
Feb 06 2014
On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:Hey, wait a second. How do you throw without allocating?One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions.Good point, we need to address that as well. Andrei
Feb 06 2014
On 2/6/14, 10:52 AM, fra wrote:On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:I don't know yet. That's what the "addressing the problem" will take care of! :o) AndreiHey, wait a second. How do you throw without allocating?One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions.Good point, we need to address that as well. Andrei
Feb 06 2014
On Thu, Feb 06, 2014 at 11:01:18AM -0800, Andrei Alexandrescu wrote:On 2/6/14, 10:52 AM, fra wrote:[...] You can just pre-declare the Exception as a global variable and then throw that. Well, OK, it's cheating because you still have to allocate it then, but the point is that you get to control how it gets allocated at the top-level rather than having the 'new' buried deep down in the function call chain where you can't control whether the code uses 'new' or a custom allocator (it may not know about which allocator to use). Exception prealloc_exc; static this() { prealloc_exc = ... /* use whatever allocation method you want */ } void main() { try { func(); } catch(Exception e) { // you get prealloc_exc here } } void func() { if (error) { // init exception parameters prealloc_exc.msg = ...; /* presumably you preallocate the * message string too, with the * allocator of your choice */ throw prealloc_exc; // N.B. no allocation } } T -- Doubtless it is a good thing to have an open mind, but a truly open mind should be open at both ends, like the food-pipe, with the capacity for excretion as well as absorption. -- Northrop FryeOn Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:I don't know yet. That's what the "addressing the problem" will take care of! :o)Hey, wait a second. How do you throw without allocating?One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions.Good point, we need to address that as well. Andrei
Feb 06 2014
Am Thu, 06 Feb 2014 18:52:20 +0000 schrieb "fra" <a b.it>:On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:You can store the exception as a global and that's done for the OutOfMemoryError IIRC, but what I meant was 'allocate with the GC'.Hey, wait a second. How do you throw without allocating?One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions.Good point, we need to address that as well. Andrei
Feb 06 2014
Am Thu, 6 Feb 2014 20:00:50 +0100 schrieb Johannes Pfau <nospam example.com>:Am Thu, 06 Feb 2014 18:52:20 +0000 schrieb "fra" <a b.it>:Oh and in other languages you can throw by value but I think that wouldn't work in D because of exception chaining.On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:You can store the exception as a global and that's done for the OutOfMemoryError IIRC, but what I meant was 'allocate with the GC'.Hey, wait a second. How do you throw without allocating?One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions.Good point, we need to address that as well. Andrei
Feb 06 2014
On Thursday, 6 February 2014 at 19:01:33 UTC, Johannes Pfau wrote:You can store the exception as a global and that's done for the OutOfMemoryError IIRC.Hmm.. is that even safe? I mean in some case of exception chaining the same object could be overwritten before being thrown again, thereby losing the original exception state. Thinking out loud here..
Feb 06 2014
On Thursday, 6 February 2014 at 19:05:49 UTC, Andrej Mitrovic wrote:On Thursday, 6 February 2014 at 19:01:33 UTC, Johannes Pfau wrote:You could use a circular buffer with appropriate length.You can store the exception as a global and that's done for the OutOfMemoryError IIRC.Hmm.. is that even safe? I mean in some case of exception chaining the same object could be overwritten before being thrown again, thereby losing the original exception state. Thinking out loud here..
Feb 06 2014
On 6 February 2014 18:52, fra <a b.it> wrote:On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:You can't. :o)Hey, wait a second. How do you throw without allocating?One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions.Good point, we need to address that as well. Andrei
Feb 06 2014
On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:Hey, wait a second. How do you throw without allocating?I think exceptions should be ok. You optimize the typical path, and exceptions are (by definition) an exceptional path. If they are also unacceptable, you could restrict yourself to nothrow functions. (Which can still throw Errors... but meh they are even *more* exceptional)
Feb 06 2014
Am Thu, 06 Feb 2014 19:08:39 +0000 schrieb "Adam D. Ruppe" <destructionator gmail.com>:On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:That depends on your situation. For games and other applications on normal computers it's OK. For games on systems like embedded gaming systems (think like NintendoDS, 4MB ram) you might not have a GC but still want to use exception handling.Hey, wait a second. How do you throw without allocating?I think exceptions should be ok. You optimize the typical path, and exceptions are (by definition) an exceptional path. If they are also unacceptable, you could restrict yourself to nothrow functions. (Which can still throw Errors... but meh they are even *more* exceptional)
Feb 06 2014
On Thursday, 6 February 2014 at 19:32:11 UTC, Johannes Pfau wrote:For games on systems like embedded gaming systems (think like NintendoDS, 4MB ram) you might not have a GC but still want to use exception handling.Yeah, when I toyed with bare metal D, I did exceptions with manual memory management - malloc when throwing (well, I did malloc in _d_newclass so it was transparent to the throwing code), free when catching. But I think a program written for a special environment will have different coding standards from top to bottom, including the need to free in an exception handler and the option to hack druntime.
Feb 06 2014
On Thursday, 6 February 2014 at 19:08:40 UTC, Adam D. Ruppe wrote:On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:Hardly so. Any exception allocation can trigger GC collection cycle and Phobos does not provide any other way to handle data errors. Any application that operates on some external user input will be subject to DoS attack vector if it uses Phobos directly. It was huge performance killer for vibe.d last time I have checked, for example.Hey, wait a second. How do you throw without allocating?I think exceptions should be ok. You optimize the typical path, and exceptions are (by definition) an exceptional path. If they are also unacceptable, you could restrict yourself to nothrow functions. (Which can still throw Errors... but meh they are even *more* exceptional)
Feb 06 2014
On Thursday, 6 February 2014 at 21:38:03 UTC, Dicebot wrote:On Thursday, 6 February 2014 at 19:08:40 UTC, Adam D. Ruppe wrote:Personally I don't think bad user input qualifies as an exceptional case because it's expected to happen and the program is expected to handle it (and let the user know) when it does. That's just a matter of taste though.On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:Hardly so. Any exception allocation can trigger GC collection cycle and Phobos does not provide any other way to handle data errors. Any application that operates on some external user input will be subject to DoS attack vector if it uses Phobos directly. It was huge performance killer for vibe.d last time I have checked, for example.Hey, wait a second. How do you throw without allocating?I think exceptions should be ok. You optimize the typical path, and exceptions are (by definition) an exceptional path. If they are also unacceptable, you could restrict yourself to nothrow functions. (Which can still throw Errors... but meh they are even *more* exceptional)
Feb 06 2014
On Thursday, 6 February 2014 at 22:15:11 UTC, Brad Anderson wrote:Personally I don't think bad user input qualifies as an exceptional case because it's expected to happen and the program is expected to handle it (and let the user know) when it does. That's just a matter of taste though.I agree. It kills the whole concept of "exceptions are rare so they don't need to be fast when thrown". But it is how quite lot of Phobos is currently designed and, in my opinion, is biggest design mistake of vibe.d too (it uses exceptions to propagate HTTP status codes)
Feb 06 2014
On Thursday, 6 February 2014 at 22:19:42 UTC, Dicebot wrote:On Thursday, 6 February 2014 at 22:15:11 UTC, Brad Anderson wrote:I must admit that I am guilty of sometimes using exceptions for routine control flow too. It's just so convenient compared to validation/consumption. Maybe we should make a list of Phobos functions that throw exceptions and ensure that (for the ones where this makes sense) they non-throwing validators available. If we can stop gc allocating them that'd be even better but I don't think them being gc allocating should hold up nogc.Personally I don't think bad user input qualifies as an exceptional case because it's expected to happen and the program is expected to handle it (and let the user know) when it does. That's just a matter of taste though.I agree. It kills the whole concept of "exceptions are rare so they don't need to be fast when thrown". But it is how quite lot of Phobos is currently designed and, in my opinion, is biggest design mistake of vibe.d too (it uses exceptions to propagate HTTP status codes)
Feb 06 2014
On 2/6/2014 2:15 PM, Brad Anderson wrote:Personally I don't think bad user input qualifies as an exceptional case because it's expected to happen and the program is expected to handle it (and let the user know) when it does. That's just a matter of taste though.It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.
Feb 06 2014
Walter Bright:It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.Perhaps the world of today malicious attacks on the software you write should be assumed as the default situation, and then the language+library has to offer something less paranoiac on request. That's why some languages have changed their sorting and hashing routines to make them a little slower but safer on default. Bye, bearophile
Feb 06 2014
On 2/6/2014 7:08 PM, bearophile wrote:Walter Bright:DoS attack resistance requires faster code, not slower code.It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.Perhaps the world of today malicious attacks on the software you write should be assumed as the default situation, and then the language+library has to offer something less paranoiac on request. That's why some languages have changed their sorting and hashing routines to make them a little slower but safer on default.
Feb 07 2014
On Friday, 7 February 2014 at 08:30:35 UTC, Walter Bright wrote:On 2/6/2014 7:08 PM, bearophile wrote:The specific problem was that it was possible to provoke hash collisions by sending carefully crafted input, causing the hash-tables to degrade to linked lists. The small performance penalty of using collision-resistant hashes is certainly worth it in this case.That's why some languages have changed their sorting and hashing routines to make them a little slower but safer on default.DoS attack resistance requires faster code, not slower code.
Feb 07 2014
On 2/7/2014 6:50 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:The specific problem was that it was possible to provoke hash collisions by sending carefully crafted input, causing the hash-tables to degrade to linked lists. The small performance penalty of using collision-resistant hashes is certainly worth it in this case.That has nothing to do with needing exceptions in the control flow path (and the performance penalty for using exceptions in this manner is certainly not small).
Feb 08 2014
On Saturday, 8 February 2014 at 21:59:24 UTC, Walter Bright wrote:On 2/7/2014 6:50 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:Huh? I responded to this discussion: On Friday, 7 February 2014 at 08:30:35 UTC, Walter Bright wrote:The specific problem was that it was possible to provoke hash collisions by sending carefully crafted input, causing the hash-tables to degrade to linked lists. The small performance penalty of using collision-resistant hashes is certainly worth it in this case.That has nothing to do with needing exceptions in the control flow path (and the performance penalty for using exceptions in this manner is certainly not small).On 2/6/2014 7:08 PM, bearophile wrote:I was merely clarifying why in this specific case making the average code path slower _did_ help DoS attack resistance.That's why some languages have changed their sorting and hashing routines to make them a little slower but safer on default.DoS attack resistance requires faster code, not slower code.
Feb 09 2014
On Friday, 7 February 2014 at 08:30:35 UTC, Walter Bright wrote:On 2/6/2014 7:08 PM, bearophile wrote:I think bearophile is referring to a practice of avoiding fast average-case, slow worst-case algorithms in favour of faster worst-cases. If an algorithm has best-case O(n*log(n)) and worst case O(n^2), it's often not practical to build for the worst case, but anything less than that can make you vulnerable to malicious input as part of DOS. In comparison, an algorithm with O(n*log^2(n)) average and worst-case might be acceptable in the average case, but will hold up better in the face of attack. I'm not sure how relevant the point is to the general discussion.Walter Bright:DoS attack resistance requires faster code, not slower code.It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.Perhaps the world of today malicious attacks on the software you write should be assumed as the default situation, and then the language+library has to offer something less paranoiac on request. That's why some languages have changed their sorting and hashing routines to make them a little slower but safer on default.
Feb 07 2014
John Colvin:I think bearophile is referring toYes, you have explained well my point. Thank you. Bye, bearophile
Feb 07 2014
07-Feb-2014 06:44, Walter Bright пишет:On 2/6/2014 2:15 PM, Brad Anderson wrote:Meh. If exceptions are such a liability we'd better make them (much) faster. -- Dmitry OlshanskyPersonally I don't think bad user input qualifies as an exceptional case because it's expected to happen and the program is expected to handle it (and let the user know) when it does. That's just a matter of taste though.It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.
Feb 07 2014
On Friday, 7 February 2014 at 16:41:00 UTC, Dmitry Olshansky wrote:Meh. If exceptions are such a liability we'd better make them (much) faster.It's not stack unwinding speed that's an issue here though, but rather that for client-facing services, throwing an exception when an invalid request is received gives malicious clients an opportunity to hurt service performance by flooding it with invalid requests. Improving the exception code specifically doesn't help here because the real issue is with GC collections. I'd say that the real fix is for such services to simply not throw in this case. But the exception could always be recycled as well (since in this case you know that throwing will abort the transaction and so will always be immediately discarded). I'm not convinced that there's any need for a language change here to support scoped exceptions. That seems a bit like killing the ant with a steamroller.
Feb 07 2014
07-Feb-2014 20:49, Sean Kelly пишет:On Friday, 7 February 2014 at 16:41:00 UTC, Dmitry Olshansky wrote:Why throwing a single exception is such a big problem? Surely even C's long_jump wasn't that expensive? *Maybe* we shouldn't re-construct full stack trace on every throw?Meh. If exceptions are such a liability we'd better make them (much) faster.It's not stack unwinding speed that's an issue here though, but rather that for client-facing services, throwing an exception when an invalid request is received gives malicious clients an opportunity to hurt service performance by flooding it with invalid requests.Improving the exception code specifically doesn't help here because the real issue is with GC collections.Then the problem is that something so temporary as an exception is allocated on the GC heap in the first place? Let's go for something more sane and deprecate the current behavior, it's not like we are forever stuck with it.I'd say that the real fix is for such services to simply not throw in this case. But the exception could always be recycled as well (since in this case you know that throwing will abort the transaction and so will always be immediately discarded).Exceptions are convenient and they make life that much easier combined with ctors/dtors and scoped lifetime. And then we say **ck it - for busy services, just use good ol': ... if (check42(...) == -1){ call_cleanup42(); return -1; } ... And up the callstack we march. The moment code gets non-trivial there come exceptions and RAII to save the day, I don't see how busy REST services are unlike anything else.I'm not convinced that there's any need for a language change here to support scoped exceptions. That seems a bit like killing the ant with a steamroller.Well I'm not convinced we should accept that exceptions are many times slower then error codes (with checks on every function that may fail + propagating up the stack). -- Dmitry Olshansky
Feb 07 2014
On Friday, 7 February 2014 at 17:06:36 UTC, Dmitry Olshansky wrote:As I have already mentioned, they don't necessarily need to be. But that may require tweaking language so that pre-allocated exception usage becomes reliable and I don't see tools right now that allow to express neseccary semantics (can't store reference to instance without deep copy)I'm not convinced that there's any need for a language change here to support scoped exceptions. That seems a bit like killing the ant with a steamroller.Well I'm not convinced we should accept that exceptions are many times slower then error codes (with checks on every function that may fail + propagating up the stack).
Feb 07 2014
On 2/7/2014 10:10 AM, Dicebot wrote:As I have already mentioned, they don't necessarily need to be. But that may require tweaking language so that pre-allocated exception usage becomes reliable and I don't see tools right now that allow to express neseccary semantics (can't store reference to instance without deep copy)It is NOT the allocation that's the issue. C++ code has the same issue. It's the exception handling table lookup.
Feb 08 2014
On Friday, 7 February 2014 at 17:06:36 UTC, Dmitry Olshansky wrote:07-Feb-2014 20:49, Sean Kelly пишет:That can be turned off at run time by clearing the traceHandler. But yeah, it's the allocations that are a problem in this case, not the unwinding. And specifically, that flooding with bad requests effectively generates tons of garbage (an allocation for the exception plus another for the trace data) thus triggering frequent stop-the-world collections.On Friday, 7 February 2014 at 16:41:00 UTC, Dmitry Olshansky wrote:Why throwing a single exception is such a big problem? Surely even C's long_jump wasn't that expensive? *Maybe* we shouldn't re-construct full stack trace on every throw?Meh. If exceptions are such a liability we'd better make them (much) faster.It's not stack unwinding speed that's an issue here though, but rather that for client-facing services, throwing an exception when an invalid request is received gives malicious clients an opportunity to hurt service performance by flooding it with invalid requests.Exceptions are convenient and they make life that much easier combined with ctors/dtors and scoped lifetime. And then we say **ck it - for busy services, just use good ol': ... if (check42(...) == -1){ call_cleanup42(); return -1; } ... And up the callstack we march. The moment code gets non-trivial there come exceptions and RAII to save the day, I don't see how busy REST services are unlike anything else.I'm sure you can see how a service is different from a desktop application, right? In the latter case, there's only one user and he's interested in having his application perform well. Outside of a QA lab you won't find desktop app. users deliberately trying to break their app. Services are exactly the opposite. It's not an exaggeration when I say that the services I work on are under attack from botnets 24/7. This is a use case that must be considered as a first order of business or the entire service suffers.Exception-oriented code is typically faster for the success case because all that return code checking can be removed. But the tradeoff is that it's slower in the failure case because stack unwinding is simply slower than checking an error code. But again, the issue here isn't the cost of stack unwinding, it's that thousands of exceptions thrown per second generates a lot of garbage, and garbage collection in D is currently fairly slow compared to, say, Java. If we could get an incremental GC for D I probably wouldn't even care, but I think that's impossible.I'm not convinced that there's any need for a language change here to support scoped exceptions. That seems a bit like killing the ant with a steamroller.Well I'm not convinced we should accept that exceptions are many times slower then error codes (with checks on every function that may fail + propagating up the stack).
Feb 07 2014
On Fri, 07 Feb 2014 10:54:37 -0800, Sean Kelly <sean invisibleduck.org> = = wrote:On Friday, 7 February 2014 at 17:06:36 UTC, Dmitry Olshansky wrote:)07-Feb-2014 20:49, Sean Kelly =D0=BF=D0=B8=D1=88=D0=B5=D1=82:On Friday, 7 February 2014 at 16:41:00 UTC, Dmitry Olshansky wrote:Meh. If exceptions are such a liability we'd better make them (much=erfaster.It's not stack unwinding speed that's an issue here though, but rath=idthat for client-facing services, throwing an exception when an inval=s =request is received gives malicious clients an opportunity to hurt service performance by flooding it with invalid requests.Why throwing a single exception is such a big problem? Surely even C'=long_jump wasn't that expensive? *Maybe* we shouldn't re-construct =d =full stack trace on every throw?That can be turned off at run time by clearing the traceHandler. But yeah, it's the allocations that are a problem in this case, not the unwinding. And specifically, that flooding with bad requests effectively generates tons of garbage (an allocation for the exception plus another for the trace data) thus triggering frequent stop-the-world collections.Exceptions are convenient and they make life that much easier combine=with ctors/dtors and scoped lifetime. And then we say **ck it - for ==busy services, just use good ol': ... if (check42(...) =3D=3D -1){ call_cleanup42(); return -1; } ... And up the callstack we march. The moment code gets non-trivial there=come exceptions and RAII to save the day, I don't see how busy REST =o =services are unlike anything else.I'm sure you can see how a service is different from a desktop application, right? In the latter case, there's only one user and he's interested in having his application perform well. Outside of a QA lab you won't find desktop app. users deliberately trying to break their app. Services are exactly the opposite. It's not an exaggeration when I say that the services I work on are under attack from botnets 24/7. This is a use case that must be considered as a first order of business or the entire service suffers.I'm not convinced that there's any need for a language change here t=th =support scoped exceptions. That seems a bit like killing the ant wi=s =a steamroller.Well I'm not convinced we should accept that exceptions are many time=+ =slower then error codes (with checks on every function that may fail =Technically, there is no reason that the current GC can't be made = incremental, insofar as incremental means collecting only what is requir= ed = complete the allocation. -- = Adam Wilson GitHub/IRC: LightBender Aurora Project Coordinatorpropagating up the stack).Exception-oriented code is typically faster for the success case because all that return code checking can be removed. But the tradeoff is that it's slower in the failure case because stack unwinding is simply slower than checking an error code. But again, the issue here isn't the cost of stack unwinding, it's that thousands of exceptions thrown per second generates a lot of garbage, and garbage collection in D is currently fairly slow compared to, say, Java. If we could get an incremental GC for D I probably wouldn't even care, but I think that's impossible.
Feb 07 2014
07-Feb-2014 22:54, Sean Kelly пишет:On Friday, 7 February 2014 at 17:06:36 UTC, Dmitry Olshansky wrote:Which should be somehow prominently advertised for release builds. Last time I checked not making it null made exceptions ridiculously slow.That can be turned off at run time by clearing the traceHandler.It's not stack unwinding speed that's an issue here though, but rather that for client-facing services, throwing an exception when an invalid request is received gives malicious clients an opportunity to hurt service performance by flooding it with invalid requests.Why throwing a single exception is such a big problem? Surely even C's long_jump wasn't that expensive? *Maybe* we shouldn't re-construct full stack trace on every throw?But yeah, it's the allocations that are a problem in this case, not the unwinding. And specifically, that flooding with bad requests effectively generates tons of garbage (an allocation for the exception plus another for the trace data) thus triggering frequent stop-the-world collections.So again - the problem is allocations on GC heap. Then let's please not worry about tiny gains of avoiding stack unwind, that is well understood. And I see no reason for allocating exceptions on GC (and none presented so far). The main use case of exception is to consume exception on catch or forward it down the line. Storing a reference to an exception elsewhere is rare case. I could see the whole situation with exceptions in D as "we copied this shit from Java, no idea why" Java at least does go to great lengths to make them fast (by caching them behind the scenes and whatnot).Aye, in fact I haven't written much in the way of desktop apps.Exceptions are convenient and they make life that much easier combined with ctors/dtors and scoped lifetime. And then we say **ck it - for busy services, just use good ol': ... if (check42(...) == -1){ call_cleanup42(); return -1; } ... And up the callstack we march. The moment code gets non-trivial there come exceptions and RAII to save the day, I don't see how busy REST services are unlike anything else.I'm sure you can see how a service is different from a desktop application, right?In the latter case, there's only one user and he's interested in having his application perform well. Outside of a QA lab you won't find desktop app. users deliberately trying to break their app. Services are exactly the opposite. It's not an exaggeration when I say that the services I work on are under attack from botnets 24/7. This is a use case that must be considered as a first order of business or the entire service suffers.I bet some sanity checks on the level of protocol handling is more then enough. Yeah these might be faster then unwinding due to shear volume of bad data, but it's a fraction of code albeit a critical fraction. I was thinking about the service logic on top of that.Duly noted. Just stating the obvious - in the majority of cases we talk about 1 unwind vs 10s of checks. The difference isn't THAT big anyway, the only advantage of codes checking is being able to fail faster on some _early_ bad condition.Exception-oriented code is typically faster for the success case because all that return code checking can be removed. But the tradeoff is that it's slower in the failure case because stack unwinding is simply slower than checking an error code.I'm not convinced that there's any need for a language change here to support scoped exceptions. That seems a bit like killing the ant with a steamroller.Well I'm not convinced we should accept that exceptions are many times slower then error codes (with checks on every function that may fail + propagating up the stack).But again, the issue here isn't the cost of stack unwinding, it's that thousands of exceptions thrown per second generates a lot of garbage, and garbage collection in D is currently fairly slow compared to, say, Java.Let's stop bashing GC here. This part of design of exceptions in D is just backwards (penalizes usual case) - time to fix it? -- Dmitry Olshansky
Feb 07 2014
On 2/7/2014 10:54 AM, Sean Kelly wrote:But yeah, it's the allocations that are a problem in this case,Code can always pre-allocate the exception that is thrown. There's no reason whatsoever that allocation is required at the throw point, nor is there any reason the thrown exception has to be newly allocated each time. And, as such, this is entirely a coding issue, not a language or runtime one.
Feb 08 2014
On 2/7/2014 9:06 AM, Dmitry Olshansky wrote:Why throwing a single exception is such a big problem?Because in order to unwind the stack, you need to find the information about the stack layout. This lookup is rather slow. You can make the lookup faster by compromising the function code generation, but this is considered an unacceptable tradeoff.
Feb 08 2014
09-Feb-2014 02:03, Walter Bright пишет:On 2/7/2014 9:06 AM, Dmitry Olshansky wrote:A special table lookup can't be slow compared to writing a dummy HTTP 500 response. Just saying. Yes, it's a tad slower then cmp + jz, I do understand that. Again I'm trying to say that framing stack unwinding as the culprit of vibe.d crawling under bad requests is plain wrong, and that was the focal point of the original argument. -- Dmitry OlshanskyWhy throwing a single exception is such a big problem?Because in order to unwind the stack, you need to find the information about the stack layout. This lookup is rather slow. You can make the lookup faster by compromising the function code generation, but this is considered an unacceptable tradeoff.
Feb 08 2014
On Saturday, 8 February 2014 at 22:11:13 UTC, Dmitry Olshansky wrote:Again I'm trying to say that framing stack unwinding as the culprit of vibe.d crawling under bad requests is plain wrong, and that was the focal point of the original argument.Can you see if it is better with this little patch? https://github.com/D-Programming-Language/druntime/pull/717 on a simple test, I got a 20x speedup on most exceptions by lazy generating the stack trace upon request in toString (though if you are printing it anyway you won't see a difference)
Feb 08 2014
On 2/8/2014 2:11 PM, Dmitry Olshansky wrote:09-Feb-2014 02:03, Walter Bright пишет:I don't know how vibe.d works, but my point is using exception handling to implement normal control flow is bad design and it is going to be slow and the reason it is slow is because of the table lookup and unwinding cost, and that is not going to be fixed.On 2/7/2014 9:06 AM, Dmitry Olshansky wrote:A special table lookup can't be slow compared to writing a dummy HTTP 500 response. Just saying. Yes, it's a tad slower then cmp + jz, I do understand that. Again I'm trying to say that framing stack unwinding as the culprit of vibe.d crawling under bad requests is plain wrong, and that was the focal point of the original argument.Why throwing a single exception is such a big problem?Because in order to unwind the stack, you need to find the information about the stack layout. This lookup is rather slow. You can make the lookup faster by compromising the function code generation, but this is considered an unacceptable tradeoff.
Feb 08 2014
On Saturday, February 08, 2014 21:21:40 Walter Bright wrote:On 2/8/2014 2:11 PM, Dmitry Olshansky wrote:I wouldn't have considered throwing on an HTTP error to be "flow control." That's normal error handling, and throwing on HTTP errors is exactly what I would have done. It generally makes code a _lot_ cleaner that way, because you don't have to constantly check return codes for errors, and it's using exceptions for exactly what they're there for - reporting and handling errors. You don't want to use exceptions for stuff other than error reporting, and you don't want to use them in situations where the error case is the frequent case, but that shouldn't be the case for HTTP. Exceptions _will_ be slower than other code paths, and you don't want them to be the normal code path. Nothing is going to make exceptions as fast as the normal code paths either. However, D's exceptions are painfully slow - far slower than is reasonable - whether that's because of allocating the exception or unwinding the stack or creating the string for the stack trace or whatever is a matter for investigation, and I'm not about to claim that I know where the bottlenecks are. Fortunately, it looks like Adam Ruppe has found some ways to speed up exceptions: https://github.com/D-Programming-Language/druntime/pull/717 And there may be other improvements that we can implement as well. I agree that there's a limit to how much we can speed up exceptions, but right now, at minimum, we're getting creamed by Java in terms of speed: https://d.puremagic.com/issues/show_bug.cgi?id=9584 - Jonathan M Davis09-Feb-2014 02:03, Walter Bright пишет:I don't know how vibe.d works, but my point is using exception handling to implement normal control flow is bad design and it is going to be slow and the reason it is slow is because of the table lookup and unwinding cost, and that is not going to be fixed.On 2/7/2014 9:06 AM, Dmitry Olshansky wrote:A special table lookup can't be slow compared to writing a dummy HTTP 500 response. Just saying. Yes, it's a tad slower then cmp + jz, I do understand that. Again I'm trying to say that framing stack unwinding as the culprit of vibe.d crawling under bad requests is plain wrong, and that was the focal point of the original argument.Why throwing a single exception is such a big problem?Because in order to unwind the stack, you need to find the information about the stack layout. This lookup is rather slow. You can make the lookup faster by compromising the function code generation, but this is considered an unacceptable tradeoff.
Feb 08 2014
On Sunday, 9 February 2014 at 05:57:44 UTC, Jonathan M Davis wrote:Exceptions _will_ be slower than other code paths, and you don't want them to be the normal code path. Nothing is going to make exceptions as fast as the normal code paths either. However, D's exceptions are painfullyJust to be pedantic: this is not true. If you have frame based exception meta-info recording then a throw out of recursion (without try-blocks in the recursion) will be faster than normal returns. You unwind down to the try-block with loading a register and a single JMP. All you have to do is to maintain a single linked list of stack frames that can catch. AFAIK the overhead is neglectible if you avoid doing try-blocks in light-weight function calls. You store one pointer per catching stack-frame. That alone is good enough reason to realize that exception handling strategy should be a compiler switch, not a language policy. Because performance depends on what kind of code patterns you have and the architecture. On current gen of x86 CPUs the decode stage of instructions into micro ops and pipelineing ought to be heavy enough that simple BRA instructions "disappear". Thus the offset strategy ought to work well too (injecting data into the code stream near the return point and branch over it if necessary, but usually not).
Feb 09 2014
And with profiling you get the call-frequency between functions, so a throw could be replaced with: if (return_address = 0x1234556){...} // 60% if (return_address = 0x7899324){...} // 30% slow_unwinding() That ought to be obvious.
Feb 09 2014
On Saturday, 8 February 2014 at 22:03:13 UTC, Walter Bright wrote:You can make the lookup faster by compromising the function code generation, but this is considered an unacceptable tradeoff."Compromising"? You mean they had to modify codegen, which they didn't want to. Clearly, if you know the return address you also could have stack info access close to it (at a fixed offset), at no runtime cost whatsoever.
Feb 08 2014
But the c++ Dwarf way of doing it was developed for Itanium which was targetting HPC, for which you probably don't need exceptions all that often. So it made sense in that context. For regular applications it makes no sense, and with whole program analysis (or language level linker) you probably often can get a good match at the throw site.
Feb 08 2014
AND (this just has to be said) if D is really meant to be a SAFE programming language then the language should NOT encourage programmers to a coding style where you can fail to test for errors. The obvious solution is to ensure that you cannot ignore errors unless you are explicit about it. Exceptions ensure that. Having 3 different ways of returning errors is not a good strategy for safe and bug free programming. Ah, I just had to say it... ;)
Feb 08 2014
On 2/8/2014 2:59 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>" wrote:On Saturday, 8 February 2014 at 22:03:13 UTC, Walter Bright wrote:Ola, I've done it both ways, I actually do know what I'm talking about. I've sometimes been proven wrong here, so you're welcome to do a pull request proving so.You can make the lookup faster by compromising the function code generation, but this is considered an unacceptable tradeoff."Compromising"? You mean they had to modify codegen, which they didn't want to. Clearly, if you know the return address you also could have stack info access close to it (at a fixed offset), at no runtime cost whatsoever.
Feb 08 2014
Am Sat, 08 Feb 2014 21:29:27 -0800 schrieb Walter Bright <newshound2 digitalmars.com>:On 2/8/2014 2:59 PM, "Ola Fosheim Gr=C3=B8stad"=20 <ola.fosheim.grostad+dlang gmail.com>" wrote:ation,On Saturday, 8 February 2014 at 22:03:13 UTC, Walter Bright wrote:You can make the lookup faster by compromising the function code gener=want to.but this is considered an unacceptable tradeoff."Compromising"? You mean they had to modify codegen, which they didn't =accessClearly, if you know the return address you also could have stack info =uest=20close to it (at a fixed offset), at no runtime cost whatsoever.=20 Ola, I've done it both ways, I actually do know what I'm talking about. =20 I've sometimes been proven wrong here, so you're welcome to do a pull req=proving so.It is not the function code gen that needs to be improved on Linux, Walter. In fact that would be premature optimization considering that the *construction* of exceptions outweights unwinding costs for functions with no local variables by multiple orders of magnitude. --=20 Marco
Feb 08 2014
On Sunday, 9 February 2014 at 05:29:25 UTC, Walter Bright wrote:Ola, I've done it both ways, I actually do know what I'm talking about.Please note that "you" and "they" was meant as "one" or "the c++ community" not personal. It was not ad hominem. So no reason to be defensive about it. I am grateful if you can point out where my reasoning fails, then I learn something new. Maybe you could explain why a single occasional Branch Always over the unwind-pointer would be slow. Clearly the offset should be empirically based (so that you usually can avoid the goto), maybe even set to a separate cache line for some CPUs, and you could fill out the gaps with other data you need there. It's not like I have run i7 on Vtune, so I could be wrong, but I don't see why… And I also think that if you have a CPU with sufficient number of callee save registers you can carry along a pointer to the last try-block stack frame with not much penalty. After all you only have to restore it if the function ruined it and before calling new functions that are not inlined and not nothrow, and you could stick it into a thread local global too where it matters. On 32 bit x86 it probably is quite expensive though. In code where I write try blocks they tend to stay in the "main logic function", this cosde is so heavy that adding the stack frame to a linked list (of stack frames) is a neglectible cost One really need to be careful when doing performance tests of exception handling, because it is easy to construct "theoretical" code. Programmers should write exception handlers with the implementation in mind, so using existing programs as a base line is not a good solution either.I've sometimes been proven wrong here, so you're welcome to do a pull request proving so.You know very well that I am not going to rewrite codegen for DMD. Adding this feature will complicate codegen and you need to understand the code generator well to do the modification. Besides, I am not sure if a system level language should have exceptions at all or that I would use them when doing the kind of stuff I like to use D for. :-P ;-) I like to use exception handling in application-level code, but not in code for audio/simulations/buffer-streaming/low-level-stuff.
Feb 09 2014
This is a pretty nice description of the i7 pipeline by Hennesey and Patterson: https://www.inkling.com/read/computer-architecture-hennessy-5th/chapter-3/section-3-13#0113e87a6dc141d7abda84b497128d61 Notice the 28 micro ops buffer before execution. I'd expect a short predicted branch to not cause a big bubble, but I don't know for sure.
Feb 09 2014
On Friday, February 07, 2014 20:40:57 Dmitry Olshansky wrote:07-Feb-2014 06:44, Walter Bright пишет:Related: http://d.puremagic.com/issues/show_bug.cgi?id=9584 The DOS aspect of exceptions are not something that I've ever thought about or seen discussed before, but one area where I've found the slowness of D's exceptions to be a real pain is in unit tests. I like to test failure cases as well as successful ones, and if you do much of that, your unit tests start taking a long time due to how insanely slow exceptions are in D. So, while in some situations, the solution may be to not use exceptions (or to use them less), I think that we really need to look at doing what's necessary to make exceptions a lot faster - be it to more efficiently deal with stack traces or to avoid allocating them or whatever else we can come up with to make them fast. I think that the approach of assuming that exceptions don't need to be fast, because they're used for error conditions is a bad one. They're not as performance critical as normal code, but their speed still very much matters. - Jonathan M DavisOn 2/6/2014 2:15 PM, Brad Anderson wrote:Meh. If exceptions are such a liability we'd better make them (much) faster.Personally I don't think bad user input qualifies as an exceptional case because it's expected to happen and the program is expected to handle it (and let the user know) when it does. That's just a matter of taste though.It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.
Feb 07 2014
On Friday, 7 February 2014 at 19:54:14 UTC, Jonathan M Davis wrote:They're not as performance critical as normal code, but their speed still very much matters.Well, it is at least more difficult to write reliable code when you have to try to avoid them. Still for a webservice you should probably not have to deal with more than 1000 per second on average, assume 1Ghz, then that is like 1.000.000 cycles of running code per stack unwinding. If you sacrifice 10% of that for exception handling that means you have 100.000 cycles to unwind the stack. If the unwound stack is 5 frames deep you have 20.000 cycles per stack frame. If that is not possible something should be done with the Release-version of the runtime. For a webserver you could of course tie the request handler directly to the request object and instantiate different ones for each request type then have all "unwinding" in the object itself. Quirky, but workable.
Feb 07 2014
On 2/7/2014 11:53 AM, Jonathan M Davis wrote:or to avoid allocating themGrep for 'throw' in std.datetime shows that every throw is actually: throw new ... and an example: throw new DateTimeException("SYSTEMTIME cannot hold dates prior to the year 1601."); There is no requirement that the new is done there. You can preallocate the DateTimeException statically, and simply keep rethrowing the same exception instance. I.e. the allocation issue is a coding style issue, not a language problem.
Feb 08 2014
On Saturday, February 08, 2014 14:13:04 Walter Bright wrote:On 2/7/2014 11:53 AM, Jonathan M Davis wrote:Of course allocation is not a language issue. The question is whether (and how) we can change our approach to allocating exceptions in order to reduce their cost. And that's a change in how we approach them, not a change in the language itself. It might require some changes in druntime to better deal with other allocation schemes (particularly with how that affects exception chaining), but it's not a language issue. And in general, I would expect that any speed-ups that we could attain with regards to actually throwing an exception would be in druntime's implementation rather than anything in the language itself. Any improvements there could then be combined with any improvements we could make to our approach to allocating exceptions (and for better or worse - probably worse - the normal approach at this point is to allocate a new exception when throwing). - Jonathan M Davisor to avoid allocating themGrep for 'throw' in std.datetime shows that every throw is actually: throw new ... and an example: throw new DateTimeException("SYSTEMTIME cannot hold dates prior to the year 1601."); There is no requirement that the new is done there. You can preallocate the DateTimeException statically, and simply keep rethrowing the same exception instance. I.e. the allocation issue is a coding style issue, not a language problem.
Feb 08 2014
On 2/7/14, 8:40 AM, Dmitry Olshansky wrote:07-Feb-2014 06:44, Walter Bright пишет:One simple idea is to statically allocate the same exception and rethrow it over and over. After all there's no guarantee a distinct exception is thrown every time, and the approach is still memory safe (though it might surprise the programmer who saves a reference to an old exception). AndreiOn 2/6/2014 2:15 PM, Brad Anderson wrote:Meh. If exceptions are such a liability we'd better make them (much) faster.Personally I don't think bad user input qualifies as an exceptional case because it's expected to happen and the program is expected to handle it (and let the user know) when it does. That's just a matter of taste though.It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.
Feb 07 2014
On Friday, February 07, 2014 16:49:45 Andrei Alexandrescu wrote:On 2/7/14, 8:40 AM, Dmitry Olshansky wrote:As long as exceptions are cloneable, and people are aware of the fact that they tend to be non-unique, then it can be common practice to clone/dup an exception when you need to keep it around. However, the two potential problems with this overall approach are 1. Do we just always allocate one of each exception type per thread (probably in a static constructor for that exception type)? That would result in a fair number of exceptions being allocated up front. The obvious alternative would be to allocate it the first time that it's thrown so that you only end up with exceptions that get used being allocated, but regardless, we need to take close look at the allocation scheme. 2. This sort of thing has a definite impact on enforce and any idioms related to it. We'd need to either adjust enforce, enforceEx, etc. to avoid the allocation, or we'd need to introduce alternatives to them that expect something like a static opCall on the exception type which returns the common exception for that type or some other standard means of getting at the reusable exception. Regardless, we need to agree upon a standard way to define exception types allow with some set of standard idioms for handling them such that we can deal with exceptions generically (particularly with regards to stuff like enforce) rather than it being an ad-hoc per-exception type thing that you can't reasonably rely on. - Jonathan M Davis07-Feb-2014 06:44, Walter Bright пишет:One simple idea is to statically allocate the same exception and rethrow it over and over. After all there's no guarantee a distinct exception is thrown every time, and the approach is still memory safe (though it might surprise the programmer who saves a reference to an old exception).On 2/6/2014 2:15 PM, Brad Anderson wrote:Meh. If exceptions are such a liability we'd better make them (much) faster.Personally I don't think bad user input qualifies as an exceptional case because it's expected to happen and the program is expected to handle it (and let the user know) when it does. That's just a matter of taste though.It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.
Feb 07 2014
On Saturday, 8 February 2014 at 00:49:46 UTC, Andrei Alexandrescu wrote:One simple idea is to statically allocate the same exception and rethrow it over and over. After all there's no guarantee a distinct exception is thrown every time, and the approach is still memory safe (though it might surprise the programmer who saves a reference to an old exception). AndreiI don't think it's that simple. What happens if an XException causes another XException and they need to be chained together?
Feb 08 2014
08-Feb-2014 15:02, Jakob Ovrum пишет:On Saturday, 8 February 2014 at 00:49:46 UTC, Andrei Alexandrescu wrote:If both are thread-local and cached I see no problem whatsoever. The thing is the current "default" of creating exception is AWFUL. And D stands for sane defaults and the simple path being good last time I checked. -- Dmitry OlshanskyOne simple idea is to statically allocate the same exception and rethrow it over and over. After all there's no guarantee a distinct exception is thrown every time, and the approach is still memory safe (though it might surprise the programmer who saves a reference to an old exception). AndreiI don't think it's that simple. What happens if an XException causes another XException and they need to be chained together?
Feb 08 2014
On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky wrote:>If both are thread-local and cached I see no problem whatsoever. The thing is the current "default" of creating exception is AWFUL. And D stands for sane defaults and the simple path being good last time I checked.How is it not a problem? XException's fields (message, location etc) would be overwritten by the latest throw site, and its `next` field would point to itself.
Feb 08 2014
On Saturday, February 08, 2014 11:17:25 Jakob Ovrum wrote:On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky wrote:>Then we have multiple of them, or we new up another one when a second one is needed. Even if it were only the first exception which avoided the allocation, it would be a big gain, and in most cases, you're only going to get a single exception, or the exceptions will be of different types. - Jonathan M DavisIf both are thread-local and cached I see no problem whatsoever. The thing is the current "default" of creating exception is AWFUL. And D stands for sane defaults and the simple path being good last time I checked.How is it not a problem? XException's fields (message, location etc) would be overwritten by the latest throw site, and its `next` field would point to itself.
Feb 08 2014
On Saturday, 8 February 2014 at 11:27:27 UTC, Jonathan M Davis wrote:On Saturday, February 08, 2014 11:17:25 Jakob Ovrum wrote:Yes, I'm sure there is a cool solution, I'm just pointing out that it's not as simple as statically allocating. I think it would be a nice exercise to compose such a solution with std.allocator.On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky wrote:>Then we have multiple of them, or we new up another one when a second one is needed. Even if it were only the first exception which avoided the allocation, it would be a big gain, and in most cases, you're only going to get a single exception, or the exceptions will be of different types. - Jonathan M DavisIf both are thread-local and cached I see no problem whatsoever. The thing is the current "default" of creating exception is AWFUL. And D stands for sane defaults and the simple path being good last time I checked.How is it not a problem? XException's fields (message, location etc) would be overwritten by the latest throw site, and its `next` field would point to itself.
Feb 08 2014
Am Sat, 08 Feb 2014 11:33:51 +0000 schrieb "Jakob Ovrum" <jakobovrum gmail.com>:On Saturday, 8 February 2014 at 11:27:27 UTC, Jonathan M Davis wrote:Yes, it doesn't seem feasible otherwise. Since you can call functions recursively you could potentially chain exceptions from the same line of code several times. catch (Exception e) { staticException.line = __LINE__; staticException.file = __FILE__; staticException.next = e; // e.next is staticException throw staticException; } You'd have to flag staticException as "in use" and spawn a new instance every time you need another one of the same type. Since there is no way to reset that flag automatically when the last user goes out of scope (i.e. ref counting), that's not even an option. Preallocated exceptions only work if you are confident your exception wont be recursively thrown and thereby chained to itself. Granted, the majority of code, but really too much cognitive load when writing exception handling code. -- MarcoOn Saturday, February 08, 2014 11:17:25 Jakob Ovrum wrote:Yes, I'm sure there is a cool solution, I'm just pointing out that it's not as simple as statically allocating. I think it would be a nice exercise to compose such a solution with std.allocator.On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky wrote:>Then we have multiple of them, or we new up another one when a second one is needed. Even if it were only the first exception which avoided the allocation, it would be a big gain, and in most cases, you're only going to get a single exception, or the exceptions will be of different types. - Jonathan M DavisIf both are thread-local and cached I see no problem whatsoever. The thing is the current "default" of creating exception is AWFUL. And D stands for sane defaults and the simple path being good last time I checked.How is it not a problem? XException's fields (message, location etc) would be overwritten by the latest throw site, and its `next` field would point to itself.
Feb 08 2014
On Sunday, 9 February 2014 at 04:38:23 UTC, Marco Leise wrote:Yes, it doesn't seem feasible otherwise. Since you can call functions recursively you could potentially chain exceptions from the same line of code several times. catch (Exception e) { staticException.line = __LINE__; staticException.file = __FILE__; staticException.next = e; // e.next is staticException throw staticException; } You'd have to flag staticException as "in use" and spawn a new instance every time you need another one of the same type. Since there is no way to reset that flag automatically when the last user goes out of scope (i.e. ref counting), that's not even an option. Preallocated exceptions only work if you are confident your exception wont be recursively thrown and thereby chained to itself. Granted, the majority of code, but really too much cognitive load when writing exception handling code.While writes directly to line and file and such can't be prevented, `next` could be implemented as a property that does the conditional .dup when assigned to itself (or throw an Error).
Feb 09 2014
On Saturday, 8 February 2014 at 11:17:26 UTC, Jakob Ovrum wrote:On Saturday, 8 February 2014 at 11:05:38 UTC, Dmitry Olshansky wrote:>It's supposedly one exception instance per place where it can be thrown, not per exception type. Then the problem would be restricted to recursive calls, where in the exception handler for XException, another XException is thrown.If both are thread-local and cached I see no problem whatsoever. The thing is the current "default" of creating exception is AWFUL. And D stands for sane defaults and the simple path being good last time I checked.How is it not a problem? XException's fields (message, location etc) would be overwritten by the latest throw site, and its `next` field would point to itself.
Feb 09 2014
On 2/8/14, 3:02 AM, Jakob Ovrum wrote:On Saturday, 8 February 2014 at 00:49:46 UTC, Andrei Alexandrescu wrote:The chaining method detects that and .dup's one of them. AndreiOne simple idea is to statically allocate the same exception and rethrow it over and over. After all there's no guarantee a distinct exception is thrown every time, and the approach is still memory safe (though it might surprise the programmer who saves a reference to an old exception). AndreiI don't think it's that simple. What happens if an XException causes another XException and they need to be chained together?
Feb 08 2014
On Saturday, 8 February 2014 at 16:50:53 UTC, Andrei Alexandrescu wrote:The chaining method detects that and .dup's one of them. AndreiAfter some thinking I don't think it actually helps - exception will be modified _before_ throwing in library code so cloning will be to late. But I don't see any reason why basic exception instances in Phobos can't be made immutable.
Feb 08 2014
On Saturday, 8 February 2014 at 16:50:53 UTC, Andrei Alexandrescu wrote:On 2/8/14, 3:02 AM, Jakob Ovrum wrote:What if the statically allocated XException is escaped to be inspected later, but before that is thrown again in a separate exception chain? I suppose it would be no different from the current situation, as it's legal to throw exceptions allocated in any fashion, so there is already no guarantee of uniqueness. It's probable that some code out there still takes exception uniqueness for granted, so changing the allocation scheme would be a (typically silent) breaking change, even if the code is arguably broken in the first place. I suppose we could make that breakage a compile error by making exceptions implicitly `scope` at the catch-site, but that would of course be a much more involved change... Personally I still like the idea, but if implemented, I think something should be done about the change in uniqueness at the same time, even if it's just an added note in the language documentation on exceptions.On Saturday, 8 February 2014 at 00:49:46 UTC, Andrei Alexandrescu wrote:The chaining method detects that and .dup's one of them. AndreiOne simple idea is to statically allocate the same exception and rethrow it over and over. After all there's no guarantee a distinct exception is thrown every time, and the approach is still memory safe (though it might surprise the programmer who saves a reference to an old exception). AndreiI don't think it's that simple. What happens if an XException causes another XException and they need to be chained together?
Feb 08 2014
On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:Meh. If exceptions are such a liability we'd better make them (much) faster.They can be made faster by slowing down non-exception code. This has been debated at length in the C++ community, and the generally accepted answer is that non-exception code performance is preferred and exception performance is thrown under the bus in order to achieve it. I think it's quite a reasonable conclusion.
Feb 08 2014
Am Sat, 08 Feb 2014 14:01:12 -0800 schrieb Walter Bright <newshound2 digitalmars.com>:On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:https://yourlogicalfallacyis.com/black-or-white The reasons for slow exceptions in D could be the generation of stack trace strings or the garbage collector instead of inherent trade offs to keep the successful code path fast. And static allocation isn't an exactly appealing option... throw staticException ? staticException : (staticException = new SomethingException("Don't do this at home kids!")); and practically out of question when you need to chain exceptions and your call stack could contain this line of code more than once, resulting in infinite loops in exception chains as a new bug type in D, that is fixed by writing: catch (Exception e) { throw (staticException ? (e.linksTo(staticException) ? staticException.dupThenWrap(e) : staticException) : (staticException = new SomethingException("Don't do this at home kids!")); } -- MarcoMeh. If exceptions are such a liability we'd better make them (much) faster.They can be made faster by slowing down non-exception code. This has been debated at length in the C++ community, and the generally accepted answer is that non-exception code performance is preferred and exception performance is thrown under the bus in order to achieve it. I think it's quite a reasonable conclusion.
Feb 08 2014
On 2/8/2014 9:00 PM, Marco Leise wrote:The reasons for slow exceptions in D could be the generation of stack trace strings or the garbage collector instead of inherent trade offs to keep the successful code path fast.Sigh, once again, 1. It is not the collector 2. I've implemented it both ways, I know what I'm talking about. You can see the fast exception way in the Win32 code generation, and the slow way in the Linux code generation.
Feb 08 2014
Content-Disposition: inline Am Sat, 08 Feb 2014 14:01:12 -0800 schrieb Walter Bright <newshound2 digitalmars.com>:On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:Am Sat, 08 Feb 2014 21:31:53 -0800 schrieb Walter Bright <newshound2 digitalmars.com>:Meh. If exceptions are such a liability we'd better make them (much) faster.They can be made faster by slowing down non-exception code. This has been debated at length in the C++ community, and the generally accepted answer is that non-exception code performance is preferred and exception performance is thrown under the bus in order to achieve it. I think it's quite a reasonable conclusion.On 2/8/2014 9:00 PM, Marco Leise wrote:Ok, I'm on Linux which should be inherently slower at throwing exceptions as you say. So I've written a little test and it shows two things: 1. You are right, about the collector. It is not the bottleneck. 2. It doesn't have anything to do with trading speed for the successful code path either. I called two functions recursively until a nesting depth of 1000. The first version allocates a new exception, the second one reuses an existing exception. At the call site I caught the exception. I did this 10_000 times in a loop. [The code is attached.] Even at this nesting depth the second version still outperformed the first one by a factor of ~200(!) and all the CPU time (>98%) was is spent somewhere in libc. Using static exceptions (or similarly in C++: throwing literal strings) is VERY fast in D already and I see no reason to improve that at the moment. So I repeat my point: The reasons for slow exceptions in D could be the generation of stack trace strings or anything else other than some inherent trade offs to keep the successful code path fast. -- MarcoThe reasons for slow exceptions in D could be the generation of stack trace strings or the garbage collector instead of inherent trade offs to keep the successful code path fast.Sigh, once again, 1. It is not the collector 2. I've implemented it both ways, I know what I'm talking about. You can see the fast exception way in the Win32 code generation, and the slow way in the Linux code generation.
Feb 08 2014
On Sunday, 9 February 2014 at 05:00:15 UTC, Marco Leise wrote:And static allocation isn't an exactly appealing option... throw staticException ? staticException : (staticException = new SomethingException("Don't do this at home kids!")); and practically out of question when you need to chain exceptions and your call stack could contain this line of code more than once, resulting in infinite loops in exception chains as a new bug type in D, that is fixed by writing: catch (Exception e) { throw (staticException ? (e.linksTo(staticException) ? staticException.dupThenWrap(e) : staticException) : (staticException = new SomethingException("Don't do this at home kids!")); }This doesn't seem like a valid concern. Nothing stops you from using a (standard) function to do that ugly boilerplate.
Feb 09 2014
On Sunday, 9 February 2014 at 05:00:15 UTC, Marco Leise wrote:https://yourlogicalfallacyis.com/black-or-whiteOff topic, but that is a fantastic web site. I wish I had known about it before.
Feb 09 2014
On 2/8/14, 9:00 PM, Marco Leise wrote:Am Sat, 08 Feb 2014 14:01:12 -0800 schrieb Walter Bright <newshound2 digitalmars.com>:This threads is about memory allocation, not exceptions being slow.On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:https://yourlogicalfallacyis.com/black-or-white The reasons for slow exceptions in D could be the generation of stack trace strings or the garbage collector instead of inherent trade offs to keep the successful code path fast.Meh. If exceptions are such a liability we'd better make them (much) faster.They can be made faster by slowing down non-exception code. This has been debated at length in the C++ community, and the generally accepted answer is that non-exception code performance is preferred and exception performance is thrown under the bus in order to achieve it. I think it's quite a reasonable conclusion.And static allocation isn't an exactly appealing option... throw staticException ? staticException : (staticException = new SomethingException("Don't do this at home kids!"));Function calls could do that. Andrei
Feb 09 2014
On Thursday, 6 February 2014 at 22:15:11 UTC, Brad Anderson wrote:Personally I don't think bad user input qualifies as an exceptional case because it's expected to happen and the program is expected to handle it (and let the user know) when it does. That's just a matter of taste though.Hmm... then what _does_ qualify as exceptional in your opinion? A logic error (i.e. a mistake on the programmers side) doesn't, IMO, it should abort instead. On the other hand, there is the class of situations where e.g. a system call returns an error (say, "permission denied" when opening a file, or out of disk space). Or more generally, an external service, like a database or a remote server. However, I can't see how these are fundamentally different from invalid user input, and indeed, there's often not even a clear separation, e.g. when a user asked you to read a file they don't have access to. So, what's left then?
Feb 07 2014
On Friday, 7 February 2014 at 14:26:48 UTC, Marc Schütz wrote:Hmm... then what _does_ qualify as exceptional in your opinion? A logic error (i.e. a mistake on the programmers side) doesn't, IMO, it should abort instead. On the other hand, there is the class of situations where e.g. a system call returns an error (say, "permission denied" when opening a file, or out of disk space). Or more generally, an external service, like a database or a remote server. However, I can't see how these are fundamentally different from invalid user input, and indeed, there's often not even a clear separation, e.g. when a user asked you to read a file they don't have access to. So, what's left then?It is exceptional situation if input is supposed to be valid but surprisingly is not. For example, calling `decodeGrapheme` on external string without making sure it is valid first. Same goes for file - trying open a missing file is exceptional, but checking for file presence is not.
Feb 07 2014
On Friday, 7 February 2014 at 14:42:18 UTC, Dicebot wrote:On Friday, 7 February 2014 at 14:26:48 UTC, Marc Schütz wrote:If the function expects it to be valid but you pass it an invalid value, you're breaking the contract, which is a logic error and thus should be checked for by assert, not by an exception. => Case number one: logic errors, no exceptions should be used here. If however the function doesn't require it to be valid (for `decodeGrapheme` the docs don't say anything, so I assume it doesn't), then it needs to be able to handle invalid input, for example by throwing an exception. => This is an example of case number two: user errors, exceptions are okay here. But Brad Anderson seems to disagree on case two (or maybe case one?). Or is there a third type of situation not covered by these two cases?Hmm... then what _does_ qualify as exceptional in your opinion? A logic error (i.e. a mistake on the programmers side) doesn't, IMO, it should abort instead. On the other hand, there is the class of situations where e.g. a system call returns an error (say, "permission denied" when opening a file, or out of disk space). Or more generally, an external service, like a database or a remote server. However, I can't see how these are fundamentally different from invalid user input, and indeed, there's often not even a clear separation, e.g. when a user asked you to read a file they don't have access to. So, what's left then?It is exceptional situation if input is supposed to be valid but surprisingly is not. For example, calling `decodeGrapheme` on external string without making sure it is valid first.Same goes for file - trying open a missing file is exceptional, but checking for file presence is not.I agree here, checking for presence is not exceptional.
Feb 07 2014
On Friday, 7 February 2014 at 14:26:48 UTC, Marc Schütz wrote:or a remote server. However, I can't see how these are fundamentally different from invalid user input, and indeed, there's often not even a clear separation, e.g. when a user asked you to read a file they don't have access to.I agree. Any situation where it makes sense to say: "Ouch, this is not going to work out, roll back, roll back, let's move out of this module! We need to try a different approach. We are not going to continue with anything productive down this lane, lets go back to the context and get into a new direction." is suitable for exceptions and it makes code reuse, evolution and modification to error reporting easy. - validation and veracity checking - authentication failures - database failures - transactional retries - serious allocation issues - timeouts are all fiiine for exceptions. You get to write a request handler like this: { auto sid = request.authenticate(); auto data = validator(request.getPost('label1','label2','label3')); auto key = model.create_and_put(sid,data); response.writeJson(key); response.status = 201; return; } And you can change the error reporting at the request dispatcher level rather than sifting through 20 different spaghetti-like request handlers trying to figure out if you got it right: { auto sid = request.authenticate(); if (sid<0){ ... return ...; } auto data = request.getPost('label1','label2','label3'); if (data){ data = validate(data); if (data){ auto key = model.create_and_put(sid,data); if (item){ auto ok = response.writeJson(key); if(ok){ response.status = 201; return; } ....; } else { .... ; } } else { .... ; } } else { ... ; } }
Feb 07 2014
On Friday, February 07, 2014 14:26:47 Marc Schütz" <schuetzm gmx.net> puremagic.com wrote:On Thursday, 6 February 2014 at 22:15:11 UTC, Brad Anderson wrote:Honestly, I think that the typical approach of discussing exceptions as being for "exceptional" circumstances is bad. It inevitably leads to confusion and debate over what "exceptional" means. Some programmers would consider that to mean any bad input, whereas others would take it to the extreme that they should only happen when your program is in an invalid state (essentially what we use Errors for). I've found rather that when discussing exceptions it works much better to explain exactly why you'd use them, and I think that that comes primarily down to three types of circumstances. 1. Code which which should succeed most of the time and which would be far cleaner if it's written to throw exceptions - particularly when the alternative would be to check error codes on every function call (which would be incredibly error-prone). A prime example of this would be a parser. It's far cleaner to write a parser which assumes that each step succeeds than it is to constantly check that each one succeeded. It makes it so that only code that could actually encounter an error has to check for it and so that it can easily and cleanly propagate the error to the top. Doing that with error codes would generally be a mess, and unless failure is the norm, efficiency shouldn't be a problem. 2. Code which you can't actually guarantee will ever succeed. There are some cases where you can avoid errors by doing validation before proceeding (e.g. testing strings for Unicode correctness before doing a lot of string processing), but there are others where you either can't validate ahead of time or where you could still end up with an error in spite of your validation. A prime example of this would be operating on files. For, instance, std.file.isDir will tell you whether a particular file is directory or not by returning bool. If that file does not actually exist, then what is isDir supposed to do? All it can do is throw an exception, unless you want to have a separate out parameter to report whether it succeeded or not or change it so that it returns an error code and returns the bool as out parameter, both of which would make it much uglier to use. And isDir can't assert that the file exists, because that's a runtime condition that cannot be fully verified ahead of time. You can (and should) check whether the file exists first if(file.exists) { if(file.isDir) {} else if(file.isFile) {} else {} } but the file system could actually delete that file right out from under you between the call to exists and the call to isDir (or between the calls to isDir and isFile), so validation reduces how often you hit the error case but cannot eliminate it. It should also be rare that isDir will fail (since you should be checking that the file exists first). So, throwing an exception makes perfect sense. You get clean code that's still able to handle error cases rather than them being ignored (as frequently happens with error codes). 3. Code which should succeed most of the time but where doing validation essentially requires doing what you're validating for anyway. Again, parsers are a good example of this. For instance, to validate that "2013-12-22T01:22:27z" is in the valid ISO extended string format for a timestamp, you have to do pretty much exactly the same work that you have to do to parse out all of the values to convert it to something other than a string (e.g. SysTime). So, if you validated it first, you'd be doing the work twice. As such, why validate first? Just have it throw an exception when the parsing fails. And if for some reason, you expect that there's a high chance that the parsing would fail, then you can have a function which returns an error code and passed out the result as an out parameter instead, but that makes the code much uglier and error-prone. So, in most cases, you'd want it to throw an exception on failure. But regardless, you wouldn't want to validate it first as that would just be expensive all the time rather than more expensive in the (hopefully) rare error case. The areas that you want to normally avoid exceptions are when you're validating up front or when the error condition is likely. If you're validating, you're normally asking a question - is this data valid - in which case, returning bool is the correct thing to do, not throwing on failure (though if the result is false, the caller could choose to throw if appropriate). And trying to do something which has a good chance of failing should probably return whether it succeeded or not, because you don't want exceptions to be your normal code path. Also, performance-critical stuff may need to go the error-code path rather than exceptions simply due to it being performance-critical, but in general, error conditions which aren't bugs in your program should be reported via exceptions (not error codes) with validation being used where appropriate to make it so that the error conditions are infrequent. - Jonathan M DavisPersonally I don't think bad user input qualifies as an exceptional case because it's expected to happen and the program is expected to handle it (and let the user know) when it does. That's just a matter of taste though.Hmm... then what _does_ qualify as exceptional in your opinion?
Feb 07 2014
Jonathan M Davis:3. Code which should succeed most of the time but where doing validation essentially requires doing what you're validating for anyway. Again, parsers are a good example of this. For instance, to validate that "2013-12-22T01:22:27z" is in the valid ISO extended string format for a timestamp, you have to do pretty much exactly the same work that you have to do to parse out all of the values to convert it to something other than a string (e.g. SysTime). So, if you validated it first, you'd be doing the work twice. As such, why validate first? Just have it throw an exception when the parsing fails. And if for some reason, you expect that there's a high chance that the parsing would fail, then you can have a function which returns an error code and passed out the result as an out parameter instead, but that makes the code much uglier and error-prone. So, in most cases, you'd want it to throw an exception on failure.Languages with a good type system solve this with Maybe / Nullable / Optional and similar things. It's both safe (and efficient if the result is equivalent to just a wapping struct). Bye, bearophile
Feb 07 2014
On Friday, February 07, 2014 21:27:04 bearophile wrote:Jonathan M Davis:That can be a good solution, but it also then requires checking the result. One of the big advantages of exceptions is that your code can not care except for the relatively few points that catch exceptions and handle them. Where you run into problems is when the failure case is likely. And if that's the case, then something like Maybe or Nullable is definitely better. - Jonathan M Davis3. Code which should succeed most of the time but where doing validation essentially requires doing what you're validating for anyway. Again, parsers are a good example of this. For instance, to validate that "2013-12-22T01:22:27z" is in the valid ISO extended string format for a timestamp, you have to do pretty much exactly the same work that you have to do to parse out all of the values to convert it to something other than a string (e.g. SysTime). So, if you validated it first, you'd be doing the work twice. As such, why validate first? Just have it throw an exception when the parsing fails. And if for some reason, you expect that there's a high chance that the parsing would fail, then you can have a function which returns an error code and passed out the result as an out parameter instead, but that makes the code much uglier and error-prone. So, in most cases, you'd want it to throw an exception on failure.Languages with a good type system solve this with Maybe / Nullable / Optional and similar things. It's both safe (and efficient if the result is equivalent to just a wapping struct).
Feb 07 2014
On Thursday, 6 February 2014 at 21:38:03 UTC, Dicebot wrote:Any application that operates on some external user input will be subject to DoS attack vector if it uses Phobos directly.Hmm, I hadn't considered that. Maybe exceptions could be handled automatically though due to the facts that there are rarely more than one in flight at any time and they typically don't live for long: 1) prohibit escaping of exception objects from catch blocks (we could just say it is undefined behavior in the spec). The data pointed to by the throwable object should be normal though, if you want to keep the exception, you can thus just shallow copy it. 2) Set aside a static (thread local) buffer early on with a size of like 512 bytes. 3) Make "throw new" call a special function which favors the static buffer. It can do a simple bump-the-pointer allocation in the static region or call the regular GC if there isn't enough room (should be extremely rare). throw e; works the same way it does now. You can pre-allocate with some other method if you want. 4) Have the compiler automatically insert a call to _d_free_exception in a scope(success) block inside every catch block. It checks the given reference, if it is in the static buffer, just zero it all out. If all the chain is in there, zeroing it will free it all. If there's any GC chained exceptions, zeroing it will orphan them and they'll be freed on the next sweep. Otherwise ... well do nothing, let the GC clean up after it. Proof of concept: bool isThrowable(const ClassInfo ci) { if(ci is null) return false; if(ci is typeid(Throwable)) return true; return isThrowable(ci.base); } byte[512] exceptionHolder = 0; size_t exceptionHolderPosition = 0; extern(C) Object _d_newclass(const ClassInfo ci) { if(!isThrowable(ci)) return _d_newclass_original(ci); auto size = ci.init.length; if(exceptionHolderPosition + size > exceptionHolder.length) return _d_newclass_original(ci); byte[] slice = exceptionHolder[exceptionHolderPosition .. exceptionHolderPosition + size]; exceptionHolderPosition += size; slice[] = ci.init[]; import core.stdc.stdio; printf("Magic allocation to %d\n", exceptionHolderPosition); return cast(Object) slice.ptr; } extern(C) void _d_freeexception(Throwable t) { auto ptr = cast(void*) t; if(ptr >= exceptionHolder.ptr && ptr < exceptionHolder.ptr + exceptionHolder.length) { exceptionHolder[] = 0; exceptionHolderPosition = 0; import core.stdc.stdio; printf("Freeing\n"); } // else do nothing, the GC will handle it } void main() { import std.stdio; try { writefln("%s"); // orphaned argument } catch(Exception e) { scope(success) _d_freeexception(e); writeln(e); } } // copy/paste from druntime as fallback extern (C) void onOutOfMemoryError(); extern (C) void* gc_malloc( size_t sz, uint ba = 0 ); extern (C) Object _d_newclass_original(const ClassInfo ci) { import core.stdc.stdlib; static import core.memory; alias BlkAttr = core.memory.GC.BlkAttr; void* p; if (ci.m_flags & TypeInfo_Class.ClassFlags.isCOMclass) { p = malloc(ci.init.length); if (!p) onOutOfMemoryError(); } else { // TODO: should this be + 1 to avoid having pointers to the next block? BlkAttr attr = BlkAttr.FINALIZE; // extern(C++) classes don't have a classinfo pointer in their vtable so the GC can't finalize them if (ci.m_flags & TypeInfo_Class.ClassFlags.isCPPclass) attr &= ~BlkAttr.FINALIZE; if (ci.m_flags & TypeInfo_Class.ClassFlags.noPointers) attr |= BlkAttr.NO_SCAN; p = gc_malloc(ci.init.length, attr); } // initialize it (cast(byte*) p)[0 .. ci.init.length] = ci.init[]; debug(PRINTF) printf("initialization done\n"); return cast(Object) p; } === Just compile and run normally, the linker will prefer our d_newclass to the one in phobos.lib automatically. And you'll see the throw from writeln went into our static buffer and was freed at the end. I toyed with a few other things too: void main() { import std.stdio; try { try { writefln("%s"); // orphaned argument } catch(Exception e) { scope(success) _d_freeexception(e); // don't forget these throw new Exception("LOL", e); } } catch(Exception e) { scope(success) _d_freeexception(e); writeln(e); writeln(e.next); } } still works. Am I missing a fatal flaw here? It seems to work and is kinda simple to do... exceptions don't really need a huge amount of dynamic memory.
Feb 06 2014
On Thursday, 6 February 2014 at 22:56:45 UTC, Adam D. Ruppe wrote:Proof of concept:code in a link so the lines aren't broken http://arsdnet.net/dcode/except.d
Feb 06 2014
On Thursday, 6 February 2014 at 22:56:45 UTC, Adam D. Ruppe wrote:On Thursday, 6 February 2014 at 21:38:03 UTC, Dicebot wrote:I really like vibe.d. A lot. But the way HTTP parse errors are handled is a disaster. Do you know what happened when I was testing vibe.d recently and I sent it a bad request? It sent a stack trace as a responses. A stack trace! To a client! I was speechless. Needless to say, I don't support the idea of further enabling this design, regardless of whether it can be made a pinnacle of elegance.Any application that operates on some external user input will be subject to DoS attack vector if it uses Phobos directly.Hmm, I hadn't considered that. Maybe exceptions could be handled automatically though due to the facts that there are rarely more than one in flight at any time and they typically don't live for long: [snipped lengthy example]
Feb 06 2014
On Friday, 7 February 2014 at 03:19:32 UTC, Sean Kelly wrote:It sent a stack trace as a responses. A stack trace! To a client! I was speechless.lol, my cgi.d will do that too if you compile with -debug.... I find it convenient at times. (It also sends it to stderr but when doing cgi apps, that means digging into the apache log which is a pain compared to just looking at the browser)
Feb 06 2014
On 2014-02-07 04:19, Sean Kelly wrote:I really like vibe.d. A lot. But the way HTTP parse errors are handled is a disaster. Do you know what happened when I was testing vibe.d recently and I sent it a bad request? It sent a stack trace as a responses. A stack trace! To a client! I was speechless. Needless to say, I don't support the idea of further enabling this design, regardless of whether it can be made a pinnacle of elegance.Ruby on Rails renders a page with a stack trace in development mode and a standard 500 page in production mode. I can't understand how anyone can do web development without that. There's even a plugin that renders a the stack trace as links pointing back to your editor (if supported). It also allows you to navigate the stack trace with a code snippet and simple debugger for each stack frame. Very convenient. -- /Jacob Carlborg
Feb 07 2014
On Friday, 7 February 2014 at 20:31:00 UTC, Jacob Carlborg wrote:On 2014-02-07 04:19, Sean Kelly wrote:I was mostly surprised that the stack trace was written back to the client. I'd expect something like that in a log on the server side. I do see how it would be convenient to have a stack trace included in a bug report, but if this feature is disabled in release mode then you can't rely on it anyway. I'd just always be checking the logs (where I'd hope the stack trace would always be written).I really like vibe.d. A lot. But the way HTTP parse errors are handled is a disaster. Do you know what happened when I was testing vibe.d recently and I sent it a bad request? It sent a stack trace as a responses. A stack trace! To a client! I was speechless. Needless to say, I don't support the idea of further enabling this design, regardless of whether it can be made a pinnacle of elegance.Ruby on Rails renders a page with a stack trace in development mode and a standard 500 page in production mode. I can't understand how anyone can do web development without that. There's even a plugin that renders a the stack trace as links pointing back to your editor (if supported). It also allows you to navigate the stack trace with a code snippet and simple debugger for each stack frame. Very convenient.
Feb 07 2014
On 2014-02-07 21:56, Sean Kelly wrote:I was mostly surprised that the stack trace was written back to the client. I'd expect something like that in a log on the server side. I do see how it would be convenient to have a stack trace included in a bug report, but if this feature is disabled in release mode then you can't rely on it anyway. I'd just always be checking the logs (where I'd hope the stack trace would always be written).Ruby on Rails always writes the stack trace to the log. In development mode it will also render it to the client. In production mode we use a plugin that sends an email when an exception occurs. The email will contain the full stack trace, environment variables and some other data about the request that failed. BTW, you can do a lot more with HTML than plain text (log files). -- /Jacob Carlborg
Feb 09 2014
On Thursday, 6 February 2014 at 21:38:03 UTC, Dicebot wrote:Hardly so. Any exception allocation can trigger GC collection cycle and Phobos does not provide any other way to handle data errors. Any application that operates on some external user input will be subject to DoS attack vector if it uses Phobos directly.Thinking about this more it'd probably be a good idea to use the type system to segregate non-validated user input from the rest of your program. UnvalidatedString or something. UnvalidatedString.validate() returns a string you can then use in the regular fashion. That way unvalidated data can't weasel its way into the trusted portion of your program without getting checked first. Anyway, that's just an idea (and getting further and further off topic).
Feb 06 2014
On Friday, 7 February 2014 at 05:25:26 UTC, Brad Anderson wrote:Thinking about this more it'd probably be a good idea to use the type system to segregate non-validated user input from the rest of your program. UnvalidatedString or something. UnvalidatedString.validate() returns a string you can then use in the regular fashion. That way unvalidated data can't weasel its way into the trusted portion of your program without getting checked first. Anyway, that's just an idea (and getting further and further off topic).Yes, I even had some simple proof-of-concept drafts of such approach for vibe.d but have never finished it. User input is not a problem if Phobos will provide more strongly typed nothrow tools.
Feb 07 2014
On Friday, 7 February 2014 at 11:06:47 UTC, Dicebot wrote:Yes, I even had some simple proof-of-concept drafts of such approach for vibe.d but have never finished it. User input is not a problem if Phobos will provide more strongly typed nothrow tools.Yeah, I think using separate types for printing to users is often a good idea too, since then the type system can help with i18n.
Feb 07 2014
On Thu, 06 Feb 2014 14:08:39 -0500, Adam D. Ruppe <destructionator gmail.com> wrote:On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:I think if reference counting is added, exceptions would be a prime candidate for using it. They are basically discarded immediately after being handled. -SteveHey, wait a second. How do you throw without allocating?I think exceptions should be ok. You optimize the typical path, and exceptions are (by definition) an exceptional path. If they are also unacceptable, you could restrict yourself to nothrow functions. (Which can still throw Errors... but meh they are even *more* exceptional)
Feb 06 2014
On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:Does this case even matter? Exceptions are not a normal function of execution, and so should happen rarely to never. And it's a time when I'd expect a delay anyway.Hey, wait a second. How do you throw without allocating?One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions.Good point, we need to address that as well.
Feb 06 2014
On 2/6/14, 11:54 AM, Sean Kelly wrote:On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:I think it's okay to put this on the backburner and revisit it later. AndreiOn Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:Does this case even matter? Exceptions are not a normal function of execution, and so should happen rarely to never. And it's a time when I'd expect a delay anyway.Hey, wait a second. How do you throw without allocating?One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions.Good point, we need to address that as well.
Feb 06 2014
On Thursday, 6 February 2014 at 19:54:27 UTC, Sean Kelly wrote:Does this case even matter? Exceptions are not a normal function of execution, and so should happen rarely to never. And it's a time when I'd expect a delay anyway.Imagine intentionally crafted broken utf as user input in repeated requests. You don't have control over it. Now if Phobos would have only thrown exceptions in really _exceptional_ situations and handled broken input gracefully...
Feb 06 2014
On Thursday, 6 February 2014 at 21:48:13 UTC, Dicebot wrote:On Thursday, 6 February 2014 at 19:54:27 UTC, Sean Kelly wrote:You should probably validate utf from all foreign sources. Catch a problem with it as it comes in rather than in some arbitrary part of your program.Does this case even matter? Exceptions are not a normal function of execution, and so should happen rarely to never. And it's a time when I'd expect a delay anyway.Imagine intentionally crafted broken utf as user input in repeated requests. You don't have control over it. Now if Phobos would have only thrown exceptions in really _exceptional_ situations and handled broken input gracefully...
Feb 06 2014
On Thursday, 6 February 2014 at 22:18:10 UTC, Brad Anderson wrote:You should probably validate utf from all foreign sources. Catch a problem with it as it comes in rather than in some arbitrary part of your program.pure safe void validate(S)(in S str) if (isSomeString!S); Throws: UTFException if str is not well-formed. ;)
Feb 06 2014
On Thursday, 6 February 2014 at 22:20:38 UTC, Dicebot wrote:On Thursday, 6 February 2014 at 22:18:10 UTC, Brad Anderson wrote:Heh, well then... let me just wipe this egg off my face. :PYou should probably validate utf from all foreign sources. Catch a problem with it as it comes in rather than in some arbitrary part of your program.pure safe void validate(S)(in S str) if (isSomeString!S); Throws: UTFException if str is not well-formed. ;)
Feb 06 2014
On Thursday, 6 February 2014 at 22:20:38 UTC, Dicebot wrote:On Thursday, 6 February 2014 at 22:18:10 UTC, Brad Anderson wrote:And somewhere in the world, darkness fell forever on a bright and beautiful countryside. The monsters poured forth and devoured everything in sight, given strength by that unbelievable abomination of a function design.You should probably validate utf from all foreign sources. Catch a problem with it as it comes in rather than in some arbitrary part of your program.pure safe void validate(S)(in S str) if (isSomeString!S); Throws: UTFException if str is not well-formed.
Feb 06 2014
On Friday, 7 February 2014 at 03:14:45 UTC, Sean Kelly wrote:On Thursday, 6 February 2014 at 22:20:38 UTC, Dicebot wrote:Yeah, that is absurd. It is a bad, bad sign when almost every time you use a function, you write bool ok = true; try validate(s); catch(UTFException) ok = false; if(!ok) {} yet that's how i use validate... fun fact, my little toy scripting language supports var a = try foo();; // if foo throws, a == the exception object but it's a toy scripting language, ugly crap is allowed there :)UTFException if str is not well-formed.unbelievable abomination of a function design.
Feb 06 2014
On 2/6/14, 7:27 PM, Adam D. Ruppe wrote:On Friday, 7 February 2014 at 03:14:45 UTC, Sean Kelly wrote:Add a bugzilla and let's define isValid that returns bool! AndreiOn Thursday, 6 February 2014 at 22:20:38 UTC, Dicebot wrote:Yeah, that is absurd. It is a bad, bad sign when almost every time you use a function, you write bool ok = true; try validate(s); catch(UTFException) ok = false; if(!ok) {} yet that's how i use validate...UTFException if str is not well-formed.unbelievable abomination of a function design.
Feb 07 2014
On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:Add a bugzilla and let's define isValid that returns bool!Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code but assigns the return value through another parameter.
Feb 07 2014
07-Feb-2014 20:29, Andrej Mitrovic пишет:On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec. -- Dmitry OlshanskyAdd a bugzilla and let's define isValid that returns bool!Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code
Feb 07 2014
On 2/7/14, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.A NaN for chars? Sounds great to me! :)
Feb 07 2014
07-Feb-2014 21:07, Andrej Mitrovic пишет:On 2/7/14, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:It's called \uFFFD and is specifically for bad encodings. I wonder why nobody had perused the spec when writing std.utf.decode in the first place... 5.22 Best Practice for U+FFFD Substitution When converting text from one character encoding to another, a conversion algorithm may encounter unconvertible code units. This is most commonly caused by some sort of corruption of the source data, so that it does not correctly follow the specification for that character encoding. Examples include dropping a byte in a multibyte encoding such as Shift-JIS, improper concatenation of strings, a mismatch between an encoding declaration and actual encoding of text, use of non-shortest form for UTF-8, and so on. ... Whenever an unconvertible offset is reached during conversion of a code unit sequence: 1. The maximal subpart at that offset should be replaced by a single U+FFFD. 2. The conversion should proceed at the offset immediately after the maximal subpart. --- Fast, simple and according to the standard. Best of all - no stinkin' exceptions! ;) -- Dmitry OlshanskyMuch simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.A NaN for chars? Sounds great to me! :)
Feb 07 2014
On 2/7/2014 12:14 PM, Dmitry Olshansky wrote:Fast, simple and according to the standard. Best of all - no stinkin' exceptions! ;)Nice find. Looks good to me.
Feb 08 2014
09-Feb-2014 02:16, Walter Bright пишет:On 2/7/2014 12:14 PM, Dmitry Olshansky wrote:https://d.puremagic.com/issues/show_bug.cgi?id=12113 -- Dmitry OlshanskyFast, simple and according to the standard. Best of all - no stinkin' exceptions! ;)Nice find. Looks good to me.
Feb 08 2014
On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:07-Feb-2014 20:29, Andrej Mitrovic пишет:Isn't that actually worse? Unless you're suggesting that we stop throwing on decode errors, then functions like std.array.front will have to check the result on every call to see whether it was valid or not and thus whether they should throw, which would mean extra overhead over simply having decode throw on decode errors. validate has no business throwing, and we definitely should add isValidUnicode (or isValid or whatever you want to call it) for validation purposes. Code can then call that to validate that a string is valid and not worry about any UTFExceptions being thrown as long as it doesn't manipulate the string in a way that could result in its Unicode becoming invalid. However, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO. You're essentially dealing with error codes at that point, and I think that experience has shown quite clearly that error codes are generally a bad way to go. Almost no one checks them unless they have to. I think that having decode throw on invalid Unicode is exactly what it should be doing. The problem is that validate shouldn't. - Jonathan M DavisOn Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.Add a bugzilla and let's define isValid that returns bool!Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code
Feb 07 2014
On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis wrote:On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.07-Feb-2014 20:29, Andrej Mitrovic пишет:Isn't that actually worse? Unless you're suggesting that we stop throwing on decode errors, then functions like std.array.front will have to check the result on every call to see whether it was valid or not and thus whether they should throw, which would mean extra overhead over simply having decode throw on decode errors. validate has no business throwing, and we definitely should add isValidUnicode (or isValid or whatever you want to call it) for validation purposes. Code can then call that to validate that a string is valid and not worry about any UTFExceptions being thrown as long as it doesn't manipulate the string in a way that could result in its Unicode becoming invalid. However, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO. You're essentially dealing with error codes at that point, and I think that experience has shown quite clearly that error codes are generally a bad way to go. Almost no one checks them unless they have to. I think that having decode throw on invalid Unicode is exactly what it should be doing. The problem is that validate shouldn't. - Jonathan M DavisOn Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.Add a bugzilla and let's define isValid that returns bool!Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code
Feb 07 2014
On Friday, February 07, 2014 23:01:46 Meta wrote:On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis wrote:How is that any better than returning an invalid dchar with a specific value? In either case, you have to check the value. With the exception, code doesn't have to care. If the string is invalid, it'll get a UTFException, and it can handle it appropriately, but having to check the return value just adds overhead (albeit minimal) and is error-prone, because it generally won't be checked (and if it is checked, it complicates the calling code, because it has to do the check). Code that doesn't want to risk a UTFException being thrown can validate up front - and that validator function return bool and _not_ throw. But having decode not throw is going to be error-prone. It also doesn't help performance- wise, because it still has to do all of the same validity checks as it decodes. It's just that instead of throwing, it returns an error value. I really think that having decode throw on invalid Unicode is the right decision, and I don't see what we gain by making it not throw. - Jonathan M DavisOn Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.07-Feb-2014 20:29, Andrej Mitrovic пишет:Isn't that actually worse? Unless you're suggesting that we stop throwing on decode errors, then functions like std.array.front will have to check the result on every call to see whether it was valid or not and thus whether they should throw, which would mean extra overhead over simply having decode throw on decode errors. validate has no business throwing, and we definitely should add isValidUnicode (or isValid or whatever you want to call it) for validation purposes. Code can then call that to validate that a string is valid and not worry about any UTFExceptions being thrown as long as it doesn't manipulate the string in a way that could result in its Unicode becoming invalid. However, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO. You're essentially dealing with error codes at that point, and I think that experience has shown quite clearly that error codes are generally a bad way to go. Almost no one checks them unless they have to. I think that having decode throw on invalid Unicode is exactly what it should be doing. The problem is that validate shouldn't. - Jonathan M DavisOn Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.Add a bugzilla and let's define isValid that returns bool!Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code
Feb 07 2014
On Friday, 7 February 2014 at 23:45:06 UTC, Jonathan M Davis wrote:On Friday, February 07, 2014 23:01:46 Meta wrote:We have had this discussion at least once before. A hypothetical Option type will not let you do anything with the wrapped value UNTIL you check it, as opposed to returning null, -1, some special Unicode value, etc. Trying to use it before this check is necessarily a compile-time error. This is both faster than exceptions and safer than special "error values" that are only special by convention. I recall that you've worked with Haskell before, so you must know how useful this pattern is.On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis wrote:How is that any better than returning an invalid dchar with a specific value? In either case, you have to check the value. With the exception, code doesn't have to care. If the string is invalid, it'll get a UTFException, and it can handle it appropriately, but having to check the return value just adds overhead (albeit minimal) and is error-prone, because it generally won't be checked (and if it is checked, it complicates the calling code, because it has to do the check).On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.07-Feb-2014 20:29, Andrej Mitrovic пишет:Isn't that actually worse? Unless you're suggesting that we stop throwing on decode errors, then functions like std.array.front will have to check the result on every call to see whether it was valid or not and thus whether they should throw, which would mean extra overhead over simply having decode throw on decode errors. validate has no business throwing, and we definitely should add isValidUnicode (or isValid or whatever you want to call it) for validation purposes. Code can then call that to validate that a string is valid and not worry about any UTFExceptions being thrown as long as it doesn't manipulate the string in a way that could result in its Unicode becoming invalid. However, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO. You're essentially dealing with error codes at that point, and I think that experience has shown quite clearly that error codes are generally a bad way to go. Almost no one checks them unless they have to. I think that having decode throw on invalid Unicode is exactly what it should be doing. The problem is that validate shouldn't. - Jonathan M DavisOn Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.Add a bugzilla and let's define isValid that returns bool!Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status codeCode that doesn't want to risk a UTFException being thrown can validate up front - and that validator function return bool and _not_ throw. But having decode not throw is going to be error-prone. It also doesn't help performance- wise, because it still has to do all of the same validity checks as it decodes. It's just that instead of throwing, it returns an error value. I really think that having decode throw on invalid Unicode is the right decision, and I don't see what we gain by making it not throw. - Jonathan M Davis
Feb 07 2014
On Saturday, February 08, 2014 01:26:10 Meta wrote:The problem is that you need to check it. This is _slower_ than exceptions in the normal case, as invalid Unicode should be the rare case. The great thing with exceptions is that you can write your code as if it will always work and don't need to put checks in it everywhere. Instead, you just put try-catch blocks in the (relatively) few places that you want to handle exceptions. Most of your code doesn't care. And if you validate the string before you start doing a bunch of operations on it, then you don't have to worry about a UTFException being thrown. Also, if code fails to validate a string for one reason or another, the error gets reported rather than an invalid return value being ignored. As for returning Optional/Nullable dchar vs an invalid dchar, I don't see much difference. In both cases, you have to check the return value, which is precisely what you don't want to have to do in most cases. And decode has to do the same work to check for valid Unicode whether it throws an exception or returns a value indicating decode-failure, so why have the extra overhead of having to check the result for decode-failure? Just let it throw an exception in that case and handle it in the appropriate part of your code. Returning a Nullable result or a specific bad value that you have to check rather than throwing an exception only makes sense when it's expected that failures are going to be frequent. If failures are infrequent, it's generally far better to use exceptions, because it will lead to much cleaner, less error-prone code. - Jonathan M DavisWe have had this discussion at least once before. A hypothetical Option type will not let you do anything with the wrapped value UNTIL you check it, as opposed to returning null, -1, some special Unicode value, etc. Trying to use it before this check is necessarily a compile-time error. This is both faster than exceptions and safer than special "error values" that are only special by convention. I recall that you've worked with Haskell before, so you must know how useful this pattern is.You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.How is that any better than returning an invalid dchar with a specific value? In either case, you have to check the value. With the exception, code doesn't have to care. If the string is invalid, it'll get a UTFException, and it can handle it appropriately, but having to check the return value just adds overhead (albeit minimal) and is error-prone, because it generally won't be checked (and if it is checked, it complicates the calling code, because it has to do the check).
Feb 07 2014
Jonathan M Davis:The problem is that you need to check it. This is _slower_ than exceptions in the normal case,Right, but verifying the correctness of the Unicode encoding of a string probably on average requires much more than time than testing a single conditional. So I think this tiny added time is acceptable. Bye, bearophile
Feb 07 2014
On Saturday, February 08, 2014 02:41:54 bearophile wrote:Jonathan M Davis:But why even do it in the first place then? The code is cleaner and less error-prone if it uses exceptions. The only argument I can see being made for not using exceptions with decode is efficiency, because it's more cumbersome to use if it's returning error values of some kind rather than just throwing in the rare case that there's a Unicode decoding error. It's also more error- prone than using exceptions, because most code will just skip checking the result. That's one of the big reasons that error codes are generally a bad idea. But since decode has to do the same validity checks whether it returns an invalid dchar or a Nullable!dchar or if it throws, I don't see why not having the exception buys us anything. It just makes the API worse. - Jonathan M DavisThe problem is that you need to check it. This is _slower_ than exceptions in the normal case,Right, but verifying the correctness of the Unicode encoding of a string probably on average requires much more than time than testing a single conditional. So I think this tiny added time is acceptable.
Feb 07 2014
Am Fri, 07 Feb 2014 22:42:00 -0500 schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:On Saturday, February 08, 2014 02:41:54 bearophile wrote:I agree with both of you. The Unicode standard tells us that it is correct to replace invalid data with that special code point, so it should be used where applicable, e.g. when one sanitizes an invalid string. On the other hand exceptions are clearly superior to error returns. I guess we just have two use cases here. One where invalid encoding is not an error (e.g. for sanitizing purposes) and one where you don't want to lose information and have to enforce correct encoding. Name the first one "decodeSubst" maybe and have decode call that and check for 0xFFFD? -- MarcoJonathan M Davis:But why even do it in the first place then? The code is cleaner and less error-prone if it uses exceptions. The only argument I can see being made for not using exceptions with decode is efficiency, because it's more cumbersome to use if it's returning error values of some kind rather than just throwing in the rare case that there's a Unicode decoding error. It's also more error- prone than using exceptions, because most code will just skip checking the result. That's one of the big reasons that error codes are generally a bad idea. But since decode has to do the same validity checks whether it returns an invalid dchar or a Nullable!dchar or if it throws, I don't see why not having the exception buys us anything. It just makes the API worse. - Jonathan M DavisThe problem is that you need to check it. This is _slower_ than exceptions in the normal case,Right, but verifying the correctness of the Unicode encoding of a string probably on average requires much more than time than testing a single conditional. So I think this tiny added time is acceptable.
Feb 07 2014
Am Sat, 8 Feb 2014 05:29:35 +0100 schrieb Marco Leise <Marco.Leise gmx.de>:Name the first one "decodeSubst" maybe and have decode call that and check for 0xFFFD?Err... the other way round. 0xFFFD would actually be valid from an encoding point of view, I guess. -- Marco
Feb 07 2014
On Saturday, February 08, 2014 05:29:35 Marco Leise wrote:I guess we just have two use cases here. One where invalid encoding is not an error (e.g. for sanitizing purposes) and one where you don't want to lose information and have to enforce correct encoding. Name the first one "decodeSubst" maybe and have decode call that and check for 0xFFFD?I think that that would call for us to have 3 related but distinct functions: 1. decode, which throws on invalid Unicode. We already have this. 2. isValidUnicode, which returns whether the string is valid Unicode and does not throw. We don't yet have this. Rather, we have validate which does the same job and then throws instead of returning bool. 3. sanitizeUnicode (or whatever would be a good name for it), which replaces invalid Unicode with 0xFFFD (or whatever the appropriate character is) so that it can be operated on without causing decode to throw in spite of the fact that it was invalid Unicode. We don't have anything like this yet. - Jonathan M Davis
Feb 07 2014
Am Fri, 07 Feb 2014 21:04:08 -0800 schrieb Jonathan M Davis <jmdavisProg gmx.com>:On Saturday, February 08, 2014 05:29:35 Marco Leise wrote:Yes, that's the one that needs to be added.I guess we just have two use cases here. One where invalid encoding is not an error (e.g. for sanitizing purposes) and one where you don't want to lose information and have to enforce correct encoding. Name the first one "decodeSubst" maybe and have decode call that and check for 0xFFFD?I think that that would call for us to have 3 related but distinct functions: 1. decode, which throws on invalid Unicode. We already have this. 2. isValidUnicode, which returns whether the string is valid Unicode and does not throw. We don't yet have this. Rather, we have validate which does the same job and then throws instead of returning bool.3. sanitizeUnicode (or whatever would be a good name for it), which replaces invalid Unicode with 0xFFFD (or whatever the appropriate character is) so that it can be operated on without causing decode to throw in spite of the fact that it was invalid Unicode. We don't have anything like this yet.And oh wonder, we actually have that already! Problem solved: (Not that I knew that before hand *cough*) Or does someone have a need to also sanitize code point by code point?- Jonathan M Davis-- Marco
Feb 07 2014
On Saturday, 8 February 2014 at 05:04:35 UTC, Jonathan M Davis wrote:I think that that would call for us to have 3 related but distinct functions: 1. decode, which throws on invalid Unicode. We already have this.I wonder if it'd be too reckless to just make decode for string nothrow (we want this function to be as fast as possible) and just require that string, by definition, must be valid unicode. to!string and company could validate strings as they come in from foreign sources. This way invalid unicode is caught early and decode gets a speedup. char[] is different because the mutability means it could be made invalid at any time so we can't rely on it staying valid after it's been checked but once a string has been confirmed valid there is no reason to check it for validity ever again.
Feb 08 2014
On 02/08/2014 07:44 PM, Brad Anderson wrote:On Saturday, 8 February 2014 at 05:04:35 UTC, Jonathan M Davis wrote:"☹"[1..$]I think that that would call for us to have 3 related but distinct functions: 1. decode, which throws on invalid Unicode. We already have this.I wonder if it'd be too reckless to just make decode for string nothrow (we want this function to be as fast as possible) and just require that string, by definition, must be valid unicode. to!string and company could validate strings as they come in from foreign sources. This way invalid unicode is caught early and decode gets a speedup. char[] is different because the mutability means it could be made invalid at any time so we can't rely on it staying valid after it's been checked but once a string has been confirmed valid there is no reason to check it for validity ever again.
Feb 08 2014
On Saturday, 8 February 2014 at 18:44:38 UTC, Brad Anderson wrote:I wonder if it'd be too reckless to just make decode for string nothrow (we want this function to be as fast as possible) andYes. It shouldn't throw. Never.just require that string, by definition, must be valid unicode.Why? Replacement of broken code is defined by unicode - we should use it. Noone prevents you to call isValidUnicode beforehand and handle that sepearately if it returns "false" (I would recomment that only if security is relevant e.g. if you chack a signature or something like that) or search for 0xFFFD in the result string afterwards and throw if you find some (but this is generally no good idea because the replacement characters may have been there even before and were intended). As default relplacing broken characters is very good. And fast.
Feb 08 2014
On Friday, February 07, 2014 21:04:08 Jonathan M Davis wrote:On Saturday, February 08, 2014 05:29:35 Marco Leise wrote:Actually, thinking this through some more, if we can replace invalid Unicode with 0xFFFD, and have all algorithms work with that and consider it valid Unicode (rather than getting weird bugs due to invalid Unicode), then if decode returned that on error rather than throwing, we wouldn't actually need to check the return value. It wouldn't matter that the Unicode was invalid. So, we wouldn't even need to _care_ that the Unicode was invalid. Anyone who _did_ care could call isValidUnicode to validate the Unicode first, and those who didn't wouldn't need to worry about UTFException being thrown, because everything would still work even if the string was invalid Unicode. So, if that's indeed what 0xFFFD does, and that's what Dmitry meant by proposing that we return that rather than throwing, then I rescind my assessment that throwing was the best way to go and have to agree that returning 0xFFFD would be better. I was responding under the assumption that you had to check for 0xFFFD and respond to it order to avoid having your code be buggy, in which case throwing would be far better. But if 0xFFFD is considered valid Unicode, then returning that would be a fantastic solution. And if that's the case, we only need two functions, not three: 1. decode, which returns 0xFFFD on decode failure 2. isValidUnicode, which returns whether the string is valid And I actually really like the idea that we could just operate on invalid Unicode as valid Unicode this way, making it so that most code doesn't need to care, and code that _does_ need to care, can validate the strings first. Right now, pretty much all string code needs to care in order to avoid processing invalid Unicode, which is much messier. - Jonathan M DavisI guess we just have two use cases here. One where invalid encoding is not an error (e.g. for sanitizing purposes) and one where you don't want to lose information and have to enforce correct encoding. Name the first one "decodeSubst" maybe and have decode call that and check for 0xFFFD?I think that that would call for us to have 3 related but distinct functions: 1. decode, which throws on invalid Unicode. We already have this. 2. isValidUnicode, which returns whether the string is valid Unicode and does not throw. We don't yet have this. Rather, we have validate which does the same job and then throws instead of returning bool. 3. sanitizeUnicode (or whatever would be a good name for it), which replaces invalid Unicode with 0xFFFD (or whatever the appropriate character is) so that it can be operated on without causing decode to throw in spite of the fact that it was invalid Unicode. We don't have anything like this yet.
Feb 07 2014
08-Feb-2014 09:45, Jonathan M Davis пишет:On Friday, February 07, 2014 21:04:08 Jonathan M Davis wrote: Actually, thinking this through some more, if we can replace invalid Unicode with 0xFFFD, and have all algorithms work with that and consider it valid Unicode (rather than getting weird bugs due to invalid Unicode), then if decode returned that on error rather than throwing, we wouldn't actually need to check the return value. It wouldn't matter that the Unicode was invalid. So, we wouldn't even need to _care_ that the Unicode was invalid. Anyone who _did_ care could call isValidUnicode to validate the Unicode first, and those who didn't wouldn't need to worry about UTFException being thrown, because everything would still work even if the string was invalid Unicode.Hm.. yes. I gotta read the whole thread next time :)So, if that's indeed what 0xFFFD does, and that's what Dmitry meant by proposing that we return that rather than throwing, then I rescind my assessment that throwing was the best way to go and have to agree that returning 0xFFFD would be better. I was responding under the assumption that you had to check for 0xFFFD and respond to it order to avoid having your code be buggy, in which case throwing would be far better. But if 0xFFFD is considered valid Unicode,It is.then returning that would be a fantastic solution. And if that's the case, we only need two functions, not three: 1. decode, which returns 0xFFFD on decode failure 2. isValidUnicode, which returns whether the string is validYay.And I actually really like the idea that we could just operate on invalid Unicode as valid Unicode this way, making it so that most code doesn't need to care, and code that _does_ need to care, can validate the strings first. Right now, pretty much all string code needs to care in order to avoid processing invalid Unicode, which is much messier.Horray! The goodness is that for example I can run regex on partially broken text and have some sane results out of it.- Jonathan M Davis-- Dmitry Olshansky
Feb 08 2014
08-Feb-2014 03:01, Meta пишет:On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis wrote:This is ridiculously distracting suggestion and simply has no merits whatsoever. To underline how impractical this suggestion is: currently every code out there expect dchar out of .front not some magic animal called 'Option!char'. -- Dmitry OlshanskyOn Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.
Feb 08 2014
On Saturday, 8 February 2014 at 11:24:56 UTC, Dmitry Olshansky wrote:08-Feb-2014 03:01, Meta пишет:I'm not actually suggesting a replacement. Just wishful thinking on how the function could've been better designed.On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis wrote:This is ridiculously distracting suggestion and simply has no merits whatsoever. To underline how impractical this suggestion is: currently every code out there expect dchar out of .front not some magic animal called 'Option!char'.On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.
Feb 08 2014
On Saturday, February 08, 2014 18:03:54 Meta wrote:On Saturday, 8 February 2014 at 11:24:56 UTC, Dmitry Olshansky wrote:I don't see how returning Nullable!dchar would improve decode function at all. Currently, it throws on invalid UTF, so you don't have to check the return value, and your code can avoid caring about decode errors except for the points where you put your catches (which are generally in far fewer places than the number of places that decode gets called - be it directly or indirectly). On the other hand, with Nullable!dchar, you'd have to always check the result or risking hitting an assertion when you don't check the result (or ending up with dchar.init in -release). I don't see how that's better than the current situation at all. It just makes decode harder to use. And Dmitry's suggestion is better than both. We end up returning the Unicode character specifically intended to designate bad encodings (\uFFFD) such that you don't even have to care that there was a decode error. You just decode the string and use it. It will just be one more character in the string that doesn't match what you're looking for for find and the like, and pretty much nothing should choke on it. Anything which then cares about Unicode validity can use isValidUnicode (once we have it) to validate the string instead of relying on decode to throw. It will clean up string processing in the face of invalid Unicode quite nicely. So, I don't see how using Nullable!dchar as you suggest would ever have been a better design. - Jonathan M Davis08-Feb-2014 03:01, Meta пишет:I'm not actually suggesting a replacement. Just wishful thinking on how the function could've been better designed.On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis wrote:This is ridiculously distracting suggestion and simply has no merits whatsoever. To underline how impractical this suggestion is: currently every code out there expect dchar out of .front not some magic animal called 'Option!char'.On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.
Feb 08 2014
08-Feb-2014 02:57, Jonathan M Davis пишет:On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:No, it's better and more flexible for those who care to repair broken text in case it's broken. We currently have ZERO facilities to work with partly broken UTF and it's not that rare thing to have it.07-Feb-2014 20:29, Andrej Mitrovic пишет:Isn't that actually worse?On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.Add a bugzilla and let's define isValid that returns bool!Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status codeUnless you're suggesting that we stop throwing on decode errors,That is exactly what I suggest. then functions like std.array.front will have to check theresult on every call to see whether it was valid or not and thus whether they should throw, which would mean extra overhead over simply having decode throw on decode errors.Why the heck? It will not throw either. In the very end bad encoding is handled by displaying the 'substituted' (typically '?') character in places where it broke not by throwing up hands in the air and spitting "UTF Exception: offset 4302 bad UTF sequence". This is not good enough (in case somebody though that it is). Those who care about throwing add a trivial map!(x => x != '\uFFFD' || die()) over a string, where die function throws an exception.validate has no business throwing, and we definitely should add isValidUnicode (or isValid or whatever you want to call it) for validation purposes. Code can then call that to validate that a string is valid and not worry about any UTFExceptions being thrown as long as it doesn't manipulate the string in a way that could result in its Unicode becoming invalid.Yet later down the road decode will triple check that anyway. Just saying. BTW if the string was checked beforehand there is no difference between 2 approaches at all (don't have to check).However, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO. You're essentially dealing with error codes at that point, and I think that experience has shown quite clearly that error codes are generally a bad way to go. Almost no one checks them unless they have to. I think that having decode throw on invalid Unicode is exactly what it should be doing. The problem is that validate shouldn't.Every single text editor out there seems to disagree with you: they do show you partially substituted text, not a dialog box "My bad, it's broken UTF-8, I'm giving up!". -- Dmitry Olshansky
Feb 08 2014
Am Sat, 08 Feb 2014 15:21:26 +0400 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:08-Feb-2014 02:57, Jonathan M Davis =D0=BF=D0=B8=D1=88=D0=B5=D1=82:ndOn Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:07-Feb-2014 20:29, Andrej Mitrovic =D0=BF=D0=B8=D1=88=D0=B5=D1=82:On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:Much simpler - it returns a special dchar to designate bad encoding. A=Add a bugzilla and let's define isValid that returns bool!Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code=20=20 No, it's better and more flexible for those who care to repair broken=20 text in case it's broken. We currently have ZERO facilities to work with=there is one defined by Unicode spec.Isn't that actually worse?partly broken UTF and it's not that rare thing to have it.Your argument is unsubstantiated, since we have this already:r theyUnless you're suggesting that we stop throwing on decode errors,=20 That is exactly what I suggest. =20 then functions like std.array.front will have to check theresult on every call to see whether it was valid or not and thus whethe=throwshould throw, which would mean extra overhead over simply having decode=|=20on decode errors.=20 Why the heck? It will not throw either. In the very end bad encoding is=20 handled by displaying the 'substituted' (typically '?') character in=20 places where it broke not by throwing up hands in the air and spitting=20 "UTF Exception: offset 4302 bad UTF sequence". This is not good enough=20 (in case somebody though that it is). =20 Those who care about throwing add a trivial map!(x =3D> x !=3D '\uFFFD' |=die()) over a string, where die function throws an exception.Thats neither an improvement over calling "validate" nor does that deal with distinguishing between invalid UTF and \uFFFD in the input.idationvalidate has no business throwing, and we definitely should add isValidUnicode (or isValid or whatever you want to call it) for val=d notpurposes. Code can then call that to validate that a string is valid an=lateworry about any UTFExceptions being thrown as long as it doesn't manipu=theirthe string in a way that could result in its Unicode becoming invalid.=20 Yet later down the road decode will triple check that anyway. Just=20 saying. BTW if the string was checked beforehand there is no difference=20 between 2 approaches at all (don't have to check). =20However, I would argue that assuming that everyone is going to validate=r havestrings and that pretty much all string-related functions shouldn't eve=theto worry about invalid Unicode is just begging for subtle bugs all over=and Iplace IMHO. You're essentially dealing with error codes at that point, =rally athink that experience has shown quite clearly that error codes are gene=hatbad way to go. Almost no one checks them unless they have to. I think t=ng. Thehaving decode throw on invalid Unicode is exactly what it should be doi=Editor do different things. They often try to detect the encoding with a fall back to Latin1. If you open a file explicitly as UTF-8 they may display a substitution char or detect the error and use the fall back, as is the case with Geany and gedit does in fact throw an error message at you saying "My bad, it's broken UTF-8, I'm giving up!". --=20 Marcoproblem is that validate shouldn't.=20 Every single text editor out there seems to disagree with you: they do=20 show you partially substituted text, not a dialog box "My bad, it's=20 broken UTF-8, I'm giving up!".
Feb 08 2014
09-Feb-2014 09:35, Marco Leise пишет:Am Sat, 08 Feb 2014 15:21:26 +0400 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:Working with ranges of dchar? Nobody is taking eager validation from your hands anyway.08-Feb-2014 02:57, Jonathan M Davis пишет:Your argument is unsubstantiated, since we have this already:On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:No, it's better and more flexible for those who care to repair broken text in case it's broken. We currently have ZERO facilities to work with partly broken UTF and it's not that rare thing to have it.07-Feb-2014 20:29, Andrej Mitrovic пишет:Isn't that actually worse?On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:Much simpler - it returns a special dchar to designate bad encoding. And there is one defined by Unicode spec.Add a bugzilla and let's define isValid that returns bool!Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status codeMeans text is broken but wasn't ever read...Thats neither an improvement over calling "validate" nor does that deal with distinguishing between invalid UTF andUnless you're suggesting that we stop throwing on decode errors,That is exactly what I suggest. then functions like std.array.front will have to check theresult on every call to see whether it was valid or not and thus whether they should throw, which would mean extra overhead over simply having decode throw on decode errors.Why the heck? It will not throw either. In the very end bad encoding is handled by displaying the 'substituted' (typically '?') character in places where it broke not by throwing up hands in the air and spitting "UTF Exception: offset 4302 bad UTF sequence". This is not good enough (in case somebody though that it is). Those who care about throwing add a trivial map!(x => x != '\uFFFD' || die()) over a string, where die function throws an exception.\uFFFD in the input....means text was broken sometime before. Hardly makes any difference to the most applications. Normal text doesn't contain \uFFFD. And you can test a string with proper 'validate', it's just that while decoding the default is to substitute.Throwing exception here is not something useful in 90% of cases. Requiring everybody to call sanitize on every string from the outside smells like a wrong default to me.Editor do different things. They often try to detect the encoding with a fall back to Latin1. If you open a file explicitly as UTF-8 they may display a substitution char or detect the error and use the fall back, as is the case with Geany andvalidate has no business throwing, and we definitely should add isValidUnicode (or isValid or whatever you want to call it) for validation purposes. Code can then call that to validate that a string is valid and not worry about any UTFExceptions being thrown as long as it doesn't manipulate the string in a way that could result in its Unicode becoming invalid.Yet later down the road decode will triple check that anyway. Just saying. BTW if the string was checked beforehand there is no difference between 2 approaches at all (don't have to check).However, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO. You're essentially dealing with error codes at that point, and I think that experience has shown quite clearly that error codes are generally a bad way to go. Almost no one checks them unless they have to. I think that having decode throw on invalid Unicode is exactly what it should be doing. The problem is that validate shouldn't.Every single text editor out there seems to disagree with you: they do show you partially substituted text, not a dialog box "My bad, it's broken UTF-8, I'm giving up!".gedit does in fact throw an error message at you saying "My bad, it's broken UTF-8, I'm giving up!".I know and it's piece of junk :) Seriously it doesn't even has regular expressions for search and replace! -- Dmitry Olshansky
Feb 09 2014
"Dmitry Olshansky" wrote in message news:ld7dla$pdg$1 digitalmars.com...That would be a luxury, gedit doesn't even have auto-indent.gedit does in fact throw an error message at you saying "My bad, it's broken UTF-8, I'm giving up!".I know and it's piece of junk :) Seriously it doesn't even has regular expressions for search and replace!
Feb 09 2014
Am Sun, 9 Feb 2014 22:24:21 +1100 schrieb "Daniel Murphy" <yebbliesnospam gmail.com>:"Dmitry Olshansky" wrote in message news:ld7dla$pdg$1 digitalmars.com...You can talk about missing features in gedit all day, but from my point of view an editor is broken when it doesn't throw an error message at you. By silently replacing incorrect UTF-8 they change the original text. 0xFFFD should probably be used only when error messages are out of question like when displaying/printing text only. -- MarcoThat would be a luxury, gedit doesn't even have auto-indent.gedit does in fact throw an error message at you saying "My bad, it's broken UTF-8, I'm giving up!".I know and it's piece of junk :) Seriously it doesn't even has regular expressions for search and replace!
Feb 16 2014
"Marco Leise" wrote in message news:20140217030525.67a21dfc org.homedns.org...0xFFFD should probably be used only when error messages are out of question like when displaying/printing text only.What do you use for displaying text, if not a text editor?
Feb 17 2014
Am Tue, 18 Feb 2014 01:01:53 +1100 schrieb "Daniel Murphy" <yebbliesnospam gmail.com>:"Marco Leise" wrote in message news:20140217030525.67a21dfc org.homedns.org...That was directed at D development. Or programming with Unicode encodings in general. If you load a text file and replace broken UTF-8 with \0xFFFD or ? as Sublime 3 does, you loose information. I think that smells and asks for a big red message box. gedit is an editor that works this way. What I meant by displaying text is static UI elements, since there is no risk of propagating the error. Everything else that can notify the user of the incorrect encoding or loss of information should do so. -- Marco0xFFFD should probably be used only when error messages are out of question like when displaying/printing text only.What do you use for displaying text, if not a text editor?
Feb 17 2014
Am Sun, 09 Feb 2014 12:18:41 +0400 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:09-Feb-2014 09:35, Marco Leise =D0=BF=D0=B8=D1=88=D0=B5=D1=82:Of course it does. It is a valid symbol and a lot of websites describing the "Specials" Unicode block make use of it, like the one on Wikipedia: http://en.wikipedia.org/wiki/Specials_(Unicode_block) With your definition, pulling such a document from the web and parsing it in D would mean playing on broken strings.Thats neither an improvement over calling "validate" nor does that deal with distinguishing between invalid UTF and=20 Means text is broken but wasn't ever read...\uFFFD in the input....means text was broken sometime before. =20 Hardly makes any difference to the most applications. Normal text doesn't contain \uFFFD.[...] Every single text editor out there seems to disagree with you: they do show you partially substituted text, not a dialog box "My bad, it's broken UTF-8, I'm giving up!".gedit does in fact throw an error message at you saying "My bad, it's broken UTF-8, I'm giving up!".I know and it's piece of junk :) Seriously it doesn't even has regular expressions for search and replace!https://yourlogicalfallacyis.com/no-true-scotsman :p --=20 Marco
Feb 16 2014
17-Feb-2014 06:19, Marco Leise пишет:Am Sun, 09 Feb 2014 12:18:41 +0400 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:In a sense, \uFFFD means broken encoding. What about lone surrogates? Private use symbols that must not occur in transmission? They all displayed in various Unicode listings. About 'playing on broken strings' - ignoring broken/partially broken strings, I specifically think that it's what most users/use cases want. A more useful and sensible default of decoding is to substitute on broken encoding. And it's a standard procedure. It's particularly better for displaying text. To remind: since it's only a decode you are still in the control of original text - in fact you may re-test what bytes are there IF you want. The way of "throw on bad encoding" could be useful but I hardly see it as what you want for default. I'm wary of breaking code that relies on throwing. For the moment I think the best course of action would be to introduce xdecode or some such that will do substitution on failure, see how it floats and then change ranges/foreach etc to use xdecode.09-Feb-2014 09:35, Marco Leise пишет:Of course it does. It is a valid symbol and a lot of websites describing the "Specials" Unicode block make use of it, like the one on Wikipedia: http://en.wikipedia.org/wiki/Specials_(Unicode_block) With your definition, pulling such a document from the web and parsing it in D would mean playing on broken strings.Thats neither an improvement over calling "validate" nor does that deal with distinguishing between invalid UTF andMeans text is broken but wasn't ever read...\uFFFD in the input....means text was broken sometime before. Hardly makes any difference to the most applications. Normal text doesn't contain \uFFFD.Well, gedit is a nice example of why just throwing exception is not good enough for many apps (editors in particular). The fact that it's piece of junk might be irrelevant ;) -- Dmitry Olshansky[...] Every single text editor out there seems to disagree with you: they do show you partially substituted text, not a dialog box "My bad, it's broken UTF-8, I'm giving up!".gedit does in fact throw an error message at you saying "My bad, it's broken UTF-8, I'm giving up!".I know and it's piece of junk :) Seriously it doesn't even has regular expressions for search and replace!https://yourlogicalfallacyis.com/no-true-scotsman :p
Feb 18 2014
On 2/18/14, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:Well, gedit is a nice example of why just throwing exception is not good enough for many apps (editors in particular). The fact that it's piece of junk might be irrelevant ;)OT: Considering how many big-budget events (World Cup / Olympics) do such a poor job at displaying any kind of unicode text (e.g. they frequently display č/ć/đ ad c/c/dj), the only thing that could be worse is a big red dialog box, lol!
Feb 18 2014
Am Tue, 18 Feb 2014 12:14:58 +0400 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:In a sense, \uFFFD means broken encoding.In a sense yes, in another no. It is a defined code point and it has a symbol: =EF=BF=BD a diamond with a question mark inside.What about lone surrogates?Those are actual broken encoding.Private use symbols that must not occur in transmission?Then that "transmission" seems to exclude private symbols. It may also exclude special characters like \uFFFD. That's part of the particular protocol and should be handled there.They all=20 displayed in various Unicode listings. About 'playing on broken strings'==20- ignoring broken/partially broken strings, I specifically think that=20 it's what most users/use cases want. =20 A more useful and sensible default of decoding is to substitute on=20 broken encoding. And it's a standard procedure. It's particularly better==20for displaying text.Correct. I just don't agree that displaying text should the the one true use case and instead prefer exceptions instead of silent loss of information as the default.To remind: since it's only a decode you are still in the control of=20 original text - in fact you may re-test what bytes are there IF you want. =20 The way of "throw on bad encoding" could be useful but I hardly see it=20 as what you want for default. =20 I'm wary of breaking code that relies on throwing. For the moment I=20 think the best course of action would be to introduce xdecode or some=20 such that will do substitution on failure, see how it floats and then=20 change ranges/foreach etc to use xdecode.We wont convince each other. Let's just stop here. --=20 Marco
Feb 18 2014
On 2/7/14, Jonathan M Davis <jmdavisProg gmx.com> wrote:However, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO.I suggested we would introduce an overload, not replace the existing function, so this isn't an issue.The problem is that you need to check it. This is _slower_ than exceptions inthe normal case, as invalid Unicode should be the rare case. Do you have any benchmarks for this? I have vague memory about complaining that the exception code is *de-facto* slower, regardless of input. But I'll try to provide some test-cases later and see where we're at.
Feb 08 2014
08-Feb-2014 12:20, Andrej Mitrovic пишет:On 2/7/14, Jonathan M Davis <jmdavisProg gmx.com> wrote:Just be sure to test on LDC or GDC. DMD results are irrelevant to the performance-minded of our community. Also be sure to copy the whole code involved in a single file not link to Phobos. People tend to thrown figures like ~10% slower with exceptions turned on but you'll never known what exactly they test. -- Dmitry OlshanskyHowever, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO.I suggested we would introduce an overload, not replace the existing function, so this isn't an issue.The problem is that you need to check it. This is _slower_ than exceptions inthe normal case, as invalid Unicode should be the rare case. Do you have any benchmarks for this? I have vague memory about complaining that the exception code is *de-facto* slower, regardless of input. But I'll try to provide some test-cases later and see where we're at.
Feb 08 2014
On 2/7/14, 8:29 AM, Andrej Mitrovic wrote:On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:.toBugzilla() AndreiAdd a bugzilla and let's define isValid that returns bool!Add std.utf.decode() to that as well. IOW, it should have an overload which returns a status code but assigns the return value through another parameter.
Feb 07 2014
On Friday, 7 February 2014 at 03:14:45 UTC, Sean Kelly wrote:True words indeed! To sum up this small thread : I am perfectly OK with exceptions not showing in -vgc if we also agree on cleaning up Phobos from control flow exceptions.pure safe void validate(S)(in S str) if (isSomeString!S); Throws: UTFException if str is not well-formed.And somewhere in the world, darkness fell forever on a bright and beautiful countryside. The monsters poured forth and devoured everything in sight, given strength by that unbelievable abomination of a function design.
Feb 07 2014
On Thursday, February 06, 2014 22:20:37 Dicebot wrote:On Thursday, 6 February 2014 at 22:18:10 UTC, Brad Anderson wrote:In general, I think that throwing on malformed Unicode is a good thing, because it results in code that's less error-prone (as the alternative is to not validate Unicode and try and continue somehow regardless of bad input when decoding Unicode, which would be very bad IMHO). That being said, validating strings when they enter the program is a good way to localize any failures - which is where validate would come in - and I have to agree that the fact that validate throws is horrific. It's a classic example of a function that should return a bool rather than throw. You're asking it whether the string is valid, not asking to report errors when your normal control flow encounters an error that prevents it from functioning normally (which is where exceptions should normally be used). As such, I think that it's clear that we need a new function to replace it (e.g. isValidUnicode). I'll have to take a look at it. If I'm lucky, it won't even take all that long to implement. - Jonathan M DavisYou should probably validate utf from all foreign sources. Catch a problem with it as it comes in rather than in some arbitrary part of your program.pure safe void validate(S)(in S str) if (isSomeString!S); Throws: UTFException if str is not well-formed. ;)
Feb 07 2014
On Thursday, 6 February 2014 at 21:48:13 UTC, Dicebot wrote:On Thursday, 6 February 2014 at 19:54:27 UTC, Sean Kelly wrote:That's a tough one. Bad input typically shouldn't generate an exception, but sometimes doing so is handy from a flow control perspective (I know I know, exceptions aren't for flow control). In the few instances where I use an exception for flow control though (like core.demangle) I always use a static instance, so no allocation occurs, and it's entirely internal to the routine. I think it's fair to say that _an_API_ shouldn't allocate and throw an exception to indicate an expected error condition. For a parser, invalid input definitely applies. So then if the user wants to throw an exception in that case, they can do so themselves. Then the choice of allocation is left to the user, not imposed on them. It's generally really easy to let the user supply a delegate to execute on error too, so they don't even necessarily have to check a return code.Does this case even matter? Exceptions are not a normal function of execution, and so should happen rarely to never. And it's a time when I'd expect a delay anyway.Imagine intentionally crafted broken utf as user input in repeated requests. You don't have control over it. Now if Phobos would have only thrown exceptions in really _exceptional_ situations and handled broken input gracefully...
Feb 06 2014
Dicebot:Now if Phobos would have only thrown exceptions in really _exceptional_ situations and handled broken input gracefully...I wrote two small ideas to reduce throwing exceptions in Phobos: http://d.puremagic.com/issues/show_bug.cgi?id=6840 http://d.puremagic.com/issues/show_bug.cgi?id=11913 Bye, bearophile
Feb 07 2014
On 2/6/2014 11:54 AM, Sean Kelly wrote:Does this case even matter? Exceptions are not a normal function of execution, and so should happen rarely to never. And it's a time when I'd expect a delay anyway.Right. If you're: 1. using throws as control flow logic 2. requiring a throw in a performance critical loop to be performance critical 3. doing so many throws that the garbage collector needs to run to clean them up you're doing it wrong. I'm tempted to say that the throw expression can call 'new' even if the function is marked as nogc.
Feb 06 2014
On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:Right. If you're: 1. using throws as control flow logic[...]you're doing it wrong.I disagree. REST based web services tend to use throws all the time. It is a an effective and clean way to break all transactions that are in progress throughout the call chain when you cannot carry through a request, or if the request returns nothing.
Feb 06 2014
On Friday, 7 February 2014 at 01:31:17 UTC, Ola Fosheim Grøstad wrote:On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:I think in the case of people using exceptions for control flow a GC.free in your exception handler would suffice for preventing the GC heap from growing to the point where collection times become a concern.Right. If you're: 1. using throws as control flow logic[...]you're doing it wrong.I disagree. REST based web services tend to use throws all the time. It is a an effective and clean way to break all transactions that are in progress throughout the call chain when you cannot carry through a request, or if the request returns nothing.
Feb 06 2014
On 2/6/2014 5:31 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>" wrote:On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:They're going to be slow when you do it that way.Right. If you're: 1. using throws as control flow logic[...]you're doing it wrong.I disagree. REST based web services tend to use throws all the time. It is a an effective and clean way to break all transactions that are in progress throughout the call chain when you cannot carry through a request, or if the request returns nothing.
Feb 06 2014
On Friday, 7 February 2014 at 02:42:14 UTC, Walter Bright wrote:They're going to be slow when you do it that way.How slow is slow? Is it slower than in Go and Python? Why would unwinding 8 stack frames be so slow? Is it a language mandated speed issue or just a runtime issue that could be fixed with a compiler switch? Most of the time is spent waiting for async request from memcaches/databases and other types of network traffic so you usually have some free cycles on a decent CPU. With native code and lightweight threads (coroutines) you should be able to handle 100+ concurrent requests per process.
Feb 07 2014
On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad wrote:usually have some free cycles on a decent CPU. With native code and lightweight threads (coroutines) you should be able to handle 100+ concurrent requests per process.When I think of it you could probably just push the RESTException throwing coroutine onto a "delayed request queue" since a timeout on a transaction might be no worse than aborting it (or carry along some kind of context object). That would make DoS less problematic too and you get better latency for good requests and complete the bad requests when you are idle.
Feb 07 2014
On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad wrote:Is it a language mandated speed issue?It is assumed by http://dlang.org/errors.html
Feb 07 2014
On Friday, 7 February 2014 at 11:41:43 UTC, Dicebot wrote:On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad wrote:P.S. Throwing exception is not that slow in D, it is allocating new instance that makes a huge impact.Is it a language mandated speed issue?It is assumed by http://dlang.org/errors.html
Feb 07 2014
On 2/7/2014 3:42 AM, Dicebot wrote:P.S. Throwing exception is not that slow in D, it is allocating new instance that makes a huge impact.Throwing speed can vary greatly from platform to platform. The idea, as in C++, is when there's a speed tradeoff between throw/catch speed and compromising speed to handle the possibility of exceptions, the non-throw case gets priority.
Feb 07 2014
On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad wrote:How slow is slow? Is it slower than in Go and Python?One problem with allocating the exception is the stop-the-world thing. My cgi.d's built in httpd does some allocations in its constructor, which is run once per request. It can answer requests at a rate of about 6000/sec on my computer... Until the allocation have gone too much and the GC starts running. Then all the pending requests stop, killing the throughput. (BTW, interestingly, on Linux it uses separate process pools instead of threads. The GC does NOT stop the world since the other processes can keep going. But, if the requests are fairly uniform - as is typically the case with benchmarks - each process hits the GC threshold at about the same time.... ironically, it is the deterministic nature of the GC that leads to the performance killer there.)
Feb 07 2014
On Friday, 7 February 2014 at 15:33:01 UTC, Adam D. Ruppe wrote:On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad wrote:It's obviously not a solution, but you could change that by having each process call GC.reserve() with a different size.How slow is slow? Is it slower than in Go and Python?One problem with allocating the exception is the stop-the-world thing. My cgi.d's built in httpd does some allocations in its constructor, which is run once per request. It can answer requests at a rate of about 6000/sec on my computer... Until the allocation have gone too much and the GC starts running. Then all the pending requests stop, killing the throughput. (BTW, interestingly, on Linux it uses separate process pools instead of threads. The GC does NOT stop the world since the other processes can keep going. But, if the requests are fairly uniform - as is typically the case with benchmarks - each process hits the GC threshold at about the same time.... ironically, it is the deterministic nature of the GC that leads to the performance killer there.)
Feb 07 2014
On Friday, 7 February 2014 at 15:33:01 UTC, Adam D. Ruppe wrote:One problem with allocating the exception is the stop-the-world thing.Ok, well I guess that primarily is an issue for validation errors where you need to return detailed error reporting. "Not Found" etc can be preallocated as immutable, or?constructor, which is run once per request. It can answer requests at a rate of about 6000/sec on my computer...That sounds pretty good, was that as localhost, or over a network?(BTW, interestingly, on Linux it uses separate process pools instead of threads. The GC does NOT stop the world since the other processes can keep going. But, if the requests are fairly uniform - as is typically the case with benchmarks - each process hits the GC threshold at about the same time.... ironically, it is the deterministic nature of the GC that leads to the performance killer there.)You could synchronize them by calling the GC explicitly N seconds after the other process GC or you if you use a load balancer, maybe the GC could be scheduled by the load balancer or notify the load balancer (assuming all requests are short-lived). This won't work for a simulation type server though. (which is what I am most interested in)
Feb 07 2014
On Friday, 7 February 2014 at 17:10:15 UTC, Ola Fosheim Grøstad wrote:Ok, well I guess that primarily is an issue for validation errors where you need to return detailed error reporting. "Not Found" etc can be preallocated as immutable, or?yeah, preallocating exceptions might be a really good idea.That sounds pretty good, was that as localhost, or over a network?localhost, and it was just hello world, performance of my thing degrades kinda quickly - it never gets /bad/, but it isn't great either once it starts doing more stuff than the basisc (but it is soooo easy to use! for me anyway)You could synchronize them by calling the GC explicitly N seconds after the other process GC or you if you use a load balancer, maybe the GC could be scheduled by the load balancer or notify the load balancer (assuming all requests are short-lived).yeah. I'm not even sure if it would be a big deal in practice because there's often a lull anyway where the gc can get caught up (certainly not a problem for the lower traffic sites I mostly work on)
Feb 07 2014
On Friday, 7 February 2014 at 20:41:01 UTC, Adam D. Ruppe wrote:yeah, preallocating exceptions might be a really good idea.I wonder if it would be possible to get better unwinding speed by only throwing a single type of exception class and only a single catch. Then do pattern matching on an embedded typefield. I.e.: if (e.id & MASK_5xx) {} if (e.id & MASK_409) {} etc. After looking at the code for stack unwinding it seems like keeping the loops short is essential.
Feb 07 2014
On 2/7/2014 7:33 AM, Adam D. Ruppe wrote:On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad wrote:The gc is not the real speed issue with exceptions, after all, one can preallocate the exception: throw new Exception(); v.s. e = new Exception(); ... throw e; It's the unwinding speed. Just have a look at what deh2.d has to do.How slow is slow? Is it slower than in Go and Python?One problem with allocating the exception is the stop-the-world thing. My cgi.d's built in httpd does some allocations in its constructor, which is run once per request. It can answer requests at a rate of about 6000/sec on my computer...
Feb 07 2014
07-Feb-2014 23:45, Walter Bright пишет:On 2/7/2014 7:33 AM, Adam D. Ruppe wrote:And the standard library basically can't do this for every function.On Friday, 7 February 2014 at 11:37:16 UTC, Ola Fosheim Grøstad wrote:The gc is not the real speed issue with exceptions, after all, one can preallocate the exception: throw new Exception(); v.s. e = new Exception(); ... throw e;How slow is slow? Is it slower than in Go and Python?One problem with allocating the exception is the stop-the-world thing. My cgi.d's built in httpd does some allocations in its constructor, which is run once per request. It can answer requests at a rate of about 6000/sec on my computer...It's the unwinding speed. Just have a look at what deh2.d has to do.It's deh.d or rather deh_win32./ deh_win64_posix.d and it doesn't look like _all_ that lot especially if you have no finally blocks and the only catch is the top-most catch-all. After all error codes would also have to propagate up the same call stack depth. -- Dmitry Olshansky
Feb 07 2014
On 2/7/2014 12:51 PM, Dmitry Olshansky wrote:It's deh.d or rather deh_win32./ deh_win64_posix.d and it doesn't look like _all_ that lot especially if you have no finally blocks and the only catch is the top-most catch-all.It's a heluva lot slower than "jmp".
Feb 08 2014
09-Feb-2014 02:17, Walter Bright пишет:On 2/7/2014 12:51 PM, Dmitry Olshansky wrote:If you can show me how a single unconditional jump propagates error code 4 calls up the stack I'm sold. I do understand it's slow, it's not that slow to make difference in the discussed case. It's all about jumping to the wrong conclusions. To put it in one pitch: it should be possible to throw/catch in excess of 100k exceptions per second no problem at all (assuming a single core of some run of the mill modern CPU). Nobody is asking to optimize it better then the normal flow. -- Dmitry OlshanskyIt's deh.d or rather deh_win32./ deh_win64_posix.d and it doesn't look like _all_ that lot especially if you have no finally blocks and the only catch is the top-most catch-all.It's a heluva lot slower than "jmp".
Feb 09 2014
On 2/9/2014 2:17 AM, Dmitry Olshansky wrote:If you can show me how a single unconditional jump propagates error code 4 calls up the stack I'm sold. I do understand it's slow, it's not that slow to make difference in the discussed case. It's all about jumping to the wrong conclusions. To put it in one pitch: it should be possible to throw/catch in excess of 100k exceptions per second no problem at all (assuming a single core of some run of the mill modern CPU). Nobody is asking to optimize it better then the normal flow.It's the table lookup that's inherently slow.
Feb 10 2014
On Friday, 7 February 2014 at 01:31:17 UTC, Ola Fosheim Grøstad wrote:On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:But let this be up to the programmer working on the service, not imposed on them by the API. Then if they run into something like this DoS issue they can fix it. My experience with these services is that performance is critical and bad input is common, because people are always trying to hack your shit. Where I work, people are serious about performance, our daily volume is ridiculous, and our goal is five nine's of uptime across the board. At the same time, really good asynchronous programmers are about as rare as water on the moon. So something like vibe.d, where mid-level programmers could write correct code that still performs well thanks to the underlying event model, would be a godsend. But only if I really can get what I pay for. The thing I think a lot of people don't realize these days is that performance per watt is just about the most important thing there is. Data centers are expensive, slow to build, and rack space is limited. If you can find a way to increase the concurrent load per box by, say, an order of magnitude by choosing a different language or programming model or whatever, there's a real economic motivation to do so. Java gets by by having a really good GC and a low barrier of entry, but its scalability is really pretty poor all things considered. On the other hand, C/C++ scales tremendously but then you're stuck with the burden those languages impose in terms of semantic complexity, bug frequency, and so on. D seems really promising here but can't rely on having a fantastic incremental GC like Java, and so I think it's a mistake to use Java as a model for how to manage memory. And maybe Java just got it wrong anyway. I know some people who had to go to ridiculous lengths to avoid GC collection cycles in Java because a collection in the app took _20_seconds_ to complete. Now maybe the application was poorly designed or they should have been using an aftermarket GC, but even so. Finally, library programming is the one place where premature optimization really is a good idea, because you can never be sure how people will be using your code. That allocation may not be a big deal to you or 98% of your users, but for the one big client who calls that routine in a tight inner loop or operates at volumes you never conceived of it's a deal breaker. I really don't want Phobos to be the deal breaker :-)Right. If you're: 1. using throws as control flow logic[...]you're doing it wrong.I disagree. REST based web services tend to use throws all the time. It is a an effective and clean way to break all transactions that are in progress throughout the call chain when you cannot carry through a request, or if the request returns nothing.
Feb 06 2014
On Friday, 7 February 2014 at 01:31:17 UTC, Ola Fosheim Grøstad wrote:On Friday, 7 February 2014 at 01:23:44 UTC, Walter Bright wrote:And it is horrible. Exceptions were never designed for this. Try benchmarking trivial vibe.d REST service looking up an entry in an array and throwing 404 upon failure. Difference in performanc between "all requests are 200" and "all requests are 404" will be of order of magnitude.Right. If you're: 1. using throws as control flow logic[...]you're doing it wrong.I disagree. REST based web services tend to use throws all the time. It is a an effective and clean way to break all transactions that are in progress throughout the call chain when you cannot carry through a request, or if the request returns nothing.
Feb 07 2014
On 2/6/14, 5:23 PM, Walter Bright wrote:I'm tempted to say that the throw expression can call 'new' even if the function is marked as nogc.That's extreme. A better possibility is to allocate exceptions from a different heap and proclaim that the heap is cleaned once all catch blocks are left. (I'm sure we can find something better, but now is not the time to worry about it.) Andrei
Feb 06 2014
On Friday, 7 February 2014 at 02:19:42 UTC, Andrei Alexandrescu wrote:A better possibility is to allocate exceptions from a different heap and proclaim that the heap is cleaned once all catch blocks are left.I wrote a quick proof of concept of this that can be tested right now: http://arsdnet.net/dcode/except.d It hooks _d_newclass to allocate Throwables on a little static bump-the-pointer array. Each catch block has a scope(success) in it that zeroes the throwables area back out to zero.
Feb 06 2014
On 2/6/2014 6:19 PM, Andrei Alexandrescu wrote:On 2/6/14, 5:23 PM, Walter Bright wrote:That doesn't work, as nothing prevents code from squirreling away the caught exception object handle.I'm tempted to say that the throw expression can call 'new' even if the function is marked as nogc.That's extreme. A better possibility is to allocate exceptions from a different heap and proclaim that the heap is cleaned once all catch blocks are left. (I'm sure we can find something better, but now is not the time to worry about it.)
Feb 07 2014
On Friday, 7 February 2014 at 08:32:04 UTC, Walter Bright wrote:That doesn't work, as nothing prevents code from squirreling away the caught exception object handle.scope would. I'm just saying. We could also just document it as undefined behavior and leave matters in the user's hands, but this wouldn't jive nicely with safe :(
Feb 07 2014
On Friday, 7 February 2014 at 15:41:59 UTC, Adam D. Ruppe wrote:On Friday, 7 February 2014 at 08:32:04 UTC, Walter Bright wrote:Thread stores an uncaught exception reference so it can be rethrown on join(). But I suppose a case could be made that an uncaught exception could either be discarded or abort the app.That doesn't work, as nothing prevents code from squirreling away the caught exception object handle.scope would. I'm just saying. We could also just document it as undefined behavior and leave matters in the user's hands, but this wouldn't jive nicely with safe :(
Feb 07 2014
On Friday, 7 February 2014 at 15:44:08 UTC, Sean Kelly wrote:But I suppose a case could be made that an uncaught exception could either be discarded or abort the app.It could also make a copy at that time on to the regular GC heap and store that (the members of the throwable class are still GC'd so all the store function has to do is a shallow copy, using the RTTI to get the correct size to copy, onto the gc heap). It'd surely be fewer exceptions to get through that than the thrown, caught, and subsequentely discarded typical case.
Feb 07 2014
On Friday, 7 February 2014 at 15:48:56 UTC, Adam D. Ruppe wrote:It could also make a copy at that time on to the regular GC heap and store thatlol just add in a quick call to .toGC when you want to store it: T toGC(T)(T t) if(is(T==class)) { auto size = typeid(t).init.length; import core.memory; auto ptr = GC.malloc(size); ptr[0 .. size] = (cast(void*) t)[0 .. size]; return cast(T) ptr; }
Feb 07 2014
Walter Bright <newshound2 digitalmars.com> writes:On 2/6/2014 6:19 PM, Andrei Alexandrescu wrote:Very naive question (that may have already been answered), but why can't throw use structs instead of classes? Then the exception would propagate by copy rather than passing the object up the stack?On 2/6/14, 5:23 PM, Walter Bright wrote:That doesn't work, as nothing prevents code from squirreling away the caught exception object handle.I'm tempted to say that the throw expression can call 'new' even if the function is marked as nogc.That's extreme. A better possibility is to allocate exceptions from a different heap and proclaim that the heap is cleaned once all catch blocks are left. (I'm sure we can find something better, but now is not the time to worry about it.)
Feb 07 2014
On Friday, 7 February 2014 at 18:28:24 UTC, Jerry wrote:throw use structs instead of classes?I think that'd be more costly and would mess up the whole inheritance checks; catch(Exception) wouldn't catch the same children.
Feb 07 2014
Jerry:This thread discusses the (low) performance of D exceptions, and suggests some ideas: https://d.puremagic.com/issues/show_bug.cgi?id=9584 Another thread: https://d.puremagic.com/issues/show_bug.cgi?id=9581 The thread also discusses an old idea from Java: http://www.javaspecialists.eu/archive/Issue187.html Bye, bearophilethrow use structs instead of classes?
Feb 07 2014
On Friday, 7 February 2014 at 18:45:24 UTC, bearophile wrote:Jerry:Okay, I'm going to look into generating traces lazily. I think it should be possible.This thread discusses the (low) performance of D exceptions, and suggests some ideas: https://d.puremagic.com/issues/show_bug.cgi?id=9584 Another thread: https://d.puremagic.com/issues/show_bug.cgi?id=9581 The thread also discusses an old idea from Java: http://www.javaspecialists.eu/archive/Issue187.htmlthrow use structs instead of classes?
Feb 07 2014
On Thursday, 6 February 2014 at 18:52:21 UTC, fra wrote:Hey, wait a second. How do you throw without allocating?Throw pre-allocated thread-local exception. And make a deep copy for it if it is going to be put into exception chain to avoid modifying one already in chain. I have been told in that PR that some of language features assume exception instances are always unique and rely on it. It sounds like major language design flaw that will block usage of Phobos in memory-caring code even if other issues are taken care of. Probably language spec should be relaxed to fix this.
Feb 06 2014
On Thursday, 6 February 2014 at 18:20:56 UTC, Andrei Alexandrescu wrote:On 2/6/14, 10:05 AM, Johannes Pfau wrote:I'd think fixing that is probably above and beyond what is required to satisfy most people. If you are throwing so many exceptions that GC pauses are a problem you've got more serious problems than the GC. nothrow doesn't concern itself with Error exceptions, I think nogc should just ignore exceptions generally.Am Thu, 06 Feb 2014 16:32:08 +0000 schrieb "Dicebot" <public dicebot.lv>:Please close if you plan to rewrite.On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu wrote:That's only for implicit allocations though. And please, don't merge yet, it'll get another rewrite this weekend ;-)Would anyone be willing to take on the ingrate task of creating a comprehensive list with all Phobos functions (and more generally artifacts) that allocate memory? That would help a lot with focusing the discussion. AndreiMerging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions.Good point, we need to address that as well.Andrei
Feb 06 2014
On 6 February 2014 18:05, Johannes Pfau <nospam example.com> wrote:Am Thu, 06 Feb 2014 16:32:08 +0000 schrieb "Dicebot" <public dicebot.lv>:That message will look much better with vcolumns. ;) Albeit, it also depends on moving fprint(global.stdmsg, ...) => message(...) http://dpaste.dzfl.pl/5b1961918ed6On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu wrote:That's only for implicit allocations though. And please, don't merge yet, it'll get another rewrite this weekend ;-) One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions. Here's some example output for std.uuid/digest/path/range/algorithm/curl: http://dpaste.dzfl.pl/96d3725b06e2Would anyone be willing to take on the ingrate task of creating a comprehensive list with all Phobos functions (and more generally artifacts) that allocate memory? That would help a lot with focusing the discussion. AndreiMerging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.
Feb 06 2014
On 6 February 2014 19:03, Iain Buclaw <ibuclaw gdcproject.org> wrote:On 6 February 2014 18:05, Johannes Pfau <nospam example.com> wrote:Saying that, it seems it doesn't show the column number correctly. http://dpaste.dzfl.pl/31c8800e223aAm Thu, 06 Feb 2014 16:32:08 +0000 schrieb "Dicebot" <public dicebot.lv>:That message will look much better with vcolumns. ;) Albeit, it also depends on moving fprint(global.stdmsg, ...) => message(...) http://dpaste.dzfl.pl/5b1961918ed6On Thursday, 6 February 2014 at 16:28:25 UTC, Andrei Alexandrescu wrote:That's only for implicit allocations though. And please, don't merge yet, it'll get another rewrite this weekend ;-) One interesting point is that module that were written with avoiding allocations in mind usually still allocate when throwing exceptions. Here's some example output for std.uuid/digest/path/range/algorithm/curl: http://dpaste.dzfl.pl/96d3725b06e2Would anyone be willing to take on the ingrate task of creating a comprehensive list with all Phobos functions (and more generally artifacts) that allocate memory? That would help a lot with focusing the discussion. AndreiMerging https://github.com/D-Programming-Language/dmd/pull/1886 and running phobos unit tests should make it relatively simple, at least for a first pass.
Feb 06 2014
Johannes Pfau:Here's some example output for std.uuid/digest/path/range/algorithm/curl: http://dpaste.dzfl.pl/96d3725b06e2./dmd -vgc ~/Dokumente/d/phobos/std/range.d -c -unittest /home/jpf/Dokumente/d/phobos/std/range.d(7307): vgc: Array literals cause gc allocationSince some time in some cases dynamic array literals don't allocate. And there's also this: https://github.com/D-Programming-Language/dmd/pull/2952 the [1, 2]s syntax guarantees no heap allocation. Bye, bearophile
Feb 06 2014
On Thursday, 6 February 2014 at 20:40:28 UTC, bearophile wrote:Johannes Pfau:My pull was not perfect. And I have no time to finish the type[$] and auto[$] pull. :/Here's some example output for std.uuid/digest/path/range/algorithm/curl: http://dpaste.dzfl.pl/96d3725b06e2./dmd -vgc ~/Dokumente/d/phobos/std/range.d -c -unittest /home/jpf/Dokumente/d/phobos/std/range.d(7307): vgc: Array literals cause gc allocationSince some time in some cases dynamic array literals don't allocate. And there's also this: https://github.com/D-Programming-Language/dmd/pull/2952 the [1, 2]s syntax guarantees no heap allocation. Bye, bearophile
Feb 06 2014
On Saturday, February 08, 2014 09:20:15 Andrej Mitrovic wrote:On 2/7/14, Jonathan M Davis <jmdavisProg gmx.com> wrote:The exception version has to all of the same checks that the version which returns an error value would have to do, while the one returning an error value which had to be checked for validity would have an extra check. So, the only ways that the exception version would be slower are if the plumbing for being able to throw an exception from the function makes it slower (assuming that the other would be nothrow) or if the optimizer just does worse with the exception one for some reason. Because the number of operations that the actual D code would be doing in the successful case would be greater for the non-throwing version. Code generation can do entertaining things to efficiency though, so benchmarking would be required to see what would actually happen. However, as I stated in another post, I've reconsidered the situation. I think that I misunderstood what Dmitry was suggesting and that checking the error value is not actually necessary: http://forum.dlang.org/post/mailman.66.1391838333.21734.digitalmars-d puremagic.com And if that's the case, then we can probably move towards having decode not throw and possibly getting rid of UTFException altogether (certainly, most code wouldn't throw it or have to worry about it, since decode and stride are the two main cases where that's a concern, and if they don't throw anymore, then UTFException would have very little use). - Jonathan M DavisHowever, I would argue that assuming that everyone is going to validate their strings and that pretty much all string-related functions shouldn't ever have to worry about invalid Unicode is just begging for subtle bugs all over the place IMHO.I suggested we would introduce an overload, not replace the existing function, so this isn't an issue.The problem is that you need to check it. This is _slower_ than exceptions inthe normal case, as invalid Unicode should be the rare case. Do you have any benchmarks for this? I have vague memory about complaining that the exception code is *de-facto* slower, regardless of input. But I'll try to provide some test-cases later and see where we're at.
Feb 08 2014