digitalmars.D.bugs - [Issue 8725] New: segmentation fault with negative-lookahead in module-level regex
- d-bugmail puremagic.com (38/38) Sep 25 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8725
- d-bugmail puremagic.com (12/12) Sep 25 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8725
- d-bugmail puremagic.com (51/51) Sep 26 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8725
- d-bugmail puremagic.com (19/19) Sep 26 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8725
- d-bugmail puremagic.com (13/13) Nov 30 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8725
- d-bugmail puremagic.com (10/10) Dec 01 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8725
http://d.puremagic.com/issues/show_bug.cgi?id=8725 Summary: segmentation fault with negative-lookahead in module-level regex Product: D Version: D2 Platform: x86_64 OS/Version: Mac OS X Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: val markovic.io --- Comment #0 from Val Markovic <val markovic.io> 2012-09-25 22:31:39 PDT --- The following program crashes with a segmentation fault: ------------- #!/usr/bin/env rdmd import std.stdio; import std.regex; auto italic = regex( r"\* (?!\s+) (.*?) (?!\s+) \*", "gx" ); void main() { string input = "this * is* interesting, *very* interesting"; writeln( replace( input, italic, "<i>$1</i>" ) ); } -------------- If one removes the first line with (?!\s+), then the program doesn't crash. I was under the impression that this snippet of code operates under the SafeD subset and therefore shouldn't cause a segmentation fault. A thrown exception on problems or something, that I can understand. But a segfault? In other sad news, these are the first lines of D I've ever written :( ... so much for experimentation... -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 25 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8725 --- Comment #1 from Val Markovic <val markovic.io> 2012-09-25 22:33:03 PDT --- Oh, and the segfault goes away if I put the regex creation directly in the call, like so: writeln( replace( input, regex( r"\* (?!\s+) (.*?) (?!\s+) \*", "gx" ), "<i>$1</i>" ) ); -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 25 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8725 Dmitry Olshansky <dmitry.olsh gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dmitry.olsh gmail.com --- Comment #2 from Dmitry Olshansky <dmitry.olsh gmail.com> 2012-09-26 06:46:49 PDT --- I suspect that is a long standing bug with compile-time evaluation that compiler parses regex pattern at compile time wrongly (unlike at R-T). See also: http://d.puremagic.com/issues/show_bug.cgi?id=7810 The problem is that once D compiler sees an initialized global variable it has to const-fold it: int fact10 = factorial(10); //will compute and hardcode the value of factorial(10) then with regex ...: auto italic = regex( ... ); // *parses* and *generates* binary object for compiled regex pattern object with all the datastructures for matching it All of this *at compile time* via CTFE, see about it here (near the bottom of): http://dlang.org/function.html Though previously it only caused unexpectedly long compilation time (CTFE is slow) and in a select cases it failed with assert *during compilation*, it never segfaulted. Probably internal structure has subtle corruption that self-test failed to catch. E.g this one also works because italic regex is created at run-time: import std.stdio; import std.regex; void main() { auto italic = regex( r"\* (?!\s+) (.*?) (?!\s+) \*", "gx" ); string input = "this * is* interesting, *very* interesting"; writeln( replace( input, italic, "<i>$1</i>" ) ); } Also a tip: the second lookahead should be lookbehind! As is is it will test that \* is not a space indeed... Also both can be just \s, because \s+ matches whenever \s matches. And since you don't capture the contents of lookahead/lookbehind it'll be faster/simpler to use a single \s. About SafeD: it shouldn't segfault but the program listed is system (as this is the default) :). Otherwise since regex is trusted, it's my responsibilty to verfiy that it is memory safe, so blame me (or rather the compiler). To be actually in SafeD try putting safe: at the top of your code or just tag main and all functions with safe. AFAIK writeln in SafeD wouldn't work as it's still system (obviously it should be safe/trusted). To be honest SafeD hasn't been addressed properly in the standard library yet. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 26 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8725 --- Comment #3 from Val Markovic <val markovic.io> 2012-09-26 09:39:30 PDT --- Thanks for the explanation! WRT the regex string being faulty, I was aware of that; I was just experimenting when I encountered a segfault. Thanks for the pointer about adding safe: at the top; too bad writeln is still system. That kinda kills the usefulness of SafeD, doesn't it? I mean if I literally can't write a Hello World program in SafeD, then SafeD is quite far from ready. :) I've read the TDPL last week and this is my first encounter with writing real D code; all in all, the language is freaking awesome (goodbye C++) and I'm even willing to live with esoteric bugs in the compiler/libs if I can work around them. I understand that D is still a work-in-progress language. I intend to write a substantial (multi KLOC) D program as a learning experience; will report any bugs I find as I find them. Anyway, good luck fixing this. :) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 26 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8725 Dmitry Olshansky <dmitry.olsh gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE --- Comment #4 from Dmitry Olshansky <dmitry.olsh gmail.com> 2012-11-30 12:49:42 PST --- Works with current git master. Must have been fixed along with the compiler bug in 7810. *** This issue has been marked as a duplicate of issue 7810 *** -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 30 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8725 --- Comment #5 from github-bugzilla puremagic.com 2012-12-01 00:12:43 PST --- Commit pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/0f2947d4d1360f0a0f797279e6f13f95695e45ec bugfixes for compile-time regex fix issue 8725 fix issue 8349 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 01 2012