digitalmars.D.announce - Fuzzed - a program to find DMDFE parser crash
- Basile B. (14/17) Dec 15 2018 Fuzzed [1] is a simple fuzzer for the D programming language. It
- Johan Engelen (12/14) Dec 15 2018 Are you familiar with libFuzzer and LDC's integration?
- Basile B. (40/54) Dec 15 2018 No, but i'm not that surprised to see that a fuzzer already
- Sebastiaan Koppe (4/9) Dec 15 2018 As is common with fuzzing, you'll need to ensure the program
- Neia Neutuladh (9/20) Dec 15 2018 I think the point is that DMD tries to recover from parsing failures in
- Walter Bright (3/7) Dec 15 2018 DMD tries to continue parsing after a syntax error, but it does not atte...
- Basile B. (6/16) Dec 15 2018 The problem i underlined is more that, like in the code that
- Basile B. (19/23) Dec 15 2018 You can still continue parsing after an error but right now many
- Basile B. (15/24) Dec 15 2018 Yes this is done by piping dmd with the random code (i dont use
- Jacob Carlborg (4/6) Dec 16 2018 Does that matter as long as the bug is found?
- Stefan Koch (6/10) Dec 17 2018 Well it's hard to tell if it's begin.
- Stefan Koch (2/11) Dec 17 2018 meant to say benign.
- Sebastiaan Koppe (6/13) Dec 15 2018 Nice. In my experience fuzzing parses works very well. I have
- Walter Bright (2/3) Dec 15 2018 Great! Please post them to bugzilla.
- Jacob Carlborg (5/6) Dec 16 2018 I've used it to make a tool, DLP [1].
Fuzzed [1] is a simple fuzzer for the D programming language. It allows to detect sequences of tokens that crash the parser. While the D front end is not yet used to make tools, if this ever happens the parser will have to accept invalid code. As experienced with dparse, invalid code tend to crash more a parser because of a cognitive bias that lead us, "hoomans", to prove that things work rather than the opposite. You can run it on one your core, report the crasher programs to the project issue tracker or fix them yourself:gdb dmd run <the_crasher> btAnd then try to see what happens in the parser at the location pointed on top of the back trace. Note that you'll need to build dmd debug version. The time to write this announce, already 5 "crashers" found. [1] https://github.com/BBasile/fuzzed
Dec 15 2018
On Saturday, 15 December 2018 at 11:29:45 UTC, Basile B. wrote:Fuzzed [1] is a simple fuzzer for the D programming language.Are you familiar with libFuzzer and LDC's integration? https://johanengelen.github.io/ldc/2018/01/14/Fuzzing-with-LDC.html You can feed libFuzzer with a dictionary of keywords to speed up the initial fuzzing phase, where the keywords are the tokens strings that you use. Besides finding crashes, it's also good to enable ASan to find memory-related bugs that by luck didn't crash the program.The time to write this announce, already 5 "crashers" found.Great :) The other day I was reminded of OSS Fuzz and that it'd be nice if we would setup fuzzing for the frontend and phobos there... -Johan
Dec 15 2018
On Saturday, 15 December 2018 at 14:22:48 UTC, Johan Engelen wrote:On Saturday, 15 December 2018 at 11:29:45 UTC, Basile B. wrote:No, but i'm not that surprised to see that a fuzzer already exists. I may have even seen this article but completely forgot it.Fuzzed [1] is a simple fuzzer for the D programming language.Are you familiar with libFuzzer and LDC's integration? https://johanengelen.github.io/ldc/2018/01/14/Fuzzing-with-LDC.htmlYou can feed libFuzzer with a dictionary of keywords to speed up the initial fuzzing phase, where the keywords are the tokens strings that you use. Besides finding crashes, it's also good to enable ASan to find memory-related bugs that by luck didn't crash the program.I have about 40 nowThe time to write this announce, already 5 "crashers" found.Great :)The other day I was reminded of OSS Fuzz and that it'd be nice if we would setup fuzzing for the frontend and phobos there... -JohanI started looking at a crasher: typeof function function in which crashes in hdrgen. Actually i realize that i don't like the D parser. In many cases it checks for errors but continues parsing unconditionally. In the example, "in" leads to an null contract that the pretty formatter dereferences at some point, but parsing should have stopped after "typeof" since there is no left paren. Now take a look at typeof sub parser AST.TypeQualified parseTypeof() { AST.TypeQualified t; const loc = token.loc; nextToken(); check(TOK.leftParentheses); // <-- why continuing if the check fails? if (token.value == TOK.return_) { nextToken(); t = new AST.TypeReturn(loc); } else { AST.Expression exp = parseExpression(); t = new AST.TypeTypeof(loc, exp); } check(TOK.rightParentheses); return t; } I think this is what Walter calls "AST poisoning" (never understood how it worked before today). And the whole parser is like this. This poisoning kills the interest of using a fuzzer. 99% of the crashes will be in hdrgen.
Dec 15 2018
On Saturday, 15 December 2018 at 15:37:19 UTC, Basile B. wrote:I think this is what Walter calls "AST poisoning" (never understood how it worked before today). And the whole parser is like this. This poisoning kills the interest of using a fuzzer. 99% of the crashes will be in hdrgen.As is common with fuzzing, you'll need to ensure the program crashes. Sometimes that requires some tweaking. Regardless, you still have the input to investigate.
Dec 15 2018
On Sat, 15 Dec 2018 21:09:12 +0000, Sebastiaan Koppe wrote:On Saturday, 15 December 2018 at 15:37:19 UTC, Basile B. wrote:I think the point is that DMD tries to recover from parsing failures in order to provide additional error messages. But those parsing failures leave the parser in an invalid state, and invalid states are fertile ground for crashes. The way to fix this is to replace the entire parser and get rid of the idea of AST poisoning; at the first error, you give up on parsing the entire file. From there, you can try recovering from specific errors with proper testing.I think this is what Walter calls "AST poisoning" (never understood how it worked before today). And the whole parser is like this. This poisoning kills the interest of using a fuzzer. 99% of the crashes will be in hdrgen.As is common with fuzzing, you'll need to ensure the program crashes. Sometimes that requires some tweaking. Regardless, you still have the input to investigate.
Dec 15 2018
On 12/15/2018 2:48 PM, Neia Neutuladh wrote:The way to fix this is to replace the entire parser and get rid of the idea of AST poisoning; at the first error, you give up on parsing the entire file. From there, you can try recovering from specific errors with proper testing.DMD tries to continue parsing after a syntax error, but it does not attempt semantic analysis if there were any errors.
Dec 15 2018
On Sunday, 16 December 2018 at 01:57:17 UTC, Walter Bright wrote:On 12/15/2018 2:48 PM, Neia Neutuladh wrote:The problem i underlined is more that, like in the code that parses typeof, a non null node is returned even if some expectations are not verified when parsing. I'm not sure of what is the right fix. fixing the ast pretty printer or the parser ?The way to fix this is to replace the entire parser and get rid of the idea of AST poisoning; at the first error, you give up on parsing the entire file. From there, you can try recovering from specific errors with proper testing.DMD tries to continue parsing after a syntax error, but it does not attempt semantic analysis if there were any errors.
Dec 15 2018
On Saturday, 15 December 2018 at 22:48:01 UTC, Neia Neutuladh wrote:The way to fix this is to replace the entire parser and get rid of the idea of AST poisoning; at the first error, you give up on parsing the entire file. From there, you can try recovering from specific errors with proper testing.You can still continue parsing after an error but right now many sub-parsers always return an AstNode instead of null. The parser on null sub parser result could go to the end of the scope or to the next statement, depending on what it expected, and continue from there. That being said this wouldn't always work, e.g when a semi colon or a curly brace misses. Simple example: struct Foo { int a, b string c; // error because a type identifier part wasn't expected ... } // ... we're in a aggr body so consume toks past the curly brace struct Bar { }
Dec 15 2018
On Saturday, 15 December 2018 at 21:09:12 UTC, Sebastiaan Koppe wrote:On Saturday, 15 December 2018 at 15:37:19 UTC, Basile B. wrote:Yes this is done by piping dmd with the random code (i dont use dmd as a library for now). If the process returns something different of 0 (ok) and 1 (normal compiler error) than the random code is saved in a file: ... ProcessPipes pp = pipeProcess([Options.dc, "-"]); pp.stdin.writeln(src); pp.stdin.close; if (!pp.pid.wait.among(0, 1)) fileName.write(src); ... Actually it would be less convenient to do that with the front end as a library, since SEGFAULTs are supposed to kill the program...I think this is what Walter calls "AST poisoning" (never understood how it worked before today). And the whole parser is like this. This poisoning kills the interest of using a fuzzer. 99% of the crashes will be in hdrgen.As is common with fuzzing, you'll need to ensure the program crashes.
Dec 15 2018
On 2018-12-15 16:37, Basile B. wrote:This poisoning kills the interest of using a fuzzer. 99% of the crashes will be in hdrgen.Does that matter as long as the bug is found? -- /Jacob Carlborg
Dec 16 2018
On Sunday, 16 December 2018 at 14:24:54 UTC, Jacob Carlborg wrote:On 2018-12-15 16:37, Basile B. wrote:Well it's hard to tell if it's begin. Generally in a compiler which is focused on speed, you accept crashing an really bogus input if it makes the parser run faster, because there is no chance of accepting corrupted code, which is what you need to be worried about.This poisoning kills the interest of using a fuzzer. 99% of the crashes will be in hdrgen.Does that matter as long as the bug is found?
Dec 17 2018
On Monday, 17 December 2018 at 10:12:44 UTC, Stefan Koch wrote:On Sunday, 16 December 2018 at 14:24:54 UTC, Jacob Carlborg wrote:meant to say benign.On 2018-12-15 16:37, Basile B. wrote:Well it's hard to tell if it's begin.This poisoning kills the interest of using a fuzzer. 99% of the crashes will be in hdrgen.Does that matter as long as the bug is found?
Dec 17 2018
On Saturday, 15 December 2018 at 11:29:45 UTC, Basile B. wrote:Fuzzed [1] is a simple fuzzer for the D programming language. It allows to detect sequences of tokens that crash the parser. While the D front end is not yet used to make tools, if this ever happens the parser will have to accept invalid code. As experienced with dparse, invalid code tend to crash more a parser because of a cognitive bias that lead us, "hoomans", to prove that things work rather than the opposite.Nice. In my experience fuzzing parses works very well. I have good memories with afl. So much so that I once wrote a wrapper around it to handle running it distributed. See https://github.com/skoppe/afl-dist Could use a readme and a how-to though.
Dec 15 2018
On 12/15/2018 3:29 AM, Basile B. wrote:The time to write this announce, already 5 "crashers" found.Great! Please post them to bugzilla.
Dec 15 2018
On 2018-12-15 12:29, Basile B. wrote:While the D front end is not yet used to make toolsI've used it to make a tool, DLP [1]. [1] http://github.com/jacob-carlborg/dlp -- /Jacob Carlborg
Dec 16 2018