digitalmars.D.announce - Fuzzed - a program to find DMDFE parser crash

Basile B. (14/17) Dec 15 2018 Fuzzed [1] is a simple fuzzer for the D programming language. It

Johan Engelen (12/14) Dec 15 2018 Are you familiar with libFuzzer and LDC's integration?

Basile B. (40/54) Dec 15 2018 No, but i'm not that surprised to see that a fuzzer already

Sebastiaan Koppe (4/9) Dec 15 2018 As is common with fuzzing, you'll need to ensure the program

Neia Neutuladh (9/20) Dec 15 2018 I think the point is that DMD tries to recover from parsing failures in

Walter Bright (3/7) Dec 15 2018 DMD tries to continue parsing after a syntax error, but it does not atte...

Basile B. (6/16) Dec 15 2018 The problem i underlined is more that, like in the code that

Basile B. (19/23) Dec 15 2018 You can still continue parsing after an error but right now many

Basile B. (15/24) Dec 15 2018 Yes this is done by piping dmd with the random code (i dont use

Jacob Carlborg (4/6) Dec 16 2018 Does that matter as long as the bug is found?

Stefan Koch (6/10) Dec 17 2018 Well it's hard to tell if it's begin.

Stefan Koch (2/11) Dec 17 2018 meant to say benign.

Sebastiaan Koppe (6/13) Dec 15 2018 Nice. In my experience fuzzing parses works very well. I have
Walter Bright (2/3) Dec 15 2018 Great! Please post them to bugzilla.
Jacob Carlborg (5/6) Dec 16 2018 I've used it to make a tool, DLP [1].

Basile B. <b2.temp gmx.com> writes:

Fuzzed [1] is a simple fuzzer for the D programming language. It 
allows to detect sequences of tokens that crash the parser. While 
the D front end is not yet used to make tools, if this ever 
happens the parser will have to accept invalid code. As 
experienced with dparse, invalid code tend to crash more a parser 
because of a cognitive bias that lead us, "hoomans", to prove 
that things work rather than the opposite.

You can run it on one your core, report the crasher programs to 
the project issue tracker or fix them yourself:

 gdb dmd
 run <the_crasher>
 bt

And then try to see what happens in the parser at the location 
pointed on top of the back trace. Note that you'll need to build 
dmd debug version.

The time to write this announce, already 5 "crashers" found.

[1] https://github.com/BBasile/fuzzed

Dec 15 2018

Johan Engelen <j j.nl> writes:

On Saturday, 15 December 2018 at 11:29:45 UTC, Basile B. wrote:
 Fuzzed [1] is a simple fuzzer for the D programming language.

Are you familiar with libFuzzer and LDC's integration?
https://johanengelen.github.io/ldc/2018/01/14/Fuzzing-with-LDC.html
You can feed libFuzzer with a dictionary of keywords to speed up 
the initial fuzzing phase, where the keywords are the tokens 
strings that you use.
Besides finding crashes, it's also good to enable ASan to find 
memory-related bugs that by luck didn't crash the program.

 The time to write this announce, already 5 "crashers" found.

Great :)

The other day I was reminded of OSS Fuzz and that it'd be nice if 
we would setup fuzzing for the frontend and phobos there...

-Johan

Dec 15 2018

Basile B. <b2.temp gmx.com> writes:

On Saturday, 15 December 2018 at 14:22:48 UTC, Johan Engelen 
wrote:
 On Saturday, 15 December 2018 at 11:29:45 UTC, Basile B. wrote:
 Fuzzed [1] is a simple fuzzer for the D programming language.

 Are you familiar with libFuzzer and LDC's integration?
 https://johanengelen.github.io/ldc/2018/01/14/Fuzzing-with-LDC.html

No, but i'm not that surprised to see that a fuzzer already 
exists.
I may have even seen this article but completely forgot it.

 You can feed libFuzzer with a dictionary of keywords to speed 
 up the initial fuzzing phase, where the keywords are the tokens 
 strings that you use.
 Besides finding crashes, it's also good to enable ASan to find 
 memory-related bugs that by luck didn't crash the program.

 The time to write this announce, already 5 "crashers" found.

 Great :)

I have about 40 now

 The other day I was reminded of OSS Fuzz and that it'd be nice 
 if we would setup fuzzing for the frontend and phobos there...

 -Johan

I started looking at a crasher:

    typeof function function in

which crashes in hdrgen. Actually i realize that i don't like the 
D parser. In many cases it checks for errors but continues 
parsing unconditionally.

In the example, "in" leads to an null contract that the pretty 
formatter dereferences at some point, but parsing should have 
stopped after "typeof" since there is no left paren. Now take a 
look at typeof sub parser


     AST.TypeQualified parseTypeof()
     {
         AST.TypeQualified t;
         const loc = token.loc;

         nextToken();
         check(TOK.leftParentheses); // <--  why continuing if the 
check fails?
         if (token.value == TOK.return_)
         {
             nextToken();
             t = new AST.TypeReturn(loc);
         }
         else
         {
             AST.Expression exp = parseExpression();
             t = new AST.TypeTypeof(loc, exp);
         }
         check(TOK.rightParentheses);
         return t;
     }

I think this is what Walter calls "AST poisoning" (never 
understood how it worked before today). And the whole parser is 
like this.

This poisoning kills the interest of using a fuzzer. 99% of the 
crashes will be in hdrgen.

Dec 15 2018

Sebastiaan Koppe <mail skoppe.eu> writes:

On Saturday, 15 December 2018 at 15:37:19 UTC, Basile B. wrote:
 I think this is what Walter calls "AST poisoning" (never 
 understood how it worked before today). And the whole parser is 
 like this.

 This poisoning kills the interest of using a fuzzer. 99% of the 
 crashes will be in hdrgen.

As is common with fuzzing, you'll need to ensure the program 
crashes. Sometimes that requires some tweaking.

Regardless, you still have the input to investigate.

Dec 15 2018

Neia Neutuladh <neia ikeran.org> writes:

On Sat, 15 Dec 2018 21:09:12 +0000, Sebastiaan Koppe wrote:
 On Saturday, 15 December 2018 at 15:37:19 UTC, Basile B. wrote:
 I think this is what Walter calls "AST poisoning" (never understood how
 it worked before today). And the whole parser is like this.

 This poisoning kills the interest of using a fuzzer. 99% of the crashes
 will be in hdrgen.

 
 As is common with fuzzing, you'll need to ensure the program crashes.
 Sometimes that requires some tweaking.
 
 Regardless, you still have the input to investigate.

I think the point is that DMD tries to recover from parsing failures in 
order to provide additional error messages. But those parsing failures 
leave the parser in an invalid state, and invalid states are fertile ground 
for crashes.

The way to fix this is to replace the entire parser and get rid of the 
idea of AST poisoning; at the first error, you give up on parsing the 
entire file. From there, you can try recovering from specific errors with 
proper testing.

Dec 15 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 12/15/2018 2:48 PM, Neia Neutuladh wrote:
 The way to fix this is to replace the entire parser and get rid of the
 idea of AST poisoning; at the first error, you give up on parsing the
 entire file. From there, you can try recovering from specific errors with
 proper testing.

DMD tries to continue parsing after a syntax error, but it does not attempt 
semantic analysis if there were any errors.

Dec 15 2018

Basile B. <b2.temp gmx.com> writes:

On Sunday, 16 December 2018 at 01:57:17 UTC, Walter Bright wrote:
 On 12/15/2018 2:48 PM, Neia Neutuladh wrote:
 The way to fix this is to replace the entire parser and get 
 rid of the
 idea of AST poisoning; at the first error, you give up on 
 parsing the
 entire file. From there, you can try recovering from specific 
 errors with
 proper testing.

 DMD tries to continue parsing after a syntax error, but it does 
 not attempt semantic analysis if there were any errors.

The problem i underlined is more that, like in the code that 
parses typeof, a non null node is returned even if some 
expectations are not verified when parsing.

I'm not sure of what is the right fix. fixing the ast pretty 
printer or the parser ?

Dec 15 2018

Basile B. <b2.temp gmx.com> writes:

On Saturday, 15 December 2018 at 22:48:01 UTC, Neia Neutuladh 
wrote:
 The way to fix this is to replace the entire parser and get rid 
 of the idea of AST poisoning; at the first error, you give up 
 on parsing the entire file. From there, you can try recovering 
 from specific errors with proper testing.

You can still continue parsing after an error but right now many 
sub-parsers always return an AstNode instead of null. The parser 
on null sub parser result could go to the end of the scope or to 
the next statement, depending on what it expected, and continue 
from there. That being said this wouldn't always work, e.g when a 
semi colon or a curly brace misses.

Simple example:

     struct Foo
     {
         int a, b
         string c; // error because a type identifier part wasn't 
expected ...
     }             // ... we're in a aggr body so consume toks 
past the curly brace

     struct Bar
     {
     }

Dec 15 2018

Basile B. <b2.temp gmx.com> writes:

On Saturday, 15 December 2018 at 21:09:12 UTC, Sebastiaan Koppe 
wrote:
 On Saturday, 15 December 2018 at 15:37:19 UTC, Basile B. wrote:
 I think this is what Walter calls "AST poisoning" (never 
 understood how it worked before today). And the whole parser 
 is like this.

 This poisoning kills the interest of using a fuzzer. 99% of 
 the crashes will be in hdrgen.

 As is common with fuzzing, you'll need to ensure the program 
 crashes.

Yes this is done by piping dmd with the random code (i dont use 
dmd as a library for now). If the process returns something 
different of 0 (ok) and 1 (normal compiler error) than the random 
code is saved in a file:

         ...
         ProcessPipes pp = pipeProcess([Options.dc, "-"]);
         pp.stdin.writeln(src);
         pp.stdin.close;
         if (!pp.pid.wait.among(0, 1)) fileName.write(src);
         ...

Actually it would be less convenient to do that with the front 
end as a library, since SEGFAULTs are supposed to kill the 
program...

Dec 15 2018

Jacob Carlborg <doob me.com> writes:

On 2018-12-15 16:37, Basile B. wrote:

 This poisoning kills the interest of using a fuzzer. 99% of the crashes 
 will be in hdrgen.

Does that matter as long as the bug is found?

-- 
/Jacob Carlborg

Dec 16 2018

Stefan Koch <uplink.coder googlemail.com> writes:

On Sunday, 16 December 2018 at 14:24:54 UTC, Jacob Carlborg wrote:
 On 2018-12-15 16:37, Basile B. wrote:

 This poisoning kills the interest of using a fuzzer. 99% of 
 the crashes will be in hdrgen.

 Does that matter as long as the bug is found?

Well it's hard to tell if it's begin.
Generally in a compiler which is focused on speed, you accept 
crashing an really bogus input if it makes the parser run faster, 
because there is no chance of accepting corrupted code, which is 
what you need to be worried about.

Dec 17 2018

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 17 December 2018 at 10:12:44 UTC, Stefan Koch wrote:
 On Sunday, 16 December 2018 at 14:24:54 UTC, Jacob Carlborg 
 wrote:
 On 2018-12-15 16:37, Basile B. wrote:

 This poisoning kills the interest of using a fuzzer. 99% of 
 the crashes will be in hdrgen.

 Does that matter as long as the bug is found?

 Well it's hard to tell if it's begin.

meant to say benign.

Dec 17 2018

Sebastiaan Koppe <mail skoppe.eu> writes:

On Saturday, 15 December 2018 at 11:29:45 UTC, Basile B. wrote:
 Fuzzed [1] is a simple fuzzer for the D programming language. 
 It allows to detect sequences of tokens that crash the parser. 
 While the D front end is not yet used to make tools, if this 
 ever happens the parser will have to accept invalid code. As 
 experienced with dparse, invalid code tend to crash more a 
 parser because of a cognitive bias that lead us, "hoomans", to 
 prove that things work rather than the opposite.

Nice. In my experience fuzzing parses works very well. I have 
good memories with afl. So much so that I once wrote a wrapper 
around it to handle running it distributed.

See https://github.com/skoppe/afl-dist
Could use a readme and a how-to though.

Dec 15 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 12/15/2018 3:29 AM, Basile B. wrote:
 The time to write this announce, already 5 "crashers" found.

Great! Please post them to bugzilla.

Dec 15 2018

Jacob Carlborg <doob me.com> writes:

On 2018-12-15 12:29, Basile B. wrote:
 While the D front end is not yet used to make tools

I've used it to make a tool, DLP [1].

[1] http://github.com/jacob-carlborg/dlp

-- 
/Jacob Carlborg

Dec 16 2018

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Fuzzed - a program to find DMDFE parser crash