digitalmars.D.announce - Article: Finding memory bugs in D code with AddressSanitizer

Johan Engelen (13/13) Dec 25 2017 I've been writing this article since August, and finally found

Walter Bright (23/34) Dec 25 2017 Thanks for the great article! Some suggestions:

Johan Engelen (18/29) Dec 25 2017 Looks great here, I like it, sorry. (made it completely black

=?UTF-8?Q?Ali_=c3=87ehreli?= (8/11) Dec 25 2017 Yes, browsers report it to be black but it looks very gray :) on Linux
Walter Bright (16/29) Dec 25 2017 It's still significantly harder to read than text in another font of the...

Temtaime (2/2) Dec 26 2017 The main font is very ugly.

Mengu (5/7) Dec 26 2017 on the contrary, post font is very readable (might use some

Walter Bright (15/15) Dec 26 2017 I posted this on another thread. It succinctly points out what is the

=?UTF-8?Q?Ali_=c3=87ehreli?= (55/56) Dec 25 2017 - (or ASan for short)
Jon Degenhardt (17/28) Dec 26 2017 Nice article. Main question / comment is about the need for

Johan Engelen (34/59) Dec 26 2017 Indeed, yes. I've used ASan successfully on the ddmd lexer.

Johan Engelen (3/6) Dec 28 2017 Is it a good fit with /r/programming ?

Atila Neves (4/12) Dec 29 2017 I'd definitely say so.

Martin Nowak (8/12) Jan 03 2018 Just built dmd with AddressSanitizer and ran dmd's, druntime's, and

Walter Bright (5/7) Jan 03 2018 This is a stack overflow caused by having 4096 expression statements. Th...

Walter Bright (3/13) Jan 04 2018 I'm a little curious about the stack overflow. I thought Linux would

codephantom (3/18) Jan 04 2018 it will, but only up to the rlimit. then it will SIGSEGV.

Johan Engelen (4/11) Jan 04 2018 Nice.

Johan Engelen <j j.nl> writes:

I've been writing this article since August, and finally found 
some time to finish it:

http://johanengelen.github.io/ldc/2017/12/25/LDC-and-AddressSanitizer.html

"LDC comes with improved support for Address Sanitizer since the 
1.4.0 release. Address Sanitizer (ASan) is a runtime memory 
write/read checker that helps discover and locate memory access 
bugs. ASan is part of the official LDC release binaries; to use 
it you must build with -fsanitize=address. In this article, I’ll 
explain how to use ASan, what kind of bugs it can find, and what 
bugs it will be able to find in the (hopefully near) future."

Thanks for your proof-reading.

cheers,
   Johan

Dec 25 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 12/25/2017 9:03 AM, Johan Engelen wrote:
 I've been writing this article since August, and finally found some time to 
 finish it:
 
 http://johanengelen.github.io/ldc/2017/12/25/LDC-and-AddressSanitizer.html
 
 "LDC comes with improved support for Address Sanitizer since the 1.4.0
release. 
 Address Sanitizer (ASan) is a runtime memory write/read checker that helps 
 discover and locate memory access bugs. ASan is part of the official LDC
release 
 binaries; to use it you must build with -fsanitize=address. In this article, 
 I’ll explain how to use ASan, what kind of bugs it can find, and what bugs
it 
 will be able to find in the (hopefully near) future."

Thanks for the great article! Some suggestions:

1. The gray-on-white text is not very legible.

2. "Although D tries to be a more safe language, the safety measures still 
require developer effort and discipline. And so D code suffers from similar 
memory bugs that plague people in C++."

This comes across as unduly negative. D is a *lot* better than C++ in this 
regard. It doesn't just try to be more safe, it *is* more safe.

3. "A simple example"

This is a contrived example, and implies that normal D code is written like C++ 
code. It isn't, the parameter to foo() would be int[], not int*. The code would 
also be rejected by the compiler when annotated with  safe.

If you want to keep the example, a note of explanation about this would work. 
Because of D's array and ref types, very very little D code needs to manipulate 
pointers.

It would be nice to add a paragraph mentioning things about D that make it a 
more memory safe language.

4. "Future work: detecting stack use after return"

This code should be rejected by the compiler if using -dip1000. It is not, so I 
filed a bug report:

https://issues.dlang.org/show_bug.cgi?id=18128

I don't know if the fault lies with the compiler or with std.algorithm.move,
but 
I'd rather use examples that didn't rely on compiler/library bugs.

Dec 25 2017

Johan Engelen <j j.nl> writes:

On Monday, 25 December 2017 at 20:31:18 UTC, Walter Bright wrote:
 Thanks for the great article! Some suggestions:

Thanks for your comments, I've incorporated them (to my liking).

 1. The gray-on-white text is not very legible.

Looks great here, I like it, sorry. (made it completely black 
now, can't see the difference here though)

Snips:
 This comes across as unduly negative.

 This is a contrived example, and implies that normal D code is 
 written like C++ code.

 It would be nice to add a paragraph mentioning things about D 
 that make it a more memory safe language.

 This code should be rejected by the compiler if using -dip1000. 
 It is not, so I filed a bug report

 I'd rather use examples that didn't rely on compiler/library 
 bugs.

You're right, the examples are (of course) contrived. However, I 
didn't want to write a marketing article, and I also want to show 
examples found in the wild. I think one of the use cases of ASan 
is exactly that it can help discover bugs whereever they are, 
even in the compiler / standard library.
I've added bits and pieces to indicate some facilities of D to 
mitigate these kinds of bugs, but the reality is that a lot of D 
code is not idiomatic and does not use the safety features (for 
diverse reasons).
The article is not meant as a marketing piece (only for ASan), 
but also shouldn't be  overly critical of D. Hope that the 
balance is a bit better now with the modifications.

-Johan

Dec 25 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 12/25/2017 03:17 PM, Johan Engelen wrote:

 1. The gray-on-white text is not very legible.

 Looks great here, I like it, sorry. (made it completely black now, can't
 see the difference here though)

Yes, browsers report it to be black but it looks very gray :) on Linux 
Mint with both Firefox and Google Chrome.

I think it's about how that specific font is rendered; I think the font 
is too thin for my environment, so the way it gets softened (can't 
remember the technical term) by gray colors at the edges make it look 
completely gray. (It's clearly black when I zoom in.)

Ali

Dec 25 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 12/25/2017 3:17 PM, Johan Engelen wrote:
 On Monday, 25 December 2017 at 20:31:18 UTC, Walter Bright wrote:
 Thanks for the great article! Some suggestions:

 
 Thanks for your comments, I've incorporated them (to my liking).
 
 1. The gray-on-white text is not very legible.

 
 Looks great here, I like it, sorry. (made it completely black now, can't see
the 
 difference here though)

It's still significantly harder to read than text in another font of the same 
size. I have to move close to the screen to read it. Perhaps its the line width 
being too narrow. Perhaps the issue is just with my screen, which has a high 
pixel density. (The boldface text is very readable, for comparison.)



 The article is not meant as a marketing piece (only for ASan), but also 
 shouldn't be  overly critical of D. Hope that the balance is a bit better now 
 with the modifications.

It is better, thank you.

For better or worse, it is always about marketing (or as I prefer it, 
"framing"). We're at a critical time with D, and framing D as being just as bad 
as C++ is going to turn people away.

C++ has many memory safe features, and you can write memory safe code in C++ 
with some discipline. The trouble is, however, that those features are library 
features, and the compiler cannot check them.

This is fundamentally different from D's approach, which is a language approach 
where unsafe operations can be detected at compile time. AS is still useful
with 
D, however, because D allows one to escape into unsafe systems programming, and 
in detecting implementation bugs.

Dec 25 2017

Temtaime <temtaime gmail.com> writes:

The main font is very ugly.
Code font looks ok tw.

Dec 26 2017

Mengu <mengukagan gmail.com> writes:

On Tuesday, 26 December 2017 at 08:03:44 UTC, Temtaime wrote:
 The main font is very ugly.
 Code font looks ok tw.

on the contrary, post font is very readable (might use some 
letter spacing), clear and beautiful. that is on a retina macbook 
pro.

code blocks are very readable too.

Dec 26 2017

Walter Bright <newshound2 digitalmars.com> writes:

I posted this on another thread. It succinctly points out what is the 
fundamental difference between C++ and D on memory safety:


C++:

     int foo(int* p) { return p[1]; }
     int bar(int i) { return foo(&i); }

     clang++ -c test.cpp -Wall


D:

      safe:
     int foo(int* p) { return p[1]; }
     int bar(int i) {return foo(&i); }

     dmd -c test.d
     test.d(3): Error: safe function 'test.foo' cannot index pointer 'p'
     test.d(4): Error: cannot take address of parameter i in  safe function bar


I.e. in C++, writing memory safe code means using the right library functions. 
It is not checkable by the compiler. In D, it is checkable by the compiler.

Dec 26 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 12/25/2017 09:03 AM, Johan Engelen wrote:

 Thanks for your proof-reading.

- (or ASan for short)
That came a little late in the article because ASan already appeared in 
the introduction.

-
peak your interest ->
pique your interest

-
Cppcon ->
CppCon

-
an ulong ->
a ulong
(That's assuming that ulong is pronounced starting with 'y'.)

-
small code fragment ->
code fragment

-
The ASan output is harder to correlate
In this case the ASan output is harder to correlate

-
running the Phobos and ->
running Phobos and

-
a blacklist and functions that ->
a blacklist so that functions that

-
some function that match this ->
a function that match this
OR
some functions that match this

-
standard library with ASan enabled ->
standard library ASan-enabled

-
asan library for the ->
ASan library for the

-
prevent code from doing ->
prevents code from doing

-
is the contrived example ->
is a contrived example

-
Work in progress…!
Work in progress…

-
as in C++, for example because of ->
as in C++ partly because of

-
the guys at Weka.io ->
the folks at Weka.io
(just because "guys" may not come as gender-neutral, whether rightly or 
not :/ )

Ali

Dec 25 2017

Jon Degenhardt <jond noreply.com> writes:

On Monday, 25 December 2017 at 17:03:37 UTC, Johan Engelen wrote:
 I've been writing this article since August, and finally found 
 some time to finish it:

 http://johanengelen.github.io/ldc/2017/12/25/LDC-and-AddressSanitizer.html

 "LDC comes with improved support for Address Sanitizer since 
 the 1.4.0 release. Address Sanitizer (ASan) is a runtime memory 
 write/read checker that helps discover and locate memory access 
 bugs. ASan is part of the official LDC release binaries; to use 
 it you must build with -fsanitize=address. In this article, 
 I’ll explain how to use ASan, what kind of bugs it can find, 
 and what bugs it will be able to find in the (hopefully near) 
 future."

Nice article. Main question / comment is about the need for 
blacklisting D standard libraries (druntime/phobos). If someone 
wants to try ASan out on their own code, can they start by 
ignoring the D standard libraries? And, for programs that use 
druntime/phobos, will this be effective? If I understand the 
post, the answer is "yes", but I think it could be more explicit.

Second comment is related - If the reader was to try 
instrumenting druntime/phobos along with their own code, how much 
effort should be expected to correctly blacklist druntime/phobos 
code? Would many programs have smooth sailing if they took the 
blacklist published in the post? Or is this early stage enough 
that some real effort should be expected?

Also, if the blacklist file in the post represents a meaningful 
starting point, perhaps it makes sense to check it in and 
distribute it. This would provide a place for contributors to 
start making improvements.

Dec 26 2017

Johan Engelen <j j.nl> writes:

On Tuesday, 26 December 2017 at 22:11:18 UTC, Jon Degenhardt 
wrote:
 On Monday, 25 December 2017 at 17:03:37 UTC, Johan Engelen 
 wrote:
 I've been writing this article since August, and finally found 
 some time to finish it:

 http://johanengelen.github.io/ldc/2017/12/25/LDC-and-AddressSanitizer.html

 Nice article. Main question / comment is about the need for 
 blacklisting D standard libraries (druntime/phobos). If someone 
 wants to try ASan out on their own code, can they start by 
 ignoring the D standard libraries? And, for programs that use 
 druntime/phobos, will this be effective? If I understand the 
 post, the answer is "yes", but I think it could be more 
 explicit.

Indeed, yes. I've used ASan successfully on the ddmd lexer. 
"successfully" = I found and fixed an actual bug with it.
Without ASan-enabled standard libs, ASan testing will cover your 
code and (most) std lib _templated_ code.
A blacklist may be needed for templated std lib code that doesn't 
work with ASan (yet), either because of a bug in the std lib (not 
very likely I think) or something else. We need much more testing 
of LDC+ASan.

 Second comment is related - If the reader was to try 
 instrumenting druntime/phobos along with their own code, how 
 much effort should be expected to correctly blacklist 
 druntime/phobos code? Would many programs have smooth sailing 
 if they took the blacklist published in the post? Or is this 
 early stage enough that some real effort should be expected?

Very early stage. I myself have not worked on ASan-enabled 
druntime/phobos for more than 30 minutes. Already found some 
trouble with cpuid functions (inline asm): `fun:_D4core5cpuid*` 
must be added to the blacklist.
I think the first goal should be to make a blacklist such that 

section. Then afterwards, we can reduce the blacklist bit-by-bit 
by figuring out exactly why ASan triggers: either a bug, expected 
behavior, or an ASan bug.
A counterpart to the blacklist file is an 
` no_sanitize("address")` magic UDA; to disable ASan and document 
it inside the code. This should be done in such a way that it is 
upstreamable. (e.g. version(LDC) static import ldc.attributes, 
alias no_sanitize = ...)

 Also, if the blacklist file in the post represents a meaningful 
 starting point,

it does

 perhaps it makes sense to check it in and distribute it. This 
 would provide a place for contributors to start making 
 improvements.

Definitely makes sense. I think this should be inside the runtime 
libraries' repos, right? (So one blacklist for druntime, and 
another for Phobos).
(I'm even thinking about adding `-fsanitize-blacklist=<...>` to 
the shipped blacklist in `ldc.conf`.)

I'll figure out how to incorporate your comments into the 
article, thanks.

cheers,
   Johan

Dec 26 2017

Johan Engelen <j j.nl> writes:

On Monday, 25 December 2017 at 17:03:37 UTC, Johan Engelen wrote:
 I've been writing this article since August, and finally found 
 some time to finish it:

 http://johanengelen.github.io/ldc/2017/12/25/LDC-and-AddressSanitizer.html

Is it a good fit with /r/programming ?

-Johan

Dec 28 2017

Atila Neves <atila.neves gmail.com> writes:

On Thursday, 28 December 2017 at 16:29:49 UTC, Johan Engelen 
wrote:
 On Monday, 25 December 2017 at 17:03:37 UTC, Johan Engelen 
 wrote:
 I've been writing this article since August, and finally found 
 some time to finish it:

 http://johanengelen.github.io/ldc/2017/12/25/LDC-and-AddressSanitizer.html

 Is it a good fit with /r/programming ?

 -Johan

I'd definitely say so.

Atila

Dec 29 2017

Martin Nowak <code+news.digitalmars dawg.eu> writes:

On 12/25/2017 06:03 PM, Johan Engelen wrote:
 I've been writing this article since August, and finally found some time
 to finish it:
 
 http://johanengelen.github.io/ldc/2017/12/25/LDC-and-AddressSanitizer.html

Just built dmd with AddressSanitizer and ran dmd's, druntime's, and
phobos' test-suite.

https://issues.dlang.org/show_bug.cgi?id=18189
https://issues.dlang.org/show_bug.cgi?id=18190

Nothing in the D part, not too surprising given dmd's approach to memory
management though ;).

-Martin

Jan 03 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 1/3/2018 3:16 PM, Martin Nowak wrote:
 https://issues.dlang.org/show_bug.cgi?id=18190

This is a stack overflow caused by having 4096 expression statements. The 
compiler joins them with a commaexpression, and then recursively traverses it.

 Nothing in the D part, not too surprising given dmd's approach to memory

management though ;).

Stack overflow has nothing to do with memory management.

Jan 03 2018

Walter Bright <newshound2 digitalmars.com> writes:

On 1/3/2018 4:46 PM, Walter Bright wrote:
 On 1/3/2018 3:16 PM, Martin Nowak wrote:
 https://issues.dlang.org/show_bug.cgi?id=18190

 
 This is a stack overflow caused by having 4096 expression statements. The 
 compiler joins them with a commaexpression, and then recursively traverses it.
 
  > Nothing in the D part, not too surprising given dmd's approach to memory
 management though ;).
 
 Stack overflow has nothing to do with memory management.

I'm a little curious about the stack overflow. I thought Linux would 
automatically extend the stack if it overflowed?

Jan 04 2018

codephantom <me noyb.com> writes:

On Friday, 5 January 2018 at 01:32:50 UTC, Walter Bright wrote:
 On 1/3/2018 4:46 PM, Walter Bright wrote:
 On 1/3/2018 3:16 PM, Martin Nowak wrote:
 https://issues.dlang.org/show_bug.cgi?id=18190

 
 This is a stack overflow caused by having 4096 expression 
 statements. The compiler joins them with a commaexpression, 
 and then recursively traverses it.
 
  > Nothing in the D part, not too surprising given dmd's 
 approach to memory
 management though ;).
 
 Stack overflow has nothing to do with memory management.

 I'm a little curious about the stack overflow. I thought Linux 
 would automatically extend the stack if it overflowed?

it will, but only up to the rlimit. then it will SIGSEGV.

http://man7.org/linux/man-pages/man2/getrlimit.2.html

Jan 04 2018

Johan Engelen <j j.nl> writes:

On Wednesday, 3 January 2018 at 23:16:45 UTC, Martin Nowak wrote:
 On 12/25/2017 06:03 PM, Johan Engelen wrote:
 I've been writing this article since August, and finally found 
 some time to finish it:
 
 http://johanengelen.github.io/ldc/2017/12/25/LDC-and-AddressSanitizer.html

 Just built dmd with AddressSanitizer and ran dmd's, druntime's, 
 and phobos' test-suite.

Nice.
Plans to make it part of CI ?

-Johan

Jan 04 2018

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Article: Finding memory bugs in D code with AddressSanitizer