www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - AddressSanitizer and the GC

reply Johan Engelen <j j.nl> writes:
Hi all,
   I've added AddressSanitizer [1] support to LDC, and it is able 
to nicely catch bugs like this one:
```
void foo(int* arr) {
     arr[10] = 1;
}

void main() {
     int[10] a;
     foo(&a[0]);
}
```

It works for malloc'ed memory, but not yet for GC'd memory. Our 
GC pool memory is malloc'ed, so it is "safe" to access from 
ASan's current point of view. To make it work with our GC, we 
have to mark all GC pool memory (malloc'ed) as poisoned, such 
that access to that memory will trigger an ASan error. Then when 
a user asks memory from the pool (e.g. `new int`), we should 
allocate a little more GC'd memory than what the user requests 
(red zone), and then mark only the user size as unpoisoned. By 
allocating a little more, we make sure that user allocated memory 
blocks are separated by at least an amount of red zone that helps 
detect read/write memory overflows.
Pieces of the solution:
a. Call __asan_poison_memory_region on all memory internally 
allocated by the GC
b. ptr = GC.alloc(user_request_size+redzone)
c. Call __asan_unpoison_memory_region on ptr[0..user_request_size]

I've already been toying with several options but would like your 
opinion: what is the best way for me to add this to the GC?

1. What are the best places to modify the conservative GC with 
the above pieces?

2. Should I create a new GC type "asan"? We now have 
"conservative" (default) and "manual". Adding an option to the 
default GC is made hard because of linking problems (need to link 
with asan library, but I can define  weak function stubs), but I 
am worried about slow down in non-asan mode (which is the 
99.9999999% use case).

Thanks,
   Johan


[1] https://github.com/google/sanitizers/wiki/AddressSanitizer
Jul 21 2017
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/21/2017 12:06 PM, Johan Engelen wrote:
< [...]

Thanks for your work on this. It is important to support such valuable tools.

One way making such things pretty much cost-free in the olden days was to trick 
the linker into writing NOPs over the function calls by using specially crafted 
fixup records. This worked with dumb linkers, but today's linkers do much more 
semantic inference into what's going on, I'm afraid this may be too brittle.

Unless there is a linker feature to support it? Are some linkers able to NOP
out 
a function call if it sees that the function body consists solely of a RET?

Another option is to use a pointer to the asan functions, like:

     if (asan_fp)
         (*asan_fp)(args...)

The asan_fp can then be set to NULL and there's only the overhead of a 
compare/branch rather than a function call. Well, also the load of a global 
variable.
Jul 21 2017
parent reply Johan Engelen <j j.nl> writes:
On Friday, 21 July 2017 at 20:53:18 UTC, Walter Bright wrote:
 On 7/21/2017 12:06 PM, Johan Engelen wrote:
 < [...]

 Thanks for your work on this. It is important to support such 
 valuable tools.

 One way making such things pretty much cost-free in the olden 
 days was to trick the linker into writing NOPs over the 
 function calls by using specially crafted fixup records. This 
 worked with dumb linkers, but today's linkers do much more 
 semantic inference into what's going on, I'm afraid this may be 
 too brittle.

 Unless there is a linker feature to support it? Are some 
 linkers able to NOP out a function call if it sees that the 
 function body consists solely of a RET?
ARM's toolchain has something that does this: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka14171.html Can't find something like it for other toolchains.
 Another option is to use a pointer to the asan functions, like:

     if (asan_fp)
         (*asan_fp)(args...)

 The asan_fp can then be set to NULL and there's only the 
 overhead of a compare/branch rather than a function call. Well, 
 also the load of a global variable.
I can mark the if-clause as unlikely, so the performance difference would be small (allocation is slow already, of course). Let me try that for now. We can decide later, let's first find an implementation that works and test that the whole idea is valid and I didn't miss anything important. Can you advise me on where best to add the asan calls? Thanks, Johan PS. The interface is easy, I don't need much info at the call sites: // Poisons memory region [addr, addr+size) void __asan_poison_memory_region(const(void*) addr, size_t size); // Unpoisons memory region [addr, addr+size) void __asan_unpoison_memory_region(const(void*) addr, size_t size);
Jul 21 2017
next sibling parent reply Johan Engelen <j j.nl> writes:
On Friday, 21 July 2017 at 21:37:39 UTC, Johan Engelen wrote:
 On Friday, 21 July 2017 at 20:53:18 UTC, Walter Bright wrote:
 On 7/21/2017 12:06 PM, Johan Engelen wrote:
 < [...]

 Another option is to use a pointer to the asan functions, like:

     if (asan_fp)
         (*asan_fp)(args...)

 The asan_fp can then be set to NULL and there's only the 
 overhead of a compare/branch rather than a function call. 
 Well, also the load of a global variable.
I can mark the if-clause as unlikely, so the performance difference would be small (allocation is slow already, of course).
Quick extra note: the function pointer solution is not friendly to LTO, whereas a weak-linking solution would easily be optimized-out fully with LTO. So that's something to consider too.
Jul 21 2017
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/21/2017 2:39 PM, Johan Engelen wrote:
 Quick extra note: the function pointer solution is not friendly to LTO,
whereas 
 a weak-linking solution would easily be optimized-out fully with LTO. So
that's 
 something to consider too.
This code should be version'd with: version (AddressSanitizer) so that: 1. it will be easy to try different methods 2. all the different places it is used will be easy to find 3. if it does make things too slow, it can be easily 'removed' at the option of whoever built druntime
Jul 21 2017
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/21/2017 2:37 PM, Johan Engelen wrote:
 Can you advise me on where best to add the asan calls?
Offhand, I don't know. It's been maybe 15 years since I worked on it, and a lot of people have moved things about since then. Dmitry Olshansky has worked on it very recently, in fact he is working on it at the moment. He might be a good person to ask and can likely give you a better answer than I can.
Jul 21 2017
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/21/2017 12:06 PM, Johan Engelen wrote:
    I've added AddressSanitizer [1] support to LDC, and it is able to nicely 
 catch bugs like this one:
If anyone is game, it would be cool to do an LLVM compile of DMD so AddressSanitizer can look for bugs in DMD itself.
Jul 21 2017
parent reply Johan Engelen <j j.nl> writes:
On Friday, 21 July 2017 at 20:56:02 UTC, Walter Bright wrote:
 On 7/21/2017 12:06 PM, Johan Engelen wrote:
    I've added AddressSanitizer [1] support to LDC, and it is 
 able to nicely catch bugs like this one:
If anyone is game, it would be cool to do an LLVM compile of DMD so AddressSanitizer can look for bugs in DMD itself.
Yep, I've got fuzzing to work too, so... expect nice blogs "soon". ;-) (btw, the work is because of wanting fuzz testing at Weka, so that's my stress test for this work)
Jul 21 2017
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/21/2017 2:14 PM, Johan Engelen wrote:
 On Friday, 21 July 2017 at 20:56:02 UTC, Walter Bright wrote:
 If anyone is game, it would be cool to do an LLVM compile of DMD so 
 AddressSanitizer can look for bugs in DMD itself.
Yep, I've got fuzzing to work too, so... expect nice blogs "soon". ;-) (btw, the work is because of wanting fuzz testing at Weka, so that's my stress test for this work)
Pretty dazz! Please continue!
Jul 21 2017