www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D code obfuscator

reply DigitalDesigns <DigitalDesigns gmail.com> writes:
Is there an obfuscator for D that at least renames identifiers? 
This is because sometimes they leak from various processes and 
could be potential sources of attack.

It would be a tool that probably just replaces their values with, 
say their hash + something else and done pre release build. 
Ideally it would be able to compile with dmd and all in memory or 
use temp storage without file issues. It can't modify the code 
directly because then that would be permanent.
Jun 13 2018
next sibling parent reply Shachar Shemesh <shachar weka.io> writes:
On 14/06/18 03:01, DigitalDesigns wrote:
 Is there an obfuscator for D that at least renames identifiers? This is 
 because sometimes they leak from various processes and could be 
 potential sources of attack.
 
 It would be a tool that probably just replaces their values with, say 
 their hash + something else and done pre release build. Ideally it would 
 be able to compile with dmd and all in memory or use temp storage 
 without file issues. It can't modify the code directly because then that 
 would be permanent.
 
I highly doubt it. You see, with introspection and run-time execution, writing such a tool is equivalent to solving the halting problem. You simply do not know what you're affecting. There are some cases where you might know at x% certainty that it's okay to rename. Someone might do a best-effort based tool. I'm not aware of one. With that said, what you're trying to achieve is probably not a good idea anyways. With very few exceptions(1), reverse-engineering code to figure out what it does is not considerably more difficult than using the source, even when none of the identifiers leak at all. Certain aspects of creating attacks are even easier with good rev-eng tools than in source form. Shachar 1- One notable exception is complex algorithmic code. I will point out that those are difficult to figure out from source code too, and it usually takes very good documentation to be able to do so, so even there I'm not sure my original statement doesn't hold.
Jun 13 2018
parent reply DigitalDesigns <DigitalDesigns gmail.com> writes:
On Thursday, 14 June 2018 at 02:13:58 UTC, Shachar Shemesh wrote:
 On 14/06/18 03:01, DigitalDesigns wrote:
 Is there an obfuscator for D that at least renames 
 identifiers? This is because sometimes they leak from various 
 processes and could be potential sources of attack.
 
 It would be a tool that probably just replaces their values 
 with, say their hash + something else and done pre release 
 build. Ideally it would be able to compile with dmd and all in 
 memory or use temp storage without file issues. It can't 
 modify the code directly because then that would be permanent.
 
I highly doubt it. You see, with introspection and run-time execution, writing such a tool is equivalent to solving the halting problem. You simply do not know what you're affecting. There are some cases where you might know at x% certainty that it's okay to rename. Someone might do a best-effort based tool. I'm not aware of one. With that said, what you're trying to achieve is probably not a good idea anyways. With very few exceptions(1), reverse-engineering code to figure out what it does is not considerably more difficult than using the source, even when none of the identifiers leak at all. Certain aspects of creating attacks are even easier with good rev-eng tools than in source form. Shachar 1- One notable exception is complex algorithmic code. I will point out that those are difficult to figure out from source code too, and it usually takes very good documentation to be able to do so, so even there I'm not sure my original statement doesn't hold.
Just one question! Are you kidding me?
Jun 13 2018
next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 14 June 2018 at 05:21:03 UTC, DigitalDesigns wrote:
 Just one question! Are you kidding me?
He most certainly is not. Infact I prefer size-optimized machinecode over source sometimes. Because it is a trustworthy representation of what the program does. Rather then being a half-truth about what it should do.
Jun 13 2018
prev sibling parent reply Shachar Shemesh <shachar weka.io> writes:
On 14/06/18 08:21, DigitalDesigns wrote:
 On Thursday, 14 June 2018 at 02:13:58 UTC, Shachar Shemesh wrote:
 With that said, what you're trying to achieve is probably not a good 
 idea anyways. With very few exceptions(1), reverse-engineering code to 
 figure out what it does is not considerably more difficult than using 
 the source, even when none of the identifiers leak at all. Certain 
 aspects of creating attacks are even easier with good rev-eng tools 
 than in source form.

 Shachar
Just one question! Are you kidding me?
First of all, run your program under strace. For a surprising percentage of the programs that should give you a fairly good idea of what the program is doing. ltrace goes further, but it can be easily defeated by statically linking, so probably irrelevant for our current discussion. Next, try loading your program in Ida Pro (https://www.hex-rays.com/products/ida/index.shtml). You will notice that program flow practically jumps out at you with no further work on your part. Other tricks require a little more knowledge, but are still exceedingly effective. In a demonstration I saw in 2002, Halvar Flake showed how he uses Ida to graph the branches, and then use a tool he built to place breakpoints on the branch points. Next he started feeding inputs to the program, and colored the graph where the input sent the code. He used that to find the correct input that would bring the code path to the line he thought might be vulnerable. If I had to do this trick today for *my own* programs, I'd still use Ida and the compiled code. So, no, I was not kidding. Not even close. Shachar
Jun 14 2018
next sibling parent reply DigitalDesigns <DigitalDesigns gmail.com> writes:
On Thursday, 14 June 2018 at 08:54:16 UTC, Shachar Shemesh wrote:
 On 14/06/18 08:21, DigitalDesigns wrote:
 On Thursday, 14 June 2018 at 02:13:58 UTC, Shachar Shemesh 
 wrote:
 With that said, what you're trying to achieve is probably not 
 a good idea anyways. With very few exceptions(1), 
 reverse-engineering code to figure out what it does is not 
 considerably more difficult than using the source, even when 
 none of the identifiers leak at all. Certain aspects of 
 creating attacks are even easier with good rev-eng tools than 
 in source form.

 Shachar
Just one question! Are you kidding me?
First of all, run your program under strace. For a surprising percentage of the programs that should give you a fairly good idea of what the program is doing. ltrace goes further, but it can be easily defeated by statically linking, so probably irrelevant for our current discussion. Next, try loading your program in Ida Pro (https://www.hex-rays.com/products/ida/index.shtml). You will notice that program flow practically jumps out at you with no further work on your part. Other tricks require a little more knowledge, but are still exceedingly effective. In a demonstration I saw in 2002, Halvar Flake showed how he uses Ida to graph the branches, and then use a tool he built to place breakpoints on the branch points. Next he started feeding inputs to the program, and colored the graph where the input sent the code. He used that to find the correct input that would bring the code path to the line he thought might be vulnerable. If I had to do this trick today for *my own* programs, I'd still use Ida and the compiled code. So, no, I was not kidding. Not even close. Shachar
Wait? Are you sure you are not kidding? Do you want another shot?
Jun 14 2018
next sibling parent reply Shachar Shemesh <shachar weka.io> writes:
On 14/06/18 13:39, DigitalDesigns wrote:
 
 Wait? Are you sure you are not kidding? Do you want another shot?
No, I'm fine. Thank you. I am not out here to convert anyone. If you want to believe the magic of obfuscation, go right ahead. You can probably even leverage D's CTFE to do it inside the compiler while not making your program too much uglier. Something like replacing definitions with: mixin Obfuscate!(int, "variableName"); and use with: Deobfuscate!"variableName"; Shouldn't be too difficult to create. Shachar
Jun 14 2018
parent reply DigitalDesigns <DigitalDesigns gmail.com> writes:
On Thursday, 14 June 2018 at 11:07:17 UTC, Shachar Shemesh wrote:
 On 14/06/18 13:39, DigitalDesigns wrote:
 
 Wait? Are you sure you are not kidding? Do you want another 
 shot?
No, I'm fine. Thank you. I am not out here to convert anyone. If you want to believe the magic of obfuscation, go right ahead.
Dude, don't be an idiot! Please! Of course, here we go...
 You can probably even leverage D's CTFE to do it inside the 
 compiler while not making your program too much uglier. 
 Something like replacing definitions with:

 mixin Obfuscate!(int, "variableName");

 and use with:

 Deobfuscate!"variableName";

 Shouldn't be too difficult to create.
That's the best you can do? Do you really expect me to go and manually obfuscate an entire program? Do you want to try again? 3 strikes and your out!
Jun 14 2018
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 06/14/2018 04:33 AM, DigitalDesigns wrote:

 3 strikes and
 your out!
That was your third. :/ Ali
Jun 14 2018
prev sibling parent Cym13 <cpicard openmailbox.org> writes:
On Thursday, 14 June 2018 at 10:39:19 UTC, DigitalDesigns wrote:
 Wait? Are you sure you are not kidding? Do you want another 
 shot?
I won't say that obfuscation is entirely useless, if I have to choose I'll of course take the version with symbols for reverse engineering and there are specific cases where symbols carry way to much information for you to want it disclosed (most common being names of customers or projects etc). But, as someone whose job is to find security issues with softwares (and other stuff) be it with or without source, I can say with professionnal certainty that things like changing all identifiers to single-letter ids don't slow me the slightest in my assignments. That's just the state of things, reversers deal with stripped stuff all the time, identifiers are just nice to have. So instead, here's what would slow a reverse engineer: - Remove strings. Make sure to remove as many as you can, especially debug statements. Hide the rest by encrypting in memory. Even if it is possible to decrypt it or read it at runtime it'll be way harder to correlate things together. - Pack. Have your software decipher itself in memory at runtime, not all at once but only sections at once dynamically. Use random keys automatically generated at compile-time for that, that'll mess up binary diffs. - Include binary tricks to mess up with disassemblers. There are many constructs that common disassemblers interpret badly. - Mess with the structure. If you can remove all conditions and loops. A reverser can often just look at a function's logical graph and know what kind of work it is doing. The movfuscator is a good example. - Add runtime checks based on time deltas between two points of the code in different functions. Generate other output based on that. - Be sure to encrypt all communications of course. In short, do what good malwares do.
Jun 14 2018
prev sibling parent Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Thursday, 14 June 2018 at 08:54:16 UTC, Shachar Shemesh wrote:
 So, no, I was not kidding. Not even close.
I've had some experience on both sides of this... so, I think I can say with some certainty that debugging symbols make reverse-engineering MUCH easier (many hunts to find the relevant code can be reduced to a keyword search), so I think it's a valid concern. That D leaks identifiers and other bits from the source code is a real issue preventing some real-world use cases. E.g., there might be legal obligations in place where leaking source code identifiers could be considered a breach of NDA etc. In one case, we needed to write an RTTI patcher for C++ (MSVC) after updating/reconfiguring the build toolchain, as the compiler would otherwise place the class names of some classes in the binary.
Jun 14 2018
prev sibling next sibling parent Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Thursday, 14 June 2018 at 00:01:31 UTC, DigitalDesigns wrote:
 Is there an obfuscator for D that at least renames identifiers? 
 This is because sometimes they leak from various processes and 
 could be potential sources of attack.
Yes, DustMite has an obfuscation mode. You will need to give it a test command which checks if the file is still a working D program. Building the program and running its unit tests is generally sufficient for this purpose.
Jun 13 2018
prev sibling parent Norm <norm.rowtree gmail.com> writes:
On Thursday, 14 June 2018 at 00:01:31 UTC, DigitalDesigns wrote:
 Is there an obfuscator for D that at least renames identifiers? 
 This is because sometimes they leak from various processes and 
 could be potential sources of attack.

 It would be a tool that probably just replaces their values 
 with, say their hash + something else and done pre release 
 build. Ideally it would be able to compile with dmd and all in 
 memory or use temp storage without file issues. It can't modify 
 the code directly because then that would be permanent.
I don't know any specifically for D but these C/C++ tools might help as a starting point. https://github.com/obfuscator-llvm/obfuscator/wiki https://github.com/obfuscator-llvm/obfuscator/tree/llvm-4.0 https://sourceforge.net/projects/cshroud/
Jun 13 2018