digitalmars.D - D code obfuscator

DigitalDesigns (8/8) Jun 13 2018 Is there an obfuscator for D that at least renames identifiers?

Shachar Shemesh (18/28) Jun 13 2018 I highly doubt it.

DigitalDesigns (2/33) Jun 13 2018 Just one question! Are you kidding me?

Stefan Koch (6/7) Jun 13 2018 He most certainly is not.
Shachar Shemesh (21/33) Jun 14 2018 First of all, run your program under strace. For a surprising percentage...

DigitalDesigns (2/39) Jun 14 2018 Wait? Are you sure you are not kidding? Do you want another shot?

Shachar Shemesh (11/13) Jun 14 2018 No, I'm fine. Thank you. I am not out here to convert anyone. If you

DigitalDesigns (5/18) Jun 14 2018 That's the best you can do? Do you really expect me to go and

=?UTF-8?Q?Ali_=c3=87ehreli?= (3/5) Jun 14 2018 That was your third. :/

Cym13 (33/35) Jun 14 2018 I won't say that obfuscation is entirely useless, if I have to

Vladimir Panteleev (13/14) Jun 14 2018 I've had some experience on both sides of this... so, I think I

Vladimir Panteleev (5/8) Jun 13 2018 Yes, DustMite has an obfuscation mode.
Norm (6/14) Jun 13 2018 I don't know any specifically for D but these C/C++ tools might

DigitalDesigns <DigitalDesigns gmail.com> writes:

Is there an obfuscator for D that at least renames identifiers? 
This is because sometimes they leak from various processes and 
could be potential sources of attack.

It would be a tool that probably just replaces their values with, 
say their hash + something else and done pre release build. 
Ideally it would be able to compile with dmd and all in memory or 
use temp storage without file issues. It can't modify the code 
directly because then that would be permanent.

Jun 13 2018

Shachar Shemesh <shachar weka.io> writes:

On 14/06/18 03:01, DigitalDesigns wrote:
 Is there an obfuscator for D that at least renames identifiers? This is 
 because sometimes they leak from various processes and could be 
 potential sources of attack.
 
 It would be a tool that probably just replaces their values with, say 
 their hash + something else and done pre release build. Ideally it would 
 be able to compile with dmd and all in memory or use temp storage 
 without file issues. It can't modify the code directly because then that 
 would be permanent.
 

I highly doubt it.

You see, with introspection and run-time execution, writing such a tool 
is equivalent to solving the halting problem. You simply do not know 
what you're affecting.

There are some cases where you might know at x% certainty that it's okay 
to rename. Someone might do a best-effort based tool. I'm not aware of one.


With that said, what you're trying to achieve is probably not a good 
idea anyways. With very few exceptions(1), reverse-engineering code to 
figure out what it does is not considerably more difficult than using 
the source, even when none of the identifiers leak at all. Certain 
aspects of creating attacks are even easier with good rev-eng tools than 
in source form.

Shachar

1- One notable exception is complex algorithmic code. I will point out 
that those are difficult to figure out from source code too, and it 
usually takes very good documentation to be able to do so, so even there 
I'm not sure my original statement doesn't hold.

Jun 13 2018

DigitalDesigns <DigitalDesigns gmail.com> writes:

On Thursday, 14 June 2018 at 02:13:58 UTC, Shachar Shemesh wrote:
 On 14/06/18 03:01, DigitalDesigns wrote:
 Is there an obfuscator for D that at least renames 
 identifiers? This is because sometimes they leak from various 
 processes and could be potential sources of attack.
 
 It would be a tool that probably just replaces their values 
 with, say their hash + something else and done pre release 
 build. Ideally it would be able to compile with dmd and all in 
 memory or use temp storage without file issues. It can't 
 modify the code directly because then that would be permanent.
 

 I highly doubt it.

 You see, with introspection and run-time execution, writing 
 such a tool is equivalent to solving the halting problem. You 
 simply do not know what you're affecting.

 There are some cases where you might know at x% certainty that 
 it's okay to rename. Someone might do a best-effort based tool. 
 I'm not aware of one.


 With that said, what you're trying to achieve is probably not a 
 good idea anyways. With very few exceptions(1), 
 reverse-engineering code to figure out what it does is not 
 considerably more difficult than using the source, even when 
 none of the identifiers leak at all. Certain aspects of 
 creating attacks are even easier with good rev-eng tools than 
 in source form.

 Shachar

 1- One notable exception is complex algorithmic code. I will 
 point out that those are difficult to figure out from source 
 code too, and it usually takes very good documentation to be 
 able to do so, so even there I'm not sure my original statement 
 doesn't hold.


Just one question! Are you kidding me?

Jun 13 2018

Stefan Koch <uplink.coder googlemail.com> writes:

On Thursday, 14 June 2018 at 05:21:03 UTC, DigitalDesigns wrote:
 Just one question! Are you kidding me?

He most certainly is not.

Infact I prefer size-optimized machinecode over source sometimes.

Because
it is a trustworthy representation of what the program does.

Rather then being a half-truth about what it should do.

Jun 13 2018

Shachar Shemesh <shachar weka.io> writes:

On 14/06/18 08:21, DigitalDesigns wrote:
 On Thursday, 14 June 2018 at 02:13:58 UTC, Shachar Shemesh wrote:
 With that said, what you're trying to achieve is probably not a good 
 idea anyways. With very few exceptions(1), reverse-engineering code to 
 figure out what it does is not considerably more difficult than using 
 the source, even when none of the identifiers leak at all. Certain 
 aspects of creating attacks are even easier with good rev-eng tools 
 than in source form.

 Shachar

 
 
 Just one question! Are you kidding me?

First of all, run your program under strace. For a surprising percentage 
of the programs that should give you a fairly good idea of what the 
program is doing. ltrace goes further, but it can be easily defeated by 
statically linking, so probably irrelevant for our current discussion.

Next, try loading your program in Ida Pro 
(https://www.hex-rays.com/products/ida/index.shtml). You will notice 
that program flow practically jumps out at you with no further work on 
your part.

Other tricks require a little more knowledge, but are still exceedingly 
effective.

In a demonstration I saw in 2002, Halvar Flake showed how he uses Ida to 
graph the branches, and then use a tool he built to place breakpoints on 
the branch points. Next he started feeding inputs to the program, and 
colored the graph where the input sent the code. He used that to find 
the correct input that would bring the code path to the line he thought 
might be vulnerable.

If I had to do this trick today for *my own* programs, I'd still use Ida 
and the compiled code.

So, no, I was not kidding. Not even close.

Shachar

Jun 14 2018

DigitalDesigns <DigitalDesigns gmail.com> writes:

On Thursday, 14 June 2018 at 08:54:16 UTC, Shachar Shemesh wrote:
 On 14/06/18 08:21, DigitalDesigns wrote:
 On Thursday, 14 June 2018 at 02:13:58 UTC, Shachar Shemesh 
 wrote:
 With that said, what you're trying to achieve is probably not 
 a good idea anyways. With very few exceptions(1), 
 reverse-engineering code to figure out what it does is not 
 considerably more difficult than using the source, even when 
 none of the identifiers leak at all. Certain aspects of 
 creating attacks are even easier with good rev-eng tools than 
 in source form.

 Shachar

 
 
 Just one question! Are you kidding me?

 First of all, run your program under strace. For a surprising 
 percentage of the programs that should give you a fairly good 
 idea of what the program is doing. ltrace goes further, but it 
 can be easily defeated by statically linking, so probably 
 irrelevant for our current discussion.

 Next, try loading your program in Ida Pro 
 (https://www.hex-rays.com/products/ida/index.shtml). You will 
 notice that program flow practically jumps out at you with no 
 further work on your part.

 Other tricks require a little more knowledge, but are still 
 exceedingly effective.

 In a demonstration I saw in 2002, Halvar Flake showed how he 
 uses Ida to graph the branches, and then use a tool he built to 
 place breakpoints on the branch points. Next he started feeding 
 inputs to the program, and colored the graph where the input 
 sent the code. He used that to find the correct input that 
 would bring the code path to the line he thought might be 
 vulnerable.

 If I had to do this trick today for *my own* programs, I'd 
 still use Ida and the compiled code.

 So, no, I was not kidding. Not even close.

 Shachar

Wait? Are you sure you are not kidding? Do you want another shot?

Jun 14 2018

Shachar Shemesh <shachar weka.io> writes:

On 14/06/18 13:39, DigitalDesigns wrote:
 
 Wait? Are you sure you are not kidding? Do you want another shot?

No, I'm fine. Thank you. I am not out here to convert anyone. If you 
want to believe the magic of obfuscation, go right ahead.

You can probably even leverage D's CTFE to do it inside the compiler 
while not making your program too much uglier. Something like replacing 
definitions with:

mixin Obfuscate!(int, "variableName");

and use with:

Deobfuscate!"variableName";

Shouldn't be too difficult to create.

Shachar

Jun 14 2018

DigitalDesigns <DigitalDesigns gmail.com> writes:

On Thursday, 14 June 2018 at 11:07:17 UTC, Shachar Shemesh wrote:
 On 14/06/18 13:39, DigitalDesigns wrote:
 
 Wait? Are you sure you are not kidding? Do you want another 
 shot?

 No, I'm fine. Thank you. I am not out here to convert anyone. 
 If you want to believe the magic of obfuscation, go right ahead.

Dude, don't be an idiot! Please! Of course, here we go...

 You can probably even leverage D's CTFE to do it inside the 
 compiler while not making your program too much uglier. 
 Something like replacing definitions with:

 mixin Obfuscate!(int, "variableName");

 and use with:

 Deobfuscate!"variableName";

 Shouldn't be too difficult to create.

That's the best you can do? Do you really expect me to go and 
manually obfuscate an entire program? Do you want to try again? 3 
strikes and your out!

Jun 14 2018

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 06/14/2018 04:33 AM, DigitalDesigns wrote:

 3 strikes and
 your out!

That was your third. :/

Ali

Jun 14 2018

Cym13 <cpicard openmailbox.org> writes:

On Thursday, 14 June 2018 at 10:39:19 UTC, DigitalDesigns wrote:
 Wait? Are you sure you are not kidding? Do you want another 
 shot?

I won't say that obfuscation is entirely useless, if I have to 
choose I'll of course take the version with symbols for reverse 
engineering and there are specific cases where symbols carry way 
to much information for you to want it disclosed (most common 
being names of customers or projects etc).

But, as someone whose job is to find security issues with 
softwares
  (and other stuff) be it with or without source, I can say with 
professionnal certainty that things like changing all identifiers 
to single-letter ids don't slow me the slightest in my 
assignments. That's just the state of things, reversers deal with 
stripped stuff all the time, identifiers are just nice to have.

So instead, here's what would slow a reverse engineer:

- Remove strings. Make sure to remove as many as you can, 
especially debug statements. Hide the rest by encrypting in 
memory. Even if it is possible to decrypt it or read it at 
runtime it'll be way harder to correlate things together.

- Pack. Have your software decipher itself in memory at runtime, 
not all at once but only sections at once dynamically. Use random 
keys automatically generated at compile-time for that, that'll 
mess up binary diffs.

- Include binary tricks to mess up with disassemblers. There are 
many constructs that common disassemblers interpret badly.

- Mess with the structure. If you can remove all conditions and 
loops. A reverser can often just look at a function's logical 
graph and know what kind of work it is doing. The movfuscator is 
a good example.

- Add runtime checks based on time deltas between two points of 
the code in different functions. Generate other output based on 
that.

- Be sure to encrypt all communications of course.

In short, do what good malwares do.

Jun 14 2018

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Thursday, 14 June 2018 at 08:54:16 UTC, Shachar Shemesh wrote:
 So, no, I was not kidding. Not even close.

I've had some experience on both sides of this... so, I think I 
can say with some certainty that debugging symbols make 
reverse-engineering MUCH easier (many hunts to find the relevant 
code can be reduced to a keyword search), so I think it's a valid 
concern.

That D leaks identifiers and other bits from the source code is a 
real issue preventing some real-world use cases. E.g., there 
might be legal obligations in place where leaking source code 
identifiers could be considered a breach of NDA etc. In one case, 
we needed to write an RTTI patcher for C++ (MSVC) after 
updating/reconfiguring the build toolchain, as the compiler would 
otherwise place the class names of some classes in the binary.

Jun 14 2018

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Thursday, 14 June 2018 at 00:01:31 UTC, DigitalDesigns wrote:
 Is there an obfuscator for D that at least renames identifiers? 
 This is because sometimes they leak from various processes and 
 could be potential sources of attack.

Yes, DustMite has an obfuscation mode.

You will need to give it a test command which checks if the file 
is still a working D program. Building the program and running 
its unit tests is generally sufficient for this purpose.

Jun 13 2018

Norm <norm.rowtree gmail.com> writes:

On Thursday, 14 June 2018 at 00:01:31 UTC, DigitalDesigns wrote:
 Is there an obfuscator for D that at least renames identifiers? 
 This is because sometimes they leak from various processes and 
 could be potential sources of attack.

 It would be a tool that probably just replaces their values 
 with, say their hash + something else and done pre release 
 build. Ideally it would be able to compile with dmd and all in 
 memory or use temp storage without file issues. It can't modify 
 the code directly because then that would be permanent.

I don't know any specifically for D but these C/C++ tools might 
help as a starting point.

https://github.com/obfuscator-llvm/obfuscator/wiki
https://github.com/obfuscator-llvm/obfuscator/tree/llvm-4.0

https://sourceforge.net/projects/cshroud/

Jun 13 2018

D Programming

C/C++ Programming

Other

digitalmars.D - D code obfuscator