www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How to map machine instctions in memory and execute them? (Aka, how to

reply rempas <rempas tutanota.com> writes:
I tried to find anything that will show code but I wasn't able to 
find anything expect for an answer on stackoverflow. I would find 
a lot of theory but no practical code that works. What I want to 
do is allocate memory (with execution mapping), add the machine 
instructions and then allocate another memory block for the data 
and finally, execute the block of memory that contains the code. 
So something like what the OS loader does when reading an 
executable. I have come with the following code:

```d
import core.stdc.stdio;
import core.stdc.string;
import core.stdc.stdlib;
import core.sys.linux.sys.mman;

extern (C) void main() {
   char* data = cast(char*)mmap(null, cast(ulong)15, 
PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
   memset(data, 0x0, 15); // Default value

   *data = 'H';
   data[1] = 'e';
   data[2] = 'l';
   data[3] = 'l';
   data[4] = 'o';
   data[5] = ' ';

   data[6] = 'w';
   data[7] = 'o';
   data[8] = 'r';
   data[9] = 'l';
   data[10] = 'd';
   data[11] = '!';

   void* code = mmap(null, cast(ulong)500, PROT_READ | PROT_WRITE 
| PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0);
   memset(code, 0xc3, 500); // Default value

   /* Call the "write" and "exit" system calls*/
   // mov rax, 0x04
   *cast(char*)code = 0x48;
   *cast(char*)(code + 1) = 0xC7;
   *cast(char*)(code + 2) = 0xC0;
   *cast(char*)(code + 3) = 0x04;
   *cast(char*)(code + 4) = 0x00;
   *cast(char*)(code + 5) = 0x00;
   *cast(char*)(code + 6) = 0x00;

   // mov rbx, 0x01
   *cast(char*)(code + 7)  = 0x48;
   *cast(char*)(code + 8)  = 0xC7;
   *cast(char*)(code + 9)  = 0xC3;
   *cast(char*)(code + 10) = 0x01;
   *cast(char*)(code + 11) = 0x00;
   *cast(char*)(code + 12) = 0x00;
   *cast(char*)(code + 13) = 0x00;

   // mov rdx, <wordLen>
   *cast(char*)(code + 14) = 0x48;
   *cast(char*)(code + 15) = 0xC7;
   *cast(char*)(code + 16) = 0xC2;
   *cast(char*)(code + 17) = 12;
   *cast(char*)(code + 18) = 0x00;
   *cast(char*)(code + 19) = 0x00;
   *cast(char*)(code + 20) = 0x00;

   // mov rdx, <location where data are allocated>
   *cast(char*)(code + 21) = 0x48;
   *cast(char*)(code + 22) = 0xC7;
   *cast(char*)(code + 23) = 0xC1;
   *cast(long*)(code + 24) = cast(long)data;
   *cast(char*)(code + 32) = 0x00;

   // int 0x80
   *cast(char*)(code + 33) = 0xcd;
   *cast(char*)(code + 34) = 0x80;

   /* Execute the code */
   (cast(void* function())&code)();
}
```

I'm 100% sure that the instructions work as I have tested them 
with another example that creates an ELF executable file and it 
was able to execute correctly. So unless I copy-pasted them 
wrong, the instructions are not the problem. The only thing that 
may be wrong is when I'm getting the location of the "data" 
"segment". In my eyes, this uses 8 bytes for the memory address 
(I'm in a 64bit machine) and it takes the memory address the 
"data" variable holds so I would expect it to work....

Any ideas?
Jun 06 2022
next sibling parent reply Alain De Vos <devosalain ymail.com> writes:
Note , it is also possible to do inline assembly with asm{...}  
or __asm(T) {..}.
Jun 06 2022
parent rempas <rempas tutanota.com> writes:
On Monday, 6 June 2022 at 15:27:12 UTC, Alain De Vos wrote:
 Note , it is also possible to do inline assembly with asm{...}  
 or __asm(T) {..}.
Thank you for the info! I am aware of that, I don't want to practically do this. I just want to learn how it works. It will be useful when I'll built my own OS.
Jun 06 2022
prev sibling next sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:
   void* code = mmap(null, cast(ulong)500, PROT_READ | 
 PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0);
On a lot of systems, it can't be executable and writable at the same time, it is a security measure. see https://en.wikipedia.org/wiki/W%5EX so you might have to mprotect it to remove the write permission before trying to execute it. idk though
Jun 06 2022
parent rempas <rempas tutanota.com> writes:
On Monday, 6 June 2022 at 16:08:28 UTC, Adam D Ruppe wrote:
 On a lot of systems, it can't be executable and writable at the 
 same time, it is a security measure.

 see https://en.wikipedia.org/wiki/W%5EX


 so you might have to mprotect it to remove the write permission 
 before trying to execute it.

 idk though
Thank you! This was very helpful and I can see why it is a clever idea to not allow it (and I love that OpenBSD was the first introducing it!!) and I love security stuff ;) However, even with "mprotect" or If I just use "PROT_READ" and "PROT_EXEC", it still doesn't work so there should be something else I'm doing wrong...
Jun 06 2022
prev sibling next sibling parent reply Guillaume Piolat <first.last gmail.com> writes:
On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:
 Any ideas?
See: https://github.com/GhostRain0/xbyak https://github.com/MrSmith33/vox/blob/master/source/vox/utils/mem.d
Jun 06 2022
parent rempas <rempas tutanota.com> writes:
On Monday, 6 June 2022 at 16:24:58 UTC, Guillaume Piolat wrote:
 See:
 https://github.com/GhostRain0/xbyak
 https://github.com/MrSmith33/vox/blob/master/source/vox/utils/mem.d
Thank you! And I just noticed that the second source is from Vox!!!!
Jun 06 2022
prev sibling next sibling parent reply Johan <j j.nl> writes:
On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:
 ```
   // mov rdx, <wordLen>
   *cast(char*)(code + 14) = 0x48;
   *cast(char*)(code + 15) = 0xC7;
   *cast(char*)(code + 16) = 0xC2;
   *cast(char*)(code + 17) = 12;
   *cast(char*)(code + 18) = 0x00;
   *cast(char*)(code + 19) = 0x00;
   *cast(char*)(code + 20) = 0x00;

   // mov rdx, <location where data are allocated>
   *cast(char*)(code + 21) = 0x48;
   *cast(char*)(code + 22) = 0xC7;
   *cast(char*)(code + 23) = 0xC1;
   *cast(long*)(code + 24) = cast(long)data;
   *cast(char*)(code + 32) = 0x00;
   ```
This instruction is wrong. Note that you are writing twice to RDX, but also that you are using `mov sign_extend imm32, reg64` instead of `mov imm64, reg64` (`0x48 0xBA`?). Third, why append an extra zero (`*cast(char*)(code + 32) = 0x00;`)? That must be a bug too. cheers, Johan
Jun 06 2022
parent rempas <rempas tutanota.com> writes:
On Monday, 6 June 2022 at 18:05:23 UTC, Johan wrote:
 This instruction is wrong. Note that you are writing twice to 
 RDX, but also that you are using `mov sign_extend imm32, reg64` 
 instead of `mov imm64, reg64` (`0x48 0xBA`?). Third, why append 
 an extra zero (`*cast(char*)(code + 32) = 0x00;`)? That must be 
 a bug too.

 cheers,
   Johan
Thanks! It seems that there is probably a "typo" from the original [source](https://github.com/vishen/go-x64-executable) that I got the code. The hex values are different however so there is only a mistake in the comment, the code normally works in the example repository (and I made a D version that works too). The padding in the end seems to be necessary else the example doesn't compile (I don't know why, I'm SUPER n00b when it comes to machine language, I don't know almost anything!). I'm also not sure how the "encode" will be for `mov imm64, reg64` as I tried to type what you typed in the parenthesis and it doesn't seem to work.
Jun 06 2022
prev sibling next sibling parent rempas <rempas tutanota.com> writes:
On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:

In case someone is wondering, I found an answer in another
forum. The code is the following:

```d
import core.stdc.stdio;
import core.stdc.string;
import core.stdc.stdlib;
import core.sys.posix.sys.mman;

void putbytes(char **code, const char *bytes) {
   uint bt;
   for (int i = 0, n; sscanf(bytes + i, "%x%n", &bt, &n) == 1; i 
+= n)
     { *(*code)++ = cast(char)bt; }
}

void putdata(char **code, char** data) {
   memcpy(*code, data, (*data).sizeof);
   *code += (*data).sizeof;
}

extern (C) void main() {
   char *data = cast(char*)mmap(null, cast(ulong)15, PROT_READ | 
PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
   strcpy(data, "Hello world!\n");

   char *code = cast(char*)mmap(null, cast(ulong)500, PROT_READ | 
PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0);
   char *pos = code;

   // Call the "write" and "exit" system calls
   putbytes(&pos, "48 C7 C0 1 0 0 0");    // mov rax, 0x01     
(write syscall)
   putbytes(&pos, "48 C7 C7 1 0 0 0");    // mov rdi, 0x01     
(stdout)
   putbytes(&pos, "48 C7 C2 D 0 0 0");   // mov rdx, 13       
(string length)
   putbytes(&pos, "48 BE");                      // movabs rsi, 
data  (string address)
   putdata(&pos, &data);
   putbytes(&pos, "0F 05");                        // syscall
   putbytes(&pos, "48 C7 C0 3C 0 0 0");  // mov rax, 0x3C     
(exit syscall)
   putbytes(&pos, "0F 05");                       // syscall

   // Execute the code
   (cast(void* function())code)();
}
```
Jun 08 2022
prev sibling parent max haughton <maxhaton gmail.com> writes:
On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:
 I tried to find anything that will show code but I wasn't able 
 to find anything expect for an answer on stackoverflow. I would 
 find a lot of theory but no practical code that works. What I 
 want to do is allocate memory (with execution mapping), add the 
 machine instructions and then allocate another memory block for 
 the data and finally, execute the block of memory that contains 
 the code. So something like what the OS loader does when 
 reading an executable. I have come with the following code:

 [...]
If you know the instructions ahead of time LDC and GDC will both let you put a function in it's own section, and you can then use some linker magic to get pointers to the beginning and end of that section.
Jun 08 2022