digitalmars.D.learn - How to map machine instctions in memory and execute them? (Aka, how to
- rempas (79/79) Jun 06 2022 I tried to find anything that will show code but I wasn't able to
- Alain De Vos (2/2) Jun 06 2022 Note , it is also possible to do inline assembly with asm{...}
- rempas (4/6) Jun 06 2022 Thank you for the info! I am aware of that, I don't want to
- Adam D Ruppe (7/9) Jun 06 2022 On a lot of systems, it can't be executable and writable at the
- rempas (7/13) Jun 06 2022 Thank you! This was very helpful and I can see why it is a clever
- Guillaume Piolat (4/5) Jun 06 2022 See:
- rempas (3/6) Jun 06 2022 Thank you! And I just noticed that the second source is from
- Johan (8/24) Jun 06 2022 This instruction is wrong. Note that you are writing twice to
- rempas (12/19) Jun 06 2022 Thanks! It seems that there is probably a "typo" from the
- rempas (43/43) Jun 08 2022 On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:
- max haughton (5/14) Jun 08 2022 If you know the instructions ahead of time LDC and GDC will both
I tried to find anything that will show code but I wasn't able to find anything expect for an answer on stackoverflow. I would find a lot of theory but no practical code that works. What I want to do is allocate memory (with execution mapping), add the machine instructions and then allocate another memory block for the data and finally, execute the block of memory that contains the code. So something like what the OS loader does when reading an executable. I have come with the following code: ```d import core.stdc.stdio; import core.stdc.string; import core.stdc.stdlib; import core.sys.linux.sys.mman; extern (C) void main() { char* data = cast(char*)mmap(null, cast(ulong)15, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); memset(data, 0x0, 15); // Default value *data = 'H'; data[1] = 'e'; data[2] = 'l'; data[3] = 'l'; data[4] = 'o'; data[5] = ' '; data[6] = 'w'; data[7] = 'o'; data[8] = 'r'; data[9] = 'l'; data[10] = 'd'; data[11] = '!'; void* code = mmap(null, cast(ulong)500, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0); memset(code, 0xc3, 500); // Default value /* Call the "write" and "exit" system calls*/ // mov rax, 0x04 *cast(char*)code = 0x48; *cast(char*)(code + 1) = 0xC7; *cast(char*)(code + 2) = 0xC0; *cast(char*)(code + 3) = 0x04; *cast(char*)(code + 4) = 0x00; *cast(char*)(code + 5) = 0x00; *cast(char*)(code + 6) = 0x00; // mov rbx, 0x01 *cast(char*)(code + 7) = 0x48; *cast(char*)(code + 8) = 0xC7; *cast(char*)(code + 9) = 0xC3; *cast(char*)(code + 10) = 0x01; *cast(char*)(code + 11) = 0x00; *cast(char*)(code + 12) = 0x00; *cast(char*)(code + 13) = 0x00; // mov rdx, <wordLen> *cast(char*)(code + 14) = 0x48; *cast(char*)(code + 15) = 0xC7; *cast(char*)(code + 16) = 0xC2; *cast(char*)(code + 17) = 12; *cast(char*)(code + 18) = 0x00; *cast(char*)(code + 19) = 0x00; *cast(char*)(code + 20) = 0x00; // mov rdx, <location where data are allocated> *cast(char*)(code + 21) = 0x48; *cast(char*)(code + 22) = 0xC7; *cast(char*)(code + 23) = 0xC1; *cast(long*)(code + 24) = cast(long)data; *cast(char*)(code + 32) = 0x00; // int 0x80 *cast(char*)(code + 33) = 0xcd; *cast(char*)(code + 34) = 0x80; /* Execute the code */ (cast(void* function())&code)(); } ``` I'm 100% sure that the instructions work as I have tested them with another example that creates an ELF executable file and it was able to execute correctly. So unless I copy-pasted them wrong, the instructions are not the problem. The only thing that may be wrong is when I'm getting the location of the "data" "segment". In my eyes, this uses 8 bytes for the memory address (I'm in a 64bit machine) and it takes the memory address the "data" variable holds so I would expect it to work.... Any ideas?
Jun 06 2022
Note , it is also possible to do inline assembly with asm{...} or __asm(T) {..}.
Jun 06 2022
On Monday, 6 June 2022 at 15:27:12 UTC, Alain De Vos wrote:Note , it is also possible to do inline assembly with asm{...} or __asm(T) {..}.Thank you for the info! I am aware of that, I don't want to practically do this. I just want to learn how it works. It will be useful when I'll built my own OS.
Jun 06 2022
On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:void* code = mmap(null, cast(ulong)500, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0);On a lot of systems, it can't be executable and writable at the same time, it is a security measure. see https://en.wikipedia.org/wiki/W%5EX so you might have to mprotect it to remove the write permission before trying to execute it. idk though
Jun 06 2022
On Monday, 6 June 2022 at 16:08:28 UTC, Adam D Ruppe wrote:On a lot of systems, it can't be executable and writable at the same time, it is a security measure. see https://en.wikipedia.org/wiki/W%5EX so you might have to mprotect it to remove the write permission before trying to execute it. idk thoughThank you! This was very helpful and I can see why it is a clever idea to not allow it (and I love that OpenBSD was the first introducing it!!) and I love security stuff ;) However, even with "mprotect" or If I just use "PROT_READ" and "PROT_EXEC", it still doesn't work so there should be something else I'm doing wrong...
Jun 06 2022
On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:Any ideas?See: https://github.com/GhostRain0/xbyak https://github.com/MrSmith33/vox/blob/master/source/vox/utils/mem.d
Jun 06 2022
On Monday, 6 June 2022 at 16:24:58 UTC, Guillaume Piolat wrote:See: https://github.com/GhostRain0/xbyak https://github.com/MrSmith33/vox/blob/master/source/vox/utils/mem.dThank you! And I just noticed that the second source is from Vox!!!!
Jun 06 2022
On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:``` // mov rdx, <wordLen> *cast(char*)(code + 14) = 0x48; *cast(char*)(code + 15) = 0xC7; *cast(char*)(code + 16) = 0xC2; *cast(char*)(code + 17) = 12; *cast(char*)(code + 18) = 0x00; *cast(char*)(code + 19) = 0x00; *cast(char*)(code + 20) = 0x00; // mov rdx, <location where data are allocated> *cast(char*)(code + 21) = 0x48; *cast(char*)(code + 22) = 0xC7; *cast(char*)(code + 23) = 0xC1; *cast(long*)(code + 24) = cast(long)data; *cast(char*)(code + 32) = 0x00; ```This instruction is wrong. Note that you are writing twice to RDX, but also that you are using `mov sign_extend imm32, reg64` instead of `mov imm64, reg64` (`0x48 0xBA`?). Third, why append an extra zero (`*cast(char*)(code + 32) = 0x00;`)? That must be a bug too. cheers, Johan
Jun 06 2022
On Monday, 6 June 2022 at 18:05:23 UTC, Johan wrote:This instruction is wrong. Note that you are writing twice to RDX, but also that you are using `mov sign_extend imm32, reg64` instead of `mov imm64, reg64` (`0x48 0xBA`?). Third, why append an extra zero (`*cast(char*)(code + 32) = 0x00;`)? That must be a bug too. cheers, JohanThanks! It seems that there is probably a "typo" from the original [source](https://github.com/vishen/go-x64-executable) that I got the code. The hex values are different however so there is only a mistake in the comment, the code normally works in the example repository (and I made a D version that works too). The padding in the end seems to be necessary else the example doesn't compile (I don't know why, I'm SUPER n00b when it comes to machine language, I don't know almost anything!). I'm also not sure how the "encode" will be for `mov imm64, reg64` as I tried to type what you typed in the parenthesis and it doesn't seem to work.
Jun 06 2022
On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote: In case someone is wondering, I found an answer in another forum. The code is the following: ```d import core.stdc.stdio; import core.stdc.string; import core.stdc.stdlib; import core.sys.posix.sys.mman; void putbytes(char **code, const char *bytes) { uint bt; for (int i = 0, n; sscanf(bytes + i, "%x%n", &bt, &n) == 1; i += n) { *(*code)++ = cast(char)bt; } } void putdata(char **code, char** data) { memcpy(*code, data, (*data).sizeof); *code += (*data).sizeof; } extern (C) void main() { char *data = cast(char*)mmap(null, cast(ulong)15, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); strcpy(data, "Hello world!\n"); char *code = cast(char*)mmap(null, cast(ulong)500, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0); char *pos = code; // Call the "write" and "exit" system calls putbytes(&pos, "48 C7 C0 1 0 0 0"); // mov rax, 0x01 (write syscall) putbytes(&pos, "48 C7 C7 1 0 0 0"); // mov rdi, 0x01 (stdout) putbytes(&pos, "48 C7 C2 D 0 0 0"); // mov rdx, 13 (string length) putbytes(&pos, "48 BE"); // movabs rsi, data (string address) putdata(&pos, &data); putbytes(&pos, "0F 05"); // syscall putbytes(&pos, "48 C7 C0 3C 0 0 0"); // mov rax, 0x3C (exit syscall) putbytes(&pos, "0F 05"); // syscall // Execute the code (cast(void* function())code)(); } ```
Jun 08 2022
On Monday, 6 June 2022 at 15:13:45 UTC, rempas wrote:I tried to find anything that will show code but I wasn't able to find anything expect for an answer on stackoverflow. I would find a lot of theory but no practical code that works. What I want to do is allocate memory (with execution mapping), add the machine instructions and then allocate another memory block for the data and finally, execute the block of memory that contains the code. So something like what the OS loader does when reading an executable. I have come with the following code: [...]If you know the instructions ahead of time LDC and GDC will both let you put a function in it's own section, and you can then use some linker magic to get pointers to the beginning and end of that section.
Jun 08 2022