written by Walter Bright
December 27, 2011
A while back, I targeted the D programming language compiler to generate 32 bit code for OS X, and 64 bit code for Linux. Since for the last several years all Mac OS X machines include 64 bit CPUs, the obvious next step is to target 64 bit code for OS X.
Having a debugged and working 32 bit port to OS X, and a debugged and working 64 bit code generator, this should be straightforward. (Hah!)
The object file format for OS X is the Mach-O format that is unique to OS X (the Linux universe uses ELF format). The first step is to convince my dumpobj utility to recognize and dump the Mach-O 64 format.
Yes, I know there are existing off-the-shelf object file dumpers, but by writing my own I learn how the file format really works. This was a quick and straightforward job, as the Apple documentation on it is good. The next job was to retarget the obj2asm disassembler. obj2asm already was doing 64 bit instructions, so it just had to learn the Mach-O 64 format. Again, this was simple, and soon I had the tools to examine the output of gcc.
The D compiler can generate library (.a) files directly, so the next job was to figure out that format and adjust the compiler as needed. This turned out to be trivial, as the .a format was the same as for 32 bit object files, it just had to deal with Mach-O 64 files. With the knowledge I gained from dumpobj and obj2asm, this was quick work.
Tackling the compiler output involves:
- Adjusting the object file generator to output Mach-O 64 format. This was easy, now that I'd learned the format by adjusting dumpobj.
- Conforming to the 64 bit ABI. Fortunately, OS X follows the same C ABI as 64 bit Linux does. This meant that all the agony I went through figuring out how to make variadic functions work worked out of the box for OS X. I didn't have to change a thing. Phew!
- Fixups. When a reference is made in a source file to a symbol, such as printf, the compiler doesn't know what address to use for the symbol. Instead, it outputs a “fixup record” that consists of a symbol, and a location in the object file that must be “fixed up” with the real address of that symbol when it becomes known. These become known by the linker when it combines object files and resolves symbols like printf and later the loader to adjust those addresses to where the program actually winds up in memory.
Fixup records and schemes are different for every system. They're usually defined by the object file format, which is called Mach-O for OS X.
Fixups used to be simple. Back in the early MS-DOS days, there weren't any fixups at all for COM programs. Just copy the bytes into memory, and jump to it. (Any relocation was done by the hardware through the segment registers.) Those glory days didn't last long. Now we've got multiple addressing modes, multiple sections, shared libraries, position independent code, offsets known and unknown by the compiler, etc.
Just for starters, there's:
- data referencing other data
- data referencing code
- code referencing code
- code referencing data
and throw in the other cases, and there's quite a smorgasbord of quirky and obscure detail.
Let's check in on the Apple documentation in the document entitled “Mac OS X ABI Mach-O File Format”:
For the x86-64 environment, the r_type field may contain any of these values: X86_64_RELOC_BRANCH CALL/JMP instruction with 32-bit displacement. X86_64_RELOC_GOT_LOAD MOVQ load of a GOT entry. X86_64_RELOC_GOT Other GOT references. X86_64_RELOC_SIGNED Signed 32-bit displacement. X86_64_RELOC_UNSIGNED Absolute address. X86_64_RELOC_SUBTRACTOR Must be followed by a X86_64_RELOC_UNSIGNED relocation.
Saying it's a “little sparse” is kind, and it still leaves off some other types found in the header files:
X86_64_RELOC_SIGNED_1 X86_64_RELOC_SIGNED_2 X86_64_RELOC_SIGNED_4
But figuring this stuff out is why I get paid the big bucks (!), so tally-ho. The first stop is to write little bits of code in C like:
int x; int *px = &x;
compile them with gcc, and then dump the output with dumpobj. Trying various combinations gives a good starting point. For example, I learned that a fixup within the same object file is one thing, a fixup to an address in another module is entirely different, and if the location to be patched is in code, an extra level of indirection is needed. (This is what the GOT is — a Global Offset Table — the code gets patched with an index into the GOT which then provides the physical address.)
gcc is of limited value, though, as it only emits a rather small subset of the possible fixups, for example, it always uses symbolic offsets rather than section offsets, and does not have COMDAT sections.
The addressing modes can be tricky. The x86_64 can address with various signed offsets to the program counter for code, a 32 bit signed offset from the program counter for data, and an absolute 64 bit address.
Getting all this right simply requires trial and error, along with a few swags (Scientific Wild-Ass Guesses). The compiler is adjusted to put out a fixup for a certain case, and the program is run. If it crashes at that point, then the procedure is to try different fixup types and different offsets. (The offset portion may seem obvious, but it takes a bit of trial and error to determine if it is looking for the offset from the start of the section, or the vm offset, or the offset from the patch location to the referred location. And, is the offset counted from the beginning of the instruction, the end, or the location within the instruction?)
I finally figured out what those X86_64_RELOC_SIGNED_ fixups were for. They were for a 32 bit signed program counter offset from a code instruction, and the location of the offset was 1, 2 or 4 bytes back from the end of the instruction! (With ELF, one just subtracted 1, 2 or 4 from the offset. It took me a lot of hair pulling to figure out why that didn't work on Mach-O.)
Once I figured this out, the rest was finding all the places in the compiler where fixups were dealt with. It isn't quite as simple as it sounds, as (for example) the GOT_LOAD fixup only worked with some instructions, like:
If you tried it with:
it failed horribly. So the latter had to be written as:
MOV r,mem CMP reg,r
Even byte MOVs didn't work, they had to be redone as full size MOVs.
The great thing, though, is that D has built up a rather large test suite over time, so if that goes through without error, it's pretty well nailed down.
There has been an explosion of new programming languages being created lately. But interestingly, very few generate native code — they target a bytecode virtual machine of one sort or another. The reason is straightforward. Writing a native code generator is a lot of work. Not many people seem inclined to do that, but it's one thing I enjoy doing. Even using an existing back end (like gcc or llvm) is a lot more work than targetting a virtual machine.
The OS X 64 bit target is a perfect example. Complex code, incomprehensible documentation, what's not to like? Who needs to buy puzzles at the store?
Thanks to Andrei Alexandrescu for his helpful comments on this.