What's in This Chapter
- How linkers work.
- How to design programs in a modular fashion to best exploit linker operation.
- Library searching.
- How to best exploit OPTLINK's features.
Linker OperationThe purpose of a linker is to convert object files into an executable form by resolving address references that span object module boundaries. This makes it possible to combine a number of distinct object modules into a single executable program. In the absence of such a capability, it would not be possible to create today's complex systems.
The file format accepted by the linker is rigidly defined; each file contains all information required to resolve all address references contained within it, although this may be simply a reference to some "external" symbol. As the linker processes successive object modules, such external references may be resolved. If any remain unresolved after the last module has been processed, the link operation has failed.
Modularly Designed ProgramsMost programs are modularly organized into relatively small sections that perform a single function or procedure. Each of these smaller sections may, in turn, call upon others, so that the final program becomes hierarchical.
Organizing programs in this manner maximizes the opportunity to re-use code. For example, once you (or someone else) have created a function to input a single keystroke from the keyboard, you never need to re-create it. All you have to do is to refer to the function whenever you need to get keyboard input.
For this strategy to work, it's necessary that organization and formats follow certain rules. These rules were set forth, for "object modules", by Intel in their specification for Object Module Format (OMF), and have been extended by other firms (notably Microsoft and Borland).
In addition to making it easy to re-use common functions and procedures, modular design has another significant advantage:when you change a program, only those functions that were actually changed need to be re-compiled. The entire collection of object modules that make up your program are re-linked.
Source File DesignTo organize a program into a series of object modules, the place to start is with the program's source file( s). Most high-level language compilers create a single object file from a single source file, no matter how many different functions or procedures that file contains.
Originally, if you placed all your functions and procedures into the same source file, the compiler would put them all into a single object module. Then, when you need only one of those object modules later, you'll find that all the other object modules are loaded along with the one you wanted. To overcome this, it was necessary to break the source file into a number of smaller source files, each containing only one (or a few, related) functions. Organizing source files in this manner tended to complicate maintenance of the program, but simplified re-use of the code.
The preceding paragraph was true until the introduction of "smart linking". It is now possible to generate encapsulated functions within a single object module, yet have only those functions which actually are used by a specific program linked into the executable file. OPTLINK recognizes the special records that make this possible and treats them correctly. However, not all language processors yet generate the special COMDAT records that make smart linking work.
A compromise that addresses both problems is to combine the smaller source files into a single larger file for storage and while editing, but then split it into smaller files before compiling or assembling it. Details of doing this are outside the scope of this manual; the thing to keep in mind is the trade-off between ease of maintenance and ease of re-use, so that you can plan your projects for maximum effectiveness in both areas.
When you run source files through a compiler or assembler, the normal output is one or more object modules per file. These individual modules must still be linked; that is, they must all be combined into a single executable program, with memory locations assigned to each symbol in each module, and all references within each module to symbols which are defined in other modules must be resolved. This task is a function of OPTLINK.
Object Module Library DesignWhen a complex program is properly divided into its component sections, the number of object modules involved can easily become huge. Just keeping track of all these modules can become a serious problem. To solve this problem, the concept of an "object module library" or "library" was invented. A library is a collection of object modules combined into a single file, together with an index that makes it possible to quickly locate any module contained in the library.
Most high-level languages include one or more "run time library" files as a part of their package, and make extensive use of the modules it contains. In addition, general software products such as screen display utilities may be sold as add-on libraries. And you can combine your own object modules into libraries as well. Thus using libraries is the way to simplify tracking the numerous modules involved in a complex program.
Library SearchingWhile collecting object modules into library files makes it simpler to keep track of the many modules involved in a typical program, the use of library files imposes some constraints on the linker program's operation. Only those modules that are required by the program being linked need actually be included in the final executable file. A library contains some form of indexing so that the linker can locate requested modules readily, without having to search through every object module stored in the library.
Typically, an object module's original source contains references to symbols which may be in other object module's source compiled at approximately the same time, and also contain references to symbols defined in modules within the libraries being used. These are known as "external" symbol references, and each time an actual memory location is assigned to one of these symbols, it is said to "resolve" the symbol reference.
If any external symbol references remain unresolved after all object modules have been processed, the linker searches for their definitions within any libraries that have been specified. Each object module specified to the linker for inclusion (whether specified explicitly, or by being located during a library search) may resolve previous external symbol references, but it may also introduce new ones that will require resolution.
OPTLINK will search any number of libraries in order to resolve external references that remain after all supplied object modules have been processed. The library files to be searched may be specified either by means of commands embedded in the object modules, or by explicit commands to OPTLINK. If any references remain unresolved after all libraries have been searched, OPTLINK reports an error, but can still create the executable file (see /ONERROR and /ERRORFLAG options).
A significant difference between OPTLINK and other linkers is that OPTLINK always resolves external references from the first library (in the supplied or default list of library files) that contains a definition, even if the reference itself occurs in a subsequent library. Microsoft LINK and many other linkers resolve such a reference by using the first definition found after the reference.
OPTLINK FunctionalityThis section reviews OPTLINK's functional processes performed when attempting to create a program. It performs a number of sequential actions to accomplish a successful link.
Reading Object ModulesThe first action taken by OPTLINK (after setting all applicable switches for the current run) is to read all .obj files, in the sequence in which they are specified in the FILE command. You control the sequence in which files are read by the sequence in which you provide them to OPTLINK. In some cases, this is significant. As OPTLINK reads the files, it collects information about sizes, segments, and symbols, for use in the later stages of linking.
OPTLINK searches for each file first in the current working (default) directory, then in the directories named by the OBJ environment variable, and finally in the directories named by the LIB environment variable.
If a requested object module cannot be found, operation terminates with a fatal error.
Search library linkAfter all object modules named in the input have been read, OPTLINK then searches through all applicable library (.lib) modules while any EXTERN symbols remain undefined.
The modules first searched are those named in the command line or supplied interactively, followed by those named in the input data (searched in the order in which they were named). The first PUBLIC symbol encountered that matches an undefined EXTERN is used and any subsequent occurrences of that symbol as a PUBLIC are ignored, so the sequence in which libraries are searched may have significant impact on a program's operation.
After all named libraries have been searched, or if no libraries are named in the input, the libraries called for by internal records of the object modules (i. e. requested by the translator which generated the object modules) are searched unless this capability has been turned off by the use of the /NODEFAULTLIBRARYSEARCH command.
In library searches, when no path is specified with the library name, the current default directory is searched first when looking for any specific .lib file. If none is found there, the path( s) listed in the LIB environment variable are used. If a requested library module cannot be found, a warning message is issued but operation continues.
Assigning Segment AddressesPhysically, every segment must begin on a paragraph boundary (an address of which the low four bits are all zero). Every segment referenced in a program is identified by two things:a segment name and a class name.
The segment name is assigned by you, if you write in assembly language, when you use the SEGMENT/ENDS declarations. If the object module was generated by a high level language, the segment name is assigned by the translator. The class name is also supplied by you, by means of a modifier you may add to the SEGMENT declaration. If given, the class name is enclosed in single quotes, as in:
CODESEG SEGMENT PUBLIC BYTE 'CODE'In this example, the segment name is CODESEG and the class name is CODE.
Unless a /PACKCODE or /PACKDATA option switches are used, OPTLINK combines segments having matching segment and class names based on their combine type, which may be PUBLIC, COMMON, PRIVATE, or STACK. PUBLIC segments combine into a single bigger segment; COMMON segments are assigned the same address (that is, all use the same memory, at the same time), and PRIVATE segments are not combined at all. The final attribute of a segment is its alignment, which may be BYTE, WORD, DWORD, PARA, or PAGE (corresponding to boundaries at multiples of 1, 2, 4, 16, or 256, respectively).
Within each program, all segments of the same class are loaded in memory adjacent to each other. If no alignment is specified, PARA is used.
Segment re-orderingSegments may be, and often are, collected into groups by language translators. The difference between segments in a group and those which are not is that, in a group the applicable segment register is not changed when moving from one segment to another within the group. If groups are not used, code to change the segment register is required whenever the segment changes. Like segments themselves, the size of any group may not exceed 65,536 bytes.
Collecting relocation informationWhen all the segments, groups, combine types, and classes have been sorted out and processed appropriately, addresses are assigned to all the segments. With all addresses known, relocation tables can be generated (internal to OPTLINK) which are then used to reconcile address references.
Assigning public addressesAll public symbols within any object module are identified as such, and only such public symbols can be addressed from outside the module. OPTLINK keeps track of the segment and offset values for each public symbol encountered, both while reading the object modules and when a library search locates a module that contains a needed extrn symbol declared to be public.
Reconciling address referencesEvery standard .obj module normally contains several records devoted to relocation information. These records, identified in the OMF documentation as type FIXUPP, are universally called "fix-ups."
Fix-ups are processed after all segment addresses and symbol references are known, to perform final reconciliation of cross-module references.