C++ Language Implementation

The C++ language standard (C++98) is:

"Information Technology - Programming Languages - C++", ISO/IEC 14882-1998

A more readable definition and tutorial on C++ is:

The C++ Programming Language Bjarne Stroustrup, Addison-Wesley

Preprocessor
Implementation-defined Behavior
Language Conformance
Templates
The __with Statement
Unimplemented Features
C++ Extensions

Preprocessor

The preprocessor used is the same one as for C.

Implementation-defined Behavior

This section documents the items listed as "implementation-defined" in C++98. The sections are the spec paragraph numbers.

4.7-3

When an integer is converted to a smaller, signed integer, the value is unchanged if it can be represented in the smaller type. Otherwise, the compiler copies the corresponding low order bits of the larger type to the smaller type. The high order bit of the smaller integer becomes the sign bit, with whatever value it received from the corresponding bit of the larger integer.

4.8-1

When a less precise float type is converted to a more precise float, the value is unchanged. On the other hand, when a more precise float is converted to a less precise float, and if the value is within the representable range, then the result may be either the next higher or the next lower representable value, depending on the setting of the rounding mode. The default rounding mode is to round to the nearest representable value.

4.9-2

If a conversion is from an integral type to a floating type, and if the value is in the range that the compiler can represent, but not represent exactly, then the compiler rounds the result according to the current rounding mode. The default rounding mode is to round to the nearest representable value.

5.2.10-3

reinterpret_cast simply 'paints' a new type on to an existing bit pattern. No change in the bit pattern is performed.

5.2.10-4

The bits are simply transferred to the integral type, 0 extended if the integral type is larger than the pointer type.

5.2.10-5

Mappings between pointers and integral types are simply a one to one transfer of bits. No changes to the bits are made.

5.6-4

When dividing two integers with the / operator, where the result is inexact and one and only one of the operands is negative, the result is the smallest integer greater than the algebraic quotient (such as -23/4 = -5). The result of the % operator, when the division is inexact and only one of the operands is negative, yields a negative result (such as -23%4 = -3). If the right operand of the % or / operator is 0, the result for integers is a compile time error or a runtime exception. For floating point divide by 0, the result is an overflow.

5.8-3

When the left operand of the right-shift operator (>>) is a signed type and negative, the implementation performs a signed right shift.

9.6-1 Bit Fields

Declare a bit-field as any integral type. The size of the declared type determines the word size for that bit-field. Consequently, a word may be 8, 16, or 32 bits wide.

A sequence of bit-fields with the same word size can be packed into a structure. No bit-field may be wider than its word size. If a bit-field straddles a word boundary, it is placed in the next word. For example, the bit-field declaration

struct bits
{
  int b1: 24;
  int b2: 16;
  int b3: 16;
  int b4: 24;
};

is represented in memory as:

31.......0
unused..b1
b3......b2
unused..b4

The compiler allocates bit-fields beginning with the low-order bit of a word. When packing bit-fields within a structure, the compiler uses an unnamed field with a width of 0 to close out the current word. A bit-field with a different word size from the preceding bit-field causes this closing out to happen automatically, just as a nonbit-field member of the structure does.

9.6-3

Plain (neither explicitly signed nor unsigned) bit fields are signed.

Language Conformance

DMC++ is not yet fully compliant with C++98. The more significant differences are listed.

Note

To enforce C++98 compatibility use the -A compiler option. For a discussion of the changes this produces in the compiler's actions, see Compiling Code.

Alignment

For 16-bit applications, types and structure members are aligned on word (16-bit) boundaries; for 32-bit applications, they are aligned on double-word (32-bit) boundaries. Win32 applications align on 8-byte (64-bit) boundaries.

ARM p. 22 (cf. p. 7; 3.2.1c), Gray pp. 486-7

Within structures, you can set alignment on byte, word (two bytes, the default), or long word (four bytes) boundaries. To suppress the default alignment within structures, pass the -a[1|2|4|8] option to the compiler. This ensures that structure members are aligned on the specified boundary. Such structure realignment is useful for defining structures that map onto particular hardware devices or predefined data elements. Alignment control operates only within structures; everything else is aligned on word boundaries.

The compiler does not generate structures of size 0 if there are no nonstatic data members; the minimum size of a structure is 1 byte. This minimum size prevents new() from returning zero when it allocates an instance of a structure.

Warning: Compile each source file referencing a given structure with the same type of alignment. If two files that reference the same structure are compiled with different alignments, the compiler does not detect it, but you will get unpredictable error messages from the linker or at run time.

Anonymous unions

C++-style anonymous unions are supported in C-compiled code.

C++ style inline functions

Digital Mars's C compiler supports C++-style inline functions in C code. To declare a C function as an inline function, use the extended keyword __inline instead of inline.

How Digital Mars C++ Implements C++

These sections discuss implementation-specific features of C++ that are not features of C.

The _new_handler variable

The _new_handler variable lets your program call a special function, which you provide, when a call to new fails due to lack of memory. The special function can free up memory in a way that preserves the run-time integrity of your program. If you use _new_handler, you do not need to check the return value of new for failure.

The variable _new_handler is declared to be a pointer to a function. It is declared in the C++ Standard Library, and is NULL by default. Its declaration is:

void (*_new_handler)(void);

When new fails, it tests whether _new_handler points to a function or if _new_handler is NULL. If _new_handler contains a value, the function it points to is called. If properly written, the function will reclaim memory until there is enough to satisfy the original request to new. If _new_handler is NULL, new returns a NULL pointer.

There are two ways to set _new_handler; directly as in:

void newfailed_ handler(void); // prototype of handler
_new_handler = newfailed_handler; // set _new_handler

or through the set_new_handler() library function as in:

set_new_handler(newfailed_handler);

Static constructors and destructors

Static constructors/destructors are implemented in a manner compatible with Microsoft Visual C++. For each module that has a static constructor or destructor, a pointer to the constructor or destructor function is placed in a special segment (XIFU, XIFL, or XIFM). The startup code cinit.obj then looks at the constructor segment and calls the constructors. The exit function calls the destructors by looking at the destructor segment XOF (for near data models) or XO (for far data models).

Static constructors are called in the reverse of the order in which they were linked. Consequently, constructors in the standard library are called first. Static destructors are called in the order they were linked. Destructors in the standard library are called last.

When the run-time library prints a run-time error message, the program is aborted without calling any static destructors. Because it is possible that a serious error has occurred, to limit the damage it is preferable to immediately stop the program. For example, when the heap has been corrupted, halt execution.

Exception handling

Exception handling is not enabled by default, because many C++ programs do not use any exception handling, but support for it will add significantly to the size of the resulting program. To enable exception handling, compile with the -Ae compiler option.

Run time type identification

Run tyme type identification (RTTI) is not enabled by default, because many C++ programs do not use RTTI, but support for it will add significantly to the size of the resulting program. To enable exception handling, compile with the -Ar compiler option.

RTTI adds a new member to the virtual function table, a pointer to the type information stored in the table. Thus, classes cannot share vtbl[] s (an optimization the compiler performs by default) when compiling with RTTI. The pointer to the type information is located at a negative offset relative to the start of the vtbl[]; this preserves compatibility with the Microsoft Object Model, which does not support RTTI.

Internal Limits

The following limits apply:

Limits
Max length of an identifier	254
Max length of an external identifier	254
Number of arguments to a macro	127
Depth of nested #include directives	number of file handles

The compiler sets no limits on the following code elements, but operating system or hardware requirements may impose practical limits:

Complexity of a declaration
Length of macro replacement text
Number of arguments to a function
Number of cases in a switch
Number of characters in an argument to a macro
Number of characters in a line
Number of characters in a string
Number of command line arguments
Number of #if directives that can be nested
Number of #include paths
Number of subscripts in an array

The header limits.h specifies the largest and smallest values of the integral types.

There are 3 basic floating point types.

Floating Point Types
Type	Default Size	Format
float	4 bytes	IEEE single precision
double	8 bytes	IEEE double precision
long double	8(10) bytes	IEEE double (extended) precision

The header floating.h, defines the implementation defined characteristics of the floating types.

Hexadecimal floating-point constants

C99 hexadecimal floating point constants are allowed in C++.

Character constants

The mapping of characters in the source character set to the execution character set is one-to-one. The basic execution character set consists of 256 extended-ASCII characters: 127 U. S. ASCII characters plus system defined extensions. All integer character constants or escape sequences can be represented with the basic execution character set.

ARM p. 10, Gray pp. 480-1

Multi-character constants have type int and can contain between one and four characters from the execution character set. If the constant has more than four characters, then the compiler generates an error. If a character string of three or four characters is assigned to a short, then the last two characters are used in the assignment. For example:

short foo = 'ABCD';

will assign CD (0x4344) to foo.

If the character following the backslash character is not one of the defined escape sequences, then the compiler generates an "undefined escape sequence" error.

String literals

If a string literal begins with the sequence \p or \P, the compiler treats it as a Pascal string. The compiler replaces the \p or \P with the length of the string. Null, '\0', is not appended to Pascal strings, as it is with C strings.

String literals are distinct. They do not overlap in memory, but it is good practice to not modify them at runtime.

Integer Variables

The size of an integer is memory-model dependent, but it is always at least two bytes. An integer is signed by default. For example:

int x, y, z;
int u = 3000;
signed int v = -56;// signed int == int
unsigned int r = 0xf000;

For 16-bit memory models, an int is two bytes; for 32-bit memory models, an int is four bytes.

Short integers

Short integers are two bytes in Digital Mars C++. For example:

short a, b;
short int c = -45;
signed short d = 2145;
unsigned short int e = 0x123f;

Long integers

Long integers are four bytes. For example:

long a, b, c = 109;
long f = -1L;
signed long g = 67;
unsigned long int h = 0x0045123f;

Long long integers

long long and unsigned long long integers in 32-bit compilations are eight bytes. Neither are supported in 16 bit compilations. For compatibility with other compilers, __int64 and unsigned __int64 are synonyms for long long and unsigned long long.

Floating-Point Variables

As with integers, the size of float types is system-dependent. All floating-point numbers require at least four bytes.

Single precision variables

Single precision floating-point numbers are stored in the data type float. A float uses four bytes. For example:

float fl1, fl2;
float fnum = 1.56;
float gnum = 1.23E3;

Double precision variables

Double precision floating-point numbers are stored in the data type double. A double may be larger or the same size as a float, but never smaller. For example:

double dp1, ext;
double dnum = 11.435;
double fnum = 121.23E4;

In Digital Mars C++, a double is an eight byte unit.

Long double precision variables

The long specifier may be added to double to make a type long double. A long double may be larger or the same size as a double, but never smaller. For example:

long double ld = 1.678E33;

In Digital Mars C++, a long double is the same size as a double, eight bytes, for all 16 bit memory models and for all 32 bit DOS memory models. For 32 bit Windows, a long double is 10 bytes.

Character Variables

Character variables are used to store single ASCII characters, such as the letter 'c'. They are declared using the keyword char. Each character is one byte in size. The range of char type values depends on whether the character is signed or unsigned. In Digital Mars C++ the char data type is signed and can store values between -128 and +127. Use of char types above 127 requires declaring the variable as unsigned char. The -J compiler option makes char types unsigned.

There are several ways to denote char values. One way uses the character flanked by single quotes, such as 'A'. The second way uses the character's ASCII code preceded by a backslash and flanked by quotes, such as '\65'. Alternatively, within an assignment statement, you can simply use the integer value of the ASCII code, such as 65. Character variables can also be expressed in the same formats as character constants, as discussed earlier. Examples of each of these ways of denoting char values are included here:

char a_character = 'A';
char a_character = '\65';
char a_character = 65;
signed char a_character = 0x41;
unsigned char a_character;

Character strings

Strings are initialized using the same syntax as character arrays. Alternatively, the following syntax can be used:

char s[6] = "hello";

Digital Mars C++ supports this syntax with automatic arrays as well as with global and static arrays. In this example, the 6 is optional. When the size of the array is not given, the compiler allocates enough space to hold the string and its terminating null character.

Digital Mars C++ also supports the wide-character string type, wchar_t, for example:

wchar_t s[] = L" hello";

Wide char types are used to refer to wide-character strings; they are equivalent to unsigned shorts (two bytes). They are used to hold character sets where individual characters do not fit into a single byte. Wide char types are needed, for example, when attribute information, such as color or font, is encoded along with the character's ASCII value. The type of wchar_t is defined in stddef.h.

Standard Conversions

In a variety of circumstances, the compiler must convert one numeric type into another. This section discusses the standard type conversions: integral promotions, integral conversions, floating-point conversions, conversions between floating-point and integral types, and arithmetic conversions.

Integral promotions

Digital Mars C++ follows ANSI C in that integral promotion is "value-preserving." See ARM p. 32 for an in-depth discussion of "value-preserving" vs. "unsigned-preserving" integral promotions, and their relationship to earlier C++ and C implementations.

ARM pp. 31-2, 322, Gray p. 489

Explicit Type Conversion

In general, the compiler promotes and sign-extends smaller integral types to larger integral types without losing information. In other words, the low order bits are copied to the corresponding positions of the larger integral type, and the sign bit of the small type is copied to the sign bit of the larger type.

In C++, a pointer or reference to an object of a const type can be cast into a pointer or a reference to a non-const type. The resulting reference still applies to the original object. In Digital Mars C++, it is possible to modify the value of the constant object through the resulting pointer or reference. This works only if the original pointer or reference contained a valid address. In this way, you can cast away the "constness" of an object.

ARM p. 71, 37, Gray pp. 500-2

Overflows in numeric expressions

Overflows in integral expressions are ignored. Overflows in floating point expressions produce infinity according to the rules of IEEE arithmetic.

Additive operators

The compiler treats memory as linear address space. You can reference outside the bounds of an array without the compiler detecting it, for example:

int a[10];
void f()
{
  int* p = &a[10];
  *p = 0xdeadbeef;
}

You can subtract two pointers that point to objects in the same array in order to find the number of elements separating the operands. The result is of type ptrdiff_t, which is defined as long in <stddef.h>. It is an error to subtract pointers which point to objects of differing types. However, explicit casting allows the operations and circumvents the error.

ARM p. 73, Gray p. 503

Class member access

If a member of a union is accessed after a value has been stored in a different member of the union, the compiler does not convert the type of the member stored into the type of the member doing the access. For example:

union u_tag
{
  int ival;
  float fval;
} u_obj;
int i;
u_obj.fval = 4.0;
i = u_obj.ival;

assigns 0x40800000 to i, because the bit pattern stored in u_obj.fval is assigned to variable i with no conversion.

ARM p. 53

Sizeof

In Digital Mars C++, the type, size_t, is defined as an unsigned int in stddef.h.

ARM p. 56, Gray p. 497

C++ Extensions

The __typeinfo expression

Digital Mars C++ implements the expression type:

__typeinfo (expression)

The syntax for __typeinfo is identical to the syntax for sizeof expressions. __typeinfo returns an int, whose bit settings specify the type of expression:

__typeinfo
1	expression is a class/struct/union
2	expression has a destructor
4	expression has a virtual destructor

Other bits are reserved, and should not be assumed to contain a meaningful value.

__typeinfo, like sizeof, yields information about the static type, not the dynamic type, so:

class A {. . .};
class B : A {. . .};
class A *p = new B;
int i = __typeinfo (*p);
// returns information on A, not B
int s = sizeof (* p);
// returns information on A, not B

Since the compiler generates different code for objects like array new/ delete depending on the presence of a destructor, __typeinfo can be useful in code that must manipulate storage allocation in a robust manner.

New

If a class of objects has a constructor, an object of that class can be created using the new operator if suitable arguments are provided or if there is a default constructor for the class. In either case, the new operator allocates memory for the object.

ARM p. 61, Gray p. 499

Miscellaneous Declarations

Enumeration declaration

In 16-bit compilations, the size of an enumeration is always the same size as an int. In 32-bit compilations, the size of an enumeration is always 32 bits.

ARM pp. 114-5, Gray p. 523

Class Members

The compiler allocates nonstatic data members of a class in order of appearance in the source file, regardless of intervening access specifiers.

ARM p. 173,241, Gray p. 545

Pointers to Functions

In Digital Mars C++, function pointers can be declared as pointing to functions with FORTRAN, Pascal, or C linkages by using the __fortran, __pascal, or __cdecl keywords, respectively. Attempting to assign a pointer, currently pointing to a C++, C, FORTRAN, or Pascal function, to a pointer to a different function type generates a compiler error. In particular, a pointer to a C++ function cannot be assigned to a pointer to a C function.

The declaration of a pointer to a function must specify both the return type and the parameter list. For example:

int memcmp( void *, void *, unsigned);

requires a compatible function pointer to be declared as:

int (*fp)(void *, void *, unsigned);

The portable way to create a pointer to a C function from within C++ is to use:

extern "C"
{
  int (*fp)(int);
}

The alternative, nonportable, syntax is:

int (__cdecl *fp)(int);

Similarly, the way to declare a C++ function with a parameter that is a C function pointer is:

extern "C" { typedef (FP)(int); }
int foo(FP fp);

Functions with Variable Numbers of Arguments

Digital Mars C++ supports functions with variable numbers of arguments in a manner compatible with Microsoft's C++ compiler for Windows NT.

Digital Mars C++ generates code with C linkage for any function with a variable number of arguments, even if the function is explicitly declared to have Pascal linkage.

Function Prototyping

The Digital Mars C++ compiler promotes arguments in function definitions. When the promoted types of the arguments to a function definition conflict with the types of the parameters in the corresponding prototype, errors can occur.

For example, if you write a prototype for a function that takes a float argument, and define the same function using an "old style" function definition, an ANSI compiler will promote the float to a double when it compiles the definition. This will generate an error because the definition no longer matches the prototype. In other words, the function is now defined to take an argument of type double, but prototyped to take a float.

When you compile with the require prototypes -r (strict prototyping) option, Digital Mars C++ does not generate an error in cases where an old style function definition and a prototype exist for the same function. In other words, no error is generated as long as the types for the parameters in the prototype are the same as the types of the arguments to the function before promotion.

If you do not compile with -r, Digital Mars C++ considers the differing function definitions to be invalid and generates an error. The same results occur when you compile with -A (enforce the ANSI standard).

Templates

One of the more interesting features of the C++ language is the template. Templates let you define container classes and generic functions without giving up type checking. Digital Mars C++ provides for the definition, declaration, and use of both class templates and function templates, as described in ARM, Chapter 14, and Gray, Chapter 8.

Template definitions and examples

Briefly, a function template declaration states the existence of a function template definition. For example:

template <class T> square(T v);

declares the existence of a square function whose argument, v, is of type T, and whose return value is also of type T.

A function template definition provides the information needed by the compiler to generate instantiations. For example:

template <class T> square(T v)
{
  return v * v;
}

defines the square function to produce a return value by multiplying the argument, v, by itself. Observe that this is still not enough information for the compiler to generate code for a specific square operation. The needed information is provided by the context in which the square function is called. For example:

void byUse()
{
  int i = 5;
  float f = 3.14;
  i = square(i);
  f = square(f);
}

instructs the compiler to produce two specializations of the square function: one for ints and one for floats. In Digital Mars C++, specialization is the use of a template that generates code.

A class template declaration states the existence of a class template definition. For example:

template <class T> class List;

declares the existence of a List class object.

A class template definition specifies a list of members for the class template. For example:

template <class T> class List
{
public:
  T *GetFirst();
  List *GetRest();

private:
  T *pFirst;
  List *pRest;
};

where template member function definitions, which provide the information needed by the compiler to generate member definitions, are needed for GetFirst and GetRest. For example:

template <class T> T *List< T>::GetFirst()
{
  return *pFirst;
}

simply calls the internal C function, pFirst.

As with template functions, a call in a specific context is needed for the compiler to produce a specialization of a template member function.

Generating code for templates

When using templates, it is important to remember that the compiler can only generate code when the compiler encounters a specialization of a template. It cannot generate code from the template definition alone, because the compiler would then have to generate code for all possible specializations. Because the compiler generates code when it encounters a specialization of a template definition, it is possible that it will generate identical code in more than one object file.

The linker, rather than the compiler, eliminates the redundant code from multiple occurrences of the same specialization. By default, the compiler marks the code generated by each specialization as a COMDAT. The linker processes COMDATs specially, removing duplicate definitions. For more information on COMDATS, see the section on the -NC option in Compiling Code.

For the compiler to be able to generate code for a template member function, it must have access to the class template and the template member function definition, as well as the relevant specialization. Consequently, the template member function definition must be included in the compilation unit (for example, the source file and its included files) that contains the specializations of the class template.

A simple and effective way to use templates is to include template declarations with the template definitions. When this is done, the compiler will be able to generate code when it needs to and the linker will remove duplicate instance definitions using the COMDAT mechanism.

ARM p. 378, Gray p. 613

The __with Statement

For ease in porting Modula 2 code to C++, Digital Mars C++ includes a __with construct. The form __with is used instead of with to avoid name-space conflicts. The syntax of the __with construct is:

__with (expression)
  statement

where expression must evaluate to an instance of an object of a class. It is evaluated only once. Within the statement, a scope is introduced for this class. This scope is searched before all other scopes. When a member of the class is parsed, it is semantically equivalent to expression. member.

__with statements do not affect access rules. __with clauses may be nested; the rules are the same as for braces {}. Refer to class members of a previously nested __with clause without specifying the class; if an identifier is not a member of the innermost __with clause, outer __with clauses are searched automatically.

The following example shows how to resolve member references using nested __with statements.

class A
{
  int X;
 public:
  int Y;
} a;


int func(int i, A *p)
{
  A *q = p;
  p[0].y = 2;
  p[1].y = 8;
  __with (*p)
  {
    y = 4; // p[0].y = 4
    p++;   // does not affect __with's expression
    __with (*q)
    {
      if (i == 3)
        return y; // returns q->y (4)
    }
    if (i == 5)
      return x; // illegal; p->x is private
    else if (i == 6)
      return p->y; // returns 8
    else
      return y; // p->y is returned (4)
  }
}

Unimplemented Features

While Digital Mars C++ strives to be fully compatible with C++98, some features remain unimplemented, for example:

export keyword.
Enums larger than an int.

Tools

Compiling

Linking

Win32 Programming

DOS and Win16 Programming

C/C++ Extensions

Porting to DMC++