www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D programming practices: object construction order

It is a long and boring post so you may want to go right to the conclusion.

Consider the following class hierarchy:

class A
{
// fields and virtual methods
};

class B : public A
{
// fields and virtual methods
};

class C : public B
{
// fields and virtual methods
};

Here is what happens when object of type C is constructed in C++:

1) Allocate memory to store C
2) Call A.ctor on object. This initializes vtbl (with pointers to methods of A)
and passes control to user code
3) Call B.ctor on object. This initializes vtbl (with pointers to methods of B)
and passes control to user code
4) Call C.ctor on object. This initializes vtbl (with pointers to methods of B)
and passes control to user code

Let's call it bottom to top object construction.

We are studying good practices now, assume that A, B and C ctor are written
well and initialize all of its variables (or *intentionally* leave them
uninitialized).

Since base class' ctor is forcibly called prior to running user-code, it is
impossible to access uninitialized member at any time in C++:

B::B(/*...*/) : /*...*/
{
    // It is enforced by compiler that all of the A members are already
initialized by now
    // We also can not access any of the C members directly or indirectly (via
virtual functions)
}

Now let's take a look at D.

In D, we have a different object construction order:
1) Allocate memory to store C
2) Call C.ctor on object and initialize in with C.init. This includes vtbl
intialization with pointers to methods of C.
3) Give control to user-code so that user himself decides when parent classes'
ctors need to be run.

So the question is - when do we run parent class ctor: at the beginning or at
the end?

Since we are all talking about non-nullable types, we must be sure that they
are indeed fully initialized before we access them:

class B
{
    this() { foo = new Foo(); }

    Foo foo;
}

class C : B
{
    // example 1:
    this()
    {
        writefln(foo.toString()); // error, foo is not initialized yet
        super();
    }
    
    // example 2:
    this()
    {
        super();
        writefln(foo.toString()); // fine, foo is initialized
    }
}

So here is recommendation №1:
- don't access base class members before base class ctor is run

What about virtual functions? Consider the following example:

class B
{
    string toString() { return super.toString() ~ ", B: " ~ foo.toString(); }
    Foo foo;
}

class C
{
    string toString()
    {
        return super.toString() ~ ", C: " ~ bar.toString();
    }

    Bar bar;

    // example 1:
    this()
    {
        writefln(toString()); // Dang! foo and bar are not initialized but
accessed
        bar = new Bar();
        super();
    }

    // example 2:
    this()
    {
        bar = new Bar();
        super();
        writefln(toString()); // fine, foo is initialized
    }
}

So here are recommendation №2:
- initialize all your variables and call super() before you call any member
function

A consequence from recommendation 2:
- don't pass 'this' to any function or store globally before you initialize all
your variables and call super(). Static methods are ok, because they don't have
access to 'this' (unless it is passed as one of the parameters, of course).

What about virtual functions called inside base class ctor? Here is an example:

class B : A
{
    Foo foo;
    string toString() { return super.toString() ~ ", B: " ~ foo.toString(); }

    this()
    {
        foo = new Foo();
        writefln(toString()); // Dang! C.bar is not initialized yet. See below
    }
}

class C : B
{
    Bar bar;
    string toString() { return super.toString() ~ ", C: " ~ bar.toString(); }

    this()
    {
        super();
        bar = new Bar();
    }
}

And here is a gotcha: since vtbl is constructed differently in D, we have no
pure virtual function call errors. But we are able to access variables that are
not initialized yet.

Here comes recommendataion №3:
- initialize all you variables *before* you call base class ctor.

Now this is something that is different from C++, different from what we are
used to. But this is the way we need to follow to make sure our fields are not
accessed before initialized.

Conclusion
----------

Since D follows object construction order different from C++, here is a
recommended one:

class Foo : public Bar
{
    this()
    {
        // Initialize all your variables.
        // This includes leaving some of them default-initialized on purpose
(unless they are non-nullable).
        // You shouldn't not call any member fields and functions yet.

        super();

        // now do something useful (object registration etc)!

        // Your object is *fully* and *correctly* constructed by now.
        // You may call any functions without any risk of accessing
uninitialized members.
    }
}

I think this is the only correct way to follow. I even believe that it should
be statically enforced by compiler. It should certainly be if we want to see
non-nullable types in D one day.

What do you think?
Mar 06 2009