www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Interesting Research Paper on Constructors in OO Languages

reply "Meta" <jared771 gmail.com> writes:
I saw an interesting post on Hacker News about constructors in OO 
languages. Apparently they are a real stumbling block for some 
programmers, which was quite a surprise to me. I think this might 
be relevant to a discussion about named parameters and whether we 
should ditch constructors for another kind of construct.

Link to the newsgroup post, the link to the paper is near the top:
http://erlang.org/pipermail/erlang-questions/2012-March/065519.html
Jul 15 2013
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Jul 15, 2013 at 09:06:38PM +0200, Meta wrote:
 I saw an interesting post on Hacker News about constructors in OO
 languages. Apparently they are a real stumbling block for some
 programmers, which was quite a surprise to me. I think this might be
 relevant to a discussion about named parameters and whether we
 should ditch constructors for another kind of construct.
 
 Link to the newsgroup post, the link to the paper is near the top:
 http://erlang.org/pipermail/erlang-questions/2012-March/065519.html

Thanks for the link; this touches on one of my pet peeves about OO libraries: constructors. I consider myself to be a "systematic" programmer (according to the definition in the paper); I can work equally well with ctors with arguments vs. create-set-call objects. But I find that mandatory ctors with arguments are a pain to work with, *both* to write and to use. On the usability side, there's the mental workload of having to remember which order the arguments appear in (or look it up in the IDE, or whatever -- the point is that I can't just type the ctor call straight from my head). Then there's the problem of needing to create objects required by the ctor before you can call the ctor. In some cases, this can be inconvenient -- I always have to remember to setup and create other objects before I can create this one, because its ctor requires said objects as arguments. Then there's the lack of flexibility: no matter what you do, it seems that anything that requires more than a single ctor argument inevitably becomes either (1) too complex, requiring too many arguments, and therefore very difficult to use, or (2) too simplistic, and therefore unable to do some things that I may want to do (e.g. some fields are default-initialized with no way to specify the initial values of the fields, 'cos otherwise the ctor would have too many arguments). No matter what you do, it seems almost impossible to come up with an ideal ctor except in trivial cases where it requires only 1 argument or is a default ctor. On the writability side, one of my pet peeves is base class ctors that require multiple arguments. Every level of inheritance inevitably adds more arguments each time, and by the time you're 5-6 levels down the class hierarchy, your ctor calls just have an unmanageable number of parameters. Not to mention the violation of DRY by requiring much redundant typing just to pass arguments from the inherited class' ctor up the class hierarchy. Tons of bugs to be had everywhere, given the amount of repeated typing needed. In the simplest cases, of course, these aren't big issues, but this kind of ctor design is clearly not scalable. OTOH, the create-set-call pattern isn't panacea either. One of the biggest problems with this pattern is that you can't guarantee your objects are in a consistent state at all times. This is very bad, because all your methods will have to check if some value has been set yet, before it uses it. This adds a lot of complexity that could've been avoided had everything been set at ctor-time. This also makes class invariants needlessly complex. Moreover, I've seen many classes in this category exhibit undefined behaviour if you call a value-setting method after you start using the object. Too many classes falsely assume that you will always call set methods and then "use" methods in that order. If you call a set method after calling a "use" method, you're quite likely to run into bugs in the class, e.g. part of the object's state doesn't reflect the new value you set, because the "use" methods were written with the assumption that when they were called the first time, the values you set earlier won't change thereafter. I've always found Perl's approach a more balanced way to tackle this problem (even though Perl's OO system as a whole suffers from other, shall we say, idiosyncrasies). In Perl, objects start out as arbitrary key-value pairs, and nothing differentiates them from a regular AA until you call the 'bless' built-in function on them, at which point they become "officially" a member of some particular class. This neatly sidesteps the whole ctor mess: you can initialize the initial AA with whatever values you want, in whatever order you want. When you finally "kicked it into shape", as the cited paper puts it, you "promote" that set of key-value pairs into an "official" member of the class, and thereafter, you can't simply modify fields anymore except through class methods. This means you now have the possibility of enforcing invariants on the object without crippling the flexibility of constructing it. (Well, OK, in Perl, this last bit isn't necessarily true, but in an ideal implementation of this initialize-bless-use approach, the object's fields would become non-public after being blessed and can only be updated by "official" object methods.) In the spirit of this approach, I've written some C++ code in the past that looked something like this: class BaseClass { public: // Encapsulate ctor arguments struct Args { int baseparm1, baseparm2; }; BaseClass(Args args) { // initialize object based on fields in // BaseClass::Args. } }; class MyClass : public BaseClass { public: // Encapsulate ctor arguments struct Args : BaseClass::Args { int parm1, parm2; }; MyClass(Args args) : BaseClass(args) { // initialize object based on fields in args } }; Basically, the Args structs let the user set up whatever values they want to, in whatever order they wish, then they are "blessed" into real class instances by the ctor. Encapsulating ctor arguments in these structs alleviates the problem of proliferating ctor arguments as the class hierarchy grows: each derived class simply hands off the Args struct (which is itself in a hierarchy that parallels that of the classes) to the base class ctor. All ctors in the class hierarchy needs only a single (polymorphic) argument. This approach also localizes the changes required when you modify base class arguments -- in the old way of having multiple ctor arguments, adding or changing arguments to the base class ctor requires you to update every single derived class ctor accordingly -- very bad. But here, adding a new field to BaseClass::Args requires zero changes to all derived classes, which is a Good Thing(tm). In some cases, if the class in relatively simple, the private members of the class can simply be themselves an instance of the Args struct, so the ctor could be nothing more than just: MyClass(Args args) : BaseClass(args), myArgs(args) {} which gets rid of that silly baroque dance of naming ctor arguments as _a, _b, _c, then writing in the ctor body a=_a, b=_b, c=_c (which can be rather error prone if you mistype a _ somewhere or forget to assign one of the members). Since the private copy of Args is not accessible from outside, class methods can use the values freely without having to worry about inconsistent states -- the ctor can check class invariants before creating the class object, ensuring that the internal copy of Args is in a consistent state. The Args structs themselves, of course, can have ctors that setup sane default values for each field, so that lazy users can simply call: MyClass *obj = new MyClass(MyClass::Args()); and get a working, consistent class object with default settings. This way of setting default values also lets the user only change fields that they don't want to use default values for, rather than be constricted by the order of ctor default arguments: if you're unlucky enough to need a non-default value in a later parameter, you're forced to repeat the default values for everything that comes before it. In D, this approach isn't quite as nice, because D structs don't have inheritance, so you can't simply pass Args from derived class to base class. You'd have to explicitly do something like: class BaseClass { public: struct Args { ... } this(Args args) { ... } } class MyClass { public: struct Args { BaseClass.Args base; // <-- explicit inclusion of BaseClass.Args ... } this(Args args) { super(args.base); // <-- more verbose than just super(args); ... } } Initializing the args also isn't as nice, since user code will have to know exactly which fields are in .base and which aren't. You can't just write, like in C++: // C++ MyClass::Args args; args.basefield1 = 123; args.field2 = 321; you'd have to write, in D: // D MyClass.Args args; args.base.basefield1 = 123; args.field2 = 321; which isn't as nice in terms of encapsulation, since ideally user code should need to care about the exact boundaries between base class and derived class. I haven't really thought about how this might be made nicer in D, though. T -- I am Ohm of Borg. Resistance is voltage over current.
Jul 15 2013
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-07-16 00:27, H. S. Teoh wrote:

 In the spirit of this approach, I've written some C++ code in the past
 that looked something like this:

 	class BaseClass {
 	public:
 		// Encapsulate ctor arguments
 		struct Args {
 			int baseparm1, baseparm2;
 		};
 		BaseClass(Args args) {
 			// initialize object based on fields in
 			// BaseClass::Args.
 		}
 	};

 	class MyClass : public BaseClass {
 	public:
 		// Encapsulate ctor arguments
 		struct Args : BaseClass::Args {
 			int parm1, parm2;
 		};

 		MyClass(Args args) : BaseClass(args) {
 			// initialize object based on fields in args
 		}
 	};

 Basically, the Args structs let the user set up whatever values they
 want to, in whatever order they wish, then they are "blessed" into real
 class instances by the ctor. Encapsulating ctor arguments in these
 structs alleviates the problem of proliferating ctor arguments as the
 class hierarchy grows: each derived class simply hands off the Args
 struct (which is itself in a hierarchy that parallels that of the
 classes) to the base class ctor. All ctors in the class hierarchy needs
 only a single (polymorphic) argument.

That's actually quite cleaver.
 In D, this approach isn't quite as nice, because D structs don't have
 inheritance, so you can't simply pass Args from derived class to base
 class. You'd have to explicitly do something like:

 	class BaseClass {
 	public:
 		struct Args { ...  }
 		this(Args args) { ... }
 	}

 	class MyClass {
 	public:
 		struct Args {
 			BaseClass.Args base;	// <-- explicit inclusion of BaseClass.Args
 			...
 		}
 		this(Args args) {
 			super(args.base);	// <-- more verbose than just super(args);
 			...
 		}
 	}

 Initializing the args also isn't as nice, since user code will have to
 know exactly which fields are in .base and which aren't. You can't just
 write, like in C++:

 	// C++
 	MyClass::Args args;
 	args.basefield1 = 123;
 	args.field2 = 321;

 you'd have to write, in D:

 	// D
 	MyClass.Args args;
 	args.base.basefield1 = 123;
 	args.field2 = 321;

 which isn't as nice in terms of encapsulation, since ideally user code
 should need to care about the exact boundaries between base class and
 derived class.

 I haven't really thought about how this might be made nicer in D,
 though.

On the other hand D supports the following syntax: MyClass.Args args = { field1: 1, field2: 2 }; Unfortunately that syntax doesn't work for function calls. -- /Jacob Carlborg
Jul 16 2013
prev sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 07/16/2013 01:30 AM, deadalnix wrote:
 My policy is to require the bare minimum to construct a valid object, in
 order to avoid initialization hell.

+0.33
 Not knowing what/when to initialize thing is really painful as well. It
 also introduce sequential coupling and wrongly initialized object tends
 to explode far away from their construction point.

+0.33
 What goes in this category ? Any state that can't have any default value
 that make sense, as well as any state that is expansive to initialize.

+0.33 And to complete: +0.01 :p Ali
Jul 17 2013
prev sibling next sibling parent reply "Meta" <jared771 gmail.com> writes:
On Monday, 15 July 2013 at 22:29:14 UTC, H. S. Teoh wrote:
 I consider myself to be a "systematic" programmer (according to 
 the
 definition in the paper); I can work equally well with ctors 
 with
 arguments vs. create-set-call objects. But I find that 
 mandatory ctors
 with arguments are a pain to work with, *both* to write and to 
 use.

I also find constructors with multiple arguments a pain to use. They get difficult to maintain as your project grows. One of my pet projects has a very shallow class hierarchy, but the constructors of each object down the tree have many arguments, with descendants adding on even more. It gets to be a real headache when you have more than 3 constructors per class to deal with base class overloads, multiple arguments, etc.
 On the usability side, there's the mental workload of having to 
 remember
 which order the arguments appear in (or look it up in the IDE, 
 or
 whatever -- the point is that I can't just type the ctor call 
 straight
 from my head). Then there's the problem of needing to create 
 objects
 required by the ctor before you can call the ctor. In some 
 cases, this
 can be inconvenient -- I always have to remember to setup and 
 create
 other objects before I can create this one, because its ctor 
 requires
 said objects as arguments. Then there's the lack of 
 flexibility: no
 matter what you do, it seems that anything that requires more 
 than a
 single ctor argument inevitably becomes either (1) too complex,
 requiring too many arguments, and therefore very difficult to 
 use, or
 (2) too simplistic, and therefore unable to do some things that 
 I may
 want to do (e.g. some fields are default-initialized with no 
 way to
 specify the initial values of the fields, 'cos otherwise the 
 ctor would
 have too many arguments). No matter what you do, it seems almost
 impossible to come up with an ideal ctor except in trivial 
 cases where
 it requires only 1 argument or is a default ctor.

Having to create other objects to pass to a constructor is particularly painful. You'd better pray that they have trivial constructors, or else things can get hairy really fast. Multiple nested constructors can also create a large amount of code bloat. Once the constructor grows large enough, I generally put each argument on its own line to ensure that it's clear what I'm calling it with. This has the unfortunate side effect of making the call span multiple lines. In my opinion, a constructor requiring more than 10 lines is an unsightly abomination.
 On the writability side, one of my pet peeves is base class 
 ctors that
 require multiple arguments. Every level of inheritance 
 inevitably adds
 more arguments each time, and by the time you're 5-6 levels 
 down the
 class hierarchy, your ctor calls just have an unmanageable 
 number of
 parameters. Not to mention the violation of DRY by requiring 
 much
 redundant typing just to pass arguments from the inherited 
 class' ctor
 up the class hierarchy. Tons of bugs to be had everywhere, 
 given the
 amount of repeated typing needed.

 In the simplest cases, of course, these aren't big issues, but 
 this kind
 of ctor design is clearly not scalable.

 OTOH, the create-set-call pattern isn't panacea either. One of 
 the
 biggest problems with this pattern is that you can't guarantee 
 your
 objects are in a consistent state at all times. This is very 
 bad,
 because all your methods will have to check if some value has 
 been set
 yet, before it uses it. This adds a lot of complexity that 
 could've been
 avoided had everything been set at ctor-time. This also makes 
 class
 invariants needlessly complex. Moreover, I've seen many classes 
 in this
 category exhibit undefined behaviour if you call a 
 value-setting method
 after you start using the object. Too many classes falsely 
 assume that
 you will always call set methods and then "use" methods in that 
 order.
 If you call a set method after calling a "use" method, you're 
 quite
 likely to run into bugs in the class, e.g. part of the object's 
 state
 doesn't reflect the new value you set, because the "use" 
 methods were
 written with the assumption that when they were called the 
 first time,
 the values you set earlier won't change thereafter.

I've found that a good way to keep constructors manageable is to use the builder pattern. Create a builder object that has its fields set by the programmer, which is then passed to the 'real' object for construction. You can provide default arguments, optional arguments, etc. Combine this with a fluid interface and I think it looks a lot better. Of course, this has the disadvantage of requiring a *lot* of boilerplate, but I think this could be okay in D, as a builder class is exactly the kind of thing that can be automatically generated.
 I've always found Perl's approach a more balanced way to tackle 
 this
 problem (even though Perl's OO system as a whole suffers from 
 other,
 shall we say, idiosyncrasies). In Perl, objects start out as 
 arbitrary
 key-value pairs, and nothing differentiates them from a regular 
 AA until
 you call the 'bless' built-in function on them, at which point 
 they
 become "officially" a member of some particular class. This 
 neatly
 sidesteps the whole ctor mess: you can initialize the initial 
 AA with
 whatever values you want, in whatever order you want. When you 
 finally
 "kicked it into shape", as the cited paper puts it, you 
 "promote" that
 set of key-value pairs into an "official" member of the class, 
 and
 thereafter, you can't simply modify fields anymore except 
 through class
 methods. This means you now have the possibility of enforcing 
 invariants
 on the object without crippling the flexibility of constructing 
 it.
 (Well, OK, in Perl, this last bit isn't necessarily true, but 
 in an
 ideal implementation of this initialize-bless-use approach, the 
 object's
 fields would become non-public after being blessed and can only 
 be
 updated by "official" object methods.)

 In the spirit of this approach, I've written some C++ code in 
 the past
 that looked something like this:

 	class BaseClass {
 	public:
 		// Encapsulate ctor arguments
 		struct Args {
 			int baseparm1, baseparm2;
 		};
 		BaseClass(Args args) {
 			// initialize object based on fields in
 			// BaseClass::Args.
 		}
 	};

 	class MyClass : public BaseClass {
 	public:
 		// Encapsulate ctor arguments
 		struct Args : BaseClass::Args {
 			int parm1, parm2;
 		};

 		MyClass(Args args) : BaseClass(args) {
 			// initialize object based on fields in args
 		}
 	};

 Basically, the Args structs let the user set up whatever values 
 they
 want to, in whatever order they wish, then they are "blessed" 
 into real
 class instances by the ctor. Encapsulating ctor arguments in 
 these
 structs alleviates the problem of proliferating ctor arguments 
 as the
 class hierarchy grows: each derived class simply hands off the 
 Args
 struct (which is itself in a hierarchy that parallels that of 
 the
 classes) to the base class ctor. All ctors in the class 
 hierarchy needs
 only a single (polymorphic) argument.

 This approach also localizes the changes required when you 
 modify base
 class arguments -- in the old way of having multiple ctor 
 arguments,
 adding or changing arguments to the base class ctor requires 
 you to
 update every single derived class ctor accordingly -- very bad. 
 But
 here, adding a new field to BaseClass::Args requires zero 
 changes to all
 derived classes, which is a Good Thing(tm).

 In some cases, if the class in relatively simple, the private 
 members of
 the class can simply be themselves an instance of the Args 
 struct, so
 the ctor could be nothing more than just:

 	MyClass(Args args) : BaseClass(args), myArgs(args) {}

 which gets rid of that silly baroque dance of naming ctor 
 arguments as
 _a, _b, _c, then writing in the ctor body a=_a, b=_b, c=_c 
 (which can be
 rather error prone if you mistype a _ somewhere or forget to 
 assign one
 of the members). Since the private copy of Args is not 
 accessible from
 outside, class methods can use the values freely without having 
 to worry
 about inconsistent states -- the ctor can check class 
 invariants before
 creating the class object, ensuring that the internal copy of 
 Args is in
 a consistent state.

 The Args structs themselves, of course, can have ctors that 
 setup sane
 default values for each field, so that lazy users can simply 
 call:

 	MyClass *obj = new MyClass(MyClass::Args());

 and get a working, consistent class object with default 
 settings. This
 way of setting default values also lets the user only change 
 fields that
 they don't want to use default values for, rather than be 
 constricted by
 the order of ctor default arguments: if you're unlucky enough 
 to need a
 non-default value in a later parameter, you're forced to repeat 
 the
 default values for everything that comes before it.

 In D, this approach isn't quite as nice, because D structs 
 don't have
 inheritance, so you can't simply pass Args from derived class 
 to base
 class. You'd have to explicitly do something like:

 	class BaseClass {
 	public:
 		struct Args { ...  }
 		this(Args args) { ... }
 	}

 	class MyClass {
 	public:
 		struct Args {
 			BaseClass.Args base;	// <-- explicit inclusion of 
 BaseClass.Args
 			...
 		}
 		this(Args args) {
 			super(args.base);	// <-- more verbose than just super(args);
 			...
 		}
 	}

 Initializing the args also isn't as nice, since user code will 
 have to
 know exactly which fields are in .base and which aren't. You 
 can't just
 write, like in C++:

 	// C++
 	MyClass::Args args;
 	args.basefield1 = 123;
 	args.field2 = 321;

 you'd have to write, in D:

 	// D
 	MyClass.Args args;
 	args.base.basefield1 = 123;
 	args.field2 = 321;

 which isn't as nice in terms of encapsulation, since ideally 
 user code
 should need to care about the exact boundaries between base 
 class and
 derived class.

 I haven't really thought about how this might be made nicer in 
 D,
 though.


 T

See above, this is basically the builder pattern. It's a neat trick, giving your args objects a class hierarchy of their own. I think that one drawback of that, however, is that now you have to maintain *two* class hierarchies. Have you found this to be a problem in practice? As an aside, you could probably simulate the inheritance of the args objects in D either with alias this or even opDispatch. Still, this means that you need to nest the structs within each-other, and this could get silly after 2-3 "generations" of args objects.
Jul 15 2013
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 16, 2013 at 01:17:30AM -0700, H. S. Teoh wrote:
[...]
 	mixin template BuilderArgs(string fields) {
 		struct Args {
 			typeof(super).Args base;
 			alias base this;
 			mixin(fields);
 		}
 	};
 
 	class MyClass : BaseClass {
 	public:
 		// Hmm, doesn't look too bad!
 		mixin BuilderArgs!(q{
 			int parm1 = 1;
 			int parm2 = 2;
 		});
 		this(Args args) {
 			super(args);
 			...
 		}
 	}
 
 	class AnotherClass : BaseClass {
 	public:
 		// N.B. Looks exactly the same like MyClass.args except
 		// for the fields! The template automatically picks up
 		// the right base class Args to "inherit" from.
 		mixin BuilderArgs!(q{
 			string anotherparm1 = "abc";
 			string anotherparm2 = "def";
 		});
 		this(Args args) {
 			super(args);
 			...
 		}
 	}
 
 Not bad at all!  Though, I haven't actually tested any of this code, so
 I've no idea if it will actually work yet. But it certainly looks
 promising! I'll give it a spin tomorrow morning (way past my bedtime
 now).

Yep, confirmed that this code actually works! Here's the actual test code that I wrote: import std.stdio; mixin template CtorArgs(string fields) { struct Args { static if (!is(typeof(super) == Object)) { typeof(super).Args base; alias base this; } mixin(fields); } } class Base { private: int sum; public: mixin CtorArgs!(q{ int basefield1 = 1; int basefield2 = 2; }); this(Args args) { sum = args.basefield1 + args.basefield2; } int getResult() { return sum; } } class Derived : Base { int derivedSum; public: mixin CtorArgs!(q{ int parm1 = 3; int parm2 = 4; }); this(Args args) { super(args); derivedSum = args.parm1 + args.parm2; } override int getResult() { return super.getResult() + derivedSum; } } class AnotherDerived : Base { private: int anotherSum; public: mixin CtorArgs!(q{ int another1 = 5; int another2 = 6; }); this(Args args) { super(args); anotherSum = args.another1 + args.another2; } override int getResult() { return super.getResult() + anotherSum; } } // Test usage in a deeper hierarchy class VeryDerived : AnotherDerived { int divisor; public: mixin CtorArgs!(q{ int divisor = 5; }); this(Args args) { super(args); this.divisor = args.divisor; } override int getResult() { return super.getResult() / divisor; } } void main() { Derived.Args args1; args1.basefield1 = 10; args1.parm1 = 20; auto obj1 = new Derived(args1); assert(obj1.getResult() == 10 + 2 + 20 + 4); AnotherDerived.Args args2; args2.basefield2 = 20; args2.another1 = 30; auto obj2 = new AnotherDerived(args2); assert(obj2.getResult() == 1 + 20 + 30 + 6); VeryDerived.Args args3; args3.divisor = 7; auto obj3 = new VeryDerived(args3); assert(obj3.getResult() == 2); } Note the nice thing about this: you can construct the ctor arguments (har har) in any order you like, and it Just Works. Referencing ctor parameters of base class ctors is just as easy; no need for ugliness like "args.base.base.base.baseparm1" thanks to alias this. The ctors themselves just hand Args over to the base class: alias this makes the struct inheritance pretty transparent. The mixin line itself is identical across the board, thanks to the static if in the mixin template, so you can actually re-root the class hierarchy or otherwise move classes around the hierarchy without having to re-wire any of the Args handling, and things will Just Work. Wow. So not only this technique works in D, it's working much *better* than my original C++ code! I think I shall add this to my personal D library. :) (Unless people think this is Phobos material.) T -- People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird. -- D. Knuth
Jul 16 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 16, 2013 at 03:54:27AM +0200, Meta wrote:
 On Monday, 15 July 2013 at 22:29:14 UTC, H. S. Teoh wrote:
I consider myself to be a "systematic" programmer (according to the
definition in the paper); I can work equally well with ctors with
arguments vs. create-set-call objects. But I find that mandatory
ctors with arguments are a pain to work with, *both* to write and to
use.

I also find constructors with multiple arguments a pain to use. They get difficult to maintain as your project grows. One of my pet projects has a very shallow class hierarchy, but the constructors of each object down the tree have many arguments, with descendants adding on even more. It gets to be a real headache when you have more than 3 constructors per class to deal with base class overloads, multiple arguments, etc.

Yeah, when every level of the hierarchy introduces 2-3 new overloads of the ctor, you get an exponential explosion of derived class ctors if you want to account for all possibilities. Most of the time, you just end up oversimplifying 'cos anything else is simply unmanageable. [...]
 Having to create other objects to pass to a constructor is
 particularly painful. You'd better pray that they have trivial
 constructors, or else things can get hairy really fast. Multiple
 nested constructors can also create a large amount of code bloat.
 Once the constructor grows large enough, I generally put each
 argument on its own line to ensure that it's clear what I'm calling
 it with. This has the unfortunate side effect of making the call
 span multiple lines. In my opinion, a constructor requiring more
 than 10 lines is an unsightly abomination.

I usually bail out way before then. :) A 10-line ctor call is just unpalatable. [...]
 I've found that a good way to keep constructors manageable is to use
 the builder pattern. Create a builder object that has its fields set
 by the programmer, which is then passed to the 'real' object for
 construction. You can provide default arguments, optional arguments,
 etc. Combine this with a fluid interface and I think it looks a lot
 better. Of course, this has the disadvantage of requiring a *lot* of
 boilerplate, but I think this could be okay in D, as a builder class
 is exactly the kind of thing that can be automatically generated.

In my C++ version of this, you could even just reuse the builder object directly, since it's just a struct containing ctor arguments. But yeah, there's some amount boilerplate necessary. [...]
In the spirit of this approach, I've written some C++ code in the
past that looked something like this:

	class BaseClass {
	public:
		// Encapsulate ctor arguments
		struct Args {
			int baseparm1, baseparm2;
		};
		BaseClass(Args args) {
			// initialize object based on fields in
			// BaseClass::Args.
		}
	};

	class MyClass : public BaseClass {
	public:
		// Encapsulate ctor arguments
		struct Args : BaseClass::Args {
			int parm1, parm2;
		};

		MyClass(Args args) : BaseClass(args) {
			// initialize object based on fields in args
		}
	};


 See above, this is basically the builder pattern. It's a neat trick,
 giving your args objects a class hierarchy of their own. I think that
 one drawback of that, however, is that now you have to maintain *two*
 class hierarchies. Have you found this to be a problem in practice?

Well, there *is* a certain amount of boilerplate, to be sure, so it isn't a perfect solution. But nesting the structs inside the class they correspond with helps to prevent mismatches between the two hierarchies. It also allows reusing the name "Args" so that you don't have to invent a whole new set of names just for these builders. Minimizing these differences makes it less likely to make a mistake and inherit Args from the wrong base class, for example. In fact, now that I think of this, in D this could actually work out even better, since you could just write: class MyClass : BaseClass { public: class Args : typeof(super).Args { int parm1 = 1; int parm2 = 2; } this(Args args) { super(args); ... } } The compile-time introspection allows you to just write "class Args : typeof(super).Args" consistently for all such builders, so you never have to worry about inventing new names or mismatches in the two hierarchies. The "typeof(super).Args" will automatically pick up the correct base class Args to inherit from, even if you shuffle the classes around the hierarchy. Furthermore, since the declaration is exactly identical across the board (except for the actual fields), you could just factor this into a mixin and thereby minimize the boilerplate. The only major disadvantage in the D version is that you can't use structs, but you have to allocate the Args objects on the GC heap, so you may end up generating lots of GC garbage. If only D structs had inheritance, this would've been a much cleaner solution.
 As an aside, you could probably simulate the inheritance of the args
 objects in D either with alias this or even opDispatch. Still, this
 means that you need to nest the structs within each-other, and this
 could get silly after 2-3 "generations" of args objects.

Hmm. This is a good idea! And with a mixin, this may not turn out so bad after all. Maybe start with something like this: class BaseClass { public: struct Args { int baseparm1 = 1; int baseparm2 = 2; ... } } class MyClass : BaseClass { public: struct Args { typeof(super).Args base; alias base this; int parm1 = 1; int parm2 = 2; ... } this(Args args) { super(args); // works 'cos of alias this } } void main() { MyClass.Args args; args.baseparm1 = 2; // works 'cos of alias this args.parm1 = 3; auto obj = new MyClass(args); } Using alias this, we have the nice effect that user code no longer needs to refer to the .base member of the structs, and indeed, doesn't need to know about it. So this is effectively like struct inheritance... heh, cool. Just discovered a new trick in D: struct inheritance using alias this. :) The boilerplate can be put into a mixin, say something like this: mixin template BuilderArgs(string fields) { struct Args { typeof(super).Args base; alias base this; mixin(fields); } }; class MyClass : BaseClass { public: // Hmm, doesn't look too bad! mixin BuilderArgs!(q{ int parm1 = 1; int parm2 = 2; }); this(Args args) { super(args); ... } } class AnotherClass : BaseClass { public: // N.B. Looks exactly the same like MyClass.args except // for the fields! The template automatically picks up // the right base class Args to "inherit" from. mixin BuilderArgs!(q{ string anotherparm1 = "abc"; string anotherparm2 = "def"; }); this(Args args) { super(args); ... } } Not bad at all! Though, I haven't actually tested any of this code, so I've no idea if it will actually work yet. But it certainly looks promising! I'll give it a spin tomorrow morning (way past my bedtime now). T -- Meat: euphemism for dead animal. -- Flora
Jul 16 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
My policy is to require the bare minimum to construct a valid 
object, in order to avoid initialization hell.

Not knowing what/when to initialize thing is really painful as 
well. It also introduce sequential coupling and wrongly 
initialized object tends to explode far away from their 
construction point.

What goes in this category ? Any state that can't have any 
default value that make sense, as well as any state that is 
expansive to initialize.
Jul 16 2013
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Tuesday, 16 July 2013 at 08:19:10 UTC, H. S. Teoh wrote:
 Just discovered a new trick in D: struct inheritance using alias
 this. :)

Wasn't this stated in TDPL as one of primary design rationales behind "alias this"? :)
Jul 16 2013
prev sibling next sibling parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Mon, 15 Jul 2013 20:06:38 +0100, Meta <jared771 gmail.com> wrote:

 I saw an interesting post on Hacker News about constructors in OO  
 languages. Apparently they are a real stumbling block for some  
 programmers, which was quite a surprise to me. I think this might be  
 relevant to a discussion about named parameters and whether we should  
 ditch constructors for another kind of construct.

 Link to the newsgroup post, the link to the paper is near the top:
 http://erlang.org/pipermail/erlang-questions/2012-March/065519.html

First thought; constructors with positional arguments aren't any different to methods or functions with positional arguments WRT remembering the arguments. The difficulties with one are the same as with another - you need to remember them, or look them up, or get help from intellisense. I think the point about constructed objects being in valid states is the important one. If the object requires N arguments which cannot be sensibly defaulted, then IMO they /have/ to be specified at construction, and should not be delayed as in the create-set-call style mentioned. Granted, A create-set-call style object could throw detailed/useful messages when used before initialisation, but that's a runtime check so IMO not a great solution to the issue. Also, I find compelling the issue that a create-set-call style object with N required set calls could be initialised N! ways, and that each of these different orderings have effectively the same semantic meaning.. so it becomes a lot harder to see what is really happening. Add to that, that someone could interleave the initialisation of another object into the first and .. well .. shudder. So, given the desire to have objects constructed in a valid state, and given the restriction that this may require N arguments which cannot be defaulted how do you alleviate the problem of having to remember the parameters required and the ordering of those parameters? Named parameters only help up to a point. Like ordered parameters you need to remember which parameters are required, all that has changed is that instead of remembering their order you have to remember their names. So, IMO this doesn't really solve the problem at all. A lot can be done with sufficiently clever intellisense in either case (ordered/named parameters), but is there anything which can be done without it using just a text editor and compiler? Or, perhaps another way to ask a similar W is.. can the compiler statically verify that a create-set-call style object has been initialised, or rather that an attempt has at least been made to initialise all the required parts. We have class invariants.. these define the things which must be initialised to reach a valid state. If we had compiler recognisable properties as well, then we could have an initialise construct like.. class Foo { string name; int age; invariant { assert(name != null); assert(age > 0); } property string Name... property int Age... } void main() { Foo f = new Foo() { Name = "test", // calls property Name setter Age = 12 // calls property Age setter }; } The compiler could statically verify that the variables tested in the invariant (name, age) were set (by setter properies) inside the initialise construct {} following the new Foo(). R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jul 16 2013
parent =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Regan Heath wrote:
 Or, perhaps another way to ask a similar W is.. can the compiler =20
 statically verify that a create-set-call style object has been =20
 initialised, or rather that an attempt has at least been made to =20
 initialise all the required parts.
=20

http://blog.rafaelferreira.net/2008/07/type-safe-builder-pattern-in-scala= =2Ehtml Basically, the builder object is a generic that has a boolean parameter for each mandatory parameter. Setting a parameter casts the builder object to the same generic with the corresponding boolean set to true. And the "build" method is only available when the type system recognizes that all the booleans are true. Note however that this will not work if you try to mutate the builder instance. IOW, this will work (assuming you only need to specify foo and bar):
 auto instance =3D builder().withFoo (1).withBar ("abc").build();

but this won't work:
 auto b =3D builder();
 b.withFoo (1);
 b.withBar ("abc");
 auto instance =3D b.build();

Something similar should be doable in D (although I'm a bit afraid of the template bloat it might create=E2=80=A6) Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Jul 16 2013
prev sibling next sibling parent "Craig Dillabaugh" <cdillaba cg.scs.careton.ca> writes:
On Tuesday, 16 July 2013 at 09:47:35 UTC, Regan Heath wrote:

clip

 We have class invariants.. these define the things which must 
 be initialised to reach a valid state.  If we had compiler 
 recognisable properties as well, then we could have an 
 initialise construct like..

 class Foo
 {
   string name;
   int age;

   invariant
   {
     assert(name != null);
     assert(age > 0);
   }

   property string Name...
   property int Age...
 }

 void main()
 {
   Foo f = new Foo() {
     Name = "test",    // calls property Name setter
     Age = 12          // calls property Age setter
   };
 }

 The compiler could statically verify that the variables tested 
 in the invariant (name, age) were set (by setter properies) 
 inside the initialise construct {} following the new Foo().

 R

How do you envision this working where Name or Age must be set to a value not known at compile time?
Jul 16 2013
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Tuesday, 16 July 2013 at 13:35:00 UTC, Craig Dillabaugh wrote:
 How do you envision this working where Name or Age must be set 
 to
 a value not known at compile time?

Contracts are run-time entities (omitted in release AFAIR).
Jul 16 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 16, 2013 at 11:18:31AM +0200, Dicebot wrote:
 On Tuesday, 16 July 2013 at 08:19:10 UTC, H. S. Teoh wrote:
Just discovered a new trick in D: struct inheritance using alias
this. :)

Wasn't this stated in TDPL as one of primary design rationales behind "alias this"? :)

Haha, you're right. I read it before but apparently the only thing that stuck in my mind is that alias this is to allow a type to masquerade as another type. But looking at the relevant sections again, Andrei did describe it as "subtyping", both w.r.t. classes and structs. Touch. :) T -- Mediocrity has been pushed to extremes.
Jul 16 2013
prev sibling next sibling parent "Wyatt" <wyatt.epp gmail.com> writes:
On Tuesday, 16 July 2013 at 13:35:00 UTC, Craig Dillabaugh wrote:
 How do you envision this working where Name or Age must be set 
 to
 a value not known at compile time?

I'm not sure if it's practical or covers all the bases, but it sounds like you would need to keep track of member initialisation during compilation, and abort if the code attempts to use the object or one of its members as an AssignExpression without initialising the whole thing. Setting aside the fact that there's compiler work mentioned at all, have I missed some nuance of this pattern? I guess there's the situation where you conditionally may or may not assign, or pass it around and accrete mutations, so it might be best to only do it for some properly-annotated (how?) subset of the whole? Not sure. -Wyatt
Jul 16 2013
prev sibling next sibling parent "Craig Dillabaugh" <cdillaba cg.scs.careton.ca> writes:
On Tuesday, 16 July 2013 at 16:07:30 UTC, Wyatt wrote:
 On Tuesday, 16 July 2013 at 13:35:00 UTC, Craig Dillabaugh 
 wrote:
 How do you envision this working where Name or Age must be set 
 to
 a value not known at compile time?

I'm not sure if it's practical or covers all the bases, but it sounds like you would need to keep track of member initialisation during compilation, and abort if the code attempts to use the object or one of its members as an AssignExpression without initialising the whole thing. Setting aside the fact that there's compiler work mentioned at all, have I missed some nuance of this pattern? I guess there's the situation where you conditionally may or may not assign, or pass it around and accrete mutations, so it might be best to only do it for some properly-annotated (how?) subset of the whole? Not sure. -Wyatt

Jul 16 2013
prev sibling next sibling parent "Craig Dillabaugh" <cdillaba cg.scs.careton.ca> writes:
On Tuesday, 16 July 2013 at 16:07:30 UTC, Wyatt wrote:
 On Tuesday, 16 July 2013 at 13:35:00 UTC, Craig Dillabaugh 
 wrote:
 How do you envision this working where Name or Age must be set 
 to
 a value not known at compile time?

I'm not sure if it's practical or covers all the bases, but it sounds like you would need to keep track of member initialisation during compilation, and abort if the code attempts to use the object or one of its members as an AssignExpression without initialising the whole thing. Setting aside the fact that there's compiler work mentioned at all, have I missed some nuance of this pattern? I guess there's the situation where you conditionally may or may not assign, or pass it around and accrete mutations, so it might be best to only do it for some properly-annotated (how?) subset of the whole? Not sure. -Wyatt

Sorry for the empty post (previous). In general, I think the proposed idea is quite nice, and as Dicebot pointed out, my initial concern was misguided because the invariant is evaluated at runtime, not compile time (and Dicebot, I checked the docs, and you are correct about it getting stripped for release builds).
Jul 16 2013
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Tue, 16 Jul 2013 14:34:59 +0100, Craig Dillabaugh  
<cdillaba cg.scs.careton.ca> wrote:

 On Tuesday, 16 July 2013 at 09:47:35 UTC, Regan Heath wrote:

 clip

 We have class invariants.. these define the things which must be  
 initialised to reach a valid state.  If we had compiler recognisable  
 properties as well, then we could have an initialise construct like..

 class Foo
 {
   string name;
   int age;

   invariant
   {
     assert(name != null);
     assert(age > 0);
   }

   property string Name...
   property int Age...
 }

 void main()
 {
   Foo f = new Foo() {
     Name = "test",    // calls property Name setter
     Age = 12          // calls property Age setter
   };
 }

 The compiler could statically verify that the variables tested in the  
 invariant (name, age) were set (by setter properies) inside the  
 initialise construct {} following the new Foo().

 R

How do you envision this working where Name or Age must be set to a value not known at compile time?

The idea isn't to run the invariant itself at compile time - as you say, a runtime only value may be used. In fact, in the example above the compiler would have to hold off running the invariant until the closing } of that initialise statement or it may fail. The idea was to /use/ the code in the invariant to determine which member fields should be set during the initialisation statement and then statically verify that a call was made to some member function to set them. The actual values set aren't important, just that some attempt has been made to set them. That's about the limit of what I think you could do statically, in the general case. In some specific cases we could extend this to say that if all the values set were evaluable at compile time, then we could actually run the invariant using CTFE, perhaps. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jul 16 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 16, 2013 at 06:17:48PM +0100, Regan Heath wrote:
 On Tue, 16 Jul 2013 14:34:59 +0100, Craig Dillabaugh
 <cdillaba cg.scs.careton.ca> wrote:
 
On Tuesday, 16 July 2013 at 09:47:35 UTC, Regan Heath wrote:

clip

We have class invariants.. these define the things which must be
initialised to reach a valid state.  If we had compiler
recognisable properties as well, then we could have an
initialise construct like..

class Foo
{
  string name;
  int age;

  invariant
  {
    assert(name != null);
    assert(age > 0);
  }

  property string Name...
  property int Age...
}

void main()
{
  Foo f = new Foo() {
    Name = "test",    // calls property Name setter
    Age = 12          // calls property Age setter
  };
}



Maybe I'm missing something obvious, but isn't this essentially the same thing as having named ctor parameters? [...]
 The idea was to /use/ the code in the invariant to determine which
 member fields should be set during the initialisation statement and
 then statically verify that a call was made to some member function
 to set them.  The actual values set aren't important, just that some
 attempt has been made to set them.  That's about the limit of what I
 think you could do statically, in the general case.

This seems to be the same thing as using named parameters: assuming the compiler actually supported such a thing, it would be able to tell at compile-time whether all required named parameters have been specified, and abort if not. There would be no need for any invariant-based guessing of what fields are required and what aren't, and no need for adding any property feature to the language -- the function signature of the ctor itself indicates what is required, and the compiler can check this at compile-time. (Of course, actual verification of the ctor parameters can only happen at runtime -- which is OK.) This still doesn't address the issue of ctor argument proliferation, though: if each level of the class hierarchy adds 1-2 additional parameters, you still need to write tons of boilerplate in your derived classes to percolate those additional parameters up the inheritance tree. If a base class ctor requires parameters parmA, parmB, parmC, then any derived class ctor must declare at least parmA, parmB, parmC in their function signature (or provide default values for them), and you must still write super(parmA, parmB, parmC) in order to percolate these parameters to the base class. If the derived class requires additional parameters, say parmD, then that's added on top of all of the base class ctor arguments. And any further derived class will now have to declare at least parmA, parmB, parmC, parmD, and then tack on any additional parameters they may need. This is not scalable -- deeply derived classes will have ctors with ridiculous numbers of arguments. Now imagine if at some point you need to change some base class ctor parameters. Now instead of making a single change to the base class, you have to update every single derived class to make the same change to every ctor, so that the new version of the parameter (or new parameter) is properly percolated up the inheritance tree. This defeats the goal in OOP of restricting the scope of changes to only localized changes. This is especially bad when you need to add an *optional* parameter to the base class: you have to do all that work of updating every single derived class yet most of the code that uses those derived classes don't even care about this new parameter! That's a lot of work for almost no benefit. (And you can't get away without doing it either, since a user of a derived class may at some point want to customize that optional base class parameter, so *all* derived class ctors must also declare it as an optional parameter.) I think my approach of using builder structs with a parallel inheritance tree is still better: adding/removing/changing parameters to a base class's builder struct automatically propagates to all derived classes with no further code change. With the help of mixin templates, the amount of boilerplate is greatly reduced. And thanks to the use of typeof(super), you can even shuffle classes around your class hierarchy without needing to change anything more than the base class name in the class declaration -- the mixin automatically picks up the right base class builder struct to inherit from, thus guaranteeing that the parallel hierarchy is consistent at all times. The only weakness I can see is that mandatory arguments with no reasonable default values can't be easily handled. In the simple cases, you can expand the mixin to allow you to specify builder struct ctors that have required arguments; but then this suffers from the same scalability problems that we were trying to solve in the first place, since all derived classes' builder structs will now require mandatory arguments to be propagated through their ctors. But I think this shouldn't be a big problem in practice: we can use Nullable fields in the builder struct and have the class ctor verify that all mandatory arguments are present, and throw an error if any arguments are not set properly. T -- ASCII stupid question, getty stupid ANSI.
Jul 16 2013
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Tue, 16 Jul 2013 18:54:06 +0100, J=C3=A9r=C3=B4me M. Berger <jeberger=
 free.fr>  =

wrote:

 Regan Heath wrote:
 Or, perhaps another way to ask a similar W is.. can the compiler
 statically verify that a create-set-call style object has been
 initialised, or rather that an attempt has at least been made to
 initialise all the required parts.

http://blog.rafaelferreira.net/2008/07/type-safe-builder-pattern-in-sc=

I saw the builder pattern mentioned in the original thread..
 	Basically, the builder object is a generic that has a boolean
 parameter for each mandatory parameter. Setting a parameter casts
 the builder object to the same generic with the corresponding
 boolean set to true. And the "build" method is only available when
 the type system recognizes that all the booleans are true.

But I hadn't realised it could enforce things statically, this is a cool= = idea.
 	Note however that this will not work if you try to mutate the
 builder instance. IOW, this will work (assuming you only need to
 specify foo and bar):

 auto instance =3D builder().withFoo (1).withBar ("abc").build();


This looks like good D style, to me, in keeping with the UFCS chains etc= .
 but this won't work:

 auto b =3D builder();
 b.withFoo (1);
 b.withBar ("abc");
 auto instance =3D b.build();


But, you could create a separate variable for each with, couldn't you - = v/ = inefficient, but possible. I don't think this syntax/style is a = requirement, and I prefer the chain style above it.
 	Something similar should be doable in D (although I'm a bit afraid
 of the template bloat it might create=E2=80=A6)

Indeed. The issue I have with the builder is the requirement for more = classes/templates/etc in addition to the original objects. D could like= ly = define them in the standard library, but as you say there would be = template bloat. R -- = Using Opera's revolutionary email client: http://www.opera.com/mail/
Jul 17 2013
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Tue, 16 Jul 2013 23:01:57 +0100, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:
 On Tue, Jul 16, 2013 at 06:17:48PM +0100, Regan Heath wrote:
 On Tue, 16 Jul 2013 14:34:59 +0100, Craig Dillabaugh
 <cdillaba cg.scs.careton.ca> wrote:

On Tuesday, 16 July 2013 at 09:47:35 UTC, Regan Heath wrote:

clip

We have class invariants.. these define the things which must be
initialised to reach a valid state.  If we had compiler
recognisable properties as well, then we could have an
initialise construct like..

class Foo
{
  string name;
  int age;

  invariant
  {
    assert(name != null);
    assert(age > 0);
  }

  property string Name...
  property int Age...
}

void main()
{
  Foo f = new Foo() {
    Name = "test",    // calls property Name setter
    Age = 12          // calls property Age setter
  };
}



Maybe I'm missing something obvious, but isn't this essentially the same thing as having named ctor parameters?

Yes, if we're comparing this to ctors with named parameters. I wasn't doing that however, I was asking this Q: "Or, perhaps another way to ask a similar W is.. can the compiler statically verify that a create-set-call style object has been initialised, or rather that an attempt has at least been made to initialise all the required parts." Emphasis on "create-set-call" :) The weakness to create-set-call style is the desire for a valid object as soon as an attempt can be made to use it. Which implies the need for some sort of enforcement of initialisation and as I mentioned in my first post the issue of preventing this intialisation being spread out, or intermingled with others and thus making the semantics of it harder to see. My idea here attempted to solve those issues with create-set-call only.
 [...]
 The idea was to /use/ the code in the invariant to determine which
 member fields should be set during the initialisation statement and
 then statically verify that a call was made to some member function
 to set them.  The actual values set aren't important, just that some
 attempt has been made to set them.  That's about the limit of what I
 think you could do statically, in the general case.

This still doesn't address the issue of ctor argument proliferation, though

It wasn't supposed to :) create-set-call ctors have no arguments.
 if each level of the class hierarchy adds 1-2 additional
 parameters, you still need to write tons of boilerplate in your derived
 classes to percolate those additional parameters up the inheritance
 tree.

In the create-set-call style additional required 'arguments' would appear as setter member functions whose underlying data member is verified in the invariant and would therefore be enforced by the syntax I detailed.
 Now imagine if at some point you need to change some base class ctor
 parameters. Now instead of making a single change to the base class, you
 have to update every single derived class to make the same change to
 every ctor, so that the new version of the parameter (or new parameter)
 is properly percolated up the inheritance tree.

This is one reason why create-set-call might be desirable, no ctor arguments, no problem. So, to take my idea a little further - WRT class inheritance. The compiler, for a derived class, would need to inspect the invariants of all classes involved (these are and-ed already), inspect the constructors of the derived classes (for calls to initialise members), and the initialisation block I described and verify statically that an attempt was made to initialise all the members which appear in all the invariants.
 I think my approach of using builder structs with a parallel inheritance
 tree is still better

It may be, it certainly looked quite neat but I haven't had a detailed look at it TBH. I think you've missunderstood my idea however, or rather, the issues it was intended to solve :) Perhaps my idea is too limiting for you? I could certainly understand that point of view. I think another interesting idea is using the builder pattern with create-set-call objects. For example, a builder template class could inspect the object for UDA's indicating a data member which is required during initialisation. It would contain a bool[] to flag each member as not/initialised and expose a setMember() method which would call the underlying object setMember() and return a reference to itself. At some point, these setMember() method would want to return another template class which contained just a build() member. I'm not sure how/if this is possible in D. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jul 17 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jul 17, 2013 at 11:00:38AM +0100, Regan Heath wrote:
 On Tue, 16 Jul 2013 23:01:57 +0100, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:
On Tue, Jul 16, 2013 at 06:17:48PM +0100, Regan Heath wrote:


class Foo
{
  string name;
  int age;

  invariant
  {
    assert(name != null);
    assert(age > 0);
  }

  property string Name...
  property int Age...
}

void main()
{
  Foo f = new Foo() {
    Name = "test",    // calls property Name setter
    Age = 12          // calls property Age setter
  };
}



Maybe I'm missing something obvious, but isn't this essentially the same thing as having named ctor parameters?

Yes, if we're comparing this to ctors with named parameters. I wasn't doing that however, I was asking this Q: "Or, perhaps another way to ask a similar W is.. can the compiler statically verify that a create-set-call style object has been initialised, or rather that an attempt has at least been made to initialise all the required parts." Emphasis on "create-set-call" :) The weakness to create-set-call style is the desire for a valid object as soon as an attempt can be made to use it. Which implies the need for some sort of enforcement of initialisation and as I mentioned in my first post the issue of preventing this intialisation being spread out, or intermingled with others and thus making the semantics of it harder to see.

Ah, I see. So basically, you need some kind of enforcement of a two-state object, pre-initialization and post-initialization. Basically, the ctor is empty, so you allocate the object first, then set some values into it, then it "officially" becomes a full-fledged instance of the class. To prevent problems with consistency, a sharp transition between setting values and using the object is enforced. Am I right? I guess my point was that if we boil this down to the essentials, it's basically the same idea as a builder pattern, just implemented slightly differently. In the builder pattern, a separate object (or struct, or whatever) is used to encapsulate the state of the object that we'd like it to be in, which we then pass to the ctor to create the object in that state. The idea is the same, though: set up a bunch of values representing the desired initial state of the object, then, to borrow Perl's terminology, "bless" it into a full-fledged class instance.
 My idea here attempted to solve those issues with create-set-call only.

Fair enough. I guess my approach was from the angle of trying to address the problem from the confines of the current language. So, same idea, different implementation. :) [...]
The idea was to /use/ the code in the invariant to determine which
member fields should be set during the initialisation statement and
then statically verify that a call was made to some member function
to set them.  The actual values set aren't important, just that some
attempt has been made to set them.  That's about the limit of what I
think you could do statically, in the general case.

This still doesn't address the issue of ctor argument proliferation, though

It wasn't supposed to :) create-set-call ctors have no arguments.

True. But if the ctor call requires a code block that initializes mandatory initial values, then isn't it essentially the same thing as ctors that have arguments? If the class hierarchy is deep, and base classes have mandatory fields to be set, then you still have the same problem, just in a different manifestation.
if each level of the class hierarchy adds 1-2 additional parameters,
you still need to write tons of boilerplate in your derived classes
to percolate those additional parameters up the inheritance tree.

In the create-set-call style additional required 'arguments' would appear as setter member functions whose underlying data member is verified in the invariant and would therefore be enforced by the syntax I detailed.

What happens when base classes also have required setter member functions that you must call?
Now imagine if at some point you need to change some base class ctor
parameters. Now instead of making a single change to the base class,
you have to update every single derived class to make the same change
to every ctor, so that the new version of the parameter (or new
parameter) is properly percolated up the inheritance tree.

This is one reason why create-set-call might be desirable, no ctor arguments, no problem.

Right.
 So, to take my idea a little further - WRT class inheritance.  The
 compiler, for a derived class, would need to inspect the invariants
 of all classes involved (these are and-ed already), inspect the
 constructors of the derived classes (for calls to initialise
 members), and the initialisation block I described and verify
 statically that an attempt was made to initialise all the members
 which appear in all the invariants.

I see. So basically the user still has to set up all required values before you can use the object, the advantage being that you don't have to manually percolate these values up the inheritance tree in the ctors. It seems to be essentially the same thing as my approach, just implemented differently. :) In my approach, ctor arguments are encapsulated inside a struct, currently called Args by convention. So if you have, say, a class hierarchy where class B inherits from class A, and A.this() has 5 parameters and B.this() adds another 5 parameters, then B.Args would have 10 fields. To create an instance of B, the user would do this: B.Args args; args.field1 = 10; args.field2 = 20; ... auto obj = new B(args); So in a sense, this isn't that much different from your approach, in that the user sets a bunch of values desired for the initial state of the object, then gets a full-fledged object out of it at the end. In my case, all ctors in the class hierarchy would take a single struct argument encapsulating all ctor arguments for that class (including arguments to its respective base class ctors, etc.). So ctors would look like this: class B : A { struct Args { ... } this(Args args) { super(...); ... // set up object based on values in args } } The trick here, then, is that call to super(...). The naïve way of doing this is to (manually) include base class ctor arguments as part of B.Args, then in B's ctor, we collect those arguments together in A.Args, and hand that over to A's ctor. But we can do better. Since A.Args is already defined, there's no need to duplicate all those fields in B.Args; we can simply do this: class B : A { struct Args { A.Args baseClassArgs; ... // fields specific to B } this(Args args) { super(args.baseClassArgs); ... } } This is ugly, though, 'cos now user code has to know about B.Args.baseClassArgs: B.Args args; args.baseClassArgs.baseClassParm1 = 123; args.derivedClassParm1 = 234; ... auto obj = new B(args); So the next step is to use alias this to make .baseClassArgs transparent to user code: class B : A { struct Args { A.Args baseClassArgs; alias baseClassArgs this; // <--- N.B. ... // fields specific to B } this(Args args) { // Nice side-effect of alias this: we can pass // args to super without needing to explicitly // name .baseClassArgs. super(args); ... } } // Now user code doesn't need to know about .baseClassArgs: B.Args args; args.baseClassParm1 = 123; args.derivedClassParm1 = 234; ... auto obj = new B(args); This is starting to look pretty good. Now the next step is, having to type A.Args baseClassArgs each time is a lot of boilerplate, and could be error-prone. For example, if we accidentally wrote C.Args instead of A.Args: class B : A { struct Args { C.Args baseClassArgs; // <--- oops! alias baseClassArgs this; ... } ... } So the next step is to make the type of baseClassArgs automatically inferred, so that no matter how we move B around in the class hierarchy, it will always be correct: class B : A { struct Args { typeof(super).Args baseClassArgs; // ah, much better! alias baseClassArgs this; ... } this(Args args) { super(args); ... } } This is good, because now, the declaration of B.Args is independent of whatever base class B has. Similarly, thanks to the alias this introduced earlier, the call to super(...) is always written super(args), without any explicit reference to the specific base class. DRY is good. Of course, this is still a lot of boilerplate: you have to keep typing out the first 3 lines of the declaration of Args, in every derived class. But now that we've made this declaration independent of an explicit base class name, we can factor it into a mixin: mixin template CtorArgs(string fields) { struct Args { typeof(super).Args baseClassArgs; alias baseClassArgs this; mixin(fields); } } class B : A { mixin CtorArgs!(q{ int derivedParm1; int derivedParm2; ... }); this(Args args) { super(args); ... } } Now we can simply use CtorArgs!(...) in each derived class to automatically declare the Args struct correctly. The boilerplate is now minimal. Things continue to work even if we move B around in the class hierarchy. Say we want to derive B from C instead of A; then we'd simply write: class B : C { // <-- this is the only line that's different! mixin CtorArgs!(q{ int derivedParm1; int derivedParm2; ... }); this(Args args) { super(args); ... } } Finally, we add a little detail to our mixin so that we can use it for the root of the class hierarchy as well. Right now, we still have to explicitly declare A.Args (assuming A is the root of our hierarchy), which is bad, because you may accidentally call it something that doesn't match what CtorArgs expects. We'd like to be able to consistently use CtorArgs even in the root base class, so that if we ever need to re-root the hierarchy, things will continue to Just Work. So we revise CtorArgs thus: mixin template CtorArgs(string fields) { struct Args { static if (!is(typeof(super)==Object)) { typeof(super).Args baseClassArgs; alias baseClassArgs this; } mixin(fields); } } Basically, the static if just omits the whole baseClassArgs and alias this deal ('cos the root of the hierarchy has no superclass that also has an Args struct). So now we can write: class A { mixin CtorArgs!(q{ /* ctor fields here */ }); ... } And if we ever re-root the hierarchy, we can simply write: class A : B { // <--- this is the only line that changes mixin CtorArgs!(q{ /* ctor fields here */ }); ... }
I think my approach of using builder structs with a parallel
inheritance tree is still better

It may be, it certainly looked quite neat but I haven't had a detailed look at it TBH. I think you've missunderstood my idea however, or rather, the issues it was intended to solve :) Perhaps my idea is too limiting for you? I could certainly understand that point of view.

Well, I think our approaches are essentially the same thing, just implemented differently. :) One thing about your implementation that I found limiting was that you *have* to declare all required fields on-the-spot before the compiler will let your 'new' call pass, so if you have to create 5 similar instances of the class, you have to copy-n-paste most of the set-method calls: auto obj1 = new C() { name = "test1", age = 12, school = "D Burg High School" }); auto obj2 = new C() { name = "test2", age = 12, school = "D Burg High School" } auto obj3 = new C() { name = "test3", age = 12, school = "D Burg High School" } auto obj4 = new C() { name = "test4", age = 12, school = "D Burg High School" } auto obj5 = new C() { name = "test5", age = 12, school = "D Burg High School" } Whereas using my approach, you can simply reuse the Args struct several times: C.Args args; args.name = "test1"; args.age = 12; args.school = "D Burg High School"; auto obj1 = new C(args); args.name = "test2"; auto obj2 = new C(args); args.name = "test3"; auto obj3 = new C(args); ... // etc. You can also have different functions setup different parts of C.Args: C createObject(C.Args args) { // N.B. only need to set a subset of fields args.school = "D Burg High School"; return new C(args); } void main() { C.Args args; args.name = "test1"; args.age = 12; // partially setup Args auto obj = createObject(args); // createObject fills out rest of the fields. ... args.name = "test2"; // modify a few parameters auto obj2 = createObject(args); // createObject doesn't need to know about this change } This is nice if there are a lot of parameters and you don't want to collect the setting up of all of them in one place.
 I think another interesting idea is using the builder pattern with
 create-set-call objects.
 
 For example, a builder template class could inspect the object for
 UDA's indicating a data member which is required during
 initialisation.  It would contain a bool[] to flag each member as
 not/initialised and expose a setMember() method which would call the
 underlying object setMember() and return a reference to itself.
 
 At some point, these setMember() method would want to return another
 template class which contained just a build() member.  I'm not sure
 how/if this is possible in D.

Hmm, this is an interesting idea indeed. I think it may be possible to implement in the current language. It would solve the problem of mandatory fields, which is currently a main weakness of my approach (the user can neglect to setup a field in Args, and there's no way to enforce that those fields *must* be set -- you could provide sane defaults in the declaration of Args, but if some fields have no sane default value, then you're out of luck). One approach is to use Nullable for mandatory fields (or equivalently, use bool[] as you suggest), then the ctors will throw an exception if a required field hasn't been set yet. Which isn't a bad solution, since ctors in theory *should* vet their input values before creating an instance of the class anyway. But it does require some amount of boilerplate. Maybe we can make use of UDAs to indicate which fields are mandatory, then have a template (or mixin template) uses compile-time reflection to generate the code that verifies that these fields have indeed been set. Maybe something like: struct RequiredAttr {} // Warning: have not tried to compile this yet mixin template checkCtorArgs(alias args) { alias Args = typeof(args); foreach (field; __traits(allMembers, Args)) { // (Ugh, __traits syntax is so ugly) static if (is(__traits(getAttributes, __traits(getMember, args, field)[0])==RequiredAttr)) { if (__traits(getMember, args, field) is null) throw new Exception("..."); } } } class B : A { mixin CtorArgs!(q{ int myfield1; // this one is optional (RequiredAttr) Nullable!int myfield2; // this one is mandatory }); this(Args args) { mixin checkCtorArgs!(args); // throws if any mandatory fields aren't set ... } } Just a rough idea, haven't actually tried to compile this code yet. On second thoughts, maybe we could just check for an instantiation of Nullable instead of using a UDA, since if you forget to use a nullable value (like int instead of Nullable!int), this code wouldn't work. Or maybe enhance the CtorArgs template to automatically substitute Nullable!T when it sees a field of type T that's marked with (RequiredAttr). Or maybe your bool[] idea is better, since it avoids the dependency on Nullable. In any case, this is an interesting direction to look into. T -- Тише едешь, дальше будешь.
Jul 17 2013
prev sibling next sibling parent "w0rp" <devw0rp gmail.com> writes:
I always just avoided confusion by limiting myself to a maximum
of 5 arguments for any function or constructor, maybe with a soft
limit of 3. Preferring composition over inheritance helps too.
Jul 17 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jul 17, 2013 at 11:19:21PM +0200, w0rp wrote:
 I always just avoided confusion by limiting myself to a maximum
 of 5 arguments for any function or constructor, maybe with a soft
 limit of 3. Preferring composition over inheritance helps too.

My original motivation for trying to tackle this problem was when I was experimenting with maze generation algorithms. I had a base class representing all maze generators, and various derived classes representing specific algorithms. Some of these algorithms have quite a large number of configurable parameters, and the algorithms themselves have different flavors, so some classes that already have many parameters would have derived classes that introduce a few more. Encapsulating all of these parameters inside structs was the only sane way I could think of to manage the large sets of parameters involved. Also, I agree that 3-5 parameters per function/ctor is about the max for a clean interface -- any more than that and it's a sign that you aren't organizing your code properly. But in the case of ctors, it's not so much the 3-5 parameters required for the class itself that's the problem, but the fact that these parameters *accumulate* in all derived classes. If you have a 4-level class hierarchy and each level adds 5 more parameters, that's 20 parameters in total, which is clearly unmanageable. T -- Designer clothes: how to cover less by paying more.
Jul 17 2013
prev sibling next sibling parent "eles" <eles eles.com> writes:
On Wednesday, 17 July 2013 at 21:42:16 UTC, H. S. Teoh wrote:
 On Wed, Jul 17, 2013 at 11:19:21PM +0200, w0rp wrote:

This is how it is done in Ecere SDK and the eC language: "However, constructors particularly do not play a role a important as in C++, for example. Neither constructors nor destructors can take in any parameters, and only a single one of each can be defined within a class." "Instead, members can be directly assigned a value through the instantiation syntax initializers (either through the data members, or the properties which we will describe in next chapter)." "They cannot be specified a return type either. A constructor should never fail, but returning false(they have an implicit bool return type) will result in a the object instantiated to be null."
Jul 17 2013
prev sibling next sibling parent "eles" <eles eles.com> writes:
On Wednesday, 17 July 2013 at 21:59:14 UTC, eles wrote:
 On Wednesday, 17 July 2013 at 21:42:16 UTC, H. S. Teoh wrote:
 On Wed, Jul 17, 2013 at 11:19:21PM +0200, w0rp wrote:

This is how it is done in Ecere SDK and the eC language:

Example: import"ecere" classForm1 : Window { text = "Form1"; background = activeBorder; borderStyle = sizable; hasMaximize = true; hasMinimize = true; hasClose = true; clientSize = { 400, 300 }; } Form1 form1 {}; Basically, you assign needed fields first, then call an unique constructor on tthat skeleton.
Jul 17 2013
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 17 Jul 2013 18:58:53 +0100, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:
 On Wed, Jul 17, 2013 at 11:00:38AM +0100, Regan Heath wrote:
 Emphasis on "create-set-call" :)  The weakness to create-set-call
 style is the desire for a valid object as soon as an attempt can be
 made to use it.  Which implies the need for some sort of enforcement
 of initialisation and as I mentioned in my first post the issue of
 preventing this intialisation being spread out, or intermingled with
 others and thus making the semantics of it harder to see.

Ah, I see. So basically, you need some kind of enforcement of a two-state object, pre-initialization and post-initialization. Basically, the ctor is empty, so you allocate the object first, then set some values into it, then it "officially" becomes a full-fledged instance of the class. To prevent problems with consistency, a sharp transition between setting values and using the object is enforced. Am I right?

Yes, that's basically it.
 I guess my point was that if we boil this down to the essentials, it's
 basically the same idea as a builder pattern, just implemented slightly
 differently. In the builder pattern, a separate object (or struct, or
 whatever) is used to encapsulate the state of the object that we'd like
 it to be in, which we then pass to the ctor to create the object in that
 state. The idea is the same, though: set up a bunch of values
 representing the desired initial state of the object, then, to borrow
 Perl's terminology, "bless" it into a full-fledged class instance.

It achieves the same ends, but does it differently. My idea requires compiler support (which makes it unlikely to happen) and doesn't require separate objects (which I think is a big plus).
 So, to take my idea a little further - WRT class inheritance.  The
 compiler, for a derived class, would need to inspect the invariants
 of all classes involved (these are and-ed already), inspect the
 constructors of the derived classes (for calls to initialise
 members), and the initialisation block I described and verify
 statically that an attempt was made to initialise all the members
 which appear in all the invariants.

I see. So basically the user still has to set up all required values before you can use the object, the advantage being that you don't have to manually percolate these values up the inheritance tree in the ctors.

Exactly.
 It seems to be essentially the same thing as my approach, just
 implemented differently. :)[...]

Thanks for the description of your idea. As I understand it, in your approach all the mandatory parameters for all classes in the hierarchy are /always/ passed to the final child constructor. In my idea a constructor in the hierarchy could chose to set some of the mandatory members of it's parents, and the compiler would detect that and would not require the initialisation block to contain those members. Also, in your approach there isn't currently any enforcement that the user sets all the mandatory parameters of Args, and this is kinda the main issue my idea solves.
 One thing about your implementation that I found limiting was that you
 *have* to declare all required fields on-the-spot before the compiler
 will let your 'new' call pass, so if you have to create 5 similar
 instances of the class, you have to copy-n-paste most of the set-method
 calls:

 	auto obj1 = new C() {
 		name = "test1",
 		age = 12,
 		school = "D Burg High School"
 	});

 [...]

 Whereas using my approach, you can simply reuse the Args struct several
 times:

 	C.Args args;
 	args.name = "test1";
 	args.age = 12;
 	args.school = "D Burg High School";
 	auto obj1 = new C(args);

 	args.name = "test2";
 	auto obj2 = new C(args);

 	args.name = "test3";
 	auto obj3 = new C(args);

 	... // etc.

Or.. you use a mixin, or better still you add a copy-constructor or .dup method to your class to duplicate it :)
 You can also have different functions setup different parts of C.Args:

 	C createObject(C.Args args) {
 		// N.B. only need to set a subset of fields
 		args.school = "D Burg High School";
 		return new C(args);
 	}

 	void main() {
 		C.Args args;
 		args.name = "test1";
 		args.age = 12;		// partially setup Args
 		auto obj = createObject(args); // createObject fills out rest of the  
 fields.
 		...

 		args.name = "test2";	// modify a few parameters
 		auto obj2 = createObject(args); // createObject doesn't need to know  
 about this change
 	}

 This is nice if there are a lot of parameters and you don't want to
 collect the setting up of all of them in one place.

In my case you can call different functions in the initialisation block, e.g. void defineObject(C c) { c.school = "...); } C c = new C() { defineObject() } :)
 I think another interesting idea is using the builder pattern with
 create-set-call objects.

 For example, a builder template class could inspect the object for
 UDA's indicating a data member which is required during
 initialisation.  It would contain a bool[] to flag each member as
 not/initialised and expose a setMember() method which would call the
 underlying object setMember() and return a reference to itself.

 At some point, these setMember() method would want to return another
 template class which contained just a build() member.  I'm not sure
 how/if this is possible in D.

Hmm, this is an interesting idea indeed. I think it may be possible to implement in the current language.

The issue I think is the step where you want to mutate the return type from the type with setX members to the type with build().
 Maybe we can make use of UDAs to indicate which fields are mandatory

That was what I was thinking.
 [...]
 Just a rough idea, haven't actually tried to compile this code yet.

Worth a go, it doesn't require compiler support like my idea so it's far more likely you'll get something at the end of it.. I can just sit on my hands and/or try to promote my idea. I still prefer my idea :P. I think it's cleaner and simpler, this is in part because it requires compiler support and that hides the gory details, but also because create-set-call is a simpler style in itself. Provided the weaknesses of create-set-call can be addressed I might be tempted to use that style. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jul 18 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jul 18, 2013 at 10:13:58AM +0100, Regan Heath wrote:
 On Wed, 17 Jul 2013 18:58:53 +0100, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:

I guess my point was that if we boil this down to the essentials,
it's basically the same idea as a builder pattern, just implemented
slightly differently. In the builder pattern, a separate object (or
struct, or whatever) is used to encapsulate the state of the object
that we'd like it to be in, which we then pass to the ctor to create
the object in that state. The idea is the same, though: set up a
bunch of values representing the desired initial state of the object,
then, to borrow Perl's terminology, "bless" it into a full-fledged
class instance.

It achieves the same ends, but does it differently. My idea requires compiler support (which makes it unlikely to happen) and doesn't require separate objects (which I think is a big plus).

Why would requiring separate objects be a problem? [...]
 Thanks for the description of your idea.
 
 As I understand it, in your approach all the mandatory parameters
 for all classes in the hierarchy are /always/ passed to the final
 child constructor.  In my idea a constructor in the hierarchy could
 chose to set some of the mandatory members of it's parents, and the
 compiler would detect that and would not require the initialisation
 block to contain those members.

In my case, the derived class ctor could manually set some of the fields in Args before handing to the superclass. Of course, it's not as ideal, since if user code already sets said fields, then they get silently overridden.
 Also, in your approach there isn't currently any enforcement that
 the user sets all the mandatory parameters of Args, and this is
 kinda the main issue my idea solves.

True. One workaround is to use Nullable and check that in the ctor. But I suppose it's not as great as a compile-time check.
One thing about your implementation that I found limiting was that
you *have* to declare all required fields on-the-spot before the
compiler will let your 'new' call pass, so if you have to create 5
similar instances of the class, you have to copy-n-paste most of the
set-method calls:

	auto obj1 = new C() {
		name = "test1",
		age = 12,
		school = "D Burg High School"
	});

[...]

Whereas using my approach, you can simply reuse the Args struct
several times:

	C.Args args;
	args.name = "test1";
	args.age = 12;
	args.school = "D Burg High School";
	auto obj1 = new C(args);

	args.name = "test2";
	auto obj2 = new C(args);

	args.name = "test3";
	auto obj3 = new C(args);

	... // etc.

Or.. you use a mixin, or better still you add a copy-constructor or .dup method to your class to duplicate it :)

But then you end up with the problem of needing to call set methods after the .dup, which may complicate things if the set methods need to do non-trivial initialization of internal structures (caches or internal representations, etc.). Whereas if you hadn't needed to .dup, you could have gotten by without writing any set methods for your class, but now you have to. [...]
 In my case you can call different functions in the initialisation
 block, e.g.
 
 void defineObject(C c)
 {
   c.school = "...);
 }
 
 C c = new C() {
   defineObject()
 }
 
 :)

So the compiler has to recursively traverse function calls in the initialization block in order to check that all required fields are set? That could have entail some implementational issues, if said function calls can be arbitrarily complex. (If you have complex control logic in said functions, the compiler can't in general determine whether or not some paths will/will not be taken that may assignment statements to the object's fields, since that would be equivalent to the halting problem. Worse, the compiler would have to track aliases of the object being set, in order to know which assignment statements are setting fields in the object, and which are just computations on the side.) Furthermore, what if defineObject tries to do something with C other than setting up fields? The object would be in an illegal state since it hasn't been fully constructed yet.
I think another interesting idea is using the builder pattern with
create-set-call objects.

For example, a builder template class could inspect the object for
UDA's indicating a data member which is required during
initialisation.  It would contain a bool[] to flag each member as
not/initialised and expose a setMember() method which would call the
underlying object setMember() and return a reference to itself.

At some point, these setMember() method would want to return another
template class which contained just a build() member.  I'm not sure
how/if this is possible in D.

Hmm, this is an interesting idea indeed. I think it may be possible to implement in the current language.

The issue I think is the step where you want to mutate the return type from the type with setX members to the type with build().

I'm not sure I understand that sentence. Could you rephrase it?
Maybe we can make use of UDAs to indicate which fields are mandatory

That was what I was thinking.
[...]
Just a rough idea, haven't actually tried to compile this code yet.

Worth a go, it doesn't require compiler support like my idea so it's far more likely you'll get something at the end of it.. I can just sit on my hands and/or try to promote my idea. I still prefer my idea :P. I think it's cleaner and simpler, this is in part because it requires compiler support and that hides the gory details, but also because create-set-call is a simpler style in itself. Provided the weaknesses of create-set-call can be addressed I might be tempted to use that style.

One thing I like about your idea is that you can reuse the same chunk of memory that the eventual object is going to sit in. With my approach, the ctors still have to copy the struct fields into the object fields, so there is some overhead there. (Having said that though, that overhead shouldn't be anything worse than the ctor-with-arguments calls it replaces; you're basically just abstracting away the ctor parameters on the stack into a struct. In machine code it's pretty much equivalent.) Requiring compiler support, though, as you said, makes your idea less likely to actually happen. I still see it as essentially equivalent to my approach; the syntax is different and the usage pattern differs, but at the end of the day, it amounts to the same thing: basically your objects have two phases, a post-creation, pre-usage stage where you set things up, and a post-setup stage where you actually start using it. Anyway, now that I'm thinking about this problem again, I'd like to take a step back and consider if any other good approaches may exist to tackle this issue. I'm thinking of the general case where the initialization of an object may be arbitrarily complex, such that neither a struct of ctor arguments nor an initialization block may be sufficient. The problem with the struct approach is, what if you need a complex setup process, say constructing a graph with complex interconnections between nodes? In order to express such a thing, you have to essentially already create the object before you can pass the struct to the ctor, which kinda defeats the purpose. Similarly, your approach of an initialization block suffers from the limitation that the initialization is confined to that block, and you can't allow arbitrary code in that block (otherwise you could end up using an object that hasn't been fully constructed yet -- like the defineObject problem I pointed out above). Keeping in mind the create-set-call pattern and Perl's approach of "blessing" an object into a full-fledged class instance, I wonder if a more radical approach might be to have the language acknowledge that objects have two phases, a preinitialized state, and a fully-initialized state. These two would have distinct types *in the type system*, such that you cannot, for example, call post-init methods on a pre-initialization object, and you can't call an init method on a post-initialization object. The ctor would be the unique transition point which takes a preinitialized object, verifies compliance with class invariants, and returns a post-initialization object. In pseudo-code, this might look something like this: class MyClass { public: preinit void setName(string name); preinit void setAge(int age); this() { if (!validateFields()) throw new Exception(...); } // The following are "normal" methods that cannot be // called in a preinit state. void computeStatistics(); void dotDotDotMagic(); } void main() { auto obj = new MyClass(); assert(typeof(obj) == MyClass.preinit); /* MyClass.preinit is a special type indicating that the * object isn't fully initialized yet */ // Compile error: cannot call non- preinit method on // preinit object. //obj.computeStatistics(); obj.setName(...); // OK obj.setAge(...); // OK // Transition object to full-fledged state obj.this(); // not sure about this syntax yet assert(typeof(obj) == MyClass); /* Now obj is a full-fledged member of the class */ // Compile error: can't call preinit method on // non-preinit object //obj.setName(...); obj.computeStatistics(); // OK } MyClass.preinit would be a separate type in the type system, so that you can pass it around without any risk that someone will try to perform illegal operations on it before it's fully initialized: void doSetup(MyClass.preinit obj) { obj.setName(...); // OK //obj.computeStatistics(); // compile error } void main() { auto obj = new MyClass(); doSetup(obj); // OK obj.this(); // "promote" to full-fledged object // Illegal: can't implicitly convert MyClass into // MyClass.preinit. //doSetup(obj); obj.computeStatistics(); // OK } Maybe "obj.this()" is not a good syntax, perhaps "obj.promote()"? In any case, this is a rather radical idea which requires language support; I'm not sure how practical it is. :) T -- "Uhh, I'm still not here." -- KD, while "away" on ICQ.
Jul 18 2013
prev sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 18 Jul 2013 19:00:44 +0100, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:

 On Thu, Jul 18, 2013 at 10:13:58AM +0100, Regan Heath wrote:
 On Wed, 17 Jul 2013 18:58:53 +0100, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:

I guess my point was that if we boil this down to the essentials,
it's basically the same idea as a builder pattern, just implemented
slightly differently. In the builder pattern, a separate object (or
struct, or whatever) is used to encapsulate the state of the object
that we'd like it to be in, which we then pass to the ctor to create
the object in that state. The idea is the same, though: set up a
bunch of values representing the desired initial state of the object,
then, to borrow Perl's terminology, "bless" it into a full-fledged
class instance.

It achieves the same ends, but does it differently. My idea requires compiler support (which makes it unlikely to happen) and doesn't require separate objects (which I think is a big plus).

Why would requiring separate objects be a problem?

It's not a problem, it's just better not to, if at all possible. K.I.S.S. :)
 In my case, the derived class ctor could manually set some of the fields
 in Args before handing to the superclass. Of course, it's not as ideal,
 since if user code already sets said fields, then they get silently
 overridden.

That's the problem I was imagining.
 Also, in your approach there isn't currently any enforcement that
 the user sets all the mandatory parameters of Args, and this is
 kinda the main issue my idea solves.

True. One workaround is to use Nullable and check that in the ctor. But I suppose it's not as great as a compile-time check.

Yeah, I was angling for a static/compile time check, if at all possible.
Whereas using my approach, you can simply reuse the Args struct
several times:

	C.Args args;
	args.name = "test1";
	args.age = 12;
	args.school = "D Burg High School";
	auto obj1 = new C(args);

	args.name = "test2";
	auto obj2 = new C(args);

	args.name = "test3";
	auto obj3 = new C(args);

	... // etc.

Or.. you use a mixin, or better still you add a copy-constructor or .dup method to your class to duplicate it :)

But then you end up with the problem of needing to call set methods after the .dup

Which is no different to setting args.name beforehand, the same number of assignments. In the example above it's N+1 assignments, N args or dup'ed members and 1 more for 'name' before or after the construction.
 which may complicate things if the set methods need to
 do non-trivial initialization of internal structures (caches or internal
 representations, etc.).

Ahh, yes, and in this case you'd want to use the idea below, where you call a method to set the common parts and manually set the differences.
 Whereas if you hadn't needed to .dup, you could
 have gotten by without writing any set methods for your class, but now
 you have to.

create-set-call <- 'set' is kinda an integral part of the whole thing :P
 [...]
 In my case you can call different functions in the initialisation
 block, e.g.

 void defineObject(C c)
 {
   c.school = "...);
 }

 C c = new C() {
   defineObject()
 }

 :)

So the compiler has to recursively traverse function calls in the initialization block in order to check that all required fields are set?

Yes. This was an off the cuff idea, but it /is/ a natural extension of the idea for the compiler to traverse the setters called inside the initialisation block, and ctors in the hierarchy, etc.
 That could have entail some implementational issues, if said function
 calls can be arbitrarily complex. (If you have complex control logic in
 said functions, the compiler can't in general determine whether or not
 some paths will/will not be taken that may assignment statements to the
 object's fields, since that would be equivalent to the halting problem.

All true. The compiler has a couple of options to (re)solve these issues: 1. It could simply baulk at the complexity and error. 2. It could take the safe route and assume those member assignments it cannot verify are uninitialised, forcing manual init. In fact, erroring at complexity might make for better code in many ways. You would have to perform your complex initialisation beforehand, store the result in a variable, and then construct/initblock your object. It does limit your choice of style, but create-set-call already does that .. and I'm not immediately against style limitations assuming they actually result in better code.
 Worse, the compiler would have to track aliases of the object being set,
 in order to know which assignment statements are setting fields in the
 object, and which are just computations on the side.)

No, aliasing would simply be ignored. In fact, calling a setter on another object in an initblock should probably be an error. Part of the whole "don't mix initialisation" goal I started with. It does require strict properties.
 Furthermore, what if defineObject tries to do something with C other
 than setting up fields? The object would be in an illegal state since it
 hasn't been fully constructed yet.

That's an error. This is why in my initial post I stated that we'd need explicit/well defined properties. All you would be allowed to call in an initialisation block, on the object being initialised, are setter properties.. and possibly methods or free function which only call setter properties.
I think another interesting idea is using the builder pattern with
create-set-call objects.

For example, a builder template class could inspect the object for
UDA's indicating a data member which is required during
initialisation.  It would contain a bool[] to flag each member as
not/initialised and expose a setMember() method which would call the
underlying object setMember() and return a reference to itself.

At some point, these setMember() method would want to return another
template class which contained just a build() member.  I'm not sure
how/if this is possible in D.

Hmm, this is an interesting idea indeed. I think it may be possible to implement in the current language.

The issue I think is the step where you want to mutate the return type from the type with setX members to the type with build().

I'm not sure I understand that sentence. Could you rephrase it?

I am imagining using a template to create a type which wraps the original object. The created type would expose setter properties for all the mandatory members, and nothing else. The user would call these setters, using UFCS/chain style, however, only after setting all the mandatory properties do we want to expose an additional member called build() which returns the constructed/initialised object. So, an example: class Foo {...} auto f = Builder!(Foo)().setName("Regan").setAge(33).build(); The type of the object returned from the Builder!(Foo) is our first created type, which exposes setName() and setAge(), however the type returned from setAge (or whichever member assignment is done last) is the second created type, which either has all the set.. members plus build() or only build(). The build() method returns a Foo. So, the type of 'f' above is Foo. The goal here is to make build() statically available when Foo is completely initialised and not before. Of course we could simplify all this by making it available immediately and throwing if some members are uninitialised - but that is a runtime check and I was angling for a compile time one. If you wanted to enforce a specific init ordering you could even produce a separate type containing only the next member to init, and from each setter return the next type in sequence - like a type state machine :p The template bloat however..
 The problem with the struct approach is, what if you need a complex
 setup process, say constructing a graph with complex interconnections
 between nodes? In order to express such a thing, you have to essentially
 already create the object before you can pass the struct to the ctor,
 which kinda defeats the purpose. Similarly, your approach of an
 initialization block suffers from the limitation that the initialization
 is confined to that block, and you can't allow arbitrary code in that
 block (otherwise you could end up using an object that hasn't been fully
 constructed yet -- like the defineObject problem I pointed out above).

Yes, neither idea works for all possible use-cases. Yours is naturally broader and less limiting because I was starting from a limited create-set-call style and imposing further limitation on how it can be used.
 Keeping in mind the create-set-call pattern and Perl's approach of
 "blessing" an object into a full-fledged class instance, I wonder if a
 more radical approach might be to have the language acknowledge that
 objects have two phases, a preinitialized state, and a fully-initialized
 state. These two would have distinct types *in the type system*, such
 that you cannot, for example, call post-init methods on a
 pre-initialization object, and you can't call an init method on a
 post-initialization object.

That is essentially the same idea as the builder template solution I talk about above :)
 The ctor would be the unique transition
 point which takes a preinitialized object, verifies compliance with
 class invariants, and returns a post-initialization object.

AKA build() above :) R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jul 19 2013