digitalmars.D - Struct-typing in Phobos

Chris Williams (212/212) Feb 03 2014 I've been working on adding dcrypt to Phobos, which meant

Jakob Ovrum (13/34) Feb 03 2014 Having to know what a particular concept does is pretty much the
Dicebot (27/27) Feb 04 2014 I find structs completely superior to classes as design base and

Chris Williams (16/19) Feb 04 2014 Really, my only intent was to write the article about how to use

Dicebot (15/28) Feb 04 2014 "alias this" for the rescue again. You can a struct private

Chris Williams (203/205) Feb 05 2014 So it seems. I have a vague recollection of using alias this a

Arjan (4/4) Feb 05 2014 On Wednesday, 5 February 2014 at 23:47:03 UTC, Chris Williams

Craig Dillabaugh (3/7) Feb 06 2014 I wanted to say the same think. I found your write up very

rumbu (18/18) Feb 06 2014 I think the most elegant way to satisfy both crowds (OOP and

Jesse Phillips (18/22) Feb 04 2014 I think this touches on why Manu is interested in final by

"Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:

I've been working on adding dcrypt to Phobos, which meant 
considering what to do in terms of taking it easy and simply 
pushing it in as-is, or converting all of the classes and 
interfaces into structs with type checkers (I'll be doing the 
latter). But it made me consider the pros and cons of the two 
options.

Basically, it seemed to come down to:

Pro-structs
1. Smaller, faster
2. Adds unique D-ish flavor to the standard library

Anti-structs
1. Difficult to read. E.g. a beginning programmer needs to be 
able to look at this and understand it before he can use the 
library:

size_t insertBack(Stuff)(Stuff stuff) if 
(isImplicitlyConvertible!(Stuff, T) || isInputRange!Stuff && 
isImplicitlyConvertible!(ElementType!Stuff, T));

And of course, even being able to read it, one would still need 
to track down isInputRange() and isImplicitlyConvertible() to 
know what they do.

2. When using classes and interfaces, type checking is as simple 
as writing the type to be passed in as a parameter. With 
struct-typing, people need to manually append a series of checks 
all over the place, which results in things like the following:

auto uniform(T, UniformRandomNumberGenerator)(ref 
UniformRandomNumberGenerator urng) if (!is(T == enum) && 
(isIntegral!T || isSomeChar!T));

Even though isUniformRNG() exists and is used on other 
implementations of uniform(), it isn't used here, because whoever 
developed the function forgot to include it on this particular 
function.

3. Non-extensible. To "extend" an SList or whatever, I would have 
to write a wrapper class with methods that forward to the SList 
methods, if I wanted my object to be able to interoperate with 
Phobos, or I would need to modify Phobos so that the body of 
SList was a template that could be mixed in (assuming I didn't 
want to override any methods, just add new ones).

4. Unclear how great an advantage the smaller/faster aspect 
actually gives us relative to the demerits of this style of 
coding. For example, using code from the below writeup (see 
below), I tested the performance of inserting 100,000 items then 
removing them all 100 times with both the interface/class and 
struct versions, for a total time of 1905±4ms / 1930±10ms (with a 
slightly smaller test the struct won, suggesting that there's no 
real difference). My suspicion is that the compiler/linker was 
detecting that there was only a single implemntation with no 
subclasses, hence it compiled out to something roughly equivalent 
with no vtable, so I split class implementation of HashSet into 
two, with an abstract base class that contained the "data" 
variable and nothing else, with all the methods declared in the 
subclass and that bumped the runtime to 1980±10ms. But still 
that's only a 2.5% difference in speed and only if one goes 
gung-ho with layering classes.

And for many objects - like random generators or HashSets - you 
aren't going to have masses of instances of the same type, just 
one top-level instance that internally contains basic data types 
(like structs), so there likely won't be much of a memory 
difference for most applications either.


Personally, I think that a better method of type-specialization 
needs to be added to the language (e.g. allowing isX!T statements 
to be used as types where we write types rather than preprocessor 
functions or allowing structs to implement interfaces) so that 
post-fixed "if" statements on function declarations are more of a 
last-ditch effort, rather than the norm. Though as pointed out, 
it might not be worth it unless someone can point out truly 
significant performance advantages.

But certainly for the moment, at least having an article about 
these things on the top site seems advisable, given how 
frequently we see them in the standard library. Plus, users and 
Phobos developers might want to have a quick tutorial to make 
their own.

So here's a quick write-up:

----

The core library of a language should be small, fast, powerful, 
and generic. This last item, however, causes a problem for 
library architects. Traditionally, the more generic something is, 
the larger and slower it is - with several layers of abstraction 
and indirection to allow many disparate implementations of a 
generic idea to all serve under the same interface. D supports 
all of the features that would allow one to write a library like 
this, but it also supports features that allow one to write a 
library which does not sacrifice performance for generality.

For example, here would be a standard declaration of a Set 
interface and a simple implementation as a class:

interface Set(T) {
     size_t length();
     size_t insert(T);
     T front();
     T back();
     void removeFront();
     void removeBack();
}

class HashSet(T) : Set!(T) {
private:
     void[0][T] data;

public:
     size_t length() {
         return data.length;
     }

     size_t insert(T value) {
         data[value] = [];
         return data.length;
     }

     T front() {
         foreach (k, v; data) {
             return k;
         }
         return T.init;
     }

     T back() {
         foreach_reverse (k, v; data) {
             return k;
         }
         return T.init;
     }

     void removeFront() {
         if (data.length == 0) return;

         T key;
         foreach (k, v; data) {
             key = k;
             break;
         }
         data.remove(key);
     }

     void removeBack() {
         if (data.length == 0) return;

         T key;
         foreach_reverse (k, v; data) {
             key = k;
             break;
         }
         data.remove(key);
     }
}

When we write a function which accepts a Set, we can write:

void foo(Set someSet) {
     someSet.insert(a);
     writeln(someSet.front);
}

This will accept and operate on any class which implements the 
Set interface (as you would expect).

To reduce overhead, in the D standard library, it is standard to 
try and implement this as a struct instead of as a class. But if 
we implement HashSet as a struct (which cannot implement an 
interface), then we would traditionally have no way to write 
functions which can accept generic types, like Sets.

This is solved firstly by use of the templating system. If I have 
a function which accepts a templated type, then so long as the 
code compiles once the type has been resolved, any arbitrary 
class or struct which implements the correct functions will 
compile and operate correctly.

void foo(Set)(Set someSet) { // "Set" is just a templated type 
name
     someSet.insert(a); // Only compiles if Set resolves to a type 
with insert and front defined
     writeln(someSet.front);
}

In a sense, this is a method for duck-typing in D, though not a 
very good one since if we look at the function declaration, it is 
not clear what types are or are not valid as parameters to foo(). 
We must look through the implementation of foo() to find useages 
of someSet to determine what is required. Likewise, there is 
nothing like "interface Set" which is mandating a standardized 
set of methods that all of our functions can expect and which 
developers of new implementations of Set types can know to 
implement.

This is corrected by use of compile-time duck-type checking 
templates. First we declare a template which implements a static 
test of compilability of the type for our chosen interface.

template isSet(Set) {
     enum bool isSet = is(typeof(
     (inout int = 0)
     {
         Set!int s = void; // can define an instance
         if (s.length == 0) {} // can test size
         s.insert(1); // can insert
         auto h = s.front; // can access front
         h = s.back; // can access back
         s.removeFront; // can remove front
         s.removeBack; // can remove back
     }));
}

Following this, we first want to validate that our implementation 
is compliant.

struct HashSet(T) {
     static assert( isSet!(HashSet!T) );
     ...
}

We can then upgrade foo() to also require only types which 
conform:

void foo(Set)(Set someSet) if (isSet!Set) {
     ...
}

While more verbose, this sort of duck-typing offers advantages 
beyond just speed. As it tests only 'compilability', options open 
up to allow flexibility in your definitions. One such example, 
the random number generators in std.random are defined so that 
they can be treated like InputRange types. Any InputRange is 
expected to return whether it is empty or not, but as a random 
number generator never becomes empty, in std.random the "empty" 
value is a constant (enum) attribute rather than a method.

Another similar benefit is the ability to allow undefined return 
types. For example, asymmetric cryptography algorithms should 
always generate Public and Private keys, but the values that 
comprise those keys is dependent on the algorithm used. 
RSAPublicKey and ElGamalPublicKey are basic data structures, not 
actionable classes. There is no reason to have a shared base 
class or common interface. With D's template-based duck-typing, 
one can validate the existence of public/private key accessors 
(using "auto" to hold the return values), without caring whether 
the return type for different implementations have the same or 
even interchangeable types.

Feb 03 2014

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Tuesday, 4 February 2014 at 02:04:55 UTC, Chris Williams wrote:
 And of course, even being able to read it, one would still need 
 to track down isInputRange() and isImplicitlyConvertible() to 
 know what they do.

Having to know what a particular concept does is pretty much the 
same as having to know what an interface does. Even so, 
`isImplicitlyConvertible` should be intuitive to programmers with 
some experience in any statically typed language.

 2. When using classes and interfaces, type checking is as 
 simple as writing the type to be passed in as a parameter. With 
 struct-typing, people need to manually append a series of 
 checks all over the place, which results in things like the 
 following:

 auto uniform(T, UniformRandomNumberGenerator)(ref 
 UniformRandomNumberGenerator urng) if (!is(T == enum) && 
 (isIntegral!T || isSomeChar!T));

 Even though isUniformRNG() exists and is used on other 
 implementations of uniform(), it isn't used here, because 
 whoever developed the function forgot to include it on this 
 particular function.

Repeatedly needing to check for something nameless yet specific 
like what the above does, is probably cause for introducing a new 
concept.

I'll admit that getting constraints right is difficult, because 
it's impossible to unit test whether an error happened because of 
an immediate resolution failure or something else. We need some 
improvement here if we want to keep aiming for tight constraints.

 3. Non-extensible. To "extend" an SList or whatever, I would 
 have to write a wrapper class with methods that forward to the 
 SList methods, if I wanted my object to be able to interoperate 
 with Phobos, or I would need to modify Phobos so that the body 
 of SList was a template that could be mixed in (assuming I 
 didn't want to override any methods, just add new ones).

Use UFCS to add new methods.

Feb 03 2014

"Dicebot" <public dicebot.lv> writes:

I find structs completely superior to classes as design base and 
only resort to latter if find myself in need of polymorphic 
behavior. There is an issue of bad constraint error messages but 
that is exactly that - implementation flaw that needs to be 
fixed, not a conceptual flaw.

Mostly agree with Jakob have said, few extra tips:

1) you can use `foo(T : Stuff)(T t)` instead of `foo(T)(T t) if 
(isImplicitlyConvertible!(T, Stuff))`

2) You can extend structs no only with UFCS but also via `alias 
this`:

struct One
{
     void foo() {}
}

struct Two
{
     private One one;
     alias one this;

     void bar() {}
}

void main()
{
     Two two;
     two.foo(); two.bar();
}

Aliasing members as this can be considered a form of single 
inheritance in this context.

Feb 04 2014

"Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:

Really, my only intent was to write the article about how to use 
these, not to argue for/against classes, but my preface got 
longer than intended. I'll make sure to note the possibilities 
offered by alias this (but will wait to see if anyone has any 
further updates to suggest before posting an updated version). I 
don't think that UFCS counts as extensibility. You can't add 
variables to your instances, just add helper functions.


On Tuesday, 4 February 2014 at 13:01:28 UTC, Dicebot wrote:
 I find structs completely superior to classes as design base 
 and only resort to latter if find myself in need of polymorphic 
 behavior.

A person who is using Phobos has no option to switch to classes 
when they find that they need polymorphic behavior, except by 
copy-pasting into a new source file and revising the Phobos code 
outside of Phobos.

If there aren't really any speed or size advantages of 
significance, then what are you considering to make structs 
superior? (I ask since I would rather make an argument for 
struct-typing in the article, so people understand the decision.)

Feb 04 2014

"Dicebot" <public dicebot.lv> writes:

On Tuesday, 4 February 2014 at 20:11:05 UTC, Chris Williams wrote:
 On Tuesday, 4 February 2014 at 13:01:28 UTC, Dicebot wrote:
 I find structs completely superior to classes as design base 
 and only resort to latter if find myself in need of 
 polymorphic behavior.

 A person who is using Phobos has no option to switch to classes 
 when they find that they need polymorphic behavior, except by 
 copy-pasting into a new source file and revising the Phobos 
 code outside of Phobos.

"alias this" for the rescue again. You can a struct private 
member of a class and alias it. If you want to override its 
methods in class hierarchy you can generated wrapper methods in 
class via D reflection capabilities and override those.

But opposite is much harder, and issues with std.typecons.scoped 
show it.

 If there aren't really any speed or size advantages of 
 significance, then what are you considering to make structs 
 superior? (I ask since I would rather make an argument for 
 struct-typing in the article, so people understand the 
 decision.)

Speed advantage can be huge on certain types of programs. Each 
class is a reference type - this is first extra indirection. On 
top of that every non-final class call requires virtual table 
lookup - this is second extra indirection. And indirections can 
cost a lot.

I have also mentioned another important concern - you can 
easility turn struct into reference type by declaring pointer to 
struct. You can't easily turn class into value type.

Feb 04 2014

"Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:

On Tuesday, 4 February 2014 at 21:07:01 UTC, Dicebot wrote:
 "alias this" for the rescue again. You can a struct private 
 member of a class and alias it.

So it seems. I have a vague recollection of using alias this a 
long time ago, so I presume that I knew all of its rules back 
then, but had since forgotten.

Here's an updated version of the writeup. Again, I'd like to 
recommend it as one of the How-tos or Articles on the top page, 
to help people out with using Phobos. Is that something I can do 
via GitHub and a pull request, or would the dlang.org admin need 
to do that (if he felt it worthwhile)?

----
The core library of a language should be small, fast, powerful,
and generic. This last item, however, causes a problem for
library architects. Traditionally, the more generic something is,
the larger and slower it is - with several layers of abstraction
and indirection to allow many disparate implementations of a
generic idea to all serve under the same interface. D supports
all of the features that would allow one to write a library like
this, but it also supports features that allow one to write a
library which does not sacrifice performance for generality.

For example, here would be a standard declaration of a Set
interface and a simple implementation as a class:

interface Set(T) {
     size_t length();
     size_t insert(T);
     T front();
     T back();
     void removeFront();
     void removeBack();
}

class HashSet(T) : Set!(T) {
private:
     void[0][T] data;

public:
     size_t length() {
         return data.length;
     }

     size_t insert(T value) {
         data[value] = [];
         return data.length;
     }

     T front() {
         foreach (k, v; data) {
             return k;
         }
         return T.init;
     }

     T back() {
         foreach_reverse (k, v; data) {
             return k;
         }
         return T.init;
     }

     void removeFront() {
         if (data.length == 0) return;

         T key;
         foreach (k, v; data) {
             key = k;
             break;
         }
         data.remove(key);
     }

     void removeBack() {
         if (data.length == 0) return;

         T key;
         foreach_reverse (k, v; data) {
             key = k;
             break;
         }
         data.remove(key);
     }
}

When we write a function which accepts a Set, we can write:

void foo(Set someSet) {
     someSet.insert(a);
     writeln(someSet.front);
}

This will accept and operate on any class which implements the
Set interface (as you would expect).

To reduce overhead, in the D standard library, it is standard to
try and implement this as a struct instead of as a class. But if
we implement HashSet as a struct (which cannot implement an
interface), then we would traditionally have no way to write
functions which can accept generic types, like Sets.

This is solved firstly by use of the templating system. If I have
a function which accepts a templated type, then so long as the
code compiles once the type has been resolved, any arbitrary
class or struct which implements the correct functions will
compile and operate correctly.

void foo(Set)(Set someSet) { // "Set" is just a templated type
name
     someSet.insert(a); // Only compiles if Set resolves to a type
with insert and front defined
     writeln(someSet.front);
}

In a sense, this is a method for duck-typing in D, though not a
very good one since if we look at the function declaration, it is
not clear what types are or are not valid as parameters to foo().
We must look through the implementation of foo() to find useages
of someSet to determine what is required. Likewise, there is
nothing like "interface Set" which is mandating a standardized
set of methods that all of our functions can expect and which
developers of new implementations of Set types can know to
implement.

This is corrected by use of compile-time duck-type checking
templates. First we declare a template which implements a static
test of compilability of the type for our chosen interface.

template isSet(Set) {
     enum bool isSet = is(typeof(
     (inout int = 0)
     {
         Set!int s = void; // can define an instance
         if (s.length == 0) {} // can test size
         s.insert(1); // can insert
         auto h = s.front; // can access front
         h = s.back; // can access back
         s.removeFront; // can remove front
         s.removeBack; // can remove back
     }));
}

Following this, we first want to validate that our implementation
is compliant.

struct HashSet(T) {
     static assert( isSet!(HashSet!T) );
     ...
}

We can then upgrade foo() to also require only types which
conform:

void foo(Set)(Set someSet) if (isSet!Set) {
     ...
}

While more verbose, this sort of duck-typing offers advantages
that allow for further optimization beyond just using structs 
instead of classes. Since the isSet() validation only checks 
'compilability', anyone wishing to implement the Set type can 
write any code that is visually equivalent to what is written in 
the checker.

For example, one type that is already defined in the Phobos 
standard library is InputRange (defined via the isInputRange() 
validator in std.range). Any sort of stream of data should be 
able to be defined as an InputRange. You can grab the current 
value, move to the next, and test whether it is empty.

Pseudo-random number generators can be considered to be 
InputRanges. You can grab the current value, ask for the next, 
and test whether there are more values to be retrieved. Since it 
will infinitely generate more values, any implementation of the 
"empty" check will return false. If InputRange was defined as an 
interface, empty would need to be defined as a method. But since 
isInputRange tests for the compilability of the code, "if 
(r.empty) {}", the random number generators in std.random are 
able to define "empty" as a constant enum with the value of 
"false", rather than a method, improving the performance time for 
any empty tests.

Another similar benefit is the ability to allow undefined return 
types. For example, asymmetric cryptography algorithms should 
always generate Public and Private keys, but there is nothing 
shared between an RSA public key and an El-Gamal public key in 
terms of properties nor interface. In both cases, the values are 
black boxes of use only to the algorithm in question. A checker 
for isAsymmetricCrypt can verify that an object contains 
properties "publicKey" and "privateKey" without having to check 
that they are of any particular type:

template isAsymmetricCrypt(Crypt) {
     enum bool isAsymmetricCrypt = is(typeof(
     (inout int = 0)
     {
         Crypt c = void; // can define an instance
         c.generate; // Can generate a key pair
         auto pub = c.publicKey; // Has a public key
         auto prv = c.privateKey; // Has a private key
         c.publicKey = pub; // Keys can be set
         c.privateKey = prv;
     }));
}

Using interfaces, all public and private keys would need to have 
a shared type, even if it was just an empty interface.

Theoretically, because we are using structs instead of classes or 
interfaces, inheritence would be impossible. D's "alias this" 
feature allows one to bypass this problem, so that any 
struct-based type can be inserted into a second struct or class 
and gain all of the features of the aliased type.

For example, if we wanted to convert our HashSet struct into a 
class and add methods which allow us to track insertions and 
deletions, we can do the following:

class LogSet(T) {
     private HashSet!T base;
     alias base this;

public:
     size_t insert(T value) {
         writefln("Inserting %s", value);
         return base.insert(value);
     }

     void removeFront() {
         writeln("Removing front");
         base.removeFront();
     }

     void removeBack() {
         writeln("Removing back");
         base.removeBack();
     }
}

We can override the methods that we wish to override while 
allowing calls to LogSet(T).front, LogSet(T).back, and 
LogSet(T).length all go direct to HashSet(T).

Feb 05 2014

"Arjan" <arjan ask.me.to> writes:

On Wednesday, 5 February 2014 at 23:47:03 UTC, Chris Williams 
wrote:

Nice and clear write up.
I would love to see more of these kind of articles.

Feb 05 2014

"Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:

On Thursday, 6 February 2014 at 06:46:06 UTC, Arjan wrote:
 On Wednesday, 5 February 2014 at 23:47:03 UTC, Chris Williams 
 wrote:

 Nice and clear write up.
 I would love to see more of these kind of articles.

I wanted to say the same think.  I found your write up very 
readable and informative.

Feb 06 2014

"rumbu" <rumbu rumbu.ro> writes:

I think the most elegant way to satisfy both crowds (OOP and 
anti-OOP) is to use something like this:

https://gist.github.com/rjmcguire/6431542

void foo(T)(T someSet) if (Implements!(T, Set))
{
}

foo will accept any class implementing interface "Set", any 
struct having the same methods as the interface "Set", and any 
other type having equivalent UFCS for "Set" members.

So, the OOP crowd will write classes implementing Set, the 
anti-OOP crowd will create structs and finally, the hardcore 
programming crowd will be happy using UFCS. Everybody is happy.

Of course, it would be nice if the compiler is doing this behind 
the scenes by accepting interface implementation for structs so I 
can write void foo(Set someSet) for any class, interface or 

struct/value type, but I think the D compiler could be smarter 
than that.

Feb 06 2014

"Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:

On Tuesday, 4 February 2014 at 20:11:05 UTC, Chris Williams wrote:
 A person who is using Phobos has no option to switch to classes 
 when they find that they need polymorphic behavior, except by 
 copy-pasting into a new source file and revising the Phobos 
 code outside of Phobos.

I think this touches on why Manu is interested in final by 
default.

A type should be designed to be polymorphic. So if these were 
classes they'd be final anyway.

I don't have a good opinion on my own if that is the correct 
thing, but I certainly don't go around extending random classes.

You paint a fairly bad picture for using structs, but you haven't 
said anything that wasn't true. It seems to me that the reality 
is, most of the time you don't need those benefits. With the 
template constraints one doesn't need to inherit from your base 
class, they just need to implement the same behaviors.

As for the new programming not being familiar with template 
constraints. They are heavily used in D, both Phobos and third 
party libraries, they won't be able to avoid them. I'd say that 
even if using classes it is still better to write functions using 
template constraints (but I don't have any examples to back that 
up).

Feb 04 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Struct-typing in Phobos