www.digitalmars.com         C & C++   DMDScript  

digitalmars.dip.ideas - Inline sumtype

reply Richard (Rikki) Andrew Cattermole <richard cattermole.co.nz> writes:
I've gone back to the drawing board on sumtypes, and I had some 
ideas yesturday based upon feedback from the last couple of years.

Unlike the other designs that have been proposed, this one is an 
inline to the type definition instead of having a declaration. It 
gives enum declarations without a value a type (error currently), 
that is non-unique to the declaration.

Matching is not added here, but my previous DIP for them could be 
made to work for it.



A sumtype contains zero or more elements.

Each element may have a name.

An element type + name pair must be unqiue in the element list, 
and a name may only appear once.

Valid: ``alias S = sumtype (int, int i);``

Invalid: ``alias S = sumtype(int i, float i);``

Any element whose type is an enum type, will also have the 
element's name set to that of its identifier.



Two sumtypes may be merged together using the ``+`` operator.

```d
alias A = sumtype(int);
alias B = sumtype(float);
alias C = A + B;
```

And subtracted from with ``-``:

```d
alias C = sumtype(int, float);
alias B = sumtype(float);
alias A = C - B;
```

The normal restrictions within a sumtype elements apply before 
and after a setop has occured.

Combine with alias assignment for fine grained control:

```d
alias Result = sumtype();

static foreach(Type; Input) {
	Result += sumtype(Type);
}
```

Duplicates are ignored during merging.



A sumtype maybe constructed in one of three situations:

- Sumtype initialization syntax: ``Type(Expression)`` or 
``Type(name: Expression)``.
- Variable declaration: ``Type var = Expression;``
- Return: ``return Expression;``
- Function call: ``func(1);`` where ``void func(sumtype(int) 
param)``

For function calls, the argument to parameter matching will use 
conversion, and will be considered less of a match than a exact 
one.



A sumtype may be assigned to another, that has an comparable 
element list.

```d
alias A = sumtype(int);
alias B = sumtype(int | string);

B b = A(5);
```

Assigning a sumtype to another will have preference over 
initialization.

```d
alias A = sumtype(int);
alias B = sumtype(int, A);

B initialization = 8;
B assignment = A(9);
```



An enum without a value, is given a non-unique type based upon 
its identifier.

```d
enum None;
pragma(msg, typeof(None)); // __enumtype("None")
static assert(is(typeof(None) == __enumtype));
```

An enum type may be used as its type, when the grammar requires a 
type:

```d
None none = None;
```

As its size is zero, any variables that are of these types are 
dummy and can replace the existing practice of ``void[0] 
storage;``. They do not contribute to field layouts.

An assignment of ``true`` will succeed, although will be no-op.

```d
None none = true;
```

The enum type ``__enumtype("None")`` will have an instance in 
object.d.

The mangling of an enum type does not include the module it is in.



To check the type of a sumtype against a known type, use an is 
expression.

``assert(sumtype(int).init is int);``

Other comparisons i.e. ``==`` are done by compiler hook by 
matching and comparing the values if the tags match.



Casting a sumtype results in a read barrier to check the tag 
matches the requested type on read. There is no read barrier on 
write.

The result of a cast cannot be passed around by-ref or taken a 
pointer to.



A sumtype holds the properties:

- ``tag`` that of the tagged union
- ``storage`` the block of storage (`` system``), typed as 
``void[X]``
- ``types`` holds a sequence of types, which are the types for 
the elements.
- ``names`` holds a sequence of strings, which are the names for 
the elements.
- ``copyconstructor``, the function pointer for the copy 
constructor `` system``.
- ``destructor``, the function pointer for the destructor 
`` system``.

All properties are assignable in non-`` safe`` code except 
``names`` and ``types``.



The layout of a sumetype is variable length it is as follows:

- ``size_t`` tag
- ``void function(ref new_, ref old_)`` Copy constructor
- ``void function(ref old_)`` Destructor
- ``void[X]`` Storage

  The tag is a hash of the fully qualified name of the type + name.

The copy constructor and destructor will work, as long as their 
arguments are pointing at storage. In practice the compiler will 
need to inject a null check before calling. The calling 
convention of the functions matches that of methods.

Attributes used on the copy constructor and destructor will be 
the common denominator between all the elements who have copy 
constructors and destructor respectively.

These are optional if none of the element types use a copy 
constructor or destructor.



```diff
BasicType:
+    SumType

TypeSpecialization:
+    sumtype
+    __enumtype

+ SumType:
+    sumtype ( SumTypeElements|opt )

+ SumTypeElements:
+    SumTypeElements '|' SumTypeElement
+    SumTypeElement

+ SumTypeElement
+    Type Identifier
+    Type
```



```d
enum None;
alias Animal = sumtype(None | Cat | Dog dog);

struct Cat {
}

struct Dog {
}

void main() {
	Animal animal = Animal(Cat());
	
	animal.dog = Dog();
	cast(Cat)animal = Cat();

	writeln(cast(Cat)animal);

	animal.None = true;
	assert(animal is None);
}

void someFunc(Animal animal) {
	import std.stdio;
	writeln(animal.tag); // some hash number
}
```
Jun 20
next sibling parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 ```d
 alias A = sumtype(int);
 alias B = sumtype(float);
 alias C = A + B;
 ```
It seems silly to me to consume alias operators on an untested and unimplemented concept when its not like it couldnt be used elsewhere why not: ? ```d alias seq(T...)=T; alias a=seq!int; alias b=seq!float; alias c=a+b; ``` ```d alias a=seq!(); a+=int; a+=float; ```
Jun 20
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 21/06/2025 6:27 AM, monkyyy wrote:
 On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 ```d
 alias A = sumtype(int);
 alias B = sumtype(float);
 alias C = A + B;
 ```
It seems silly to me to consume alias operators on an untested and unimplemented concept when its not like it couldnt be used elsewhere why not: ? ```d alias seq(T...)=T; alias a=seq!int; alias b=seq!float; alias c=a+b; ``` ```d alias a=seq!(); a+=int; a+=float; ```
The operators are defined on the sumtype, nothing else. Alias is not a typedef, it does not have its own unique type in the type system, so there is nothing for the operators to attach to in the language. The alias itself disappears from the type system. The type system can only see what it has been aliased to, its a direct replacement, and yes there is at least one bug due to this. Have I explained this well enough? I get the feeling that you should have come across this by now, and therefore this explanation won't be enough.
Jun 20
prev sibling next sibling parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 - ``size_t`` tag
Nonsense why would someone need 64 bits of types, that wont ever compile enum are at least only ints https://dlang.org/spec/enum.html#enum_properties (tho they should also just be bytes that upgrade to shorts if you ever go longer then 255 elements)
Jun 20
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 21/06/2025 6:43 AM, monkyyy wrote:
 On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 - ``size_t`` tag
Nonsense why would someone need 64 bits of types, that wont ever compile enum are at least only ints https://dlang.org/spec/ enum.html#enum_properties (tho they should also just be bytes that upgrade to shorts if you ever go longer then 255 elements)
Hashes in D are typically size_t or ulong. The use of size_t makes sense as it takes up a full register, and we may as well use all of it to get more accuracy. The rest of the layout won't merge into it. https://github.com/dlang/dmd/blob/f0541d65ba777e6f03499bcb5c0c59da8ce94050/druntime/src/object.d#L137 https://github.com/dlang/dmd/blob/f0541d65ba777e6f03499bcb5c0c59da8ce94050/druntime/src/core/internal/hash.d#L132
Jun 20
parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 20 June 2025 at 18:46:38 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 
 Hashes in D are typically size_t or ulong.
why would it be a hash? snars being overly fancy here but this will be a ubyte most of the time: https://github.com/dlang/phobos/blob/832cc465998b1ea77051cd3fd014b544442a4f8c/std/sumtype.d#L288 in mine versions I think I used enum as is, meaning its wouldve been an int in an ideal world: ```d struct sumtype(T...){ enum Tag=enum{ static foreach(A;T){... union Union{ static foreach(A;T){... Tag tag; Union myunion; ... } ``` Its not a hash, its an enum paired with a union; then you can iterate T while having tag with `static foreach(I,A;T){ if(I==tag){...` ; simple and sane Ideally enum with less then 255 elements would be ubytes so you get to make simpliticy vs idealness tradeoffs; but 64 bits is insane; break web apis if a client is 32 bit, also insane. Size_t is awful in general. The d compiler breaks down with deeply nested template hell, how do you plan on generating 64^2 types when theres a 100 depth recursion limit and they all have to be in memory.
Jun 20
next sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 20 June 2025 at 19:17:33 UTC, monkyyy wrote:
  The d compiler breaks down with deeply nested template hell, 
 how do you plan on generating 64^2 types when theres a 100 
 depth recursion limit and they all have to be in memory.
Also isnt uniqueness filtering a O(n^2) algorithm? at n pf 2^64, your not finishing that filter in the runtime of the universe
Jun 20
prev sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 21/06/2025 7:17 AM, monkyyy wrote:
 On Friday, 20 June 2025 at 18:46:38 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Hashes in D are typically size_t or ulong.
why would it be a hash?
When you assign one sumtype to another, where the second has a different element set, you have to match and then translate the previous tag to the new one. If you are doing this often, that is a lot of unnecessary work. A hash will work in both sumtypes and is therefore a direct copy. This was suggested to me by Jacob Carlburg in the context of value type exceptions and is a brilliant way to minimize the cost.
 snars being overly fancy here but this will be a ubyte most of the time: 
 https://github.com/dlang/phobos/ 
 blob/832cc465998b1ea77051cd3fd014b544442a4f8c/std/sumtype.d#L288
 
 in mine versions I think I used enum as is, meaning its wouldve been an int
 
 in an ideal world:
 ```d
 struct sumtype(T...){
    enum Tag=enum{ static foreach(A;T){...
    union Union{ static foreach(A;T){...
    Tag tag;
    Union myunion;
    ...
 }
 ```
 Its not a hash, its an enum paired with a union; then you can iterate T 
 while having  tag with `static foreach(I,A;T){ if(I==tag){...` ; simple 
 and sane
 
 Ideally enum with less then 255 elements would be ubytes so you get to 
 make simpliticy vs idealness tradeoffs; but 64 bits is insane; break web 
 apis if a client is 32 bit, also insane. Size_t is awful in general. The 
 d compiler breaks down with deeply nested template hell, how do you plan 
 on generating 64^2 types when theres a 100 depth recursion limit and 
 they all have to be in memory.
I'm not. You are presupposing that the tag is an offset, not a hash. If it was an offset I would indeed make it variable sized, so only the amount needed would be used. The benefit of using an offset is branch tables will have a better chance to work. But I'm not convinced that it is worth it over getting the cheaper copies.
Jun 20
prev sibling next sibling parent reply MrSmith33 <mrsmith33 yandex.ru> writes:
On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 The tag is a hash of the fully qualified name of the type + 
 name.
* What happens in the case of collision? * Can collision be detected at compile-time?
Jun 21
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 22/06/2025 12:58 AM, MrSmith33 wrote:
 On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 The tag is a hash of the fully qualified name of the type + name.
* What happens in the case of collision? * Can collision be detected at compile-time?
I see no reason it can't be detected. However in practice the hash is so large compared to the elements in the set, that the chances of it hitting something when AA's don't is pretty low.
Jun 21
prev sibling parent reply Lance Bachmeier <no spam.net> writes:
On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 I've gone back to the drawing board on sumtypes, and I had some 
 ideas yesturday based upon feedback from the last couple of 
 years.
It would be useful to include a comparison with std.sumtype. Explicitly state the new functionality this enables and the improvement in cases it overlaps.
 void main() {
 	Animal animal = Animal(Cat());
 	
 	animal.dog = Dog();
 	cast(Cat)animal = Cat();

 	writeln(cast(Cat)animal);

 	animal.None = true;
 	assert(animal is None);
 }

 void someFunc(Animal animal) {
 	import std.stdio;
 	writeln(animal.tag); // some hash number
 }
 ```
Could it do this? ``` double fun(Dog d) { // Operations specific to Dog } double fun(Cat c) { // Operations specific to Cat } auto a1 = Animal(Cat()); fun(a1); auto a2 = Animal(Dog()); fun(a2); ``` Just one example where this is useful is with dates. You might have an int, two ints, or a string. Handling those cases with templates or structs is less than an optimal experience in terms of verbosity, ugliness, and complexity.
Jun 21
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 22/06/2025 3:06 AM, Lance Bachmeier wrote:
 On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 I've gone back to the drawing board on sumtypes, and I had some ideas 
 yesturday based upon feedback from the last couple of years.
It would be useful to include a comparison with std.sumtype. Explicitly state the new functionality this enables and the improvement in cases it overlaps.
std.sumtype has to work around a lot of problems in the type system, that will continue to exist. It can't introduce new concepts like naming, although I'm wondering if that should go (which I hate having to do). Making it first class is also a win for usability. I've thought about it, however this isn't a full DIP just enough to evaluate it, and I've already found a couple things that I want to explore.
 void main() {
     Animal animal = Animal(Cat());

     animal.dog = Dog();
     cast(Cat)animal = Cat();

     writeln(cast(Cat)animal);

     animal.None = true;
     assert(animal is None);
 }

 void someFunc(Animal animal) {
     import std.stdio;
     writeln(animal.tag); // some hash number
 }
 ```
Could it do this? ``` double fun(Dog d) { // Operations specific to Dog } double fun(Cat c) { // Operations specific to Cat } auto a1 = Animal(Cat()); fun(a1); auto a2 = Animal(Dog()); fun(a2); ``` Just one example where this is useful is with dates. You might have an int, two ints, or a string. Handling those cases with templates or structs is less than an optimal experience in terms of verbosity, ugliness, and complexity.
No. Modifying overload and symbol resolution to understanding matching like this would make Walter a very unhappy person. He already complains about how complex it is. I for one do not wish to poke that nest of problems that is symbol resolution.
Jun 21
parent Paul Backus <snarwin gmail.com> writes:
On Saturday, 21 June 2025 at 20:01:13 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 std.sumtype has to work around a lot of problems in the type 
 system, that will continue to exist.

 It can't introduce new concepts like naming, although I'm 
 wondering if that should go (which I hate having to do).
The only reason std.sumtype can't introduce naming is due to backward compatibility requirements. In principle, there is nothing stopping a library sum type from having named members.
Jun 22