digitalmars.dip.ideas - Inline sumtype

Richard (Rikki) Andrew Cattermole (171/171) Jun 20 I've gone back to the drawing board on sumtypes, and I had some

monkyyy (17/22) Jun 20 It seems silly to me to consume alias operators on an untested

Richard (Rikki) Andrew Cattermole (10/35) Jun 20 The operators are defined on the sumtype, nothing else.

monkyyy (8/9) Jun 20 Nonsense why would someone need 64 bits of types, that wont ever

Richard (Rikki) Andrew Cattermole (7/16) Jun 20 Hashes in D are typically size_t or ulong.

monkyyy (28/30) Jun 20 why would it be a hash?

monkyyy (3/6) Jun 20 Also isnt uniqueness filtering a O(n^2) algorithm? at n pf 2^64,
Richard (Rikki) Andrew Cattermole (15/47) Jun 20 When you assign one sumtype to another, where the second has a different...

MrSmith33 (4/6) Jun 21 * What happens in the case of collision?

Richard (Rikki) Andrew Cattermole (4/10) Jun 21 I see no reason it can't be detected.

Lance Bachmeier (22/39) Jun 21 It would be useful to include a comparison with std.sumtype.

Richard (Rikki) Andrew Cattermole (14/63) Jun 21 std.sumtype has to work around a lot of problems in the type system,

Paul Backus (5/9) Jun 22 The only reason std.sumtype can't introduce naming is due to

Richard (Rikki) Andrew Cattermole <richard cattermole.co.nz> writes:

I've gone back to the drawing board on sumtypes, and I had some 
ideas yesturday based upon feedback from the last couple of years.

Unlike the other designs that have been proposed, this one is an 
inline to the type definition instead of having a declaration. It 
gives enum declarations without a value a type (error currently), 
that is non-unique to the declaration.

Matching is not added here, but my previous DIP for them could be 
made to work for it.



A sumtype contains zero or more elements.

Each element may have a name.

An element type + name pair must be unqiue in the element list, 
and a name may only appear once.

Valid: ``alias S = sumtype (int, int i);``

Invalid: ``alias S = sumtype(int i, float i);``

Any element whose type is an enum type, will also have the 
element's name set to that of its identifier.



Two sumtypes may be merged together using the ``+`` operator.

```d
alias A = sumtype(int);
alias B = sumtype(float);
alias C = A + B;
```

And subtracted from with ``-``:

```d
alias C = sumtype(int, float);
alias B = sumtype(float);
alias A = C - B;
```

The normal restrictions within a sumtype elements apply before 
and after a setop has occured.

Combine with alias assignment for fine grained control:

```d
alias Result = sumtype();

static foreach(Type; Input) {
	Result += sumtype(Type);
}
```

Duplicates are ignored during merging.



A sumtype maybe constructed in one of three situations:

- Sumtype initialization syntax: ``Type(Expression)`` or 
``Type(name: Expression)``.
- Variable declaration: ``Type var = Expression;``
- Return: ``return Expression;``
- Function call: ``func(1);`` where ``void func(sumtype(int) 
param)``

For function calls, the argument to parameter matching will use 
conversion, and will be considered less of a match than a exact 
one.



A sumtype may be assigned to another, that has an comparable 
element list.

```d
alias A = sumtype(int);
alias B = sumtype(int | string);

B b = A(5);
```

Assigning a sumtype to another will have preference over 
initialization.

```d
alias A = sumtype(int);
alias B = sumtype(int, A);

B initialization = 8;
B assignment = A(9);
```



An enum without a value, is given a non-unique type based upon 
its identifier.

```d
enum None;
pragma(msg, typeof(None)); // __enumtype("None")
static assert(is(typeof(None) == __enumtype));
```

An enum type may be used as its type, when the grammar requires a 
type:

```d
None none = None;
```

As its size is zero, any variables that are of these types are 
dummy and can replace the existing practice of ``void[0] 
storage;``. They do not contribute to field layouts.

An assignment of ``true`` will succeed, although will be no-op.

```d
None none = true;
```

The enum type ``__enumtype("None")`` will have an instance in 
object.d.

The mangling of an enum type does not include the module it is in.



To check the type of a sumtype against a known type, use an is 
expression.

``assert(sumtype(int).init is int);``

Other comparisons i.e. ``==`` are done by compiler hook by 
matching and comparing the values if the tags match.



Casting a sumtype results in a read barrier to check the tag 
matches the requested type on read. There is no read barrier on 
write.

The result of a cast cannot be passed around by-ref or taken a 
pointer to.



A sumtype holds the properties:

- ``tag`` that of the tagged union
- ``storage`` the block of storage (`` system``), typed as 
``void[X]``
- ``types`` holds a sequence of types, which are the types for 
the elements.
- ``names`` holds a sequence of strings, which are the names for 
the elements.
- ``copyconstructor``, the function pointer for the copy 
constructor `` system``.
- ``destructor``, the function pointer for the destructor 
`` system``.

All properties are assignable in non-`` safe`` code except 
``names`` and ``types``.



The layout of a sumetype is variable length it is as follows:

- ``size_t`` tag
- ``void function(ref new_, ref old_)`` Copy constructor
- ``void function(ref old_)`` Destructor
- ``void[X]`` Storage

  The tag is a hash of the fully qualified name of the type + name.

The copy constructor and destructor will work, as long as their 
arguments are pointing at storage. In practice the compiler will 
need to inject a null check before calling. The calling 
convention of the functions matches that of methods.

Attributes used on the copy constructor and destructor will be 
the common denominator between all the elements who have copy 
constructors and destructor respectively.

These are optional if none of the element types use a copy 
constructor or destructor.



```diff
BasicType:
+    SumType

TypeSpecialization:
+    sumtype
+    __enumtype

+ SumType:
+    sumtype ( SumTypeElements|opt )

+ SumTypeElements:
+    SumTypeElements '|' SumTypeElement
+    SumTypeElement

+ SumTypeElement
+    Type Identifier
+    Type
```



```d
enum None;
alias Animal = sumtype(None | Cat | Dog dog);

struct Cat {
}

struct Dog {
}

void main() {
	Animal animal = Animal(Cat());
	
	animal.dog = Dog();
	cast(Cat)animal = Cat();

	writeln(cast(Cat)animal);

	animal.None = true;
	assert(animal is None);
}

void someFunc(Animal animal) {
	import std.stdio;
	writeln(animal.tag); // some hash number
}
```

Jun 20

monkyyy <crazymonkyyy gmail.com> writes:

On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 ```d
 alias A = sumtype(int);
 alias B = sumtype(float);
 alias C = A + B;
 ```

It seems silly to me to consume alias operators on an untested 
and unimplemented concept when its not like it couldnt be used 
elsewhere

why not: ?

```d
alias seq(T...)=T;
alias a=seq!int;
alias b=seq!float;
alias c=a+b;
```

```d
alias a=seq!();
a+=int;
a+=float;
```

Jun 20

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 21/06/2025 6:27 AM, monkyyy wrote:
 On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 ```d
 alias A = sumtype(int);
 alias B = sumtype(float);
 alias C = A + B;
 ```

 
 It seems silly to me to consume alias operators on an untested and 
 unimplemented concept when its not like it couldnt be used elsewhere
 
 why not: ?
 
 ```d
 alias seq(T...)=T;
 alias a=seq!int;
 alias b=seq!float;
 alias c=a+b;
 ```
 
 ```d
 alias a=seq!();
 a+=int;
 a+=float;
 ```

The operators are defined on the sumtype, nothing else.

Alias is not a typedef, it does not have its own unique type in the type 
system, so there is nothing for the operators to attach to in the language.

The alias itself disappears from the type system. The type system can 
only see what it has been aliased to, its a direct replacement, and yes 
there is at least one bug due to this.

Have I explained this well enough?
I get the feeling that you should have come across this by now, and 
therefore this explanation won't be enough.

Jun 20

monkyyy <crazymonkyyy gmail.com> writes:

On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 - ``size_t`` tag

Nonsense why would someone need 64 bits of types, that wont ever 
compile
enum are at least only ints 
https://dlang.org/spec/enum.html#enum_properties
(tho they should also just be bytes that upgrade to shorts if you 
ever go longer then 255 elements)

Jun 20

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 21/06/2025 6:43 AM, monkyyy wrote:
 On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 - ``size_t`` tag

 
 Nonsense why would someone need 64 bits of types, that wont ever compile
 enum are at least only ints https://dlang.org/spec/ 
 enum.html#enum_properties
 (tho they should also just be bytes that upgrade to shorts if you ever 
 go longer then 255 elements)

Hashes in D are typically size_t or ulong.

The use of size_t makes sense as it takes up a full register, and we may 
as well use all of it to get more accuracy. The rest of the layout won't 
merge into it.

https://github.com/dlang/dmd/blob/f0541d65ba777e6f03499bcb5c0c59da8ce94050/druntime/src/object.d#L137

https://github.com/dlang/dmd/blob/f0541d65ba777e6f03499bcb5c0c59da8ce94050/druntime/src/core/internal/hash.d#L132

Jun 20

monkyyy <crazymonkyyy gmail.com> writes:

On Friday, 20 June 2025 at 18:46:38 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 
 Hashes in D are typically size_t or ulong.

why would it be a hash?

snars being overly fancy here but this will be a ubyte most of 
the time: 
https://github.com/dlang/phobos/blob/832cc465998b1ea77051cd3fd014b544442a4f8c/std/sumtype.d#L288

in mine versions I think I used enum as is, meaning its wouldve 
been an int

in an ideal world:
```d
struct sumtype(T...){
   enum Tag=enum{ static foreach(A;T){...
   union Union{ static foreach(A;T){...
   Tag tag;
   Union myunion;
   ...
}
```
Its not a hash, its an enum paired with a union; then you can 
iterate T while having  tag with `static foreach(I,A;T){ 
if(I==tag){...` ; simple and sane

Ideally enum with less then 255 elements would be ubytes so you 
get to make simpliticy vs idealness tradeoffs; but 64 bits is 
insane; break web apis if a client is 32 bit, also insane. Size_t 
is awful in general. The d compiler breaks down with deeply 
nested template hell, how do you plan on generating 64^2 types 
when theres a 100 depth recursion limit and they all have to be 
in memory.

Jun 20

monkyyy <crazymonkyyy gmail.com> writes:

On Friday, 20 June 2025 at 19:17:33 UTC, monkyyy wrote:
  The d compiler breaks down with deeply nested template hell, 
 how do you plan on generating 64^2 types when theres a 100 
 depth recursion limit and they all have to be in memory.

Also isnt uniqueness filtering a O(n^2) algorithm? at n pf 2^64, 
your not finishing that filter in the runtime of the universe

Jun 20

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 21/06/2025 7:17 AM, monkyyy wrote:
 On Friday, 20 June 2025 at 18:46:38 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Hashes in D are typically size_t or ulong.

 why would it be a hash?

When you assign one sumtype to another, where the second has a different 
element set, you have to match and then translate the previous tag to 
the new one.

If you are doing this often, that is a lot of unnecessary work. A hash 
will work in both sumtypes and is therefore a direct copy.

This was suggested to me by Jacob Carlburg in the context of value type 
exceptions and is a brilliant way to minimize the cost.

 snars being overly fancy here but this will be a ubyte most of the time: 
 https://github.com/dlang/phobos/ 
 blob/832cc465998b1ea77051cd3fd014b544442a4f8c/std/sumtype.d#L288
 
 in mine versions I think I used enum as is, meaning its wouldve been an int
 
 in an ideal world:
 ```d
 struct sumtype(T...){
    enum Tag=enum{ static foreach(A;T){...
    union Union{ static foreach(A;T){...
    Tag tag;
    Union myunion;
    ...
 }
 ```
 Its not a hash, its an enum paired with a union; then you can iterate T 
 while having  tag with `static foreach(I,A;T){ if(I==tag){...` ; simple 
 and sane
 
 Ideally enum with less then 255 elements would be ubytes so you get to 
 make simpliticy vs idealness tradeoffs; but 64 bits is insane; break web 
 apis if a client is 32 bit, also insane. Size_t is awful in general. The 
 d compiler breaks down with deeply nested template hell, how do you plan 
 on generating 64^2 types when theres a 100 depth recursion limit and 
 they all have to be in memory.

I'm not.

You are presupposing that the tag is an offset, not a hash.

If it was an offset I would indeed make it variable sized, so only the 
amount needed would be used.

The benefit of using an offset is branch tables will have a better 
chance to work. But I'm not convinced that it is worth it over getting 
the cheaper copies.

Jun 20

MrSmith33 <mrsmith33 yandex.ru> writes:

On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 The tag is a hash of the fully qualified name of the type + 
 name.

* What happens in the case of collision?
* Can collision be detected at compile-time?

Jun 21

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 22/06/2025 12:58 AM, MrSmith33 wrote:
 On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 The tag is a hash of the fully qualified name of the type + name.

 
 * What happens in the case of collision?
 * Can collision be detected at compile-time?

I see no reason it can't be detected.

However in practice the hash is so large compared to the elements in the 
set, that the chances of it hitting something when AA's don't is pretty low.

Jun 21

Lance Bachmeier <no spam.net> writes:

On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 I've gone back to the drawing board on sumtypes, and I had some 
 ideas yesturday based upon feedback from the last couple of 
 years.

It would be useful to include a comparison with std.sumtype. 
Explicitly state the new functionality this enables and the 
improvement in cases it overlaps.

 void main() {
 	Animal animal = Animal(Cat());
 	
 	animal.dog = Dog();
 	cast(Cat)animal = Cat();

 	writeln(cast(Cat)animal);

 	animal.None = true;
 	assert(animal is None);
 }

 void someFunc(Animal animal) {
 	import std.stdio;
 	writeln(animal.tag); // some hash number
 }
 ```

Could it do this?

```
double fun(Dog d) {
// Operations specific to Dog
}

double fun(Cat c) {
// Operations specific to Cat
}


auto a1 = Animal(Cat());
fun(a1);

auto a2 = Animal(Dog());
fun(a2);
```

Just one example where this is useful is with dates. You might 
have an int, two ints, or a string. Handling those cases with 
templates or structs is less than an optimal experience in terms 
of verbosity, ugliness, and complexity.

Jun 21

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 22/06/2025 3:06 AM, Lance Bachmeier wrote:
 On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 I've gone back to the drawing board on sumtypes, and I had some ideas 
 yesturday based upon feedback from the last couple of years.

 
 It would be useful to include a comparison with std.sumtype. Explicitly 
 state the new functionality this enables and the improvement in cases it 
 overlaps.

std.sumtype has to work around a lot of problems in the type system, 
that will continue to exist.

It can't introduce new concepts like naming, although I'm wondering if 
that should go (which I hate having to do).

Making it first class is also a win for usability.

I've thought about it, however this isn't a full DIP just enough to 
evaluate it, and I've already found a couple things that I want to explore.

 void main() {
     Animal animal = Animal(Cat());

     animal.dog = Dog();
     cast(Cat)animal = Cat();

     writeln(cast(Cat)animal);

     animal.None = true;
     assert(animal is None);
 }

 void someFunc(Animal animal) {
     import std.stdio;
     writeln(animal.tag); // some hash number
 }
 ```

 
 Could it do this?
 
 ```
 double fun(Dog d) {
 // Operations specific to Dog
 }
 
 double fun(Cat c) {
 // Operations specific to Cat
 }
 
 
 auto a1 = Animal(Cat());
 fun(a1);
 
 auto a2 = Animal(Dog());
 fun(a2);
 ```
 
 Just one example where this is useful is with dates. You might have an 
 int, two ints, or a string. Handling those cases with templates or 
 structs is less than an optimal experience in terms of verbosity, 
 ugliness, and complexity.

No.

Modifying overload and symbol resolution to understanding matching like 
this would make Walter a very unhappy person. He already complains about 
how complex it is.

I for one do not wish to poke that nest of problems that is symbol 
resolution.

Jun 21

Paul Backus <snarwin gmail.com> writes:

On Saturday, 21 June 2025 at 20:01:13 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 std.sumtype has to work around a lot of problems in the type 
 system, that will continue to exist.

 It can't introduce new concepts like naming, although I'm 
 wondering if that should go (which I hate having to do).

The only reason std.sumtype can't introduce naming is due to 
backward compatibility requirements. In principle, there is 
nothing stopping a library sum type from having named members.

Jun 22

D Programming

C/C++ Programming

Other

digitalmars.dip.ideas - Inline sumtype