digitalmars.dip.ideas - Inline sumtype
- Richard (Rikki) Andrew Cattermole (171/171) Jun 20 I've gone back to the drawing board on sumtypes, and I had some
- monkyyy (17/22) Jun 20 It seems silly to me to consume alias operators on an untested
- Richard (Rikki) Andrew Cattermole (10/35) Jun 20 The operators are defined on the sumtype, nothing else.
- monkyyy (8/9) Jun 20 Nonsense why would someone need 64 bits of types, that wont ever
- Richard (Rikki) Andrew Cattermole (7/16) Jun 20 Hashes in D are typically size_t or ulong.
- monkyyy (28/30) Jun 20 why would it be a hash?
- monkyyy (3/6) Jun 20 Also isnt uniqueness filtering a O(n^2) algorithm? at n pf 2^64,
- Richard (Rikki) Andrew Cattermole (15/47) Jun 20 When you assign one sumtype to another, where the second has a different...
- MrSmith33 (4/6) Jun 21 * What happens in the case of collision?
- Richard (Rikki) Andrew Cattermole (4/10) Jun 21 I see no reason it can't be detected.
- Lance Bachmeier (22/39) Jun 21 It would be useful to include a comparison with std.sumtype.
- Richard (Rikki) Andrew Cattermole (14/63) Jun 21 std.sumtype has to work around a lot of problems in the type system,
- Paul Backus (5/9) Jun 22 The only reason std.sumtype can't introduce naming is due to
I've gone back to the drawing board on sumtypes, and I had some ideas yesturday based upon feedback from the last couple of years. Unlike the other designs that have been proposed, this one is an inline to the type definition instead of having a declaration. It gives enum declarations without a value a type (error currently), that is non-unique to the declaration. Matching is not added here, but my previous DIP for them could be made to work for it. A sumtype contains zero or more elements. Each element may have a name. An element type + name pair must be unqiue in the element list, and a name may only appear once. Valid: ``alias S = sumtype (int, int i);`` Invalid: ``alias S = sumtype(int i, float i);`` Any element whose type is an enum type, will also have the element's name set to that of its identifier. Two sumtypes may be merged together using the ``+`` operator. ```d alias A = sumtype(int); alias B = sumtype(float); alias C = A + B; ``` And subtracted from with ``-``: ```d alias C = sumtype(int, float); alias B = sumtype(float); alias A = C - B; ``` The normal restrictions within a sumtype elements apply before and after a setop has occured. Combine with alias assignment for fine grained control: ```d alias Result = sumtype(); static foreach(Type; Input) { Result += sumtype(Type); } ``` Duplicates are ignored during merging. A sumtype maybe constructed in one of three situations: - Sumtype initialization syntax: ``Type(Expression)`` or ``Type(name: Expression)``. - Variable declaration: ``Type var = Expression;`` - Return: ``return Expression;`` - Function call: ``func(1);`` where ``void func(sumtype(int) param)`` For function calls, the argument to parameter matching will use conversion, and will be considered less of a match than a exact one. A sumtype may be assigned to another, that has an comparable element list. ```d alias A = sumtype(int); alias B = sumtype(int | string); B b = A(5); ``` Assigning a sumtype to another will have preference over initialization. ```d alias A = sumtype(int); alias B = sumtype(int, A); B initialization = 8; B assignment = A(9); ``` An enum without a value, is given a non-unique type based upon its identifier. ```d enum None; pragma(msg, typeof(None)); // __enumtype("None") static assert(is(typeof(None) == __enumtype)); ``` An enum type may be used as its type, when the grammar requires a type: ```d None none = None; ``` As its size is zero, any variables that are of these types are dummy and can replace the existing practice of ``void[0] storage;``. They do not contribute to field layouts. An assignment of ``true`` will succeed, although will be no-op. ```d None none = true; ``` The enum type ``__enumtype("None")`` will have an instance in object.d. The mangling of an enum type does not include the module it is in. To check the type of a sumtype against a known type, use an is expression. ``assert(sumtype(int).init is int);`` Other comparisons i.e. ``==`` are done by compiler hook by matching and comparing the values if the tags match. Casting a sumtype results in a read barrier to check the tag matches the requested type on read. There is no read barrier on write. The result of a cast cannot be passed around by-ref or taken a pointer to. A sumtype holds the properties: - ``tag`` that of the tagged union - ``storage`` the block of storage (`` system``), typed as ``void[X]`` - ``types`` holds a sequence of types, which are the types for the elements. - ``names`` holds a sequence of strings, which are the names for the elements. - ``copyconstructor``, the function pointer for the copy constructor `` system``. - ``destructor``, the function pointer for the destructor `` system``. All properties are assignable in non-`` safe`` code except ``names`` and ``types``. The layout of a sumetype is variable length it is as follows: - ``size_t`` tag - ``void function(ref new_, ref old_)`` Copy constructor - ``void function(ref old_)`` Destructor - ``void[X]`` Storage The tag is a hash of the fully qualified name of the type + name. The copy constructor and destructor will work, as long as their arguments are pointing at storage. In practice the compiler will need to inject a null check before calling. The calling convention of the functions matches that of methods. Attributes used on the copy constructor and destructor will be the common denominator between all the elements who have copy constructors and destructor respectively. These are optional if none of the element types use a copy constructor or destructor. ```diff BasicType: + SumType TypeSpecialization: + sumtype + __enumtype + SumType: + sumtype ( SumTypeElements|opt ) + SumTypeElements: + SumTypeElements '|' SumTypeElement + SumTypeElement + SumTypeElement + Type Identifier + Type ``` ```d enum None; alias Animal = sumtype(None | Cat | Dog dog); struct Cat { } struct Dog { } void main() { Animal animal = Animal(Cat()); animal.dog = Dog(); cast(Cat)animal = Cat(); writeln(cast(Cat)animal); animal.None = true; assert(animal is None); } void someFunc(Animal animal) { import std.stdio; writeln(animal.tag); // some hash number } ```
Jun 20
On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:```d alias A = sumtype(int); alias B = sumtype(float); alias C = A + B; ```It seems silly to me to consume alias operators on an untested and unimplemented concept when its not like it couldnt be used elsewhere why not: ? ```d alias seq(T...)=T; alias a=seq!int; alias b=seq!float; alias c=a+b; ``` ```d alias a=seq!(); a+=int; a+=float; ```
Jun 20
On 21/06/2025 6:27 AM, monkyyy wrote:On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:The operators are defined on the sumtype, nothing else. Alias is not a typedef, it does not have its own unique type in the type system, so there is nothing for the operators to attach to in the language. The alias itself disappears from the type system. The type system can only see what it has been aliased to, its a direct replacement, and yes there is at least one bug due to this. Have I explained this well enough? I get the feeling that you should have come across this by now, and therefore this explanation won't be enough.```d alias A = sumtype(int); alias B = sumtype(float); alias C = A + B; ```It seems silly to me to consume alias operators on an untested and unimplemented concept when its not like it couldnt be used elsewhere why not: ? ```d alias seq(T...)=T; alias a=seq!int; alias b=seq!float; alias c=a+b; ``` ```d alias a=seq!(); a+=int; a+=float; ```
Jun 20
On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:- ``size_t`` tagNonsense why would someone need 64 bits of types, that wont ever compile enum are at least only ints https://dlang.org/spec/enum.html#enum_properties (tho they should also just be bytes that upgrade to shorts if you ever go longer then 255 elements)
Jun 20
On 21/06/2025 6:43 AM, monkyyy wrote:On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:Hashes in D are typically size_t or ulong. The use of size_t makes sense as it takes up a full register, and we may as well use all of it to get more accuracy. The rest of the layout won't merge into it. https://github.com/dlang/dmd/blob/f0541d65ba777e6f03499bcb5c0c59da8ce94050/druntime/src/object.d#L137 https://github.com/dlang/dmd/blob/f0541d65ba777e6f03499bcb5c0c59da8ce94050/druntime/src/core/internal/hash.d#L132- ``size_t`` tagNonsense why would someone need 64 bits of types, that wont ever compile enum are at least only ints https://dlang.org/spec/ enum.html#enum_properties (tho they should also just be bytes that upgrade to shorts if you ever go longer then 255 elements)
Jun 20
On Friday, 20 June 2025 at 18:46:38 UTC, Richard (Rikki) Andrew Cattermole wrote:Hashes in D are typically size_t or ulong.why would it be a hash? snars being overly fancy here but this will be a ubyte most of the time: https://github.com/dlang/phobos/blob/832cc465998b1ea77051cd3fd014b544442a4f8c/std/sumtype.d#L288 in mine versions I think I used enum as is, meaning its wouldve been an int in an ideal world: ```d struct sumtype(T...){ enum Tag=enum{ static foreach(A;T){... union Union{ static foreach(A;T){... Tag tag; Union myunion; ... } ``` Its not a hash, its an enum paired with a union; then you can iterate T while having tag with `static foreach(I,A;T){ if(I==tag){...` ; simple and sane Ideally enum with less then 255 elements would be ubytes so you get to make simpliticy vs idealness tradeoffs; but 64 bits is insane; break web apis if a client is 32 bit, also insane. Size_t is awful in general. The d compiler breaks down with deeply nested template hell, how do you plan on generating 64^2 types when theres a 100 depth recursion limit and they all have to be in memory.
Jun 20
On Friday, 20 June 2025 at 19:17:33 UTC, monkyyy wrote:The d compiler breaks down with deeply nested template hell, how do you plan on generating 64^2 types when theres a 100 depth recursion limit and they all have to be in memory.Also isnt uniqueness filtering a O(n^2) algorithm? at n pf 2^64, your not finishing that filter in the runtime of the universe
Jun 20
On 21/06/2025 7:17 AM, monkyyy wrote:On Friday, 20 June 2025 at 18:46:38 UTC, Richard (Rikki) Andrew Cattermole wrote:When you assign one sumtype to another, where the second has a different element set, you have to match and then translate the previous tag to the new one. If you are doing this often, that is a lot of unnecessary work. A hash will work in both sumtypes and is therefore a direct copy. This was suggested to me by Jacob Carlburg in the context of value type exceptions and is a brilliant way to minimize the cost.Hashes in D are typically size_t or ulong.why would it be a hash?snars being overly fancy here but this will be a ubyte most of the time: https://github.com/dlang/phobos/ blob/832cc465998b1ea77051cd3fd014b544442a4f8c/std/sumtype.d#L288 in mine versions I think I used enum as is, meaning its wouldve been an int in an ideal world: ```d struct sumtype(T...){ enum Tag=enum{ static foreach(A;T){... union Union{ static foreach(A;T){... Tag tag; Union myunion; ... } ``` Its not a hash, its an enum paired with a union; then you can iterate T while having tag with `static foreach(I,A;T){ if(I==tag){...` ; simple and sane Ideally enum with less then 255 elements would be ubytes so you get to make simpliticy vs idealness tradeoffs; but 64 bits is insane; break web apis if a client is 32 bit, also insane. Size_t is awful in general. The d compiler breaks down with deeply nested template hell, how do you plan on generating 64^2 types when theres a 100 depth recursion limit and they all have to be in memory.I'm not. You are presupposing that the tag is an offset, not a hash. If it was an offset I would indeed make it variable sized, so only the amount needed would be used. The benefit of using an offset is branch tables will have a better chance to work. But I'm not convinced that it is worth it over getting the cheaper copies.
Jun 20
On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:The tag is a hash of the fully qualified name of the type + name.* What happens in the case of collision? * Can collision be detected at compile-time?
Jun 21
On 22/06/2025 12:58 AM, MrSmith33 wrote:On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:I see no reason it can't be detected. However in practice the hash is so large compared to the elements in the set, that the chances of it hitting something when AA's don't is pretty low.The tag is a hash of the fully qualified name of the type + name.* What happens in the case of collision? * Can collision be detected at compile-time?
Jun 21
On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:I've gone back to the drawing board on sumtypes, and I had some ideas yesturday based upon feedback from the last couple of years.It would be useful to include a comparison with std.sumtype. Explicitly state the new functionality this enables and the improvement in cases it overlaps.void main() { Animal animal = Animal(Cat()); animal.dog = Dog(); cast(Cat)animal = Cat(); writeln(cast(Cat)animal); animal.None = true; assert(animal is None); } void someFunc(Animal animal) { import std.stdio; writeln(animal.tag); // some hash number } ```Could it do this? ``` double fun(Dog d) { // Operations specific to Dog } double fun(Cat c) { // Operations specific to Cat } auto a1 = Animal(Cat()); fun(a1); auto a2 = Animal(Dog()); fun(a2); ``` Just one example where this is useful is with dates. You might have an int, two ints, or a string. Handling those cases with templates or structs is less than an optimal experience in terms of verbosity, ugliness, and complexity.
Jun 21
On 22/06/2025 3:06 AM, Lance Bachmeier wrote:On Friday, 20 June 2025 at 18:15:17 UTC, Richard (Rikki) Andrew Cattermole wrote:std.sumtype has to work around a lot of problems in the type system, that will continue to exist. It can't introduce new concepts like naming, although I'm wondering if that should go (which I hate having to do). Making it first class is also a win for usability. I've thought about it, however this isn't a full DIP just enough to evaluate it, and I've already found a couple things that I want to explore.I've gone back to the drawing board on sumtypes, and I had some ideas yesturday based upon feedback from the last couple of years.It would be useful to include a comparison with std.sumtype. Explicitly state the new functionality this enables and the improvement in cases it overlaps.No. Modifying overload and symbol resolution to understanding matching like this would make Walter a very unhappy person. He already complains about how complex it is. I for one do not wish to poke that nest of problems that is symbol resolution.void main() { Animal animal = Animal(Cat()); animal.dog = Dog(); cast(Cat)animal = Cat(); writeln(cast(Cat)animal); animal.None = true; assert(animal is None); } void someFunc(Animal animal) { import std.stdio; writeln(animal.tag); // some hash number } ```Could it do this? ``` double fun(Dog d) { // Operations specific to Dog } double fun(Cat c) { // Operations specific to Cat } auto a1 = Animal(Cat()); fun(a1); auto a2 = Animal(Dog()); fun(a2); ``` Just one example where this is useful is with dates. You might have an int, two ints, or a string. Handling those cases with templates or structs is less than an optimal experience in terms of verbosity, ugliness, and complexity.
Jun 21
On Saturday, 21 June 2025 at 20:01:13 UTC, Richard (Rikki) Andrew Cattermole wrote:std.sumtype has to work around a lot of problems in the type system, that will continue to exist. It can't introduce new concepts like naming, although I'm wondering if that should go (which I hate having to do).The only reason std.sumtype can't introduce naming is due to backward compatibility requirements. In principle, there is nothing stopping a library sum type from having named members.
Jun 22