digitalmars.dip.ideas - Literal types
- Quirin Schroll (133/133) Jun 25 D’s array literals (seem to) have an unofficial type and an
- Richard (Rikki) Andrew Cattermole (8/8) Jun 28 I have a hunch that what you are seeing is actually just different
- Paul Backus (4/9) Jun 28 Obviously it doesn't have a type during parsing--types are
- IchorDev (33/38) Jul 04 There’s a really clever idea in here somewhere but I think you
- Nick Treleaven (16/38) Jul 05 How can that work? `dup` is a template function that uses the
- Nick Treleaven (2/10) Jul 05
- IchorDev (22/26) Jul 06 It doesn’t work with static arrays so it doesn’t even solve one
- Nick Treleaven (15/30) Jul 06 You said integer literal type suffixes could be useful:
- IchorDev (21/35) Jul 06 The problem is that in practice VRP only really cares about type.
- Nick Treleaven (14/54) Jul 06 I assumed you were showing that line as something that could be
- Quirin Schroll (42/83) Jul 07 While I’d like to have those for symmetry and consistency,
D’s array literals (seem to) have an unofficial type and an official type. What I mean by that is, `[1, 2]` has the official type `int[]` (if you do `typeof([1, 2])` it says `int[]`), but unofficially, it can do a bunch of things a `int[]` generally doesn’t support. You can assign the literal to `immutable int[]` because it’s unique. You can assign it to `ubyte[]` because VRP can prove each value entry is within the bounds of `ubyte`. And you can assign it to `int[2]` because the length is right and it won’t heap allocate. The last thing is important because it’s not a mere optimization that’s optional, it works in ` nogc`, which means this is specified. I don’t know if I’m getting this 100% correct, but saying `[1, 2]` is an `int[]` isn’t the full story since it’s more like an `int[2]` that “decays” to a `int[]` (which usually ends up on the heap) unless you “catch” it early enough. Catching such a literal isn’t too difficult, the `staticArray` function does it. In a similar fashion, numeric literals decay: `typeof(1)` is `int`, but knowing something is `1` is so much more concrete than knowing it’s some `int`. Add `__typeof` that returns a non-decayed type, one that has all the information the compiler officially retains, and add the respective results for this operator. That means, with `x` some run-time `int` variable, `__typeof([1, 2, x])` isn’t `int[]`. It’s something like `__array!(__integer!(int, 1), __integer!(int, 2), int)`. Lastly, add a parameter storage class `__nodecay` or ` nodecay` that only matters when the type of the parameter is a template type parameter and is inferred; then, inference does not decay the type before binding the type parameter. This would enable low-level functionality where one can special-case certain values, e.g. giving a function a specialized overload with a parameter type of `__typeof(0)` that’s only matched by the constant `0` or `int(0)` or an enum of type `int` with value `0`, but not `0u`, `0L`, or an enum of type `ubyte` with value 0, and of course not by anything that might have a value distinct form `0` such as `1` or a run-time value. Because those types would be templates (or behave as such), their arguments could be matched: ```d void f0(T)(__integer!(T, 0) x) { } void fi(int x) { } alias f = f0; alias f = fi; f(0); // calls f0!(int) f(1); // calls fi ``` (No need to change overload resolution: Partial ordering determines that `fi` can be called with `f0`’s (synthesized) parameter type `__integer!(int, 0)`, but `f0` can’t be called with `fi`’s parameter type `int`, so it’s more specific.) This is akin to how `staticArray` can infer type and size from an array literal, just that it’s much finer grained. The `staticArray` function makes the argument decay insofar as the types of the entries must be unified. With this addition, that becomes optional: `__array!(int, string)` would be totally valid, it just can’t decay into a static or dynamic array type, so if it has to because it’s not caught early enough, that’s an error. A tuple type of `int` and `string` could support an `opAssign` that takes an `__array!(int, string)` parameter and that allows `t = [1, "hi"]` even though `typeof([1, "hi"])` fails because it requires an array literal to decay into some `T[]` which this one can’t. Maybe a future edition could make `typeof([1, "hi"])` not be an error, but as of now, this can be used to test if two values have compatible type and we can’t just take that use case away. This subsumes `enum` parameters and tuples to some degree. Recap: In another DIP idea, I suggested an enum parameter would be a compile-time constant that’s passed the same way a run-time parameter is passed to a function call. Essentially what Zig calls `comptime` parameters. I also suggested in the past that static arrays are basically homogeneous tuples and that they could be generalized to tuples. That would mean the syntax would be brackets, not parentheses, but that neatly solves the 1-tuple case, since `(x)` must stay `x`, but `[x]` isn’t the same as `x`. The idea there the same decay observation and that there’s no inherent need to decay array literals to static arrays immediately and to dynamic arrays further. Of course, this idea doesn’t solve `auto enum` from the `enum` parameter idea as neatly. It also doesn’t add any tuple decomposition support and syntax sugars one might want to have. In that context, `__array` is a bad name and it should be `__tuple` instead. It does solve e.g. compile-time format strings and indexing into a tuple: ```d int format(Fmt, Ts...)(__nodecay Fmt fmt, in Ts args) if (__traits(compiles, { enum string s = fmt; })) { // If fmt is a string literal, // its type Fmt is a unit type, // i.e. enough to recreate the value // without even considering fmt. enum string s = fmt; // s can be analyzed like any compile-time constant } // string literals are zero-terminated // and must be distinct from other array literals in undecayed form // (hex strings are even more special) format("%d", 10); // okay: fmt!(__string(char, "%d")) // actual array literal format(['%', 'd'], 10); // okay: fmt!(__array!(__integer!(char, '%'), __integer!(char, 'd')) string fmt = "%s"; // calls some other format function // that must do run-time checking format(fmt, 10); ``` As for static indexing: ```d struct Tuple(Ts...) { Ts expand; static foreach (i; 0 .. Ts.length) ref Ts[i] opIndex(T)(__integer!(T, i)) return => expand[i]; // or static foreach (i; 0 .. Ts.length) ref Ts[i] opIndex(__integer!(size_t, i)) return => expand[i]; } Tuple!(int, string) t; auto x = t[0]; // calls t.opIndex!int(__integer!(int, 0)); auto y = t[1u]; // calls t.opIndex!uint(__integer!(uint, 1)); auto z = t[2L]; // error, none of the overloads match // or auto x = t[0]; // calls t.opIndex(__integer!(size_t, 0)); auto y = t[1u]; // calls t.opIndex(__integer!(size_t, 1)); auto z = t[2L]; // error, none of the overloads match ``` The second alternative only works if the types `__integer(T, x)` have semantics that they convert implicitly to each other if their values are equal.
Jun 25
I have a hunch that what you are seeing is actually just different stages of semantic analysis. Initially an array literal does not have a type: https://github.com/dlang/dmd/blob/c87ec3505e6675c9cd14697a4a18a0212d6a3b46/compiler/src/dmd/parse.d#L8715 ``e = new AST.ArrayLiteralExp(loc, null, values);`` That is what the null means there. https://github.com/dlang/dmd/blob/c87ec3505e6675c9cd14697a4a18a0212d6a3b46/compiler/src/dmd/expression.d#L1909 I don't think that there is a type to expose.
Jun 28
On Saturday, 28 June 2025 at 07:58:03 UTC, Richard (Rikki) Andrew Cattermole wrote:I have a hunch that what you are seeing is actually just different stages of semantic analysis. Initially an array literal does not have a type: https://github.com/dlang/dmd/blob/c87ec3505e6675c9cd14697a4a18a0212d6a3b46/compiler/src/dmd/parse.d#L8715 ``e = new AST.ArrayLiteralExp(loc, null, values);``Obviously it doesn't have a type during parsing--types are determined in semantic analysis.
Jun 28
On Wednesday, 25 June 2025 at 17:56:09 UTC, Quirin Schroll wrote:I don’t know if I’m getting this 100% correct, but saying `[1, 2]` is an `int[]` isn’t the full story since it’s more like an `int[2]` that “decays” to a `int[]` (which usually ends up on the heap) unless you “catch” it early enough. Catching such a literal isn’t too difficult, the `staticArray` function does it.There’s a really clever idea in here somewhere but I think you didn’t quite hit the nail on the head. The one-way implicit casting of literals often gets in my way… ``` auto x = 1; ushort y = x; //ERR: `1` is an `int`… ugh ushort z = 1; //OK: `1` is actually `ushort` now auto a = [1,2]; ubyte[] b = a.dup(); //ERR: `[1,2]` is an `int[]`… ugh ubyte[] c = [1,2]; //OK: on second thought, `[1,2]` can totally be a `ubyte[]`… ``` One thing that could mitigate this is having explicit suffixes for char, byte, and short. But what would be really nice is if the language could keep track of when a variable’s type was inferred from nothing but a literal, and then allow the usual type conversion restrictions to be bent a little based on how the literal could’ve been interpreted as a different type. It’s a similar idea to https://dlang.org/spec/type.html#vrp which never accounts for variables that have just been unconditionally assigned a literal value. Oddly, the idea that I described already exists for enums: ```d enum int[] x = [1,2]; ushort[] y = x; //no error?! ushort[2] z = x; //still no error!!?! ``` Hopefully that makes sense. TL;DR: Let variables that are initialised/unconditionally assigned literals follow the same implicit cast rules as the literal, meaning that `int x=1; byte y=x;` compiles since `1` can be inferred as a byte.
Jul 04
On Friday, 4 July 2025 at 16:16:05 UTC, IchorDev wrote:auto a = [1,2]; ubyte[] b = a.dup(); //ERR: `[1,2]` is an `int[]`… ughHow can that work? `dup` is a template function that uses the type of `a`. `dup` doesn't know anything about `b` being `ubyte[]`. That would need backwards type inference.``` ubyte[] c = [1,2]; //OK: on second thought, `[1,2]` can totally be a `ubyte[]`… ``` One thing that could mitigate this is having explicit suffixes for char, byte, and short.You can write `cast(ubyte) [1, 2]`: https://dlang.org/spec/expression.html#cast_array_literalBut what would be really nice is if the language could keep track of when a variable’s type was inferred from nothing but a literal, and then allow the usual type conversion restrictions to be bent a little based on how the literal could’ve been interpreted as a different type. It’s a similar idea to https://dlang.org/spec/type.html#vrp which never accounts for variables that have just been unconditionally assigned a literal value.VRP works for expressions, it doesn't work across statements (except for const variable declarations). To do that in the general case requires a lot of analysis, slowing compilation.Oddly, the idea that I described already exists for enums: ```d enum int[] x = [1,2]; ushort[] y = x; //no error?! ushort[2] z = x; //still no error!!?! ```There's no memory allocated for `x`, it is a symbol representing an array literal. The type of the enum is not a problem because the element values are still known, similar to: ```d enum int i = 1; ushort s = i; ```
Jul 05
On Saturday, 5 July 2025 at 10:50:13 UTC, Nick Treleaven wrote:On Friday, 4 July 2025 at 16:16:05 UTC, IchorDev wrote:I meant `cast(ubyte[]) [1, 2]`.ubyte[] c = [1,2]; //OK: on second thought, `[1,2]` can totally be a `ubyte[]`… ``` One thing that could mitigate this is having explicit suffixes for char, byte, and short.You can write `cast(ubyte) [1, 2]`:https://dlang.org/spec/expression.html#cast_array_literal
Jul 05
On Saturday, 5 July 2025 at 15:48:43 UTC, Nick Treleaven wrote:On Saturday, 5 July 2025 at 10:50:13 UTC, Nick Treleaven wrote:It doesn’t work with static arrays so it doesn’t even solve one of Quinn’s first examples. ```d auto x = [1,2]; ushort[2] y = cast(ushort[2])x; ``` The unspoken problem here is integer arithmetic: putting explicit casts on expressions SUCKS. There’s no nicer way to say it. The code ugliness is over 100%. If you haven’t had to add `cast(ushort)(exp)` to hundreds of expressions before, then count yourself lucky because it defaces pretty code into a hideous mess. I still fail to see Walter’s reasoning as to how compatibility with C’s integer promotion is more important than compatibility with its function pointer syntax, truncating implicit casts, etc. Integer promotion doesn’t even prevent overflows for int/long types, similar to how C’s `const` is completely useless. So, a better system would just ask the programmer to be specific about how they want/expect each integer to handle overflow (see std.checkedint… which doesn’t even work properly with bytes/shorts because of integer promotion). Or: just add a flag to disable integer promotion. ;)You can write `cast(ubyte) [1, 2]`:I meant `cast(ubyte[]) [1, 2]`.https://dlang.org/spec/expression.html#cast_array_literal
Jul 06
On Sunday, 6 July 2025 at 14:22:23 UTC, IchorDev wrote:On Saturday, 5 July 2025 at 15:48:43 UTC, Nick Treleaven wrote:I didn't say it did.On Saturday, 5 July 2025 at 10:50:13 UTC, Nick Treleaven wrote:It doesn’t work with static arrays so it doesn’t even solve one of Quinn’s first examples.You can write `cast(ubyte) [1, 2]`:I meant `cast(ubyte[]) [1, 2]`.https://dlang.org/spec/expression.html#cast_array_literal```d auto x = [1,2]; ushort[2] y = cast(ushort[2])x; ```You said integer literal type suffixes could be useful:One thing that could mitigate this is having explicit suffixes for char, byte, and short.Instead I pointed you to array literal casts, which solves that problem without repeating a suffix for every literal element of the array literal, and it's a feature that already exists. Your code is casting a non-literal, which the link I gave you explicitly says means something different. That was not what I suggested. For the record, you can cast an array literal to a static array type: ```d ushort[2] y = cast(ushort[2]) [1,2]; assert(y == [1, 2]); ```
Jul 06
On Saturday, 5 July 2025 at 10:50:13 UTC, Nick Treleaven wrote:On Friday, 4 July 2025 at 16:16:05 UTC, IchorDev wrote:It doesn’t, that’s the point.auto a = [1,2]; ubyte[] b = a.dup(); //ERR: `[1,2]` is an `int[]`… ughHow can that work?VRP works for expressions, it doesn't work across statements (except for const variable declarations). To do that in the general case requires a lot of analysis, slowing compilation.The problem is that in practice VRP only really cares about type. I don’t see why it would be non-trivial to add a piece of metadata that specifies the literal that a variable was initialised with, but I haven’t worked on dmd much so maybe it’s not flexible like that. A simpler solution is just remove integer promotion. Yes, ha-ha very funny. More realistically, a future edition could make it so that the compiler never assumes that a number that could fit into a `byte`/`short` is an `int`, but this would break a lot of declarations like `auto n=1;`, so I can see it being unpopular. I think this idea is in the same category as wanting `float` arithmetic to use `float`s: it would be so useful but it’s unlikely to happen.There's no memory allocated for `x`, it is a symbol representing an array literal. The type of the enum is not a problem because the element values are still known, similar to: ```d enum int i = 1; ushort s = i; ```Uhh, yeah obviously? I pointed this out because it illustrates how D already breaks the usual implicit cast rules in the same way as in my idea. The only difference between enums and variables here is that variables are less statically predictable. Nothing requires the compiler to actually allocate a variable either if that variable can be optimised away.
Jul 06
On Sunday, 6 July 2025 at 13:37:41 UTC, IchorDev wrote:On Saturday, 5 July 2025 at 10:50:13 UTC, Nick Treleaven wrote:I assumed you were showing that line as something that could be made to work. I was asking how you think the language could make that work.On Friday, 4 July 2025 at 16:16:05 UTC, IchorDev wrote:It doesn’t, that’s the point.auto a = [1,2]; ubyte[] b = a.dup(); //ERR: `[1,2]` is an `int[]`… ughHow can that work?Because how do you know the variable wasn't mutated after initialization but before the value of the variable is needed. Detecting that needs complex analysis. You could come up with simple cases, but then people will get confused and complain when they have a slightly more complex case and it doesn't work.VRP works for expressions, it doesn't work across statements (except for const variable declarations). To do that in the general case requires a lot of analysis, slowing compilation.The problem is that in practice VRP only really cares about type. I don’t see why it would be non-trivial to add a piece of metadata that specifies the literal that a variable was initialised with, but I haven’t worked on dmd much so maybe it’s not flexible like that. A simpler solution is just remove integer promotion. Yes, ha-ha very funny. More realistically, a future edition could make it so that the compiler never assumes that a number that could fit into a `byte`/`short` is an `int`, but this would break a lot of declarations like `auto n=1;`, so I can see it being unpopular.Not only that, but it would also encourage people to use integer types that are slower than optimal for the machine, thus producing slow code, as well as making under/overflow more likely.I think this idea is in the same category as wanting `float` arithmetic to use `float`s: it would be so useful but it’s unlikely to happen.An enum is const, so VRP works exactly the same as a const variable.There's no memory allocated for `x`, it is a symbol representing an array literal. The type of the enum is not a problem because the element values are still known, similar to: ```d enum int i = 1; ushort s = i; ```Uhh, yeah obviously? I pointed this out because it illustrates how D already breaks the usual implicit cast rules in the same way as in my idea. The only difference between enums and variables here is that variables are less statically predictable.Nothing requires the compiler to actually allocate a variable either if that variable can be optimised away.
Jul 06
On Friday, 4 July 2025 at 16:16:05 UTC, IchorDev wrote:On Wednesday, 25 June 2025 at 17:56:09 UTC, Quirin Schroll wrote:While I’d like to have those for symmetry and consistency, `short(1)` actually works. But that’s not your actual issue here. You can’t initialize a smaller integer type from a _run-time_ value of a bigger integral type. Plain literals have a default decay type which isn’t maximally narrow, but `int`. D inherited that from C, so that what looks like C and compiles, acts like C, for the most part. The importance of this principle has declined over the years, but for very basic stuff, it’s still relevant. There’s virtually no difference between `auto x = short(1)` and `short x = 1`.I don’t know if I’m getting this 100% correct, but saying `[1, 2]` is an `int[]` isn’t the full story since it’s more like an `int[2]` that “decays” to a `int[]` (which usually ends up on the heap) unless you “catch” it early enough. Catching such a literal isn’t too difficult, the `staticArray` function does it.There’s a really clever idea in here somewhere but I think you didn’t quite hit the nail on the head. The one-way implicit casting of literals often gets in my way… ``` auto x = 1; ushort y = x; //ERR: `1` is an `int`… ugh ushort z = 1; //OK: `1` is actually `ushort` now auto a = [1,2]; ubyte[] b = a.dup(); //ERR: `[1,2]` is an `int[]`… ugh ubyte[] c = [1,2]; //OK: on second thought, `[1,2]` can totally be a `ubyte[]`… ``` One thing that could mitigate this is having explicit suffixes for char, byte, and short.But what would be really nice is if the language could keep track of when a variable’s type was inferred from nothing but a literal, and then allow the usual type conversion restrictions to be bent a little based on how the literal could’ve been interpreted as a different type.You can’t easily do that with run-time values.It’s a similar idea to https://dlang.org/spec/type.html#vrp which never accounts for variables that have just been unconditionally assigned a literal value.That’s because VRP is an expression feature. It doesn’t interact with statements. I don’t know how VRP is implemented, but it treats integral types as (unions of ) intervals. That’s probably a lot easier to do within expressions than over statements where you’d have to store the intervals with variables indefinitely.Oddly, the idea that I described already exists for enums: ```d enum int[] x = [1,2]; ushort[] y = x; //no error?! ushort[2] z = x; //still no error!!?! ``` Hopefully that makes sense.It makes sense because an `enum` isn’t a run-time value, but quite close to a named literal with explicit decay type. (By the latter, I mean that `enum short h = 10` has type `short` not `int`, but behaves like the literal `10`; the same is true for `enum short[] hs = [1,2]` which has type `short[]` but you can infer its length and assign it to a `ubyte[]`.) It also works with `immutable` values with compile-time values: ```d immutable int v = 10; immutable int[] vs = [v,v]; void main() { byte w = v; // okay byte[] ws = vs; // error } ``` The slice thing doesn’t work with `immutable` because slices are somewhat reference types. It works with `enum` because `x` in your example is a literal. `ws = vs` wouldn’t ever allocate, but `y = x` actually will (unless optimized out) in the sense that it’s not ` nogc`.TL;DR: Let variables that are initialised/unconditionally assigned literals follow the same implicit cast rules as the literal, meaning that `int x=1; byte y=x;` compiles since `1` can be inferred as a byte.It won’t happen because it’s hard to implement, hard to reason through in general, and brings little value in return. In many cases, restructuring the code works.
Jul 07