www.digitalmars.com         C & C++   DMDScript  

digitalmars.dip.ideas - Literal types

reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
D’s array literals (seem to) have an unofficial type and an 
official type. What I mean by that is, `[1, 2]` has the official 
type `int[]` (if you do `typeof([1, 2])` it says `int[]`), but 
unofficially, it can do a bunch of things a `int[]` generally 
doesn’t support. You can assign the literal to `immutable int[]` 
because it’s unique. You can assign it to `ubyte[]` because VRP 
can prove each value entry is within the bounds of `ubyte`. And 
you can assign it to `int[2]` because the length is right and it 
won’t heap allocate. The last thing is important because it’s not 
a mere optimization that’s optional, it works in ` nogc`, which 
means this is specified.

I don’t know if I’m getting this 100% correct, but saying `[1, 
2]` is an `int[]` isn’t the full story since it’s more like an 
`int[2]` that “decays” to a `int[]` (which usually ends up on the 
heap) unless you “catch” it early enough. Catching such a literal 
isn’t too difficult, the `staticArray` function does it.

In a similar fashion, numeric literals decay: `typeof(1)` is 
`int`, but knowing something is `1` is so much more concrete than 
knowing it’s some `int`.

Add `__typeof` that returns a non-decayed type, one that has all 
the information the compiler officially retains, and add the 
respective results for this operator. That means, with `x` some 
run-time `int` variable, `__typeof([1, 2, x])` isn’t `int[]`. 
It’s something like `__array!(__integer!(int, 1), __integer!(int, 
2), int)`. Lastly, add a parameter storage class `__nodecay` or 
` nodecay` that only matters when the type of the parameter is a 
template type parameter and is inferred; then, inference does not 
decay the type before binding the type parameter.

This would enable low-level functionality where one can 
special-case certain values, e.g. giving a function a specialized 
overload with a parameter type of `__typeof(0)` that’s only 
matched by the constant `0` or `int(0)` or an enum of type `int` 
with value `0`, but not `0u`, `0L`, or an enum of type `ubyte` 
with value 0, and of course not by anything that might have a 
value distinct form `0` such as `1` or a run-time value.

Because those types would be templates (or behave as such), their 
arguments could be matched:
```d
void f0(T)(__integer!(T, 0) x) { }
void fi(int x) { }
alias f = f0;
alias f = fi;
f(0); // calls f0!(int)
f(1); // calls fi
```

(No need to change overload resolution: Partial ordering 
determines that `fi` can be called with `f0`’s (synthesized) 
parameter type `__integer!(int, 0)`, but `f0` can’t be called 
with `fi`’s parameter type `int`, so it’s more specific.)

This is akin to how `staticArray` can infer type and size from an 
array literal, just that it’s much finer grained. The 
`staticArray` function makes the argument decay insofar as the 
types of the entries must be unified. With this addition, that 
becomes optional: `__array!(int, string)` would be totally valid, 
it just can’t decay into a static or dynamic array type, so if it 
has to because it’s not caught early enough, that’s an error. A 
tuple type of `int` and `string` could support an `opAssign` that 
takes an `__array!(int, string)` parameter and that allows `t = 
[1, "hi"]` even though `typeof([1, "hi"])` fails because it 
requires an array literal to decay into some `T[]` which this one 
can’t. Maybe a future edition could make `typeof([1, "hi"])` not 
be an error, but as of now, this can be used to test if two 
values have compatible type and we can’t just take that use case 
away.

This subsumes `enum` parameters and tuples to some degree.

Recap: In another DIP idea, I suggested an enum parameter would 
be a compile-time constant that’s passed the same way a run-time 
parameter is passed to a function call. Essentially what Zig 
calls `comptime` parameters.

I also suggested in the past that static arrays are basically 
homogeneous tuples and that they could be generalized to tuples. 
That would mean the syntax would be brackets, not parentheses, 
but that neatly solves the 1-tuple case, since `(x)` must stay 
`x`, but `[x]` isn’t the same as `x`. The idea there the same 
decay observation and that there’s no inherent need to decay 
array literals to static arrays immediately and to dynamic arrays 
further.

Of course, this idea doesn’t solve `auto enum` from the `enum` 
parameter idea as neatly. It also doesn’t add any tuple 
decomposition support and syntax sugars one might want to have. 
In that context, `__array` is a bad name and it should be 
`__tuple` instead.

It does solve e.g. compile-time format strings and indexing into 
a tuple:
```d
int format(Fmt, Ts...)(__nodecay Fmt fmt, in Ts args)
if (__traits(compiles, { enum string s = fmt; }))
{
     // If fmt is a string literal,
     // its type Fmt is a unit type,
     // i.e. enough to recreate the value
     // without even considering fmt.
     enum string s = fmt;
     // s can be analyzed like any compile-time constant
}

// string literals are zero-terminated
// and must be distinct from other array literals in undecayed 
form
// (hex strings are even more special)
format("%d", 10); // okay: fmt!(__string(char, "%d"))

// actual array literal
format(['%', 'd'], 10); // okay: fmt!(__array!(__integer!(char, 
'%'), __integer!(char, 'd'))

string fmt = "%s";
// calls some other format function
// that must do run-time checking
format(fmt, 10);
```

As for static indexing:
```d
struct Tuple(Ts...)
{
     Ts expand;

     static foreach (i; 0 .. Ts.length)
         ref Ts[i] opIndex(T)(__integer!(T, i)) return => 
expand[i];

     // or

     static foreach (i; 0 .. Ts.length)
         ref Ts[i] opIndex(__integer!(size_t, i)) return => 
expand[i];
}

Tuple!(int, string) t;

auto x = t[0]; // calls t.opIndex!int(__integer!(int, 0));
auto y = t[1u]; // calls t.opIndex!uint(__integer!(uint, 1));
auto z = t[2L]; // error, none of the overloads match

// or

auto x = t[0]; // calls t.opIndex(__integer!(size_t, 0));
auto y = t[1u]; // calls t.opIndex(__integer!(size_t, 1));
auto z = t[2L]; // error, none of the overloads match
```

The second alternative only works if the types `__integer(T, x)` 
have semantics that they convert implicitly to each other if 
their values are equal.
Jun 25
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
I have a hunch that what you are seeing is actually just different 
stages of semantic analysis.

Initially an array literal does not have a type: 
https://github.com/dlang/dmd/blob/c87ec3505e6675c9cd14697a4a18a0212d6a3b46/compiler/src/dmd/parse.d#L8715

``e = new AST.ArrayLiteralExp(loc, null, values);``

That is what the null means there.

https://github.com/dlang/dmd/blob/c87ec3505e6675c9cd14697a4a18a0212d6a3b46/compiler/src/dmd/expression.d#L1909


I don't think that there is a type to expose.
Jun 28
parent Paul Backus <snarwin gmail.com> writes:
On Saturday, 28 June 2025 at 07:58:03 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 I have a hunch that what you are seeing is actually just 
 different stages of semantic analysis.

 Initially an array literal does not have a type: 
 https://github.com/dlang/dmd/blob/c87ec3505e6675c9cd14697a4a18a0212d6a3b46/compiler/src/dmd/parse.d#L8715

 ``e = new AST.ArrayLiteralExp(loc, null, values);``
Obviously it doesn't have a type during parsing--types are determined in semantic analysis.
Jun 28
prev sibling parent reply IchorDev <zxinsworld gmail.com> writes:
On Wednesday, 25 June 2025 at 17:56:09 UTC, Quirin Schroll wrote:
 I don’t know if I’m getting this 100% correct, but saying `[1, 
 2]` is an `int[]` isn’t the full story since it’s more like an 
 `int[2]` that “decays” to a `int[]` (which usually ends up on 
 the heap) unless you “catch” it early enough. Catching such a 
 literal isn’t too difficult, the `staticArray` function does it.
There’s a really clever idea in here somewhere but I think you didn’t quite hit the nail on the head. The one-way implicit casting of literals often gets in my way… ``` auto x = 1; ushort y = x; //ERR: `1` is an `int`… ugh ushort z = 1; //OK: `1` is actually `ushort` now auto a = [1,2]; ubyte[] b = a.dup(); //ERR: `[1,2]` is an `int[]`… ugh ubyte[] c = [1,2]; //OK: on second thought, `[1,2]` can totally be a `ubyte[]`… ``` One thing that could mitigate this is having explicit suffixes for char, byte, and short. But what would be really nice is if the language could keep track of when a variable’s type was inferred from nothing but a literal, and then allow the usual type conversion restrictions to be bent a little based on how the literal could’ve been interpreted as a different type. It’s a similar idea to https://dlang.org/spec/type.html#vrp which never accounts for variables that have just been unconditionally assigned a literal value. Oddly, the idea that I described already exists for enums: ```d enum int[] x = [1,2]; ushort[] y = x; //no error?! ushort[2] z = x; //still no error!!?! ``` Hopefully that makes sense. TL;DR: Let variables that are initialised/unconditionally assigned literals follow the same implicit cast rules as the literal, meaning that `int x=1; byte y=x;` compiles since `1` can be inferred as a byte.
Jul 04
next sibling parent reply Nick Treleaven <nick geany.org> writes:
On Friday, 4 July 2025 at 16:16:05 UTC, IchorDev wrote:
 auto a = [1,2];
 ubyte[] b = a.dup(); //ERR: `[1,2]` is an `int[]`… ugh
How can that work? `dup` is a template function that uses the type of `a`. `dup` doesn't know anything about `b` being `ubyte[]`. That would need backwards type inference.
 ```
 ubyte[] c = [1,2]; //OK: on second thought, `[1,2]` can totally 
 be a `ubyte[]`…
 ```
 One thing that could mitigate this is having explicit suffixes 
 for char, byte, and short.
You can write `cast(ubyte) [1, 2]`: https://dlang.org/spec/expression.html#cast_array_literal
 But what would be really nice is if the language could keep 
 track of when a variable’s type was inferred from nothing but a 
 literal, and then allow the usual type conversion restrictions 
 to be bent a little based on how the literal could’ve been 
 interpreted as a different type.
 It’s a similar idea to https://dlang.org/spec/type.html#vrp 
 which never accounts for variables that have just been 
 unconditionally assigned a literal value.
VRP works for expressions, it doesn't work across statements (except for const variable declarations). To do that in the general case requires a lot of analysis, slowing compilation.
 Oddly, the idea that I described already exists for enums:
 ```d
 enum int[] x = [1,2];
 ushort[] y = x; //no error?!
 ushort[2] z = x; //still no error!!?!
 ```
There's no memory allocated for `x`, it is a symbol representing an array literal. The type of the enum is not a problem because the element values are still known, similar to: ```d enum int i = 1; ushort s = i; ```
Jul 05
next sibling parent reply Nick Treleaven <nick geany.org> writes:
On Saturday, 5 July 2025 at 10:50:13 UTC, Nick Treleaven wrote:
 On Friday, 4 July 2025 at 16:16:05 UTC, IchorDev wrote:
 ubyte[] c = [1,2]; //OK: on second thought, `[1,2]` can 
 totally be a `ubyte[]`…
 ```
 One thing that could mitigate this is having explicit suffixes 
 for char, byte, and short.
You can write `cast(ubyte) [1, 2]`:
I meant `cast(ubyte[]) [1, 2]`.
 https://dlang.org/spec/expression.html#cast_array_literal
Jul 05
parent reply IchorDev <zxinsworld gmail.com> writes:
On Saturday, 5 July 2025 at 15:48:43 UTC, Nick Treleaven wrote:
 On Saturday, 5 July 2025 at 10:50:13 UTC, Nick Treleaven wrote:
 You can write `cast(ubyte) [1, 2]`:
I meant `cast(ubyte[]) [1, 2]`.
 https://dlang.org/spec/expression.html#cast_array_literal
It doesn’t work with static arrays so it doesn’t even solve one of Quinn’s first examples. ```d auto x = [1,2]; ushort[2] y = cast(ushort[2])x; ``` The unspoken problem here is integer arithmetic: putting explicit casts on expressions SUCKS. There’s no nicer way to say it. The code ugliness is over 100%. If you haven’t had to add `cast(ushort)(exp)` to hundreds of expressions before, then count yourself lucky because it defaces pretty code into a hideous mess. I still fail to see Walter’s reasoning as to how compatibility with C’s integer promotion is more important than compatibility with its function pointer syntax, truncating implicit casts, etc. Integer promotion doesn’t even prevent overflows for int/long types, similar to how C’s `const` is completely useless. So, a better system would just ask the programmer to be specific about how they want/expect each integer to handle overflow (see std.checkedint… which doesn’t even work properly with bytes/shorts because of integer promotion). Or: just add a flag to disable integer promotion. ;)
Jul 06
parent Nick Treleaven <nick geany.org> writes:
On Sunday, 6 July 2025 at 14:22:23 UTC, IchorDev wrote:
 On Saturday, 5 July 2025 at 15:48:43 UTC, Nick Treleaven wrote:
 On Saturday, 5 July 2025 at 10:50:13 UTC, Nick Treleaven wrote:
 You can write `cast(ubyte) [1, 2]`:
I meant `cast(ubyte[]) [1, 2]`.
 https://dlang.org/spec/expression.html#cast_array_literal
It doesn’t work with static arrays so it doesn’t even solve one of Quinn’s first examples.
I didn't say it did.
 ```d
 auto x = [1,2];
 ushort[2] y = cast(ushort[2])x;
 ```
You said integer literal type suffixes could be useful:
 One thing that could mitigate this is having explicit suffixes 
 for char, byte, and short.
Instead I pointed you to array literal casts, which solves that problem without repeating a suffix for every literal element of the array literal, and it's a feature that already exists. Your code is casting a non-literal, which the link I gave you explicitly says means something different. That was not what I suggested. For the record, you can cast an array literal to a static array type: ```d ushort[2] y = cast(ushort[2]) [1,2]; assert(y == [1, 2]); ```
Jul 06
prev sibling parent reply IchorDev <zxinsworld gmail.com> writes:
On Saturday, 5 July 2025 at 10:50:13 UTC, Nick Treleaven wrote:
 On Friday, 4 July 2025 at 16:16:05 UTC, IchorDev wrote:
 auto a = [1,2];
 ubyte[] b = a.dup(); //ERR: `[1,2]` is an `int[]`… ugh
How can that work?
It doesn’t, that’s the point.
 VRP works for expressions, it doesn't work across statements 
 (except for const variable declarations). To do that in the 
 general case requires a lot of analysis, slowing compilation.
The problem is that in practice VRP only really cares about type. I don’t see why it would be non-trivial to add a piece of metadata that specifies the literal that a variable was initialised with, but I haven’t worked on dmd much so maybe it’s not flexible like that. A simpler solution is just remove integer promotion. Yes, ha-ha very funny. More realistically, a future edition could make it so that the compiler never assumes that a number that could fit into a `byte`/`short` is an `int`, but this would break a lot of declarations like `auto n=1;`, so I can see it being unpopular. I think this idea is in the same category as wanting `float` arithmetic to use `float`s: it would be so useful but it’s unlikely to happen.
 There's no memory allocated for `x`, it is a symbol 
 representing an array literal. The type of the enum is not a 
 problem because the element values are still known, similar to:

 ```d
 enum int i = 1;
 ushort s = i;
 ```
Uhh, yeah obviously? I pointed this out because it illustrates how D already breaks the usual implicit cast rules in the same way as in my idea. The only difference between enums and variables here is that variables are less statically predictable. Nothing requires the compiler to actually allocate a variable either if that variable can be optimised away.
Jul 06
parent Nick Treleaven <nick geany.org> writes:
On Sunday, 6 July 2025 at 13:37:41 UTC, IchorDev wrote:
 On Saturday, 5 July 2025 at 10:50:13 UTC, Nick Treleaven wrote:
 On Friday, 4 July 2025 at 16:16:05 UTC, IchorDev wrote:
 auto a = [1,2];
 ubyte[] b = a.dup(); //ERR: `[1,2]` is an `int[]`… ugh
How can that work?
It doesn’t, that’s the point.
I assumed you were showing that line as something that could be made to work. I was asking how you think the language could make that work.
 VRP works for expressions, it doesn't work across statements 
 (except for const variable declarations). To do that in the 
 general case requires a lot of analysis, slowing compilation.
The problem is that in practice VRP only really cares about type. I don’t see why it would be non-trivial to add a piece of metadata that specifies the literal that a variable was initialised with
Because how do you know the variable wasn't mutated after initialization but before the value of the variable is needed. Detecting that needs complex analysis. You could come up with simple cases, but then people will get confused and complain when they have a slightly more complex case and it doesn't work.
, but I haven’t worked on dmd much so maybe
 it’s not flexible like that.
 A simpler solution is just remove integer promotion. Yes, ha-ha 
 very funny. More realistically, a future edition could make it 
 so that the compiler never assumes that a number that could fit 
 into a `byte`/`short` is an `int`, but this would break a lot 
 of declarations like `auto n=1;`, so I can see it being 
 unpopular.
Not only that, but it would also encourage people to use integer types that are slower than optimal for the machine, thus producing slow code, as well as making under/overflow more likely.
 I think this idea is in the same category as wanting `float` 
 arithmetic to use `float`s: it would be so useful but it’s 
 unlikely to happen.

 There's no memory allocated for `x`, it is a symbol 
 representing an array literal. The type of the enum is not a 
 problem because the element values are still known, similar to:

 ```d
 enum int i = 1;
 ushort s = i;
 ```
Uhh, yeah obviously? I pointed this out because it illustrates how D already breaks the usual implicit cast rules in the same way as in my idea. The only difference between enums and variables here is that variables are less statically predictable.
An enum is const, so VRP works exactly the same as a const variable.
 Nothing requires the compiler to actually allocate a variable 
 either if that variable can be optimised away.
Jul 06
prev sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Friday, 4 July 2025 at 16:16:05 UTC, IchorDev wrote:
 On Wednesday, 25 June 2025 at 17:56:09 UTC, Quirin Schroll 
 wrote:
 I don’t know if I’m getting this 100% correct, but saying `[1, 
 2]` is an `int[]` isn’t the full story since it’s more like an 
 `int[2]` that “decays” to a `int[]` (which usually ends up on 
 the heap) unless you “catch” it early enough. Catching such a 
 literal isn’t too difficult, the `staticArray` function does 
 it.
There’s a really clever idea in here somewhere but I think you didn’t quite hit the nail on the head. The one-way implicit casting of literals often gets in my way… ``` auto x = 1; ushort y = x; //ERR: `1` is an `int`… ugh ushort z = 1; //OK: `1` is actually `ushort` now auto a = [1,2]; ubyte[] b = a.dup(); //ERR: `[1,2]` is an `int[]`… ugh ubyte[] c = [1,2]; //OK: on second thought, `[1,2]` can totally be a `ubyte[]`… ``` One thing that could mitigate this is having explicit suffixes for char, byte, and short.
While I’d like to have those for symmetry and consistency, `short(1)` actually works. But that’s not your actual issue here. You can’t initialize a smaller integer type from a _run-time_ value of a bigger integral type. Plain literals have a default decay type which isn’t maximally narrow, but `int`. D inherited that from C, so that what looks like C and compiles, acts like C, for the most part. The importance of this principle has declined over the years, but for very basic stuff, it’s still relevant. There’s virtually no difference between `auto x = short(1)` and `short x = 1`.
 But what would be really nice is if the language could keep 
 track of when a variable’s type was inferred from nothing but a 
 literal, and then allow the usual type conversion restrictions 
 to be bent a little based on how the literal could’ve been 
 interpreted as a different type.
You can’t easily do that with run-time values.
 It’s a similar idea to https://dlang.org/spec/type.html#vrp 
 which never accounts for variables that have just been 
 unconditionally assigned a literal value.
That’s because VRP is an expression feature. It doesn’t interact with statements. I don’t know how VRP is implemented, but it treats integral types as (unions of ) intervals. That’s probably a lot easier to do within expressions than over statements where you’d have to store the intervals with variables indefinitely.
 Oddly, the idea that I described already exists for enums:
 ```d
 enum int[] x = [1,2];
 ushort[] y = x; //no error?!
 ushort[2] z = x; //still no error!!?!
 ```
 Hopefully that makes sense.
It makes sense because an `enum` isn’t a run-time value, but quite close to a named literal with explicit decay type. (By the latter, I mean that `enum short h = 10` has type `short` not `int`, but behaves like the literal `10`; the same is true for `enum short[] hs = [1,2]` which has type `short[]` but you can infer its length and assign it to a `ubyte[]`.) It also works with `immutable` values with compile-time values: ```d immutable int v = 10; immutable int[] vs = [v,v]; void main() { byte w = v; // okay byte[] ws = vs; // error } ``` The slice thing doesn’t work with `immutable` because slices are somewhat reference types. It works with `enum` because `x` in your example is a literal. `ws = vs` wouldn’t ever allocate, but `y = x` actually will (unless optimized out) in the sense that it’s not ` nogc`.
 TL;DR: Let variables that are initialised/unconditionally 
 assigned literals follow the same implicit cast rules as the 
 literal, meaning that `int x=1; byte y=x;` compiles since `1` 
 can be inferred as a byte.
It won’t happen because it’s hard to implement, hard to reason through in general, and brings little value in return. In many cases, restructuring the code works.
Jul 07