www.digitalmars.com         C & C++   DMDScript  

digitalmars.dip.ideas - Literal types

reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
D’s array literals (seem to) have an unofficial type and an 
official type. What I mean by that is, `[1, 2]` has the official 
type `int[]` (if you do `typeof([1, 2])` it says `int[]`), but 
unofficially, it can do a bunch of things a `int[]` generally 
doesn’t support. You can assign the literal to `immutable int[]` 
because it’s unique. You can assign it to `ubyte[]` because VRP 
can prove each value entry is within the bounds of `ubyte`. And 
you can assign it to `int[2]` because the length is right and it 
won’t heap allocate. The last thing is important because it’s not 
a mere optimization that’s optional, it works in ` nogc`, which 
means this is specified.

I don’t know if I’m getting this 100% correct, but saying `[1, 
2]` is an `int[]` isn’t the full story since it’s more like an 
`int[2]` that “decays” to a `int[]` (which usually ends up on the 
heap) unless you “catch” it early enough. Catching such a literal 
isn’t too difficult, the `staticArray` function does it.

In a similar fashion, numeric literals decay: `typeof(1)` is 
`int`, but knowing something is `1` is so much more concrete than 
knowing it’s some `int`.

Add `__typeof` that returns a non-decayed type, one that has all 
the information the compiler officially retains, and add the 
respective results for this operator. That means, with `x` some 
run-time `int` variable, `__typeof([1, 2, x])` isn’t `int[]`. 
It’s something like `__array!(__integer!(int, 1), __integer!(int, 
2), int)`. Lastly, add a parameter storage class `__nodecay` or 
` nodecay` that only matters when the type of the parameter is a 
template type parameter and is inferred; then, inference does not 
decay the type before binding the type parameter.

This would enable low-level functionality where one can 
special-case certain values, e.g. giving a function a specialized 
overload with a parameter type of `__typeof(0)` that’s only 
matched by the constant `0` or `int(0)` or an enum of type `int` 
with value `0`, but not `0u`, `0L`, or an enum of type `ubyte` 
with value 0, and of course not by anything that might have a 
value distinct form `0` such as `1` or a run-time value.

Because those types would be templates (or behave as such), their 
arguments could be matched:
```d
void f0(T)(__integer!(T, 0) x) { }
void fi(int x) { }
alias f = f0;
alias f = fi;
f(0); // calls f0!(int)
f(1); // calls fi
```

(No need to change overload resolution: Partial ordering 
determines that `fi` can be called with `f0`’s (synthesized) 
parameter type `__integer!(int, 0)`, but `f0` can’t be called 
with `fi`’s parameter type `int`, so it’s more specific.)

This is akin to how `staticArray` can infer type and size from an 
array literal, just that it’s much finer grained. The 
`staticArray` function makes the argument decay insofar as the 
types of the entries must be unified. With this addition, that 
becomes optional: `__array!(int, string)` would be totally valid, 
it just can’t decay into a static or dynamic array type, so if it 
has to because it’s not caught early enough, that’s an error. A 
tuple type of `int` and `string` could support an `opAssign` that 
takes an `__array!(int, string)` parameter and that allows `t = 
[1, "hi"]` even though `typeof([1, "hi"])` fails because it 
requires an array literal to decay into some `T[]` which this one 
can’t. Maybe a future edition could make `typeof([1, "hi"])` not 
be an error, but as of now, this can be used to test if two 
values have compatible type and we can’t just take that use case 
away.

This subsumes `enum` parameters and tuples to some degree.

Recap: In another DIP idea, I suggested an enum parameter would 
be a compile-time constant that’s passed the same way a run-time 
parameter is passed to a function call. Essentially what Zig 
calls `comptime` parameters.

I also suggested in the past that static arrays are basically 
homogeneous tuples and that they could be generalized to tuples. 
That would mean the syntax would be brackets, not parentheses, 
but that neatly solves the 1-tuple case, since `(x)` must stay 
`x`, but `[x]` isn’t the same as `x`. The idea there the same 
decay observation and that there’s no inherent need to decay 
array literals to static arrays immediately and to dynamic arrays 
further.

Of course, this idea doesn’t solve `auto enum` from the `enum` 
parameter idea as neatly. It also doesn’t add any tuple 
decomposition support and syntax sugars one might want to have. 
In that context, `__array` is a bad name and it should be 
`__tuple` instead.

It does solve e.g. compile-time format strings and indexing into 
a tuple:
```d
int format(Fmt, Ts...)(__nodecay Fmt fmt, in Ts args)
if (__traits(compiles, { enum string s = fmt; }))
{
     // If fmt is a string literal,
     // its type Fmt is a unit type,
     // i.e. enough to recreate the value
     // without even considering fmt.
     enum string s = fmt;
     // s can be analyzed like any compile-time constant
}

// string literals are zero-terminated
// and must be distinct from other array literals in undecayed 
form
// (hex strings are even more special)
format("%d", 10); // okay: fmt!(__string(char, "%d"))

// actual array literal
format(['%', 'd'], 10); // okay: fmt!(__array!(__integer!(char, 
'%'), __integer!(char, 'd'))

string fmt = "%s";
// calls some other format function
// that must do run-time checking
format(fmt, 10);
```

As for static indexing:
```d
struct Tuple(Ts...)
{
     Ts expand;

     static foreach (i; 0 .. Ts.length)
         ref Ts[i] opIndex(T)(__integer!(T, i)) return => 
expand[i];

     // or

     static foreach (i; 0 .. Ts.length)
         ref Ts[i] opIndex(__integer!(size_t, i)) return => 
expand[i];
}

Tuple!(int, string) t;

auto x = t[0]; // calls t.opIndex!int(__integer!(int, 0));
auto y = t[1u]; // calls t.opIndex!uint(__integer!(uint, 1));
auto z = t[2L]; // error, none of the overloads match

// or

auto x = t[0]; // calls t.opIndex(__integer!(size_t, 0));
auto y = t[1u]; // calls t.opIndex(__integer!(size_t, 1));
auto z = t[2L]; // error, none of the overloads match
```

The second alternative only works if the types `__integer(T, x)` 
have semantics that they convert implicitly to each other if 
their values are equal.
Jun 25
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
I have a hunch that what you are seeing is actually just different 
stages of semantic analysis.

Initially an array literal does not have a type: 
https://github.com/dlang/dmd/blob/c87ec3505e6675c9cd14697a4a18a0212d6a3b46/compiler/src/dmd/parse.d#L8715

``e = new AST.ArrayLiteralExp(loc, null, values);``

That is what the null means there.

https://github.com/dlang/dmd/blob/c87ec3505e6675c9cd14697a4a18a0212d6a3b46/compiler/src/dmd/expression.d#L1909


I don't think that there is a type to expose.
Jun 28
parent Paul Backus <snarwin gmail.com> writes:
On Saturday, 28 June 2025 at 07:58:03 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 I have a hunch that what you are seeing is actually just 
 different stages of semantic analysis.

 Initially an array literal does not have a type: 
 https://github.com/dlang/dmd/blob/c87ec3505e6675c9cd14697a4a18a0212d6a3b46/compiler/src/dmd/parse.d#L8715

 ``e = new AST.ArrayLiteralExp(loc, null, values);``
Obviously it doesn't have a type during parsing--types are determined in semantic analysis.
Jun 28
prev sibling parent IchorDev <zxinsworld gmail.com> writes:
On Wednesday, 25 June 2025 at 17:56:09 UTC, Quirin Schroll wrote:
 I don’t know if I’m getting this 100% correct, but saying `[1, 
 2]` is an `int[]` isn’t the full story since it’s more like an 
 `int[2]` that “decays” to a `int[]` (which usually ends up on 
 the heap) unless you “catch” it early enough. Catching such a 
 literal isn’t too difficult, the `staticArray` function does it.
There’s a really clever idea in here somewhere but I think you didn’t quite hit the nail on the head. The one-way implicit casting of literals often gets in my way… ``` auto x = 1; ushort y = x; //ERR: `1` is an `int`… ugh ushort z = 1; //OK: `1` is actually `ushort` now auto a = [1,2]; ubyte[] b = a.dup(); //ERR: `[1,2]` is an `int[]`… ugh ubyte[] c = [1,2]; //OK: on second thought, `[1,2]` can totally be a `ubyte[]`… ``` One thing that could mitigate this is having explicit suffixes for char, byte, and short. But what would be really nice is if the language could keep track of when a variable’s type was inferred from nothing but a literal, and then allow the usual type conversion restrictions to be bent a little based on how the literal could’ve been interpreted as a different type. It’s a similar idea to https://dlang.org/spec/type.html#vrp which never accounts for variables that have just been unconditionally assigned a literal value. Oddly, the idea that I described already exists for enums: ```d enum int[] x = [1,2]; ushort[] y = x; //no error?! ushort[2] z = x; //still no error!!?! ``` Hopefully that makes sense. TL;DR: Let variables that are initialised/unconditionally assigned literals follow the same implicit cast rules as the literal, meaning that `int x=1; byte y=x;` compiles since `1` can be inferred as a byte.
Jul 04