www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [just talk] what if implicitly typed literals were disallowed

reply "Adam D. Ruppe" <destructionator gmail.com> writes:
I'm saying [just talk] because this isn't a serious proposal to 
change the D language, I just want to discuss what if.


The thread about string and rep had me bring up my library string 
implementation again, and it was shot down because of "auto a = 
"foo";" meaning a would not have the library type.

It made me wonder: what if we got rid of literals? Well, can't do 
that, we need some kind of building block. But, what if we 
required their use to have a clear type?

I'm tempted to say literals should only be used when there is an 
explicit type given somewhere


int a = 10; // allowed, explicitly int
auto a = 10; // disallowed, literal 10 has no declared type

int foo(int a) { return 0; /* allowed, explicit type in function 
signature */ }

foo(10); // allowed, explicit type in function signature

auto a = 1 + 1; // illegal, the type is still not explicit

int a = 10;
auto b = 10 + a; // legal, a has an explicit type, so b does too

void foo(T)(T a)  {}

foo(10); // illegal, T is implicit and it is literals so no go
foo!int(10); // legal

auto a = cast(int) 10; // legal, the cast makes it explicit



Why does this matter? Well it gives you a lot of user defined 
capabilities here. Let's combine with the following:

* make the internal type of the literals a special word. _int 
instead of int for example. Now int is not a keyword - it is 
available to library redefinitions (surely in object.d, but you 
can import your own int if you wanted to)

* have the library give them the prettier names. this might be 
"alias int = _int;" or it might be struct int { this(_int i) { } 
/* custom functionality */}; in other words you can give complete 
user defined behavior without name conflicts. This can be 
replaced at any time by hacking up your druntime.


* the requirement of being explicit ensures the internal types 
don't leak out somewhere unintentionally in templates or other 
auto types, so your behavior is always predictable, thus solving 
the problem of the library replacement for string



Now using the existing richness for library types, we can 
customize everything and put it in without the compiler even 
caring.


Of course having to explicitly name literals *somewhere* could be 
a major hassle... the code breakage means that part is definitely 
out for D (though renaming the compiler types is something we 
could do, since the object.d aliases will just work except maybe 
for error messages)

But how annoying? Ignoring the reality of the situation, is this 
a good idea in theory?

destroy
Oct 24 2012
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 10/24/2012 06:38 PM, Adam D. Ruppe wrote:
 ...
 But how annoying? Ignoring the reality of the situation, is this a good
 idea in theory?

 destroy

I'd miss ifti on literals. If your goal is to be able to customize the type of literals from druntime, it is also possible to make the compiler invoke some predefined templates/functions in order to transform literals to an arbitrary library-defined type.
Oct 24 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Oct 24, 2012 at 06:38:34PM +0200, Adam D. Ruppe wrote:
[...]
 The thread about string and rep had me bring up my library string
 implementation again, and it was shot down because of "auto a =
 "foo";" meaning a would not have the library type.
 
 It made me wonder: what if we got rid of literals? Well, can't do
 that, we need some kind of building block. But, what if we required
 their use to have a clear type?
 
 I'm tempted to say literals should only be used when there is an
 explicit type given somewhere

Well, completely getting rid of type inference for literals is a bit extreme, I think, but I definitely agree with the spirit of your proposal: literals should NOT be assigned a type beforehand; the compiler should always look for the best fit in the given context first, and THEN if nothing else says anything more about the type, fall back to the default type. So for example, "f(10)" will correctly infer the type based on f's signature, and so will f([1,2,3]) where f's parameter may be typed, say, ushort[] or even real[]. This is particular important when f is a template function: the compiler should try to find all possible matches BEFORE falling back to int[]. Currently, it doesn't always do this, so sometimes you have to explicitly type the literal in order to get it to match the desired template. Only when nothing else works (say you wrote "auto x = [1,2,3]") should the compiler fall back to int[], for example. [...]
 Why does this matter? Well it gives you a lot of user defined
 capabilities here. Let's combine with the following:
 
 * make the internal type of the literals a special word. _int
 instead of int for example. Now int is not a keyword - it is
 available to library redefinitions (surely in object.d, but you can
 import your own int if you wanted to)
 
 * have the library give them the prettier names. this might be
 "alias int = _int;" or it might be struct int { this(_int i) { } /*
 custom functionality */}; in other words you can give complete user
 defined behavior without name conflicts. This can be replaced at any
 time by hacking up your druntime.
 
 
 * the requirement of being explicit ensures the internal types don't
 leak out somewhere unintentionally in templates or other auto types,
 so your behavior is always predictable, thus solving the problem of
 the library replacement for string

I think this is orthogonal to assigning a type to a literal. You could, in theory, modify the compiler so that a literal like [1,2,3] is typed Integer[], where Integer is aliased to int in druntime, and can be overridden by a user-defined Integer. This does not require that literals have no type at all. Anyway, on a related topic: another thing that irks me about D literals is that there is sometimes unexpected implicit copying. Something like: real[] x = [1,2,3]; contains an implicit runtime copy of a compile-time built array [1,2,3] into x. Ideally, since literals are by definition used only in a single place, the array being initialized should constructed in-place with the specified values. The current compiler emits a call to a literal-construction function at runtime, which is unnecessary overhead when the literal is small (and literals are usually small, since otherwise you wouldn't write things that way!) -- it could've been just a handful of MOV's instead of an entire function call. T -- Маленькие детки - маленькие бедки.
Oct 24 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 24 October 2012 at 19:00:03 UTC, Timon Gehr wrote:
 I'd miss ifti on literals.

Yeah, that'd be a total sucl.
 If your goal is to be able to customize the
 type of literals from druntime, it is also possible to make the 
 compiler
 invoke some predefined templates/functions in order to transform
 literals to an arbitrary library-defined type.

OOooh, I like it. Then kill off the keywords while you're at it and boom. alias _int int; /* repeat for the rest */ template __d_literal(T, T value) { alias value __d_literal; } and you have the current behavior, but complete customization potential. That's potentially awesome. You could even wrap literals just to identify them and overload functions on them. whoa i kinda want.
Oct 24 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 24 October 2012 at 19:04:59 UTC, H. S. Teoh wrote:
 literals should NOT be assigned a type beforehand; the
 compiler should always look for the best fit in the given 
 context first, and THEN if nothing else says anything more
 about the type, fall back to the default type.

Yup. I actually thought D was already doing some of this tbh. Heck maybe it is. I was just focusing on complete customization there. But another thing I'd like is for literals to be infinitely sized until context forces it down. So if you were to write some enormous literal like 2^70 that should work, then range checks say "hey too big" if you try to assign it to a long or otherwise work on it. But if we had the illusion of infinite size at compile time we could do magic like bigint literals by breaking up a huge number in CTFE (which should prolly have well defined endianness so your ctfe bitops can make sense).
 Anyway, on a related topic: another thing that irks me about D 
 literals is that there is sometimes unexpected implicit copying.

Aye. Don has talked about this before. I disagree that [] should be disallowed on non-static data (something IIRC he wants to make it a clear literal, removing the need for any runtime stuff at all), but if it is static, it certainly should act that way too. I guess one benefit of [] being defined to be compile time is it could be an implicit CTFE force: auto a = [foo(), 10]; // eval foo as if this was an enum, failing to compile if it can't do it But I really like the convenient syntax of [] on runtime variables. We could say array() instead or whatever but meh, it'd still be annoying when the code breaks.
Oct 24 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Oct 25, 2012 at 12:55:36AM +0200, Adam D. Ruppe wrote:
 On Wednesday, 24 October 2012 at 19:00:03 UTC, Timon Gehr wrote:

If your goal is to be able to customize the type of literals from
druntime, it is also possible to make the compiler invoke some
predefined templates/functions in order to transform literals to an
arbitrary library-defined type.

OOooh, I like it. Then kill off the keywords while you're at it and boom.

But there might be instances where you want to refer to the built-in types. Having the keywords around (perhaps suitably renamed) seems to be needed still.
 alias _int int;
 /* repeat for the rest */
 
 template __d_literal(T, T value) {
     alias value __d_literal;
 }
 
 and you have the current behavior, but complete customization
 potential. That's potentially awesome. You could even wrap literals
 just to identify them and overload functions on them.
 
 whoa i kinda want.

I like this idea too. Coupled with CTFE, it can potentially let you rewire the language in new and novel ways. I mean, think about this one: initialize a complicated multi-dimensional array with a custom string notations, expressed as a token string, and have your custom class/struct transform that into compile-time initialization of said array. And what about _transparent substitution_ of AA literals for a custom hash implementation? You could even make std.container take AA literals as initializers for hashes, and have CTFE transform that into the appropriate ctor/init calls. You could use AA literals for all kinds of different user-defined types. Use AA literals to implement a *library* solution for Python's named-parameter function calling, by turning them into the appropriate tuples. You might even be able to make BigNum literals without using strings (or am I dreaming too much about this one?). Library implementation of quad-precision floats. Or library implementation of int.isPrime, which computes an int literal's primality at compile-time? (P.S. My perl script doth mock me with the random signature it selected below. :-P) T -- Just because you can, doesn't mean you should.
Oct 24 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Adam D. Ruppe:

 I was just focusing on complete customization there. But 
 another thing I'd like is for literals to be infinitely sized 
 until context forces it down.

 So if you were to write some enormous literal like 2^70 that 
 should work, then range checks say "hey too big" if you try to 
 assign it to a long or otherwise work on it.

This is how Go is designed, I think. It also helps against this kind of bugs that are *not* acceptable in a language that defines itself as "safer than C": http://d.puremagic.com/issues/show_bug.cgi?id=4835 Bye, bearophile
Oct 24 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 24 October 2012 at 23:16:15 UTC, bearophile wrote:
 It also helps against this kind of bugs that are *not* 
 acceptable in a language that defines itself as "safer than C":

Yes, I agree this should be caught with literals at compile time. It's just kinda silly not to. (if you explicitly want overflow, you can always cast it or something)
Oct 24 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
BTW, bearophile, we've talked about runtime overflow checks 
before. I'm generally against them, but if we had user defined 
basic types, we could implement them in the library (already 
possible, though some important optimization opportunities are 
missed on it right now which is a potential practical problem).

But then we could make all ints use the special overflow checked 
type too by changing a line in druntime.
Oct 24 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 24 October 2012 at 23:15:35 UTC, H. S. Teoh wrote:
 there might be instances where you want to refer to the 
 built-in types. Having the keywords around (perhaps suitably 
 renamed) seems to be needed still.

Oh yes, definitely. I'm thinking of making them _int or __int rather than int, just freeing up the common words. Heck I'd like to have int and char available as variable names anyway so hey.
 And what about _transparent substitution_ of AA literals for a 
 custom hash implementation?

yessss
 You could even make std.container take AA literals
 as initializers for hashes, and have CTFE transform that into 
 the

I think this would work today actually... just define a CTFE constructor, similar to std.conv.octal, though it'd still have to be consistent types; T[V] where T is always the same, so not quite suitable for named param calling or json.
 You might even be able to make BigNum literals without using 
 strings (or am I dreaming too much about this one?).

this should definitely be possible if the literals didn't overflow.
Oct 24 2012
prev sibling next sibling parent "BLM768" <blm768 gmail.com> writes:
 And what about _transparent substitution_ of AA literals for a 
 custom hash
 implementation?

That would also be an excellent way to sidestep the current issues with AAs. The AA code would have to be heavily refactored, which could clean up the mess and probably get rid of a lot of bugs. It could also make performance tuning easier. Hopefully, this would make other aspects of the compiler and runtime simpler as well. I'm not sure how the optimizer would like the new system, though. It's likely that it would have to be beefed up, especially in the area of function inlining between compilation units.
Oct 25 2012
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Oct 25, 2012 at 01:40:19AM +0200, Adam D. Ruppe wrote:
 BTW, bearophile, we've talked about runtime overflow checks before.
 I'm generally against them, but if we had user defined basic types,
 we could implement them in the library (already possible, though
 some important optimization opportunities are missed on it right now
 which is a potential practical problem).

Didn't Andrei give a working example of that in TDPL?
 But then we could make all ints use the special overflow checked
 type too by changing a line in druntime.

Yeah that would be neat. T -- Some ideas are so stupid that only intellectuals could believe them. -- George Orwell
Oct 25 2012