www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Value Preservation and Polysemy

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I've had a talk with Walter today, and two interesting things transpired.

First off, Walter pointed out that I was wrong about one conversion rule 
(google value preservation vs. type preservation). It turns out that 
everything that's unsigned and less than int is actually converted to 
int, NOT unsigned int as I thought. This is the case in C, C++, and D.

Second, as of today Walter devised a very crude heuristic for 
typechecking narrowing conversions:

(a) a straight assignment x = y fails if y is wider than x.

(b) however, x = e compiles for more complex expressions EVEN if there 
is potential for loss of precision.

Now enter polysemy. With that, we can get the right rules in place and 
minimize false positives. An expression will yield a polysemous value 
with the as-C-does-it type as its principal type. The secondary type is 
a carefully computed narrower type that is the tightest actual type.

If you just say auto or use the value with overloaded functions etc., 
it's just like in C - the as-in-C type will be in vigor. But if you 
specifically ask for a narrower type, the secondary type enters in effect.

Examples:

uint u1 = ...;
ushort us1 = ...;
auto a = u1 & us1; // fine, a is uint
ushort b = u1 & us1; // fine too, secondary type kicks in

long l1 = ...;
auto c = u1 / l1; // c is long
int d = u1 / l1; // fine too, secondary type kicks in

We need to think this through for complex expressions etc. Walter and I 
are quite excited that this will take care of a significant portion of 
the integral conversions mess (in addition to handling literals, 
constants, and variables within a unified framework).

The plan is to deploy polysemous integrals first without changing the 
rest of the conversion rules. At that point, if the technique turns out 
to enjoy considerable success, Walter agreed to review and possibly 
stage in the change I suggested to drop the implicit signed -> unsigned 
conversion. With that, I guess we can claim victory in the war between 
spurious vs. too lax conversions.

I'm very excited about polysemy. It's entirely original to D, covers a 
class of problems that can't be addressed with any of the known 
techniques (subtyping, coercion...) and has a kick-ass name to boot. 
Walter today mentioned he's still not sure I hadn't made "polysemy" up. 
Indeed, Thunderbird and Firefox are suggesting it's a typo - please "add 
to dictionary" :o).


Andrei
Nov 28 2008
next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-28 19:08:06 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Walter today mentioned he's still not sure I hadn't made "polysemy" up. 
 Indeed, Thunderbird and Firefox are suggesting it's a typo - please 
 "add to dictionary" :o).

Does this help? polysemy noun Linguistics the coexistence of many possible meanings for a word or phrase. -- New Oxford American Dictionary, 2nd Edition polysemy n : the ambiguity of an individual word or phrase that can be used (in different contexts) to express two or more different meanings [syn: lexical ambiguity] [ant: monosemy] -- WordNet <http://wordnet.princeton.edu/perl/webwn?s=polysemy> -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 28 2008
prev sibling next sibling parent Max Samukha <samukha voliacable.com.removethis> writes:
On Fri, 28 Nov 2008 16:08:06 -0800, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

I've had a talk with Walter today, and two interesting things transpired.

First off, Walter pointed out that I was wrong about one conversion rule 
(google value preservation vs. type preservation). It turns out that 
everything that's unsigned and less than int is actually converted to 
int, NOT unsigned int as I thought. This is the case in C, C++, and D.

Second, as of today Walter devised a very crude heuristic for 
typechecking narrowing conversions:

(a) a straight assignment x = y fails if y is wider than x.

(b) however, x = e compiles for more complex expressions EVEN if there 
is potential for loss of precision.

Now enter polysemy. With that, we can get the right rules in place and 
minimize false positives. An expression will yield a polysemous value 
with the as-C-does-it type as its principal type. The secondary type is 
a carefully computed narrower type that is the tightest actual type.

If you just say auto or use the value with overloaded functions etc., 
it's just like in C - the as-in-C type will be in vigor. But if you 
specifically ask for a narrower type, the secondary type enters in effect.

Examples:

uint u1 = ...;
ushort us1 = ...;
auto a = u1 & us1; // fine, a is uint
ushort b = u1 & us1; // fine too, secondary type kicks in

long l1 = ...;
auto c = u1 / l1; // c is long
int d = u1 / l1; // fine too, secondary type kicks in

We need to think this through for complex expressions etc. Walter and I 
are quite excited that this will take care of a significant portion of 
the integral conversions mess (in addition to handling literals, 
constants, and variables within a unified framework).

The plan is to deploy polysemous integrals first without changing the 
rest of the conversion rules. At that point, if the technique turns out 
to enjoy considerable success, Walter agreed to review and possibly 
stage in the change I suggested to drop the implicit signed -> unsigned 
conversion. With that, I guess we can claim victory in the war between 
spurious vs. too lax conversions.

I'm very excited about polysemy. It's entirely original to D, covers a 
class of problems that can't be addressed with any of the known 
techniques (subtyping, coercion...) and has a kick-ass name to boot. 
Walter today mentioned he's still not sure I hadn't made "polysemy" up. 
Indeed, Thunderbird and Firefox are suggesting it's a typo - please "add 
to dictionary" :o).

I think it's a very relevant term. Given that type of a value attaches a meaning to that value, values getting different types (meanings) depending on the context are polysemous. Very cool.
Andrei

Nov 29 2008
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 I've had a talk with Walter today, and two interesting things transpired.
 
 First off, Walter pointed out that I was wrong about one conversion rule 
 (google value preservation vs. type preservation). It turns out that 
 everything that's unsigned and less than int is actually converted to 
 int, NOT unsigned int as I thought. This is the case in C, C++, and D.

That has some interesting consequences. ushort x = 0xFFFF; short y = x; printf("%d %d %d\n", x>>1, y>>1, y>>>1); // prints: 32767 -1 2147483647 What a curious beast the >>> operator is!
 I'm very excited about polysemy. It's entirely original to D, covers a 
 class of problems that can't be addressed with any of the known 
 techniques (subtyping, coercion...) and has a kick-ass name to boot. 

I agree. By making the type system looser in the one place where you actually need it to be loose, you can tighten it everywhere else. Fantastic.
Dec 01 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 Andrei Alexandrescu wrote:
 I've had a talk with Walter today, and two interesting things transpired.

 First off, Walter pointed out that I was wrong about one conversion 
 rule (google value preservation vs. type preservation). It turns out 
 that everything that's unsigned and less than int is actually 
 converted to int, NOT unsigned int as I thought. This is the case in 
 C, C++, and D.

That has some interesting consequences. ushort x = 0xFFFF; short y = x; printf("%d %d %d\n", x>>1, y>>1, y>>>1); // prints: 32767 -1 2147483647 What a curious beast the >>> operator is!
 I'm very excited about polysemy. It's entirely original to D, covers a 
 class of problems that can't be addressed with any of the known 
 techniques (subtyping, coercion...) and has a kick-ass name to boot. 

I agree. By making the type system looser in the one place where you actually need it to be loose, you can tighten it everywhere else. Fantastic.

My enthusiasm about polysemy got quite a bit lower when I realized that the promises of polysemy for integral operations can be provided (and actually outdone) by range analysis, a well-known method. The way it's done: for each integral expression in the program assign two numbers: the smallest possible value, and the largest possible value. Literals will therefore have a salami-slice-thin range associated with them. Whenever code asks for a lossy implicit cast, check the range and if it fits within the target type, let the code go through. Each operation computes its ranges from the range of its operands. The computation is operation-specific. For example, the range of a & b is max(a.range.min, b.range.min) to min(a.range.max, b.range.max). Sign considerations complicate this a bit, but not quite much. The precision of range analysis can be quite impressive, for example: uint b = ...; ubyte a = ((b & 2) << 6) | (b >> 24); typechecks no problem because it can prove no loss of information for all values of b. Andrei
Dec 01 2008
parent Brad Roberts <braddr bellevue.puremagic.com> writes:
On Mon, 1 Dec 2008, Andrei Alexandrescu wrote:

 My enthusiasm about polysemy got quite a bit lower when I realized that the
 promises of polysemy for integral operations can be provided (and actually
 outdone) by range analysis, a well-known method.
 
 The way it's done: for each integral expression in the program assign two
 numbers: the smallest possible value, and the largest possible value. Literals
 will therefore have a salami-slice-thin range associated with them. Whenever
 code asks for a lossy implicit cast, check the range and if it fits within the
 target type, let the code go through.
 
 Each operation computes its ranges from the range of its operands. The
 computation is operation-specific. For example, the range of a & b is
 max(a.range.min, b.range.min) to min(a.range.max, b.range.max). Sign
 considerations complicate this a bit, but not quite much.
 
 The precision of range analysis can be quite impressive, for example:
 
 uint b = ...;
 ubyte a = ((b & 2) << 6) | (b >> 24);
 
 typechecks no problem because it can prove no loss of information for all
 values of b.
 
 
 Andrei

The term I see associated with this technique, indeed well known, is Value Range Propagation. Combine this sort of accumulated knowledge with loop and if condition analysis as well as inlining, and often a signivicant amount of dead code elimination can occur. Later, Brad
Dec 01 2008
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

I accused Andrei of making up the word 'polysemy', but it turns out it is a real word! <g>
Dec 01 2008
next sibling parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-01 21:16:58 +0100, Walter Bright <newshound1 digitalmars.com> said:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

I accused Andrei of making up the word 'polysemy', but it turns out it is a real word! <g>

Is this the beginning of discriminating overloads also based on the return values? Fawzi
Dec 01 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

I accused Andrei of making up the word 'polysemy', but it turns out it is a real word! <g>

Is this the beginning of discriminating overloads also based on the return values?

No. I think return type overloading looks good in trivial cases, but as things get more complex it gets inscrutable.
Dec 01 2008
parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> said:

 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

I accused Andrei of making up the word 'polysemy', but it turns out it is a real word! <g>

Is this the beginning of discriminating overloads also based on the return values?

No. I think return type overloading looks good in trivial cases, but as things get more complex it gets inscrutable.

I agreee that return type overloading can go very bad, but a little bit can be very nice. Polysemy make more expressions typecheck, but I am not sure that I want that. For example with size_t & co I would amost always want a stronger typechecking, as if size_t would be a typedef, but with the usual rules wrt to ptr_diff, size_t,... (i.e. not cast between them). This because mixing size_t with int, or long is almost always suspicious, but you might see it only on the other platform (32/64 bit), and not on you own. Something that I would find nice on the other hand is to have a kind of integer literals that automatically cast to the type that makes more sense. I saw this in aldor, that discriminated upon return type, there and integer like 23 would be seen as fromInteger(23), and would select the optimal overloaded fromInteger depending on the context. Sometime you would need a cast, but most of the time things just work. This allowed to use 1 also as unit matrix for example. I don't need that much, but +1/-1,... with something that might be long, short, real,... needs more care than it should be, and normally it is obvious which type one expects. Now such a change should be checked in detail, and one would probably want also a simple way to tell the compiler that an integer is really a 32 bit int, to be more compatible with C one could make the different choice that for example these "adapting" integer literals have a special extension, like "a" so that the normal integer literals keep exactly the same semantic as in C, and 0a,1a, 12a would be these new integer types. To choose the type of these "adapting" integers one would proceed as follow : if it has an operation op(a,x) then take the type of x as type of a (I would restrict op to +-*/% to keep it simple), if x is also adaptive, recurse. If the whole expression is done and it is an assignment look at the type of the variable. If the variable has no type (auto) -> error [one could default to long or int, but it can be dangerous] if this is part of a function call f(a,...), try the values in the following order: long, int [one could try more, but again it can be expensive, one could also fail as before, but I think that this kind of use is widespread enough, that it is good to try to guess, but I am not totally convinced about this] Basically something like polysemy, but *only* for a kind of integer literals, and without introducing new types that can be used externally. One could also try to make the normal 0,1,2,... behave like that, and have a special extension for the one that are only 32 bits, but then to minimize the surprises then one cannot easily decide "not to guess", and the default decision should be int, and not long, something that I am not sure is the best choice. Fawzi Implementation details: these adaptive numbers need at least to be represented temporarily within the compiler. Using longs for them if one wants to allow also conversion to unsigned longs of maximum size, can be problematic. The compiler should use arbitrary precision numbers to represent them until the type is decided, or finds the exact type before the conversion.
Dec 04 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

I accused Andrei of making up the word 'polysemy', but it turns out it is a real word! <g>

Is this the beginning of discriminating overloads also based on the return values?

No. I think return type overloading looks good in trivial cases, but as things get more complex it gets inscrutable.

I agreee that return type overloading can go very bad, but a little bit can be very nice. Polysemy make more expressions typecheck, but I am not sure that I want that. For example with size_t & co I would amost always want a stronger typechecking, as if size_t would be a typedef, but with the usual rules wrt to ptr_diff, size_t,... (i.e. not cast between them). This because mixing size_t with int, or long is almost always suspicious, but you might see it only on the other platform (32/64 bit), and not on you own. Something that I would find nice on the other hand is to have a kind of integer literals that automatically cast to the type that makes more sense.

Wouldn't value range propagation take care of that (and actually more)? A literal such as 5 will have a support range [5, 5] which provides enough information to compute the best type down the road. Andrei
Dec 04 2008
next sibling parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-04 18:54:32 +0100, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

I accused Andrei of making up the word 'polysemy', but it turns out it is a real word! <g>

Is this the beginning of discriminating overloads also based on the return values?

No. I think return type overloading looks good in trivial cases, but as things get more complex it gets inscrutable.

I agreee that return type overloading can go very bad, but a little bit can be very nice. Polysemy make more expressions typecheck, but I am not sure that I want that. For example with size_t & co I would amost always want a stronger typechecking, as if size_t would be a typedef, but with the usual rules wrt to ptr_diff, size_t,... (i.e. not cast between them). This because mixing size_t with int, or long is almost always suspicious, but you might see it only on the other platform (32/64 bit), and not on you own. Something that I would find nice on the other hand is to have a kind of integer literals that automatically cast to the type that makes more sense.

Wouldn't value range propagation take care of that (and actually more)? A literal such as 5 will have a support range [5, 5] which provides enough information to compute the best type down the road. Andrei

Exactly, my point was to apply this only to integer literals, if I had understood correctly you thought to apply it to everything. As I said with size_t & co actually I would like a tighter control, and range propagation gives me a more lax control. With integer literals on the other hand I think range propagation or similar is a good idea. (because there I am sure that preserving the value is the correct choice) Fawzi
Dec 04 2008
prev sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:

 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

I accused Andrei of making up the word 'polysemy', but it turns out it is a real word! <g>

Is this the beginning of discriminating overloads also based on the return values?

No. I think return type overloading looks good in trivial cases, but as things get more complex it gets inscrutable.

I agreee that return type overloading can go very bad, but a little bit can be very nice. Polysemy make more expressions typecheck, but I am not sure that I want that. For example with size_t & co I would amost always want a stronger typechecking, as if size_t would be a typedef, but with the usual rules wrt to ptr_diff, size_t,... (i.e. not cast between them). This because mixing size_t with int, or long is almost always suspicious, but you might see it only on the other platform (32/64 bit), and not on you own. Something that I would find nice on the other hand is to have a kind of integer literals that automatically cast to the type that makes more sense.

Wouldn't value range propagation take care of that (and actually more)? A literal such as 5 will have a support range [5, 5] which provides enough information to compute the best type down the road.

It sounds very nice and right, except it's incompatible with Cee. Well, you can safely reduce bit count so that assigning "1025 & 15" to "byte" would go without both a cast and a warning/error. But you cannot grow bitcount beyond the C limits, that is, you cannot return long for "1024 << 30." You should probably report an error, and you should provide some way to tell the compiler, "i mean it." In the worst case, any shift, multiplication or addition will result in a compiler error. Do I miss something?
Dec 04 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> 
 said:

 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

it is a real word! <g>

return values?

as things get more complex it gets inscrutable.

can be very nice. Polysemy make more expressions typecheck, but I am not sure that I want that. For example with size_t & co I would amost always want a stronger typechecking, as if size_t would be a typedef, but with the usual rules wrt to ptr_diff, size_t,... (i.e. not cast between them). This because mixing size_t with int, or long is almost always suspicious, but you might see it only on the other platform (32/64 bit), and not on you own. Something that I would find nice on the other hand is to have a kind of integer literals that automatically cast to the type that makes more sense.

A literal such as 5 will have a support range [5, 5] which provides enough information to compute the best type down the road.

It sounds very nice and right, except it's incompatible with Cee. Well, you can safely reduce bit count so that assigning "1025 & 15" to "byte" would go without both a cast and a warning/error. But you cannot grow bitcount beyond the C limits, that is, you cannot return long for "1024 << 30." You should probably report an error, and you should provide some way to tell the compiler, "i mean it." In the worst case, any shift, multiplication or addition will result in a compiler error. Do I miss something?

Well any integral value carries: a) type as per the C rule b) minimum value possible c) maximum value possible The type stays the type as per the C rule, so there's no change there. If (and only if) a *narrower* type is asked as a conversion target for the value, the range is consulted. If the range is too large, the conversion fails. Andrei
Dec 04 2008
next sibling parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:

 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

out it is a real word! <g>

the return values?

but as things get more complex it gets inscrutable.

bit can be very nice. Polysemy make more expressions typecheck, but I am not sure that I want that. For example with size_t & co I would amost always want a stronger typechecking, as if size_t would be a typedef, but with the usual rules wrt to ptr_diff, size_t,... (i.e. not cast between them). This because mixing size_t with int, or long is almost always suspicious, but you might see it only on the other platform (32/64 bit), and not on you own. Something that I would find nice on the other hand is to have a kind of integer literals that automatically cast to the type that makes more sense.

more)? A literal such as 5 will have a support range [5, 5] which provides enough information to compute the best type down the road.

It sounds very nice and right, except it's incompatible with Cee. Well, you can safely reduce bit count so that assigning "1025 & 15" to "byte" would go without both a cast and a warning/error. But you cannot grow bitcount beyond the C limits, that is, you cannot return long for "1024 << 30." You should probably report an error, and you should provide some way to tell the compiler, "i mean it." In the worst case, any shift, multiplication or addition will result in a compiler error. Do I miss something?

Well any integral value carries: a) type as per the C rule b) minimum value possible c) maximum value possible The type stays the type as per the C rule, so there's no change there. If (and only if) a *narrower* type is asked as a conversion target for the value, the range is consulted. If the range is too large, the conversion fails. Andrei

Any idea how hard this would be to implement? Also we've got an interesting case in D that other languages don't have: CTFE functions. I presume that range propagation would not apply during evaluation of the CTFE function, but when evaluation is complete, it would then become a known literal, which can have precise range propagation. But there's still some funny issues: uint foo(int x) { return 5; } int bar(int y) { ubyte w = foo(7); // this is a narrowing conversion, generates compiler warning (foo is not called as CTFE). return 6; } enum ubyte z = foo(7); // this is range propagated, so narrowing is OK. enum int q = bar(3); // still gets a warning, because bar() didn't compile. int gar(T)(int y) { ubyte w = foo(7); return 6; } enum int v = gar!(int)(3); // is this OK???
Dec 05 2008
parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-05 09:40:03 +0100, Don <nospam nospam.com> said:

 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

is a real word! <g>

return values?

things get more complex it gets inscrutable.

can be very nice. Polysemy make more expressions typecheck, but I am not sure that I want that. For example with size_t & co I would amost always want a stronger typechecking, as if size_t would be a typedef, but with the usual rules wrt to ptr_diff, size_t,... (i.e. not cast between them). This because mixing size_t with int, or long is almost always suspicious, but you might see it only on the other platform (32/64 bit), and not on you own. Something that I would find nice on the other hand is to have a kind of integer literals that automatically cast to the type that makes more sense.

A literal such as 5 will have a support range [5, 5] which provides enough information to compute the best type down the road.

It sounds very nice and right, except it's incompatible with Cee. Well, you can safely reduce bit count so that assigning "1025 & 15" to "byte" would go without both a cast and a warning/error. But you cannot grow bitcount beyond the C limits, that is, you cannot return long for "1024 << 30." You should probably report an error, and you should provide some way to tell the compiler, "i mean it." In the worst case, any shift, multiplication or addition will result in a compiler error. Do I miss something?

Well any integral value carries: a) type as per the C rule b) minimum value possible c) maximum value possible The type stays the type as per the C rule, so there's no change there. If (and only if) a *narrower* type is asked as a conversion target for the value, the range is consulted. If the range is too large, the conversion fails. Andrei

Any idea how hard this would be to implement? Also we've got an interesting case in D that other languages don't have: CTFE functions. I presume that range propagation would not apply during evaluation of the CTFE function, but when evaluation is complete, it would then become a known literal, which can have precise range propagation. But there's still some funny issues: uint foo(int x) { return 5; } int bar(int y) { ubyte w = foo(7); // this is a narrowing conversion, generates compiler warning (foo is not called as CTFE). return 6; } enum ubyte z = foo(7); // this is range propagated, so narrowing is OK. enum int q = bar(3); // still gets a warning, because bar() didn't compile. int gar(T)(int y) { ubyte w = foo(7); return 6; } enum int v = gar!(int)(3); // is this OK???

What I would like is that one type of integer literals (optimally the one without annotation) has *no* fixed C type, but is effectively treated as an arbitrary dimension integer. Conversion form this arbitrary precision integer to any other type are implicit as long as the *value* can be represented in the end type, otherwise they fail. ubyte ub=4; // ok byte ib=4; // ok ubyte ub=-4; // failure ubyte ub=cast(ubyte)cast(byte)-4; // ok (one could see if the removal of cast(byte) should be accepted byte ib=-4; // ok byte ib=130; // failure float f=1234567890; // ok even if there could be precision loss int i=123455; // ok long i= 2147483647*2; // ok note that as the value is known at compile time this can always be checked, and one would get rid of annotations most of the time L UL s... Annotations should stay for compatibility with C and a short way instead of for example cast(uint)1234 . This thing has one problem, and that is overloaded function calls... in that case a rule has to be chosen: find the smallest signed and unsigned type that can represent the number. If both are ok, fail, otherwise choose the one that is ok, could be a possible rule, anyway that should be discussed to make the compiler work reasonable. So this is what I would like, I do not know how this matches with the polysemy proposal, because from Andrei comments I am not sure I have understood it correctly. So to answer Don within my proposal your code would not be correct because
      ubyte w = foo(7);

needs a cast, even when performed at compile time. You have no new types, special rules only apply to integer literals, as soon as the assume a fixed C type, then the normal rules are valid.
Dec 05 2008
prev sibling parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-05 07:02:37 +0100, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Sergey Gromov wrote:
 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

is a real word! <g>

return values?

things get more complex it gets inscrutable.

can be very nice. Polysemy make more expressions typecheck, but I am not sure that I want that. For example with size_t & co I would amost always want a stronger typechecking, as if size_t would be a typedef, but with the usual rules wrt to ptr_diff, size_t,... (i.e. not cast between them). This because mixing size_t with int, or long is almost always suspicious, but you might see it only on the other platform (32/64 bit), and not on you own. Something that I would find nice on the other hand is to have a kind of integer literals that automatically cast to the type that makes more sense.

A literal such as 5 will have a support range [5, 5] which provides enough information to compute the best type down the road.

It sounds very nice and right, except it's incompatible with Cee. Well, you can safely reduce bit count so that assigning "1025 & 15" to "byte" would go without both a cast and a warning/error. But you cannot grow bitcount beyond the C limits, that is, you cannot return long for "1024 << 30." You should probably report an error, and you should provide some way to tell the compiler, "i mean it." In the worst case, any shift, multiplication or addition will result in a compiler error. Do I miss something?

Well any integral value carries: a) type as per the C rule b) minimum value possible c) maximum value possible The type stays the type as per the C rule, so there's no change there. If (and only if) a *narrower* type is asked as a conversion target for the value, the range is consulted. If the range is too large, the conversion fails. Andrei

basically the implicit conversion rules of C disallowing automatic unsigned/signed conversions to unsigned? Fawzi
Dec 05 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Fawzi Mohamed wrote:
 On 2008-12-05 07:02:37 +0100, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Sergey Gromov wrote:
 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:

 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

out it is a real word! <g>

the return values?

but as things get more complex it gets inscrutable.

bit can be very nice. Polysemy make more expressions typecheck, but I am not sure that I want that. For example with size_t & co I would amost always want a stronger typechecking, as if size_t would be a typedef, but with the usual rules wrt to ptr_diff, size_t,... (i.e. not cast between them). This because mixing size_t with int, or long is almost always suspicious, but you might see it only on the other platform (32/64 bit), and not on you own. Something that I would find nice on the other hand is to have a kind of integer literals that automatically cast to the type that makes more sense.

more)? A literal such as 5 will have a support range [5, 5] which provides enough information to compute the best type down the road.

It sounds very nice and right, except it's incompatible with Cee. Well, you can safely reduce bit count so that assigning "1025 & 15" to "byte" would go without both a cast and a warning/error. But you cannot grow bitcount beyond the C limits, that is, you cannot return long for "1024 << 30." You should probably report an error, and you should provide some way to tell the compiler, "i mean it." In the worst case, any shift, multiplication or addition will result in a compiler error. Do I miss something?

Well any integral value carries: a) type as per the C rule b) minimum value possible c) maximum value possible The type stays the type as per the C rule, so there's no change there. If (and only if) a *narrower* type is asked as a conversion target for the value, the range is consulted. If the range is too large, the conversion fails. Andrei

basically the implicit conversion rules of C disallowing automatic unsigned/signed conversions to unsigned? Fawzi

Where's the predicate? I don't understand the question. Andrei
Dec 05 2008
parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-05 16:27:01 +0100, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 On 2008-12-05 07:02:37 +0100, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 [...]
 Well any integral value carries:
 
 a) type as per the C rule
 
 b) minimum value possible
 
 c) maximum value possible
 
 The type stays the type as per the C rule, so there's no change there. 
 If (and only if) a *narrower* type is asked as a conversion target for 
 the value, the range is consulted. If the range is too large, the 
 conversion fails.
 
 Andrei

basically the implicit conversion rules of C disallowing automatic unsigned/signed conversions to unsigned? Fawzi

Where's the predicate? I don't understand the question. Andrei

The implicit conversion rules in C when performing arithmetic operations allow up-conversion of types, basically the largest type present is used, which almost already respect a,b,c: (using C names) 1) if long double is present everything is converted to it 2) otherwise if double is present everything is converted to it 3) otherwise if float is present everything is converted to it if only signed or only unsigned integers are present the are ranked in the following sequence char, short, int ,long,long long and everything is converted to the largest type (largest rank) If the range of the signed integer include the range of the unsigned integer everything is copied to the signed type these rules respect a,b,c for example ushort us=1; printf("%g\n",1.0*(-34+us)); prints -33, as one would expect. Now the two rules that break this and that you want to abolish (or at least you have problems with) if I have understood correctly are * if the signed number has rank<= the unsigned convert to the unsigned * otherwise the unsigned version of the signed type is used. Is this correct? did I understand what you mean correctly? this is what polysemy does? I agree that in general these two last rules can bring a little bit of confusion printf("%g\n",1.0*(-34+1u)); printf("%g\n",1.0*(-34+1UL)); printf("%g\n",1.0*(-34-1u)); prints 4.29497e+09 1.84467e+19 4.29497e+09 but normally it does not matter, because the bit pattern is what one expects, and casting to the correct type one has the correct result printf("%g\n",1.0*cast(int)(-34+1u)); printf("%g\n",1.0*cast(long)(-34+1UL)); printf("%g\n",1.0*cast(int)(-34-1u)); prints -33 -33 -35 and the advantage of combining freely signed and unsigned without cast (and it happens often) I think out weights its problems. The only problem that I have seen connected to this is people thinking opCmp = cast(signed)(unsigned-unsigned); which is wrong. What I would like to have is 1) adaptive integer literals For one kind of integer literals, optimally the decimal literals without prefix, and introducing a prefix for int integer literals (yes in my first message I proposed the opposite, a prefix for the new kind of literals, but I changed idea already in the second one) to have a very lax regime, based of value preservation: - all calculations of these integer literals between themselves are done with arbitrary precision - this literal can implicitly cast to any type as long as the type can represent the value (that is obviously known at compile time) - for matching overloaded functions one has to find a rule, this is something I am not too sure about, int if the vale fits in it, long if it doesn't and ulong if it does not fit in either could be a possibility. 2) different integer type for size_t ptr_diff_t (and similar integer types that have the size of a pointer) no cast needed between size_t and ptr_diff_t cast needed between them and both long and int Fawzi
Dec 05 2008
prev sibling parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-05 02:53:11 +0100, Sergey Gromov <snake.scaly gmail.com> said:

 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com>
 said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright
 <newshound1 digitalmars.com> said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

I accused Andrei of making up the word 'polysemy', but it turns out it is a real word! <g>

Is this the beginning of discriminating overloads also based on the return values?

No. I think return type overloading looks good in trivial cases, but as things get more complex it gets inscrutable.

I agreee that return type overloading can go very bad, but a little bit can be very nice. Polysemy make more expressions typecheck, but I am not sure that I want that. For example with size_t & co I would amost always want a stronger typechecking, as if size_t would be a typedef, but with the usual rules wrt to ptr_diff, size_t,... (i.e. not cast between them). This because mixing size_t with int, or long is almost always suspicious, but you might see it only on the other platform (32/64 bit), and not on you own. Something that I would find nice on the other hand is to have a kind of integer literals that automatically cast to the type that makes more sense.

Wouldn't value range propagation take care of that (and actually more)? A literal such as 5 will have a support range [5, 5] which provides enough information to compute the best type down the road.

It sounds very nice and right, except it's incompatible with Cee. Well, you can safely reduce bit count so that assigning "1025 & 15" to "byte" would go without both a cast and a warning/error. But you cannot grow bitcount beyond the C limits, that is, you cannot return long for "1024 << 30." You should probably report an error, and you should provide some way to tell the compiler, "i mean it." In the worst case, any shift, multiplication or addition will result in a compiler error. Do I miss something?

well what I would like to have is 1024 << 30 to be acceptable as long as it is then stored in a long. With Polysemy I am not sure about what the result should be. Fawzi
Dec 05 2008
parent Sergey Gromov <snake.scaly gmail.com> writes:
Fri, 5 Dec 2008 12:24:27 +0100, Fawzi Mohamed wrote:

 On 2008-12-05 02:53:11 +0100, Sergey Gromov <snake.scaly gmail.com> said:
 
 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com>
 said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright
 <newshound1 digitalmars.com> said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

I accused Andrei of making up the word 'polysemy', but it turns out it is a real word! <g>

Is this the beginning of discriminating overloads also based on the return values?

No. I think return type overloading looks good in trivial cases, but as things get more complex it gets inscrutable.

I agreee that return type overloading can go very bad, but a little bit can be very nice. Polysemy make more expressions typecheck, but I am not sure that I want that. For example with size_t & co I would amost always want a stronger typechecking, as if size_t would be a typedef, but with the usual rules wrt to ptr_diff, size_t,... (i.e. not cast between them). This because mixing size_t with int, or long is almost always suspicious, but you might see it only on the other platform (32/64 bit), and not on you own. Something that I would find nice on the other hand is to have a kind of integer literals that automatically cast to the type that makes more sense.

Wouldn't value range propagation take care of that (and actually more)? A literal such as 5 will have a support range [5, 5] which provides enough information to compute the best type down the road.

It sounds very nice and right, except it's incompatible with Cee. Well, you can safely reduce bit count so that assigning "1025 & 15" to "byte" would go without both a cast and a warning/error. But you cannot grow bitcount beyond the C limits, that is, you cannot return long for "1024 << 30." You should probably report an error, and you should provide some way to tell the compiler, "i mean it." In the worst case, any shift, multiplication or addition will result in a compiler error. Do I miss something?

well what I would like to have is 1024 << 30 to be acceptable as long as it is then stored in a long. With Polysemy I am not sure about what the result should be.

The result should be either 0 or a compile-time error because C requires this to evaluate to 0.
Dec 08 2008
prev sibling parent Paul D. Anderson <paul.d.removethis.anderson comcast.andthis.net> writes:
Walter Bright Wrote:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

I accused Andrei of making up the word 'polysemy', but it turns out it is a real word! <g>

Did Andrei also make up this word? http://www.worldwidewords.org/weirdwords/ww-pop3.htm
Dec 03 2008