digitalmars.D - Value Preservation and Polysemy

Andrei Alexandrescu (42/42) Nov 28 2008 I've had a talk with Walter today, and two interesting things transpired...

Michel Fortin (16/19) Nov 28 2008 Does this help?
Max Samukha (5/47) Nov 29 2008 I think it's a very relevant term. Given that type of a value attaches
Don (9/18) Dec 01 2008 That has some interesting consequences.

Andrei Alexandrescu (19/45) Dec 01 2008 My enthusiasm about polysemy got quite a bit lower when I realized that

Brad Roberts (7/32) Dec 01 2008 The term I see associated with this technique, indeed well known, is Val...

Walter Bright (3/4) Dec 01 2008 I accused Andrei of making up the word 'polysemy', but it turns out it

Fawzi Mohamed (4/9) Dec 01 2008 Is this the beginning of discriminating overloads also based on the

Walter Bright (3/14) Dec 01 2008 No. I think return type overloading looks good in trivial cases, but as

Fawzi Mohamed (55/69) Dec 04 2008 I agreee that return type overloading can go very bad, but a little bit

Andrei Alexandrescu (5/38) Dec 04 2008 Wouldn't value range propagation take care of that (and actually more)?

Fawzi Mohamed (10/48) Dec 04 2008 Exactly, my point was to apply this only to integer literals, if I had
Sergey Gromov (9/47) Dec 04 2008 It sounds very nice and right, except it's incompatible with Cee.

Andrei Alexandrescu (10/56) Dec 04 2008 Well any integral value carries:

Don (23/86) Dec 05 2008 Any idea how hard this would be to implement?

Fawzi Mohamed (34/125) Dec 05 2008 What I would like is that one type of integer literals (optimally the

Fawzi Mohamed (5/65) Dec 05 2008 basically the implicit conversion rules of C disallowing automatic

Andrei Alexandrescu (3/74) Dec 05 2008 Where's the predicate? I don't understand the question.

Fawzi Mohamed (68/96) Dec 05 2008 The implicit conversion rules in C when performing arithmetic

Fawzi Mohamed (5/56) Dec 05 2008 well what I would like to have is 1024 << 30 to be acceptable as long

Sergey Gromov (3/60) Dec 08 2008 The result should be either 0 or a compile-time error because C requires

Paul D. Anderson (3/8) Dec 03 2008 Did Andrei also make up this word?

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I've had a talk with Walter today, and two interesting things transpired.

First off, Walter pointed out that I was wrong about one conversion rule 
(google value preservation vs. type preservation). It turns out that 
everything that's unsigned and less than int is actually converted to 
int, NOT unsigned int as I thought. This is the case in C, C++, and D.

Second, as of today Walter devised a very crude heuristic for 
typechecking narrowing conversions:

(a) a straight assignment x = y fails if y is wider than x.

(b) however, x = e compiles for more complex expressions EVEN if there 
is potential for loss of precision.

Now enter polysemy. With that, we can get the right rules in place and 
minimize false positives. An expression will yield a polysemous value 
with the as-C-does-it type as its principal type. The secondary type is 
a carefully computed narrower type that is the tightest actual type.

If you just say auto or use the value with overloaded functions etc., 
it's just like in C - the as-in-C type will be in vigor. But if you 
specifically ask for a narrower type, the secondary type enters in effect.

Examples:

uint u1 = ...;
ushort us1 = ...;
auto a = u1 & us1; // fine, a is uint
ushort b = u1 & us1; // fine too, secondary type kicks in

long l1 = ...;
auto c = u1 / l1; // c is long
int d = u1 / l1; // fine too, secondary type kicks in

We need to think this through for complex expressions etc. Walter and I 
are quite excited that this will take care of a significant portion of 
the integral conversions mess (in addition to handling literals, 
constants, and variables within a unified framework).

The plan is to deploy polysemous integrals first without changing the 
rest of the conversion rules. At that point, if the technique turns out 
to enjoy considerable success, Walter agreed to review and possibly 
stage in the change I suggested to drop the implicit signed -> unsigned 
conversion. With that, I guess we can claim victory in the war between 
spurious vs. too lax conversions.

I'm very excited about polysemy. It's entirely original to D, covers a 
class of problems that can't be addressed with any of the known 
techniques (subtyping, coercion...) and has a kick-ass name to boot. 
Walter today mentioned he's still not sure I hadn't made "polysemy" up. 
Indeed, Thunderbird and Firefox are suggesting it's a typo - please "add 
to dictionary" :o).


Andrei

Nov 28 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-28 19:08:06 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Walter today mentioned he's still not sure I hadn't made "polysemy" up. 
 Indeed, Thunderbird and Firefox are suggesting it's a typo - please 
 "add to dictionary" :o).

Does this help?

polysemy
noun Linguistics
the coexistence of many possible meanings for a word or phrase.

-- New Oxford American Dictionary, 2nd Edition

polysemy
n : the ambiguity of an individual word or phrase that can be used (in 
different contexts) to express two or more different meanings [syn: 
lexical ambiguity] [ant: monosemy]

-- WordNet <http://wordnet.princeton.edu/perl/webwn?s=polysemy>

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 28 2008

Max Samukha <samukha voliacable.com.removethis> writes:

On Fri, 28 Nov 2008 16:08:06 -0800, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

I've had a talk with Walter today, and two interesting things transpired.

First off, Walter pointed out that I was wrong about one conversion rule 
(google value preservation vs. type preservation). It turns out that 
everything that's unsigned and less than int is actually converted to 
int, NOT unsigned int as I thought. This is the case in C, C++, and D.

Second, as of today Walter devised a very crude heuristic for 
typechecking narrowing conversions:

(a) a straight assignment x = y fails if y is wider than x.

(b) however, x = e compiles for more complex expressions EVEN if there 
is potential for loss of precision.

Now enter polysemy. With that, we can get the right rules in place and 
minimize false positives. An expression will yield a polysemous value 
with the as-C-does-it type as its principal type. The secondary type is 
a carefully computed narrower type that is the tightest actual type.

If you just say auto or use the value with overloaded functions etc., 
it's just like in C - the as-in-C type will be in vigor. But if you 
specifically ask for a narrower type, the secondary type enters in effect.

Examples:

uint u1 = ...;
ushort us1 = ...;
auto a = u1 & us1; // fine, a is uint
ushort b = u1 & us1; // fine too, secondary type kicks in

long l1 = ...;
auto c = u1 / l1; // c is long
int d = u1 / l1; // fine too, secondary type kicks in

We need to think this through for complex expressions etc. Walter and I 
are quite excited that this will take care of a significant portion of 
the integral conversions mess (in addition to handling literals, 
constants, and variables within a unified framework).

The plan is to deploy polysemous integrals first without changing the 
rest of the conversion rules. At that point, if the technique turns out 
to enjoy considerable success, Walter agreed to review and possibly 
stage in the change I suggested to drop the implicit signed -> unsigned 
conversion. With that, I guess we can claim victory in the war between 
spurious vs. too lax conversions.

I'm very excited about polysemy. It's entirely original to D, covers a 
class of problems that can't be addressed with any of the known 
techniques (subtyping, coercion...) and has a kick-ass name to boot. 
Walter today mentioned he's still not sure I hadn't made "polysemy" up. 
Indeed, Thunderbird and Firefox are suggesting it's a typo - please "add 
to dictionary" :o).

I think it's a very relevant term. Given that type of a value attaches
a meaning to that value, values getting different types (meanings)
depending on the context are polysemous. Very cool.

Andrei

Nov 29 2008

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 I've had a talk with Walter today, and two interesting things transpired.
 
 First off, Walter pointed out that I was wrong about one conversion rule 
 (google value preservation vs. type preservation). It turns out that 
 everything that's unsigned and less than int is actually converted to 
 int, NOT unsigned int as I thought. This is the case in C, C++, and D.

That has some interesting consequences.

   ushort x = 0xFFFF;
   short y = x;
   printf("%d %d %d\n", x>>1, y>>1, y>>>1);

// prints: 32767 -1 2147483647

What a curious beast the >>> operator is!

 I'm very excited about polysemy. It's entirely original to D, covers a 
 class of problems that can't be addressed with any of the known 
 techniques (subtyping, coercion...) and has a kick-ass name to boot. 

I agree. By making the type system looser in the one place where you 
actually need it to be loose, you can tighten it everywhere else. Fantastic.

Dec 01 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 I've had a talk with Walter today, and two interesting things transpired.

 First off, Walter pointed out that I was wrong about one conversion 
 rule (google value preservation vs. type preservation). It turns out 
 that everything that's unsigned and less than int is actually 
 converted to int, NOT unsigned int as I thought. This is the case in 
 C, C++, and D.

 
 That has some interesting consequences.
 
   ushort x = 0xFFFF;
   short y = x;
   printf("%d %d %d\n", x>>1, y>>1, y>>>1);
 
 // prints: 32767 -1 2147483647
 
 What a curious beast the >>> operator is!
 
 I'm very excited about polysemy. It's entirely original to D, covers a 
 class of problems that can't be addressed with any of the known 
 techniques (subtyping, coercion...) and has a kick-ass name to boot. 

 
 I agree. By making the type system looser in the one place where you 
 actually need it to be loose, you can tighten it everywhere else. 
 Fantastic.

My enthusiasm about polysemy got quite a bit lower when I realized that 
the promises of polysemy for integral operations can be provided (and 
actually outdone) by range analysis, a well-known method.

The way it's done: for each integral expression in the program assign 
two numbers: the smallest possible value, and the largest possible 
value. Literals will therefore have a salami-slice-thin range associated 
with them. Whenever code asks for a lossy implicit cast, check the range 
and if it fits within the target type, let the code go through.

Each operation computes its ranges from the range of its operands. The 
computation is operation-specific. For example, the range of a & b is 
max(a.range.min, b.range.min) to min(a.range.max, b.range.max). Sign 
considerations complicate this a bit, but not quite much.

The precision of range analysis can be quite impressive, for example:

uint b = ...;
ubyte a = ((b & 2) << 6) | (b >> 24);

typechecks no problem because it can prove no loss of information for 
all values of b.


Andrei

Dec 01 2008

Brad Roberts <braddr bellevue.puremagic.com> writes:

On Mon, 1 Dec 2008, Andrei Alexandrescu wrote:

 My enthusiasm about polysemy got quite a bit lower when I realized that the
 promises of polysemy for integral operations can be provided (and actually
 outdone) by range analysis, a well-known method.
 
 The way it's done: for each integral expression in the program assign two
 numbers: the smallest possible value, and the largest possible value. Literals
 will therefore have a salami-slice-thin range associated with them. Whenever
 code asks for a lossy implicit cast, check the range and if it fits within the
 target type, let the code go through.
 
 Each operation computes its ranges from the range of its operands. The
 computation is operation-specific. For example, the range of a & b is
 max(a.range.min, b.range.min) to min(a.range.max, b.range.max). Sign
 considerations complicate this a bit, but not quite much.
 
 The precision of range analysis can be quite impressive, for example:
 
 uint b = ...;
 ubyte a = ((b & 2) << 6) | (b >> 24);
 
 typechecks no problem because it can prove no loss of information for all
 values of b.
 
 
 Andrei

The term I see associated with this technique, indeed well known, is Value 
Range Propagation.  Combine this sort of accumulated knowledge with loop 
and if condition analysis as well as inlining, and often a signivicant 
amount of dead code elimination can occur.

Later,
Brad

Dec 01 2008

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

I accused Andrei of making up the word 'polysemy', but it turns out it 
is a real word! <g>

Dec 01 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-01 21:16:58 +0100, Walter Bright <newshound1 digitalmars.com> said:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 
 I accused Andrei of making up the word 'polysemy', but it turns out it 
 is a real word! <g>

Is this the beginning of discriminating overloads also based on the 
return values?

Fawzi

Dec 01 2008

Walter Bright <newshound1 digitalmars.com> writes:

Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 I accused Andrei of making up the word 'polysemy', but it turns out it 
 is a real word! <g>

 
 Is this the beginning of discriminating overloads also based on the 
 return values?

No. I think return type overloading looks good in trivial cases, but as 
things get more complex it gets inscrutable.

Dec 01 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> said:

 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 
 I accused Andrei of making up the word 'polysemy', but it turns out it 
 is a real word! <g>

 
 Is this the beginning of discriminating overloads also based on the 
 return values?

 
 No. I think return type overloading looks good in trivial cases, but as 
 things get more complex it gets inscrutable.

I agreee that return type overloading can go very bad, but a little bit 
can be very nice.

Polysemy make more expressions typecheck, but I am not sure that I want that.
For example with size_t & co I would amost always want a stronger 
typechecking, as if size_t would be a typedef, but with the usual rules 
wrt to ptr_diff, size_t,... (i.e. not cast between them).
This because mixing size_t with int, or long is almost always 
suspicious, but you might see it only on the other platform (32/64 
bit), and not on you own.

Something that I would find nice on the other hand is to have a kind of 
integer literals that automatically cast to the type that makes more 
sense.
I saw this in aldor, that discriminated upon return type, there and 
integer like 23 would be seen as fromInteger(23), and would select the 
optimal overloaded fromInteger depending on the context.
Sometime you would need a cast, but most of the time things just work. 
This allowed to use 1 also as unit matrix for example.
I don't need that much, but +1/-1,... with something that might be 
long, short, real,... needs more care than it should be, and normally 
it is obvious which type one expects.

Now such a change should be checked in detail, and one would probably 
want also a simple way to tell the compiler that an integer is really a 
32 bit int, to be more compatible with C one could make the different 
choice that for example these "adapting" integer literals have a 
special extension, like "a" so that the normal integer literals keep 
exactly the same semantic as in C, and 0a,1a, 12a would be these new 
integer types.

To choose the type of these "adapting" integers one would proceed as follow :
if it has an operation op(a,x) then take the type of x as type of a (I 
would restrict op to +-*/% to keep it simple), if x is also adaptive, 
recurse.
If the whole expression is done and it is an assignment look at the 
type of the variable.
If the variable has no type (auto) -> error [one could default to long 
or int, but it can be dangerous]
if this is part of a function call
f(a,...), try the values in the following order: long, int [one could 
try more, but again it can be expensive, one could also fail as before, 
but I think that this kind of use is widespread enough, that it is good 
to try to guess, but I am not totally convinced about this]

Basically something like polysemy, but *only* for a kind of integer 
literals, and without introducing new types that can be used externally.

One could also try to make the normal 0,1,2,... behave like that, and 
have a special extension for the one that are only 32 bits, but then to 
minimize the surprises then one cannot easily decide "not to guess", 
and the default decision should be int, and not long, something that I 
am not sure is the best choice.

Fawzi

Implementation details: these adaptive numbers need at least to be 
represented temporarily within the compiler. Using longs for them if 
one wants to allow also conversion to unsigned longs of maximum size, 
can be problematic. The compiler should use arbitrary precision numbers 
to represent them until the type is decided, or finds the exact type 
before the conversion.

Dec 04 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 I accused Andrei of making up the word 'polysemy', but it turns out 
 it is a real word! <g>

 Is this the beginning of discriminating overloads also based on the 
 return values?

 No. I think return type overloading looks good in trivial cases, but 
 as things get more complex it gets inscrutable.

 
 I agreee that return type overloading can go very bad, but a little bit 
 can be very nice.
 
 Polysemy make more expressions typecheck, but I am not sure that I want 
 that.
 For example with size_t & co I would amost always want a stronger 
 typechecking, as if size_t would be a typedef, but with the usual rules 
 wrt to ptr_diff, size_t,... (i.e. not cast between them).
 This because mixing size_t with int, or long is almost always 
 suspicious, but you might see it only on the other platform (32/64 bit), 
 and not on you own.
 
 Something that I would find nice on the other hand is to have a kind of 
 integer literals that automatically cast to the type that makes more sense.

Wouldn't value range propagation take care of that (and actually more)? 
A literal such as 5 will have a support range [5, 5] which provides 
enough information to compute the best type down the road.

Andrei

Dec 04 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-04 18:54:32 +0100, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 
 I accused Andrei of making up the word 'polysemy', but it turns out it 
 is a real word! <g>

 
 Is this the beginning of discriminating overloads also based on the 
 return values?

 
 No. I think return type overloading looks good in trivial cases, but as 
 things get more complex it gets inscrutable.

 
 I agreee that return type overloading can go very bad, but a little bit 
 can be very nice.
 
 Polysemy make more expressions typecheck, but I am not sure that I want that.
 For example with size_t & co I would amost always want a stronger 
 typechecking, as if size_t would be a typedef, but with the usual rules 
 wrt to ptr_diff, size_t,... (i.e. not cast between them).
 This because mixing size_t with int, or long is almost always 
 suspicious, but you might see it only on the other platform (32/64 
 bit), and not on you own.
 
 Something that I would find nice on the other hand is to have a kind of 
 integer literals that automatically cast to the type that makes more 
 sense.

 
 Wouldn't value range propagation take care of that (and actually more)? 
 A literal such as 5 will have a support range [5, 5] which provides 
 enough information to compute the best type down the road.
 
 Andrei

Exactly, my point was to apply this only to integer literals, if I had 
understood correctly you thought to apply it to everything.
As I said with size_t & co actually I would like a tighter control, and 
range propagation gives me a more lax control.
With integer literals on the other hand I think range propagation or 
similar is a good idea.
(because there I am sure that preserving the value is the correct choice)
Fawzi

Dec 04 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:

 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 I accused Andrei of making up the word 'polysemy', but it turns out 
 it is a real word! <g>

 Is this the beginning of discriminating overloads also based on the 
 return values?

 No. I think return type overloading looks good in trivial cases, but 
 as things get more complex it gets inscrutable.

 
 I agreee that return type overloading can go very bad, but a little bit 
 can be very nice.
 
 Polysemy make more expressions typecheck, but I am not sure that I want 
 that.
 For example with size_t & co I would amost always want a stronger 
 typechecking, as if size_t would be a typedef, but with the usual rules 
 wrt to ptr_diff, size_t,... (i.e. not cast between them).
 This because mixing size_t with int, or long is almost always 
 suspicious, but you might see it only on the other platform (32/64 bit), 
 and not on you own.
 
 Something that I would find nice on the other hand is to have a kind of 
 integer literals that automatically cast to the type that makes more sense.

 
 Wouldn't value range propagation take care of that (and actually more)? 
 A literal such as 5 will have a support range [5, 5] which provides 
 enough information to compute the best type down the road.

It sounds very nice and right, except it's incompatible with Cee.

Well, you can safely reduce bit count so that assigning "1025 & 15" to
"byte" would go without both a cast and a warning/error.  But you cannot
grow bitcount beyond the C limits, that is, you cannot return long for
"1024 << 30."  You should probably report an error, and you should
provide some way to tell the compiler, "i mean it."

In the worst case, any shift, multiplication or addition will result in
a compiler error.  Do I miss something?

Dec 04 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> 
 said:

 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 I accused Andrei of making up the word 'polysemy', but it turns out 
 it is a real word! <g>

 Is this the beginning of discriminating overloads also based on the 
 return values?

 No. I think return type overloading looks good in trivial cases, but 
 as things get more complex it gets inscrutable.

 I agreee that return type overloading can go very bad, but a little bit 
 can be very nice.

 Polysemy make more expressions typecheck, but I am not sure that I want 
 that.
 For example with size_t & co I would amost always want a stronger 
 typechecking, as if size_t would be a typedef, but with the usual rules 
 wrt to ptr_diff, size_t,... (i.e. not cast between them).
 This because mixing size_t with int, or long is almost always 
 suspicious, but you might see it only on the other platform (32/64 bit), 
 and not on you own.

 Something that I would find nice on the other hand is to have a kind of 
 integer literals that automatically cast to the type that makes more sense.

 Wouldn't value range propagation take care of that (and actually more)? 
 A literal such as 5 will have a support range [5, 5] which provides 
 enough information to compute the best type down the road.

 
 It sounds very nice and right, except it's incompatible with Cee.
 
 Well, you can safely reduce bit count so that assigning "1025 & 15" to
 "byte" would go without both a cast and a warning/error.  But you cannot
 grow bitcount beyond the C limits, that is, you cannot return long for
 "1024 << 30."  You should probably report an error, and you should
 provide some way to tell the compiler, "i mean it."
 
 In the worst case, any shift, multiplication or addition will result in
 a compiler error.  Do I miss something?

Well any integral value carries:

a) type as per the C rule

b) minimum value possible

c) maximum value possible

The type stays the type as per the C rule, so there's no change there. 
If (and only if) a *narrower* type is asked as a conversion target for 
the value, the range is consulted. If the range is too large, the 
conversion fails.

Andrei

Dec 04 2008

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:

 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 I accused Andrei of making up the word 'polysemy', but it turns 
 out it is a real word! <g>

 Is this the beginning of discriminating overloads also based on 
 the return values?

 No. I think return type overloading looks good in trivial cases, 
 but as things get more complex it gets inscrutable.

 I agreee that return type overloading can go very bad, but a little 
 bit can be very nice.

 Polysemy make more expressions typecheck, but I am not sure that I 
 want that.
 For example with size_t & co I would amost always want a stronger 
 typechecking, as if size_t would be a typedef, but with the usual 
 rules wrt to ptr_diff, size_t,... (i.e. not cast between them).
 This because mixing size_t with int, or long is almost always 
 suspicious, but you might see it only on the other platform (32/64 
 bit), and not on you own.

 Something that I would find nice on the other hand is to have a kind 
 of integer literals that automatically cast to the type that makes 
 more sense.

 Wouldn't value range propagation take care of that (and actually 
 more)? A literal such as 5 will have a support range [5, 5] which 
 provides enough information to compute the best type down the road.

 It sounds very nice and right, except it's incompatible with Cee.

 Well, you can safely reduce bit count so that assigning "1025 & 15" to
 "byte" would go without both a cast and a warning/error.  But you cannot
 grow bitcount beyond the C limits, that is, you cannot return long for
 "1024 << 30."  You should probably report an error, and you should
 provide some way to tell the compiler, "i mean it."

 In the worst case, any shift, multiplication or addition will result in
 a compiler error.  Do I miss something?

 
 Well any integral value carries:
 
 a) type as per the C rule
 
 b) minimum value possible
 
 c) maximum value possible
 
 The type stays the type as per the C rule, so there's no change there. 
 If (and only if) a *narrower* type is asked as a conversion target for 
 the value, the range is consulted. If the range is too large, the 
 conversion fails.
 
 Andrei

Any idea how hard this would be to implement?

Also we've got an interesting case in D that other languages don't have: 
CTFE functions.
I presume that range propagation would not apply during evaluation of 
the CTFE function, but when evaluation is complete, it would then become 
a known literal, which can have precise range propagation. But there's 
still some funny issues:

uint foo(int x) { return 5; }

int bar(int y)
{
     ubyte w = foo(7); // this is a narrowing conversion, generates 
compiler warning (foo is not called as CTFE).
     return 6;
}

enum ubyte z = foo(7); // this is range propagated, so narrowing is OK.
enum int q = bar(3); // still gets a warning, because bar() didn't compile.

int gar(T)(int y)
{
     ubyte w = foo(7);
     return 6;
}

enum int v = gar!(int)(3); // is this OK???

Dec 05 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-05 09:40:03 +0100, Don <nospam nospam.com> said:

 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 I accused Andrei of making up the word 'polysemy', but it turns out it 
 is a real word! <g>

 Is this the beginning of discriminating overloads also based on the 
 return values?

 No. I think return type overloading looks good in trivial cases, but as 
 things get more complex it gets inscrutable.

 I agreee that return type overloading can go very bad, but a little bit 
 can be very nice.
 
 Polysemy make more expressions typecheck, but I am not sure that I want that.
 For example with size_t & co I would amost always want a stronger 
 typechecking, as if size_t would be a typedef, but with the usual rules 
 wrt to ptr_diff, size_t,... (i.e. not cast between them).
 This because mixing size_t with int, or long is almost always 
 suspicious, but you might see it only on the other platform (32/64 
 bit), and not on you own.
 
 Something that I would find nice on the other hand is to have a kind of 
 integer literals that automatically cast to the type that makes more 
 sense.

 Wouldn't value range propagation take care of that (and actually more)? 
 A literal such as 5 will have a support range [5, 5] which provides 
 enough information to compute the best type down the road.

 
 It sounds very nice and right, except it's incompatible with Cee.
 
 Well, you can safely reduce bit count so that assigning "1025 & 15" to
 "byte" would go without both a cast and a warning/error.  But you cannot
 grow bitcount beyond the C limits, that is, you cannot return long for
 "1024 << 30."  You should probably report an error, and you should
 provide some way to tell the compiler, "i mean it."
 
 In the worst case, any shift, multiplication or addition will result in
 a compiler error.  Do I miss something?

 
 Well any integral value carries:
 
 a) type as per the C rule
 
 b) minimum value possible
 
 c) maximum value possible
 
 The type stays the type as per the C rule, so there's no change there. 
 If (and only if) a *narrower* type is asked as a conversion target for 
 the value, the range is consulted. If the range is too large, the 
 conversion fails.
 
 Andrei

 
 Any idea how hard this would be to implement?
 
 Also we've got an interesting case in D that other languages don't 
 have: CTFE functions.
 I presume that range propagation would not apply during evaluation of 
 the CTFE function, but when evaluation is complete, it would then 
 become a known literal, which can have precise range propagation. But 
 there's still some funny issues:
 
 uint foo(int x) { return 5; }
 
 int bar(int y)
 {
      ubyte w = foo(7); // this is a narrowing conversion, generates 
 compiler warning (foo is not called as CTFE).
      return 6;
 }
 
 enum ubyte z = foo(7); // this is range propagated, so narrowing is OK.
 enum int q = bar(3); // still gets a warning, because bar() didn't compile.
 
 int gar(T)(int y)
 {
      ubyte w = foo(7);
      return 6;
 }
 
 enum int v = gar!(int)(3); // is this OK???

What I would like is that one type of integer literals (optimally the 
one without annotation) has *no* fixed C type, but is effectively 
treated as an arbitrary dimension integer.
Conversion form this arbitrary precision integer to any other type are 
implicit as long as the *value* can be represented in the end type, 
otherwise they fail.

ubyte ub=4; // ok
byte ib=4; // ok
ubyte ub=-4; // failure
ubyte ub=cast(ubyte)cast(byte)-4; // ok (one could see if the removal 
of cast(byte) should be accepted
byte ib=-4; // ok
byte ib=130; // failure
float f=1234567890; // ok even if there could be precision loss
int i=123455; // ok
long i= 2147483647*2; // ok

note that as the value is known at compile time this can always be 
checked, and one would get rid of annotations  most of the time L UL 
s...
Annotations should stay for compatibility with C and a short way 
instead of for example cast(uint)1234 .

This thing has one problem, and that is overloaded function calls... in 
that case a rule has to be chosen:  find the smallest signed and 
unsigned type that can represent the number. If both are ok, fail, 
otherwise choose the one that is ok, could be a possible rule, anyway 
that should be discussed to make the compiler work reasonable.

So this is what I would like, I do not know how this matches with the 
polysemy proposal, because from  Andrei comments I am not sure I have 
understood it correctly.

So to answer Don within my proposal your code would not be correct because

      ubyte w = foo(7);

needs a cast, even when performed at compile time. You have no new 
types, special rules only apply to integer literals, as soon as the 
assume a fixed C type, then the normal rules are valid.

Dec 05 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-05 07:02:37 +0100, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Sergey Gromov wrote:
 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright <newshound1 digitalmars.com> said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 I accused Andrei of making up the word 'polysemy', but it turns out it 
 is a real word! <g>

 Is this the beginning of discriminating overloads also based on the 
 return values?

 No. I think return type overloading looks good in trivial cases, but as 
 things get more complex it gets inscrutable.

 I agreee that return type overloading can go very bad, but a little bit 
 can be very nice.
 
 Polysemy make more expressions typecheck, but I am not sure that I want that.
 For example with size_t & co I would amost always want a stronger 
 typechecking, as if size_t would be a typedef, but with the usual rules 
 wrt to ptr_diff, size_t,... (i.e. not cast between them).
 This because mixing size_t with int, or long is almost always 
 suspicious, but you might see it only on the other platform (32/64 
 bit), and not on you own.
 
 Something that I would find nice on the other hand is to have a kind of 
 integer literals that automatically cast to the type that makes more 
 sense.

 Wouldn't value range propagation take care of that (and actually more)? 
 A literal such as 5 will have a support range [5, 5] which provides 
 enough information to compute the best type down the road.

 
 It sounds very nice and right, except it's incompatible with Cee.
 
 Well, you can safely reduce bit count so that assigning "1025 & 15" to
 "byte" would go without both a cast and a warning/error.  But you cannot
 grow bitcount beyond the C limits, that is, you cannot return long for
 "1024 << 30."  You should probably report an error, and you should
 provide some way to tell the compiler, "i mean it."
 
 In the worst case, any shift, multiplication or addition will result in
 a compiler error.  Do I miss something?

 
 Well any integral value carries:
 
 a) type as per the C rule
 
 b) minimum value possible
 
 c) maximum value possible
 
 The type stays the type as per the C rule, so there's no change there. 
 If (and only if) a *narrower* type is asked as a conversion target for 
 the value, the range is consulted. If the range is too large, the 
 conversion fails.
 
 Andrei

basically the implicit conversion rules of C disallowing automatic 
unsigned/signed conversions to unsigned?
Fawzi

Dec 05 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Fawzi Mohamed wrote:
 On 2008-12-05 07:02:37 +0100, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Sergey Gromov wrote:
 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:

 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright 
 <newshound1 digitalmars.com> said:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 I accused Andrei of making up the word 'polysemy', but it turns 
 out it is a real word! <g>

 Is this the beginning of discriminating overloads also based on 
 the return values?

 No. I think return type overloading looks good in trivial cases, 
 but as things get more complex it gets inscrutable.

 I agreee that return type overloading can go very bad, but a little 
 bit can be very nice.

 Polysemy make more expressions typecheck, but I am not sure that I 
 want that.
 For example with size_t & co I would amost always want a stronger 
 typechecking, as if size_t would be a typedef, but with the usual 
 rules wrt to ptr_diff, size_t,... (i.e. not cast between them).
 This because mixing size_t with int, or long is almost always 
 suspicious, but you might see it only on the other platform (32/64 
 bit), and not on you own.

 Something that I would find nice on the other hand is to have a 
 kind of integer literals that automatically cast to the type that 
 makes more sense.

 Wouldn't value range propagation take care of that (and actually 
 more)? A literal such as 5 will have a support range [5, 5] which 
 provides enough information to compute the best type down the road.

 It sounds very nice and right, except it's incompatible with Cee.

 Well, you can safely reduce bit count so that assigning "1025 & 15" to
 "byte" would go without both a cast and a warning/error.  But you cannot
 grow bitcount beyond the C limits, that is, you cannot return long for
 "1024 << 30."  You should probably report an error, and you should
 provide some way to tell the compiler, "i mean it."

 In the worst case, any shift, multiplication or addition will result in
 a compiler error.  Do I miss something?

 Well any integral value carries:

 a) type as per the C rule

 b) minimum value possible

 c) maximum value possible

 The type stays the type as per the C rule, so there's no change there. 
 If (and only if) a *narrower* type is asked as a conversion target for 
 the value, the range is consulted. If the range is too large, the 
 conversion fails.

 Andrei

 
 basically the implicit conversion rules of C disallowing automatic 
 unsigned/signed conversions to unsigned?
 Fawzi
 

Where's the predicate? I don't understand the question.

Andrei

Dec 05 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-05 16:27:01 +0100, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 On 2008-12-05 07:02:37 +0100, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 [...]
 Well any integral value carries:
 
 a) type as per the C rule
 
 b) minimum value possible
 
 c) maximum value possible
 
 The type stays the type as per the C rule, so there's no change there. 
 If (and only if) a *narrower* type is asked as a conversion target for 
 the value, the range is consulted. If the range is too large, the 
 conversion fails.
 
 Andrei

 
 basically the implicit conversion rules of C disallowing automatic 
 unsigned/signed conversions to unsigned?
 Fawzi
 

 
 Where's the predicate? I don't understand the question.
 
 Andrei

The implicit conversion rules in C when performing arithmetic 
operations allow up-conversion of types, basically the largest type 
present is used, which almost already respect a,b,c: (using C names)

1) if long double is present everything is converted to it
2) otherwise if double is present everything is converted to it
3) otherwise if float is present everything is converted to it

if only signed or only unsigned integers are present the are ranked in 
the following sequence
	char, short, int ,long,long long
and everything is converted to the largest type (largest rank)

If the range of the signed integer include the range of the unsigned 
integer everything is copied to the signed type
these rules respect a,b,c

for example
    ushort us=1;
    printf("%g\n",1.0*(-34+us));
prints -33, as one would expect.

Now the two rules that break this and that you want to abolish (or at 
least you have problems with) if I have understood correctly are
* if the signed number has rank<= the unsigned convert to the unsigned
* otherwise the unsigned version of the signed type is used.
Is this correct? did I understand what you mean correctly? this is what 
polysemy does?

I agree that in general these two last rules can bring a little bit of 
confusion
    printf("%g\n",1.0*(-34+1u));
    printf("%g\n",1.0*(-34+1UL));
    printf("%g\n",1.0*(-34-1u));
prints
    4.29497e+09
    1.84467e+19
    4.29497e+09
but normally it does not matter, because the bit pattern is what one 
expects, and casting to the correct type one has the correct result
    printf("%g\n",1.0*cast(int)(-34+1u));
    printf("%g\n",1.0*cast(long)(-34+1UL));
    printf("%g\n",1.0*cast(int)(-34-1u));
prints
 -33
 -33
 -35
and the advantage of combining freely signed and unsigned without cast 
(and it happens often)  I think out weights its problems.

The only problem that I have seen connected to this is people thinking
	opCmp =	cast(signed)(unsigned-unsigned);
which is wrong.

What I would like to have is
1) adaptive integer literals

For one kind of integer literals, optimally the decimal literals 
without prefix, and introducing a prefix for int integer literals (yes 
in my first message I proposed the opposite, a prefix for the new kind 
of literals, but I changed idea already in the second one) to have a 
very lax regime, based of value preservation:
- all calculations of these integer literals between themselves are 
done with arbitrary precision
- this literal can implicitly cast to any type as long as the type can 
represent the value (that is obviously known at compile time)
- for matching overloaded functions one has to find a rule, this is 
something I am not too sure about, int if the vale fits in it, long if 
it doesn't and ulong if it does not fit in either could be a 
possibility.

2) different integer type for size_t ptr_diff_t (and similar integer 
types that have the size of a pointer)

no cast needed between size_t and ptr_diff_t
cast needed between them and both long and int

Fawzi

Dec 05 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-05 02:53:11 +0100, Sergey Gromov <snake.scaly gmail.com> said:

 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com>
 said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright
 <newshound1 digitalmars.com> said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 
 I accused Andrei of making up the word 'polysemy', but it turns out
 it is a real word! <g>

 
 Is this the beginning of discriminating overloads also based on the
 return values?

 
 No. I think return type overloading looks good in trivial cases, but
 as things get more complex it gets inscrutable.

 
 I agreee that return type overloading can go very bad, but a little bit
 can be very nice.
 
 Polysemy make more expressions typecheck, but I am not sure that I want
 that.
 For example with size_t & co I would amost always want a stronger
 typechecking, as if size_t would be a typedef, but with the usual rules
 wrt to ptr_diff, size_t,... (i.e. not cast between them).
 This because mixing size_t with int, or long is almost always
 suspicious, but you might see it only on the other platform (32/64 bit),
 and not on you own.
 
 Something that I would find nice on the other hand is to have a kind of
 integer literals that automatically cast to the type that makes more sense.

 
 Wouldn't value range propagation take care of that (and actually more)?
 A literal such as 5 will have a support range [5, 5] which provides
 enough information to compute the best type down the road.

 
 It sounds very nice and right, except it's incompatible with Cee.
 
 Well, you can safely reduce bit count so that assigning "1025 & 15" to
 "byte" would go without both a cast and a warning/error.  But you cannot
 grow bitcount beyond the C limits, that is, you cannot return long for
 "1024 << 30."  You should probably report an error, and you should
 provide some way to tell the compiler, "i mean it."
 
 In the worst case, any shift, multiplication or addition will result in
 a compiler error.  Do I miss something?

well what I would like to have is 1024 << 30 to be acceptable as long 
as it is then stored in a long.
With Polysemy I am not sure about what the result should be.

Fawzi

Dec 05 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Fri, 5 Dec 2008 12:24:27 +0100, Fawzi Mohamed wrote:

 On 2008-12-05 02:53:11 +0100, Sergey Gromov <snake.scaly gmail.com> said:
 
 Thu, 04 Dec 2008 09:54:32 -0800, Andrei Alexandrescu wrote:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 22:30:54 +0100, Walter Bright <newshound1 digitalmars.com>
 said:
 
 Fawzi Mohamed wrote:
 On 2008-12-01 21:16:58 +0100, Walter Bright
 <newshound1 digitalmars.com> said:
 
 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 
 I accused Andrei of making up the word 'polysemy', but it turns out
 it is a real word! <g>

 
 Is this the beginning of discriminating overloads also based on the
 return values?

 
 No. I think return type overloading looks good in trivial cases, but
 as things get more complex it gets inscrutable.

 
 I agreee that return type overloading can go very bad, but a little bit
 can be very nice.
 
 Polysemy make more expressions typecheck, but I am not sure that I want
 that.
 For example with size_t & co I would amost always want a stronger
 typechecking, as if size_t would be a typedef, but with the usual rules
 wrt to ptr_diff, size_t,... (i.e. not cast between them).
 This because mixing size_t with int, or long is almost always
 suspicious, but you might see it only on the other platform (32/64 bit),
 and not on you own.
 
 Something that I would find nice on the other hand is to have a kind of
 integer literals that automatically cast to the type that makes more sense.

 
 Wouldn't value range propagation take care of that (and actually more)?
 A literal such as 5 will have a support range [5, 5] which provides
 enough information to compute the best type down the road.

 
 It sounds very nice and right, except it's incompatible with Cee.
 
 Well, you can safely reduce bit count so that assigning "1025 & 15" to
 "byte" would go without both a cast and a warning/error.  But you cannot
 grow bitcount beyond the C limits, that is, you cannot return long for
 "1024 << 30."  You should probably report an error, and you should
 provide some way to tell the compiler, "i mean it."
 
 In the worst case, any shift, multiplication or addition will result in
 a compiler error.  Do I miss something?

 
 well what I would like to have is 1024 << 30 to be acceptable as long 
 as it is then stored in a long.
 With Polysemy I am not sure about what the result should be.

The result should be either 0 or a compile-time error because C requires
this to evaluate to 0.

Dec 08 2008

Paul D. Anderson <paul.d.removethis.anderson comcast.andthis.net> writes:

Walter Bright Wrote:

 Andrei Alexandrescu wrote:
 I'm very excited about polysemy. It's entirely original to D,

 
 I accused Andrei of making up the word 'polysemy', but it turns out it 
 is a real word! <g>

Did Andrei also make up this word? 
http://www.worldwidewords.org/weirdwords/ww-pop3.htm

Dec 03 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Value Preservation and Polysemy