www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Numeric limits tracking

reply Manu <turkeyman gmail.com> writes:
So, here's an issue that constantly drives me nuts, and an elegant solution
seems so do-able.

void func(int x)
{
  x &= 0xFFFF;
  short s = x; // Error! (but we know x is 0 .. 65535)

  if(x < 256)
  {
    byte b = x; // Error! (we also know x is 0 .. 255)
  }
}

It would be really nice if the compiler would carry around the known
possible range of values, and refer to that information when performing
down-casting assignments.
It would also be very useful information for the back end, it can choose
more efficient types if it knows the range, or produce jump tables rather
than branch sequences.

I think the compiler would need a min, max, and mask for any integers, and
update them as it works.

There are many sources of this information.
Comparisons effectively limit the range:
if(x > 0)
{
  // x = 0 .. int.max
}
else
{
  // x = int.min .. -1
}

Masks, obviously:
x &= 15;

Also contracts are a really great source of seeding this information on
function entry.

Has this been discussed? It seems simple, and it's invisible to the
language. It would just reduce some boilerplate (tedious casts), and also
offer some great optimisation opportunities.
May 12 2013
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
 So, here's an issue that constantly drives me nuts, and an 
 elegant solution
 seems so do-able.

 void func(int x)
 {
   x &= 0xFFFF;
   short s = x; // Error! (but we know x is 0 .. 65535)

   if(x < 256)
   {
     byte b = x; // Error! (we also know x is 0 .. 255)
   }
 }

 It would be really nice if the compiler would carry around the 
 known
 possible range of values, and refer to that information when 
 performing
 down-casting assignments.
 It would also be very useful information for the back end, it 
 can choose
 more efficient types if it knows the range, or produce jump 
 tables rather
 than branch sequences.

 I think the compiler would need a min, max, and mask for any 
 integers, and
 update them as it works.

 There are many sources of this information.
 Comparisons effectively limit the range:
 if(x > 0)
 {
   // x = 0 .. int.max
 }
 else
 {
   // x = int.min .. -1
 }

 Masks, obviously:
 x &= 15;

 Also contracts are a really great source of seeding this 
 information on
 function entry.

 Has this been discussed? It seems simple, and it's invisible to 
 the
 language. It would just reduce some boilerplate (tedious 
 casts), and also
 offer some great optimisation opportunities.
I've been thinking about this for a while from an optimisation point of view. Is this not a common practice for an optimising compiler?
May 12 2013
parent Manu <turkeyman gmail.com> writes:
It is, but there are limits to what it can do.
D's contracts potentially offer a lot of information that the optimiser
wouldn't naturally have (any function receiving blind arguments can't make
any assumptions without a contract present), but more from a
usability/safety point of view, I really hate casting when the compiler
should know the assignment is safe.

It's not only an inconvenience, it's also a safety thing.
Consider:
I have an int that I assign to a byte, I know the value fits in a byte, but
I'm forced to type the manual cast anyway.
Once that cast is written, if the valid range of my int happens to change
in outer code, I will not receive an error informing me that the new range
doesn't fit within a byte anymore, it'll just truncate it anyway, since a
blind cast is already there in the code.
It would be much better for the compiler to start complaining that it can
no longer assure that the int fits into the byte. With the cast present, my
bug has been hidden until I eventually crash.


On 13 May 2013 10:18, John Colvin <john.loughran.colvin gmail.com> wrote:

 On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:

 So, here's an issue that constantly drives me nuts, and an elegant
 solution
 seems so do-able.

 void func(int x)
 {
   x &= 0xFFFF;
   short s = x; // Error! (but we know x is 0 .. 65535)

   if(x < 256)
   {
     byte b = x; // Error! (we also know x is 0 .. 255)
   }
 }

 It would be really nice if the compiler would carry around the known
 possible range of values, and refer to that information when performing
 down-casting assignments.
 It would also be very useful information for the back end, it can choose
 more efficient types if it knows the range, or produce jump tables rather
 than branch sequences.

 I think the compiler would need a min, max, and mask for any integers, and
 update them as it works.

 There are many sources of this information.
 Comparisons effectively limit the range:
 if(x > 0)
 {
   // x = 0 .. int.max
 }
 else
 {
   // x = int.min .. -1
 }

 Masks, obviously:
 x &= 15;

 Also contracts are a really great source of seeding this information on
 function entry.

 Has this been discussed? It seems simple, and it's invisible to the
 language. It would just reduce some boilerplate (tedious casts), and also
 offer some great optimisation opportunities.
I've been thinking about this for a while from an optimisation point of view. Is this not a common practice for an optimising compiler?
May 12 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/12/2013 5:00 PM, Manu wrote:
 So, here's an issue that constantly drives me nuts, and an elegant solution
 seems so do-able.
It's a form of "data flow analysis". This is done by the optimizer, although it doesn't track ranges. It isn't quite as simple as you suggest, there are issues like dealing with arbitrary control flow. x &= 0xFFFF; short s = x; // Error! (but we know x is 0 .. 65535) Ok, but what about: x &= 0xFFFF; while (expr) { short s = x; x += 1; } There's a forward reference going on. We cannot semantically evaluate s=x until we semantically evaluate the rest of the function in order to know that x is or is not reassigned.
May 12 2013
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Walter Bright:

 There's a forward reference going on. We cannot semantically 
 evaluate s=x until we semantically evaluate the rest of the 
 function in order to know that x is or is not reassigned.
A first step to improve the situation is to propagate the range of const/immutable values inside a function: void main() { uint x = 1000; immutable y = x % 256; ubyte z = y; // currently error. } Bye, bearophile
May 12 2013
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 13 May 2013 11:03, Walter Bright <newshound2 digitalmars.com> wrote:

 On 5/12/2013 5:00 PM, Manu wrote:

 So, here's an issue that constantly drives me nuts, and an elegant
 solution
 seems so do-able.
It's a form of "data flow analysis". This is done by the optimizer, although it doesn't track ranges. It isn't quite as simple as you suggest, there are issues like dealing with arbitrary control flow. x &= 0xFFFF; short s = x; // Error! (but we know x is 0 .. 65535) Ok, but what about: x &= 0xFFFF; while (expr) { short s = x; x += 1; } There's a forward reference going on. We cannot semantically evaluate s=x until we semantically evaluate the rest of the function in order to know that x is or is not reassigned.
It is done by the optimiser, but I'm suggesting that it doesn't only have value in terms of optimisation. It could improve language usability if performed in the front end. In that context, unless the analysis is very powerful(/mature) and can statically determine 'expr', then x would need to be relaxed to unlimited at that point, since it obviously can't know the limits within (or after) that loop. In this case, the compiler would complain that it can't determine x and you would need an explicit cast as usual. I can see the effective forward reference issue here, which could be addressed in time, but even just reasonably simple limits tracking (ie, within sequential code) would make a massive difference. I think programmers would intuitively understand this situation and wouldn't expect magic from the compiler. The places where it matters the most are places where it's dead obvious what the numeric limits are, and it should be equally obvious to the compiler. It's the sort of thing that could be improved with time, but even a very simple start would be a big help in many cases.
May 12 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/12/2013 6:23 PM, Manu wrote:
 It is done by the optimiser, but I'm suggesting that it doesn't only have value
 in terms of optimisation. It could improve language usability if performed in
 the front end.
I understand.
 In that context, unless the analysis is very powerful(/mature) and can
 statically determine 'expr', then x would need to be relaxed to unlimited at
 that point, since it obviously can't know the limits within (or after) that
loop.
 In this case, the compiler would complain that it can't determine x and you
 would need an explicit cast as usual.

 I can see the effective forward reference issue here, which could be addressed
 in time, but even just reasonably simple limits tracking (ie, within sequential
 code) would make a massive difference. I think programmers would intuitively
 understand this situation and wouldn't expect magic from the compiler.
 The places where it matters the most are places where it's dead obvious what
the
 numeric limits are, and it should be equally obvious to the compiler.

 It's the sort of thing that could be improved with time, but even a very simple
 start would be a big help in many cases.
It's true you could do a sequential only, very conservative form of data flow analysis. But be aware there are other problems, such as: int *p = ...; ... x &= 0xFFFF; ++(*p); // hmm, is x affected? short s = x;
May 12 2013
parent Manu <turkeyman gmail.com> writes:
On 13 May 2013 13:46, Walter Bright <newshound2 digitalmars.com> wrote:

 On 5/12/2013 6:23 PM, Manu wrote:

 It is done by the optimiser, but I'm suggesting that it doesn't only have
 value
 in terms of optimisation. It could improve language usability if
 performed in
 the front end.
I understand. In that context, unless the analysis is very powerful(/mature) and can
 statically determine 'expr', then x would need to be relaxed to unlimited
 at
 that point, since it obviously can't know the limits within (or after)
 that loop.
 In this case, the compiler would complain that it can't determine x and
 you
 would need an explicit cast as usual.

 I can see the effective forward reference issue here, which could be
 addressed
 in time, but even just reasonably simple limits tracking (ie, within
 sequential
 code) would make a massive difference. I think programmers would
 intuitively
 understand this situation and wouldn't expect magic from the compiler.
 The places where it matters the most are places where it's dead obvious
 what the
 numeric limits are, and it should be equally obvious to the compiler.

 It's the sort of thing that could be improved with time, but even a very
 simple
 start would be a big help in many cases.
It's true you could do a sequential only, very conservative form of data flow analysis. But be aware there are other problems, such as: int *p = ...; ... x &= 0xFFFF; ++(*p); // hmm, is x affected? short s = x;
Yes indeed, there are certainly many ways to interfere with it, but I certainly hope that most lines of code you deal with on a daily basis aren't like this ;) Obviously it would grow in maturity over time. In that example, just revert to unlimited assumptions when it encounters the pointer operation. Limits may then be refined again in following code. I can imagine things like proper escaping analysis as planned could eventually help in cases like this. Anyway, it's just something to think about. I think it would offer additional safety/correctness to the language; anywhere you type a blind cast, you are potentially hiding a future bug. I'd rather never cast when I don't absolutely need/intend to.
May 12 2013
prev sibling next sibling parent reply "David Nadlinger" <see klickverbot.at> writes:
On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
 Has this been discussed? It seems simple, and it's invisible to 
 the language.
There is no such thing as a language feature invisible to the language – or at least there should not be. Even a feature, such as this, which only makes the language more permissive needs exact specifications, for all D compilers have to implement it in the same way. David
May 13 2013
parent "deadalnix" <deadalnix gmail.com> writes:
On Monday, 13 May 2013 at 12:37:34 UTC, David Nadlinger wrote:
 On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
 Has this been discussed? It seems simple, and it's invisible 
 to the language.
There is no such thing as a language feature invisible to the language – or at least there should not be. Even a feature, such as this, which only makes the language more permissive needs exact specifications, for all D compilers have to implement it in the same way. David
is(typeof()) say that any permissive change can also break code in D. This is our little specificity ;)
May 13 2013
prev sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Mon, 13 May 2013 10:00:40 +1000
schrieb Manu <turkeyman gmail.com>:

 So, here's an issue that constantly drives me nuts, and an elegant solution
 seems so do-able.
 
 void func(int x)
 {
   x &= 0xFFFF;
   short s = x; // Error! (but we know x is 0 .. 65535)
 
   if(x < 256)
   {
     byte b = x; // Error! (we also know x is 0 .. 255)
   }
 }
Yes, I came across that one every so often myself. But as David pointed out, such a feature would have to make it into the D specification and implemented in all D compilers. In any case I wouldn't call a half-baked "I will solve the halting problem" flow-analysis elegant. :p Let's think different. You could write short s = x & 0xFFFF; on your first line, as you may already know DMD does this bit of range analysis (same for modulo). Also what was your intention? func() practically takes a short parameter. Maybe it's signature should be void func(short x) {...} If that's a problem on the calling site, maybe that can be fixed? bearophile's suggestion sounds like something that's pretty solid and naturally extends the likes of "short s = x & 0xFFF;" So if your "int x" was immutable it would really be trivial to have your "if" change the range of immutable x during that scope and have the assignment only fail because x may be -1000. ;) -- Marco
May 15 2013
parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Thu, 16 May 2013 07:34:18 +0200, Marco Leise <Marco.Leise gmx.de> wrote:

 Am Mon, 13 May 2013 10:00:40 +1000
 schrieb Manu <turkeyman gmail.com>:

 void func(int x)
 {
   x &= 0xFFFF;
   short s = x; // Error! (but we know x is 0 .. 65535)

   if(x < 256)
   {
     byte b = x; // Error! (we also know x is 0 .. 255)
   }
 }
[snip]
 Also what was your intention? func() practically takes a short
 parameter. Maybe it's signature should be
     void func(short x) {...}
 If that's a problem on the calling site, maybe that can be
 fixed?
I believe the intention was to show an issue with the language, not to make a function that assigns to a short and a byte only to throw them away. Perhaps he needed to do some intermediate calculations at higher precision, perhaps he needs it to be an int for interfacing with a C function, etc. -- Simen
May 16 2013