digitalmars.D - Numeric limits tracking

Manu (36/36) May 12 2013 So, here's an issue that constantly drives me nuts, and an elegant solut...

John Colvin (4/49) May 12 2013 I've been thinking about this for a while from an optimisation

Manu (18/67) May 12 2013 It is, but there are limits to what it can do.

Walter Bright (16/18) May 12 2013 It's a form of "data flow analysis". This is done by the optimizer, alth...

bearophile (10/13) May 12 2013 A first step to improve the situation is to propagate the range
Manu (20/40) May 12 2013 It is done by the optimiser, but I'm suggesting that it doesn't only hav...

Walter Bright (10/26) May 12 2013 It's true you could do a sequential only, very conservative form of data...

Manu (13/53) May 12 2013 Yes indeed, there are certainly many ways to interfere with it, but I

David Nadlinger (7/9) May 13 2013 There is no such thing as a language feature invisible to the

deadalnix (3/12) May 13 2013 is(typeof()) say that any permissive change can also break code

Marco Leise (24/37) May 15 2013 Yes, I came across that one every so often myself. But as

Simen Kjaeraas (9/26) May 16 2013 I believe the intention was to show an issue with the language, not to

Manu <turkeyman gmail.com> writes:

So, here's an issue that constantly drives me nuts, and an elegant solution
seems so do-able.

void func(int x)
{
  x &= 0xFFFF;
  short s = x; // Error! (but we know x is 0 .. 65535)

  if(x < 256)
  {
    byte b = x; // Error! (we also know x is 0 .. 255)
  }
}

It would be really nice if the compiler would carry around the known
possible range of values, and refer to that information when performing
down-casting assignments.
It would also be very useful information for the back end, it can choose
more efficient types if it knows the range, or produce jump tables rather
than branch sequences.

I think the compiler would need a min, max, and mask for any integers, and
update them as it works.

There are many sources of this information.
Comparisons effectively limit the range:
if(x > 0)
{
  // x = 0 .. int.max
}
else
{
  // x = int.min .. -1
}

Masks, obviously:
x &= 15;

Also contracts are a really great source of seeding this information on
function entry.

Has this been discussed? It seems simple, and it's invisible to the
language. It would just reduce some boilerplate (tedious casts), and also
offer some great optimisation opportunities.

May 12 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
 So, here's an issue that constantly drives me nuts, and an 
 elegant solution
 seems so do-able.

 void func(int x)
 {
   x &= 0xFFFF;
   short s = x; // Error! (but we know x is 0 .. 65535)

   if(x < 256)
   {
     byte b = x; // Error! (we also know x is 0 .. 255)
   }
 }

 It would be really nice if the compiler would carry around the 
 known
 possible range of values, and refer to that information when 
 performing
 down-casting assignments.
 It would also be very useful information for the back end, it 
 can choose
 more efficient types if it knows the range, or produce jump 
 tables rather
 than branch sequences.

 I think the compiler would need a min, max, and mask for any 
 integers, and
 update them as it works.

 There are many sources of this information.
 Comparisons effectively limit the range:
 if(x > 0)
 {
   // x = 0 .. int.max
 }
 else
 {
   // x = int.min .. -1
 }

 Masks, obviously:
 x &= 15;

 Also contracts are a really great source of seeding this 
 information on
 function entry.

 Has this been discussed? It seems simple, and it's invisible to 
 the
 language. It would just reduce some boilerplate (tedious 
 casts), and also
 offer some great optimisation opportunities.

I've been thinking about this for a while from an optimisation 
point of view. Is this not a common practice for an optimising 
compiler?

May 12 2013

Manu <turkeyman gmail.com> writes:

It is, but there are limits to what it can do.
D's contracts potentially offer a lot of information that the optimiser
wouldn't naturally have (any function receiving blind arguments can't make
any assumptions without a contract present), but more from a
usability/safety point of view, I really hate casting when the compiler
should know the assignment is safe.

It's not only an inconvenience, it's also a safety thing.
Consider:
I have an int that I assign to a byte, I know the value fits in a byte, but
I'm forced to type the manual cast anyway.
Once that cast is written, if the valid range of my int happens to change
in outer code, I will not receive an error informing me that the new range
doesn't fit within a byte anymore, it'll just truncate it anyway, since a
blind cast is already there in the code.
It would be much better for the compiler to start complaining that it can
no longer assure that the int fits into the byte. With the cast present, my
bug has been hidden until I eventually crash.


On 13 May 2013 10:18, John Colvin <john.loughran.colvin gmail.com> wrote:

 On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:

 So, here's an issue that constantly drives me nuts, and an elegant
 solution
 seems so do-able.

 void func(int x)
 {
   x &= 0xFFFF;
   short s = x; // Error! (but we know x is 0 .. 65535)

   if(x < 256)
   {
     byte b = x; // Error! (we also know x is 0 .. 255)
   }
 }

 It would be really nice if the compiler would carry around the known
 possible range of values, and refer to that information when performing
 down-casting assignments.
 It would also be very useful information for the back end, it can choose
 more efficient types if it knows the range, or produce jump tables rather
 than branch sequences.

 I think the compiler would need a min, max, and mask for any integers, and
 update them as it works.

 There are many sources of this information.
 Comparisons effectively limit the range:
 if(x > 0)
 {
   // x = 0 .. int.max
 }
 else
 {
   // x = int.min .. -1
 }

 Masks, obviously:
 x &= 15;

 Also contracts are a really great source of seeding this information on
 function entry.

 Has this been discussed? It seems simple, and it's invisible to the
 language. It would just reduce some boilerplate (tedious casts), and also
 offer some great optimisation opportunities.

 I've been thinking about this for a while from an optimisation point of
 view. Is this not a common practice for an optimising compiler?

May 12 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 5/12/2013 5:00 PM, Manu wrote:
 So, here's an issue that constantly drives me nuts, and an elegant solution
 seems so do-able.

It's a form of "data flow analysis". This is done by the optimizer, although it 
doesn't track ranges.

It isn't quite as simple as you suggest, there are issues like dealing with 
arbitrary control flow.

   x &= 0xFFFF;
   short s = x; // Error! (but we know x is 0 .. 65535)

Ok, but what about:

   x &= 0xFFFF;
   while (expr) {
       short s = x;
       x += 1;
   }

There's a forward reference going on. We cannot semantically evaluate s=x until 
we semantically evaluate the rest of the function in order to know that x is or 
is not reassigned.

May 12 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Walter Bright:

 There's a forward reference going on. We cannot semantically 
 evaluate s=x until we semantically evaluate the rest of the 
 function in order to know that x is or is not reassigned.

A first step to improve the situation is to propagate the range 
of const/immutable values inside a function:

void main() {
     uint x = 1000;
     immutable y = x % 256;
     ubyte z = y; // currently error.
}

Bye,
bearophile

May 12 2013

Manu <turkeyman gmail.com> writes:

On 13 May 2013 11:03, Walter Bright <newshound2 digitalmars.com> wrote:

 On 5/12/2013 5:00 PM, Manu wrote:

 So, here's an issue that constantly drives me nuts, and an elegant
 solution
 seems so do-able.

 It's a form of "data flow analysis". This is done by the optimizer,
 although it doesn't track ranges.

 It isn't quite as simple as you suggest, there are issues like dealing
 with arbitrary control flow.


   x &= 0xFFFF;
   short s = x; // Error! (but we know x is 0 .. 65535)

 Ok, but what about:

   x &= 0xFFFF;
   while (expr) {
       short s = x;
       x += 1;
   }

 There's a forward reference going on. We cannot semantically evaluate s=x
 until we semantically evaluate the rest of the function in order to know
 that x is or is not reassigned.

It is done by the optimiser, but I'm suggesting that it doesn't only have
value in terms of optimisation. It could improve language usability if
performed in the front end.

In that context, unless the analysis is very powerful(/mature) and can
statically determine 'expr', then x would need to be relaxed to unlimited
at that point, since it obviously can't know the limits within (or after)
that loop.
In this case, the compiler would complain that it can't determine x and you
would need an explicit cast as usual.

I can see the effective forward reference issue here, which could be
addressed in time, but even just reasonably simple limits tracking (ie,
within sequential code) would make a massive difference. I think
programmers would intuitively understand this situation and wouldn't expect
magic from the compiler.
The places where it matters the most are places where it's dead obvious
what the numeric limits are, and it should be equally obvious to the
compiler.

It's the sort of thing that could be improved with time, but even a very
simple start would be a big help in many cases.

May 12 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 5/12/2013 6:23 PM, Manu wrote:
 It is done by the optimiser, but I'm suggesting that it doesn't only have value
 in terms of optimisation. It could improve language usability if performed in
 the front end.

I understand.

 In that context, unless the analysis is very powerful(/mature) and can
 statically determine 'expr', then x would need to be relaxed to unlimited at
 that point, since it obviously can't know the limits within (or after) that
loop.
 In this case, the compiler would complain that it can't determine x and you
 would need an explicit cast as usual.

 I can see the effective forward reference issue here, which could be addressed
 in time, but even just reasonably simple limits tracking (ie, within sequential
 code) would make a massive difference. I think programmers would intuitively
 understand this situation and wouldn't expect magic from the compiler.
 The places where it matters the most are places where it's dead obvious what
the
 numeric limits are, and it should be equally obvious to the compiler.

 It's the sort of thing that could be improved with time, but even a very simple
 start would be a big help in many cases.

It's true you could do a sequential only, very conservative form of data flow 
analysis.

But be aware there are other problems, such as:

   int *p = ...;
   ...
   x &= 0xFFFF;
   ++(*p);      // hmm, is x affected?
   short s = x;

May 12 2013

Manu <turkeyman gmail.com> writes:

On 13 May 2013 13:46, Walter Bright <newshound2 digitalmars.com> wrote:

 On 5/12/2013 6:23 PM, Manu wrote:

 It is done by the optimiser, but I'm suggesting that it doesn't only have
 value
 in terms of optimisation. It could improve language usability if
 performed in
 the front end.

 I understand.


  In that context, unless the analysis is very powerful(/mature) and can
 statically determine 'expr', then x would need to be relaxed to unlimited
 at
 that point, since it obviously can't know the limits within (or after)
 that loop.
 In this case, the compiler would complain that it can't determine x and
 you
 would need an explicit cast as usual.

 I can see the effective forward reference issue here, which could be
 addressed
 in time, but even just reasonably simple limits tracking (ie, within
 sequential
 code) would make a massive difference. I think programmers would
 intuitively
 understand this situation and wouldn't expect magic from the compiler.
 The places where it matters the most are places where it's dead obvious
 what the
 numeric limits are, and it should be equally obvious to the compiler.

 It's the sort of thing that could be improved with time, but even a very
 simple
 start would be a big help in many cases.

 It's true you could do a sequential only, very conservative form of data
 flow analysis.

 But be aware there are other problems, such as:

   int *p = ...;
   ...
   x &= 0xFFFF;
   ++(*p);      // hmm, is x affected?
   short s = x;

Yes indeed, there are certainly many ways to interfere with it, but I
certainly hope that most lines of code you deal with on a daily basis
aren't like this ;)
Obviously it would grow in maturity over time. In that example, just revert
to unlimited assumptions when it encounters the pointer operation. Limits
may then be refined again in following code.
I can imagine things like proper escaping analysis as planned could
eventually help in cases like this.

Anyway, it's just something to think about. I think it would offer
additional safety/correctness to the language; anywhere you type a blind
cast, you are potentially hiding a future bug. I'd rather never cast when I
don't absolutely need/intend to.

May 12 2013

"David Nadlinger" <see klickverbot.at> writes:

On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
 Has this been discussed? It seems simple, and it's invisible to 
 the language.

There is no such thing as a language feature invisible to the 
language – or at least there should not be. Even a feature, such 
as this, which only makes the language more permissive needs 
exact specifications, for all D compilers have to implement it in 
the same way.

David

May 13 2013

"deadalnix" <deadalnix gmail.com> writes:

On Monday, 13 May 2013 at 12:37:34 UTC, David Nadlinger wrote:
 On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
 Has this been discussed? It seems simple, and it's invisible 
 to the language.

 There is no such thing as a language feature invisible to the 
 language – or at least there should not be. Even a feature, 
 such as this, which only makes the language more permissive 
 needs exact specifications, for all D compilers have to 
 implement it in the same way.

 David

is(typeof()) say that any permissive change can also break code 
in D. This is our little specificity ;)

May 13 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Mon, 13 May 2013 10:00:40 +1000
schrieb Manu <turkeyman gmail.com>:

 So, here's an issue that constantly drives me nuts, and an elegant solution
 seems so do-able.
 
 void func(int x)
 {
   x &= 0xFFFF;
   short s = x; // Error! (but we know x is 0 .. 65535)
 
   if(x < 256)
   {
     byte b = x; // Error! (we also know x is 0 .. 255)
   }
 }

Yes, I came across that one every so often myself. But as
David pointed out, such a feature would have to make it into
the D specification and implemented in all D compilers.
In any case I wouldn't call a half-baked "I will solve the
halting problem" flow-analysis elegant. :p

Let's think different. You could write
    short s = x & 0xFFFF;
on your first line, as you may already know DMD does this bit
of range analysis (same for modulo).

Also what was your intention? func() practically takes a short
parameter. Maybe it's signature should be
    void func(short x) {...}
If that's a problem on the calling site, maybe that can be
fixed?

bearophile's suggestion sounds like something that's pretty
solid and naturally extends the likes of "short s = x & 0xFFF;"
So if your "int x" was immutable it would really be trivial to
have your "if" change the range of immutable x during that
scope and have the assignment only fail because x may be
-1000. ;)

-- 
Marco

May 15 2013

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Thu, 16 May 2013 07:34:18 +0200, Marco Leise <Marco.Leise gmx.de> wrote:

 Am Mon, 13 May 2013 10:00:40 +1000
 schrieb Manu <turkeyman gmail.com>:

 void func(int x)
 {
   x &= 0xFFFF;
   short s = x; // Error! (but we know x is 0 .. 65535)

   if(x < 256)
   {
     byte b = x; // Error! (we also know x is 0 .. 255)
   }
 }


[snip]
 Also what was your intention? func() practically takes a short
 parameter. Maybe it's signature should be
     void func(short x) {...}
 If that's a problem on the calling site, maybe that can be
 fixed?

I believe the intention was to show an issue with the language, not to  
make a
function that assigns to a short and a byte only to throw them away.

Perhaps he needed to do some intermediate calculations at higher precision,
perhaps he needs it to be an int for interfacing with a C function, etc.

-- 
Simen

May 16 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Numeric limits tracking