## digitalmars.D - value range propagation for logical OR

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```Consider:

byte c = a | b;

Say you already know min_a, max_a, min_b, and max_b. How do you compute
min_c and max_c? I thought of it for a bit and it seems quite tricky.

Thanks,

Andrei
```
Apr 10 2010
Justin Spahr-Summers <Justin.SpahrSummers gmail.com> writes:
```On Sat, 10 Apr 2010 12:01:45 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

Consider:

byte c = a | b;

Say you already know min_a, max_a, min_b, and max_b. How do you compute
min_c and max_c? I thought of it for a bit and it seems quite tricky.

Thanks,

Andrei

It seems like it would use the lowest minimum and the highest maximum of
the two... that's what I personally would find most natural, given the
semantics of bitwise OR, but maybe I'm just way off base.
```
Apr 10 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 04/10/2010 12:22 PM, Justin Spahr-Summers wrote:
On Sat, 10 Apr 2010 12:01:45 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org>  wrote:
Consider:

byte c = a | b;

Say you already know min_a, max_a, min_b, and max_b. How do you compute
min_c and max_c? I thought of it for a bit and it seems quite tricky.

Thanks,

Andrei

It seems like it would use the lowest minimum and the highest maximum of
the two... that's what I personally would find most natural, given the
semantics of bitwise OR, but maybe I'm just way off base.

I thought the same, but consider:

a = 4;
b = 3;

Then the maximum value is 7, larger than both.

Andrei
```
Apr 10 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Sat, Apr 10, 2010 at 12:01:45PM -0500, Andrei Alexandrescu wrote:
byte c = a | b;

Say you already know min_a, max_a, min_b, and max_b. How do you compute
min_c and max_c? I thought of it for a bit and it seems quite tricky.

min_c seems easy enough:
min_c = max(min_a, min_b);

When you or two values together, the result must be >= both of them, and this
does the trick there.

max_c is a bit harder. The obvious solution of:

max_c = max(max_a, max_b);

Fails because of 0b1100 | 0b0011... looking at this example, let's try:

max_c = max_a + max_b;

It covers that case, but is too big for 0b1111 | 0b1111.

This might work:

int numbits(number) { return the number of bits required to hold that number; }

if ( numbits(max_a + max_b) > max(numbits(max_a), numbits(max_b)) )
max_c = 2 ^^ (max(numbits(max_a), numbits(max_b))) - 1;
else
max_c = max_a + max_b;

The biggest we can possibly get is the sum of the two numbers. But, if it
carries
over to more bits than either option, we truncate it - OR never gives more bits
than the two arguments. So we assume the max is the worst case of all ones or'd
with all ones.

I think this would work, but might not be the best we can possibly do. It is the
best I can think of though.

--
http://arsdnet.net
```
Apr 10 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```(By the way I meant _bitwise_ in the title.)

On 04/10/2010 12:38 PM, Adam D. Ruppe wrote:
On Sat, Apr 10, 2010 at 12:01:45PM -0500, Andrei Alexandrescu wrote:
byte c = a | b;

Say you already know min_a, max_a, min_b, and max_b. How do you compute
min_c and max_c? I thought of it for a bit and it seems quite tricky.

min_c seems easy enough:
min_c = max(min_a, min_b);

When you or two values together, the result must be>= both of them, and this
does the trick there.

Yah, that works for unsigned types. For signed types, it's tricky (as
tricky as max).

max_c is a bit harder. The obvious solution of:

max_c = max(max_a, max_b);

Fails because of 0b1100 | 0b0011... looking at this example, let's try:

max_c = max_a + max_b;

It covers that case, but is too big for 0b1111 | 0b1111.

Yah, sum is a bound for max, but not the exact max. This is because
bitwise OR is like a sum without transport.

This might work:

int numbits(number) { return the number of bits required to hold that number; }

if ( numbits(max_a + max_b)>  max(numbits(max_a), numbits(max_b)) )
max_c = 2 ^^ (max(numbits(max_a), numbits(max_b))) - 1;
else
max_c = max_a + max_b;

The biggest we can possibly get is the sum of the two numbers. But, if it
carries
over to more bits than either option, we truncate it - OR never gives more bits
than the two arguments. So we assume the max is the worst case of all ones or'd
with all ones.

I think this would work, but might not be the best we can possibly do. It is
the
best I can think of though.

Even on the branch that doesn't use the sum, that's still just a bound.
Consider:

a = 8;
b = 4;

Then max(a|b) is 12, but computed with your method is 15.

Andrei
```
Apr 10 2010
bearophile <bearophileHUGS lycos.com> writes:
```Adam D. Ruppe:

My feeling is to go ahead and cast to unsigned, then do the calculation. For
me at least, when doing bitwise ops, I assume it is unsigned anyway.

Time ago I have even suggested to disallow bitwise ops when one or both
operands are signed...

Bye,
bearophile
```
Apr 10 2010
bearophile <bearophileHUGS lycos.com> writes:
```Adam D. Ruppe:
I understand the position, but I don't advocate it just because I think it
would
be annoying and not give much of a benefit.

doesn't give enough benefit.

bye,
bearophile
```
Apr 10 2010
bearophile <bearophileHUGS lycos.com> writes:
```Adam D. Ruppe:

Oh no! Eyeballing again showed a problem.. I should have put parens in my
asserts:
a | b < f(a,b)   !=  (a|b) < f(a,b)

Bugs are gold nuggets! Yes, the precedence of bitwise operators is low, it's an
error-prone thing, it's a part of C/C++/D that causes frequent bugs in programs
written by everybody. I add extra parentheses when I use bitwise operators.

Unfortunately I don't see a simple way to remove this significant source of
bugs from the D2 language.

When you switch on full warnings GCC warns you about few possible similar
errors, suggesting to add parentheses to remove some ambiguity (for the eyes of
the programmers).

This is a small example in C:

#include "stdio.h"
#include "stdlib.h"
int main() {
int a = atoi("10");
int b = atoi("20");
int c = atoi("30");
printf("%u\n", a|b <= c);
return 0;
}

If you compile it with GCC (I am using gcc 4.4.1):

...>gcc -Wall test.c -o test
test.c: In function 'main':
test.c:9: warning: suggest parentheses around comparison in operand of '|'

You always use -Wall (and other warnings) when you write C code. So in this
case the C language+compiler is able to catch a bug like yours :-)

Bye,
bearophile
```
Apr 10 2010
bearophile <bearophileHUGS lycos.com> writes:
```I have filed a "report":
http://d.puremagic.com/issues/show_bug.cgi?id=4077

Bye,
bearophile
```
Apr 10 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 04/10/2010 09:55 PM, bearophile wrote:
I have filed a "report":
http://d.puremagic.com/issues/show_bug.cgi?id=4077

Perfect, thanks. Your decision to contribute with reports is very
appreciated.

Andrei
```
Apr 10 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 04/10/2010 06:58 PM, Adam D. Ruppe wrote:
On Sat, Apr 10, 2010 at 07:38:07PM -0400, Adam D. Ruppe wrote:
I *might* have it, but I'm not 100% confident in my test program.

I think my test program is telling me something: a non-zero min_c subtracts
from
max_c. If I take my brute-force empirically calculated max_c and compare it to
what my max_c function returns, it doesn't always match if the minimums change.

This might be a bug; it is unintuitive that it would do this, but it actually
makes sense. Consider if min == max, you only have one value to work with.

let max_a = 4, min_a = 0;

max_b = 4

You can get 7 out of it, since 3<  4, so it is a possible number and: 100 |
011 == 7

But, if min_b == 4, the only possible value for b is 4. And: 100 | 100 == 4.

Increasing min_b does indeed decrease the max you can get.

Nice fact, but I've spent too much time on this already, so I'll call it done
with
this:

My max_c included some unnecessary code: my reusable mask obsoletes all of
my special case code! So, we can bring the program down to three lines:

import std.intrinsic;
uint numbits(uint a) { return a == 0 ? a : (bsr(a) + 1); }
uint max_c(uint a, uint b) { return a | ((1<<  numbits(a&b)) - 1) | b; }

It passes my test, though since I bugged it up the first time, I might have
done
it again.

What results does it yield with my main() test harness?

Andrei
```
Apr 10 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 04/10/2010 08:58 PM, Adam D. Ruppe wrote:
On Sat, Apr 10, 2010 at 08:42:54PM -0500, Andrei Alexandrescu wrote:
What results does it yield with my main() test harness?

total=100000000; equal=14585400 (14.5854%)

I think that's a perfect result. Here's why: the equality comparison checks
two specific values, (a, b), but the maxOr functions return the max for the
interval ([0, a], [0, b]). It wouldn't always be equal, since there are other
values in there that can reach the max.

Yah, it's clear that we're not looking anywhere near 100%. I'm a bit sad
we don't have an adequate baseline, but the fact that two distinct
methods obtained the same result is encouraging.

For example, a = 4, b = 4. 100|100 == 100, but that range also include 4|3,
100 | 011 == 111, which is the max.

Your maxOR function gives the same percentage, for probably the same reason.
Though, mine runs ~7x faster on my box.

Then you haven't wasted your day!

On to the next task: compute minOR and maxOR for _signed_ values. It
turns out minOR its also nontrivial...

When we're done, we can pass the functions to Walter so he can integrate
them within his compiler. Right now the compiler uses a wrong algorithm.

Andrei
```
Apr 10 2010
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```Adam D. Ruppe wrote:

I am pretty convinced that to get the
perfect solution, the max_c function needs to know min_a and min_b too.

Andrei Alexandrescu wrote:
On to the next task: compute minOR and maxOR for _signed_ values. It
turns out minOR its also nontrivial...

How about computing minOR for unsigned values first? ;)

The proposed solution of max(min_a, min_b) is correct only for a couple
of corner cases.

Ali
```
Apr 10 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 04/10/2010 12:01 PM, Andrei Alexandrescu wrote:
Consider:

byte c = a | b;

Say you already know min_a, max_a, min_b, and max_b. How do you compute
min_c and max_c? I thought of it for a bit and it seems quite tricky.

Thanks,

Andrei

I think this would work:

uint maxOR(uint maxa, uint maxb) {
if (maxa < maxb) return maxOR(maxb, maxa);
uint candidate = 0;
foreach (i, bit; byBitDescending(maxa)) {
if (bit) continue;
auto t = candidate | (1 << (31 - i));
if (t <= maxb) candidate = t;
}
return maxa | candidate;
}

The idea is to find the candidate that would fill as many zeros in maxa
as possible.

But this is inefficient. I'm wondering whether a simple bitwise
manipulation expression could do the trick.

Andrei
```
Apr 10 2010
Rainer Deyke <rainerd eldwood.com> writes:
```On 4/10/2010 11:52, Andrei Alexandrescu wrote:
I think this would work:

uint maxOR(uint maxa, uint maxb) {
if (maxa < maxb) return maxOR(maxb, maxa);
uint candidate = 0;
foreach (i, bit; byBitDescending(maxa)) {
if (bit) continue;
auto t = candidate | (1 << (31 - i));
if (t <= maxb) candidate = t;
}
return maxa | candidate;
}

This looks wrong.  Your function, if I understand it correctly, flips
all the bits in 'maxb' (excluding the leading zeros).  If 'maxb' is
exactly 1 less than a power of 2, then 'candidate' will be 0.  Now
consider a in [0, 16], b in [0, 15].  Your function will produce 16, but
the real maximum is 31.

For maximum accuracy, you have to consider the minima as well as the
maxima when calculating the new maximum.  With 'a' in [16, 16] and 'b'
in [16, 16], the new range can only be [16, 16].  With 'a' in [0, 16]
and 'b' in [0, 16], the correct new range is [0, 31].

--
Rainer Deyke - rainerd eldwood.com
```
Apr 10 2010
Don <nospam nospam.com> writes:
```Rainer Deyke wrote:
On 4/10/2010 11:52, Andrei Alexandrescu wrote:
I think this would work:

uint maxOR(uint maxa, uint maxb) {
if (maxa < maxb) return maxOR(maxb, maxa);
uint candidate = 0;
foreach (i, bit; byBitDescending(maxa)) {
if (bit) continue;
auto t = candidate | (1 << (31 - i));
if (t <= maxb) candidate = t;
}
return maxa | candidate;
}

This looks wrong.  Your function, if I understand it correctly, flips
all the bits in 'maxb' (excluding the leading zeros).  If 'maxb' is
exactly 1 less than a power of 2, then 'candidate' will be 0.  Now
consider a in [0, 16], b in [0, 15].  Your function will produce 16, but
the real maximum is 31.

For maximum accuracy, you have to consider the minima as well as the
maxima when calculating the new maximum.  With 'a' in [16, 16] and 'b'
in [16, 16], the new range can only be [16, 16].  With 'a' in [0, 16]
and 'b' in [0, 16], the correct new range is [0, 31].

It's a very interesting puzzle. There are some pretty complex cases.

Consider (all numbers binary) a in [1010, 1100] and b in [101,101]. The
maximum value is when a=1011, b=101, so max is 1111.
```
Apr 10 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 04/10/2010 01:55 PM, Rainer Deyke wrote:
On 4/10/2010 11:52, Andrei Alexandrescu wrote:
I think this would work:

uint maxOR(uint maxa, uint maxb) {
if (maxa<  maxb) return maxOR(maxb, maxa);
uint candidate = 0;
foreach (i, bit; byBitDescending(maxa)) {
if (bit) continue;
auto t = candidate | (1<<  (31 - i));
if (t<= maxb) candidate = t;
}
return maxa | candidate;
}

This looks wrong.  Your function, if I understand it correctly, flips
all the bits in 'maxb' (excluding the leading zeros).

It tries to set all bits in which maxa is zero starting and keeps them
set if the resulting number is less than maxb.

If 'maxb' is
exactly 1 less than a power of 2, then 'candidate' will be 0.  Now
consider a in [0, 16], b in [0, 15].  Your function will produce 16, but
the real maximum is 31.

If maxb is 2^^N-1, then it will fill all nonzero bits of maxa.

For maximum accuracy, you have to consider the minima as well as the
maxima when calculating the new maximum.  With 'a' in [16, 16] and 'b'
in [16, 16], the new range can only be [16, 16].  With 'a' in [0, 16]
and 'b' in [0, 16], the correct new range is [0, 31].

The minimum for unsigned numbers is very simple: min(mina, minb).

Andrei
```
Apr 10 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 04/10/2010 04:21 PM, Adam D. Ruppe wrote:
On Sat, Apr 10, 2010 at 04:05:06PM -0500, Andrei Alexandrescu wrote:
The minimum for unsigned numbers is very simple: min(mina, minb).

mina = 1
minb = 0

min(1, 0) == 0

But, 1|0 == 1

max(mina, minb) looks to work though. Consider that ORing can only set bits -
it can never take a one and spit out a zero. Therefore, x|y>= x&&  x|y>= y,
which only works on the max(mina, minb) function.

Sorry, I meant max of the two minima.

Andrei
```
Apr 10 2010
=?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Andrei Alexandrescu wrote:
On 04/10/2010 01:55 PM, Rainer Deyke wrote:
On 4/10/2010 11:52, Andrei Alexandrescu wrote:
I think this would work:

uint maxOR(uint maxa, uint maxb) {
if (maxa<  maxb) return maxOR(maxb, maxa);
uint candidate =3D 0;
foreach (i, bit; byBitDescending(maxa)) {
if (bit) continue;
auto t =3D candidate | (1<<  (31 - i));
if (t<=3D maxb) candidate =3D t;
}
return maxa | candidate;
}

This looks wrong.  Your function, if I understand it correctly, flips
all the bits in 'maxb' (excluding the leading zeros).

It tries to set all bits in which maxa is zero starting and keeps them
set if the resulting number is less than maxb.
=20

int highbit (uint n) {
auto result =3D 0;
if (n & 0xFFFF0000) {
result +=3D 16;
n =3D n >> 16;
}
if (n & 0xFF00) {
result +=3D 8;
n =3D n >> 8;
}
if (n & 0xF0) {
result +=3D 4;
n =3D n >> 4;
}
if (n & 0xC) {
result +=3D 2;
n =3D n >> 2;
}
if (n & 0x2) {
result +=3D 1;
n =3D n >> 1;
}
if (n & 0x1) {
result +=3D 1;
}
return result;
}

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 10 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 04/10/2010 04:50 PM, "Jérôme M. Berger" wrote:
Andrei Alexandrescu wrote:
On 04/10/2010 01:55 PM, Rainer Deyke wrote:
On 4/10/2010 11:52, Andrei Alexandrescu wrote:
I think this would work:

uint maxOR(uint maxa, uint maxb) {
if (maxa<   maxb) return maxOR(maxb, maxa);
uint candidate = 0;
foreach (i, bit; byBitDescending(maxa)) {
if (bit) continue;
auto t = candidate | (1<<   (31 - i));
if (t<= maxb) candidate = t;
}
return maxa | candidate;
}

This looks wrong.  Your function, if I understand it correctly, flips
all the bits in 'maxb' (excluding the leading zeros).

It tries to set all bits in which maxa is zero starting and keeps them
set if the resulting number is less than maxb.

int highbit (uint n) {
auto result = 0;
if (n&  0xFFFF0000) {
result += 16;
n = n>>  16;
}
if (n&  0xFF00) {
result += 8;
n = n>>  8;
}
if (n&  0xF0) {
result += 4;
n = n>>  4;
}
if (n&  0xC) {
result += 2;
n = n>>  2;
}
if (n&  0x2) {
result += 1;
n = n>>  1;
}
if (n&  0x1) {
result += 1;
}
return result;
}

Jerome

Interesting. I don't get how your version is equivalent to mine, and I
don't get how it actually works, but it seems to. Could you please explain?

The program below compiles, runs, and prints

total=100000000; equal=3850865 (3.85087%)

Here it is:

import std.contracts, std.stdio;

uint maxOR(uint maxa, uint maxb) {
return maxa | ((1 << highbit (maxb))-1);
}

uint highbit(uint n) {
auto result = 0;
if (n & 0xFFFF0000) {
result += 16;
n = n >> 16;
}
if (n & 0xFF00) {
result += 8;
n = n >> 8;
}
if (n & 0xF0) {
result += 4;
n = n >> 4;
}
if (n & 0xC) {
result += 2;
n = n >> 2;
}
if (n & 0x2) {
result += 1;
n = n >> 1;
}
if (n & 0x1) {
result += 1;
}
return result;
}

void main() {
ulong total, equal;
uint N = 10000;
foreach (a; 0 .. N) {
foreach (b; 0 .. N) {
auto c = maxOR(a, b);
enforce((a|b) <= c);
if ((a|b) == c) equal++;
total++;
}
}
writeln("total=", total, "; equal=", equal,
" (", equal * 100. / total, "%)");
}

However, your method is not the tightest; seems like mine is tighter.
When I replaced maxOR with this:

uint maxOR2(uint maxa, uint maxb) {
if (maxa < maxb) return maxOR(maxb, maxa);
uint candidate = 0;
uint mask = 1u << 31;
for (uint i = 0; i < 32; ++i, mask >>= 1) {
auto t = candidate | (1 << (31 - i));
if (t <= maxb) candidate = t;
}
return maxa | candidate;
}

I got:

total=100000000; equal=9822516 (9.82252%)

Besides, I verified that my method returns numbers <= yours.

Andrei
```
Apr 10 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 04/10/2010 05:33 PM, Andrei Alexandrescu wrote:
[snip]

One more interesting detail. I simplified the routine to:

uint maxOR(uint maxa, uint maxb) {
uint candidate = 0;
uint mask = 1u << 31;
auto t = candidate | mask;
if (t <= maxb) candidate = t;
}
return maxa | candidate;
}

Among other things I remove the test whether maxa >= maxb at the
beginning of the function. I now obtain:

total=100000000; equal=14585400 (14.5854%)

so this function is better than all others so far. But I don't
understand why it beats this one:

uint maxOR(uint maxa, uint maxb) {
if (maxa < maxb) return maxOR(maxb, maxa); // added
uint candidate = 0;
uint mask = 1u << 31;
auto t = candidate | mask;
if (t <= maxb) candidate = t;
}
return maxa | candidate;
}

Andrei
```
Apr 10 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 04/10/2010 05:43 PM, Andrei Alexandrescu wrote:
On 04/10/2010 05:33 PM, Andrei Alexandrescu wrote:
[snip]

One more interesting detail.

Never mind that. I mistakenly used maxOR instead of maxOR2.

To summarize: best result is

total=100000000; equal=14585400 (14.5854%)

with the function:

uint maxOR(uint maxa, uint maxb) {
uint candidate = 0;
uint mask = 1u << 31;
auto t = candidate | mask;
if (t <= maxb) candidate = t;
}
return maxa | candidate;
}

Inserting a test and a recursive call at the top keeps the result but
actually slows down execution a bit.

Andrei
```
Apr 10 2010
=?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Andrei Alexandrescu wrote:
Interesting. I don't get how your version is equivalent to mine, and I
don't get how it actually works, but it seems to. Could you please expl=

=20

tired. Moreover it doesn't work: it should be:
maxa | ((1 << (highbit (maxb)+1))-1)

Which still isn't as tight as yours but at least works :) Between 0
and 64, it is optimal for 92% cases whereas yours is optimal for
over 98% cases.

void main() {
ulong total, equal;
uint N =3D 10000;
foreach (a; 0 .. N) {
foreach (b; 0 .. N) {
auto c =3D maxOR(a, b);
enforce((a|b) <=3D c);
if ((a|b) =3D=3D c) equal++;
total++;
}
}
writeln("total=3D", total, "; equal=3D", equal,
" (", equal * 100. / total, "%)");
}
=20

accuracy with this test: just define maxOR to be maxA|maxB :) I used
the following C++ code for my tests:

int main (int, char**)
{
int  count      =3D 0;
int  countTight =3D 0;
for (uint32_t minA =3D 0; minA < 64; ++minA)
for (uint32_t maxA =3D minA; maxA < 64; ++maxA)
for (uint32_t minB =3D 0; minB < 64; ++minB)
for (uint32_t maxB =3D minB; maxB < 64; ++maxB)
{
bool reached =3D false;
uint32_t max =3D maxOr (minA, minB, maxA, maxB);
for (uint32_t a =3D minA; a <=3D maxA; ++a)
for (uint32_t b =3D minB; b <=3D maxB; ++b) {
if ((a|b) > max) {
printf ("minA=3D%x\n"
"maxA=3D%x\n"
"minB=3D%x\n"
"maxB=3D%x\n"
"a=3D%x\n"
"b=3D%x\n"
"a|b=3D%x\n"
"maxOr=3D%x\n",
minA, maxA, minB, maxB,
a, b, a|b, max);
exit (1);
}
if ((a|b) =3D=3D max) reached =3D true;
}
if (reached)
++countTight;
++count;
}
printf ("Total: %d\n", count);
printf ("Tight: %d (%g%%)\n",
countTight, countTight * 100. / count);
printf ("Loose:  %d (%g%%)\n",
(count - countTight),
(count - countTight) * 100. / count);
return 0;
}

OTOH, my solution was slightly faster (4.98s to your 5.34s)

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 11 2010
=?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

OK, I found a solution that is optimal in over 99% of the cases
(99.195% to be precise), while still being fast (because it avoids
looping over the bits):

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D8<------------------------------
uint32_t maxOr (uint32_t    minA, uint32_t minB,
uint32_t    maxA, uint32_t maxB)
{
if (maxA =3D=3D 0) return maxB;
if (maxB =3D=3D 0) return maxA;

uint32_t a =3D maxA ^ minA;
if (a !=3D 0)
a =3D ((1 << highbit (a)) - 1);

uint32_t b =3D maxB ^ minB;
if (b !=3D 0)
b =3D ((1 << highbit (b)) - 1);

uint32_t  t =3D maxA & maxB;
if (t !=3D 0)
t =3D ((1 << highbit (t)) - 1);

return min (maxA|maxB|a|b, maxA|maxB|t);
}
------------------------------>8=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Mostly it's black magic ;) and it's **much** simpler than the first
version that reached those performances. The explanation is in the
rest of the message.

I will only  talk  about the ``a``  side  of the problem here   (i.e
assume ``maxB=3D=3DminB``). The ``b`` side is symmetrical.

The idea is to build a value that is  between minA and maxA and will
set as many bits as possible when or'ed with maxB:

- The first thing  we do is to  notice that minA  and  maxA have the
following  structure   (in binary):  ``minA=3DA0x`` and ``maxA=3DA1y``
where  the ``A`` part is  identical.  Any  number between minA and
maxA will therefore be of the form  ``a=3DAz``.  A very conservative
estimate tells  us   that if  we   set  ``z`` to all  ones,   then
``maxB|maxA|z``  will be  greater than  ``maxB|a``  for all ``a``.
This is computed by ``(1 << highbit (maxA ^ minA)) - 1``;

- Another way to look at the  problem is to  say that a ``0`` bit in
maxA cannot be  set  unless some higher  ``1`` bit  is unset.  For
example if maxA is ``0b10`` then if we want  to set bit 0, we need
to unset bit 1 (which will give us ``0b01``).  So by doing this we
get a lower  value for ``a|maxB`` unless  this bit was already set
in maxB. The expression ``maxA & maxB`` gives us  the bits that we
can unset. Therefore only bits that  are less significant than the
high bit of ``maxA & maxB`` may  be set.  This  is stored into the
variable ``t``;

- Either  method  works,  but  neither  is  perfect.  By  taking the
minimum   of the  two  results, we  get  the best   of both worlds
(although still isn't perfect).

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 11 2010
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```ï¿½ wrote:

The idea is to build a value that is  between minA and maxA and will
set as many bits as possible when or'ed with maxB:

The assumption that maxB would be the value that produces the maximum
a|b is not correct. A lower valued b may fill more gaps in the bit
representation of what is calculated from min_a and max_a.

Your function failed for me with the following values:

min_a 00000000000000000000000011001000 000000c8        200
max_a 00000000000000000000001100001111 0000030f        783
min_b 00000000000000000000000001000101 00000045         69
max_b 00000000000000000000001001100001 00000261        609
calculated 00000000000000000000001001100101 00000265        613
WRONG! empirical 00000000000000000000001111111111 000003ff       1023
emp_max_a 00000000000000000000000110011110 0000019e        414
emp_max_b 00000000000000000000001001100001 00000261        609

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=108851

Ali
```
Apr 11 2010
=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
```Content-Type: multipart/mixed;
boundary="------------020801040508020503030409"

This is a multi-part message in MIME format.
--------------020801040508020503030409
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Ali =C3=87ehreli wrote:
=EF=BF=BD wrote:
=20
The idea is to build a value that is  between minA and maxA and will
set as many bits as possible when or'ed with maxB:

The assumption that maxB would be the value that produces the maximum
a|b is not correct. A lower valued b may fill more gaps in the bit
representation of what is calculated from min_a and max_a.
=20
Your function failed for me with the following values:
=20
min_a 00000000000000000000000011001000 000000c8        200
max_a 00000000000000000000001100001111 0000030f        783
min_b 00000000000000000000000001000101 00000045         69
max_b 00000000000000000000001001100001 00000261        609
calculated 00000000000000000000001001100101 00000265        613
WRONG! empirical 00000000000000000000001111111111 000003ff       1023
emp_max_a 00000000000000000000000110011110 0000019e        414
emp_max_b 00000000000000000000001001100001 00000261        609
=20
=20
http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalma=

=20

function! Since the return value from my function includes maxA and
maxB, at least all bits that are set in either of those should be
set in the output.

I've run my code with those input and the result is 3ff as
expected... (See attached source file).

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

--------------020801040508020503030409
Content-Type: text/x-c++src;
name="maxor.cc"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline;
filename="maxor.cc"

#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int highbit (uint32_t n)
{
assert (n !=3D 0);

int  result =3D 0;
if (n & 0xFFFF0000) {
result  +=3D 16;
n      >>=3D 16;
}
if (n & 0xFF00) {
result  +=3D 8;
n      >>=3D 8;
}
if (n & 0xF0) {
result  +=3D 4;
n      >>=3D 4;
}
if (n & 0xC) {
result  +=3D 2;
n      >>=3D 2;
}
if (n & 0x2) {
result  +=3D 1;
n      >>=3D 1;
}
return result;
}

inline uint32_t min (uint32_t a, uint32_t b)
{
return (a<b) ? a : b;
}

uint32_t maxOr (uint32_t    minA, uint32_t minB,
uint32_t    maxA, uint32_t maxB)
{
assert (minA <=3D maxA);
assert (minB <=3D maxB);

if (maxA =3D=3D 0) return maxB;
if (maxB =3D=3D 0) return maxA;

uint32_t a =3D maxA ^ minA;
if (a !=3D 0)
a =3D ((1 << highbit (a)) - 1);

uint32_t b =3D maxB ^ minB;
if (b !=3D 0)
b =3D ((1 << highbit (b)) - 1);

uint32_t  t =3D maxA & maxB;
if (t !=3D 0)
t =3D ((1 << highbit (t)) - 1);

return min (maxA|maxB|a|b, maxA|maxB|t);
}

int main (int, char**)
{
uint32_t minA =3D 200;
uint32_t maxA =3D 783;
uint32_t minB =3D 69;
uint32_t maxB =3D 609;
printf ("minA=3D%x\n"
"maxA=3D%x\n"
"minB=3D%x\n"
"maxB=3D%x\n"
"maxOr=3D%x\n",
minA, maxA, minB, maxB,
maxOr (minA, minB, maxA, maxB));
return 0;
}

--------------020801040508020503030409--
```
Apr 11 2010
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```JÃ©rÃ´me M. Berger wrote:
Ali Ã‡ehreli wrote:
ï¿½ wrote:

The idea is to build a value that is  between minA and maxA and will
set as many bits as possible when or'ed with maxB:

a|b is not correct. A lower valued b may fill more gaps in the bit
representation of what is calculated from min_a and max_a.

Your function failed for me with the following values:

min_a 00000000000000000000000011001000 000000c8        200
max_a 00000000000000000000001100001111 0000030f        783
min_b 00000000000000000000000001000101 00000045         69
max_b 00000000000000000000001001100001 00000261        609
calculated 00000000000000000000001001100101 00000265        613
WRONG! empirical 00000000000000000000001111111111 000003ff       1023
emp_max_a 00000000000000000000000110011110 0000019e        414
emp_max_b 00000000000000000000001001100001 00000261        609

function!

My mistake: It was your function but the highbit() that I used was not
correct. The one I used was returning the *value* of the highest bit.

Since the return value from my function includes maxA and
maxB, at least all bits that are set in either of those should be
set in the output.

I've run my code with those input and the result is 3ff as
expected... (See attached source file).

Perhaps my test code is wrong; but I can't find my error. :/

Your function reports 0b_111 for these set of values:

min_a = 5;
max_a = 6;
min_b = 4;
max_b = 4;

But the maximum value of a|b is 6.

Ali
```
Apr 11 2010
=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Ali =C3=87ehreli wrote:
J=C3=A9r=C3=B4me M. Berger wrote:
Ali =C3=87ehreli wrote:
=EF=BF=BD wrote:

The idea is to build a value that is  between minA and maxA and will=

set as many bits as possible when or'ed with maxB:

a|b is not correct. A lower valued b may fill more gaps in the bit
representation of what is calculated from min_a and max_a.

Your function failed for me with the following values:

min_a 00000000000000000000000011001000 000000c8        200=

max_a 00000000000000000000001100001111 0000030f        783=

min_b 00000000000000000000000001000101 00000045         69=

max_b 00000000000000000000001001100001 00000261        609=

calculated 00000000000000000000001001100101 00000265        613=

WRONG! empirical 00000000000000000000001111111111 000003ff       1023=

emp_max_a 00000000000000000000000110011110 0000019e        414=

emp_max_b 00000000000000000000001001100001 00000261        609=

=20

function!

My mistake: It was your function but the highbit() that I used was not
correct. The one I used was returning the *value* of the highest bit.
=20
Since the return value from my function includes maxA and
maxB, at least all bits that are set in either of those should be
set in the output.

I've run my code with those input and the result is 3ff as
expected... (See attached source file).

Perhaps my test code is wrong; but I can't find my error. :/
=20
Your function reports 0b_111 for these set of values:
=20
min_a =3D 5;
max_a =3D 6;
min_b =3D 4;
max_b =3D 4;
=20
But the maximum value of a|b is 6.
=20

optimal result for over 99% of the cases but give a slightly higher
value for the rest. I'll post a new version that's 100% precise in a
few minutes.

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 12 2010
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```JÃ©rÃ´me M. Berger wrote:

Your function reports 0b_111 for these set of values:

min_a = 5;
max_a = 6;
min_b = 4;
max_b = 4;

But the maximum value of a|b is 6.

optimal result for over 99% of the cases but give a slightly higher
value for the rest.

Now I see my problem: I took Andrei's puzzle literally. The implication
of _compiler_implementation_ was too subtle for me.

That's the reason why none of my "correctly inaccurate" solutions has
been posted to this thread. :)

Ali
```
Apr 12 2010
=?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
```Content-Type: multipart/mixed;
boundary="------------010309040703050906000802"

This is a multi-part message in MIME format.
--------------010309040703050906000802
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

J=E9r=F4me M. Berger wrote:
OK, I found a solution that is optimal in over 99% of the cases
(99.195% to be precise), while still being fast (because it avoids
looping over the bits):
=20

0..299 without checking, just for the timing. On my computer (Athlon
X2 64  2GHz), my function took 50s and Andrei's took 266s. The
difference should be even bigger for larger numbers.

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

--------------010309040703050906000802
Content-Type: text/x-c++src;
name="maxor.cc"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline;
filename="maxor.cc"

#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#define N   300
// Comment the following line to get Andrei's implementation
#define MINE
// Comment the following line for exhaustive checking (slow! don't
// forget to reduce N)
#define BENCH

int highbit (uint32_t n)
{
assert (n !=3D 0);

int  result =3D 0;
if (n & 0xFFFF0000) {
result  +=3D 16;
n      >>=3D 16;
}
if (n & 0xFF00) {
result  +=3D 8;
n      >>=3D 8;
}
if (n & 0xF0) {
result  +=3D 4;
n      >>=3D 4;
}
if (n & 0xC) {
result  +=3D 2;
n      >>=3D 2;
}
if (n & 0x2) {
result  +=3D 1;
n      >>=3D 1;
}
return result;
}

inline uint32_t min (uint32_t a, uint32_t b)
{
return (a<b) ? a : b;
}

#ifdef MINE
uint32_t maxOr (uint32_t    minA, uint32_t minB,
uint32_t    maxA, uint32_t maxB)
{
assert (minA <=3D maxA);
assert (minB <=3D maxB);

if (maxA =3D=3D 0) return maxB;
if (maxB =3D=3D 0) return maxA;

uint32_t a =3D maxA ^ minA;
if (a !=3D 0)
a =3D ((1 << highbit (a)) - 1);

uint32_t b =3D maxB ^ minB;
if (b !=3D 0)
b =3D ((1 << highbit (b)) - 1);

uint32_t  t =3D maxA & maxB;
if (t !=3D 0)
t =3D ((1 << highbit (t)) - 1);

return min (maxA|maxB|a|b, maxA|maxB|t);
}
#else                           // Andrei's
uint32_t maxOr (uint32_t, uint32_t, uint32_t maxa, uint32_t maxb)
{
uint32_t candidate =3D 0;
uint32_t mask =3D 1u << 31;
uint32_t t =3D candidate | mask;
if (t <=3D maxb) candidate =3D t;
}
return maxa | candidate;
}
#endif

int main (int, char**)
{
int  count      =3D 0;
int  countTight =3D 0;
for (uint32_t minA =3D 0; minA < N; ++minA)
for (uint32_t maxA =3D minA; maxA < N; ++maxA)
for (uint32_t minB =3D 0; minB < N; ++minB)
for (uint32_t maxB =3D minB; maxB < N; ++maxB)
{
bool reached =3D false;
uint32_t max =3D maxOr (minA, minB, maxA, maxB);
#ifdef BENCH
count +=3D max;
#else  // Check
for (uint32_t a =3D minA; a <=3D maxA; ++a)
for (uint32_t b =3D minB; b <=3D maxB; ++b) {
if ((a|b) > max) {
printf ("minA=3D%x\n"
"maxA=3D%x\n"
"minB=3D%x\n"
"maxB=3D%x\n"
"a=3D%x\n"
"b=3D%x\n"
"a|b=3D%x\n"
"maxOr=3D%x\n",
minA, maxA, minB, maxB,
a, b, a|b, max);
exit (1);
}
if ((a|b) =3D=3D max) reached =3D true;
}
if (reached)
++countTight;
++count;
#endif
}
#ifdef BENCH
printf ("Result: %d (this number doesn't mean anything, it is only her=
e\n"
"to make sure the compiler doesn't optimize everything away...=
\n",
count);
#else
printf ("Total: %d\n", count);
printf ("Tight: %d (%g%%)\n",
countTight, countTight * 100. / count);
printf ("Loose:  %d (%g%%)\n",
(count - countTight),
(count - countTight) * 100. / count);
#endif
return 0;
}

--------------010309040703050906000802--
```
Apr 11 2010
Steven Schveighoffer <schveiguy yahoo.com> writes:
```Jérôme M. Berger Wrote:

Jérôme M. Berger wrote:
OK, I found a solution that is optimal in over 99% of the cases
(99.195% to be precise), while still being fast (because it avoids
looping over the bits):

0..299 without checking, just for the timing. On my computer (Athlon
X2 64  2GHz), my function took 50s and Andrei's took 266s. The
difference should be even bigger for larger numbers.

Can I ask, what is the point of having a non-exact solution (unless you are
just doing this for fun)?

We are talking about range propagation, a function of the compiler, not a
function of the compiled program.  Therefore, slower but more exact functions
should be preferred, since they only affect the compile time.  Note that you
will only need to do this range propagation on lines that "or" two values
together, and something that reduces the runtime of the compiler by 216
seconds, but only when compiling enough code to have over 8 billion 'or'
operations in it (300^4), I don't think is really that important.  Give me the
exact solution that prevents me from having to cast when the compiler can prove
it, even if the runtime of the compiler is slightly slower.

Just my \$.02

-Steve
```
Apr 11 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 04/11/2010 09:18 PM, Adam D. Ruppe wrote:
On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:
We are talking about range propagation, a function of the compiler, not a
function of the compiled program.  Therefore, slower but more exact functions
should be preferred, since they only affect the compile time.

I agree with this. While my solution has a certain beauty to it and is
fast, your solution accurately covers all the bases, whereas mine fails with
min>  1. I say we go with your's.

Agreed. Steve's solution is the best we have for unsigned numbers.

Andrei
```
Apr 12 2010
=?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Andrei Alexandrescu wrote:
On 04/11/2010 09:18 PM, Adam D. Ruppe wrote:
On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:
We are talking about range propagation, a function of the compiler,
not a function of the compiled program.  Therefore, slower but more
exact functions should be preferred, since they only affect the
compile time.

I agree with this. While my solution has a certain beauty to it and is=

fast, your solution accurately covers all the bases, whereas mine
fails with
min>  1. I say we go with your's.

Agreed. Steve's solution is the best we have for unsigned numbers.
=20
Andrei
=20

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 12 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 04/12/2010 12:57 PM, "Jérôme M. Berger" wrote:
Andrei Alexandrescu wrote:
On 04/11/2010 09:18 PM, Adam D. Ruppe wrote:
On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:
We are talking about range propagation, a function of the compiler,
not a function of the compiled program.  Therefore, slower but more
exact functions should be preferred, since they only affect the
compile time.

I agree with this. While my solution has a certain beauty to it and is
fast, your solution accurately covers all the bases, whereas mine
fails with
min>   1. I say we go with your's.

Agreed. Steve's solution is the best we have for unsigned numbers.

Andrei

Jerome

Looks great, thanks Jerome!

Now, how about that case in which some or all of the ranges involved
include negative values? A simple approach is to define them in terms of
ranges for unsigned numbers. Here are the cases I identified:

a) If both ranges cross zero, then:

minOR == setmsb(min(
minOR(nomsb(min_a), nomsb(-1), nomsb(min_b), nomsb(max_b)),
minOR(nomsb(min_a), nomsb(max_a), nomsb(min_b), nomsb(-1))))
maxOR == maxOR(0, max_a, 0, max_b)

b) If both ranges are negative, then:

minOR == setmsb(minOR(nomsb(max_a), nomsb(min_a), nomsb(max_b),
nomsb(min_b)))
maxOR == setmsb(maxOR(nomsb(max_a), nomsb(min_a), nomsb(max_b),
nomsb(min_b)))

c) If a crosses zero and b doesn't:

minOR == setmsb(minOR(nomsb(min_a), nomsb(-1), min_b, max_b))
maxOR == maxOR(0, max_a, min_b, max_b)

The primitive nomsb clears off the sign bit in a number, and setmsb sets it.

Makes sense?

Andrei
```
Apr 16 2010
Don <nospam nospam.com> writes:
```Andrei Alexandrescu wrote:
On 04/12/2010 12:57 PM, "Jérôme M. Berger" wrote:
Andrei Alexandrescu wrote:
On 04/11/2010 09:18 PM, Adam D. Ruppe wrote:
On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:
We are talking about range propagation, a function of the compiler,
not a function of the compiled program.  Therefore, slower but more
exact functions should be preferred, since they only affect the
compile time.

I agree with this. While my solution has a certain beauty to it and is
fast, your solution accurately covers all the bases, whereas mine
fails with
min>   1. I say we go with your's.

Agreed. Steve's solution is the best we have for unsigned numbers.

Andrei

Jerome

Looks great, thanks Jerome!

Now, how about that case in which some or all of the ranges involved
include negative values? A simple approach is to define them in terms of
ranges for unsigned numbers. Here are the cases I identified:

a) If both ranges cross zero, then:

minOR == setmsb(min(
minOR(nomsb(min_a), nomsb(-1), nomsb(min_b), nomsb(max_b)),
minOR(nomsb(min_a), nomsb(max_a), nomsb(min_b), nomsb(-1))))
maxOR == maxOR(0, max_a, 0, max_b)

b) If both ranges are negative, then:

minOR == setmsb(minOR(nomsb(max_a), nomsb(min_a), nomsb(max_b),
nomsb(min_b)))
maxOR == setmsb(maxOR(nomsb(max_a), nomsb(min_a), nomsb(max_b),
nomsb(min_b)))

c) If a crosses zero and b doesn't:

minOR == setmsb(minOR(nomsb(min_a), nomsb(-1), min_b, max_b))
maxOR == maxOR(0, max_a, min_b, max_b)

The primitive nomsb clears off the sign bit in a number, and setmsb sets
it.

Makes sense?

Andrei

A comment: The way DMD currently does range propagation (storing range
as union{long, ulong}) is wrong. Currently when mixed signed+-unsigned
operations happen, it gives very pessimistic results.
The sign bit should be stored separately.

sign value
1    1xxxxx    Negative, long.min..-1
1    0xxxxx    (Never occurs)
0    0xxxxx    0..long.max
0    1xxxxx    long.max+1..ulong.max

I'm not sure if that makes this case a little more, or a little less
complicated. But this defines the problem a bit better.
```
Apr 20 2010
=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
J=EF=BF=BDr=EF=BF=BDme M. Berger Wrote:
=20
J=EF=BF=BDr=EF=BF=BDme M. Berger wrote:
OK, I found a solution that is optimal in over 99% of the cases
(99.195% to be precise), while still being fast (because it avoids
looping over the bits):

0..299 without checking, just for the timing. On my computer (Athlon
X2 64  2GHz), my function took 50s and Andrei's took 266s. The
difference should be even bigger for larger numbers.

Can I ask, what is the point of having a non-exact solution (unless you=

=20
We are talking about range propagation, a function of the compiler, not=

nctions should be preferred, since they only affect the compile time.  No=
te that you will only need to do this range propagation on lines that "or=
" two values together, and something that reduces the runtime of the comp=
iler by 216 seconds, but only when compiling enough code to have over 8 b=
illion 'or' operations in it (300^4), I don't think is really that import=
ant.  Give me the exact solution that prevents me from having to cast whe=
n the compiler can prove it, even if the runtime of the compiler is sligh=
tly slower.
=20

speed of the D compiler to that of the Go compiler...

We are talking about range propagation, a function of the compiler,
not a function of the compiled program. Since we can't get a 100%
accurate representation of the possible values anyway (even yours
might leave holes in the middle after all), then it might be better
to choose a faster, slightly less precise algorithm if the
difference is not too great. That's the difference between a
compiler and a full-fledged static code analysis an program prover.

Anyway, the point is moot, I have a new version of my algorithm
with 100% precision and high speed.

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 12 2010
=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
On Mon, 12 Apr 2010 13:45:14 -0400, J=C3=A9r=C3=B4me M. Berger <jeberge=

wrote:
=20
We are talking about range propagation, a function of the compiler=

not a function of the compiled program. Since we can't get a 100%
accurate representation of the possible values anyway (even yours
might leave holes in the middle after all), then it might be better
to choose a faster, slightly less precise algorithm if the
difference is not too great. That's the difference between a
compiler and a full-fledged static code analysis an program prover.

When we're talking about the difference between O(1) and O(lgn), I'll
take accuracy over speed in my compiler any day.  The solution should b=

100% accurate for the problem statement.  If the problem statement is
not what we need, then we need a new problem statement :)  Solving the
problem statement for 99% of values is not good enough.
=20

a minimal loss of accuracy, which will you take? Especially if the
accuracy loss is less than is lost elsewhere (due to holes in the
ranges).

Anyway, the point is moot, I have a new version of my algorithm
with 100% precision and high speed.

Yes, I'm still trying to understand how it works :)
=20

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 12 2010
Steven Schveighoffer <schveiguy yahoo.com> writes:
```JÃ©rÃ´me M. Berger Wrote:

Steven Schveighoffer wrote:
When we're talking about the difference between O(1) and O(lgn), I'll
take accuracy over speed in my compiler any day.

a minimal loss of accuracy, which will you take? Especially if the
accuracy loss is less than is lost elsewhere (due to holes in the
ranges).

Really?  You rebuilt the compiler with your range propagation algorithm and
verified that it adds 10 seconds versus an accurate one that adds 55s?  How
much time did the compiler spend to compile?  I'd hazard to guess that a code
base that adds 10s worth of your algorithm takes at least a few hours to
compile.  Is 55s that bad at that point?

Again, if it takes the compiler an extra insignificant amount of time to be
more accurate, I'm all for accuracy over speed when you get down to that level
of insignificance.  I'd say the point of pain has to be at least 10% of the
compile time before it makes any sizable difference.

-Steve
```
Apr 12 2010
=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
J=C3=A9r=C3=B4me M. Berger Wrote:
=20
Steven Schveighoffer wrote:
When we're talking about the difference between O(1) and O(lgn), I'll=

take accuracy over speed in my compiler any day.

a minimal loss of accuracy, which will you take? Especially if the
accuracy loss is less than is lost elsewhere (due to holes in the
ranges).

Really?  You rebuilt the compiler with your range propagation algorithm=

s?  How much time did the compiler spend to compile?  I'd hazard to guess=
that a code base that adds 10s worth of your algorithm takes at least a =
few hours to compile.  Is 55s that bad at that point?
=20
Again, if it takes the compiler an extra insignificant amount of time t=

that level of insignificance.  I'd say the point of pain has to be at lea=
st 10% of the compile time before it makes any sizable difference.
=20

times slower just because it brings extra precision which you may
not really need, then you will wind up with a compiler that is 5 to
6 times slower than it needs to be. Sure the difference on *one*
function is not great in absolute terms, but if you make the same
choice for *all* functions, then where do you go?

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 13 2010
=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
In my opinion?  Yes, slower compilers that make code easier to write ar=

better.  I don't spend lots of time compiling, I spend it writing code.=

And I don't need to babysit the compiler, it goes off and does its thin=

=20
Performance is only important in the end result.  I'm not saying I want=

my compiler to be slow, but I want it to be accurate and useful more
than I want it to be quick.  In this case, there is a quick and fully
accurate solution, so it doesn't matter.  But, for instance, if the
compiler could do a full analysis to check if variables escape their
scope, and that makes the compiler 5x slower, then I'd rather have the
compiler verify my work instead of quickly producing memory-corrupting
code.
=20

of this kind of tools, but I don't want to have to put up with their
slowness every time I compile something. Having a quick
write-compile-test cycle is very important IMO. Then when I have a
rough outline of the program, I can run another tool (or adding a
command line option) to enable extra checking (and then it doesn't
matter if the tool takes the whole night to run and I get the
results the next morning when I get to work).

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 13 2010
bearophile <bearophileHUGS lycos.com> writes:
```Fawzi Mohamed:
I guess that I am missing something obvious, so I don't see the reason
for range propagation, but maybe I am not the only one, so thanks for
an explanation...

Range propagation can improve the situation a little, so it's not bad. But it
can't replace proper integral overflows. You need integral overflows at runtime.

Bye,
bearophile
```
Apr 13 2010
Don <nospam nospam.com> writes:
```Fawzi Mohamed wrote:

On 12-apr-10, at 21:40, Steven Schveighoffer wrote:

On Mon, 12 Apr 2010 13:45:14 -0400, JÃ©rÃ´me M. Berger
<jeberger free.fr> wrote:

Steven Schveighoffer wrote:
Jï¿½rï¿½me M. Berger Wrote:

Jï¿½rï¿½me M. Berger wrote:
OK, I found a solution that is optimal in over 99% of the cases
(99.195% to be precise), while still being fast (because it avoids
looping over the bits):

pairs in
0..299 without checking, just for the timing. On my computer (Athlon
X2 64  2GHz), my function took 50s and Andrei's took 266s. The
difference should be even bigger for larger numbers.

Can I ask, what is the point of having a non-exact solution (unless
you are just doing this for fun)?

We are talking about range propagation, a function of the compiler,
not a function of the compiled program.  Therefore, slower but more
exact functions should be preferred, since they only affect the
compile time.  Note that you will only need to do this range
propagation on lines that "or" two values together, and something
that reduces the runtime of the compiler by 216 seconds, but only
when compiling enough code to have over 8 billion 'or' operations in
it (300^4), I don't think is really that important.  Give me the
exact solution that prevents me from having to cast when the
compiler can prove it, even if the runtime of the compiler is
slightly slower.

speed of the D compiler to that of the Go compiler...

My point was simply that the amount of time saved is relative to the
size of the program being compiled.  If we assume conservatively that
every other line in a program has bitwise or in it, then you are
talking about a 16 billion line program, meaning the 216 seconds you
save is insignificant compared to the total compile time.  My point
was that your fast solution that is inaccurate is not preferable
because nobody notices the speed, they just notice the inaccuracy.

We are talking about range propagation, a function of the compiler,
not a function of the compiled program. Since we can't get a 100%
accurate representation of the possible values anyway (even yours
might leave holes in the middle after all), then it might be better
to choose a faster, slightly less precise algorithm if the
difference is not too great. That's the difference between a
compiler and a full-fledged static code analysis an program prover.

When we're talking about the difference between O(1) and O(lgn), I'll
take accuracy over speed in my compiler any day.  The solution should
be 100% accurate for the problem statement.  If the problem statement
is not what we need, then we need a new problem statement :)  Solving
the problem statement for 99% of values is not good enough.

Anyway, the point is moot, I have a new version of my algorithm
with 100% precision and high speed.

Yes, I'm still trying to understand how it works :)

Sorry for the probably stupid question, but I don't understand much the
need of all this range propagation, in the compiler either you have a a
constant (and then you have the value at compile time, and you don't
need any range propagation, you can compare with the value), or you have
a runtime value.

It's been part of DMD2 for a while now. It allows you to do things like:

ubyte lowbits(int x)
{
return x & 7;
}

without an explicit cast. The compiler knows that x&7 can safely fit
inside a single byte.  Whereas   ((x&7) << 12) | (x&3);
does not fit, and requires an explicit cast.
```
Apr 13 2010
=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Fawzi Mohamed wrote:
=20
On 13-apr-10, at 12:02, Don wrote:
It's been part of DMD2 for a while now. It allows you to do things lik=

ubyte lowbits(int x)
{
return x & 7;
}

without an explicit cast. The compiler knows that x&7 can safely fit
inside a single byte.  Whereas   ((x&7) << 12) | (x&3);
does not fit, and requires an explicit cast.

ah ok I understand, I thought that was treated like x & cast(ubyte)7 ,
and so as comparison of the compile time value with the ranges of ubyte=

(no range propagation needed).
But I can understand why it is treated as cast(ubyte)((cast(int)x) & 7)=

as it is probably easier for the compiler, as it upcasts by default.
=20
Thanks for the explanation.
=20

don't need the cast to "int" in your second example, but you still
need range propagation in the first...

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 13 2010
=?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Here's a fast 100% precise solution:

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D8<------------------------------
uint32_t maxOr (uint32_t minA, uint32_t minB,
uint32_t maxA, uint32_t maxB)
{
assert (minA <=3D maxA);
assert (minB <=3D maxB);

if (maxA =3D=3D 0) return maxB;
if (maxB =3D=3D 0) return maxA;

uint32_t a =3D maxA ^ minA;
if (a !=3D 0) {
a =3D ((1 << (highbit (a)+1)) - 1) & maxA & maxB;
if (a !=3D 0)
a =3D (1 << highbit (a)) - 1;
}

uint32_t b =3D maxB ^ minB;
if (b !=3D 0) {
b =3D ((1 << (highbit (b)+1)) - 1) & maxA & maxB;
if (b !=3D 0)
b =3D (1 << highbit (b)) - 1;
}

return maxA|maxB|a|b;
}
------------------------------>8=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 12 2010
=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
Fails for test case:
=20
minA =3D 4, maxA =3D 4, minB =3D 4, maxB =3D 6 (outputs 7, accurate res=

=20

combinations in the [0, 63] range, so if there are mistakes they
have to be outside that range...

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 12 2010
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```JÃ©rÃ´me M. Berger wrote:
Steven Schveighoffer wrote:
Fails for test case:

minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result is 6).

combinations in the [0, 63] range, so if there are mistakes they
have to be outside that range...

Jerome

I confirm. Jerome's code passes my randomized test.

Ali
```
Apr 12 2010
Steven Schveighoffer <schveiguy yahoo.com> writes:
```JÃ©rÃ´me M. Berger Wrote:

Steven Schveighoffer wrote:
Fails for test case:

minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result is 6).

combinations in the [0, 63] range, so if there are mistakes they
have to be outside that range...

I'll have to double check, I thought I copied your code verbatim (I did switch
around the arguments to be minA, maxA, minB, maxB to fit my test harness, and I
changed the uint_32_t to uint).  I'll check tomorrow at work where the computer
I used to test is.

-Steve
```
Apr 12 2010
Steven Schveighoffer <schveiguy yahoo.com> writes:
```Steven Schveighoffer Wrote:

I'll have to double check, I thought I copied your code verbatim (I did switch
around the arguments to be minA, maxA, minB, maxB to fit my test harness, and I
changed the uint_32_t to uint).  I'll check tomorrow at work where the computer
I used to test is.

I think I see the issue, I was using an older version of your highbit, at some
point you changed it, but didn't repeat it in this solution, so I searched for
it on the newsgroup and found your first version.  That differs from later
versions you posted, but I can't find where you identified there was a bug.

I'll test again tomorrow, but it looks like that probably accounts for the
problem.

The version I was using:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=108788

The later version: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=108855

It looks like in that second post that Ali had the same problem I did.

Next time, you should include your whole code each time, especially parts you
changed.

-Steve
```
Apr 12 2010
=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
Steven Schveighoffer Wrote:
=20
I'll have to double check, I thought I copied your code verbatim (I di=

harness, and I changed the uint_32_t to uint).  I'll check tomorrow at w=
ork where the computer I used to test is.

I think I see the issue, I was using an older version of your highbit, =

searched for it on the newsgroup and found your first version.  That dif=
fers from later versions you posted, but I can't find where you identifie=
d there was a bug.
=20
I'll test again tomorrow, but it looks like that probably accounts for =

=20
The version I was using: http://www.digitalmars.com/webnews/newsgroups.=

=20
The later version: http://www.digitalmars.com/webnews/newsgroups.php?ar=

=20
It looks like in that second post that Ali had the same problem I did.
=20
Next time, you should include your whole code each time, especially par=

=20

for the C<->D translation). Sorry about that.

Note that there is no bug: both versions are correct. The first one
uses 1-based indexes, while the second uses 0-based indexes an
processes 0 differently.

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 13 2010
Don <nospam nospam.com> writes:
```Steven Schveighoffer wrote:
On Mon, 12 Apr 2010 22:37:00 -0400, Steven Schveighoffer
<schveiguy yahoo.com> wrote:

Steven Schveighoffer wrote:
Fails for test case:

minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result

combinations in the [0, 63] range, so if there are mistakes they
have to be outside that range...

I'll have to double check, I thought I copied your code verbatim (I
did switch around the arguments to be minA, maxA, minB, maxB to fit my
test harness, and I changed the uint_32_t to uint).  I'll check
tomorrow at work where the computer I used to test is.

I confirm, with the updated highbit, your solution works.

is very neat BTW) I came up with a one-liner.  Enjoy :)

uint maxor(uint mina, uint maxa, uint minb, uint maxb)
{
return maxa | maxb | ((1 << highbit(((1 << (max(highbit(mina ^
maxa), highbit(minb ^ maxb)) + 1)) - 1) & maxa & maxb)) - 1);
}

Anyone want to do the equivalent minor?  I've spent way too much time on
this :)

-Steve

Awesome stuff, guys.
I think that's worthy of inclusion as an exercise in Knuth's Volume 4,
fascile 1. Suggest it to someone at ACCU?
(Knuth has highbit(x) but calls it lambda(x). He notes the result that
highbit(x)==highbit(y) iff x^y <= x&y).
```
Apr 13 2010
bearophile <bearophileHUGS lycos.com> writes:
```Steven Schveighoffer:
I never would have thought of doing a binary search for the high bit, it
is definitely a novel idea to me :)

You can learn something from this page (it can also be useful for the
translation to C++):
http://graphics.stanford.edu/~seander/bithacks.html

Bye,
bearophile
```
Apr 13 2010
Clemens <eriatarka84 gmail.com> writes:
```Adam Ruppe Wrote:

Jerome's highbit function is the same as std.intrinsic.bsr. I wonder
which is faster?

I ran a test, and for 100 million iterations (1..10000000), the
intrinsic beat out his function be a mere 300 milliseconds on my box.
- highbit ran in an average of 1897 ms and bsr did the same in an
average if 1534.

Recompiling with -inline -O -release cuts the raw numbers about in
overhead amounts for a fair amount of the percentage instead of actual
implementation. The new averages are 1134 and 853.

That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd
sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I
guess is the reason it's an intrinsic in the first place). I would have
expected this to be much, much faster than a user function. Anyone care enough
to check the generated assembly?

[1] http://www.itis.mn.it/linux/quarta/x86/bsr.htm

-- Clemens
```
Apr 13 2010
Don <nospam nospam.com> writes:
```Adam D. Ruppe wrote:
On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:
That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd
sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I
guess is the reason it's an intrinsic in the first place). I would have
expected this to be much, much faster than a user function. Anyone care enough
to check the generated assembly?

The opcode is fairly slow anyway (as far as opcodes go) - odds are the
implementation inside the processor is similar to Jerome's method, and
the main savings come from it loading fewer bytes into the pipeline.

I remember a line from a blog, IIRC it was the author of the C++ FQA
writing it, saying hardware and software are pretty much the same thing -
moving an instruction to hardware doesn't mean it will be any faster,
since it is the same algorithm, just done in processor microcode instead of
user opcodes.

inlining max().
```
Apr 13 2010
Don <nospam nospam.com> writes:
```Don wrote:
On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:
That's strange. Looking at src/backend/cod4.c, function cdbscan, in
the dmd sources, bsr seems to be implemented in terms of the bsr
opcode [1] (which I guess is the reason it's an intrinsic in the
first place). I would have expected this to be much, much faster than
a user function. Anyone care enough to check the generated assembly?

The opcode is fairly slow anyway (as far as opcodes go) - odds are the
implementation inside the processor is similar to Jerome's method, and
the main savings come from it loading fewer bytes into the pipeline.

I remember a line from a blog, IIRC it was the author of the C++ FQA
writing it, saying hardware and software are pretty much the same thing -
moving an instruction to hardware doesn't mean it will be any faster,
since it is the same algorithm, just done in processor microcode
user opcodes.

inlining max().

Specifically, bsr is 7 uops on AMD, 1 uop on Intel since the original
Pentium. AMD's performance is shameful.

And bsr() is supported in the compiler; in fact DMC uses it extensively,
which is why it's included in DMD!
```
Apr 13 2010
=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
```Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
using your highbit function, which is very neat BTW
=20

which comes up regularly in programming discussions and newsgroups...

uint maxor(uint mina, uint maxa, uint minb, uint maxb)
{
return maxa | maxb | ((1 << highbit(((1 << (max(highbit(mina ^
maxa), highbit(minb ^ maxb)) + 1)) - 1) & maxa & maxb)) - 1);
}
=20

Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
```
Apr 13 2010
Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
```is it just me, or does this fail for

minA = 0
minB = 0
maxA = 0x80_00_00_00
maxB = 0x80_00_00_00

I'm pretty sure you need to handle 1 << 32 specially
```
Dec 04 2010
Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
```On 12/04/2010 01:21 PM, Ellery Newcomer wrote:
is it just me, or does this fail for

minA = 0
minB = 0
maxA = 0x80_00_00_00
maxB = 0x80_00_00_00

I'm pretty sure you need to handle 1<<  32 specially

i'm thinking this is better:

uint32_t maxOr (uint32_t minA, uint32_t minB,
uint32_t maxA, uint32_t maxB)
{
assert (minA <= maxA);
assert (minB <= maxB);

if (maxA == 0) return maxB;
if (maxB == 0) return maxA;

uint32_t a = maxA ^ minA;
if (a != 0) {
a |= (1 << highbit (a)) - 1;
a = a & maxA & maxB;
if (a != 0)
a = (1 << highbit (a)) - 1;
}

uint32_t b = maxB ^ minB;
if (b != 0) {
b |= (1 << highbit (b)) - 1;
b = b & maxA & maxB;
if (b != 0)
b = (1 << highbit (b)) - 1;
}

return maxA|maxB|a|b;
}
```
Dec 04 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Tue, Apr 13, 2010 at 10:52:14AM -0400, Steven Schveighoffer wrote:
Just a note, this code should be runnable in C++, because the compiler is
written in C++.  Is bsr available there?

I just assumed since it was in D that it was probably
in DMC too, but I can't find it in the docs, so maybe not.
Well, there's always inline asm :D

But 300 milliseconds over 100 million iterations is
negligible anyway.

--
http://arsdnet.net
```
Apr 13 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:
That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd
sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I
guess is the reason it's an intrinsic in the first place). I would have
expected this to be much, much faster than a user function. Anyone care enough
to check the generated assembly?

The opcode is fairly slow anyway (as far as opcodes go) - odds are the
implementation inside the processor is similar to Jerome's method, and
the main savings come from it loading fewer bytes into the pipeline.

I remember a line from a blog, IIRC it was the author of the C++ FQA
writing it, saying hardware and software are pretty much the same thing -
moving an instruction to hardware doesn't mean it will be any faster,
since it is the same algorithm, just done in processor microcode instead of
user opcodes.

--
http://arsdnet.net
```
Apr 13 2010
```Jerome's highbit function is the same as std.intrinsic.bsr. I wonder
which is faster?

I ran a test, and for 100 million iterations (1..10000000), the
intrinsic beat out his function be a mere 300 milliseconds on my box.
- highbit ran in an average of 1897 ms and bsr did the same in an
average if 1534.

Recompiling with -inline -O -release cuts the raw numbers about in
overhead amounts for a fair amount of the percentage instead of actual
implementation. The new averages are 1134 and 853.

I've gotta say, your implementation of the function is better than I
would have done though. Without bsr, I probably would have looped,
shifted, and tested each individual bit...

On 4/13/10, Steven Schveighoffer <schveiguy yahoo.com> wrote:
On Mon, 12 Apr 2010 22:37:00 -0400, Steven Schveighoffer
<schveiguy yahoo.com> wrote:

J=C3=A9r=C3=B4me M. Berger Wrote:

Steven Schveighoffer wrote:
Fails for test case:

minA =3D 4, maxA =3D 4, minB =3D 4, maxB =3D 6 (outputs 7, accurate r=

6).

combinations in the [0, 63] range, so if there are mistakes they
have to be outside that range...

I'll have to double check, I thought I copied your code verbatim (I did
switch around the arguments to be minA, maxA, minB, maxB to fit my test
harness, and I changed the uint_32_t to uint).  I'll check tomorrow at
work where the computer I used to test is.

I confirm, with the updated highbit, your solution works.

Also, inspired by your solution (and using your highbit function, which i=

very neat BTW) I came up with a one-liner.  Enjoy :)

uint maxor(uint mina, uint maxa, uint minb, uint maxb)
{
return maxa | maxb | ((1 << highbit(((1 << (max(highbit(mina ^ maxa)=

highbit(minb ^ maxb)) + 1)) - 1) & maxa & maxb)) - 1);
}

Anyone want to do the equivalent minor?  I've spent way too much time on
this :)

-Steve

```
Apr 13 2010
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
```On Tue, 13 Apr 2010 09:56:34 -0400, Adam Ruppe <destructionator gmail.com>
wrote:

Jerome's highbit function is the same as std.intrinsic.bsr. I wonder
which is faster?

Just a note, this code should be runnable in C++, because the compiler is
written in C++.  Is bsr available there?

Recompiling with -inline -O -release cuts the raw numbers about in
overhead amounts for a fair amount of the percentage instead of actual
implementation. The new averages are 1134 and 853.

Probably the inlining accounts for the savings here.  Calling a function
is rather expensive.

I've gotta say, your implementation of the function is better than I
would have done though. Without bsr, I probably would have looped,
shifted, and tested each individual bit...

I never would have thought of doing a binary search for the high bit, it
is definitely a novel idea to me :)

-Steve
```
Apr 13 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Sat, Apr 10, 2010 at 04:05:06PM -0500, Andrei Alexandrescu wrote:
The minimum for unsigned numbers is very simple: min(mina, minb).

mina = 1
minb = 0

min(1, 0) == 0

But, 1|0 == 1

max(mina, minb) looks to work though. Consider that ORing can only set bits -
it can never take a one and spit out a zero. Therefore, x|y >= x && x|y >= y,
which only works on the max(mina, minb) function.

--
http://arsdnet.net
```
Apr 10 2010
PBa <spam goto-hell.com> writes:
```Am 10.04.2010, 23:17 Uhr, schrieb Don <nospam nospam.com>:

Rainer Deyke wrote:
On 4/10/2010 11:52, Andrei Alexandrescu wrote:
I think this would work:

uint maxOR(uint maxa, uint maxb) {
if (maxa < maxb) return maxOR(maxb, maxa);
uint candidate = 0;
foreach (i, bit; byBitDescending(maxa)) {
if (bit) continue;
auto t = candidate | (1 << (31 - i));
if (t <= maxb) candidate = t;
}
return maxa | candidate;
}

all the bits in 'maxb' (excluding the leading zeros).  If 'maxb' is
exactly 1 less than a power of 2, then 'candidate' will be 0.  Now
consider a in [0, 16], b in [0, 15].  Your function will produce 16, but
the real maximum is 31.
For maximum accuracy, you have to consider the minima as well as the
maxima when calculating the new maximum.  With 'a' in [16, 16] and 'b'
in [16, 16], the new range can only be [16, 16].  With 'a' in [0, 16]
and 'b' in [0, 16], the correct new range is [0, 31].

It's a very interesting puzzle. There are some pretty complex cases.

Consider (all numbers binary) a in [1010, 1100] and b in [101,101]. The
maximum value is when a=1011, b=101, so max is 1111.

import std.stdio, std.conv, std.intrinsic;

void main(string[] args)
{
int total = 0, match = 0, from = to!(int)(args[1]), to =
to!(int)(args[2]);
uint res, resBF;
for(int i = from; i < to; ++i)
{
for(int j = i; j < to; ++j)
{
for(int k = from; k < to; ++k)
{
for(int m = k; m < to; ++m)
{
++total;
res = maxOr(i, j, k, m);
resBF = maxOrBF(i, j, k, m);
if(res == resBF) ++match;
//else writefln("mismatch: minA %s, maxA %s, minB %s, maxB %s,
maxOr %s, maxOrBF %s", i, j, k, m, res, resBF);
}
}
}
}
writefln("total %s, match %s", total, match);
}

uint max(uint a, uint b)
{
return (a > b) ? a : b;
}

uint maxOr(uint minA, uint maxA, uint minB, uint maxB)
{
uint result = minA | maxA | minB | maxB,
diff = max(maxA - minA, maxB - minB);

if(diff)
{
result |= (1 << (bsr(diff) + 1)) - 1;
}
return result;
}

/* compute result using brute force */
uint maxOrBF(uint minA, uint maxA, uint minB, uint maxB)
{
return bitsSetInIntervallBF(minA, maxA) | bitsSetInIntervallBF(minB,
maxB);
}

uint bitsSetInIntervallBF(uint min, uint max)
{
uint result = 0;

for(; min <= max; ++min)
{
result |= min;
}
return result;
}
```
Apr 09 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:
We are talking about range propagation, a function of the compiler, not a
function of the compiled program.  Therefore, slower but more exact functions
should be preferred, since they only affect the compile time.

I agree with this. While my solution has a certain beauty to it and is
fast, your solution accurately covers all the bases, whereas mine fails with
min > 1. I say we go with your's.

--
http://arsdnet.net
```
Apr 11 2010
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
```On Mon, 12 Apr 2010 13:45:14 -0400, JÃ©rÃ´me M. Berger <jeberger free.fr>
wrote:

Steven Schveighoffer wrote:
Jï¿½rï¿½me M. Berger Wrote:

Jï¿½rï¿½me M. Berger wrote:
OK, I found a solution that is optimal in over 99% of the cases
(99.195% to be precise), while still being fast (because it avoids
looping over the bits):

0..299 without checking, just for the timing. On my computer (Athlon
X2 64  2GHz), my function took 50s and Andrei's took 266s. The
difference should be even bigger for larger numbers.

Can I ask, what is the point of having a non-exact solution (unless you
are just doing this for fun)?

We are talking about range propagation, a function of the compiler, not
a function of the compiled program.  Therefore, slower but more exact
functions should be preferred, since they only affect the compile
time.  Note that you will only need to do this range propagation on
lines that "or" two values together, and something that reduces the
runtime of the compiler by 216 seconds, but only when compiling enough
code to have over 8 billion 'or' operations in it (300^4), I don't
think is really that important.  Give me the exact solution that
prevents me from having to cast when the compiler can prove it, even if
the runtime of the compiler is slightly slower.

speed of the D compiler to that of the Go compiler...

My point was simply that the amount of time saved is relative to the size
of the program being compiled.  If we assume conservatively that every
other line in a program has bitwise or in it, then you are talking about a
16 billion line program, meaning the 216 seconds you save is insignificant
compared to the total compile time.  My point was that your fast solution
that is inaccurate is not preferable because nobody notices the speed,
they just notice the inaccuracy.

We are talking about range propagation, a function of the compiler,
not a function of the compiled program. Since we can't get a 100%
accurate representation of the possible values anyway (even yours
might leave holes in the middle after all), then it might be better
to choose a faster, slightly less precise algorithm if the
difference is not too great. That's the difference between a
compiler and a full-fledged static code analysis an program prover.

When we're talking about the difference between O(1) and O(lgn), I'll take
accuracy over speed in my compiler any day.  The solution should be 100%
accurate for the problem statement.  If the problem statement is not what
we need, then we need a new problem statement :)  Solving the problem
statement for 99% of values is not good enough.

Anyway, the point is moot, I have a new version of my algorithm
with 100% precision and high speed.

Yes, I'm still trying to understand how it works :)

-Steve
```
Apr 12 2010
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
```On Mon, 12 Apr 2010 13:56:23 -0400, JÃ©rÃ´me M. Berger <jeberger free.fr>
wrote:

Here's a fast 100% precise solution:

==============================8<------------------------------
uint32_t maxOr (uint32_t minA, uint32_t minB,
uint32_t maxA, uint32_t maxB)
{
assert (minA <= maxA);
assert (minB <= maxB);

if (maxA == 0) return maxB;
if (maxB == 0) return maxA;

uint32_t a = maxA ^ minA;
if (a != 0) {
a = ((1 << (highbit (a)+1)) - 1) & maxA & maxB;
if (a != 0)
a = (1 << highbit (a)) - 1;
}

uint32_t b = maxB ^ minB;
if (b != 0) {
b = ((1 << (highbit (b)+1)) - 1) & maxA & maxB;
if (b != 0)
b = (1 << highbit (b)) - 1;
}

return maxA|maxB|a|b;
}
------------------------------>8==============================

Fails for test case:

minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result is 6).

I'm not quite sure what you are trying to do, but I think you are on the
right path to getting a faster solution.

-Steve
```
Apr 12 2010
Fawzi Mohamed <fawzi gmx.ch> writes:
```On 12-apr-10, at 21:40, Steven Schveighoffer wrote:

On Mon, 12 Apr 2010 13:45:14 -0400, J=C3=A9r=C3=B4me M. Berger =20
<jeberger free.fr> wrote:

Steven Schveighoffer wrote:
J=EF=BF=BDr=EF=BF=BDme M. Berger Wrote:

J=EF=BF=BDr=EF=BF=BDme M. Berger wrote:
OK, I found a solution that is optimal in over 99% of the cases
(99.195% to be precise), while still being fast (because it avoids
looping over the bits):

in
0..299 without checking, just for the timing. On my computer =20
(Athlon
X2 64  2GHz), my function took 50s and Andrei's took 266s. The
difference should be even bigger for larger numbers.

Can I ask, what is the point of having a non-exact solution =20
(unless you are just doing this for fun)?

We are talking about range propagation, a function of the =20
compiler, not a function of the compiled program.  Therefore, =20
slower but more exact functions should be preferred, since they =20
only affect the compile time.  Note that you will only need to do =20=

this range propagation on lines that "or" two values together, and =20=

something that reduces the runtime of the compiler by 216 seconds, =20=

but only when compiling enough code to have over 8 billion 'or' =20
operations in it (300^4), I don't think is really that important.  =20=

Give me the exact solution that prevents me from having to cast =20
when the compiler can prove it, even if the runtime of the =20
compiler is slightly slower.

speed of the D compiler to that of the Go compiler...

My point was simply that the amount of time saved is relative to the =20=

size of the program being compiled.  If we assume conservatively =20
that every other line in a program has bitwise or in it, then you =20
are talking about a 16 billion line program, meaning the 216 seconds =20=

you save is insignificant compared to the total compile time.  My =20
point was that your fast solution that is inaccurate is not =20
preferable because nobody notices the speed, they just notice the =20
inaccuracy.

We are talking about range propagation, a function of the =

not a function of the compiled program. Since we can't get a 100%
accurate representation of the possible values anyway (even yours
might leave holes in the middle after all), then it might be better
to choose a faster, slightly less precise algorithm if the
difference is not too great. That's the difference between a
compiler and a full-fledged static code analysis an program prover.

When we're talking about the difference between O(1) and O(lgn), =20
I'll take accuracy over speed in my compiler any day.  The solution =20=

should be 100% accurate for the problem statement.  If the problem =20
statement is not what we need, then we need a new problem =20
statement :)  Solving the problem statement for 99% of values is not =20=

good enough.

Anyway, the point is moot, I have a new version of my algorithm
with 100% precision and high speed.

Yes, I'm still trying to understand how it works :)

Sorry for the probably stupid question, but I don't understand much =20
the need of all this range propagation, in the compiler either you =20
have a a constant (and then you have the value at compile time, and =20
you don't need any range propagation, you can compare with the value), =20=

or you have a runtime value.
Do you want to explicitly add in the compiler the support for more =20
limited runtime values?
Otherwise the range of a runtime value is a priori the whole possible =20=

range, and thus any rule based on range propagation might be expressed =20=

as static type based rule (as done in C).
You can gain something for example you can know that summing 4 shorts =20=

you will never overflow an int, is this where you want to go?
What kind of bugs you are trying to avoid? Or is it simply having the =20=

max and min properties defined?

Nobody commented on my proposal, but there I see the potential for =20
bugs (1u-2)+1UL is 4294967296 and this happens also at runtime if you =20=

have, for example, a function returning size_t and another returning =20
uint, and you combine them without thinking much and then you switch =20
to 64 bits...

There I see a problem, but this you see without any special range =20
propagation, just thinking that subtraction or negation of unsigned =20
types is modulo 2^bit size, and thus cannot be then changed to another =20=

size without explicit cast.
Maybe in some cases using enums range propagation might spare a cast, =20=

but is it really an improvement?
I guess that I am missing something obvious, so I don't see the reason =20=

for range propagation, but maybe I am not the only one, so thanks for =20=

an explanation...

Fawzi=
```
Apr 13 2010
Fawzi Mohamed <fawzi gmx.ch> writes:
```--Apple-Mail-2-789995494
Content-Type: text/plain;
charset=US-ASCII;
format=flowed;
delsp=yes
Content-Transfer-Encoding: 7bit

On 13-apr-10, at 12:01, bearophile wrote:

Fawzi Mohamed:
I guess that I am missing something obvious, so I don't see the
reason
for range propagation, but maybe I am not the only one, so thanks for
an explanation...

Range propagation can improve the situation a little, so it's not
bad. But it can't replace proper integral overflows. You need
integral overflows at runtime.

integral overflow are helpful only if you have automatic conversion to
a larger type, but that breaks the compile time knowledge of the size
of such an integer, so that you have to assume that it might need to
be pushed to the heap.
Yes you might use some tricks (like using size_t.sizeof*8-1 bits, or a
pointer to spare some place), but I don't think that D wants to go
that way for the basic integers...

Bye,
bearophile

--Apple-Mail-2-789995494
Content-Type: text/html;
charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

<html><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><br><div><div>On 13-apr-10, at =
12:01, bearophile wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div>Fawzi =
Mohamed:<br><blockquote type=3D"cite">I guess that I am missing =
something obvious, so I don't see the reason =
&nbsp;<br></blockquote><blockquote type=3D"cite">for range propagation, =
but maybe I am not the only one, so thanks for =
&nbsp;<br></blockquote><blockquote type=3D"cite">an =
explanation...<br></blockquote><br>Range propagation can improve the =
situation a little, so it's not bad. But it can't replace proper =
integral overflows. You need integral overflows at =
runtime.<br></div></blockquote><div><br></div>integral overflow are =
helpful only if you have automatic conversion to a larger type, but that =
breaks the compile time knowledge of the size of such an integer, so =
that you have to assume that it might need to be pushed to the =
heap.</div><div>Yes you might use some tricks (like using =
size_t.sizeof*8-1 bits, or a pointer to spare some place), but I don't =
think that D wants to go that way for the basic =
integers...</div><div><br></div><div><blockquote type=3D"cite"><div><font =
class=3D"Apple-style-span" =
color=3D"#000000"><br></font>Bye,<br>bearophile<br></div></blockquote></di=
v><br></body></html>=

--Apple-Mail-2-789995494--
```
Apr 13 2010
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
```On Mon, 12 Apr 2010 22:37:00 -0400, Steven Schveighoffer
<schveiguy yahoo.com> wrote:

Steven Schveighoffer wrote:
Fails for test case:

minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result is

combinations in the [0, 63] range, so if there are mistakes they
have to be outside that range...

I'll have to double check, I thought I copied your code verbatim (I did
switch around the arguments to be minA, maxA, minB, maxB to fit my test
harness, and I changed the uint_32_t to uint).  I'll check tomorrow at
work where the computer I used to test is.

I confirm, with the updated highbit, your solution works.

Also, inspired by your solution (and using your highbit function, which is
very neat BTW) I came up with a one-liner.  Enjoy :)

uint maxor(uint mina, uint maxa, uint minb, uint maxb)
{
return maxa | maxb | ((1 << highbit(((1 << (max(highbit(mina ^ maxa),
highbit(minb ^ maxb)) + 1)) - 1) & maxa & maxb)) - 1);
}

Anyone want to do the equivalent minor?  I've spent way too much time on
this :)

-Steve
```
Apr 13 2010
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
```On Tue, 13 Apr 2010 13:37:13 -0400, JÃ©rÃ´me M. Berger <jeberger free.fr>
wrote:

Steven Schveighoffer wrote:

Steven Schveighoffer wrote:
When we're talking about the difference between O(1) and O(lgn), I'll
take accuracy over speed in my compiler any day.

a minimal loss of accuracy, which will you take? Especially if the
accuracy loss is less than is lost elsewhere (due to holes in the
ranges).

Really?  You rebuilt the compiler with your range propagation algorithm
and verified that it adds 10 seconds versus an accurate one that adds
55s?  How much time did the compiler spend to compile?  I'd hazard to
guess that a code base that adds 10s worth of your algorithm takes at
least a few hours to compile.  Is 55s that bad at that point?

Again, if it takes the compiler an extra insignificant amount of time
to be more accurate, I'm all for accuracy over speed when you get down
to that level of insignificance.  I'd say the point of pain has to be
at least 10% of the compile time before it makes any sizable difference.

times slower just because it brings extra precision which you may
not really need, then you will wind up with a compiler that is 5 to
6 times slower than it needs to be. Sure the difference on *one*
function is not great in absolute terms, but if you make the same
choice for *all* functions, then where do you go?

In my opinion?  Yes, slower compilers that make code easier to write are
better.  I don't spend lots of time compiling, I spend it writing code.
And I don't need to babysit the compiler, it goes off and does its thing.

Performance is only important in the end result.  I'm not saying I want my
compiler to be slow, but I want it to be accurate and useful more than I
want it to be quick.  In this case, there is a quick and fully accurate
solution, so it doesn't matter.  But, for instance, if the compiler could
do a full analysis to check if variables escape their scope, and that
makes the compiler 5x slower, then I'd rather have the compiler verify my
work instead of quickly producing memory-corrupting code.

-Steve
```
Apr 13 2010
bearophile <bearophileHUGS lycos.com> writes:
```Andrei Alexandrescu:
byte c = a | b;
Say you already know min_a, max_a, min_b, and max_b. How do you compute
min_c and max_c? I thought of it for a bit and it seems quite tricky.

This is Interval arithmetic. If you are able to find a book about it... (or a
library for some programming language that implements it).

Bye,
bearophile
```
Apr 10 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Sat, Apr 10, 2010 at 12:39:22PM -0500, Andrei Alexandrescu wrote:
Yah, that works for unsigned types. For signed types, it's tricky (as
tricky as max).

My feeling is to go ahead and cast to unsigned, then do the calculation. For
me at least, when doing bitwise ops, I assume it is unsigned anyway.

Even on the branch that doesn't use the sum, that's still just a bound.
Consider:

a = 8;
b = 4;

Then max(a|b) is 12, but computed with your method is 15.

Yeah, you're right. Bah. Maybe:

max_a = max(a, b) | (2 ^^ numbits(min(a, b)) - 1);

8 | 7 == 15;

Same deal, still wrong. Maybe we can throw in one more thing:

let f(a, b) = the above;
max_c = min(f(a, b), a | b);

Let me test it... it seems to work. Here's the D program I used to brute-force
the test:

===
import std.intrinsic; // for bsr()

uint max(uint a, uint b) { return (a >= b) ? a : b; }
uint min(uint a, uint b) { return (a >= b) ? b : a; }

uint numbits(uint a) { return a == 0 ? a : bsr(a) + 1; }

unittest {
assert(numbits(0) == 0);
assert(numbits(1) == 1);
assert(numbits(2) == 2);
assert(numbits(3) == 2);
assert(numbits(4) == 3);
assert(numbits(5) == 3);
}

uint f(uint a, uint b) {
return max(a, b) | (1 << numbits(min(a, b)) - 1);
}

uint max_a(uint a, uint b) {
return min(f(a, b), a | b);
}

void main() {
for(uint a = 0; a <= 256; a++)
for(uint b = 0; b <= 256; b++)
assert(a|b <= max_a(a, b));
}
=======

I compiled with the unittests, and it closed without tripping any asserts.
Eyeballing the output looks right too.

--
http://arsdnet.net
```
Apr 10 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Sat, Apr 10, 2010 at 02:57:59PM -0400, Adam D. Ruppe wrote:
Let me test it... it seems to work. Here's the D program I used to brute-force
the test:

The main() function there isn't an ideal test. Here's a better one:

void main() {
for(uint max_a = 0; max_a < 100; max_a++)
for(uint max_b = 0; max_b < 100; max_b++)
for(uint a = 0; a <= max_a; a++)
for(uint b = 0; b <= max_b; b++)
assert(a|b <= max_c(max_a, max_b));
}

// note that I renamed uint max_a to max_c.

We try a whole load of max_a and max_b and test every possible value. It
still runs without throwing.

My original function for min_c works too:

uint min_c(uint a, uint b) {
return max(a, b);
}

And you can brute force test it all:

void main() {
for(uint min_a = 0; min_a < 50; min_a++)
for(uint min_b = 0; min_b < 50; min_b++)
for(uint max_a = min_a; max_a < 100; max_a++)
for(uint max_b = min_b; max_b < 100; max_b++)
for(uint a = min_a; a <= max_a; a++)
for(uint b = min_b; b <= max_b; b++) {
assert(a|b <= max_c(max_a, max_b));
assert(a|b >= min_c(min_a, min_b));
}
}

This, as you can expect, takes a while to run, but does so without throwing. I'm
pretty confident in the functions now.

--
http://arsdnet.net
```
Apr 10 2010
BCS <none anon.com> writes:
```Hello Andrei,

Consider:

byte c = a | b;

Say you already know min_a, max_a, min_b, and max_b. How do you
compute min_c and max_c? I thought of it for a bit and it seems quite
tricky.

x_bit_max == "the largest bit in x_max ^ x_min"
x_oth_max == x_bit_max - 1;

c_max = max(a_bit_max | b_oth_max, b_bit_max | a_oth_max) | (a_max &
~(a_bit_max
| a_oth_max));

--
... <IXOYE><
```
Apr 10 2010
BCS <none anon.com> writes:
```Hello BCS,

Hello Andrei,

Consider:

byte c = a | b;

Say you already know min_a, max_a, min_b, and max_b. How do you
compute min_c and max_c? I thought of it for a bit and it seems quite
tricky.

x_oth_max == x_bit_max - 1;

// or make that

c_min = (a_max & ~(a_bit_max | a_oth_max);
c_max = max(a_bit_max | b_oth_max, b_bit_max | a_oth_max) | c_min;

c_max = max(a_bit_max | b_oth_max, b_bit_max | a_oth_max) | (a_max &
~(a_bit_max | a_oth_max));

--
... <IXOYE><
```
Apr 10 2010
BCS <none anon.com> writes:
```Hello BCS,

Scratch all that, I'm clueless. :b

--
... <IXOYE><
```
Apr 10 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Sat, Apr 10, 2010 at 02:55:36PM -0400, bearophile wrote:
Time ago I have even suggested to disallow bitwise ops when one or both
operands are signed...

I understand the position, but I don't advocate it just because I think it would
be annoying and not give much of a benefit.

Consider the following:

for(int a = -10; a < 10; a++)
writeln(a&1 ? "odd" : "even");

I'd just be annoyed if I had to cast a to uint just to do that. Note that it
works
on negative numbers too, so it isn't strictly wrong to do it on signed ints.

I use this when doing stripes on outputted tables and stuff like that.

a % 2 does the same thing, so this specific case isn't a big deal, but I still
tend to type &1 rather than %2 out of habit (this kind of thing used to make a
real speed difference back in the DOS days when I learned it all!)

--
http://arsdnet.net
```
Apr 10 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Sat, Apr 10, 2010 at 02:57:59PM -0400, Adam D. Ruppe wrote:
Let me test it... it seems to work. Here's the D program I used to brute-force
the test:

Oh no! Eyeballing again showed a problem.. I should have put parens in my
asserts:

a | b < f(a,b)   !=  (a|b) < f(a,b)

That breaks it. Back to the drawing board.

--
http://arsdnet.net
```
Apr 10 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Sat, Apr 10, 2010 at 04:02:05PM -0400, Adam D. Ruppe wrote:
That breaks it. Back to the drawing board.

I *might* have it, but I'm not 100% confident in my test program. Here's my
implementation:

====
import std.intrinsic;

uint max(uint a, uint b) { return (a >= b) ? a : b; }
uint min(uint a, uint b) { return (a >= b) ? b : a; }
uint numbits(uint a) { return a == 0 ? a : (bsr(a) + 1); }

uint max_c(uint a, uint b) {
uint mn = min(a, b);
uint mx = max(a, b);
if(mn == 0)
return mx;
// if there is a bit we can reuse, it grants us everything before it too
uint reusable = mn & mx;
if(reusable)
return (mx | ((1 << numbits(reusable)) - 1)) | mn;
else
return mx | mn;
}

uint min_c(uint min_a, uint min_b) {
return max(min_a, min_b);
}
=====

I can't believe I spent all day on this...

--
http://arsdnet.net
```
Apr 10 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Sat, Apr 10, 2010 at 07:38:07PM -0400, Adam D. Ruppe wrote:
I *might* have it, but I'm not 100% confident in my test program.

I think my test program is telling me something: a non-zero min_c subtracts from
max_c. If I take my brute-force empirically calculated max_c and compare it to
what my max_c function returns, it doesn't always match if the minimums change.

This might be a bug; it is unintuitive that it would do this, but it actually
makes sense. Consider if min == max, you only have one value to work with.

let max_a = 4, min_a = 0;

max_b = 4

You can get 7 out of it, since 3 < 4, so it is a possible number and: 100 | 011
== 7

But, if min_b == 4, the only possible value for b is 4. And: 100 | 100 == 4.

Increasing min_b does indeed decrease the max you can get.

Nice fact, but I've spent too much time on this already, so I'll call it done
with
this:

My max_c included some unnecessary code: my reusable mask obsoletes all of
my special case code! So, we can bring the program down to three lines:

import std.intrinsic;
uint numbits(uint a) { return a == 0 ? a : (bsr(a) + 1); }
uint max_c(uint a, uint b) { return a | ((1 << numbits(a&b)) - 1) | b; }

It passes my test, though since I bugged it up the first time, I might have done
it again.

--
http://arsdnet.net
```
Apr 10 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Sat, Apr 10, 2010 at 08:42:54PM -0500, Andrei Alexandrescu wrote:
What results does it yield with my main() test harness?

total=100000000; equal=14585400 (14.5854%)

I think that's a perfect result. Here's why: the equality comparison checks
two specific values, (a, b), but the maxOr functions return the max for the
interval ([0, a], [0, b]). It wouldn't always be equal, since there are other
values in there that can reach the max.

For example, a = 4, b = 4. 100|100 == 100, but that range also include 4|3,
100 | 011 == 111, which is the max.

Your maxOR function gives the same percentage, for probably the same reason.
Though, mine runs ~7x faster on my box.

--
http://arsdnet.net
```
Apr 10 2010
"Adam D. Ruppe" <destructionator gmail.com> writes:
```On Sat, Apr 10, 2010 at 09:21:07PM -0500, Andrei Alexandrescu wrote:
Yah, it's clear that we're not looking anywhere near 100%. I'm a bit sad
we don't have an adequate baseline, but the fact that two distinct
methods obtained the same result is encouraging.

My brute-force program was buggy, but I think I fixed it. Here's the main():

========
uint min_a = 0;
uint min_b = 0;
for(uint max_a = min_a; max_a <= 256; max_a++)
for(uint max_b = min_b; max_b <= 256; max_b++) {
uint real_max = 0;
for(uint a = min_a; a <= max_a; a++)
for(uint b = min_b; b <= max_b; b++) {
assert((a|b) <= max_c(max_a, max_b), format("%b | %b !<= %b (from %b %b)",
a, b, max_c(max_a, max_b), max_a, max_b));
assert((a|b) >= min_c(min_a, min_b));
if((a | b) > real_max)
real_max = (a|b);
}
assert(max_c(max_a, max_b) == real_max, format("[%b, %b]. expected: %d, got
%d (min == %d)", max_a, max_b, max_c(max_a, max_b), real_max, min_c(min_a,
min_b)));
}

writeln("success");
=======

With so many nested loops, it is slow as dirt, but what it does is try every
value in the given range and record the maximum number encountered.

If it doesn't match at any step, it throws. But, it works for me. I spent
a chunk of the day eyeballing the output too to ensure it is right (and to
figure out the pattern that led to the final function).

Now, the thing is if you change min_a or min_b, it breaks the real_max; like
I said in the other message, if min == max, for example, you get the case
where 4|4 isn't the max, since 4|3 isn't available.

I'm not sure how to fix that, but I am pretty convinced that to get the
perfect solution, the max_c function needs to know min_a and min_b too.
Still, this solution works very well in the min_a = min_b = 0 case, so
it is at least a decent bound.

On to the next task: compute minOR and maxOR for _signed_ values. It
turns out minOR its also nontrivial...

Nontrivial is an understatement. I started asking if it can even be done
without assuming a size.. I do think you can, by assuming the sizes
are equal, whatever they are, but it still is not easy.

I still like the idea of just casting it. How often would it cause problems
in real code anyway?

The only starting point I have is:
Of either max_a or max_b is negative, the result of the OR is going
to be negative, since that sign bit is one, and we can't change a one to
a zero on any OR operation.

So, the max value will be positive iff both values are possibly positive.

But taking that one statement to working code is too hard for my blood.

When we're done, we can pass the functions to Walter so he can integrate
them within his compiler. Right now the compiler uses a wrong algorithm.

Indeed. We're getting there, but still a ways to go.

--
http://arsdnet.net
```
Apr 10 2010
Fawzi Mohamed <fawzi gmx.ch> writes:
```On 13-apr-10, at 12:02, Don wrote:

Fawzi Mohamed wrote:
On 12-apr-10, at 21:40, Steven Schveighoffer wrote:
On Mon, 12 Apr 2010 13:45:14 -0400, J=C3=A9r=C3=B4me M. Berger =

fr> wrote:

Steven Schveighoffer wrote:
J=EF=BF=BDr=EF=BF=BDme M. Berger Wrote:

J=EF=BF=BDr=EF=BF=BDme M. Berger wrote:
OK, I found a solution that is optimal in over 99% of the =20
cases
(99.195% to be precise), while still being fast (because it =20
avoids
looping over the bits):

pairs in
0..299 without checking, just for the timing. On my computer =20
(Athlon
X2 64  2GHz), my function took 50s and Andrei's took 266s. The
difference should be even bigger for larger numbers.

Can I ask, what is the point of having a non-exact solution =20
(unless you are just doing this for fun)?

We are talking about range propagation, a function of the =20
compiler, not a function of the compiled program.  Therefore, =20
slower but more exact functions should be preferred, since they =20=

only affect the compile time.  Note that you will only need to =20
do this range propagation on lines that "or" two values =20
together, and something that reduces the runtime of the compiler =20=

by 216 seconds, but only when compiling enough code to have over =20=

8 billion 'or' operations in it (300^4), I don't think is really =20=

that important.  Give me the exact solution that prevents me =20
from having to cast when the compiler can prove it, even if the =20=

runtime of the compiler is slightly slower.

speed of the D compiler to that of the Go compiler...

My point was simply that the amount of time saved is relative to =20
the size of the program being compiled.  If we assume =20
conservatively that every other line in a program has bitwise or =20
in it, then you are talking about a 16 billion line program, =20
meaning the 216 seconds you save is insignificant compared to the =20=

total compile time.  My point was that your fast solution that is =20=

inaccurate is not preferable because nobody notices the speed, =20
they just notice the inaccuracy.

We are talking about range propagation, a function of the =20
compiler,
not a function of the compiled program. Since we can't get a 100%
accurate representation of the possible values anyway (even yours
might leave holes in the middle after all), then it might be better
to choose a faster, slightly less precise algorithm if the
difference is not too great. That's the difference between a
compiler and a full-fledged static code analysis an program prover.

When we're talking about the difference between O(1) and O(lgn), =20
I'll take accuracy over speed in my compiler any day.  The =20
solution should be 100% accurate for the problem statement.  If =20
the problem statement is not what we need, then we need a new =20
problem statement :)  Solving the problem statement for 99% of =20
values is not good enough.

Anyway, the point is moot, I have a new version of my algorithm
with 100% precision and high speed.

Yes, I'm still trying to understand how it works :)

the need of all this range propagation, in the compiler either you =20=

have a a constant (and then you have the value at compile time, and =20=

you don't need any range propagation, you can compare with the =20
value), or you have a runtime value.

It's been part of DMD2 for a while now. It allows you to do things =20
like:

ubyte lowbits(int x)
{
return x & 7;
}

without an explicit cast. The compiler knows that x&7 can safely fit =20=

inside a single byte.  Whereas   ((x&7) << 12) | (x&3);
does not fit, and requires an explicit cast.

ah ok I understand, I thought that was treated like x & cast(ubyte)7 , =20=

and so as comparison of the compile time value with the ranges of =20
ubyte (no range propagation needed).
But I can understand why it is treated as cast(ubyte)((cast(int)x) & =20
7), as it is probably easier for the compiler, as it upcasts by default.

Thanks for the explanation.
```
Apr 13 2010