www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - value range propagation for logical OR

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Consider:

byte c = a | b;

Say you already know min_a, max_a, min_b, and max_b. How do you compute 
min_c and max_c? I thought of it for a bit and it seems quite tricky.


Thanks,

Andrei
Apr 10 2010
next sibling parent reply Justin Spahr-Summers <Justin.SpahrSummers gmail.com> writes:
On Sat, 10 Apr 2010 12:01:45 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> wrote:
 
 Consider:
 
 byte c = a | b;
 
 Say you already know min_a, max_a, min_b, and max_b. How do you compute 
 min_c and max_c? I thought of it for a bit and it seems quite tricky.
 
 
 Thanks,
 
 Andrei

It seems like it would use the lowest minimum and the highest maximum of the two... that's what I personally would find most natural, given the semantics of bitwise OR, but maybe I'm just way off base.
Apr 10 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/10/2010 12:22 PM, Justin Spahr-Summers wrote:
 On Sat, 10 Apr 2010 12:01:45 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>  wrote:
 Consider:

 byte c = a | b;

 Say you already know min_a, max_a, min_b, and max_b. How do you compute
 min_c and max_c? I thought of it for a bit and it seems quite tricky.


 Thanks,

 Andrei

It seems like it would use the lowest minimum and the highest maximum of the two... that's what I personally would find most natural, given the semantics of bitwise OR, but maybe I'm just way off base.

I thought the same, but consider: a = 4; b = 3; Then the maximum value is 7, larger than both. Andrei
Apr 10 2010
prev sibling next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sat, Apr 10, 2010 at 12:01:45PM -0500, Andrei Alexandrescu wrote:
 byte c = a | b;
 
 Say you already know min_a, max_a, min_b, and max_b. How do you compute 
 min_c and max_c? I thought of it for a bit and it seems quite tricky.

min_c seems easy enough: min_c = max(min_a, min_b); When you or two values together, the result must be >= both of them, and this does the trick there. max_c is a bit harder. The obvious solution of: max_c = max(max_a, max_b); Fails because of 0b1100 | 0b0011... looking at this example, let's try: max_c = max_a + max_b; It covers that case, but is too big for 0b1111 | 0b1111. This might work: int numbits(number) { return the number of bits required to hold that number; } if ( numbits(max_a + max_b) > max(numbits(max_a), numbits(max_b)) ) max_c = 2 ^^ (max(numbits(max_a), numbits(max_b))) - 1; else max_c = max_a + max_b; The biggest we can possibly get is the sum of the two numbers. But, if it carries over to more bits than either option, we truncate it - OR never gives more bits than the two arguments. So we assume the max is the worst case of all ones or'd with all ones. I think this would work, but might not be the best we can possibly do. It is the best I can think of though. -- Adam D. Ruppe http://arsdnet.net
Apr 10 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
(By the way I meant _bitwise_ in the title.)

On 04/10/2010 12:38 PM, Adam D. Ruppe wrote:
 On Sat, Apr 10, 2010 at 12:01:45PM -0500, Andrei Alexandrescu wrote:
 byte c = a | b;

 Say you already know min_a, max_a, min_b, and max_b. How do you compute
 min_c and max_c? I thought of it for a bit and it seems quite tricky.

min_c seems easy enough: min_c = max(min_a, min_b); When you or two values together, the result must be>= both of them, and this does the trick there.

Yah, that works for unsigned types. For signed types, it's tricky (as tricky as max).
 max_c is a bit harder. The obvious solution of:

 max_c = max(max_a, max_b);

 Fails because of 0b1100 | 0b0011... looking at this example, let's try:

 max_c = max_a + max_b;

 It covers that case, but is too big for 0b1111 | 0b1111.

Yah, sum is a bound for max, but not the exact max. This is because bitwise OR is like a sum without transport.
 This might work:

 int numbits(number) { return the number of bits required to hold that number; }

 if ( numbits(max_a + max_b)>  max(numbits(max_a), numbits(max_b)) )
      max_c = 2 ^^ (max(numbits(max_a), numbits(max_b))) - 1;
 else
      max_c = max_a + max_b;


 The biggest we can possibly get is the sum of the two numbers. But, if it
carries
 over to more bits than either option, we truncate it - OR never gives more bits
 than the two arguments. So we assume the max is the worst case of all ones or'd
 with all ones.

 I think this would work, but might not be the best we can possibly do. It is
the
 best I can think of though.

Even on the branch that doesn't use the sum, that's still just a bound. Consider: a = 8; b = 4; Then max(a|b) is 12, but computed with your method is 15. Andrei
Apr 10 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Adam D. Ruppe:

 My feeling is to go ahead and cast to unsigned, then do the calculation. For
 me at least, when doing bitwise ops, I assume it is unsigned anyway.

Time ago I have even suggested to disallow bitwise ops when one or both operands are signed... Bye, bearophile
Apr 10 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Adam D. Ruppe:
 I understand the position, but I don't advocate it just because I think it
would
 be annoying and not give much of a benefit.

I have never added a bug report in Bugzilla about this because I think it doesn't give enough benefit. Thank you for your answer, bye, bearophile
Apr 10 2010
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Adam D. Ruppe:

 Oh no! Eyeballing again showed a problem.. I should have put parens in my
asserts:
 a | b < f(a,b)   !=  (a|b) < f(a,b)

Bugs are gold nuggets! Yes, the precedence of bitwise operators is low, it's an error-prone thing, it's a part of C/C++/D that causes frequent bugs in programs written by everybody. I add extra parentheses when I use bitwise operators. Unfortunately I don't see a simple way to remove this significant source of bugs from the D2 language. When you switch on full warnings GCC warns you about few possible similar errors, suggesting to add parentheses to remove some ambiguity (for the eyes of the programmers). This is a small example in C: #include "stdio.h" #include "stdlib.h" int main() { int a = atoi("10"); int b = atoi("20"); int c = atoi("30"); printf("%u\n", a|b <= c); return 0; } If you compile it with GCC (I am using gcc 4.4.1): ...>gcc -Wall test.c -o test test.c: In function 'main': test.c:9: warning: suggest parentheses around comparison in operand of '|' You always use -Wall (and other warnings) when you write C code. So in this case the C language+compiler is able to catch a bug like yours :-) Bye, bearophile
Apr 10 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
I have filed a "report":
http://d.puremagic.com/issues/show_bug.cgi?id=4077

Bye,
bearophile
Apr 10 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/10/2010 09:55 PM, bearophile wrote:
 I have filed a "report":
 http://d.puremagic.com/issues/show_bug.cgi?id=4077

Perfect, thanks. Your decision to contribute with reports is very appreciated. Andrei
Apr 10 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/10/2010 06:58 PM, Adam D. Ruppe wrote:
 On Sat, Apr 10, 2010 at 07:38:07PM -0400, Adam D. Ruppe wrote:
 I *might* have it, but I'm not 100% confident in my test program.

I think my test program is telling me something: a non-zero min_c subtracts from max_c. If I take my brute-force empirically calculated max_c and compare it to what my max_c function returns, it doesn't always match if the minimums change. This might be a bug; it is unintuitive that it would do this, but it actually makes sense. Consider if min == max, you only have one value to work with. let max_a = 4, min_a = 0; max_b = 4 You can get 7 out of it, since 3< 4, so it is a possible number and: 100 | 011 == 7 But, if min_b == 4, the only possible value for b is 4. And: 100 | 100 == 4. Increasing min_b does indeed decrease the max you can get. Nice fact, but I've spent too much time on this already, so I'll call it done with this: My max_c included some unnecessary code: my reusable mask obsoletes all of my special case code! So, we can bring the program down to three lines: import std.intrinsic; uint numbits(uint a) { return a == 0 ? a : (bsr(a) + 1); } uint max_c(uint a, uint b) { return a | ((1<< numbits(a&b)) - 1) | b; } It passes my test, though since I bugged it up the first time, I might have done it again.

What results does it yield with my main() test harness? Andrei
Apr 10 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/10/2010 08:58 PM, Adam D. Ruppe wrote:
 On Sat, Apr 10, 2010 at 08:42:54PM -0500, Andrei Alexandrescu wrote:
 What results does it yield with my main() test harness?

total=100000000; equal=14585400 (14.5854%) I think that's a perfect result. Here's why: the equality comparison checks two specific values, (a, b), but the maxOr functions return the max for the interval ([0, a], [0, b]). It wouldn't always be equal, since there are other values in there that can reach the max.

Yah, it's clear that we're not looking anywhere near 100%. I'm a bit sad we don't have an adequate baseline, but the fact that two distinct methods obtained the same result is encouraging.
 For example, a = 4, b = 4. 100|100 == 100, but that range also include 4|3,
 100 | 011 == 111, which is the max.


 Your maxOR function gives the same percentage, for probably the same reason.
 Though, mine runs ~7x faster on my box.

Then you haven't wasted your day! On to the next task: compute minOR and maxOR for _signed_ values. It turns out minOR its also nontrivial... When we're done, we can pass the functions to Walter so he can integrate them within his compiler. Right now the compiler uses a wrong algorithm. Andrei
Apr 10 2010
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
Adam D. Ruppe wrote:

 I am pretty convinced that to get the
 perfect solution, the max_c function needs to know min_a and min_b too.

Absolutely! :) I had just started writing an e-mail about it. Andrei Alexandrescu wrote:
 On to the next task: compute minOR and maxOR for _signed_ values. It
 turns out minOR its also nontrivial...


How about computing minOR for unsigned values first? ;) The proposed solution of max(min_a, min_b) is correct only for a couple of corner cases. Ali
Apr 10 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/10/2010 12:01 PM, Andrei Alexandrescu wrote:
 Consider:

 byte c = a | b;

 Say you already know min_a, max_a, min_b, and max_b. How do you compute
 min_c and max_c? I thought of it for a bit and it seems quite tricky.


 Thanks,

 Andrei

I think this would work: uint maxOR(uint maxa, uint maxb) { if (maxa < maxb) return maxOR(maxb, maxa); uint candidate = 0; foreach (i, bit; byBitDescending(maxa)) { if (bit) continue; auto t = candidate | (1 << (31 - i)); if (t <= maxb) candidate = t; } return maxa | candidate; } The idea is to find the candidate that would fill as many zeros in maxa as possible. But this is inefficient. I'm wondering whether a simple bitwise manipulation expression could do the trick. Andrei
Apr 10 2010
next sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
On 4/10/2010 11:52, Andrei Alexandrescu wrote:
 I think this would work:
 
 uint maxOR(uint maxa, uint maxb) {
     if (maxa < maxb) return maxOR(maxb, maxa);
     uint candidate = 0;
     foreach (i, bit; byBitDescending(maxa)) {
         if (bit) continue;
         auto t = candidate | (1 << (31 - i));
         if (t <= maxb) candidate = t;
     }
     return maxa | candidate;
 }

This looks wrong. Your function, if I understand it correctly, flips all the bits in 'maxb' (excluding the leading zeros). If 'maxb' is exactly 1 less than a power of 2, then 'candidate' will be 0. Now consider a in [0, 16], b in [0, 15]. Your function will produce 16, but the real maximum is 31. For maximum accuracy, you have to consider the minima as well as the maxima when calculating the new maximum. With 'a' in [16, 16] and 'b' in [16, 16], the new range can only be [16, 16]. With 'a' in [0, 16] and 'b' in [0, 16], the correct new range is [0, 31]. -- Rainer Deyke - rainerd eldwood.com
Apr 10 2010
next sibling parent Don <nospam nospam.com> writes:
Rainer Deyke wrote:
 On 4/10/2010 11:52, Andrei Alexandrescu wrote:
 I think this would work:

 uint maxOR(uint maxa, uint maxb) {
     if (maxa < maxb) return maxOR(maxb, maxa);
     uint candidate = 0;
     foreach (i, bit; byBitDescending(maxa)) {
         if (bit) continue;
         auto t = candidate | (1 << (31 - i));
         if (t <= maxb) candidate = t;
     }
     return maxa | candidate;
 }

This looks wrong. Your function, if I understand it correctly, flips all the bits in 'maxb' (excluding the leading zeros). If 'maxb' is exactly 1 less than a power of 2, then 'candidate' will be 0. Now consider a in [0, 16], b in [0, 15]. Your function will produce 16, but the real maximum is 31. For maximum accuracy, you have to consider the minima as well as the maxima when calculating the new maximum. With 'a' in [16, 16] and 'b' in [16, 16], the new range can only be [16, 16]. With 'a' in [0, 16] and 'b' in [0, 16], the correct new range is [0, 31].

It's a very interesting puzzle. There are some pretty complex cases. Consider (all numbers binary) a in [1010, 1100] and b in [101,101]. The maximum value is when a=1011, b=101, so max is 1111.
Apr 10 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/10/2010 01:55 PM, Rainer Deyke wrote:
 On 4/10/2010 11:52, Andrei Alexandrescu wrote:
 I think this would work:

 uint maxOR(uint maxa, uint maxb) {
      if (maxa<  maxb) return maxOR(maxb, maxa);
      uint candidate = 0;
      foreach (i, bit; byBitDescending(maxa)) {
          if (bit) continue;
          auto t = candidate | (1<<  (31 - i));
          if (t<= maxb) candidate = t;
      }
      return maxa | candidate;
 }

This looks wrong. Your function, if I understand it correctly, flips all the bits in 'maxb' (excluding the leading zeros).

It tries to set all bits in which maxa is zero starting and keeps them set if the resulting number is less than maxb.
 If 'maxb' is
 exactly 1 less than a power of 2, then 'candidate' will be 0.  Now
 consider a in [0, 16], b in [0, 15].  Your function will produce 16, but
 the real maximum is 31.

If maxb is 2^^N-1, then it will fill all nonzero bits of maxa.
 For maximum accuracy, you have to consider the minima as well as the
 maxima when calculating the new maximum.  With 'a' in [16, 16] and 'b'
 in [16, 16], the new range can only be [16, 16].  With 'a' in [0, 16]
 and 'b' in [0, 16], the correct new range is [0, 31].

The minimum for unsigned numbers is very simple: min(mina, minb). Andrei
Apr 10 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/10/2010 04:21 PM, Adam D. Ruppe wrote:
 On Sat, Apr 10, 2010 at 04:05:06PM -0500, Andrei Alexandrescu wrote:
 The minimum for unsigned numbers is very simple: min(mina, minb).

mina = 1 minb = 0 min(1, 0) == 0 But, 1|0 == 1 max(mina, minb) looks to work though. Consider that ORing can only set bits - it can never take a one and spit out a zero. Therefore, x|y>= x&& x|y>= y, which only works on the max(mina, minb) function.

Sorry, I meant max of the two minima. Andrei
Apr 10 2010
prev sibling parent reply =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Andrei Alexandrescu wrote:
 On 04/10/2010 01:55 PM, Rainer Deyke wrote:
 On 4/10/2010 11:52, Andrei Alexandrescu wrote:
 I think this would work:

 uint maxOR(uint maxa, uint maxb) {
      if (maxa<  maxb) return maxOR(maxb, maxa);
      uint candidate =3D 0;
      foreach (i, bit; byBitDescending(maxa)) {
          if (bit) continue;
          auto t =3D candidate | (1<<  (31 - i));
          if (t<=3D maxb) candidate =3D t;
      }
      return maxa | candidate;
 }

This looks wrong. Your function, if I understand it correctly, flips all the bits in 'maxb' (excluding the leading zeros).

It tries to set all bits in which maxa is zero starting and keeps them set if the resulting number is less than maxb. =20

int highbit (uint n) { auto result =3D 0; if (n & 0xFFFF0000) { result +=3D 16; n =3D n >> 16; } if (n & 0xFF00) { result +=3D 8; n =3D n >> 8; } if (n & 0xF0) { result +=3D 4; n =3D n >> 4; } if (n & 0xC) { result +=3D 2; n =3D n >> 2; } if (n & 0x2) { result +=3D 1; n =3D n >> 1; } if (n & 0x1) { result +=3D 1; } return result; } Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 10 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/10/2010 04:50 PM, "Jrme M. Berger" wrote:
 Andrei Alexandrescu wrote:
 On 04/10/2010 01:55 PM, Rainer Deyke wrote:
 On 4/10/2010 11:52, Andrei Alexandrescu wrote:
 I think this would work:

 uint maxOR(uint maxa, uint maxb) {
       if (maxa<   maxb) return maxOR(maxb, maxa);
       uint candidate = 0;
       foreach (i, bit; byBitDescending(maxa)) {
           if (bit) continue;
           auto t = candidate | (1<<   (31 - i));
           if (t<= maxb) candidate = t;
       }
       return maxa | candidate;
 }

This looks wrong. Your function, if I understand it correctly, flips all the bits in 'maxb' (excluding the leading zeros).

It tries to set all bits in which maxa is zero starting and keeps them set if the resulting number is less than maxb.

int highbit (uint n) { auto result = 0; if (n& 0xFFFF0000) { result += 16; n = n>> 16; } if (n& 0xFF00) { result += 8; n = n>> 8; } if (n& 0xF0) { result += 4; n = n>> 4; } if (n& 0xC) { result += 2; n = n>> 2; } if (n& 0x2) { result += 1; n = n>> 1; } if (n& 0x1) { result += 1; } return result; } Jerome

Interesting. I don't get how your version is equivalent to mine, and I don't get how it actually works, but it seems to. Could you please explain? The program below compiles, runs, and prints total=100000000; equal=3850865 (3.85087%) Here it is: import std.contracts, std.stdio; uint maxOR(uint maxa, uint maxb) { return maxa | ((1 << highbit (maxb))-1); } uint highbit(uint n) { auto result = 0; if (n & 0xFFFF0000) { result += 16; n = n >> 16; } if (n & 0xFF00) { result += 8; n = n >> 8; } if (n & 0xF0) { result += 4; n = n >> 4; } if (n & 0xC) { result += 2; n = n >> 2; } if (n & 0x2) { result += 1; n = n >> 1; } if (n & 0x1) { result += 1; } return result; } void main() { ulong total, equal; uint N = 10000; foreach (a; 0 .. N) { foreach (b; 0 .. N) { auto c = maxOR(a, b); enforce((a|b) <= c); if ((a|b) == c) equal++; total++; } } writeln("total=", total, "; equal=", equal, " (", equal * 100. / total, "%)"); } However, your method is not the tightest; seems like mine is tighter. When I replaced maxOR with this: uint maxOR2(uint maxa, uint maxb) { if (maxa < maxb) return maxOR(maxb, maxa); uint candidate = 0; uint mask = 1u << 31; for (uint i = 0; i < 32; ++i, mask >>= 1) { if (maxa & mask) continue; auto t = candidate | (1 << (31 - i)); if (t <= maxb) candidate = t; } return maxa | candidate; } I got: total=100000000; equal=9822516 (9.82252%) Besides, I verified that my method returns numbers <= yours. Andrei
Apr 10 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/10/2010 05:33 PM, Andrei Alexandrescu wrote:
[snip]

One more interesting detail. I simplified the routine to:

uint maxOR(uint maxa, uint maxb) {
     uint candidate = 0;
     uint mask = 1u << 31;
     for (; mask; mask >>= 1) {
         if (maxa & mask) continue;
         auto t = candidate | mask;
         if (t <= maxb) candidate = t;
     }
     return maxa | candidate;
}

Among other things I remove the test whether maxa >= maxb at the 
beginning of the function. I now obtain:

total=100000000; equal=14585400 (14.5854%)

so this function is better than all others so far. But I don't 
understand why it beats this one:

uint maxOR(uint maxa, uint maxb) {
     if (maxa < maxb) return maxOR(maxb, maxa); // added
     uint candidate = 0;
     uint mask = 1u << 31;
     for (; mask; mask >>= 1) {
         if (maxa & mask) continue;
         auto t = candidate | mask;
         if (t <= maxb) candidate = t;
     }
     return maxa | candidate;
}


Andrei
Apr 10 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/10/2010 05:43 PM, Andrei Alexandrescu wrote:
 On 04/10/2010 05:33 PM, Andrei Alexandrescu wrote:
 [snip]

 One more interesting detail.

Never mind that. I mistakenly used maxOR instead of maxOR2. To summarize: best result is total=100000000; equal=14585400 (14.5854%) with the function: uint maxOR(uint maxa, uint maxb) { uint candidate = 0; uint mask = 1u << 31; for (; mask; mask >>= 1) { if (maxa & mask) continue; auto t = candidate | mask; if (t <= maxb) candidate = t; } return maxa | candidate; } Inserting a test and a recursive call at the top keeps the result but actually slows down execution a bit. Andrei
Apr 10 2010
prev sibling parent reply =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Andrei Alexandrescu wrote:
 Interesting. I don't get how your version is equivalent to mine, and I
 don't get how it actually works, but it seems to. Could you please expl=

=20

tired. Moreover it doesn't work: it should be: maxa | ((1 << (highbit (maxb)+1))-1) Which still isn't as tight as yours but at least works :) Between 0 and 64, it is optimal for 92% cases whereas yours is optimal for over 98% cases.
 void main() {
     ulong total, equal;
     uint N =3D 10000;
     foreach (a; 0 .. N) {
         foreach (b; 0 .. N) {
             auto c =3D maxOR(a, b);
             enforce((a|b) <=3D c);
             if ((a|b) =3D=3D c) equal++;
             total++;
         }
     }
     writeln("total=3D", total, "; equal=3D", equal,
             " (", equal * 100. / total, "%)");
 }
=20

accuracy with this test: just define maxOR to be maxA|maxB :) I used the following C++ code for my tests: int main (int, char**) { int count =3D 0; int countTight =3D 0; for (uint32_t minA =3D 0; minA < 64; ++minA) for (uint32_t maxA =3D minA; maxA < 64; ++maxA) for (uint32_t minB =3D 0; minB < 64; ++minB) for (uint32_t maxB =3D minB; maxB < 64; ++maxB) { bool reached =3D false; uint32_t max =3D maxOr (minA, minB, maxA, maxB); for (uint32_t a =3D minA; a <=3D maxA; ++a) for (uint32_t b =3D minB; b <=3D maxB; ++b) { if ((a|b) > max) { printf ("minA=3D%x\n" "maxA=3D%x\n" "minB=3D%x\n" "maxB=3D%x\n" "a=3D%x\n" "b=3D%x\n" "a|b=3D%x\n" "maxOr=3D%x\n", minA, maxA, minB, maxB, a, b, a|b, max); exit (1); } if ((a|b) =3D=3D max) reached =3D true; } if (reached) ++countTight; ++count; } printf ("Total: %d\n", count); printf ("Tight: %d (%g%%)\n", countTight, countTight * 100. / count); printf ("Loose: %d (%g%%)\n", (count - countTight), (count - countTight) * 100. / count); return 0; } OTOH, my solution was slightly faster (4.98s to your 5.34s) Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 11 2010
next sibling parent reply =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

	OK, I found a solution that is optimal in over 99% of the cases
(99.195% to be precise), while still being fast (because it avoids
looping over the bits):

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D8<------------------------------
uint32_t maxOr (uint32_t    minA, uint32_t minB,
                uint32_t    maxA, uint32_t maxB)
{
   if (maxA =3D=3D 0) return maxB;
   if (maxB =3D=3D 0) return maxA;

   uint32_t a =3D maxA ^ minA;
   if (a !=3D 0)
      a =3D ((1 << highbit (a)) - 1);

   uint32_t b =3D maxB ^ minB;
   if (b !=3D 0)
      b =3D ((1 << highbit (b)) - 1);

   uint32_t  t =3D maxA & maxB;
   if (t !=3D 0)
      t =3D ((1 << highbit (t)) - 1);

   return min (maxA|maxB|a|b, maxA|maxB|t);
}
------------------------------>8=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Mostly it's black magic ;) and it's **much** simpler than the first
version that reached those performances. The explanation is in the
rest of the message.

I will only  talk  about the ``a``  side  of the problem here   (i.e
assume ``maxB=3D=3DminB``). The ``b`` side is symmetrical.

The idea is to build a value that is  between minA and maxA and will
set as many bits as possible when or'ed with maxB:

- The first thing  we do is to  notice that minA  and  maxA have the
  following  structure   (in binary):  ``minA=3DA0x`` and ``maxA=3DA1y``
  where  the ``A`` part is  identical.  Any  number between minA and
  maxA will therefore be of the form  ``a=3DAz``.  A very conservative
  estimate tells  us   that if  we   set  ``z`` to all  ones,   then
  ``maxB|maxA|z``  will be  greater than  ``maxB|a``  for all ``a``.
  This is computed by ``(1 << highbit (maxA ^ minA)) - 1``;

- Another way to look at the  problem is to  say that a ``0`` bit in
  maxA cannot be  set  unless some higher  ``1`` bit  is unset.  For
  example if maxA is ``0b10`` then if we want  to set bit 0, we need
  to unset bit 1 (which will give us ``0b01``).  So by doing this we
  get a lower  value for ``a|maxB`` unless  this bit was already set
  in maxB. The expression ``maxA & maxB`` gives us  the bits that we
  can unset. Therefore only bits that  are less significant than the
  high bit of ``maxA & maxB`` may  be set.  This  is stored into the
  variable ``t``;

- Either  method  works,  but  neither  is  perfect.  By  taking the
  minimum   of the  two  results, we  get  the best   of both worlds
  (although still isn't perfect).

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
Apr 11 2010
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
� wrote:

 The idea is to build a value that is  between minA and maxA and will
 set as many bits as possible when or'ed with maxB:

The assumption that maxB would be the value that produces the maximum a|b is not correct. A lower valued b may fill more gaps in the bit representation of what is calculated from min_a and max_a. Your function failed for me with the following values: min_a 00000000000000000000000011001000 000000c8 200 max_a 00000000000000000000001100001111 0000030f 783 min_b 00000000000000000000000001000101 00000045 69 max_b 00000000000000000000001001100001 00000261 609 calculated 00000000000000000000001001100101 00000265 613 WRONG! empirical 00000000000000000000001111111111 000003ff 1023 emp_max_a 00000000000000000000000110011110 0000019e 414 emp_max_b 00000000000000000000001001100001 00000261 609 Please see my test code elsewhere in the same thread: :) http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=108851 Ali
Apr 11 2010
parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: multipart/mixed;
 boundary="------------020801040508020503030409"

This is a multi-part message in MIME format.
--------------020801040508020503030409
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Ali =C3=87ehreli wrote:
 =EF=BF=BD wrote:
=20
 The idea is to build a value that is  between minA and maxA and will
 set as many bits as possible when or'ed with maxB:

The assumption that maxB would be the value that produces the maximum a|b is not correct. A lower valued b may fill more gaps in the bit representation of what is calculated from min_a and max_a. =20 Your function failed for me with the following values: =20 min_a 00000000000000000000000011001000 000000c8 200 max_a 00000000000000000000001100001111 0000030f 783 min_b 00000000000000000000000001000101 00000045 69 max_b 00000000000000000000001001100001 00000261 609 calculated 00000000000000000000001001100101 00000265 613 WRONG! empirical 00000000000000000000001111111111 000003ff 1023 emp_max_a 00000000000000000000000110011110 0000019e 414 emp_max_b 00000000000000000000001001100001 00000261 609 =20 Please see my test code elsewhere in the same thread: :) =20 http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalma=

=20

function! Since the return value from my function includes maxA and maxB, at least all bits that are set in either of those should be set in the output. I've run my code with those input and the result is 3ff as expected... (See attached source file). Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr --------------020801040508020503030409 Content-Type: text/x-c++src; name="maxor.cc" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline; filename="maxor.cc" #include <assert.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> int highbit (uint32_t n) { assert (n !=3D 0); int result =3D 0; if (n & 0xFFFF0000) { result +=3D 16; n >>=3D 16; } if (n & 0xFF00) { result +=3D 8; n >>=3D 8; } if (n & 0xF0) { result +=3D 4; n >>=3D 4; } if (n & 0xC) { result +=3D 2; n >>=3D 2; } if (n & 0x2) { result +=3D 1; n >>=3D 1; } return result; } inline uint32_t min (uint32_t a, uint32_t b) { return (a<b) ? a : b; } uint32_t maxOr (uint32_t minA, uint32_t minB, uint32_t maxA, uint32_t maxB) { assert (minA <=3D maxA); assert (minB <=3D maxB); if (maxA =3D=3D 0) return maxB; if (maxB =3D=3D 0) return maxA; uint32_t a =3D maxA ^ minA; if (a !=3D 0) a =3D ((1 << highbit (a)) - 1); uint32_t b =3D maxB ^ minB; if (b !=3D 0) b =3D ((1 << highbit (b)) - 1); uint32_t t =3D maxA & maxB; if (t !=3D 0) t =3D ((1 << highbit (t)) - 1); return min (maxA|maxB|a|b, maxA|maxB|t); } int main (int, char**) { uint32_t minA =3D 200; uint32_t maxA =3D 783; uint32_t minB =3D 69; uint32_t maxB =3D 609; printf ("minA=3D%x\n" "maxA=3D%x\n" "minB=3D%x\n" "maxB=3D%x\n" "maxOr=3D%x\n", minA, maxA, minB, maxB, maxOr (minA, minB, maxA, maxB)); return 0; } --------------020801040508020503030409--
Apr 11 2010
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
Jérôme M. Berger wrote:
 Ali Çehreli wrote:
 � wrote:

 The idea is to build a value that is  between minA and maxA and will
 set as many bits as possible when or'ed with maxB:

a|b is not correct. A lower valued b may fill more gaps in the bit representation of what is calculated from min_a and max_a. Your function failed for me with the following values: min_a 00000000000000000000000011001000 000000c8 200 max_a 00000000000000000000001100001111 0000030f 783 min_b 00000000000000000000000001000101 00000045 69 max_b 00000000000000000000001001100001 00000261 609 calculated 00000000000000000000001001100101 00000265 613 WRONG! empirical 00000000000000000000001111111111 000003ff 1023 emp_max_a 00000000000000000000000110011110 0000019e 414 emp_max_b 00000000000000000000001001100001 00000261 609 Please see my test code elsewhere in the same thread: :)



function!

My mistake: It was your function but the highbit() that I used was not correct. The one I used was returning the *value* of the highest bit.
 Since the return value from my function includes maxA and
 maxB, at least all bits that are set in either of those should be
 set in the output.

 	I've run my code with those input and the result is 3ff as
 expected... (See attached source file).

Perhaps my test code is wrong; but I can't find my error. :/ Your function reports 0b_111 for these set of values: min_a = 5; max_a = 6; min_b = 4; max_b = 4; But the maximum value of a|b is 6. Ali
Apr 11 2010
parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Ali =C3=87ehreli wrote:
 J=C3=A9r=C3=B4me M. Berger wrote:
 Ali =C3=87ehreli wrote:
 =EF=BF=BD wrote:

 The idea is to build a value that is  between minA and maxA and will=




 set as many bits as possible when or'ed with maxB:




 a|b is not correct. A lower valued b may fill more gaps in the bit
 representation of what is calculated from min_a and max_a.

 Your function failed for me with the following values:

            min_a 00000000000000000000000011001000 000000c8        200=



            max_a 00000000000000000000001100001111 0000030f        783=



            min_b 00000000000000000000000001000101 00000045         69=



            max_b 00000000000000000000001001100001 00000261        609=



       calculated 00000000000000000000001001100101 00000265        613=



 WRONG! empirical 00000000000000000000001111111111 000003ff       1023=



        emp_max_a 00000000000000000000000110011110 0000019e        414=



        emp_max_b 00000000000000000000001001100001 00000261        609=



 Please see my test code elsewhere in the same thread: :)



=20

function!

My mistake: It was your function but the highbit() that I used was not correct. The one I used was returning the *value* of the highest bit. =20
 Since the return value from my function includes maxA and
 maxB, at least all bits that are set in either of those should be
 set in the output.

     I've run my code with those input and the result is 3ff as
 expected... (See attached source file).

Perhaps my test code is wrong; but I can't find my error. :/ =20 Your function reports 0b_111 for these set of values: =20 min_a =3D 5; max_a =3D 6; min_b =3D 4; max_b =3D 4; =20 But the maximum value of a|b is 6. =20

optimal result for over 99% of the cases but give a slightly higher value for the rest. I'll post a new version that's 100% precise in a few minutes. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 12 2010
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
Jérôme M. Berger wrote:

 Your function reports 0b_111 for these set of values:

         min_a = 5;
         max_a = 6;
         min_b = 4;
         max_b = 4;

 But the maximum value of a|b is 6.

optimal result for over 99% of the cases but give a slightly higher value for the rest.

Now I see my problem: I took Andrei's puzzle literally. The implication of _compiler_implementation_ was too subtle for me. That's the reason why none of my "correctly inaccurate" solutions has been posted to this thread. :) Ali
Apr 12 2010
prev sibling next sibling parent reply =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
Content-Type: multipart/mixed;
 boundary="------------010309040703050906000802"

This is a multi-part message in MIME format.
--------------010309040703050906000802
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

J=E9r=F4me M. Berger wrote:
 	OK, I found a solution that is optimal in over 99% of the cases
 (99.195% to be precise), while still being fast (because it avoids
 looping over the bits):
=20

0..299 without checking, just for the timing. On my computer (Athlon X2 64 2GHz), my function took 50s and Andrei's took 266s. The difference should be even bigger for larger numbers. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr --------------010309040703050906000802 Content-Type: text/x-c++src; name="maxor.cc" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline; filename="maxor.cc" #include <assert.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #define N 300 // Comment the following line to get Andrei's implementation #define MINE // Comment the following line for exhaustive checking (slow! don't // forget to reduce N) #define BENCH int highbit (uint32_t n) { assert (n !=3D 0); int result =3D 0; if (n & 0xFFFF0000) { result +=3D 16; n >>=3D 16; } if (n & 0xFF00) { result +=3D 8; n >>=3D 8; } if (n & 0xF0) { result +=3D 4; n >>=3D 4; } if (n & 0xC) { result +=3D 2; n >>=3D 2; } if (n & 0x2) { result +=3D 1; n >>=3D 1; } return result; } inline uint32_t min (uint32_t a, uint32_t b) { return (a<b) ? a : b; } #ifdef MINE uint32_t maxOr (uint32_t minA, uint32_t minB, uint32_t maxA, uint32_t maxB) { assert (minA <=3D maxA); assert (minB <=3D maxB); if (maxA =3D=3D 0) return maxB; if (maxB =3D=3D 0) return maxA; uint32_t a =3D maxA ^ minA; if (a !=3D 0) a =3D ((1 << highbit (a)) - 1); uint32_t b =3D maxB ^ minB; if (b !=3D 0) b =3D ((1 << highbit (b)) - 1); uint32_t t =3D maxA & maxB; if (t !=3D 0) t =3D ((1 << highbit (t)) - 1); return min (maxA|maxB|a|b, maxA|maxB|t); } #else // Andrei's uint32_t maxOr (uint32_t, uint32_t, uint32_t maxa, uint32_t maxb) { uint32_t candidate =3D 0; uint32_t mask =3D 1u << 31; for (; mask; mask >>=3D 1) { if (maxa & mask) continue; uint32_t t =3D candidate | mask; if (t <=3D maxb) candidate =3D t; } return maxa | candidate; } #endif int main (int, char**) { int count =3D 0; int countTight =3D 0; for (uint32_t minA =3D 0; minA < N; ++minA) for (uint32_t maxA =3D minA; maxA < N; ++maxA) for (uint32_t minB =3D 0; minB < N; ++minB) for (uint32_t maxB =3D minB; maxB < N; ++maxB) { bool reached =3D false; uint32_t max =3D maxOr (minA, minB, maxA, maxB); #ifdef BENCH count +=3D max; #else // Check for (uint32_t a =3D minA; a <=3D maxA; ++a) for (uint32_t b =3D minB; b <=3D maxB; ++b) { if ((a|b) > max) { printf ("minA=3D%x\n" "maxA=3D%x\n" "minB=3D%x\n" "maxB=3D%x\n" "a=3D%x\n" "b=3D%x\n" "a|b=3D%x\n" "maxOr=3D%x\n", minA, maxA, minB, maxB, a, b, a|b, max); exit (1); } if ((a|b) =3D=3D max) reached =3D true; } if (reached) ++countTight; ++count; #endif } #ifdef BENCH printf ("Result: %d (this number doesn't mean anything, it is only her= e\n" "to make sure the compiler doesn't optimize everything away...= \n", count); #else printf ("Total: %d\n", count); printf ("Tight: %d (%g%%)\n", countTight, countTight * 100. / count); printf ("Loose: %d (%g%%)\n", (count - countTight), (count - countTight) * 100. / count); #endif return 0; } --------------010309040703050906000802--
Apr 11 2010
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
Jrme M. Berger Wrote:

 Jrme M. Berger wrote:
 	OK, I found a solution that is optimal in over 99% of the cases
 (99.195% to be precise), while still being fast (because it avoids
 looping over the bits):
 

0..299 without checking, just for the timing. On my computer (Athlon X2 64 2GHz), my function took 50s and Andrei's took 266s. The difference should be even bigger for larger numbers.

Can I ask, what is the point of having a non-exact solution (unless you are just doing this for fun)? We are talking about range propagation, a function of the compiler, not a function of the compiled program. Therefore, slower but more exact functions should be preferred, since they only affect the compile time. Note that you will only need to do this range propagation on lines that "or" two values together, and something that reduces the runtime of the compiler by 216 seconds, but only when compiling enough code to have over 8 billion 'or' operations in it (300^4), I don't think is really that important. Give me the exact solution that prevents me from having to cast when the compiler can prove it, even if the runtime of the compiler is slightly slower. Just my $.02 -Steve
Apr 11 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/11/2010 09:18 PM, Adam D. Ruppe wrote:
 On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:
 We are talking about range propagation, a function of the compiler, not a
function of the compiled program.  Therefore, slower but more exact functions
should be preferred, since they only affect the compile time.

I agree with this. While my solution has a certain beauty to it and is fast, your solution accurately covers all the bases, whereas mine fails with min> 1. I say we go with your's.

Agreed. Steve's solution is the best we have for unsigned numbers. Andrei
Apr 12 2010
parent reply =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Andrei Alexandrescu wrote:
 On 04/11/2010 09:18 PM, Adam D. Ruppe wrote:
 On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:
 We are talking about range propagation, a function of the compiler,
 not a function of the compiled program.  Therefore, slower but more
 exact functions should be preferred, since they only affect the
 compile time.

I agree with this. While my solution has a certain beauty to it and is=


 fast, your solution accurately covers all the bases, whereas mine
 fails with
 min>  1. I say we go with your's.

Agreed. Steve's solution is the best we have for unsigned numbers. =20 Andrei =20

Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 12 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/12/2010 12:57 PM, "Jrme M. Berger" wrote:
 Andrei Alexandrescu wrote:
 On 04/11/2010 09:18 PM, Adam D. Ruppe wrote:
 On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:
 We are talking about range propagation, a function of the compiler,
 not a function of the compiled program.  Therefore, slower but more
 exact functions should be preferred, since they only affect the
 compile time.

I agree with this. While my solution has a certain beauty to it and is fast, your solution accurately covers all the bases, whereas mine fails with min> 1. I say we go with your's.

Agreed. Steve's solution is the best we have for unsigned numbers. Andrei

Jerome

Looks great, thanks Jerome! Now, how about that case in which some or all of the ranges involved include negative values? A simple approach is to define them in terms of ranges for unsigned numbers. Here are the cases I identified: a) If both ranges cross zero, then: minOR == setmsb(min( minOR(nomsb(min_a), nomsb(-1), nomsb(min_b), nomsb(max_b)), minOR(nomsb(min_a), nomsb(max_a), nomsb(min_b), nomsb(-1)))) maxOR == maxOR(0, max_a, 0, max_b) b) If both ranges are negative, then: minOR == setmsb(minOR(nomsb(max_a), nomsb(min_a), nomsb(max_b), nomsb(min_b))) maxOR == setmsb(maxOR(nomsb(max_a), nomsb(min_a), nomsb(max_b), nomsb(min_b))) c) If a crosses zero and b doesn't: minOR == setmsb(minOR(nomsb(min_a), nomsb(-1), min_b, max_b)) maxOR == maxOR(0, max_a, min_b, max_b) The primitive nomsb clears off the sign bit in a number, and setmsb sets it. Makes sense? Andrei
Apr 16 2010
parent Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 On 04/12/2010 12:57 PM, "Jrme M. Berger" wrote:
 Andrei Alexandrescu wrote:
 On 04/11/2010 09:18 PM, Adam D. Ruppe wrote:
 On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:
 We are talking about range propagation, a function of the compiler,
 not a function of the compiled program.  Therefore, slower but more
 exact functions should be preferred, since they only affect the
 compile time.

I agree with this. While my solution has a certain beauty to it and is fast, your solution accurately covers all the bases, whereas mine fails with min> 1. I say we go with your's.

Agreed. Steve's solution is the best we have for unsigned numbers. Andrei

Jerome

Looks great, thanks Jerome! Now, how about that case in which some or all of the ranges involved include negative values? A simple approach is to define them in terms of ranges for unsigned numbers. Here are the cases I identified: a) If both ranges cross zero, then: minOR == setmsb(min( minOR(nomsb(min_a), nomsb(-1), nomsb(min_b), nomsb(max_b)), minOR(nomsb(min_a), nomsb(max_a), nomsb(min_b), nomsb(-1)))) maxOR == maxOR(0, max_a, 0, max_b) b) If both ranges are negative, then: minOR == setmsb(minOR(nomsb(max_a), nomsb(min_a), nomsb(max_b), nomsb(min_b))) maxOR == setmsb(maxOR(nomsb(max_a), nomsb(min_a), nomsb(max_b), nomsb(min_b))) c) If a crosses zero and b doesn't: minOR == setmsb(minOR(nomsb(min_a), nomsb(-1), min_b, max_b)) maxOR == maxOR(0, max_a, min_b, max_b) The primitive nomsb clears off the sign bit in a number, and setmsb sets it. Makes sense? Andrei

A comment: The way DMD currently does range propagation (storing range as union{long, ulong}) is wrong. Currently when mixed signed+-unsigned operations happen, it gives very pessimistic results. The sign bit should be stored separately. sign value 1 1xxxxx Negative, long.min..-1 1 0xxxxx (Never occurs) 0 0xxxxx 0..long.max 0 1xxxxx long.max+1..ulong.max I'm not sure if that makes this case a little more, or a little less complicated. But this defines the problem a bit better.
Apr 20 2010
prev sibling parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
 J=EF=BF=BDr=EF=BF=BDme M. Berger Wrote:
=20
 J=EF=BF=BDr=EF=BF=BDme M. Berger wrote:
 	OK, I found a solution that is optimal in over 99% of the cases
 (99.195% to be precise), while still being fast (because it avoids
 looping over the bits):

0..299 without checking, just for the timing. On my computer (Athlon X2 64 2GHz), my function took 50s and Andrei's took 266s. The difference should be even bigger for larger numbers.

Can I ask, what is the point of having a non-exact solution (unless you=

=20
 We are talking about range propagation, a function of the compiler, not=

nctions should be preferred, since they only affect the compile time. No= te that you will only need to do this range propagation on lines that "or= " two values together, and something that reduces the runtime of the comp= iler by 216 seconds, but only when compiling enough code to have over 8 b= illion 'or' operations in it (300^4), I don't think is really that import= ant. Give me the exact solution that prevents me from having to cast whe= n the compiler can prove it, even if the runtime of the compiler is sligh= tly slower.
=20

speed of the D compiler to that of the Go compiler... We are talking about range propagation, a function of the compiler, not a function of the compiled program. Since we can't get a 100% accurate representation of the possible values anyway (even yours might leave holes in the middle after all), then it might be better to choose a faster, slightly less precise algorithm if the difference is not too great. That's the difference between a compiler and a full-fledged static code analysis an program prover. Anyway, the point is moot, I have a new version of my algorithm with 100% precision and high speed. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 12 2010
next sibling parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
 On Mon, 12 Apr 2010 13:45:14 -0400, J=C3=A9r=C3=B4me M. Berger <jeberge=

 wrote:
=20
     We are talking about range propagation, a function of the compiler=


 not a function of the compiled program. Since we can't get a 100%
 accurate representation of the possible values anyway (even yours
 might leave holes in the middle after all), then it might be better
 to choose a faster, slightly less precise algorithm if the
 difference is not too great. That's the difference between a
 compiler and a full-fledged static code analysis an program prover.

When we're talking about the difference between O(1) and O(lgn), I'll take accuracy over speed in my compiler any day. The solution should b=

 100% accurate for the problem statement.  If the problem statement is
 not what we need, then we need a new problem statement :)  Solving the
 problem statement for 99% of values is not good enough.
=20

a minimal loss of accuracy, which will you take? Especially if the accuracy loss is less than is lost elsewhere (due to holes in the ranges).
     Anyway, the point is moot, I have a new version of my algorithm
 with 100% precision and high speed.

Yes, I'm still trying to understand how it works :) =20

Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 12 2010
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
Jérôme M. Berger Wrote:

 Steven Schveighoffer wrote:
 When we're talking about the difference between O(1) and O(lgn), I'll
 take accuracy over speed in my compiler any day.

a minimal loss of accuracy, which will you take? Especially if the accuracy loss is less than is lost elsewhere (due to holes in the ranges).

Really? You rebuilt the compiler with your range propagation algorithm and verified that it adds 10 seconds versus an accurate one that adds 55s? How much time did the compiler spend to compile? I'd hazard to guess that a code base that adds 10s worth of your algorithm takes at least a few hours to compile. Is 55s that bad at that point? Again, if it takes the compiler an extra insignificant amount of time to be more accurate, I'm all for accuracy over speed when you get down to that level of insignificance. I'd say the point of pain has to be at least 10% of the compile time before it makes any sizable difference. -Steve
Apr 12 2010
parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
 J=C3=A9r=C3=B4me M. Berger Wrote:
=20
 Steven Schveighoffer wrote:
 When we're talking about the difference between O(1) and O(lgn), I'll=



 take accuracy over speed in my compiler any day.

a minimal loss of accuracy, which will you take? Especially if the accuracy loss is less than is lost elsewhere (due to holes in the ranges).

Really? You rebuilt the compiler with your range propagation algorithm=

s? How much time did the compiler spend to compile? I'd hazard to guess= that a code base that adds 10s worth of your algorithm takes at least a = few hours to compile. Is 55s that bad at that point?
=20
 Again, if it takes the compiler an extra insignificant amount of time t=

that level of insignificance. I'd say the point of pain has to be at lea= st 10% of the compile time before it makes any sizable difference.
=20

times slower just because it brings extra precision which you may not really need, then you will wind up with a compiler that is 5 to 6 times slower than it needs to be. Sure the difference on *one* function is not great in absolute terms, but if you make the same choice for *all* functions, then where do you go? Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 13 2010
parent =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
 In my opinion?  Yes, slower compilers that make code easier to write ar=

 better.  I don't spend lots of time compiling, I spend it writing code.=

 And I don't need to babysit the compiler, it goes off and does its thin=

=20
 Performance is only important in the end result.  I'm not saying I want=

 my compiler to be slow, but I want it to be accurate and useful more
 than I want it to be quick.  In this case, there is a quick and fully
 accurate solution, so it doesn't matter.  But, for instance, if the
 compiler could do a full analysis to check if variables escape their
 scope, and that makes the compiler 5x slower, then I'd rather have the
 compiler verify my work instead of quickly producing memory-corrupting
 code.
=20

of this kind of tools, but I don't want to have to put up with their slowness every time I compile something. Having a quick write-compile-test cycle is very important IMO. Then when I have a rough outline of the program, I can run another tool (or adding a command line option) to enable extra checking (and then it doesn't matter if the tool takes the whole night to run and I get the results the next morning when I get to work). Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 13 2010
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Fawzi Mohamed:
 I guess that I am missing something obvious, so I don't see the reason  
 for range propagation, but maybe I am not the only one, so thanks for  
 an explanation...

Range propagation can improve the situation a little, so it's not bad. But it can't replace proper integral overflows. You need integral overflows at runtime. Bye, bearophile
Apr 13 2010
prev sibling parent reply Don <nospam nospam.com> writes:
Fawzi Mohamed wrote:
 
 On 12-apr-10, at 21:40, Steven Schveighoffer wrote:
 
 On Mon, 12 Apr 2010 13:45:14 -0400, Jérôme M. Berger 
 <jeberger free.fr> wrote:

 Steven Schveighoffer wrote:
 J�r�me M. Berger Wrote:

 J�r�me M. Berger wrote:
     OK, I found a solution that is optimal in over 99% of the cases
 (99.195% to be precise), while still being fast (because it avoids
 looping over the bits):

pairs in 0..299 without checking, just for the timing. On my computer (Athlon X2 64 2GHz), my function took 50s and Andrei's took 266s. The difference should be even bigger for larger numbers.

Can I ask, what is the point of having a non-exact solution (unless you are just doing this for fun)? We are talking about range propagation, a function of the compiler, not a function of the compiled program. Therefore, slower but more exact functions should be preferred, since they only affect the compile time. Note that you will only need to do this range propagation on lines that "or" two values together, and something that reduces the runtime of the compiler by 216 seconds, but only when compiling enough code to have over 8 billion 'or' operations in it (300^4), I don't think is really that important. Give me the exact solution that prevents me from having to cast when the compiler can prove it, even if the runtime of the compiler is slightly slower.

speed of the D compiler to that of the Go compiler...

My point was simply that the amount of time saved is relative to the size of the program being compiled. If we assume conservatively that every other line in a program has bitwise or in it, then you are talking about a 16 billion line program, meaning the 216 seconds you save is insignificant compared to the total compile time. My point was that your fast solution that is inaccurate is not preferable because nobody notices the speed, they just notice the inaccuracy.
     We are talking about range propagation, a function of the compiler,
 not a function of the compiled program. Since we can't get a 100%
 accurate representation of the possible values anyway (even yours
 might leave holes in the middle after all), then it might be better
 to choose a faster, slightly less precise algorithm if the
 difference is not too great. That's the difference between a
 compiler and a full-fledged static code analysis an program prover.

When we're talking about the difference between O(1) and O(lgn), I'll take accuracy over speed in my compiler any day. The solution should be 100% accurate for the problem statement. If the problem statement is not what we need, then we need a new problem statement :) Solving the problem statement for 99% of values is not good enough.
     Anyway, the point is moot, I have a new version of my algorithm
 with 100% precision and high speed.

Yes, I'm still trying to understand how it works :)

Sorry for the probably stupid question, but I don't understand much the need of all this range propagation, in the compiler either you have a a constant (and then you have the value at compile time, and you don't need any range propagation, you can compare with the value), or you have a runtime value.

It's been part of DMD2 for a while now. It allows you to do things like: ubyte lowbits(int x) { return x & 7; } without an explicit cast. The compiler knows that x&7 can safely fit inside a single byte. Whereas ((x&7) << 12) | (x&3); does not fit, and requires an explicit cast.
Apr 13 2010
parent =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Fawzi Mohamed wrote:
=20
 On 13-apr-10, at 12:02, Don wrote:
 It's been part of DMD2 for a while now. It allows you to do things lik=


 ubyte lowbits(int x)
 {
    return x & 7;
 }

 without an explicit cast. The compiler knows that x&7 can safely fit
 inside a single byte.  Whereas   ((x&7) << 12) | (x&3);
 does not fit, and requires an explicit cast.

ah ok I understand, I thought that was treated like x & cast(ubyte)7 , and so as comparison of the compile time value with the ranges of ubyte=

 (no range propagation needed).
 But I can understand why it is treated as cast(ubyte)((cast(int)x) & 7)=

 as it is probably easier for the compiler, as it upcasts by default.
=20
 Thanks for the explanation.
=20

don't need the cast to "int" in your second example, but you still need range propagation in the first... Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 13 2010
prev sibling parent reply =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

	Here's a fast 100% precise solution:

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D8<------------------------------
uint32_t maxOr (uint32_t minA, uint32_t minB,
                uint32_t maxA, uint32_t maxB)
{
   assert (minA <=3D maxA);
   assert (minB <=3D maxB);

   if (maxA =3D=3D 0) return maxB;
   if (maxB =3D=3D 0) return maxA;

   uint32_t a =3D maxA ^ minA;
   if (a !=3D 0) {
      a =3D ((1 << (highbit (a)+1)) - 1) & maxA & maxB;
      if (a !=3D 0)
         a =3D (1 << highbit (a)) - 1;
   }

   uint32_t b =3D maxB ^ minB;
   if (b !=3D 0) {
      b =3D ((1 << (highbit (b)+1)) - 1) & maxA & maxB;
      if (b !=3D 0)
         b =3D (1 << highbit (b)) - 1;
   }

   return maxA|maxB|a|b;
}
------------------------------>8=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
Apr 12 2010
next sibling parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
 Fails for test case:
=20
 minA =3D 4, maxA =3D 4, minB =3D 4, maxB =3D 6 (outputs 7, accurate res=

=20

combinations in the [0, 63] range, so if there are mistakes they have to be outside that range... Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 12 2010
next sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
Jérôme M. Berger wrote:
 Steven Schveighoffer wrote:
 Fails for test case:

 minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result is 6).

combinations in the [0, 63] range, so if there are mistakes they have to be outside that range... Jerome

I confirm. Jerome's code passes my randomized test. Ali
Apr 12 2010
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
Jérôme M. Berger Wrote:

 Steven Schveighoffer wrote:
 Fails for test case:
 
 minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result is 6).
 

combinations in the [0, 63] range, so if there are mistakes they have to be outside that range...

I'll have to double check, I thought I copied your code verbatim (I did switch around the arguments to be minA, maxA, minB, maxB to fit my test harness, and I changed the uint_32_t to uint). I'll check tomorrow at work where the computer I used to test is. -Steve
Apr 12 2010
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
Steven Schveighoffer Wrote:

 I'll have to double check, I thought I copied your code verbatim (I did switch
around the arguments to be minA, maxA, minB, maxB to fit my test harness, and I
changed the uint_32_t to uint).  I'll check tomorrow at work where the computer
I used to test is.
 

I think I see the issue, I was using an older version of your highbit, at some point you changed it, but didn't repeat it in this solution, so I searched for it on the newsgroup and found your first version. That differs from later versions you posted, but I can't find where you identified there was a bug. I'll test again tomorrow, but it looks like that probably accounts for the problem. The version I was using: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=108788 The later version: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=108855 It looks like in that second post that Ali had the same problem I did. Next time, you should include your whole code each time, especially parts you changed. -Steve
Apr 12 2010
parent =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
 Steven Schveighoffer Wrote:
=20
 I'll have to double check, I thought I copied your code verbatim (I di=


harness, and I changed the uint_32_t to uint). I'll check tomorrow at w= ork where the computer I used to test is.

I think I see the issue, I was using an older version of your highbit, =

searched for it on the newsgroup and found your first version. That dif= fers from later versions you posted, but I can't find where you identifie= d there was a bug.
=20
 I'll test again tomorrow, but it looks like that probably accounts for =

=20
 The version I was using: http://www.digitalmars.com/webnews/newsgroups.=

=20
 The later version: http://www.digitalmars.com/webnews/newsgroups.php?ar=

=20
 It looks like in that second post that Ali had the same problem I did.
=20
 Next time, you should include your whole code each time, especially par=

=20

for the C<->D translation). Sorry about that. Note that there is no bug: both versions are correct. The first one uses 1-based indexes, while the second uses 0-based indexes an processes 0 differently. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 13 2010
prev sibling next sibling parent Don <nospam nospam.com> writes:
Steven Schveighoffer wrote:
 On Mon, 12 Apr 2010 22:37:00 -0400, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 
 Jérôme M. Berger Wrote:

 Steven Schveighoffer wrote:
 Fails for test case:

 minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result 


combinations in the [0, 63] range, so if there are mistakes they have to be outside that range...

I'll have to double check, I thought I copied your code verbatim (I did switch around the arguments to be minA, maxA, minB, maxB to fit my test harness, and I changed the uint_32_t to uint). I'll check tomorrow at work where the computer I used to test is.

I confirm, with the updated highbit, your solution works. Also, inspired by your solution (and using your highbit function, which is very neat BTW) I came up with a one-liner. Enjoy :) uint maxor(uint mina, uint maxa, uint minb, uint maxb) { return maxa | maxb | ((1 << highbit(((1 << (max(highbit(mina ^ maxa), highbit(minb ^ maxb)) + 1)) - 1) & maxa & maxb)) - 1); } Anyone want to do the equivalent minor? I've spent way too much time on this :) -Steve

Awesome stuff, guys. I think that's worthy of inclusion as an exercise in Knuth's Volume 4, fascile 1. Suggest it to someone at ACCU? (Knuth has highbit(x) but calls it lambda(x). He notes the result that highbit(x)==highbit(y) iff x^y <= x&y).
Apr 13 2010
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:
 I never would have thought of doing a binary search for the high bit, it  
 is definitely a novel idea to me :)

You can learn something from this page (it can also be useful for the translation to C++): http://graphics.stanford.edu/~seander/bithacks.html Bye, bearophile
Apr 13 2010
prev sibling next sibling parent reply Clemens <eriatarka84 gmail.com> writes:
Adam Ruppe Wrote:

 Jerome's highbit function is the same as std.intrinsic.bsr. I wonder
 which is faster?
 
 I ran a test, and for 100 million iterations (1..10000000), the
 intrinsic beat out his function be a mere 300 milliseconds on my box.
 - highbit ran in an average of 1897 ms and bsr did the same in an
 average if 1534.
 
 Recompiling with -inline -O -release cuts the raw numbers about in
 half, but keeps about the same difference, leading me to think
 overhead amounts for a fair amount of the percentage instead of actual
 implementation. The new averages are 1134 and 853.

That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I guess is the reason it's an intrinsic in the first place). I would have expected this to be much, much faster than a user function. Anyone care enough to check the generated assembly? [1] http://www.itis.mn.it/linux/quarta/x86/bsr.htm -- Clemens
Apr 13 2010
parent reply Don <nospam nospam.com> writes:
Adam D. Ruppe wrote:
 On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:
 That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd
sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I
guess is the reason it's an intrinsic in the first place). I would have
expected this to be much, much faster than a user function. Anyone care enough
to check the generated assembly?

The opcode is fairly slow anyway (as far as opcodes go) - odds are the implementation inside the processor is similar to Jerome's method, and the main savings come from it loading fewer bytes into the pipeline. I remember a line from a blog, IIRC it was the author of the C++ FQA writing it, saying hardware and software are pretty much the same thing - moving an instruction to hardware doesn't mean it will be any faster, since it is the same algorithm, just done in processor microcode instead of user opcodes.

inlining max().
Apr 13 2010
parent Don <nospam nospam.com> writes:
Don wrote:
 Adam D. Ruppe wrote:
 On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:
 That's strange. Looking at src/backend/cod4.c, function cdbscan, in 
 the dmd sources, bsr seems to be implemented in terms of the bsr 
 opcode [1] (which I guess is the reason it's an intrinsic in the 
 first place). I would have expected this to be much, much faster than 
 a user function. Anyone care enough to check the generated assembly?

The opcode is fairly slow anyway (as far as opcodes go) - odds are the implementation inside the processor is similar to Jerome's method, and the main savings come from it loading fewer bytes into the pipeline. I remember a line from a blog, IIRC it was the author of the C++ FQA writing it, saying hardware and software are pretty much the same thing - moving an instruction to hardware doesn't mean it will be any faster, since it is the same algorithm, just done in processor microcode instead of user opcodes.

inlining max().

Specifically, bsr is 7 uops on AMD, 1 uop on Intel since the original Pentium. AMD's performance is shameful. And bsr() is supported in the compiler; in fact DMC uses it extensively, which is why it's included in DMD!
Apr 13 2010
prev sibling parent =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
 using your highbit function, which is very neat BTW
=20

which comes up regularly in programming discussions and newsgroups...
 uint maxor(uint mina, uint maxa, uint minb, uint maxb)
 {
     return maxa | maxb | ((1 << highbit(((1 << (max(highbit(mina ^
 maxa), highbit(minb ^ maxb)) + 1)) - 1) & maxa & maxb)) - 1);
 }
=20

Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 13 2010
prev sibling parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
is it just me, or does this fail for

minA = 0
minB = 0
maxA = 0x80_00_00_00
maxB = 0x80_00_00_00

I'm pretty sure you need to handle 1 << 32 specially
Dec 04 2010
parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 12/04/2010 01:21 PM, Ellery Newcomer wrote:
 is it just me, or does this fail for

 minA = 0
 minB = 0
 maxA = 0x80_00_00_00
 maxB = 0x80_00_00_00

 I'm pretty sure you need to handle 1<<  32 specially

i'm thinking this is better: uint32_t maxOr (uint32_t minA, uint32_t minB, uint32_t maxA, uint32_t maxB) { assert (minA <= maxA); assert (minB <= maxB); if (maxA == 0) return maxB; if (maxB == 0) return maxA; uint32_t a = maxA ^ minA; if (a != 0) { a |= (1 << highbit (a)) - 1; a = a & maxA & maxB; if (a != 0) a = (1 << highbit (a)) - 1; } uint32_t b = maxB ^ minB; if (b != 0) { b |= (1 << highbit (b)) - 1; b = b & maxA & maxB; if (b != 0) b = (1 << highbit (b)) - 1; } return maxA|maxB|a|b; }
Dec 04 2010
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Tue, Apr 13, 2010 at 10:52:14AM -0400, Steven Schveighoffer wrote:
 Just a note, this code should be runnable in C++, because the compiler is  
 written in C++.  Is bsr available there?

I just assumed since it was in D that it was probably in DMC too, but I can't find it in the docs, so maybe not. Well, there's always inline asm :D But 300 milliseconds over 100 million iterations is negligible anyway. -- Adam D. Ruppe http://arsdnet.net
Apr 13 2010
prev sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:
 That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd
sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I
guess is the reason it's an intrinsic in the first place). I would have
expected this to be much, much faster than a user function. Anyone care enough
to check the generated assembly?

The opcode is fairly slow anyway (as far as opcodes go) - odds are the implementation inside the processor is similar to Jerome's method, and the main savings come from it loading fewer bytes into the pipeline. I remember a line from a blog, IIRC it was the author of the C++ FQA writing it, saying hardware and software are pretty much the same thing - moving an instruction to hardware doesn't mean it will be any faster, since it is the same algorithm, just done in processor microcode instead of user opcodes. -- Adam D. Ruppe http://arsdnet.net
Apr 13 2010
prev sibling next sibling parent Adam Ruppe <destructionator gmail.com> writes:
Jerome's highbit function is the same as std.intrinsic.bsr. I wonder
which is faster?

I ran a test, and for 100 million iterations (1..10000000), the
intrinsic beat out his function be a mere 300 milliseconds on my box.
- highbit ran in an average of 1897 ms and bsr did the same in an
average if 1534.

Recompiling with -inline -O -release cuts the raw numbers about in
half, but keeps about the same difference, leading me to think
overhead amounts for a fair amount of the percentage instead of actual
implementation. The new averages are 1134 and 853.

I've gotta say, your implementation of the function is better than I
would have done though. Without bsr, I probably would have looped,
shifted, and tested each individual bit...

On 4/13/10, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 On Mon, 12 Apr 2010 22:37:00 -0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 J=C3=A9r=C3=B4me M. Berger Wrote:

 Steven Schveighoffer wrote:
 Fails for test case:

 minA =3D 4, maxA =3D 4, minB =3D 4, maxB =3D 6 (outputs 7, accurate r=




 6).

combinations in the [0, 63] range, so if there are mistakes they have to be outside that range...

I'll have to double check, I thought I copied your code verbatim (I did switch around the arguments to be minA, maxA, minB, maxB to fit my test harness, and I changed the uint_32_t to uint). I'll check tomorrow at work where the computer I used to test is.

I confirm, with the updated highbit, your solution works. Also, inspired by your solution (and using your highbit function, which i=

 very neat BTW) I came up with a one-liner.  Enjoy :)

 uint maxor(uint mina, uint maxa, uint minb, uint maxb)
 {
      return maxa | maxb | ((1 << highbit(((1 << (max(highbit(mina ^ maxa)=

 highbit(minb ^ maxb)) + 1)) - 1) & maxa & maxb)) - 1);
 }

 Anyone want to do the equivalent minor?  I've spent way too much time on
 this :)

 -Steve

Apr 13 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 13 Apr 2010 09:56:34 -0400, Adam Ruppe <destructionator gmail.com>  
wrote:

 Jerome's highbit function is the same as std.intrinsic.bsr. I wonder
 which is faster?

Just a note, this code should be runnable in C++, because the compiler is written in C++. Is bsr available there?
 Recompiling with -inline -O -release cuts the raw numbers about in
 half, but keeps about the same difference, leading me to think
 overhead amounts for a fair amount of the percentage instead of actual
 implementation. The new averages are 1134 and 853.

Probably the inlining accounts for the savings here. Calling a function is rather expensive.
 I've gotta say, your implementation of the function is better than I
 would have done though. Without bsr, I probably would have looped,
 shifted, and tested each individual bit...

I never would have thought of doing a binary search for the high bit, it is definitely a novel idea to me :) -Steve
Apr 13 2010
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sat, Apr 10, 2010 at 04:05:06PM -0500, Andrei Alexandrescu wrote:
 The minimum for unsigned numbers is very simple: min(mina, minb).

mina = 1 minb = 0 min(1, 0) == 0 But, 1|0 == 1 max(mina, minb) looks to work though. Consider that ORing can only set bits - it can never take a one and spit out a zero. Therefore, x|y >= x && x|y >= y, which only works on the max(mina, minb) function. -- Adam D. Ruppe http://arsdnet.net
Apr 10 2010
prev sibling next sibling parent PBa <spam goto-hell.com> writes:
Am 10.04.2010, 23:17 Uhr, schrieb Don <nospam nospam.com>:

 Rainer Deyke wrote:
 On 4/10/2010 11:52, Andrei Alexandrescu wrote:
 I think this would work:

 uint maxOR(uint maxa, uint maxb) {
     if (maxa < maxb) return maxOR(maxb, maxa);
     uint candidate = 0;
     foreach (i, bit; byBitDescending(maxa)) {
         if (bit) continue;
         auto t = candidate | (1 << (31 - i));
         if (t <= maxb) candidate = t;
     }
     return maxa | candidate;
 }

all the bits in 'maxb' (excluding the leading zeros). If 'maxb' is exactly 1 less than a power of 2, then 'candidate' will be 0. Now consider a in [0, 16], b in [0, 15]. Your function will produce 16, but the real maximum is 31. For maximum accuracy, you have to consider the minima as well as the maxima when calculating the new maximum. With 'a' in [16, 16] and 'b' in [16, 16], the new range can only be [16, 16]. With 'a' in [0, 16] and 'b' in [0, 16], the correct new range is [0, 31].

It's a very interesting puzzle. There are some pretty complex cases. Consider (all numbers binary) a in [1010, 1100] and b in [101,101]. The maximum value is when a=1011, b=101, so max is 1111.

How about this? import std.stdio, std.conv, std.intrinsic; void main(string[] args) { int total = 0, match = 0, from = to!(int)(args[1]), to = to!(int)(args[2]); uint res, resBF; for(int i = from; i < to; ++i) { for(int j = i; j < to; ++j) { for(int k = from; k < to; ++k) { for(int m = k; m < to; ++m) { ++total; res = maxOr(i, j, k, m); resBF = maxOrBF(i, j, k, m); if(res == resBF) ++match; //else writefln("mismatch: minA %s, maxA %s, minB %s, maxB %s, maxOr %s, maxOrBF %s", i, j, k, m, res, resBF); } } } } writefln("total %s, match %s", total, match); } uint max(uint a, uint b) { return (a > b) ? a : b; } uint maxOr(uint minA, uint maxA, uint minB, uint maxB) { uint result = minA | maxA | minB | maxB, diff = max(maxA - minA, maxB - minB); if(diff) { result |= (1 << (bsr(diff) + 1)) - 1; } return result; } /* compute result using brute force */ uint maxOrBF(uint minA, uint maxA, uint minB, uint maxB) { return bitsSetInIntervallBF(minA, maxA) | bitsSetInIntervallBF(minB, maxB); } uint bitsSetInIntervallBF(uint min, uint max) { uint result = 0; for(; min <= max; ++min) { result |= min; } return result; }
Apr 09 2010
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:
 We are talking about range propagation, a function of the compiler, not a
function of the compiled program.  Therefore, slower but more exact functions
should be preferred, since they only affect the compile time.

I agree with this. While my solution has a certain beauty to it and is fast, your solution accurately covers all the bases, whereas mine fails with min > 1. I say we go with your's. -- Adam D. Ruppe http://arsdnet.net
Apr 11 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 12 Apr 2010 13:45:14 -0400, Jérôme M. Berger <jeberger free.fr>  
wrote:

 Steven Schveighoffer wrote:
 J�r�me M. Berger Wrote:

 J�r�me M. Berger wrote:
 	OK, I found a solution that is optimal in over 99% of the cases
 (99.195% to be precise), while still being fast (because it avoids
 looping over the bits):

0..299 without checking, just for the timing. On my computer (Athlon X2 64 2GHz), my function took 50s and Andrei's took 266s. The difference should be even bigger for larger numbers.

Can I ask, what is the point of having a non-exact solution (unless you are just doing this for fun)? We are talking about range propagation, a function of the compiler, not a function of the compiled program. Therefore, slower but more exact functions should be preferred, since they only affect the compile time. Note that you will only need to do this range propagation on lines that "or" two values together, and something that reduces the runtime of the compiler by 216 seconds, but only when compiling enough code to have over 8 billion 'or' operations in it (300^4), I don't think is really that important. Give me the exact solution that prevents me from having to cast when the compiler can prove it, even if the runtime of the compiler is slightly slower.

speed of the D compiler to that of the Go compiler...

My point was simply that the amount of time saved is relative to the size of the program being compiled. If we assume conservatively that every other line in a program has bitwise or in it, then you are talking about a 16 billion line program, meaning the 216 seconds you save is insignificant compared to the total compile time. My point was that your fast solution that is inaccurate is not preferable because nobody notices the speed, they just notice the inaccuracy.
 	We are talking about range propagation, a function of the compiler,
 not a function of the compiled program. Since we can't get a 100%
 accurate representation of the possible values anyway (even yours
 might leave holes in the middle after all), then it might be better
 to choose a faster, slightly less precise algorithm if the
 difference is not too great. That's the difference between a
 compiler and a full-fledged static code analysis an program prover.

When we're talking about the difference between O(1) and O(lgn), I'll take accuracy over speed in my compiler any day. The solution should be 100% accurate for the problem statement. If the problem statement is not what we need, then we need a new problem statement :) Solving the problem statement for 99% of values is not good enough.
 	Anyway, the point is moot, I have a new version of my algorithm
 with 100% precision and high speed.

Yes, I'm still trying to understand how it works :) -Steve
Apr 12 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 12 Apr 2010 13:56:23 -0400, Jérôme M. Berger <jeberger free.fr>  
wrote:

 	Here's a fast 100% precise solution:

 ==============================8<------------------------------
 uint32_t maxOr (uint32_t minA, uint32_t minB,
                 uint32_t maxA, uint32_t maxB)
 {
    assert (minA <= maxA);
    assert (minB <= maxB);

    if (maxA == 0) return maxB;
    if (maxB == 0) return maxA;

    uint32_t a = maxA ^ minA;
    if (a != 0) {
       a = ((1 << (highbit (a)+1)) - 1) & maxA & maxB;
       if (a != 0)
          a = (1 << highbit (a)) - 1;
    }

    uint32_t b = maxB ^ minB;
    if (b != 0) {
       b = ((1 << (highbit (b)+1)) - 1) & maxA & maxB;
       if (b != 0)
          b = (1 << highbit (b)) - 1;
    }

    return maxA|maxB|a|b;
 }
 ------------------------------>8==============================

Fails for test case: minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result is 6). I'm not quite sure what you are trying to do, but I think you are on the right path to getting a faster solution. -Steve
Apr 12 2010
prev sibling next sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 12-apr-10, at 21:40, Steven Schveighoffer wrote:

 On Mon, 12 Apr 2010 13:45:14 -0400, J=C3=A9r=C3=B4me M. Berger =20
 <jeberger free.fr> wrote:

 Steven Schveighoffer wrote:
 J=EF=BF=BDr=EF=BF=BDme M. Berger Wrote:

 J=EF=BF=BDr=EF=BF=BDme M. Berger wrote:
 	OK, I found a solution that is optimal in over 99% of the cases
 (99.195% to be precise), while still being fast (because it avoids
 looping over the bits):





 in
 0..299 without checking, just for the timing. On my computer =20
 (Athlon
 X2 64  2GHz), my function took 50s and Andrei's took 266s. The
 difference should be even bigger for larger numbers.

Can I ask, what is the point of having a non-exact solution =20 (unless you are just doing this for fun)? We are talking about range propagation, a function of the =20 compiler, not a function of the compiled program. Therefore, =20 slower but more exact functions should be preferred, since they =20 only affect the compile time. Note that you will only need to do =20=



 this range propagation on lines that "or" two values together, and =20=



 something that reduces the runtime of the compiler by 216 seconds, =20=



 but only when compiling enough code to have over 8 billion 'or' =20
 operations in it (300^4), I don't think is really that important.  =20=



 Give me the exact solution that prevents me from having to cast =20
 when the compiler can prove it, even if the runtime of the =20
 compiler is slightly slower.

speed of the D compiler to that of the Go compiler...

My point was simply that the amount of time saved is relative to the =20=

 size of the program being compiled.  If we assume conservatively =20
 that every other line in a program has bitwise or in it, then you =20
 are talking about a 16 billion line program, meaning the 216 seconds =20=

 you save is insignificant compared to the total compile time.  My =20
 point was that your fast solution that is inaccurate is not =20
 preferable because nobody notices the speed, they just notice the =20
 inaccuracy.

 	We are talking about range propagation, a function of the =


 not a function of the compiled program. Since we can't get a 100%
 accurate representation of the possible values anyway (even yours
 might leave holes in the middle after all), then it might be better
 to choose a faster, slightly less precise algorithm if the
 difference is not too great. That's the difference between a
 compiler and a full-fledged static code analysis an program prover.

When we're talking about the difference between O(1) and O(lgn), =20 I'll take accuracy over speed in my compiler any day. The solution =20=

 should be 100% accurate for the problem statement.  If the problem =20
 statement is not what we need, then we need a new problem =20
 statement :)  Solving the problem statement for 99% of values is not =20=

 good enough.

 	Anyway, the point is moot, I have a new version of my algorithm
 with 100% precision and high speed.

Yes, I'm still trying to understand how it works :)

Sorry for the probably stupid question, but I don't understand much =20 the need of all this range propagation, in the compiler either you =20 have a a constant (and then you have the value at compile time, and =20 you don't need any range propagation, you can compare with the value), =20= or you have a runtime value. Do you want to explicitly add in the compiler the support for more =20 limited runtime values? Otherwise the range of a runtime value is a priori the whole possible =20= range, and thus any rule based on range propagation might be expressed =20= as static type based rule (as done in C). You can gain something for example you can know that summing 4 shorts =20= you will never overflow an int, is this where you want to go? What kind of bugs you are trying to avoid? Or is it simply having the =20= max and min properties defined? Nobody commented on my proposal, but there I see the potential for =20 bugs (1u-2)+1UL is 4294967296 and this happens also at runtime if you =20= have, for example, a function returning size_t and another returning =20 uint, and you combine them without thinking much and then you switch =20 to 64 bits... There I see a problem, but this you see without any special range =20 propagation, just thinking that subtraction or negation of unsigned =20 types is modulo 2^bit size, and thus cannot be then changed to another =20= size without explicit cast. Maybe in some cases using enums range propagation might spare a cast, =20= but is it really an improvement? I guess that I am missing something obvious, so I don't see the reason =20= for range propagation, but maybe I am not the only one, so thanks for =20= an explanation... Fawzi=
Apr 13 2010
prev sibling next sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
--Apple-Mail-2-789995494
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed;
	delsp=yes
Content-Transfer-Encoding: 7bit


On 13-apr-10, at 12:01, bearophile wrote:

 Fawzi Mohamed:
 I guess that I am missing something obvious, so I don't see the  
 reason
 for range propagation, but maybe I am not the only one, so thanks for
 an explanation...

Range propagation can improve the situation a little, so it's not bad. But it can't replace proper integral overflows. You need integral overflows at runtime.

integral overflow are helpful only if you have automatic conversion to a larger type, but that breaks the compile time knowledge of the size of such an integer, so that you have to assume that it might need to be pushed to the heap. Yes you might use some tricks (like using size_t.sizeof*8-1 bits, or a pointer to spare some place), but I don't think that D wants to go that way for the basic integers...
 Bye,
 bearophile

--Apple-Mail-2-789995494 Content-Type: text/html; charset=US-ASCII Content-Transfer-Encoding: quoted-printable <html><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; = -webkit-line-break: after-white-space; "><br><div><div>On 13-apr-10, at = 12:01, bearophile wrote:</div><br = class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div>Fawzi = Mohamed:<br><blockquote type=3D"cite">I guess that I am missing = something obvious, so I don't see the reason = &nbsp;<br></blockquote><blockquote type=3D"cite">for range propagation, = but maybe I am not the only one, so thanks for = &nbsp;<br></blockquote><blockquote type=3D"cite">an = explanation...<br></blockquote><br>Range propagation can improve the = situation a little, so it's not bad. But it can't replace proper = integral overflows. You need integral overflows at = runtime.<br></div></blockquote><div><br></div>integral overflow are = helpful only if you have automatic conversion to a larger type, but that = breaks the compile time knowledge of the size of such an integer, so = that you have to assume that it might need to be pushed to the = heap.</div><div>Yes you might use some tricks (like using = size_t.sizeof*8-1 bits, or a pointer to spare some place), but I don't = think that D wants to go that way for the basic = integers...</div><div><br></div><div><blockquote type=3D"cite"><div><font = class=3D"Apple-style-span" = color=3D"#000000"><br></font>Bye,<br>bearophile<br></div></blockquote></di= v><br></body></html>= --Apple-Mail-2-789995494--
Apr 13 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 12 Apr 2010 22:37:00 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 Jérôme M. Berger Wrote:

 Steven Schveighoffer wrote:
 Fails for test case:

 minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result is  


combinations in the [0, 63] range, so if there are mistakes they have to be outside that range...

I'll have to double check, I thought I copied your code verbatim (I did switch around the arguments to be minA, maxA, minB, maxB to fit my test harness, and I changed the uint_32_t to uint). I'll check tomorrow at work where the computer I used to test is.

I confirm, with the updated highbit, your solution works. Also, inspired by your solution (and using your highbit function, which is very neat BTW) I came up with a one-liner. Enjoy :) uint maxor(uint mina, uint maxa, uint minb, uint maxb) { return maxa | maxb | ((1 << highbit(((1 << (max(highbit(mina ^ maxa), highbit(minb ^ maxb)) + 1)) - 1) & maxa & maxb)) - 1); } Anyone want to do the equivalent minor? I've spent way too much time on this :) -Steve
Apr 13 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 13 Apr 2010 13:37:13 -0400, Jérôme M. Berger <jeberger free.fr>  
wrote:

 Steven Schveighoffer wrote:
 Jérôme M. Berger Wrote:

 Steven Schveighoffer wrote:
 When we're talking about the difference between O(1) and O(lgn), I'll
 take accuracy over speed in my compiler any day.

a minimal loss of accuracy, which will you take? Especially if the accuracy loss is less than is lost elsewhere (due to holes in the ranges).

Really? You rebuilt the compiler with your range propagation algorithm and verified that it adds 10 seconds versus an accurate one that adds 55s? How much time did the compiler spend to compile? I'd hazard to guess that a code base that adds 10s worth of your algorithm takes at least a few hours to compile. Is 55s that bad at that point? Again, if it takes the compiler an extra insignificant amount of time to be more accurate, I'm all for accuracy over speed when you get down to that level of insignificance. I'd say the point of pain has to be at least 10% of the compile time before it makes any sizable difference.

times slower just because it brings extra precision which you may not really need, then you will wind up with a compiler that is 5 to 6 times slower than it needs to be. Sure the difference on *one* function is not great in absolute terms, but if you make the same choice for *all* functions, then where do you go?

In my opinion? Yes, slower compilers that make code easier to write are better. I don't spend lots of time compiling, I spend it writing code. And I don't need to babysit the compiler, it goes off and does its thing. Performance is only important in the end result. I'm not saying I want my compiler to be slow, but I want it to be accurate and useful more than I want it to be quick. In this case, there is a quick and fully accurate solution, so it doesn't matter. But, for instance, if the compiler could do a full analysis to check if variables escape their scope, and that makes the compiler 5x slower, then I'd rather have the compiler verify my work instead of quickly producing memory-corrupting code. -Steve
Apr 13 2010
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 byte c = a | b;
 Say you already know min_a, max_a, min_b, and max_b. How do you compute 
 min_c and max_c? I thought of it for a bit and it seems quite tricky.

This is Interval arithmetic. If you are able to find a book about it... (or a library for some programming language that implements it). Bye, bearophile
Apr 10 2010
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sat, Apr 10, 2010 at 12:39:22PM -0500, Andrei Alexandrescu wrote:
 Yah, that works for unsigned types. For signed types, it's tricky (as 
 tricky as max).

My feeling is to go ahead and cast to unsigned, then do the calculation. For me at least, when doing bitwise ops, I assume it is unsigned anyway.
 Even on the branch that doesn't use the sum, that's still just a bound. 
 Consider:
 
 a = 8;
 b = 4;
 
 Then max(a|b) is 12, but computed with your method is 15.

Yeah, you're right. Bah. Maybe: max_a = max(a, b) | (2 ^^ numbits(min(a, b)) - 1); 8 | 7 == 15; Same deal, still wrong. Maybe we can throw in one more thing: let f(a, b) = the above; max_c = min(f(a, b), a | b); Let me test it... it seems to work. Here's the D program I used to brute-force the test: === import std.intrinsic; // for bsr() uint max(uint a, uint b) { return (a >= b) ? a : b; } uint min(uint a, uint b) { return (a >= b) ? b : a; } uint numbits(uint a) { return a == 0 ? a : bsr(a) + 1; } unittest { assert(numbits(0) == 0); assert(numbits(1) == 1); assert(numbits(2) == 2); assert(numbits(3) == 2); assert(numbits(4) == 3); assert(numbits(5) == 3); } uint f(uint a, uint b) { return max(a, b) | (1 << numbits(min(a, b)) - 1); } uint max_a(uint a, uint b) { return min(f(a, b), a | b); } void main() { for(uint a = 0; a <= 256; a++) for(uint b = 0; b <= 256; b++) assert(a|b <= max_a(a, b)); } ======= I compiled with the unittests, and it closed without tripping any asserts. Eyeballing the output looks right too. -- Adam D. Ruppe http://arsdnet.net
Apr 10 2010
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sat, Apr 10, 2010 at 02:57:59PM -0400, Adam D. Ruppe wrote:
 Let me test it... it seems to work. Here's the D program I used to brute-force
 the test:

The main() function there isn't an ideal test. Here's a better one: void main() { for(uint max_a = 0; max_a < 100; max_a++) for(uint max_b = 0; max_b < 100; max_b++) for(uint a = 0; a <= max_a; a++) for(uint b = 0; b <= max_b; b++) assert(a|b <= max_c(max_a, max_b)); } // note that I renamed uint max_a to max_c. We try a whole load of max_a and max_b and test every possible value. It still runs without throwing. My original function for min_c works too: uint min_c(uint a, uint b) { return max(a, b); } And you can brute force test it all: void main() { for(uint min_a = 0; min_a < 50; min_a++) for(uint min_b = 0; min_b < 50; min_b++) for(uint max_a = min_a; max_a < 100; max_a++) for(uint max_b = min_b; max_b < 100; max_b++) for(uint a = min_a; a <= max_a; a++) for(uint b = min_b; b <= max_b; b++) { assert(a|b <= max_c(max_a, max_b)); assert(a|b >= min_c(min_a, min_b)); } } This, as you can expect, takes a while to run, but does so without throwing. I'm pretty confident in the functions now. -- Adam D. Ruppe http://arsdnet.net
Apr 10 2010
prev sibling next sibling parent reply BCS <none anon.com> writes:
Hello Andrei,

 Consider:
 
 byte c = a | b;
 
 Say you already know min_a, max_a, min_b, and max_b. How do you
 compute min_c and max_c? I thought of it for a bit and it seems quite
 tricky.
 

x_bit_max == "the largest bit in x_max ^ x_min" x_oth_max == x_bit_max - 1; c_max = max(a_bit_max | b_oth_max, b_bit_max | a_oth_max) | (a_max & ~(a_bit_max | a_oth_max)); -- ... <IXOYE><
Apr 10 2010
parent reply BCS <none anon.com> writes:
Hello BCS,

 Hello Andrei,
 
 Consider:
 
 byte c = a | b;
 
 Say you already know min_a, max_a, min_b, and max_b. How do you
 compute min_c and max_c? I thought of it for a bit and it seems quite
 tricky.
 

x_oth_max == x_bit_max - 1;

// or make that c_min = (a_max & ~(a_bit_max | a_oth_max); c_max = max(a_bit_max | b_oth_max, b_bit_max | a_oth_max) | c_min;
 c_max = max(a_bit_max | b_oth_max, b_bit_max | a_oth_max) | (a_max &
 ~(a_bit_max | a_oth_max));
 

-- ... <IXOYE><
Apr 10 2010
parent BCS <none anon.com> writes:
Hello BCS,

Scratch all that, I'm clueless. :b
 
-- 
... <IXOYE><
Apr 10 2010
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sat, Apr 10, 2010 at 02:55:36PM -0400, bearophile wrote:
 Time ago I have even suggested to disallow bitwise ops when one or both
operands are signed...

I understand the position, but I don't advocate it just because I think it would be annoying and not give much of a benefit. Consider the following: for(int a = -10; a < 10; a++) writeln(a&1 ? "odd" : "even"); I'd just be annoyed if I had to cast a to uint just to do that. Note that it works on negative numbers too, so it isn't strictly wrong to do it on signed ints. I use this when doing stripes on outputted tables and stuff like that. a % 2 does the same thing, so this specific case isn't a big deal, but I still tend to type &1 rather than %2 out of habit (this kind of thing used to make a real speed difference back in the DOS days when I learned it all!) -- Adam D. Ruppe http://arsdnet.net
Apr 10 2010
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sat, Apr 10, 2010 at 02:57:59PM -0400, Adam D. Ruppe wrote:
 Let me test it... it seems to work. Here's the D program I used to brute-force
 the test:

Oh no! Eyeballing again showed a problem.. I should have put parens in my asserts: a | b < f(a,b) != (a|b) < f(a,b) That breaks it. Back to the drawing board. -- Adam D. Ruppe http://arsdnet.net
Apr 10 2010
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sat, Apr 10, 2010 at 04:02:05PM -0400, Adam D. Ruppe wrote:
 That breaks it. Back to the drawing board.

I *might* have it, but I'm not 100% confident in my test program. Here's my implementation: ==== import std.intrinsic; uint max(uint a, uint b) { return (a >= b) ? a : b; } uint min(uint a, uint b) { return (a >= b) ? b : a; } uint numbits(uint a) { return a == 0 ? a : (bsr(a) + 1); } uint max_c(uint a, uint b) { uint mn = min(a, b); uint mx = max(a, b); if(mn == 0) return mx; // if there is a bit we can reuse, it grants us everything before it too uint reusable = mn & mx; if(reusable) return (mx | ((1 << numbits(reusable)) - 1)) | mn; else return mx | mn; } uint min_c(uint min_a, uint min_b) { return max(min_a, min_b); } ===== I can't believe I spent all day on this... -- Adam D. Ruppe http://arsdnet.net
Apr 10 2010
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sat, Apr 10, 2010 at 07:38:07PM -0400, Adam D. Ruppe wrote:
 I *might* have it, but I'm not 100% confident in my test program.

I think my test program is telling me something: a non-zero min_c subtracts from max_c. If I take my brute-force empirically calculated max_c and compare it to what my max_c function returns, it doesn't always match if the minimums change. This might be a bug; it is unintuitive that it would do this, but it actually makes sense. Consider if min == max, you only have one value to work with. let max_a = 4, min_a = 0; max_b = 4 You can get 7 out of it, since 3 < 4, so it is a possible number and: 100 | 011 == 7 But, if min_b == 4, the only possible value for b is 4. And: 100 | 100 == 4. Increasing min_b does indeed decrease the max you can get. Nice fact, but I've spent too much time on this already, so I'll call it done with this: My max_c included some unnecessary code: my reusable mask obsoletes all of my special case code! So, we can bring the program down to three lines: import std.intrinsic; uint numbits(uint a) { return a == 0 ? a : (bsr(a) + 1); } uint max_c(uint a, uint b) { return a | ((1 << numbits(a&b)) - 1) | b; } It passes my test, though since I bugged it up the first time, I might have done it again. -- Adam D. Ruppe http://arsdnet.net
Apr 10 2010
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sat, Apr 10, 2010 at 08:42:54PM -0500, Andrei Alexandrescu wrote:
 What results does it yield with my main() test harness?

total=100000000; equal=14585400 (14.5854%) I think that's a perfect result. Here's why: the equality comparison checks two specific values, (a, b), but the maxOr functions return the max for the interval ([0, a], [0, b]). It wouldn't always be equal, since there are other values in there that can reach the max. For example, a = 4, b = 4. 100|100 == 100, but that range also include 4|3, 100 | 011 == 111, which is the max. Your maxOR function gives the same percentage, for probably the same reason. Though, mine runs ~7x faster on my box. -- Adam D. Ruppe http://arsdnet.net
Apr 10 2010
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sat, Apr 10, 2010 at 09:21:07PM -0500, Andrei Alexandrescu wrote:
 Yah, it's clear that we're not looking anywhere near 100%. I'm a bit sad 
 we don't have an adequate baseline, but the fact that two distinct 
 methods obtained the same result is encouraging.

My brute-force program was buggy, but I think I fixed it. Here's the main(): ======== uint min_a = 0; uint min_b = 0; for(uint max_a = min_a; max_a <= 256; max_a++) for(uint max_b = min_b; max_b <= 256; max_b++) { uint real_max = 0; for(uint a = min_a; a <= max_a; a++) for(uint b = min_b; b <= max_b; b++) { assert((a|b) <= max_c(max_a, max_b), format("%b | %b !<= %b (from %b %b)", a, b, max_c(max_a, max_b), max_a, max_b)); assert((a|b) >= min_c(min_a, min_b)); if((a | b) > real_max) real_max = (a|b); } assert(max_c(max_a, max_b) == real_max, format("[%b, %b]. expected: %d, got %d (min == %d)", max_a, max_b, max_c(max_a, max_b), real_max, min_c(min_a, min_b))); } writeln("success"); ======= With so many nested loops, it is slow as dirt, but what it does is try every value in the given range and record the maximum number encountered. If it doesn't match at any step, it throws. But, it works for me. I spent a chunk of the day eyeballing the output too to ensure it is right (and to figure out the pattern that led to the final function). Now, the thing is if you change min_a or min_b, it breaks the real_max; like I said in the other message, if min == max, for example, you get the case where 4|4 isn't the max, since 4|3 isn't available. I'm not sure how to fix that, but I am pretty convinced that to get the perfect solution, the max_c function needs to know min_a and min_b too. Still, this solution works very well in the min_a = min_b = 0 case, so it is at least a decent bound.
 On to the next task: compute minOR and maxOR for _signed_ values. It 
 turns out minOR its also nontrivial...

Nontrivial is an understatement. I started asking if it can even be done without assuming a size.. I do think you can, by assuming the sizes are equal, whatever they are, but it still is not easy. I still like the idea of just casting it. How often would it cause problems in real code anyway? The only starting point I have is: Of either max_a or max_b is negative, the result of the OR is going to be negative, since that sign bit is one, and we can't change a one to a zero on any OR operation. So, the max value will be positive iff both values are possibly positive. But taking that one statement to working code is too hard for my blood.
 When we're done, we can pass the functions to Walter so he can integrate 
 them within his compiler. Right now the compiler uses a wrong algorithm.

Indeed. We're getting there, but still a ways to go. -- Adam D. Ruppe http://arsdnet.net
Apr 10 2010
prev sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 13-apr-10, at 12:02, Don wrote:

 Fawzi Mohamed wrote:
 On 12-apr-10, at 21:40, Steven Schveighoffer wrote:
 On Mon, 12 Apr 2010 13:45:14 -0400, J=C3=A9r=C3=B4me M. Berger =



 fr> wrote:

 Steven Schveighoffer wrote:
 J=EF=BF=BDr=EF=BF=BDme M. Berger Wrote:

 J=EF=BF=BDr=EF=BF=BDme M. Berger wrote:
    OK, I found a solution that is optimal in over 99% of the =20
 cases
 (99.195% to be precise), while still being fast (because it =20
 avoids
 looping over the bits):

pairs in 0..299 without checking, just for the timing. On my computer =20 (Athlon X2 64 2GHz), my function took 50s and Andrei's took 266s. The difference should be even bigger for larger numbers.

Can I ask, what is the point of having a non-exact solution =20 (unless you are just doing this for fun)? We are talking about range propagation, a function of the =20 compiler, not a function of the compiled program. Therefore, =20 slower but more exact functions should be preferred, since they =20=





 only affect the compile time.  Note that you will only need to =20
 do this range propagation on lines that "or" two values =20
 together, and something that reduces the runtime of the compiler =20=





 by 216 seconds, but only when compiling enough code to have over =20=





 8 billion 'or' operations in it (300^4), I don't think is really =20=





 that important.  Give me the exact solution that prevents me =20
 from having to cast when the compiler can prove it, even if the =20=





 runtime of the compiler is slightly slower.

speed of the D compiler to that of the Go compiler...

My point was simply that the amount of time saved is relative to =20 the size of the program being compiled. If we assume =20 conservatively that every other line in a program has bitwise or =20 in it, then you are talking about a 16 billion line program, =20 meaning the 216 seconds you save is insignificant compared to the =20=



 total compile time.  My point was that your fast solution that is =20=



 inaccurate is not preferable because nobody notices the speed, =20
 they just notice the inaccuracy.

    We are talking about range propagation, a function of the =20
 compiler,
 not a function of the compiled program. Since we can't get a 100%
 accurate representation of the possible values anyway (even yours
 might leave holes in the middle after all), then it might be better
 to choose a faster, slightly less precise algorithm if the
 difference is not too great. That's the difference between a
 compiler and a full-fledged static code analysis an program prover.

When we're talking about the difference between O(1) and O(lgn), =20 I'll take accuracy over speed in my compiler any day. The =20 solution should be 100% accurate for the problem statement. If =20 the problem statement is not what we need, then we need a new =20 problem statement :) Solving the problem statement for 99% of =20 values is not good enough.
    Anyway, the point is moot, I have a new version of my algorithm
 with 100% precision and high speed.

Yes, I'm still trying to understand how it works :)



 the need of all this range propagation, in the compiler either you =20=


 have a a constant (and then you have the value at compile time, and =20=


 you don't need any range propagation, you can compare with the =20
 value), or you have a runtime value.

It's been part of DMD2 for a while now. It allows you to do things =20 like: ubyte lowbits(int x) { return x & 7; } without an explicit cast. The compiler knows that x&7 can safely fit =20=

 inside a single byte.  Whereas   ((x&7) << 12) | (x&3);
 does not fit, and requires an explicit cast.

ah ok I understand, I thought that was treated like x & cast(ubyte)7 , =20= and so as comparison of the compile time value with the ranges of =20 ubyte (no range propagation needed). But I can understand why it is treated as cast(ubyte)((cast(int)x) & =20 7), as it is probably easier for the compiler, as it upcasts by default. Thanks for the explanation.
Apr 13 2010