## digitalmars.D - value range propagation for logical OR

- Andrei Alexandrescu (6/6) Apr 10 2010 Consider:
- Justin Spahr-Summers (5/17) Apr 10 2010 It seems like it would use the lowest minimum and the highest maximum of...
- Andrei Alexandrescu (6/23) Apr 10 2010 I thought the same, but consider:
- Adam D. Ruppe (25/29) Apr 10 2010 min_c seems easy enough:
- Andrei Alexandrescu (12/38) Apr 10 2010 Yah, that works for unsigned types. For signed types, it's tricky (as
- Adam D. Ruppe (41/50) Apr 10 2010 My feeling is to go ahead and cast to unsigned, then do the calculation....
- bearophile (4/6) Apr 10 2010 Time ago I have even suggested to disallow bitwise ops when one or both ...
- Adam D. Ruppe (15/16) Apr 10 2010 I understand the position, but I don't advocate it just because I think ...
- bearophile (5/7) Apr 10 2010 I have never added a bug report in Bugzilla about this because I think i...
- Adam D. Ruppe (33/35) Apr 10 2010 The main() function there isn't an ideal test. Here's a better one:
- Adam D. Ruppe (7/9) Apr 10 2010 Oh no! Eyeballing again showed a problem.. I should have put parens in m...
- bearophile (21/23) Apr 10 2010 Bugs are gold nuggets! Yes, the precedence of bitwise operators is low, ...
- bearophile (4/4) Apr 10 2010 I have filed a "report":
- Andrei Alexandrescu (4/6) Apr 10 2010 Perfect, thanks. Your decision to contribute with reports is very
- Adam D. Ruppe (28/29) Apr 10 2010 I *might* have it, but I'm not 100% confident in my test program. Here's...
- Adam D. Ruppe (23/24) Apr 10 2010 I think my test program is telling me something: a non-zero min_c subtra...
- Andrei Alexandrescu (3/24) Apr 10 2010 What results does it yield with my main() test harness?
- Adam D. Ruppe (13/14) Apr 10 2010 total=100000000; equal=14585400 (14.5854%)
- Andrei Alexandrescu (10/21) Apr 10 2010 Yah, it's clear that we're not looking anywhere near 100%. I'm a bit sad...
- Adam D. Ruppe (46/53) Apr 10 2010 My brute-force program was buggy, but I think I fixed it. Here's the mai...
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (7/11) Apr 10 2010 Absolutely! :) I had just started writing an e-mail about it.
- Andrei Alexandrescu (17/23) Apr 10 2010 I think this would work:
- Rainer Deyke (12/24) Apr 10 2010 This looks wrong. Your function, if I understand it correctly, flips
- Don (4/30) Apr 10 2010 It's a very interesting puzzle. There are some pretty complex cases.
- PBa (57/84) Apr 09 2010 How about this?
- Andrei Alexandrescu (6/29) Apr 10 2010 It tries to set all bits in which maxa is zero starting and keeps them
- Adam D. Ruppe (11/12) Apr 10 2010 mina = 1
- Andrei Alexandrescu (3/12) Apr 10 2010 Sorry, I meant max of the two minima.
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (34/55) Apr 10 2010 In other words: maxa | ((1 << highbit (maxb))-1) where:
- Andrei Alexandrescu (68/119) Apr 10 2010 Interesting. I don't get how your version is equivalent to mine, and I
- Andrei Alexandrescu (30/30) Apr 10 2010 On 04/10/2010 05:33 PM, Andrei Alexandrescu wrote:
- Andrei Alexandrescu (18/21) Apr 10 2010 Never mind that. I mistakenly used maxOR instead of maxOR2.
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (57/75) Apr 11 2010 It's not equivalent to yours, because it was a bit late and I was
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (54/54) Apr 11 2010 OK, I found a solution that is optimal in over 99% of the cases
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (17/19) Apr 11 2010 The assumption that maxB would be the value that produces the maximum
- =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (92/116) Apr 11 2010 boundary="------------020801040508020503030409"
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (12/43) Apr 11 2010 My mistake: It was your function but the highbit() that I used was not
- =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (11/62) Apr 12 2010 Yes, like I said with my code, it is conservative. It will give the
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (6/18) Apr 12 2010 Now I see my problem: I took Andrei's puzzle literally. The implication
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (146/150) Apr 11 2010 boundary="------------010309040703050906000802"
- Steven Schveighoffer (5/14) Apr 11 2010 Can I ask, what is the point of having a non-exact solution (unless you ...
- Adam D. Ruppe (7/8) Apr 11 2010 I agree with this. While my solution has a certain beauty to it and is
- Andrei Alexandrescu (3/8) Apr 12 2010 Agreed. Steve's solution is the best we have for unsigned numbers.
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (7/23) Apr 12 2010 Nope, not anymore...
- Andrei Alexandrescu (21/40) Apr 16 2010 Looks great, thanks Jerome!
- Don (12/67) Apr 20 2010 A comment: The way DMD currently does range propagation (storing range
- =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (27/43) Apr 12 2010 a function of the compiled program. Therefore, slower but more exact f...
- Steven Schveighoffer (16/54) Apr 12 2010 My point was simply that the amount of time saved is relative to the siz...
- =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (14/37) Apr 12 2010 ,
- Steven Schveighoffer (4/11) Apr 12 2010 Really? You rebuilt the compiler with your range propagation algorithm ...
- =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (19/33) Apr 13 2010 and verified that it adds 10 seconds versus an accurate one that adds 5...
- Steven Schveighoffer (13/41) Apr 13 2010 In my opinion? Yes, slower compilers that make code easier to write are...
- =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (17/30) Apr 13 2010 =20
- Fawzi Mohamed (32/91) Apr 13 2010 compiler,
- bearophile (4/7) Apr 13 2010 Range propagation can improve the situation a little, so it's not bad. B...
- Fawzi Mohamed (12/22) Apr 13 2010 charset=US-ASCII;
- Don (9/80) Apr 13 2010 It's been part of DMD2 for a while now. It allows you to do things like:
- Fawzi Mohamed (8/92) Apr 13 2010 ah ok I understand, I thought that was treated like x & cast(ubyte)7 , =...
- =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (11/32) Apr 13 2010 ,
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (31/31) Apr 12 2010 Here's a fast 100% precise solution:
- Steven Schveighoffer (7/31) Apr 12 2010 Fails for test case:
- =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (10/14) Apr 12 2010 Nope, outputs 6. Note that I've run an exhaustive search for all
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (3/13) Apr 12 2010 I confirm. Jerome's code passes my randomized test.
- Steven Schveighoffer (3/12) Apr 12 2010 I'll have to double check, I thought I copied your code verbatim (I did ...
- Steven Schveighoffer (8/10) Apr 12 2010 I think I see the issue, I was using an older version of your highbit, a...
- =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (22/39) Apr 13 2010 d switch around the arguments to be minA, maxA, minB, maxB to fit my tes...
- Steven Schveighoffer (13/28) Apr 13 2010 I confirm, with the updated highbit, your solution works.
- Don (6/42) Apr 13 2010 Awesome stuff, guys.
- Adam Ruppe (17/47) Apr 13 2010 Jerome's highbit function is the same as std.intrinsic.bsr. I wonder
- Steven Schveighoffer (9/18) Apr 13 2010 Just a note, this code should be runnable in C++, because the compiler i...
- bearophile (5/7) Apr 13 2010 You can learn something from this page (it can also be useful for the tr...
- Adam D. Ruppe (9/11) Apr 13 2010 I just assumed since it was in D that it was probably
- Clemens (4/16) Apr 13 2010 That's strange. Looking at src/backend/cod4.c, function cdbscan, in the ...
- Adam D. Ruppe (12/13) Apr 13 2010 The opcode is fairly slow anyway (as far as opcodes go) - odds are the
- =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (9/17) Apr 13 2010 Thanks, but I can't claim credit for it. It's a classical algorithm
- Ellery Newcomer (6/6) Dec 04 2010 is it just me, or does this fail for
- Ellery Newcomer (25/31) Dec 04 2010 i'm thinking this is better:
- bearophile (4/7) Apr 10 2010 This is Interval arithmetic. If you are able to find a book about it... ...
- BCS (7/15) Apr 10 2010 x_bit_max == "the largest bit in x_max ^ x_min"

Consider: byte c = a | b; Say you already know min_a, max_a, min_b, and max_b. How do you compute min_c and max_c? I thought of it for a bit and it seems quite tricky. Thanks, Andrei

Apr 10 2010

On Sat, 10 Apr 2010 12:01:45 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Consider: byte c = a | b; Say you already know min_a, max_a, min_b, and max_b. How do you compute min_c and max_c? I thought of it for a bit and it seems quite tricky. Thanks, AndreiIt seems like it would use the lowest minimum and the highest maximum of the two... that's what I personally would find most natural, given the semantics of bitwise OR, but maybe I'm just way off base.

Apr 10 2010

On 04/10/2010 12:22 PM, Justin Spahr-Summers wrote:On Sat, 10 Apr 2010 12:01:45 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I thought the same, but consider: a = 4; b = 3; Then the maximum value is 7, larger than both. AndreiConsider: byte c = a | b; Say you already know min_a, max_a, min_b, and max_b. How do you compute min_c and max_c? I thought of it for a bit and it seems quite tricky. Thanks, AndreiIt seems like it would use the lowest minimum and the highest maximum of the two... that's what I personally would find most natural, given the semantics of bitwise OR, but maybe I'm just way off base.

Apr 10 2010

On Sat, Apr 10, 2010 at 12:01:45PM -0500, Andrei Alexandrescu wrote:byte c = a | b; Say you already know min_a, max_a, min_b, and max_b. How do you compute min_c and max_c? I thought of it for a bit and it seems quite tricky.min_c seems easy enough: min_c = max(min_a, min_b); When you or two values together, the result must be >= both of them, and this does the trick there. max_c is a bit harder. The obvious solution of: max_c = max(max_a, max_b); Fails because of 0b1100 | 0b0011... looking at this example, let's try: max_c = max_a + max_b; It covers that case, but is too big for 0b1111 | 0b1111. This might work: int numbits(number) { return the number of bits required to hold that number; } if ( numbits(max_a + max_b) > max(numbits(max_a), numbits(max_b)) ) max_c = 2 ^^ (max(numbits(max_a), numbits(max_b))) - 1; else max_c = max_a + max_b; The biggest we can possibly get is the sum of the two numbers. But, if it carries over to more bits than either option, we truncate it - OR never gives more bits than the two arguments. So we assume the max is the worst case of all ones or'd with all ones. I think this would work, but might not be the best we can possibly do. It is the best I can think of though. -- Adam D. Ruppe http://arsdnet.net

Apr 10 2010

(By the way I meant _bitwise_ in the title.) On 04/10/2010 12:38 PM, Adam D. Ruppe wrote:On Sat, Apr 10, 2010 at 12:01:45PM -0500, Andrei Alexandrescu wrote:Yah, that works for unsigned types. For signed types, it's tricky (as tricky as max).byte c = a | b; Say you already know min_a, max_a, min_b, and max_b. How do you compute min_c and max_c? I thought of it for a bit and it seems quite tricky.min_c seems easy enough: min_c = max(min_a, min_b); When you or two values together, the result must be>= both of them, and this does the trick there.max_c is a bit harder. The obvious solution of: max_c = max(max_a, max_b); Fails because of 0b1100 | 0b0011... looking at this example, let's try: max_c = max_a + max_b; It covers that case, but is too big for 0b1111 | 0b1111.Yah, sum is a bound for max, but not the exact max. This is because bitwise OR is like a sum without transport.This might work: int numbits(number) { return the number of bits required to hold that number; } if ( numbits(max_a + max_b)> max(numbits(max_a), numbits(max_b)) ) max_c = 2 ^^ (max(numbits(max_a), numbits(max_b))) - 1; else max_c = max_a + max_b; The biggest we can possibly get is the sum of the two numbers. But, if it carries over to more bits than either option, we truncate it - OR never gives more bits than the two arguments. So we assume the max is the worst case of all ones or'd with all ones. I think this would work, but might not be the best we can possibly do. It is the best I can think of though.Even on the branch that doesn't use the sum, that's still just a bound. Consider: a = 8; b = 4; Then max(a|b) is 12, but computed with your method is 15. Andrei

Apr 10 2010

On Sat, Apr 10, 2010 at 12:39:22PM -0500, Andrei Alexandrescu wrote:Yah, that works for unsigned types. For signed types, it's tricky (as tricky as max).My feeling is to go ahead and cast to unsigned, then do the calculation. For me at least, when doing bitwise ops, I assume it is unsigned anyway.Even on the branch that doesn't use the sum, that's still just a bound. Consider: a = 8; b = 4; Then max(a|b) is 12, but computed with your method is 15.Yeah, you're right. Bah. Maybe: max_a = max(a, b) | (2 ^^ numbits(min(a, b)) - 1); 8 | 7 == 15; Same deal, still wrong. Maybe we can throw in one more thing: let f(a, b) = the above; max_c = min(f(a, b), a | b); Let me test it... it seems to work. Here's the D program I used to brute-force the test: === import std.intrinsic; // for bsr() uint max(uint a, uint b) { return (a >= b) ? a : b; } uint min(uint a, uint b) { return (a >= b) ? b : a; } uint numbits(uint a) { return a == 0 ? a : bsr(a) + 1; } unittest { assert(numbits(0) == 0); assert(numbits(1) == 1); assert(numbits(2) == 2); assert(numbits(3) == 2); assert(numbits(4) == 3); assert(numbits(5) == 3); } uint f(uint a, uint b) { return max(a, b) | (1 << numbits(min(a, b)) - 1); } uint max_a(uint a, uint b) { return min(f(a, b), a | b); } void main() { for(uint a = 0; a <= 256; a++) for(uint b = 0; b <= 256; b++) assert(a|b <= max_a(a, b)); } ======= I compiled with the unittests, and it closed without tripping any asserts. Eyeballing the output looks right too. -- Adam D. Ruppe http://arsdnet.net

Apr 10 2010

Adam D. Ruppe:My feeling is to go ahead and cast to unsigned, then do the calculation. For me at least, when doing bitwise ops, I assume it is unsigned anyway.Time ago I have even suggested to disallow bitwise ops when one or both operands are signed... Bye, bearophile

Apr 10 2010

On Sat, Apr 10, 2010 at 02:55:36PM -0400, bearophile wrote:Time ago I have even suggested to disallow bitwise ops when one or both operands are signed...I understand the position, but I don't advocate it just because I think it would be annoying and not give much of a benefit. Consider the following: for(int a = -10; a < 10; a++) writeln(a&1 ? "odd" : "even"); I'd just be annoyed if I had to cast a to uint just to do that. Note that it works on negative numbers too, so it isn't strictly wrong to do it on signed ints. I use this when doing stripes on outputted tables and stuff like that. a % 2 does the same thing, so this specific case isn't a big deal, but I still tend to type &1 rather than %2 out of habit (this kind of thing used to make a real speed difference back in the DOS days when I learned it all!) -- Adam D. Ruppe http://arsdnet.net

Apr 10 2010

Adam D. Ruppe:I understand the position, but I don't advocate it just because I think it would be annoying and not give much of a benefit.I have never added a bug report in Bugzilla about this because I think it doesn't give enough benefit. Thank you for your answer, bye, bearophile

Apr 10 2010

On Sat, Apr 10, 2010 at 02:57:59PM -0400, Adam D. Ruppe wrote:Let me test it... it seems to work. Here's the D program I used to brute-force the test:The main() function there isn't an ideal test. Here's a better one: void main() { for(uint max_a = 0; max_a < 100; max_a++) for(uint max_b = 0; max_b < 100; max_b++) for(uint a = 0; a <= max_a; a++) for(uint b = 0; b <= max_b; b++) assert(a|b <= max_c(max_a, max_b)); } // note that I renamed uint max_a to max_c. We try a whole load of max_a and max_b and test every possible value. It still runs without throwing. My original function for min_c works too: uint min_c(uint a, uint b) { return max(a, b); } And you can brute force test it all: void main() { for(uint min_a = 0; min_a < 50; min_a++) for(uint min_b = 0; min_b < 50; min_b++) for(uint max_a = min_a; max_a < 100; max_a++) for(uint max_b = min_b; max_b < 100; max_b++) for(uint a = min_a; a <= max_a; a++) for(uint b = min_b; b <= max_b; b++) { assert(a|b <= max_c(max_a, max_b)); assert(a|b >= min_c(min_a, min_b)); } } This, as you can expect, takes a while to run, but does so without throwing. I'm pretty confident in the functions now. -- Adam D. Ruppe http://arsdnet.net

Apr 10 2010

On Sat, Apr 10, 2010 at 02:57:59PM -0400, Adam D. Ruppe wrote:Let me test it... it seems to work. Here's the D program I used to brute-force the test:Oh no! Eyeballing again showed a problem.. I should have put parens in my asserts: a | b < f(a,b) != (a|b) < f(a,b) That breaks it. Back to the drawing board. -- Adam D. Ruppe http://arsdnet.net

Apr 10 2010

Adam D. Ruppe:Oh no! Eyeballing again showed a problem.. I should have put parens in my asserts: a | b < f(a,b) != (a|b) < f(a,b)Bugs are gold nuggets! Yes, the precedence of bitwise operators is low, it's an error-prone thing, it's a part of C/C++/D that causes frequent bugs in programs written by everybody. I add extra parentheses when I use bitwise operators. Unfortunately I don't see a simple way to remove this significant source of bugs from the D2 language. When you switch on full warnings GCC warns you about few possible similar errors, suggesting to add parentheses to remove some ambiguity (for the eyes of the programmers). This is a small example in C: #include "stdio.h" #include "stdlib.h" int main() { int a = atoi("10"); int b = atoi("20"); int c = atoi("30"); printf("%u\n", a|b <= c); return 0; } If you compile it with GCC (I am using gcc 4.4.1): ...>gcc -Wall test.c -o test test.c: In function 'main': test.c:9: warning: suggest parentheses around comparison in operand of '|' You always use -Wall (and other warnings) when you write C code. So in this case the C language+compiler is able to catch a bug like yours :-) Bye, bearophile

Apr 10 2010

I have filed a "report": http://d.puremagic.com/issues/show_bug.cgi?id=4077 Bye, bearophile

Apr 10 2010

On 04/10/2010 09:55 PM, bearophile wrote:I have filed a "report": http://d.puremagic.com/issues/show_bug.cgi?id=4077Perfect, thanks. Your decision to contribute with reports is very appreciated. Andrei

Apr 10 2010

On Sat, Apr 10, 2010 at 04:02:05PM -0400, Adam D. Ruppe wrote:That breaks it. Back to the drawing board.I *might* have it, but I'm not 100% confident in my test program. Here's my implementation: ==== import std.intrinsic; uint max(uint a, uint b) { return (a >= b) ? a : b; } uint min(uint a, uint b) { return (a >= b) ? b : a; } uint numbits(uint a) { return a == 0 ? a : (bsr(a) + 1); } uint max_c(uint a, uint b) { uint mn = min(a, b); uint mx = max(a, b); if(mn == 0) return mx; // if there is a bit we can reuse, it grants us everything before it too uint reusable = mn & mx; if(reusable) return (mx | ((1 << numbits(reusable)) - 1)) | mn; else return mx | mn; } uint min_c(uint min_a, uint min_b) { return max(min_a, min_b); } ===== I can't believe I spent all day on this... -- Adam D. Ruppe http://arsdnet.net

Apr 10 2010

On Sat, Apr 10, 2010 at 07:38:07PM -0400, Adam D. Ruppe wrote:I *might* have it, but I'm not 100% confident in my test program.I think my test program is telling me something: a non-zero min_c subtracts from max_c. If I take my brute-force empirically calculated max_c and compare it to what my max_c function returns, it doesn't always match if the minimums change. This might be a bug; it is unintuitive that it would do this, but it actually makes sense. Consider if min == max, you only have one value to work with. let max_a = 4, min_a = 0; max_b = 4 You can get 7 out of it, since 3 < 4, so it is a possible number and: 100 | 011 == 7 But, if min_b == 4, the only possible value for b is 4. And: 100 | 100 == 4. Increasing min_b does indeed decrease the max you can get. Nice fact, but I've spent too much time on this already, so I'll call it done with this: My max_c included some unnecessary code: my reusable mask obsoletes all of my special case code! So, we can bring the program down to three lines: import std.intrinsic; uint numbits(uint a) { return a == 0 ? a : (bsr(a) + 1); } uint max_c(uint a, uint b) { return a | ((1 << numbits(a&b)) - 1) | b; } It passes my test, though since I bugged it up the first time, I might have done it again. -- Adam D. Ruppe http://arsdnet.net

Apr 10 2010

On 04/10/2010 06:58 PM, Adam D. Ruppe wrote:On Sat, Apr 10, 2010 at 07:38:07PM -0400, Adam D. Ruppe wrote:What results does it yield with my main() test harness? AndreiI *might* have it, but I'm not 100% confident in my test program.I think my test program is telling me something: a non-zero min_c subtracts from max_c. If I take my brute-force empirically calculated max_c and compare it to what my max_c function returns, it doesn't always match if the minimums change. This might be a bug; it is unintuitive that it would do this, but it actually makes sense. Consider if min == max, you only have one value to work with. let max_a = 4, min_a = 0; max_b = 4 You can get 7 out of it, since 3< 4, so it is a possible number and: 100 | 011 == 7 But, if min_b == 4, the only possible value for b is 4. And: 100 | 100 == 4. Increasing min_b does indeed decrease the max you can get. Nice fact, but I've spent too much time on this already, so I'll call it done with this: My max_c included some unnecessary code: my reusable mask obsoletes all of my special case code! So, we can bring the program down to three lines: import std.intrinsic; uint numbits(uint a) { return a == 0 ? a : (bsr(a) + 1); } uint max_c(uint a, uint b) { return a | ((1<< numbits(a&b)) - 1) | b; } It passes my test, though since I bugged it up the first time, I might have done it again.

Apr 10 2010

On Sat, Apr 10, 2010 at 08:42:54PM -0500, Andrei Alexandrescu wrote:What results does it yield with my main() test harness?total=100000000; equal=14585400 (14.5854%) I think that's a perfect result. Here's why: the equality comparison checks two specific values, (a, b), but the maxOr functions return the max for the interval ([0, a], [0, b]). It wouldn't always be equal, since there are other values in there that can reach the max. For example, a = 4, b = 4. 100|100 == 100, but that range also include 4|3, 100 | 011 == 111, which is the max. Your maxOR function gives the same percentage, for probably the same reason. Though, mine runs ~7x faster on my box. -- Adam D. Ruppe http://arsdnet.net

Apr 10 2010

On 04/10/2010 08:58 PM, Adam D. Ruppe wrote:On Sat, Apr 10, 2010 at 08:42:54PM -0500, Andrei Alexandrescu wrote:Yah, it's clear that we're not looking anywhere near 100%. I'm a bit sad we don't have an adequate baseline, but the fact that two distinct methods obtained the same result is encouraging.What results does it yield with my main() test harness?total=100000000; equal=14585400 (14.5854%) I think that's a perfect result. Here's why: the equality comparison checks two specific values, (a, b), but the maxOr functions return the max for the interval ([0, a], [0, b]). It wouldn't always be equal, since there are other values in there that can reach the max.For example, a = 4, b = 4. 100|100 == 100, but that range also include 4|3, 100 | 011 == 111, which is the max. Your maxOR function gives the same percentage, for probably the same reason. Though, mine runs ~7x faster on my box.Then you haven't wasted your day! On to the next task: compute minOR and maxOR for _signed_ values. It turns out minOR its also nontrivial... When we're done, we can pass the functions to Walter so he can integrate them within his compiler. Right now the compiler uses a wrong algorithm. Andrei

Apr 10 2010

On Sat, Apr 10, 2010 at 09:21:07PM -0500, Andrei Alexandrescu wrote:Yah, it's clear that we're not looking anywhere near 100%. I'm a bit sad we don't have an adequate baseline, but the fact that two distinct methods obtained the same result is encouraging.My brute-force program was buggy, but I think I fixed it. Here's the main(): ======== uint min_a = 0; uint min_b = 0; for(uint max_a = min_a; max_a <= 256; max_a++) for(uint max_b = min_b; max_b <= 256; max_b++) { uint real_max = 0; for(uint a = min_a; a <= max_a; a++) for(uint b = min_b; b <= max_b; b++) { assert((a|b) <= max_c(max_a, max_b), format("%b | %b !<= %b (from %b %b)", a, b, max_c(max_a, max_b), max_a, max_b)); assert((a|b) >= min_c(min_a, min_b)); if((a | b) > real_max) real_max = (a|b); } assert(max_c(max_a, max_b) == real_max, format("[%b, %b]. expected: %d, got %d (min == %d)", max_a, max_b, max_c(max_a, max_b), real_max, min_c(min_a, min_b))); } writeln("success"); ======= With so many nested loops, it is slow as dirt, but what it does is try every value in the given range and record the maximum number encountered. If it doesn't match at any step, it throws. But, it works for me. I spent a chunk of the day eyeballing the output too to ensure it is right (and to figure out the pattern that led to the final function). Now, the thing is if you change min_a or min_b, it breaks the real_max; like I said in the other message, if min == max, for example, you get the case where 4|4 isn't the max, since 4|3 isn't available. I'm not sure how to fix that, but I am pretty convinced that to get the perfect solution, the max_c function needs to know min_a and min_b too. Still, this solution works very well in the min_a = min_b = 0 case, so it is at least a decent bound.On to the next task: compute minOR and maxOR for _signed_ values. It turns out minOR its also nontrivial...Nontrivial is an understatement. I started asking if it can even be done without assuming a size.. I do think you can, by assuming the sizes are equal, whatever they are, but it still is not easy. I still like the idea of just casting it. How often would it cause problems in real code anyway? The only starting point I have is: Of either max_a or max_b is negative, the result of the OR is going to be negative, since that sign bit is one, and we can't change a one to a zero on any OR operation. So, the max value will be positive iff both values are possibly positive. But taking that one statement to working code is too hard for my blood.When we're done, we can pass the functions to Walter so he can integrate them within his compiler. Right now the compiler uses a wrong algorithm.Indeed. We're getting there, but still a ways to go. -- Adam D. Ruppe http://arsdnet.net

Apr 10 2010

Adam D. Ruppe wrote:I am pretty convinced that to get the perfect solution, the max_c function needs to know min_a and min_b too.Absolutely! :) I had just started writing an e-mail about it. Andrei Alexandrescu wrote:How about computing minOR for unsigned values first? ;) The proposed solution of max(min_a, min_b) is correct only for a couple of corner cases. AliOn to the next task: compute minOR and maxOR for _signed_ values. It turns out minOR its also nontrivial...

Apr 10 2010

On 04/10/2010 12:01 PM, Andrei Alexandrescu wrote:

Apr 10 2010

On 4/10/2010 11:52, Andrei Alexandrescu wrote:I think this would work: uint maxOR(uint maxa, uint maxb) { if (maxa < maxb) return maxOR(maxb, maxa); uint candidate = 0; foreach (i, bit; byBitDescending(maxa)) { if (bit) continue; auto t = candidate | (1 << (31 - i)); if (t <= maxb) candidate = t; } return maxa | candidate; }This looks wrong. Your function, if I understand it correctly, flips all the bits in 'maxb' (excluding the leading zeros). If 'maxb' is exactly 1 less than a power of 2, then 'candidate' will be 0. Now consider a in [0, 16], b in [0, 15]. Your function will produce 16, but the real maximum is 31. For maximum accuracy, you have to consider the minima as well as the maxima when calculating the new maximum. With 'a' in [16, 16] and 'b' in [16, 16], the new range can only be [16, 16]. With 'a' in [0, 16] and 'b' in [0, 16], the correct new range is [0, 31]. -- Rainer Deyke - rainerd eldwood.com

Apr 10 2010

Rainer Deyke wrote:On 4/10/2010 11:52, Andrei Alexandrescu wrote:It's a very interesting puzzle. There are some pretty complex cases. Consider (all numbers binary) a in [1010, 1100] and b in [101,101]. The maximum value is when a=1011, b=101, so max is 1111.I think this would work: uint maxOR(uint maxa, uint maxb) { if (maxa < maxb) return maxOR(maxb, maxa); uint candidate = 0; foreach (i, bit; byBitDescending(maxa)) { if (bit) continue; auto t = candidate | (1 << (31 - i)); if (t <= maxb) candidate = t; } return maxa | candidate; }This looks wrong. Your function, if I understand it correctly, flips all the bits in 'maxb' (excluding the leading zeros). If 'maxb' is exactly 1 less than a power of 2, then 'candidate' will be 0. Now consider a in [0, 16], b in [0, 15]. Your function will produce 16, but the real maximum is 31. For maximum accuracy, you have to consider the minima as well as the maxima when calculating the new maximum. With 'a' in [16, 16] and 'b' in [16, 16], the new range can only be [16, 16]. With 'a' in [0, 16] and 'b' in [0, 16], the correct new range is [0, 31].

Apr 10 2010

Am 10.04.2010, 23:17 Uhr, schrieb Don <nospam nospam.com>:Rainer Deyke wrote:How about this? import std.stdio, std.conv, std.intrinsic; void main(string[] args) { int total = 0, match = 0, from = to!(int)(args[1]), to = to!(int)(args[2]); uint res, resBF; for(int i = from; i < to; ++i) { for(int j = i; j < to; ++j) { for(int k = from; k < to; ++k) { for(int m = k; m < to; ++m) { ++total; res = maxOr(i, j, k, m); resBF = maxOrBF(i, j, k, m); if(res == resBF) ++match; //else writefln("mismatch: minA %s, maxA %s, minB %s, maxB %s, maxOr %s, maxOrBF %s", i, j, k, m, res, resBF); } } } } writefln("total %s, match %s", total, match); } uint max(uint a, uint b) { return (a > b) ? a : b; } uint maxOr(uint minA, uint maxA, uint minB, uint maxB) { uint result = minA | maxA | minB | maxB, diff = max(maxA - minA, maxB - minB); if(diff) { result |= (1 << (bsr(diff) + 1)) - 1; } return result; } /* compute result using brute force */ uint maxOrBF(uint minA, uint maxA, uint minB, uint maxB) { return bitsSetInIntervallBF(minA, maxA) | bitsSetInIntervallBF(minB, maxB); } uint bitsSetInIntervallBF(uint min, uint max) { uint result = 0; for(; min <= max; ++min) { result |= min; } return result; }On 4/10/2010 11:52, Andrei Alexandrescu wrote:It's a very interesting puzzle. There are some pretty complex cases. Consider (all numbers binary) a in [1010, 1100] and b in [101,101]. The maximum value is when a=1011, b=101, so max is 1111.I think this would work: uint maxOR(uint maxa, uint maxb) { if (maxa < maxb) return maxOR(maxb, maxa); uint candidate = 0; foreach (i, bit; byBitDescending(maxa)) { if (bit) continue; auto t = candidate | (1 << (31 - i)); if (t <= maxb) candidate = t; } return maxa | candidate; }This looks wrong. Your function, if I understand it correctly, flips all the bits in 'maxb' (excluding the leading zeros). If 'maxb' is exactly 1 less than a power of 2, then 'candidate' will be 0. Now consider a in [0, 16], b in [0, 15]. Your function will produce 16, but the real maximum is 31. For maximum accuracy, you have to consider the minima as well as the maxima when calculating the new maximum. With 'a' in [16, 16] and 'b' in [16, 16], the new range can only be [16, 16]. With 'a' in [0, 16] and 'b' in [0, 16], the correct new range is [0, 31].

Apr 09 2010

On 04/10/2010 01:55 PM, Rainer Deyke wrote:On 4/10/2010 11:52, Andrei Alexandrescu wrote:It tries to set all bits in which maxa is zero starting and keeps them set if the resulting number is less than maxb.I think this would work: uint maxOR(uint maxa, uint maxb) { if (maxa< maxb) return maxOR(maxb, maxa); uint candidate = 0; foreach (i, bit; byBitDescending(maxa)) { if (bit) continue; auto t = candidate | (1<< (31 - i)); if (t<= maxb) candidate = t; } return maxa | candidate; }This looks wrong. Your function, if I understand it correctly, flips all the bits in 'maxb' (excluding the leading zeros).If 'maxb' is exactly 1 less than a power of 2, then 'candidate' will be 0. Now consider a in [0, 16], b in [0, 15]. Your function will produce 16, but the real maximum is 31.If maxb is 2^^N-1, then it will fill all nonzero bits of maxa.For maximum accuracy, you have to consider the minima as well as the maxima when calculating the new maximum. With 'a' in [16, 16] and 'b' in [16, 16], the new range can only be [16, 16]. With 'a' in [0, 16] and 'b' in [0, 16], the correct new range is [0, 31].The minimum for unsigned numbers is very simple: min(mina, minb). Andrei

Apr 10 2010

On Sat, Apr 10, 2010 at 04:05:06PM -0500, Andrei Alexandrescu wrote:The minimum for unsigned numbers is very simple: min(mina, minb).mina = 1 minb = 0 min(1, 0) == 0 But, 1|0 == 1 max(mina, minb) looks to work though. Consider that ORing can only set bits - it can never take a one and spit out a zero. Therefore, x|y >= x && x|y >= y, which only works on the max(mina, minb) function. -- Adam D. Ruppe http://arsdnet.net

Apr 10 2010

On 04/10/2010 04:21 PM, Adam D. Ruppe wrote:On Sat, Apr 10, 2010 at 04:05:06PM -0500, Andrei Alexandrescu wrote:Sorry, I meant max of the two minima. AndreiThe minimum for unsigned numbers is very simple: min(mina, minb).mina = 1 minb = 0 min(1, 0) == 0 But, 1|0 == 1 max(mina, minb) looks to work though. Consider that ORing can only set bits - it can never take a one and spit out a zero. Therefore, x|y>= x&& x|y>= y, which only works on the max(mina, minb) function.

Apr 10 2010

Andrei Alexandrescu wrote:On 04/10/2010 01:55 PM, Rainer Deyke wrote:In other words: maxa | ((1 << highbit (maxb))-1) where: int highbit (uint n) { auto result =3D 0; if (n & 0xFFFF0000) { result +=3D 16; n =3D n >> 16; } if (n & 0xFF00) { result +=3D 8; n =3D n >> 8; } if (n & 0xF0) { result +=3D 4; n =3D n >> 4; } if (n & 0xC) { result +=3D 2; n =3D n >> 2; } if (n & 0x2) { result +=3D 1; n =3D n >> 1; } if (n & 0x1) { result +=3D 1; } return result; } Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.frOn 4/10/2010 11:52, Andrei Alexandrescu wrote:=20 It tries to set all bits in which maxa is zero starting and keeps them set if the resulting number is less than maxb. =20I think this would work: uint maxOR(uint maxa, uint maxb) { if (maxa< maxb) return maxOR(maxb, maxa); uint candidate =3D 0; foreach (i, bit; byBitDescending(maxa)) { if (bit) continue; auto t =3D candidate | (1<< (31 - i)); if (t<=3D maxb) candidate =3D t; } return maxa | candidate; }This looks wrong. Your function, if I understand it correctly, flips all the bits in 'maxb' (excluding the leading zeros).

Apr 10 2010

On 04/10/2010 04:50 PM, "Jérôme M. Berger" wrote:Andrei Alexandrescu wrote:Interesting. I don't get how your version is equivalent to mine, and I don't get how it actually works, but it seems to. Could you please explain? The program below compiles, runs, and prints total=100000000; equal=3850865 (3.85087%) Here it is: import std.contracts, std.stdio; uint maxOR(uint maxa, uint maxb) { return maxa | ((1 << highbit (maxb))-1); } uint highbit(uint n) { auto result = 0; if (n & 0xFFFF0000) { result += 16; n = n >> 16; } if (n & 0xFF00) { result += 8; n = n >> 8; } if (n & 0xF0) { result += 4; n = n >> 4; } if (n & 0xC) { result += 2; n = n >> 2; } if (n & 0x2) { result += 1; n = n >> 1; } if (n & 0x1) { result += 1; } return result; } void main() { ulong total, equal; uint N = 10000; foreach (a; 0 .. N) { foreach (b; 0 .. N) { auto c = maxOR(a, b); enforce((a|b) <= c); if ((a|b) == c) equal++; total++; } } writeln("total=", total, "; equal=", equal, " (", equal * 100. / total, "%)"); } However, your method is not the tightest; seems like mine is tighter. When I replaced maxOR with this: uint maxOR2(uint maxa, uint maxb) { if (maxa < maxb) return maxOR(maxb, maxa); uint candidate = 0; uint mask = 1u << 31; for (uint i = 0; i < 32; ++i, mask >>= 1) { if (maxa & mask) continue; auto t = candidate | (1 << (31 - i)); if (t <= maxb) candidate = t; } return maxa | candidate; } I got: total=100000000; equal=9822516 (9.82252%) Besides, I verified that my method returns numbers <= yours. AndreiOn 04/10/2010 01:55 PM, Rainer Deyke wrote:In other words: maxa | ((1<< highbit (maxb))-1) where: int highbit (uint n) { auto result = 0; if (n& 0xFFFF0000) { result += 16; n = n>> 16; } if (n& 0xFF00) { result += 8; n = n>> 8; } if (n& 0xF0) { result += 4; n = n>> 4; } if (n& 0xC) { result += 2; n = n>> 2; } if (n& 0x2) { result += 1; n = n>> 1; } if (n& 0x1) { result += 1; } return result; } JeromeOn 4/10/2010 11:52, Andrei Alexandrescu wrote:It tries to set all bits in which maxa is zero starting and keeps them set if the resulting number is less than maxb.I think this would work: uint maxOR(uint maxa, uint maxb) { if (maxa< maxb) return maxOR(maxb, maxa); uint candidate = 0; foreach (i, bit; byBitDescending(maxa)) { if (bit) continue; auto t = candidate | (1<< (31 - i)); if (t<= maxb) candidate = t; } return maxa | candidate; }This looks wrong. Your function, if I understand it correctly, flips all the bits in 'maxb' (excluding the leading zeros).

Apr 10 2010

On 04/10/2010 05:33 PM, Andrei Alexandrescu wrote: [snip] One more interesting detail. I simplified the routine to: uint maxOR(uint maxa, uint maxb) { uint candidate = 0; uint mask = 1u << 31; for (; mask; mask >>= 1) { if (maxa & mask) continue; auto t = candidate | mask; if (t <= maxb) candidate = t; } return maxa | candidate; } Among other things I remove the test whether maxa >= maxb at the beginning of the function. I now obtain: total=100000000; equal=14585400 (14.5854%) so this function is better than all others so far. But I don't understand why it beats this one: uint maxOR(uint maxa, uint maxb) { if (maxa < maxb) return maxOR(maxb, maxa); // added uint candidate = 0; uint mask = 1u << 31; for (; mask; mask >>= 1) { if (maxa & mask) continue; auto t = candidate | mask; if (t <= maxb) candidate = t; } return maxa | candidate; } Andrei

Apr 10 2010

On 04/10/2010 05:43 PM, Andrei Alexandrescu wrote:On 04/10/2010 05:33 PM, Andrei Alexandrescu wrote: [snip] One more interesting detail.Never mind that. I mistakenly used maxOR instead of maxOR2. To summarize: best result is total=100000000; equal=14585400 (14.5854%) with the function: uint maxOR(uint maxa, uint maxb) { uint candidate = 0; uint mask = 1u << 31; for (; mask; mask >>= 1) { if (maxa & mask) continue; auto t = candidate | mask; if (t <= maxb) candidate = t; } return maxa | candidate; } Inserting a test and a recursive call at the top keeps the result but actually slows down execution a bit. Andrei

Apr 10 2010

Andrei Alexandrescu wrote:Interesting. I don't get how your version is equivalent to mine, and I don't get how it actually works, but it seems to. Could you please expl=ain?=20It's not equivalent to yours, because it was a bit late and I was tired. Moreover it doesn't work: it should be: maxa | ((1 << (highbit (maxb)+1))-1) Which still isn't as tight as yours but at least works :) Between 0 and 64, it is optimal for 92% cases whereas yours is optimal for over 98% cases.void main() { ulong total, equal; uint N =3D 10000; foreach (a; 0 .. N) { foreach (b; 0 .. N) { auto c =3D maxOR(a, b); enforce((a|b) <=3D c); if ((a|b) =3D=3D c) equal++; total++; } } writeln("total=3D", total, "; equal=3D", equal, " (", equal * 100. / total, "%)"); } =20Note that this is incomplete. Besides, it is very easy to get 100% accuracy with this test: just define maxOR to be maxA|maxB :) I used the following C++ code for my tests: int main (int, char**) { int count =3D 0; int countTight =3D 0; for (uint32_t minA =3D 0; minA < 64; ++minA) for (uint32_t maxA =3D minA; maxA < 64; ++maxA) for (uint32_t minB =3D 0; minB < 64; ++minB) for (uint32_t maxB =3D minB; maxB < 64; ++maxB) { bool reached =3D false; uint32_t max =3D maxOr (minA, minB, maxA, maxB); for (uint32_t a =3D minA; a <=3D maxA; ++a) for (uint32_t b =3D minB; b <=3D maxB; ++b) { if ((a|b) > max) { printf ("minA=3D%x\n" "maxA=3D%x\n" "minB=3D%x\n" "maxB=3D%x\n" "a=3D%x\n" "b=3D%x\n" "a|b=3D%x\n" "maxOr=3D%x\n", minA, maxA, minB, maxB, a, b, a|b, max); exit (1); } if ((a|b) =3D=3D max) reached =3D true; } if (reached) ++countTight; ++count; } printf ("Total: %d\n", count); printf ("Tight: %d (%g%%)\n", countTight, countTight * 100. / count); printf ("Loose: %d (%g%%)\n", (count - countTight), (count - countTight) * 100. / count); return 0; } OTOH, my solution was slightly faster (4.98s to your 5.34s) Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr

Apr 11 2010

OK, I found a solution that is optimal in over 99% of the cases (99.195% to be precise), while still being fast (because it avoids looping over the bits): =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D8<------------------------------ uint32_t maxOr (uint32_t minA, uint32_t minB, uint32_t maxA, uint32_t maxB) { if (maxA =3D=3D 0) return maxB; if (maxB =3D=3D 0) return maxA; uint32_t a =3D maxA ^ minA; if (a !=3D 0) a =3D ((1 << highbit (a)) - 1); uint32_t b =3D maxB ^ minB; if (b !=3D 0) b =3D ((1 << highbit (b)) - 1); uint32_t t =3D maxA & maxB; if (t !=3D 0) t =3D ((1 << highbit (t)) - 1); return min (maxA|maxB|a|b, maxA|maxB|t); } ------------------------------>8=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Mostly it's black magic ;) and it's **much** simpler than the first version that reached those performances. The explanation is in the rest of the message. I will only talk about the ``a`` side of the problem here (i.e assume ``maxB=3D=3DminB``). The ``b`` side is symmetrical. The idea is to build a value that is between minA and maxA and will set as many bits as possible when or'ed with maxB: - The first thing we do is to notice that minA and maxA have the following structure (in binary): ``minA=3DA0x`` and ``maxA=3DA1y`` where the ``A`` part is identical. Any number between minA and maxA will therefore be of the form ``a=3DAz``. A very conservative estimate tells us that if we set ``z`` to all ones, then ``maxB|maxA|z`` will be greater than ``maxB|a`` for all ``a``. This is computed by ``(1 << highbit (maxA ^ minA)) - 1``; - Another way to look at the problem is to say that a ``0`` bit in maxA cannot be set unless some higher ``1`` bit is unset. For example if maxA is ``0b10`` then if we want to set bit 0, we need to unset bit 1 (which will give us ``0b01``). So by doing this we get a lower value for ``a|maxB`` unless this bit was already set in maxB. The expression ``maxA & maxB`` gives us the bits that we can unset. Therefore only bits that are less significant than the high bit of ``maxA & maxB`` may be set. This is stored into the variable ``t``; - Either method works, but neither is perfect. By taking the minimum of the two results, we get the best of both worlds (although still isn't perfect). Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr

Apr 11 2010

ï¿½ wrote:The idea is to build a value that is between minA and maxA and will set as many bits as possible when or'ed with maxB:The assumption that maxB would be the value that produces the maximum a|b is not correct. A lower valued b may fill more gaps in the bit representation of what is calculated from min_a and max_a. Your function failed for me with the following values: min_a 00000000000000000000000011001000 000000c8 200 max_a 00000000000000000000001100001111 0000030f 783 min_b 00000000000000000000000001000101 00000045 69 max_b 00000000000000000000001001100001 00000261 609 calculated 00000000000000000000001001100101 00000265 613 WRONG! empirical 00000000000000000000001111111111 000003ff 1023 emp_max_a 00000000000000000000000110011110 0000019e 414 emp_max_b 00000000000000000000001001100001 00000261 609 Please see my test code elsewhere in the same thread: :) http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=108851 Ali

Apr 11 2010

boundary="------------020801040508020503030409" This is a multi-part message in MIME format. --------------020801040508020503030409 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Ali =C3=87ehreli wrote:=EF=BF=BD wrote: =20rs.D&article_id=3D108851The idea is to build a value that is between minA and maxA and will set as many bits as possible when or'ed with maxB:=20 The assumption that maxB would be the value that produces the maximum a|b is not correct. A lower valued b may fill more gaps in the bit representation of what is calculated from min_a and max_a. =20 Your function failed for me with the following values: =20 min_a 00000000000000000000000011001000 000000c8 200 max_a 00000000000000000000001100001111 0000030f 783 min_b 00000000000000000000000001000101 00000045 69 max_b 00000000000000000000001001100001 00000261 609 calculated 00000000000000000000001001100101 00000265 613 WRONG! empirical 00000000000000000000001111111111 000003ff 1023 emp_max_a 00000000000000000000000110011110 0000019e 414 emp_max_b 00000000000000000000001001100001 00000261 609 =20 Please see my test code elsewhere in the same thread: :) =20 http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalma==20The "calculated" value above obviously was not computed with my function! Since the return value from my function includes maxA and maxB, at least all bits that are set in either of those should be set in the output. I've run my code with those input and the result is 3ff as expected... (See attached source file). Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr --------------020801040508020503030409 Content-Type: text/x-c++src; name="maxor.cc" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline; filename="maxor.cc" #include <assert.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> int highbit (uint32_t n) { assert (n !=3D 0); int result =3D 0; if (n & 0xFFFF0000) { result +=3D 16; n >>=3D 16; } if (n & 0xFF00) { result +=3D 8; n >>=3D 8; } if (n & 0xF0) { result +=3D 4; n >>=3D 4; } if (n & 0xC) { result +=3D 2; n >>=3D 2; } if (n & 0x2) { result +=3D 1; n >>=3D 1; } return result; } inline uint32_t min (uint32_t a, uint32_t b) { return (a<b) ? a : b; } uint32_t maxOr (uint32_t minA, uint32_t minB, uint32_t maxA, uint32_t maxB) { assert (minA <=3D maxA); assert (minB <=3D maxB); if (maxA =3D=3D 0) return maxB; if (maxB =3D=3D 0) return maxA; uint32_t a =3D maxA ^ minA; if (a !=3D 0) a =3D ((1 << highbit (a)) - 1); uint32_t b =3D maxB ^ minB; if (b !=3D 0) b =3D ((1 << highbit (b)) - 1); uint32_t t =3D maxA & maxB; if (t !=3D 0) t =3D ((1 << highbit (t)) - 1); return min (maxA|maxB|a|b, maxA|maxB|t); } int main (int, char**) { uint32_t minA =3D 200; uint32_t maxA =3D 783; uint32_t minB =3D 69; uint32_t maxB =3D 609; printf ("minA=3D%x\n" "maxA=3D%x\n" "minB=3D%x\n" "maxB=3D%x\n" "maxOr=3D%x\n", minA, maxA, minB, maxB, maxOr (minA, minB, maxA, maxB)); return 0; } --------------020801040508020503030409--

Apr 11 2010

JÃ©rÃ´me M. Berger wrote:Ali Ã‡ehreli wrote:http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=108851ï¿½ wrote:The idea is to build a value that is between minA and maxA and will set as many bits as possible when or'ed with maxB:The assumption that maxB would be the value that produces the maximum a|b is not correct. A lower valued b may fill more gaps in the bit representation of what is calculated from min_a and max_a. Your function failed for me with the following values: min_a 00000000000000000000000011001000 000000c8 200 max_a 00000000000000000000001100001111 0000030f 783 min_b 00000000000000000000000001000101 00000045 69 max_b 00000000000000000000001001100001 00000261 609 calculated 00000000000000000000001001100101 00000265 613 WRONG! empirical 00000000000000000000001111111111 000003ff 1023 emp_max_a 00000000000000000000000110011110 0000019e 414 emp_max_b 00000000000000000000001001100001 00000261 609 Please see my test code elsewhere in the same thread: :)The "calculated" value above obviously was not computed with my function!My mistake: It was your function but the highbit() that I used was not correct. The one I used was returning the *value* of the highest bit.Since the return value from my function includes maxA and maxB, at least all bits that are set in either of those should be set in the output. I've run my code with those input and the result is 3ff as expected... (See attached source file).Perhaps my test code is wrong; but I can't find my error. :/ Your function reports 0b_111 for these set of values: min_a = 5; max_a = 6; min_b = 4; max_b = 4; But the maximum value of a|b is 6. Ali

Apr 11 2010

Ali =C3=87ehreli wrote:J=C3=A9r=C3=B4me M. Berger wrote:Ali =C3=87ehreli wrote:=EF=BF=BD wrote:The idea is to build a value that is between minA and maxA and will=set as many bits as possible when or'ed with maxB:The assumption that maxB would be the value that produces the maximum=a|b is not correct. A lower valued b may fill more gaps in the bit representation of what is calculated from min_a and max_a. Your function failed for me with the following values: min_a 00000000000000000000000011001000 000000c8 200=max_a 00000000000000000000001100001111 0000030f 783=min_b 00000000000000000000000001000101 00000045 69=max_b 00000000000000000000001001100001 00000261 609=calculated 00000000000000000000001001100101 00000265 613=WRONG! empirical 00000000000000000000001111111111 000003ff 1023=emp_max_a 00000000000000000000000110011110 0000019e 414=emp_max_b 00000000000000000000001001100001 00000261 609=rs.D&article_id=3D108851http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalma=Please see my test code elsewhere in the same thread: :)=20Yes, like I said with my code, it is conservative. It will give the optimal result for over 99% of the cases but give a slightly higher value for the rest. I'll post a new version that's 100% precise in a few minutes. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.frThe "calculated" value above obviously was not computed with my function!=20 My mistake: It was your function but the highbit() that I used was not correct. The one I used was returning the *value* of the highest bit. =20Since the return value from my function includes maxA and maxB, at least all bits that are set in either of those should be set in the output. I've run my code with those input and the result is 3ff as expected... (See attached source file).=20 Perhaps my test code is wrong; but I can't find my error. :/ =20 Your function reports 0b_111 for these set of values: =20 min_a =3D 5; max_a =3D 6; min_b =3D 4; max_b =3D 4; =20 But the maximum value of a|b is 6. =20

Apr 12 2010

JÃ©rÃ´me M. Berger wrote:Now I see my problem: I took Andrei's puzzle literally. The implication of _compiler_implementation_ was too subtle for me. That's the reason why none of my "correctly inaccurate" solutions has been posted to this thread. :) AliYour function reports 0b_111 for these set of values: min_a = 5; max_a = 6; min_b = 4; max_b = 4; But the maximum value of a|b is 6.Yes, like I said with my code, it is conservative. It will give the optimal result for over 99% of the cases but give a slightly higher value for the rest.

Apr 12 2010

boundary="------------010309040703050906000802" This is a multi-part message in MIME format. --------------010309040703050906000802 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable J=E9r=F4me M. Berger wrote:OK, I found a solution that is optimal in over 99% of the cases (99.195% to be precise), while still being fast (because it avoids looping over the bits): =20I've run my function and Andrei's on all possible min, max pairs in 0..299 without checking, just for the timing. On my computer (Athlon X2 64 2GHz), my function took 50s and Andrei's took 266s. The difference should be even bigger for larger numbers. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr --------------010309040703050906000802 Content-Type: text/x-c++src; name="maxor.cc" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline; filename="maxor.cc" #include <assert.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #define N 300 // Comment the following line to get Andrei's implementation #define MINE // Comment the following line for exhaustive checking (slow! don't // forget to reduce N) #define BENCH int highbit (uint32_t n) { assert (n !=3D 0); int result =3D 0; if (n & 0xFFFF0000) { result +=3D 16; n >>=3D 16; } if (n & 0xFF00) { result +=3D 8; n >>=3D 8; } if (n & 0xF0) { result +=3D 4; n >>=3D 4; } if (n & 0xC) { result +=3D 2; n >>=3D 2; } if (n & 0x2) { result +=3D 1; n >>=3D 1; } return result; } inline uint32_t min (uint32_t a, uint32_t b) { return (a<b) ? a : b; } #ifdef MINE uint32_t maxOr (uint32_t minA, uint32_t minB, uint32_t maxA, uint32_t maxB) { assert (minA <=3D maxA); assert (minB <=3D maxB); if (maxA =3D=3D 0) return maxB; if (maxB =3D=3D 0) return maxA; uint32_t a =3D maxA ^ minA; if (a !=3D 0) a =3D ((1 << highbit (a)) - 1); uint32_t b =3D maxB ^ minB; if (b !=3D 0) b =3D ((1 << highbit (b)) - 1); uint32_t t =3D maxA & maxB; if (t !=3D 0) t =3D ((1 << highbit (t)) - 1); return min (maxA|maxB|a|b, maxA|maxB|t); } #else // Andrei's uint32_t maxOr (uint32_t, uint32_t, uint32_t maxa, uint32_t maxb) { uint32_t candidate =3D 0; uint32_t mask =3D 1u << 31; for (; mask; mask >>=3D 1) { if (maxa & mask) continue; uint32_t t =3D candidate | mask; if (t <=3D maxb) candidate =3D t; } return maxa | candidate; } #endif int main (int, char**) { int count =3D 0; int countTight =3D 0; for (uint32_t minA =3D 0; minA < N; ++minA) for (uint32_t maxA =3D minA; maxA < N; ++maxA) for (uint32_t minB =3D 0; minB < N; ++minB) for (uint32_t maxB =3D minB; maxB < N; ++maxB) { bool reached =3D false; uint32_t max =3D maxOr (minA, minB, maxA, maxB); #ifdef BENCH count +=3D max; #else // Check for (uint32_t a =3D minA; a <=3D maxA; ++a) for (uint32_t b =3D minB; b <=3D maxB; ++b) { if ((a|b) > max) { printf ("minA=3D%x\n" "maxA=3D%x\n" "minB=3D%x\n" "maxB=3D%x\n" "a=3D%x\n" "b=3D%x\n" "a|b=3D%x\n" "maxOr=3D%x\n", minA, maxA, minB, maxB, a, b, a|b, max); exit (1); } if ((a|b) =3D=3D max) reached =3D true; } if (reached) ++countTight; ++count; #endif } #ifdef BENCH printf ("Result: %d (this number doesn't mean anything, it is only her= e\n" "to make sure the compiler doesn't optimize everything away...= \n", count); #else printf ("Total: %d\n", count); printf ("Tight: %d (%g%%)\n", countTight, countTight * 100. / count); printf ("Loose: %d (%g%%)\n", (count - countTight), (count - countTight) * 100. / count); #endif return 0; } --------------010309040703050906000802--

Apr 11 2010

Jérôme M. Berger Wrote:Jérôme M. Berger wrote:Can I ask, what is the point of having a non-exact solution (unless you are just doing this for fun)? We are talking about range propagation, a function of the compiler, not a function of the compiled program. Therefore, slower but more exact functions should be preferred, since they only affect the compile time. Note that you will only need to do this range propagation on lines that "or" two values together, and something that reduces the runtime of the compiler by 216 seconds, but only when compiling enough code to have over 8 billion 'or' operations in it (300^4), I don't think is really that important. Give me the exact solution that prevents me from having to cast when the compiler can prove it, even if the runtime of the compiler is slightly slower. Just my $.02 -SteveOK, I found a solution that is optimal in over 99% of the cases (99.195% to be precise), while still being fast (because it avoids looping over the bits):I've run my function and Andrei's on all possible min, max pairs in 0..299 without checking, just for the timing. On my computer (Athlon X2 64 2GHz), my function took 50s and Andrei's took 266s. The difference should be even bigger for larger numbers.

Apr 11 2010

On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:We are talking about range propagation, a function of the compiler, not a function of the compiled program. Therefore, slower but more exact functions should be preferred, since they only affect the compile time.I agree with this. While my solution has a certain beauty to it and is fast, your solution accurately covers all the bases, whereas mine fails with min > 1. I say we go with your's. -- Adam D. Ruppe http://arsdnet.net

Apr 11 2010

On 04/11/2010 09:18 PM, Adam D. Ruppe wrote:On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:Agreed. Steve's solution is the best we have for unsigned numbers. AndreiWe are talking about range propagation, a function of the compiler, not a function of the compiled program. Therefore, slower but more exact functions should be preferred, since they only affect the compile time.I agree with this. While my solution has a certain beauty to it and is fast, your solution accurately covers all the bases, whereas mine fails with min> 1. I say we go with your's.

Apr 12 2010

Andrei Alexandrescu wrote:On 04/11/2010 09:18 PM, Adam D. Ruppe wrote:On Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:We are talking about range propagation, a function of the compiler, not a function of the compiled program. Therefore, slower but more exact functions should be preferred, since they only affect the compile time.I agree with this. While my solution has a certain beauty to it and is=Nope, not anymore... Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.frfast, your solution accurately covers all the bases, whereas mine fails with min> 1. I say we go with your's.=20 Agreed. Steve's solution is the best we have for unsigned numbers. =20 Andrei =20

Apr 12 2010

On 04/12/2010 12:57 PM, "Jérôme M. Berger" wrote:Andrei Alexandrescu wrote:Looks great, thanks Jerome! Now, how about that case in which some or all of the ranges involved include negative values? A simple approach is to define them in terms of ranges for unsigned numbers. Here are the cases I identified: a) If both ranges cross zero, then: minOR == setmsb(min( minOR(nomsb(min_a), nomsb(-1), nomsb(min_b), nomsb(max_b)), minOR(nomsb(min_a), nomsb(max_a), nomsb(min_b), nomsb(-1)))) maxOR == maxOR(0, max_a, 0, max_b) b) If both ranges are negative, then: minOR == setmsb(minOR(nomsb(max_a), nomsb(min_a), nomsb(max_b), nomsb(min_b))) maxOR == setmsb(maxOR(nomsb(max_a), nomsb(min_a), nomsb(max_b), nomsb(min_b))) c) If a crosses zero and b doesn't: minOR == setmsb(minOR(nomsb(min_a), nomsb(-1), min_b, max_b)) maxOR == maxOR(0, max_a, min_b, max_b) The primitive nomsb clears off the sign bit in a number, and setmsb sets it. Makes sense? AndreiOn 04/11/2010 09:18 PM, Adam D. Ruppe wrote:Nope, not anymore... JeromeOn Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:Agreed. Steve's solution is the best we have for unsigned numbers. Andrei

Apr 16 2010

Andrei Alexandrescu wrote:On 04/12/2010 12:57 PM, "Jérôme M. Berger" wrote:A comment: The way DMD currently does range propagation (storing range as union{long, ulong}) is wrong. Currently when mixed signed+-unsigned operations happen, it gives very pessimistic results. The sign bit should be stored separately. sign value 1 1xxxxx Negative, long.min..-1 1 0xxxxx (Never occurs) 0 0xxxxx 0..long.max 0 1xxxxx long.max+1..ulong.max I'm not sure if that makes this case a little more, or a little less complicated. But this defines the problem a bit better.Andrei Alexandrescu wrote:Looks great, thanks Jerome! Now, how about that case in which some or all of the ranges involved include negative values? A simple approach is to define them in terms of ranges for unsigned numbers. Here are the cases I identified: a) If both ranges cross zero, then: minOR == setmsb(min( minOR(nomsb(min_a), nomsb(-1), nomsb(min_b), nomsb(max_b)), minOR(nomsb(min_a), nomsb(max_a), nomsb(min_b), nomsb(-1)))) maxOR == maxOR(0, max_a, 0, max_b) b) If both ranges are negative, then: minOR == setmsb(minOR(nomsb(max_a), nomsb(min_a), nomsb(max_b), nomsb(min_b))) maxOR == setmsb(maxOR(nomsb(max_a), nomsb(min_a), nomsb(max_b), nomsb(min_b))) c) If a crosses zero and b doesn't: minOR == setmsb(minOR(nomsb(min_a), nomsb(-1), min_b, max_b)) maxOR == maxOR(0, max_a, min_b, max_b) The primitive nomsb clears off the sign bit in a number, and setmsb sets it. Makes sense? AndreiOn 04/11/2010 09:18 PM, Adam D. Ruppe wrote:Nope, not anymore... JeromeOn Sun, Apr 11, 2010 at 10:00:41PM -0400, Steven Schveighoffer wrote:Agreed. Steve's solution is the best we have for unsigned numbers. Andrei

Apr 20 2010

Steven Schveighoffer wrote:J=EF=BF=BDr=EF=BF=BDme M. Berger Wrote: =20are just doing this for fun)?J=EF=BF=BDr=EF=BF=BDme M. Berger wrote:=20 Can I ask, what is the point of having a non-exact solution (unless you=OK, I found a solution that is optimal in over 99% of the cases (99.195% to be precise), while still being fast (because it avoids looping over the bits):I've run my function and Andrei's on all possible min, max pairs in 0..299 without checking, just for the timing. On my computer (Athlon X2 64 2GHz), my function took 50s and Andrei's took 266s. The difference should be even bigger for larger numbers.=20 We are talking about range propagation, a function of the compiler, not=a function of the compiled program. Therefore, slower but more exact fu= nctions should be preferred, since they only affect the compile time. No= te that you will only need to do this range propagation on lines that "or= " two values together, and something that reduces the runtime of the comp= iler by 216 seconds, but only when compiling enough code to have over 8 b= illion 'or' operations in it (300^4), I don't think is really that import= ant. Give me the exact solution that prevents me from having to cast whe= n the compiler can prove it, even if the runtime of the compiler is sligh= tly slower.=20Funny you should say that given the current thread comparing the speed of the D compiler to that of the Go compiler... We are talking about range propagation, a function of the compiler, not a function of the compiled program. Since we can't get a 100% accurate representation of the possible values anyway (even yours might leave holes in the middle after all), then it might be better to choose a faster, slightly less precise algorithm if the difference is not too great. That's the difference between a compiler and a full-fledged static code analysis an program prover. Anyway, the point is moot, I have a new version of my algorithm with 100% precision and high speed. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr

Apr 12 2010

On Mon, 12 Apr 2010 13:45:14 -0400, JÃ©rÃ´me M. Berger <jeberger free.fr> wrote:Steven Schveighoffer wrote:My point was simply that the amount of time saved is relative to the size of the program being compiled. If we assume conservatively that every other line in a program has bitwise or in it, then you are talking about a 16 billion line program, meaning the 216 seconds you save is insignificant compared to the total compile time. My point was that your fast solution that is inaccurate is not preferable because nobody notices the speed, they just notice the inaccuracy.Jï¿½rï¿½me M. Berger Wrote:Funny you should say that given the current thread comparing the speed of the D compiler to that of the Go compiler...Jï¿½rï¿½me M. Berger wrote:Can I ask, what is the point of having a non-exact solution (unless you are just doing this for fun)? We are talking about range propagation, a function of the compiler, not a function of the compiled program. Therefore, slower but more exact functions should be preferred, since they only affect the compile time. Note that you will only need to do this range propagation on lines that "or" two values together, and something that reduces the runtime of the compiler by 216 seconds, but only when compiling enough code to have over 8 billion 'or' operations in it (300^4), I don't think is really that important. Give me the exact solution that prevents me from having to cast when the compiler can prove it, even if the runtime of the compiler is slightly slower.OK, I found a solution that is optimal in over 99% of the cases (99.195% to be precise), while still being fast (because it avoids looping over the bits):I've run my function and Andrei's on all possible min, max pairs in 0..299 without checking, just for the timing. On my computer (Athlon X2 64 2GHz), my function took 50s and Andrei's took 266s. The difference should be even bigger for larger numbers.We are talking about range propagation, a function of the compiler, not a function of the compiled program. Since we can't get a 100% accurate representation of the possible values anyway (even yours might leave holes in the middle after all), then it might be better to choose a faster, slightly less precise algorithm if the difference is not too great. That's the difference between a compiler and a full-fledged static code analysis an program prover.When we're talking about the difference between O(1) and O(lgn), I'll take accuracy over speed in my compiler any day. The solution should be 100% accurate for the problem statement. If the problem statement is not what we need, then we need a new problem statement :) Solving the problem statement for 99% of values is not good enough.Anyway, the point is moot, I have a new version of my algorithm with 100% precision and high speed.Yes, I'm still trying to understand how it works :) -Steve

Apr 12 2010

Steven Schveighoffer wrote:On Mon, 12 Apr 2010 13:45:14 -0400, J=C3=A9r=C3=B4me M. Berger <jeberge=r free.fr>wrote: =20,We are talking about range propagation, a function of the compiler=enot a function of the compiled program. Since we can't get a 100% accurate representation of the possible values anyway (even yours might leave holes in the middle after all), then it might be better to choose a faster, slightly less precise algorithm if the difference is not too great. That's the difference between a compiler and a full-fledged static code analysis an program prover.=20 When we're talking about the difference between O(1) and O(lgn), I'll take accuracy over speed in my compiler any day. The solution should b=100% accurate for the problem statement. If the problem statement is not what we need, then we need a new problem statement :) Solving the problem statement for 99% of values is not good enough. =20And when we're talking about the difference between 10s and 55s for a minimal loss of accuracy, which will you take? Especially if the accuracy loss is less than is lost elsewhere (due to holes in the ranges).:) Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.frAnyway, the point is moot, I have a new version of my algorithm with 100% precision and high speed.=20 Yes, I'm still trying to understand how it works :) =20

Apr 12 2010

JÃ©rÃ´me M. Berger Wrote:Steven Schveighoffer wrote:Really? You rebuilt the compiler with your range propagation algorithm and verified that it adds 10 seconds versus an accurate one that adds 55s? How much time did the compiler spend to compile? I'd hazard to guess that a code base that adds 10s worth of your algorithm takes at least a few hours to compile. Is 55s that bad at that point? Again, if it takes the compiler an extra insignificant amount of time to be more accurate, I'm all for accuracy over speed when you get down to that level of insignificance. I'd say the point of pain has to be at least 10% of the compile time before it makes any sizable difference. -SteveWhen we're talking about the difference between O(1) and O(lgn), I'll take accuracy over speed in my compiler any day.And when we're talking about the difference between 10s and 55s for a minimal loss of accuracy, which will you take? Especially if the accuracy loss is less than is lost elsewhere (due to holes in the ranges).

Apr 12 2010

Steven Schveighoffer wrote:J=C3=A9r=C3=B4me M. Berger Wrote: =20Steven Schveighoffer wrote:When we're talking about the difference between O(1) and O(lgn), I'll=and verified that it adds 10 seconds versus an accurate one that adds 55= s? How much time did the compiler spend to compile? I'd hazard to guess= that a code base that adds 10s worth of your algorithm takes at least a = few hours to compile. Is 55s that bad at that point?=20 Really? You rebuilt the compiler with your range propagation algorithm=take accuracy over speed in my compiler any day.And when we're talking about the difference between 10s and 55s for a minimal loss of accuracy, which will you take? Especially if the accuracy loss is less than is lost elsewhere (due to holes in the ranges).=20 Again, if it takes the compiler an extra insignificant amount of time t=o be more accurate, I'm all for accuracy over speed when you get down to = that level of insignificance. I'd say the point of pain has to be at lea= st 10% of the compile time before it makes any sizable difference.=20My point is that if you always choose an algorithm that is 5 to 6 times slower just because it brings extra precision which you may not really need, then you will wind up with a compiler that is 5 to 6 times slower than it needs to be. Sure the difference on *one* function is not great in absolute terms, but if you make the same choice for *all* functions, then where do you go? Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr

Apr 13 2010

On Tue, 13 Apr 2010 13:37:13 -0400, JÃ©rÃ´me M. Berger <jeberger free.fr> wrote:Steven Schveighoffer wrote:In my opinion? Yes, slower compilers that make code easier to write are better. I don't spend lots of time compiling, I spend it writing code. And I don't need to babysit the compiler, it goes off and does its thing. Performance is only important in the end result. I'm not saying I want my compiler to be slow, but I want it to be accurate and useful more than I want it to be quick. In this case, there is a quick and fully accurate solution, so it doesn't matter. But, for instance, if the compiler could do a full analysis to check if variables escape their scope, and that makes the compiler 5x slower, then I'd rather have the compiler verify my work instead of quickly producing memory-corrupting code. -SteveJÃ©rÃ´me M. Berger Wrote:My point is that if you always choose an algorithm that is 5 to 6 times slower just because it brings extra precision which you may not really need, then you will wind up with a compiler that is 5 to 6 times slower than it needs to be. Sure the difference on *one* function is not great in absolute terms, but if you make the same choice for *all* functions, then where do you go?Steven Schveighoffer wrote:Really? You rebuilt the compiler with your range propagation algorithm and verified that it adds 10 seconds versus an accurate one that adds 55s? How much time did the compiler spend to compile? I'd hazard to guess that a code base that adds 10s worth of your algorithm takes at least a few hours to compile. Is 55s that bad at that point? Again, if it takes the compiler an extra insignificant amount of time to be more accurate, I'm all for accuracy over speed when you get down to that level of insignificance. I'd say the point of pain has to be at least 10% of the compile time before it makes any sizable difference.When we're talking about the difference between O(1) and O(lgn), I'll take accuracy over speed in my compiler any day.And when we're talking about the difference between 10s and 55s for a minimal loss of accuracy, which will you take? Especially if the accuracy loss is less than is lost elsewhere (due to holes in the ranges).

Apr 13 2010

Steven Schveighoffer wrote:In my opinion? Yes, slower compilers that make code easier to write ar=ebetter. I don't spend lots of time compiling, I spend it writing code.==20And I don't need to babysit the compiler, it goes off and does its thin=g.=20 Performance is only important in the end result. I'm not saying I want=my compiler to be slow, but I want it to be accurate and useful more than I want it to be quick. In this case, there is a quick and fully accurate solution, so it doesn't matter. But, for instance, if the compiler could do a full analysis to check if variables escape their scope, and that makes the compiler 5x slower, then I'd rather have the compiler verify my work instead of quickly producing memory-corrupting code. =20This is why there are static verification tools. I'm all in favor of this kind of tools, but I don't want to have to put up with their slowness every time I compile something. Having a quick write-compile-test cycle is very important IMO. Then when I have a rough outline of the program, I can run another tool (or adding a command line option) to enable extra checking (and then it doesn't matter if the tool takes the whole night to run and I get the results the next morning when I get to work). Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr

Apr 13 2010

On 12-apr-10, at 21:40, Steven Schveighoffer wrote:On Mon, 12 Apr 2010 13:45:14 -0400, J=C3=A9r=C3=B4me M. Berger =20 <jeberger free.fr> wrote:=20Steven Schveighoffer wrote:J=EF=BF=BDr=EF=BF=BDme M. Berger Wrote:J=EF=BF=BDr=EF=BF=BDme M. Berger wrote:in 0..299 without checking, just for the timing. On my computer =20 (Athlon X2 64 2GHz), my function took 50s and Andrei's took 266s. The difference should be even bigger for larger numbers.Can I ask, what is the point of having a non-exact solution =20 (unless you are just doing this for fun)? We are talking about range propagation, a function of the =20 compiler, not a function of the compiled program. Therefore, =20 slower but more exact functions should be preferred, since they =20 only affect the compile time. Note that you will only need to do =20=this range propagation on lines that "or" two values together, and =20=something that reduces the runtime of the compiler by 216 seconds, =20=but only when compiling enough code to have over 8 billion 'or' =20 operations in it (300^4), I don't think is really that important. =20=My point was simply that the amount of time saved is relative to the =20=Give me the exact solution that prevents me from having to cast =20 when the compiler can prove it, even if the runtime of the =20 compiler is slightly slower.Funny you should say that given the current thread comparing the speed of the D compiler to that of the Go compiler...size of the program being compiled. If we assume conservatively =20 that every other line in a program has bitwise or in it, then you =20 are talking about a 16 billion line program, meaning the 216 seconds =20=you save is insignificant compared to the total compile time. My =20 point was that your fast solution that is inaccurate is not =20 preferable because nobody notices the speed, they just notice the =20 inaccuracy.compiler,We are talking about range propagation, a function of the =not a function of the compiled program. Since we can't get a 100% accurate representation of the possible values anyway (even yours might leave holes in the middle after all), then it might be better to choose a faster, slightly less precise algorithm if the difference is not too great. That's the difference between a compiler and a full-fledged static code analysis an program prover.When we're talking about the difference between O(1) and O(lgn), =20 I'll take accuracy over speed in my compiler any day. The solution =20=should be 100% accurate for the problem statement. If the problem =20 statement is not what we need, then we need a new problem =20 statement :) Solving the problem statement for 99% of values is not =20=good enough.Sorry for the probably stupid question, but I don't understand much =20 the need of all this range propagation, in the compiler either you =20 have a a constant (and then you have the value at compile time, and =20 you don't need any range propagation, you can compare with the value), =20= or you have a runtime value. Do you want to explicitly add in the compiler the support for more =20 limited runtime values? Otherwise the range of a runtime value is a priori the whole possible =20= range, and thus any rule based on range propagation might be expressed =20= as static type based rule (as done in C). You can gain something for example you can know that summing 4 shorts =20= you will never overflow an int, is this where you want to go? What kind of bugs you are trying to avoid? Or is it simply having the =20= max and min properties defined? Nobody commented on my proposal, but there I see the potential for =20 bugs (1u-2)+1UL is 4294967296 and this happens also at runtime if you =20= have, for example, a function returning size_t and another returning =20 uint, and you combine them without thinking much and then you switch =20 to 64 bits... There I see a problem, but this you see without any special range =20 propagation, just thinking that subtraction or negation of unsigned =20 types is modulo 2^bit size, and thus cannot be then changed to another =20= size without explicit cast. Maybe in some cases using enums range propagation might spare a cast, =20= but is it really an improvement? I guess that I am missing something obvious, so I don't see the reason =20= for range propagation, but maybe I am not the only one, so thanks for =20= an explanation... Fawzi=Anyway, the point is moot, I have a new version of my algorithm with 100% precision and high speed.Yes, I'm still trying to understand how it works :)

Apr 13 2010

Fawzi Mohamed:I guess that I am missing something obvious, so I don't see the reason for range propagation, but maybe I am not the only one, so thanks for an explanation...Range propagation can improve the situation a little, so it's not bad. But it can't replace proper integral overflows. You need integral overflows at runtime. Bye, bearophile

Apr 13 2010

charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit On 13-apr-10, at 12:01, bearophile wrote:Fawzi Mohamed:integral overflow are helpful only if you have automatic conversion to a larger type, but that breaks the compile time knowledge of the size of such an integer, so that you have to assume that it might need to be pushed to the heap. Yes you might use some tricks (like using size_t.sizeof*8-1 bits, or a pointer to spare some place), but I don't think that D wants to go that way for the basic integers...I guess that I am missing something obvious, so I don't see the reason for range propagation, but maybe I am not the only one, so thanks for an explanation...Range propagation can improve the situation a little, so it's not bad. But it can't replace proper integral overflows. You need integral overflows at runtime.Bye, bearophile

Apr 13 2010

Fawzi Mohamed wrote:On 12-apr-10, at 21:40, Steven Schveighoffer wrote:It's been part of DMD2 for a while now. It allows you to do things like: ubyte lowbits(int x) { return x & 7; } without an explicit cast. The compiler knows that x&7 can safely fit inside a single byte. Whereas ((x&7) << 12) | (x&3); does not fit, and requires an explicit cast.On Mon, 12 Apr 2010 13:45:14 -0400, JÃ©rÃ´me M. Berger <jeberger free.fr> wrote:Sorry for the probably stupid question, but I don't understand much the need of all this range propagation, in the compiler either you have a a constant (and then you have the value at compile time, and you don't need any range propagation, you can compare with the value), or you have a runtime value.Steven Schveighoffer wrote:My point was simply that the amount of time saved is relative to the size of the program being compiled. If we assume conservatively that every other line in a program has bitwise or in it, then you are talking about a 16 billion line program, meaning the 216 seconds you save is insignificant compared to the total compile time. My point was that your fast solution that is inaccurate is not preferable because nobody notices the speed, they just notice the inaccuracy.Jï¿½rï¿½me M. Berger Wrote:Funny you should say that given the current thread comparing the speed of the D compiler to that of the Go compiler...Jï¿½rï¿½me M. Berger wrote:Can I ask, what is the point of having a non-exact solution (unless you are just doing this for fun)? We are talking about range propagation, a function of the compiler, not a function of the compiled program. Therefore, slower but more exact functions should be preferred, since they only affect the compile time. Note that you will only need to do this range propagation on lines that "or" two values together, and something that reduces the runtime of the compiler by 216 seconds, but only when compiling enough code to have over 8 billion 'or' operations in it (300^4), I don't think is really that important. Give me the exact solution that prevents me from having to cast when the compiler can prove it, even if the runtime of the compiler is slightly slower.We are talking about range propagation, a function of the compiler, not a function of the compiled program. Since we can't get a 100% accurate representation of the possible values anyway (even yours might leave holes in the middle after all), then it might be better to choose a faster, slightly less precise algorithm if the difference is not too great. That's the difference between a compiler and a full-fledged static code analysis an program prover.When we're talking about the difference between O(1) and O(lgn), I'll take accuracy over speed in my compiler any day. The solution should be 100% accurate for the problem statement. If the problem statement is not what we need, then we need a new problem statement :) Solving the problem statement for 99% of values is not good enough.Anyway, the point is moot, I have a new version of my algorithm with 100% precision and high speed.Yes, I'm still trying to understand how it works :)

Apr 13 2010

On 13-apr-10, at 12:02, Don wrote:Fawzi Mohamed wrote:<jeberger free.=20On 12-apr-10, at 21:40, Steven Schveighoffer wrote:On Mon, 12 Apr 2010 13:45:14 -0400, J=C3=A9r=C3=B4me M. Berger =fr> wrote:Steven Schveighoffer wrote:J=EF=BF=BDr=EF=BF=BDme M. Berger Wrote:J=EF=BF=BDr=EF=BF=BDme M. Berger wrote:Can I ask, what is the point of having a non-exact solution =20 (unless you are just doing this for fun)? We are talking about range propagation, a function of the =20 compiler, not a function of the compiled program. Therefore, =20 slower but more exact functions should be preferred, since they =20=OK, I found a solution that is optimal in over 99% of the =20 cases (99.195% to be precise), while still being fast (because it =20 avoids looping over the bits):I've run my function and Andrei's on all possible min, max =20 pairs in 0..299 without checking, just for the timing. On my computer =20 (Athlon X2 64 2GHz), my function took 50s and Andrei's took 266s. The difference should be even bigger for larger numbers.only affect the compile time. Note that you will only need to =20 do this range propagation on lines that "or" two values =20 together, and something that reduces the runtime of the compiler =20=by 216 seconds, but only when compiling enough code to have over =20=8 billion 'or' operations in it (300^4), I don't think is really =20=that important. Give me the exact solution that prevents me =20 from having to cast when the compiler can prove it, even if the =20=My point was simply that the amount of time saved is relative to =20 the size of the program being compiled. If we assume =20 conservatively that every other line in a program has bitwise or =20 in it, then you are talking about a 16 billion line program, =20 meaning the 216 seconds you save is insignificant compared to the =20=runtime of the compiler is slightly slower.Funny you should say that given the current thread comparing the speed of the D compiler to that of the Go compiler...total compile time. My point was that your fast solution that is =20=inaccurate is not preferable because nobody notices the speed, =20 they just notice the inaccuracy.Sorry for the probably stupid question, but I don't understand much =20=We are talking about range propagation, a function of the =20 compiler, not a function of the compiled program. Since we can't get a 100% accurate representation of the possible values anyway (even yours might leave holes in the middle after all), then it might be better to choose a faster, slightly less precise algorithm if the difference is not too great. That's the difference between a compiler and a full-fledged static code analysis an program prover.When we're talking about the difference between O(1) and O(lgn), =20 I'll take accuracy over speed in my compiler any day. The =20 solution should be 100% accurate for the problem statement. If =20 the problem statement is not what we need, then we need a new =20 problem statement :) Solving the problem statement for 99% of =20 values is not good enough.Anyway, the point is moot, I have a new version of my algorithm with 100% precision and high speed.Yes, I'm still trying to understand how it works :)the need of all this range propagation, in the compiler either you =20=have a a constant (and then you have the value at compile time, and =20=you don't need any range propagation, you can compare with the =20 value), or you have a runtime value.It's been part of DMD2 for a while now. It allows you to do things =20 like: ubyte lowbits(int x) { return x & 7; } without an explicit cast. The compiler knows that x&7 can safely fit =20=inside a single byte. Whereas ((x&7) << 12) | (x&3); does not fit, and requires an explicit cast.ah ok I understand, I thought that was treated like x & cast(ubyte)7 , =20= and so as comparison of the compile time value with the ranges of =20 ubyte (no range propagation needed). But I can understand why it is treated as cast(ubyte)((cast(int)x) & =20 7), as it is probably easier for the compiler, as it upcasts by default. Thanks for the explanation.

Apr 13 2010

Fawzi Mohamed wrote:=20 On 13-apr-10, at 12:02, Don wrote:e:It's been part of DMD2 for a while now. It allows you to do things lik=ubyte lowbits(int x) { return x & 7; } without an explicit cast. The compiler knows that x&7 can safely fit inside a single byte. Whereas ((x&7) << 12) | (x&3); does not fit, and requires an explicit cast.=20 ah ok I understand, I thought that was treated like x & cast(ubyte)7 , and so as comparison of the compile time value with the ranges of ubyte=(no range propagation needed). But I can understand why it is treated as cast(ubyte)((cast(int)x) & 7)=,as it is probably easier for the compiler, as it upcasts by default. =20 Thanks for the explanation. =20Note that in Don's example, "x" is an int, not a ubyte, so you don't need the cast to "int" in your second example, but you still need range propagation in the first... Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr

Apr 13 2010

Here's a fast 100% precise solution: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D8<------------------------------ uint32_t maxOr (uint32_t minA, uint32_t minB, uint32_t maxA, uint32_t maxB) { assert (minA <=3D maxA); assert (minB <=3D maxB); if (maxA =3D=3D 0) return maxB; if (maxB =3D=3D 0) return maxA; uint32_t a =3D maxA ^ minA; if (a !=3D 0) { a =3D ((1 << (highbit (a)+1)) - 1) & maxA & maxB; if (a !=3D 0) a =3D (1 << highbit (a)) - 1; } uint32_t b =3D maxB ^ minB; if (b !=3D 0) { b =3D ((1 << (highbit (b)+1)) - 1) & maxA & maxB; if (b !=3D 0) b =3D (1 << highbit (b)) - 1; } return maxA|maxB|a|b; } ------------------------------>8=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr

Apr 12 2010

On Mon, 12 Apr 2010 13:56:23 -0400, JÃ©rÃ´me M. Berger <jeberger free.fr> wrote:Here's a fast 100% precise solution: ==============================8<------------------------------ uint32_t maxOr (uint32_t minA, uint32_t minB, uint32_t maxA, uint32_t maxB) { assert (minA <= maxA); assert (minB <= maxB); if (maxA == 0) return maxB; if (maxB == 0) return maxA; uint32_t a = maxA ^ minA; if (a != 0) { a = ((1 << (highbit (a)+1)) - 1) & maxA & maxB; if (a != 0) a = (1 << highbit (a)) - 1; } uint32_t b = maxB ^ minB; if (b != 0) { b = ((1 << (highbit (b)+1)) - 1) & maxA & maxB; if (b != 0) b = (1 << highbit (b)) - 1; } return maxA|maxB|a|b; } ------------------------------>8==============================Fails for test case: minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result is 6). I'm not quite sure what you are trying to do, but I think you are on the right path to getting a faster solution. -Steve

Apr 12 2010

Steven Schveighoffer wrote:Fails for test case: =20 minA =3D 4, maxA =3D 4, minB =3D 4, maxB =3D 6 (outputs 7, accurate res=ult is 6).=20Nope, outputs 6. Note that I've run an exhaustive search for all combinations in the [0, 63] range, so if there are mistakes they have to be outside that range... Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr

Apr 12 2010

JÃ©rÃ´me M. Berger wrote:Steven Schveighoffer wrote:I confirm. Jerome's code passes my randomized test. AliFails for test case: minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result is 6).Nope, outputs 6. Note that I've run an exhaustive search for all combinations in the [0, 63] range, so if there are mistakes they have to be outside that range... Jerome

Apr 12 2010

JÃ©rÃ´me M. Berger Wrote:Steven Schveighoffer wrote:I'll have to double check, I thought I copied your code verbatim (I did switch around the arguments to be minA, maxA, minB, maxB to fit my test harness, and I changed the uint_32_t to uint). I'll check tomorrow at work where the computer I used to test is. -SteveFails for test case: minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result is 6).Nope, outputs 6. Note that I've run an exhaustive search for all combinations in the [0, 63] range, so if there are mistakes they have to be outside that range...

Apr 12 2010

Steven Schveighoffer Wrote:I'll have to double check, I thought I copied your code verbatim (I did switch around the arguments to be minA, maxA, minB, maxB to fit my test harness, and I changed the uint_32_t to uint). I'll check tomorrow at work where the computer I used to test is.I think I see the issue, I was using an older version of your highbit, at some point you changed it, but didn't repeat it in this solution, so I searched for it on the newsgroup and found your first version. That differs from later versions you posted, but I can't find where you identified there was a bug. I'll test again tomorrow, but it looks like that probably accounts for the problem. The version I was using: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=108788 The later version: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=108855 It looks like in that second post that Ali had the same problem I did. Next time, you should include your whole code each time, especially parts you changed. -Steve

Apr 12 2010

Steven Schveighoffer wrote:Steven Schveighoffer Wrote: =20d switch around the arguments to be minA, maxA, minB, maxB to fit my test= harness, and I changed the uint_32_t to uint). I'll check tomorrow at w= ork where the computer I used to test is.I'll have to double check, I thought I copied your code verbatim (I di==20 I think I see the issue, I was using an older version of your highbit, =at some point you changed it, but didn't repeat it in this solution, so I= searched for it on the newsgroup and found your first version. That dif= fers from later versions you posted, but I can't find where you identifie= d there was a bug.=20 I'll test again tomorrow, but it looks like that probably accounts for =the problem.=20 The version I was using: http://www.digitalmars.com/webnews/newsgroups.=php?art_group=3Ddigitalmars.D&article_id=3D108788=20 The later version: http://www.digitalmars.com/webnews/newsgroups.php?ar=t_group=3Ddigitalmars.D&article_id=3D108855=20 It looks like in that second post that Ali had the same problem I did. =20 Next time, you should include your whole code each time, especially par=ts you changed.=20Oups, I hadn't noticed that I sent two different versions (except for the C<->D translation). Sorry about that. Note that there is no bug: both versions are correct. The first one uses 1-based indexes, while the second uses 0-based indexes an processes 0 differently. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr

Apr 13 2010

On Mon, 12 Apr 2010 22:37:00 -0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:JÃƒÂ©rÃƒÂ´me M. Berger Wrote:I confirm, with the updated highbit, your solution works. Also, inspired by your solution (and using your highbit function, which is very neat BTW) I came up with a one-liner. Enjoy :) uint maxor(uint mina, uint maxa, uint minb, uint maxb) { return maxa | maxb | ((1 << highbit(((1 << (max(highbit(mina ^ maxa), highbit(minb ^ maxb)) + 1)) - 1) & maxa & maxb)) - 1); } Anyone want to do the equivalent minor? I've spent way too much time on this :) -SteveSteven Schveighoffer wrote:I'll have to double check, I thought I copied your code verbatim (I did switch around the arguments to be minA, maxA, minB, maxB to fit my test harness, and I changed the uint_32_t to uint). I'll check tomorrow at work where the computer I used to test is.Fails for test case: minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate result is6). Nope, outputs 6. Note that I've run an exhaustive search for all combinations in the [0, 63] range, so if there are mistakes they have to be outside that range...

Apr 13 2010

Steven Schveighoffer wrote:On Mon, 12 Apr 2010 22:37:00 -0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:Awesome stuff, guys. I think that's worthy of inclusion as an exercise in Knuth's Volume 4, fascile 1. Suggest it to someone at ACCU? (Knuth has highbit(x) but calls it lambda(x). He notes the result that highbit(x)==highbit(y) iff x^y <= x&y).JÃƒÂ©rÃƒÂ´me M. Berger Wrote:I confirm, with the updated highbit, your solution works. Also, inspired by your solution (and using your highbit function, which is very neat BTW) I came up with a one-liner. Enjoy :) uint maxor(uint mina, uint maxa, uint minb, uint maxb) { return maxa | maxb | ((1 << highbit(((1 << (max(highbit(mina ^ maxa), highbit(minb ^ maxb)) + 1)) - 1) & maxa & maxb)) - 1); } Anyone want to do the equivalent minor? I've spent way too much time on this :) -SteveSteven Schveighoffer wrote:I'll have to double check, I thought I copied your code verbatim (I did switch around the arguments to be minA, maxA, minB, maxB to fit my test harness, and I changed the uint_32_t to uint). I'll check tomorrow at work where the computer I used to test is.Fails for test case: minA = 4, maxA = 4, minB = 4, maxB = 6 (outputs 7, accurate resultis 6). Nope, outputs 6. Note that I've run an exhaustive search for all combinations in the [0, 63] range, so if there are mistakes they have to be outside that range...

Apr 13 2010

Jerome's highbit function is the same as std.intrinsic.bsr. I wonder which is faster? I ran a test, and for 100 million iterations (1..10000000), the intrinsic beat out his function be a mere 300 milliseconds on my box. - highbit ran in an average of 1897 ms and bsr did the same in an average if 1534. Recompiling with -inline -O -release cuts the raw numbers about in half, but keeps about the same difference, leading me to think overhead amounts for a fair amount of the percentage instead of actual implementation. The new averages are 1134 and 853. I've gotta say, your implementation of the function is better than I would have done though. Without bsr, I probably would have looped, shifted, and tested each individual bit... On 4/13/10, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Mon, 12 Apr 2010 22:37:00 -0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:esult isJ=C3=A9r=C3=B4me M. Berger Wrote:Steven Schveighoffer wrote:Fails for test case: minA =3D 4, maxA =3D 4, minB =3D 4, maxB =3D 6 (outputs 7, accurate r=sI confirm, with the updated highbit, your solution works. Also, inspired by your solution (and using your highbit function, which i=6). Nope, outputs 6. Note that I've run an exhaustive search for all combinations in the [0, 63] range, so if there are mistakes they have to be outside that range...I'll have to double check, I thought I copied your code verbatim (I did switch around the arguments to be minA, maxA, minB, maxB to fit my test harness, and I changed the uint_32_t to uint). I'll check tomorrow at work where the computer I used to test is.very neat BTW) I came up with a one-liner. Enjoy :) uint maxor(uint mina, uint maxa, uint minb, uint maxb) { return maxa | maxb | ((1 << highbit(((1 << (max(highbit(mina ^ maxa)=,highbit(minb ^ maxb)) + 1)) - 1) & maxa & maxb)) - 1); } Anyone want to do the equivalent minor? I've spent way too much time on this :) -Steve

Apr 13 2010

On Tue, 13 Apr 2010 09:56:34 -0400, Adam Ruppe <destructionator gmail.com> wrote:Jerome's highbit function is the same as std.intrinsic.bsr. I wonder which is faster?Just a note, this code should be runnable in C++, because the compiler is written in C++. Is bsr available there?Recompiling with -inline -O -release cuts the raw numbers about in half, but keeps about the same difference, leading me to think overhead amounts for a fair amount of the percentage instead of actual implementation. The new averages are 1134 and 853.Probably the inlining accounts for the savings here. Calling a function is rather expensive.I've gotta say, your implementation of the function is better than I would have done though. Without bsr, I probably would have looped, shifted, and tested each individual bit...I never would have thought of doing a binary search for the high bit, it is definitely a novel idea to me :) -Steve

Apr 13 2010

Steven Schveighoffer:I never would have thought of doing a binary search for the high bit, it is definitely a novel idea to me :)You can learn something from this page (it can also be useful for the translation to C++): http://graphics.stanford.edu/~seander/bithacks.html Bye, bearophile

Apr 13 2010

On Tue, Apr 13, 2010 at 10:52:14AM -0400, Steven Schveighoffer wrote:Just a note, this code should be runnable in C++, because the compiler is written in C++. Is bsr available there?I just assumed since it was in D that it was probably in DMC too, but I can't find it in the docs, so maybe not. Well, there's always inline asm :D But 300 milliseconds over 100 million iterations is negligible anyway. -- Adam D. Ruppe http://arsdnet.net

Apr 13 2010

Adam Ruppe Wrote:Jerome's highbit function is the same as std.intrinsic.bsr. I wonder which is faster? I ran a test, and for 100 million iterations (1..10000000), the intrinsic beat out his function be a mere 300 milliseconds on my box. - highbit ran in an average of 1897 ms and bsr did the same in an average if 1534. Recompiling with -inline -O -release cuts the raw numbers about in half, but keeps about the same difference, leading me to think overhead amounts for a fair amount of the percentage instead of actual implementation. The new averages are 1134 and 853.That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I guess is the reason it's an intrinsic in the first place). I would have expected this to be much, much faster than a user function. Anyone care enough to check the generated assembly? [1] http://www.itis.mn.it/linux/quarta/x86/bsr.htm -- Clemens

Apr 13 2010

On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I guess is the reason it's an intrinsic in the first place). I would have expected this to be much, much faster than a user function. Anyone care enough to check the generated assembly?The opcode is fairly slow anyway (as far as opcodes go) - odds are the implementation inside the processor is similar to Jerome's method, and the main savings come from it loading fewer bytes into the pipeline. I remember a line from a blog, IIRC it was the author of the C++ FQA writing it, saying hardware and software are pretty much the same thing - moving an instruction to hardware doesn't mean it will be any faster, since it is the same algorithm, just done in processor microcode instead of user opcodes. -- Adam D. Ruppe http://arsdnet.net

Apr 13 2010

Adam D. Ruppe wrote:On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:It's fast on Intel, slow on AMD. I bet the speed difference comes from inlining max().That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I guess is the reason it's an intrinsic in the first place). I would have expected this to be much, much faster than a user function. Anyone care enough to check the generated assembly?The opcode is fairly slow anyway (as far as opcodes go) - odds are the implementation inside the processor is similar to Jerome's method, and the main savings come from it loading fewer bytes into the pipeline. I remember a line from a blog, IIRC it was the author of the C++ FQA writing it, saying hardware and software are pretty much the same thing - moving an instruction to hardware doesn't mean it will be any faster, since it is the same algorithm, just done in processor microcode instead of user opcodes.

Apr 13 2010

Don wrote:Adam D. Ruppe wrote:Specifically, bsr is 7 uops on AMD, 1 uop on Intel since the original Pentium. AMD's performance is shameful. And bsr() is supported in the compiler; in fact DMC uses it extensively, which is why it's included in DMD!On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:It's fast on Intel, slow on AMD. I bet the speed difference comes from inlining max().That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I guess is the reason it's an intrinsic in the first place). I would have expected this to be much, much faster than a user function. Anyone care enough to check the generated assembly?The opcode is fairly slow anyway (as far as opcodes go) - odds are the implementation inside the processor is similar to Jerome's method, and the main savings come from it loading fewer bytes into the pipeline. I remember a line from a blog, IIRC it was the author of the C++ FQA writing it, saying hardware and software are pretty much the same thing - moving an instruction to hardware doesn't mean it will be any faster, since it is the same algorithm, just done in processor microcode instead of user opcodes.

Apr 13 2010

Steven Schveighoffer wrote:using your highbit function, which is very neat BTW =20Thanks, but I can't claim credit for it. It's a classical algorithm which comes up regularly in programming discussions and newsgroups...uint maxor(uint mina, uint maxa, uint minb, uint maxb) { return maxa | maxb | ((1 << highbit(((1 << (max(highbit(mina ^ maxa), highbit(minb ^ maxb)) + 1)) - 1) & maxa & maxb)) - 1); } =20Neat! Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr

Apr 13 2010

is it just me, or does this fail for minA = 0 minB = 0 maxA = 0x80_00_00_00 maxB = 0x80_00_00_00 I'm pretty sure you need to handle 1 << 32 specially

Dec 04 2010

On 12/04/2010 01:21 PM, Ellery Newcomer wrote:is it just me, or does this fail for minA = 0 minB = 0 maxA = 0x80_00_00_00 maxB = 0x80_00_00_00 I'm pretty sure you need to handle 1<< 32 speciallyi'm thinking this is better: uint32_t maxOr (uint32_t minA, uint32_t minB, uint32_t maxA, uint32_t maxB) { assert (minA <= maxA); assert (minB <= maxB); if (maxA == 0) return maxB; if (maxB == 0) return maxA; uint32_t a = maxA ^ minA; if (a != 0) { a |= (1 << highbit (a)) - 1; a = a & maxA & maxB; if (a != 0) a = (1 << highbit (a)) - 1; } uint32_t b = maxB ^ minB; if (b != 0) { b |= (1 << highbit (b)) - 1; b = b & maxA & maxB; if (b != 0) b = (1 << highbit (b)) - 1; } return maxA|maxB|a|b; }

Dec 04 2010

Andrei Alexandrescu:byte c = a | b; Say you already know min_a, max_a, min_b, and max_b. How do you compute min_c and max_c? I thought of it for a bit and it seems quite tricky.This is Interval arithmetic. If you are able to find a book about it... (or a library for some programming language that implements it). Bye, bearophile

Apr 10 2010

Hello Andrei,Consider: byte c = a | b; Say you already know min_a, max_a, min_b, and max_b. How do you compute min_c and max_c? I thought of it for a bit and it seems quite tricky.x_bit_max == "the largest bit in x_max ^ x_min" x_oth_max == x_bit_max - 1; c_max = max(a_bit_max | b_oth_max, b_bit_max | a_oth_max) | (a_max & ~(a_bit_max | a_oth_max)); -- ... <IXOYE><

Apr 10 2010

Hello BCS,Hello Andrei,// or make that c_min = (a_max & ~(a_bit_max | a_oth_max); c_max = max(a_bit_max | b_oth_max, b_bit_max | a_oth_max) | c_min;Consider: byte c = a | b; Say you already know min_a, max_a, min_b, and max_b. How do you compute min_c and max_c? I thought of it for a bit and it seems quite tricky.x_bit_max == "the largest bit in x_max ^ x_min" x_oth_max == x_bit_max - 1;c_max = max(a_bit_max | b_oth_max, b_bit_max | a_oth_max) | (a_max & ~(a_bit_max | a_oth_max));-- ... <IXOYE><

Apr 10 2010

Hello BCS, Scratch all that, I'm clueless. :b -- ... <IXOYE><

Apr 10 2010