www.digitalmars.com         C & C++   DMDScript  

D - Objection: bsf() return value

reply Georg Wrede <Georg_member pathlink.com> writes:
The spec says:


  int bsf(uint v)

  Scans the bits in v starting with bit 0, looking for the first set bit. 

  int bsr(uint v)

  Scans the bits in v from the most significant bit to the least 
  significant bit, looking for the first set bit.

Both return the bit number of the first set bit. The return value is undefined if v is zero. Am I the only one who finds this unnerving? Would it not be more appropriate if the return value were -1 if v is zero? This way no zero-check in programmer code is needed before bsr. And since immediately after bsr there usually is a branch anyhow, this would be the logical place to handle the zero case too. Maybe it is for performance reasons? Yes, bsr is faster this way, but the test itself has to be done somewhere anyhow. Besides, if the return value is undefined, then there is all the reason to fear that bsr could return a reasonable looking value, and the programmer who forgot to check in advance for zero value would not be the wiser. Bonus: this change would not break existing code. And it works the same with any practical size v.
May 15 2003
next sibling parent Georg Wrede <Georg_member pathlink.com> writes:
It sounds like I'm talking about bsr only, but of course this goes for both.

In article <ba0fr5$1d2d$1 digitaldaemon.com>, Georg Wrede says...
The spec says:


  int bsf(uint v)

  Scans the bits in v starting with bit 0, looking for the first set bit. 

  int bsr(uint v)

  Scans the bits in v from the most significant bit to the least 
  significant bit, looking for the first set bit.

Both return the bit number of the first set bit. The return value is undefined if v is zero. Am I the only one who finds this unnerving? Would it not be more appropriate if the return value were -1 if v is zero? This way no zero-check in programmer code is needed before bsr. And since immediately after bsr there usually is a branch anyhow, this would be the logical place to handle the zero case too. Maybe it is for performance reasons? Yes, bsr is faster this way, but the test itself has to be done somewhere anyhow. Besides, if the return value is undefined, then there is all the reason to fear that bsr could return a reasonable looking value, and the programmer who forgot to check in advance for zero value would not be the wiser. Bonus: this change would not break existing code. And it works the same with any practical size v.

May 16 2003
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
bsr and bsf are intended as the simplest possible shell around the x86 bsr
and bsf opcodes. The compiler recognizes those functions and they're well
integrated into the optimizer and code generator. This makes them great as
building blocks for more advanced operations.

"Georg Wrede" <Georg_member pathlink.com> wrote in message
news:ba0fr5$1d2d$1 digitaldaemon.com...
 The spec says:


  int bsf(uint v)

  Scans the bits in v starting with bit 0, looking for the first set bit.

  int bsr(uint v)

  Scans the bits in v from the most significant bit to the least
  significant bit, looking for the first set bit.

Both return the bit number of the first set bit. The return value is

 if v is zero.


 Am I the only one who finds this unnerving? Would it not be more

 the return value were -1 if v is zero?

 This way no zero-check in programmer code is needed before bsr. And since
 immediately after bsr there usually is a branch anyhow, this would be the
 logical place to handle the zero case too.

 Maybe it is for performance reasons? Yes, bsr is faster this way, but the

 itself has to be done somewhere anyhow. Besides, if the return value is
 undefined, then there is all the reason to fear that bsr could return a
 reasonable looking value, and the programmer who forgot to check in

 zero value would not be the wiser.

 Bonus: this change would not break existing code. And it works the same

 practical size v.

May 18 2003
parent reply Georg Wrede <Georg_member pathlink.com> writes:
I've thought about this, and I've given up.  The original thought went something
like:


If speed here is really, really important for the programmer, then he'd probably
code the parts of his routine in assembler anyhow. On the other hand, usually
bsr would be used when examining data where you are not sure of which bits are
set. In normal circumstances this might include the uint being entirely zero,
for all he knows. In most cases where he has at least some idea of the bits,
he'd probably test for individual bits instead.

Using bsr then is restricted to cases where you can guarantee that the uint is
not entirely zero. This can be achieved either by testing for zero in D code, or
by making the algorithm robust enough to never, ever encounter the zero case.
The former case is bound to be way slower than having bsr return -1 and testing
in D for -1 only after testing for the "valid" cases.

The latter case _requires_ the D code to be solid, which might be perfectly
acceptable in a language like C or C++ (offence intended against them). D,
however seems to avoid unnecessary "catches". (For which we all love Walter,
right?)

Since I'm no Intel hacker, (I did only 6502 in the early '80s) I have no strong
opinion whether the added clocks for testing for zero and returning -1 would be
inappropriate here.

Also, since these are "Intrinsics", the average user is implicitly discouraged
from using them? 

Of course, if we leave bsr as-is, the compiler might be made to recogize and
optimise idioms like

if (v) {
a = bsr(v);
if (a < b) {
// do something
else if (a > c) {
// do something
}
} else {
throw new Error("Who stole all our bits?");
}


In hindsight, it seems to me the above code already compiles to something just
as fast as what could be achieved by changig the bsr function. So even making a
non-intrinsic bsr that does the -1 thing would be of no value.



In article <ba9vj3$ee0$1 digitaldaemon.com>, Walter says...
bsr and bsf are intended as the simplest possible shell around the x86 bsr
and bsf opcodes. The compiler recognizes those functions and they're well
integrated into the optimizer and code generator. This makes them great as
building blocks for more advanced operations.

"Georg Wrede" <Georg_member pathlink.com> wrote in message
news:ba0fr5$1d2d$1 digitaldaemon.com...
 The spec says:


  int bsf(uint v)

  Scans the bits in v starting with bit 0, looking for the first set bit.

  int bsr(uint v)

  Scans the bits in v from the most significant bit to the least
  significant bit, looking for the first set bit.

Both return the bit number of the first set bit. The return value is

 if v is zero.


 Am I the only one who finds this unnerving? Would it not be more

 the return value were -1 if v is zero?

 This way no zero-check in programmer code is needed before bsr. And since
 immediately after bsr there usually is a branch anyhow, this would be the
 logical place to handle the zero case too.

 Maybe it is for performance reasons? Yes, bsr is faster this way, but the

 itself has to be done somewhere anyhow. Besides, if the return value is
 undefined, then there is all the reason to fear that bsr could return a
 reasonable looking value, and the programmer who forgot to check in

 zero value would not be the wiser.

 Bonus: this change would not break existing code. And it works the same

 practical size v.


May 19 2003
next sibling parent reply "Walter" <walter digitalmars.com> writes:
In general, you are right. However, the bsr and bsf intrinsics came about
because I use bit vectors extensively in the compiler's optimizer
implementation. The functions in it were large and complex, and so were not
good candidates for recoding in assembler. Having direct access to bsr and
bsf got about a 10% improvement in compiler optimization speed.

"Georg Wrede" <Georg_member pathlink.com> wrote in message
news:baae7r$smk$1 digitaldaemon.com...
 I've thought about this, and I've given up.  The original thought went

 like:


 If speed here is really, really important for the programmer, then he'd

 code the parts of his routine in assembler anyhow. On the other hand,

 bsr would be used when examining data where you are not sure of which bits

 set. In normal circumstances this might include the uint being entirely

 for all he knows. In most cases where he has at least some idea of the

 he'd probably test for individual bits instead.

 Using bsr then is restricted to cases where you can guarantee that the

 not entirely zero. This can be achieved either by testing for zero in D

 by making the algorithm robust enough to never, ever encounter the zero

 The former case is bound to be way slower than having bsr return -1 and

 in D for -1 only after testing for the "valid" cases.

 The latter case _requires_ the D code to be solid, which might be

 acceptable in a language like C or C++ (offence intended against them). D,
 however seems to avoid unnecessary "catches". (For which we all love

 right?)

 Since I'm no Intel hacker, (I did only 6502 in the early '80s) I have no

 opinion whether the added clocks for testing for zero and returning -1

 inappropriate here.

 Also, since these are "Intrinsics", the average user is implicitly

 from using them?

 Of course, if we leave bsr as-is, the compiler might be made to recogize

 optimise idioms like

 if (v) {
 a = bsr(v);
 if (a < b) {
 // do something
 else if (a > c) {
 // do something
 }
 } else {
 throw new Error("Who stole all our bits?");
 }


 In hindsight, it seems to me the above code already compiles to something

 as fast as what could be achieved by changig the bsr function. So even

 non-intrinsic bsr that does the -1 thing would be of no value.

May 19 2003
parent reply Georg Wrede <Georg_member pathlink.com> writes:
In article <bab492$1k6u$1 digitaldaemon.com>, Walter says...
In general, you are right. However, the bsr and bsf intrinsics came about
because I use bit vectors extensively in the compiler's optimizer
implementation. The functions in it were large and complex, and so were not
good candidates for recoding in assembler. Having direct access to bsr and
bsf got about a 10% improvement in compiler optimization speed.

Ahah. Thanks, I seem to learn a lot here! This means that when one has a bitfield representing boolean values, say for whether to execute some from a set of subroutines, one gets to test for several falses at one go with bsf -- if several of them are zero you can skip testing those individually. Knowing (or, rather suspecting) that one day they might all be false, all one has to do is reserve one bit that always stays true, i.e. is never actually used. This lets one skip the zero test, which saves a lot of time in a tight loop. Careful desicions about which bit represents which action seems key here. Say one uses the msb for always-true, then the most likely falses should be at the other end, near lsb. But even this is not enough, since in real life the things the bits represent are not genuinely mutually independent. This means that if bit X is true, then the probabilities for the other bits being true are different than if X is false. This affects ordering decisions, especially if bsf is going to be used more than once on the bit set. This might be useful when a specific bit being true implies lowered probability for several of the following to be false. A rigorous theoretical analysis would of course be required here, but in real life one seldom has this option. Then the answer is "profiling, profiling, profiling!" But before that, dumping the bitset in a log from several test runs could give invaluable statistical data about probabilities and correlations, which will make the order decisions made between profiling runs more profitable. At this time it is also possible to change the meaning of the bits. Sure, true should always mean yes, but here it would be worthwhile to use false to mean yes for some bits to further increase the number of bits skipped by bsf. The yes-no meaning and the ordering of the bits can even be changed at runtime. All one needs is a reshuffle (maybe with a lookup table), and an xor for the changing of yes-no for some bits. A matching inline copy of the branch logic is then used instead of the original. I wonder if these optimizations are the very reason bsr and bsf exist in CPUs? --- Why am I littering this important mailgroup with these ruminations? Well, I can assume some readers have not thought about this, maybe since they have a non-C(++) or non-asm bacground. D is especially attractive to those who have avoided C(++), but still have always wanted to use a Real programming language. Sharing is gold. D makes it easy and straightforward to use this technique to gain execution speed.
May 20 2003
parent "Walter" <walter digitalmars.com> writes:
"Georg Wrede" <Georg_member pathlink.com> wrote in message
news:badevg$2mb0$1 digitaldaemon.com...
 In article <bab492$1k6u$1 digitaldaemon.com>, Walter says...
In general, you are right. However, the bsr and bsf intrinsics came about
because I use bit vectors extensively in the compiler's optimizer
implementation. The functions in it were large and complex, and so were


good candidates for recoding in assembler. Having direct access to bsr


bsf got about a 10% improvement in compiler optimization speed.

Ahah. Thanks, I seem to learn a lot here! This means that when one has a bitfield representing boolean values, say

 whether to execute some from a set of subroutines, one gets to test for

 falses at one go with bsf -- if several of them are zero you can skip

 those individually.

What I was doing was looking for any bits set in a very large bit vector. There were two loops, the first did a scasd to find a non-zero 32 bit value, the inner was replaced with a bsf. So the bsf was only run on non-zero values.
May 20 2003
prev sibling parent "Serge K" <skarebo programmer.net> writes:
 Of course, if we leave bsr as-is, the compiler might be made to recogize

 optimise idioms like

 if (v) {
 a = bsr(v);
 if (a < b) {
 // do something
 else if (a > c) {
 // do something
 }
 } else {
 throw new Error("Who stole all our bits?");
 }

 In hindsight, it seems to me the above code already compiles to something

 as fast as what could be achieved by changig the bsr function. So even

 non-intrinsic bsr that does the -1 thing would be of no value.

Actually, both BSR & BSF are conditional operations, in case of a zero input they do nothing. It's quite easy to make them return some predefined values in such case. Here is how I do that (in MSVC): inline int bsf( int x ) { __asm mov eax, 32 // if(!x) return 32; __asm bsf eax, x } inline int bsr( int x ) { __asm mov eax, -1 // if(!x) return -1; __asm bsr eax, x }
May 21 2003