D - Objection: bsf() return value

Georg Wrede (15/20) May 15 2003 Both return the bit number of the first set bit. The return value is und...

Georg Wrede (2/22) May 16 2003
Walter (11/31) May 18 2003 bsr and bsf are intended as the simplest possible shell around the x86 b...

Georg Wrede (38/82) May 19 2003 I've thought about this, and I've given up. The original thought went s...

Walter (25/62) May 19 2003 In general, you are right. However, the bsr and bsf intrinsics came abou...

Georg Wrede (40/45) May 20 2003 Ahah. Thanks, I seem to learn a lot here!

Walter (11/23) May 20 2003 not

Serge K (17/32) May 21 2003 making a

Georg Wrede <Georg_member pathlink.com> writes:

The spec says:


  int bsf(uint v)

  Scans the bits in v starting with bit 0, looking for the first set bit. 

  int bsr(uint v)

  Scans the bits in v from the most significant bit to the least 
  significant bit, looking for the first set bit.

Both return the bit number of the first set bit. The return value is undefined
if v is zero.


Am I the only one who finds this unnerving? Would it not be more appropriate if
the return value were -1 if v is zero?

This way no zero-check in programmer code is needed before bsr. And since
immediately after bsr there usually is a branch anyhow, this would be the
logical place to handle the zero case too.

Maybe it is for performance reasons? Yes, bsr is faster this way, but the test
itself has to be done somewhere anyhow. Besides, if the return value is
undefined, then there is all the reason to fear that bsr could return a
reasonable looking value, and the programmer who forgot to check in advance for
zero value would not be the wiser.

Bonus: this change would not break existing code. And it works the same with any
practical size v.

May 15 2003

Georg Wrede <Georg_member pathlink.com> writes:

It sounds like I'm talking about bsr only, but of course this goes for both.

In article <ba0fr5$1d2d$1 digitaldaemon.com>, Georg Wrede says...
The spec says:


  int bsf(uint v)

  Scans the bits in v starting with bit 0, looking for the first set bit. 

  int bsr(uint v)

  Scans the bits in v from the most significant bit to the least 
  significant bit, looking for the first set bit.

Both return the bit number of the first set bit. The return value is undefined
if v is zero.


Am I the only one who finds this unnerving? Would it not be more appropriate if
the return value were -1 if v is zero?

This way no zero-check in programmer code is needed before bsr. And since
immediately after bsr there usually is a branch anyhow, this would be the
logical place to handle the zero case too.

Maybe it is for performance reasons? Yes, bsr is faster this way, but the test
itself has to be done somewhere anyhow. Besides, if the return value is
undefined, then there is all the reason to fear that bsr could return a
reasonable looking value, and the programmer who forgot to check in advance for
zero value would not be the wiser.

Bonus: this change would not break existing code. And it works the same with any
practical size v.

May 16 2003

"Walter" <walter digitalmars.com> writes:

bsr and bsf are intended as the simplest possible shell around the x86 bsr
and bsf opcodes. The compiler recognizes those functions and they're well
integrated into the optimizer and code generator. This makes them great as
building blocks for more advanced operations.

"Georg Wrede" <Georg_member pathlink.com> wrote in message
news:ba0fr5$1d2d$1 digitaldaemon.com...
 The spec says:


  int bsf(uint v)

  Scans the bits in v starting with bit 0, looking for the first set bit.

  int bsr(uint v)

  Scans the bits in v from the most significant bit to the least
  significant bit, looking for the first set bit.

 Both return the bit number of the first set bit. The return value is

undefined
 if v is zero.


 Am I the only one who finds this unnerving? Would it not be more

appropriate if
 the return value were -1 if v is zero?

 This way no zero-check in programmer code is needed before bsr. And since
 immediately after bsr there usually is a branch anyhow, this would be the
 logical place to handle the zero case too.

 Maybe it is for performance reasons? Yes, bsr is faster this way, but the

test
 itself has to be done somewhere anyhow. Besides, if the return value is
 undefined, then there is all the reason to fear that bsr could return a
 reasonable looking value, and the programmer who forgot to check in

advance for
 zero value would not be the wiser.

 Bonus: this change would not break existing code. And it works the same

with any
 practical size v.

May 18 2003

Georg Wrede <Georg_member pathlink.com> writes:

I've thought about this, and I've given up.  The original thought went something
like:


If speed here is really, really important for the programmer, then he'd probably
code the parts of his routine in assembler anyhow. On the other hand, usually
bsr would be used when examining data where you are not sure of which bits are
set. In normal circumstances this might include the uint being entirely zero,
for all he knows. In most cases where he has at least some idea of the bits,
he'd probably test for individual bits instead.

Using bsr then is restricted to cases where you can guarantee that the uint is
not entirely zero. This can be achieved either by testing for zero in D code, or
by making the algorithm robust enough to never, ever encounter the zero case.
The former case is bound to be way slower than having bsr return -1 and testing
in D for -1 only after testing for the "valid" cases.

The latter case _requires_ the D code to be solid, which might be perfectly
acceptable in a language like C or C++ (offence intended against them). D,
however seems to avoid unnecessary "catches". (For which we all love Walter,
right?)

Since I'm no Intel hacker, (I did only 6502 in the early '80s) I have no strong
opinion whether the added clocks for testing for zero and returning -1 would be
inappropriate here.

Also, since these are "Intrinsics", the average user is implicitly discouraged
from using them? 

Of course, if we leave bsr as-is, the compiler might be made to recogize and
optimise idioms like

if (v) {
a = bsr(v);
if (a < b) {
// do something
else if (a > c) {
// do something
}
} else {
throw new Error("Who stole all our bits?");
}


In hindsight, it seems to me the above code already compiles to something just
as fast as what could be achieved by changig the bsr function. So even making a
non-intrinsic bsr that does the -1 thing would be of no value.



In article <ba9vj3$ee0$1 digitaldaemon.com>, Walter says...
bsr and bsf are intended as the simplest possible shell around the x86 bsr
and bsf opcodes. The compiler recognizes those functions and they're well
integrated into the optimizer and code generator. This makes them great as
building blocks for more advanced operations.

"Georg Wrede" <Georg_member pathlink.com> wrote in message
news:ba0fr5$1d2d$1 digitaldaemon.com...
 The spec says:


  int bsf(uint v)

  Scans the bits in v starting with bit 0, looking for the first set bit.

  int bsr(uint v)

  Scans the bits in v from the most significant bit to the least
  significant bit, looking for the first set bit.

 Both return the bit number of the first set bit. The return value is

undefined
 if v is zero.


 Am I the only one who finds this unnerving? Would it not be more

appropriate if
 the return value were -1 if v is zero?

 This way no zero-check in programmer code is needed before bsr. And since
 immediately after bsr there usually is a branch anyhow, this would be the
 logical place to handle the zero case too.

 Maybe it is for performance reasons? Yes, bsr is faster this way, but the

test
 itself has to be done somewhere anyhow. Besides, if the return value is
 undefined, then there is all the reason to fear that bsr could return a
 reasonable looking value, and the programmer who forgot to check in

advance for
 zero value would not be the wiser.

 Bonus: this change would not break existing code. And it works the same

with any
 practical size v.

May 19 2003

"Walter" <walter digitalmars.com> writes:

In general, you are right. However, the bsr and bsf intrinsics came about
because I use bit vectors extensively in the compiler's optimizer
implementation. The functions in it were large and complex, and so were not
good candidates for recoding in assembler. Having direct access to bsr and
bsf got about a 10% improvement in compiler optimization speed.

"Georg Wrede" <Georg_member pathlink.com> wrote in message
news:baae7r$smk$1 digitaldaemon.com...
 I've thought about this, and I've given up.  The original thought went

something
 like:


 If speed here is really, really important for the programmer, then he'd

probably
 code the parts of his routine in assembler anyhow. On the other hand,

usually
 bsr would be used when examining data where you are not sure of which bits

are
 set. In normal circumstances this might include the uint being entirely

zero,
 for all he knows. In most cases where he has at least some idea of the

bits,
 he'd probably test for individual bits instead.

 Using bsr then is restricted to cases where you can guarantee that the

uint is
 not entirely zero. This can be achieved either by testing for zero in D

code, or
 by making the algorithm robust enough to never, ever encounter the zero

case.
 The former case is bound to be way slower than having bsr return -1 and

testing
 in D for -1 only after testing for the "valid" cases.

 The latter case _requires_ the D code to be solid, which might be

perfectly
 acceptable in a language like C or C++ (offence intended against them). D,
 however seems to avoid unnecessary "catches". (For which we all love

Walter,
 right?)

 Since I'm no Intel hacker, (I did only 6502 in the early '80s) I have no

strong
 opinion whether the added clocks for testing for zero and returning -1

would be
 inappropriate here.

 Also, since these are "Intrinsics", the average user is implicitly

discouraged
 from using them?

 Of course, if we leave bsr as-is, the compiler might be made to recogize

and
 optimise idioms like

 if (v) {
 a = bsr(v);
 if (a < b) {
 // do something
 else if (a > c) {
 // do something
 }
 } else {
 throw new Error("Who stole all our bits?");
 }


 In hindsight, it seems to me the above code already compiles to something

just
 as fast as what could be achieved by changig the bsr function. So even

making a
 non-intrinsic bsr that does the -1 thing would be of no value.

May 19 2003

Georg Wrede <Georg_member pathlink.com> writes:

In article <bab492$1k6u$1 digitaldaemon.com>, Walter says...
In general, you are right. However, the bsr and bsf intrinsics came about
because I use bit vectors extensively in the compiler's optimizer
implementation. The functions in it were large and complex, and so were not
good candidates for recoding in assembler. Having direct access to bsr and
bsf got about a 10% improvement in compiler optimization speed.

Ahah. Thanks, I seem to learn a lot here!

This means that when one has a bitfield representing boolean values, say for
whether to execute some from a set of subroutines, one gets to test for several
falses at one go with bsf -- if several of them are zero you can skip testing
those individually.

Knowing (or, rather suspecting) that one day they might all be false, all one
has to do is reserve one bit that always stays true, i.e. is never actually
used. This lets one skip the zero test, which saves a lot of time in a tight
loop.

Careful desicions about which bit represents which action seems key here. Say
one uses the msb for always-true, then the most likely falses should be at the
other end, near lsb. But even this is not enough, since in real life the things
the bits represent are not genuinely mutually independent. This means that if
bit X is true, then the probabilities for the other bits being true are
different than if X is false. This affects ordering decisions, especially if bsf
is going to be used more than once on the bit set. This might be useful when a
specific bit being true implies lowered probability for several of the following
to be false.

A rigorous theoretical analysis would of course be required here, but in real
life one seldom has this option. Then the answer is "profiling, profiling,
profiling!" But before that, dumping the bitset in a log from several test runs
could give invaluable statistical data about probabilities and correlations,
which will make the order decisions made between profiling runs more profitable.

At this time it is also possible to change the meaning of the bits. Sure, true
should always mean yes, but here it would be worthwhile to use false to mean yes
for some bits to further increase the number of bits skipped by bsf.

The yes-no meaning and the ordering of the bits can even be changed at runtime.
All one needs is a reshuffle (maybe with a lookup table), and an xor for the
changing of yes-no for some bits. A matching inline copy of the branch logic is
then used instead of the original.

I wonder if these optimizations are the very reason bsr and bsf exist in CPUs?

---

Why am I littering this important mailgroup with these ruminations? Well, I can
assume some readers have not thought about this, maybe since they have a
non-C(++) or non-asm bacground. D is especially attractive to those who have
avoided C(++), but still have always wanted to use a Real programming language.
Sharing is gold.

D makes it easy and straightforward to use this technique to gain execution
speed.

May 20 2003

"Walter" <walter digitalmars.com> writes:

"Georg Wrede" <Georg_member pathlink.com> wrote in message
news:badevg$2mb0$1 digitaldaemon.com...
 In article <bab492$1k6u$1 digitaldaemon.com>, Walter says...
In general, you are right. However, the bsr and bsf intrinsics came about
because I use bit vectors extensively in the compiler's optimizer
implementation. The functions in it were large and complex, and so were


not
good candidates for recoding in assembler. Having direct access to bsr


and
bsf got about a 10% improvement in compiler optimization speed.

 Ahah. Thanks, I seem to learn a lot here!

 This means that when one has a bitfield representing boolean values, say

for
 whether to execute some from a set of subroutines, one gets to test for

several
 falses at one go with bsf -- if several of them are zero you can skip

testing
 those individually.

What I was doing was looking for any bits set in a very large bit vector.
There were two loops, the first  did a scasd to find a non-zero 32 bit
value, the inner was replaced with a bsf. So the bsf was only run on
non-zero values.

May 20 2003

"Serge K" <skarebo programmer.net> writes:

 Of course, if we leave bsr as-is, the compiler might be made to recogize

and
 optimise idioms like

 if (v) {
 a = bsr(v);
 if (a < b) {
 // do something
 else if (a > c) {
 // do something
 }
 } else {
 throw new Error("Who stole all our bits?");
 }

 In hindsight, it seems to me the above code already compiles to something

just
 as fast as what could be achieved by changig the bsr function. So even

making a
 non-intrinsic bsr that does the -1 thing would be of no value.

Actually, both BSR & BSF are conditional operations, in case of a zero input
they do nothing.
It's quite easy to make them return some predefined values in such case.

Here is how I do that (in MSVC):

inline int bsf( int x )
{
  __asm  mov  eax, 32 // if(!x) return 32;
  __asm  bsf  eax, x
}

inline int bsr( int x )
{
  __asm  mov  eax, -1 // if(!x) return -1;
  __asm  bsr  eax, x
}

May 21 2003

D Programming

C/C++ Programming

Other

D - Objection: bsf() return value