www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - testing bits in sequence

reply spir <denis.spir gmail.com> writes:
Hello,

I need to test in sequence the bits of an unsigned int (see below more prec=
ision), and move in a tree accordingly. Since there are 2 possible branches=
 at every step, they are encoded in a [2] array, indexed by bit. I am looki=
ng for the fastest way to get that bit.

To run backwards (MSB <-- LSB), one can simply mask and shift right, like:
	uint MASK =3D 1;
	auto node =3D root;
	auto code =3D something;
	foreach (_ ; 0..BIT_SIZE) {
	    bit =3D code & MASK;
	    node =3D node.nodes[bit];
	    code >>=3D 1;
	}
But this looks like slow (to my surprise). I don't know what the limiting i=
nstruction is.

Also, My actual need would rather be to move forward. The reason is this al=
lows important optimisations for common cases. In fact, the codes are unico=
de code points: if the 5 first bits are 0 (tested with a mask), I can jump =
forward to a sub-tree corresponding to a code < 0x10000 (BMP, 99.999% cases=
 in practice). If the 14 first bits are 0 (ditto), I can jump to the ASCII =
case, very common as well.
But this seems more difficult; or rather I have not found a direct way to e=
xtract a 0/1 bit. I had first used an array of masks [2^n,2^n-1,...,4,2,1]:
	foreach (i ; 0..BIT_SIZE) {
	    mask =3D MASKS[i];
	    bit =3D !!(code & mask);
	    node =3D node.nodes[bit];
	}
But as you see masking that way lets a value of 2^i, not 1, in the 'true' c=
ase, which needs to be converted, either with "cast(bool)bit" or "!!bit". A=
nd this seems _very_ slow: runs about 3 times slower than backward traversa=
l.

Note: I cannot use a plain array, because there is a small proportion of ac=
tual values in the whole range (sparse array of ~ 1 per-thousand 'busy' loc=
ations). I'm looking for a possible alternative to using AAs (costly becaus=
e of division). Backward traversal currently runs sensibly slower that usin=
g an AA, but with optimisation for common cases there could be a huge gain =
in practice.
The data structure I'm trying to implement is concretely a kind "bit-trie" =
(dunno whether that exists already).

Any advice welcome. Thank you,

Denis
-- -- -- -- -- -- --
vit esse estrany =E2=98=A3

spir.wikidot.com
Dec 26 2010
next sibling parent Simon <s.d.hammett gmail.com> writes:
On 26/12/2010 12:18, spir wrote:
 Hello,

 I need to test in sequence

 Denis
 -- -- -- -- -- -- --
 vit esse estrany ☣

 spir.wikidot.com

http://digitalmars.com/d/2.0/phobos/std_intrinsic.html -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk
Dec 26 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
spir <denis.spir gmail.com> wrote:

 Also, My actual need would rather be to move forward. The reason is this  
 allows important optimisations for common cases. In fact, the codes are  
 unicode code points: if the 5 first bits are 0 (tested with a mask), I  
 can jump forward to a sub-tree corresponding to a code < 0x10000 (BMP,  
 99.999% cases in practice). If the 14 first bits are 0 (ditto), I can  
 jump to the ASCII case, very common as well.

std.intrinsic's[1] bsr returns the highest non-zero bit. I'd think the masking'd be as effective when you know which bits you care about.
 But this seems more difficult; or rather I have not found a direct way  
 to extract a 0/1 bit.

 [2^n,2^n-1,...,4,2,1]:
 	foreach (i ; 0..BIT_SIZE) {
 	    mask = MASKS[i];
 	    bit = !!(code & mask);
 	    node = node.nodes[bit];
 	}
 But as you see masking that way lets a value of 2^i, not 1, in the  
 'true' case, which needs to be converted, either with "cast(bool)bit" or  
 "!!bit". And this seems _very_ slow: runs about 3 times slower than  
 backward traversal.

This seems better implemented like this: foreach ( i; 0..BIT_SIZE ) { bit = ( code >> i ) & 1; node = node.nodes[bit]; }

[1]: http://www.digitalmars.com/d/2.0/phobos/std_intrinsic.html -- Simen
Dec 26 2010
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On Sun, 26 Dec 2010 13:40:22 +0100
"Simen kjaeraas" <simen.kjaras gmail.com> wrote:

 	foreach (i ; 0..BIT_SIZE) {
 	    mask =3D MASKS[i];
 	    bit =3D !!(code & mask);
 	    node =3D node.nodes[bit];
 	}
 But as you see masking that way lets a value of 2^i, not 1, in the =20
 'true' case, which needs to be converted, either with "cast(bool)bit" o=


 "!!bit". And this seems _very_ slow: runs about 3 times slower than =20
 backward traversal. =20

This seems better implemented like this: =20 foreach ( i; 0..BIT_SIZE ) { bit =3D ( code >> i ) & 1; node =3D node.nodes[bit]; }

You have not read my post carefully enough ;-) That's ~ how I coded travers= ing the bit seq backwards (right-to-left, LSB -> MSB); but I was trying to = find a way to traverse it forwards. Anyway, than you for the pointer to std.intrinsic, seems helpful. Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Dec 26 2010
prev sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
spir <denis.spir gmail.com> wrote:

 foreach ( i; 0..BIT_SIZE ) {
      bit = ( code >> i ) & 1;
      node = node.nodes[bit];
 }

You have not read my post carefully enough ;-) That's ~ how I coded traversing the bit seq backwards (right-to-left, LSB -> MSB); but I was trying to find a way to traverse it forwards.

Sorry. Indeed I messed up: foreach_reverse ( i; 0..BIT_SIZE ) { bit = ( code >> i ) & 1; node = node.nodes[bit]; } or foreach ( i; 0..BIT_SIZE ) { bit = ( code >> ( BIT_SIZE + 1 - i ) ) & 1; node = node.nodes[bit]; } -- Simen
Dec 26 2010