www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Fast 2D matrix of bits

reply bearophile <bearophileHUGS lycos.com> writes:
In Phobos (beside fast 2D/3D/4D vectors) I'd like a fast 2D matrix of bits. I
have added an implementation in attach in this enhancement request:

http://d.puremagic.com/issues/show_bug.cgi?id=6697

It's a common enough data structure, and despite being simple it's not obvious,
especially if you want it to be fast. A naive implementation that uses a
BitArray is not fast enough.

Bye,
bearophile
Sep 19 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 09/20/2011 12:19 AM, bearophile wrote:
 In Phobos (beside fast 2D/3D/4D vectors) I'd like a fast 2D matrix of bits. I
have added an implementation in attach in this enhancement request:

 http://d.puremagic.com/issues/show_bug.cgi?id=6697

 It's a common enough data structure, and despite being simple it's not
obvious, especially if you want it to be fast. A naive implementation that uses
a BitArray is not fast enough.

 Bye,
 bearophile

The implementation uses int and uint everywhere. I think it is not 64 bit aware?
 // sizex_, sizey_ are signed to catch negative arguments bugs.

There would be no negative arguments if they were unsigned.
 // opIndex not used because it's slower on LDC

Why is it slower? The structure should imho certainly use opIndex(Assign). Since you want it to be really fast, this occurs multiple times:
 immutable int index = x + (y << this.pow2); // 1D bit index
 this._bits[index / nbits_item] |= 1 << (index % nbits_item);

nbits_item is a power of two. So the divisions/modulo have the potential to be really fast. But the signedness of index makes some straightforward optimizations very hard to carry out for the compiler. (it will probably use bit shifts and logical and, but it has to emit something more to cope with (im)possible negative values) Try making index unsigned in those cases, it'll probably save some machine instructions. (size_t is always your best bet for indexes)
Sep 19 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Timon Gehr:

 The implementation uses int and uint everywhere. I think it is not 64 
 bit aware?

I have not even tried it on a 64 bit system.
  > // sizex_, sizey_ are signed to catch negative arguments bugs.
 
 There would be no negative arguments if they were unsigned.

For me a huge unsigned value that I can't catch in the ctor precondition is worse than a negative value that I am able to catch. I don't need to create a 2D matrix with sides bigger than int.max. I agree that all the other methods are probably better with size_t arguments. I'll fix it.
 Why is it slower?

Ask it to LDC1 developers :-) Adding that method is easy, if you want it. And I agree it's handy, with syntax mat[x,y] = true/false. But note the preferred methods to use this matrix are set/reset, because they are faster than the assign. I'll think about it.
 But the signedness of index makes some 
 straightforward optimizations very hard to carry out for the compiler. 
 [...] but it has to emit something more to cope with (im)possible negative
values)

Right, index needs to be uint or size_t. I'll fix it. Thank you for your suggestions and notes, bye, bearophile
Sep 19 2011
prev sibling next sibling parent reply Josh Simmons <simmons.44 gmail.com> writes:
On Tue, Sep 20, 2011 at 10:24 AM, bearophile <bearophileHUGS lycos.com> wro=
te:
 Timon Gehr:

 The implementation uses int and uint everywhere. I think it is not 64
 bit aware?

I have not even tried it on a 64 bit system.
 =C2=A0> // sizex_, sizey_ are signed to catch negative arguments bugs.

 There would be no negative arguments if they were unsigned.

For me a huge unsigned value that I can't catch in the ctor precondition =

ate a 2D matrix with sides bigger than int.max.
 I agree that all the other methods are probably better with size_t argume=

 Why is it slower?

Ask it to LDC1 developers :-) Adding that method is easy, if you want it. And I agree it's handy, with =

 But note the preferred methods to use this matrix are set/reset, because =

 But the signedness of index makes some
 straightforward optimizations very hard to carry out for the compiler.
 [...] but it has to emit something more to cope with (im)possible negati=


 Right, index needs to be uint or size_t. I'll fix it.

 Thank you for your suggestions and notes,
 bye,
 bearophile

Zero is a power of two too :( You could also just... uint32_t next_pow_2(uint32_t n) { n -=3D 1; n |=3D n >> 1; n |=3D n >> 2; n |=3D n >> 4; n |=3D n >> 8; n |=3D n >> 16; return n + 1; }
Sep 19 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Josh Simmons:

 You could also just...
 
 uint32_t next_pow_2(uint32_t n)
 {
     n -= 1;
     n |= n >> 1;
     n |= n >> 2;
     n |= n >> 4;
     n |= n >> 8;
     n |= n >> 16;
     return n + 1;
 }

My version with bsr is faster. Bye, bearophile
Sep 20 2011
parent reply Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
20.09.2011 10:22, bearophile пишет:
 Josh Simmons:

 You could also just...

 uint32_t next_pow_2(uint32_t n)
 {
      n -= 1;
      n |= n>>  1;
      n |= n>>  2;
      n |= n>>  4;
      n |= n>>  8;
      n |= n>>  16;
      return n + 1;
 }

My version with bsr is faster. Bye, bearophile

Is it? Every conditional branch can break CPU command queue and slow down execution. Anyway, I saw same problem not far-away (2^n sized OpenGL textures). IMHO (too lazy to make tests) it is the fastest: --- bool isPow2(int n) in { assert(n > 0); } body { return !((n - 1) & n); } unittest { assert(!isPow2(0)); assert(isPow2(1)); assert(isPow2(2)); assert(!isPow2(3)); assert(isPow2(4)); assert(!isPow2(5)); } int toPow2(int n) in { assert(n > 0); } body { return 1 << (bsr(n) + !isPow2(n)); } unittest { assert(toPow2(1) == 1); assert(toPow2(2) == 2); assert(toPow2(3) == 4); assert(toPow2(4) == 4); assert(toPow2(5) == 8); } --- A short version (AFAIK, !! is one lexem for dmd): --- int toPow2(int n) { return 1 << (bsr(n) + !!((n - 1) & n)); } ---
Sep 20 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Denis Shelomovskij:

 Is it? Every conditional branch can break CPU command queue and slow 
 down execution. Anyway, I saw same problem not far-away (2^n sized 
 OpenGL textures).

Hours ago Don has written a comment in one of my enhancement requests that is related to this topic: http://d.puremagic.com/issues/show_bug.cgi?id=4046 I will do better benchmarks, and if necessary I'll update that helper function in the bit matrix code in Bugzilla (I have already fixed most of the other suggestions given by Timon Gehr). If Don is right, this asks for some deprecation in core.bitops. Bye, bearophile
Sep 22 2011
prev sibling next sibling parent Josh Simmons <simmons.44 gmail.com> writes:
On Tue, Sep 20, 2011 at 12:03 PM, Josh Simmons <simmons.44 gmail.com> wrote:
 Zero is a power of two too :(

Which one exactly? Sometimes I wonder...
Sep 19 2011
prev sibling next sibling parent Josh Simmons <simmons.44 gmail.com> writes:
On Tue, Sep 20, 2011 at 5:22 PM, bearophile <bearophileHUGS lycos.com> wrote:
 My version with bsr is faster.

 Bye,
 bearophile

Is that science or guessing? My horribly unscientific test shows the opposite to be true, I'm looking over the assembly output to see if there's an extraneous factor.
Sep 20 2011
prev sibling next sibling parent Josh Simmons <simmons.44 gmail.com> writes:
On Tue, Sep 20, 2011 at 6:36 PM, Josh Simmons <simmons.44 gmail.com> wrote:
 On Tue, Sep 20, 2011 at 5:22 PM, bearophile <bearophileHUGS lycos.com> wrote:
 My version with bsr is faster.

 Bye,
 bearophile

Is that science or guessing? My horribly unscientific test shows the opposite to be true, I'm looking over the assembly output to see if there's an extraneous factor.

Ah, when the one I gave was slower it wasn't being unrolled by gcc, when yours was slower my trivial loop was being vectorised. When they're both handled the same the results are the same not favoring either. Cool. I do like that the propagate-right version can be vectorised though.
Sep 20 2011
prev sibling parent Paul D. Anderson <paul.d.removethis.anderson comcast.andthis.net> writes:
Timon Gehr Wrote:

 
 The implementation uses int and uint everywhere. I think it is not 64 
 bit aware?
 

There is another problem with bitmanip and 64-bit integers. See Bug #6686. Paul
Sep 20 2011