digitalmars.D.learn - BitArray

digitalmars.D.learn - BitArray - Is there one?

Era Scarecrow (33/33) May 26 2012 Curiously I'm making a huffman compression algo for fun, however

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (6/37) May 26 2012 std.bitmanip.BitArray.

Era Scarecrow (5/6) May 26 2012 And I was certain I checked std.bitmanip. Oh well, I'll throw my

Era Scarecrow (4/10) May 27 2012 Wow, it passed all my basic unittests with flying colors, even
=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (16/21) May 27 2012 It could definitely use some improvement. In particular:

Era Scarecrow (11/22) May 27 2012 Mmm probably the first change I'd do is

bearophile (5/6) May 27 2012 Take also a look in Bugzilla, there are few small enhancement

Era Scarecrow (53/63) May 30 2012 Got some improvements, although the bulk work needs to be tested

Dmitry Olshansky (7/35) May 30 2012 It all went nice and well...

bearophile (7/8) May 30 2012 That sounds a bit too much. I think 2 - 3 words should be enough.
Era Scarecrow (22/24) May 30 2012 Well for the whole normal size it was 32bytes, or 24 if I drop

Dmitry Olshansky (20/44) May 30 2012 That overhead is for concatenation ? Did it ever occured to you that

Era Scarecrow (59/83) May 30 2012 Yes, a ~ b of course creates a new one. a ~= true or a ~= b is

Suliman (6/6) Aug 22 2014 bool[4] x = [1, 1, 0, 0];

bearophile (5/11) Aug 22 2014 It still lacks some features like a constructor that accepts a

Chris Cain (4/5) May 27 2012 And also std.container.Array ... it has a specialization that

Era Scarecrow (3/5) May 27 2012 Unfortunately I'm not following, either in the documentation or

Era Scarecrow (3/8) May 27 2012 Nevermind, was looking in std.array and not std.container.

Jonathan M Davis (5/17) May 27 2012 AFAIK, there are no plans to get rid of it due to the bool packing in

Era Scarecrow (15/18) May 28 2012 Well so far the biggest problem I have is trying to make

Dmitry Olshansky (24/40) May 28 2012 Check your math. Xor != sub.

Era Scarecrow (29/52) May 28 2012 Hmm well subs would all happen at the 1 bit level. So let's

jerro (8/24) May 28 2012 I think you misunderstood. If you view a boolean as a one bit
Dmitry Olshansky (19/65) May 28 2012 You surely haven't looked at the source code did you? :)

Dmitry Olshansky (3/7) May 28 2012 ... in-place bit array. So it's cheap tag for your union.
Era Scarecrow (15/27) May 28 2012 I recall looking at it, but to me that just didn't make sense. I

Dmitry Olshansky (11/30) May 28 2012 No you don't. Since pointer is already a pointer to word-sized object.

Era Scarecrow (5/16) May 29 2012 Don't you mean the first 2 bits? (being least significant). Even

Dmitry Olshansky (15/30) May 29 2012 Yes, least significant. no you memory is intact.

Era Scarecrow (5/8) May 28 2012 Annoying, twice I've tried to branch/fork to post updates in

Jonathan M Davis (4/13) May 28 2012 Well, I'm afraid that that's completely out of our hands. Overall, git a...
Dmitry Olshansky (4/11) May 28 2012 hm... http://git-scm.com/book

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

  Curiously I'm making a huffman compression algo for fun, however 
I didn't see anything in std.array or anywhere that was to 
support bools specifically (in a packed form anyways). So I'm 
making one. So far I've got it saving the data as uint, 0 refers 
to the most significant bit and 7 the least significant, so A 
will come out as 0100001 and look right

  It supports a range, however foreach doesn't like using 
front/popFront/empty if I make opApply. It supports slices as 
well.

  For the BitArray, using it as follows I get an error:

struct BitArray {
   alias length opDollar;
   alias length __dollar;

   int length();
   Range opSlice();
   Range opSlice(int s, int e);
   struct Range {
      int opDollar();
   }
}

  BitArray ba = BitArray(someArrayOfUbyte);
  auto range = ba[0 .. $]; //needs to access 'this' for length? 
shouldn't it rewrite to ba[0 .. ba.opDollar()] ?
  auto range2 = ba[];      //works fine

  It also seems the BitArray (but not the range) requires __dollar 
rather than opDollar, however an alias seems to fix that. (Why is 
this?? the Range doesn't need a __dollar alias to work right)

  I also have it so you can ref the individual bools (by using an 
intermediate one in opApply). So..

//perfectly fine in unittest
foreach(ref r_b; range) {
   r_b = !r_b;
}

May 26 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 27-05-2012 08:50, Era Scarecrow wrote:
 Curiously I'm making a huffman compression algo for fun, however I
 didn't see anything in std.array or anywhere that was to support bools
 specifically (in a packed form anyways). So I'm making one. So far I've
 got it saving the data as uint, 0 refers to the most significant bit and
 7 the least significant, so A will come out as 0100001 and look right

 It supports a range, however foreach doesn't like using
 front/popFront/empty if I make opApply. It supports slices as well.

 For the BitArray, using it as follows I get an error:

 struct BitArray {
 alias length opDollar;
 alias length __dollar;

 int length();
 Range opSlice();
 Range opSlice(int s, int e);
 struct Range {
 int opDollar();
 }
 }

 BitArray ba = BitArray(someArrayOfUbyte);
 auto range = ba[0 .. $]; //needs to access 'this' for length? shouldn't
 it rewrite to ba[0 .. ba.opDollar()] ?
 auto range2 = ba[]; //works fine

 It also seems the BitArray (but not the range) requires __dollar rather
 than opDollar, however an alias seems to fix that. (Why is this?? the
 Range doesn't need a __dollar alias to work right)

 I also have it so you can ref the individual bools (by using an
 intermediate one in opApply). So..

 //perfectly fine in unittest
 foreach(ref r_b; range) {
 r_b = !r_b;
 }

std.bitmanip.BitArray.

-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

May 26 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Sunday, 27 May 2012 at 06:53:33 UTC, Alex Rønne Petersen 
wrote:
 std.bitmanip.BitArray.

  And I was certain I checked std.bitmanip. Oh well, I'll throw my 
current unittests at it and perhaps improve what's already 
avaliable if it needs it.

May 26 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Sunday, 27 May 2012 at 06:58:21 UTC, Era Scarecrow wrote:
 On Sunday, 27 May 2012 at 06:53:33 UTC, Alex Rønne Petersen 
 wrote:
 std.bitmanip.BitArray.

  And I was certain I checked std.bitmanip. Oh well, I'll throw 
 my current unittests at it and perhaps improve what's already 
 avaliable if it needs it.

  Wow, it passed all my basic unittests with flying colors, even 
working with $ where mine failed. Most curious :) Guess my 
partial code can be put away and likely never touched again.

May 27 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 27-05-2012 08:58, Era Scarecrow wrote:
 On Sunday, 27 May 2012 at 06:53:33 UTC, Alex Rønne Petersen wrote:
 std.bitmanip.BitArray.

 And I was certain I checked std.bitmanip. Oh well, I'll throw my current
 unittests at it and perhaps improve what's already avaliable if it needs
 it.

It could definitely use some improvement. In particular:

* It still uses the deprecated operator overload methods, rather than 
opBinary(Right) and opUnary.
* It's not quite const/immutable-friendly.
* It forces the GC on you. Probably not easy to solve until Andrei comes 
up with an allocators design.
* Needs more pure and nothrow.
* Needs more  safe/ trusted.
* The init() methods are very unintuitive at first glance.
* Should probably deprecate the reverse and sort properties.

... if you want to have a poke at it. ;)

-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

May 27 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Sunday, 27 May 2012 at 07:04:36 UTC, Alex Rønne Petersen 
wrote:
 It could definitely use some improvement. In particular:

 * It still uses the deprecated operator overload methods, 
 rather than opBinary(Right) and opUnary.
 * It's not quite const/immutable-friendly.
 * It forces the GC on you. Probably not easy to solve until 
 Andrei comes up with an allocators design.
 * Needs more pure and nothrow.
 * Needs more  safe/ trusted.
 * The init() methods are very unintuitive at first glance.
 * Should probably deprecate the reverse and sort properties.

 ... if you want to have a poke at it. ;)

Mmm probably the first change I'd do is

struct BitArray(INT = size_t)

  It should more precise size control letting you make larger than 
2^32bits in an array. If there's an objection to this I'd need to 
know before I began (Not that it changes any functionality).

  I'll go over it and see what I can do. Strange, as I'm looking 
it over it looks remarkably similar to my incomplete one. Not 
seeing any support for slices, I'll probably add that 
functionality.

May 27 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Era Scarecrow:

  I'll go over it and see what I can do.

Take also a look in Bugzilla, there are few small enhancement 
requests about BitArray (like a fast population count method).

Bye,
bearophile

May 27 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

  Got some improvements, although the bulk work needs to be tested 
more otherwise it all seems to work quite well. Once I figure out 
how to post to GitHub I'll upload a version for everyone to play 
with and review. Code's not beautiful, but other than the length 
function, most of it is easy to follow.

  I'll see if I can post it tomorrow, depending on how much my 
brain wants to be my friend.

On Sunday, 27 May 2012 at 07:04:36 UTC, Alex Rønne Petersen 
wrote:
 It could definitely use some improvement. In particular:

 * It still uses the deprecated operator overload methods, 
 rather than opBinary(Right) and opUnary.

  Fixed, except I can't seem to get around using opCat, beyond 
that it uses only the newer methods. and less code in those spots.

 * It's not quite const/immutable-friendly.

  It's const friendly now; which includes slices; That doesn't 
include immutable though as I'll probably have to make a whole 
bunch of new functions just to handle immutable but otherwise it 
doesn't seem like it's out of reach.

 * It forces the GC on you. Probably not easy to solve until 
 Andrei comes up with an allocators design.

  With the compact version (up to 256 bits) the GC is not needed.

 * Needs more pure and nothrow.

  Since bitmanips templates (The version I have anyways) aren't 
pure/nothrow, neither can the rest of it be. If/when those get 
fixed I'll tag it as best I can.

 * Needs more  safe/ trusted.

  Tried to tag as much as I can. Half of it's  safe, the other 
half  trusted.

 * The init() methods are very unintuitive at first glance.

  Still not very intuitive; however constructors are also 
available that do as good if not better than init.

 * Should probably deprecate the reverse and sort properties.

  Depreciated.


  Other features: Slice operations available, so you can slice 
specific sets of bits. Since opDollar doesn't work right I can't 
rely on them just yet. uses and accepts ulong for the slices and 
opIndex. The BitArray footprint is about 36 bytes for 32bit 
systems, and likely 40bytes for 64bit systems.

  bool[4] x = [1, 1, 0, 0];
  BitArray ba = BitArray(x);

  ba = ba[2 .. ba.length]; //or BitArray(x, 2, x.length)
  assert(ba == x[2 .. $]);


// const

  ba = BitArray(x);
  const BitArray cba = ba;
  assert(cba == ba);

  const BitArray slice = ba[2 .. ba.length]; //slices work too!
  assert(ba[2 .. cba.length] == cba);
  ba[0 .. 2] = cba[];


// compact / GC-less.

  assert(ba.isCompact()); //if it's true, no GC involved.
  BitArray ca = ba;
  ca[0] = 0;
  assert(ca != ba);

  ba.length = 1000;
  assert(!ba.isCompact()); //GC'd
  ba = ba[0 .. 4];
  assert(!ba.isCompact()); //slice doesn't automatically convert 
to compact

  ca = ba;
  ca[0] = 0;
  assert(ba == ca); //like a regular slice, shares memory

May 30 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 30.05.2012 15:04, Era Scarecrow wrote:
[snip]
 Other features: Slice operations available, so you can slice specific
 sets of bits. Since opDollar doesn't work right I can't rely on them
 just yet. uses and accepts ulong for the slices and opIndex. The
 BitArray footprint is about 36 bytes for 32bit systems, and likely
 40bytes for 64bit systems.

It all went nice and well...
Ouch why 36 bytes?

 bool[4] x = [1, 1, 0, 0];
 BitArray ba = BitArray(x);

 ba = ba[2 .. ba.length]; //or BitArray(x, 2, x.length)
 assert(ba == x[2 .. $]);


 // const

 ba = BitArray(x);
 const BitArray cba = ba;
 assert(cba == ba);

 const BitArray slice = ba[2 .. ba.length]; //slices work too!
 assert(ba[2 .. cba.length] == cba);
 ba[0 .. 2] = cba[];


 // compact / GC-less.

 assert(ba.isCompact()); //if it's true, no GC involved.
 BitArray ca = ba;
 ca[0] = 0;
 assert(ca != ba);

 ba.length = 1000;
 assert(!ba.isCompact()); //GC'd
 ba = ba[0 .. 4];
 assert(!ba.isCompact()); //slice doesn't automatically convert to compact

 ca = ba;
 ca[0] = 0;
 assert(ba == ca); //like a regular slice, shares memory

Cool.

-- 
Dmitry Olshansky

May 30 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Dmitry Olshansky:

 Ouch why 36 bytes?

That sounds a bit too much. I think 2 - 3 words should be enough.

I also suggest the OP to take a look at Bugzilla:
http://d.puremagic.com/issues/buglist.cgi?quicksearch=BitArray

If you have questions feel free to ask :)

Bye,
bearophile

May 30 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Wednesday, 30 May 2012 at 11:19:43 UTC, Dmitry Olshansky wrote:
 It all went nice and well...
 Ouch why 36 bytes?

  Well for the whole normal size it was 32bytes, or 24 if I drop 
the ulong 'reserved' which does a minor optimization so it can 
expand rather than duping every time it expands (but slices know 
not to expand and dup). Otherwise...

  auto x = BitArray(1000);
  auto y = x[256 .. 700];

  x ~= true; //expands without resizing (Leftover space)
  y ~= true; //overwrite bit 701 in x? Dups instead

  There's also much faster prepending of bits (Thanks to slice 
support). So...

  x = true ~ x; //dups & reallocates once, but not O(n) slow
  x = false ~ x; //uses reserved space at beginning used, so O(1)

Let's see, 32bit systems if we have 24 bytes, and 4 bytes for the 
overhead, 20*8, 160 bits would be available, or 32bytes with 28 
bytes extra at 228 bits

On 64bit systems, you'll have 32bytes, 24 avaliable, so 192 bits. 
or 40bytes with 32, giving 256 bits. I will the current to at 
least match the regular/normal one equally so some space isn't 
lost, but it won't be by much.

  If you drop it down to a simple pointer (like the original), 
than slices and extra features and optimizations go away.

May 30 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 30.05.2012 23:25, Era Scarecrow wrote:
 On Wednesday, 30 May 2012 at 11:19:43 UTC, Dmitry Olshansky wrote:
 It all went nice and well...
 Ouch why 36 bytes?

 Well for the whole normal size it was 32bytes, or 24 if I drop the ulong
 'reserved' which does a minor optimization so it can expand rather than
 duping every time it expands (but slices know not to expand and dup).
 Otherwise...

 auto x = BitArray(1000);
 auto y = x[256 .. 700];

 x ~= true; //expands without resizing (Leftover space)
 y ~= true; //overwrite bit 701 in x? Dups instead

 There's also much faster prepending of bits (Thanks to slice support).
 So...

 x = true ~ x; //dups & reallocates once, but not O(n) slow
 x = false ~ x; //uses reserved space at beginning used, so O(1)

 Let's see, 32bit systems if we have 24 bytes, and 4 bytes for the
 overhead, 20*8, 160 bits would be available, or 32bytes with 28 bytes
 extra at 228 bits

That overhead is for concatenation ? Did it ever occured to you that 
arrays do sometimes reallocate on append. And of course a ~ b should 
produce new copy (at least semantically).
Why bit array shouldn't follow the same design?

I fail to see how the whole 4 byte word has to be wasted.
Obviously you can denote 1 byte for small length. Or maximum 2bytes in 
both cases.

 On 64bit systems, you'll have 32bytes, 24 avaliable, so 192 bits. or
 40bytes with 32, giving 256 bits. I will the current to at least match
 the regular/normal one equally so some space isn't lost, but it won't be
 by much.

Same thought where these 8 bytes go ? Why would I pay 8 bytes for 
possible concatenation if I happen to use fixed sizes or resize them 
rarely? Pay as you go is the most sane principle.
(built-in arrays do have append cache if you didn't know so they already 
reserve space in advance(!))

 If you drop it down to a simple pointer (like the original),

than slices
 and extra features and optimizations go away.

Slice could be another type BTW. That type should be implicitly 
convertible to BitArray on assignment. Thus slice would have start/end 
pair of ulongs while array itself wouldn't.
Sounds good?

-- 
Dmitry Olshansky

May 30 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Wednesday, 30 May 2012 at 19:43:32 UTC, Dmitry Olshansky wrote:
 On 30.05.2012 23:25, Era Scarecrow wrote:

 That overhead is for concatenation ? Did it ever occurred to 
 you that arrays do sometimes reallocate on append. And of 
 course a ~ b should produce new copy (at least semantically).
 Why bit array shouldn't follow the same design?

  Yes, a ~ b of course creates a new one. a ~= true or a ~= b is 
different. To do slices I did a simple start/end position pair 
and it keeps the current array (Not too unlike a regular array). 
When you slice it just adjusts the starting/ending location, and 
'reserved' gets truncated based on the slice. Course I might be 
able to drop that to a single bit meaning it was a slice or the 
original, but that still has to come from somewhere, due to 
alignment it would likely still be a 32/64bit size_t, unless I 
have other stuff to go with it.

 I fail to see how the whole 4 byte word has to be wasted. 
 Obviously you can denote 1 byte for small length. Or maximum 
 2bytes in both cases.

Truthfully only 2 bytes are wasted for the compact version (As a 
flag), and 2 bytes are used for the start/ending locations.

 On 64bit systems, you'll have 32bytes, 24 avaliable, so 192 
 bits. or 40bytes with 32, giving 256 bits. I will the current 
 to at least match the regular/normal one equally so some space 
 isn't lost, but it won't be by much.

 Same thought where these 8 bytes go ? Why would I pay 8 bytes 
 for possible concatenation if I happen to use fixed sizes or 
 resize them rarely? Pay as you go is the most sane principle.
 (built-in arrays do have append cache if you didn't know so 
 they already reserve space in advance(!))

  But unfortunately bit arrays aren't normal arrays. If you did 
exactly the amount to fill the size_t types, then you won't have 
extra space and it would have to be resized to concatenate 
(assuming it could be). I can also only do what makes sense to 
me. If I don't have a 'reserved' somehow, every concatenation 
could require a resize/copy to ensure the slices don't overlap. 
So a ~= true; a ~= true; would do 2 forced resizes/copies, which 
is silly but would be required.

 If you drop it down to a simple pointer (like the original), 
 than slices
 and extra features and optimizations go away.

 Slice could be another type BTW. That type should be implicitly 
 convertible to BitArray on assignment. Thus slice would have 
 start/end pair of ulongs while array itself wouldn't.
 Sounds good?

  Sounds workable... Although it increases the complexity a bit, 
and even more functions to make just to cover the compatibility 
and differences. Since the BitArray is a struct rather than a 
class, if it were a class, then it would be easier to do some of 
this, but the hidden overhead then becomes greater than the 
struct overhead. Then there's also if you decide you want a 
uneven array not divisible by size_t bits, you'd get stuck with a 
slice anyways or the length would be wrong all the time.

  a.length = 10;
  assert(a == 10); //breaks! Length is 32/64
  a.length = 32;
  assert(a == 32); //okay on 32bit machines
  a ~= true;
  assert(a == 33); //Breaks! length is now 64/128

  Choices choices.... Simpler seems

  Having it implicitly convertible means any time you would want 
to do a bitarray and a slice, the slice needs to reallocate so 
it's beginning offset is 0 and not possible 0 - size_t*8, or 
making a slice actually makes a unique copy each time, which is 
it's own overhead but makes the structure smaller, or making a 
slice version of a BitArray, but once again doubles complexity 
and requires helper versions for all of them. The simple bitarray 
suddenly is becoming very large and complex just to save a few 
bytes.

  When I did C programming before I was always anal about space 
which I find now is rather silly. I would do something like

struct {
  unsigned int a : 5;
  unsigned int b : 5;
  unsigned int c : 10;
  unsigned int d : 2;
}

  etc etc. Any time my requirements changed although I needed to 
change a few bits, having the size difference between 4 bytes and 
32 bytes depending on how many times the object is made seems 
small and silly. I have gigs of memory free and only only a 
handful of the objects made at any given time makes it seem silly.

  Maybe I have it all wrong, or maybe there's much more to worry 
about than originally thought.

May 30 2012

"Suliman" <evermind live.ru> writes:

  bool[4] x = [1, 1, 0, 0];
  BitArray ba = BitArray(x);

When I try to compile this I am getting error:
source\app.d(13): Error: cannot implicitly convert expression (x) 
of type bool[4] to uint

What I am doing wrong?

Aug 22 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Suliman:

  bool[4] x = [1, 1, 0, 0];
  BitArray ba = BitArray(x);

 When I try to compile this I am getting error:
 source\app.d(13): Error: cannot implicitly convert expression 
 (x) of type bool[4] to uint

 What I am doing wrong?

It still lacks some features like a constructor that accepts a 
(lazy or eager) Range of bits/bools.

Bye,
bearophile

Aug 22 2014

"Chris Cain" <clcain uncg.edu> writes:

On Sunday, 27 May 2012 at 06:53:33 UTC, Alex Rønne Petersen 
wrote:
 std.bitmanip.BitArray.

And also std.container.Array ... it has a specialization that 
packs bools. It also appears to be more modern.

May 27 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Sunday, 27 May 2012 at 17:26:33 UTC, Chris Cain wrote:
 And also std.container.Array ... it has a specialization that 
 packs bools. It also appears to be more modern.

  Unfortunately I'm not following, either in the documentation or 
glancing over the sources.

May 27 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Sunday, 27 May 2012 at 18:18:13 UTC, Era Scarecrow wrote:
 On Sunday, 27 May 2012 at 17:26:33 UTC, Chris Cain wrote:
 And also std.container.Array ... it has a specialization that 
 packs bools. It also appears to be more modern.

  Unfortunately I'm not following, either in the documentation 
 or glancing over the sources.


  Nevermind, was looking in std.array and not std.container.

  So should I work on std.bitmanip.BitArray?

May 27 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, May 27, 2012 20:21:23 Era Scarecrow wrote:
 On Sunday, 27 May 2012 at 18:18:13 UTC, Era Scarecrow wrote:
 On Sunday, 27 May 2012 at 17:26:33 UTC, Chris Cain wrote:
 And also std.container.Array ... it has a specialization that
 packs bools. It also appears to be more modern.
 

  Unfortunately I'm not following, either in the documentation
 
 or glancing over the sources.

 
   Nevermind, was looking in std.array and not std.container.
 
   So should I work on std.bitmanip.BitArray?

AFAIK, there are no plans to get rid of it due to the bool packing in 
std.container.Array, so if there's anything that you can do to improve it, go 
right ahead. Help is welcome.

- Jonathan M Davis

May 27 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Sunday, 27 May 2012 at 18:25:38 UTC, Jonathan M Davis wrote:
 AFAIK, there are no plans to get rid of it due to the bool 
 packing in std.container.Array, so if there's anything that you 
 can do to improve it, go right ahead. Help is welcome.

  Well so far the biggest problem I have is trying to make 
const-friendly slice(s). Also minor issues with the built-in 
core.bitop functions as they accept size_t and not long, so 
beyond having to include my own versions everything else seems to 
be happy and passes. I wouldn't say it's cleaner, but should be 
an improvement.

  One functionality I fully removed was the subtract operator. 
Honestly it makes no sense; you may as well use xor. Originally I 
updated it to use ubyte rather than size_t, however seeing as 
there would be a big speed drop I ended up reverting it 
afterwards.

  I'm not sure about merging it into GitHub, I'd prefer to have 
someone glance it over and give it a try before adding it back 
into the standard library.

May 28 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 29.05.2012 1:39, Era Scarecrow wrote:
 On Sunday, 27 May 2012 at 18:25:38 UTC, Jonathan M Davis wrote:
 AFAIK, there are no plans to get rid of it due to the bool packing in
 std.container.Array, so if there's anything that you can do to improve
 it, go right ahead. Help is welcome.

 Well so far the biggest problem I have is trying to make const-friendly
 slice(s). Also minor issues with the built-in core.bitop functions as
 they accept size_t and not long, so beyond having to include my own
 versions everything else seems to be happy and passes. I wouldn't say
 it's cleaner, but should be an improvement.

 One functionality I fully removed was the subtract operator. Honestly it
 makes no sense; you may as well use xor. Originally I updated it to use
 ubyte rather than size_t, however seeing as there would be a big speed
 drop I ended up reverting it afterwards.

Check your math. Xor != sub.
  1 ^ 0 == 1, 0 ^ 1 == 1;
Compare with
  1 <sub> 0 == 1, 0 <sub> 1 == 0.

That is, for one thing, sub is asymmetric :)
 I'm not sure about merging it into GitHub, I'd prefer to have someone
 glance it over and give it a try before adding it back into the standard
 library.

I think it would be nice to rewrite operators to new style. Cut down of 
the sheer repetition is good enough reason to merge it.

Alas the other primary thing it lacks aside from set operators 
overloading is Small string optimization. Namely you have some 8bytes 
here, that's more then enough to store say 7*8=56 bits, and have 7 bits 
for length. Presumably you check if it's a small set by using lower bit 
of a pointer.
e.g. 0 == normal pointer, big set. if set to 1 == small set, everything 
else in next 3 bits of "pointer" are small-length, all other 7 bytes of 
the whole struct contain bit-array. You still can operate on it as 2 words.

Then it can have very interesting use cases. Like say completely replace 
usual nitty-gritty C-ish bit-flags. They sadly are too cost effective so 
that no suitable safe alternative came about yet. Time to make a game 
changer? ;)

(and of course this optimization would be very welcome, I'll happily 
bash any walls in front of such contribution making into Phobos)

-- 
Dmitry Olshansky

May 28 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Monday, 28 May 2012 at 21:59:36 UTC, Dmitry Olshansky wrote:
 Check your math. Xor != sub.
  1 ^ 0 == 1, 0 ^ 1 == 1;
 Compare with
  1 <sub> 0 == 1, 0 <sub> 1 == 0.

 That is, for one thing, sub is asymmetric :)

  Hmm well subs would all happen at the 1 bit level. So let's 
compare.
xor
0 ^ 0 = 0
0 ^ 1 = 1
1 ^ 0 = 1
1 ^ 1 = 0

sub
0 - 0 = 0
0 - 1 = -1 (or non-zero/true, truncates to 1)
1 - 0 = 1
1 - 1 = 0

  Sorry, seems the same unless we are going with carry rules, (at 
which point it is no longer bit arrays subs), it seems to follow 
the xor rules.

 I think it would be nice to rewrite operators to new style. Cut 
 down of the sheer repetition is good enough reason to merge it.

 Alas the other primary thing it lacks aside from set operators 
 overloading is Small string optimization. Namely you have some 
 8bytes here, that's more then enough to store say 7*8=56 bits, 
 and have 7 bits for length. Presumably you check if it's a 
 small set by using lower bit of a pointer. e.g. 0 == normal 
 pointer, big set. if set to 1 == small set, everything else in 
 next 3 bits of "pointer" are small-length, all other 7 bytes of 
 the whole struct contain bit-array. You still can operate on it 
 as 2 words.

  Union work. Could do it, but it then requires special rules. To 
do slices I have 2 longs representing the offset and ending bits 
within the size_t array, so we're looking at 192-256 bits. minus 
16 will still leave us with 128 workable bits.

  Dealing with the pointer seems like a bad idea, but my starting 
offset we can say the first two bytes are 255, dropping the 
usable range of the starting offset to about 2^64 - 2^32, That 
could work and keep almost all the functionality.

 Then it can have very interesting use cases. Like say 
 completely replace usual nitty-gritty C-ish bit-flags. They 
 sadly are too cost effective so that no suitable safe 
 alternative came about yet. Time to make a game changer? ;)

  I don't see BitArray as a good replacement for the C flags. 
However I already made (and suggested) a flags struct that 
accepts and uses enums for it's flags.

 (and of course this optimization would be very welcome, I'll 
 happily bash any walls in front of such contribution making 
 into Phobos)

  Then I'll import it and Hope I don't get too much criticism from 
the community.

May 28 2012

"jerro" <a a.com> writes:

 That is, for one thing, sub is asymmetric :)

  Hmm well subs would all happen at the 1 bit level. So let's 
 compare.
 xor
 0 ^ 0 = 0
 0 ^ 1 = 1
 1 ^ 0 = 1
 1 ^ 1 = 0

 sub
 0 - 0 = 0
 0 - 1 = -1 (or non-zero/true, truncates to 1)
 1 - 0 = 1
 1 - 1 = 0

  Sorry, seems the same unless we are going with carry rules, 
 (at which point it is no longer bit arrays subs), it seems to 
 follow the xor rules.

I think you misunderstood. If you view a boolean as a one bit 
integer, then yes, xor is the same as subtraction. But what 
Dmitry probably meant is subtracting sets 
(http://en.wikipedia.org/wiki/Set_(mathematics)#Complements). So 
for bit arrays a and b the result of a - b would be a bit array 
where bit at index i is set if it is set in a and not set in b. 
That's also what documentation for std.bitmanip.BitArray 
currently says.

May 28 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 29.05.2012 4:13, Era Scarecrow wrote:
 On Monday, 28 May 2012 at 21:59:36 UTC, Dmitry Olshansky wrote:
 Check your math. Xor != sub.
 1 ^ 0 == 1, 0 ^ 1 == 1;
 Compare with
 1 <sub> 0 == 1, 0 <sub> 1 == 0.

 That is, for one thing, sub is asymmetric :)

 Hmm well subs would all happen at the 1 bit level. So let's compare.
 xor
 0 ^ 0 = 0
 0 ^ 1 = 1
 1 ^ 0 = 1
 1 ^ 1 = 0

 sub
 0 - 0 = 0
 0 - 1 = -1 (or non-zero/true, truncates to 1)
 1 - 0 = 1
 1 - 1 = 0

You surely haven't looked at the source code did you? :)
BitArray opSub(BitArray e2)
{
	BitArray result;
	
         result.length = len;
         for (size_t i = 0; i < dim; i++)
             result.ptr[i] = this.ptr[i] & ~e2.ptr[i];
         return result;
}
It's conceptualy non per bit '-', it's a set difference...

 Alas the other primary thing it lacks aside from set operators
 overloading is Small string optimization. Namely you have some 8bytes
 here, that's more then enough to store say 7*8=56 bits, and have 7
 bits for length. Presumably you check if it's a small set by using
 lower bit of a pointer. e.g. 0 == normal pointer, big set. if set to 1
 == small set, everything else in next 3 bits of "pointer" are
 small-length, all other 7 bytes of the whole struct contain bit-array.
 You still can operate on it as 2 words.

 Union work. Could do it, but it then requires special rules. To do
 slices I have 2 longs representing the offset and ending bits within the
 size_t array, so we're looking at 192-256 bits. minus 16 will still
 leave us with 128 workable bits.

 Dealing with the pointer seems like a bad idea, but my starting offset
 we can say the first two bytes are 255, dropping the usable range of the
 starting offset to about 2^64 - 2^32, That could work and keep almost
 all the functionality.

Not at all. Once you established that it's not a pointer namely since 
every pointer to size_t is word aligned (unless constructed by hand).
You could use it's lowest bit as marker then. It's 0 state won't disturb 
pointer usual semantics, when it's set to 1 it's obviously

 Then it can have very interesting use cases. Like say completely
 replace usual nitty-gritty C-ish bit-flags. They sadly are too cost
 effective so that no suitable safe alternative came about yet. Time to
 make a game changer? ;)

 I don't see BitArray as a good replacement for the C flags. However I
 already made (and suggested) a flags struct that accepts and uses enums
 for it's flags.

Cool. Eager to see that on the way to Phobos too.

 (and of course this optimization would be very welcome, I'll happily
 bash any walls in front of such contribution making into Phobos)

 Then I'll import it and Hope I don't get too much criticism from the
 community.


-- 
Dmitry Olshansky

May 28 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

 Not at all. Once you established that it's not a pointer namely since
 every pointer to size_t is word aligned (unless constructed by hand).
 You could use it's lowest bit as marker then. It's 0 state won't disturb
 pointer usual semantics, when it's set to 1 it's obviously

... in-place bit array. So it's cheap tag for your union.

-- 
Dmitry Olshansky

May 28 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Tuesday, 29 May 2012 at 06:30:37 UTC, Dmitry Olshansky wrote:
 You surely haven't looked at the source code did you? :)
 It's conceptualy non per bit '-', it's a set difference...

  I recall looking at it, but to me that just didn't make sense. I 
could add subtract back and update it (Not many changes needed to 
keep it).

 Not at all. Once you established that it's not a pointer namely 
 since every pointer to size_t is word aligned (unless 
 constructed by hand).

 You could use it's lowest bit as marker then. It's 0 state 
 won't disturb pointer usual semantics, when it's set to 1 it's 
 obviously.

  I considered that, but then you actually limit your address 
space to 2^63, true that seems silly up until in the future when 
we use all 64bits for memory referencing and suddenly it's seg 
faulting for no understandable reason (Yes it's a long ways off, 
but this will long be forgotten about by that time). However 
referring to the internal offsets it only effects slices; and 
those are easy to fix.

  This will likely take a little time to think over and get 
working, I'd hate to have to make alternate versions of 
everything.

 I don't see BitArray as a good replacement for the C flags. 
 However I already made (and suggested) a flags struct that 
 accepts and uses enums for it's flags.


 Cool. Eager to see that on the way to Phobos too.

  We'll see, once I sorta figure this all out.

May 28 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 29.05.2012 10:52, Era Scarecrow wrote:
 On Tuesday, 29 May 2012 at 06:30:37 UTC, Dmitry Olshansky wrote:
 You surely haven't looked at the source code did you? :)
 It's conceptualy non per bit '-', it's a set difference...

 I recall looking at it, but to me that just didn't make sense. I could
 add subtract back and update it (Not many changes needed to keep it).

 Not at all. Once you established that it's not a pointer namely since
 every pointer to size_t is word aligned (unless constructed by hand).

 You could use it's lowest bit as marker then. It's 0 state won't
 disturb pointer usual semantics, when it's set to 1 it's obviously.

 I considered that, but then you actually limit your address space to
 2^63,

No you don't. Since pointer is already a pointer to word-sized object. 
It has 2 last bits == 0. Always. There is no escaping of this fact. And 
no your address space is intact. All it has to do is assuming proper 
alignment, and you sure have it since you _allocate_ it.
To be more specific most allocator go even farther and provide 8bytes 
aligned pointer.

true that seems silly up until in the future when we use all
 64bits for memory referencing and suddenly it's seg faulting for no
 understandable reason (Yes it's a long ways off, but this will long be
 forgotten about by that time). However referring to the internal offsets
 it only effects slices; and those are easy to fix.

See the above.

 This will likely take a little time to think over and get working, I'd
 hate to have to make alternate versions of everything.


 Cool. Eager to see that on the way to Phobos too.

 We'll see, once I sorta figure this all out.


-- 
Dmitry Olshansky

May 28 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Tuesday, 29 May 2012 at 06:56:40 UTC, Dmitry Olshansky wrote:
 On 29.05.2012 10:52, Era Scarecrow wrote:
 I considered that, but then you actually limit your address 
 space to
 2^63,

 No you don't. Since pointer is already a pointer to word-sized 
 object. It has 2 last bits == 0. Always. There is no escaping 
 of this fact. And no your address space is intact. All it has 
 to do is assuming proper alignment, and you sure have it since 
 you _allocate_ it.

 To be more specific most allocator go even farther and provide 
 8bytes aligned pointer.

  Don't you mean the first 2 bits? (being least significant). Even 
so, that seems more like its working out of coincidence, that or 
you can never use more than 25% of your memory in a single 
address space in your program (ever).

May 29 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 29.05.2012 11:25, Era Scarecrow wrote:
 On Tuesday, 29 May 2012 at 06:56:40 UTC, Dmitry Olshansky wrote:
 On 29.05.2012 10:52, Era Scarecrow wrote:
 I considered that, but then you actually limit your address space to
 2^63,

 No you don't. Since pointer is already a pointer to word-sized object.
 It has 2 last bits == 0. Always. There is no escaping of this fact.
 And no your address space is intact. All it has to do is assuming
 proper alignment, and you sure have it since you _allocate_ it.

 To be more specific most allocator go even farther and provide 8bytes
 aligned pointer.

 Don't you mean the first 2 bits? (being least significant). Even so,
 that seems more like its working out of coincidence, that or you can
 never use more than 25% of your memory in a single address space in your
 program (ever).

Yes, least significant. no you memory is intact.
I think it's a classical case of you not getting how it works.
No problem, bear with me :)

Suppose you have array of size_t, let size_t.sizeof == 4.
Here it is:
data: 	a0 a1 a2 a3 a4 ...
addr:	20 24 28 32 36 ...

Noticed the pattern? Once first address is multiple of 4 (that is 
properly aligned) it can't become not a multiple of 4.
Meaning that 2 least significant bits are always == 0.

The memory is not wasted since address is incremented by 4 bytes at a 
time, and is read by a unit of 4 bytes.

-- 
Dmitry Olshansky

May 29 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Sunday, 27 May 2012 at 18:25:38 UTC, Jonathan M Davis wrote:

 AFAIK, there are no plans to get rid of it due to the bool 
 packing in std.container.Array, so if there's anything that you 
 can do to improve it, go right ahead. Help is welcome.

  Annoying, twice I've tried to branch/fork to post updates in 
GitHub and twice it says 'no changes'. It's annoying things like 
this that make me not want to contribute. Doesn't help I'm not in 
a patient mood today :(

May 28 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, May 29, 2012 04:09:48 Era Scarecrow wrote:
 On Sunday, 27 May 2012 at 18:25:38 UTC, Jonathan M Davis wrote:
 AFAIK, there are no plans to get rid of it due to the bool
 packing in std.container.Array, so if there's anything that you
 can do to improve it, go right ahead. Help is welcome.

 
   Annoying, twice I've tried to branch/fork to post updates in
 GitHub and twice it says 'no changes'. It's annoying things like
 this that make me not want to contribute. Doesn't help I'm not in
 a patient mood today :(

Well, I'm afraid that that's completely out of our hands. Overall, git and 
github work quite well, but they're not perfect.

- Jonathan M Davis

May 28 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 29.05.2012 6:09, Era Scarecrow wrote:
 On Sunday, 27 May 2012 at 18:25:38 UTC, Jonathan M Davis wrote:

 AFAIK, there are no plans to get rid of it due to the bool packing in
 std.container.Array, so if there's anything that you can do to improve
 it, go right ahead. Help is welcome.

 Annoying, twice I've tried to branch/fork to post updates in GitHub and
 twice it says 'no changes'. It's annoying things like this that make me
 not want to contribute. Doesn't help I'm not in a patient mood today :(

hm... http://git-scm.com/book


-- 
Dmitry Olshansky

May 28 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - BitArray - Is there one?