digitalmars.D.announce - cashew.utils.array 0.1a.2

Chris Nicholson-Sauls (11/11) Sep 12 2006 I've been pushing for some array utilities to get into Phobos, yes, but ...

Chris Nicholson-Sauls (4/4) Sep 12 2006 And just a few minutes later, I issue another release... I'd neglected ...
Lionello Lunesu (7/14) Sep 13 2006 Nice.. I hope something like this will make it into Phobos at some

Chris Nicholson-Sauls (7/27) Sep 13 2006 True... and fixed. :) Functions indexOf, rindexOf, indexOfSub, and rin...

clayasaurus (2/631) Sep 13 2006 Phobos does need something like this.

J Duncan (5/6) Sep 13 2006 very badly, everyone rolls their own so it would be nice to have stuff

Mikola Lysenko (33/77) Sep 14 2006 These algorithms can be implemented more efficiently. The running time

Chris Nicholson-Sauls (5/93) Sep 14 2006 True, and not a bad idea. I'm not sure how it would be with arrays of c...

Chris Nicholson-Sauls (10/107) Sep 14 2006 Well I just tried it and... woo, intersect is notably faster indeed. I ...

Nikita Kalaganov (3/8) Sep 13 2006 Ahh, thanks!
Reiner Pope (76/93) Sep 13 2006 It looks good. Thanks for doing this.

Reiner Pope (1/1) Sep 13 2006 Oh, and I forgot: a reverse iterator, like mintl.array has.
Chris Nicholson-Sauls (9/118) Sep 14 2006 Because I'm mildly paranoid and wanted to avoid memory allocation where ...

Reiner Pope (8/54) Sep 14 2006 Passing by 'inout' is probably less efficient and certainly doesn't save...

Chris Nicholson-Sauls (8/70) Sep 16 2006 So it does... although I could swear that it didn't behave right before....

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

I've been pushing for some array utilities to get into Phobos, yes, but in the
meantime I 
updated the array module in my personal Cashew lib.  Included in the updates
are the 
rotl/rotr that I recall someone asking about.  In the process I've discovered
two bugs, as 
well: the behavior of 'is' versus '==' is incompatable if the operands are
arrays, and 
DDoc for abbreviated function templates is borked.  For an example of the
latter, just 
look at Cashew's own docs.

That said: if people think this is a decent collection, and Walter would take
it, I would 
be willing to release it to public domain for Phobos' sake.

The array module is attached, and the docs are at:
http://www.codemeu.com:81/~pontiff/projects/cashew/doc/array.html

-- Christopher Nicholson-Sauls

Sep 12 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

And just a few minutes later, I issue another release...  I'd neglected a
little something 
from rotl() and rotr(), namely I hadn't accounted for iterations greater than
the array's 
length.  Heh.  Fixed now.  New version attached.

-- Chris Nicholson-Sauls

Sep 12 2006

Lionello Lunesu <lio lunesu.remove.com> writes:

Chris Nicholson-Sauls wrote:
 I've been pushing for some array utilities to get into Phobos, yes, but 
 in the meantime I updated the array module in my personal Cashew lib.  
 Included in the updates are the rotl/rotr that I recall someone asking 
 about.  In the process I've discovered two bugs, as well: the behavior 
 of 'is' versus '==' is incompatable if the operands are arrays, and DDoc 
 for abbreviated function templates is borked.  For an example of the 
 latter, just look at Cashew's own docs.


Nice.. I hope something like this will make it into Phobos at some 
point, since everybody seems to have implemented these functions anyway..

One remark though: your removeAll is pretty slow. It should be enough to 
iterate the array only once. A indexOf that takes a starting_index would 
be a simple fix.

L.

Sep 13 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Lionello Lunesu wrote:
 Chris Nicholson-Sauls wrote:
 
 I've been pushing for some array utilities to get into Phobos, yes, 
 but in the meantime I updated the array module in my personal Cashew 
 lib.  Included in the updates are the rotl/rotr that I recall someone 
 asking about.  In the process I've discovered two bugs, as well: the 
 behavior of 'is' versus '==' is incompatable if the operands are 
 arrays, and DDoc for abbreviated function templates is borked.  For an 
 example of the latter, just look at Cashew's own docs.

 
 
 
 Nice.. I hope something like this will make it into Phobos at some 
 point, since everybody seems to have implemented these functions anyway..
 
 One remark though: your removeAll is pretty slow. It should be enough to 
 iterate the array only once. A indexOf that takes a starting_index would 
 be a simple fix.
 
 L.

True... and fixed.  :)  Functions indexOf, rindexOf, indexOfSub, and
rindexOfSub all take 
an optional 'start' parameter now.  This effected the design of functions
removeAll and 
unique.

I did, however, have to do something weird with removeAll.  It just didn't want
to work 
right any other way... must play with it some more later.

-- Chris Nicholson-Sauls

Sep 13 2006

clayasaurus <clayasaurus gmail.com> writes:

Chris Nicholson-Sauls wrote:
 Lionello Lunesu wrote:
 Chris Nicholson-Sauls wrote:

 I've been pushing for some array utilities to get into Phobos, yes, 
 but in the meantime I updated the array module in my personal Cashew 
 lib.  Included in the updates are the rotl/rotr that I recall someone 
 asking about.  In the process I've discovered two bugs, as well: the 
 behavior of 'is' versus '==' is incompatable if the operands are 
 arrays, and DDoc for abbreviated function templates is borked.  For 
 an example of the latter, just look at Cashew's own docs.



 Nice.. I hope something like this will make it into Phobos at some 
 point, since everybody seems to have implemented these functions anyway..

 One remark though: your removeAll is pretty slow. It should be enough 
 to iterate the array only once. A indexOf that takes a starting_index 
 would be a simple fix.

 L.

 
 True... and fixed.  :)  Functions indexOf, rindexOf, indexOfSub, and 
 rindexOfSub all take an optional 'start' parameter now.  This effected 
 the design of functions removeAll and unique.
 
 I did, however, have to do something weird with removeAll.  It just 
 didn't want to work right any other way... must play with it some more 
 later.
 
 -- Chris Nicholson-Sauls
 
 
 ------------------------------------------------------------------------
 
 /*****************************************************************************************
  * Utility template functions for arrays.  Example:
  *
  * $(CASHEW_HEAD)
  *----------------------------------------------------------------------------------------
  * import cashew .utils .array ;
  * // ...
  * int[] foo = ...;
  * foo.remove(4);
  * foo.eat(foo.indexOf(2));
  *----------------------------------------------------------------------------------------
  */
 module cashew.utils.array ;
 
 /***********************************************************************************
  * Unit tests support.
  */
 version (Unittest) {
   import std .stdio ;
 }
 
 unittest {
   writefln("Unittest: cashew.utils.array: begin");
 }
 
 /***********************************************************************************
  * Create an array.
  */
 T[] array (T) (T[] arr ...)
 body {
   return arr.dup;
 }
 unittest {
   writef("\t array(...) --------> ");
   auto foo = array!(int)(1, 2, 3);
   assert(foo.length == 3U);
   assert(typeid(typeof(foo[0])) == typeid(typeof(1)));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Verify that a given value is present in an array.
  */
 bool contains (T) (inout T[] haystack, T needle)
 body {
   return haystack.indexOf(needle) != size_t.max;
 }
 unittest {
   writef("\t.contains(T) -------> ");
   auto foo = array!(int)(1, 2, 3);
   assert(  foo.contains(2));
   assert(! foo.contains(4));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Compose a new array of the values in the first parameter not present in the
second.
  */
 T[] diff (T) (inout T[] alpha, inout T[] beta)
 body {
   T[] result = alpha.dup;
 
   foreach (x; alpha) {
     if (beta.contains(x)) {
       while (result.contains(x)) {
         result.remove(x);
       }
     }
   }
   return result;
 }
 unittest {
   writef("\t.diff(T[]) ---------> ");
   auto foo = array!(int)(1, 2, 3, 4, 5);
   auto bar = array!(int)(1,    3,    5);
   assert(foo.diff(bar) == array!(int)(2, 4));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Remove some of the beginning of an array.
  */
 void eat (T) (inout T[] haystack, size_t count)
 body {
   if (count > haystack.length) {
     haystack.length = 0;
   }
   else {
     haystack = haystack[count .. haystack.length];
   }
 }
 unittest {
   writef("\t.eat(N) ------------> ");
   auto foo = array!(int)(1, 2, 3, 4, 5);
   foo.eat(3U);
   assert(foo == array!(int)(4, 5));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Search an array for a given item, and return its index, or size_t.max if
not found.
  */
 size_t indexOf (T) (inout T[] haystack, T needle, size_t start = 0U)
 body {
   size_t result = size_t.max ;
 
   foreach (index, item; haystack[start .. haystack.length]) {
     if (item is needle) {
       result = index;
       break;
     }
   }
   if (result != size_t.max) {
     result += start;
   }
   return result;
 }
 unittest {
   writef("\t.indexOf(T) --------> ");
   auto foo = array!(int)(1, 2, 3);
   assert(foo.indexOf(2) == 1U);
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Search an array for a given sub-array, and return its index, or size_t.max
if not found.
  */
 size_t indexOfSub (T) (inout T[] haystack, inout T[] bale, size_t start = 0U)
 body {
   size_t result = size_t.max                    ,
          wall   = haystack.length - bale.length ;
 
   for (size_t i = start; i < wall; i++) {
     if (haystack[i .. i + bale.length] == bale) {
       result = i;
       break;
     }
   }
   return result;
 }
 unittest {
   writef("\t.indexOfSub(T[]) ---> ");
   auto foo = array!(int)(1, 2, 3, 4, 5);
   auto sub = array!(int)(      3, 4   );
   assert(foo.indexOfSub(sub) == 2U);
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Same as indexOf but work backwards from the end.
  */
 size_t rindexOf (T) (inout T[] haystack, T needle, size_t start = size_t.max)
 body {
   size_t result = size_t.max ;
 
   if (start == size_t.max) {
     start = haystack.length - 1U;
   }
   for (size_t i = start; i >= 0U; i--) {
     if (haystack[i] is needle) {
       result = i;
       break;
     }
   }
   return result;
 }
 unittest {
   writef("\t.rindexOf(T) -------> ");
   auto foo = array!(int)(1, 9, 2, 9, 3);
   assert(foo.rindexOf(9) == 3U);
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Same as indexOf but work backwards from the end.
  */
 size_t rindexOfSub (T) (inout T[] haystack, inout T[] bale, size_t start =
size_t.max)
 body {
   size_t result = size_t.max ;
 
   if (start == size_t.max) {
     start = haystack.length;
   }
   for (size_t i = start - bale.length; i >= 0; i--) {
     if (haystack[i .. i + bale.length] == bale) {
       result = i;
       break;
     }
   }
   return result;
 }
 unittest {
   writef("\t.rindexOfSub(T[]) --> ");
   auto foo = array!(int)(1, 2, 9, 8, 1, 2, 9, 8);
   auto sub = array!(int)(1, 2                  );
   assert(foo.rindexOfSub(sub) == 4U);
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Return an array composed of common elements of two arrays.
  */
 T[] intersect (T) (inout T[] alpha, inout T[] beta)
 body {
   T[] result;
 
   foreach (x; alpha) {
     if (beta.contains(x)) {
       result ~= x;
     }
   }
   return result;
 }
 unittest {
   writef("\t.intersect(T[]) ----> ");
   auto foo = array!(int)(1, 2, 3, 4, 5      );
   auto bar = array!(int)(      3, 4, 5, 6, 7);
   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Remove an item from an array.
  */
 void remove (T) (inout T[] haystack, T needle)
 body {
   size_t index = haystack.indexOf(needle) ;
 
   if (index != size_t.max) {
     haystack.removeIndex(index);
   }
 }
 unittest {
   writef("\t.remove(T) ---------> ");
   auto foo = array!(int)(1, 2, 3);
   foo.remove(2);
   assert(foo == array!(int)(1, 3));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Remove all occurances of an item from an array.
  */
 void removeAll (T) (inout T[] haystack, T needle)
 body {
   size_t   index   = 0U ;
   size_t[] indices      ;
 
   while ((index = haystack.indexOf(needle, index)) != size_t.max) {
     indices ~= index;
     index++;
   }
   foreach (i, x; indices) {
     haystack.removeIndex(x - i);
   }
 }
 unittest {
   writef("\t.removeAll(T) ------> ");
   auto foo = array!(int)(1, 2, 1, 3, 1, 4, 1, 5);
   foo.removeAll(1);
   assert(foo == array!(int)(2, 3, 4, 5));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Remove the item at a given index.
  */
 void removeIndex (T) (inout T[] haystack, size_t index)
 body {
   haystack = haystack[0 .. index] ~ haystack[index + 1 .. haystack.length];
 }
 unittest {
   writef("\t.removeIndex(N) ----> ");
   auto foo = array!(int)(1, 2, 3, 4);
   foo.removeIndex(2U);
   assert(foo == array!(int)(1, 2, 4));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Build an array by repeating an item.
  */
 void repeat (T) (inout T[] haystack, T needle, size_t len = size_t.max)
 body {
   if (len == size_t.max) {
     len = haystack.length;
   }
 
   haystack.length = len;
   haystack[] = needle;
 }
 unittest {
   writef("\t.repeat(T [, N]) ---> ");
   int[] foo ;
   foo.repeat(3, 3U);
   assert(foo == array!(int)(3, 3, 3));
 
   auto bar = array!(int)(0, 0, 0, 0, 0, 0, 0);
   bar.repeat(7);
   assert(bar == array!(int)(7, 7, 7, 7, 7, 7, 7));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Build an array by repeating a smaller array.
  */
 void repeatSub (T) (inout T[] haystack, inout T[] bale, size_t count)
 body {
   haystack.length = 0;
   for (int i; i < count; i++) {
     haystack ~= bale;
   }
 }
 unittest {
   writef("\t.repeatSub(T[], N) -> ");
   int[] foo                     ,
         sub = array!(int)(4, 2) ;
   foo.repeatSub(sub, 3U);
   assert(foo == array!(int)(4, 2, 4, 2, 4, 2));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Fill an array with a given smaller array.
  */
 void fill (T) (inout T[] haystack, inout T[] bale, size_t len = size_t.max)
 body {
   if (len == size_t.max) {
     len = haystack.length;
   }
   haystack.length = 0;
   while (haystack.length < len) {
     haystack ~= bale;
   }
   haystack.length = len;
 }
 unittest {
   writef("\t.fill (T[] [, N]) --> ");
   auto foo = array!(int)(1, 2, 3, 4, 5);
   auto sub = array!(int)(3, 2, 1      );
   foo.fill(sub);
   assert(foo == array!(int)(3, 2, 1, 3, 2));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Remove duplicate values from an array.
  */
 void unique (T) (inout T[] haystack)
 body {
   size_t ridx ;
 
   for (int i = 0; i < haystack.length; i++) {
     ridx = size_t.max;
     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
       haystack.removeIndex(ridx);
     }
   }
 }
 unittest {
   writef("\t.unique() ----------> ");
   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
   foo.unique();
   assert(foo == array!(int)(1, 2, 3, 4, 5));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Pull a number of items from one array into another.
  */
 T[] pull (T) (inout T[] haystack, size_t idx)
 body {
   T[] result ;
 
   if (idx >= haystack.length) {
     idx = haystack.length;
   }
 
   result   = haystack[0   .. idx            ];
   haystack = haystack[idx .. haystack.length];
   return result;
 }
 unittest {
   writef("\t.pull(N) -----------> ");
   auto foo = array!(int)(1, 2, 3, 4, 5);
   auto bar = foo.pull(3U);
   assert(foo == array!(int)(         4, 5));
   assert(bar == array!(int)(1, 2, 3      ));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Same as pull, but indexed from the end.
  */
 T[] rpull (T) (inout T[] haystack, size_t idx)
 body {
   T[] result ;
 
   if (idx >= haystack.length) {
     result = haystack;
     haystack.length = 0;
   }
   else {
     idx--;
 
     result   = haystack[idx .. haystack.length];
     haystack = haystack[0   .. idx            ];
   }
   return result;
 }
 unittest {
   writef("\t.rpull(N) ----------> ");
   auto foo = array!(int)(1, 2, 3, 4, 5);
   auto bar = foo.rpull(3U);
   assert(foo == array!(int)(1, 2         ));
   assert(bar == array!(int)(      3, 4, 5));
 
   auto alpha = array!(int)(1, 2, 3);
   auto beta  = alpha.rpull(4U);
   assert(alpha == array!(int)(       ));
   assert(beta  == array!(int)(1, 2, 3));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Rotate an array's contents toward the left.
  */
 void rotl (T) (inout T[] haystack, size_t iter)
 body {
   iter %= haystack.length;
   haystack = haystack[iter .. haystack.length] ~ haystack[0 .. iter];
 }
 unittest {
   writef("\t.rotl(N) -----------> ");
   auto foo = array!(int)(1, 2, 3, 4, 5, 6);
   foo.rotl(2U);
   assert(foo == array!(int)(3, 4, 5, 6 ,1, 2));
 
   auto bar = array!(int)(1, 2, 3);
   bar.rotl(4U);
   assert(bar == array!(int)(2, 3, 1));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Rotate an array's contents toward the right.
  */
 void rotr (T) (inout T[] haystack, size_t iter)
 body {
   size_t idx  ;
 
   iter %= haystack.length        ;
   idx   = haystack.length - iter ;
   haystack = haystack[idx .. haystack.length] ~ haystack[0 .. idx];
 }
 unittest {
   writef("\t.rotr(N) -----------> ");
   auto foo = array!(int)(1, 2, 3, 4, 5, 6);
   foo.rotr(2U);
   assert(foo == array!(int)(5, 6, 1, 2, 3, 4));
 
   auto bar = array!(int)(1, 2, 3);
   bar.rotr(4U);
   assert(bar == array!(int)(3, 1, 2));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Append to an array.
  */
 void push (T) (inout T[] haystack, T[] bale ...)
 body {
   haystack ~= bale;
 }
 unittest {
   writef("\t.push(...) ---------> ");
   auto foo = array!(int)(1, 2, 3);
   foo.push(4, 5);
   assert(foo == array!(int)(1, 2, 3, 4, 5));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Retrieve and remove the last value of an array.
  */
 T pop (T) (inout T[] haystack)
 body {
   T result = haystack[haystack.length - 1];
 
   haystack.length = haystack.length - 1;
   return result;
 }
 unittest {
   writef("\t.pop() -------------> ");
   auto foo = array!(int)(1, 2, 3);
   auto elm = foo.pop();
   assert(foo == array!(int)(1, 2));
   assert(elm == 3);
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Prepend to an array.
  */
 void backpush (T) (inout T[] haystack, T[] bale ...)
 body {
   haystack ~= bale;
   haystack.rotr(bale.length);
 }
 unittest {
   writef("\t.backpush(...) -----> ");
   auto foo = array!(int)(3, 4);
   foo.backpush(1, 2);
   assert(foo == array!(int)(1, 2, 3, 4));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Retrieve and remove the first value of an array.
  */
 T backpop (T) (inout T[] haystack)
 body {
   T result = haystack[0];
 
   haystack = haystack[1 .. haystack.length];
   return result;
 }
 unittest {
   writef("\t.backpop() ---------> ");
   auto foo = array!(int)(1, 2, 3);
   auto elm = foo.backpop();
   assert(foo == array!(int)(2, 3));
   assert(elm == 1);
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Drop values from the beginning of an array.
  */
 T[] shift (T) (inout T[] haystack, size_t count)
 body {
   T[] result ;
 
   if (count >= haystack.length) {
     result = haystack;
     haystack.length = 0;
   }
   else {
     result = haystack[0 .. count];
     haystack = haystack[count .. haystack.length];
   }
   return result;
 }
 unittest {
   writef("\t.shift(N) ----------> ");
   auto foo = array!(int)(1, 2, 3, 4, 5);
   auto sub = foo.shift(3U);
   assert(foo == array!(int)(         4, 5));
   assert(sub == array!(int)(1, 2, 3      ));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Drop values from the end of an array.
  */
 T[] rshift (T) (inout T[] haystack, size_t count)
 body {
   T[] result ;
 
   if (count >= haystack.length) {
     result = haystack;
     haystack.length = 0;
   }
   else {
     result = haystack[haystack.length - count .. haystack.length];
     haystack.length = haystack.length - count;
   }
   return result;
 }
 unittest {
   writef("\t.rshift(N) ---------> ");
   auto foo = array!(int)(1, 2, 3, 4, 5);
   auto sub = foo.rshift(3U);
   assert(foo == array!(int)(1, 2         ));
   assert(sub == array!(int)(      3, 4, 5));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Unit tests support.
  */
 unittest {
   writefln("Unittest: cashew.utils.array: end\n");
 }

Phobos does need something like this.

Sep 13 2006

J Duncan <jtd514 nospam.ameritech.net> writes:

clayasaurus wrote:
 Phobos does need something like this.

very badly, everyone rolls their own so it would be nice to have stuff 
like that consolidated in the community....

I also think this would give a more 'complete' feeling to the built-in 
arrays

Sep 13 2006

Mikola Lysenko <mclysenk mtu.edu> writes:

Chris Nicholson-Sauls wrote:
 /***********************************************************************************
  * Return an array composed of common elements of two arrays.
  */
 T[] intersect (T) (inout T[] alpha, inout T[] beta)
 body {
   T[] result;
 
   foreach (x; alpha) {
     if (beta.contains(x)) {
       result ~= x;
     }
   }
   return result;
 }
 unittest {
   writef("\t.intersect(T[]) ----> ");
   auto foo = array!(int)(1, 2, 3, 4, 5      );
   auto bar = array!(int)(      3, 4, 5, 6, 7);
   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Remove duplicate values from an array.
  */
 void unique (T) (inout T[] haystack)
 body {
   size_t ridx ;
 
   for (int i = 0; i < haystack.length; i++) {
     ridx = size_t.max;
     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
       haystack.removeIndex(ridx);
     }
   }
 }
 unittest {
   writef("\t.unique() ----------> ");
   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
   foo.unique();
   assert(foo == array!(int)(1, 2, 3, 4, 5));
   writefln("Pass");
 }
 

These algorithms can be implemented more efficiently.  The running time 
for the intersect is O(n * m) (where n,m are the length of the two 
arrays).  However, what about the case where alpha and beta are ordered? 
  Here is a simple algorithm for getting the intersect:

T[] intersect (T) (inout T[] alpha, inout T[] beta)
body
{

/* 	Assume alpha, beta are ordered such that alpha[i] < alpha[i+1]
  *	and beta[i] < beta[i+1] for all i.
  */

         T[] result;
         size_t a_pos = 0;
         size_t b_pos = 0;

         while(a_pos < alpha.length && b_pos < beta.length)
         {
                 if(alpha[a_pos] == beta[b_pos])
                 {
                         result ~= alpha[a_pos];
                         ++a_pos;
                         ++b_pos;
                 }
                 else if(alpha[a_pos] < beta[b_pos])
                         ++a_pos;
                 else if(beta[b_pos] < alpha[a_pos])
                         ++b_pos;
         }

         return result;
}


The cost for this algorithm is only O(n + m).  However, ordering the two 
arrays is effectively an O(nlogn + mlogm) operation.  Therefore the 
running time for this modified algorithm is overall O(nlogn + mlogm).  A 
similar argument can be made for the unique function.

Sep 14 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Mikola Lysenko wrote:
 Chris Nicholson-Sauls wrote:
 
 /***************************************************************
******************* 

  * Return an array composed of common elements of two arrays.
  */
 T[] intersect (T) (inout T[] alpha, inout T[] beta)
 body {
   T[] result;

   foreach (x; alpha) {
     if (beta.contains(x)) {
       result ~= x;
     }
   }
   return result;
 }
 unittest {
   writef("\t.intersect(T[]) ----> ");
   auto foo = array!(int)(1, 2, 3, 4, 5      );
   auto bar = array!(int)(      3, 4, 5, 6, 7);
   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
   writefln("Pass");
 }

 /***************************************************************
******************* 

  * Remove duplicate values from an array.
  */
 void unique (T) (inout T[] haystack)
 body {
   size_t ridx ;

   for (int i = 0; i < haystack.length; i++) {
     ridx = size_t.max;
     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
       haystack.removeIndex(ridx);
     }
   }
 }
 unittest {
   writef("\t.unique() ----------> ");
   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
   foo.unique();
   assert(foo == array!(int)(1, 2, 3, 4, 5));
   writefln("Pass");
 }

 
 These algorithms can be implemented more efficiently.  The running time 
 for the intersect is O(n * m) (where n,m are the length of the two 
 arrays).  However, what about the case where alpha and beta are ordered? 
  Here is a simple algorithm for getting the intersect:
 
 T[] intersect (T) (inout T[] alpha, inout T[] beta)
 body
 {
 
 /*     Assume alpha, beta are ordered such that alpha[i] < alpha[i+1]
  *    and beta[i] < beta[i+1] for all i.
  */
 
         T[] result;
         size_t a_pos = 0;
         size_t b_pos = 0;
 
         while(a_pos < alpha.length && b_pos < beta.length)
         {
                 if(alpha[a_pos] == beta[b_pos])
                 {
                         result ~= alpha[a_pos];
                         ++a_pos;
                         ++b_pos;
                 }
                 else if(alpha[a_pos] < beta[b_pos])
                         ++a_pos;
                 else if(beta[b_pos] < alpha[a_pos])
                         ++b_pos;
         }
 
         return result;
 }
 
 
 The cost for this algorithm is only O(n + m).  However, ordering the two 
 arrays is effectively an O(nlogn + mlogm) operation.  Therefore the 
 running time for this modified algorithm is overall O(nlogn + mlogm).  A 
 similar argument can be made for the unique function.

True, and not a bad idea.  I'm not sure how it would be with arrays of
classes/structs 
which have no defined order.  I'll try a rewrite and post it later.  :)  Maybe
I should 
just ask for a DSource entry for Cashew.

-- Chris Nicholson-Sauls

Sep 14 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Chris Nicholson-Sauls wrote:
 Mikola Lysenko wrote:
 
 Chris Nicholson-Sauls wrote:

 /***************************************************************
******************* 

  * Return an array composed of common elements of two arrays.
  */
 T[] intersect (T) (inout T[] alpha, inout T[] beta)
 body {
   T[] result;

   foreach (x; alpha) {
     if (beta.contains(x)) {
       result ~= x;
     }
   }
   return result;
 }
 unittest {
   writef("\t.intersect(T[]) ----> ");
   auto foo = array!(int)(1, 2, 3, 4, 5      );
   auto bar = array!(int)(      3, 4, 5, 6, 7);
   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
   writefln("Pass");
 }

 /***************************************************************
******************* 

  * Remove duplicate values from an array.
  */
 void unique (T) (inout T[] haystack)
 body {
   size_t ridx ;

   for (int i = 0; i < haystack.length; i++) {
     ridx = size_t.max;
     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
       haystack.removeIndex(ridx);
     }
   }
 }
 unittest {
   writef("\t.unique() ----------> ");
   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
   foo.unique();
   assert(foo == array!(int)(1, 2, 3, 4, 5));
   writefln("Pass");
 }

 These algorithms can be implemented more efficiently.  The running 
 time for the intersect is O(n * m) (where n,m are the length of the 
 two arrays).  However, what about the case where alpha and beta are 
 ordered?  Here is a simple algorithm for getting the intersect:

 T[] intersect (T) (inout T[] alpha, inout T[] beta)
 body
 {

 /*     Assume alpha, beta are ordered such that alpha[i] < alpha[i+1]
  *    and beta[i] < beta[i+1] for all i.
  */

         T[] result;
         size_t a_pos = 0;
         size_t b_pos = 0;

         while(a_pos < alpha.length && b_pos < beta.length)
         {
                 if(alpha[a_pos] == beta[b_pos])
                 {
                         result ~= alpha[a_pos];
                         ++a_pos;
                         ++b_pos;
                 }
                 else if(alpha[a_pos] < beta[b_pos])
                         ++a_pos;
                 else if(beta[b_pos] < alpha[a_pos])
                         ++b_pos;
         }

         return result;
 }


 The cost for this algorithm is only O(n + m).  However, ordering the 
 two arrays is effectively an O(nlogn + mlogm) operation.  Therefore 
 the running time for this modified algorithm is overall O(nlogn + 
 mlogm).  A similar argument can be made for the unique function.

 
 
 True, and not a bad idea.  I'm not sure how it would be with arrays of 
 classes/structs which have no defined order.  I'll try a rewrite and 
 post it later.  :)  Maybe I should just ask for a DSource entry for Cashew.
 
 -- Chris Nicholson-Sauls

Well I just tried it and... woo, intersect is notably faster indeed.  I did a
benchmark 
with arrays of 1000 elements (random values), reset and intersected 1000 times,
and the 
numbers are striking:

<Benchmark old> 19.330000
<Benchmark new> 3.080000

I'll rewrite unique later, for now here's an attachment of the module with the
rewritten 
intersect, which is actually almost identical to yours, so I'll be sure and
include credit 
for it.  :)

-- Chris Nicholson-Sauls

Sep 14 2006

"Nikita Kalaganov" <sapfirnik rambler.ru> writes:

On Wed, 13 Sep 2006 07:56:36 +0400, Chris Nicholson-Sauls  
<ibisbasenji gmail.com> wrote:

 I've been pushing for some array utilities to get into Phobos, yes, but  
 in the meantime I
 updated the array module in my personal Cashew lib.  Included in the  
 updates are the
 rotl/rotr that I recall someone asking about.

Ahh, thanks!

Sep 13 2006

Reiner Pope <reiner.pope REMOVE.THIS.gmail.com> writes:

Chris Nicholson-Sauls wrote:
 I've been pushing for some array utilities to get into Phobos, yes, but 
 in the meantime I updated the array module in my personal Cashew lib.  
 Included in the updates are the rotl/rotr that I recall someone asking 
 about.  In the process I've discovered two bugs, as well: the behavior 
 of 'is' versus '==' is incompatable if the operands are arrays, and DDoc 
 for abbreviated function templates is borked.  For an example of the 
 latter, just look at Cashew's own docs.
 
 That said: if people think this is a decent collection, and Walter would 
 take it, I would be willing to release it to public domain for Phobos' 
 sake.
 
 The array module is attached, and the docs are at:
 http://www.codemeu.com:81/~pontiff/projects/cashew/doc/array.html
 
 -- Christopher Nicholson-Sauls
 

It looks good. Thanks for doing this.

Some comments/questions:

Why does indexOfSub have bale as 'inout', not simply 'in'?
Same question for fill

Couldn't indexOfSub be named indexOf?
Same question for repeatSub and repeat.

Also, thinking about its possible use as the starting point for a 
standard library, I've got some other thoughts for functions which are 
implemented in other array processing libraries (std.string, Oskar 
Linde's templated array functions, mintl.array...). Here is what I was 
thinking could be useful:

splitting/joining:
   split
   join

string processing:
   stripl/stripr/strip/chomp

string matching/replacing:
   replace
   insert
   count
   squeeze // from std.string
   startsWith
   endsWith

searching:
   findmax
   findmin
   find(delegate matches)
   findAll(delegate matches)

I've written some code:
T[][] split (T) (T[] arr, T[] delim ...)
in { assert (delim.length > 0, "Splitting with no delimiters is useless"); }
body
{
   T[][] results;
   T[] token;

   foreach (i, x; arr)
   {
     if (delim.contains(x))
	{
	  if (token.length > 0)
	  {
	     results ~= token;
		 token.length = 0;
	  }
	}
	else
	{
	  token ~= x;
	}
   }
   if (token.length > 0) results ~= token;
   return results;
}

unittest {
   writef("\t.split(...) --------> ");
/+  auto foo = "a b cde   fg";
   auto result = foo.split(' ');
   assert(result == array!(char[])("a", "b", "cde", "fg")); +/

   auto bar = "cashew casehew";
   auto result2 = bar.split('s', 'h');
   assert(result2 == array!(char[])("ca", "ew ca", "e", "ew"));
   writefln("Pass");
}

Also, couldn't you avoid the temporary in removeAll by doing this:
void removeAll (T) (inout T[] haystack, T needle)
{
   size_t index = 0U;
   size_t indices;

   while ((index = haystack.indexOf(needle, index)) != size_t.max) {
     haystack.removeIndex(index);
   }
}

If you're interested, I'll write some more code.

Cheers,

Reiner

Sep 13 2006

Reiner Pope <reiner.pope REMOVE.THIS.gmail.com> writes:

Oh, and I forgot: a reverse iterator, like mintl.array has.

Sep 13 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Reiner Pope wrote:
 Chris Nicholson-Sauls wrote:
 
 I've been pushing for some array utilities to get into Phobos, yes, 
 but in the meantime I updated the array module in my personal Cashew 
 lib.  Included in the updates are the rotl/rotr that I recall someone 
 asking about.  In the process I've discovered two bugs, as well: the 
 behavior of 'is' versus '==' is incompatable if the operands are 
 arrays, and DDoc for abbreviated function templates is borked.  For an 
 example of the latter, just look at Cashew's own docs.

 That said: if people think this is a decent collection, and Walter 
 would take it, I would be willing to release it to public domain for 
 Phobos' sake.

 The array module is attached, and the docs are at:
 http://www.codemeu.com:81/~pontiff/projects/cashew/doc/array.html

 -- Christopher Nicholson-Sauls

 It looks good. Thanks for doing this.
 
 Some comments/questions:
 
 Why does indexOfSub have bale as 'inout', not simply 'in'?
 Same question for fill

Because I'm mildly paranoid and wanted to avoid memory allocation where I
could.  :)  I do 
suppose it could safely be 'in' though.

 Couldn't indexOfSub be named indexOf?
 Same question for repeatSub and repeat.

That's a D limitation.  At the moment, function templates cannot be overloaded.
 *snort 
stomp*  Originally I /did/ have them named the same, but alas.

 Also, thinking about its possible use as the starting point for a 
 standard library, I've got some other thoughts for functions which are 
 implemented in other array processing libraries (std.string, Oskar 
 Linde's templated array functions, mintl.array...). Here is what I was 
 thinking could be useful:
 
 splitting/joining:
   split
   join
 
 string processing:
   stripl/stripr/strip/chomp
 
 string matching/replacing:
   replace
   insert
   count
   squeeze // from std.string
   startsWith
   endsWith
 
 searching:
   findmax
   findmin
   find(delegate matches)
   findAll(delegate matches)
 
 I've written some code:
 T[][] split (T) (T[] arr, T[] delim ...)
 in { assert (delim.length > 0, "Splitting with no delimiters is 
 useless"); }
 body
 {
   T[][] results;
   T[] token;
 
   foreach (i, x; arr)
   {
     if (delim.contains(x))
     {
       if (token.length > 0)
       {
          results ~= token;
          token.length = 0;
       }
     }
     else
     {
       token ~= x;
     }
   }
   if (token.length > 0) results ~= token;
   return results;
 }
 
 unittest {
   writef("\t.split(...) --------> ");
 /+  auto foo = "a b cde   fg";
   auto result = foo.split(' ');
   assert(result == array!(char[])("a", "b", "cde", "fg")); +/
 
   auto bar = "cashew casehew";
   auto result2 = bar.split('s', 'h');
   assert(result2 == array!(char[])("ca", "ew ca", "e", "ew"));
   writefln("Pass");
 }
 
 Also, couldn't you avoid the temporary in removeAll by doing this:
 void removeAll (T) (inout T[] haystack, T needle)
 {
   size_t index = 0U;
   size_t indices;
 
   while ((index = haystack.indexOf(needle, index)) != size_t.max) {
     haystack.removeIndex(index);
   }
 }

That's exactly how it was written (once I'd modified .indexOf) but it doesn't
act quite 
right for some reason.  Until I figure out what the bug is, I figured I could
slap 
together a less pretty but working version.

 If you're interested, I'll write some more code.
 
 Cheers,
 
 Reiner

-- Chris Nicholson-Sauls

Sep 14 2006

Reiner Pope <reiner.pope REMOVE.THIS.gmail.com> writes:

Chris Nicholson-Sauls wrote:
 
 
 Reiner Pope wrote:
 Chris Nicholson-Sauls wrote:

 I've been pushing for some array utilities to get into Phobos, yes, 
 but in the meantime I updated the array module in my personal Cashew 
 lib.  Included in the updates are the rotl/rotr that I recall someone 
 asking about.  In the process I've discovered two bugs, as well: the 
 behavior of 'is' versus '==' is incompatable if the operands are 
 arrays, and DDoc for abbreviated function templates is borked.  For 
 an example of the latter, just look at Cashew's own docs.

 That said: if people think this is a decent collection, and Walter 
 would take it, I would be willing to release it to public domain for 
 Phobos' sake.

 The array module is attached, and the docs are at:
 http://www.codemeu.com:81/~pontiff/projects/cashew/doc/array.html

 -- Christopher Nicholson-Sauls

 It looks good. Thanks for doing this.

 Some comments/questions:

 Why does indexOfSub have bale as 'inout', not simply 'in'?
 Same question for fill

 
 Because I'm mildly paranoid and wanted to avoid memory allocation where 
 I could.  :)  I do suppose it could safely be 'in' though.
 

Passing by 'inout' is probably less efficient and certainly doesn't save 
any memory allocation. 'inout' is equivalent to passing a pointer to the 
array reference, which itself contains a pointer to the data. The extra 
level of indirection can only add computation time.
 Also, couldn't you avoid the temporary in removeAll by doing this:
 void removeAll (T) (inout T[] haystack, T needle)
 {
   size_t index = 0U;
   size_t indices;

   while ((index = haystack.indexOf(needle, index)) != size_t.max) {
     haystack.removeIndex(index);
   }
 }

 
 That's exactly how it was written (once I'd modified .indexOf) but it 
 doesn't act quite right for some reason.  Until I figure out what the 
 bug is, I figured I could slap together a less pretty but working version.

This passes the unit tests, though.

Cheers,

Reiner

Sep 14 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Reiner Pope wrote:
 Chris Nicholson-Sauls wrote:
 
 Reiner Pope wrote:

 Chris Nicholson-Sauls wrote:

 I've been pushing for some array utilities to get into Phobos, yes, 
 but in the meantime I updated the array module in my personal Cashew 
 lib.  Included in the updates are the rotl/rotr that I recall 
 someone asking about.  In the process I've discovered two bugs, as 
 well: the behavior of 'is' versus '==' is incompatable if the 
 operands are arrays, and DDoc for abbreviated function templates is 
 borked.  For an example of the latter, just look at Cashew's own docs.

 That said: if people think this is a decent collection, and Walter 
 would take it, I would be willing to release it to public domain for 
 Phobos' sake.

 The array module is attached, and the docs are at:
 http://www.codemeu.com:81/~pontiff/projects/cashew/doc/array.html

 -- Christopher Nicholson-Sauls

 It looks good. Thanks for doing this.

 Some comments/questions:

 Why does indexOfSub have bale as 'inout', not simply 'in'?
 Same question for fill


 Because I'm mildly paranoid and wanted to avoid memory allocation 
 where I could.  :)  I do suppose it could safely be 'in' though.

 Passing by 'inout' is probably less efficient and certainly doesn't save 
 any memory allocation. 'inout' is equivalent to passing a pointer to the 
 array reference, which itself contains a pointer to the data. The extra 
 level of indirection can only add computation time.

Possibly true, so for now I took out some of the 'inout's.

 Also, couldn't you avoid the temporary in removeAll by doing this:
 void removeAll (T) (inout T[] haystack, T needle)
 {
   size_t index = 0U;
   size_t indices;

   while ((index = haystack.indexOf(needle, index)) != size_t.max) {
     haystack.removeIndex(index);
   }
 }


 That's exactly how it was written (once I'd modified .indexOf) but it 
 doesn't act quite right for some reason.  Until I figure out what the 
 bug is, I figured I could slap together a less pretty but working 
 version.

 
 This passes the unit tests, though.
 
 Cheers,
 
 Reiner

So it does... although I could swear that it didn't behave right before.  Ah
well, it 
works fine now.

Also I've written a new version of .unique(), in the process of which I've
added a 
.removeRange() function.  A quick note on .removeRange() though: it mimics the
slicing 
logic of D arrays.  So, foo.removeRange(1, 4) actually removes [1, 2, 3], not
[1, 2, 3, 4].

-- Chris Nicholson-Sauls

Sep 16 2006

D Programming

C/C++ Programming

Other

digitalmars.D.announce - cashew.utils.array 0.1a.2