www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Ranges usages

reply "bearophile" <bearophileHUGS lycos.com> writes:
(This is a partial repost from D.learn.)

Some time ago after Andrei Alexandrescu talk "Iterators must Go" 
someone asked about implementing the Dutch national flag sort 
algorithm with D ranges. This is a version of mine (but 
unfortunately currently I can't try it on the Dlist of Phobos), 
it looks short but for me it wasn't immediate to write:


void dutchNationalFlagSort(Range, T)(Range secondLast,
                                      in T lowVal, in T highVal)
pure nothrow if (isBidirectionalRange!Range &&
                  hasSwappableElements!Range &&
                  is(ElementType!Range == T)) {
     for (auto first = secondLast; !secondLast.empty; )
         if (secondLast.front == lowVal) {
             swap(first.front, secondLast.front);
             first.popFront();
             secondLast.popFront();
         } else if (secondLast.front == highVal) {
             swap(secondLast.front, secondLast.back);
             secondLast.popBack();
         } else
             secondLast.popFront();
}


In the following program there are two versions of the compress 
function, that implement the same bare-bones version of the LZW 
(http://en.wikipedia.org/wiki/Lempel-Ziv-Welch ) compression 
algorithm.

compress1 is the simpler version, while compress2 avoids some 
heap allocations using a Slice that is able to grow (in some of 
my benchmarks compress2 is about twice faster than compress1).

Is it possible to write similar code nicely & efficiently with 
ranges?


import std.stdio, std.array;

alias T = ubyte;
alias Ta = immutable(T)[];

int[] compress1(immutable T[] original) {
     int[Ta] dictionary;
     foreach (i; 0 .. 256)
         dictionary[cast(Ta)[i]] = i;

     Ta w;
     int[] result;
     foreach (ch; original) {
         auto wc = w ~ ch;
         if (wc in dictionary) {
             w = wc;
         } else {
             result ~= dictionary[w];
             dictionary[wc] = dictionary.length - 1;
             w = [ch];
         }
     }

     return w.empty ? result : (result ~ dictionary[w]);
}

int[] compress2(immutable T[] original) {
     int[Ta] dictionary;
     foreach (i; 0 .. 256)
         dictionary[cast(Ta)[i]] = i;

     struct Slice {
         size_t start, end;
          property opSlice() {
             return original[start .. end];
         }
         alias this = opSlice;
     }

     Slice w;
     int[] result;
     foreach (i; 0 .. original.length) {
         auto wc = Slice(w.start, w.end + 1);
         if (wc in dictionary) {
             w = wc;
         } else {
             result ~= dictionary[w];
             dictionary[wc] = dictionary.length - 1;
             w = Slice(i, i + 1);
         }
     }

     return w.empty ? result : (result ~ dictionary[w]);
}

void main() {
     auto txt = cast(Ta)"TOBEORNOTTOBEORTOBEORNOT";
     writeln(compress1(txt));
     writeln(compress2(txt));
}


Thank you,
bye,
bearophile
Nov 21 2012
next sibling parent reply "Rob T" <rob ucora.com> writes:
FYI

I'm getting these compile errors, must be the compiler flags I'm 
using?

source/main.d(19): Error: cannot implicitly convert expression 
(dictionary.length() - 1LU) of type ulong to int
source/main.d(48): Error: cannot implicitly convert expression 
(dictionary.length() - 1LU) of type ulong to int


17        } else {
18            result ~= dictionary[w];
19**          dictionary[wc] = dictionary.length - 1;
20            w = [ch];
21        }


46        } else {
47            result ~= dictionary[w];
48**          dictionary[wc] = dictionary.length - 1;
49            w = Slice(i, i + 1);
50        }

--rt
Nov 21 2012
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 22 November 2012 at 06:52:55 UTC, Rob T wrote:
 FYI

 I'm getting these compile errors, must be the compiler flags 
 I'm using?
Must be that bearophile and I are on windows (ergo 32), so "length" is of type "size_t", which is a "uint". This matches the key of the dictionary. If you are linux, then you are probably 64, and the type of "dictionary.length - 1" gets promoted to ulong, and can't be placed back into the dictionary. I don't think this has anything to do with flags. Placing a cast here "cast(uint)" would be the correct workaround I think.
Nov 22 2012
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11/22/2012 11:25 AM, monarch_dodra wrote:
 On Thursday, 22 November 2012 at 06:52:55 UTC, Rob T wrote:
 FYI

 I'm getting these compile errors, must be the compiler flags I'm using?
Must be that bearophile and I are on windows (ergo 32), so "length" is of type "size_t", which is a "uint". This matches the key of the dictionary. If you are linux, then you are probably 64, and the type of "dictionary.length - 1" gets promoted to ulong, and can't be placed back into the dictionary. I don't think this has anything to do with flags. Placing a cast here "cast(uint)" would be the correct workaround I think.
The flag is -m32.
Nov 22 2012
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, November 23, 2012 01:52:51 Timon Gehr wrote:
 On 11/22/2012 11:25 AM, monarch_dodra wrote:
 On Thursday, 22 November 2012 at 06:52:55 UTC, Rob T wrote:
 FYI
 
 I'm getting these compile errors, must be the compiler flags I'm using?
Must be that bearophile and I are on windows (ergo 32), so "length" is of type "size_t", which is a "uint". This matches the key of the dictionary. If you are linux, then you are probably 64, and the type of "dictionary.length - 1" gets promoted to ulong, and can't be placed back into the dictionary. I don't think this has anything to do with flags. Placing a cast here "cast(uint)" would be the correct workaround I think.
The flag is -m32.
That would be a very poor solution if the problem is that you're trying to assign a size_t to a uint. Sure, -m32 will make it compile a 32-bit binary which would then work, but well-written code doesn't generally care whether it's on a 32-bit or 64-bit machine, so if a variable of type size_t is being assigned to anything smaller than long, a cast or std.conv.to would be the correct way to handle it. Unless you really have to target a specific architecture, writing code which assumes what architecture it's in is generally bad practice. It should be as cross-platform as is reasonably possible, and D and Phobos are well set up to do that. - Jonathan M Davis
Nov 22 2012
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 11/23/2012 03:01 AM, Jonathan M Davis wrote:
 On Friday, November 23, 2012 01:52:51 Timon Gehr wrote:
 On 11/22/2012 11:25 AM, monarch_dodra wrote:
 On Thursday, 22 November 2012 at 06:52:55 UTC, Rob T wrote:
 FYI

 I'm getting these compile errors, must be the compiler flags I'm using?
Must be that bearophile and I are on windows (ergo 32), so "length" is of type "size_t", which is a "uint". This matches the key of the dictionary. If you are linux, then you are probably 64, and the type of "dictionary.length - 1" gets promoted to ulong, and can't be placed back into the dictionary. I don't think this has anything to do with flags. Placing a cast here "cast(uint)" would be the correct workaround I think.
The flag is -m32.
That would be a very poor solution if the problem is that you're trying to assign a size_t to a uint. Sure, -m32 will make it compile a 32-bit binary which would then work, but well-written code doesn't generally care whether it's on a 32-bit or 64-bit machine, so if a variable of type size_t is being assigned to anything smaller than long, a cast or std.conv.to would be the correct way to handle it. Unless you really have to target a specific architecture, writing code which assumes what architecture it's in is generally bad practice. It should be as cross-platform as is reasonably possible, and D and Phobos are well set up to do that. - Jonathan M Davis
Sure, but that is up to the author.
Nov 23 2012
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 22 November 2012 at 01:14:23 UTC, bearophile wrote:
 [SNIP]

 Thank you,
 bye,
 bearophile
I'd argue that your first "compress1" isn't range correct either anyways, because "w ~ ch;" (which will allocate) is not part of the range interface. Neither is "w = [ch];" (which will also allocate) Given the allocations in the first example, I'd hardly consider the performance comparison fair. How about this? int[] compress3(immutable T[] original) { int[Ta] dictionary; foreach (i; 0 .. 256) dictionary[cast(Ta)[i]] = i; int[] result; size_t start, end; Ta w; foreach (i, ch; original) { auto wc = original[start .. end + 1]; if (wc in dictionary) { w = wc; ++end; } else { w = original[start .. end]; result ~= dictionary[w]; dictionary[wc] = dictionary.length - 1; start = i; end = i + 1; } } return w.empty ? result : (result ~ dictionary[w]); } This is "range correct". Being range correct, it doesn't allocate either. I would have preferred writing "wc = wc.ptr[0 .. wc.length + 1]"... but that's not range correct... I'd be thrilled if you (quickly) benched my compress3. ---- Also: I'd suggest you avoid code such as: int[] result; for (...) result ~= xxx ; Unless you take the time to manually call assumeUnique, this will reallocate on *each*and*every* append. You are better off using an actual appender: auto result = appender!int(); for (...) result.put(x); return result.data; My benching experiences have shown performance gains with appender as early as the 2nd insertion. -------- Back to topic though: My limited range experience has shown me that: 1. They are *incredibly* easy to chain and adapt. 2. RA ranges are (mostly) just as powerful as iterators or raw arrays. 3. If all you want to do is iterate forward, than any flavor of range will rock. The "HUGE" flaw I see with ranges though, is in regards to bidirectional ranges, and their ability to "only shrink, never grow". In particular, it is not possible to "slice" an Bidir right through the middle. "iterate until you find x, and then return the slice *up-to* x". Iterators can do it just fine. The (IMO) tell-tale symptom are all the algorithms that return a "take" for bidirectional ranges. "take" is nice and all, but in that case, I think it is subverting your range's type. In particular, own of the "great power" of a bidirectional range, is to define a sequence by "all elements between a start and end point". What's cool about this definition is that it means that if you insert new element between the bounds, they will be taken into account by the range. "Take" subverts that in that the sequence becomes a "start + index", so inserting new items in the underlying _container_ will just bump the last items out of your rake range. -------- Well, that's my rant anyways: Range are an incredibly powerful abstract notion, but when you want to get down and dirty with them, they tend to create a bit of friction compared to some lower level abstractions... But overall: Worth it.
Nov 22 2012
prev sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 11/22/12 02:14, bearophile wrote:
 In the following program there are two versions of the compress function, that
implement the same bare-bones version of the LZW
(http://en.wikipedia.org/wiki/Lempel-Ziv-Welch ) compression algorithm.
 
 compress1 is the simpler version, while compress2 avoids some heap allocations
using a Slice that is able to grow (in some of my benchmarks compress2 is about
twice faster than compress1).
 import std.stdio, std.array;
 
 alias T = ubyte;
 alias Ta = immutable(T)[];
 
 int[] compress1(immutable T[] original) {
     int[Ta] dictionary;
     foreach (i; 0 .. 256)
         dictionary[cast(Ta)[i]] = i;
 
     Ta w;
     int[] result;
     foreach (ch; original) {
         auto wc = w ~ ch;
         if (wc in dictionary) {
             w = wc;
         } else {
             result ~= dictionary[w];
             dictionary[wc] = dictionary.length - 1;
             w = [ch];
         }
     }
 
     return w.empty ? result : (result ~ dictionary[w]);
 }
 
 int[] compress2(immutable T[] original) {
     int[Ta] dictionary;
     foreach (i; 0 .. 256)
         dictionary[cast(Ta)[i]] = i;
 
     struct Slice {
         size_t start, end;
          property opSlice() {
             return original[start .. end];
         }
         alias this = opSlice;
     }
 
     Slice w;
     int[] result;
     foreach (i; 0 .. original.length) {
         auto wc = Slice(w.start, w.end + 1);
         if (wc in dictionary) {
             w = wc;
         } else {
             result ~= dictionary[w];
             dictionary[wc] = dictionary.length - 1;
             w = Slice(i, i + 1);
         }
     }
 
     return w.empty ? result : (result ~ dictionary[w]);
 }
 
 void main() {
     auto txt = cast(Ta)"TOBEORNOTTOBEORTOBEORNOT";
     writeln(compress1(txt));
     writeln(compress2(txt));
 }
 Is it possible to write similar code nicely & efficiently with ranges?
First, the code above is practically a textbook example on how /not/ to write readable and efficient code. So lets do a saner implementation: import std.stdio, std.array, std.range; OUT[] compress(OUT=int, R)(const R original) { alias typeof(original[0..1]) Elems; alias typeof(cast()Elems[0]) Elem; IDAA!(OUT, const Elems) dictionary; OUT[] result; size_t start, end = 1; while (end<=original.length) { if (original[start..end] in dictionary) { ++end; continue; } result ~= dictionary[original[start..end-1]]; dictionary[original[start..end]] = cast(OUT)dictionary.length; start = end-1; } return original[start..end-1].empty ? result : (result ~ dictionary[original[start..end-1]]); } struct IDAA(VAL, IDX) { VAL[IDX] map; enum M = IDX[0].max; static immutable VAL[M+1] idmap = { VAL[M+1] a; foreach (VAL c; 0..M) a[c] = c; return a; }(); auto opIndex(const IDX idx) inout { return idx.length==1 ? idmap[idx[0]] : map[idx]; } auto opIndexAssign(const VAL val, const IDX idx) { return idx.length==1 ? idx[0] : (map[idx] = val); } const opBinaryRight(string op:"in")(const IDX idx) /*inout*/ { return idx.length==1 ? &idmap[idx[0]] : idx in map; } property length() /*inout*/ { return map.length + idmap.length; } } This code is about an order of magnitude faster than both of your versions (which performed ~ equally here w/ gdc), as the initialization, allocations, copies and AA ops were completely dominating everything else. Now, when we have something to compare against, rewriting it to use ranges is trivial: OUT[] compress(OUT=int, R)(R original) { IDAA!(OUT, const typeof(cast()original.front)[]) dictionary; OUT[] result; size_t l = 1; while (original.length>=l) { if (original.takeExactly(l) in dictionary) { ++l; continue; } result ~= dictionary[original.takeExactly(l-1)]; dictionary[original.takeExactly(l)] = cast(OUT)dictionary.length; original.popFrontN(l-1); l = 1; } return original.empty ? result : (result ~ dictionary[original]); } And the (surprising, for me) result is that this code performs just as well as the "saner" version above. So i guess the answer is that it is possible. For some definition of "nicely", as I don't like takeExactly and popFrontN. ;) artur
Nov 22 2012