digitalmars.D - Segmented Ranges?

bearophile (80/108) Jun 09 2012 Often enough you have to perform operations on some non-flat

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (6/107) Jun 09 2012 --
bearophile (4/8) Jun 09 2012 Sorry:

Denis Shelomovskij (10/14) Jun 09 2012 Wow, didn't know there is an article about this. Looks like we all think...

Andrei Alexandrescu (3/4) Jun 09 2012 Would joiner help?

bearophile (45/46) Jun 17 2012 I don't know. I think the point of those segmented ranges is to

travert phare.normalesup.org (Christophe Travert) (17/30) Jun 18 2012 I would only use this axiom:

"bearophile" <bearophileHUGS lycos.com> writes:

Often enough you have to perform operations on some non-flat 
sequence, like on a 2D matrix:

import std.stdio;
void main() {
     auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
     foreach (row; m1)
         foreach (x; row)
             writeln(x);
}


Such multi-level ranges are common. But a Range (below an Input 
Range) that works on a generic multi level Range is slower than 
the two nested for loops because it needs to keep attention to 
the exhaustion of the sub-ranges at every iteration (see 
popFront):


import std.stdio, std.traits, std.array, std.range, std.algorithm;

struct BilevelScan(Range) {
     private Range range;
     typeof(range.front) currentSegment;

     this(Range range_) {
         this.range = range_;
         _nextItem();
     }

      property bool empty() {
         return range.empty && currentSegment.empty;
     }

      property auto front() {
         return currentSegment.front;
     }

     void popFront() {
         currentSegment.popFront();
         _nextItem();
     }

     private void _nextItem() {
         while (!range.empty && currentSegment.empty) {
             currentSegment = range.front;
             range.popFront();
         }
     }
}

void main() { // some test code
     auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];

     foreach (x; BilevelScan!(typeof(m1))(m1))
         writeln(x);
     writeln(m1, "\n");

     auto m2 = [[10.5]];
     foreach (x; BilevelScan!(typeof(m2))(m2))
         writeln(x);
     writeln(m2, "\n");

     auto m3 = 6.iota().map!(i => iota(10 ^^ i + i, 10 ^^ i + i + 
3))();
     foreach (x; BilevelScan!(typeof(m3))(m3))
         writeln(x);
     writeln(m3, "\n");
}



In the paper "Segmented Iterators and Hierarchical Algorithms" 
Matthew H. Austern offers a solution to regain the lost 
efficiency:
http://


Some quotations from the paper:

segmented data structures are common. Within the STL, for 
example, the deque container is typically implemented as a 
segmented array, and hash tables [6] are typically implemented 
as arrays of buckets. Other examples of segmentation include 
two-dimensional matrices, graphs, files in record-based file 
systems, and, on modern NUMA (non-uniform memory architecture) 
multiprocessor machines, even ordinary memory.<

A segmented iterator has the same associated types as any other 
iterator (a value type, for example), and, additionally, it has 
an associated segment iterator type and local iterator type. 
Writing a generic algorithm that uses segmented iterators 
requires the ability to name those types. Additionally, a fully 
generic algorithm ought to be usable with both segmented and 
nonsegmented iterators.<

The distinction between segmented and nonsegmented iterators is 
orthogonal to the distinction between Forward Iterators, 
Bidirectional Iterators, and Random Access Iterators.<

Multiple levels of segmentation can be used with no extra 
effort. Nothing in this discussion, or this implementation of 
fill, assumes that the local iterator is nonsegmented.<

Segmented iterators, an extension of ordinary STL iterators, 
make it possible for generic algorithms to treat a segmented 
data structure as something other than a uniform range of 
elements. This allows existing algorithms to operate more 
efficiently on segmented data structures, and provides a natural 
decomposition for parallel computation. Segmented iterators 
enable new kinds of generic algorithms that explicitly depend on 
segmentation.<


In D the concept of Input Range is defined as a type that 
statically supports (there is also some necessary semantics that 
can't be expressed in the D code of such concepts):

R r;              // can define a range object
if (r.empty) {}   // can test for empty
r.popFront();     // can invoke popFront()
auto h = r.front; // can get the front of the range of non-void 
type



So an idea is to introduce in D the multi-level Ranges:
SegmentedInputRange
SegmentedOutputRange
SegmentedForwardRange
SegmentedBidirectionalRange
SegmentedRandomAccessRange

I think a Segmented Input Range is more complex than an Input 
Range. Inventing such ranges is like finding the miminal set of 
axioms that allow to write down a theorem, that is to efficiently 
implement the normal algorithms. This is not an easy for me, I 
don't know much about this topic.

Bye,
bearophile

Jun 09 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 09-06-2012 12:45, bearophile wrote:
 Often enough you have to perform operations on some non-flat sequence,
 like on a 2D matrix:

 import std.stdio;
 void main() {
 auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
 foreach (row; m1)
 foreach (x; row)
 writeln(x);
 }


 Such multi-level ranges are common. But a Range (below an Input Range)
 that works on a generic multi level Range is slower than the two nested
 for loops because it needs to keep attention to the exhaustion of the
 sub-ranges at every iteration (see popFront):


 import std.stdio, std.traits, std.array, std.range, std.algorithm;

 struct BilevelScan(Range) {
 private Range range;
 typeof(range.front) currentSegment;

 this(Range range_) {
 this.range = range_;
 _nextItem();
 }

  property bool empty() {
 return range.empty && currentSegment.empty;
 }

  property auto front() {
 return currentSegment.front;
 }

 void popFront() {
 currentSegment.popFront();
 _nextItem();
 }

 private void _nextItem() {
 while (!range.empty && currentSegment.empty) {
 currentSegment = range.front;
 range.popFront();
 }
 }
 }

 void main() { // some test code
 auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];

 foreach (x; BilevelScan!(typeof(m1))(m1))
 writeln(x);
 writeln(m1, "\n");

 auto m2 = [[10.5]];
 foreach (x; BilevelScan!(typeof(m2))(m2))
 writeln(x);
 writeln(m2, "\n");

 auto m3 = 6.iota().map!(i => iota(10 ^^ i + i, 10 ^^ i + i + 3))();
 foreach (x; BilevelScan!(typeof(m3))(m3))
 writeln(x);
 writeln(m3, "\n");
 }



 In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew
 H. Austern offers a solution to regain the lost efficiency:
 http://

I think your NNTP client ate a link. ;)

 Some quotations from the paper:

 segmented data structures are common. Within the STL, for example, the
 deque container is typically implemented as a segmented array, and
 hash tables [6] are typically implemented as arrays of buckets. Other
 examples of segmentation include two-dimensional matrices, graphs,
 files in record-based file systems, and, on modern NUMA (non-uniform
 memory architecture) multiprocessor machines, even ordinary memory.<

 A segmented iterator has the same associated types as any other
 iterator (a value type, for example), and, additionally, it has an
 associated segment iterator type and local iterator type. Writing a
 generic algorithm that uses segmented iterators requires the ability
 to name those types. Additionally, a fully generic algorithm ought to
 be usable with both segmented and nonsegmented iterators.<

 The distinction between segmented and nonsegmented iterators is
 orthogonal to the distinction between Forward Iterators, Bidirectional
 Iterators, and Random Access Iterators.<

 Multiple levels of segmentation can be used with no extra effort.
 Nothing in this discussion, or this implementation of fill, assumes
 that the local iterator is nonsegmented.<

 Segmented iterators, an extension of ordinary STL iterators, make it
 possible for generic algorithms to treat a segmented data structure as
 something other than a uniform range of elements. This allows existing
 algorithms to operate more efficiently on segmented data structures,
 and provides a natural decomposition for parallel computation.
 Segmented iterators enable new kinds of generic algorithms that
 explicitly depend on segmentation.<


 In D the concept of Input Range is defined as a type that statically
 supports (there is also some necessary semantics that can't be expressed
 in the D code of such concepts):

 R r; // can define a range object
 if (r.empty) {} // can test for empty
 r.popFront(); // can invoke popFront()
 auto h = r.front; // can get the front of the range of non-void type



 So an idea is to introduce in D the multi-level Ranges:
 SegmentedInputRange
 SegmentedOutputRange
 SegmentedForwardRange
 SegmentedBidirectionalRange
 SegmentedRandomAccessRange

 I think a Segmented Input Range is more complex than an Input Range.
 Inventing such ranges is like finding the miminal set of axioms that
 allow to write down a theorem, that is to efficiently implement the
 normal algorithms. This is not an easy for me, I don't know much about
 this topic.

 Bye,
 bearophile

-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Jun 09 2012

"bearophile" <bearophileHUGS lycos.com> writes:

 In the paper "Segmented Iterators and Hierarchical Algorithms" 
 Matthew H. Austern offers a solution to regain the lost 
 efficiency:
 http://

Sorry:
http://lafstern.org/matt/segmented.pdf

Bye,
bearophile

Jun 09 2012

Denis Shelomovskij <verylonglogin.reg gmail.com> writes:

09.06.2012 19:06, bearophile пишет:
 In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew
 H. Austern offers a solution to regain the lost efficiency:

...
 http://lafstern.org/matt/segmented.pdf

Wow, didn't know there is an article about this. Looks like we all think 
about same things. Personally I was too lazy to propose in this NG 
similar but less advanced idea. So my small (but valuable, IMHO) addition:
Add explicit vectorisation for some common predicates (`a == b` e.g.) 
for sequential memory blocks (arrays) and ranges with elements of such 
blocks (called "segmented" in the article).

-- 
Денис В. Шеломовский
Denis V. Shelomovskij

Jun 09 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 6/9/12 5:45 AM, bearophile wrote:
 Often enough you have to perform operations on some non-flat sequence,

Would joiner help?

Andrei

Jun 09 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 Would joiner help?

I don't know. I think the point of those segmented ranges is to 
not hide their segmented nature (as joiner does).

- - - - - - - - - - - - - -

This paper shows a variant of the stream fusion, applied on 
chunked arrays that represent strings:

"Rewriting Haskell Strings" (2007) by Duncan Coutts, Don Stewart, 
Roman Leshchinskiy:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.3166


In D it's easy to create a range that segments a given lazy range 
into a lazy iterable of arrays:


import std.range: ElementType;
import std.array: empty, front, popFront;

struct Segmenter(size_t chunkLength, Range) {
     alias ElementType!Range elementT;
     private elementT[chunkLength] chunk;
     private Range r;
     private size_t len;

     this(Range rin) {
         this.r = rin;
         popFront();
     }

      property bool empty() const {
         return len == 0 && r.empty;
     }

      property elementT[] front() pure nothrow {
         return chunk[0 .. len];
     }

     void popFront() {
         len = 0;
         while (len < chunkLength && !r.empty) {
             chunk[len] = r.front;
             len++;
             r.popFront();
         }
     }
}


But Phobos ranges don't manage segments automatically (so you 
can't just apply a normal map on a Segmenter, because this map 
will work on the chunks), and there aren't the optimizations 
described in that paper. On the other hand some range methods 
become inlined, because D ranges are quite different from Haskell 
lazy lists, so maybe some of those optimizations are not needed.

Bye,
bearophile

Jun 17 2012

travert phare.normalesup.org (Christophe Travert) writes:

"bearophile" , dans le message (digitalmars.D:169611), a �crit�:
 struct BilevelScan(Range) {

This is basically std.algorithm.joiner

 So an idea is to introduce in D the multi-level Ranges:
 SegmentedInputRange
 SegmentedOutputRange
 SegmentedForwardRange
 SegmentedBidirectionalRange
 SegmentedRandomAccessRange
 
 I think a Segmented Input Range is more complex than an Input 
 Range. Inventing such ranges is like finding the miminal set of 
 axioms that allow to write down a theorem, that is to efficiently 
 implement the normal algorithms. This is not an easy for me, I 
 don't know much about this topic.

I would only use this axiom:
- a SegmentedRange defines a bySegment property, returning a Range of 
Range bearing the same element type as SegmentedRange.

and define BySegment!Range as the type returned by range.bySegment, and 
SegmentType!Range as the type returned by range.bySegment.front.

Then it is up to the implementer of the algorithm to test the type of 
a SegmentedRange (InputRange, OutputRange, etc.), the type of 
BySegment!Range, and the type of SegmentType!Range when needed. There 
are too many combinaisons to make specific range types for all of them.
Moreover SegmentedRandomAccessRange would be ambiguous: is it a 
RandomAccessRange, a Range whose BySegment method returns a 
RandomAccessRange, a Range whose SegmentType is a RandomAccessRange, 
or all of them ?

-- 
Christophe

Jun 18 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Segmented Ranges?