www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Segmented Ranges?

reply "bearophile" <bearophileHUGS lycos.com> writes:
Often enough you have to perform operations on some non-flat 
sequence, like on a 2D matrix:

import std.stdio;
void main() {
     auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
     foreach (row; m1)
         foreach (x; row)
             writeln(x);
}


Such multi-level ranges are common. But a Range (below an Input 
Range) that works on a generic multi level Range is slower than 
the two nested for loops because it needs to keep attention to 
the exhaustion of the sub-ranges at every iteration (see 
popFront):


import std.stdio, std.traits, std.array, std.range, std.algorithm;

struct BilevelScan(Range) {
     private Range range;
     typeof(range.front) currentSegment;

     this(Range range_) {
         this.range = range_;
         _nextItem();
     }

      property bool empty() {
         return range.empty && currentSegment.empty;
     }

      property auto front() {
         return currentSegment.front;
     }

     void popFront() {
         currentSegment.popFront();
         _nextItem();
     }

     private void _nextItem() {
         while (!range.empty && currentSegment.empty) {
             currentSegment = range.front;
             range.popFront();
         }
     }
}

void main() { // some test code
     auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];

     foreach (x; BilevelScan!(typeof(m1))(m1))
         writeln(x);
     writeln(m1, "\n");

     auto m2 = [[10.5]];
     foreach (x; BilevelScan!(typeof(m2))(m2))
         writeln(x);
     writeln(m2, "\n");

     auto m3 = 6.iota().map!(i => iota(10 ^^ i + i, 10 ^^ i + i + 
3))();
     foreach (x; BilevelScan!(typeof(m3))(m3))
         writeln(x);
     writeln(m3, "\n");
}



In the paper "Segmented Iterators and Hierarchical Algorithms" 
Matthew H. Austern offers a solution to regain the lost 
efficiency:
http://


Some quotations from the paper:

segmented data structures are common. Within the STL, for 
example, the deque container is typically implemented as a 
segmented array, and hash tables [6] are typically implemented 
as arrays of buckets. Other examples of segmentation include 
two-dimensional matrices, graphs, files in record-based file 
systems, and, on modern NUMA (non-uniform memory architecture) 
multiprocessor machines, even ordinary memory.<

A segmented iterator has the same associated types as any other 
iterator (a value type, for example), and, additionally, it has 
an associated segment iterator type and local iterator type. 
Writing a generic algorithm that uses segmented iterators 
requires the ability to name those types. Additionally, a fully 
generic algorithm ought to be usable with both segmented and 
nonsegmented iterators.<

The distinction between segmented and nonsegmented iterators is 
orthogonal to the distinction between Forward Iterators, 
Bidirectional Iterators, and Random Access Iterators.<

Multiple levels of segmentation can be used with no extra 
effort. Nothing in this discussion, or this implementation of 
fill, assumes that the local iterator is nonsegmented.<

Segmented iterators, an extension of ordinary STL iterators, 
make it possible for generic algorithms to treat a segmented 
data structure as something other than a uniform range of 
elements. This allows existing algorithms to operate more 
efficiently on segmented data structures, and provides a natural 
decomposition for parallel computation. Segmented iterators 
enable new kinds of generic algorithms that explicitly depend on 
segmentation.<

In D the concept of Input Range is defined as a type that statically supports (there is also some necessary semantics that can't be expressed in the D code of such concepts): R r; // can define a range object if (r.empty) {} // can test for empty r.popFront(); // can invoke popFront() auto h = r.front; // can get the front of the range of non-void type So an idea is to introduce in D the multi-level Ranges: SegmentedInputRange SegmentedOutputRange SegmentedForwardRange SegmentedBidirectionalRange SegmentedRandomAccessRange I think a Segmented Input Range is more complex than an Input Range. Inventing such ranges is like finding the miminal set of axioms that allow to write down a theorem, that is to efficiently implement the normal algorithms. This is not an easy for me, I don't know much about this topic. Bye, bearophile
Jun 09 2012
next sibling parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 09-06-2012 12:45, bearophile wrote:
 Often enough you have to perform operations on some non-flat sequence,
 like on a 2D matrix:

 import std.stdio;
 void main() {
 auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
 foreach (row; m1)
 foreach (x; row)
 writeln(x);
 }


 Such multi-level ranges are common. But a Range (below an Input Range)
 that works on a generic multi level Range is slower than the two nested
 for loops because it needs to keep attention to the exhaustion of the
 sub-ranges at every iteration (see popFront):


 import std.stdio, std.traits, std.array, std.range, std.algorithm;

 struct BilevelScan(Range) {
 private Range range;
 typeof(range.front) currentSegment;

 this(Range range_) {
 this.range = range_;
 _nextItem();
 }

  property bool empty() {
 return range.empty && currentSegment.empty;
 }

  property auto front() {
 return currentSegment.front;
 }

 void popFront() {
 currentSegment.popFront();
 _nextItem();
 }

 private void _nextItem() {
 while (!range.empty && currentSegment.empty) {
 currentSegment = range.front;
 range.popFront();
 }
 }
 }

 void main() { // some test code
 auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];

 foreach (x; BilevelScan!(typeof(m1))(m1))
 writeln(x);
 writeln(m1, "\n");

 auto m2 = [[10.5]];
 foreach (x; BilevelScan!(typeof(m2))(m2))
 writeln(x);
 writeln(m2, "\n");

 auto m3 = 6.iota().map!(i => iota(10 ^^ i + i, 10 ^^ i + i + 3))();
 foreach (x; BilevelScan!(typeof(m3))(m3))
 writeln(x);
 writeln(m3, "\n");
 }



 In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew
 H. Austern offers a solution to regain the lost efficiency:
 http://

I think your NNTP client ate a link. ;)
 Some quotations from the paper:

 segmented data structures are common. Within the STL, for example, the
 deque container is typically implemented as a segmented array, and
 hash tables [6] are typically implemented as arrays of buckets. Other
 examples of segmentation include two-dimensional matrices, graphs,
 files in record-based file systems, and, on modern NUMA (non-uniform
 memory architecture) multiprocessor machines, even ordinary memory.<

 A segmented iterator has the same associated types as any other
 iterator (a value type, for example), and, additionally, it has an
 associated segment iterator type and local iterator type. Writing a
 generic algorithm that uses segmented iterators requires the ability
 to name those types. Additionally, a fully generic algorithm ought to
 be usable with both segmented and nonsegmented iterators.<

 The distinction between segmented and nonsegmented iterators is
 orthogonal to the distinction between Forward Iterators, Bidirectional
 Iterators, and Random Access Iterators.<

 Multiple levels of segmentation can be used with no extra effort.
 Nothing in this discussion, or this implementation of fill, assumes
 that the local iterator is nonsegmented.<

 Segmented iterators, an extension of ordinary STL iterators, make it
 possible for generic algorithms to treat a segmented data structure as
 something other than a uniform range of elements. This allows existing
 algorithms to operate more efficiently on segmented data structures,
 and provides a natural decomposition for parallel computation.
 Segmented iterators enable new kinds of generic algorithms that
 explicitly depend on segmentation.<

In D the concept of Input Range is defined as a type that statically supports (there is also some necessary semantics that can't be expressed in the D code of such concepts): R r; // can define a range object if (r.empty) {} // can test for empty r.popFront(); // can invoke popFront() auto h = r.front; // can get the front of the range of non-void type So an idea is to introduce in D the multi-level Ranges: SegmentedInputRange SegmentedOutputRange SegmentedForwardRange SegmentedBidirectionalRange SegmentedRandomAccessRange I think a Segmented Input Range is more complex than an Input Range. Inventing such ranges is like finding the miminal set of axioms that allow to write down a theorem, that is to efficiently implement the normal algorithms. This is not an easy for me, I don't know much about this topic. Bye, bearophile

-- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 09 2012
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
 In the paper "Segmented Iterators and Hierarchical Algorithms" 
 Matthew H. Austern offers a solution to regain the lost 
 efficiency:
 http://

Sorry: http://lafstern.org/matt/segmented.pdf Bye, bearophile
Jun 09 2012
parent Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
09.06.2012 19:06, bearophile пишет:
 In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew
 H. Austern offers a solution to regain the lost efficiency:

http://lafstern.org/matt/segmented.pdf

Wow, didn't know there is an article about this. Looks like we all think about same things. Personally I was too lazy to propose in this NG similar but less advanced idea. So my small (but valuable, IMHO) addition: Add explicit vectorisation for some common predicates (`a == b` e.g.) for sequential memory blocks (arrays) and ranges with elements of such blocks (called "segmented" in the article). -- Денис В. Шеломовский Denis V. Shelomovskij
Jun 09 2012
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/9/12 5:45 AM, bearophile wrote:
 Often enough you have to perform operations on some non-flat sequence,

Would joiner help? Andrei
Jun 09 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 Would joiner help?

I don't know. I think the point of those segmented ranges is to not hide their segmented nature (as joiner does). - - - - - - - - - - - - - - This paper shows a variant of the stream fusion, applied on chunked arrays that represent strings: "Rewriting Haskell Strings" (2007) by Duncan Coutts, Don Stewart, Roman Leshchinskiy: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.3166 In D it's easy to create a range that segments a given lazy range into a lazy iterable of arrays: import std.range: ElementType; import std.array: empty, front, popFront; struct Segmenter(size_t chunkLength, Range) { alias ElementType!Range elementT; private elementT[chunkLength] chunk; private Range r; private size_t len; this(Range rin) { this.r = rin; popFront(); } property bool empty() const { return len == 0 && r.empty; } property elementT[] front() pure nothrow { return chunk[0 .. len]; } void popFront() { len = 0; while (len < chunkLength && !r.empty) { chunk[len] = r.front; len++; r.popFront(); } } } But Phobos ranges don't manage segments automatically (so you can't just apply a normal map on a Segmenter, because this map will work on the chunks), and there aren't the optimizations described in that paper. On the other hand some range methods become inlined, because D ranges are quite different from Haskell lazy lists, so maybe some of those optimizations are not needed. Bye, bearophile
Jun 17 2012
prev sibling parent travert phare.normalesup.org (Christophe Travert) writes:
"bearophile" , dans le message (digitalmars.D:169611), a crit:
 struct BilevelScan(Range) {

This is basically std.algorithm.joiner
 So an idea is to introduce in D the multi-level Ranges:
 SegmentedInputRange
 SegmentedOutputRange
 SegmentedForwardRange
 SegmentedBidirectionalRange
 SegmentedRandomAccessRange
 
 I think a Segmented Input Range is more complex than an Input 
 Range. Inventing such ranges is like finding the miminal set of 
 axioms that allow to write down a theorem, that is to efficiently 
 implement the normal algorithms. This is not an easy for me, I 
 don't know much about this topic.

I would only use this axiom: - a SegmentedRange defines a bySegment property, returning a Range of Range bearing the same element type as SegmentedRange. and define BySegment!Range as the type returned by range.bySegment, and SegmentType!Range as the type returned by range.bySegment.front. Then it is up to the implementer of the algorithm to test the type of a SegmentedRange (InputRange, OutputRange, etc.), the type of BySegment!Range, and the type of SegmentType!Range when needed. There are too many combinaisons to make specific range types for all of them. Moreover SegmentedRandomAccessRange would be ambiguous: is it a RandomAccessRange, a Range whose BySegment method returns a RandomAccessRange, a Range whose SegmentType is a RandomAccessRange, or all of them ? -- Christophe
Jun 18 2012