www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - C++ / Why Iterators Got It All Wrong

reply =?iso-8859-1?Q?Robert_M._M=FCnch?= <robert.muench saphirion.com> writes:
Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1

I haven't read everything, so not sure if it worth to take a look.

-- 
Robert M. Mnch
http://www.saphirion.com
smarter | better | faster
Aug 29
next sibling parent bitwise <bitwise.pvt gmail.com> writes:
On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
 Maybe of interest: 
 https://www.think-cell.com/en/career/talks/iterators/#1

 I haven't read everything, so not sure if it worth to take a 
 look.
"superseded" is the wrong word, as ranges cannot do everything iterators can.
Aug 29
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/29/17 8:50 AM, Robert M. Mnch wrote:
 Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1
 
 
 I haven't read everything, so not sure if it worth to take a look.
 
 
Interesting. It reminds me a bit of cursors for dcollections. In there, a cursor is a 0 or 1 element range. The one element range points at an element, the 0 element range points at a border. You can use cursors to compose ranges. https://github.com/schveiguy/dcollections Not sure how useful it is to have them be separate types. It can be nice I suppose. But one thing I love about iterators/cursors is that something like find doesn't assume what data you are interested in. In Phobos, find gives you a range where the first element is the one you searched for, and the last element is the end of the original range. But what if you wanted all the data *up to* the element instead? What if you just wanted to look at that specific element? So we need several functions that do this, and it's not always clear how to do it correctly, and it's difficult to compose one function from the other. With iterators, it's simple. -Steve
Aug 29
next sibling parent bitwise <bitwise.pvt gmail.com> writes:
On Tuesday, 29 August 2017 at 13:23:50 UTC, Steven Schveighoffer 
wrote:
 In Phobos, find gives you a range where the first element is 
 the one you searched for, and the last element is the end of 
 the original range. But what if you wanted all the data *up to* 
 the element instead? What if you just wanted to look at that 
 specific element? So we need several functions that do this, 
 and it's not always clear how to do it correctly, and it's 
 difficult to compose one function from the other. With 
 iterators, it's simple.

 -Steve
I was about to whine about exactly this, but thought it would be off topic ;) I see "trim_left" in the slides which I'm guessing refers to what D calls "find". IMO, "trim_left" is a much better name. "find" should not return elements you didn't ask for, it should return a range of length 1 or 0.
Aug 29
prev sibling next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 29 August 2017 at 13:23:50 UTC, Steven Schveighoffer 
wrote:
 Interesting. It reminds me a bit of cursors for dcollections. 
 In there, a cursor is a 0 or 1 element range. The one element 
 range points at an element, the 0 element range points at a 
 border. You can use cursors to compose ranges.

 https://github.com/schveiguy/dcollections
I was thinking this exact same thing when I got to the Element part of it. The mach library also makes use of cursors. https://github.com/pineapplemachine/mach.d I'm not entirely sure how if I grok how the Border thing they are talking about works.
Aug 29
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/29/17 9:34 AM, jmh530 wrote:
 On Tuesday, 29 August 2017 at 13:23:50 UTC, Steven Schveighoffer wrote:
 Interesting. It reminds me a bit of cursors for dcollections. In 
 there, a cursor is a 0 or 1 element range. The one element range 
 points at an element, the 0 element range points at a border. You can 
 use cursors to compose ranges.

 https://github.com/schveiguy/dcollections
I was thinking this exact same thing when I got to the Element part of it. The mach library also makes use of cursors. https://github.com/pineapplemachine/mach.d I'm not entirely sure how if I grok how the Border thing they are talking about works.
A border points between elements. It's like the end element in a standard STL container, it's not really pointing at an element, but one past the last element. It can be handy when specifying ranges. I.e. exclusive or inclusive ranges. My gut feeling is that the splitting of types between elements and borders is too much machinery, but I haven't seen it in practice to make a fair judgment. -Steve
Aug 29
prev sibling parent reply =?iso-8859-1?Q?Robert_M._M=FCnch?= <robert.muench saphirion.com> writes:
On 2017-08-29 13:23:50 +0000, Steven Schveighoffer said:

 ...
 In Phobos, find gives you a range where the first element is the one 
 you searched for, and the last element is the end of the original 
 range. But what if you wanted all the data *up to* the element instead? 
 What if you just wanted to look at that specific element? So we need 
 several functions that do this, and it's not always clear how to do it 
 correctly, and it's difficult to compose one function from the other.
I'm a big fan of Rebol (yes, it's far from mainstream and a dead-end these days but it has a lot of very nice ideas) and it uses the idea of a series datatype. Fruther information here: http://www.rebol.com/docs/core23/rebolcore-6.html There you do something like: 1. Return the first hit and the rest of the series
 my-series: "abcdefg"
== "abcdefg"
 find my-series "b"
== "bcdefg" 2. Return only the hit
 first find my-series "b"
== #"b" 3. Return everything before the first hit
 copy/part my-series find my-series "d"
== "abc" If you ever got used to such a thinking, writing & using more non-functional approaches, really hurts. -- Robert M. Mnch http://www.saphirion.com smarter | better | faster
Aug 30
parent reply drug <drug2004 bk.ru> writes:
31.08.2017 09:50, Robert M. Münch пишет:
 On 2017-08-29 13:23:50 +0000, Steven Schveighoffer said:

 ...
 In Phobos, find gives you a range where the first element is the one
 you searched for, and the last element is the end of the original
 range. But what if you wanted all the data *up to* the element
 instead? What if you just wanted to look at that specific element? So
 we need several functions that do this, and it's not always clear how
 to do it correctly, and it's difficult to compose one function from
 the other.
I'm a big fan of Rebol (yes, it's far from mainstream and a dead-end these days but it has a lot of very nice ideas) and it uses the idea of a series datatype. Fruther information here: http://www.rebol.com/docs/core23/rebolcore-6.html There you do something like: 1. Return the first hit and the rest of the series
 my-series: "abcdefg"
== "abcdefg"
 find my-series "b"
== "bcdefg" 2. Return only the hit
 first find my-series "b"
== #"b" 3. Return everything before the first hit
 copy/part my-series find my-series "d"
== "abc" If you ever got used to such a thinking, writing & using more non-functional approaches, really hurts.
Interesting. How is it comparable with iterators and ranges ideas?
Aug 31
parent reply =?iso-8859-1?Q?Robert_M._M=FCnch?= <robert.muench saphirion.com> writes:
On 2017-08-31 07:13:55 +0000, drug said:

 Interesting. How is it comparable with iterators and ranges ideas?
Well, see: http://www.rebol.com/docs/core23/rebolcore-6.html#section-6 Just download the interpreter and play around with it to get a feeling. Overall it's way more functional. Like getting the last element: copy back tail my-series What I don't like is that indexing starts at 1, because this makes working with +/- offsets from a specific positon cumbersome. -- Robert M. Mnch http://www.saphirion.com smarter | better | faster
Sep 02
parent reply Moritz Maxeiner <moritz ucworks.org> writes:
On Saturday, 2 September 2017 at 20:17:40 UTC, Robert M. Münch 
wrote:
 On 2017-08-31 07:13:55 +0000, drug said:

 Interesting. How is it comparable with iterators and ranges 
 ideas?
Well, see: http://www.rebol.com/docs/core23/rebolcore-6.html#section-6
Thanks for your post about Rebol, I didn't know it before. After reading through the series chaper, though, AFAICT Rebol series *are* iterators (begin+end), just with really nice, functional (read: LISP) syntax?
Sep 02
parent reply =?iso-8859-1?Q?Robert_M._M=FCnch?= <robert.muench saphirion.com> writes:
On 2017-09-02 21:27:58 +0000, Moritz Maxeiner said:

 Thanks for your post about Rebol, I didn't know it before.
As said, the official Rebol-2 version is a dead-end. Even our main product is still based on it :-) 15 years old technology, but still working and we know all problemes. So very few surprises. And, it's very productive. There exists a half-done official Rebol-3 version as open-source. It was picked up by some and continued. And than there is a thing called Red, which uses a lot of ideas but compiles. Worth a look too. It's really cool, because it compiles native apps for Android without any SDK for example. Overall, it's worth to spend some time with Rebol. I'm sure you won't your time back and can learn a lot. Things to look at: VIEW (GUI), PARSE (for parsing) and after this using PARSE to create DSL with Rebol. Very cool feature.
 After reading through the series chaper, though, AFAICT Rebol series 
 *are* iterators (begin+end), just with really nice, functional (read: 
 LISP) syntax?
There is no difference between code & data in Rebol. And it has a very rich set of datatpye, IIRC about 35 native ones. And many of them are series, which can be operated in the same way. From my experience, traversing datastructures with a functional syntax and concept is really natural to work with. It's mostly what you would tell someone to do using english. -- Robert M. Mnch http://www.saphirion.com smarter | better | faster
Sep 03
parent reply Moritz Maxeiner <moritz ucworks.org> writes:
On Sunday, 3 September 2017 at 08:37:36 UTC, Robert M. Münch 
wrote:
 On 2017-09-02 21:27:58 +0000, Moritz Maxeiner said:

 Thanks for your post about Rebol, I didn't know it before.
As said, the official Rebol-2 version is a dead-end. Even our main product is still based on it :-) 15 years old technology, but still working and we know all problemes. So very few surprises. And, it's very productive. There exists a half-done official Rebol-3 version as open-source. It was picked up by some and continued. And than there is a thing called Red, which uses a lot of ideas but compiles. Worth a look too. It's really cool, because it compiles native apps for Android without any SDK for example. Overall, it's worth to spend some time with Rebol. I'm sure you won't your time back and can learn a lot. Things to look at: VIEW (GUI), PARSE (for parsing) and after this using PARSE to create DSL with Rebol. Very cool feature.
I'll put in on my ever growing list of things to check out in depth, thanks :p
 After reading through the series chaper, though, AFAICT Rebol 
 series *are* iterators (begin+end), just with really nice, 
 functional (read: LISP) syntax?
There is no difference between code & data in Rebol. And it has a very rich set of datatpye, IIRC about 35 native ones. And many of them are series, which can be operated in the same way.
Sounds like LISP :)
 From my experience, traversing datastructures with a functional 
 syntax and concept is really natural to work with. It's mostly 
 what you would tell someone to do using english.
I agree, though I was talking about what the abstract data type of a "series" is, i.e. what operations is exposes. From my observation: A D input range exposes via empty/front/popFront. A classic iterator exposes via begin/end. A Rebol series seems to be a safer form of iterator, as it doesn't expose begin/end directly, but exposes restricted operations that are defined as manipulating begin/end.
Sep 03
next sibling parent =?iso-8859-1?Q?Robert_M._M=FCnch?= <robert.muench saphirion.com> writes:
On 2017-09-03 12:46:05 +0000, Moritz Maxeiner said:

 I'll put in on my ever growing list of things to check out in depth, thanks :p
You're welcome. Don't wait to long ;-)
 There is no difference between code & data in Rebol. And it has a very 
 rich set of datatpye, IIRC about 35 native ones. And many of them are 
 series, which can be operated in the same way.
Sounds like LISP :)
Yes, some ideas are the same. But the syntax is way more beautyful and useable.
 I agree, though I was talking about what the abstract data type of a 
 "series" is, i.e. what operations is exposes. From my observation:
 A D input range exposes via empty/front/popFront.
 A classic iterator exposes via begin/end.
 A Rebol series seems to be a safer form of iterator, as it doesn't 
 expose begin/end directly, but exposes restricted operations that are 
 defined as manipulating begin/end.
The series has elements and a "current pointer" which can be manipulated. Begin & end are just that, begin and end of the series. You don't manipulate them directly. Of couse you can change a series by adding/removing, cutting etc. which then changes begin/end accordingly.
 a: [a b c d e f g]
== [a b c d e f g]
 a/5
== e
 skip a 5
== [f g]
 a
== [a b c d e f g]
 b: a/5
== e
 type? b
== word!
 type? a
== block!
 b: skip a 5
== [f g]
 type? b
== block!
 index? b
== 6
 head b
== [a b c d e f g] You can even treat such a block as fixed-size record and operate on this. Very handy. -- Robert M. Mnch http://www.saphirion.com smarter | better | faster
Sep 04
prev sibling parent Mark <smarksc gmail.com> writes:
On Sunday, 3 September 2017 at 12:46:05 UTC, Moritz Maxeiner 
wrote:
 I agree, though I was talking about what the abstract data type 
 of a "series" is, i.e. what operations is exposes. From my 
 observation:
 A D input range exposes via empty/front/popFront.
 A classic iterator exposes via begin/end.
 A Rebol series seems to be a safer form of iterator, as it 
 doesn't expose begin/end directly, but exposes restricted 
 operations that are defined as manipulating begin/end.
They are all pretty low-level abstractions. C++ has posited that iterators should be the bridge connecting generic data structures and generic algorithms. But they are pretty awful at that. They make it incredibly easy to destroy one of your structure's invariants, or to use it in a way that ignores some of its invariants (leading to inferior performance and sometimes unnecessarily bulky code). Ironically iterators are frequently used in OO code even though they clearly violate the "Tell, don't ask" principle, as do D ranges (and presumably also Rebol series, though I only skimmed over the documentation).
Sep 04
prev sibling next sibling parent reply Mark <smarksc gmail.com> writes:
On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
 Maybe of interest: 
 https://www.think-cell.com/en/career/talks/iterators/#1

 I haven't read everything, so not sure if it worth to take a 
 look.
Iterators have many problems. Andrei's talk some years ago, titled "Iterators Must Go", points out many of them in a clear fashion. I think most of the problems stem from the fact that they are inherently a leaky abstraction. Iterators basically treat all data structures as a singly/doubly linked list or as arrays (if they allow random access). There is no natural or intuitive way of describing, say, a tree or an arbitrary graph as a list/array. Ranges and cursors try to solve some of the issues but they still have this fundamental problem. When I think of a good iteration interface, the only thing that comes to mind is SQL queries (= relational algebra, more or less). You have a few basic operators (Select, From, etc.) for querying that are intuitive, simple and safe. Furthermore, they compose elegantly, allowing you to express fairly complicated queries using these few basic operators. It's a true abstraction - I don't know and I don't care if the operators are implemented underneath using iterators, ranges, cursors or whatever (of course, sometimes I do care, when performance is a priority...). It may be unrealisic to expect such a nice abstraction of iteration for other data structures, not to mention a universal one which will work for all interesting data structures. Still, if and when I can dispense with iteration altogether, I'm happy to do so.
Sep 01
parent reply =?iso-8859-1?Q?Robert_M._M=FCnch?= <robert.muench saphirion.com> writes:
On 2017-09-01 20:34:32 +0000, Mark said:

 On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Mnch wrote:
 Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1
 
 I haven't read everything, so not sure if it worth to take a look.
Iterators have many problems. Andrei's talk some years ago, titled "Iterators Must Go", points out many of them in a clear fashion. I think most of the problems stem from the fact that they are inherently a leaky abstraction. Iterators basically treat all data structures as a singly/doubly linked list or as arrays (if they allow random access). There is no natural or intuitive way of describing, say, a tree or an arbitrary graph as a list/array. Ranges and cursors try to solve some of the issues but they still have this fundamental problem.
Iterators are not the silver bullet. But IIRC you can specify if you want to iterate over a graph BF or DF. If you just need to "iterate" over the elements things work pretty good IMO. If you want to select a sub-set and then iterate things get much complicater anyway.
 It may be unrealisic to expect such a nice abstraction of iteration for 
 other data structures, not to mention a universal one which will work 
 for all interesting data structures.
That's true, hence I didn't ever expected this at all. Some pretty simple & basic building blocks and the rest is done on-demand fitting the use-case. -- Robert M. Mnch http://www.saphirion.com smarter | better | faster
Sep 02
parent Mark <smarksc gmail.com> writes:
On Saturday, 2 September 2017 at 20:22:44 UTC, Robert M. Münch 
wrote:
 Iterators are not the silver bullet. But IIRC you can specify 
 if you want to iterate over a graph BF or DF. If you just need 
 to "iterate" over the elements things work pretty good IMO. If 
 you want to select a sub-set and then iterate things get much 
 complicater anyway.
There isn't really some finite (or small, anyway) choice here. I may want to iterate over a binary tree in preorder/inorder/postorder DF order. I may want to iterate over it in BF order. I may want to handle the left son before the right one, or vice versa. This is already 8 iterators. In C++ land I also need to provide const and non-const versions of each iterator, which brings it up to 16 iterators (I'm not sure what's the situation in D - can you have const ranges and still be able to call popFront on them?) From the implementor's side, this is a headache. However, it's not that bad in a language in D, since "static if" saves you a lot of code in cases like this. The bigger problem is on the user's side: Which iterator should she use? For most applications more than one type of iterator will be adequate, but the "right" choice will often be implementation-dependent. For instance, if the tree in question is a complete binary tree, then it is often implemented as an array that contains the tree's elements in BF order. Therefore, if some operation on the tree can be done by traversing it BF then the user should probably use a BF iterator, for performance reasons. But the user doesn't know how the tree is implemented, so she can't make that choice... If the implementor informs the user about the implementation in the documentation, then his data structure is a cost-free abstraction (assuming that the user always chooses wisely...) but also a leaky one. If he doesn't, then the abstraction doesn't leak but it is no longer cost-free. He can't have both. A simpler example is a class interface for a (two-dimensional) matrix. You would expect the interface to provide a row-major order iterator and a column-major order one. Again, from the user's POV, the right choice (performance wise) will be dependent on the implementation. Asymptotic complexity is sometimes considered an essential detail of a "true" abstraction but this doesn't help here since the difference in performance is only observed in practice. This is why I find the presentation in the OP - "Why iterators got it all wrong" - a bit iffy. Iterators did get a lot of things wrong, but the presentation focuses on relatively mundane issues - ugly syntax and somewhat annoying semantics.
Sep 05
prev sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
 Maybe of interest: 
 https://www.think-cell.com/en/career/talks/iterators/#1

 I haven't read everything, so not sure if it worth to take a 
 look.
Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1]. User level API is ndslice [4] (D n-dimensional random access range) , can be combined with Phobos. In the same time library uses low level unbounded random access iterator [2] representation to implement 90% of topology logic. This approach allows to maintain few times smaller, simpler and less error-prone code comparing with std.range. The best example it `bitwise` functions implementation in Phobos and Mir Algorithm. Iterators are very good for type composition when one need its own D range that can not be constructed using existing mir.ndslice.topology / std.range API. [1] https://github.com/libmir/mir-algorithm [2] http://docs.algorithm.dlang.io/latest/mir_ndslice_topology.html [3] http://docs.algorithm.dlang.io/latest/mir_ndslice_iterator.html [4] http://docs.algorithm.dlang.io/latest/mir_ndslice_slice.html#Slice Best Regards, Ilya
Sep 02
parent reply Moritz Maxeiner <moritz ucworks.org> writes:
On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko 
wrote:
 On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch 
 wrote:
 Maybe of interest: 
 https://www.think-cell.com/en/career/talks/iterators/#1

 I haven't read everything, so not sure if it worth to take a 
 look.
Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].
Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?
Sep 02
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner 
wrote:
 On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko 
 wrote:
 On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch 
 wrote:
 Maybe of interest: 
 https://www.think-cell.com/en/career/talks/iterators/#1

 I haven't read everything, so not sure if it worth to take a 
 look.
Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].
Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?
There are three general kinds of dynamic dense tensors. Mir implements all of them: 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths. 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride. 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors. Finite random access range Issues: 1. Main API issue is that full featured random access range (RAR) API (with slicing, and all opIndex* operators like `[]*=` ) is much larger comparing with unbounded random access iterator (the same API as pointer). 2. Random access range holds its length. Yes, Phobos has infinity ranges (mir hasn't), but the practice is that almost any RAR constructed using Phobos has length and you can not strip when type is constructed. Anyway, infinity RAR is just a pointer/iterator that can move forward but can not move backward. Tensors hold their own lengths for each dimensions, so range's length is just a useless payload. 3. Ranges can not move backward (auto b = a[-4 .. $].). This means one can not implement dynamic `reversed`(flip) [1, 2] operation for dimensions that have strides (universal or canonical tensors). _Dynamic_ means that instead of construct a new type with reversed order for a dimension, we just negate the corresponding stride and move the cursor. [1] http://docs.algorithm.dlang.io/latest/mir_ndslice_dynamic.html#.reversed [2] https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.flip.html Best regards, Ilya
Sep 03
next sibling parent reply Moritz Maxeiner <moritz ucworks.org> writes:
On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko 
wrote:
 On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner 
 wrote:
 On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko 
 wrote:
 On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch 
 wrote:
 Maybe of interest: 
 https://www.think-cell.com/en/career/talks/iterators/#1

 I haven't read everything, so not sure if it worth to take a 
 look.
Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].
Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?
There are three general kinds of dynamic dense tensors. Mir implements all of them: 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths. 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride. 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors. Finite random access range Issues: 1. Main API issue is that full featured random access range (RAR) API (with slicing, and all opIndex* operators like `[]*=` ) is much larger comparing with unbounded random access iterator (the same API as pointer).
Won't you have to implement opIndex* operators either way in order to use the `a[]` syntax? The main difference I can see should be that you'd also have to implement the InputRange (front,popFront,empty), ForwardRange (save), and BidirectionalRange (back,popBack) members, which if you don't need them, is indeed understandably too much?
 2. Random access range holds its length. Yes, Phobos has 
 infinity ranges (mir hasn't), but the practice is that almost 
 any RAR constructed using Phobos has length and you can not 
 strip when type is constructed. Anyway, infinity RAR is just a 
 pointer/iterator that can move forward but can not move 
 backward. Tensors hold their own lengths for each dimensions, 
 so range's length is just a useless payload.
I'm not sure what you mean here. Is this still about accessing the elements of one tensor? If so, what do Phobos' ranges have to do with your tensor type's API?
 3. Ranges can not move backward (auto b = a[-4 .. $].). This 
 means one can not implement dynamic `reversed`(flip) [1, 2] 
 operation for dimensions that have strides (universal or 
 canonical tensors). _Dynamic_ means that instead of construct a 
 new type with reversed order for a dimension, we just negate 
 the corresponding stride and move the cursor.
Right, that is indeed a limitation of the range abstraction (though a type could expose both functionality to move backwards _and_ the range API) and if you need that, it makes sense not to use ranges. Thanks for the summary of issues :)
Sep 03
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Sunday, 3 September 2017 at 12:35:16 UTC, Moritz Maxeiner 
wrote:
 On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko 
 wrote:
 On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner 
 wrote:
 On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko 
 wrote:
 On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch 
 wrote:
 Maybe of interest: 
 https://www.think-cell.com/en/career/talks/iterators/#1

 I haven't read everything, so not sure if it worth to take 
 a look.
Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].
Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?
There are three general kinds of dynamic dense tensors. Mir implements all of them: 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths. 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride. 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors. Finite random access range Issues: 1. Main API issue is that full featured random access range (RAR) API (with slicing, and all opIndex* operators like `[]*=` ) is much larger comparing with unbounded random access iterator (the same API as pointer).
Won't you have to implement opIndex* operators either way in order to use the `a[]` syntax? The main difference I can see should be that you'd also have to implement the InputRange (front,popFront,empty), ForwardRange (save), and BidirectionalRange (back,popBack) members, which if you don't need them, is indeed understandably too much?
front, popFront, empty, save, back, popBack, opIndexOpAssign, 2 x opIndex(.opSlice) for rhs scalars and ranges 2 x opIndexAssign(.opSlice) for rhs scalars and ranges 2 x opIndexOpAssign(.opSlice) for rhs scalars and ranges Plus, because of range topology slicing may be slower good to add this ones: popFrontN popFrontExactly, popBackN popBackExactly, and maybe i forgot something ...
 2. Random access range holds its length. Yes, Phobos has 
 infinity ranges (mir hasn't), but the practice is that almost 
 any RAR constructed using Phobos has length and you can not 
 strip when type is constructed. Anyway, infinity RAR is just a 
 pointer/iterator that can move forward but can not move 
 backward. Tensors hold their own lengths for each dimensions, 
 so range's length is just a useless payload.
I'm not sure what you mean here. Is this still about accessing the elements of one tensor? If so, what do Phobos' ranges have to do with your tensor type's API?
ndslice is: 1. Slice structure with a huge amount of features from mir algorithm 2. multidimensional RAR: .front!0, front!1, .front!(N - 1) and all other generalized RAR primitves including multidimensional index [1, 2, 3, ...] and slicing [1 .. 4, 3 .. $]. 3. Common one dimensional RAR on top of outer most dimension. So one can use Phobos funcitons for Slice structure. For example, a matrix will be iterated row-by-row. But this paragraph was about another issue: struct BLASMatrixTypeBasedOnRanges { double[] _data; // arrays is the simplest RAR size_t rows, cols; size_t stride; } sizeof(YourBLASMatrixTypeBasedOnRanges) == size_t.sizeof * 5; in other hand: sizeof(Slice!(Canonical, [2], double*)) == size_t.sizeof * 4; Ranges requires for 25% more space for canonical matrixes then iterators.
 3. Ranges can not move backward (auto b = a[-4 .. $].). This 
 means one can not implement dynamic `reversed`(flip) [1, 2] 
 operation for dimensions that have strides (universal or 
 canonical tensors). _Dynamic_ means that instead of construct 
 a new type with reversed order for a dimension, we just negate 
 the corresponding stride and move the cursor.
Right, that is indeed a limitation of the range abstraction (though a type could expose both functionality to move backwards _and_ the range API) and if you need that, it makes sense not to use ranges. Thanks for the summary of issues :)
Sep 03
parent reply Moritz Maxeiner <moritz ucworks.org> writes:
On Sunday, 3 September 2017 at 14:19:19 UTC, Ilya Yaroshenko 
wrote:
 On Sunday, 3 September 2017 at 12:35:16 UTC, Moritz Maxeiner 
 wrote:
 On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko 
 wrote:
 On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner 
 wrote:
 On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko 
 wrote:
 On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch 
 wrote:
 Maybe of interest: 
 https://www.think-cell.com/en/career/talks/iterators/#1

 I haven't read everything, so not sure if it worth to take 
 a look.
Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].
Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?
There are three general kinds of dynamic dense tensors. Mir implements all of them: 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths. 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride. 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors. Finite random access range Issues: 1. Main API issue is that full featured random access range (RAR) API (with slicing, and all opIndex* operators like `[]*=` ) is much larger comparing with unbounded random access iterator (the same API as pointer).
Won't you have to implement opIndex* operators either way in order to use the `a[]` syntax? The main difference I can see should be that you'd also have to implement the InputRange (front,popFront,empty), ForwardRange (save), and BidirectionalRange (back,popBack) members, which if you don't need them, is indeed understandably too much?
front, popFront, empty, save, back, popBack, opIndexOpAssign, 2 x opIndex(.opSlice) for rhs scalars and ranges 2 x opIndexAssign(.opSlice) for rhs scalars and ranges 2 x opIndexOpAssign(.opSlice) for rhs scalars and ranges Plus, because of range topology slicing may be slower good to add this ones: popFrontN popFrontExactly, popBackN popBackExactly, and maybe i forgot something ...
Didn't know about the *N *Exactly ones, but yeah, it's a lot. Though unless you aren't interesting in the `a[]` syntax, you can't avoid opIndex*.
 2. Random access range holds its length. Yes, Phobos has 
 infinity ranges (mir hasn't), but the practice is that almost 
 any RAR constructed using Phobos has length and you can not 
 strip when type is constructed. Anyway, infinity RAR is just 
 a pointer/iterator that can move forward but can not move 
 backward. Tensors hold their own lengths for each dimensions, 
 so range's length is just a useless payload.
I'm not sure what you mean here. Is this still about accessing the elements of one tensor? If so, what do Phobos' ranges have to do with your tensor type's API?
ndslice is: 1. Slice structure with a huge amount of features from mir algorithm 2. multidimensional RAR: .front!0, front!1, .front!(N - 1) and all other generalized RAR primitves including multidimensional index [1, 2, 3, ...] and slicing [1 .. 4, 3 .. $]. 3. Common one dimensional RAR on top of outer most dimension. So one can use Phobos funcitons for Slice structure. For example, a matrix will be iterated row-by-row. But this paragraph was about another issue: struct BLASMatrixTypeBasedOnRanges { double[] _data; // arrays is the simplest RAR size_t rows, cols; size_t stride; }
Now I get what you mean, you were talking about not using the dynamic arrays built into the language (which indeed carry their length for safety purposes), not about exposing range API here; you're right, you don't need to use a dynamic array here, as you already have the length encoded another way (it would be wasteful), but strictly speaking D's builtin dynamic arrays are not ranges (as they are neither aggregate types nor do they have the range functions baked into the language). They only become ranges when you import the appropriate free functions from Phobos (or define some yourself).
 sizeof(YourBLASMatrixTypeBasedOnRanges) == size_t.sizeof * 5;

 in other hand:

 sizeof(Slice!(Canonical, [2], double*)) == size_t.sizeof * 4;

 Ranges requires for 25% more space for canonical matrixes then 
 iterators.
We do have to differentiate between a container and the API with which to iterate over its elements. D's dynamic arrays (a pointer+length) require more space then using only a raw pointer, of course. If that's more than an iterator depends on the type of iterator you use. Most common iterator implementations I know consist of a begin and an end pointer, essentially requiring the same space as D's dynamic arrays. In the case you describe, though, we aren't talking about the iteration API, we are talking about what's used internally for storage.
Sep 03
parent reply Ilya <ilyayaroshenko gmail.com> writes:
On Sunday, 3 September 2017 at 16:33:04 UTC, Moritz Maxeiner 
wrote:
 On Sunday, 3 September 2017 at 14:19:19 UTC, Ilya Yaroshenko 
 wrote:
 Ranges requires for 25% more space for canonical matrixes then 
 iterators.
We do have to differentiate between a container and the API with which to iterate over its elements. D's dynamic arrays (a pointer+length) require more space then using only a raw pointer, of course. If that's more than an iterator depends on the type of iterator you use. Most common iterator implementations I know consist of a begin and an end pointer, essentially requiring the same space as D's dynamic arrays. In the case you describe, though, we aren't talking about the iteration API, we are talking about what's used internally for storage.
Maybe I should call it cursors or generic pointers instead of iterators.
Sep 03
next sibling parent jmh530 <john.michael.hall gmail.com> writes:
On Monday, 4 September 2017 at 04:29:36 UTC, Ilya wrote:
 Maybe I should call it cursors or generic pointers instead of 
 iterators.
dcollections uses a cursor approach that seems different from yours https://github.com/schveiguy/dcollections/blob/master/concepts.txt Maybe generic pointers makes more sense? Unless I'm mistaken, mir iterators are more like pointers with specific operations defined for how they are incremented, assigned, etc. I had been a little confused by the name iterators because I don't see any functions in mir that use begin/end. But I don't know if that's more about C++'s implementation of iterators or more core to the concept.
Sep 04
prev sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Monday, 4 September 2017 at 04:29:36 UTC, Ilya wrote:
 Maybe I should call it cursors or generic pointers instead of 
 iterators.
Well, «C++ iterators» are table pointer abstractions, so you need a pair. An «iterator» would be a possibly heavy object that is used for traversing a possibly complex and heterogenous data-structure without exposing any notion of size or end a-priori (e.g. before it arrives at the end, if there is an end).
Sep 05
parent jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 6 September 2017 at 06:57:25 UTC, Ola Fosheim 
Grøstad wrote:
 Well, «C++ iterators» are table pointer abstractions, so you 
 need a pair.

 An «iterator» would be a possibly heavy object that is used for 
 traversing a possibly complex and heterogenous data-structure 
 without exposing any notion of size or end a-priori (e.g. 
 before it arrives at the end, if there is an end).
mir-ndslices are like a generalization of D's slices. Instead of pointer and a length, they are an iterator and lengths/strides. So the size is exposed.
Sep 06
prev sibling parent reply Enamex <enamex+d outlook.com> writes:
On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko 
wrote:
 1. Contiguous tensors. Their data is located contiguously in 
 memory. Single dense memory chunk. All strides between 
 subs-tensors can be computed from lengths.

 2. Canonical tensors. Only data for one dimension is dense, 
 other dimensions has strides that can not be computed from 
 lengths. BLAS matrixes are canonical tensors: they have two 
 lengths and one stride.

 3. Universal tensors. Each dimension has a stride. Numpy 
 ndarrays are universal tensors.
Can you elaborate?
Sep 06
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 6 September 2017 at 20:24:05 UTC, Enamex wrote:
 On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko 
 wrote:
 1. Contiguous tensors. Their data is located contiguously in 
 memory. Single dense memory chunk. All strides between 
 subs-tensors can be computed from lengths.

 2. Canonical tensors. Only data for one dimension is dense, 
 other dimensions has strides that can not be computed from 
 lengths. BLAS matrixes are canonical tensors: they have two 
 lengths and one stride.

 3. Universal tensors. Each dimension has a stride. Numpy 
 ndarrays are universal tensors.
Can you elaborate?
IMO, it's something that still needs to get explained better in the documentation. I haven't tried to because I'm not 100% on it. Below is as best as I have figured things out: All Slices in mir can have an iterator, lengths, and strides. The lengths are always N-dimensional and contain information on the shape of the Slice. So for instance, if the lengths are [3, 4], then N=2 and it is a 2-dimensional slice, with 3 rows and 4 columns. I left out packs...which are an added complication. Packs can be used to make slices of slices. For the above Slice, the default would simply be that the packs are [1], which means that there is no slice of slicing going on. If the packs were now [1, 1] (the sum of packs must equal N), then that is like saying you now have a slice of slices. In this case, slice[0] would be a row instead of a scalar. This is what allows you to iterate by row or by column. So when you're thinking about contiguous, canonical, and universal. The way that lengths and packs work is the same for all of them. The difference is in the strides. Contiguous slices don't have a strides vector. Canonical slices have a strides vector with a length of N-1. Universal slices have a strides vector of length N. So how are the strides used and why do they matter? I'm not sure I grok this part 100%, so I'll do my best. Strides tell you how much difference there is between the units of each array. So for instance, if my slice (call it a) has lengths [2, 3, 4] with strides [12, 4, 1], then a[0] is a [3, 4] matrix, which is where the 12 comes from. To move the pointer to the start of the next [3, 4] matrix (a[1]), requires moving 12 of whatever the type is. This would be a universal slice because it has N=3 lengths and strides. So if you call a._strides, then it would give you [12, 4, 1]. If a were canonical, calling _strides would give you [12, 4] because _strides for canonical slices have length N-1. Now if a were contiguous instead of universal and you call _strides on it, then it would give you [], because contiguous slices have no strides. The memory footprint of the slice appears different for these with a and a[0] of universal being larger than canonical and contiguous. This largely reflects the storage of the strides data. Similarly, a[0] has _strides [4, 1] for universal, [4] for canonical, and [] for contiguous. Mir is written in such a way that a[0] the same regardless of the SliceKind. For the most part, this means that it isn't really obvious that there is a difference between them. It matters in some underlying functions, but I haven't needed to do much other than sometimes convert a contiguous slice to universal (though it's not always clear to me why, I just do it).
Sep 06
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 6 September 2017 at 21:44:01 UTC, jmh530 wrote:
 On Wednesday, 6 September 2017 at 20:24:05 UTC, Enamex wrote:
 Similarly, a[0] has _strides [4, 1] for universal, [4] for 
 canonical, and [] for contiguous. Mir is written in such a way 
 that a[0] the same regardless of the SliceKind. For the most 
 part, this means that it isn't really obvious that there is a 
 difference between them. It matters in some underlying 
 functions, but I haven't needed to do much other than sometimes 
 convert a contiguous slice to universal (though it's not always 
 clear to me why, I just do it).
For example, lets takes `transposed` function. It does not transpose the date. Instead, it swap dimensions. Assume you have a canonical matrix with _lengths = [3, 4]. So its strides are [4]. Now we want to swap dimensions, but to do it we need to swap both lengths and strides. So first we need to convert a slice to universal, so it will have both strides we want to swap: [4, 1]. Transposed slice will have _lengths = [4, 3] and _strides = [1, 4]. Best Regards, Ilya
Sep 07
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 7 September 2017 at 09:40:40 UTC, Ilya Yaroshenko 
wrote:
 For example, lets takes `transposed` function. It does not 
 transpose the date. Instead, it swap dimensions.
 Assume you have a canonical matrix with _lengths = [3, 4]. So 
 its strides are [4]. Now we want to swap dimensions, but to do 
 it we need to swap both lengths and strides. So first we need 
 to convert a slice to universal, so it will have both strides 
 we want to swap: [4, 1]. Transposed slice will have _lengths = 
 [4, 3] and _strides = [1, 4].

 Best Regards,
 Ilya
I think what's missing from the documentation is a clear explanation of how the strides determine how the iterator moves. Even something like below (assuming I have the math right) would be an improvement, though I'm sure there is a clearer way to express the concept. auto x = iota(3, 4).universal; assert(x.strides == [4, 1]); assert(x[2, 3] == 6); //(2 - 1) * 4 + (3 - 1) * 1 auto y = x.transposed; assert(y.strides == [1, 4]); assert(y[2, 3] == 9); //(2 - 1) * 1 + (3 - 1) * 4
Sep 07
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 7 September 2017 at 12:04:12 UTC, jmh530 wrote:
 I think what's missing from the documentation is a clear 
 explanation of how the strides determine how the iterator 
 moves. Even something like below (assuming I have the math 
 right) would be an improvement, though I'm sure there is a 
 clearer way to express the concept.

 auto x = iota(3, 4).universal;
 assert(x.strides == [4, 1]);
 assert(x[2, 3] == 6); //(2 - 1) * 4 + (3 - 1) * 1

 auto y = x.transposed;
 assert(y.strides == [1, 4]);
 assert(y[2, 3] == 9); //(2 - 1) * 1 + (3 - 1) * 4
Hmm, maybe I can express this better as something like auto data = iota(12); auto x = data.sliced(2, 3).universal; assert(x.strides == [4, 1]); assert(x[2, 3] == data[(2 - 1) * 4 + (3 - 1) * 1]); auto y = x.transposed; assert(y.strides == [1, 4]); assert(y[2, 3] == data[(2 - 1) * 1 + (3 - 1) * 4]);
Sep 07
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 7 September 2017 at 12:27:19 UTC, jmh530 wrote:
 auto x = data.sliced(2, 3).universal;
Err, (3, 4) not (2, 3)
Sep 07
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 7 September 2017 at 12:28:00 UTC, jmh530 wrote:
 On Thursday, 7 September 2017 at 12:27:19 UTC, jmh530 wrote:
 auto x = data.sliced(2, 3).universal;
Err, (3, 4) not (2, 3)
All kinds of screwed up. This is what I get for not testing things before I post them. unittest { auto data = iota(12); auto x = data.sliced(3, 4).universal; assert(x.strides == [4, 1]); assert(x[1, 2] == data[1 * 4 + 2 * 1]); auto y = x.transposed; assert(y.strides == [1, 4]); assert(y[1, 2] == data[1 * 1 + 2 * 4]); }
Sep 07
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Thursday, 7 September 2017 at 20:46:22 UTC, jmh530 wrote:
 On Thursday, 7 September 2017 at 12:28:00 UTC, jmh530 wrote:
 On Thursday, 7 September 2017 at 12:27:19 UTC, jmh530 wrote:
 auto x = data.sliced(2, 3).universal;
Err, (3, 4) not (2, 3)
All kinds of screwed up. This is what I get for not testing things before I post them. unittest { auto data = iota(12); auto x = data.sliced(3, 4).universal; assert(x.strides == [4, 1]); assert(x[1, 2] == data[1 * 4 + 2 * 1]); auto y = x.transposed; assert(y.strides == [1, 4]); assert(y[1, 2] == data[1 * 1 + 2 * 4]); }
Another small difference is slicing: For example, for contiguous matrix m: 1. m[a .. b] is contiguous 2. m[i] is contiguous 3. m[a .. b, i] is universal (because there are no 1D canonical slices) 4. m[a .. b, c .. d] is canonical BTW, could you please update the docs or may be write a small article for Mir blog? I will be happy to answer your questions if any. Best Regards, Ilya
Sep 10
parent jmh530 <john.michael.hall gmail.com> writes:
On Monday, 11 September 2017 at 05:41:37 UTC, Ilya Yaroshenko 
wrote:
 Another small difference is slicing:
 For example, for contiguous matrix m:
 1. m[a .. b]         is contiguous
 2. m[i]              is contiguous
 3. m[a .. b, i]      is universal (because there are no 1D 
 canonical slices)
 4. m[a .. b, c .. d] is canonical
Was not aware of this.
 BTW, could you please update the docs or may be write a small 
 article for Mir blog?
I'm happy to do some additional work on the docs, but I was waiting until I had a better understanding of things. I've also been working on some other things and there are only so many hours in the day. I want to add cholesky to mir-lapack/lubeck.
 I will be happy to answer your questions if any.
I think I had asked a question on gitter, but I don't think I had heard back. Not sure how often you check that.
Sep 11