www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D Ranges in C#

reply "David Piepgrass" <qwertie256 gmail.com> writes:
I'm adding D-style ranges to my new C# collections library. In 
case anyone would like to comment, please see here:

http://loyc-etc.blogspot.ca/2013/06/d-style-ranges-in-c-net.html
May 31 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
David Piepgrass:

 http://loyc-etc.blogspot.ca/2013/06/d-style-ranges-in-c-net.html

From the article:
Ranges are an improvement on the C++ concept of iterators. I 
don't exactly know how ranges were invented in D, but perhaps 
someone noticed that most of the C++ STL algorithms require 
pairs of iterators:<

Ranges are an improvement over iterators about as much as oxygen molecules are an "improvement" over oxygen atoms. You breath mostly O2, and it allows you to live, but they are made of atomic oxygen. Atomic oxygen is more fundamental, it is here to stay (as Alexander Stepanov says), and it has "capabilities" that molecules don't have. In your normal life you can think to use only molecular oxygen, but sometimes you can even desire some ozone :-) And in chemistry reactions that happen inside you all the time, like oxidation, the oxygen molecules usually break apart to react.
In fact, most STL algorithms require exactly two iterators--a 
range--and none require only a single iterator<

I think there are some C++ data structures that store many single iterators. If you instead store ranges you double the data amount. Bye, bearophile
Jun 01 2013
parent Paulo Pinto <pjmlp progtools.org> writes:
Am 01.06.2013 20:23, schrieb Mehrdad:
 On Saturday, 1 June 2013 at 16:30:05 UTC, David Piepgrass wrote:
 David Piepgrass:
 In fact, most STL algorithms require exactly two iterators--a
 range--and none require only a single iterator<

I think there are some C++ data structures that store many single iterators. If you instead store ranges you double the data amount.

Hashmaps would be the most common example. Usually implemented as a linked list of key-value pairs along with a vector of list iterators.

In theory. But the .NET hashtables are implemented with an *array* of key-value pairs and an array of *indexes*. The former forms a virtual linked list that is more efficient than a traditional linked list, and the latter is more efficient than a vector of iterators (especially on x64, as the indexes can be 32-bit.)

Iterators are usually (but not always) faster, as they don't have an extra level of indirection involved -- the iterators tell you directly where to look in memory, whereas indices are useless without containers (or iterators) backing them. You shouldn't be using 32-bit indices on x64, that defeats the whole point of x64.

As of .NET 4.5, 64bit array indexes are supported as well. http://msdn.microsoft.com/en-us/library/hh285054.aspx
Jun 01 2013
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Saturday, 1 June 2013 at 08:12:00 UTC, bearophile wrote:
 David Piepgrass:
In fact, most STL algorithms require exactly two iterators--a 
range--and none require only a single iterator<

I think there are some C++ data structures that store many single iterators. If you instead store ranges you double the data amount.

Hashmaps would be the most common example. Usually implemented as a linked list of key-value pairs along with a vector of list iterators. Iterators are also useful for constructing sub-ranges, which proves useful in the implementation of some algorithms. Writing std::next_permutation in D with ranges is quiet frustrating compared to C++. https://github.com/D-Programming-Language/phobos/blob/master/std/algorithm.d#L10901 http://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a01045_source.html#l03619 Notice that in D you need to maintain a count of the elements popped off the back and then use takeExactly to retrieve that portion again. In C++ you just move the iterators around and create "ranges" from pairs of iterators as you need them. When I implemented nextPermutation in D, I constantly felt as if I was fighting with ranges instead of working with them - I knew exactly what I wanted to do, but ranges only provide a roundabout way of doing it.
Jun 01 2013
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/1/13 2:17 AM, David Piepgrass wrote:
 I'm adding D-style ranges to my new C# collections library. In case
 anyone would like to comment, please see here:

 http://loyc-etc.blogspot.ca/2013/06/d-style-ranges-in-c-net.html

http://www.reddit.com/r/programming/comments/1fgnyl/dstyle_ranges_in_c_net/ Andrei
Jun 01 2013
prev sibling next sibling parent "Szymon Gatner" <noemail gmail.com> writes:
On Saturday, 1 June 2013 at 06:17:30 UTC, David Piepgrass wrote:
 I'm adding D-style ranges to my new C# collections library. In 
 case anyone would like to comment, please see here:

 http://loyc-etc.blogspot.ca/2013/06/d-style-ranges-in-c-net.html

Lol I am just preparing a blog post on D ranges in C++, a piece of code I did after Andrei's "Iterators must Go" presentation. It was not super usable before C++11 and auto but now is pretty cool thing to have.
Jun 01 2013
prev sibling next sibling parent "David Piepgrass" <qwertie256 gmail.com> writes:
 David Piepgrass:
In fact, most STL algorithms require exactly two iterators--a 
range--and none require only a single iterator<

I think there are some C++ data structures that store many single iterators. If you instead store ranges you double the data amount.

Hashmaps would be the most common example. Usually implemented as a linked list of key-value pairs along with a vector of list iterators.

In theory. But the .NET hashtables are implemented with an *array* of key-value pairs and an array of *indexes*. The former forms a virtual linked list that is more efficient than a traditional linked list, and the latter is more efficient than a vector of iterators (especially on x64, as the indexes can be 32-bit.)
 Iterators are also useful for constructing sub-ranges, which 
 proves useful in the implementation of some algorithms. Writing 
 std::next_permutation in D with ranges is quiet frustrating 
 compared to C++.

 https://github.com/D-Programming-Language/phobos/blob/master/std/algorithm.d#L10901
 http://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a01045_source.html#l03619

 Notice that in D you need to maintain a count of the elements 
 popped off the back and then use takeExactly to retrieve that 
 portion again. In C++ you just move the iterators around and 
 create "ranges" from pairs of iterators as you need them.

 When I implemented nextPermutation in D, I constantly felt as 
 if I was fighting with ranges instead of working with them - I 
 knew exactly what I wanted to do, but ranges only provide a 
 roundabout way of doing it.

Hmm, very interesting. I suppose the problem with C++ iterators is that they are useless without external context: you can't increment or decrement one without also comparing it to begin() or end() in its container, which implies that the caller must manually keep track of which container it came from. Thus, an iterator is hardly an improvement over a plain-old integer index, the only advantages being 1. You can dereference it (*if* you can be sure it doesn't point to end()) 2. Unlike an index, it's compatible with non-random-access data structures But perhaps the iterator concept could be improved by being made self-aware: if the iterator "knew" and could tell you when it was at the beginning or end of its container. This would increase the storage requirement in some circumstances but not others. For example, an iterator to a doubly-linked list node can tell when it is at the beginning or end, but an iterator to a singly-linked list node can only tell when it is at the end. A pointer inside an array may or may not be able to tell if it is at the end depending on how arrays work, e.g. perhaps the way D heap arrays work would allow an array iterator, implemented as a simple pointer, to still know when it is safe to increment and decrement. The simplest possible .NET array iterator is an array reference plus an index, and again the iterator can know when it is at the beginning or end of the array--except that if the iterator came from inside a range, it would be unaware of the range's boundaries, which may be smaller than the array's boundaries.
Jun 01 2013
prev sibling next sibling parent "Mehrdad" <wfunction hotmail.com> writes:
On Saturday, 1 June 2013 at 16:30:05 UTC, David Piepgrass wrote:
 David Piepgrass:
In fact, most STL algorithms require exactly two iterators--a 
range--and none require only a single iterator<

I think there are some C++ data structures that store many single iterators. If you instead store ranges you double the data amount.

Hashmaps would be the most common example. Usually implemented as a linked list of key-value pairs along with a vector of list iterators.

In theory. But the .NET hashtables are implemented with an *array* of key-value pairs and an array of *indexes*. The former forms a virtual linked list that is more efficient than a traditional linked list, and the latter is more efficient than a vector of iterators (especially on x64, as the indexes can be 32-bit.)

Iterators are usually (but not always) faster, as they don't have an extra level of indirection involved -- the iterators tell you directly where to look in memory, whereas indices are useless without containers (or iterators) backing them. You shouldn't be using 32-bit indices on x64, that defeats the whole point of x64.
 Hmm, very interesting. I suppose the problem with C++ iterators 
 is that they are useless without external context: you can't 
 increment or decrement one without also comparing it to begin() 
 or end() in its container, which implies that the caller must 
 manually keep track of which container it came from.

No, that's exactly the opposite of what happens. The caller just accepts two iterators (like std::unique), so that it has the end iterator as well as the begin iterator.
Jun 01 2013
prev sibling next sibling parent "David Piepgrass" <qwertie256 gmail.com> writes:
 You shouldn't be using 32-bit indices on x64, that defeats the 
 whole
 point of x64.

As of .NET 4.5, 64bit array indexes are supported as well. http://msdn.microsoft.com/en-us/library/hh285054.aspx

Don't forget that we're talking about a *hashtable* here. If a .NET hashtable used 64-bit indexes (or pointers) it would require 8-12 bytes more memory per entry, specifically 32 bytes total, including overhead, if the key and value are 4 bytes each. An in-memory hashtable that requires 64-bit indexes rather than 32 bits would have to contain over 4 billion entries which would take at least 128 GB of RAM, assuming 8 bytes for each key-value pair!!! In fact it's worse than that, as the dictionary grows by size-doubling and contains a certain amount of unused entries at the end. No thanks, I'd rather save those 8 bytes and accept the 4 billion limit, if you don't mind.
Jun 01 2013
prev sibling next sibling parent "renoX" <renozyx gmail.com> writes:
On Saturday, 1 June 2013 at 18:23:31 UTC, Mehrdad wrote:
[cut]
 You shouldn't be using 32-bit indices on x64, that defeats the 
 whole point of x64.

Some would disagree: have a look at x32: http://en.wikipedia.org/wiki/X32_ABI Being able to use efficiently 64 bit doesn't mean that you *have* to use 64bit when 32bit is enough.. renoX
Jun 01 2013
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 01 Jun 2013 23:20:19 +0200
schrieb "renoX" <renozyx gmail.com>:

 On Saturday, 1 June 2013 at 18:23:31 UTC, Mehrdad wrote:
 [cut]
 You shouldn't be using 32-bit indices on x64, that defeats the 
 whole point of x64.

Some would disagree: have a look at x32: http://en.wikipedia.org/wiki/X32_ABI Being able to use efficiently 64 bit doesn't mean that you *have* to use 64bit when 32bit is enough.. renoX

No wait, the whole 'efficiency' point was that on x64 you don't have to waste memory with the native word size for indexes, but can go with 32-bits. So it saves memory. But Mehrdad argued that we have x64 to be able to address beyond 2^32 (bytes, items, objects). And 32-bit indexes in generic code are thus asking for overflow bugs on x64 one day. x32 adds nothing to the discussion as you'd always use 32-bit indexes there. There is no point in storing more than 2^32 items when the address space is 32-bit. (Exceptions: bit arrays, file offsets, etc.) -- Marco
Jun 01 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 01 Jun 2013 04:11:59 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 David Piepgrass:

 In fact, most STL algorithms require exactly two iterators--a  
 range--and none require only a single iterator<

I think there are some C++ data structures that store many single iterators. If you instead store ranges you double the data amount.

This is true. In dcollections, I have the concept of cursors. These are 1 or 0 element ranges (they are 0 after the first popFront). For almost all cases, they require an extra boolean to designate the "empty" state. So while not consuming 2x, it's more like 1.5x-ish. Because of alignment, it pretty much requires 2x. There have been suggestions on utilizing perhaps unused bits in the pointer to designate the empty flag, but I'm a bit aprehensive on that, especially if the node is GC-managed. But besides the space requirements the *concept* of a single-element pointer is very much needed. And cursors fill that role. As an example, std.algorithm.find consumes a range until it finds the specific value. It then returns a range of the remaining data. But what if you wanted the range of the data UNTIL the first value? In std.container, we have three functions to deal with this, upperBound, lowerBound, and equalRange. But even with these, it can be difficult to construct intersections of these two ranges. In dcollections, we have one function, find, which returns a cursor. Then using the cursor, you can construct the range you need: auto r = mymap[mymap.find(5)..$]; // opDollar not supported yet, but will be. auto rbefore = mymap[mymap.begin()..mymap.find(5)]; All of this is safe and correct, and I think it reads rather well. There are other good places where single-element ranges are useful. It is important to note that a cursor is NOT equivalent to a range of one element. Such a range in a node-based container has two node pointers. A cursor MUST only point at exactly one element. This is important when considering cursor invalidation -- removing an element unreferenced by a cursor should not invalidate that cursor (depending on how the container is structured). For example, adding a new value to a hash set, or removing an unrelated value from a hash set may invalidate all ranges, but not cursors. -Steve
Jun 03 2013
prev sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 05/31/2013 11:17 PM, David Piepgrass wrote:
 I'm adding D-style ranges to my new C# collections library. In case
 anyone would like to comment, please see here:

 http://loyc-etc.blogspot.ca/2013/06/d-style-ranges-in-c-net.html

(I wrote the following comment on the blog page but it is not visible at this time.) Thank you for writing this. I think the range article that you mentioned is Andrei Alexandrescu's "On Iteration": http://www.informit.com/articles/article.aspx?p=1407357 Ali
Jun 08 2013