digitalmars.D - D Ranges in C#

David Piepgrass (3/3) May 31 2013 I'm adding D-style ranges to my new C# collections library. In

bearophile (16/23) Jun 01 2013 Ranges are an improvement over iterators about as much as oxygen

Peter Alexander (18/24) Jun 01 2013 Hashmaps would be the most common example. Usually implemented as

David Piepgrass (33/57) Jun 01 2013 In theory. But the .NET hashtables are implemented with an

Mehrdad (10/32) Jun 01 2013 Iterators are usually (but not always) faster, as they don't have

Paulo Pinto (3/25) Jun 01 2013 As of .NET 4.5, 64bit array indexes are supported as well.

David Piepgrass (12/17) Jun 01 2013 Don't forget that we're talking about a *hashtable* here. If a

renoX (7/9) Jun 01 2013 Some would disagree: have a look at x32:

Marco Leise (14/25) Jun 01 2013 No wait, the whole 'efficiency' point was that on x64 you

Steven Schveighoffer (33/38) Jun 03 2013 This is true. In dcollections, I have the concept of cursors. These ar...

Andrei Alexandrescu (3/6) Jun 01 2013 http://www.reddit.com/r/programming/comments/1fgnyl/dstyle_ranges_in_c_n...
Szymon Gatner (5/8) Jun 01 2013 Lol I am just preparing a blog post on D ranges in C++, a piece
=?UTF-8?B?QWxpIMOHZWhyZWxp?= (8/11) Jun 08 2013 (I wrote the following comment on the blog page but it is not visible at...

"David Piepgrass" <qwertie256 gmail.com> writes:


case anyone would like to comment, please see here:

http://loyc-etc.blogspot.ca/2013/06/d-style-ranges-in-c-net.html

May 31 2013

"bearophile" <bearophileHUGS lycos.com> writes:

David Piepgrass:

 http://loyc-etc.blogspot.ca/2013/06/d-style-ranges-in-c-net.html

 From the article:

Ranges are an improvement on the C++ concept of iterators. I 
don't exactly know how ranges were invented in D, but perhaps 
someone noticed that most of the C++ STL algorithms require 
pairs of iterators:<

Ranges are an improvement over iterators about as much as oxygen 
molecules are an "improvement" over oxygen atoms. You breath 
mostly O2, and it allows you to live, but they are made of atomic 
oxygen. Atomic oxygen is more fundamental, it is here to stay (as 
Alexander Stepanov says), and it has "capabilities" that 
molecules don't have. In your normal life you can think to use 
only molecular oxygen, but sometimes you can even desire some 
ozone :-) And in chemistry reactions that happen inside you all 
the time, like oxidation, the oxygen molecules usually break 
apart to react.


In fact, most STL algorithms require exactly two iterators--a 
range--and none require only a single iterator<

I think there are some C++ data structures that store many single 
iterators. If you instead store ranges you double the data amount.

Bye,
bearophile

Jun 01 2013

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Saturday, 1 June 2013 at 08:12:00 UTC, bearophile wrote:
 David Piepgrass:
In fact, most STL algorithms require exactly two iterators--a 
range--and none require only a single iterator<

 I think there are some C++ data structures that store many 
 single iterators. If you instead store ranges you double the 
 data amount.

Hashmaps would be the most common example. Usually implemented as 
a linked list of key-value pairs along with a vector of list 
iterators.

Iterators are also useful for constructing sub-ranges, which 
proves useful in the implementation of some algorithms. Writing 
std::next_permutation in D with ranges is quiet frustrating 
compared to C++.

https://github.com/D-Programming-Language/phobos/blob/master/std/algorithm.d#L10901
http://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a01045_source.html#l03619

Notice that in D you need to maintain a count of the elements 
popped off the back and then use takeExactly to retrieve that 
portion again. In C++ you just move the iterators around and 
create "ranges" from pairs of iterators as you need them.

When I implemented nextPermutation in D, I constantly felt as if 
I was fighting with ranges instead of working with them - I knew 
exactly what I wanted to do, but ranges only provide a roundabout 
way of doing it.

Jun 01 2013

"David Piepgrass" <qwertie256 gmail.com> writes:

 David Piepgrass:
In fact, most STL algorithms require exactly two iterators--a 
range--and none require only a single iterator<

 I think there are some C++ data structures that store many 
 single iterators. If you instead store ranges you double the 
 data amount.

 Hashmaps would be the most common example. Usually implemented 
 as a linked list of key-value pairs along with a vector of list 
 iterators.

In theory. But the .NET hashtables are implemented with an 
*array* of key-value pairs and an array of *indexes*. The former 
forms a virtual linked list that is more efficient than a 
traditional linked list, and the latter is more efficient than a 
vector of iterators (especially on x64, as the indexes can be 
32-bit.)

 Iterators are also useful for constructing sub-ranges, which 
 proves useful in the implementation of some algorithms. Writing 
 std::next_permutation in D with ranges is quiet frustrating 
 compared to C++.

 https://github.com/D-Programming-Language/phobos/blob/master/std/algorithm.d#L10901
 http://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a01045_source.html#l03619

 Notice that in D you need to maintain a count of the elements 
 popped off the back and then use takeExactly to retrieve that 
 portion again. In C++ you just move the iterators around and 
 create "ranges" from pairs of iterators as you need them.

 When I implemented nextPermutation in D, I constantly felt as 
 if I was fighting with ranges instead of working with them - I 
 knew exactly what I wanted to do, but ranges only provide a 
 roundabout way of doing it.

Hmm, very interesting. I suppose the problem with C++ iterators 
is that they are useless without external context: you can't 
increment or decrement one without also comparing it to begin() 
or end() in its container, which implies that the caller must 
manually keep track of which container it came from. Thus, an 
iterator is hardly an improvement over a plain-old integer index, 
the only advantages being

1. You can dereference it (*if* you can be sure it doesn't point 
to end())
2. Unlike an index, it's compatible with non-random-access data 
structures

But perhaps the iterator concept could be improved by being made 
self-aware: if the iterator "knew" and could tell you when it was 
at the beginning or end of its container. This would increase the 
storage requirement in some circumstances but not others. For 
example, an iterator to a doubly-linked list node can tell when 
it is at the beginning or end, but an iterator to a singly-linked 
list node can only tell when it is at the end. A pointer inside 
an array may or may not be able to tell if it is at the end 
depending on how arrays work, e.g. perhaps the way D heap arrays 
work would allow an array iterator, implemented as a simple 
pointer, to still know when it is safe to increment and decrement.

The simplest possible .NET array iterator is an array reference 
plus an index, and again the iterator can know when it is at the 
beginning or end of the array--except that if the iterator came 
from inside a range, it would be unaware of the range's 
boundaries, which may be smaller than the array's boundaries.

Jun 01 2013

"Mehrdad" <wfunction hotmail.com> writes:

On Saturday, 1 June 2013 at 16:30:05 UTC, David Piepgrass wrote:
 David Piepgrass:
In fact, most STL algorithms require exactly two iterators--a 
range--and none require only a single iterator<

 I think there are some C++ data structures that store many 
 single iterators. If you instead store ranges you double the 
 data amount.

 Hashmaps would be the most common example. Usually implemented 
 as a linked list of key-value pairs along with a vector of 
 list iterators.

 In theory. But the .NET hashtables are implemented with an 
 *array* of key-value pairs and an array of *indexes*. The 
 former forms a virtual linked list that is more efficient than 
 a traditional linked list, and the latter is more efficient 
 than a vector of iterators (especially on x64, as the indexes 
 can be 32-bit.)



Iterators are usually (but not always) faster, as they don't have 
an extra level of indirection involved -- the iterators tell you 
directly where to look in memory, whereas indices are useless 
without containers (or iterators) backing them.


You shouldn't be using 32-bit indices on x64, that defeats the 
whole point of x64.


 Hmm, very interesting. I suppose the problem with C++ iterators 
 is that they are useless without external context: you can't 
 increment or decrement one without also comparing it to begin() 
 or end() in its container, which implies that the caller must 
 manually keep track of which container it came from.


No, that's exactly the opposite of what happens.

The caller just accepts two iterators (like std::unique), so that 
it has the end iterator as well as the begin iterator.

Jun 01 2013

Paulo Pinto <pjmlp progtools.org> writes:

Am 01.06.2013 20:23, schrieb Mehrdad:
 On Saturday, 1 June 2013 at 16:30:05 UTC, David Piepgrass wrote:
 David Piepgrass:
 In fact, most STL algorithms require exactly two iterators--a
 range--and none require only a single iterator<

 I think there are some C++ data structures that store many single
 iterators. If you instead store ranges you double the data amount.

 Hashmaps would be the most common example. Usually implemented as a
 linked list of key-value pairs along with a vector of list iterators.

 In theory. But the .NET hashtables are implemented with an *array* of
 key-value pairs and an array of *indexes*. The former forms a virtual
 linked list that is more efficient than a traditional linked list, and
 the latter is more efficient than a vector of iterators (especially on
 x64, as the indexes can be 32-bit.)



 Iterators are usually (but not always) faster, as they don't have an
 extra level of indirection involved -- the iterators tell you directly
 where to look in memory, whereas indices are useless without containers
 (or iterators) backing them.


 You shouldn't be using 32-bit indices on x64, that defeats the whole
 point of x64.

As of .NET 4.5, 64bit array indexes are supported as well.

http://msdn.microsoft.com/en-us/library/hh285054.aspx

Jun 01 2013

"David Piepgrass" <qwertie256 gmail.com> writes:

 You shouldn't be using 32-bit indices on x64, that defeats the 
 whole
 point of x64.

 As of .NET 4.5, 64bit array indexes are supported as well.

 http://msdn.microsoft.com/en-us/library/hh285054.aspx

Don't forget that we're talking about a *hashtable* here. If a 
.NET hashtable used 64-bit indexes (or pointers) it would require 
8-12 bytes more memory per entry, specifically 32 bytes total, 
including overhead, if the key and value are 4 bytes each.

An in-memory hashtable that requires 64-bit indexes rather than 
32 bits would have to contain over 4 billion entries which would 
take at least 128 GB of RAM, assuming 8 bytes for each key-value 
pair!!! In fact it's worse than that, as the dictionary grows by 
size-doubling and contains a certain amount of unused entries at 
the end.

No thanks, I'd rather save those 8 bytes and accept the 4 billion 
limit, if you don't mind.

Jun 01 2013

"renoX" <renozyx gmail.com> writes:

On Saturday, 1 June 2013 at 18:23:31 UTC, Mehrdad wrote:
[cut]
 You shouldn't be using 32-bit indices on x64, that defeats the 
 whole point of x64.

Some would disagree: have a look at x32: 
http://en.wikipedia.org/wiki/X32_ABI
Being able to use efficiently 64 bit doesn't mean that you *have* 
to use 64bit when 32bit is enough..
renoX

Jun 01 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 01 Jun 2013 23:20:19 +0200
schrieb "renoX" <renozyx gmail.com>:

 On Saturday, 1 June 2013 at 18:23:31 UTC, Mehrdad wrote:
 [cut]
 You shouldn't be using 32-bit indices on x64, that defeats the 
 whole point of x64.

 
 Some would disagree: have a look at x32: 
 http://en.wikipedia.org/wiki/X32_ABI
 Being able to use efficiently 64 bit doesn't mean that you *have* 
 to use 64bit when 32bit is enough..
 renoX

No wait, the whole 'efficiency' point was that on x64 you
don't have to waste memory with the native word size for
indexes, but can go with 32-bits. So it saves memory.

But Mehrdad argued that we have x64 to be able to address
beyond 2^32 (bytes, items, objects). And 32-bit indexes in
generic code are thus asking for overflow bugs on x64 one day.

x32 adds nothing to the discussion as you'd always use 32-bit
indexes there. There is no point in storing more than 2^32
items when the address space is 32-bit. (Exceptions: bit
arrays, file offsets, etc.)

-- 
Marco

Jun 01 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 01 Jun 2013 04:11:59 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 David Piepgrass:

 In fact, most STL algorithms require exactly two iterators--a  
 range--and none require only a single iterator<

 I think there are some C++ data structures that store many single  
 iterators. If you instead store ranges you double the data amount.

This is true.  In dcollections, I have the concept of cursors.  These are  
1 or 0 element ranges (they are 0 after the first popFront).  For almost  
all cases, they require an extra boolean to designate the "empty" state.

So while not consuming 2x, it's more like 1.5x-ish.  Because of alignment,  
it pretty much requires 2x.  There have been suggestions on utilizing  
perhaps unused bits in the pointer to designate the empty flag, but I'm a  
bit aprehensive on that, especially if the node is GC-managed.

But besides the space requirements the *concept* of a single-element  
pointer is very much needed.  And cursors fill that role.

As an example, std.algorithm.find consumes a range until it finds the  
specific value.  It then returns a range of the remaining data.  But what  
if you wanted the range of the data UNTIL the first value?

In std.container, we have three functions to deal with this, upperBound,  
lowerBound, and equalRange.  But even with these, it can be difficult to  
construct intersections of these two ranges.

In dcollections, we have one function, find, which returns a cursor.  Then  
using the cursor, you can construct the range you need:

auto r = mymap[mymap.find(5)..$]; // opDollar not supported yet, but will  
be.

auto rbefore = mymap[mymap.begin()..mymap.find(5)];

All of this is safe and correct, and I think it reads rather well.

There are other good places where single-element ranges are useful.  It is  
important to note that a cursor is NOT equivalent to a range of one  
element.  Such a range in a node-based container has two node pointers.  A  
cursor MUST only point at exactly one element.  This is important when  
considering cursor invalidation -- removing an element unreferenced by a  
cursor should not invalidate that cursor (depending on how the container  
is structured).  For example, adding a new value to a hash set, or  
removing an unrelated value from a hash set may invalidate all ranges, but  
not cursors.

-Steve

Jun 03 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 6/1/13 2:17 AM, David Piepgrass wrote:

 anyone would like to comment, please see here:

 http://loyc-etc.blogspot.ca/2013/06/d-style-ranges-in-c-net.html

http://www.reddit.com/r/programming/comments/1fgnyl/dstyle_ranges_in_c_net/

Andrei

Jun 01 2013

"Szymon Gatner" <noemail gmail.com> writes:

On Saturday, 1 June 2013 at 06:17:30 UTC, David Piepgrass wrote:

 case anyone would like to comment, please see here:

 http://loyc-etc.blogspot.ca/2013/06/d-style-ranges-in-c-net.html

Lol I am just preparing a blog post on D ranges in C++, a piece 
of code I did after Andrei's "Iterators must Go" presentation. It 
was not super usable before C++11 and auto but now is pretty cool 
thing to have.

Jun 01 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 05/31/2013 11:17 PM, David Piepgrass wrote:

 anyone would like to comment, please see here:

 http://loyc-etc.blogspot.ca/2013/06/d-style-ranges-in-c-net.html

(I wrote the following comment on the blog page but it is not visible at 
this time.)

Thank you for writing this.

I think the range article that you mentioned is Andrei Alexandrescu's 
"On Iteration":

   http://www.informit.com/articles/article.aspx?p=1407357

Ali

Jun 08 2013

D Programming

C/C++ Programming

Other

digitalmars.D - D Ranges in C#