www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - RangeExtra

reply dsimcha <dsimcha yahoo.com> writes:
I'm loving the new Phobos, and have been playing around with ranges, alias
this, etc.  I've compiled a list of a few range types that I have at least
partial implementations of for personal use that were overlooked by Andrei, et
al. in the initial release of the new Phobos.  Please tell me which of these
are generally useful and should be cleaned up and released, which ones are too
niche to belong in Phobos, which ones std.range can already do by some
slightly non-obvious idiom, and which ones, if any, are already in the works.

CapacityArray:  A range that covers a builtin array and gives it a capacity
field.  All operations that don't have to do with appending are forwarded to
the underlying array via alias this (modulo a few compiler bugs), meaning that
a CapacityArray has an identical compile time interface to a builtin array,
and is implicitly convertible to one.

Reindex:  Takes a zero-indexed random access range and converts it to a range
with an arbitrary base index.  Useful for one-indexed arrays to fit
mathematical notation conventions, etc.

Comb:  Given a random-access range, iterate over all unordered pairs, triples,
etc. of elements in the underlying range.  front(), opIndex(), etc. would
return tuples or structs containing static arrays.  Note that to do this
efficiently, the number of elements (pair, triplet, etc.) would have to be
known at compile time.

Joint:  Kind of like std.range.Zip, but returns a tuple of values of the
elements in the underlying ranges instead of a proxy of pointers to the
elements of the underlying range.  Zip is great for what it's designed for,
but sometimes the pointer/reference semantics are problematic.

Rotated:  Gives a rotated view of a random access range without modifying the
underlying range.  Also has a put() method.  Given an underlying range of
length L, this provides an efficient way to remember the last L elements seen
by the range:

auto underlying = [1,2,3];
auto r = rotated(underlying, 1);

foreach(elem; r) {  // Prints "2  3  1".
    write(elem, "  ");
}

r.put(5);  // O(1).
r.put(4);
r.put(3);
r.put(2);
r.put(1);
foreach(elem; r) {  // Prints "3  2  1".
    write(elem, "  ");
}
Apr 23 2009
next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 23 Apr 2009 22:12:03 +0400, dsimcha <dsimcha yahoo.com> wrote:

 I'm loving the new Phobos, and have been playing around with ranges,  
 alias
 this, etc.  I've compiled a list of a few range types that I have at  
 least
 partial implementations of for personal use that were overlooked by  
 Andrei, et
 al. in the initial release of the new Phobos.  Please tell me which of  
 these
 are generally useful and should be cleaned up and released, which ones  
 are too
 niche to belong in Phobos, which ones std.range can already do by some
 slightly non-obvious idiom, and which ones, if any, are already in the  
 works.

 CapacityArray:  A range that covers a builtin array and gives it a  
 capacity
 field.  All operations that don't have to do with appending are  
 forwarded to
 the underlying array via alias this (modulo a few compiler bugs),  
 meaning that
 a CapacityArray has an identical compile time interface to a builtin  
 array,
 and is implicitly convertible to one.

 Reindex:  Takes a zero-indexed random access range and converts it to a  
 range
 with an arbitrary base index.  Useful for one-indexed arrays to fit
 mathematical notation conventions, etc.

 Comb:  Given a random-access range, iterate over all unordered pairs,  
 triples,
 etc. of elements in the underlying range.  front(), opIndex(), etc. would
 return tuples or structs containing static arrays.  Note that to do this
 efficiently, the number of elements (pair, triplet, etc.) would have to  
 be
 known at compile time.

 Joint:  Kind of like std.range.Zip, but returns a tuple of values of the
 elements in the underlying ranges instead of a proxy of pointers to the
 elements of the underlying range.  Zip is great for what it's designed  
 for,
 but sometimes the pointer/reference semantics are problematic.

 Rotated:  Gives a rotated view of a random access range without  
 modifying the
 underlying range.  Also has a put() method.  Given an underlying range of
 length L, this provides an efficient way to remember the last L elements  
 seen
 by the range:

 auto underlying = [1,2,3];
 auto r = rotated(underlying, 1);

 foreach(elem; r) {  // Prints "2  3  1".
     write(elem, "  ");
 }

 r.put(5);  // O(1).
 r.put(4);
 r.put(3);
 r.put(2);
 r.put(1);
 foreach(elem; r) {  // Prints "3  2  1".
     write(elem, "  ");
 }

Great! Personally, I'd love to use CapacityArray, others are pretty cool, too.
Apr 23 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 I'm loving the new Phobos, and have been playing around with ranges, alias
 this, etc.  I've compiled a list of a few range types that I have at least
 partial implementations of for personal use that were overlooked by Andrei, et
 al. in the initial release of the new Phobos.  Please tell me which of these
 are generally useful and should be cleaned up and released, which ones are too
 niche to belong in Phobos, which ones std.range can already do by some
 slightly non-obvious idiom, and which ones, if any, are already in the works.

I think these are fantastic. Some comments within.
 CapacityArray:  A range that covers a builtin array and gives it a capacity
 field.  All operations that don't have to do with appending are forwarded to
 the underlying array via alias this (modulo a few compiler bugs), meaning that
 a CapacityArray has an identical compile time interface to a builtin array,
 and is implicitly convertible to one.

Upon more thinking, I think we'll have to bit the bullet and define Array!T and Slice!T. Then dmd rewrites: [ a, b, c ] -> .Array!(typeof(a))(a, b, c) T[new] -> .Array!(T) new T[n] -> .Array!(T)(n) T[] -> .Slice!(T) It seems increasingly clear that slices alone can't quite cut the mustard. So your CapacityArray will essentially be the basis of Array.
 Reindex:  Takes a zero-indexed random access range and converts it to a range
 with an arbitrary base index.  Useful for one-indexed arrays to fit
 mathematical notation conventions, etc.

Hm. I'd take the opportunity to actually reindex the range in an arbitrary way, e.g. by computing a separate index for it and using it transparently. I see little utility in just changing base (I tried such based_vector code for C++ and it created more confusion than it removed.)
 Comb:  Given a random-access range, iterate over all unordered pairs, triples,
 etc. of elements in the underlying range.  front(), opIndex(), etc. would
 return tuples or structs containing static arrays.  Note that to do this
 efficiently, the number of elements (pair, triplet, etc.) would have to be
 known at compile time.

Sounds cool.
 Joint:  Kind of like std.range.Zip, but returns a tuple of values of the
 elements in the underlying ranges instead of a proxy of pointers to the
 elements of the underlying range.  Zip is great for what it's designed for,
 but sometimes the pointer/reference semantics are problematic.

Joint is a risque name, but the idea is good. My preference would be to look into using alias this with Zip for a seamless experience though. After all, once you have a seamless Zip, creating copies out of it is the easy part. Besides, Joint can only make for an input range as people can't save references returned by front(). Zip can be as strong a range as possible.
 Rotated:  Gives a rotated view of a random access range without modifying the
 underlying range.  Also has a put() method.  Given an underlying range of
 length L, this provides an efficient way to remember the last L elements seen
 by the range:

I think that's very related to Cycle. http://www.digitalmars.com/d/2.0/phobos/std_range.html#Cycle Keep the great ideas coming! Andrei
Apr 23 2009
next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 23 Apr 2009 22:37:37 +0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 dsimcha wrote:
 I'm loving the new Phobos, and have been playing around with ranges,  
 alias
 this, etc.  I've compiled a list of a few range types that I have at  
 least
 partial implementations of for personal use that were overlooked by  
 Andrei, et
 al. in the initial release of the new Phobos.  Please tell me which of  
 these
 are generally useful and should be cleaned up and released, which ones  
 are too
 niche to belong in Phobos, which ones std.range can already do by some
 slightly non-obvious idiom, and which ones, if any, are already in the  
 works.

I think these are fantastic. Some comments within.
 CapacityArray:  A range that covers a builtin array and gives it a  
 capacity
 field.  All operations that don't have to do with appending are  
 forwarded to
 the underlying array via alias this (modulo a few compiler bugs),  
 meaning that
 a CapacityArray has an identical compile time interface to a builtin  
 array,
 and is implicitly convertible to one.

Upon more thinking, I think we'll have to bit the bullet and define Array!T and Slice!T. Then dmd rewrites: [ a, b, c ] -> .Array!(typeof(a))(a, b, c) T[new] -> .Array!(T) new T[n] -> .Array!(T)(n) T[] -> .Slice!(T) It seems increasingly clear that slices alone can't quite cut the mustard. So your CapacityArray will essentially be the basis of Array.

Now that there is a consideration about putting built-in types into a library... ... how about introducing struct MaybeNull(T) { ... } among them (and NonNull-by-default)? Oh, and Object.d deserves a better name, IMHO.
Apr 23 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 On Thu, 23 Apr 2009 22:37:37 +0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Now that there is a consideration about putting built-in types into a
library...
 
 ... how about introducing struct MaybeNull(T) { ... } among them (and
NonNull-by-default)?

Maybe that could be done by manipulating the implicit constructor of Ref. Andrei
Apr 23 2009
prev sibling next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Reindex:  Takes a zero-indexed random access range and converts it to a range
 with an arbitrary base index.  Useful for one-indexed arrays to fit
 mathematical notation conventions, etc.

arbitrary way, e.g. by computing a separate index for it and using it transparently. I see little utility in just changing base (I tried such based_vector code for C++ and it created more confusion than it removed.)

So it would take something like a lambda function as a template alias parameter to map an index in whatever space you want onto normal zero-indexing? This sounds like a very good idea: Array with only even indices: uint[] foo = [0,2,4,6,8,10]; auto bar = reindex!("assert(a & 1 == 0); a / 2")(foo); assert(bar[2] == 2); assert(bar[10] == 10); One-indexing: foo = [1,2,3,4,5]; auto baz = reindex!("a - 1")(foo); assert(foo[1] == 1); assert(foo[5] == 5); Array indexed by powers of two: foo = [1,2,4,8,16,32,64]; auto pow2 = reindex!((a) { return cast(size_t) log2(a); })(foo); assert(foo[32] == 32); assert(foo[4] == 4);
 Joint:  Kind of like std.range.Zip, but returns a tuple of values of the
 elements in the underlying ranges instead of a proxy of pointers to the
 elements of the underlying range.  Zip is great for what it's designed for,
 but sometimes the pointer/reference semantics are problematic.


This idea grew out of dstats.infotheory and joint probability distributions.
 Rotated:  Gives a rotated view of a random access range without modifying the
 underlying range.  Also has a put() method.  Given an underlying range of
 length L, this provides an efficient way to remember the last L elements seen
 by the range:

http://www.digitalmars.com/d/2.0/phobos/std_range.html#Cycle Keep the great ideas coming!

True, I didn't see the index parameter in cycle when I initially looked at it. This one could go either way. On the one hand, avoiding bloat and having a few very general primitives is usually a good thing. On the other hand, if you use cycle and take, you're implementing someting fairly simple on top of something much more complex, and you can't have the put() function that makes the range track the last L items written to it in order with O(1) put() calls, only one memory allocation over the object's lifetime and minimal space overhead, which was my original motivation for Rotate.
Apr 23 2009
prev sibling next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 23 Apr 2009 14:37:37 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 dsimcha wrote:
 CapacityArray:  A range that covers a builtin array and gives it a  
 capacity
 field.  All operations that don't have to do with appending are  
 forwarded to
 the underlying array via alias this (modulo a few compiler bugs),  
 meaning that
 a CapacityArray has an identical compile time interface to a builtin  
 array,
 and is implicitly convertible to one.

Upon more thinking, I think we'll have to bit the bullet and define Array!T and Slice!T. Then dmd rewrites: [ a, b, c ] -> .Array!(typeof(a))(a, b, c) T[new] -> .Array!(T) new T[n] -> .Array!(T)(n) T[] -> .Slice!(T) It seems increasingly clear that slices alone can't quite cut the mustard. So your CapacityArray will essentially be the basis of Array.

No. The capacity field hack only works with the current free-list based mark-sweep GC. In every other type of GC, the capacity is dynamic and can not be cached in the array itself. Introducing a language feature that is specific to an implementation seems like a bad idea.
Apr 23 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Thu, 23 Apr 2009 14:37:37 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 dsimcha wrote:
 CapacityArray:  A range that covers a builtin array and gives it a 
 capacity
 field.  All operations that don't have to do with appending are 
 forwarded to
 the underlying array via alias this (modulo a few compiler bugs), 
 meaning that
 a CapacityArray has an identical compile time interface to a builtin 
 array,
 and is implicitly convertible to one.

Upon more thinking, I think we'll have to bit the bullet and define Array!T and Slice!T. Then dmd rewrites: [ a, b, c ] -> .Array!(typeof(a))(a, b, c) T[new] -> .Array!(T) new T[n] -> .Array!(T)(n) T[] -> .Slice!(T) It seems increasingly clear that slices alone can't quite cut the mustard. So your CapacityArray will essentially be the basis of Array.

No. The capacity field hack only works with the current free-list based mark-sweep GC. In every other type of GC, the capacity is dynamic and can not be cached in the array itself. Introducing a language feature that is specific to an implementation seems like a bad idea.

1. STL does the same thing so it's likely a more general issue than one might think. 2. This is not a language feature. Different implementations can take different routes. Andrei
Apr 23 2009
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 23 Apr 2009 19:00:10 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Thu, 23 Apr 2009 14:37:37 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 dsimcha wrote:
 CapacityArray:  A range that covers a builtin array and gives it a  
 capacity
 field.  All operations that don't have to do with appending are  
 forwarded to
 the underlying array via alias this (modulo a few compiler bugs),  
 meaning that
 a CapacityArray has an identical compile time interface to a builtin  
 array,
 and is implicitly convertible to one.

Upon more thinking, I think we'll have to bit the bullet and define Array!T and Slice!T. Then dmd rewrites: [ a, b, c ] -> .Array!(typeof(a))(a, b, c) T[new] -> .Array!(T) new T[n] -> .Array!(T)(n) T[] -> .Slice!(T) It seems increasingly clear that slices alone can't quite cut the mustard. So your CapacityArray will essentially be the basis of Array.

based mark-sweep GC. In every other type of GC, the capacity is dynamic and can not be cached in the array itself. Introducing a language feature that is specific to an implementation seems like a bad idea.

1. STL does the same thing so it's likely a more general issue than one might think.

1) Malloc generally uses a free-list based allocation scheme, so no, its not more general. 2) The STL is a library, not a language feature.
 2. This is not a language feature. Different implementations can take  
 different routes.

Creating seperate types for slices and arrays is a language feature. P.S. I think having array builders in Phobos is a good idea, I just don't think that they warrant a language change.
Apr 23 2009