www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - On "A New Collections Framework for the Standard Library"

reply Jack Stouffer <jack jackstouffer.com> writes:
I just got around to watching Eduard Staniloiu's talk at DConf 
[1] about the collections library he was working on. One thing 
seemed odd, in that Eduard seems to be saying that the container 
and the range over the container's elements are one in the same 
and the range primitives are on the container itself.

Is this accurate?

If so, then this is completely different than the currently 
accepted design of ranges and containers. Currently, a container 
allows the removal and insertion of elements, and to get a range 
of the elements you have to call a function or slice the 
container. This is how std.containers and Brain's EMSI container 
library. In this new library, what happens when I iterate the 
elements of one of these containers using foreach? Do the range 
primitives empty the container? How do I ask for a new, full 
range of my data in the collection to use in a range algorithm 
after I've already iterated over the elements?

As Andrei originally laid out in this talk introducing ranges 
[2], ranges only shrink and never grow. Put another way, Output 
Ranges and Input Ranges are two separate things. Andrei states 
durning that talk (at 1:20:20)

"Can ranges ever grow? ... No, never, because it would be 
fundamentally unsafe."

I'm not sure what Andrei meant by this, but he believes that 
growing ranges is also a bad idea.

[1] https://www.youtube.com/watch?v=k88QSIC-Na0
[2] 
https://archive.org/details/AndreiAlexandrescuKeynoteBoostcon2009
May 18 2017
next sibling parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Thursday, May 18, 2017 15:18:00 Jack Stouffer via Digitalmars-d wrote:
 I just got around to watching Eduard Staniloiu's talk at DConf
 [1] about the collections library he was working on. One thing
 seemed odd, in that Eduard seems to be saying that the container
 and the range over the container's elements are one in the same
 and the range primitives are on the container itself.

 Is this accurate?

 If so, then this is completely different than the currently
 accepted design of ranges and containers. Currently, a container
 allows the removal and insertion of elements, and to get a range
 of the elements you have to call a function or slice the
 container. This is how std.containers and Brain's EMSI container
 library. In this new library, what happens when I iterate the
 elements of one of these containers using foreach? Do the range
 primitives empty the container? How do I ask for a new, full
 range of my data in the collection to use in a range algorithm
 after I've already iterated over the elements?

 As Andrei originally laid out in this talk introducing ranges
 [2], ranges only shrink and never grow. Put another way, Output
 Ranges and Input Ranges are two separate things. Andrei states
 durning that talk (at 1:20:20)

 "Can ranges ever grow? ... No, never, because it would be
 fundamentally unsafe."

 I'm not sure what Andrei meant by this, but he believes that
 growing ranges is also a bad idea.

 [1] https://www.youtube.com/watch?v=k88QSIC-Na0
 [2]
 https://archive.org/details/AndreiAlexandrescuKeynoteBoostcon2009
That point concerned me as well. Dynamic arrays in D are very strange beasts indeed, and while it works for them to function as both ranges and (sort of) containers, it's also created a fair bit of confusion, and it really a fair bit of what is done with dynamic arrays are _not_ range-based functions (e.g. appending), making the whole situation that much more confusing when range-based functions and array-specific functions are mixed (which is definitely going to happen in stuff like string code). And a container as a range really makes no sense to me. It makes slightly more sense than an iterator being a container (since it is a range of elements not just one), but it still goes against how containers normally work. A big part of what's different with dynamic arrays is that they don't actually own or manage their own memory - rather they refer to memory from elsewhere, and how that memory is managed depends on where that memory comes from. Yes, after appending, it can be guaranteed to be the GC, but even then, it's the GC that manages what goes on with the elements. No one dynamic array owns them, since you can have multiple of them referring to the same memory. So, they're quite schizophrenic. Now, I could see it working to treat a container as a range if you're talking about _immutable_ containers - especially if their memory is owned by the GC (and immutable containers are definitely part of what's being worked on) - since you don't really add to those (at most, you create a new container with the same elements plus more), but for mutable containers, that just doesn't seem right. So, yes, that bit about modeling containers after dynamic arrays greatly concerns me, but I figured that I'd wait and see the actual API before complaining about it. And it may very well turn out that what they're doing actually makes a lot of sense once you actually see the details. I don't know. - Jonathan M Davis
May 18 2017
parent Jack Stouffer <jack jackstouffer.com> writes:
On Thursday, 18 May 2017 at 15:38:39 UTC, Jonathan M Davis wrote:
 That point concerned me as well. Dynamic arrays in D are very 
 strange beasts indeed, and while it works for them to function 
 as both ranges and (sort of) containers, it's also created a 
 fair bit of confusion, and it really a fair bit of what is done 
 with dynamic arrays are _not_ range-based functions (e.g. 
 appending), making the whole situation that much more confusing 
 when range-based functions and array-specific functions are 
 mixed (which is definitely going to happen in stuff like string 
 code).
Yes, adding in the free function versions of the range primitives in std.range, while convenient initially, seems to have had large negative effects down the line.
May 18 2017
prev sibling next sibling parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Thursday, 18 May 2017 at 15:18:00 UTC, Jack Stouffer wrote:
 ...
Also, shared allocators raise another problem entirely. Let's say for the sake of clarity that these future containers will use a separate range type. Let's say you have a container with five elements, and then you give a range of those elements to a function that caches the length somewhere because it's available (very common as ranges have never been expected to change before). What happens when you're iterating over the range and then another thread calls removeBack before the loop is completed? All sorts of functions in Phobos will blow up in interesting and different ways. I think you'd have to make all ranges created from mutable containers input ranges with no extra primitives in order to avoid this problem. Plus, you also have a problem with things like SList where removeFront can be called after a different thread creates a range. All of a sudden your range `front` is pointing to invalid data. So for that, you need to either 1. Make a thread safe way to invalidate all old ranges (possibly by marking them empty immediately) 2. Allow the program to blow-up and mark all remove functions as system
May 18 2017
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/18/2017 02:17 PM, Jack Stouffer wrote:
 On Thursday, 18 May 2017 at 15:18:00 UTC, Jack Stouffer wrote:
 ...
Also, shared allocators raise another problem entirely. Let's say for the sake of clarity that these future containers will use a separate range type. Let's say you have a container with five elements, and then you give a range of those elements to a function that caches the length somewhere because it's available (very common as ranges have never been expected to change before). What happens when you're iterating over the range and then another thread calls removeBack before the loop is completed? All sorts of functions in Phobos will blow up in interesting and different ways.
If two or more threads have access to a container, that would be shared. Many traditional containers aren't usable gainfully as shared, so we plan to implement a few specialized ones. Anyhow, for now you won't be able to use containers that are shared. -- Andrei
May 18 2017
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/18/2017 11:18 AM, Jack Stouffer wrote:
 I just got around to watching Eduard Staniloiu's talk at DConf [1]
 about the collections library he was working on. One thing seemed
 odd, in that Eduard seems to be saying that the container and the
 range over the container's elements are one in the same and the
 range primitives are on the container itself.

 Is this accurate?
That is correct. The fundamental equation is: Ranges + Optional Primitives = Containers
 If so, then this is completely different than the currently accepted
  design of ranges and containers. Currently, a container allows the
 removal and insertion of elements, and to get a range of the elements
 you have to call a function or slice the container. This is how
 std.containers and Brain's EMSI container library. In this new
 library, what happens when I iterate the elements of one of these
 containers using foreach?
Iterating over a container using e.g. foreach won't consume the container same as iterating over int[] won't consume the slice.
 Do the range primitives empty the container? How do I ask for a new,
 full range of my data in the collection to use in a range algorithm
 after I've already iterated over the elements?
There is no way to reclaim the original container. If you create a container, you get a range positioned to "see" the entire container. Once you popFront from that range, you lose all access to the first element in the container, unless you have a separate copy of the range. This is not new and not different from: auto r = new int[10]; r.popFront; What happens here is an array of 10 elements gets created. But you don't get a type "array of 10 integers". You get "a slice of integers, incidentally initialized to refer to the entire array of 10 integers that was just created". Next, you decide you don't care about the first element in the array. Once you call r.popFront, access to that element is lost forever. As an aside, we briefly experimented with a type called "new T[]" which meant to refer to that initial array. I tried to redo a bunch of code with it. It was a failure - figuring out what's an array and what's a slice in the array was a continuous hassle offering no redeeming insight. It took me years to figure actually there's no need for containers and ranges - all you need is ranges.
 As Andrei originally laid out in this talk introducing ranges [2],
 ranges only shrink and never grow. Put another way, Output Ranges and
 Input Ranges are two separate things. Andrei states durning that talk
 (at 1:20:20)

 "Can ranges ever grow? ... No, never, because it would be
 fundamentally unsafe."

 I'm not sure what Andrei meant by this, but he believes that growing
  ranges is also a bad idea.
Yes, you're not supposed to grow a range on its existing support without knowing whether you have room. That's a distinct matter. You can always grow a range by means of safe primitives that may duplicate it if needed, such as ~ and ~=. Andrei
May 18 2017
next sibling parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Thursday, 18 May 2017 at 18:27:22 UTC, Andrei Alexandrescu 
wrote:
 Iterating over a container using e.g. foreach won't consume the
 container same as iterating over int[] won't consume the slice.
I don't understand why you're mapping the behavior of ranges/slices, which theoretically are shrinking views of data, onto containers of data which can grow or shrink. They seem antithetical to me. The only reason foreach over a dynamic array doesn't consume the slice is because foreach uses index based iteration on static and dynamic arrays and not the std.range.primitives functions. It's not evident to me that the non-consuming behavior makes sense to map onto something completely different like containers that won't even necessarily have index based access. It seems odd to me that slices would be the design basis when slices 1. Are intentionally limited in scope design wise to be a shrinking view of data 2. Only make sense behavior wise for array containers 3. Their ability to grow is a GC oddity which may end up copying a whole lot of data depending on the origin of the slice
 There is no way to reclaim the original container. If you 
 create a container, you get a range positioned to "see" the 
 entire container. Once you popFront from that range, you lose 
 all access to the first element in the container, unless you 
 have a separate copy of the range. This is not new and not 
 different from:

 auto r = new int[10];
 r.popFront;

 What happens here is an array of 10 elements gets created. But 
 you don't get a type "array of 10 integers". You get "a slice 
 of integers, incidentally initialized to refer to the entire 
 array of 10 integers that was just created". Next, you decide 
 you don't care about the first element in the array. Once you 
 call r.popFront, access to that element is lost forever.
First off, how are you going to do something like a map over a immutable container then, as map uses the range primitives and not foreach? There's no reason in principal that that should cause an issue. But with that design popFront is mutating the underlying data, and therefore should not be allowed. But this works with the std.container design because popFront is mutating the range view which has no reason to be immutable. Secondly, if I want to say, perform a map over the elements of a container for some output, and I want to do things with the elements later in the code, then I have to create a copy of the container and pass the original to map? If that's the case then why not just go with the std.container behavior of getting a range from a primitive if you end up having to create range copies? I mean it's fundamentally the same, but that way you don't have to worry about the copies changing the data of the container out from under you. What happens to the copy of a container when the original modifies the underlying data? For example, what should this code print? auto container = Array!int(MAllocator.instance); container.put(iota(10)); auto container_copy = container; container.popBackN(5); container.insertBack(1); container_copy.each!(a => write(a, ", ")); In my estimation, a properly designed containers library should print 0, 1, 2, 3, 4, 1, 5, 6, 7, 8, 9 But from what you're describing, that the containers act like slices, the code with print 0, 1, 2, 3, 4, 1, 6, 7, 8, 9
May 18 2017
parent Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Friday, May 19, 2017 1:22:50 AM PDT Jack Stouffer via Digitalmars-d 
wrote:
 First off, how are you going to do something like a map over a
 immutable container then, as map uses the range primitives and
 not foreach? There's no reason in principal that that should
 cause an issue. But with that design popFront is mutating the
 underlying data, and therefore should not be allowed. But this
 works with the std.container design because popFront is mutating
 the range view which has no reason to be immutable.
Andrei was talking at dconf about this and said that we'd be adding a new range primitive called tail, and we'd basically end up with car and cdr for ranges. This makes sense for immutable ranges, and it's simple to wrap existing input ranges to makes them work with such primitives, but I'm not sure that it plays at all nicely once you start considering higher order ranges - and it has the serious problem that it would require revamping all range-based code to support it. That _might_ make sense for Phobos, since it's the standard library, but I expect that its DOA pretty much everywhere else (especially when it's for immutable containers which can be very useful but are usually very niche), and even for Phobos, it's going to be a lot of churn. Personally, I'm not convinced that immutable containers are worth enough to merit such a change, but Andrei clearly thinks that they are. Interesingly, he doesn't seem to think that it's worth solving the tail-const issue with ranges, and without a solution for that, const and ranges basically so not work together at all. And _that_ is a problem that keeps coming up, whereas most folks don't seem to be clammering for immutable containers. Now, if you're willing to add a layer of indirection, then it's trivial to make existing ranges over immutable containers work, because you can have a mutable range refer to an immutable object, but that is either going to be unsafe or require that the range be allocated on the heap - which is presumably why Andrei didn't go with that solution, but I don't know. But overall, it sounds like once again Andrei is getting very experimental with what he's doing (or getting his mentee to do). So if it works out well, we could end up with something pretty interesting and innovative, but it also risks being a bit of a disaster, and since it's experimental, there's really no way to know how it's going to go (even less so when all we know about it is what was in the talk and whatever tidbits Andrei has discussed in the newsgroup or at dconf). So, I guess that we'll have to wait and see - particularly since we haven't even seen the actual API yet. - Jonathan M Davis
May 18 2017
prev sibling parent reply Mark <smarksc gmail.com> writes:
On Thursday, 18 May 2017 at 18:27:22 UTC, Andrei Alexandrescu 
wrote:
 On 05/18/2017 11:18 AM, Jack Stouffer wrote:
 I just got around to watching Eduard Staniloiu's talk at DConf 
 [1]
 about the collections library he was working on. One thing 
 seemed
 odd, in that Eduard seems to be saying that the container and 
 the
 range over the container's elements are one in the same and the
 range primitives are on the container itself.

 Is this accurate?
That is correct. The fundamental equation is: Ranges + Optional Primitives = Containers Andrei
Is the code for the new collections available on github or somewhere else? Even if it's a work in progress, I'm curious to see what it looks like.
Nov 13 2017
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/13/2017 05:19 AM, Mark wrote:
 On Thursday, 18 May 2017 at 18:27:22 UTC, Andrei Alexandrescu wrote:
 On 05/18/2017 11:18 AM, Jack Stouffer wrote:
 I just got around to watching Eduard Staniloiu's talk at DConf [1]
 about the collections library he was working on. One thing seemed
 odd, in that Eduard seems to be saying that the container and the
 range over the container's elements are one in the same and the
 range primitives are on the container itself.

 Is this accurate?
That is correct. The fundamental equation is: Ranges + Optional Primitives = Containers Andrei
Is the code for the new collections available on github or somewhere else? Even if it's a work in progress, I'm curious to see what it looks like.
+Eduard
Nov 13 2017
prev sibling parent Eugene Wissner <belka caraus.de> writes:
On Thursday, 18 May 2017 at 15:18:00 UTC, Jack Stouffer wrote:
 I just got around to watching Eduard Staniloiu's talk at DConf 
 [1] about the collections library he was working on. One thing 
 seemed odd, in that Eduard seems to be saying that the 
 container and the range over the container's elements are one 
 in the same and the range primitives are on the container 
 itself.

 Is this accurate?

 If so, then this is completely different than the currently 
 accepted design of ranges and containers. Currently, a 
 container allows the removal and insertion of elements, and to 
 get a range of the elements you have to call a function or 
 slice the container. This is how std.containers and Brain's 
 EMSI container library. In this new library, what happens when 
 I iterate the elements of one of these containers using 
 foreach? Do the range primitives empty the container? How do I 
 ask for a new, full range of my data in the collection to use 
 in a range algorithm after I've already iterated over the 
 elements?

 As Andrei originally laid out in this talk introducing ranges 
 [2], ranges only shrink and never grow. Put another way, Output 
 Ranges and Input Ranges are two separate things. Andrei states 
 durning that talk (at 1:20:20)

 "Can ranges ever grow? ... No, never, because it would be 
 fundamentally unsafe."

 I'm not sure what Andrei meant by this, but he believes that 
 growing ranges is also a bad idea.

 [1] https://www.youtube.com/watch?v=k88QSIC-Na0
 [2] 
 https://archive.org/details/AndreiAlexandrescuKeynoteBoostcon2009
I didn't look the talk yet, but about a month ago I tried to implement a similar concept and wanted to apply it to all containers as well. My idea was to make the containers/slices reference counted and copy on write (or to be more precise, on size changing, similar how D built-in slices behave). 1) All containers have a pointer to the allocated memory and its length. 2) All containers have a pointer to the beginning and the end of the range they refer to (so popFront, popBack can be implemented without changing the real size of the allocated memory) 3) If you do something like insertBack or insertFront and the reference counter is greater than 1, then the counter decreases and the container is copied. But at the end I couldn't deal with the constness. Slice!(ubyte) and Slice!(const ubyte) are different types - that isn't nice. But for "const Slice!ubyte" you can't do popFront and popBack. It could be solved with some casting or other hacks but I don't know how it would work in a multithreaded envrionment. So I stopped the experiment for now.
May 18 2017