www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Iterators and Ranges: Comparing C++ to D to Rust

reply Friendly Neighborhood Weirdo <frineiwei1001001 gmail.com> writes:
This is a talk from the 2021 c++now conference: 
https://www.youtube.com/watch?v=d3qY4dZ2r4w
Jun 13
next sibling parent reply IGotD- <nise nise.com> writes:
On Monday, 14 June 2021 at 01:48:18 UTC, Friendly Neighborhood 
Weirdo wrote:
 This is a talk from the 2021 c++now conference: 
 https://www.youtube.com/watch?v=d3qY4dZ2r4w
Interesting video. There is something that I'm kind of confused about. The narrator in the video claims that in D popFront doesn't consume the element. Is this true? When I do a popFront, doesn't it consume the first element in an array for example. Can someone clarify this? Also in Rust, next does consume the element. However, in Rust you can iterate over references (which also automatically borrows). So iterating over borrowed references does not consume the underlying data structure.
Jun 14
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 6/14/21 5:33 AM, IGotD- wrote:

 The narrator in the video claims that in D popFront doesn't consume the
 element. Is this true? When I do a popFront, doesn't it consume the
 first element in an array for example. Can someone clarify this?
popFront() consumes the range, not the container: import std.range; void main() { auto arr = [ 1, 2, 3, 4 ]; auto half = arr[0..$/2]; half.popFront(); assert(arr[0] == 1); // Nothing happened to the array (which 'arr' // is just another slice of it) } Ali
Jun 14
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 14 June 2021 at 13:30:52 UTC, Ali Çehreli wrote:
 popFront() consumes the range, not the container:
I glossed over the presentation, and I don't think he really covered things as well as he should. For instance there is a difference between generators, nonconsuming-iterators/table–pointers, consuming-iterators, but sometimes the distinction is blurry. What if you create a range that consumes a queue. What if you create a range that interpolates and doubles the length of the source. What if you want to modify the underlying container while iterating over it. It would be nice with an ontology for table-pointers, iterators and generators, along the lines of the original Gang of Four book on design patterns, but I didn't really feel this presentation brought anything to the table. His take on this was very "syntactical". Also, his assumptions about performance were not really well founded. You have to look at how iterators are used in actual alogrithms and how that could block optimizations to say anything sensible about that. E.g. his take on StopIteration in Python is not really correct either, you can write a generator (coroutine) with no explicit StopIteration (although the semantics implies a raised exception, but that does not mean the implementation has to).
Jun 14
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 14 June 2021 at 13:45:14 UTC, Ola Fosheim Grøstad 
wrote:
 Also, his assumptions about performance were not really well 
 founded. You have to look at how iterators are used in actual 
 alogrithms and how that could block optimizations to say 
 anything sensible about that.
Think of it this way: If it is possible write an O(1) wrapper around an iterator/range implementation and emulate any of the other interfaces, then they have the same performance (in terms of the ability to optimize). I suspect that (except for table pointers) they are equivalent in O(1). So basically all the same...
Jun 14
prev sibling next sibling parent reply jfondren <julian.fondren gmail.com> writes:
On Monday, 14 June 2021 at 13:45:14 UTC, Ola Fosheim Grøstad 
wrote:
 On Monday, 14 June 2021 at 13:30:52 UTC, Ali Çehreli wrote:
 popFront() consumes the range, not the container:
I glossed over the presentation, and I don't think he really covered things as well as he should.
Rust iterators have been discussed here before, e.g. in https://forum.dlang.org/thread/arufovtvoewhyfuesvco forum.dlang.org?page=1 and there are tones like "Rust has a better design and D does things differently only for historical reasons.", and maybe "it'd be too hard for D to emulate Rust without a borrow checker". Similarly, I think people could agree that D's ranges are simpler than than C++'s iterators, and that C++ just has iterators because they hadn't though of ranges first. What this presentation does well is to add nuance to these "X is just better than Y" relationships. Some tasks are trivial for C++ but a hassle for D, and surprisingly many tasks are excluded by Rust's iterators. The only clear loser is Java, where there really are historical reasons that hampered its design. I'd also never noticed std.algorithm.cache until I went searching for a library solution that I was sure must exist for the "filter calls map's function twice" problem that D has, but Rust doesn't have.
Jun 14
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 14 June 2021 at 14:10:48 UTC, jfondren wrote:
 What this presentation does well is to add nuance to these "X 
 is just
 better than Y" relationships. Some tasks are trivial for C++ 
 but a
 hassle for D, and surprisingly many tasks are excluded by Rust's
 iterators.
But then he should have focused on concrete complex algorithms and how one approach makes the algorithm much harder to read or some other usability measure. But C++ didn't have iterators until C++20, they had table pointers/cursors (which they wrongly call "iterators").
 I'd also never noticed std.algorithm.cache until I went 
 searching for
 a library solution that I was sure must exist for the "filter 
 calls
 map's function twice" problem that D has, but Rust doesn't have.
I think C++ may gain an edge with concepts. D will eventually need something similar and make it easier to write composable template libraries. By composable I mean that you compose new types applying templates to templates to templates. Right now, this is not really possible in D because of the type unification issue. If C++ library authors succeed in making good use of such features, they might gain an edge. BUT the D community is much smaller, so it should in theory be much easier for the D community to develop "standards" for creating composable template libraries. So even though C++ may have an edge right now, they also have a bigger hurdle to overcome for establishing eco system "standards". D needs to capitalize on being small and improve template composability between libraries, IMO.
Jun 14
prev sibling parent reply IGotD- <nise nise.com> writes:
On Monday, 14 June 2021 at 13:45:14 UTC, Ola Fosheim Grøstad 
wrote:
 E.g. his take on StopIteration in Python is not really correct 
 either, you can write a generator (coroutine) with no explicit 
 StopIteration (although the semantics implies a raised 
 exception, but that does not mean the implementation has to).
How do we classify the iterators of Nim? https://nim-lang.org/docs/tut1.html#iterators It looks similar to Python and Rust. Also notice how it can use yield like a coroutine.
Jun 14
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 14 June 2021 at 22:56:29 UTC, IGotD- wrote:
 How do we classify the iterators of Nim?

 https://nim-lang.org/docs/tut1.html#iterators

 It looks similar to Python and Rust. Also notice how it can use 
 yield like a coroutine.
Yes, but the page says that they have to be inlined and used in for-loops only? But if they are made more general then they would be generator coroutines like Python and C++.
Jun 14
parent reply Araq <rumpf_a web.de> writes:
On Monday, 14 June 2021 at 23:08:40 UTC, Ola Fosheim Grøstad 
wrote:
 On Monday, 14 June 2021 at 22:56:29 UTC, IGotD- wrote:
 How do we classify the iterators of Nim?

 https://nim-lang.org/docs/tut1.html#iterators

 It looks similar to Python and Rust. Also notice how it can 
 use yield like a coroutine.
Yes, but the page says that they have to be inlined and used in for-loops only? But if they are made more general then they would be generator coroutines like Python and C++.
There is also a variant called "closure" iterators which is Nim's foundation for async programming. Nim's inline iterators don't compose particularly well, but they get the job done and there are all sorts of workarounds and 3rd party packages. We couldn't copy C++ or D's designs because they are fundamentally unsafe (iterator invalidation is complex and expensive). Rust can do it easily thanks to its borrow checker. We're slowly catching up with our support for "view types".
Jun 15
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 15 June 2021 at 07:22:24 UTC, Araq wrote:
 We couldn't copy C++ or D's designs because they are 
 fundamentally unsafe (iterator invalidation is complex and 
 expensive).
Yes, but C++20 coroutines are completely different though. My understanding is that the compiler analyse the coroutine function and breaks it up into many functions so that the stack is empty when it hits ```co_yield```, then the state is synthesised into an object (I guess you could call it a closure). Is this what Nim closure-iterators do?
Jun 15
parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 15 June 2021 at 08:04:13 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 15 June 2021 at 07:22:24 UTC, Araq wrote:
 We couldn't copy C++ or D's designs because they are 
 fundamentally unsafe (iterator invalidation is complex and 
 expensive).
Yes, but C++20 coroutines are completely different though. My understanding is that the compiler analyse the coroutine function and breaks it up into many functions so that the stack is empty when it hits ```co_yield```, then the state is synthesised into an object (I guess you could call it a closure).
It is a bit more elaborated than that. I suggest a carefull reading of all these posts to get a good overview of all use cases. https://devblogs.microsoft.com/oldnewthing/20210504-01/?p=105178
Jun 15
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 15 June 2021 at 08:36:16 UTC, Paulo Pinto wrote:
 It is a bit more elaborated than that.

 I suggest a carefull reading of all these posts to get a good 
 overview of all use cases.

 https://devblogs.microsoft.com/oldnewthing/20210504-01/?p=105178
That was longwinded... about async/await like behaviour? A short summary of C++20 coroutines: https://en.cppreference.com/w/cpp/language/coroutines
Jun 15
parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 15 June 2021 at 08:58:17 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 15 June 2021 at 08:36:16 UTC, Paulo Pinto wrote:
 It is a bit more elaborated than that.

 I suggest a carefull reading of all these posts to get a good 
 overview of all use cases.

 https://devblogs.microsoft.com/oldnewthing/20210504-01/?p=105178
That was longwinded... about async/await like behaviour? A short summary of C++20 coroutines: https://en.cppreference.com/w/cpp/language/coroutines
That short summary doesn't cover the implementation details for all use cases, hence the longwinded articles about their uses in Windows, and C++/WinRT runtime. Outside Windows, cppcoro is probably the best option, https://www.modernescpp.com/index.php/c-20-coroutines-with-cppcoro Couroutines will only be fully done when C++23 gets out of the door, if everything goes as planned. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0592r3.html
Jun 15
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 15 June 2021 at 09:42:10 UTC, Paulo Pinto wrote:
 That short summary doesn't cover the implementation details for 
 all use cases, hence the longwinded articles about their uses 
 in Windows, and C++/WinRT runtime.
This is all interesting, but not really necessary to discuss iterators and for-loop like usage. What is interesting though, is an implementation strategy of tearing up a function into many smaller functions so that the backend doesn't have to change in order to support it. So basically: ``` coroutine(){ int n = 0; while(true){ n++; yield n; if (n > 100) { yield 0; } } } ``` can be distilled into something like this (unoptimized): ``` struct { int n; int result; delegate nextstate = state1; state1() { n++; result = n; nextstate = state2; } state2() { if (n > 100) { result = 0; nextstate = state1; } else state1(); } } ```
Jun 15
parent reply sighoya <sighoya gmail.com> writes:
On Tuesday, 15 June 2021 at 10:04:03 UTC, Ola Fosheim Grøstad 
wrote:

 So basically:

 ```
     coroutine(){
        int n = 0;
        while(true){
           n++;
           yield n;
           if (n > 100) {
             yield 0;
           }
        }
     }

 ```


 can be distilled into something like this (unoptimized):


 ```
     struct {
         int n;
         int result;
         delegate nextstate = state1;
         state1() {
            n++;
            result = n;
            nextstate = state2;
         }
         state2() {
            if (n > 100) {
                result = 0;
                nextstate = state1;
            }
            else state1();
         }
     }

 ```
Reminds me a bit on CPS: https://github.com/zevv/cpsdoc
Jun 15
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 15 June 2021 at 14:52:53 UTC, sighoya wrote:
 Reminds me a bit on CPS: https://github.com/zevv/cpsdoc
Yes, continuation passing is an implementation strategy for high level languages. C++ coroutines may be a bit different from what I presented though, the memory for saving state is reused like a stack frame. So if you have a variable "x" that isn't used after a certain point, then it can be reused for a variable "y"... Makes sense.
Jun 15
prev sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 15 June 2021 at 07:22:24 UTC, Araq wrote:
 On Monday, 14 June 2021 at 23:08:40 UTC, Ola Fosheim Grøstad 
 wrote:
 On Monday, 14 June 2021 at 22:56:29 UTC, IGotD- wrote:
 How do we classify the iterators of Nim?

 https://nim-lang.org/docs/tut1.html#iterators

 It looks similar to Python and Rust. Also notice how it can 
 use yield like a coroutine.
Yes, but the page says that they have to be inlined and used in for-loops only? But if they are made more general then they would be generator coroutines like Python and C++.
There is also a variant called "closure" iterators which is Nim's foundation for async programming. Nim's inline iterators don't compose particularly well, but they get the job done and there are all sorts of workarounds and 3rd party packages.
In addition to Nim's inline and closure iterators, which if I understand correctly, model input iteration (not sure about forward), is there an at least somewhat standardized interface for implementing iterators manually that does support backward iteration and random access? What do the most prominent Nim libraries do? One of the things that I don't like about languages that I use that pretty much their whole ecosystems only cares about input iteration and things that in D are trivially `O(1)` like `array.map!foo[$/2 .. $].retro.map!bar.length` end up being unexpected performance pitfalls in practice in these languages.
 We couldn't copy C++ or D's designs because they are 
 fundamentally unsafe (iterator invalidation is complex and 
 expensive). Rust can do it easily thanks to its borrow checker.
Heh, when I guess many people having read the the infamous "fearless concurrency" blog post believe that iterator invalidation is a big deal and that languages really ought to prevent it statically and now demand it from language maintainers (which is a good thing™). That said, no matter how easy is to show that innocent looking code can end up with an iterator invalidation bug, in practice I don't remember ever experiencing it in my D code. Resisting the urge to mutate a container while you're iterating is really not that hard :D Perhaps iterator invalidation problems are more common with imperative-style raw loops code, rather that FP-style, which I feel is much better supported by D's standard library, than the standard libraries of That is not to say that I wouldn't want my libraries to be fool-proof against that kind of problems, but at same time, at least in my experience with D it hasn't been a big deal, so I'd much rather have a powerful algorithmic API foundation than prevent very useful and beautiful designs, just because they can be misused.
 We're slowly catching up with our support for "view types".
Interesting, can you elaborate more?
Jun 15
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 15 June 2021 at 08:22:07 UTC, Petar Kirov 
[ZombineDev] wrote:
 That said, no matter how easy is to show that innocent looking 
 code can end up with an iterator invalidation bug, in practice 
 I don't remember ever experiencing it in my D code. Resisting 
 the urge to mutate a container while you're iterating is really 
 not that hard :D
I agree. If D wants to be a system level programming language it should take some of the disadvantages that comes with making it easy to write performant code. It would be better to focus on making it easier to write static analysis linters for D, by making it possible to emit a high level IR after templates have been expanded. There are tools that take a "generic programming language" as input and makes it possible to write various custom checks over that language. So what is needed is a way to translate template-expanded D code to such "verification languages". Limited resources do suggest that one should leverage existing generic tools. Lots of unused opportunities there, I think.
Jun 15
prev sibling parent reply Araq <rumpf_a web.de> writes:
On Tuesday, 15 June 2021 at 08:22:07 UTC, Petar Kirov 
[ZombineDev] wrote:
 One of the things that I don't like about languages that I use 

 is that pretty much their whole ecosystems only cares about 
 input iteration and things that in D are trivially `O(1)` like 
 `array.map!foo[$/2 .. $].retro.map!bar.length` end up being 
 unexpected performance pitfalls in practice in these languages.
Er, performance pitfalls can be found with a profiler. Crashes are much harder to track down. If I have to pick between them, I take the performance pitfalls, as they are much easier to fix. At least for your outlined `.retro.map.length` snippets.
 That is not to say that I wouldn't want my libraries to be 
 fool-proof against that kind of problems, but at same time, at 
 least in my experience with D it hasn't been a big deal, so I'd 
 much rather have a powerful algorithmic API foundation than 
 prevent very useful and beautiful designs, just because they 
 can be misused.
Beauty is subjective, safety isn't.
 We're slowly catching up with our support for "view types".
Interesting, can you elaborate more?
Wrong forum for that.
Jun 15
next sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 15 June 2021 at 10:13:07 UTC, Araq wrote:
 On Tuesday, 15 June 2021 at 08:22:07 UTC, Petar Kirov 
 [ZombineDev] wrote:
[...]
Er, performance pitfalls can be found with a profiler. Crashes are much harder to track down. If I have to pick between them, I take the performance pitfalls, as they are much easier to fix. At least for your outlined `.retro.map.length` snippets.
 [...]
Beauty is subjective, safety isn't.
 [...]
Wrong forum for that.
+1 We already have enough languages where performance trumps safety.
Jun 15
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 15 June 2021 at 10:53:20 UTC, Paulo Pinto wrote:
 +1 We already have enough languages where performance trumps 
 safety.
This is close to impossible to achieve because of aliasing. Then you need to prove that there is no aliasing on a GLOBAL level. That is totally out of scope for a language like D.
Jun 15
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 15 June 2021 at 11:12:09 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 15 June 2021 at 10:53:20 UTC, Paulo Pinto wrote:
 +1 We already have enough languages where performance trumps 
 safety.
This is close to impossible to achieve because of aliasing.
Keep in mind that anything with references in it that can be used to build up chains can be a source for aliasing. How to do you prove that a graph/linked-list is not included in another graph/linked-list? Damn hard.
Jun 15
prev sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 15 June 2021 at 10:53:20 UTC, Paulo Pinto wrote:
 On Tuesday, 15 June 2021 at 10:13:07 UTC, Araq wrote:
 On Tuesday, 15 June 2021 at 08:22:07 UTC, Petar Kirov 
 [ZombineDev] wrote:
[...]
Er, performance pitfalls can be found with a profiler. Crashes are much harder to track down. If I have to pick between them, I take the performance pitfalls, as they are much easier to fix. At least for your outlined `.retro.map.length` snippets.
 [...]
Beauty is subjective, safety isn't.
 [...]
Wrong forum for that.
+1 We already have enough languages where performance trumps safety.
The real problem is when the safe code is not fast enough and people rewrite it in an unsafe language.
Jun 15
parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 15 June 2021 at 12:36:07 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Tuesday, 15 June 2021 at 10:53:20 UTC, Paulo Pinto wrote:
 On Tuesday, 15 June 2021 at 10:13:07 UTC, Araq wrote:
 On Tuesday, 15 June 2021 at 08:22:07 UTC, Petar Kirov 
 [ZombineDev] wrote:
[...]
Er, performance pitfalls can be found with a profiler. Crashes are much harder to track down. If I have to pick between them, I take the performance pitfalls, as they are much easier to fix. At least for your outlined `.retro.map.length` snippets.
 [...]
Beauty is subjective, safety isn't.
 [...]
Wrong forum for that.
+1 We already have enough languages where performance trumps safety.
The real problem is when the safe code is not fast enough and people rewrite it in an unsafe language.
Actually the real problem is that people think it is not fast enough, based on urban myths without touching any kind of profiling tools, or measuring it against the project hard deadlines. More often than not, it is already fast enough, unless we are talking about winning micro-benchmarks games.
Jun 15
next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Tuesday, 15 June 2021 at 13:25:23 UTC, Paulo Pinto wrote:
 Actually the real problem is that people think it is not fast 
 enough, based on urban myths without touching any kind of 
 profiling tools, or measuring it against the project hard 
 deadlines.

 More often than not, it is already fast enough, unless we are 
 talking about winning micro-benchmarks games.
This is true, but what if the benchmarks show you do need to optimize? You need to avoid systemically slow designs in advance or you run the risk of having to rewrite the whole program. Now, you absolutely can design your program in [insert non-system language] so that it can be optimized as needed. But I believe Petar was arguing that certain kinds of inefficient iterator APIs discourage doing so.
Jun 15
parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 15 June 2021 at 14:31:31 UTC, Dukc wrote:
 On Tuesday, 15 June 2021 at 13:25:23 UTC, Paulo Pinto wrote:
 Actually the real problem is that people think it is not fast 
 enough, based on urban myths without touching any kind of 
 profiling tools, or measuring it against the project hard 
 deadlines.

 More often than not, it is already fast enough, unless we are 
 talking about winning micro-benchmarks games.
This is true, but what if the benchmarks show you do need to optimize? You need to avoid systemically slow designs in advance or you run the risk of having to rewrite the whole program. Now, you absolutely can design your program in [insert non-system language] so that it can be optimized as needed. But I believe Petar was arguing that certain kinds of inefficient iterator APIs discourage doing so.
Yes, that's a good summary.
Jun 15
prev sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 15 June 2021 at 13:25:23 UTC, Paulo Pinto wrote:
 The real problem is when the safe code is not fast enough and 
 people rewrite it in an unsafe language.
Actually the real problem is that people think it is not fast enough, based on urban myths without touching any kind of profiling tools, or measuring it against the project hard deadlines.
I agree it's a factor. However in one case you can prove the myths wrong with real data, while in the other case the data is given as a reason to the low-level/unsafe route.
 More often than not, it is already fast enough, unless we are 
 talking about winning micro-benchmarks games.
Perhaps you feel the need to push back on some prevailing Reddit/Hackernews misconceptions, but I'm referring to real-world cases. For example, how is it possible, that e.g. on the same computer switching between two Slack channels takes 3-4 seconds, but at the same time runs demanding AAA games from from 2-3 years ago just fine? Unless we're living in different universes, you must have noticed the increasing quantity of bloatware apps are more slow and janky than ever, without any corresponding increase of functionality. I'm not saying that the use of a tracing GC is a problem or anything of the sorts. Often times, there's many small inefficiencies (each of which small enough that it's lost in the noise in profiler trace) that when taken as whole accumulate and make perceived user experience bad. Also, I don't about you, but since you often talk about .NET and UWP, in the past I've worked full-time on WPF/SL/UWP control libraries and MVVM apps. For a purely app developer, the profiler often is really not that helpful when most of the CPU time time is spent in the framework after they've async-ified their code and moved all heavy computation out of the UI thread. Back then (before .NET Core was even a thing) I used to spend a lot of time using decompilers and later browsing [referencesource.microsoft.com](https://referencesource.microsoft.com/#Pr sentationFramework) to understand where the inefficiencies lie. In the end, the solutions (as confirmed by both benchmarks and perceived application performance) were often to rewrite the code to avoid high-level constructs like `DependencyProperty` and even sometimes reach for big hammers like `ILEmitter`, to speed-up code that was forced to rely on runtime reflection. In summary, when the framework dictates an inefficient API design (and also not really type-safe - everything is relying on dynamic casts and runtime reflection), the whole ecosystem (from third-party library developers to user-facing app writers) suffers. In the past several years MS has put a ton of effort into optimizing .NET Core under the hood, but often times the highest gains come from a more efficient APIs (e.g. otherwise why would they invest all this effort into value types, Spans, ref-returns, etc.).
Jun 15
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 15 June 2021 at 16:21:07 UTC, Petar Kirov 
[ZombineDev] wrote:
 real-world cases. For example, how is it possible, that e.g. on 
 the same computer switching between two Slack channels takes 
 3-4 seconds, but at the same time runs demanding AAA games from 
 from 2-3 years ago just fine? Unless we're living in different
This trend has been true since the 1980s where people wrote key routines in assembly. Meaning, most projects aim for usable at lowest price. More capable computers means less efficient programming... I think a better argument is that system level programming requires predictable latency and high effiency to a greater extent. For instance, if I create a solar powered monitoring system then I want to use a low powered cpu. Walter says D is for systems programming. Well, then it follows that everything in the standard library should be designed for that purpose. Meaning: predictable latency and highest efficiency, either by being fast OR consume minimal resources (energy or RAM). If D does not follow through on that then it cannot be considered to be a dedicated system level language. But then D needs to define what it is for.
Jun 16
next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 16 June 2021 at 07:47:18 UTC, Ola Fosheim Grostad 
wrote:
 On Tuesday, 15 June 2021 at 16:21:07 UTC, Petar Kirov 
 [ZombineDev] wrote:
 real-world cases. For example, how is it possible, that e.g. 
 on the same computer switching between two Slack channels 
 takes 3-4 seconds, but at the same time runs demanding AAA 
 games from from 2-3 years ago just fine? Unless we're living 
 in different
This trend has been true since the 1980s where people wrote key routines in assembly. Meaning, most projects aim for usable at lowest price. More capable computers means less efficient programming...
That would explain it if the programs were as slow as before, but Petar also complained about programs that are even slower. There has to me more. It could be survivourmanship bias. Perhaps we just don't remember the slower programs from the old days. I think that computers (and internet connections) differ in power more than they used to. The alternative explanation might be that developers with powerful computers don't notice as easily how slow stuff they are making. Then again, it also might be more competition as industry gets bigger => tighter deadlines and less slack => software getting done worse and worse.
 Walter says D is for systems programming. Well, then it follows 
 that everything in the standard library should be designed for 
 that purpose.
Not everything. You're certainly familiar with the D principle for being BOTH a systems programming and application programming language. It follows that the standard library should cater for both cases, so large parts of the standard library should be handy for systems programming but not everything, because that would drive application programmers away. And Phobos is on the right track. There are shortcomings, but nothing is fundamentally wrong from systems programming perspective. D ranges are totally usable at system level. I've used them myself when compiling to WebAssembly, where the D ecosystem is (currently) just as primitive as on some microcontroller.
Jun 16
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 16 June 2021 at 10:20:05 UTC, Dukc wrote:
 That would explain it if the programs were as slow as before, 
 but Petar also complained about programs that are even slower. 
 There has to me more.

 It could be survivourmanship bias. Perhaps we just don't 
 remember the slower programs from the old days.
Yes, maybe. There might be other factors too: 1. Business people are more present as leaders now (whereas it used to be engineers advancing to leadership positions). As a result engineering quality is not appreciated to the same level. My opinion. You see this with Apple products too, the surface stuff is being focused on in their presentations. Fancy surface, boring interior. 2. People have become used to using web applications, so they have become used to latency (lag) when interacting with a system. Thus, the end users may not think about why an application is sluggish (as they don't understand the difference between slow code and network lag). So users expect less? This is especially true when you go to the 90s where Amiga users were predominantly geeks, and thus more demanding. 3. Programmers have become less proficient and fail to think of how the code they write maps down through layers of libraries all the way down to CPU/GPU. Of course, more layers also makes this more difficult. I think many programmers now have the framework they use as their mental model, and not really the actual computer hardware.
 I think that computers (and internet connections) differ in 
 power more than they used to. The alternative explanation might 
 be that developers with powerful computers don't notice as 
 easily how slow stuff they are making.
Yep, developers should use the low end computers the application targets for executing their code.
 Then again, it also might be more competition as industry gets 
 bigger => tighter deadlines and less slack => software getting 
 done worse and worse.
But more competition should lead to better quality... The problem with this (also for other sectors) is that for capitalism to work well for engineered products the consumer has to be knowledgable, demanding and be willing to shop around. But human beings tend to go with the flow (fashion, marketing, social peers) when they are making choices on things they have limited knowledge and interest in. Audio software is still pretty good, for instance. I assume this is because consumers in that market have a fairly good understanding of what to expect.
 Not everything. You're certainly familiar with the D principle 
 for being BOTH a systems programming and application 
 programming language. It follows that the standard library 
 should cater for both cases, so large parts of the standard 
 library should be handy for systems programming but not 
 everything, because that would drive application programmers 
 away.

 And Phobos is on the right track. There are shortcomings, but 
 nothing is fundamentally wrong from systems programming 
 perspective. D ranges are totally usable at system level. I've 
 used them myself when compiling to WebAssembly, where the D 
 ecosystem is (currently) just as primitive as on some 
 microcontroller.
I understand where you are coming from, and I agree that D/C++ ranges can be used for system level programming (at least as a starting point). I still think everything in the standard lib should be useful for system level programming, so you don't have to reconsider your options because of standard lib deficiencies half way through. What is needed in addition to that is a standardized application level library built on top of the standard library. This could ship with the compiler and be defined in terms of APIs (so different compilers can optimize the implementation for their backend). Or maybe even an application framework (with keyboard, mouse, graphics apis). I think it is better with a more layered eco-system of standardized library APIs than one big monolithic standard library.
Jun 16
prev sibling parent Paulo Pinto <pjmlp progtools.org> writes:
On Wednesday, 16 June 2021 at 07:47:18 UTC, Ola Fosheim Grostad 
wrote:
 On Tuesday, 15 June 2021 at 16:21:07 UTC, Petar Kirov 
 [ZombineDev] wrote:
 [...]
This trend has been true since the 1980s where people wrote key routines in assembly. Meaning, most projects aim for usable at lowest price. More capable computers means less efficient programming... I think a better argument is that system level programming requires predictable latency and high effiency to a greater extent. For instance, if I create a solar powered monitoring system then I want to use a low powered cpu. Walter says D is for systems programming. Well, then it follows that everything in the standard library should be designed for that purpose. Meaning: predictable latency and highest efficiency, either by being fast OR consume minimal resources (energy or RAM). If D does not follow through on that then it cannot be considered to be a dedicated system level language. But then D needs to define what it is for.
Ironically Oberon, TinyGo, microEJ and .NET Nanoframework, Meadow fullfil that use case, while everyone keeps arguing what is D's killer use case. Assuming you are happy with an ARM Cortex M0 class or Arduino like device.
Jun 16
prev sibling parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 15 June 2021 at 10:13:07 UTC, Araq wrote:
 On Tuesday, 15 June 2021 at 08:22:07 UTC, Petar Kirov 
 [ZombineDev] wrote:
 One of the things that I don't like about languages that I use 

 is that pretty much their whole ecosystems only cares about 
 input iteration and things that in D are trivially `O(1)` like 
 `array.map!foo[$/2 .. $].retro.map!bar.length` end up being 
 unexpected performance pitfalls in practice in these languages.
Er, performance pitfalls can be found with a profiler.
Yes, but there's a big difference between "can" and "will". There's an increasing quantity of JS (not only JS, just it's the most prominent example) web, desktop(electron), and mobile applications written with complete disregard for performance that perform tasks slower than similar programs did way faster 20 years ago on the hardware from that time. Defaults matter. If an inefficient way is also the easiest way to implement a feature it will likely be first choice for many and very rarely someone will got out of their way to optimize it later. And it's not just JS. I've seen this in large C++ codebases as well, where the development friction severely limits the approaches that people will consider when implementing features. As a random example (which you've probably seen): https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times-by-70 Also, from experience, profiling modern async UI applications is really not straightforward as opposed to CLI apps with no event loop. Often there are very few obvious bottlenecks. Instead you have "death by a thousand cuts" - small inefficiencies spread evenly across the application accumulate to produce an absolute bloatware.
 Crashes are much harder to track down. If I have to pick 
 between them, I take the performance pitfalls, as they are much 
 easier to fix. At least for your outlined `.retro.map.length` 
 snippets.
While, obviously correct programs are better than fast but incorrect programs, taking such a simplistic view would be dangerous mistake. When people who care (or are forced to care) about performance track-down a performance bug due to the a use of a high level language or library features, often their first instinct would be rewrite the code with more primitive constructs, which are easier to reason about wr.t. performance. One may think that say SQL injection is a solved problem, but they would be surprised by the number of cases where teams have replaced ORM frameworks in parts of their applications with raw SQL queries, just because either they didn't understand well the ORM framework, or the framework itself was at fault. In the case of iterator / range libraries, would you rather people use raw loops rather than said libraries? Because, realistically if people find iterator / range libraries slow, or their API too constraining for them to express they're ideas, raw loops would be the "solution".
 That is not to say that I wouldn't want my libraries to be 
 fool-proof against that kind of problems, but at same time, at 
 least in my experience with D it hasn't been a big deal, so 
 I'd much rather have a powerful algorithmic API foundation 
 than prevent very useful and beautiful designs, just because 
 they can be misused.
Beauty is subjective, safety isn't.
From a math point of view, safety/correctness may be a binary property, however in practice every type of bug has a different probability and impact. Using D ranges compared to raw loops greatly increases my ability to reason about the code, diminishes amount of bugs it has and also makes it easier to optimize, because it is easy to experiment with alternative approaches. If the language or ecosystem forced a less expressive iterator abstraction on me, most likely I would have to resort to raw loops more often and in turn have a greater probability bugs in my code.
 We're slowly catching up with our support for "view types".
Interesting, can you elaborate more?
Wrong forum for that.
Why? The topic is comparing iterator abstractions in different languages. Where do you draw line?
Jun 15
prev sibling parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Monday, 14 June 2021 at 12:33:37 UTC, IGotD- wrote:
 On Monday, 14 June 2021 at 01:48:18 UTC, Friendly Neighborhood 
 Weirdo wrote:
 This is a talk from the 2021 c++now conference: 
 https://www.youtube.com/watch?v=d3qY4dZ2r4w
Interesting video. There is something that I'm kind of confused about. The narrator in the video claims that in D popFront doesn't consume the element. Is this true? When I do a popFront, doesn't it consume the first element in an array for example. Can someone clarify this?
In D, `.popFront()` advances the range to the next element, just like `++it` advances a C++ iterator. I think the presenter makes a point that D's `popFront` doesn't consume, because in C++'s STL the convention is to have `pop_{front,back}` as names of container member functions that indeed "consume" (remove, erase) the first/last element, i.e. the destructor of that element will be called. So it's not an insight w.r.t D's design, but rather a clarification for the C++ audience which presumably expects `popFront` to be method on a container and not a range (view) of the container. That said, the whether `poprFront` "consumes" is dependent on type of range. Most Phobos algorithms like [`map`](https://dlang.org/phobos/std_algorithm_iteration#map) and [`filter`](https://dlang.org/phobos/std_algorithm_iteration#filter) are range adaptors which won't change the structure of the container they're iterating (e.g. if the underlying range is a linked list, they can't really unlink elements from it). However, some range types are actually "containers" themselves, and calling `popFront` on them would indeed consume their current element. For example: * [`iota`](https://dlang.org/phobos/std_range#iota) * [`generate`](https://dlang.org/phobos/std_range#generate) * [`only`](https://dlang.org/phobos/std_range#only) * [`recurrence`](https://dlang.org/phobos/std_range#.Recurrence) * [`sequence`](https://dlang.org/phobos/std_range#.Sequence) * RNG engines in [`std.random`](https://dlang.org/phobos/std_random)
 Also in Rust, next does consume the element. However, in Rust 
 you can iterate over references (which also automatically 
 borrows). So iterating over borrowed references does not 
 consume the underlying data structure.
I don't have much Rust experience, but my understanding is that Rust's `std::iter::Iterator.next(&mut self) -> Option<Self::Item>` design leans into its move-by-default, single mutable reference nature. I don't know if D's range design would be ergonomic or even admissible in a purely non-`unsafe` library in Rust. Seems like Rust's advantage is that is solves the iterator invalidation problem at the cost of more limited algorithmic expressiveness and generic power. Perhaps their "efficient, yet safe at all costs" design guideline basically paints them into a corner by essentially dictating this design. (I'm just speculating, I'm interested to to hear from people more experienced with Rust!)
Jun 14
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/13/21 9:48 PM, Friendly Neighborhood Weirdo wrote:
 This is a talk from the 2021 c++now conference: 
 https://www.youtube.com/watch?v=d3qY4dZ2r4w
The talk was really interesting and engaging. Thanks for sharing! One thing I'll note, he said for C++ `ranges::search`, there are 6 pairs of iterators that can be used: first .. last first .. mid.begin() first .. mid.end() mid.begin() .. mid.end() mid.begin() .. last mid.end() .. last But in reality, there are 6 more: last .. first last .. mid.begin() ... // etc Of course those are invalid, but the C++ compiler will happily use them, and graciously allow memory corruption. And of course, that is the point of D using ranges instead of iterators (maybe he glossed over it because C++ developers take unsafety as a given). It got me thinking again about iterators in D and how that might look like. A lifetime ago I had a collection library (dcollections) that had both ranges and "iterators" that I called cursors. The benefit of a cursor is it could refer to exactly one element, and you could use them to compose ranges from the original. In dcollections, when you did `container.find` you got a cursor, and you can use the collection to construct a range using 2 cursors, or remove a specific element using the cursor. I find this functionality extremely useful, and it sucks that D doesn't have something formal for this situation. He is right that composing ranges from the result of algorithms is needlessly complex in D. I wonder if there is room for hobbled iterators (cursors) to be in D in some capacity. -Steve
Jun 14
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 14 June 2021 at 14:29:57 UTC, Steven Schveighoffer 
wrote:
 Of course those are invalid, but the C++ compiler will happily 
 use them, and graciously allow memory corruption. And of 
 course, that is the point of D using ranges instead of 
 iterators (maybe he glossed over it because C++ developers take 
 unsafety as a given).
Apples and oranges. Gang of Four defined _iterator_ to be what D and C++ call "ranges", basically a "heavy" object that has builtin knowledge of how to traverse a data-structure. Of course, the term "range" is a misnomer since you can traverse a graph in numerous ways, so you can have a depth-first-iterator, breadth-first-iterator, and indeed have iterators that to randomized walks... so iterator is a better name. C++ STL "iterators" are *light-weight* pointer-abstractions. You can create heavy C++ "iterators" that keeps track of where "the end" is and have it throw an exception. C++ does support safe iterators. STL does not.
Jun 14
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/14/21 10:46 AM, Ola Fosheim Grøstad wrote:
 On Monday, 14 June 2021 at 14:29:57 UTC, Steven Schveighoffer wrote:
 Of course those are invalid, but the C++ compiler will happily use 
 them, and graciously allow memory corruption. And of course, that is 
 the point of D using ranges instead of iterators (maybe he glossed 
 over it because C++ developers take unsafety as a given).
Apples and oranges. Gang of Four defined _iterator_ to be what D and C++ call "ranges", basically a "heavy" object that has builtin knowledge of how to traverse a data-structure. Of course, the term "range" is a misnomer since you can traverse a graph in numerous ways, so you can have a depth-first-iterator, breadth-first-iterator, and indeed have iterators that to randomized walks... so iterator is a better name. C++ STL "iterators" are *light-weight* pointer-abstractions. You can create heavy C++ "iterators" that keeps track of where "the end" is and have it throw an exception. C++ does support safe iterators. STL does not.
I'm referring to the talk, where you can use one function (search) to compose any possible "range" from the result that makes sense, but also ones that don't make sense. And D doesn't allow that specifically due to the safety implications. A C++ "safe" iterator that is essentially a range under the hood doesn't compose any better than D ranges. -Steve
Jun 14
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 14 June 2021 at 15:12:56 UTC, Steven Schveighoffer 
wrote:
 I'm referring to the talk, where you can use one function 
 (search) to compose any possible "range" from the result that 
 makes sense, but also ones that don't make sense.
Ok, but some complicated algorithms are sometimes easier to express if you allow the ones that don't make sense (but does not happen).
 A C++ "safe" iterator that is essentially a range under the 
 hood doesn't compose any better than D ranges.
I don't think "C++ iterators" with safety checks essentially are ranges, they are essentially checked pointers. How would you quickly search an annotated suffix array with D ranges as cursors?
Jun 14
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 14 June 2021 at 15:22:29 UTC, Ola Fosheim Grøstad 
wrote:
 How would you quickly search an annotated suffix array with D 
 ranges as cursors?
Let me expand on that: What if you want to do a binary search, but each node in the table has an optimization entry that says how many nodes you should move up or down when you hit that node? (For speeding up the search or some other reason).
Jun 14
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/14/21 11:22 AM, Ola Fosheim Grøstad wrote:
 On Monday, 14 June 2021 at 15:12:56 UTC, Steven Schveighoffer wrote:
 I'm referring to the talk, where you can use one function (search) to 
 compose any possible "range" from the result that makes sense, but 
 also ones that don't make sense.
Ok, but some complicated algorithms are sometimes easier to express if you allow the ones that don't make sense (but does not happen).
True, and matching iterators from separate subranges might be reasonable when they all come from an original container/range. It's still *mostly* possible with cursors, as long as you have a definitive parent range/container to use for range composition.
 
 A C++ "safe" iterator that is essentially a range under the hood 
 doesn't compose any better than D ranges.
I don't think "C++ iterators" with safety checks essentially are ranges, they are essentially checked pointers.
You'd have to restrict them to forward-only. Which may not be a huge issue, if you don't ever go backwards. Or else, also you'd have to store the beginning of the original as well as the end.
 
 How would you quickly search an annotated suffix array with D ranges as 
 cursors?
dcollections cursors are not iterators, they store an optional reference to one element, and also can tell you whether they belong to a container or not (they are actually ranges of zero or one element, but that's mostly only because it was easy to support). So you can compose ranges using the original container and the cursors. In terms of std::search (which is not the style that dcollections supported), you would use it just like C++, but of course, you have to compose the range before passing to a D algorithm which requires ranges. Like: ```d auto allEqualRange = container.search(someValue); auto allBeforeRange = container[container.begin .. mid.begin]; // I wish container.begin could be `^` auto allAfterRange = container[mid.end .. $]; auto beforeAndEqual = container[container.begin .. mid.end]; auto equalAndAfter = container[mid.begin .. $]; ``` Dcollections supported a `find` method, which returned a cursor. Then you can compose your ranges with that. Possibly only redblacktree-based containers with allowed duplicates would match the requirements for `search`. -Steve
Jun 14
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 14 June 2021 at 15:43:47 UTC, Steven Schveighoffer 
wrote:
 On 6/14/21 11:22 AM, Ola Fosheim Grøstad wrote:
 I don't think "C++ iterators" with safety checks essentially 
 are ranges, they are essentially checked pointers.
You'd have to restrict them to forward-only. Which may not be a huge issue, if you don't ever go backwards. Or else, also you'd have to store the beginning of the original as well as the end.
You only need to store a pointer to the container, but it doesn't matter if the datastructure you are looking at only provide offsets that stays within the legal range. Like if you do a binary search on a sorted dictionary and each element/node has offsets that expands to the range of items that starts with the same prefiks. If we think of random access ranges (e.g. C++) as slices, you can implement binary search with elegance, but what to do if you want move out of the range? Then you end up writing cluttered code. At this point, table pointers ("C++ iterators") make the code less cluttered. So, there is an advantage to "C++ ranges" that can be decomposed into table pointers.
 dcollections cursors are not iterators, they store an optional 
 reference to one element, and also can tell you whether they 
 belong to a container or not (they are actually ranges of zero 
 or one element, but that's mostly only because it was easy to 
 support).

 So you can compose ranges using the original container and the 
 cursors.
Yes, that makes it more flexible. So a zero-length range is a pointer to the space between two elements, and a one-length range is a pointer to that one element? That is kinda like "C++ iterators". C++ begin() could be seen as pointing to the space before the first element and end() could be seen as pointing the space after the last element. So in a sense, "C++ iterators" are zero-length "ranges". Of course, usually begin() is described as pointing to the first element, but that is a matter of wording. It actually makes more sense sometimes to think of "C++ iterators" as being pointers to the space between elements.
Jun 14
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 14 June 2021 at 16:02:18 UTC, Ola Fosheim Grøstad 
wrote:
 cluttered code. At this point, table pointers ("C++ iterators") 
 make the code less cluttered. So, there is an advantage to "C++ 
 ranges" that can be decomposed into table pointers.
Of course, the _big disadvantage_ with "C++ iterators" as specc'ed is that it is time-consuming to implement all the requirements. Can easily be 400 lines or more. Compare this to a Python Generator based on coroutines with ```yield``` or C++ stack-less coroutines with ```co_yield```...
Jun 14
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/14/21 12:02 PM, Ola Fosheim Grøstad wrote:
 On Monday, 14 June 2021 at 15:43:47 UTC, Steven Schveighoffer wrote:
 On 6/14/21 11:22 AM, Ola Fosheim Grøstad wrote:
 dcollections cursors are not iterators, they store an optional 
 reference to one element, and also can tell you whether they belong to 
 a container or not (they are actually ranges of zero or one element, 
 but that's mostly only because it was easy to support).

 So you can compose ranges using the original container and the cursors.
Yes, that makes it more flexible. So a zero-length range is a pointer to the space between two elements, and a one-length range is a pointer to that one element? That is kinda like "C++ iterators". C++ begin() could be seen as pointing to the space before the first element and end() could be seen as pointing the space after the last element. So in a sense, "C++ iterators" are zero-length "ranges".
Yes, this is how dcollections worked. Except dcollections cursors were slightly more tied to an element than C++ iterators. In fact, re-examining the implementation, I made some mistakes which I never addressed. My concept was that a cursor points either an element or *one past* the element. So popFront'ing a cursor doesn't move past the element to the next element, it moves to the space beyond the element. The awesome thing this enables is that you have a choice of both before and after the element, without having to refer to the next element at all (and things like rearranging or inserting elements do not affect the cursor's viability). However, in particular the red-black-tree based sets/maps just used the Node pointer as-is for slicing, regardless of whether it was empty or not. Which is very much not correct according to the concept. But I haven't touched the code or used it in over a decade, so I'm not sure if I'll ever fix that.
 
 Of course, usually begin() is described as pointing to the first 
 element, but that is a matter of wording. It actually makes more sense 
 sometimes to think of "C++ iterators" as being pointers to the space 
 between elements.
 
Yes, it's pointing at the boundaries. That's what makes them composable. -Steve
Jun 14
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 14 June 2021 at 16:29:09 UTC, Steven Schveighoffer 
wrote:
 However, in particular the red-black-tree based sets/maps just 
 used the Node pointer as-is for slicing, regardless of whether 
 it was empty or not. Which is very much not correct according 
 to the concept. But I haven't touched the code or used it in 
 over a decade, so I'm not sure if I'll ever fix that.
Was that because it was harder to do for red-black-trees? I have been toying with chunked containers that allows fast transfers of ranges of objects from one container type to another. The basic idea being that you could have a queue and then quickly transfer the whole queue to a B-tree or maybe the first 100 objects of the queue to the middle of a deque. But it is challenging to come up with a chunk type that works equally well for many container types. So I limit it to sequential container types. Then have links between the chunks (so basically a linked list of small arrays). That way I can have one iterator type that works for all container types. I would also like to make it possible to have table pointers that doesn't become invalidated when moving a chunk, but how to do this if they point to the space between elements and not the element itself? Like if it points to the space between two chunks and you move the chunk after to a new container then the table pointer would move to the new container as well (or the implementation would be inefficient). Seems like the semantics of pointing to elements instead of the space between elements is more robust for this kind of container semantics. So, when you mutate then the distinction between pointing to or between elements become significant. Also, it then becomes important to figure out whether table pointers point to one specific container or the content and how the table pointers can be implemented efficiently (time and space).
Jun 14
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/14/21 1:00 PM, Ola Fosheim Grøstad wrote:
 On Monday, 14 June 2021 at 16:29:09 UTC, Steven Schveighoffer wrote:
 However, in particular the red-black-tree based sets/maps just used 
 the Node pointer as-is for slicing, regardless of whether it was empty 
 or not. Which is very much not correct according to the concept. But I 
 haven't touched the code or used it in over a decade, so I'm not sure 
 if I'll ever fix that.
Was that because it was harder to do for red-black-trees?
No, I think it was just a mistake of design. The end node in a RBT is always the root (everything is based on a left child), which makes it O(1) always to fetch it. But it's not a true element, there is no data stored there. So the easiest thing to do was to return a cursor with `end` as the node, and `false` for whether the cursor had data. Which probably lead to just ignoring the `hasData` boolean. But it should have been basically `cursor(_end.prev, false)` for the result of `end()`, and the slice operator would have to take into account which "space" a cursor was pointing at.
 
 I have been toying with chunked containers that allows fast transfers of 
 ranges of objects from one container type to another.
 
 The basic idea being that you could have a queue and then quickly 
 transfer the whole queue to a B-tree or maybe the first 100 objects of 
 the queue to the middle of a deque.
 
 But it is challenging to come up with a chunk type that works equally 
 well for many container types. So I limit it to sequential container 
 types. Then have links between the chunks (so basically a linked list of 
 small arrays). That way I can have one iterator type that works for all 
 container types.
Oh for sure. Generalizing the relationships that nodes have would be pretty difficult. Trees need 2 pointers at least (possibly a parent pointer as well), whereas a linked list only needs one (if it's not bidirectional).
 I would also like to make it possible to have table pointers that 
 doesn't become invalidated when moving a chunk, but how to do this if 
 they point to the space between elements and not the element itself? 
 Like if it points to the space between two chunks and you move the chunk 
 after to a new container then the table pointer would move to the new 
 container as well (or the implementation would be inefficient).
 
 Seems like the semantics of pointing to elements instead of the space 
 between elements is more robust for this kind of container semantics. 
 So, when you mutate then the distinction between pointing to or between 
 elements become significant.
Unless you allocate nodes for the actual spaces, you have to deal with this kind of stuff. I felt like pointing at an element, and then storing whether that data is valid or not was a pretty reasonable way to have element pointers without safety problems. Thinking about it some more, it may have been a conscious decision not to care about the "valid" boolean at first, since I wasn't focused on implementing the cursors as ranges at all. But if I did it again today, this is exactly what I would do. My original use case for making dcollections was to have a fast-lookup map (using RBTree), and be able to remove items as I iterate it the tree. Using ranges for this just doesn't make sense in the general case, but using cursors works well. -Steve
Jun 14
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 14 June 2021 at 17:19:35 UTC, Steven Schveighoffer 
wrote:
 Oh for sure. Generalizing the relationships that nodes have 
 would be pretty difficult. Trees need 2 pointers at least 
 (possibly a parent pointer as well), whereas a linked list only 
 needs one (if it's not bidirectional).
Yes, my idea is to keep the chunk-header within two cache lines (or one), but there should be auxillary space there for the container as well. So when you insert a chunk-list into a new container there is meta-info about the nature of the data (are all chunks filled up, data sorted etc), so the container can then decide whether to run over the chunks and "fix" the structure/invariants. So if you take a singled linked list and insert it into a double linked one with a hash index, then the double linked list will run over it and update backpointers and stuff in key-hash-values. However, in my next attempt at this I will try to make it SIMD friendly, I think. So using a struct-of-arrays with maybe a bitmask that says whether a slot is occupied. So each chunk would have a 32, 64 or 256 element capacity or something like that. Then maybe a table pointer would be a pointer to the chunk + a bitmask? But then it makes more sense to use a ranges approach. So you point to first chunk with bitmask and the last chunk with bitmask. So pointing to the end would be pointing to the last chunk with a zero-bitmask? Not sure… :-D It is funny how implementation choices for containers changes how one thinks about table pointers/iterators/ranges.
 Unless you allocate nodes for the actual spaces, you have to 
 deal with this kind of stuff. I felt like pointing at an 
 element, and then storing whether that data is valid or not was 
 a pretty reasonable way to have element pointers without safety 
 problems.
Yes, but how does it affect efficiency? It really depends on how they are used, for instance, if you provide all the algorithms yourself, then inefficent table-pointers/cursors can be turned into internal raw pointers and then it does not matter as much (as they only are begin/end markers that are mutated infrequently).
 My original use case for making dcollections was to have a 
 fast-lookup map (using RBTree), and be able to remove items as 
 I iterate it the tree. Using ranges for this just doesn't make 
 sense in the general case, but using cursors works well.
Yes. I am concerned about how a struct-of-arrays datastructure approach affects these kind of considerations. Of course, there is also the question of whether the LDC/GDC backend are able to compile to bitmasked SIMD instructions (bitmasks that says whether an element is present or not)?? I wonder if that is at all possible or if I will have to write explicit SIMD code?
Jun 14
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 14 June 2021 at 17:49:46 UTC, Ola Fosheim Grøstad 
wrote:
 Then maybe a table pointer would be a pointer to the chunk + a 
 bitmask? But then it makes more sense to use a ranges approach. 
 So you point to first chunk with bitmask and the last chunk 
 with bitmask. So pointing to the end would be pointing to the 
 last chunk with a zero-bitmask? Not sure… :-D
No. I think I cannot have empty ranges to signify the beginning or end. Since there might be only one chunk in the container. So then there would be an ambiguity as to whether I point to the empty space at the beginning or the end. So probably better to have two types: bitmasked ranges, and explicitly indexed table pointers. Perhaps?
Jun 14
prev sibling next sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 6/14/21 7:29 AM, Steven Schveighoffer wrote:

 But in reality, there are 6 more:

 last .. first
 last .. mid.begin()
 .... // etc

 Of course those are invalid, but the C++ compiler will happily use them
Add to that pairs of iterators made from different containers. :) last_of_apples .. first_of_oranges Ali
Jun 14
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/14/2021 7:29 AM, Steven Schveighoffer wrote:
 I wonder if there is room for hobbled iterators (cursors) to be in D in some 
 capacity.
Have them be an index, rather than a pointer.
Jun 15
next sibling parent Dukc <ajieskola gmail.com> writes:
On Tuesday, 15 June 2021 at 18:35:04 UTC, Walter Bright wrote:
 On 6/14/2021 7:29 AM, Steven Schveighoffer wrote:
 I wonder if there is room for hobbled iterators (cursors) to 
 be in D in some capacity.
Have them be an index, rather than a pointer.
This is a good generic solution for random access ranges. But iterators are about any forward range. For a forward range, I think something like this does it albeit verbosely: ``` auto midAndTail = haystack.find(/*...*/); auto head = haystack.recurrence!((r,n)=>r[n-1].dropOne).until!"a is b"(midAndTail).map!"a.front"; ``` But there is currently no way for a bidirectional range to iterate forward and then turn back. We could agree on a generic range primitive, say `turned`, that returns a range that iterates back to start that is optional. But then the bidirectional range that supports it will always be bigger because it needs to remember it's start. Hard question.
Jun 16
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/15/21 2:35 PM, Walter Bright wrote:
 On 6/14/2021 7:29 AM, Steven Schveighoffer wrote:
 I wonder if there is room for hobbled iterators (cursors) to be in D 
 in some capacity.
Have them be an index, rather than a pointer.
How does this work on a linked list? -Steve
Jun 16
parent reply Lorenso <lorensbrawns gmail.com> writes:
I also have the same problems in C++
Jun 19
parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Saturday, 19 June 2021 at 12:38:42 UTC, Lorenso wrote:
 I also have the same problems in C++
Why?
Jun 20
prev sibling parent reply Guillaume Piolat <first.name domain.tld> writes:
On Monday, 14 June 2021 at 01:48:18 UTC, Friendly Neighborhood 
Weirdo wrote:
 This is a talk from the 2021 c++now conference: 
 https://www.youtube.com/watch?v=d3qY4dZ2r4w
This is an interesting talk that is more a comparison of iteration abstractions than a language comparison, across 6 different abstractions. It's a healthy reminder of just how complicated everything is in C++, often for no good reason. -
Jun 16
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 16 June 2021 at 10:53:26 UTC, Guillaume Piolat 
wrote:
 It's a healthy reminder of just how complicated everything is 
 in C++, often for no good reason.
That statement is true for some things, but not really in this case. He focused on STL style table-pointers. Which is a different concept. In modern C++ you have ranges and generators. C++ generators are easier to write than D ranges, not much more difficult than generators in Python.
Jun 16
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 16 June 2021 at 11:08:52 UTC, Ola Fosheim Grøstad 
wrote:
 On Wednesday, 16 June 2021 at 10:53:26 UTC, Guillaume Piolat 
 wrote:
 It's a healthy reminder of just how complicated everything is 
 in C++, often for no good reason.
That statement is true for some things, but not really in this case. He focused on STL style table-pointers. Which is a different concept.
Keep in mind that you don't have to implement the full "c++ iterator" in order to support ranged for-loops in modern C++. You only need to implement: ``` auto b = range.begin() // obtain progress state object auto e = range.end() // obtain end-marker b != e // comparable to D empty() ++b // comparable to D popFront() *b // comparable to D front() ``` So it is basically not much more work to support ranged for in C++ than D, but most C++ library authors would implement full table-pointers and then it gets more time consuming. For an application then there is no need to implement more than you need, obviously.
Jun 16
prev sibling parent Guillaume Piolat <first.name domain.tld> writes:
On Wednesday, 16 June 2021 at 10:53:26 UTC, Guillaume Piolat 
wrote:
 It's a healthy reminder of just how complicated everything is 
 in C++, often for no good reason.
Size not intended.
Jun 16