digitalmars.D - Transient ranges
- Seb (11/39) May 27 2016 A couple of weeks ago on the next/shift convenience wrapper
- Jonathan M Davis via Digitalmars-d (19/22) May 27 2016 Honestly, I don't think that supporting transient ranges is worth it. Ev...
- cym13 (6/31) May 28 2016 Agreed, any time I see « you're code is wrong, byLine reuses its
- Joseph Rushton Wakeling (6/12) May 28 2016 I have personally wondered if there was a case for a
- Stefan Koch (3/17) May 28 2016 That is a rather sound idea.
- Seb (5/24) May 28 2016 I like this idea too!
- Brad Roberts via Digitalmars-d (5/15) May 28 2016 Um. that's wrong, just look at byLine as evidence. It is popFront that
- Joseph Rushton Wakeling (5/31) May 28 2016 I'm not talking about how byLine currently behaves. I'm talking
- Dicebot (10/22) May 29 2016 I would prefer such ranges to not have `front` and return new item from
- ZombineDev (10/37) May 29 2016 +1
- ZombineDev (7/46) May 29 2016 This won't work safely, because the compiler would need to
- default0 (16/64) May 29 2016 Isnt the idea to communicate "hey! You may not possibly cache
- Seb (33/69) May 29 2016 I don't think that should be a huge problem, but after having
- Joseph Rushton Wakeling (7/25) May 29 2016 I don't follow your reasoning here. In the proposal I put
- Stefan Koch (4/5) May 29 2016 There is none :)
- Seb (12/38) May 29 2016 Nothing. Just that it could lead to a lot of surprising mistakes,
- ZombineDev (14/40) May 29 2016 Yes it can be introspected in library, but it breaks the
- Steven Schveighoffer (10/40) May 29 2016 This doesn't help at all. I can still make a "transient" range with all
- Joseph Rushton Wakeling (13/23) May 30 2016 Ack, yea, I think see the issue. It's
- Andrei Alexandrescu (5/22) May 30 2016 That's right. The one approach that works here is to acknowledge the
- Steven Schveighoffer (8/35) May 30 2016 Yes, this is how e.g. Linux getline works. But this leaks even more
- Steven Schveighoffer (7/27) May 29 2016 What problems are solvable only by not caching the front element? I
- Dicebot (11/16) May 30 2016 As far as I know, currently it is done mostly for performance
- Steven Schveighoffer (9/23) May 30 2016 Maybe my understanding of your post is incorrect. You said "It is
- Alex Parrill (18/39) May 30 2016 One case is wrapping a network stream: a TCP input range that
- Steven Schveighoffer (18/35) May 31 2016 This is not the same issue. Note that populating on call to front
- Dicebot (26/55) May 31 2016 There are two separate points here:
- Steven Schveighoffer (11/60) May 31 2016 Regardless of property-like or not, this should be the case. Otherwise,
- Dicebot (15/40) May 31 2016 Except it isn't in many cases you call "bugs" :(
- Timon Gehr (2/6) May 31 2016 map often allows random access. Do you suggest it should cache opIndex t...
- Dicebot (3/15) May 31 2016 Random access map must store all already evaluated items in
- Steven Schveighoffer (15/51) May 31 2016 If you want to use such "ranges", the compiler will not stop you. Just
- Dicebot (17/26) Jun 03 2016 It only strengthens my opinion that Phobos is not a standard
- Andrei Alexandrescu (7/9) Jun 03 2016 FYI that was on the table, along with many other possible designs.
- Dicebot (7/17) Jun 03 2016 Has it actually been measured as inefficient? Ranges are very
- Steven Schveighoffer (22/44) Jun 03 2016 This doesn't solve the problem. empty may not be knowable until you try
- Dicebot (21/55) Jun 12 2016 Not necessarily, one can simply define that range processing
- Steven Schveighoffer (34/69) Jun 13 2016 So what it seems like you want is somerange.map(...).front to evaluate
- John Colvin (6/31) Jun 13 2016 I see three levels of function used with map:
- Andrei Alexandrescu (3/3) Jul 06 2016 Related: just made a few comments to
- Dicebot (12/18) Jun 14 2016 I think you are completely right here, though it is not very
- Andrei Alexandrescu (11/20) Jun 13 2016 C++'s input iterator model also predicates multiple accesses to the
- Dicebot (31/48) Jun 14 2016 Yeah, I don't say C++ iterator model is any better for that kind
- Steven Schveighoffer (16/39) May 29 2016 Wholly disagree. If we didn't cache the element, D would be a
- Jonathan M Davis via Digitalmars-d (51/93) May 29 2016 Having byLine not copy its buffer is fine. Having it be a range is not.
- Steven Schveighoffer (23/103) May 30 2016 Here is how I think about it: the front element is valid and stable
- Andrei Alexandrescu (3/5) May 30 2016 Yah, we should have formal language in the library reference about this....
- H. S. Teoh via Digitalmars-d (52/75) May 30 2016 Yes, some years ago I submitted a series of PRs to fix poorly-written
- Jonathan M Davis via Digitalmars-d (21/27) May 31 2016 Technically, the range API doesn't even require that front return the sa...
- Steven Schveighoffer (29/57) May 31 2016 The API doesn't require it mechanically, but the API does require it
- Joseph Rushton Wakeling (5/28) Jun 01 2016 The `Generator` range is an eager violator of this requirement:
- Steven Schveighoffer (3/28) Jun 01 2016 Yeah, that seems like a bug.
- Steven Schveighoffer (7/15) Jun 02 2016 I guess this controversy was hashed out when this function was added. So...
- Jack Stouffer (10/12) May 29 2016 byLine already is a laughingstock performance wise:
- Steven Schveighoffer (8/19) May 30 2016 But that's because we base our I/O on C :)
- Andrei Alexandrescu (2/3) May 30 2016 Cool! What is the trick to the speedup? -- Andrei
- Steven Schveighoffer (3/6) May 30 2016 Direct buffer access and not using FILE * as a base.
- Andrei Alexandrescu (10/16) May 30 2016 So what primitives do you use? The lower-level descriptor-based
- Steven Schveighoffer (41/58) May 31 2016 Yes. That is not the focus of my library, though. I have a very crude
- Patrick Schluter (7/12) Jun 01 2016 You can cheat by using setvbuf() and imposing your own buffer to
- Steven Schveighoffer (5/17) Jun 01 2016 But there is no mechanism to determine where the current file pointer is...
- Andrei Alexandrescu (6/13) May 30 2016 That has been fixed a long time ago by
- Steven Schveighoffer (8/10) May 29 2016 enum isTransient(R) = is(typeof(() {
- Steven Schveighoffer (5/14) May 29 2016 obviously, this is better as a simple && statement between the three
- default0 (3/21) May 29 2016 Would that make a range of polymorphic objects transient?
- Jonathan M Davis via Digitalmars-d (22/46) May 29 2016 It would make pretty much anything that isn't a value type - including a
- Steven Schveighoffer (4/23) May 30 2016 Of course! If it's mutable (or marked const), it can change from call to...
- Alex Parrill (6/17) May 29 2016 allIndrectionsImmutable could probably just be is(T : immutable)
- Steven Schveighoffer (3/20) May 30 2016 Yes, I think that is correct. Thanks!
- =?UTF-8?B?Tm9yZGzDtnc=?= (15/18) May 30 2016 An alternative solution is to extend data-flow/escape-analysis to
A couple of weeks ago on the next/shift convenience wrapper discussion [1], there was a nice discussion about transient ranges. It didn't come to a conclusion, so let's revie it. I just cite the best three quotes from the thread as a summary: jmdavis:The reality of the matter is that ranges with a transient front tend to break if you do much more than use foreach on them. They don't even work with std.array.arrayschveiguy:There is an implicit expectation that if you assign the front of a range to something, it's going to last forever, because many ranges do actually behave that way. But the range contract does not guarantee that, it just happens to work.quickfur:isTransient can only be implemented if the range implementor defines its value. There is no way to know whether a range is transient or not just by looking at its type and the signatures of front, popFront, etc.. Having said that, though, since transient ranges are generally the exception rather than the norm, we could adopt a convention that transient ranges declare themselves thus by defining a .transient enum member or something like that, and have isTransient return false if a range doesn't have such a member.On a related note, there are a good number of Phobos algorithms that do not work on transient ranges at all, because internally they (blindly) assume that assigning the value of .front to a local variable will retain its original value after popFront is called. I've fixed a couple of places where this happens, but I'm almost certain many other places haven't been fixed yet, and in some cases, cannot be fixed without restricting the algorithms to forward ranges only rather than input ranges. Some things simply cannot be implemented without copying .front somehow, and the only way to do this reliably with the current API is to require forward ranges and use save to keep track of the previous position of the range.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change? [1] https://github.com/dlang/phobos/pull/4010
May 27 2016
On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it. Every single range-based function would have to either test that the "transient" enum wasn't there or take transient ranges into account, and realistically, that isn't going to happen. For better or worse, we do have byLine in std.stdio, which has a transient front, but aside from the performance benefits, it's been a disaster. It's way too error-prone. We now have byLineCopy to combat that, but of course, byLine is the more obvious function and thus more likely to be used (plus it's been around longer), so a _lot_ of code is going to end up using it, and a good chunk of that code really should be using byLineCopy. I'm of the opinion that if you want a transient front, you should just use opApply and skip ranges entirely. Allowing for front to be transient - whether you can check for it or not - simply is not worth the extra complications. I'd love it if we deprecated byLine's range functions, and made it use opApply instead and just declare transient ranges to be completely unsupported. If you want to write your code to have a transient front, you can obviously take that risk, but you're on your own. - Jonathan M Davis
May 27 2016
On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:Agreed, any time I see « you're code is wrong, byLine reuses its buffer you should use byLineCopy instead » I cringe as it means that people are supposed to know implementation details. It's obviously a leaky abstraction. While its speed makes it useful nonetheless I don't think such code should be encouraged.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it. Every single range-based function would have to either test that the "transient" enum wasn't there or take transient ranges into account, and realistically, that isn't going to happen. For better or worse, we do have byLine in std.stdio, which has a transient front, but aside from the performance benefits, it's been a disaster. It's way too error-prone. We now have byLineCopy to combat that, but of course, byLine is the more obvious function and thus more likely to be used (plus it's been around longer), so a _lot_ of code is going to end up using it, and a good chunk of that code really should be using byLineCopy. I'm of the opinion that if you want a transient front, you should just use opApply and skip ranges entirely. Allowing for front to be transient - whether you can check for it or not - simply is not worth the extra complications. I'd love it if we deprecated byLine's range functions, and made it use opApply instead and just declare transient ranges to be completely unsupported. If you want to write your code to have a transient front, you can obviously take that risk, but you're on your own. - Jonathan M Davis
May 28 2016
On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:I have personally wondered if there was a case for a TransientRange concept where the only primitives defined are `empty` and `front`. `popFront()` is not defined because the whole point is that every single call to `front` will produce a different value.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it.
May 28 2016
On Saturday, 28 May 2016 at 17:27:17 UTC, Joseph Rushton Wakeling wrote:On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:That is a rather sound idea.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:I have personally wondered if there was a case for a TransientRange concept where the only primitives defined are `empty` and `front`. `popFront()` is not defined because the whole point is that every single call to `front` will produce a different value.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it.
May 28 2016
On Saturday, 28 May 2016 at 19:09:09 UTC, Stefan Koch wrote:On Saturday, 28 May 2016 at 17:27:17 UTC, Joseph Rushton Wakeling wrote:I like this idea too! `opApply` implies passing in a delegate which (if I am not mistaken) implies the use of GC at the moment and isn't as easy to optimize as `while (!file.empty) { writeln(file.front) }`On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:That is a rather sound idea.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:I have personally wondered if there was a case for a TransientRange concept where the only primitives defined are `empty` and `front`. `popFront()` is not defined because the whole point is that every single call to `front` will produce a different value.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it.
May 28 2016
On 5/28/2016 10:27 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:Um. that's wrong, just look at byLine as evidence. It is popFront that changes the value to the 'next' one in the iteration. Front called twice in a row does NOT change the value, and shouldn't. I think you're misunderstanding something.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:I have personally wondered if there was a case for a TransientRange concept where the only primitives defined are `empty` and `front`. `popFront()` is not defined because the whole point is that every single call to `front` will produce a different value.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it.
May 28 2016
On Saturday, 28 May 2016 at 21:32:15 UTC, Brad Roberts wrote:On 5/28/2016 10:27 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:I'm not talking about how byLine currently behaves. I'm talking about a meaningful API that could be used for a range that (by design) has a transient front, that would be unambiguous in its behaviour.On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:Um. that's wrong, just look at byLine as evidence. It is popFront that changes the value to the 'next' one in the iteration. Front called twice in a row does NOT change the value, and shouldn't. I think you're misunderstanding something.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:I have personally wondered if there was a case for a TransientRange concept where the only primitives defined are `empty` and `front`. `popFront()` is not defined because the whole point is that every single call to `front` will produce a different value.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it.
May 28 2016
On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:I would prefer such ranges to not have `front` and return new item from `popFront` instead but yes, I would much prefer it to existing form, transient or not. It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact. Because of that, a lot of Phobos ranges compromise `front` consistency in favor of speed up - which only seems to work because most algorithms need to access `front` once. I believe this is biggest issue in D ranges design right now, by large margin.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:I have personally wondered if there was a case for a TransientRange concept where the only primitives defined are `empty` and `front`. `popFront()` is not defined because the whole point is that every single call to `front` will produce a different value.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it.
May 29 2016
On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:+1 I think making popFront return a value for transient ranges is a sound idea. It would allow to easily distinguish between InputRange and TransientRange with very simple CT introspection. The biggest blocker is to teach the compiler to recognize TransientRange types in foreach. Another option is to make popFront return a new range, ala slice[1..$] (like std.range.dropOne) which would have the benefit of allowing const/immutable ranges to work.On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:I would prefer such ranges to not have `front` and return new item from `popFront` instead but yes, I would much prefer it to existing form, transient or not. It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact. Because of that, a lot of Phobos ranges compromise `front` consistency in favor of speed up - which only seems to work because most algorithms need to access `front` once. I believe this is biggest issue in D ranges design right now, by large margin.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:I have personally wondered if there was a case for a TransientRange concept where the only primitives defined are `empty` and `front`. `popFront()` is not defined because the whole point is that every single call to `front` will produce a different value.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it.
May 29 2016
On Sunday, 29 May 2016 at 11:28:11 UTC, ZombineDev wrote:On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:Scratch that:On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:+1 I think making popFront return a value for transient ranges is a sound idea. It would allow to easily distinguish between InputRange and TransientRange with very simple CT introspection. The biggest blocker is to teach the compiler to recognize TransientRange types in foreach.On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:I would prefer such ranges to not have `front` and return new item from `popFront` instead but yes, I would much prefer it to existing form, transient or not. It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact. Because of that, a lot of Phobos ranges compromise `front` consistency in favor of speed up - which only seems to work because most algorithms need to access `front` once. I believe this is biggest issue in D ranges design right now, by large margin.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:I have personally wondered if there was a case for a TransientRange concept where the only primitives defined are `empty` and `front`. `popFront()` is not defined because the whole point is that every single call to `front` will produce a different value.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it.Another option is to make popFront return a new range, ala slice[1..$] (like std.range.dropOne) which would have the benefit of allowing const/immutable ranges to work.This won't work safely, because the compiler would need to disallow access to the previous instance of the range (sort of Rust moved-from objects), but it's currently no possible. I proposed that idea because I have other uses for immutable ranges, unrelated to this discussion.
May 29 2016
On Sunday, 29 May 2016 at 11:36:37 UTC, ZombineDev wrote:On Sunday, 29 May 2016 at 11:28:11 UTC, ZombineDev wrote:Isnt the idea to communicate "hey! You may not possibly cache whatever value you get in an iteration"? If yes, just removing .front seems odd as, well, I can still obviously just cache the result of .popFront if I'm inexperienced with ranges. Maybe instead simply rename .front to .buffer for transient ranges? That name would more accurately convey that this is just a buffer the range uses to dump the current front value in, but that if you want to cache it you will need to duplicate it (because, hey, this is the buffer of the range, not yours). Not sure this is a great idea or a great name, but simply removing .front and returning the current iteration value from .popFront imho does not convey "you should not expect to be able to simply cache this by assignment" to me - arguably this is not a problem since how transient ranges work would be a documented thing, but thats just my 2 cents.On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:Scratch that:On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:+1 I think making popFront return a value for transient ranges is a sound idea. It would allow to easily distinguish between InputRange and TransientRange with very simple CT introspection. The biggest blocker is to teach the compiler to recognize TransientRange types in foreach.On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:I would prefer such ranges to not have `front` and return new item from `popFront` instead but yes, I would much prefer it to existing form, transient or not. It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact. Because of that, a lot of Phobos ranges compromise `front` consistency in favor of speed up - which only seems to work because most algorithms need to access `front` once. I believe this is biggest issue in D ranges design right now, by large margin.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:I have personally wondered if there was a case for a TransientRange concept where the only primitives defined are `empty` and `front`. `popFront()` is not defined because the whole point is that every single call to `front` will produce a different value.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it.Another option is to make popFront return a new range, ala slice[1..$] (like std.range.dropOne) which would have the benefit of allowing const/immutable ranges to work.This won't work safely, because the compiler would need to disallow access to the previous instance of the range (sort of Rust moved-from objects), but it's currently no possible. I proposed that idea because I have other uses for immutable ranges, unrelated to this discussion.
May 29 2016
On Sunday, 29 May 2016 at 11:28:11 UTC, ZombineDev wrote:On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:I don't think that should be a huge problem, but after having looked at the compiler code [1]: we should name it neither front nor popFront, because how would the compiler know that it is supposed to be transient and not a normal InputRange without front or popFront for which it should throw an error? Idea 1: New name that will make it easier to distinguish that transient ranges are something completly different to normal ranges. How about next? Problem 1: One can't use algorithms that work on transient ranges (map, reduce) anymore Idea 2: Help the compiler with Transient or `enum transient = true` Problem 2: How would the "transientivity" be automatically forwarded to ranges that work on it. Btw thinking longer about it - transient ranges aren't bad per se. They objey the InputRange contract and e.g. the following works just fine. It's just impossible to distinguish between a transient and a non-transient InputRange. ``` // input: 1\n2\n\3\n4\n... void main() { import std.stdio, std.conv, std.algorithm; stdin .byLine .map!((a) => a.to!int) .sum .writeln; } ``` https://github.com/dlang/dmd/blob/master/src/statement.d#L2596On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:+1 I think making popFront return a value for transient ranges is a sound idea. It would allow to easily distinguish between InputRange and TransientRange with very simple CT introspection. The biggest blocker is to teach the compiler to recognize TransientRange types in foreach.On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:I would prefer such ranges to not have `front` and return new item from `popFront` instead but yes, I would much prefer it to existing form, transient or not. It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact. Because of that, a lot of Phobos ranges compromise `front` consistency in favor of speed up - which only seems to work because most algorithms need to access `front` once. I believe this is biggest issue in D ranges design right now, by large margin.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:I have personally wondered if there was a case for a TransientRange concept where the only primitives defined are `empty` and `front`. `popFront()` is not defined because the whole point is that every single call to `front` will produce a different value.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it.
May 29 2016
On Sunday, 29 May 2016 at 11:28:11 UTC, ZombineDev wrote:On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:I don't follow your reasoning here. In the proposal I put forward, if a range doesn't define `popFront()`, it's not an InputRange, it's a TransientRange. Conversely, if it _does_ define `popFront()`, it _is_ an InputRange. What's the problem with introspecting that?I would prefer such ranges to not have `front` and return new item from `popFront` instead but yes, I would much prefer it to existing form, transient or not. It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact. Because of that, a lot of Phobos ranges compromise `front` consistency in favor of speed up - which only seems to work because most algorithms need to access `front` once. I believe this is biggest issue in D ranges design right now, by large margin.+1 I think making popFront return a value for transient ranges is a sound idea. It would allow to easily distinguish between InputRange and TransientRange with very simple CT introspection. The biggest blocker is to teach the compiler to recognize TransientRange types in foreach.
May 29 2016
On Sunday, 29 May 2016 at 15:45:14 UTC, Joseph Rushton Wakeling wrote:What's the problem with introspecting that?There is none :) it could be implemented today.
May 29 2016
On Sunday, 29 May 2016 at 15:45:14 UTC, Joseph Rushton Wakeling wrote:On Sunday, 29 May 2016 at 11:28:11 UTC, ZombineDev wrote:Nothing. Just that it could lead to a lot of surprising mistakes, currently the following yields an error: struct A { auto front(){ ...} enum empty = false; } A as; foreach (a; as) {} // ERROR: no method popFront found with the proposed change it would compile.On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:I don't follow your reasoning here. In the proposal I put forward, if a range doesn't define `popFront()`, it's not an InputRange, it's a TransientRange. Conversely, if it _does_ define `popFront()`, it _is_ an InputRange. What's the problem with introspecting that?I would prefer such ranges to not have `front` and return new item from `popFront` instead but yes, I would much prefer it to existing form, transient or not. It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact. Because of that, a lot of Phobos ranges compromise `front` consistency in favor of speed up - which only seems to work because most algorithms need to access `front` once. I believe this is biggest issue in D ranges design right now, by large margin.+1 I think making popFront return a value for transient ranges is a sound idea. It would allow to easily distinguish between InputRange and TransientRange with very simple CT introspection. The biggest blocker is to teach the compiler to recognize TransientRange types in foreach.
May 29 2016
On Sunday, 29 May 2016 at 15:45:14 UTC, Joseph Rushton Wakeling wrote:On Sunday, 29 May 2016 at 11:28:11 UTC, ZombineDev wrote:Yes it can be introspected in library, but it breaks the expectations of the unsuspecting user. For example: auto r = getSomeRange(); assert (r.front == r.front); So, under your suggestion, the above could fail just because an implementation detail was changed like modifying some input range to become a transient range and somehow the code still compiled. Now I'm leaning more towards the buffer + popFront + empty trio, because it's harder to misuse. An alternative would be tagging such ranges with an enum or UDA, but we would need to verify/change too many places in phobos for this to work.On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:I don't follow your reasoning here. In the proposal I put forward, if a range doesn't define `popFront()`, it's not an InputRange, it's a TransientRange. Conversely, if it _does_ define `popFront()`, it _is_ an InputRange. What's the problem with introspecting that?I would prefer such ranges to not have `front` and return new item from `popFront` instead but yes, I would much prefer it to existing form, transient or not. It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact. Because of that, a lot of Phobos ranges compromise `front` consistency in favor of speed up - which only seems to work because most algorithms need to access `front` once. I believe this is biggest issue in D ranges design right now, by large margin.+1 I think making popFront return a value for transient ranges is a sound idea. It would allow to easily distinguish between InputRange and TransientRange with very simple CT introspection. The biggest blocker is to teach the compiler to recognize TransientRange types in foreach.
May 29 2016
On 5/29/16 7:28 AM, ZombineDev wrote:On Sunday, 29 May 2016 at 11:15:19 UTC, Dicebot wrote:This doesn't help at all. I can still make a "transient" range with all three range primitives. There seems to be a misunderstanding about what a transient range is. byLine is a transient range that requires the front element be cacheable (I have to build the line somewhere, and reusing that buffer provides performance). Shoehorning into a popFront-only style "range" does not solve the problem. Not only that, but now I would have to cache BOTH the front element and the next one. -SteveOn 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:+1 I think making popFront return a value for transient ranges is a sound idea. It would allow to easily distinguish between InputRange and TransientRange with very simple CT introspection. The biggest blocker is to teach the compiler to recognize TransientRange types in foreach.On Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:I would prefer such ranges to not have `front` and return new item from `popFront` instead but yes, I would much prefer it to existing form, transient or not. It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact. Because of that, a lot of Phobos ranges compromise `front` consistency in favor of speed up - which only seems to work because most algorithms need to access `front` once. I believe this is biggest issue in D ranges design right now, by large margin.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:I have personally wondered if there was a case for a TransientRange concept where the only primitives defined are `empty` and `front`. `popFront()` is not defined because the whole point is that every single call to `front` will produce a different value.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it.
May 29 2016
On Sunday, 29 May 2016 at 17:29:35 UTC, Steven Schveighoffer wrote:This doesn't help at all. I can still make a "transient" range with all three range primitives. There seems to be a misunderstanding about what a transient range is. byLine is a transient range that requires the front element be cacheable (I have to build the line somewhere, and reusing that buffer provides performance). Shoehorning into a popFront-only style "range" does not solve the problem. Not only that, but now I would have to cache BOTH the front element and the next one.Ack, yea, I think see the issue. It's auto a = transientRange.front; transientRange.popFront() // <=== now a has changed Even if you go with the only-2-primitives one, you'd have the same problem: auto a = transientRange.front; auto b = transientRange.front; // <=== now a has changed My "empty + front only" idea was more inspired by the case e.g. where the range is wrapping some truly-random signal (like the current observed value of atmospheric noise, for example), it clearly doesn't solve this case.
May 30 2016
On 05/30/2016 08:21 AM, Joseph Rushton Wakeling wrote:On Sunday, 29 May 2016 at 17:29:35 UTC, Steven Schveighoffer wrote:That's right. The one approach that works here is to acknowledge the range has no buffering at all and require the user to provide the buffer as an argument. However, that would make for a different primitive that does not fit within the range hierarchy. -- AndreiThis doesn't help at all. I can still make a "transient" range with all three range primitives. There seems to be a misunderstanding about what a transient range is. byLine is a transient range that requires the front element be cacheable (I have to build the line somewhere, and reusing that buffer provides performance). Shoehorning into a popFront-only style "range" does not solve the problem. Not only that, but now I would have to cache BOTH the front element and the next one.Ack, yea, I think see the issue. It's auto a = transientRange.front; transientRange.popFront() // <=== now a has changed Even if you go with the only-2-primitives one, you'd have the same problem: auto a = transientRange.front; auto b = transientRange.front; // <=== now a has changed
May 30 2016
On 5/30/16 8:44 AM, Andrei Alexandrescu wrote:On 05/30/2016 08:21 AM, Joseph Rushton Wakeling wrote:Yes, this is how e.g. Linux getline works. But this leaks even more implementation details to the user, and still doesn't solve the problem, it just pushes it onto the user. It also would make for some awkward pipelining. Another solution is to accept a "buffer manager" that is responsible for determining how to handle the buffering. -SteveOn Sunday, 29 May 2016 at 17:29:35 UTC, Steven Schveighoffer wrote:That's right. The one approach that works here is to acknowledge the range has no buffering at all and require the user to provide the buffer as an argument. However, that would make for a different primitive that does not fit within the range hierarchy. -- AndreiThis doesn't help at all. I can still make a "transient" range with all three range primitives. There seems to be a misunderstanding about what a transient range is. byLine is a transient range that requires the front element be cacheable (I have to build the line somewhere, and reusing that buffer provides performance). Shoehorning into a popFront-only style "range" does not solve the problem. Not only that, but now I would have to cache BOTH the front element and the next one.Ack, yea, I think see the issue. It's auto a = transientRange.front; transientRange.popFront() // <=== now a has changed Even if you go with the only-2-primitives one, you'd have the same problem: auto a = transientRange.front; auto b = transientRange.front; // <=== now a has changed
May 30 2016
On 5/29/16 7:15 AM, Dicebot wrote:On 05/28/2016 08:27 PM, Joseph Rushton Wakeling wrote:What problems are solvable only by not caching the front element? I can't think of any. And there is no way to define "transient" ranges in a way other than explicitly declaring they are transient. There isn't anything inherent or introspectable about such ranges. -SteveOn Saturday, 28 May 2016 at 01:48:08 UTC, Jonathan M Davis wrote:I would prefer such ranges to not have `front` and return new item from `popFront` instead but yes, I would much prefer it to existing form, transient or not. It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact. Because of that, a lot of Phobos ranges compromise `front` consistency in favor of speed up - which only seems to work because most algorithms need to access `front` once.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:I have personally wondered if there was a case for a TransientRange concept where the only primitives defined are `empty` and `front`. `popFront()` is not defined because the whole point is that every single call to `front` will produce a different value.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it.
May 29 2016
On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer wrote:What problems are solvable only by not caching the front element? I can't think of any.As far as I know, currently it is done mostly for performance reasons - if result is fitting in the register there is no need to allocate stack space for the cache, or something like that. One of most annoying examples is map which calls lambda on each `front` call : https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590And there is no way to define "transient" ranges in a way other than explicitly declaring they are transient. There isn't anything inherent or introspectable about such ranges.I don't really care about concept of transient ranges, it is the fact there is no guarantee of front stability for plain input ranges which worries me.
May 30 2016
On 5/30/16 5:35 AM, Dicebot wrote:On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer wrote:Maybe my understanding of your post is incorrect. You said "It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact." I'm trying to figure out which cases caching makes the solution impossible.What problems are solvable only by not caching the front element? I can't think of any.As far as I know, currently it is done mostly for performance reasons - if result is fitting in the register there is no need to allocate stack space for the cache, or something like that. One of most annoying examples is map which calls lambda on each `front` call : https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590But this is inherent in languages which support mutable data. If you want data that doesn't change, require copied/unique data, or immutable data. -SteveAnd there is no way to define "transient" ranges in a way other than explicitly declaring they are transient. There isn't anything inherent or introspectable about such ranges.I don't really care about concept of transient ranges, it is the fact there is no guarantee of front stability for plain input ranges which worries me.
May 30 2016
On Monday, 30 May 2016 at 12:53:07 UTC, Steven Schveighoffer wrote:On 5/30/16 5:35 AM, Dicebot wrote:One case is wrapping a network stream: a TCP input range that yields ubyte[] chunks of data as they are read from the socket, a joiner to join the blocks together, a map that converts the bytes to characters, and take to consume each message. The issue is that, in a naive implementation, creating the TCP range requires reading a chunk from the socket to populate front. Since I usually set up my stream objects, before using them, the receiver range will try to receive a chunk, but I haven't sent a request to the server yet, so it would block indefinitely. Same issue with popFront: I need to pop the bytes I've already used for the previous request, but calling popFront would mean waiting for the response for the next request which I haven't sent out yet. The solution I used was to delay actually receiving the chunk until front was called, which complicates the implementation a bit.On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer wrote:Maybe my understanding of your post is incorrect. You said "It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact." I'm trying to figure out which cases caching makes the solution impossible.What problems are solvable only by not caching the front element? I can't think of any.As far as I know, currently it is done mostly for performance reasons - if result is fitting in the register there is no need to allocate stack space for the cache, or something like that. One of most annoying examples is map which calls lambda on each `front` call : https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590
May 30 2016
On 5/30/16 2:32 PM, Alex Parrill wrote:On Monday, 30 May 2016 at 12:53:07 UTC, Steven Schveighoffer wrote:This is not the same issue. Note that populating on call to front instead of on popFront is still valid within the definition of a range (though I think the most straightforward way to implement a range is to populate only on popFront). What makes this awkward is that you need to check to see if front has already been populated, which also kind of requires a separate flag (though you may be able to do this without an extra flag). However, you still need a place to put the data, and using a buffer for this is a good fit. I would suggest that custom i/o patterns like this (where calling popFront is coupled to some other operation) do not fit ranges very well. In the iopipe library I wrote, I have two separate operations for "fetch more data" and "forget the oldest data", which are based on the requirements for i/o. These can be cajoled into a range interface, of course, but those operations suit i/o requirements much better than front/popFront/empty. -SteveI'm trying to figure out which cases caching makes the solution impossible.One case is wrapping a network stream: a TCP input range that yields ubyte[] chunks of data as they are read from the socket, a joiner to join the blocks together, a map that converts the bytes to characters, and take to consume each message. The issue is that, in a naive implementation, creating the TCP range requires reading a chunk from the socket to populate front. Since I usually set up my stream objects, before using them, the receiver range will try to receive a chunk, but I haven't sent a request to the server yet, so it would block indefinitely. Same issue with popFront: I need to pop the bytes I've already used for the previous request, but calling popFront would mean waiting for the response for the next request which I haven't sent out yet. The solution I used was to delay actually receiving the chunk until front was called, which complicates the implementation a bit.
May 31 2016
On Monday, 30 May 2016 at 12:53:07 UTC, Steven Schveighoffer wrote:On 5/30/16 5:35 AM, Dicebot wrote:There are two separate points here: 1) Current definition of input range (most importantly, the fact `front` has to be property-like) implies `front` to always return the same result until `popFront` is called. 2) For ranges that call predicates on elements to evaluate next element this can only be achieved by caching - predicates are never required to be pure. 3) But caching is sub-optimal performance wise and thus bunch of Phobos algorithms violate `front` consistency / cheapness expectation evaluating predicates each time it is called (liked map).On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer wrote:Maybe my understanding of your post is incorrect. You said "It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact." I'm trying to figure out which cases caching makes the solution impossible.What problems are solvable only by not caching the front element? I can't think of any.As far as I know, currently it is done mostly for performance reasons - if result is fitting in the register there is no need to allocate stack space for the cache, or something like that. One of most annoying examples is map which calls lambda on each `front` call : https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590This has nothing to do with mutability, look at this totally broken example: import std.random; import std.algorithm; import std.stdio; import std.range; void main ( ) { auto range = only(0).map!(x => uniform(0, 10)); writeln(range.front); writeln(range.front); writeln(range.front); }I don't really care about concept of transient ranges, it is the fact there is no guarantee of front stability for plain input ranges which worries me.But this is inherent in languages which support mutable data. If you want data that doesn't change, require copied/unique data, or immutable data.
May 31 2016
On 5/31/16 10:46 AM, Dicebot wrote:On Monday, 30 May 2016 at 12:53:07 UTC, Steven Schveighoffer wrote:Regardless of property-like or not, this should be the case. Otherwise, popFront makes no sense.On 5/30/16 5:35 AM, Dicebot wrote:There are two separate points here: 1) Current definition of input range (most importantly, the fact `front` has to be property-like) implies `front` to always return the same result until `popFront` is called.On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer wrote:Maybe my understanding of your post is incorrect. You said "It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact." I'm trying to figure out which cases caching makes the solution impossible.What problems are solvable only by not caching the front element? I can't think of any.As far as I know, currently it is done mostly for performance reasons - if result is fitting in the register there is no need to allocate stack space for the cache, or something like that. One of most annoying examples is map which calls lambda on each `front` call : https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L5902) For ranges that call predicates on elements to evaluate next element this can only be achieved by caching - predicates are never required to be pure.Or, by not returning different things from your predicate. This is like saying RedBlackTree is broken when I give it a predicate of "a == b".3) But caching is sub-optimal performance wise and thus bunch of Phobos algorithms violate `front` consistency / cheapness expectation evaluating predicates each time it is called (liked map).I don't think anything defensively caches front in case the next call to front is different, unless that's specifically the reason for the range.I think we should be aware that the range API doesn't prevent bugs of all kinds. There's only so much analysis the compiler can do. -SteveThis has nothing to do with mutability, look at this totally broken example: import std.random; import std.algorithm; import std.stdio; import std.range; void main ( ) { auto range = only(0).map!(x => uniform(0, 10)); writeln(range.front); writeln(range.front); writeln(range.front); }I don't really care about concept of transient ranges, it is the fact there is no guarantee of front stability for plain input ranges which worries me.But this is inherent in languages which support mutable data. If you want data that doesn't change, require copied/unique data, or immutable data.
May 31 2016
On Tuesday, 31 May 2016 at 18:11:34 UTC, Steven Schveighoffer wrote:Except it isn't in many cases you call "bugs" :(1) Current definition of input range (most importantly, the fact `front` has to be property-like) implies `front` to always return the same result until `popFront` is called.Regardless of property-like or not, this should be the case. Otherwise, popFront makes no sense.It is perfectly legal for predicate to be non-pure and that would be hugely annoying if anyone decided to prohibit it. Also even pure predicates may be simply very expensive to evaluate which can make `front` a silent pessimization.2) For ranges that call predicates on elements to evaluate next element this can only be achieved by caching - predicates are never required to be pure.Or, by not returning different things from your predicate.This is like saying RedBlackTree is broken when I give it a predicate of "a == b".RBL at least makes certain demands about valid predicate can be. This is not case for ranges in general.stability) casually to the point it can't be relied at all and one has to always make sure it is only evaluated once (make stack local copy or something like that).3) But caching is sub-optimal performance wise and thus bunch of Phobos algorithms violate `front` consistency / cheapness expectation evaluating predicates each time it is called (liked map).I don't think anything defensively caches front in case the next call to front is different, unless that's specifically the reason for the range.I think we should be aware that the range API doesn't prevent bugs of all kinds. There's only so much analysis the compiler can do.This is a totally valid code I want to actually work and not be discarded as "bug".
May 31 2016
On 31.05.2016 22:59, Dicebot wrote:map often allows random access. Do you suggest it should cache opIndex too?I think we should be aware that the range API doesn't prevent bugs of all kinds. There's only so much analysis the compiler can do.This is a totally valid code I want to actually work and not be discarded as "bug".
May 31 2016
On Tuesday, 31 May 2016 at 21:25:12 UTC, Timon Gehr wrote:On 31.05.2016 22:59, Dicebot wrote:Random access map must store all already evaluated items in memory in mu opinion.map often allows random access. Do you suggest it should cache opIndex too?I think we should be aware that the range API doesn't prevent bugs of all kinds. There's only so much analysis the compiler can do.This is a totally valid code I want to actually work and not be discarded as "bug".
May 31 2016
On 5/31/16 4:59 PM, Dicebot wrote:On Tuesday, 31 May 2016 at 18:11:34 UTC, Steven Schveighoffer wrote:If you want to use such "ranges", the compiler will not stop you. Just don't expect any help from Phobos.Except it isn't in many cases you call "bugs" :(1) Current definition of input range (most importantly, the fact `front` has to be property-like) implies `front` to always return the same result until `popFront` is called.Regardless of property-like or not, this should be the case. Otherwise, popFront makes no sense.There's no requirement or need to prevent it. Just a) don't do it, or b) deal with the consequences.It is perfectly legal for predicate to be non-pure and that would be hugely annoying if anyone decided to prohibit it. Also even pure predicates may be simply very expensive to evaluate which can make `front` a silent pessimization.2) For ranges that call predicates on elements to evaluate next element this can only be achieved by caching - predicates are never required to be pure.Or, by not returning different things from your predicate.RedBlackTree with "a == b" will compile and operate. It just won't do much red-black-tree-like things.This is like saying RedBlackTree is broken when I give it a predicate of "a == b".RBL at least makes certain demands about valid predicate can be. This is not case for ranges in general.That's a little much. If you expect such things, you can construct them. There's no way for the functions to ascertain what your lambda is going to do (halting problem) and determine to cache or not based on that.casually to the point it can't be relied at all and one has to always make sure it is only evaluated once (make stack local copy or something like that).3) But caching is sub-optimal performance wise and thus bunch of Phobos algorithms violate `front` consistency / cheapness expectation evaluating predicates each time it is called (liked map).I don't think anything defensively caches front in case the next call to front is different, unless that's specifically the reason for the range.Then it's not a bug? It's going to work just fine how you specified it. I just don't consider it a valid "range" for general purposes. You can do this if you want caching: only(0).map!(x => uniform(0, 10)).cache -SteveI think we should be aware that the range API doesn't prevent bugs of all kinds. There's only so much analysis the compiler can do.This is a totally valid code I want to actually work and not be discarded as "bug".
May 31 2016
On Wednesday, 1 June 2016 at 01:31:53 UTC, Steven Schveighoffer wrote:If you want to use such "ranges", the compiler will not stop you. Just don't expect any help from Phobos.It only strengthens my opinion that Phobos is not a standard library I want. Really, many of those issue would have been solved if basic input range was defined as `empty` + `ElementType popFront()` instead. Some more - if algorithms didn't try to preserve original range kind unless they can do it with no overhead (i.e. arrray.map should not be a random access range out of the box).Good advice. Don't want bugs with non-stable results and accidental double I/O in your long idiomatic range pipeline? Just put "cache" calls everywhere just to be safe, defensive programming for the win! Existing situation is bug prone exactly because you have no idea if you need to cache or not unless you try out specific combination of algorithms / predicates / input and carefully check what it does.This is a totally valid code I want to actually work and not be discarded as "bug".Then it's not a bug? It's going to work just fine how you specified it. I just don't consider it a valid "range" for general purposes. You can do this if you want caching: only(0).map!(x => uniform(0, 10)).cache
Jun 03 2016
On 6/3/16 4:37 AM, Dicebot wrote:Really, many of those issue would have been solved if basic input range was defined as `empty` + `ElementType popFront()` instead.FYI that was on the table, along with many other possible designs. (popFront() should be allowed to return by reference, too.) The problem with this one is that it is inefficient with arrays when the same element is accessed multiple times - the user is forced to create a copy or to use other contortions. Arrays are a prime use case for the range interface. -- Andrei
Jun 03 2016
On 06/03/2016 01:49 PM, Andrei Alexandrescu wrote:On 6/3/16 4:37 AM, Dicebot wrote:Has it actually been measured as inefficient? Ranges are very inline/optimization friendly because of the templated nature of algorithms - I would expect at least GDC/LDC to take advantage of that for simple case and for more complicated ones mutiple evaluations of `front` predicates will dominate performance gain from direct array access. I guess I need to benchmark it myself and see :)Really, many of those issue would have been solved if basic input range was defined as `empty` + `ElementType popFront()` instead.FYI that was on the table, along with many other possible designs. (popFront() should be allowed to return by reference, too.) The problem with this one is that it is inefficient with arrays when the same element is accessed multiple times - the user is forced to create a copy or to use other contortions. Arrays are a prime use case for the range interface. -- Andrei
Jun 03 2016
On 6/3/16 4:37 AM, Dicebot wrote:On Wednesday, 1 June 2016 at 01:31:53 UTC, Steven Schveighoffer wrote:This doesn't solve the problem. empty may not be knowable until you try to fetch the next element (for instance, i/o). A better interface would be bool getNext(ref ElementType), or Nullable!ElementType getNext(), or tuple!(bool, "eof", ElementType, "value") getNext(). I've said many times, i/o is not conducive to ranges. You have to shoehorn it, which is fine if you need compatibility, but it shouldn't be a mechanism of low-level implementation. And you should expect some cruft here because of that.If you want to use such "ranges", the compiler will not stop you. Just don't expect any help from Phobos.It only strengthens my opinion that Phobos is not a standard library I want. Really, many of those issue would have been solved if basic input range was defined as `empty` + `ElementType popFront()` instead.Some more - if algorithms didn't try to preserve original range kind unless they can do it with no overhead (i.e. arrray.map should not be a random access range out of the box).Yes, I can see good reason why you would want this. Hm... a set of adapters which reduce a range to its lesser API would be useful here: array.asInput.map But an expectation for map should be that you want it to be exactly the same as the original, but with a transformation applied to each fetch of an element. I think it needs to provide random access if random access is provided to it.Strawman, not what I said.Then it's not a bug? It's going to work just fine how you specified it. I just don't consider it a valid "range" for general purposes. You can do this if you want caching: only(0).map!(x => uniform(0, 10)).cacheGood advice. Don't want bugs with non-stable results and accidental double I/O in your long idiomatic range pipeline? Just put "cache" calls everywhere just to be safe, defensive programming for the win!Existing situation is bug prone exactly because you have no idea if you need to cache or not unless you try out specific combination of algorithms / predicates / input and carefully check what it does.Again, yes, it's still possible to write code with bugs. The compiler or range library can only hold your hand so much. I don't know of a library that's impossible to use incorrectly. Especially ones that let you execute arbitrary code internally. -Steve
Jun 03 2016
On Friday, 3 June 2016 at 12:03:26 UTC, Steven Schveighoffer wrote:Not necessarily, one can simply define that range processing requires checking `empty` both before and after getting next element (and that returned value is undefined if empty == true). Though I do agree that returning `Optional!ElementType` would be even better.It only strengthens my opinion that Phobos is not a standard library I want. Really, many of those issue would have been solved if basic input range was defined as `empty` + `ElementType popFront()` instead.This doesn't solve the problem. empty may not be knowable until you try to fetch the next element (for instance, i/o). A better interface would be bool getNext(ref ElementType), or Nullable!ElementType getNext(), or tuple!(bool, "eof", ElementType, "value") getNext().Yes, I can see good reason why you would want this. Hm... a set of adapters which reduce a range to its lesser API would be useful here: array.asInput.map But an expectation for map should be that you want it to be exactly the same as the original, but with a transformation applied to each fetch of an element. I think it needs to provide random access if random access is provided to it.That is matter of design philosophy. For me such basic library primitives warrant C++ attitude of "don't pay for what you don't ask for" - and everything else, including actual feature completeness, is of negligible importance compared to that. If keeping random access for map requires either caching or multiple evaluations of predicate, I want it to be banned by default and enabled explicitly (not sure what could be good API for that though). Phobos indeed doesn't seem to make such priorities right now and that is one of reasons I am growing increasingly unhappy with it.But that is what your answer meant in context of my original question :) My complaint was about existing semantics are being error-prone and hard to spot - you proposed it by adding more _manual_ fixup, which kills the whole point.Strawman, not what I said.Then it's not a bug? It's going to work just fine how you specified it. I just don't consider it a valid "range" for general purposes. You can do this if you want caching: only(0).map!(x => uniform(0, 10)).cacheGood advice. Don't want bugs with non-stable results and accidental double I/O in your long idiomatic range pipeline? Just put "cache" calls everywhere just to be safe, defensive programming for the win!
Jun 12 2016
On 6/12/16 4:46 AM, Dicebot wrote:On Friday, 3 June 2016 at 12:03:26 UTC, Steven Schveighoffer wrote:So what it seems like you want is somerange.map(...).front to evaluate the lambda only once, without caching. This doesn't fit into the definition of ranges, which require the ability to access the same front more than once. What you really want is a new kind of construct. It's not a range. Given that front must be accessible more than once, map has to make a choice, caching or reevaluating. Not doing one of those two things is not an option for ranges. Note that this has nothing to do with random access.Yes, I can see good reason why you would want this. Hm... a set of adapters which reduce a range to its lesser API would be useful here: array.asInput.map But an expectation for map should be that you want it to be exactly the same as the original, but with a transformation applied to each fetch of an element. I think it needs to provide random access if random access is provided to it.That is matter of design philosophy. For me such basic library primitives warrant C++ attitude of "don't pay for what you don't ask for" - and everything else, including actual feature completeness, is of negligible importance compared to that. If keeping random access for map requires either caching or multiple evaluations of predicate, I want it to be banned by default and enabled explicitly (not sure what could be good API for that though).Phobos indeed doesn't seem to make such priorities right now and that is one of reasons I am growing increasingly unhappy with it.It's not so much that Phobos attempts to give you kitchen sink support at all costs, it's that if the sink being passed in has all the faucets, it'll forward them for free if possible.No, it's not. My advice is to understand the limitations and expectations of the range wrappers you are using (i.e. read the docs [1]). If you need caching for your purposes, then do that by plopping cache at the end of your use case. The user must have to understand that providing a mapping function that does random things (or even varying things on each call) is going to result in unexpected behavior. The compiler cannot foresee the purpose of your lambda code. You just said only pay for what you ask for. I don't want to pay for caching for map if I don't need it. I'd be pissed if map cached for something like map!(x => x * 2). If you want caching, then ask for it. There was a talk by Herb Sutter on atomic operations on various platforms. His general rule was, as long as you don't write race conditions, and use only provided atomics, your code will do what you think. We can apply a similar rule here: as long as you write a lambda which for a given input will provide the same result, map will do what you expect. When you try odd things like your random call, we don't guarantee things will work like you expect. You may utilize the knowledge of how map works, and how the things that are using the map work to write something clever that breaks the rules. But no guarantees. -Steve [1] I admit, the map docs aren't explicit about this, and should be.But that is what your answer meant in context of my original question :) My complaint was about existing semantics are being error-prone and hard to spot - you proposed it by adding more _manual_ fixup, which kills the whole point.Strawman, not what I said.Then it's not a bug? It's going to work just fine how you specified it. I just don't consider it a valid "range" for general purposes. You can do this if you want caching: only(0).map!(x => uniform(0, 10)).cacheGood advice. Don't want bugs with non-stable results and accidental double I/O in your long idiomatic range pipeline? Just put "cache" calls everywhere just to be safe, defensive programming for the win!
Jun 13 2016
On Monday, 13 June 2016 at 13:59:06 UTC, Steven Schveighoffer wrote:No, it's not. My advice is to understand the limitations and expectations of the range wrappers you are using (i.e. read the docs [1]). If you need caching for your purposes, then do that by plopping cache at the end of your use case. The user must have to understand that providing a mapping function that does random things (or even varying things on each call) is going to result in unexpected behavior. The compiler cannot foresee the purpose of your lambda code. You just said only pay for what you ask for. I don't want to pay for caching for map if I don't need it. I'd be pissed if map cached for something like map!(x => x * 2). If you want caching, then ask for it. There was a talk by Herb Sutter on atomic operations on various platforms. His general rule was, as long as you don't write race conditions, and use only provided atomics, your code will do what you think. We can apply a similar rule here: as long as you write a lambda which for a given input will provide the same result, map will do what you expect. When you try odd things like your random call, we don't guarantee things will work like you expect. You may utilize the knowledge of how map works, and how the things that are using the map work to write something clever that breaks the rules. But no guarantees. -Steve [1] I admit, the map docs aren't explicit about this, and should be.I see three levels of function used with map: pure: everything is as expected, consider caching if expensive. idempotent side-effects: remember map is lazy and you'll be ok. non-idempotent side-effects: probably not what you want.
Jun 13 2016
Related: just made a few comments to https://github.com/dlang/phobos/pull/4511. More work on range formalization is welcome. Thanks! -- Andrei
Jul 06 2016
On Monday, 13 June 2016 at 13:59:06 UTC, Steven Schveighoffer wrote:So what it seems like you want is somerange.map(...).front to evaluate the lambda only once, without caching. This doesn't fit into the definition of ranges, which require the ability to access the same front more than once. What you really want is a new kind of construct. It's not a range.I think you are completely right here, though it is not very comforting :) Indeed, looking now through all uses of ranges in my code, it has always been same specific subset of their functionality - input range based pipeline. Chunk input, build up filters/transformations, write output. Never random access or even `.save`, and each original input chunk is supposed to be processed only once. Every of my complaints is simply a natural outcome of this usage pattern - I'd really prefer to have something more stupid but also more fool-proof.
Jun 14 2016
On 06/12/2016 04:46 AM, Dicebot wrote:That is matter of design philosophy. For me such basic library primitives warrant C++ attitude of "don't pay for what you don't ask for" - and everything else, including actual feature completeness, is of negligible importance compared to that.C++'s input iterator model also predicates multiple accesses to the current value by means of *it.If keeping random access for map requires either caching or multiple evaluations of predicate, I want it to be banned by default and enabled explicitly (not sure what could be good API for that though).In the initial implementation, map did cache the current element. That made some folks unhappy, so it was removed. In lieu, a function "cache" was introduced. Apparently this setup makes other folks unhappy. It seems to me, sometimes if I went by what this forum agrees upon I'd write one thousand lines of code one day and remove it the next.Phobos indeed doesn't seem to make such priorities right now and that is one of reasons I am growing increasingly unhappy with it.What steps do you think we could take with Phobos to make it better without breaking backward compatibility? Andrei
Jun 13 2016
On Tuesday, 14 June 2016 at 00:19:54 UTC, Andrei Alexandrescu wrote:On 06/12/2016 04:46 AM, Dicebot wrote:Yeah, I don't say C++ iterator model is any better for that kind of design - sorry if my wording implied that. I wonder how they would address it if pipeline approach would ever become popular in C++. See my response to Steven - probably my main issue is wanting some very different concept, dedicated stream/pipeline abstraction that would still be usable with all the same std.algorithmThat is matter of design philosophy. For me such basic library primitives warrant C++ attitude of "don't pay for what you don't ask for" - and everything else, including actual feature completeness, is of negligible importance compared to that.C++'s input iterator model also predicates multiple accesses to the current value by means of *it.It seems to me, sometimes if I went by what this forum agrees upon I'd write one thousand lines of code one day and remove it the next.Please don't take my rants in the NG as a serious call to action. I am simply sharing my concerns in a relevant topic - if there was anything pragmatical I could propose, I'd take it to the mail list instead or even right a DIP. To be honest, I'd even prefer if you ignored _any_ request/proposal which is not submitted as such. NG can't be taken as a base for reasoning.Introducing `std2` package is technically not breaking backwards compatibility but I presume this is not what you have meant ;) I don't know - as I have already said, otherwise I'd actually taken some more constructive actions instead of throwing complaints. If I'd have a say in designing brand new Phobos, it would be very different - async by default, with a strict separation of lightweight nogc algorithm and API core (fitting even for embedded env) and all the feature rich additional modules built on top. This is not kind of stuff one simply "fixes" in existing mature library. At the same time, I am not even sure if it is even important to fix. It is good enough for any kind of general purpose development, good for productivity. And bigger projects with hard performance concerns and maintenance costs tend to grow their own "standard" libraries anyway.Phobos indeed doesn't seem to make such priorities right now and that is one of reasons I am growing increasingly unhappy with it.What steps do you think we could take with Phobos to make it better without breaking backward compatibility?
Jun 14 2016
On 5/27/16 9:48 PM, Jonathan M Davis via Digitalmars-d wrote:On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:Wholly disagree. If we didn't cache the element, D would be a laughingstock of performance-minded tests.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it. Every single range-based function would have to either test that the "transient" enum wasn't there or take transient ranges into account, and realistically, that isn't going to happen. For better or worse, we do have byLine in std.stdio, which has a transient front, but aside from the performance benefits, it's been a disaster.It's way too error-prone. We now have byLineCopy to combat that, but of course, byLine is the more obvious function and thus more likely to be used (plus it's been around longer), so a _lot_ of code is going to end up using it, and a good chunk of that code really should be using byLineCopy.There's nothing actually wrong with using byLine, and copying on demand. Why such a negative connotation?I'm of the opinion that if you want a transient front, you should just use opApply and skip ranges entirely.So you want to make this code invalid? Why? foreach(i; map!(a => a.to!int)(stdin.byLine)) { // process each integer ... } You want to make me copy each line to a heap-allocated string so I can parse it?!!Allowing for front to be transient - whether you can check for it or not - simply is not worth the extra complications. I'd love it if we deprecated byLine's range functions, and made it use opApply instead and just declare transient ranges to be completely unsupported. If you want to write your code to have a transient front, you can obviously take that risk, but you're on your own.There is no way to disallow front from being transient. In fact, it should be assumed that it is the default unless it's wholly a value-type. -Steve
May 29 2016
On Sunday, May 29, 2016 13:36:24 Steven Schveighoffer via Digitalmars-d wrote:On 5/27/16 9:48 PM, Jonathan M Davis via Digitalmars-d wrote:Having byLine not copy its buffer is fine. Having it be a range is not. Algorithms in general just do not play well with that behavior, and I don't think that it's reasonable to expect them to.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:Wholly disagree. If we didn't cache the element, D would be a laughingstock of performance-minded tests.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it. Every single range-based function would have to either test that the "transient" enum wasn't there or take transient ranges into account, and realistically, that isn't going to happen. For better or worse, we do have byLine in std.stdio, which has a transient front, but aside from the performance benefits, it's been a disaster.Because it does not play nicely with ranges, and aside from a few rare ranges like byLine that have to deal directly with I/O, transience isn't even useful. Having an efficient solution that plays nicely with I/O is definitely important, but it doesn't need to be a range, especially when it complicates ranges in general. byLine doesn't even work with std.array.array, and if even that doesn't work, I don't see how a range could be considered well-behaved.It's way too error-prone. We now have byLineCopy to combat that, but of course, byLine is the more obvious function and thus more likely to be used (plus it's been around longer), so a _lot_ of code is going to end up using it, and a good chunk of that code really should be using byLineCopy.There's nothing actually wrong with using byLine, and copying on demand. Why such a negative connotation?If it's a range, then it can be passed around to other algorithms with impunity, and almost nothing is written with the idea that a range's front is transient. There's no way to check for transience, and I don't think that it's even vaguely worth adding yet another range primitive that has to be checked for everywhere just for this case. Transience does _not_ play nicely with algorithms in general. Using opApply doesn't completely solve the problem (since the buffer could still escape - we'd need some kind of scope attribute or wrapper to fix that problem), but it makes it so that you can't pass such a a range around and run into problems with all of the algorithms that don't play nicely with it. So, instead, you end up with code that looks something like foreach(line; stdin.byLine()) { auto i = line.to!int(); ... } And yes, it's slightly longer, but it prevents a whole class of bugs by not having it be a range with a transient front.I'm of the opinion that if you want a transient front, you should just use opApply and skip ranges entirely.So you want to make this code invalid? Why? foreach(i; map!(a => a.to!int)(stdin.byLine)) { // process each integer ... } You want to make me copy each line to a heap-allocated string so I can parse it?!!Pretty much no range-based code is written with the idea that front is transient. It's pretty much the opposite. Unfortunately, we can't check for all of the proper range semantics at compile time (be it having to do with transience, the fact that front needs to be the same every time until popFront is called, that save has to actually result in a range that will have exactly the same elements, or whatever other runtime behavior that ranges are supposed to adhere to), but just because something can't be checked for doesn't mean that it should be considered reasonable or valid. IMHO, a range with a transient front should be considered as valid as a range that returns a different value every time that front is called without popFront having been called. Neither can be tested for, but both cause problems. If we're going to support transience, then we _need_ to have some sort of flag/enum in the type to indicate that the range is transient, but that complicates everything, because then all range implementations have to check for it and pass it on when they wrap that type, and many algorithms will have to expclicitly check for it in their template constraints to make it invalid. You end up with a whole lot of extra machinery in range-based code to support a very small number of ranges. The number of things that range-based code has to check for is already arguably way too high without adding yet more into the mix. - Jonathan M DavisAllowing for front to be transient - whether you can check for it or not - simply is not worth the extra complications. I'd love it if we deprecated byLine's range functions, and made it use opApply instead and just declare transient ranges to be completely unsupported. If you want to write your code to have a transient front, you can obviously take that risk, but you're on your own.There is no way to disallow front from being transient. In fact, it should be assumed that it is the default unless it's wholly a value-type.
May 29 2016
On 5/30/16 12:17 AM, Jonathan M Davis via Digitalmars-d wrote:On Sunday, May 29, 2016 13:36:24 Steven Schveighoffer via Digitalmars-d wrote:I disagree. Most algorithms in std.algorithm are fine with transient ranges.On 5/27/16 9:48 PM, Jonathan M Davis via Digitalmars-d wrote:Having byLine not copy its buffer is fine. Having it be a range is not. Algorithms in general just do not play well with that behavior, and I don't think that it's reasonable to expect them to.On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:Wholly disagree. If we didn't cache the element, D would be a laughingstock of performance-minded tests.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?Honestly, I don't think that supporting transient ranges is worth it. Every single range-based function would have to either test that the "transient" enum wasn't there or take transient ranges into account, and realistically, that isn't going to happen. For better or worse, we do have byLine in std.stdio, which has a transient front, but aside from the performance benefits, it's been a disaster.Here is how I think about it: the front element is valid and stable until you call popFront. After that, anything goes for the old front. This is entirely reasonable, and fits into many many algorithms. This isn't a functional-only language, mutation is valid in D.Because it does not play nicely with ranges, and aside from a few rare ranges like byLine that have to deal directly with I/O, transience isn't even useful. Having an efficient solution that plays nicely with I/O is definitely important, but it doesn't need to be a range, especially when it complicates ranges in general. byLine doesn't even work with std.array.array, and if even that doesn't work, I don't see how a range could be considered well-behaved.It's way too error-prone. We now have byLineCopy to combat that, but of course, byLine is the more obvious function and thus more likely to be used (plus it's been around longer), so a _lot_ of code is going to end up using it, and a good chunk of that code really should be using byLineCopy.There's nothing actually wrong with using byLine, and copying on demand. Why such a negative connotation?Algorithms which are fine with byLine from std.algorithm.searching (not going to go through all of the submodules): all, any, balancedParens, count, countUntil, find, canFind, findAmong (with first range being transient), skipOver (second parameter transient), startsWith, until. algorithms which would would compile with transient ranges, but not work correctly: minCount, maxCount Now, if minCount and maxCount could introspect transience, it could .dup the elements to make sure they were returned properly.If it's a range, then it can be passed around to other algorithms with impunity, and almost nothing is written with the idea that a range's front is transient.I'm of the opinion that if you want a transient front, you should just use opApply and skip ranges entirely.So you want to make this code invalid? Why? foreach(i; map!(a => a.to!int)(stdin.byLine)) { // process each integer ... } You want to make me copy each line to a heap-allocated string so I can parse it?!!There's no way to check for transience, and I don't think that it's even vaguely worth adding yet another range primitive that has to be checked for everywhere just for this case. Transience does _not_ play nicely with algorithms in general.I think your understanding of this is incorrect. A range is transient by default if the type of the element allows modification via reference. It doesn't have to be checked everywhere, because most algorithms are fine with or without transience.Using opApply doesn't completely solve the problem (since the buffer could still escape - we'd need some kind of scope attribute or wrapper to fix that problem), but it makes it so that you can't pass such a a range around and run into problems with all of the algorithms that don't play nicely with it. So, instead, you end up with code that looks something like foreach(line; stdin.byLine()) { auto i = line.to!int(); ... } And yes, it's slightly longer, but it prevents a whole class of bugs by not having it be a range with a transient front.Sure, as long as we're adding new newsgroups, let's at one that's titled "why can't I use byLine as a range", as this will be a popular topic.Provably false, see above. -StevePretty much no range-based code is written with the idea that front is transient.Allowing for front to be transient - whether you can check for it or not - simply is not worth the extra complications. I'd love it if we deprecated byLine's range functions, and made it use opApply instead and just declare transient ranges to be completely unsupported. If you want to write your code to have a transient front, you can obviously take that risk, but you're on your own.There is no way to disallow front from being transient. In fact, it should be assumed that it is the default unless it's wholly a value-type.
May 30 2016
On 05/30/2016 09:26 AM, Steven Schveighoffer wrote:Here is how I think about it: the front element is valid and stable until you call popFront. After that, anything goes for the old front.Yah, we should have formal language in the library reference about this. -- Andrei
May 30 2016
On Mon, May 30, 2016 at 09:26:35AM -0400, Steven Schveighoffer via Digitalmars-d wrote:On 5/30/16 12:17 AM, Jonathan M Davis via Digitalmars-d wrote:[...]Yes, some years ago I submitted a series of PRs to fix poorly-written code in Phobos that unnecessarily depended on .front not changing when popFront is called. I had found that in the majority of cases it was actually unnecessary to require non-transience; most algorithms in std.algorithm, properly-written, work just fine with transient ranges. For the few algorithms that don't work with transient ranges, arguably they ought to require forward ranges and use .save instead. The only exception I can think of right now is array(). [...]Having byLine not copy its buffer is fine. Having it be a range is not. Algorithms in general just do not play well with that behavior, and I don't think that it's reasonable to expect them to.I disagree. Most algorithms in std.algorithm are fine with transient ranges.Here is how I think about it: the front element is valid and stable until you call popFront. After that, anything goes for the old front. This is entirely reasonable, and fits into many many algorithms. This isn't a functional-only language, mutation is valid in D.I agree, most range algorithms work just fine with transient ranges. I consider it a reasonable consequence of defensive programming: don't assume anything about the API beyond what it actually guarantees. The current range API says that the current range element is accessible via .front; from this one may conclude that the value returned by .front should be valid immediately afterwards. However, the API says nothing about this value lasting past the next call to .popFront, and in fact it specifies that .front will return something different afterwards, so defensively-written code should assume the worst and regard the previous value of .front as invalidated. Understandably, writing code this way is more constrained and less convenient, but in a standard library I'd expect that code should be at least of this calibre, if not better. And as far as I have found, the majority of range algorithms in Phobos *can* be written this way and work perfectly fine with transient ranges. As I've already said, most of the remaining algorithms can be implemented by requiring forward ranges and using .save instead of assuming that .front is cacheable -- .save is something guaranteed by the API, and therefore defensively written generic code should use it rather than making unfounded assumptions about .front. [...]And nothing about the range API guarantees that .front is *not* transient either. Generic code should not assume either way. It should be written with the minimal assumptions necessary to work. [...]If it's a range, then it can be passed around to other algorithms with impunity, and almost nothing is written with the idea that a range's front is transient.I disagree. Of the algorithms I surveyed in Phobos at the time, I found that the majority of them can be written in such a way that they are transience-agnostic. Only a small set of algorithms actually require non-transience. That's hardly "not play nicely with algorithms in general". [...]There's no way to check for transience, and I don't think that it's even vaguely worth adding yet another range primitive that has to be checked for everywhere just for this case. Transience does _not_ play nicely with algorithms in general.[...] I'd argue that range-based generic code that assumes non-transience is inherently buggy, because generic code ought not to make any assumptions beyond what the range API guarantees. Currently, the range API does not guarantee non-transience, therefore code that assumes so is broken by definition. Just because they happen to work most of the time does not change the fact that they're written wrongly. T -- Just because you can, doesn't mean you should.Pretty much no range-based code is written with the idea that front is transient.Provably false, see above.
May 30 2016
On Monday, May 30, 2016 09:57:29 H. S. Teoh via Digitalmars-d wrote:I'd argue that range-based generic code that assumes non-transience is inherently buggy, because generic code ought not to make any assumptions beyond what the range API guarantees. Currently, the range API does not guarantee non-transience, therefore code that assumes so is broken by definition. Just because they happen to work most of the time does not change the fact that they're written wrongly.Technically, the range API doesn't even require that front return the same value every time that it's called, because isInputRange can't possibly test for it. That doesn't mean that we don't expect that ranges behave that way. It's never been the case that we agreed that transient ranges were acceptable. In fact, it's pretty much the opposite. There was some discussion that maybe we'd consider it legitimate for basic input ranges given that we already had byLine doing it, and we didn't want to require that it copy its buffer, but previous discussions on it made it quite clear that most of us (Andrei included) thought that it should not be allowed for forward ranges. It's been the case for ages now that ranges with a transient front are not explicitly supported. They happen to work with some algorithms (and you made changes to Phobos to make them work with more algorithms there), but they're not checked for, and there are algorithms that they don't and can't work with. The simple fact that std.array.array doesn't work with them is pretty damning IMHO. We already have too many problems with unspecified behavior in ranges without adding transient front into the mix. It should be killed with fire IMHO. - Jonathan M Davis
May 31 2016
On 5/31/16 11:45 AM, Jonathan M Davis via Digitalmars-d wrote:On Monday, May 30, 2016 09:57:29 H. S. Teoh via Digitalmars-d wrote:The API doesn't require it mechanically, but the API does require it semantically (what does popFront mean if front changes automatically?). If front returns different things, I'd say that's a bug in your range construction.I'd argue that range-based generic code that assumes non-transience is inherently buggy, because generic code ought not to make any assumptions beyond what the range API guarantees. Currently, the range API does not guarantee non-transience, therefore code that assumes so is broken by definition. Just because they happen to work most of the time does not change the fact that they're written wrongly.Technically, the range API doesn't even require that front return the same value every time that it's called, because isInputRange can't possibly test for it.That doesn't mean that we don't expect that ranges behave that way. It's never been the case that we agreed that transient ranges were acceptable.I want to note here that transient ranges are different from what you identified above (front returning different values each time). I know that you know that, but the way you said this implies that's what transient ranges are.In fact, it's pretty much the opposite. There was some discussion that maybe we'd consider it legitimate for basic input ranges given that we already had byLine doing it, and we didn't want to require that it copy its buffer, but previous discussions on it made it quite clear that most of us (Andrei included) thought that it should not be allowed for forward ranges.I can see why forward ranges make it easier to avoid transience, since by definition the data should be available even if another range has already popFront'd by. But I don't think this needs to be a requirement. What I'd say is that as long as you haven't called popFront on the range, front should be stable. What happens elsewhere is implementation defined.It's been the case for ages now that ranges with a transient front are not explicitly supported. They happen to work with some algorithms (and you made changes to Phobos to make them work with more algorithms there), but they're not checked for, and there are algorithms that they don't and can't work with. The simple fact that std.array.array doesn't work with them is pretty damning IMHO.array can work with them, you just have to copy each element as you iterate it (or otherwise transform the data into something that's persistent when copied). But there is no requirement I know of that says a range has to always provide the expected outcome with std.array.array.We already have too many problems with unspecified behavior in ranges without adding transient front into the mix. It should be killed with fire IMHO.It can't be killed, because it's ALREADY supported. Way too much code would break, and I think you would piss off a lot of D developers (myself included). But I don't think transient ranges are that horrific. For instance, stdin.byLine.array probably doesn't do what you want. But it does actually work and do something that's defined and reproducible and will not corrupt memory. It's a bug needing to be fixed because you didn't understand what it was doing. -Steve
May 31 2016
On Tuesday, 31 May 2016 at 18:31:05 UTC, Steven Schveighoffer wrote:On 5/31/16 11:45 AM, Jonathan M Davis via Digitalmars-d wrote:The `Generator` range is an eager violator of this requirement: https://github.com/dlang/phobos/blob/ca292ff78cd825f642eb58d586e2723ba14ae448/std/range/package.d#L3075-L3080 ... although I'd agree that's an implementation error.On Monday, May 30, 2016 09:57:29 H. S. Teoh via Digitalmars-d wrote:The API doesn't require it mechanically, but the API does require it semantically (what does popFront mean if front changes automatically?). If front returns different things, I'd say that's a bug in your range construction.I'd argue that range-based generic code that assumes non-transience is inherently buggy, because generic code ought not to make any assumptions beyond what the range API guarantees. Currently, the range API does not guarantee non-transience, therefore code that assumes so is broken by definition. Just because they happen to work most of the time does not change the fact that they're written wrongly.Technically, the range API doesn't even require that front return the same value every time that it's called, because isInputRange can't possibly test for it.
Jun 01 2016
On 6/1/16 8:49 AM, Joseph Rushton Wakeling wrote:On Tuesday, 31 May 2016 at 18:31:05 UTC, Steven Schveighoffer wrote:Yeah, that seems like a bug. -SteveOn 5/31/16 11:45 AM, Jonathan M Davis via Digitalmars-d wrote:The `Generator` range is an eager violator of this requirement: https://github.com/dlang/phobos/blob/ca292ff78cd825f642eb58d586e2723ba14ae448/std/range/package.d#L3075-L3080 .... although I'd agree that's an implementation error.On Monday, May 30, 2016 09:57:29 H. S. Teoh via Digitalmars-d wrote:The API doesn't require it mechanically, but the API does require it semantically (what does popFront mean if front changes automatically?). If front returns different things, I'd say that's a bug in your range construction.I'd argue that range-based generic code that assumes non-transience is inherently buggy, because generic code ought not to make any assumptions beyond what the range API guarantees. Currently, the range API does not guarantee non-transience, therefore code that assumes so is broken by definition. Just because they happen to work most of the time does not change the fact that they're written wrongly.Technically, the range API doesn't even require that front return the same value every time that it's called, because isInputRange can't possibly test for it.
Jun 01 2016
On 6/1/16 8:37 PM, Steven Schveighoffer wrote:On 6/1/16 8:49 AM, Joseph Rushton Wakeling wrote:I guess this controversy was hashed out when this function was added. So this is very much intentional: https://github.com/dlang/phobos/pull/2606 https://github.com/dlang/phobos/pull/3002 I'll start a new thread to discuss it. I think it needs to be changed. -SteveThe `Generator` range is an eager violator of this requirement: https://github.com/dlang/phobos/blob/ca292ff78cd825f642eb58d586e2723ba14ae448/std/range/package.d#L3075-L3080 .... although I'd agree that's an implementation error.Yeah, that seems like a bug.
Jun 02 2016
On Sunday, 29 May 2016 at 17:36:24 UTC, Steven Schveighoffer wrote:Wholly disagree. If we didn't cache the element, D would be a laughingstock of performance-minded tests.byLine already is a laughingstock performance wise: https://issues.dlang.org/show_bug.cgi?id=11810 It's way faster to read the entire file into a buffer and iterate by line over that. I have to agree with Jonathan, I see a lot of proposals in this thread but I have yet to see a cost/benefit analysis that's pro transient support. The amount of changes needed to support them is not commensurate to any possible benefits.
May 29 2016
On 5/30/16 12:22 AM, Jack Stouffer wrote:On Sunday, 29 May 2016 at 17:36:24 UTC, Steven Schveighoffer wrote:But that's because we base our I/O on C :) My iopipe library is 2x as fast.Wholly disagree. If we didn't cache the element, D would be a laughingstock of performance-minded tests.byLine already is a laughingstock performance wise: https://issues.dlang.org/show_bug.cgi?id=11810It's way faster to read the entire file into a buffer and iterate by line over that.Depends on how big the entire file is.I have to agree with Jonathan, I see a lot of proposals in this thread but I have yet to see a cost/benefit analysis that's pro transient support. The amount of changes needed to support them is not commensurate to any possible benefits.Transient ranges are already supported. Removing the ability to have transient ranges would be an unmitigated disaster. e.g. removing range support for byLine. -Steve
May 30 2016
On 05/30/2016 09:30 AM, Steven Schveighoffer wrote:My iopipe library is 2x as fast.Cool! What is the trick to the speedup? -- Andrei
May 30 2016
On Monday, 30 May 2016 at 15:12:41 UTC, Andrei Alexandrescu wrote:On 05/30/2016 09:30 AM, Steven Schveighoffer wrote:Direct buffer access and not using FILE * as a base. -SteveMy iopipe library is 2x as fast.Cool! What is the trick to the speedup? -- Andrei
May 30 2016
On 05/30/2016 11:22 AM, Steven Schveighoffer wrote:On Monday, 30 May 2016 at 15:12:41 UTC, Andrei Alexandrescu wrote:So what primitives do you use? The lower-level descriptor-based primitives open() and friends? http://linux.die.net/man/2/open What do you mean by direct buffer access? What is the relative contributions of these (and possibly other) design decisions to the overall speed? How can we integrate some of these in std.stdio without breaking backward compatibility, or offer additional artifacts that integrate seamlessly and don't need to be compatible? AndreiOn 05/30/2016 09:30 AM, Steven Schveighoffer wrote:Direct buffer access and not using FILE * as a base.My iopipe library is 2x as fast.Cool! What is the trick to the speedup? -- Andrei
May 30 2016
On 5/30/16 11:46 AM, Andrei Alexandrescu wrote:On 05/30/2016 11:22 AM, Steven Schveighoffer wrote:Yes. That is not the focus of my library, though. I have a very crude and basic I/O type that just wraps the system calls into a nice API.On Monday, 30 May 2016 at 15:12:41 UTC, Andrei Alexandrescu wrote:So what primitives do you use? The lower-level descriptor-based primitives open() and friends? http://linux.die.net/man/2/openOn 05/30/2016 09:30 AM, Steven Schveighoffer wrote:Direct buffer access and not using FILE * as a base.My iopipe library is 2x as fast.Cool! What is the trick to the speedup? -- AndreiWhat do you mean by direct buffer access?I mean instead of pulling data out one character at a time from another buffer into your buffer, you have direct access, using all the speedups that CPUs provide when accessing arrays. I'm also allowing the compiler to inline as much as possible because all the code is available.What is the relative contributions of these (and possibly other) design decisions to the overall speed?I don't know. I started with a vastly different design and it ended up being twice as fast. So it's difficult to say for certain what are the major factor that is slowing down std.stdio. At DConf, Adrian Matoga presented his very similar flod library, and I was surprised when it was faster than mine at processing lines, even though he was still using std.stdio.File. So I found a few reasons why mine was slower and fixed them. The things that made it much faster were: 1. Switched to direct pointer access (i.e. doing while(*ptr++ != lineTerminator) {} ) when the buffer type was an array. 2. I was stupidly accessing the member of the struct which held the terminator instead of allocating a stack variable for it. After doing that (and making it immutable), the compiler was much better informed on how to optimize. There may even be more opportunities for speedup, but I'm pretty satisfied with it at the moment.How can we integrate some of these in std.stdio without breaking backward compatibility,There are 2 main issues with FILE *: 1) it does not provide buffer access, so you must rely on things like getline if they exist. But these have their own problems (i.e. do not support unicode, require C-malloc'd buffer) 2) All calls are opaque, so no inlining can occur. This is especially painful if you have to call fgetc in a tight loop. This goes back to the buffer access issue -- FILE * only lets you look ahead one character. So I think the only real way to provide a decent performance improvement without breaking C compatibility is to write code that is aware of the implementation details of the opaque FILE * type. Just like getline.or offer additional artifacts that integrate seamlessly and don't need to be compatible?I had a grand plan to replace stdio with something that "evolves" to faster D based i/o when you did D-like things. But it's very complex, and each addition of code to stdio makes things more complex. In reality, I think we only need to be C compatible for stdout (possibly only via write[f][ln]), and possibly stdin. There's no reason we need to use C i/o for any other uses. What makes things difficult is that everything in Phobos uses std.stdio.File for all i/o and that uses C i/o. -Steve
May 31 2016
On Tuesday, 31 May 2016 at 12:42:23 UTC, Steven Schveighoffer wrote:There are 2 main issues with FILE *: 1) it does not provide buffer access, so you must rely on things like getline if they exist. But these have their own problems (i.e. do not support unicode, require C-malloc'd buffer)You can cheat by using setvbuf() and imposing your own buffer to the FILE* routine. What and how the underlying implementation put in that buffer is of course not documented but not very difficult to guess (for example fseek()/fread() will always fill the buffer from 0 to the end (of buffer or file depending what come first).
Jun 01 2016
On 6/1/16 10:05 AM, Patrick Schluter wrote:On Tuesday, 31 May 2016 at 12:42:23 UTC, Steven Schveighoffer wrote:But there is no mechanism to determine where the current file pointer is inside the buffer. One could use ftell vs. tell on the file descriptor, but that is going to perform quite poorly. -SteveThere are 2 main issues with FILE *: 1) it does not provide buffer access, so you must rely on things like getline if they exist. But these have their own problems (i.e. do not support unicode, require C-malloc'd buffer)You can cheat by using setvbuf() and imposing your own buffer to the FILE* routine. What and how the underlying implementation put in that buffer is of course not documented but not very difficult to guess (for example fseek()/fread() will always fill the buffer from 0 to the end (of buffer or file depending what come first).
Jun 01 2016
On 05/30/2016 12:22 AM, Jack Stouffer wrote:On Sunday, 29 May 2016 at 17:36:24 UTC, Steven Schveighoffer wrote:That has been fixed a long time ago by https://github.com/dlang/phobos/pull/3089, I just marked it as fixed.Wholly disagree. If we didn't cache the element, D would be a laughingstock of performance-minded tests.byLine already is a laughingstock performance wise: https://issues.dlang.org/show_bug.cgi?id=11810It's way faster to read the entire file into a buffer and iterate by line over that.Sometimes yes, but that's a change of approach with its own tradeoffs (e.g. what if you want to stop early, the file is very large etc. etc). Andrei
May 30 2016
On 5/27/16 7:42 PM, Seb wrote:So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?enum isTransient(R) = is(typeof(() { static assert(isInputRange!R); static assert(hasIndirections(ElementType!R)); static assert(!allIndrectionsImmutable!(ElementType!R)); // need to write this })); -Steve
May 29 2016
On 5/29/16 1:45 PM, Steven Schveighoffer wrote:On 5/27/16 7:42 PM, Seb wrote:obviously, this is better as a simple && statement between the three requirements :) When I started writing, I thought I'd have to write some runtime code. -SteveSo what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?enum isTransient(R) = is(typeof(() { static assert(isInputRange!R); static assert(hasIndirections(ElementType!R)); static assert(!allIndrectionsImmutable!(ElementType!R)); // need to write this }));
May 29 2016
On Sunday, 29 May 2016 at 18:09:29 UTC, Steven Schveighoffer wrote:On 5/29/16 1:45 PM, Steven Schveighoffer wrote:Would that make a range of polymorphic objects transient?On 5/27/16 7:42 PM, Seb wrote:obviously, this is better as a simple && statement between the three requirements :) When I started writing, I thought I'd have to write some runtime code. -SteveSo what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?enum isTransient(R) = is(typeof(() { static assert(isInputRange!R); static assert(hasIndirections(ElementType!R)); static assert(!allIndrectionsImmutable!(ElementType!R)); // need to write this }));
May 29 2016
On Sunday, May 29, 2016 18:27:53 default0 via Digitalmars-d wrote:On Sunday, 29 May 2016 at 18:09:29 UTC, Steven Schveighoffer wrote:It would make pretty much anything that isn't a value type - including a type that's actually a value but uses postblit to do it - be treated as transient, with the one exception being that if the reference types involved are immutable (be they the element type or members in the elmenet type), then it's not treated as transient. This means a very large number of ranges will be treated as being transient, which is completely unacceptable IMHO. Having a transient front is _not_ the norm, and code is usually written with the assumption that front is not transient. In almost all cases, if a range-based function happens to work with a transient front, it's by luck and not because it was designed that way. You can't statically check for transience, because it depends on runtime behavior. At best, you can statically eliminate a fairly small portion of the ranges as not being having transient fronts. If we want to actually support transient fronts, it really needs to be explicit IMHO. Regardless, I don't think that we want to need to be checking for transience in range-based functions in general. It's too much extra complication for too little benefit. A very small number of ranges actually have or benefit from having a transient front, and I don't think that it's worth supporting them as ranges given how much that affects everything else. Otherwise, you end up with the 1% case causing problems for all range-based code. - Jonathan M DavisOn 5/29/16 1:45 PM, Steven Schveighoffer wrote:Would that make a range of polymorphic objects transient?On 5/27/16 7:42 PM, Seb wrote:obviously, this is better as a simple && statement between the three requirements :) When I started writing, I thought I'd have to write some runtime code. -SteveSo what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?enum isTransient(R) = is(typeof(() { static assert(isInputRange!R); static assert(hasIndirections(ElementType!R)); static assert(!allIndrectionsImmutable!(ElementType!R)); // need to write this }));
May 29 2016
On 5/29/16 2:27 PM, default0 wrote:On Sunday, 29 May 2016 at 18:09:29 UTC, Steven Schveighoffer wrote:Of course! If it's mutable (or marked const), it can change from call to call. -SteveOn 5/29/16 1:45 PM, Steven Schveighoffer wrote:Would that make a range of polymorphic objects transient?On 5/27/16 7:42 PM, Seb wrote:obviously, this is better as a simple && statement between the three requirements :) When I started writing, I thought I'd have to write some runtime code.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?enum isTransient(R) = is(typeof(() { static assert(isInputRange!R); static assert(hasIndirections(ElementType!R)); static assert(!allIndrectionsImmutable!(ElementType!R)); // need to write this }));
May 30 2016
On Sunday, 29 May 2016 at 17:45:00 UTC, Steven Schveighoffer wrote:On 5/27/16 7:42 PM, Seb wrote:allIndrectionsImmutable could probably just be is(T : immutable) (ie implicitly convertible to immutable). Value types without (or with immutable only) indirections should be convertible to immutable, since the value is being copied.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?enum isTransient(R) = is(typeof(() { static assert(isInputRange!R); static assert(hasIndirections(ElementType!R)); static assert(!allIndrectionsImmutable!(ElementType!R)); // need to write this })); -Steve
May 29 2016
On 5/29/16 11:46 PM, Alex Parrill wrote:On Sunday, 29 May 2016 at 17:45:00 UTC, Steven Schveighoffer wrote:Yes, I think that is correct. Thanks! -SteveOn 5/27/16 7:42 PM, Seb wrote:allIndrectionsImmutable could probably just be is(T : immutable) (ie implicitly convertible to immutable). Value types without (or with immutable only) indirections should be convertible to immutable, since the value is being copied.So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?enum isTransient(R) = is(typeof(() { static assert(isInputRange!R); static assert(hasIndirections(ElementType!R)); static assert(!allIndrectionsImmutable!(ElementType!R)); // need to write this }));
May 30 2016
On Friday, 27 May 2016 at 23:42:24 UTC, Seb wrote:So what about the convention to explicitely declare a `.transient` enum member on a range, if the front element value can change?An alternative solution is to extend data-flow/escape-analysis to forbit references to `scope`d-variables from leaking outside of the scope where it's declared. If the element variable in the `foreach`-statement is then qualifed with `scope` the developer can safely use the front-reference inside the foreach-scope without worrying about it leaking into the enclosing scopes. However, this solution, of course, requires the developer to remember to use the `scope` keyword every time he iterates over a transient range, which might not be what want in terms of simplicity. For a very technical plan on how to implement this in D see http://wiki.dlang.org/User:Schuetzm/scope Could this big undertaking be split up into smaller more managable parts?
May 30 2016