www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Recommendations on avoiding range pipeline type hell

reply Chris Piker <chris hoopjump.com> writes:
Hi D

Since the example of piping the output of one range to another 
looked pretty cool, I've tried my own hand at it for my current 
program, and the results have been... sub optimal.

Basically the issue is that if one attempts to make a range based 
pipeline aka:

```d
auto mega_range = range1.range2!(lambda2).range3!(lambda3);
```
Then the type definition of mega_range is something in the order 
of:

```d
   TYPE_range3!( TYPE_range2!( TYPE_range1, TYPE_lamba2 ), 
TYPE_lambda3));
```
So the type tree builds to infinity and the type of `range3` is 
very much determined by the lambda I gave to `range2`.  To me 
this seems kinda crazy.

To cut through all the clutter, I need something more like a unix 
command line:
```bash
prog1 | prog2 some_args | prog3 some_args
```
Here prog2 doesn't care what prog1 *is* just what it produces.

So pipelines that are more like:

```d
ET2 front2(ET1, FT)(ET1 element, FT lambda){ /* stuff */ }
ET3 front3(ET2, FT)(ET2 element, FT lambda){ /* stuff */ }

void main(){

   for(; !range1.empty; range1.popFront() )
   {
     ET3 el3 = front3( front2(range1.front, lambda2), lamda3) );
     writeln(el3);
   }
}
```

But, loops are bad.  On the D blog I've seen knowledgeable people 
say all loops are bugs.  But how do you get rid of them without 
descending into Type Hell(tm).  Is there anyway to get some type 
erasure on the stack?

The only thing I can think of is to use Interfaces and Classes 
like Java, but we don't have the automagical JVM reordering the 
heap at runtime, so that means living life on a scattered heap, 
just like python.

Is there some obvious trick or way of looking at the problem that 
I'm missing?

Thanks for your patience with a potentially dumb question.  I've 
been working on the code for well over 12 hours so I'm probably 
not thinking straight it this point.

Cheers all,
May 15 2021
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Saturday, 15 May 2021 at 11:25:10 UTC, Chris Piker wrote:
 Then the type definition of mega_range is something in the 
 order of:
The idea is you aren't supposed to care what the type is, just what attributes it has, e.g., can be indexed, or can be assigned, etc. You'd want to do it all in one big statement, with a consumer at the end (and pray there's no errors cuz while you're supposed to hide from the type, it won't hide from you if there's a problem, and as you know the errors might be usable if they were formatted better and got to the point but they're not and sometimes the compiler withholds vital information from you! Error message Anyway, you put it all in one bit thing and this is kinda important: avoid assigning it to anything. You'd ideally do all the work, from creation to conclusion, all in the big pipeline. So say you want to write it auto mega_range = range1.range2!(lambda2).range3!(lambda3); writeln(mega_range); that'd prolly work, writeln is itself flexible enough, but you'd prolly be better off doing like ``` range1 .range2!lambda2 .range3!lambda3 .each!writeln; // tell it to write out each element ``` Or since writeln is itself a range consumer you could avoid that .each call. But it is really useful for getting the result out of a bit mess for a normal function that isn't a full range consumer. (It is basically foreach written as a function call instead of as a loop) This way the concrete type never enters into things, it is all just a detail the compiler tracks to ensure the next consumer doesn't try to do things the previous step does not support.
 But, loops are bad.  On the D blog I've seen knowledgeable 
 people say all loops are bugs.
Meh, don't listen to that nonsense, just write what works for you. D's strength is that it adapts to different styles and meets you where you are. Listening to dogmatic sermons about idiomatic one true ways is throwing that strength away and likely to kill your personal productivity as you're fighting your instincts instead of making it work.
May 15 2021
next sibling parent reply Chris Piker <chris hoopjump.com> writes:
On Saturday, 15 May 2021 at 11:51:11 UTC, Adam D. Ruppe wrote:
 On Saturday, 15 May 2021 at 11:25:10 UTC, Chris Piker wrote:
 The idea is you aren't supposed to care what the type is, just 
 what attributes it has, e.g., can be indexed, or can be 
 assigned, etc.
(Warning, new user rant ahead. Eye rolling warranted and encouraged) I'm trying to do that, but range3 and range2 are written by me not a Phobos wizard, and there's a whole library of template functions a person needs to learn to make their own pipelines. For example: ```d // From std/range/package.d CommonType!(staticMap!(ElementType, staticMap!(Unqual, Ranges)) alias RvalueElementType = CommonType!(staticMap!(.ElementType, R)); // ... what's with the . before the ElementType statement? Line 921 says // .ElementType depends on RvalueElementType. How can they depend on // each other? Is this a recursive template thing? ``` and all the other automagic stuff that phobos pulls off to make ranges work. If that's what's needed to make a custom range type, then D ranges should come with the warning **don't try this at home**. (Ali's book made it look so easy that I got sucker in) Every time I slightly change the inputs to range2, then a function that operates on *range3* output types blows up with a helpful message similar to: ``` template das2.range.PrioritySelect!(PriorityRange!(DasRange!(Tuple!(int, int)[], int function(Tuple!(int, int)) pure nothrow nogc safe, int function(Tuple!(int, int)) pure nothrow nogc safe, Tuple!(int, int), int), int function() pure nothrow nogc safe), PriorityRange!(DasRange!(Tuple!(int, int)[], int function(Tuple!(int, int)) pure nothrow nogc safe, int function(Tuple!(int, int)) pure nothrow nogc safe, Tuple!(int, int), int), int function() pure nothrow nogc safe)).PrioritySelect.getReady.filter!((rng) => !rng.empty).filter cannot deduce function from argument types !()(PriorityRange!(DasRange!(Tuple!(int, int)[], int function(Tuple!(int, int)) pure nothrow nogc safe, int function(Tuple!(int, int)) pure nothrow nogc safe, Tuple!(int, int), int), int function() pure nothrow nogc safe), PriorityRange!(DasRange!(Tuple!(int, int)[], int function(Tuple!(int, int)) pure nothrow nogc safe, int function(Tuple!(int, int)) pure nothrow nogc safe, Tuple!(int, int), int), int function() pure nothrow nogc safe)) ``` What the heck is that?
 Anyway, you put it all in one bit thing and this is kinda 
 important: avoid assigning it to anything. You'd ideally do all 
 the work, from creation to conclusion, all in the big pipeline.
I fell back to using assignments just to make sure range2 values were saved in a concrete variable so that range3 didn't break when I changed the lambda that was run by range2 to mutate it's output elements. What went in to getting the element to range3's doorstep is a detail that I shouldn't have to care about inside range3 code, but am forced to care about it, because changing range2's type, changes range3's type and triggers really obscure error messages. (Using interfaces or *gasp* casts, would break the TMI situation.)
 So say you want to write it

 auto mega_range = range1.range2!(lambda2).range3!(lambda3);
 writeln(mega_range);

 that'd prolly work, writeln is itself flexible enough, but 
 you'd prolly be better off doing like
Sure it will work, because writeln isn't some function written by a new user, it's got all the meta magic.
 This way the concrete type never enters into things, it is all 
 just a detail the compiler tracks to ensure the next consumer 
 doesn't try to do things the previous step does not support.
It's all just a detail the compiler tracks, until you're not sending to writeln, but to your own data consumer. Then, you'd better know all of std.traits and std.meta cause you're going to need them too implement a range-of-ranges consumer. And by the way you have to use a range of ranges instead of an array of ranges because two ranges that look to be identical types, actually are not identical types and so can't go into the same array. Here's an actual (though formatted by me) error message I got stating that two things were different and thus couldn't share an array. Can you see the difference? I can't. Please point it out if you do. ```d das2/range.d(570,39): Error: incompatible types for (dr_fine) : (dr_coarse): das2.range.PriorityRange!( DasRange!( Take!( ZipShortest!( cast(Flag)false, Result, Generator!(function () safe => uniform(0, 128)) ) ), int function(Tuple!(int, int)) pure nothrow nogc safe, int function(Tuple!(int, int)) pure nothrow nogc safe, Tuple!(int, int), int ), int function() pure nothrow nogc safe ) and das2.range.PriorityRange!( DasRange!( Take!( ZipShortest!( cast(Flag)false, Result, Generator!(function () safe => uniform(0, 128)) ) ), int function(Tuple!(int, int)) pure nothrow nogc safe, int function(Tuple!(int, int)) pure nothrow nogc safe, Tuple!(int, int), int ), int function() pure nothrow nogc safe ) ```
 But, loops are bad.  On the D blog I've seen knowledgeable 
 people say all loops are bugs.
Meh, don't listen to that nonsense, just write what works for you. D's strength is that it adapts to different styles and meets you where you are. Listening to dogmatic sermons about idiomatic one true ways is throwing that strength away and likely to kill your personal productivity as you're fighting your instincts instead of making it work.
Insightful. Anyway, if you made it this far, you're a saint. Thanks for your time :)
May 15 2021
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Saturday, 15 May 2021 at 13:46:57 UTC, Chris Piker wrote:
 Every time I slightly change the inputs to range2, then a 
 function that operates on *range3* output types blows up with a 
 helpful message similar to:
 [snip]
If you post your code (or at least a self-contained subset of it) someone can probably help you figure out where you're running into trouble. The error messages by themselves do not provide enough information--all I can say from them is, "you must be doing something wrong."
May 15 2021
next sibling parent Chris Piker <chris hoopjump.com> writes:
On Saturday, 15 May 2021 at 14:05:34 UTC, Paul Backus wrote:
 If you post your code (or at least a self-contained subset of 
 it) someone can probably help you figure out where you're 
 running into trouble.
Smart idea. It's all on github. I'll fix a few items and send a link soon as I get a little shut eye.
 all I can say from them is, "you must be doing something wrong."
I bet you're right :) Take Care,
May 15 2021
prev sibling parent reply Chris Piker <chris hoopjump.com> writes:
On Saturday, 15 May 2021 at 14:05:34 UTC, Paul Backus wrote:

 If you post your code (or at least a self-contained subset of 
 it) someone can probably help you figure out where you're 
 running into trouble. The error messages by themselves do not 
 provide enough information--all I can say from them is, "you 
 must be doing something wrong."
I just tacked on `.array` in the the unittest and moved on for now, but for those who may be interested in the "equivalent but not equivalent" dmd error message mentioned above, the code is up on github. To trigger the error message: ```bash git clone git github.com:das-developers/das2D.git cd das2D ``` In file `das2/range.d`, comment out lines 550 & 553 and uncomment lines 557 & 558 to get alternate definitions of `coarse_recs` and `fine_recs` then run rdmd again: ```bash ``` In addition to the issue mentioned above, comments on any style issues, best practices or design choices are invited. By the way the writeln calls in the unittests just temporary.
May 16 2021
parent reply Jordan Wilson <wilsonjord gmail.com> writes:
On Sunday, 16 May 2021 at 07:20:52 UTC, Chris Piker wrote:
 On Saturday, 15 May 2021 at 14:05:34 UTC, Paul Backus wrote:

 If you post your code (or at least a self-contained subset of 
 it) someone can probably help you figure out where you're 
 running into trouble. The error messages by themselves do not 
 provide enough information--all I can say from them is, "you 
 must be doing something wrong."
I just tacked on `.array` in the the unittest and moved on for now, but for those who may be interested in the "equivalent but not equivalent" dmd error message mentioned above, the code is up on github. To trigger the error message: ```bash git clone git github.com:das-developers/das2D.git cd das2D ``` In file `das2/range.d`, comment out lines 550 & 553 and uncomment lines 557 & 558 to get alternate definitions of `coarse_recs` and `fine_recs` then run rdmd again: ```bash ``` In addition to the issue mentioned above, comments on any style issues, best practices or design choices are invited. By the way the writeln calls in the unittests just temporary.
Essentially, `dr_fine` and `dr_coarse` are different types. For example: ```bash echo 'import std; void main() { auto a = [a,"test"]; }' | dmd ``` Another example: ```d auto r = [iota(1,10).map!(a => a.to!int),iota(1,10).map!(a => ``` Using ```.array``` on both of the elements of r will compile. Thanks, Jordan
May 16 2021
parent reply Chris Piker <chris hoopjump.com> writes:
On Sunday, 16 May 2021 at 09:17:47 UTC, Jordan Wilson wrote:

 Another example:
 ```d
 auto r = [iota(1,10).map!(a => a.to!int),iota(1,10).map!(a => 
 a.to!int)];

 ```
Hi Jordan Nice succinct example. Thanks for looking at the code :) So, honest question. Does it strike you as odd that the exact same range definition is considered to be two different types? Maybe that's eminently reasonable to those with deep knowledge, but it seems crazy to a new D programmer. It breaks a general assumption about programming when copying and pasting a definition yields two things that aren't the same type. (except in rare cases like SQL where null != null.) On a side note, I appreciate that `.array` solves the problem, but I'm writing pipelines that are supposed to work on arbitrarily long data sets (> 1.4 TB is not uncommon).
May 16 2021
next sibling parent reply SealabJaster <sealabjaster gmail.com> writes:
On Sunday, 16 May 2021 at 09:55:31 UTC, Chris Piker wrote:
 Maybe that's eminently reasonable to those with deep knowledge, 
 but it seems crazy to a new D programmer.  It breaks a general 
 assumption about programming when copying and pasting a 
 definition yields two things that aren't the same type. (except 
 in rare cases like SQL where null != null.)
It's due to a quirk with passing lambdas as template arguments. Each lambda is actually separated into its own function. It's kind of hard to explain, but examine this code: ```d // runnable version: https://run.dlang.io/is/NbU3iT struct S(alias Func) { pragma(msg, __traits(identifier, Func)); } int func(int a) { return a*2; } void main() { auto a = S!(a => a*2)(); auto b = S!(a => a*2)(); // Comment above. Then uncomment below for a working version. /* auto a = S!func(); auto b = S!func(); */ pragma(msg, typeof(a)); pragma(msg, typeof(b)); a = b; } ``` In its given state, this is the following output: ``` __lambda1 __lambda3 S!((a) => a * 2) S!((a) => a * 2) onlineapp.d(24): Error: cannot implicitly convert expression `b` of type `onlineapp.main.S!((a) => a * 2)` to `onlineapp.main.S!((a) => a * 2)` ``` So while `typeof(a)` and `typeof(b)` **look** like they are the same, in actuality you can see that `auto a` uses `__lambda1`, whereas `auto b` uses `__lambda3`. This means that, even though visually they should be equal, they are in fact two entirely separate types. So if you had a nested `Result` struct, it'd look more like `S!__lambda1.Result` and `S!__lambda3.Result`, instead of just `S!IDENTICAL_LAMBDA.Result`. Confusing, I know... So if we change this to using a non-lambda function (doing the commenting/uncommenting as mentioned in the code) then we get successful output: ``` func S!(func) S!(func) ``` p.s. I love that you can debug D within D.
May 16 2021
parent reply Chris Piker <chris hoopjump.com> writes:
On Sunday, 16 May 2021 at 10:10:54 UTC, SealabJaster wrote:

 It's due to a quirk with passing lambdas as template arguments. 
 Each lambda is actually separated into its own function.
Hey that was a very well laid out example. Okay, I think the light is starting do dawn. So if I use lambdas as real arguments (not type arguments) then I'm just storing function pointers and I'm back to C land where I understand what's going on. Though I may lose some safety, I gain some sanity. For example: ```d alias EXPLICIT_TYPE = int function (int); struct Tplt(FT) { FT f; } void main() { auto a = Tplt!(EXPLICIT_TYPE)( a => a+3); auto b = Tplt!(EXPLICIT_TYPE)( a => a*2); a = b; // Lambdas as arguments instead of types works } ``` Here's the non-lambda version of your example that helped me to understand what's going on, and how the function called get's mashed into the type (even though `typeid` doesn't tell us that's what happens): ```d struct S(alias Func) { pragma(msg, __traits(identifier, Func)); } int func1(int a){ return a*2; } int func2(int a){ return a*2; } void main() { auto a = S!func1(); auto b = S!func2(); pragma(msg, typeof(a)); pragma(msg, typeof(b)); a = b; } ``` I'm going to go above my station and call this a bug in typeof/typeid. If the function name is part of the type, that should be stated explicitly to make the error messages more clear. We depend on those type names in compiler messages to understand what's going on. Cheers,
May 16 2021
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 16 May 2021 at 12:54:19 UTC, Chris Piker wrote:
    a = b; // Lambdas as arguments instead of types works
Wait a sec, when you do the ```d auto a = S!(a => a*2)(); ``` That's not actually passing a type. That's passing the (hidden) name of a on-the-spot-created function template as a compile time parameter. If it was just a type, you'd probably be ok! It is that on-the-spot-created bit that trips up. The compiler doesn't both looking at the content of the lambda to see if it is repeated from something earlier. It just sees that shorthand syntax and blindly expands it to: ``` static int __some_internal_name_created_on_line_5(int a) { return a*2; } ``` (even that isn't entirely true, because you didn't specify the type of `a` in your example, meaning the compiler actually passes ``` template __some_internal_name_created_on_line_5(typeof_a) { auto __some_internal_name_created_on_line_5(typeof_a a) { return a*2; } } ``` and that *template* is instantiated with the type the range passes to the lambda - inside the range's implementation - to create the *actual* function that it ends up calling. but that's not really important to the point since you'd get the same thing even if you did specify the types in this situation.) Anyway, when you repeat it later, it makes *another*: ``` static int __some_internal_name_created_on_line_8(int a) { return a*2; } ``` And passes that. Wanna know what's really nuts? If it is made in the context of another template, even being on the same line won't save you from duplicates. It creates a new copy of the lambda for each and every distinct context it sees. Same thing in a different object? Another function. Different line? Another function. Different template argument in the surrounding function? Yet another function. In my day job thing at one point one little `(a,b) => a < b` sorting lambda exploded to *two gigabytes* of generated identical functions in the compiler's memory, and over 100 MB in the generated object files. simply moving that out to a top-level function eliminated all that bloat... most of us could barely believe such a little thing had such a profound impact. It would be nice if the compiler could collapse those duplicates by itself, wouldn't it? But... void main() { auto a = (int arg) => arg + 1; auto b = (int arg) => arg + 1; assert(a is b); } Should that assert pass? are those two actually the same function? Right now it does NOT, but should it? That's a question for philosophers and theologians, way above my pay grade. Then a practical worry, how does the compiler tell if two lambdas are actually identical? There's a surprising number of cases that look obvious to us, but aren't actually. Suppose it refers to a different local variable. Or ends up with a different type of argument. Or what if they came from separate compilation units? It is legitimately more complex than it seems at first glance. I digress again... where was I? Oh yeah, since it is passing an alias to the function to the range instead of the type, the fact that they're considered distinct entities - even if just because the implementation is lazy and considers that one was created on line 5 and one was created on line 8 to be an irreconcilable difference - means the range based on that alias now has its own distinct type. Indeed, passing the lambda as a runtime arg fixes this to some extent since at least then the type match up. But there's still a bit of generated code bloat (not NEARLY as much! but still a bit). For best results, declare your own function as close to top-level as you can with as many discrete types as you can, and give it a name. Don't expect the compiler to automatically factor things out for you. (for now, i still kinda hope the implementation can improve someday.) This is obviously more of a hassle. Even with a runtime param you have to specify more than just `a => a*2`... minimally like `(int a) => a*2`.
 ```d
 struct S(alias Func)
 {
    pragma(msg, __traits(identifier, Func));
 }

 int func1(int a){ return a*2; }

 int func2(int a){ return a*2; }

 void main()
 {
    auto a = S!func1();
    auto b = S!func2();

    pragma(msg, typeof(a));
    pragma(msg, typeof(b));
    a = b;
 }

 ```
 I'm going to go above my station and call this a bug in 
 typeof/typeid.
Wait, what's the bug there? The typeof DOES tell you they are separate. Error: cannot implicitly convert expression `b` of type `S!(func2)` to `S!(func1)` Just remember it isn't the function name per se, it is the symbol `alias` that S is taking. Which means it is different for each symbol passed... The alias in the parameter list tells you it is making a new type for each param. Same as if you did struct S(int a) {} S!1 would be a distinct type from S!2.
May 16 2021
parent reply Chris Piker <chris hoopjump.com> writes:
On Sunday, 16 May 2021 at 13:35:02 UTC, Adam D. Ruppe wrote:
 Wait, what's the bug there? The typeof DOES tell you they are 
 separate.

 Error: cannot implicitly convert expression `b` of type 
 `S!(func2)` to `S!(func1)`
Sorry, it's a forum post, so I really should have been more explicit. It seems there's a broken symmetry in compiler error reporting for the following, ostensibly identical, cases: ```d struct S(alias Func){ } int func1(int a){ return a*2; } int func2(int a){ return a*2; } void main() { auto a = S!func1(); auto b = S!func2(); a = b; // Error message is understandable auto c = S!((int a) => a*2)(); auto d = S!((int a) => a*2)(); c = d; // Error message says a thing != to a same thing } ``` Error messages formatted below for readability: ``` test.d(12): Error: cannot implicitly convert expression b of type S!(func2) to S!(func1) test.d(17): Error: cannot implicitly convert expression d of type test.S!(function (int a) pure nothrow nogc safe => a * 2) to test.S!(function (int a) pure nothrow nogc safe => a * 2) ``` As new D programmer, this really threw me off the debugging trail. For the second case, had the compiler reported some in the order of: ``` test.d(17): Error: cannot implicitly convert expression d of type test.S!(__lambda_temp_main_1) to test.S!(__lambda_temp_main_2) Hint: All lambda function instances have unique auto-generated names ``` It would have saved a lot of head scratching. Oh well, I learned quite a bit from this exchange so that's productive. Thanks all
May 16 2021
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 16 May 2021 at 22:17:16 UTC, Chris Piker wrote:
 It seems there's a broken symmetry in compiler error reporting 
 for the following, ostensibly identical, cases:
Oh yes, I completely agree with you. Sometimes error messages even use the name but it is from a different module so it is like "cannot assign Color to Color" but one of them is from arsd.color and one is from std.experimental.color... it just wouldn't tell you that. (I think this has been fixed btw) so yeah same thing here, it should at least check if str1 == str2, show more info.
May 16 2021
parent reply SealabJaster <sealabjaster gmail.com> writes:
On Sunday, 16 May 2021 at 23:52:06 UTC, Adam D. Ruppe wrote:
 ...
I've opened a PR (https://github.com/dlang/dmd/pull/12526) with a super hacked together proof-of-concept. As I say in the PR I don't know if I'm actually capable of pushing it forward if the idea gets accepted, but I've decided to at least try to see if this kind of thing is even palatable to the compiler devs.
May 16 2021
next sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 17 May 2021 at 00:27:01 UTC, SealabJaster wrote:
 I've opened a PR (https://github.com/dlang/dmd/pull/12526) with 
 a super hacked together proof-of-concept.
oh very good! I was going to add something similar to my own todo list but who knows when I'd get around to it. This kind of thing would be a big help.
May 16 2021
prev sibling parent Chris Piker <chris hoopjump.com> writes:
On Monday, 17 May 2021 at 00:27:01 UTC, SealabJaster wrote:
 On Sunday, 16 May 2021 at 23:52:06 UTC, Adam D. Ruppe wrote:
 ...
I've opened a PR (https://github.com/dlang/dmd/pull/12526) with a super hacked together proof-of-concept. As I say in the PR I don't know if I'm actually capable of pushing it forward if the idea gets accepted, but I've decided to at least try to see if this kind of thing is even palatable to the compiler devs.
Wow! That's good news. Thanks! Here's hoping that future versions of dmd do a bit more whitespace formatting of error messages. The current undifferentiated text walls are an early low quality user experience that can drive people away before they are invested in the language. --- A final note on the initial problem that started this thread: My overall my program works now. (None too soon, management meeting is tomorrow) To solve my previous Type Hell(tm) problems I've found that `std.range.interfaces` is my new best friend.
May 16 2021
prev sibling parent reply SealabJaster <sealabjaster gmail.com> writes:
On Sunday, 16 May 2021 at 12:54:19 UTC, Chris Piker wrote:
 ...
If all you need is a single type for the parameter(s) and return type, then it can be simplified a bit to save you some typing: ```d struct S(T) { alias FT = T function(T); FT func; } void main() { auto a = S!int(a => a*2); auto b = S!int(a => a+2); a = b; } ```
May 16 2021
parent SealabJaster <sealabjaster gmail.com> writes:
On Sunday, 16 May 2021 at 20:32:08 UTC, SealabJaster wrote:
 ...
You could even make a helper function that lets the compiler infer the lambda's type for you: ```d struct S(T) { alias FT = T function(T); FT func; } S!T s(T)(T function(T) func) { return S!T(func); } void main() { auto a = S!int(a => a*2); auto b = s((int a) => a+2); a = b; } ``` But that can give you awful looking error messages if you forget to add the param typing.
May 16 2021
prev sibling next sibling parent Jordan Wilson <wilsonjord gmail.com> writes:
On Sunday, 16 May 2021 at 09:55:31 UTC, Chris Piker wrote:
 On Sunday, 16 May 2021 at 09:17:47 UTC, Jordan Wilson wrote:

 Another example:
 ```d
 auto r = [iota(1,10).map!(a => a.to!int),iota(1,10).map!(a => 
 a.to!int)];

 ```
Hi Jordan Nice succinct example. Thanks for looking at the code :) So, honest question. Does it strike you as odd that the exact same range definition is considered to be two different types? Maybe that's eminently reasonable to those with deep knowledge, but it seems crazy to a new D programmer. It breaks a general assumption about programming when copying and pasting a definition yields two things that aren't the same type. (except in rare cases like SQL where null != null.) On a side note, I appreciate that `.array` solves the problem, but I'm writing pipelines that are supposed to work on arbitrarily long data sets (> 1.4 TB is not uncommon).
There are those far more learned than me that could help explain. But in short, yes, it did take a little getting used to it - I would recommend looking at Voldemort types for D. Ironically, use of Voldemort types and range-based programming is what helps me perform large data processing. Jordan
May 16 2021
prev sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Sunday, 16 May 2021 at 09:55:31 UTC, Chris Piker wrote:
 On Sunday, 16 May 2021 at 09:17:47 UTC, Jordan Wilson wrote:

 Another example:
 ```d
 auto r = [iota(1,10).map!(a => a.to!int),iota(1,10).map!(a => 
 a.to!int)];

 ```
Hi Jordan Nice succinct example. Thanks for looking at the code :) So, honest question. Does it strike you as odd that the exact same range definition is considered to be two different types?
Even in C ``` typedef struct { int a; } type1; ``` and ``` struct { int a; } type2; ``` are two different types. The compiler will give an error if you pass one to a function waiting for the other. ``` void fun(type1 v) { } type2 x; fun(x); // gives error ``` See https://godbolt.org/z/eWenEW6q1
 Maybe that's eminently reasonable to those with deep knowledge, 
 but it seems crazy to a new D programmer.  It breaks a general 
 assumption about programming when copying and pasting a 
 definition yields two things that aren't the same type. (except 
 in rare cases like SQL where null != null.)
 On a side note, I appreciate that `.array` solves the problem, 
 but I'm writing pipelines that are supposed to work on 
 arbitrarily long data sets (> 1.4 TB is not uncommon).
May 16 2021
prev sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Saturday, 15 May 2021 at 13:46:57 UTC, Chris Piker wrote:
 I'm trying to do that, but range3 and range2 are written by me 
 not a Phobos wizard, and there's a whole library of template 
 functions a person needs to learn to make their own pipelines.  
 For example:
Phobos has plenty of design flaws, you don't want to copy that. Generally you should just accept the range with a simple foreach in your handler. ``` void processRange(R)(R r) { foreach(item; r) { // use item } } ``` If you want it forwarded in a pipeline, make a predicate that works on an individual item and pass it to `map` instead of trying to forward everything. If you're creating a range, only worry about the basic three functions: empty, front, and popFront. That's the minimum then it works with most phobos things too. That's where I balance ease of use with compatibility - those three basics let the phobos ones iterate through your generated data. Can't jump around but you can do a lot with just that. (personally btw I don't even use most of this stuff at all)
 // ... what's with the . before the ElementType statement?
Now that is good to know: that is a D language thing meaning "look this up at top level". So like let's say you are writing a module with ``` void func(); class Foo { void func(); void stuff() { func(); } } ``` The func inside stuff would normally refer to the local method; it is shorthand for `this.func();`. But what if you want that `func` from outside the class? That's where the . comes in: ``` void func(); class Foo { void func(); void stuff() { .func(); // now refers to top-level, no more `this` } } ``` In fact, it might help to think of it as specifically NOT wanting `this.func`, so you leave the this out.
 What the heck is that?
idk i can't read that either, the awful error message are one reason why i don't even use this style myself (and the other is im just not on the functional bandwagon...) Most the time std.algorithm vomits though it is because some future function required a capability that got dropped in the middle. For example: some_array.filter.sort would vomit because sort needs random access, but filter drops that. So like sort says "give me the second element" but filter doesn't know what the second element is until it actually processes the sequence - it might filter out ALL the elements and it has no way of knowing if anything is left until it actually performs the filter. And since all these algorithms are lazy, it puts off actually performing anything until it has some idea what the end result is supposed to be. The frequent advice here is to stick ".array" in the middle, which performs the operation up to that point and puts the result in a freshly-created array. This works, but it also kinda obscures why it is there and sacrifices the high performance the lazy pipeline is supposed to offer, making it process intermediate data it might just discard at the next step anyway. Rearranging the pipeline so the relatively destructive items are last can sometimes give better results. (But on the other hand, sorting 100,000 items when you know 99,000 are going to be filtered out is itself wasted time... so there's no one right answer.) anyway idk what's going on in your case. it could even just be a compile error in a predicate, like a typo'd name. it won't tell you, it just vomits up so much spam it could fill a monty python sketch.
 messages. (Using interfaces or *gasp* casts, would break the 
 TMI situation.)
i <3 interfaces it is a pity to me cuz D's java-style OOP is actually pretty excellent. a few little things I'd fix if I could, a few nice additions I could dream up, but I'm overall pretty happy with it and its error messages are much better. but too many people in the core team are allergic to classes. and i get it, classes do cost you some theoretical performance, and a lot of people's class hierarchies are hideous af, but hey they work and give pretty helpful errors. Most the time.
 better know all of std.traits and std.meta cause you're going 
 to need them too implement a range-of-ranges consumer.
Write your function like it is Python or javascript - use the properties you want on an unconstrained template function. void foo(T)(T thing) { // use thing.whatever // or thing[whatever] // or whatever you need } Even if that's a range of ranges: void foo(T)(T thing) { foreach(range; thing) foreach(item; range) // use item. } It will work if you actually get a range of ranges and if not, you get an error anyway. It isn't like the constraint ones are readable, so just let this fail where it may. (In fact, I find the non-contraint messages to be a little better! I'd rather see like "cannot foreach over range" than "no match for <spam>") I don't even think phobos benefits from its traits signatures. If you do it wrong it won't compile the same as if you do all the explicit checks. But again, if you're doing some intermediate processing... try to use map, filter, fold, and friends... since doing the forwarding they do is legitimately complicated and my little foo consumers here don't even touch it.
 Here's an actual (though formatted by me) error message I got 
 stating that two things were different and thus couldn't share 
 an array.  Can you see the difference?  I can't.  Please point 
 it out if you do.
idk.... maybe with the full code i could guess and check my way to something but i too lazy rn tbh.
May 15 2021
parent Chris Piker <chris hoopjump.com> writes:
Thanks to everyone who has replied.  You've given me a lot to 
think about, and since I'm not yet fluent in D it will take a bit 
to digest it all, though one thing is clear.

This community is one of the strong features of D.

I will mention it to others as a selling point.

Best,
May 15 2021
prev sibling parent Jerry <labuurii gmail.com> writes:
On Saturday, 15 May 2021 at 11:51:11 UTC, Adam D. Ruppe wrote:
 Meh, don't listen to that nonsense, just write what works for 
 you. D's strength is that it adapts to different styles and 
 meets you where you are. Listening to dogmatic sermons about 
 idiomatic one true ways is throwing that strength away and 
 likely to kill your personal productivity as you're fighting 
 your instincts instead of making it work.
+1
May 19 2021
prev sibling next sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Saturday, 15 May 2021 at 11:25:10 UTC, Chris Piker wrote:

 Is there some obvious trick or way of looking at the problem 
 that I'm missing?
In addition to what Adam said, if you do need to store the result for use in a friendlier form, just import `std.array` and append `.array` to the end of the pipeline. This will eagerly allocate space for and copy the range elements to an array, i.e., convert the range to a container: ```d auto mega_range = range1.range2!(lambda2).range3!(lambda3).array; ``` Sometimes you may want to set up a range and save it for later consumption, but not necessarily as a container. In that case, just store the range itself as you already do, and pass it to a consumer when you're ready. That might be `.array` or it could be `foreach` or something else. ```d auto mega_range = range1.range2!(lambda2).range3!(lambda3); // later foreach(elem; mega_range) { doStuff(elem); } ```
May 15 2021
parent Chris Piker <chris hoopjump.com> writes:
On Saturday, 15 May 2021 at 13:43:29 UTC, Mike Parker wrote:
 On Saturday, 15 May 2021 at 11:25:10 UTC, Chris Piker wrote:
 In addition to what Adam said, if you do need to store the 
 result for use in a friendlier form, just import `std.array` 
 and append `.array` to the end of the pipeline. This will 
 eagerly allocate space for and copy the range elements to an 
 array, i.e., convert the range to a container:
Thanks for the suggestion. Unfortunately the range is going to be 40+ years of Voyager magnetometer data processed in a pipeline. I am trying to do everything in functional form, but the deep type dependencies (and my lack of knowledge) are crushing my productivity. I might have to stop trying to write idiomatic D and start writing Java-in-D just to move this project along. Fortunately, D supports that too.
May 15 2021
prev sibling next sibling parent Mike Parker <aldacron gmail.com> writes:
On Saturday, 15 May 2021 at 11:25:10 UTC, Chris Piker wrote:

 Thanks for your patience with a potentially dumb question.  
 I've been working on the code for well over 12 hours so I'm 
 probably not thinking straight it this point.
BTW, I can send you a couple of documents regarding ranges that you may or may not find useful. Please email me at aldacron gmail.com if you're interested.
May 15 2021
prev sibling next sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 5/15/21 4:25 AM, Chris Piker wrote:

 But, loops are bad.
I agree with Adam here. Although most of my recent code gravitates towards long range expressions, I use 'foreach' (even 'for') when I think it makes code more readable.
 Is there some obvious trick or way of looking at the problem that I'm
 missing?
The following are idioms that I use: * The range is part of the type: struct MyType(R) { R myRange; } * If the type is too complicated as in your examples: struct MyType(R) { R myRange; } auto makeMyType(X, Y)(/* ... */) { auto myArg = foo!X.bar!Y.etc; return MyType!(typeof(myArg))(myArg); } * If my type can't be templated: struct MyType { alias MyRange = typeof(makeMyArg()); MyRange myRange; } // For the alias to work above, all parameters of this // function must have default values so that the typeof // expression is as convenient as above. auto makeMyArg(X, Y)(X x = X.init, Y y = Y.init) { // Then, you can put some condition checks here if // X.init and Y.init are invalid values for your // program. return foo!X.bar!Y.etc; } I think that's all really. And yes, sometimes there are confusing error messages but the compiler is always right. :) Ali
May 15 2021
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sat, May 15, 2021 at 11:25:10AM +0000, Chris Piker via Digitalmars-d-learn
wrote:
[...]
 Basically the issue is that if one attempts to make a range based
 pipeline aka:
 
 ```d
 auto mega_range = range1.range2!(lambda2).range3!(lambda3);
 ```
 Then the type definition of mega_range is something in the order of:
 
 ```d
   TYPE_range3!( TYPE_range2!( TYPE_range1, TYPE_lamba2 ), TYPE_lambda3));
 ```
 So the type tree builds to infinity and the type of `range3` is very
 much determined by the lambda I gave to `range2`.  To me this seems
 kinda crazy.
Perhaps it's crazy, but as others have mentioned, these are Voldemort types; you're not *meant* to know what the concrete type is, merely that it satisfies the range API. It's sorta kinda like the compile-time functional analogue of a Java-style interface: you're not meant to know what the concrete derived type is, just that it implements that interface. [...]
 But, loops are bad.  On the D blog I've seen knowledgeable people say
 all loops are bugs.
I wouldn't say all loops are bugs. If they were, why does D still have looping constructs? :D But it's true that most loops should be refactored into functional-style components instead. Nested loops are especially evil if written carelessly or uncontrollably.
 But how do you get rid of them without descending into Type Hell(tm).
Generally, when using ranges you just let the compiler infer the type for you, usually with `auto`: auto mySuperLongPipeline = inputData .map!(...) .filter!(...) .splitter!(...) .joiner!(...) .whateverElseYouGot();
 Is there anyway to get some type erasure on the stack?
You can wrap a range in a heap-allocated OO object using the helpers in std.range.interfaces, e.g., .inputRangeObject. Then you can use the interface as a handle to refer to the range. Once I wrote a program almost entirely in a single pipeline. It started from a math function, piped into an abstract 2D array (a generalization of ranges), filtered, transformed, mapped into a color scheme, superimposed on top of some rendered text, then piped to a pipeline-based implementation of PNG-generation code that produced a range of bytes in a PNG file that's then piped into std.stdio.File.bufferedWrite. The resulting type of the main pipeline was so hilariously huge, that in an older version of dmd it produced a mangled symbol several *megabytes* long (by that I mean the *name* of the symbol was measured in MB), not to mention tickled several O(N^2) algorithms in dmd that caused it to explode in memory consumption and slow down to an unusable crawl. The mangled symbol problem was shortly fixed, probably partly due to my complaint about it :-P -- kudos to Rainer for the fix! Eventually I inserted this line into my code: .arrayObject // Behold, type erasure! (which is my equivalent of .inputRangeObject) and immediately observed a significant speedup in compilation time and reduction in executable size. :-D The pipeline-based PNG emitter also leaves a lot to be desired in terms of runtime speed... if I were to do this again, I'd go for a traditional imperative-style PNG generator with hand-coded loops instead of the fancy pipeline-based one I wrote. Pipelines are all good and everything, but sometimes you *really* just need a good ole traditional OO-style heap allocation and hand-written loop. Don't pick a tool just because of the idealism behind it, I say, pick the tool best suited for the job. T -- Computers aren't intelligent; they only think they are.
May 16 2021