digitalmars.D - Lots of low hanging fruit in Phobos

Walter Bright (12/12) Mar 06 2014 A major goal for D in the short term is to reduce reliance in Phobos on ...

Vladimir Panteleev (7/12) Mar 06 2014 Would be pretty neat if std.string and std.regex would work with

Steven Schveighoffer (5/9) Mar 06 2014 Perhaps he means many of the functions only accept const(Char)[], where ...
Dmitry Olshansky (9/20) Mar 06 2014 Exactly. I've been toying with idea of having generic notion of
Peter Alexander (3/8) Mar 06 2014 The same way std.algorithm does: through equality predicates.
Walter Bright (4/8) Mar 06 2014 Use 'static if' to detect the element type.

Meta (7/22) Mar 06 2014 Seems like a good idea to reduce memory usage wherever possible,

Walter Bright (11/16) Mar 06 2014 1. By using template specializations, algorithms can be custom optimized...

H. S. Teoh (6/16) Mar 06 2014 [...]

Adam D. Ruppe (11/12) Mar 06 2014 I think most the string functions should be transformative, like
Walter Bright (23/36) Mar 06 2014 A great question. I tend to regard output ranges as suitable for a conta...

H. S. Teoh (31/73) Mar 06 2014 After some thought, and also some recent experiences, I've come to the

Walter Bright (3/4) Mar 06 2014 I know. But we need to make the effort, at least for Phobos. The payoff ...
bearophile (6/8) Mar 06 2014 Some persons have quickly shot down this idea that I have
Nick Sabalausky (22/34) Mar 06 2014 Yes, this has been #1 on my wishlist for ranges for some time. We really...

Walter Bright (1/1) Mar 06 2014 These are good ideas, but for now we need to just bite the bullet and fi...

Nick Sabalausky (2/4) Mar 06 2014 Agreed.

Timon Gehr (2/6) Mar 07 2014 I think that the separation of front and popFront causes much of this.

H. S. Teoh (12/20) Mar 07 2014 I'm on the fence about that one. On the one hand, some ranges that do

Walter Bright (3/5) Mar 07 2014 I usually start writing an InputRange by pasting in a boilerplate versio...
Adam D. Ruppe (40/42) Mar 07 2014 Eh, I don't think it is a big deal and would be fairly limited

bearophile (12/33) Mar 07 2014 In generally this is rather bad, unless you are a C programmer

bearophile (5/7) Mar 08 2014 The optimization for Python yield is present in F# too:

bearophile (4/5) Mar 08 2014 I have not yet found the paper, but I think this was the topic:

bearophile (7/8) Mar 08 2014 Here it is:

Adam D. Ruppe (35/38) Mar 07 2014 I slapped this together to see if we can do it with a mixin

bearophile (9/12) Mar 07 2014 In general a typed syntax could be:

H. S. Teoh (34/52) Mar 07 2014 [...]

Timon Gehr (5/31) Mar 08 2014 This does not do the same thing. It computes the first row even if it is...

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (3/7) Mar 09 2014 Please have a look at this pull request:

Nick Sabalausky (35/74) Mar 08 2014 Oh it would definitely be for nothing more advanced than a forward

Nick Sabalausky (5/11) Mar 08 2014 The troublesome thing about that is anyone who writes a generator in D

Timon Gehr (4/23) Mar 08 2014 The drawback is that without the compiler being really clever, ranges

H. S. Teoh (5/33) Mar 08 2014 We can achieve at least forward ranges with coroutines, if they're pure.

Piotr Szturmaj (2/4) Mar 08 2014 https://github.com/pszturmaj/dgenerators

Nick Sabalausky (19/23) Mar 08 2014 Yea, there is that approach, which is nice in certain ways. I did the

w0rp (9/10) Mar 08 2014 I think that's what I would go for. yield with co-routines could

bearophile (8/10) Mar 09 2014 Yet there's no need for that. You can have your pie and eat it

Peter Alexander (4/14) Mar 09 2014 How does this handle recursive generators, e.g. like you would

Tobias Pankrath (3/19) Mar 09 2014 The generators mentioned are stack-less. You'd need to maintain

Peter Alexander (3/24) Mar 09 2014 That's unfortunate. I find that non-recursive generators are

bearophile (106/109) Mar 09 2014 A Python 2.x test program:

Jakob Ovrum (16/29) Mar 06 2014 While I prefer this approach most of the time, I fear it has a

Walter Bright (11/26) Mar 06 2014 std.buffer.scopebuffer was developed for that purpose, and as the discus...

Dmitry Olshansky (5/10) Mar 07 2014 Fun is most of std.algorithm is special-cased on strings. So we have

Walter Bright <newshound2 digitalmars.com> writes:

A major goal for D in the short term is to reduce reliance in Phobos on the GC. 
I was looking at std.string last night, and I noticed a couple things:

1. The inputs are constrained to being strings. This is overly restrictive, the 
inputs should be InputRanges.

2. The outputs should be a range, too. In fact, the string functions should 
become algorithms. Then they won't need to allocate any memory at all.

The existing functions should not be removed, but perhaps rewritten as wrappers 
around the algorithm versions.

I've found that rewriting traditional code, which is what std.string is now, in 
terms of algorithms is a bit mind-bending. But it's well worth it, and fun.

So who wants to step up? Don't have to do the whole thing in one go, just pick
a 
function and do that one.

Mar 06 2014

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Thursday, 6 March 2014 at 21:26:45 UTC, Walter Bright wrote:
 The existing functions should not be removed, but perhaps 
 rewritten as wrappers around the algorithm versions.

How does one handle case sensitivity for ranges of abstract types?

 I've found that rewriting traditional code, which is what 
 std.string is now, in terms of algorithms is a bit 
 mind-bending. But it's well worth it, and fun.

Would be pretty neat if std.string and std.regex would work with 
char-like types which actually carry more data per character. 
That way, it'd be possible to do string/regex transforms (search 
& replace, etc.) but keep track where exactly each character came 
from.

Mar 06 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 06 Mar 2014 16:30:46 -0500, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 On Thursday, 6 March 2014 at 21:26:45 UTC, Walter Bright wrote:
 The existing functions should not be removed, but perhaps rewritten as  
 wrappers around the algorithm versions.

 How does one handle case sensitivity for ranges of abstract types?

Perhaps he means many of the functions only accept const(Char)[], where  
Char can be templatized, but it really should accept any range.

-Steve

Mar 06 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

07-Mar-2014 01:30, Vladimir Panteleev пишет:
 On Thursday, 6 March 2014 at 21:26:45 UTC, Walter Bright wrote:
 The existing functions should not be removed, but perhaps rewritten as
 wrappers around the algorithm versions.

 How does one handle case sensitivity for ranges of abstract types?

+1

 I've found that rewriting traditional code, which is what std.string
 is now, in terms of algorithms is a bit mind-bending. But it's well
 worth it, and fun.

 Would be pretty neat if std.string and std.regex would work with
 char-like types which actually carry more data per character. That way,
 it'd be possible to do string/regex transforms (search & replace, etc.)
 but keep track where exactly each character came from.

Exactly. I've been toying with idea of having generic notion of 
Alphabets for more then a year now. That would also be generalizing code 
unit / code point stuff of Unicode, into legacy encodings and beyond.

One case I had in mind is the very limited (A, C, T, G) alphabet in 
bio-informatics.

-- 
Dmitry Olshansky

Mar 06 2014

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Thursday, 6 March 2014 at 21:30:47 UTC, Vladimir Panteleev 
wrote:
 On Thursday, 6 March 2014 at 21:26:45 UTC, Walter Bright wrote:
 The existing functions should not be removed, but perhaps 
 rewritten as wrappers around the algorithm versions.

 How does one handle case sensitivity for ranges of abstract 
 types?

The same way std.algorithm does: through equality predicates.

Mar 06 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/6/2014 1:30 PM, Vladimir Panteleev wrote:
 On Thursday, 6 March 2014 at 21:26:45 UTC, Walter Bright wrote:
 The existing functions should not be removed, but perhaps rewritten as
 wrappers around the algorithm versions.

 How does one handle case sensitivity for ranges of abstract types?

Use 'static if' to detect the element type.

And besides, I had meant that an algorithm should work on an InputRange of 
char's, and not be restricted to char[].

Mar 06 2014

"Meta" <jared771 gmail.com> writes:

On Thursday, 6 March 2014 at 21:26:45 UTC, Walter Bright wrote:
 A major goal for D in the short term is to reduce reliance in 
 Phobos on the GC. I was looking at std.string last night, and I 
 noticed a couple things:

 1. The inputs are constrained to being strings. This is overly 
 restrictive, the inputs should be InputRanges.

 2. The outputs should be a range, too. In fact, the string 
 functions should become algorithms. Then they won't need to 
 allocate any memory at all.

 The existing functions should not be removed, but perhaps 
 rewritten as wrappers around the algorithm versions.

 I've found that rewriting traditional code, which is what 
 std.string is now, in terms of algorithms is a bit 
 mind-bending. But it's well worth it, and fun.

 So who wants to step up? Don't have to do the whole thing in 
 one go, just pick a function and do that one.

Seems like a good idea to reduce memory usage wherever possible, 
but I thought that the reason std.string exists (and duplicates 
some functionality that exists elsewhere) is to provide 
string-specific versions that were either optimized specifically 
for strings, or have completely different functionality due to 
working with strings.

Mar 06 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/6/2014 1:40 PM, Meta wrote:
 Seems like a good idea to reduce memory usage wherever possible, but I thought
 that the reason std.string exists (and duplicates some functionality that
exists
 elsewhere) is to provide string-specific versions that were either optimized
 specifically for strings, or have completely different functionality due to
 working with strings.

1. By using template specializations, algorithms can be custom optimized for 
certain inputs.

2. I expect the developer of these algorithms to do performance profiling of 
them vs the originals, and address any problems.

3. I've done some similar algorithms, and was able to achieve performance 
parity, and sometimes even do better (because no memory needed to be allocated).


std.string was one of the first Phobos modules written, was written by myself, 
and long predates notions of ranges and algorithms. It has evolved since then, 
but its fundamental nature has not changed. It's time for that to change to a 
modern, D style.

Mar 06 2014

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Mar 06, 2014 at 01:26:46PM -0800, Walter Bright wrote:
 A major goal for D in the short term is to reduce reliance in Phobos
 on the GC. I was looking at std.string last night, and I noticed a
 couple things:
 
 1. The inputs are constrained to being strings. This is overly
 restrictive, the inputs should be InputRanges.
 
 2. The outputs should be a range, too. In fact, the string functions
 should become algorithms. Then they won't need to allocate any
 memory at all.

[...]

What about using output ranges?


T

-- 
What do you call optometrist jokes? Vitreous humor.

Mar 06 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 7 March 2014 at 00:44:45 UTC, H. S. Teoh wrote:
 What about using output ranges?

I think most the string functions should be transformative, like 
std.algorithm.map, so they take an input range and return an 
input range.

This lets them chain most easily, letting the user sink them into 
a particular range at the end.


Though we could do a bit of magic to both take an output range 
and return an input range for it (which can also be 
backward-compatible, as we talked about in a thread a month or so 
ago), the most straightforward way is surely to treat it all like 
map.

Mar 06 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/6/2014 4:43 PM, H. S. Teoh wrote:
 On Thu, Mar 06, 2014 at 01:26:46PM -0800, Walter Bright wrote:
 A major goal for D in the short term is to reduce reliance in Phobos
 on the GC. I was looking at std.string last night, and I noticed a
 couple things:

 1. The inputs are constrained to being strings. This is overly
 restrictive, the inputs should be InputRanges.

 2. The outputs should be a range, too. In fact, the string functions
 should become algorithms. Then they won't need to allocate any
 memory at all.

 [...]

 What about using output ranges?

A great question. I tend to regard output ranges as suitable for a container to 
expose. An algorithm reads an InputRange, and presents its output as another 
InputRange. This is so that algorithms can be easily chained together by the 
user, and the last algorithm in the chain would be
std.algorithm.copy(OutputRange).

For example, we could implement toStringz() as an algorithm that looks like:

     auto toStringz(S)(S s) {
         return chain(s, repeat(0, 1))
     }

and use it like this:

     string s = ...;
     char[] buffer = ...;
     s.toStringz.copy(buffer);

Note how the algorithm version of toStringz does not allocate any memory.
buffer 
would be the output range, but note how what it actually is is, and how its 
memory is allocated, is selected by the user.

(std.buffer.scopebuffer is ideal for this sort of usage.)

std.file is loaded with calls to toStringz(), each of which leaks memory quite 
unnecessarily. Using an algorithm version of toStringz(), along with 
scopebuffer, will neatly eliminated these leaks.

You might be tempted to think that these are just little leaks, who cares. But
I 
recently wrote a D program that did tens of thousands of file operations, and 
those little leaks turned into a raging river.

Mar 06 2014

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Mar 06, 2014 at 05:15:45PM -0800, Walter Bright wrote:
 On 3/6/2014 4:43 PM, H. S. Teoh wrote:
On Thu, Mar 06, 2014 at 01:26:46PM -0800, Walter Bright wrote:
A major goal for D in the short term is to reduce reliance in Phobos
on the GC. I was looking at std.string last night, and I noticed a
couple things:

1. The inputs are constrained to being strings. This is overly
restrictive, the inputs should be InputRanges.

2. The outputs should be a range, too. In fact, the string functions
should become algorithms. Then they won't need to allocate any
memory at all.

[...]

What about using output ranges?

 
 A great question. I tend to regard output ranges as suitable for a
 container to expose. An algorithm reads an InputRange, and presents
 its output as another InputRange. This is so that algorithms can be
 easily chained together by the user, and the last algorithm in the
 chain would be std.algorithm.copy(OutputRange).

After some thought, and also some recent experiences, I've come to the
same conclusion.  Output ranges are basically sinks, which means they
are no good for further chaining (attempting to do so is an exercise in
frustration). So pretty much everything that generates or transforms
data should return an input range, and output ranges should only be used
when you're going to stop processing the data at that point, e.g., save
it into a buffer, write to a file, etc..

Unfortunately, input ranges are somewhat tedious to write -- nice
foreach loops have to be broken up into .empty, .front, .popFront, which
is a lot of boilerplate code and not so nice in inner loops (though I
suppose the idea is to improve compiler inlining to handle these cases).
I wonder if a mid- to long-term goal for D would be to ease writing
input ranges by moving some of the boilerplate into the language, or
providing range builder facilities in Phobos. The most nagging part of
writing input ranges in the current language is the inability to use
foreach to generate the returned elements. (Well, you *can* do that to a
buffer array and then return the array, but that kinda defeats the
purpose of GC avoidance.) Some kind of built-in coroutine syntax with
'yield' would help things a LOT.


 For example, we could implement toStringz() as an algorithm that looks
 like:
 
     auto toStringz(S)(S s) {
         return chain(s, repeat(0, 1))
     }
 
 and use it like this:
 
     string s = ...;
     char[] buffer = ...;
     s.toStringz.copy(buffer);
 
 Note how the algorithm version of toStringz does not allocate any
 memory. buffer would be the output range, but note how what it
 actually is is, and how its memory is allocated, is selected by the
 user.

Yes, I like this. This is the right way to go.


[...]
 You might be tempted to think that these are just little leaks, who
 cares. But I recently wrote a D program that did tens of thousands
 of file operations, and those little leaks turned into a raging
 river.

Yeah, I'm currently working on a program that generates potentially huge
numbers of n-dimensional coordinates, and currently the code does this
using ranges that return array literals (ouch). This is pretty bad for
performance and also for GC pressure, so I'm thinking to restructure the
code some time in the near future to get rid of obligatory allocations,
much like how you describe it above.


T

-- 
Customer support: the art of getting your clients to pay for your own
incompetence.

Mar 06 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/6/2014 5:37 PM, H. S. Teoh wrote:
 Unfortunately, input ranges are somewhat tedious to write

I know. But we need to make the effort, at least for Phobos. The payoff is it 
makes user code much less effort.

Mar 06 2014

"bearophile" <bearophileHUGS lycos.com> writes:

H. S. Teoh:

 Some kind of built-in coroutine syntax with
 'yield' would help things a LOT.

Some persons have quickly shot down this idea that I have 
suggested:
https://d.puremagic.com/issues/show_bug.cgi?id=11880

Bye,
bearophile

Mar 06 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 3/6/2014 8:37 PM, H. S. Teoh wrote:
 Unfortunately, input ranges are somewhat tedious to write -- nice
 foreach loops have to be broken up into .empty, .front, .popFront, which
 is a lot of boilerplate code and not so nice in inner loops (though I
 suppose the idea is to improve compiler inlining to handle these cases).
 I wonder if a mid- to long-term goal for D would be to ease writing
 input ranges by moving some of the boilerplate into the language, or
 providing range builder facilities in Phobos. The most nagging part of
 writing input ranges in the current language is the inability to use
 foreach to generate the returned elements. (Well, you *can* do that to a
 buffer array and then return the array, but that kinda defeats the
 purpose of GC avoidance.) Some kind of built-in coroutine syntax with
 'yield' would help things a LOT.


need a way to make input ranges (maybe even forward ranges if we're 

automatically converted behind-the-scenes into stackless fibers (ie, 
into state machines).

A while back, I tried doing a library solution for this via regular 
fibers, but there turned out to be a lot of overhead due to the fiber's 
context-switching [1].

We need to take inspiration from two things: C's "protothreads" 

IMO. In both cases, the body of a coroutine is automagically transformed 
into a big switch statement. Yield statements then become roughly 
"return tuple([yielded value], [resume point]); case [resume 
point]:...". And then invoking the coroutine again automatically passes 
in the last [resume point] returned so the big switch() can jump 
straight to where it left off. The main difference in our case would be 
that we'd also auto-generate all the boilerplate rigging for this to be 
an input range.

[1] 
http://semitwist.com/articles/article/view/combine-coroutines-and-input-ranges-for-dead-simple-d-iteration

[2] C's "protothreads" library: http://dunkels.com/adam/pt/expansion.html

Mar 06 2014

Walter Bright <newshound2 digitalmars.com> writes:

These are good ideas, but for now we need to just bite the bullet and fix
Phobos.

Mar 06 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 3/6/2014 10:59 PM, Walter Bright wrote:
 These are good ideas, but for now we need to just bite the bullet and
 fix Phobos.

Agreed.

Mar 06 2014

Timon Gehr <timon.gehr gmx.ch> writes:

On 03/07/2014 02:37 AM, H. S. Teoh wrote:
 Unfortunately, input ranges are somewhat tedious to write -- nice
 foreach loops have to be broken up into .empty, .front, .popFront, which
 is a lot of boilerplate code and not so nice in inner loops (though I
 suppose the idea is to improve compiler inlining to handle these cases).

I think that the separation of front and popFront causes much of this.

Mar 07 2014

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sat, Mar 08, 2014 at 01:14:29AM +0100, Timon Gehr wrote:
 On 03/07/2014 02:37 AM, H. S. Teoh wrote:
Unfortunately, input ranges are somewhat tedious to write -- nice
foreach loops have to be broken up into .empty, .front, .popFront,
which is a lot of boilerplate code and not so nice in inner loops
(though I suppose the idea is to improve compiler inlining to handle
these cases).

 
 I think that the separation of front and popFront causes much of this.

I'm on the fence about that one. On the one hand, some ranges that do
complex calculations in .front are better off with .popAndReturnFront.
But on the other hand, there are also algorithms for which you don't
want to advance the range just to pick off its front element. Since the
separated semantics can't be achieved with a non-separated range API, I
lean slightly towards favoring the current split approach.

Having a way to auto-generate input range boilerplate, though, would be
really, *really* nice. Coroutine-style code would be ideal.


T

-- 
Bomb technician: If I'm running, try to keep up.

Mar 07 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/7/2014 5:09 PM, H. S. Teoh wrote:
 Having a way to auto-generate input range boilerplate, though, would be
 really, *really* nice. Coroutine-style code would be ideal.

I usually start writing an InputRange by pasting in a boilerplate version, then 
modifying it.

Mar 07 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Saturday, 8 March 2014 at 01:10:38 UTC, H. S. Teoh wrote:
 Having a way to auto-generate input range boilerplate, though, 
 would be really, *really* nice.

Eh, I don't think it is a big deal and would be fairly limited 
compared to the current setup. If you use a fiber or state 
variable or something for the yield this yield that trick, how do 
you go backward? Random access?

I think the best the yield stuff can do is maybe forward range 
and maybe infinite (probably with an annotation of some sort, 
since otherwise, the infiniteness wouldn't be obvious at compile 
time).


So the best we're looking to automate is input or perhaps forward 
ranges. And how hard are these really to write?

yield query(string q) {
    auto result = c_query(toStringz(q));
    while(!HasRow(result))
       yield GetNextRow(result);
}

OK, that is kinda nice, but, is the status quo so bad? (BTW the 
reason I went with some kind of C database api here is everything 
else I could think of are actually pretty short when using 
std.algorithm functions to help define them.)

struct query {
     private Result result;
     this(string q) {
          result = c_query(toStringz(q));
          if(!empty) popFront();
     }

     Row front;
      property bool empty() { return HasRow(result); }
     void popFront() in { assert(!empty); } body {
          front = GetNextRow(result);
     }
}


It is certainly a bit longer, but it isn't that bad, and is 
easily extended to other range capabilities.


Translating recursive iteration to a range does take a bit more, 
you need to track your local variables and put them in a stack of 
your own, but even that isn't too hard (though a bit wordier).


I guess the whole yield thing can be kinda nice, I'm just not 
sure it is that big of a win given its other limitations compared 
to full ranges.

Mar 07 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Adam D. Ruppe:

 struct query {
     private Result result;
     this(string q) {
          result = c_query(toStringz(q));
          if(!empty) popFront();
     }

     Row front;
      property bool empty() { return HasRow(result); }
     void popFront() in { assert(!empty); } body {
          front = GetNextRow(result);
     }
 }


 It is certainly a bit longer, but it isn't that bad, and is 
 easily extended to other range capabilities.

Your code is badly formatted.


 Translating recursive iteration to a range does take a bit 
 more, you need to track your local variables and put them in a 
 stack of your own, but even that isn't too hard (though a bit 
 wordier).

In generally this is rather bad, unless you are a C programmer 
used to use tweezers and needles to implement your own hash set 
every other day :-(


 I guess the whole yield thing can be kinda nice, I'm just not 
 sure it is that big of a win given its other limitations 
 compared to full ranges.

yield is limited, but in a large amount of cases it's enough, 
especially if it's well implemented (now Python 3 has yield that 
is usable for coroutines too, and it has recently added another 
optimization). For the other cases you can use normal range 
protocols as you have done.

Bye,
bearophile

Mar 07 2014

"bearophile" <bearophileHUGS lycos.com> writes:

 (now Python 3 has yield that is usable for coroutines too, and 
 it has recently added another optimization).


http://theburningmonk.com/2011/09/fsharp-yield-vs-yield/



Bye,
bearophile

Mar 08 2014

"bearophile" <bearophileHUGS lycos.com> writes:



I have not yet found the paper, but I think this was the topic:
http://blogs.msdn.com/b/dsyme/archive/2007/09/22/some-details-on-f-computation-expressions-aka-monadic-or-workflow-syntax.aspx

Bye,
bearophile

Mar 08 2014

"bearophile" <bearophileHUGS lycos.com> writes:

 I have not yet found the paper,

Here it is:
http://tomasp.net/academic/papers/computation-zoo/

More on the same:
http://en.wikibooks.org/wiki/F_Sharp_Programming/Computation_Expressions
http://msdn.microsoft.com/en-us/library/dd233182.aspx

Bye,
bearophile

Mar 08 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Saturday, 8 March 2014 at 02:15:19 UTC, Adam D. Ruppe wrote:
 I guess the whole yield thing can be kinda nice, I'm just not 
 sure it is that big of a win given its other limitations 
 compared to full ranges.

I slapped this together to see if we can do it with a mixin 
template:

http://arsdnet.net/dcode/next.d

struct ShortRange {
	int next() {
		switch(callNumber) {
			case 0: return 50;
			case 1: return 60;
			case 2: return 70;
			case 3: return finished();
			default: assert(0);
		}
	}

	mixin MakeShortRange!();
}

Wordier than
yield ShortRange() { yield 50; yield 60; yield 70; }

but the same idea. (In fact, i'm sure the whole yield thing 
should pretty much just rewrite into something like this anyway).

My hypothetical from the previous post would be written:


struct ShortRange {
         Result result;
         this(string s) {
             result = c_query(s);
         }
	Row next() {
              if(!HasRow(result))
                  return finished();
              return GetNextRow(result);
	}

	mixin MakeShortRange!();
}

Which is a bit shorter and perhaps nicer to use than the long 
form range.

Mar 07 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Adam D. Ruppe:

 Wordier than
 yield ShortRange() { yield 50; yield 60; yield 70; }

 but the same idea.

In general a typed syntax could be:

yield(int) ShortRange() { yield 50; yield 60; yield 70; }

Or even:

yield(auto) ShortRange() { yield 50; yield 60; yield 70; }

So yield(auto) could be sugar for yield(auto):

yield ShortRange() { yield 50; yield 60; yield 70; }

Bye,
bearophile

Mar 07 2014

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sat, Mar 08, 2014 at 03:27:15AM +0000, bearophile wrote:
 Adam D. Ruppe:
 
Wordier than
yield ShortRange() { yield 50; yield 60; yield 70; }

but the same idea.

 
 In general a typed syntax could be:
 
 yield(int) ShortRange() { yield 50; yield 60; yield 70; }
 
 Or even:
 
 yield(auto) ShortRange() { yield 50; yield 60; yield 70; }
 
 So yield(auto) could be sugar for yield(auto):
 
 yield ShortRange() { yield 50; yield 60; yield 70; }

[...]

Yield syntax is generally not a big deal with simple ranges, but once
you start needing recursion and branching logic, things become
significantly hairier without yield. For example:

	yield ComplicatedRange(int arg) {
		if (arg == 0) {
			foreach (i; 0 .. 10)
				yield i;
		} else if (arg >= 1 && arg <= 5) {
			yield 10;
			yield 20;
			yield 30;
		} else if (arg >= 6) {
			foreach (i; 10 .. 20) {
				if (i >= 5 && i < 9) {
					yield 40;
					yield 50;
				} else {
					yield i*2;
				}
			}
		}
	}

With yield, the branching logic is clear and relatively easy to follow.
Without yield, this will turn into a nasty mess of nested state
variables, due to the different numbers of yielded items within nested
conditional blocks.

(Of course, you could argue that if things get this complicated, one
should be splitting it up into multiple composed ranges instead, but not
everything is readily decomposible.)


T

-- 
Bare foot: (n.) A device for locating thumb tacks on the floor.

Mar 07 2014

Timon Gehr <timon.gehr gmx.ch> writes:

On 03/08/2014 03:15 AM, Adam D. Ruppe wrote:
 So the best we're looking to automate is input or perhaps forward
 ranges. And how hard are these really to write?

 yield query(string q) {
     auto result = c_query(toStringz(q));
     while(!HasRow(result))
        yield GetNextRow(result);
 }

 OK, that is kinda nice, but, is the status quo so bad?
 (BTW the reason I
 went with some kind of C database api here is everything else I could
 think of are actually pretty short when using std.algorithm functions to
 help define them.)

 struct query {
      private Result result;
      this(string q) {
           result = c_query(toStringz(q));
           if(!empty) popFront();
      }

      Row front;
       property bool empty() { return HasRow(result); }
      void popFront() in { assert(!empty); } body {
           front = GetNextRow(result);
      }
 }


 It is certainly a bit longer, but it isn't that bad, and is easily
 extended to other range capabilities.

This does not do the same thing. It computes the first row even if it is 
never requested. std.algorithm.filter suffers from the same problem. 
This is part of the reason why I think the separation of front and 
popFront is suboptimal design.

Mar 08 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Saturday, 8 March 2014 at 09:25:38 UTC, Timon Gehr wrote:
 This does not do the same thing. It computes the first row even 
 if it is never requested. std.algorithm.filter suffers from the 
 same problem. This is part of the reason why I think the 
 separation of front and popFront is suboptimal design.

Please have a look at this pull request:
https://github.com/D-Programming-Language/phobos/pull/1987

Mar 09 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 3/7/2014 9:15 PM, Adam D. Ruppe wrote:
 On Saturday, 8 March 2014 at 01:10:38 UTC, H. S. Teoh wrote:
 Having a way to auto-generate input range boilerplate, though, would
 be really, *really* nice.

 Eh, I don't think it is a big deal and would be fairly limited compared
 to the current setup. If you use a fiber or state variable or something
 for the yield this yield that trick, how do you go backward? Random access?

Oh it would definitely be for nothing more advanced than a forward 
range. It's a coroutine, ie a generator function, so the elements are 
determined by execution flow. Since execution can't go in reverse or 
random-access, anything beyond forward range is ruled out. But you knew 
that :)

Forward range should be possible though (at least for pure-ish 
coroutines anyway), since all you'd have to do is clone the coroutine's 
internal state structure (which it must already have anyway, since you 
can't have coroutines without it).

While it couldn't be used for bidirectional or random-access ranges, it 
would still be a big help for generators - ie the whole point of 
coroutines in the first place. Other languages have coroutines - we'd 
have coroutines that can be used as ranges :)


 I think the best the yield stuff can do is maybe forward range and maybe
 infinite (probably with an annotation of some sort, since otherwise, the
 infiniteness wouldn't be obvious at compile time).


 So the best we're looking to automate is input or perhaps forward
 ranges. And how hard are these really to write?

 yield query(string q) {
     auto result = c_query(toStringz(q));
     while(!HasRow(result))
        yield GetNextRow(result);
 }

 OK, that is kinda nice, but, is the status quo so bad? (BTW the reason I
 went with some kind of C database api here is everything else I could
 think of are actually pretty short when using std.algorithm functions to
 help define them.)

 struct query {
      private Result result;
      this(string q) {
           result = c_query(toStringz(q));
           if(!empty) popFront();
      }

      Row front;
       property bool empty() { return HasRow(result); }
      void popFront() in { assert(!empty); } body {
           front = GetNextRow(result);
      }
 }


 It is certainly a bit longer, but it isn't that bad, and is easily
 extended to other range capabilities.


 Translating recursive iteration to a range does take a bit more, you
 need to track your local variables and put them in a stack of your own,
 but even that isn't too hard (though a bit wordier).


 I guess the whole yield thing can be kinda nice, I'm just not sure it is
 that big of a win given its other limitations compared to full ranges.

I dunno, what you wrote sounds to me like a pretty convincing argument 
in favor of coroutine ranges. ;)

Keep in mind, making *all* ranges easier to write isn't the charter 
here, and it doesn't need to be: Bidirectional and random-access ranges 
ARE more complicated than generators, so I think they're already just 
about as easy to write as we *can* make them.

Instead, the problem we're looking at solving here is exactly what 
you've described above: Making generators in D is more complicated and 
boilerplate-y than it really needs to be (unless you want to give up on 
ranges and use opApply - which still isn't particularly great without 
that mixin you suggested a couple years ago to cover up the return value 
ugliness).

Generators may be a subset of ranges, but they're an important subset. 

generators.

What really itches at me is that we're so tantalizingly close with 
opApply. The only real problem with it (aside from the return value 
system being kinda awkward compared to a "yield" statement) is that it 
can't be used as a range. And it can't be converted to a range without 
using Phobos fibers which imposes too much of an overhead to be a 
one-size-fits-all technique for generators.

Mar 08 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 3/8/2014 5:56 AM, Nick Sabalausky wrote:
 What really itches at me is that we're so tantalizingly close with
 opApply. The only real problem with it (aside from the return value
 system being kinda awkward compared to a "yield" statement) is that it
 can't be used as a range. And it can't be converted to a range without
 using Phobos fibers which imposes too much of an overhead to be a
 one-size-fits-all technique for generators.

The troublesome thing about that is anyone who writes a generator in D 
is forced to choose between [mostly] straightforward implementation 
logic (ie opApply) *or* range compatibility.

I think we can do better than to force an ugly choice like that.

Mar 08 2014

Timon Gehr <timon.gehr gmx.ch> writes:

On 03/08/2014 02:09 AM, H. S. Teoh wrote:
 On Sat, Mar 08, 2014 at 01:14:29AM +0100, Timon Gehr wrote:
 On 03/07/2014 02:37 AM, H. S. Teoh wrote:
 Unfortunately, input ranges are somewhat tedious to write -- nice
 foreach loops have to be broken up into .empty, .front, .popFront,
 which is a lot of boilerplate code and not so nice in inner loops
 (though I suppose the idea is to improve compiler inlining to handle
 these cases).

 I think that the separation of front and popFront causes much of this.

 I'm on the fence about that one. On the one hand, some ranges that do
 complex calculations in .front are better off with .popAndReturnFront.
 But on the other hand, there are also algorithms for which you don't
 want to advance the range just to pick off its front element. Since the
 separated semantics can't be achieved with a non-separated range API, I
 lean slightly towards favoring the current split approach.
 ...

I agree that there are trade-offs involved.

 Having a way to auto-generate input range boilerplate, though, would be
 really, *really* nice. Coroutine-style code would be ideal.
 ...

The drawback is that without the compiler being really clever, ranges 
thus defined will be just input ranges.

Mar 08 2014

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sat, Mar 08, 2014 at 10:38:50AM +0100, Timon Gehr wrote:
 On 03/08/2014 02:09 AM, H. S. Teoh wrote:
On Sat, Mar 08, 2014 at 01:14:29AM +0100, Timon Gehr wrote:
On 03/07/2014 02:37 AM, H. S. Teoh wrote:
Unfortunately, input ranges are somewhat tedious to write -- nice
foreach loops have to be broken up into .empty, .front, .popFront,
which is a lot of boilerplate code and not so nice in inner loops
(though I suppose the idea is to improve compiler inlining to
handle these cases).

I think that the separation of front and popFront causes much of
this.

I'm on the fence about that one. On the one hand, some ranges that do
complex calculations in .front are better off with
.popAndReturnFront.  But on the other hand, there are also algorithms
for which you don't want to advance the range just to pick off its
front element. Since the separated semantics can't be achieved with a
non-separated range API, I lean slightly towards favoring the current
split approach.  ...

 
 I agree that there are trade-offs involved.
 
Having a way to auto-generate input range boilerplate, though, would
be really, *really* nice. Coroutine-style code would be ideal.
...

 
 The drawback is that without the compiler being really clever,
 ranges thus defined will be just input ranges.

We can achieve at least forward ranges with coroutines, if they're pure.


T

-- 
Your inconsistency is the only consistent thing about you! -- KD

Mar 08 2014

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

W dniu 2014-03-08 02:09, H. S. Teoh pisze:
 Having a way to auto-generate input range boilerplate, though, would be
 really, *really* nice. Coroutine-style code would be ideal.

https://github.com/pszturmaj/dgenerators

Mar 08 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 3/8/2014 5:58 PM, Piotr Szturmaj wrote:
 W dniu 2014-03-08 02:09, H. S. Teoh pisze:
 Having a way to auto-generate input range boilerplate, though, would be
 really, *really* nice. Coroutine-style code would be ideal.

 https://github.com/pszturmaj/dgenerators

Yea, there is that approach, which is nice in certain ways. I did the 
same kinda thing a couple years ago[1], but your API appears much, much 
nicer.

The unfortunate downside, though, is that as jerro demonstrated[2] 
there's a high overhead when using fibers for iteration. It likely 
wouldn't be an issue for some things, like IO, but for other things it 
could be a problem.

So what we end up with is an unfortunate three-way choice anytime 
someone needs a generator in D:

1. Give up on range compatibility (opApply)
2. Give up on straightforward implementation (input/forward range)
3. Give up on maximum performance (fiber-based coroutine range)

D's just dancing all around the ideal solution for generators without 
ever actually hitting the mark. Which I find frustrating because I know 
that's a target D's capable of nailing.

[1] 
http://semitwist.com/articles/article/view/combine-coroutines-and-input-ranges-for-dead-simple-d-iteration

[2] http://forum.dlang.org/thread/jno6o5$qtb$1 digitalmars.com

Mar 08 2014

"w0rp" <devw0rp gmail.com> writes:

On Saturday, 8 March 2014 at 23:23:21 UTC, Nick Sabalausky wrote:
 3. Give up on maximum performance (fiber-based coroutine range)

I think that's what I would go for. yield with co-routines could 
be a quick and dirty way to create an InputRange at some 
performance cost. Then writing the InputRange by hand would then 
be an optimisation. I think I tend to write things in such a way 
where I have room to optimise in such a way, but I almost always 
reach for the easier to write, less performant option first. 
(It's a bonus when the easier way is also the more performant 
way.)

Mar 08 2014

"bearophile" <bearophileHUGS lycos.com> writes:

w0rp:

 3. Give up on maximum performance (fiber-based coroutine range)

 I think that's what I would go for.

Yet there's no need for that. You can have your pie and eat it 
too, with a small cost. D can rewrite code that contains a yield 
in an efficient finite state machine (this is what ShedSkin 

same).

Bye,
bearophile

Mar 09 2014

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Sunday, 9 March 2014 at 10:30:37 UTC, bearophile wrote:
 w0rp:

 3. Give up on maximum performance (fiber-based coroutine 
 range)

 I think that's what I would go for.

 Yet there's no need for that. You can have your pie and eat it 
 too, with a small cost. D can rewrite code that contains a 
 yield in an efficient finite state machine (this is what 

 does the same).

How does this handle recursive generators, e.g. like you would 
see with a depth-first tree traversal? The "state" in this case 
is a call stack. How is the memory allocated for the stack?

Mar 09 2014

"Tobias Pankrath" <tobias pankrath.net> writes:

On Sunday, 9 March 2014 at 11:00:13 UTC, Peter Alexander wrote:
 On Sunday, 9 March 2014 at 10:30:37 UTC, bearophile wrote:
 w0rp:

 3. Give up on maximum performance (fiber-based coroutine 
 range)

 I think that's what I would go for.

 Yet there's no need for that. You can have your pie and eat it 
 too, with a small cost. D can rewrite code that contains a 
 yield in an efficient finite state machine (this is what 

 does the same).

 How does this handle recursive generators, e.g. like you would 
 see with a depth-first tree traversal? The "state" in this case 
 is a call stack. How is the memory allocated for the stack?

The generators mentioned are stack-less. You'd need to maintain 
the stack yourself.

Mar 09 2014

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Sunday, 9 March 2014 at 11:17:14 UTC, Tobias Pankrath wrote:
 On Sunday, 9 March 2014 at 11:00:13 UTC, Peter Alexander wrote:
 On Sunday, 9 March 2014 at 10:30:37 UTC, bearophile wrote:
 w0rp:

 3. Give up on maximum performance (fiber-based coroutine 
 range)

 I think that's what I would go for.

 Yet there's no need for that. You can have your pie and eat 
 it too, with a small cost. D can rewrite code that contains a 
 yield in an efficient finite state machine (this is what 

 compiler does the same).

 How does this handle recursive generators, e.g. like you would 
 see with a depth-first tree traversal? The "state" in this 
 case is a call stack. How is the memory allocated for the 
 stack?

 The generators mentioned are stack-less. You'd need to maintain 
 the stack yourself.

That's unfortunate. I find that non-recursive generators are 
pretty easy to write as ranges anyway :-/

Mar 09 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Peter Alexander:

 How does this handle recursive generators, e.g. like you would 
 see with a depth-first tree traversal? The "state" in this case 
 is a call stack. How is the memory allocated for the stack?

A Python 2.x test program:


class Node:
     def __init__(self, data, left, right):
         self.data = data
         self.left = left
         self.right = right

def in_order(node):
     if node is not None:
         for n in in_order(node.left):
             yield n
         yield node.data
         for n in in_order(node.right):
             yield n

def main():
     tree = Node(1,
                 Node(2,
                      Node(4,
                           Node(7, None, None),
                           None),
                      Node(5, None, None)),
                 Node(3,
                      Node(6,
                           Node(8, None, None),
                           Node(9, None, None)),
                      None))

     for n in in_order(tree):
         print n,
     print

main()


It prints:

7 4 2 5 1 8 6 9 3


ShedSkin translates it to some C++ code, here I show only the 
in_order function:



class __gen_in_order : public __iter<__ss_int> {
public:
     Node *node;
     __iter<__ss_int> *__0, *__1, *__4, *__5;
     __ss_int __2, __6, n;
     __iter<__ss_int>::for_in_loop __3, __7;

     int __last_yield;

     __gen_in_order(Node *node) {
         this->node = node;
         __last_yield = -1;
     }

     __ss_int __get_next() {
         switch(__last_yield) {
             case 0: goto __after_yield_0;
             case 1: goto __after_yield_1;
             case 2: goto __after_yield_2;
             default: break;
         }
         if ((node!=NULL)) {

             FOR_IN(n,in_order(node->left),0,2,3)
                 __last_yield = 0;
                 __result = n;
                 return __result;
                 __after_yield_0:;
             END_FOR

             __last_yield = 1;
             __result = node->data;
             return __result;
             __after_yield_1:;

             FOR_IN(n,in_order(node->right),4,6,7)
                 __last_yield = 2;
                 __result = n;
                 return __result;
                 __after_yield_2:;
             END_FOR

         }
         __stop_iteration = true;
     }

};

__iter<__ss_int> *in_order(Node *node) {
     return new __gen_in_order(node);
}



Where FOR_IN is a macro to some foreach-like code:

#define FOR_IN(e, iter, temp, i, t) \




     { \




Note this is not efficient, because it could generate O(n^2) 
code, but this is not fault of ShedSkin, as the original Python 
code has the same problem.

They have recently (in Python 3.3) added to Python the syntax 
"yield from" that will allow to remove that performance problem 
(currently I think the performance is not improved):

http://legacy.python.org/dev/peps/pep-0380/

Look especially at the section regarding optimization:

http://legacy.python.org/dev/peps/pep-0380/#optimisations

With this improvement the in_order function becomes:

def in_order(node):
     if node is not None:
         yield from in_order(node.left)
         yield node.data
         yield from in_order(node.right)



kinds of yield are "yield" and "yield!":

http://theburningmonk.com/2011/09/fsharp-yield-vs-yield/

http://stackoverflow.com/questions/3500488/f-yield-operator-implementation-and-possible-c-sharp-equivalents

Bye,
bearophile

Mar 09 2014

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Friday, 7 March 2014 at 01:15:43 UTC, Walter Bright wrote:
 On 3/6/2014 4:43 PM, H. S. Teoh wrote:
 What about using output ranges?

 A great question. I tend to regard output ranges as suitable 
 for a container to expose. An algorithm reads an InputRange, 
 and presents its output as another InputRange. This is so that 
 algorithms can be easily chained together by the user, and the 
 last algorithm in the chain would be 
 std.algorithm.copy(OutputRange).

While I prefer this approach most of the time, I fear it has a 
performance cost over sink-style algorithms (whether they use an 
output range or build an array result). I guess the question is 
whether this cost is generally offset by gains in the memory 
allocation code or not.

 For example, we could implement toStringz() as an algorithm 
 that looks like:

     auto toStringz(S)(S s) {
         return chain(s, repeat(0, 1))
     }

std.range.only is more suitable than `repeat` here (and I don't 
know if `chain` would let you chain a range of characters with a 
range of integers):

auto toStringz(S)(S s)
     if (is(Unqual!(ElementType!S) == dchar)) {
     return s.chain('\0'.only());
}

Either way, perhaps the most serious problem is that when using 
`copy` to write the result to an array, both UTF decoding and 
re-encoding takes place (the result is always a range of `dchar`).

Mar 06 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/6/2014 5:54 PM, Jakob Ovrum wrote:
 While I prefer this approach most of the time, I fear it has a performance cost
 over sink-style algorithms (whether they use an output range or build an array
 result). I guess the question is whether this cost is generally offset by gains
 in the memory allocation code or not.

std.buffer.scopebuffer was developed for that purpose, and as the discussion on 
why it is the way it is shows, it was developed with extensive use of a
profiler 
to ensure there was no speed compromise from using it.


 For example, we could implement toStringz() as an algorithm that looks like:

     auto toStringz(S)(S s) {
         return chain(s, repeat(0, 1))
     }

 std.range.only is more suitable than `repeat` here

Yes, I missed that. I'm still learning about the proper way to use ranges.


 (and I don't know if `chain`
 would let you chain a range of characters with a range of integers):

My code fragment is untested and pretty much off the cuff. A real version would 
be a bit more complex, as you suggest.


 Either way, perhaps the most serious problem is that when using `copy` to write
 the result to an array, both UTF decoding and re-encoding takes place (the
 result is always a range of `dchar`).

I know. I struggled with that myself, and I object to the design decision that 
auto-decodes and auto-encodes any ranges that deal with char's. It
automatically 
turns fast code into molasses. I wound up casting everything to ubytes, ugh.

But it's too late to change that now, sigh.

Mar 06 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

07-Mar-2014 01:26, Walter Bright пишет:
 A major goal for D in the short term is to reduce reliance in Phobos on
 the GC. I was looking at std.string last night, and I noticed a couple
 things:

 1. The inputs are constrained to being strings. This is overly
 restrictive, the inputs should be InputRanges.

Fun is most of std.algorithm is special-cased on strings. So we have 
double volume of crap.


-- 
Dmitry Olshansky

Mar 07 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Lots of low hanging fruit in Phobos