www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Bidirectional range dilemma

reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
I'm repeatedly hitting a range API problem in my combinatorics 
library. Many of the ranges require (potentially sizeable) 
internal buffers to maintain their current state (permutations, 
set partitions, power sets etc.) Also, many of these ranges are 
bidirectional. What this means is they have to store *two* 
buffers, one for the front, and one for the back.

This is a problem because 90% of the time you'll only iterate 
forward, and of the remaining 10%, 9% is just iterating 
backwards, and maybe 1% of the time you want to iterate from both 
the front and back in the same iteration. This means that 99% of 
the time, having two buffers is just wasted memory. Of course, 
I'm plucking these numbers from the air, but you get the idea.

To make matters worse, thanks to the lack of logical const 
there's no way the buffers can be lazily allocated on first use.

The only (ugly) solution I can come up with is to create three 
different ranges, e.g.

ForwardPermutations
ReversePermutations
BidirectionalPermutations

(or equivalently, one range with an iteration policy).

I'd rather not do that. Any ideas on how to resolve this?
Jan 13 2013
next sibling parent David <d dav1d.de> writes:
Am 13.01.2013 13:05, schrieb Peter Alexander:
 To make matters worse, thanks to the lack of logical const there's no
 way the buffers can be lazily allocated on first use.
can't be Rebindable used for that?
Jan 13 2013
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Peter Alexander:

 (or equivalently, one range with an iteration policy).
If no other better solution is found, one range with an iteration policy is acceptable, in my opinion. (By the way, what's the API of your functions? In similar generators I usually add a boolean doCopy template argument, that defaults to true. If it's true, the generator yields different buffers, otherwise it yields always the same modified buffer. This allows to have both convenience & safety on default, and speed on request). Bye, bearophile
Jan 13 2013
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Sunday, 13 January 2013 at 12:34:38 UTC, bearophile wrote:
 (By the way, what's the API of your functions? In similar 
 generators I usually add a boolean doCopy template argument, 
 that defaults to true. If it's true, the generator yields 
 different buffers, otherwise it yields always the same modified 
 buffer. This allows to have both convenience & safety on 
 default, and speed on request).
Interesting. I do empathize with having convenience and safety by default and speed on demand, but in this case I worry about the performance lost by not having speed as the default, especially since I imagine that most of the time the safety is unneeded. If speed is the default then the template parameter is unneeded since you can just do permutations(r).map!array(). This is what I currently do.
Jan 13 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
 Peter Alexander:
 especially since I imagine that most of the time the safety is 
 unneeded.
The lazyness of D ranges is useful, but unlike Haskell, in D you can't always use lazyness, sometimes you need eagerness, because D is strongly array-based. A newbie programmer, or a new user of the permutations functions that has not studied its API, will find permutations() more handy and safe if it allocates a new buffer for each permutation. You remember the discussions about File.byLine that is not allocating one array for each line on default, but it yields char[] so when you convert it to a string you usually copy the buffer. So its unsafety is limited. This is not true for permutations(). I have found that in small script-like programs I sometimes (about half of the time?) prefer an eager generation of combinations or permutations, because it's more handy. You don't always need max speed in the code. Safety on default and speed on request is one of the tenets of D language. And it's a good thing. Not allocating a buffer for each permutation is premature optimization. In past I have had several bugs caused by generators that don't allocate a buffer each time. When you fix some of such bugs, you learn to appreciate safety on default. So my current generators of permutations, combinations, permutations by swapping, derangements, lexicographic generator, etc are safe on default. It makes my life simpler. Bye, bearophile
Jan 13 2013
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Sunday, 13 January 2013 at 14:01:28 UTC, bearophile wrote:
 Safety on default and speed on request is one of the tenets of 
 D language. And it's a good thing.

 Not allocating a buffer for each permutation is premature 
 optimization.
You may have convinced me. I'll need to think more about it. FWIW, it is not a premature optimisation. On a simple benchmark I did, adding .dup to front() increased runtime by 3.5x on DMD with all optimisations. 3.5 is the difference between C and Javascript in the Computer Language Benchmarks Game. http://benchmarksgame.alioth.debian.org/u32/which-programs-are-fastest.php I do care about safety, but I also believe that performance is critically important to D's success. Performance on demands is good in theory, but if I'm writing high performance code then I want performance by default, and I don't want to have to fill my code with annotations and special flags/options. I think we need a D mission statement that we can refer to, to settle these disputes. How much performance loss is acceptable in the name of safety by default?
Jan 13 2013
next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Sunday, 13 January 2013 at 15:49:13 UTC, Peter Alexander wrote:
 On Sunday, 13 January 2013 at 14:01:28 UTC, bearophile wrote:
 Safety on default and speed on request is one of the tenets of 
 D language. And it's a good thing.

 Not allocating a buffer for each permutation is premature 
 optimization.
You may have convinced me. I'll need to think more about it. FWIW, it is not a premature optimisation. On a simple benchmark I did, adding .dup to front() increased runtime by 3.5x on DMD with all optimisations. 3.5 is the difference between C and Javascript in the Computer Language Benchmarks Game. http://benchmarksgame.alioth.debian.org/u32/which-programs-are-fastest.php I do care about safety, but I also believe that performance is critically important to D's success. Performance on demands is good in theory, but if I'm writing high performance code then I want performance by default, and I don't want to have to fill my code with annotations and special flags/options. I think we need a D mission statement that we can refer to, to settle these disputes. How much performance loss is acceptable in the name of safety by default?
You spend most of the time in the same piece of code. That one must be annotated and everything, but most code could be written in javascript running in a VM written in PHP that it would change much. Plus, you probably here are benchmarking mostly the GC, which is known to be inefficient in D.
Jan 13 2013
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Peter Alexander:

 FWIW, it is not a premature optimisation. On a simple benchmark 
 I did, adding .dup to front() increased runtime by 3.5x on DMD 
 with all optimisations.
Yes, around 3X is compatible with my timings. I think with a generational GC (you can test it in Java), or with some kind of arena allocator, that performance loss will decrease significantly (but then you need to add a memory allocator template argument, that I presume defaults to the normal GC allocator).
 I do care about safety, but I also believe that performance is 
 critically important to D's success.
I care a lot for performance (where performance is needed), but I care even more for correctness :-) A fast but buggy program is often useless and sometimes it's even dangerous.
 Performance on demands is good in theory, but if I'm writing 
 high performance code then I want performance by default, and I 
 don't want to have to fill my code with annotations and special 
 flags/options.
Compare: foreach (p; permutations(items)) {... Vs: foreach (p; permutations!false(items)) {... Or even: foreach (p; permutations!0(items)) {... Or even: alias fastPermutations = permutations!false; foreach (p; fastPermutations(items)) {...
 I think we need a D mission statement that we can refer to, to 
 settle these disputes. How much performance loss is acceptable 
 in the name of safety by default?
I don't think a "mission statement" is enough to settle all such questions, in life there is too much variety. You have to decide on specific cases, on the base of few general rules, like the D Zen rule I have written in the precedent answer. Bye, bearophile
Jan 13 2013
prev sibling next sibling parent reply "Phil Lavoie" <maidenphil hotmail.com> writes:
 This is a problem because 90% of the time you'll only iterate 
 forward, and of the remaining 10%, 9% is just iterating 
 backwards, and maybe 1% of the time you want to iterate from 
 both the front and back in the same iteration. This means that 
 99% of the time, having two buffers is just wasted memory. Of 
 course, I'm plucking these numbers from the air, but you get 
 the idea.
You might just suggest these numbers like that, but in my experience they seem accurate.
 To make matters worse, thanks to the lack of logical const 
 there's no way the buffers can be lazily allocated on first use.

 The only (ugly) solution I can come up with is to create three 
 different ranges, e.g.

 ForwardPermutations
 ReversePermutations
 BidirectionalPermutations
That. I am not sure why you find it so ugly however :(. When the most capable/general range yields a noticeable cost on performance/space consumption (which seems to be the case here), I think it better to empower the user with choices rather than try to make decisions for him/her. Documentating how bidirectionalPerms modifies perf/space might just be what you need to make peace with yourself. In the event a user REALLY needs the bidirectional functionality, he/she will most likely read that documentation and comprehend why it wasn't the default in the first place and I am sure they will be thankful to know why.
Jan 13 2013
parent reply "Phil Lavoie" <maidenphil hotmail.com> writes:
Personally,

I would have the policy implemented:
permutations( Policy p = Policy.forward )() {
...
}

And maybe extend it with aliases:

alias fPermutations permutations!( Policy.forward );
alias rPermutations permutations!( Policy.backward );
alias biPermutations permutations!( Policy.bidirectional );

Cheers!
Jan 13 2013
parent reply "Phil Lavoie" <maidenphil hotmail.com> writes:
On Sunday, 13 January 2013 at 16:31:30 UTC, Phil Lavoie wrote:
 Personally,

 I would have the policy implemented:
 permutations( Policy p = Policy.forward )() {
 ...
 }

 And maybe extend it with aliases:

 alias fPermutations permutations!( Policy.forward );
 alias rPermutations permutations!( Policy.backward );
 alias biPermutations permutations!( Policy.bidirectional );

 Cheers!
And of course I managed to invert alias terms again!
Jan 13 2013
next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Sunday, 13 January 2013 at 16:54:24 UTC, Phil Lavoie wrote:
 On Sunday, 13 January 2013 at 16:31:30 UTC, Phil Lavoie wrote:
 Personally,

 I would have the policy implemented:
 permutations( Policy p = Policy.forward )() {
 ...
 }

 And maybe extend it with aliases:

 alias fPermutations permutations!( Policy.forward );
 alias rPermutations permutations!( Policy.backward );
 alias biPermutations permutations!( Policy.bidirectional );

 Cheers!
And of course I managed to invert alias terms again!
Just use the new syntax. alias fPermutations = permutations!( Policy.forward ); alias rPermutations = permutations!( Policy.backward ); alias biPermutations = permutations!( Policy.bidirectional );
Jan 13 2013
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Phil Lavoie:

 And of course I managed to invert alias terms again!
Use the new syntax: alias forwardPermutations = permutations!(CombPolicy.forward); Bye, bearophile
Jan 13 2013
parent "Phil Lavoie" <maidenphil hotmail.com> writes:
On Sunday, 13 January 2013 at 17:26:16 UTC, bearophile wrote:
 Phil Lavoie:

 And of course I managed to invert alias terms again!
Use the new syntax: alias forwardPermutations = permutations!(CombPolicy.forward); Bye, bearophile
Love it
Jan 13 2013
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Jan 13, 2013 at 01:05:25PM +0100, Peter Alexander wrote:
 I'm repeatedly hitting a range API problem in my combinatorics
 library. Many of the ranges require (potentially sizeable) internal
 buffers to maintain their current state (permutations, set
 partitions, power sets etc.) Also, many of these ranges are
 bidirectional. What this means is they have to store *two* buffers,
 one for the front, and one for the back.
[...]
 The only (ugly) solution I can come up with is to create three
 different ranges, e.g.
 
 ForwardPermutations
 ReversePermutations
 BidirectionalPermutations
 
 (or equivalently, one range with an iteration policy).
[...] On Sun, Jan 13, 2013 at 05:23:24PM +0100, Phil Lavoie wrote: [...]
 That. I am not sure why you find it so ugly however :(. When the
 most capable/general range yields a noticeable cost on
 performance/space consumption (which seems to be the case here), I
 think it better to empower the user with choices rather than try to
 make decisions for him/her.
+1.
 Documentating how bidirectionalPerms modifies perf/space might just
 be what you need to make peace with yourself. In the event a user
 REALLY needs the bidirectional functionality, he/she will most
 likely read that documentation and  comprehend why it wasn't the
 default in the first place and I am sure they will be thankful to
 know why.
Another point I'd like to ask is, are these ranges transient, or do they allocate per iteration? If they are transient (i.e., .front is invalidated upon calling popFront()), they need to be clearly marked as such. Also, I think the default setting should be non-transient, and only when the user asks for higher performance, a transient range should be given. T -- Guns don't kill people. Bullets do.
Jan 14 2013