digitalmars.D - More on "Component Programming"

bearophile (166/168) May 27 2013 This simple task on Rosettacode site is useful to show some uses

Sebastian Graf (10/11) May 27 2013 Every time I see that kind of code, my heart makes a delightful

bearophile (23/36) May 27 2013 There are several optimizations that D/DMD is not performing on

H. S. Teoh (11/22) May 27 2013 Library-defined rewrite rules would be �ber-cool if it were supported in
Andrei Alexandrescu (4/18) May 27 2013 I wrote that a while ago. Someone please review and pull
Walter Bright (2/7) May 27 2013 I don't really know what you're talking about.

bearophile (46/56) May 28 2013 Then maybe you have missed the last two times I have discussed

Peter Alexander (13/20) May 28 2013 Phobos does this a little bit in simple cases. For example,

Jonathan M Davis (9/14) May 28 2013 Yes. Phobos tries to optimize useless templates like that, and definitel...

Walter Bright (2/8) May 28 2013 Thank you.
bearophile (3/5) May 28 2013 Sorry, I meant the main D newsgroup.

Sebastian Graf (13/22) May 28 2013 Point taken, but what I actually meant was that the compiler is

Paulo Pinto (2/27) May 28 2013 Even with NGEN or Mono -aot?

Jacob Carlborg (4/6) May 28 2013 Why not use proper lambdas instead of strings?

bearophile (17/18) May 28 2013 Mostly for personal reasons: quoted strings are sometimes a

Paulo Pinto (3/15) May 28 2013 Microsoft could actually spend a bit more time on their

bearophile (3/9) May 27 2013 Also, let's kill the built-in sort already :-)

Peter Williams (13/20) May 27 2013 But I just found it and started using it. :-)

bearophile (7/11) May 27 2013 If you don't use () I think you call the slower, not flexible and

Jonathan M Davis (4/10) May 27 2013 If it's on an array, then yes, I believe that calling sort without paren...
Peter Williams (15/23) May 27 2013 I'm assuming here that it's safe to call sort() on the .keys property of...

Andrei Alexandrescu (4/18) May 27 2013 We need to fix that! std.algorithm.sort should include builtin sort's

bearophile (11/13) May 27 2013 What is the missing functionality we are talking about?

Jonathan M Davis (6/9) May 27 2013 Any function which takes no arguments can be called without parens, and ...

Andrei Alexandrescu (6/30) May 27 2013 I'm disappointed cartesianProduct works that way; I should have caught

bearophile (5/9) May 27 2013 I opened this:
H. S. Teoh (7/40) May 27 2013 This is not too hard to change; it's just a matter of swapping left- and
H. S. Teoh (7/45) May 27 2013 [...]
Timothee Cour (22/71) May 27 2013 that during the code review. A better iteration order would have spanned
H. S. Teoh (16/28) May 27 2013 [...]
Timothee Cour (9/37) May 27 2013 Explicit is better than implicit. The user should request the order, and

bearophile (7/13) May 28 2013 Different cases enjoy different orderings, so giving it a

H. S. Teoh (41/52) May 28 2013 [...]

Diggory (18/101) May 28 2013 I don't think the ordering should change just because the input

Peter Alexander (8/12) May 28 2013 Why is that better? It would be both unexpected and almost

Peter Alexander (6/10) May 27 2013 This will be part of my combinatorics library, but is more

bearophile (5/8) May 28 2013 Yes, that's what's written a bit lower in my post, in the

"bearophile" <bearophileHUGS lycos.com> writes:

This simple task on Rosettacode site is useful to show some uses 
of Phobos and the "component programming" recently discussed by 
Walter (other languages use a different name to denote the same 
idea).

Given a dictionary file of different words, it asks to find any 
of the longest anagram pairs, that also share no equal chars in 
the same position (so they are named deranged anagrams):

http://rosettacode.org/wiki/Anagrams/Deranged_anagrams#D

There are many ways to do this in D+Phobos. The following 
solution is long, but it's quite fast (the "warmed up" run-time 
is only about 0.03 seconds with a dictionary of about 200 KB, on 
an old CPU core), I have chosen it over simple solutions because 
it gives me a chance to discuss certain things:



import std.stdio, std.file, std.algorithm, std.string,
        std.typecons, std.range, std.functional;

auto findDeranged(in string[] words) pure /*nothrow*/ {
     //return words.pairwise.filter!(ww=> ww[].zip.all!q{a[0] != 
a[1]});
     Tuple!(string, string)[] result;
     foreach (immutable i, const w1; words)
         foreach (const w2; words[i + 1 .. $])
             if (zip(w1, w2).all!q{ a[0] != a[1] })
                 result ~= tuple(w1, w2);
     return result;
}

void main() {
     Appender!(string[])[30] wClasses;
     foreach (word; 
std.algorithm.splitter("unixdict.txt".readText))
         wClasses[$ - word.length] ~= word;

     "Longest deranged anagrams:".writeln;
     foreach (words; wClasses[].map!q{ a.data 
}.filter!(not!empty)) {
         string[][const ubyte[]] anags; // Assume ASCII input.
         foreach (w; words)
             anags[w.dup.representation.sort().release.idup] ~= w;
         auto pairs = anags.byValue.map!findDeranged.join;
         if (!pairs.empty)
             return writefln("  %s, %s", pairs.front[]);
     }
}


- - - - - - - - - - - -

That program contains five foreach loops. Foreach loops are not 
evil and I like them, but for a certain kind of programming 

languages) every time you use a for/foreach it's one small 
"failure" for the standard library :-)

The following weird (untested and maybe buggy) program replaces 
all the foreach loops with higher order functions and other 
library functions. It can't be compiled because it uses some 
things not yet present in Phobos (on the Rosettacode page there 
is also a slower and simpler D solution of this problem that uses 
only one foreach):


void main() {
     import std.stdio, std.file, std.algorithm, std.string,
            std.typecons, std.range, std.functional;

     "unixdict.txt"
     .readText
     .splitter
     .classify!q{ a.length }
     .map!q{ a.values } // .byValue is almost OK.
     .array
     .schwartzSort!q{ -a[0].length }
     .release
     .map!(words => words
                    .classify!q{ a
                                 .dup
                                 .representation
                                 .sort()
                                 .release
                                 .idup }
                    .byValue
                    .map!(words => words
                                   .pairwise
                                   .filter!(ww => ww[]
                                                  .zip
                                                  .all!q{ a[0] != 
a[1] }))
                    .join)
     .filter(not!empty)
     .front[]
     .binaryReverseArgs!writefln("  %s, %s");
}


A copy of the same code if the newsgroup has messed up the 
formatting and indents, turning that code into a soup:
http://codepad.org/L4TyDkcQ


I am not suggesting you to write whole D script-like programs in 
this strange style. But I think Phobos should offer all the tools 
to write a program like this, because even if you don't want to 
write a whole little program in this style, you sometimes want to 
use some parts of it or some other parts of it, so I think all 
the most common and distinct micro-patterns should be contained 
in Phobos.

- - - - - - - - - - - -

"binaryReverseArgs" is in the std.functional module. Here it 
allows the use of writefln in UFCS style, inverting the 
formatting string position. I think I'd like a shorter and more 
handy name for it. In Haskell it's named "flip", and its usage is 
not uncommon.

- - - - - - - - - - - -

"classify" is a simple function, that given a forward range of T 
and an optional function T->K, returns an associative array 
T[][K]. (Arrays are used by default as values. But maybe you can 
optionally specify a different type of values, like Appenders, 
Arrays, sets, etc). (Currently in Phobos the only function to 
build an associative array is std.array.assocArray, but here we 
need something different). 
(http://d.puremagic.com/issues/show_bug.cgi?id=5502 ).

[1, 7, 6, 3, 2].classify!(x => x % 2 ? "odd": "even").writeln;

==>
["odd": [1, 7, 3], "even": [6, 2]]

- - - - - - - - - - - -

"pairwise" is a very useful lazy range similar to 
cartesianProduct, but it yields only the ordered pairs, so they 
cover only about half (a triangle) of the square matrix of the 
possibilities. 
(http://d.puremagic.com/issues/show_bug.cgi?id=6788 ).


This simple example shows the difference:

import std.stdio, std.algorithm;
void main() {
     auto data = [1, 2, 3, 4];
     foreach (xy; cartesianProduct(data, data))
         writeln(xy);
}


Generates the tuples:
(1, 1)
(2, 1)
(3, 1)
(4, 1)
(1, 2)
(2, 2)
(3, 2)
(4, 2)
(1, 3)
(2, 3)
(3, 3)
(4, 3)
(1, 4)
(2, 4)
(3, 4)
(4, 4)


While:

import std.stdio, std.range;
void main() {
     auto data = [1, 2, 3, 4];
     foreach (tup; pairwise(data))
         writeln(tup);
}


Should generate:
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)


In the Python standard library there is a lazy generator that's 
more general than pairwise:

 from itertools import combinations
 list(combinations([1, 2, 3, 4], 2))



[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]


So if you prefer that more general solution the D code becomes:

...
                    .map!(words => words
                                   .combinations(2)
                                   .filter!(ww => ww[]
...

Bye,
bearophile

May 27 2013

"Sebastian Graf" <SebastianGraf t-online.de> writes:

On Monday, 27 May 2013 at 21:36:12 UTC, bearophile wrote:
 snip

Every time I see that kind of code, my heart makes a delightful 
jump. That code is what I enjoy most about D compared to C++. 
Plus, the compiler is still able to optimize most of the 


I'm all for more algorithm primitives in std.algorithm. I missed 

confused that std.algorithm.group did not what I thought it did.

Is there any reason why you keep using quoted strings instead of 
string literals for lambdas besides taste?

May 27 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Sebastian Graf:

 Plus, the compiler is still able to optimize most of the 


There are several optimizations that D/DMD is not performing on 
those ranges and higher order functions. The Haskell compiler GHC 
optimized that stuff using purity, library defined "rewrite 
rules", stream fusion/deforestation and more. DMD does nothing of 
this, or very little. I think so far Walter has shown no interest 
in this.


 I'm all for more algorithm primitives in std.algorithm. I 

 was confused that std.algorithm.group did not what I thought it 
 did.


 std.algorithm.group did not what I thought it did.

The "group" of Phobos is useful and it has purposes quite 
different from the Perl6 "classify", both are needed.
I have also suggested "group" to act more like the Python 
"itertools.grouby" and yield not just the (head,count) tuples, 
but (head,lazy_range_of_the_equality_ class) that is quite 
useful, example:


 from itertools import groupby
 s = "abABACAaaaaBCAB"
 [(h, list(g)) for h,g in groupby(s, key=str.isupper)]



[(False, ['a', 'b']), (True, ['A', 'B', 'A', 'C', 'A']), (False, 
['a', 'a', 'a', 'a']), (True, ['B', 'C', 'A', 'B'])]

I think Andrei was looking at this change with interest. Changing 
Phobos group() API now is maybe not easy to do, so maybe it's 
better to introduce a differently named function for that, like 
one named "groupAll" or something similar.


 Is there any reason why you keep using quoted strings instead 
 of string literals for lambdas besides taste?

My editor uses a single uniform color for the contents of normal 
strings, unlike quoted strings.

Bye,
bearophile

May 27 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, May 28, 2013 at 02:16:22AM +0200, bearophile wrote:
 Sebastian Graf:
 
Plus, the compiler is still able to optimize most of the


 
 There are several optimizations that D/DMD is not performing on
 those ranges and higher order functions. The Haskell compiler GHC
 optimized that stuff using purity, library defined "rewrite rules",
 stream fusion/deforestation and more.

Library-defined rewrite rules would be �ber-cool if it were supported in
D.

*(waits for somebody to mention "AST macros", and everyone else to shout
him down. :-P)


 DMD does nothing of this, or very little. I think so far Walter has
 shown no interest in this.

[...]

The DMD optimizer certainly leaves a lot of room for improvement. I
think Walter would accept pull requests to that effect. ;-)


T

-- 
Try to keep an open mind, but not so open your brain falls out. -- theboz

May 27 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/27/13 8:16 PM, bearophile wrote:
 The "group" of Phobos is useful and it has purposes quite different from
 the Perl6 "classify", both are needed.
 I have also suggested "group" to act more like the Python
 "itertools.grouby" and yield not just the (head,count) tuples, but
 (head,lazy_range_of_the_equality_ class) that is quite useful, example:


 from itertools import groupby
 s = "abABACAaaaaBCAB"
 [(h, list(g)) for h,g in groupby(s, key=str.isupper)]



 [(False, ['a', 'b']), (True, ['A', 'B', 'A', 'C', 'A']), (False, ['a',
 'a', 'a', 'a']), (True, ['B', 'C', 'A', 'B'])]

 I think Andrei was looking at this change with interest. Changing Phobos
 group() API now is maybe not easy to do, so maybe it's better to
 introduce a differently named function for that, like one named
 "groupAll" or something similar.

I wrote that a while ago. Someone please review and pull 
https://github.com/D-Programming-Language/phobos/pull/1186 already!


Andrei

May 27 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 5/27/2013 5:16 PM, bearophile wrote:
 There are several optimizations that D/DMD is not performing on those ranges
and
 higher order functions. The Haskell compiler GHC optimized that stuff using
 purity, library defined "rewrite rules", stream fusion/deforestation and more.
 DMD does nothing of this, or very little. I think so far Walter has shown no
 interest in this.

I don't really know what you're talking about.

May 27 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Walter Bright:

 There are several optimizations that D/DMD is not performing 
 on those ranges and
 higher order functions. The Haskell compiler GHC optimized 
 that stuff using
 purity, library defined "rewrite rules", stream 
 fusion/deforestation and more.
 DMD does nothing of this, or very little. I think so far 
 Walter has shown no
 interest in this.

 I don't really know what you're talking about.

Then maybe you have missed the last two times I have discussed 
such topics in D.learn. They are optimizations done by compilers 
of functional languages, in particular the GHC compiler of the 
lazy functional language Haskell.

The optimizations performed thanks to purity and immutability are 
needed by GHC because in Haskell you use Higher Order Functions 
(HOFs), and generally you use functions as first class immutable 
values, so if you don't inline a lot, your Haskell program will 
be slow as a snail.

The "rewrite rules" is a feature of the GHC compiler. It allows 
the library writers to define rules that in most (or many) cases 
are advantageous and lead to better optimization. As example one 
of such rules swap map and filter, putting the filter before the 
map, because this is more efficient. Haskell allows you to 
perform such swapping because (nearly) everything it does is pure 
and immutable.

Some examples and explanations on the GHC rewrite rules:
http://www.haskell.org/ghc/docs/7.0.1/html/users_guide/rewrite-rules.html
http://www.haskell.org/haskellwiki/GHC/Using_rules
http://www.haskell.org/haskellwiki/Playing_by_the_rules

Stream fusion, or more generally deforestation is another 
optimization done by GHC, it's needed because it's a functional 
language that uses lists all the time, and such optimization are 
helped by Haskell default lazyness (deforestation is possible in 
an eager language too, but it's a bit less easy to do, they say). 
If you perform a map and then a filter and then another HOF you 
are traversing lists all the time, and this is slow. Haskell 
default strings currently are lists of chars, so you have to 
optimize string operations well. To generate fast binaries GHC 
"fuses" similar lazy streams, and avoids creating all the 
intermediate lists.

With this optimization recently GHC has shown to generate good 
enough vectorized SSE-aware binaries that are faster than naive C 
code optimized by the vectorizing stages of the GCC optimizer, 
and beaten only by very vector-intrinsics-heavy handwritten C 
code:
http://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf

But that's not a paper to start reading about such topics, 
because that's a late paper on the topic.

More on the topic:
http://www.randomhacks.net/articles/2007/02/10/map-fusion-and-haskell-performance

On the topic there are both easy to understand and harder to 
understand papers. So you have to choose.

Bye,
bearophile

May 28 2013

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Tuesday, 28 May 2013 at 09:54:35 UTC, bearophile wrote:
 The "rewrite rules" is a feature of the GHC compiler. It allows 
 the library writers to define rules that in most (or many) 
 cases are advantageous and lead to better optimization. As 
 example one of such rules swap map and filter, putting the 
 filter before the map, because this is more efficient. Haskell 
 allows you to perform such swapping because (nearly) everything 
 it does is pure and immutable.

Phobos does this a little bit in simple cases. For example, 
retro(retro(r)) returns r if I remember correctly.

More complicated rewrites would be trickier to implement, and may 
be impossible cross-modules (I cannot provide a rewrite for 
retro(my.module.range) without modifying Phobos).

General rewrites, like the kind relational database query 
optimisers do, require statistics on the data structures. For 
example:

1) cartesianProduct(r1, r2).filter!(pred).sort("a[1]<b[1]")
2) cartesianProduct(r1, r2.sort("a[1]<b[1]")).filter!(pred)

Which is faster depends on the sizes of r1 and r2, and the pass 
ratio of the filter predicate.

May 28 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, May 28, 2013 19:55:43 Peter Alexander wrote:
 Phobos does this a little bit in simple cases. For example,
 retro(retro(r)) returns r if I remember correctly.

Yes. Phobos tries to optimize useless templates like that, and definitely does 
it in this particular case, though I don't know if it does it everywhere that 
it should.
 
 More complicated rewrites would be trickier to implement, and may
 be impossible cross-modules (I cannot provide a rewrite for
 retro(my.module.range) without modifying Phobos).

True, though you might be able to pull off something very close with alias 
this. The type wouldn't be the same, but it would at least convert to the same 
type.

- Jonathan M Davis

May 28 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 5/28/2013 2:54 AM, bearophile wrote:
 The "rewrite rules" is a feature of the GHC compiler. It allows the library
 writers to define rules that in most (or many) cases are advantageous and lead
 to better optimization. As example one of such rules swap map and filter,
 putting the filter before the map, because this is more efficient. Haskell
 allows you to perform such swapping because (nearly) everything it does is pure
 and immutable.

Thank you.

May 28 2013

"bearophile" <bearophileHUGS lycos.com> writes:

 Then maybe you have missed the last two times I have discussed 
 such topics in D.learn.

Sorry, I meant the main D newsgroup.

Bye,
bearophile

May 28 2013

"Sebastian Graf" <SebastianGraf t-online.de> writes:

On Tuesday, 28 May 2013 at 00:16:23 UTC, bearophile wrote:
 Sebastian Graf:

 Plus, the compiler is still able to optimize most of the 


 There are several optimizations that D/DMD is not performing on 
 those ranges and higher order functions. The Haskell compiler 
 GHC optimized that stuff using purity, library defined "rewrite 
 rules", stream fusion/deforestation and more. DMD does nothing 
 of this, or very little. I think so far Walter has shown no 
 interest in this.

Point taken, but what I actually meant was that the compiler is 
allowed to optimize it all away in _theory_. You cannot have this 

  big array is faster with a foreach-if combination than with an 
array.Where(predicate) linq statement (I blame the cache and 
delegate indirection). Since there is no strong compile time 
templating, this will not change in the future.
Of course this is a very specific case, but it is something you 
have to keep in mind. The difference for some MB-sized arrays is 
noticable.
TIL that GHC already has this. I should really continue learning 
Haskell.

May 28 2013

"Paulo Pinto" <pjmlp progtools.org> writes:

On Tuesday, 28 May 2013 at 07:26:06 UTC, Sebastian Graf wrote:
 On Tuesday, 28 May 2013 at 00:16:23 UTC, bearophile wrote:
 Sebastian Graf:

 Plus, the compiler is still able to optimize most of the 


 There are several optimizations that D/DMD is not performing 
 on those ranges and higher order functions. The Haskell 
 compiler GHC optimized that stuff using purity, library 
 defined "rewrite rules", stream fusion/deforestation and more. 
 DMD does nothing of this, or very little. I think so far 
 Walter has shown no interest in this.

 Point taken, but what I actually meant was that the compiler is 
 allowed to optimize it all away in _theory_. You cannot have 

  big array is faster with a foreach-if combination than with an 
 array.Where(predicate) linq statement (I blame the cache and 
 delegate indirection). Since there is no strong compile time 
 templating, this will not change in the future.
 Of course this is a very specific case, but it is something you 
 have to keep in mind. The difference for some MB-sized arrays 
 is noticable.
 TIL that GHC already has this. I should really continue 
 learning Haskell.

Even with NGEN or Mono -aot?

May 28 2013

Jacob Carlborg <doob me.com> writes:

On 2013-05-28 02:16, bearophile wrote:

 My editor uses a single uniform color for the contents of normal
 strings, unlike quoted strings.

Why not use proper lambdas instead of strings?

-- 
/Jacob Carlborg

May 28 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Jacob Carlborg:

 Why not use proper lambdas instead of strings?

Mostly for personal reasons: quoted strings are sometimes a 
little shorter, and they require you to use arguments with 
default names (as "a" and "b"), this increased standardization 
makes me read them a little faster than lambdas that have varying 
argument names. If I see q{a + b} I don't need to go look what a 
and b are, for me it's a "standard" sum, like one to be used by a 
reduce().

It's a bit like the "pointfree" style in Haskell 
(http://www.haskell.org/haskellwiki/Pointfree ), also named 
"tacit" in other languages, where you don't name input variables; 
that if overused makes the code unreadable, but if used a bit and 
wisely allows you to underline the transformations done on the 
data instead of on the (sometimes arbitrary) names you give to 
parts of generic data.

Bye,
bearophile

May 28 2013

"Paulo Pinto" <pjmlp progtools.org> writes:

On Monday, 27 May 2013 at 23:23:52 UTC, Sebastian Graf wrote:
 On Monday, 27 May 2013 at 21:36:12 UTC, bearophile wrote:
 snip

 Every time I see that kind of code, my heart makes a delightful 
 jump. That code is what I enjoy most about D compared to C++. 
 Plus, the compiler is still able to optimize most of the 


 I'm all for more algorithm primitives in std.algorithm. I 

 was confused that std.algorithm.group did not what I thought it 
 did.

 Is there any reason why you keep using quoted strings instead 
 of string literals for lambdas besides taste?

Microsoft could actually spend a bit more time on their 
optimizer, but I guess business has other priorities.

May 28 2013

"bearophile" <bearophileHUGS lycos.com> writes:

     .map!(words => words
                    .classify!q{ a
                                 .dup
                                 .representation
                                 .sort()
                                 .release

Also, let's kill the built-in sort already :-)

Bye,
bearophile

May 27 2013

Peter Williams <pwil3058 bigpond.net.au> writes:

On 28/05/13 10:37, bearophile wrote:
     .map!(words => words
                    .classify!q{ a
                                 .dup
                                 .representation
                                 .sort()
                                 .release

 Also, let's kill the built-in sort already :-)

But I just found it and started using it. :-)

I was contemplating writing my own sort function as the ones in 
std.algorithm didn't meet my needs (or using them was too messy) until I 
discovered this feature.

Are the () necessary on sort?  I found:

auto sorted_array = an_array.dup.sort;

worked.

Peter
PS Now I've found this I can go back and simplify all the code where I 
iterated over associative arrays in key order by getting the keys and 
the sorting them separately.
PPS Every time I discover one of these features I like D even more.

May 27 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Peter Williams:

 Are the () necessary on sort?

If you don't use () I think you call the slower, not flexible and 
buggy built-in sort. I think it's already deprecated. Maybe I am 
wrong...


 PS Now I've found this I can go back and simplify all the code 
 where I iterated over associative arrays in key order by 
 getting the keys and the sorting them separately.

I don't fully understand.

Bye,
bearophile

May 27 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, May 28, 2013 03:50:48 bearophile wrote:
 Peter Williams:
 Are the () necessary on sort?

 
 If you don't use () I think you call the slower, not flexible and
 buggy built-in sort. I think it's already deprecated. Maybe I am
 wrong...

If it's on an array, then yes, I believe that calling sort without parens 
would call the built-in one that should be deprecated.

- Jonathan M Davis

May 27 2013

Peter Williams <pwil3058 bigpond.net.au> writes:

On 28/05/13 11:50, bearophile wrote:
 Peter Williams:

 Are the () necessary on sort?

 If you don't use () I think you call the slower, not flexible and buggy
 built-in sort. I think it's already deprecated. Maybe I am wrong...

Ah.  Does that mean that import.algorithms is need to use sort()?

 PS Now I've found this I can go back and simplify all the code where I
 iterated over associative arrays in key order by getting the keys and
 the sorting them separately.

 I don't fully understand.

I'm assuming here that it's safe to call sort() on the .keys property of 
the dynamic array.  This enables me to go:

foreach (key; aa.keys.sort()) {...}

instead of the much more complex code I'm currently writing which gets 
the keys, sorts them and then uses them in the foreach.  I had actually 
written a generic function to do all of that but now find all the work 
was unnecessary :-).

As I learn more about D I'm finding I can go back and simplify a lot of 
my code.  I'm going to reread Andrei's book to see what I missed the 
first time.

Peter
PS I think I need to read more about component programming as I'm 
beginning to suspect I don't understand it fully.

May 27 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/27/13 9:30 PM, Peter Williams wrote:
 On 28/05/13 10:37, bearophile wrote:
 .map!(words => words
 .classify!q{ a
 .dup
 .representation
 .sort()
 .release

 Also, let's kill the built-in sort already :-)

 But I just found it and started using it. :-)

 I was contemplating writing my own sort function as the ones in
 std.algorithm didn't meet my needs (or using them was too messy) until I
 discovered this feature.

We need to fix that! std.algorithm.sort should include builtin sort's 
functionality,

Andrei

May 27 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 We need to fix that! std.algorithm.sort should include builtin 
 sort's functionality,

What is the missing functionality we are talking about?
I think the only advantage of the built-in sort is that it causes 
less template bloat and less compilation time.

Anyway, I'd like to see a warning or deprecation for the built-in 
sort soon... Unless you/we want to un-deprecate it and fix it 
better. Keeping several things for lot of time in a 
"will-be-deprecated" limbo doesn't sound good for a serious 
language.

Bye,
bearophile

May 27 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, May 28, 2013 11:30:55 Peter Williams wrote:
 Are the () necessary on sort?  I found:
 
 auto sorted_array = an_array.dup.sort;

Any function which takes no arguments can be called without parens, and thanks 
to UFCS (Universal Function Call Syntax), you're calling these functions as if 
they were member functions and so they no longer have any arguments between 
the parens and so can be called without parens.

- Jonathan M Davis

May 27 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/27/13 5:36 PM, bearophile wrote:
 This simple example shows the difference:

 import std.stdio, std.algorithm;
 void main() {
 auto data = [1, 2, 3, 4];
 foreach (xy; cartesianProduct(data, data))
 writeln(xy);
 }


 Generates the tuples:
 (1, 1)
 (2, 1)
 (3, 1)
 (4, 1)
 (1, 2)
 (2, 2)
 (3, 2)
 (4, 2)
 (1, 3)
 (2, 3)
 (3, 3)
 (4, 3)
 (1, 4)
 (2, 4)
 (3, 4)
 (4, 4)

I'm disappointed cartesianProduct works that way; I should have caught 
that during the code review. A better iteration order would have spanned 
the lower position in both ranges first, i.e. create squares of 
increasing side in the 2D space.

Andrei

May 27 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 I'm disappointed cartesianProduct works that way; I should have 
 caught that during the code review. A better iteration order 
 would have spanned the lower position in both ranges first, 
 i.e. create squares of increasing side in the 2D space.

I opened this:
http://d.puremagic.com/issues/show_bug.cgi?id=9878

Bye,
bearophile

May 27 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, May 27, 2013 at 10:01:32PM -0400, Andrei Alexandrescu wrote:
 On 5/27/13 5:36 PM, bearophile wrote:
This simple example shows the difference:

import std.stdio, std.algorithm;
void main() {
auto data = [1, 2, 3, 4];
foreach (xy; cartesianProduct(data, data))
writeln(xy);
}


Generates the tuples:
(1, 1)
(2, 1)
(3, 1)
(4, 1)
(1, 2)
(2, 2)
(3, 2)
(4, 2)
(1, 3)
(2, 3)
(3, 3)
(4, 3)
(1, 4)
(2, 4)
(3, 4)
(4, 4)

 
 I'm disappointed cartesianProduct works that way; I should have
 caught that during the code review. A better iteration order would
 have spanned the lower position in both ranges first, i.e. create
 squares of increasing side in the 2D space.

This is not too hard to change; it's just a matter of swapping left- and
right- recursion in the templates. I'll see if I can cook up an pull
request in a bit.


T

-- 
It only takes one twig to burn down a forest.

May 27 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, May 27, 2013 at 07:35:36PM -0700, H. S. Teoh wrote:
 On Mon, May 27, 2013 at 10:01:32PM -0400, Andrei Alexandrescu wrote:
 On 5/27/13 5:36 PM, bearophile wrote:
This simple example shows the difference:

import std.stdio, std.algorithm;
void main() {
auto data = [1, 2, 3, 4];
foreach (xy; cartesianProduct(data, data))
writeln(xy);
}


Generates the tuples:
(1, 1)
(2, 1)
(3, 1)
(4, 1)
(1, 2)
(2, 2)
(3, 2)
(4, 2)
(1, 3)
(2, 3)
(3, 3)
(4, 3)
(1, 4)
(2, 4)
(3, 4)
(4, 4)

 
 I'm disappointed cartesianProduct works that way; I should have
 caught that during the code review. A better iteration order would
 have spanned the lower position in both ranges first, i.e. create
 squares of increasing side in the 2D space.

 
 This is not too hard to change; it's just a matter of swapping left-
 and right- recursion in the templates. I'll see if I can cook up an
 pull request in a bit.

[...]

Done, turns out the fix was trivial, just swapping two static ifs:

https://github.com/D-Programming-Language/phobos/pull/1314


T

-- 
I think Debian's doing something wrong, `apt-get install pesticide', doesn't
seem to remove the bugs on my system! -- Mike Dresser

May 27 2013

Timothee Cour <thelastmammoth gmail.com> writes:

 Done, turns out the fix was trivial, just swapping two static ifs:
 https://github.com/D-Programming-Language/phobos/pull/1314

This isn't what Andrei had in mind in his post above:

 I'm disappointed cartesianProduct works that way; I should have caught

that during the code review. A better iteration order would have spanned
the lower position in both ranges first, i.e. create squares of increasing
side in the 2D space.

I would suggest an additional template parameter to specify the order of
iteration:
for example an enum Order {lexicographic_depth,
lexicographic_breadth,antilexicographic_depth,
antilexicographic_breadth}.
The naming is horrible but you get the idea:
cartesianProduct!(order)(['a','b','c'],[1,2,3]))
order=Order.lexicographic_depth:
a1 a2 a3 b1 b2 b3 c1 c2 c3
order=Order. lexicographic_breadth:
a1 b1 a2 c1 b2 a3 c2 b3 c3
ie , in 2D, it follows diagonals x+y=k, with k=0,1,2,... and 0<=x<length(x
range), 0<=y<length(y range)
order=Order.antilexicographic_depth:
a1 b1 c1 a2 b2 c2 a3 b3 c3
order=Order.antilexicographic_breadth:
a1 a2 b1 a3 b2 c1 b3 c2 c3


On Mon, May 27, 2013 at 7:58 PM, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:

 On Mon, May 27, 2013 at 07:35:36PM -0700, H. S. Teoh wrote:
 On Mon, May 27, 2013 at 10:01:32PM -0400, Andrei Alexandrescu wrote:
 On 5/27/13 5:36 PM, bearophile wrote:
This simple example shows the difference:

import std.stdio, std.algorithm;
void main() {
auto data = [1, 2, 3, 4];
foreach (xy; cartesianProduct(data, data))
writeln(xy);
}


Generates the tuples:
(1, 1)
(2, 1)
(3, 1)
(4, 1)
(1, 2)
(2, 2)
(3, 2)
(4, 2)
(1, 3)
(2, 3)
(3, 3)
(4, 3)
(1, 4)
(2, 4)
(3, 4)
(4, 4)

 I'm disappointed cartesianProduct works that way; I should have
 caught that during the code review. A better iteration order would
 have spanned the lower position in both ranges first, i.e. create
 squares of increasing side in the 2D space.

 This is not too hard to change; it's just a matter of swapping left-
 and right- recursion in the templates. I'll see if I can cook up an
 pull request in a bit.

 [...]

 Done, turns out the fix was trivial, just swapping two static ifs:

 https://github.com/D-Programming-Language/phobos/pull/1314


 T

 --
 I think Debian's doing something wrong, `apt-get install pesticide',
 doesn't seem to remove the bugs on my system! -- Mike Dresser

May 27 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, May 27, 2013 at 09:24:28PM -0700, Timothee Cour wrote:
 Done, turns out the fix was trivial, just swapping two static ifs:
 https://github.com/D-Programming-Language/phobos/pull/1314

 
 This isn't what Andrei had in mind in his post above:
 
 I'm disappointed cartesianProduct works that way; I should have
 caught that during the code review. A better iteration order would
 have spanned the lower position in both ranges first, i.e. create
 squares of increasing side in the 2D space.

 
 I would suggest an additional template parameter to specify the order
 of iteration:

[...]

The problem with allowing the user to specify order is that when one or
more of the input ranges are infinite, the order of traversal is much
less flexible, and it may not be possible to satisfy the requested
order.

The order that Andrei suggested is what's currently used for two
infinite ranges, though when finite ranges are involved I chose a
slightly simpler implementation. The order bearophile proposed in issue
9878 is different from what Andrei describes, at any rate.

What's the typical output order of cartesian products in other languages
/ libraries? It seems nobody can agree on what the order should be. :-/


T

-- 
People demand freedom of speech to make up for the freedom of thought
which they avoid. -- Soren Aabye Kierkegaard (1813-1855)

May 27 2013

Timothee Cour <thelastmammoth gmail.com> writes:

On Mon, May 27, 2013 at 9:53 PM, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:

 On Mon, May 27, 2013 at 09:24:28PM -0700, Timothee Cour wrote:
 Done, turns out the fix was trivial, just swapping two static ifs:
 https://github.com/D-Programming-Language/phobos/pull/1314

 This isn't what Andrei had in mind in his post above:

 I'm disappointed cartesianProduct works that way; I should have
 caught that during the code review. A better iteration order would
 have spanned the lower position in both ranges first, i.e. create
 squares of increasing side in the 2D space.

 I would suggest an additional template parameter to specify the order
 of iteration:

 [...]

 The problem with allowing the user to specify order is that when one or
 more of the input ranges are infinite, the order of traversal is much
 less flexible, and it may not be possible to satisfy the requested
 order.

Explicit is better than implicit. The user should request the order, and
the compiler can statically disallow
lexicographic_depth/antilexicographic_depth
when at least one of the ranges is infinite.


 The order that Andrei suggested is what's currently used for two
 infinite ranges, though when finite ranges are involved I chose a
 slightly simpler implementation. The order bearophile proposed in issue
 9878 is different from what Andrei describes, at any rate.

 What's the typical output order of cartesian products in other languages
 / libraries? It seems nobody can agree on what the order should be. :-/

python uses itertools.product which is lexicographic_depth.
Like you say, no-one can agrees what the order should be, so let's leave it
up to user through a template. Sounds like a no-brainer to me. There are
use cases for each order I mentioned.


 T

 --
 People demand freedom of speech to make up for the freedom of thought
 which they avoid. -- Soren Aabye Kierkegaard (1813-1855)

May 27 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Timothee Cour:

 python uses itertools.product which is lexicographic_depth.
 Like you say, no-one can agrees what the order should be, so 
 let's leave it
 up to user through a template. Sounds like a no-brainer to me. 
 There are
 use cases for each order I mentioned.

Different cases enjoy different orderings, so giving it a 
compile-time enum is OK. But I prefer the order of 9878 to be the 
default one, because it's the most commonly needed by me, and 
it's what I expect when I port Python code to D.

Bye,
bearophile

May 28 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, May 28, 2013 at 11:37:06AM +0200, bearophile wrote:
 Timothee Cour:
 
python uses itertools.product which is lexicographic_depth.  Like you
say, no-one can agrees what the order should be, so let's leave it up
to user through a template. Sounds like a no-brainer to me.  There
are use cases for each order I mentioned.

 
 Different cases enjoy different orderings, so giving it a
 compile-time enum is OK. But I prefer the order of 9878 to be the
 default one, because it's the most commonly needed by me, and it's
 what I expect when I port Python code to D.

[...]

OK, I'm looking at the code to see how this could be implemented, and it
seems to require some complications:

If both ranges are infinite, then the ordering needs to be Andrei's
suggested ordering (incremental 2D square). It *can* have other
orderings too, like diagonals, but that requires bidirectional ranges,
not just forward ranges.

If only one range is infinite, then it is allowed to be an input range,
but that constrains it to a specific ordering (namely, finite range
first, then increment infinite range). If the infinite range is more
than an input range, then of course other orderings are possible, but
explicitly supporting those cases seems to be a lot of work for very
little added value.

If both ranges are finite, then many different orderings are possible,
but again it depends on the nature of the ranges. If one of them is an
input range, then the other must be a forward range, and the order is
constrained.

The biggest problem here is, what should be the default ordering for the
template? There is no single ordering that will work for every case.

Furthermore, how many different kinds of orderings should be supported?
There are a lot of possibilities, even just in the finite case (if the
ranges are both bidirectional, for example -- who's to say we should
exclude a Peano-curve traversal of the cartesian product, or everything
in between?). Even restricting ourselves to the simplest orderings,
there's the question of how to efficiently implement something like
Andrei's 2D traversal when you have two finite ranges of different
lengths.

Currently, I'm inclined to say, the library gets to decide on the
ordering based on its inputs. I'd rather the template require minimal
functionality from its input ranges and automatically decide on which
ordering is best. I'd just follow bearophile's suggestion for the finite
case -- the ordering in issue 9878 is certainly the most useful, and
most likely expected (most uses of cartesian products would expect an
ordering analogous to binary enumeration, I think). While adding an enum
to specify the exact ordering is certainly possible, I'm not sure if
it's adding enough value to justify the complications in the
implementation.


T

-- 
Gone Chopin. Bach in a minuet.

May 28 2013

"Diggory" <diggsey googlemail.com> writes:

On Tuesday, 28 May 2013 at 17:24:13 UTC, H. S. Teoh wrote:
 On Tue, May 28, 2013 at 11:37:06AM +0200, bearophile wrote:
 Timothee Cour:
 
python uses itertools.product which is lexicographic_depth.  
Like you
say, no-one can agrees what the order should be, so let's 
leave it up
to user through a template. Sounds like a no-brainer to me.  
There
are use cases for each order I mentioned.

 
 Different cases enjoy different orderings, so giving it a
 compile-time enum is OK. But I prefer the order of 9878 to be 
 the
 default one, because it's the most commonly needed by me, and 
 it's
 what I expect when I port Python code to D.

 [...]

 OK, I'm looking at the code to see how this could be 
 implemented, and it
 seems to require some complications:

 If both ranges are infinite, then the ordering needs to be 
 Andrei's
 suggested ordering (incremental 2D square). It *can* have other
 orderings too, like diagonals, but that requires bidirectional 
 ranges,
 not just forward ranges.

 If only one range is infinite, then it is allowed to be an 
 input range,
 but that constrains it to a specific ordering (namely, finite 
 range
 first, then increment infinite range). If the infinite range is 
 more
 than an input range, then of course other orderings are 
 possible, but
 explicitly supporting those cases seems to be a lot of work for 
 very
 little added value.

 If both ranges are finite, then many different orderings are 
 possible,
 but again it depends on the nature of the ranges. If one of 
 them is an
 input range, then the other must be a forward range, and the 
 order is
 constrained.

 The biggest problem here is, what should be the default 
 ordering for the
 template? There is no single ordering that will work for every 
 case.

 Furthermore, how many different kinds of orderings should be 
 supported?
 There are a lot of possibilities, even just in the finite case 
 (if the
 ranges are both bidirectional, for example -- who's to say we 
 should
 exclude a Peano-curve traversal of the cartesian product, or 
 everything
 in between?). Even restricting ourselves to the simplest 
 orderings,
 there's the question of how to efficiently implement something 
 like
 Andrei's 2D traversal when you have two finite ranges of 
 different
 lengths.

 Currently, I'm inclined to say, the library gets to decide on 
 the
 ordering based on its inputs. I'd rather the template require 
 minimal
 functionality from its input ranges and automatically decide on 
 which
 ordering is best. I'd just follow bearophile's suggestion for 
 the finite
 case -- the ordering in issue 9878 is certainly the most 
 useful, and
 most likely expected (most uses of cartesian products would 
 expect an
 ordering analogous to binary enumeration, I think). While 
 adding an enum
 to specify the exact ordering is certainly possible, I'm not 
 sure if
 it's adding enough value to justify the complications in the
 implementation.


 T

I don't think the ordering should change just because the input 
type is changed, that would be very confusing, especially when 
code is just passing on ranges it's received from somewhere else.

The default should be the simplest ordering (across then down) 
and other orderings must be explicitly asked for. When an 
incompatible ordering (including using an infinite range for the 
"across" range with the default ordering) is being used it should 
show a helpful error message suggesting a compatible ordering.

There could also be an ordering constant "dontCare" which simply 
picks any ordering compatible with the nature of the input 
ranges. I don't think this should be the default because the 
common case is that you do care what the ordering is, you should 
have to explicitly say that you don't care.

It's also worth mentioning that the rules are more complicated 
than have been suggested - using an infinite range with the 
default ordering is fine as long as a finite range is used for 
the range that is being iterated over more quickly.

May 28 2013

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Tuesday, 28 May 2013 at 02:01:30 UTC, Andrei Alexandrescu 
wrote:
 I'm disappointed cartesianProduct works that way; I should have 
 caught that during the code review. A better iteration order 
 would have spanned the lower position in both ranges first, 
 i.e. create squares of increasing side in the 2D space.

Why is that better? It would be both unexpected and almost 
certainly slower.

It has the advantage that it works sensibly with infinite ranges, 
but I think the correct approach would be to provide a policy 
option for the iteration strategy (lexicographic, 
colexicographic, Gray, etc.) so that the user can control this.

May 28 2013

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Monday, 27 May 2013 at 21:36:12 UTC, bearophile wrote:
 "pairwise" is a very useful lazy range similar to 
 cartesianProduct, but it yields only the ordered pairs, so they 
 cover only about half (a triangle) of the square matrix of the 
 possibilities.

This will be part of my combinatorics library, but is more 
general. pairWise is just a specific case of k-subsets (with 
k=2). By default, it currently provides the subsets in 
colexicographic order instead of lexicographic, but I intend to 
make the ordering a policy before I'm done.

May 27 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Peter Alexander:

 This will be part of my combinatorics library, but is more 
 general. pairWise is just a specific case of k-subsets (with 
 k=2).

Yes, that's what's written a bit lower in my post, in the 
combinations() of Python.

Bye,
bearophile

May 28 2013

D Programming

C/C++ Programming

Other

digitalmars.D - More on "Component Programming"