digitalmars.D - Few things II

bearophile (152/152) Aug 07 2007 Here are some more notes, that I think are less important/significant th...

Sean Kelly (42/121) Aug 07 2007 I disagree. Since the key in an AA is equivalent to the index in an

Bruno Medeiros (11/23) Aug 09 2007 I think some amount of people in the NG knew that already :)

BCS (10/42) Aug 07 2007 D used to have this after a sorts. It was removed. I forget why.
Jari-Matti =?ISO-8859-1?Q?M=E4kel=E4?= (18/60) Aug 07 2007 It does a bit more type inference than D at the moment. Currently you ei...

Jari-Matti =?ISO-8859-1?Q?M=E4kel=E4?= (36/53) Aug 08 2007 I lost my train of thought a bit here. I'm not sure if this works, but

Walter Bright (3/45) Aug 07 2007 D's ulong is 64 bits, while C's unsigned long is 32 bits. This is likely...
Oskar Linde (92/136) Aug 07 2007 In that case, writefln(1) should not print the same thing as

Sean Kelly (6/24) Aug 07 2007 Another advantage is that the property syntax can simplify template code...
Chris Nicholson-Sauls (16/34) Aug 07 2007 Agreed. Incidentally, consider this little thing I did in Ruby a few
Bill Baxter (10/24) Aug 07 2007 This is the heart of the repr/str distinction from Python that was

bearophile (32/53) Aug 09 2007 I was looking for something O(1).

Oskar Linde (27/47) Aug 09 2007 This is much better, but I would still recommend you to reply to each
Sean Kelly (4/12) Aug 09 2007 For what it's worth, Tango contains a Fiber implementation that can be
BCS (30/45) Aug 09 2007 I am currently working on a project that is written in C# but which we a...

bearophile (190/194) Aug 11 2007 I understand. We may collect a large repository of code/snippets that us...

BCS (76/116) Aug 11 2007 [dropped D code]

bearophile (33/35) Aug 16 2007 Thank you. I'm learning D, so I can comment your code; this is the first...

Oskar Linde (42/56) Aug 16 2007 As far as I can tell, the following should work. Maybe I am missing
BCS (22/48) Aug 16 2007 that is because it can't, not directly at least. This is because the arg...

bearophile <bearophileHUGS lycos.com> writes:

Here are some more notes, that I think are less important/significant than the
first ones. Probably some of the following notes are silly/useless. There are
various things I don't know about D yet.

1) The Ddoc has some bugs. For example I'd like to write in the docs that a
function takes an input in [0,1), but that produces a wrong html output.

----------------------

2) Regarding the printing functions:

2a) I think that the names writef e writefln are a bit too much long and not
easy to remember. So better names can be found. For example show/shownl,
show/showr, show/showl, put/putr, write/writel, print/printnl, print/printl,
etc. I have seen Tango uses print.


2b) I like the idea of the writefl/write functions that don't perform
formatting. I was printing a string that casually contained an % and that has
produced a bug. So I've created a better printing function, that avoids that
bug too.


2c) writefln may print structs too in a default way, my basic (non-recursive!)
implementation (that represents structs as: <field1, field2, ...> ): 

char[] sstr(TyStruct)(TyStruct s) {
  auto result = new char[][s.tupleof.length];
  foreach (i, field; s.tupleof)
    result[i] = str(field);
  return "<" ~ result.join(", ") ~ ">";
}


2d) Regarding the printing I think it's not nice that:
writefln(["1", "2"])
prints the same thing as:
writefln([1, 2])
So maybe writefln can use "" when printing strings inside arrays/AAs, to
differentiate them.
(To sort out the ideas you may also take a look at the difference between
repr() and str() in Python).



2e) I also think writefln when in printing arrays/AAs can put spaces between
the commas.
I think [el1, el2, el3] and [1:"a", 2:"b"] are more readable than [el1,el2,el3]
and [1:a,2:b].


2f) I think this is a writefln inconsistency:
import std.stdio;
void main() {
  char[3] a = "abc";
  writefln([a, a]); // Prints: [abc,abc]
  writefln([a: 1]); // Prints: [[a,b,c]:1]
}

----------------------

3) Given an array x1, like a char[], it seems to me that x1.dup is a bit slower
compared to:
char[] x2 = new char[x1.length];
x2[] = x1;
I hope this can be fixed (or maybe someone can tell me why).

----------------------

4) The __builtin_expect is a method that gcc (versions >= 2.96) offer for
programmers to indicate branch prediction information to the compiler. The
return value of __builtin_expect is the first argument (which could only be an
integer) passed to it. This may be useful for dmd too, maybe for far future
times, when dmd has solved most of its bugs and the D syntax is more finished.

----------------------

5) I think built-in sets can be quite useful (I use them often in Python, I can
show some examples if you want). Few set operations among the keys of AAs may
be quite useful too (Python doesn't have them). (A first version may be inside
Phobos, implemented using the AAs with empty values.)

----------------------

6) I suggest to add a third optional to the AAs, the numerical progressive
index, it seems easy to add, the foreach already supports more than two
variables:
foreach(index, key, value; someAA) {...}

But I don't like much the fact that the foreach with one parameter scans the
values:
foreach(value; someAA) {...}
because most times you have to iterate on the keys (in Python too the default
iteration on a dict is on its keys). With the key you can find the value, while
I believe with the value you can't find the key.

I'd like a way to find the key if you have a pointer to the actual value
contained inside the AA. (I have seen you can cast the pointer to the value to
a struct and you can start to walk back with that, but I have not succeed yet
to find the key in a reliable way, you may suggest it to me. Maybe a function
to do this can be added to Phobos.) I can use it to create a "ordered
associative array" data structure (using the built-in AAs) that keeps a linked
list of the AA pairs, according to the original insertion order inside the AA.
To do that linked list of values you may need to find the key of a given value
pointer (or a given key-value pair).

----------------------

7) A "len" property instead of "length" may be useful (even if it's a bit less
clear) because it's used so much often it's boring to write such long word over
and over. (And because 'length' isn't an easy to write word for non-English
speaking people).

----------------------

8) From the docs:
Overlapping copies are an error:
s[0..2] = s[1..3];	// error, overlapping copy
s[1..3] = s[0..2];	// error, overlapping copy
Disallowing overlapping makes it possible for more aggressive parallel code
optimizations than possible with the serial semantics of C.

But the programmer may need such operations anyway now and then, so maybe a
different syntax can be added... I don't know. Or maybe other solutions can be
found.

----------------------

9) Bit fields of structs: D is a practical language, so I think it can support
them too, to allow people to translate C => D code more simply (D has already
chosen similar practical compromises regarding C for other things).

----------------------

10) I think the compiler has to accept a line like this too, converting the
strings as necessary:
char[][int] a = [1:"abba", 2:"hello"];

It may even accept this too (probably the values have to be converted to
dynamic arrays):
auto a = [1:"abba", 2:"hello"];
auto aa1 = [1:2, 3:4];
auto sa = ["0x","1x"];
auto a2 = [[1, 2], [3, 4]];

While this works already:
auto a = [1, 2, 3, 4];
auto ca = ['0','1'];
int[int] aa2 = [1:2, 3:4];
int[][] a3 = [[1, 2], [3, 4]];

----------------------

11) I have seen that D AAs are quite efficient, but probably there are very
good hash implementations around. So "superfasthash" or a modified version of
the super refined and optimized Python/Perl hashes can be useful to replace the
currently used by DMD. (Python source code is fully open source but code
licenses may be a problem anyway, I don't know).

----------------------

12) I have seen D allows to pass anonymous functions like:
(int x){return x*x;}


x => x*x

That syntax isn't bad, but I don't know how the compiler automatically manages
the types in such situations, it can be a bit complex.

----------------------

13) The Haskell language community is quite organized, they have a large page
on the main Haskell site that compares the best ways to create the entries for
the Shootout site. They have even improved the language because one of such
tests has shown a bad (slow) side of Haskell. I have recently posted few D
entries into the Shootout site, with mixed results (one entry is at the top).
Regarding the shootout code the following code may be used by DMD developers to
improve the compiler, it shows a bad side of DMD compared to GCC:

C code:

#include <stdio.h>
#include <stdlib.h>
unsigned long fib(unsigned long n) {
    return( (n < 2) ? 1 : (fib(n-2) + fib(n-1)) );
}
int main(int argc, char *argv[]) {
    int N = ((argc == 2) ? atoi(argv[1]) : 1);
    printf("%ld\n", fib(N));
    return(0);
}


The D code:

import std.stdio, std.conv;
ulong fib(ulong n) {
    return (n < 2) ? 1 : (fib(n-2) + fib(n-1));
}
void main(string[] argv) {
    ulong n = ((argv.length == 2) ? toUlong(argv[1]) : 1);
    writefln(fib(n));
}


Used:
gcc version 3.4.2 (mingw-special)
DMD Digital Mars D Compiler v1.020 (Windows)

Compiled with:
gcc -O2 fiboC.c -o fiboC.exe
dmd -O -release -inline fiboD.d

Timings with n = 38:
C: 4.38 s
D: 5.28 s

I think such diffece is big enough to justify an improvement.

----------------------

14) This code gives:
problem.d(5): Error: cannot evaluate StrType() at compile time
But I'd like to use typeid().toString at compile time (it's useful inside
compile time functions, etc):

import std.stdio;
char[] StrType(T)() {
    return typeid(T).toString;
}
const char[] r = StrType!(typeof(1))();
void main() {
    writefln(r);
}

Later I have solved similar problems with mixins, compile-time demangle, etc,
but I think the toString of typeid() may just work at compile time.

----------------------

15) On the newsgroup I have seen a list of "wishes", among them I like:
- support struct&array in switch
- Multi-Dimensional Allocation (of dynamic arrays too)
- Named keyword arguments
- Iterators and Generators (yield is a good starting point)
- Templates in classes
- !in (probably easy to implement)
- Multiple return values (tuples) (Quite useful, but I have found other
solutions)
- Statically check for == null (maybe not easy to implement)
- Implicit New or short syntax for New

Beside them I like list generators from Python, but their syntax isn't much
compatibile with D syntax, so I don't know how they can be written.

----------------------

16) Beside C++ and Python, I think D can copy Erlang e Fortress languages a bit
too. Erlang is nice for its easy parallel capabilities, and the Fortress
language made by Sun shows lot of collections literals (bags, sets, lists,
etc), and an easy management of parallel processing (I think it looks somewhat
similar to the ParallelPascal one).

----------------------

17) I am following this newsgroup for some weeks, I have seen now and then it
pops out some DMD problems regarding type inferencing (I too have found a 'bug'
related to that, I have shown it on the learn newsgroup). I am helping the
development of "ShedSkin" it's a compiler for an implicitly static subset of
Python to C++. This program contains a really powerful type inferencer able to
find fully by itself the types of all the variables used inside a 3000-lines
long Python program (that contains no explicit type annotations).
ShedSkin is rather slow and I think D will never need all its type inferencing
capabilities, but I think its main point is to show that you can produce very
fast code even if you start with code that's written with a very easy hi-level
syntax. This 'discovery' may be useful to D too in the future.

----------------------

18) I don't like still that string literals are immutable while the others not.
I suggest to make the string literals mutable as the other ones (automatic dup
during program creation). This will avoid some of my bugs (and it can probably
avoid problems in porting code Win <=>Linux that uses strings) and help me
avoid some tricks to de-constant strings when they are constant. In few
situations you may need to create a program that avoids heap activity, so in
those situations you may need the normal string literals anyway, in such
situations you may add a certain letter after the string literal (like c?) that
tells the compile to not perform the automatic dup.

----------------------

19) I have created many very general function templates like map, imap, map,
zip, sum, max, min, filter, ifilter, all, any, etc, that accept arrays, AAs and
iterable classes. It may be useful to add a bit of functional-style programming
to D, inside a Phobos module. Such functions may be fit for the 90% of the code
where max running speed isn't necessary. They shorten the code, reduce bugs,
speed up programming, make the program simpler to read, etc, and being hi-level
they may even end helping the compiler produce efficient code (as foreach
sometimes does).

----------------------

20) After few test of mine I think the default way dmd computes struct hashes
has to be improved (or removed), to make it compute the hash looking inside the
dynamic arrays contained into the struct too. Otherwise it silently gives the
"wrong" results.

----------------------

21) In this newsgroup I've seen that people like a lot the ability of doing:
txt.split()
But I don't like it much because it's a syntactical sugar that masks the fact
that those split are functions (that pollute the namespace) and they aren't
true methods (object functions). That namespace pollution may cause problems,
if you have already defined elsewhere a function with the same name and similar
signature.


I think that's all, my successive post will probably be much smaller and mostly
comments to specific posts :-)

Bear hugs,
bearophile

Aug 07 2007

Sean Kelly <sean f4.ca> writes:

bearophile wrote:
 
 6) I suggest to add a third optional to the AAs, the numerical progressive
index, it seems easy to add, the foreach already supports more than two
variables:
 foreach(index, key, value; someAA) {...}
 
 But I don't like much the fact that the foreach with one parameter scans the
values:
 foreach(value; someAA) {...}
 because most times you have to iterate on the keys (in Python too the default
iteration on a dict is on its keys). With the key you can find the value, while
I believe with the value you can't find the key.

I disagree.  Since the key in an AA is equivalent to the index in an 
array, the current behavior of foreach is consistent for all types 
passed to it.  One parameter gives you the value, two parameters gives 
you the 'key' and the value.  And since the key for an AA is equivalent 
to the index value, I'm not sure I like the three parameter idea either. 
  It's just as easy to use a separate variable for that when counting 
entries is important anyway.

 I'd like a way to find the key if you have a pointer to the actual value
contained inside the AA. (I have seen you can cast the pointer to the value to
a struct and you can start to walk back with that, but I have not succeed yet
to find the key in a reliable way, you may suggest it to me. Maybe a function
to do this can be added to Phobos.) I can use it to create a "ordered
associative array" data structure (using the built-in AAs) that keeps a linked
list of the AA pairs, according to the original insertion order inside the AA.
To do that linked list of values you may need to find the key of a given value
pointer (or a given key-value pair).

It's hardly efficient, but:

Key getKeyOf(Value, Key)( Value[Key] aa, Value val )
{
     foreach( k, v; aa )
     {
         if( v == val )
             return k;
     }
     return Key.init;
}

 ----------------------
 
 7) A "len" property instead of "length" may be useful (even if it's a bit less
clear) because it's used so much often it's boring to write such long word over
and over. (And because 'length' isn't an easy to write word for non-English
speaking people).

It would bed nice if there were a way to support this within the current 
syntax.  Though I suppose a property method could be created:

void copyTo(T)( T src, T dst )
in
{
     assert( src.length == dst.length );
}
body
{
     memmove( dst.ptr, src.ptr, src.length );
}

myArray[1 .. 3].copyTo( myArray[2 .. 4] );

 ----------------------
 
 11) I have seen that D AAs are quite efficient, but probably there are very
good hash implementations around. So "superfasthash" or a modified version of
the super refined and optimized Python/Perl hashes can be useful to replace the
currently used by DMD. (Python source code is fully open source but code
licenses may be a problem anyway, I don't know).

Not hard to do.  I've considered even using a "power of two" number of 
buckets for the D AA but because the degenerate case is so much worse 
than with a prime number of buckets I think the current implementation 
is more generally useful.  A library hashtable is probably the best 
option for more specialized needs.

 ----------------------
 
 12) I have seen D allows to pass anonymous functions like:
 (int x){return x*x;}
 

 x => x*x



 ----------------------
 
 13) The Haskell language community is quite organized, they have a large page
on the main Haskell site that compares the best ways to create the entries for
the Shootout site. They have even improved the language because one of such
tests has shown a bad (slow) side of Haskell. I have recently posted few D
entries into the Shootout site, with mixed results (one entry is at the top).
 Regarding the shootout code the following code may be used by DMD developers
to improve the compiler, it shows a bad side of DMD compared to GCC:
 
 C code:
 
 #include <stdio.h>
 #include <stdlib.h>
 unsigned long fib(unsigned long n) {
     return( (n < 2) ? 1 : (fib(n-2) + fib(n-1)) );
 }
 int main(int argc, char *argv[]) {
     int N = ((argc == 2) ? atoi(argv[1]) : 1);
     printf("%ld\n", fib(N));
     return(0);
 }
 
 
 The D code:
 
 import std.stdio, std.conv;
 ulong fib(ulong n) {
     return (n < 2) ? 1 : (fib(n-2) + fib(n-1));
 }
 void main(string[] argv) {
     ulong n = ((argv.length == 2) ? toUlong(argv[1]) : 1);
     writefln(fib(n));
 }
 
 
 Used:
 gcc version 3.4.2 (mingw-special)
 DMD Digital Mars D Compiler v1.020 (Windows)
 
 Compiled with:
 gcc -O2 fiboC.c -o fiboC.exe
 dmd -O -release -inline fiboD.d
 
 Timings with n = 38:
 C: 4.38 s
 D: 5.28 s
 
 I think such diffece is big enough to justify an improvement.

Unless this was run on a 64-bit Unix machine, the major differernce in 
performance is that 'long' in the C code is a 32-but value, while 'long' 
in the D code is a 64-bit value.  Someone should verify that the sizes 
match.

 ----------------------
 
 14) This code gives:
 problem.d(5): Error: cannot evaluate StrType() at compile time
 But I'd like to use typeid().toString at compile time (it's useful inside
compile time functions, etc):
 
 import std.stdio;
 char[] StrType(T)() {
     return typeid(T).toString;
 }
 const char[] r = StrType!(typeof(1))();
 void main() {
     writefln(r);
 }

What about .stringof?


Sean

Aug 07 2007

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Sean Kelly wrote:
 bearophile wrote:
 
 ----------------------

 12) I have seen D allows to pass anonymous functions like:
 (int x){return x*x;}


 x => x*x

 

 

I think some amount of people in the NG knew that already :)
There were several mentions of it some time ago (more than year), 


already supports delegate literals, with a syntax nearly identical to D, 
and with full support for outer variables (ie, detects outer variables 
and allocates them in heap instead of the stack).

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Aug 09 2007

BCS <ao pathlink.com> writes:

Reply to bearophile,

 Here are some more notes, that I think are less important/significant
 than the first ones. Probably some of the following notes are
 silly/useless. There are various things I don't know about D yet.
 
 1) The Ddoc has some bugs. For example I'd like to write in the docs
 that a function takes an input in [0,1), but that produces a wrong
 html output.
 

It has a few more than that <G>

 ----------------------
 
 9) Bit fields of structs: D is a practical language, so I think it can
 support them too, to allow people to translate C => D code more simply
 (D has already chosen similar practical compromises regarding C for
 other things).
 

D used to have this after a sorts. It was removed. I forget why.

 ----------------------
 
 15) On the newsgroup I have seen a list of "wishes", among them I
 like:
 - Iterators and Generators (yield is a good starting point)

under the covers, yeild is a disaster IMO

 21) In this newsgroup I've seen that people like a lot the ability of
 doing:
 
 txt.split()
 
 But I don't like it much because it's a syntactical sugar that masks
 the fact that those split are functions

How? Just about anything after a dot is a funcion (excluding member access 
and with regars to types) even method cals are function.

 (that pollute the namespace)

If that were removed then nothing could be used with that syntax on a char[], 
so what pollution is their?

 and they aren't true methods (object functions).

In what way?

 That namespace
 pollution may cause problems, if you have already defined elsewhere a
 function with the same name and similar signature.

Could you give an example?

Aug 07 2007

Jari-Matti =?ISO-8859-1?Q?M=E4kel=E4?= <jmjmak utu.fi.invalid> writes:

bearophile wrote:

 12) I have seen D allows to pass anonymous functions like:
 (int x){return x*x;}
 

 x => x*x
 
 That syntax isn't bad, but I don't know how the compiler automatically
 manages the types in such situations, it can be a bit complex.

It does a bit more type inference than D at the moment. Currently you either
get return type or parameter type inference, but not both:

  (int x){return x*x;}

  int mul(T)(T x) { return x*x; }


 17) I am following this newsgroup for some weeks, I have seen now and then
 it pops out some DMD problems regarding type inferencing (I too have found
 a 'bug' related to that, I have shown it on the learn newsgroup). I am
 helping the development of "ShedSkin" it's a compiler for an implicitly
 static subset of Python to C++. This program contains a really powerful
 type inferencer able to find fully by itself the types of all the
 variables used inside a 3000-lines long Python program (that contains no
 explicit type annotations). ShedSkin is rather slow and I think D will
 never need all its type inferencing capabilities, but I think its main
 point is to show that you can produce very fast code even if you start
 with code that's written with a very easy hi-level syntax. This
 'discovery' may be useful to D too in the future.

At least the compile times are quite fast currently so there's much
potential for this.

 19) I have created many very general function templates like map, imap,
 map, zip, sum, max, min, filter, ifilter, all, any, etc, that accept
 arrays, AAs and iterable classes. It may be useful to add a bit of
 functional-style programming to D, inside a Phobos module. Such functions
 may be fit for the 90% of the code where max running speed isn't
 necessary. They shorten the code, reduce bugs, speed up programming, make
 the program simpler to read, etc, and being hi-level they may even end
 helping the compiler produce efficient code (as foreach sometimes does).

It's hard to predict how much generic D code will still change in the near
future, but a functional library sounds useful. I've also written both
compile and runtime versions of generic map, fold, min, max, ... functions.
Probably some dsource project already implements them too, but it's faster
to write a simple algorithm from scratch than search the whole dsource and
worry about possible IP issues. But in overall I assume a lot of developer
time is wasted on reinventing the wheel :-/

 21) In this newsgroup I've seen that people like a lot the ability of
 doing: txt.split()
 But I don't like it much because it's a syntactical sugar that masks the
 fact that those split are functions (that pollute the namespace) and they
 aren't true methods (object functions). That namespace pollution may cause
 problems, if you have already defined elsewhere a function with the same
 name and similar signature.

There was discussion about this some time ago. For example Scala has an
interesting language construct we could use in D.

 
 
 I think that's all, my successive post will probably be much smaller and
 mostly comments to specific posts :-)
 
 Bear hugs,
 bearophile

--

Aug 07 2007

Jari-Matti =?ISO-8859-1?Q?M=E4kel=E4?= <jmjmak utu.fi.invalid> writes:

Jari-Matti M�kel� wrote:

 bearophile wrote:
 
 12) I have seen D allows to pass anonymous functions like:
 (int x){return x*x;}
 

 x => x*x
 
 That syntax isn't bad, but I don't know how the compiler automatically
 manages the types in such situations, it can be a bit complex.

 
 It does a bit more type inference than D at the moment. Currently you
 either get return type or parameter type inference, but not both:
 
   (int x){return x*x;}
 
   int mul(T)(T x) { return x*x; }

I lost my train of thought a bit here. I'm not sure if this works, but

  (T)(T x) { return x; }

could allow type inference in both ways. Still on the parameter side the
polymorphism is explicit and you can't "chain" the literals because there's
no implicit parametric type in D afaik:

  auto myfun = (x) { return "foo"; }

  auto myval = myfun(myfun(42));


A bad example, but this kind of code would also benefit a lot

  template foo(T : int) { alias char[] foo; }
  template foo(T : char[]) { alias int foo; }

  template zzz(T : int) { const zzz = "hello"; }
  template zzz(T : char[]) { const zzz = 0; }

  U bar(T, U = foo!(T))(T t) {
    return zzz!(T);
  }

  /* with proper type inference
  def bar(t) {
    static if (is(t : int)
      return "hello";
    else
      return 0;
  }
  */

  void main() {

    auto a = bar(42);
    auto b = bar("hi");

    pragma(msg, typeof(a).stringof); // char[]
    pragma(msg, typeof(b).stringof); // int
  }

I've even seen wicked code that in the end looked pretty much like this

  U bar(T, U = inferType!(parseAST!(scanSource!(traits!(getSourceOfThisFun
())))) )(T t) {
    ...
  }

What can I say - somebody must have had fun :P

Aug 08 2007

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 13) The Haskell language community is quite organized, they have a large page
on the main Haskell site that compares the best ways to create the entries for
the Shootout site. They have even improved the language because one of such
tests has shown a bad (slow) side of Haskell. I have recently posted few D
entries into the Shootout site, with mixed results (one entry is at the top).
 Regarding the shootout code the following code may be used by DMD developers
to improve the compiler, it shows a bad side of DMD compared to GCC:
 
 C code:
 
 #include <stdio.h>
 #include <stdlib.h>
 unsigned long fib(unsigned long n) {
     return( (n < 2) ? 1 : (fib(n-2) + fib(n-1)) );
 }
 int main(int argc, char *argv[]) {
     int N = ((argc == 2) ? atoi(argv[1]) : 1);
     printf("%ld\n", fib(N));
     return(0);
 }
 
 
 The D code:
 
 import std.stdio, std.conv;
 ulong fib(ulong n) {
     return (n < 2) ? 1 : (fib(n-2) + fib(n-1));
 }
 void main(string[] argv) {
     ulong n = ((argv.length == 2) ? toUlong(argv[1]) : 1);
     writefln(fib(n));
 }
 
 
 Used:
 gcc version 3.4.2 (mingw-special)
 DMD Digital Mars D Compiler v1.020 (Windows)
 
 Compiled with:
 gcc -O2 fiboC.c -o fiboC.exe
 dmd -O -release -inline fiboD.d
 
 Timings with n = 38:
 C: 4.38 s
 D: 5.28 s
 
 I think such diffece is big enough to justify an improvement.

D's ulong is 64 bits, while C's unsigned long is 32 bits. This is likely 
the source of the speed difference.

Aug 07 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

bearophile wrote:
 2d) Regarding the printing I think it's not nice that:
 writefln(["1", "2"])
 prints the same thing as:
 writefln([1, 2])
 So maybe writefln can use "" when printing strings inside arrays/AAs, to
differentiate them.

In that case, writefln(1) should not print the same thing as 
writefln("1") either. But I agree that it would be useful to have a 
function that prints strings with "". Preferably, it should be able to 
convert any expression of basic types into a string representation, that 
when parsed by the compiler as a literal yields an identical value.

 2e) I also think writefln when in printing arrays/AAs can put spaces between
the commas.
 I think [el1, el2, el3] and [1:"a", 2:"b"] are more readable than
[el1,el2,el3] and [1:a,2:b].

Ideally, a formatter should allow you to define any format you want.

 2f) I think this is a writefln inconsistency:
 import std.stdio;
 void main() {
   char[3] a = "abc";
   writefln([a, a]); // Prints: [abc,abc]
   writefln([a: 1]); // Prints: [[a,b,c]:1]
 }

This is not a writefln inconsistency as much as it is a DMD inconsistency.

note that typeof([a, a]) == char[][2],
while     typeof([a: 1]) == int[char[3]],

This feels like a bug, but somehow, I have a feeling it is intentional 
to allow string array literals such as:

["a","ab","abc"]

without giving a stupid compiler error about type incompatibility.

IMHO, it would be much better if array literals instead picked the best 
common type, allowing expressions such as: [1, 1.5, 2, 2.5]


 3) Given an array x1, like a char[], it seems to me that x1.dup is a bit
slower compared to:
 char[] x2 = new char[x1.length];
 x2[] = x1;
 I hope this can be fixed (or maybe someone can tell me why).

.dup should definitely not be any slower (unless you have a GC compiled 
with DbC that does a memcmp after each .dup). Do you have a benchmark 
showing this?

 6) I suggest to add a third optional to the AAs, the numerical progressive
index, it seems easy to add, the foreach already supports more than two
variables:
 foreach(index, key, value; someAA) {...}
 
 But I don't like much the fact that the foreach with one parameter scans the
values:
 foreach(value; someAA) {...}
 because most times you have to iterate on the keys (in Python too the default
iteration on a dict is on its keys). With the key you can find the value, while
I believe with the value you can't find the key.

I agree with Sean here. The current behavior is the most consistent one.

 
 I'd like a way to find the key if you have a pointer to the actual value
contained inside the AA. (I have seen you can cast the pointer to the value to
a struct and you can start to walk back with that, but I have not succeed yet
to find the key in a reliable way, you may suggest it to me. Maybe a function
to do this can be added to Phobos.) I can use it to create a "ordered
associative array" data structure (using the built-in AAs) that keeps a linked
list of the AA pairs, according to the original insertion order inside the AA.
To do that linked list of values you may need to find the key of a given value
pointer (or a given key-value pair).

Something like this should work with current DMD and derivations that 
keep the memory layout of AA's:

template getKeyFromPtr(Key) {
   Key getKeyFromPtr(Val)(Val *ptr) {
     auto offset = (Key.sizeof + size_t.sizeof - 1) & ~(size_t.sizeof - 1);
     return *cast(Key *)(cast(void*)ptr - offset);
   }
}

(Not really tested though.)


 9) Bit fields of structs: D is a practical language, so I think it can support
them too, to allow people to translate C => D code more simply (D has already
chosen similar practical compromises regarding C for other things).

Using string mixins and some CTFE it is not that hard implementing a bit 
field replacement.

 10) I think the compiler has to accept a line like this too, converting the
strings as necessary:
 char[][int] a = [1:"abba", 2:"hello"];

I agree. My solution would be to make string literals dynamic arrays 
instead of static. It doesn't make much sense to put the length into the 
type. "a" and "ab" should be of the same type.

 11) I have seen that D AAs are quite efficient, but probably there are very
good hash implementations around. So "superfasthash" or a modified version of
the super refined and optimized Python/Perl hashes can be useful to replace the
currently used by DMD. (Python source code is fully open source but code
licenses may be a problem anyway, I don't know).

I usually make my own containers when the built in AA doesn't cut it, 
such as:

Map!(K,V), OrderedMap!(K,V), DiskStoredMap!(K,V), and so on.

The only big problem is D's lack of a reference return type.

 12) I have seen D allows to pass anonymous functions like:
 (int x){return x*x;}
 

 x => x*x
 
 That syntax isn't bad, but I don't know how the compiler automatically manages
the types in such situations, it can be a bit complex.

It is not bad, but it can never evaluate to a single function. It would 
have to evaluate to a template alias or similar. But I too wish D had a 
more convenient syntax for single expression delegate literals, such as:

(int x) => x*x

or something.

 Beside them I like list generators from Python, but their syntax isn't much
compatibile with D syntax, so I don't know how they can be written.

They could probably be written something like:

int x; // dummy
List(x*x, x, range(10))

or

List((int x) { return x*x; }, range(10));

or (as I have implemented them):

map(range(10), (int x) { return x*x; });

range(10).map((int x) { return x*x; });

or equivalently as a lazy expression:

a = range(1_000_000_000_000_000L).mapLazy((int x) { return x*x; });
writefln(a[555_555_555_555]);

 18) I don't like still that string literals are immutable while the others
not. I suggest to make the string literals mutable as the other ones (automatic
dup during program creation). This will avoid some of my bugs (and it can
probably avoid problems in porting code Win <=>Linux that uses strings) and
help me avoid some tricks to de-constant strings when they are constant. In few
situations you may need to create a program that avoids heap activity, so in
those situations you may need the normal string literals anyway, in such
situations you may add a certain letter after the string literal (like c?) that
tells the compile to not perform the automatic dup.

Most strings will never be changed, so this will make lots of 
unnecessary dups. D2.0's strings are constant (pronounced constant but 
spelled invariant) is a better solution. Maybe an automatic dup from 
constant strings into char[] would be a good thing.

 19) I have created many very general function templates like map, imap, map,
zip, sum, max, min, filter, ifilter, all, any, etc, that accept arrays, AAs and
iterable classes. It may be useful to add a bit of functional-style programming
to D, inside a Phobos module. Such functions may be fit for the 90% of the code
where max running speed isn't necessary. They shorten the code, reduce bugs,
speed up programming, make the program simpler to read, etc, and being hi-level
they may even end helping the compiler produce efficient code (as foreach
sometimes does).

I have implemented heaps of such functions too and I believe most people 
have implemented at least a few of those.

I made a proposal to standardize a number of such functions a long time 
ago. It is a shame there is no standard regarding such functions. If not 
only because it would make it so much easier to publish short snippets 
of code without having to constantly define all the basic functions.

Luckily, Tango has some really well implemented and solid functions in 
tango.core.Array. A collection that can only grow.

BTW, I really like your naming convention for in place versions (as I 
assume they are). I might steal that right away :). Or does i mean those 
are iterator functions?

 20) After few test of mine I think the default way dmd computes struct hashes
has to be improved (or removed), to make it compute the hash looking inside the
dynamic arrays contained into the struct too. Otherwise it silently gives the
"wrong" results.

This is just another example that treating structs as plain old data may 
be a too simplistic view. Another example is comparing two structs for 
equality. One could argue that two structs should be equal if all 
members are equal. That is not how it works today.

 21) In this newsgroup I've seen that people like a lot the ability of doing:
 txt.split()
 But I don't like it much because it's a syntactical sugar that masks the fact
that those split are functions (that pollute the namespace) and they aren't
true methods (object functions). That namespace pollution may cause problems,
if you have already defined elsewhere a function with the same name and similar
signature.

I disagree. First, the name space "pollution" is no problem in D. You 
can never accidentally call an unwanted overload of a function from 
another module. D will issue an ambiguity error when the same identifier 
matches instances in several different modules.

Secondly, I have really grown to like the way of chaining functions on 
arrays. The function call order and the data flow is read left to right. 
An example from my own code:

references ~= refs
	.splitIter()
	.filterIter((char[] c) { return c.length > 2; })
	.map((char[] c) { return a[1..$-1]; });

Writing this as:

references ~= map(filterIterate(refs.splitIterate(),
                                 (char[] c) { return c.length > 2; }),
                   (char[] c) { return a[1..$-1]; });

Is both much harder to read and parse (to me atleast) and also hard to 
indent in any meaningful way.

-- 
Oskar

Aug 07 2007

Sean Kelly <sean f4.ca> writes:

Oskar Linde wrote:
 bearophile wrote:
 
 21) In this newsgroup I've seen that people like a lot the ability of 
 doing:
 txt.split()
 But I don't like it much because it's a syntactical sugar that masks 
 the fact that those split are functions (that pollute the namespace) 
 and they aren't true methods (object functions). That namespace 
 pollution may cause problems, if you have already defined elsewhere a 
 function with the same name and similar signature.

 
 I disagree. First, the name space "pollution" is no problem in D. You 
 can never accidentally call an unwanted overload of a function from 
 another module. D will issue an ambiguity error when the same identifier 
 matches instances in several different modules.
 
 Secondly, I have really grown to like the way of chaining functions on 
 arrays. The function call order and the data flow is read left to right. 

Another advantage is that the property syntax can simplify template code 
in some cases because it allows arrays to be treated as if they were 
UDTs so long as the proper support routine is available.  To me, this 
goes a long way towards supporting arrays as viable containers.


Sean

Aug 07 2007

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Oskar Linde wrote:
 Secondly, I have really grown to like the way of chaining functions on 
 arrays. The function call order and the data flow is read left to right. 
 An example from my own code:
 
 references ~= refs
     .splitIter()
     .filterIter((char[] c) { return c.length > 2; })
     .map((char[] c) { return a[1..$-1]; });
 
 Writing this as:
 
 references ~= map(filterIterate(refs.splitIterate(),
                                 (char[] c) { return c.length > 2; }),
                   (char[] c) { return a[1..$-1]; });
 
 Is both much harder to read and parse (to me atleast) and also hard to 
 indent in any meaningful way.
 

Agreed.  Incidentally, consider this little thing I did in Ruby a few 
weeks back as an answer to a programming challenge (hey, technically 
just two lines):












We can actually come quite close to this in D.

-- Chris Nicholson-Sauls

Aug 07 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Oskar Linde wrote:
 bearophile wrote:
 2d) Regarding the printing I think it's not nice that:
 writefln(["1", "2"])
 prints the same thing as:
 writefln([1, 2])
 So maybe writefln can use "" when printing strings inside arrays/AAs, 
 to differentiate them.

 
 In that case, writefln(1) should not print the same thing as 
 writefln("1") either. But I agree that it would be useful to have a 
 function that prints strings with "". Preferably, it should be able to 
 convert any expression of basic types into a string representation, that 
 when parsed by the compiler as a literal yields an identical value.
 

This is the heart of the repr/str distinction from Python that was 
mentioned.  In Python instead of just one toString method, you have 
__str__ and __repr__.

__str__ is supposed to return a version that looks nice for "human reading"

__repr__ is supposed to return a string that (if possible) would 
recreate the object if passed to the interpreter.

So the str of "1" prints without quote characters, but repr of "1" 
prints with quotes.

--bb

Aug 07 2007

bearophile <bearophile <bearophileHUGS lycos.com> > writes:

I have changed the way I answer, hopefully this is better, now I group answers
to a person.

Sean Kelly:

It's hardly efficient, but:<

I was looking for something O(1).


Though I suppose a property method could be created:<

Nice, I'll try that. But maybe a syntax like this one is simpler:
myArray[1 .. 3].copyTo(2, 4);


Not hard to do.<

Well copying large portions of hashing Python/Perl C sourcecode isn't that
easy, it has to be adapted.


A library hashtable is probably the best option for more specialized needs.<

I know, but I was talking about improving the D built in AA copying the already
refined implementations that come from 15+ years of community efforts to
produce the best.


What about .stringof?<

Thank you.

--------------------------

BCS <ao at pathlink dot com>:

under the covers, yeild is a disaster IMO<

Maybe this is true for a low level language, I don't know. But in Python it's
very useful and quite commonly used (but I don't like its V.2.5 use for
coroutines, it's too much confusing and its syntax is dirty because it looks

cause no disasters :-) It avoids you to mess with the opApply.

--------------------------

Oskar Linde:

But I agree that it would be useful to have a function that prints strings with
"".<

I have already written it, it's called put/putr, and it solves all the problems
I was talking of :-)


Preferably, it should be able to convert any expression of basic types into a
string representation, that when parsed by the compiler as a literal yields an
identical value.<

This may require some extra testing to see if it's true.


Ideally, a formatter should allow you to define any format you want.<

You can define all the formats you want, but sensible defaults are necessary
anyway.
I was talking about the standard tools in standard configuration, that you use
when you debug, when you write things quickly, etc.


This is not a writefln inconsistency as much as it is a DMD inconsistency.<

I see, and I think it deserves to be fixed :-)


.dup should definitely not be any slower (unless you have a GC compiled with
DbC that does a memcmp after each .dup). Do you have a benchmark showing this?<

I have just written a benchmark, and now .dup is quite faster than the other
(about 50% faster, and it's a repeatable result), so I shut up.


Something like this should work with current DMD and derivations that keep the
memory layout of AA's:<

Thank you, I'll test it.


Using string mixins and some CTFE it is not that hard implementing a bit field
replacement.<

Okay. Maybe this can be standardized inside a Phobos lib (unrelated: bitvectors
of Phobos are nice, but they are slow compared to the old 'bit' type).


My solution would be to make string literals dynamic arrays instead of static.<

I'd like that. I don't know what's the point of having them a different kind of
strings (but I like to have a way to use static array of chars anyway, they may
be useful now and then).


I usually make my own containers when the built in AA doesn't cut it, such as:<

Designing and tuning a very good associative array/set requires months or
years, so I was talking about the built-in one.


I have implemented heaps of such functions too and I believe most people have
implemented at least a few of those.<

I am sure lot of people have done similar things, because when used (and not
over-used) they can help (expecially if you are a bit used to functional
programming). It's just I think I have implemented lot of them, and mine are
probably more flexible (they usually accept arrays, AAs and iterable objects,
more than one when possible, mixed types when possible, etc. There are some
limitations that I have tries to push, some can't be removed) :-)


Luckily, Tango has some really well implemented and solid functions in
tango.core.Array. A collection that can only grow.<

I have seen them, and I think they are nice, but I was talking about something
for functional-style coding.


BTW, I really like your naming convention for in place versions (as I assume
they are). I might steal that right away :). Or does i mean those are iterator
functions?<

You are free to steal. I can probably offer my functional module to Tango
developers too. My "isomething" has the same meaning as in the itertools module
of Python, they are related to lazy iterable objects.


Another example is comparing two structs for equality.<

I see :-)


First, the name space "pollution" is no problem in D. You can never
accidentally call an unwanted overload of a function from another module. D
will issue an ambiguity error when the same identifier matches instances in
several different modules.<

I see.


Is both much harder to read and parse (to me atleast) and also hard to indent
in any meaningful way.<

I think both versions aren't much readable (but the first is better), I suggest
you to split your lines in more parts, so you can give those parts a name and
you can comment the lines too.

Thank you for the many answers, and the two snippets of code I may find useful.

Bear hugs,
bearophile

Aug 09 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

bearophile <bearophile wrote:
 I have changed the way I answer, hopefully this is better, now I group answers
to a person.

This is much better, but I would still recommend you to reply to each 
post(and person) in a separate message. Not doing that breaks the 
message threading (the References header in your post will not refer to 
the correct messages) making it much harder for others to follow the 
discussion.

The web-news reader has lots of issues in itself, so I would really 
recommend you to use an NNTP client if possible. Thunderbird seems to be 
the most popular one.

 Sean Kelly:
 Though I suppose a property method could be created:<

 
 Nice, I'll try that. But maybe a syntax like this one is simpler:
 myArray[1 .. 3].copyTo(2, 4);

This would also have the advantage of actually being implementable with 
current DMD.

 BCS <ao at pathlink dot com>:
 
 under the covers, yeild is a disaster IMO<

 
 Maybe this is true for a low level language, I don't know. But in Python it's
very useful and quite commonly used (but I don't like its V.2.5 use for
coroutines, it's too much confusing and its syntax is dirty because it looks

cause no disasters :-) It avoids you to mess with the opApply.

yield would not be any worse than current opApply. In fact, it would be 
identical on the low level side. To improve the low level side of this, 
the compiler would have to be better at inlining and const-folding 
delegate calls.

 I usually make my own containers when the built in AA doesn't cut it, such as:<

 
 Designing and tuning a very good associative array/set requires months or
years, so I was talking about the built-in one.

Many times people have been posting improvements to the built in AA. The 
problem is proving that the improvements really are better for the 
general case. The hardest part has proven to be defining the "general case".

 I have implemented heaps of such functions too and I believe most people have
implemented at least a few of those.<

 
 I am sure lot of people have done similar things, because when used (and not
over-used) they can help (expecially if you are a bit used to functional
programming). It's just I think I have implemented lot of them, and mine are
probably more flexible (they usually accept arrays, AAs and iterable objects,
more than one when possible, mixed types when possible, etc. There are some
limitations that I have tries to push, some can't be removed) :-)

I have also made quite general functions and implemented iterator 
versions for many of them too. I have really not had much time lately 
(as in the last 10 months), but would still be interested to see your 
code. There are quite a few new features in D since I worked on this 
last that should make some things much easier now.

 Is both much harder to read and parse (to me atleast) and also hard to indent
in any meaningful way.<

 
 I think both versions aren't much readable (but the first is better), I
suggest you to split your lines in more parts, so you can give those parts a
name and you can comment the lines too.

Thank you for your comments on my personal coding style ;p.

Cheers,

-- 
Oskar

Aug 09 2007

Sean Kelly <sean f4.ca> writes:

bearophile <bearophile wrote:
 
 --------------------------
 
 BCS <ao at pathlink dot com>:
 
 under the covers, yeild is a disaster IMO<

 
 Maybe this is true for a low level language, I don't know. But in Python it's
very useful and quite commonly used (but I don't like its V.2.5 use for
coroutines, it's too much confusing and its syntax is dirty because it looks

cause no disasters :-) It avoids you to mess with the opApply.

For what it's worth, Tango contains a Fiber implementation that can be 
used for coroutines and such.


Sean

Aug 09 2007

BCS <ao pathlink.com> writes:

Reply to bearophile,

 --------------------------
 
 BCS <ao at pathlink dot com>:
 
 under the covers, yeild is a disaster IMO<
 

 Maybe this is true for a low level language, I don't know. But in
 Python it's very useful and quite commonly used (but I don't like its
 V.2.5 use for coroutines, it's too much confusing and its syntax is
 dirty because it looks like a statement used as an expression). It's



considering converting to D. The use of yield has turned out to be a
significant 
performance problem. This is the reson I don't like it.

 It avoids you to
 mess with the opApply.
 

D's opApply syntax IMO seems much cleaner and easier to understand. Also 
it can work without any heap allocations.


OTOH you can fake it quiet well with an delegate that stores most of it's 
state in an object.

|class Walk(T)
|{
|  T[] dat;
|  int at;
|
|  this(T[] d){dat = d; at = 0;}
|
|  int opApply(int delegate(inout int, inout T) dg)
|  {
|    while(at < dat.length)
|    {
|      int j = at;
|      if(int ret = dg(j, dat[at])) return ret; //yeild somtimes
|      at++;
|    }
|  }
|}
|
|auto walker = new Walk!(char)("hello world");
|foreach(inout int i, inout char c; walker) if(c == ' ') break;  //skip hello
|foreach(inout int i, inout char c; walker) writef("%s", c);     //prints 
world

 --------------------------

Aug 09 2007

bearophile <bearophileHUGS lycos.com> writes:

Oskar Linde:

Many times people have been posting improvements to the built in AA. The
problem is proving that the improvements really are better for the general
case. The hardest part has proven to be defining the "general case".<

I understand. We may collect a large repository of code/snippets that use AAs,
so they can be tuned against that repository.

-----------------

BCS:


considering converting to D. The use of yield has turned out to be a
significant performance problem. This is the reson I don't like it.<

I see. Let's assume you are right, that yield is too much slow for a low level
language (as D. D is a low leven language that is able to be used as
medium-level language too).
I think yield is good if you use Python, that's a high level language.
I also think a language like D can have both some hi-level constructs and some
lower level ones. Often many parts of a program don't need max speed (a
profiler usually shows just few hot spots in a program), so there you may use
yield. Where speed is more important the language can offer you means to go
faster. This looks like a win-win situation to me (but the programmer must know
what language constructs are slow, so they don't have to be used in speed
critical parts of the code). After developing ShedSkin I think languages fit
for both low level and high level programming can be designed (but a programmer
has to know them well to avoid using the slow features in the speed/memory
critical parts of the code).


D's opApply syntax IMO seems much cleaner and easier to understand.<

I can't agree (but I know Python much more than D, so I am biased), yield of
Python is very simple :-) Here are some examples (to reduce space I have
reduced the indent from 4 to 2 spaces, I have removed most comments and all the
docstrings):

Space-efficient and lazy version of sieve of Eratosthene, by Eppstein:

def xsieve():




  roots = xsieve()
  root = roots.next()
  square = root*root





  D = {}
  n = 3
  while True:

      D[square] = root+root
      root = roots.next()
      square = root*root


      yield n

      p = D[n]
      q = n+p
      while q in D:
        q += p
      del D[n]
      D[q] = p



Generator that returns the integer codons of the given (string) sequence:

def xcodons(seq, overlap=0):
  assert 0 <= overlap <= 2
  len_seq = len(seq)
  pos = 0
  while len_seq - pos >= 3:
    yield seq[pos : pos+3]
    pos += 3 - overlap


Divides a given 2D mat in equal blocks, of given size, and yields them:

def xblockfy(mat, blocknrows, blockncols):
  if blocknrows <= 0: raise ValueError("blocknrows must be >= 1")
  if blockncols <= 0: raise ValueError("blockncols must be >= 1")
  if mat and mat[0]:
    nx = len(mat[0])
    ny = len(mat)
    for y in xrange(0, ny, blocknrows):
      ymax = min(y+blocknrows, ny)
      for x in xrange(0, nx, blockncols):
        block = []
        for yy in xrange(y, ymax):
          block.extend(mat[yy][x: x+blockncols])
        yield block


Arithmetic progression generator, that accepts noninteger increments too:

def xfrange(start, end=None, inc=None):
  if inc is None:
    inc = 1.0
  if end is None:
    end = start + 0.0
    start = 0.0
  else:
    start = float(start)
  count = int( ceil((end-start)/inc) )
  for i in xrange(count):
    yield start + i * inc


Lazy iterable split, with key function too (I have created a similar thing for
D):

def xsplit(seq, key=bool, keepkeys=True):
  group = []
  for el in seq:
    if key(el):
      if group:
        yield group
      group = []
      if keepkeys:
        group.append(el)
    else:
      group.append(el)
  yield group


Yields all the distinct pairs of the given seq:

def xpairs(seq):
  lung = len(seq)
  for i,e1 in enumerate(seq):
    for j in xrange(i+1, lung):
      yield e1, seq[j]


Yields all the positions of item in a given iterable:

def xpositions(seq, item):
  for pos, el in enumerate(seq):
    if el == item:
      yield pos


I know you can do those things with D too, this is an example of a translation
from Python that uses yield to D, a lazy generator of all the permutations of a
sequence, algorithm from Phillip Paul Fuchs, modified:

def xpermutations(items, copy=True):
  if copy:
    items = list(items)
    n = len(items)
    p = range(n+1)
    i = 1
    yield items[:]
    while i < n:
      p[i] -= 1
      j = (p[i] if i & 1 else 0)
      items[j], items[i] = items[i], items[j]
      yield items[:]
      i = 1
      while p[i] == 0:
        p[i] = i
        i += 1
  else:
    items = items[:]
    n = len(items)
    p = range(n+1)
    i = 1
    yield items
    while i < n:
      p[i] -= 1
      j = (p[i] if i & 1 else 0)
      items[j], items[i] = items[i], items[j]
      yield items
      i = 1
      while p[i] == 0:
        p[i] = i
        i += 1


Xpermutations!(TyItem) xpermutations(TyItem)(TyItem[] items, bool copy=true) {
  return new Xpermutations!(TyItem)(items, copy);
}

class Xpermutations(TyItem) {
  int[] range(int stop) {
    int[] result;

    if (stop > 0) {
      result.length = stop;
      for(int i = 0; i < stop; i++)
        result[i] = i;
    }
    return result;
  }

  TyItem[] items;
  bool copy;

  this(TyItem[] items, bool copy=true) {
    this.items = items.dup;
    this.copy = copy;
  }

  int opApply(int delegate(ref TyItem[]) dg) {
    int result;

    int n = items.length;
    TyItem aux;
    auto p = range(n + 1);
    int i = 1;

    if (copy) { // yield duplicated arrays (safer)
      auto items2 = items.dup;
      result = dg(items2);  if (result) goto END; // yield items2
      while (i < n) {
        p[i] -= 1;
        int j = (i & 1) ? p[i] : 0;
        aux = items[j]; items[j] = items[i]; items[i] = aux; // swap
        items2 = items.dup;
        result = dg(items2);  if (result) goto END; // yield items2
        i = 1;
        while (p[i] == 0) {
          p[i] = i;
          i++;
        }
      }
    } else { // yield just references (faster)
      result = dg(items);  if (result) goto END; // yield items
      while (i < n) {
        p[i] -= 1;
        int j = (i & 1) ? p[i] : 0;
        aux = items[j]; items[j] = items[i]; items[i] = aux; // swap
        result = dg(items);  if (result) goto END; // yield items
        i = 1;
        while (p[i] == 0) {
          p[i] = i;
          i++;
        }
      }
    }

    END:
    return result;
  }
}


(If you spot problems in that D code you are encouraged to say it, I am a
newbie of D still).
They do similar things, but I can't see how you can say that opApply of D
"seems much cleaner and easier to understand" than Python yield :-)


OTOH you can fake it quiet well with an delegate that stores most of it's state
in an object.<

Something quite similar to a working version of your code is the builtin Python
function iter(), that given an iterable (like an array, an iterable object,
etc) return a lazy iterable object that allows to scan its elements in a
flexible way (I have created some similar functions with D). But iter() doesn't
replace yield.

Bye and thank you for the answers,
bearophile

Aug 11 2007

BCS <ao pathlink.com> writes:

Reply to bearophile,

 Oskar Linde:
 
 Many times people have been posting improvements to the built in AA.
 The problem is proving that the improvements really are better for
 the general case. The hardest part has proven to be defining the
 "general case".<
 

 I understand. We may collect a large repository of code/snippets that
 use AAs, so they can be tuned against that repository.
 
 -----------------
 
 BCS:
 
 D's opApply syntax IMO seems much cleaner and easier to understand.<
 

 I can't agree (but I know Python much more than D, so I am biased),
 yield of Python is very simple :-) Here are some examples 

[dropped python code]

 I know you can do those things with D too, this is an example of a
 translation from Python that uses yield to D, a lazy generator of all
 the permutations of a sequence, algorithm from Phillip Paul Fuchs,
 modified:
 

[dropped D code]

 (If you spot problems in that D code you are encouraged to say it, I
 am a newbie of D still).

I don't see anything that is a problem (I don't follow the details though). 
I'll put a copy at the end that has some edits, mostly for code space and 
editablility.

 
 They do similar things, but I can't see how you can say that opApply
 of D "seems much cleaner and easier to understand" than Python yield
 :-)

Maybe I under-spoke, the opApply is much more transparent. the system is 
doing, and hiding, a lot less complexity. IMO a system hiding to much
complexity 
is not a good thing. Also, opApply is working within the existing constraints 
of the language.
 
 OTOH you can fake it quiet well with an delegate that stores most of
 it's state in an object.<
 

 Something quite similar to a working version of your code is the
 builtin Python function iter(), that given an iterable (like an array,
 an iterable object, etc) return a lazy iterable object that allows to
 scan its elements in a flexible way (I have created some similar
 functions with D). But iter() doesn't replace yield.
 
 Bye and thank you for the answers,
 bearophile




|Xpermutations!(TyItem) xpermutations(TyItem)(TyItem[] items, bool copy=true)
|{
|  if(copy)
|    return new Xpermutations!(TyItem, true)(items);
|  else
|    return new Xpermutations!(TyItem, false)(items);
|}
|
|class Xpermutations(TyItem, bool copy)
|{
|  int[] range(int stop)
|  {
|    int[] result;
|    if (stop > 0)
|    {
|      result.length = stop;
|      for (int i = 0; i < stop; i++)
|        result[i] = i;
|    }
|    return result;
|  }
|
|  TyItem[] items;
|  bool copy;
|
|  this(TyItem[] items, bool copy=true)
|  {
|    this.items = items.dup;
|    this.copy = copy;
|  }
|
|  int opApply(int delegate(ref TyItem[]) dg)
|  {
|    TyItem[] items2 = items;
|
|    static if(copy) items2 = items.dup;
|    if (int result = dg2(items2)) return result; // yield items2
|
|    int n = items.length;
|    int i = 1;
|    auto p = range(n + 1);
|    TyItem aux;
|
|    while (i < n)
|    {
|      p[i] -= 1;
|      int j = (i & 1) ? p[i] : 0;
|      aux = items[j];
|      items[j] = items[i];
|      items[i] = aux; // swap
|      
|      static if(copy) items2 = items.dup;
|      if (int result = dg2(items2)) return result; // yield items2
|      
|      i = 1;
|      
|      while (p[i] == 0)
|      {
|        p[i] = i;
|        i++;
|      }
|    }
|  return 0;
|  }
|}

Aug 11 2007

bearophile <bearophileHUGS lycos.com> writes:

Sorry for my answering delay.

BCS:

I'll put a copy at the end that has some edits, mostly for code space and
editablility.<

Thank you. I'm learning D, so I can comment your code; this is the first line:
Xpermutations!(TyItem) xpermutations(TyItem)(TyItem[] items, bool copy=true) 

I think it has a problem, this templated class (that't the resulting object of
the helper function) doesn't receive the boolean value:
Xpermutations!(TyItem)

And this line:
if (int result = dg2(items2)) return result;
May raise an error like this:
modulename.d(linenumber): Error: '=' does not give a boolean result


Maybe I under-spoke, the opApply is much more transparent. the system is doing,
and hiding, a lot less complexity. IMO a system hiding to much complexity is
not a good thing.<

I understand now. Because of their nature and usage, in dynamic languages like
Python/Ruby it's normal to hide part of the complexity from the user, but you
are right asserting that in lower level languages like C++/D it's probably
better to show most complexities to the programmer, to avoid problems, reduce
"mysteries", to improve the capability to optimize the code, etc.
(Maybe a language like D can be designed to have some hi-level constructs to be
used in the spots where you don't care of speed; beside other lower lever ones
to be used where you need more speed and you need to exactly know what's
happening under the cover. Other people may argue you can't fit both things in
the same language).

Thank you for your code version and your answers.

----------------------

In the meantime I have found some problems translating lazy recursive
generators from Py=>D, like this one, that yields the combinations of the given
items, taking k distinct elements (this kind of code may look slow). You may
help me:

def xcombinations(items, k):
  if k:
    for i in xrange(len(items) - k + 1):
      for cc in xcombinations(items[i+1:], k-1):
        yield [items[i]] + cc
  else:
    yield []

Usage example:
xcombinations("abcd", 1) ==> "a" "b" "c" "d"
xcombinations("abcd", 2) ==> "ab" "ac" "ad" "bc" "bd" "cd"
xcombinations("abcd", 3) ==> "abc" "abd" "acd" "bcd"

Note that:

xrange(n) just yields int numbers from 0 to n-1.

+ among arrays equals to ~.

array[a:] equals to:
(a >= cast(long)array.length) ? new typeof(array)[0] : array[max(a,0) .. $].

Bear hugs,
bearophile

Aug 16 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

bearophile wrote:

 In the meantime I have found some problems translating lazy recursive
generators from Py=>D, like this one, that yields the combinations of the given
items, taking k distinct elements (this kind of code may look slow). You may
help me:
 
 def xcombinations(items, k):
   if k:
     for i in xrange(len(items) - k + 1):
       for cc in xcombinations(items[i+1:], k-1):
         yield [items[i]] + cc
   else:
     yield []
 
 Usage example:
 xcombinations("abcd", 1) ==> "a" "b" "c" "d"
 xcombinations("abcd", 2) ==> "ab" "ac" "ad" "bc" "bd" "cd"
 xcombinations("abcd", 3) ==> "abc" "abd" "acd" "bcd"


As far as I can tell, the following should work. Maybe I am missing 
something fundamental, as this is just a straight forward implementation 
of the above. Note that yield(x) is:

{ auto tmp = x; if (auto r = dg(tmp)) return r; }

(Might be a nice macro in the future.)

Code:

struct XRange {
   int n;

   int opApply(int delegate(inout int v) dg) {
     for (int i = 0; i < n; i++)
       if (auto r = dg(i)) return r;
     return 0;
   }
}

XRange xrange(int n) { XRange r; r.n = n; return r; }

struct XCombinations(T) {
   T[] items;
   int k;
	
   int opApply(int delegate(inout T[] arr) dg) {
     if (k > 0)
       foreach (i; xrange(items.length - k + 1))
         foreach(cc; xcombinations(items[i+1..$], k-1)) {
           auto tmp = items[i] ~ cc; // must be an lvalue
           if (auto r = dg(tmp)) return r;
	}
     else {
       T[] tmp = null; // need an lvalue
       return dg(tmp);
     }
     return 0;
   }	
}

XCombinations!(T) xcombinations(T)(T[] items, int k) {
   XCombinations!(T) ret;
   ret.items = items;
   ret.k = k;
   return ret;
}



-- 
Oskar

Aug 16 2007

BCS <ao pathlink.com> writes:

Reply to bearophile,

 Sorry for my answering delay.
 
 BCS:
 
 I'll put a copy at the end that has some edits, mostly for code space
 and editablility.<
 

 Thank you. I'm learning D, so I can comment your code; this is the
 first line: Xpermutations!(TyItem) xpermutations(TyItem)(TyItem[]
 items, bool copy=true)
 
 I think it has a problem, this templated class (that't the resulting
 object of the helper function) doesn't receive the boolean value:
 
 Xpermutations!(TyItem)

that is because it can't, not directly at least. This is because the argument 
is a runtime value and template parameter must be compile time values. However 
because bools can only have two values it is trivial to generate both versions 
and have the run time code select between them.

OTOH your correct about the return type. this could be solved by using an 
abstract base class that uses the (TyItem) and a derived class that uses 
both args.
Another solution would be to return a delegate to the opApply function:

|int delegate(int delegate(ref TyItem[])) xpermutations(TyItem)(TyItem[] 
items, bool copy=true)
|{
| if(copy)
| return &(new Xpermutations!(TyItem, true)(items)).opApply;
| else
| return &(new Xpermutations!(TyItem, false)(items)).opApply;
|}
 
 And this line:
 if (int result = dg2(items2)) return result;
 May raise an error like this:
 modulename.d(linenumber): Error: '=' does not give a boolean result

it doesn't. assignment expressions cause that error, variable delectations 
don't

 
 (Maybe a language like D can be designed to have some hi-level
 constructs to be used in the spots where you don't care of speed;
 beside other lower lever ones to be used where you need more speed and
 you need to exactly know what's happening under the cover. Other
 people may argue you can't fit both things in the same language).
 


high level". ff

Aug 16 2007

D Programming

C/C++ Programming

Other

digitalmars.D - Few things II