www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Few things II

reply bearophile <bearophileHUGS lycos.com> writes:
Here are some more notes, that I think are less important/significant than the
first ones. Probably some of the following notes are silly/useless. There are
various things I don't know about D yet.

1) The Ddoc has some bugs. For example I'd like to write in the docs that a
function takes an input in [0,1), but that produces a wrong html output.

----------------------

2) Regarding the printing functions:

2a) I think that the names writef e writefln are a bit too much long and not
easy to remember. So better names can be found. For example show/shownl,
show/showr, show/showl, put/putr, write/writel, print/printnl, print/printl,
etc. I have seen Tango uses print.


2b) I like the idea of the writefl/write functions that don't perform
formatting. I was printing a string that casually contained an % and that has
produced a bug. So I've created a better printing function, that avoids that
bug too.


2c) writefln may print structs too in a default way, my basic (non-recursive!)
implementation (that represents structs as: <field1, field2, ...> ): 

char[] sstr(TyStruct)(TyStruct s) {
  auto result = new char[][s.tupleof.length];
  foreach (i, field; s.tupleof)
    result[i] = str(field);
  return "<" ~ result.join(", ") ~ ">";
}


2d) Regarding the printing I think it's not nice that:
writefln(["1", "2"])
prints the same thing as:
writefln([1, 2])
So maybe writefln can use "" when printing strings inside arrays/AAs, to
differentiate them.
(To sort out the ideas you may also take a look at the difference between
repr() and str() in Python).



2e) I also think writefln when in printing arrays/AAs can put spaces between
the commas.
I think [el1, el2, el3] and [1:"a", 2:"b"] are more readable than [el1,el2,el3]
and [1:a,2:b].


2f) I think this is a writefln inconsistency:
import std.stdio;
void main() {
  char[3] a = "abc";
  writefln([a, a]); // Prints: [abc,abc]
  writefln([a: 1]); // Prints: [[a,b,c]:1]
}

----------------------

3) Given an array x1, like a char[], it seems to me that x1.dup is a bit slower
compared to:
char[] x2 = new char[x1.length];
x2[] = x1;
I hope this can be fixed (or maybe someone can tell me why).

----------------------

4) The __builtin_expect is a method that gcc (versions >= 2.96) offer for
programmers to indicate branch prediction information to the compiler. The
return value of __builtin_expect is the first argument (which could only be an
integer) passed to it. This may be useful for dmd too, maybe for far future
times, when dmd has solved most of its bugs and the D syntax is more finished.

----------------------

5) I think built-in sets can be quite useful (I use them often in Python, I can
show some examples if you want). Few set operations among the keys of AAs may
be quite useful too (Python doesn't have them). (A first version may be inside
Phobos, implemented using the AAs with empty values.)

----------------------

6) I suggest to add a third optional to the AAs, the numerical progressive
index, it seems easy to add, the foreach already supports more than two
variables:
foreach(index, key, value; someAA) {...}

But I don't like much the fact that the foreach with one parameter scans the
values:
foreach(value; someAA) {...}
because most times you have to iterate on the keys (in Python too the default
iteration on a dict is on its keys). With the key you can find the value, while
I believe with the value you can't find the key.

I'd like a way to find the key if you have a pointer to the actual value
contained inside the AA. (I have seen you can cast the pointer to the value to
a struct and you can start to walk back with that, but I have not succeed yet
to find the key in a reliable way, you may suggest it to me. Maybe a function
to do this can be added to Phobos.) I can use it to create a "ordered
associative array" data structure (using the built-in AAs) that keeps a linked
list of the AA pairs, according to the original insertion order inside the AA.
To do that linked list of values you may need to find the key of a given value
pointer (or a given key-value pair).

----------------------

7) A "len" property instead of "length" may be useful (even if it's a bit less
clear) because it's used so much often it's boring to write such long word over
and over. (And because 'length' isn't an easy to write word for non-English
speaking people).

----------------------

8) From the docs:
Overlapping copies are an error:
s[0..2] = s[1..3];	// error, overlapping copy
s[1..3] = s[0..2];	// error, overlapping copy
Disallowing overlapping makes it possible for more aggressive parallel code
optimizations than possible with the serial semantics of C.

But the programmer may need such operations anyway now and then, so maybe a
different syntax can be added... I don't know. Or maybe other solutions can be
found.

----------------------

9) Bit fields of structs: D is a practical language, so I think it can support
them too, to allow people to translate C => D code more simply (D has already
chosen similar practical compromises regarding C for other things).

----------------------

10) I think the compiler has to accept a line like this too, converting the
strings as necessary:
char[][int] a = [1:"abba", 2:"hello"];

It may even accept this too (probably the values have to be converted to
dynamic arrays):
auto a = [1:"abba", 2:"hello"];
auto aa1 = [1:2, 3:4];
auto sa = ["0x","1x"];
auto a2 = [[1, 2], [3, 4]];

While this works already:
auto a = [1, 2, 3, 4];
auto ca = ['0','1'];
int[int] aa2 = [1:2, 3:4];
int[][] a3 = [[1, 2], [3, 4]];

----------------------

11) I have seen that D AAs are quite efficient, but probably there are very
good hash implementations around. So "superfasthash" or a modified version of
the super refined and optimized Python/Perl hashes can be useful to replace the
currently used by DMD. (Python source code is fully open source but code
licenses may be a problem anyway, I don't know).

----------------------

12) I have seen D allows to pass anonymous functions like:
(int x){return x*x;}

C# V.3.0 uses a more synthetic syntax:
x => x*x

That syntax isn't bad, but I don't know how the compiler automatically manages
the types in such situations, it can be a bit complex.

----------------------

13) The Haskell language community is quite organized, they have a large page
on the main Haskell site that compares the best ways to create the entries for
the Shootout site. They have even improved the language because one of such
tests has shown a bad (slow) side of Haskell. I have recently posted few D
entries into the Shootout site, with mixed results (one entry is at the top).
Regarding the shootout code the following code may be used by DMD developers to
improve the compiler, it shows a bad side of DMD compared to GCC:

C code:

#include <stdio.h>
#include <stdlib.h>
unsigned long fib(unsigned long n) {
    return( (n < 2) ? 1 : (fib(n-2) + fib(n-1)) );
}
int main(int argc, char *argv[]) {
    int N = ((argc == 2) ? atoi(argv[1]) : 1);
    printf("%ld\n", fib(N));
    return(0);
}


The D code:

import std.stdio, std.conv;
ulong fib(ulong n) {
    return (n < 2) ? 1 : (fib(n-2) + fib(n-1));
}
void main(string[] argv) {
    ulong n = ((argv.length == 2) ? toUlong(argv[1]) : 1);
    writefln(fib(n));
}


Used:
gcc version 3.4.2 (mingw-special)
DMD Digital Mars D Compiler v1.020 (Windows)

Compiled with:
gcc -O2 fiboC.c -o fiboC.exe
dmd -O -release -inline fiboD.d

Timings with n = 38:
C: 4.38 s
D: 5.28 s

I think such diffece is big enough to justify an improvement.

----------------------

14) This code gives:
problem.d(5): Error: cannot evaluate StrType() at compile time
But I'd like to use typeid().toString at compile time (it's useful inside
compile time functions, etc):

import std.stdio;
char[] StrType(T)() {
    return typeid(T).toString;
}
const char[] r = StrType!(typeof(1))();
void main() {
    writefln(r);
}

Later I have solved similar problems with mixins, compile-time demangle, etc,
but I think the toString of typeid() may just work at compile time.

----------------------

15) On the newsgroup I have seen a list of "wishes", among them I like:
- support struct&array in switch
- Multi-Dimensional Allocation (of dynamic arrays too)
- Named keyword arguments
- Iterators and Generators (yield is a good starting point)
- Templates in classes
- !in (probably easy to implement)
- Multiple return values (tuples) (Quite useful, but I have found other
solutions)
- Statically check for == null (maybe not easy to implement)
- Implicit New or short syntax for New

Beside them I like list generators from Python, but their syntax isn't much
compatibile with D syntax, so I don't know how they can be written.

----------------------

16) Beside C++ and Python, I think D can copy Erlang e Fortress languages a bit
too. Erlang is nice for its easy parallel capabilities, and the Fortress
language made by Sun shows lot of collections literals (bags, sets, lists,
etc), and an easy management of parallel processing (I think it looks somewhat
similar to the ParallelPascal one).

----------------------

17) I am following this newsgroup for some weeks, I have seen now and then it
pops out some DMD problems regarding type inferencing (I too have found a 'bug'
related to that, I have shown it on the learn newsgroup). I am helping the
development of "ShedSkin" it's a compiler for an implicitly static subset of
Python to C++. This program contains a really powerful type inferencer able to
find fully by itself the types of all the variables used inside a 3000-lines
long Python program (that contains no explicit type annotations).
ShedSkin is rather slow and I think D will never need all its type inferencing
capabilities, but I think its main point is to show that you can produce very
fast code even if you start with code that's written with a very easy hi-level
syntax. This 'discovery' may be useful to D too in the future.

----------------------

18) I don't like still that string literals are immutable while the others not.
I suggest to make the string literals mutable as the other ones (automatic dup
during program creation). This will avoid some of my bugs (and it can probably
avoid problems in porting code Win <=>Linux that uses strings) and help me
avoid some tricks to de-constant strings when they are constant. In few
situations you may need to create a program that avoids heap activity, so in
those situations you may need the normal string literals anyway, in such
situations you may add a certain letter after the string literal (like c?) that
tells the compile to not perform the automatic dup.

----------------------

19) I have created many very general function templates like map, imap, map,
zip, sum, max, min, filter, ifilter, all, any, etc, that accept arrays, AAs and
iterable classes. It may be useful to add a bit of functional-style programming
to D, inside a Phobos module. Such functions may be fit for the 90% of the code
where max running speed isn't necessary. They shorten the code, reduce bugs,
speed up programming, make the program simpler to read, etc, and being hi-level
they may even end helping the compiler produce efficient code (as foreach
sometimes does).

----------------------

20) After few test of mine I think the default way dmd computes struct hashes
has to be improved (or removed), to make it compute the hash looking inside the
dynamic arrays contained into the struct too. Otherwise it silently gives the
"wrong" results.

----------------------

21) In this newsgroup I've seen that people like a lot the ability of doing:
txt.split()
But I don't like it much because it's a syntactical sugar that masks the fact
that those split are functions (that pollute the namespace) and they aren't
true methods (object functions). That namespace pollution may cause problems,
if you have already defined elsewhere a function with the same name and similar
signature.


I think that's all, my successive post will probably be much smaller and mostly
comments to specific posts :-)

Bear hugs,
bearophile
Aug 07 2007
next sibling parent reply Sean Kelly <sean f4.ca> writes:
bearophile wrote:
 
 6) I suggest to add a third optional to the AAs, the numerical progressive
index, it seems easy to add, the foreach already supports more than two
variables:
 foreach(index, key, value; someAA) {...}
 
 But I don't like much the fact that the foreach with one parameter scans the
values:
 foreach(value; someAA) {...}
 because most times you have to iterate on the keys (in Python too the default
iteration on a dict is on its keys). With the key you can find the value, while
I believe with the value you can't find the key.

I disagree. Since the key in an AA is equivalent to the index in an array, the current behavior of foreach is consistent for all types passed to it. One parameter gives you the value, two parameters gives you the 'key' and the value. And since the key for an AA is equivalent to the index value, I'm not sure I like the three parameter idea either. It's just as easy to use a separate variable for that when counting entries is important anyway.
 I'd like a way to find the key if you have a pointer to the actual value
contained inside the AA. (I have seen you can cast the pointer to the value to
a struct and you can start to walk back with that, but I have not succeed yet
to find the key in a reliable way, you may suggest it to me. Maybe a function
to do this can be added to Phobos.) I can use it to create a "ordered
associative array" data structure (using the built-in AAs) that keeps a linked
list of the AA pairs, according to the original insertion order inside the AA.
To do that linked list of values you may need to find the key of a given value
pointer (or a given key-value pair).

It's hardly efficient, but: Key getKeyOf(Value, Key)( Value[Key] aa, Value val ) { foreach( k, v; aa ) { if( v == val ) return k; } return Key.init; }
 ----------------------
 
 7) A "len" property instead of "length" may be useful (even if it's a bit less
clear) because it's used so much often it's boring to write such long word over
and over. (And because 'length' isn't an easy to write word for non-English
speaking people).

It would bed nice if there were a way to support this within the current syntax. Though I suppose a property method could be created: void copyTo(T)( T src, T dst ) in { assert( src.length == dst.length ); } body { memmove( dst.ptr, src.ptr, src.length ); } myArray[1 .. 3].copyTo( myArray[2 .. 4] );
 ----------------------
 
 11) I have seen that D AAs are quite efficient, but probably there are very
good hash implementations around. So "superfasthash" or a modified version of
the super refined and optimized Python/Perl hashes can be useful to replace the
currently used by DMD. (Python source code is fully open source but code
licenses may be a problem anyway, I don't know).

Not hard to do. I've considered even using a "power of two" number of buckets for the D AA but because the degenerate case is so much worse than with a prime number of buckets I think the current implementation is more generally useful. A library hashtable is probably the best option for more specialized needs.
 ----------------------
 
 12) I have seen D allows to pass anonymous functions like:
 (int x){return x*x;}
 
 C# V.3.0 uses a more synthetic syntax:
 x => x*x

Crazy. Who knew C# had functional influences.
 ----------------------
 
 13) The Haskell language community is quite organized, they have a large page
on the main Haskell site that compares the best ways to create the entries for
the Shootout site. They have even improved the language because one of such
tests has shown a bad (slow) side of Haskell. I have recently posted few D
entries into the Shootout site, with mixed results (one entry is at the top).
 Regarding the shootout code the following code may be used by DMD developers
to improve the compiler, it shows a bad side of DMD compared to GCC:
 
 C code:
 
 #include <stdio.h>
 #include <stdlib.h>
 unsigned long fib(unsigned long n) {
     return( (n < 2) ? 1 : (fib(n-2) + fib(n-1)) );
 }
 int main(int argc, char *argv[]) {
     int N = ((argc == 2) ? atoi(argv[1]) : 1);
     printf("%ld\n", fib(N));
     return(0);
 }
 
 
 The D code:
 
 import std.stdio, std.conv;
 ulong fib(ulong n) {
     return (n < 2) ? 1 : (fib(n-2) + fib(n-1));
 }
 void main(string[] argv) {
     ulong n = ((argv.length == 2) ? toUlong(argv[1]) : 1);
     writefln(fib(n));
 }
 
 
 Used:
 gcc version 3.4.2 (mingw-special)
 DMD Digital Mars D Compiler v1.020 (Windows)
 
 Compiled with:
 gcc -O2 fiboC.c -o fiboC.exe
 dmd -O -release -inline fiboD.d
 
 Timings with n = 38:
 C: 4.38 s
 D: 5.28 s
 
 I think such diffece is big enough to justify an improvement.

Unless this was run on a 64-bit Unix machine, the major differernce in performance is that 'long' in the C code is a 32-but value, while 'long' in the D code is a 64-bit value. Someone should verify that the sizes match.
 ----------------------
 
 14) This code gives:
 problem.d(5): Error: cannot evaluate StrType() at compile time
 But I'd like to use typeid().toString at compile time (it's useful inside
compile time functions, etc):
 
 import std.stdio;
 char[] StrType(T)() {
     return typeid(T).toString;
 }
 const char[] r = StrType!(typeof(1))();
 void main() {
     writefln(r);
 }

What about .stringof? Sean
Aug 07 2007
parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Sean Kelly wrote:
 bearophile wrote:
 
 ----------------------

 12) I have seen D allows to pass anonymous functions like:
 (int x){return x*x;}

 C# V.3.0 uses a more synthetic syntax:
 x => x*x

Crazy. Who knew C# had functional influences.

I think some amount of people in the NG knew that already :) There were several mentions of it some time ago (more than year), somewhere after the time C# 3.0 was announced and being discussed. But that is just the short (and "templated") syntax for delegates, C# 2.0 already supports delegate literals, with a syntax nearly identical to D, and with full support for outer variables (ie, detects outer variables and allocates them in heap instead of the stack). -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Aug 09 2007
prev sibling next sibling parent BCS <ao pathlink.com> writes:
Reply to bearophile,

 Here are some more notes, that I think are less important/significant
 than the first ones. Probably some of the following notes are
 silly/useless. There are various things I don't know about D yet.
 
 1) The Ddoc has some bugs. For example I'd like to write in the docs
 that a function takes an input in [0,1), but that produces a wrong
 html output.
 

It has a few more than that <G>
 ----------------------
 
 9) Bit fields of structs: D is a practical language, so I think it can
 support them too, to allow people to translate C => D code more simply
 (D has already chosen similar practical compromises regarding C for
 other things).
 

D used to have this after a sorts. It was removed. I forget why.
 ----------------------
 
 15) On the newsgroup I have seen a list of "wishes", among them I
 like:
 - Iterators and Generators (yield is a good starting point)

under the covers, yeild is a disaster IMO
 21) In this newsgroup I've seen that people like a lot the ability of
 doing:
 
 txt.split()
 
 But I don't like it much because it's a syntactical sugar that masks
 the fact that those split are functions

How? Just about anything after a dot is a funcion (excluding member access and with regars to types) even method cals are function.
 (that pollute the namespace)

If that were removed then nothing could be used with that syntax on a char[], so what pollution is their?
 and they aren't true methods (object functions).

In what way?
 That namespace
 pollution may cause problems, if you have already defined elsewhere a
 function with the same name and similar signature.

Could you give an example?
Aug 07 2007
prev sibling next sibling parent reply Jari-Matti =?ISO-8859-1?Q?M=E4kel=E4?= <jmjmak utu.fi.invalid> writes:
bearophile wrote:

 12) I have seen D allows to pass anonymous functions like:
 (int x){return x*x;}
 
 C# V.3.0 uses a more synthetic syntax:
 x => x*x
 
 That syntax isn't bad, but I don't know how the compiler automatically
 manages the types in such situations, it can be a bit complex.

It does a bit more type inference than D at the moment. Currently you either get return type or parameter type inference, but not both: (int x){return x*x;} int mul(T)(T x) { return x*x; }
 17) I am following this newsgroup for some weeks, I have seen now and then
 it pops out some DMD problems regarding type inferencing (I too have found
 a 'bug' related to that, I have shown it on the learn newsgroup). I am
 helping the development of "ShedSkin" it's a compiler for an implicitly
 static subset of Python to C++. This program contains a really powerful
 type inferencer able to find fully by itself the types of all the
 variables used inside a 3000-lines long Python program (that contains no
 explicit type annotations). ShedSkin is rather slow and I think D will
 never need all its type inferencing capabilities, but I think its main
 point is to show that you can produce very fast code even if you start
 with code that's written with a very easy hi-level syntax. This
 'discovery' may be useful to D too in the future.

At least the compile times are quite fast currently so there's much potential for this.
 19) I have created many very general function templates like map, imap,
 map, zip, sum, max, min, filter, ifilter, all, any, etc, that accept
 arrays, AAs and iterable classes. It may be useful to add a bit of
 functional-style programming to D, inside a Phobos module. Such functions
 may be fit for the 90% of the code where max running speed isn't
 necessary. They shorten the code, reduce bugs, speed up programming, make
 the program simpler to read, etc, and being hi-level they may even end
 helping the compiler produce efficient code (as foreach sometimes does).

It's hard to predict how much generic D code will still change in the near future, but a functional library sounds useful. I've also written both compile and runtime versions of generic map, fold, min, max, ... functions. Probably some dsource project already implements them too, but it's faster to write a simple algorithm from scratch than search the whole dsource and worry about possible IP issues. But in overall I assume a lot of developer time is wasted on reinventing the wheel :-/
 21) In this newsgroup I've seen that people like a lot the ability of
 doing: txt.split()
 But I don't like it much because it's a syntactical sugar that masks the
 fact that those split are functions (that pollute the namespace) and they
 aren't true methods (object functions). That namespace pollution may cause
 problems, if you have already defined elsewhere a function with the same
 name and similar signature.

There was discussion about this some time ago. For example Scala has an interesting language construct we could use in D.
 
 
 I think that's all, my successive post will probably be much smaller and
 mostly comments to specific posts :-)
 
 Bear hugs,
 bearophile

--
Aug 07 2007
parent Jari-Matti =?ISO-8859-1?Q?M=E4kel=E4?= <jmjmak utu.fi.invalid> writes:
Jari-Matti Mäkelä wrote:

 bearophile wrote:
 
 12) I have seen D allows to pass anonymous functions like:
 (int x){return x*x;}
 
 C# V.3.0 uses a more synthetic syntax:
 x => x*x
 
 That syntax isn't bad, but I don't know how the compiler automatically
 manages the types in such situations, it can be a bit complex.

It does a bit more type inference than D at the moment. Currently you either get return type or parameter type inference, but not both: (int x){return x*x;} int mul(T)(T x) { return x*x; }

I lost my train of thought a bit here. I'm not sure if this works, but (T)(T x) { return x; } could allow type inference in both ways. Still on the parameter side the polymorphism is explicit and you can't "chain" the literals because there's no implicit parametric type in D afaik: auto myfun = (x) { return "foo"; } auto myval = myfun(myfun(42)); A bad example, but this kind of code would also benefit a lot template foo(T : int) { alias char[] foo; } template foo(T : char[]) { alias int foo; } template zzz(T : int) { const zzz = "hello"; } template zzz(T : char[]) { const zzz = 0; } U bar(T, U = foo!(T))(T t) { return zzz!(T); } /* with proper type inference def bar(t) { static if (is(t : int) return "hello"; else return 0; } */ void main() { auto a = bar(42); auto b = bar("hi"); pragma(msg, typeof(a).stringof); // char[] pragma(msg, typeof(b).stringof); // int } I've even seen wicked code that in the end looked pretty much like this U bar(T, U = inferType!(parseAST!(scanSource!(traits!(getSourceOfThisFun ())))) )(T t) { ... } What can I say - somebody must have had fun :P
Aug 08 2007
prev sibling next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 13) The Haskell language community is quite organized, they have a large page
on the main Haskell site that compares the best ways to create the entries for
the Shootout site. They have even improved the language because one of such
tests has shown a bad (slow) side of Haskell. I have recently posted few D
entries into the Shootout site, with mixed results (one entry is at the top).
 Regarding the shootout code the following code may be used by DMD developers
to improve the compiler, it shows a bad side of DMD compared to GCC:
 
 C code:
 
 #include <stdio.h>
 #include <stdlib.h>
 unsigned long fib(unsigned long n) {
     return( (n < 2) ? 1 : (fib(n-2) + fib(n-1)) );
 }
 int main(int argc, char *argv[]) {
     int N = ((argc == 2) ? atoi(argv[1]) : 1);
     printf("%ld\n", fib(N));
     return(0);
 }
 
 
 The D code:
 
 import std.stdio, std.conv;
 ulong fib(ulong n) {
     return (n < 2) ? 1 : (fib(n-2) + fib(n-1));
 }
 void main(string[] argv) {
     ulong n = ((argv.length == 2) ? toUlong(argv[1]) : 1);
     writefln(fib(n));
 }
 
 
 Used:
 gcc version 3.4.2 (mingw-special)
 DMD Digital Mars D Compiler v1.020 (Windows)
 
 Compiled with:
 gcc -O2 fiboC.c -o fiboC.exe
 dmd -O -release -inline fiboD.d
 
 Timings with n = 38:
 C: 4.38 s
 D: 5.28 s
 
 I think such diffece is big enough to justify an improvement.

D's ulong is 64 bits, while C's unsigned long is 32 bits. This is likely the source of the speed difference.
Aug 07 2007
prev sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
bearophile wrote:
 2d) Regarding the printing I think it's not nice that:
 writefln(["1", "2"])
 prints the same thing as:
 writefln([1, 2])
 So maybe writefln can use "" when printing strings inside arrays/AAs, to
differentiate them.

In that case, writefln(1) should not print the same thing as writefln("1") either. But I agree that it would be useful to have a function that prints strings with "". Preferably, it should be able to convert any expression of basic types into a string representation, that when parsed by the compiler as a literal yields an identical value.
 2e) I also think writefln when in printing arrays/AAs can put spaces between
the commas.
 I think [el1, el2, el3] and [1:"a", 2:"b"] are more readable than
[el1,el2,el3] and [1:a,2:b].

Ideally, a formatter should allow you to define any format you want.
 2f) I think this is a writefln inconsistency:
 import std.stdio;
 void main() {
   char[3] a = "abc";
   writefln([a, a]); // Prints: [abc,abc]
   writefln([a: 1]); // Prints: [[a,b,c]:1]
 }

This is not a writefln inconsistency as much as it is a DMD inconsistency. note that typeof([a, a]) == char[][2], while typeof([a: 1]) == int[char[3]], This feels like a bug, but somehow, I have a feeling it is intentional to allow string array literals such as: ["a","ab","abc"] without giving a stupid compiler error about type incompatibility. IMHO, it would be much better if array literals instead picked the best common type, allowing expressions such as: [1, 1.5, 2, 2.5]
 3) Given an array x1, like a char[], it seems to me that x1.dup is a bit
slower compared to:
 char[] x2 = new char[x1.length];
 x2[] = x1;
 I hope this can be fixed (or maybe someone can tell me why).

.dup should definitely not be any slower (unless you have a GC compiled with DbC that does a memcmp after each .dup). Do you have a benchmark showing this?
 6) I suggest to add a third optional to the AAs, the numerical progressive
index, it seems easy to add, the foreach already supports more than two
variables:
 foreach(index, key, value; someAA) {...}
 
 But I don't like much the fact that the foreach with one parameter scans the
values:
 foreach(value; someAA) {...}
 because most times you have to iterate on the keys (in Python too the default
iteration on a dict is on its keys). With the key you can find the value, while
I believe with the value you can't find the key.

I agree with Sean here. The current behavior is the most consistent one.
 
 I'd like a way to find the key if you have a pointer to the actual value
contained inside the AA. (I have seen you can cast the pointer to the value to
a struct and you can start to walk back with that, but I have not succeed yet
to find the key in a reliable way, you may suggest it to me. Maybe a function
to do this can be added to Phobos.) I can use it to create a "ordered
associative array" data structure (using the built-in AAs) that keeps a linked
list of the AA pairs, according to the original insertion order inside the AA.
To do that linked list of values you may need to find the key of a given value
pointer (or a given key-value pair).

Something like this should work with current DMD and derivations that keep the memory layout of AA's: template getKeyFromPtr(Key) { Key getKeyFromPtr(Val)(Val *ptr) { auto offset = (Key.sizeof + size_t.sizeof - 1) & ~(size_t.sizeof - 1); return *cast(Key *)(cast(void*)ptr - offset); } } (Not really tested though.)
 9) Bit fields of structs: D is a practical language, so I think it can support
them too, to allow people to translate C => D code more simply (D has already
chosen similar practical compromises regarding C for other things).

Using string mixins and some CTFE it is not that hard implementing a bit field replacement.
 10) I think the compiler has to accept a line like this too, converting the
strings as necessary:
 char[][int] a = [1:"abba", 2:"hello"];

I agree. My solution would be to make string literals dynamic arrays instead of static. It doesn't make much sense to put the length into the type. "a" and "ab" should be of the same type.
 11) I have seen that D AAs are quite efficient, but probably there are very
good hash implementations around. So "superfasthash" or a modified version of
the super refined and optimized Python/Perl hashes can be useful to replace the
currently used by DMD. (Python source code is fully open source but code
licenses may be a problem anyway, I don't know).

I usually make my own containers when the built in AA doesn't cut it, such as: Map!(K,V), OrderedMap!(K,V), DiskStoredMap!(K,V), and so on. The only big problem is D's lack of a reference return type.
 12) I have seen D allows to pass anonymous functions like:
 (int x){return x*x;}
 
 C# V.3.0 uses a more synthetic syntax:
 x => x*x
 
 That syntax isn't bad, but I don't know how the compiler automatically manages
the types in such situations, it can be a bit complex.

It is not bad, but it can never evaluate to a single function. It would have to evaluate to a template alias or similar. But I too wish D had a more convenient syntax for single expression delegate literals, such as: (int x) => x*x or something.
 Beside them I like list generators from Python, but their syntax isn't much
compatibile with D syntax, so I don't know how they can be written.

They could probably be written something like: int x; // dummy List(x*x, x, range(10)) or List((int x) { return x*x; }, range(10)); or (as I have implemented them): map(range(10), (int x) { return x*x; }); range(10).map((int x) { return x*x; }); or equivalently as a lazy expression: a = range(1_000_000_000_000_000L).mapLazy((int x) { return x*x; }); writefln(a[555_555_555_555]);
 18) I don't like still that string literals are immutable while the others
not. I suggest to make the string literals mutable as the other ones (automatic
dup during program creation). This will avoid some of my bugs (and it can
probably avoid problems in porting code Win <=>Linux that uses strings) and
help me avoid some tricks to de-constant strings when they are constant. In few
situations you may need to create a program that avoids heap activity, so in
those situations you may need the normal string literals anyway, in such
situations you may add a certain letter after the string literal (like c?) that
tells the compile to not perform the automatic dup.

Most strings will never be changed, so this will make lots of unnecessary dups. D2.0's strings are constant (pronounced constant but spelled invariant) is a better solution. Maybe an automatic dup from constant strings into char[] would be a good thing.
 19) I have created many very general function templates like map, imap, map,
zip, sum, max, min, filter, ifilter, all, any, etc, that accept arrays, AAs and
iterable classes. It may be useful to add a bit of functional-style programming
to D, inside a Phobos module. Such functions may be fit for the 90% of the code
where max running speed isn't necessary. They shorten the code, reduce bugs,
speed up programming, make the program simpler to read, etc, and being hi-level
they may even end helping the compiler produce efficient code (as foreach
sometimes does).

I have implemented heaps of such functions too and I believe most people have implemented at least a few of those. I made a proposal to standardize a number of such functions a long time ago. It is a shame there is no standard regarding such functions. If not only because it would make it so much easier to publish short snippets of code without having to constantly define all the basic functions. Luckily, Tango has some really well implemented and solid functions in tango.core.Array. A collection that can only grow. BTW, I really like your naming convention for in place versions (as I assume they are). I might steal that right away :). Or does i mean those are iterator functions?
 20) After few test of mine I think the default way dmd computes struct hashes
has to be improved (or removed), to make it compute the hash looking inside the
dynamic arrays contained into the struct too. Otherwise it silently gives the
"wrong" results.

This is just another example that treating structs as plain old data may be a too simplistic view. Another example is comparing two structs for equality. One could argue that two structs should be equal if all members are equal. That is not how it works today.
 21) In this newsgroup I've seen that people like a lot the ability of doing:
 txt.split()
 But I don't like it much because it's a syntactical sugar that masks the fact
that those split are functions (that pollute the namespace) and they aren't
true methods (object functions). That namespace pollution may cause problems,
if you have already defined elsewhere a function with the same name and similar
signature.

I disagree. First, the name space "pollution" is no problem in D. You can never accidentally call an unwanted overload of a function from another module. D will issue an ambiguity error when the same identifier matches instances in several different modules. Secondly, I have really grown to like the way of chaining functions on arrays. The function call order and the data flow is read left to right. An example from my own code: references ~= refs .splitIter() .filterIter((char[] c) { return c.length > 2; }) .map((char[] c) { return a[1..$-1]; }); Writing this as: references ~= map(filterIterate(refs.splitIterate(), (char[] c) { return c.length > 2; }), (char[] c) { return a[1..$-1]; }); Is both much harder to read and parse (to me atleast) and also hard to indent in any meaningful way. -- Oskar
Aug 07 2007
next sibling parent Sean Kelly <sean f4.ca> writes:
Oskar Linde wrote:
 bearophile wrote:
 
 21) In this newsgroup I've seen that people like a lot the ability of 
 doing:
 txt.split()
 But I don't like it much because it's a syntactical sugar that masks 
 the fact that those split are functions (that pollute the namespace) 
 and they aren't true methods (object functions). That namespace 
 pollution may cause problems, if you have already defined elsewhere a 
 function with the same name and similar signature.

I disagree. First, the name space "pollution" is no problem in D. You can never accidentally call an unwanted overload of a function from another module. D will issue an ambiguity error when the same identifier matches instances in several different modules. Secondly, I have really grown to like the way of chaining functions on arrays. The function call order and the data flow is read left to right.

Another advantage is that the property syntax can simplify template code in some cases because it allows arrays to be treated as if they were UDTs so long as the proper support routine is available. To me, this goes a long way towards supporting arrays as viable containers. Sean
Aug 07 2007
prev sibling next sibling parent Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Oskar Linde wrote:
 Secondly, I have really grown to like the way of chaining functions on 
 arrays. The function call order and the data flow is read left to right. 
 An example from my own code:
 
 references ~= refs
     .splitIter()
     .filterIter((char[] c) { return c.length > 2; })
     .map((char[] c) { return a[1..$-1]; });
 
 Writing this as:
 
 references ~= map(filterIterate(refs.splitIterate(),
                                 (char[] c) { return c.length > 2; }),
                   (char[] c) { return a[1..$-1]; });
 
 Is both much harder to read and parse (to me atleast) and also hard to 
 indent in any meaningful way.
 

Agreed. Incidentally, consider this little thing I did in Ruby a few weeks back as an answer to a programming challenge (hey, technically just two lines): # haystack = ( # master = IO.read('wordlist.txt').split # ).collect {|word| word.split(//).sort!.join } # # File.new('result.txt','w').puts( # (IO.read('input.txt').split) # .collect {|word | word.split(//).sort!.join } # .collect {|needle| master[haystack.index(needle)] } # .join(',') # ) We can actually come quite close to this in D. -- Chris Nicholson-Sauls
Aug 07 2007
prev sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Oskar Linde wrote:
 bearophile wrote:
 2d) Regarding the printing I think it's not nice that:
 writefln(["1", "2"])
 prints the same thing as:
 writefln([1, 2])
 So maybe writefln can use "" when printing strings inside arrays/AAs, 
 to differentiate them.

In that case, writefln(1) should not print the same thing as writefln("1") either. But I agree that it would be useful to have a function that prints strings with "". Preferably, it should be able to convert any expression of basic types into a string representation, that when parsed by the compiler as a literal yields an identical value.

This is the heart of the repr/str distinction from Python that was mentioned. In Python instead of just one toString method, you have __str__ and __repr__. __str__ is supposed to return a version that looks nice for "human reading" __repr__ is supposed to return a string that (if possible) would recreate the object if passed to the interpreter. So the str of "1" prints without quote characters, but repr of "1" prints with quotes. --bb
Aug 07 2007
parent reply bearophile <bearophile <bearophileHUGS lycos.com> > writes:
I have changed the way I answer, hopefully this is better, now I group answers
to a person.

Sean Kelly:

It's hardly efficient, but:<

I was looking for something O(1).
Though I suppose a property method could be created:<

Nice, I'll try that. But maybe a syntax like this one is simpler: myArray[1 .. 3].copyTo(2, 4);
Not hard to do.<

Well copying large portions of hashing Python/Perl C sourcecode isn't that easy, it has to be adapted.
A library hashtable is probably the best option for more specialized needs.<

I know, but I was talking about improving the D built in AA copying the already refined implementations that come from 15+ years of community efforts to produce the best.
What about .stringof?<

Thank you. -------------------------- BCS <ao at pathlink dot com>:
under the covers, yeild is a disaster IMO<

Maybe this is true for a low level language, I don't know. But in Python it's very useful and quite commonly used (but I don't like its V.2.5 use for coroutines, it's too much confusing and its syntax is dirty because it looks like a statement used as an expression). It's used in C# too and it seem to cause no disasters :-) It avoids you to mess with the opApply. -------------------------- Oskar Linde:
But I agree that it would be useful to have a function that prints strings with
"".<

I have already written it, it's called put/putr, and it solves all the problems I was talking of :-)
Preferably, it should be able to convert any expression of basic types into a
string representation, that when parsed by the compiler as a literal yields an
identical value.<

This may require some extra testing to see if it's true.
Ideally, a formatter should allow you to define any format you want.<

You can define all the formats you want, but sensible defaults are necessary anyway. I was talking about the standard tools in standard configuration, that you use when you debug, when you write things quickly, etc.
This is not a writefln inconsistency as much as it is a DMD inconsistency.<

I see, and I think it deserves to be fixed :-)
.dup should definitely not be any slower (unless you have a GC compiled with
DbC that does a memcmp after each .dup). Do you have a benchmark showing this?<

I have just written a benchmark, and now .dup is quite faster than the other (about 50% faster, and it's a repeatable result), so I shut up.
Something like this should work with current DMD and derivations that keep the
memory layout of AA's:<

Thank you, I'll test it.
Using string mixins and some CTFE it is not that hard implementing a bit field
replacement.<

Okay. Maybe this can be standardized inside a Phobos lib (unrelated: bitvectors of Phobos are nice, but they are slow compared to the old 'bit' type).
My solution would be to make string literals dynamic arrays instead of static.<

I'd like that. I don't know what's the point of having them a different kind of strings (but I like to have a way to use static array of chars anyway, they may be useful now and then).
I usually make my own containers when the built in AA doesn't cut it, such as:<

Designing and tuning a very good associative array/set requires months or years, so I was talking about the built-in one.
I have implemented heaps of such functions too and I believe most people have
implemented at least a few of those.<

I am sure lot of people have done similar things, because when used (and not over-used) they can help (expecially if you are a bit used to functional programming). It's just I think I have implemented lot of them, and mine are probably more flexible (they usually accept arrays, AAs and iterable objects, more than one when possible, mixed types when possible, etc. There are some limitations that I have tries to push, some can't be removed) :-)
Luckily, Tango has some really well implemented and solid functions in
tango.core.Array. A collection that can only grow.<

I have seen them, and I think they are nice, but I was talking about something for functional-style coding.
BTW, I really like your naming convention for in place versions (as I assume
they are). I might steal that right away :). Or does i mean those are iterator
functions?<

You are free to steal. I can probably offer my functional module to Tango developers too. My "isomething" has the same meaning as in the itertools module of Python, they are related to lazy iterable objects.
Another example is comparing two structs for equality.<

I see :-)
First, the name space "pollution" is no problem in D. You can never
accidentally call an unwanted overload of a function from another module. D
will issue an ambiguity error when the same identifier matches instances in
several different modules.<

I see.
Is both much harder to read and parse (to me atleast) and also hard to indent
in any meaningful way.<

I think both versions aren't much readable (but the first is better), I suggest you to split your lines in more parts, so you can give those parts a name and you can comment the lines too. Thank you for the many answers, and the two snippets of code I may find useful. Bear hugs, bearophile
Aug 09 2007
next sibling parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
bearophile <bearophile wrote:
 I have changed the way I answer, hopefully this is better, now I group answers
to a person.

This is much better, but I would still recommend you to reply to each post(and person) in a separate message. Not doing that breaks the message threading (the References header in your post will not refer to the correct messages) making it much harder for others to follow the discussion. The web-news reader has lots of issues in itself, so I would really recommend you to use an NNTP client if possible. Thunderbird seems to be the most popular one.
 Sean Kelly:
 Though I suppose a property method could be created:<

Nice, I'll try that. But maybe a syntax like this one is simpler: myArray[1 .. 3].copyTo(2, 4);

This would also have the advantage of actually being implementable with current DMD.
 BCS <ao at pathlink dot com>:
 
 under the covers, yeild is a disaster IMO<

Maybe this is true for a low level language, I don't know. But in Python it's very useful and quite commonly used (but I don't like its V.2.5 use for coroutines, it's too much confusing and its syntax is dirty because it looks like a statement used as an expression). It's used in C# too and it seem to cause no disasters :-) It avoids you to mess with the opApply.

yield would not be any worse than current opApply. In fact, it would be identical on the low level side. To improve the low level side of this, the compiler would have to be better at inlining and const-folding delegate calls.
 I usually make my own containers when the built in AA doesn't cut it, such as:<

Designing and tuning a very good associative array/set requires months or years, so I was talking about the built-in one.

Many times people have been posting improvements to the built in AA. The problem is proving that the improvements really are better for the general case. The hardest part has proven to be defining the "general case".
 I have implemented heaps of such functions too and I believe most people have
implemented at least a few of those.<

I am sure lot of people have done similar things, because when used (and not over-used) they can help (expecially if you are a bit used to functional programming). It's just I think I have implemented lot of them, and mine are probably more flexible (they usually accept arrays, AAs and iterable objects, more than one when possible, mixed types when possible, etc. There are some limitations that I have tries to push, some can't be removed) :-)

I have also made quite general functions and implemented iterator versions for many of them too. I have really not had much time lately (as in the last 10 months), but would still be interested to see your code. There are quite a few new features in D since I worked on this last that should make some things much easier now.
 Is both much harder to read and parse (to me atleast) and also hard to indent
in any meaningful way.<

I think both versions aren't much readable (but the first is better), I suggest you to split your lines in more parts, so you can give those parts a name and you can comment the lines too.

Thank you for your comments on my personal coding style ;p. Cheers, -- Oskar
Aug 09 2007
prev sibling next sibling parent Sean Kelly <sean f4.ca> writes:
bearophile <bearophile wrote:
 
 --------------------------
 
 BCS <ao at pathlink dot com>:
 
 under the covers, yeild is a disaster IMO<

Maybe this is true for a low level language, I don't know. But in Python it's very useful and quite commonly used (but I don't like its V.2.5 use for coroutines, it's too much confusing and its syntax is dirty because it looks like a statement used as an expression). It's used in C# too and it seem to cause no disasters :-) It avoids you to mess with the opApply.

For what it's worth, Tango contains a Fiber implementation that can be used for coroutines and such. Sean
Aug 09 2007
prev sibling parent BCS <ao pathlink.com> writes:
Reply to bearophile,

 --------------------------
 
 BCS <ao at pathlink dot com>:
 
 under the covers, yeild is a disaster IMO<
 

Python it's very useful and quite commonly used (but I don't like its V.2.5 use for coroutines, it's too much confusing and its syntax is dirty because it looks like a statement used as an expression). It's used in C# too and it seem to cause no disasters :-)

I am currently working on a project that is written in C# but which we are considering converting to D. The use of yield has turned out to be a significant performance problem. This is the reson I don't like it.
 It avoids you to
 mess with the opApply.
 

D's opApply syntax IMO seems much cleaner and easier to understand. Also it can work without any heap allocations. OTOH you can fake it quiet well with an delegate that stores most of it's state in an object. |class Walk(T) |{ | T[] dat; | int at; | | this(T[] d){dat = d; at = 0;} | | int opApply(int delegate(inout int, inout T) dg) | { | while(at < dat.length) | { | int j = at; | if(int ret = dg(j, dat[at])) return ret; //yeild somtimes | at++; | } | } |} | |auto walker = new Walk!(char)("hello world"); |foreach(inout int i, inout char c; walker) if(c == ' ') break; //skip hello |foreach(inout int i, inout char c; walker) writef("%s", c); //prints world
 --------------------------

Aug 09 2007