digitalmars.D.learn - Memory leak with dynamic array

Joseph Wakeling (32/32) Apr 10 2010 Hello all,

bearophile (22/23) Apr 10 2010 Dynamic arrays in D are not black boxes, they are an abstraction that le...
bearophile (24/25) Apr 10 2010 My precedent answer was ignorant. Read about the capacity, reserve and a...

bearophile (97/97) Apr 10 2010 Few dynamic array benchmarks that can show you more differences between ...

bearophile (28/28) Apr 10 2010 This can help confuse your mind a bit:

Steven Schveighoffer (17/53) Apr 10 2010 Yes, to allocate an array of 4 integers, you need 16 bytes, but there

Steven Schveighoffer (12/22) Apr 10 2010 That is to be expected. The append function is not the most efficient

bearophile (22/33) Apr 10 2010 Steven Schveighoffer:

bearophile (22/22) Apr 11 2010 One more small thing I was forgetting. In my dlibs1 there are functions ...
Joseph Wakeling (61/75) Apr 11 2010 Thanks very much for the extended and detailed explanations. The assume...

bearophile (105/116) Apr 11 2010 'reserve' is not a dynamic array method, it's a free function contained ...

Ellery Newcomer (4/5) Apr 11 2010 OTish: It would be nice if the data-structures-to-be are designed such

bearophile (5/8) Apr 11 2010 Andrei will probably do what you are asking for here.

bearophile (22/23) Apr 11 2010 I was wrong, this is not so bad looking:

Ellery Newcomer (20/43) Apr 11 2010 It won't work for classes, though.

bearophile (30/32) Apr 12 2010 Something that I have forgotten, for the original poster: the D append i...

bearophile (42/42) Apr 11 2010 A third uglier D version, this is a bit faster than the C++ version. Wit...
Joseph Wakeling (56/90) Apr 12 2010 Thanks to all again for the discussion, examples and explanations. :-)

Steven Schveighoffer (47/104) Apr 12 2010 AFAIK, gdc hasn't been actively developed for a few years.

Joseph Wakeling (30/61) Apr 12 2010 http://bitbucket.org/goshawk/gdc/wiki/Home
Joseph Wakeling (32/34) Apr 12 2010 ... solved; it's interesting to note that assumeSafeAppend has to be

Steven Schveighoffer (8/37) Apr 13 2010 This should work as you expect. Can you produce a full code example? A...

Joseph Wakeling (8/10) Apr 13 2010 Weird; now it works. I wonder if because I upgraded to 2.043 ... ? I

Steven Schveighoffer (19/27) Apr 13 2010 2.043 has fixed a corruption bug, one which you are unlikely to have

Joseph Wakeling (6/9) Apr 13 2010 If I include a line

Steven Schveighoffer (51/60) Apr 13 2010 x.reserve(5_000_000) will reserve the memory for the first loop. But th...

Steven Schveighoffer (8/15) Apr 13 2010 Assuming you may ask questions about this, it's not an exact description...

Joseph Wakeling (27/32) Apr 13 2010 I realised that while running some of the code that bearophile suggested

Steven Schveighoffer (17/20) Apr 13 2010 D special cases foreach on arrays to basically be a standard array for
Don (3/34) Apr 13 2010 The D code compiles directly to a memset and memcpy, which are

Joseph Wakeling (23/25) Apr 13 2010 OK. I'm just bemused about why -- having solved the memory leak -- my D

bearophile (5/7) Apr 13 2010 I'll try to take a look at your D code. In the meantime you can show the...
bearophile (20/20) Apr 13 2010 So far I've just given a light reading of the code. Notes:

bearophile (14/14) Apr 13 2010 This is a little cleaned up version of the result of the profiling (no i...

bearophile (223/223) Apr 13 2010 This is a single module, and on my PC is more than two times faster than...

bearophile (209/209) Apr 13 2010 This compiles for D1. With LDC it runs about two times faster still, abo...

Joseph Wakeling (73/73) Apr 14 2010 Hi Bearophile,

Steven Schveighoffer (13/37) Apr 15 2010 It is because D2 is more advanced in preserving memory. In D1, setting ...
bearophile (102/108) Apr 15 2010 This can be true, but such differences are not magic, they can be found,...

bearophile (3/4) Apr 15 2010 To find why the C++ code is faster you can show me the equivalent C++ co...

Joseph Wakeling (14/17) Apr 17 2010 I took a little time last night and wrote up a C++ version that is much

Joseph Wakeling (36/52) Apr 16 2010 No need to apologise! You helped me significantly improve my code,

bearophile (71/73) Apr 14 2010 A test shows that on ldc the two nested loops are a little faster withou...

Don (5/7) Apr 14 2010 That's not a bug. It's intentional. x ^^ y will probably always require

Lars T. Kyllingstad (11/19) Apr 14 2010 Really? Why is that? I find that kind of disappointing, I always

Don (14/34) Apr 14 2010 Because pow() for floating point, when implemented properly, is a HUGE

bearophile (36/42) Apr 14 2010 I have just read the source of pow() and indeed it's a complex function ...

Don (8/73) Apr 14 2010 It's not even implemented correctly yet. A correct implementation is AT
Justin Spahr-Summers (6/13) Apr 14 2010 I think I've only ever used a square root once, and that was to get the

Lars T. Kyllingstad (27/64) Apr 14 2010 Exponentiation is a built-in operation in FORTRAN, so I made this little...

Lars T. Kyllingstad (5/50) Apr 14 2010 Scratch that, I just remembered that libgfortran is dynamically linked

Don (9/62) Apr 14 2010 I have a vague recollection that correctly-rounded pow() will require

Robert Clipsham (6/14) Apr 14 2010 I've always disliked the requirement to import std.math, but only

bearophile (4/7) Apr 14 2010 I hope Tango2 will be designed to be installed beside Phobos2, and not i...

Joseph Wakeling (3/10) Apr 15 2010 I thought that was the aim ... ? At least according to D's Wikipedia pa...

Ary Borenszweig (13/36) Apr 15 2010 Is there a better way to do pow() for floating point numbers without

Don (4/45) Apr 15 2010 That's a good point, it should be possible to use a static import as

Ary Borenszweig (4/51) Apr 15 2010 But if you do a static import the whole module gets linked in, right?

bearophile (4/6) Apr 15 2010 Can the automatic import happen only iff a module uses the ^^fp?

Ary Borenszweig (2/10) Apr 16 2010 That's what I'm asking for.

Don (7/60) Apr 16 2010 But you're assuming that you're using ^^ without using anything else

Ary Borenszweig (9/71) Apr 16 2010 - Oh, hi Foo, I want to show you this great language D. You can doo blah...

bearophile (7/9) Apr 14 2010 What's the rationale behind this decision?

bearophile (10/13) Apr 14 2010 As you have seen from my benchmarks, when with dmd you use:

Brad Roberts (9/16) Apr 12 2010 All new subscribers to the lists start off moderated until a post flows

Don (12/22) Apr 12 2010 There are quite a lot of us here with exactly that kind of background.

Steven Schveighoffer (45/67) Apr 11 2010 It's new. It probably should be mentioned in the spec, but it's a

Joseph Wakeling <joseph.wakeling gmail.com> writes:

Hello all,

This one is another 'oh God it doesn't work like C++' query ... :-)

I've been having a problem with a memory leak that I've tied down to a dynamic
array that is repeatedly resized to zero and then extended within a loop.  The
following code gives a minimal example:

////////////////////////////////////////////////////////////////////
import std.array;
import std.stdio;

void main()
{
    real[] x;

    x.length = 5000000;

    foreach(uint i;0..100) {
        x.length = 0;

        foreach(uint j;0..5000000)
            x ~= j;

        writefln("At iteration %u, x has %u elements.",i,x.length);
    }
}
////////////////////////////////////////////////////////////////////

If I run this on my system, the memory usage explodes within a very short
time, despite the fact that the length of the array is set early on, so the
memory should all be reserved.  The writefln() command confirms that the
actual number of elements always stays within the original bounds set.

I'm guessing this is down to concatenation ~ working differently from C++
vectors' push_back() command.  I have also tried using the Appender and its
put() command, but the same result happens.

I even tried moving the array declaration within the first foreach() loop, and
putting an explicit delete at the end, but to no avail.

Can anyone advise? :-)

Thanks & best wishes,

    -- Joe

Apr 10 2010

bearophile <bearophileHUGS lycos.com> writes:

Joseph Wakeling:

 Can anyone advise? :-)

Dynamic arrays in D are not black boxes, they are an abstraction that leaks a
lot, so you must know how dynamic arrays are implemented. Recently the
implementation of dynamic arrays being changed, has someone a link to a
description of how they work now?

Few notes on your code: there is no need to import std.array into that little
program. Often there is no need to add an "uint" inside the foreach. I suggest
to put a space after every comma. I also suggest you to write big numbers like
this: 5_000_000 because they can avoid some mistakes.

From the little I know about the new arrays this doesn't work anymore, zeroing
the length produces a deallocation:
x.length = 5_000_000;
x.length = 0;

So a basic strategy to face your problem is to not let the GC work:


import std.stdio: writefln;

void main() {
    auto x = new real[5_000_000];

    foreach (i; 0 .. 1_000) {
        int x_len = 0;

        foreach (j; 0 .. 5_000_000) {
            x[x_len] = j;
            x_len++;
        }

        writefln("At iteration %u, x has %u elements.", i, x_len);
    }
}


When the GC is not good enough one thing you can do is to manage the memory in
a less automatic way...

Bye,
bearophile

Apr 10 2010

bearophile <bearophileHUGS lycos.com> writes:

Joseph Wakeling:

 Can anyone advise? :-)

My precedent answer was ignorant. Read about the capacity, reserve and
assumeSafeAppend here:
http://www.digitalmars.com/d/2.0/phobos/object.html

I think a short page about the design of the new dynamic arrays can be added to
the D site.

This is a version of the program that doesn't blow up the memory (it uses more
and more memory but just a bit):

import std.stdio: writeln, writefln;

void main() {
    enum int N = 5_000_000;
    real[] arr;
    writeln("A) arr.capacity: ", arr.capacity);
    arr.reserve(N); // optional
    writeln("B) arr.capacity: ", arr.capacity);

    foreach (i; 0 .. 100) {
        arr.length = 0;
        writeln("1) arr.capacity: ", arr.capacity);
        assumeSafeAppend(arr); // not optional
        writeln("2) arr.capacity: ", arr.capacity);

        foreach (j; 0 .. N)
            arr ~= j;

        writefln("At iteration %u, arr has %u elements.", i, arr.length);
    }
}

Bye,
bearophile

Apr 10 2010

bearophile <bearophileHUGS lycos.com> writes:

Few dynamic array benchmarks that can show you more differences between C++ and
D2.

Timings dmd, N=5_000_000, NLOOPS=15, T=double, seconds:
  C++ v1: 0.67
  C++ v1: 0.72  
  D v1:   6.47 (uses >1 GB RAM)
  D v2:   0.76
  D v3:   5.25

dmd v2.043, dmd -O -release -inline
g++ 4.4.1, -O3 -Wall -s

-----------------

// D v.1
import std.c.stdio: printf;

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 15;

    T[] arr;
    arr.length = N;

    foreach (i; 0 .. NLOOPS) {
        arr.length = 0;

        foreach (j; 0 .. N)
            arr ~= j;

        printf("At iteration %d, arr has %u elements.\n", i, arr.length);
    }
}

-----------------

// D v.2
import std.c.stdio: printf;

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 15;

    auto arr = new T[N];

    foreach (i; 0 .. NLOOPS) {
        uint arr_len = 0;

        foreach (j; 0 .. N) {
            arr[arr_len] = j;
            arr_len++;
        }

        printf("At iteration %d, arr has %u elements.\n", i, arr_len);
    }
}

-----------------

// D v.3
import std.c.stdio: printf;

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 15;

    T[] arr;
    arr.reserve(N);

    foreach (i; 0 .. NLOOPS) {
        arr.length = 0;
        arr.assumeSafeAppend;

        foreach (j; 0 .. N)
            arr ~= j;

        printf("At iteration %d, arr has %u elements.\n", i, arr.length);
    }
}

-----------------

// C++ v.1
#include "stdio.h"
#include <vector>

int main() {
    typedef double T;
    const int N = 5000000;
    const int NLOOPS = 15;

    std::vector<T> arr;
    arr.reserve(N);

    for (int i = 0; i < NLOOPS; i++) {
        arr.clear();

        for (int j = 0; j < N; j++)
            arr.push_back(j);

        printf("At iteration %d, arr has %u elements.\n", i, arr.size());
    }
}

-----------------

// C++ v.2
#include "stdio.h"
#include <vector>

int main() {
    typedef double T;
    const int N = 5000000;
    const int NLOOPS = 15;

    std::vector<T> arr(N);

    for (int i = 0; i < NLOOPS; i++) {
        size_t arr_len = 0;

        for (int j = 0; j < N; j++) {
            arr[arr_len] = j;
            arr_len++;
        }

        printf("At iteration %d, arr has %u elements.\n", i, arr_len);
    }
}

-----------------

Bye,
bearophile

Apr 10 2010

bearophile <bearophileHUGS lycos.com> writes:

This can help confuse your mind a bit:

import std.stdio: writeln;
void main() {
    {
        int[] arr = [0, 1, 2, 3, 4, 5, 6].dup;
        int[] slice = arr[2 .. 4];
        writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 4 5
6 | 2 3
        slice ~= 10; slice ~= 20;
        writeln("arr.capacity, slice.capacity: ", arr.capacity, " ",
slice.capacity); // arr.capacity, slice.capacity: 7 7
        writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 4 5
6 | 2 3 10 20
    }

    {
        int[] arr = [0, 1, 2, 3, 4, 5, 6].dup;
        int[] slice = arr[2 .. 4];
        writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 4 5
6 | 2 3

        slice.assumeSafeAppend;
        slice ~= 10; slice ~= 20; // causes stomping
        writeln("arr.capacity, slice.capacity: ", arr.capacity, " ",
slice.capacity); // arr.capacity, slice.capacity: 0 5
        writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 10
20 6 | 2 3 10 20
        slice ~= 30; slice ~= 40;
        writeln("arr.capacity, slice.capacity: ", arr.capacity, " ",
slice.capacity); // arr.capacity, slice.capacity: 7 7
        writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 10
20 30 | 2 3 10 20 30 40
    }
}


The  slice.capacity = 7 in the first case is just a coincidence, it's the
result of the overallocation.

But I don't know why arr.capacity is zero and then seven in the second and
third case.

Bye,
bearophile

Apr 10 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 10 Apr 2010 18:17:12 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 This can help confuse your mind a bit:

 import std.stdio: writeln;
 void main() {
     {
         int[] arr = [0, 1, 2, 3, 4, 5, 6].dup;
         int[] slice = arr[2 .. 4];
         writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2  
 3 4 5 6 | 2 3
         slice ~= 10; slice ~= 20;
         writeln("arr.capacity, slice.capacity: ", arr.capacity, " ",  
 slice.capacity); // arr.capacity, slice.capacity: 7 7
         writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2  
 3 4 5 6 | 2 3 10 20
     }

     {
         int[] arr = [0, 1, 2, 3, 4, 5, 6].dup;
         int[] slice = arr[2 .. 4];
         writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2  
 3 4 5 6 | 2 3

         slice.assumeSafeAppend;
         slice ~= 10; slice ~= 20; // causes stomping
         writeln("arr.capacity, slice.capacity: ", arr.capacity, " ",  
 slice.capacity); // arr.capacity, slice.capacity: 0 5
         writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2  
 3 10 20 6 | 2 3 10 20
         slice ~= 30; slice ~= 40;
         writeln("arr.capacity, slice.capacity: ", arr.capacity, " ",  
 slice.capacity); // arr.capacity, slice.capacity: 7 7
         writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2  
 3 10 20 30 | 2 3 10 20 30 40
     }
 }


 The  slice.capacity = 7 in the first case is just a coincidence, it's  
 the result of the overallocation.

Yes, to allocate an array of 4 integers, you need 16 bytes, but there  
needs to be a padding byte to prevent cross-block pointer problems, so it  
uses a 32-byte block.  The 32-byte block also needs a padding byte, so  
really you can only put in 7 elements, not 8.

 But I don't know why arr.capacity is zero and then seven in the second  
 and third case.

0 means if you append to the array, it will reallocate.  It will return 0  
for stack-allocated arrays also.  This makes sense since the slice has  
taken over the "allocated" length of the block.  Essentially, capacity  
indicates how many elements can be appended.  The function gives up and  
returns 0 if it determines the array does not end at the allocated part of  
a block.  Technically, I could return the length of the array, but I'm not  
sure whether that is as useful.

For fun, add one more element to slice, and the arr will now have a valid  
capacity :)

FYI, using arr after assuming safe append on the slice is undefined.

-Steve

Apr 10 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 10 Apr 2010 17:19:17 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Few dynamic array benchmarks that can show you more differences between  
 C++ and D2.

 Timings dmd, N=5_000_000, NLOOPS=15, T=double, seconds:
   C++ v1: 0.67
   C++ v1: 0.72
   D v1:   6.47 (uses >1 GB RAM)
   D v2:   0.76
   D v3:   5.25

 dmd v2.043, dmd -O -release -inline
 g++ 4.4.1, -O3 -Wall -s

That is to be expected.  The append function is not the most efficient  
thing.  It still must do a lot of work just to look up the valid length  
and verify appending can be done, whereas the v2 "append" is a single  
instruction!

The append function is good when you don't know what the length of an  
array is, and you are appending a small number of elements, or another  
array.  I don't advise using it for one-at-a-time appending if you know  
you have a large number of elements to append.  A better method is to set  
the length, and then write the data.

-Steve

Apr 10 2010

bearophile <bearophileHUGS lycos.com> writes:

Steven Schveighoffer:

Thank you for your answers Steven, I always appreciate your efforts in both
trying to improve D runtime and in teaching me :-)

The new dynamic array regime has the main advantage of avoiding some
stomping-related bugs (a secondary advantage is a little higher append speed).
One its disadvantage is its increased complexity, that makes it harder for the
programmer to understand what dynamic arrays are actually doing under the
cover. Sometimes you don't need to know what's happening under the cover, but
in this case arrays are so basic and common data structures, and their
abstraction is leaking so much, that I think a D programmer must know what's
happening inside them to use them at their best.


   C++ v1: 0.67
   C++ v1: 0.72


That the 0.72 timing was from the second C++ program, sorry.


 That is to be expected.  The append function is not the most efficient  
 thing.  It still must do a lot of work just to look up the valid length  
 and verify appending can be done, whereas the v2 "append" is a single  
 instruction!

The v2 "append" seems two instructions:
arr[arr_len] = j;
arr_len++;

And the C++ v1 too has to verify if the appending can be done, comparing the
size with the capacity and if so assigning the item and increasing the size.
I understand that the D code has to work more here, and I presume your code
can't be improved. But those timings are a little sad anyway :-|


 A better method is to set the length, and then write the data.

What I have done in the D v2 example.


0 means if you append to the array, it will reallocate.  It will return 0  

for stack-allocated arrays also.  This makes sense since the slice has  
taken over the "allocated" length of the block.  Essentially, capacity  
indicates how many elements can be appended.  The function gives up and  
returns 0 if it determines the array does not end at the allocated part of  
a block.<
For fun, add one more element to slice, and the arr will now have a valid
capacity :)<

I will have to read this some more times, and probably I will have to read the
source code of the append implementation :-)


Technically, I could return the length of the array, but I'm not sure whether
that is as useful.<

I like this.


FYI, using arr after assuming safe append on the slice is undefined.<

/me nods.

Can you please write a short page, to be added to the D site (for example in
the "Articles" section) that explains the data structures used under the cover
by the dynamic arrays and how such data structures work (and maybe why this is
the chosen design of arrays instead of other possible designs, how it iteracts
with the current GC)? The purpose of this short text is to help D programmers
understand what and why dynamic arrays work this way in their programs. Dynamic
arrays are among the most common data structure in D programs, so they are
pretty important and basic.

If you are willing to write a first draft of this small document I can offer
you any kind of help you want, for example I can read and comment your text,
fix typos, convert it to html, I can even draw for you small data structures
with their little pointers, I can beg Walter to add the resulting html page to
his site, and so on :-) It's not a big work, despite all it can't be a very
complex data structure.

Bye and thank you,
bearophile

Apr 10 2010

bearophile writes:

One more small thing I was forgetting. In my dlibs1 there are functions similar
to:


T[][] NewVoidGCMatrix(T)(int nr, int nc) {
    // Part of the code by Marius Muja <mariusm cs.ubc.ca>
    assert(nr > 0, "NewVoidCGMatrix: nr must be > 0.");
    assert(nc > 0, "NewVoidCGMatrix: nc must be > 0.");
    void* mem = cast(void*)gcmalloc(nr * (T[]).sizeof + nr * nc * T.sizeof);
    hasNoPointers(mem);
    T[]* index = cast(T[]*)mem;
    T* mat = cast(T*)(mem + nr * (T[]).sizeof);

    for (int i = 0; i < nr; ++i) {
        index[i] = mat[0 .. nc];
        mat += nc;
    }

    return index[0 .. nr];
}

void delVoidGC(T)(ref T[] a) {
    gcrealloc(a.ptr, 0);
    a = null;
}


They allow to allocate only one chunk of memory for a matrix, and in some
situations they give a little more performance (because they give a little more
CPU cache coherence). I guess such functions can't be used in D2. But maybe
once I know the underlying data structure I can write something similar for D2
too.

Bye,
bearophile

Apr 11 2010

Joseph Wakeling <joseph.wakeling gmail.com> writes:

Thanks very much for the extended and detailed explanations.  The
assumeSafeAppend
function
indeed fixes the memory leakage. :-)

In reply to some comments ...

 That is to be expected.  The append function is not the most efficient
 thing.  It still must do a lot of work just to look up the valid length
 and verify appending can be done, whereas the v2 "append" is a single
 instruction!

 The v2 "append" seems two instructions:
 arr[arr_len] = j;
 arr_len++;

 And the C++ v1 too has to verify if the appending can be done, comparing
 the size with the capacity and if so assigning the item and increasing the
 size.

 I understand that the D code has to work more here, and I presume your code
 can't be improved. But those timings are a little sad anyway :-|

 A better method is to set the length, and then write the data.

 What I have done in the D v2 example.

I was very happy to see that D _does_ have a 'reserve' function for arrays,
which I had been missing compared to C++ (it's not mentioned in the array
docs).

Still, I don't think that pre-reserving the memory per se is the influencing
factor on the differences in performance.  To see why, compare these two pieces
of code -- a non-leaking D code, and the equivalent in C++:

////////////// D example
/////////////////////////
import std.stdio;

void main()
{
    double[] x;

    foreach(uint i;0..100) {
        x.length = 0;

        writefln("Array capacity (1) = %u",x.capacity);
        assumeSafeAppend(x);
        writefln("Array capacity (2) = %u",x.capacity);

        foreach(uint j;0..5000000)
            x ~= j;

        writefln("At iteration %u, x has %u elements.",i,x.length);
    }
}
/////////////////////////

//////////// C++ example
/////////////////////////
#include <vector>
#include <iostream>

using namespace std;

int main()
{
    vector<double> x;

    for(unsigned int i=0;i!=100;++i) {
        x.resize(0);

        cout << "Vector capacity: " << x.capacity() << endl;

        for(unsigned int j=0;j!=5000000;++j)
            x.push_back(j);

        cout << "At iteration " << i << ", x has " << x.size() << " elements."
<<
endl;
    }

    return 0;
}
/////////////////////////

Note that in C++ the memory is not preassigned either.  The difference
between the performance of these pieces of code is striking -- on my
machine the D example takes about 70--75 seconds to run, whereas the
C++ example speeds through it in 10s or less.

D also uses about 20% more memory than the C++ even though the C++ code
declares a higher capacity for the vector (above 8 million) than D does
for the array (a little over 5 million).

I don't know if it was naive to assume that D's dynamic arrays would be
equivalent to vectors in C++.  But it's clear that the D array appender
~= is much slower than the C++ vector push_back().

Using an Appender class and put() (from std.array) is even slower,
despite the std.array docs recommending this over ~. :-(

It's disappointing because I'm loving so much of D's syntax, but I can
write far more efficient code (with less explicit memory management)
in C++ ...

Apr 11 2010

bearophile <bearophileHUGS lycos.com> writes:

Joseph Wakeling:

 I was very happy to see that D _does_ have a 'reserve' function for arrays,
 which I had been missing compared to C++ (it's not mentioned in the array
 docs).

'reserve' is not a dynamic array method, it's a free function contained in the
Phobos default module that also contains object, see the docs page:
http://www.digitalmars.com/d/2.0/phobos/object.html
It's not mentioned in the array docs because the 'reserve' and related
functions are new, they come from a recent change in how dynamic arrays are
implemented in D.


 Still, I don't think that pre-reserving the memory per se is the influencing
 factor on the differences in performance.

You are right. The difference in performance comes from mostly from the way D
arrays are designed. I agree with you that dynamic array append in D is slow,
in absolute terms. Comparing the performance of D code with C++ is a good thing
to do, it gives a baseline for comparison.

D dynamic arrays are more flexible than C++ vector, they can be sliced, such
slicing is O(1), and the slices are seen by the language just like other
arrays. So you pay the price of some performance for such increased
flexibility. The idea here is that the built-in data types must be as flexible
as possible even if their performance is not so high, so they can be used for
many different purposes. Then D standard library will have specialized data
structures that are faster thanks to being more specialized and less flexible.
In D dynamic arrays some of the performance price is also paid for the
automatic memory management, for the GC that's not a precise GC (for example if
your array has some empty items at the end past its true length, the GC must
ignore them).

I think D dynamic arrays have a lower append performance because they don't
contain the capacity in a handy place (their new design has a capacity, but
it's not beside length and pointer). I don't know how the D array append works
in situations of multithreading. If you don't require the slicing flexibility,
you can use D language to implement a C++-style Vector in D. If you implement a
struct that contains capacity, length and pointer to data, and you don't let
the GC scan the memory of the data, and you add method overloading for the ~=
that doubles the size of the memory with a realloc once in a while, you
probably have a performance similar to the C++ one (not equal because the
performance profile of the GC in allocating memory chunks is a little more
complex than the C malloc). See the bottom of this post.


 D also uses about 20% more memory than the C++

I don't know why, I presume it's caused by the GC bookkeeping, it must be
careful to avoid memory fragmentation. Currently the D GC is not so refined.


 Using an Appender class and put() (from std.array) is even slower,
 despite the std.array docs recommending this over ~. :-(

That was useful and designed for the precedent design of the dynamic arrays.
Maybe it can be removed from Phobos now...


 It's disappointing because I'm loving so much of D's syntax, but I can
 write far more efficient code (with less explicit memory management)
 in C++ ...

As you know all data structures are the result of compromises, they are a
balance of performance for their different uses, there is no perfect data
structure; in practice D arrays are not designed for an efficient "en mass"
appending. A D dynamic array is seen locally as a struct of 2 words. So it's
cheap to copy and pass around. Why they don't contain the capacity too in such
struct is a long story.

If you like to write benchmarks you will also find performance difference
between for example a C++ hash_map and the built-in associative arrays of D.

Beside the design (that for example can make it hard to implement
append-efficient dynamic arrays in D), D language is also rather new still, so
far the focus was on its design, not in tuning the performance of its
implementation. It's being several years they are performance-tuning C++
implementations :-)

-------------------

Below three implementations in D and C++. The Vector(T) is bare-bones. It
missed most features and it's unsafe.

In the D2 version I don't use the GC, and the memory is deallocated when arr
gets out of scope.

Timings, T=double, N=5_000_000, NLOOPS=100, seconds:
  D1:  38.20
  D2:   8.32
  C++:  6.65

You can see the performance of the append in D2 is a matter of data structure
implementation, it's not a language problem :-)

With LDC (once we'll have a D2 version of it) the performance of D2 can
probably be the same as the C++. DMD maybe loses a little here because it's not
so good at inlining, or maybe because the C++ vector is better than this D2
code.

-------------------

// First D version
import std.stdio: printf;

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 100;
    T[] arr;

    foreach (i; 0 .. NLOOPS) {
        arr.length = 0;
        arr.assumeSafeAppend;
        printf("Array capacity = %d\n", arr.capacity);

        foreach (uint j; 0 .. N)
            arr ~= j;

        printf("At iteration %u, x has %u elements.\n", i, arr.length);
    }
}

------------------

// Second D version
import std.stdio: printf;
import std.c.stdlib: malloc, realloc, free;

struct Vector(T) {
    enum double FAST_GROW = 2;
    enum double SLOW_GROW = 1.3;
    enum int STARTING_SIZE = 4;
    enum int BIG_MEM = (1 << 27) / T.sizeof;

    T* ptr;
    int length, capacity;

    ~this() {
        free(ptr);
        ptr = null;
        length = 0;
        capacity = 0;
    }

    void opOpAssign(string Op:"~=")(T item) {
        if (length >= capacity) {
            if (capacity == 0)
                _grow_capacity(STARTING_SIZE);
            else if (capacity < BIG_MEM)
                _grow_capacity(cast(int)(capacity * FAST_GROW));
            else
                _grow_capacity(cast(int)(capacity * SLOW_GROW));
        }

        ptr[length] = item;
        length++;
    }

    void _grow_capacity(int new_capacity) {
        ptr = cast(T*)realloc(ptr, new_capacity * T.sizeof);
        assert(ptr);
        this.capacity = new_capacity;
    }
}

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 100;
    Vector!T arr;

    foreach (i; 0 .. NLOOPS) {
        arr.length = 0;
        printf("Array capacity = %d\n", arr.capacity);

        foreach (uint j; 0 .. N)
            arr ~= j;

        printf("At iteration %u, x has %u elements.\n", i, arr.length);
    }
}

------------------

// C++ version

#include "stdio.h"
#include <vector>

int main() {
    typedef double T;
    const unsigned int N = 5000000;
    const unsigned int NLOOPS = 100;
    std::vector<T> arr;

    for (unsigned int i = 0; i < NLOOPS; i++) {
        arr.resize(0);
        printf("Array capacity = %d\n", arr.capacity());

        for (unsigned int j = 0; j < N; j++)
            arr.push_back(j);

        printf("At iteration %u, x has %u elements.\n", i, arr.size());
    }
}

Bye,
bearophile

Apr 11 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 04/11/2010 07:42 PM, bearophile wrote:
 D dynamic arrays are more flexible than C++ vector, they can be sliced, such
slicing is O(1), and the slices are seen by the language just like other
arrays. So you pay the price of some performance for such increased
flexibility. The idea here is that the built-in data types must be as flexible
as possible even if their performance is not so high, so they can be used for
many different purposes. Then D standard library will have specialized data
structures that are faster thanks to being more specialized and less flexible.
In D dynamic arrays some of the performance price is also paid for the
automatic memory management, for the GC that's not a precise GC (for example if
your array has some empty items at the end past its true length, the GC must
ignore them).

OTish: It would be nice if the data-structures-to-be are designed such 
that they can be easily dropped in wherever dynamic arrays are used. At 
least the list-like ones.

Apr 11 2010

bearophile <bearophileHUGS lycos.com> writes:

Ellery Newcomer:
 OTish: It would be nice if the data-structures-to-be are designed such 
 that they can be easily dropped in wherever dynamic arrays are used. At 
 least the list-like ones.

Andrei will probably do what you are asking for here.

But those list-like data structures probably can't be defined using the normal
array literals :-) D2 operator overloading is flexible enough for many
purposes, but data structure literals are a weak spot of D. For example you
need to put the digits in a string to initialize a large BigInt. It's not easy
to invent a way to allow literal design through code in a C-like language.

Bye,
bearophile

Apr 11 2010

bearophile <bearophileHUGS lycos.com> writes:

 But those list-like data structures probably can't be defined using the normal
array literals :-)

I was wrong, this is not so bad looking:

struct List(T) {
    // Node of std.range.SListRange is not static!
    static struct Node {
        T data;
        Node* next;
        this(T d, Node* p=null) {
            data = d;
            next = p;
        }
    }

    Node* lh;

    this(T[] arr) {
        foreach_reverse (el; arr)
            lh = new Node(el, lh);
    }
}

void main() {
    List!int items = [1, 2, 3];
}

Bye,
bearophile

Apr 11 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 04/11/2010 09:36 PM, bearophile wrote:
 But those list-like data structures probably can't be defined using the normal
array literals :-)

 I was wrong, this is not so bad looking:

 struct List(T) {
      // Node of std.range.SListRange is not static!
      static struct Node {
          T data;
          Node* next;
          this(T d, Node* p=null) {
              data = d;
              next = p;
          }
      }

      Node* lh;

      this(T[] arr) {
          foreach_reverse (el; arr)
              lh = new Node(el, lh);
      }
 }

 void main() {
      List!int items = [1, 2, 3];
 }

 Bye,
 bearophile

It won't work for classes, though.

I'd think there'd be more trouble with other parts of the language 
though, like slices. foreach with index is another thing I care about 
particularly, but it should be trivial to implement. I can't recall what 
else is still special-cased with array literals.

And I agree about integer literals. What would be nice is a way to 
override how the literals are interpreted. Maybe some sort of macroish 
facility that pattern matches and manipulates the AST at compile time...

<dreaming>

macro  (<Decl>:  (<Type> is X)
                  (<Identifier>)
                 =
                  (<IntegerLiteral>, lit), decl){
   string str = "`" ~ lit.getText() ~ "`";
   lit =  ( str2X( $(<Exp>, str) ));
   return decl;
}

Ick, that turned into a mess fast.

</dreaming>

Apr 11 2010

bearophile <bearophileHUGS lycos.com> writes:

Something that I have forgotten, for the original poster: the D append is not
templated code, so this produces no template bloat as in the C++ <vector>. This
is another compromise.

-----------

Ellery Newcomer:

It won't work for classes, though.<

You have to write:

class List(T) {
    static struct Node {
        T data;
        Node* next;
        this(T d, Node* p=null) {
            data = d;
            next = p;
        }
    }

    Node* lh;

    this(T[] arr) {
        foreach_reverse (el; arr)
            lh = new Node(el, lh);
    }
}

void main() {
    auto items = new List!int([1, 2, 3]);
}



Ick, that turned into a mess fast.<

I'd like to be a computer scientist, able to create a DMeta, similar to PyMeta.
A DMeta can be much cleaner than that "mess".
See the original OMeta:
http://tinlizzie.org/ometa/
and its Python version:
http://washort.twistedmatrix.com/
https://launchpad.net/pymeta

Bye,
bearophile

Apr 12 2010

bearophile <bearophileHUGS lycos.com> writes:

A third uglier D version, this is a bit faster than the C++ version. With few
experiments I've seen that the second D version was slower than the C++ code
because dmd missed the inlining of the opOpAssign and because it contaned the
if(capacity<BIG_MEM) test too.

LDC can surely inline the method in a situation like that. But the BIG_MEM test
was something I have added to reduce memory usage for very large vectors. I
don't know if the C++ implementation has something similar.


// Third D version
import std.stdio: printf;
import std.c.stdlib: malloc, realloc, free;

struct Vector(T) {
    enum double FAST_GROW = 2;
    enum double SLOW_GROW = 1.3;
    enum int STARTING_SIZE = 4;
    enum int BIG_MEM = (1 << 27) / T.sizeof;

    T* ptr;
    int length, capacity;

    ~this() {
        free(ptr);
        ptr = null;
        length = 0;
        capacity = 0;
    }
}

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 100;
    Vector!T arr;

    foreach (i; 0 .. NLOOPS) {
        arr.length = 0;
        printf("Array capacity = %d\n", arr.capacity);

        foreach (uint j; 0 .. N) {
            if (arr.length >= arr.capacity) {
                int new_capacity = arr.capacity ? cast(int)(arr.capacity *
arr.FAST_GROW) : arr.STARTING_SIZE;
                arr.ptr = cast(T*)realloc(arr.ptr, new_capacity * T.sizeof);
                assert(arr.ptr);
                arr.capacity = new_capacity;
            }

            arr.ptr[arr.length] = j;
            arr.length++;
        }

        printf("At iteration %u, x has %u elements.\n", i, arr.length);
    }
}

Bye,
bearophile

Apr 11 2010

Joseph Wakeling <joseph.wakeling gmail.com> writes:

Thanks to all again for the discussion, examples and explanations. :-)

One note -- I wouldn't want anyone to think I'm bashing D or complaining.
I've been interested in the language for some time and this seemed an
opportune time to start experimenting properly.  It's fun, I'm learning
a lot, and I'm genuinely touched by the amount of effort put in by
everyone on this list to teach and share examples.

I'm also fully aware that D is still growing, and that I need to be
patient in some cases ... :-)

bearophile wrote:
 D dynamic arrays are more flexible than C++ vector, they can be sliced,
 such slicing is O(1), and the slices are seen by the language just like
 other arrays. So you pay the price of some performance for such
 increased flexibility. The idea here is that the built-in data types
 must be as flexible as possible even if their performance is not so
 high, so they can be used for many different purposes.

No complaint there. :-)

 Then D standard library will have specialized data structures that are
 faster thanks to being more specialized and less flexible.

In my case -- I'm turning into 'Mr C++' again -- probably that's often
what I need.  If I look at the major benefits I found in moving from C
to C++, the first was memory management that was as automatic as I required.
For example, C++ vectors are great because they do away with having to
put in malloc/realloc/free statements and let you treat dynamic arrays
pretty much as 'just another variable'.

Within my own needs I've not yet found a case where the kind of smart GC
functionality discussed on this thread seemed necessary, but of course
I've never had it available to use before ... :-)

 In D dynamic arrays some of the performance price is also paid for the
 automatic memory management, for the GC that's not a precise GC (for
 example if your array has some empty items at the end past its true
 length, the GC must ignore them).

An idea was floating in my head about whether it is/could be possible to
turn off GC safety features in a scope where they are unnecessary --
rather like a more general version of the 'assumeSafeAppend' function...

 With LDC (once we'll have a D2 version of it) the performance of D2
 can probably be the same as the C++. DMD maybe loses a little here
 because it's not so good at inlining, or maybe because the C++ vector
 is better than this D2 code.

I thought dev effort was now focusing back on GDC ... ? :-P

I have actually not made much use of the -inline function because in
the code I wrote (maybe not best suited to inlining...), it made the
program generally run slower ...

Steven Schveighoffer wrote:
 The C++ example is reallocating memory, freeing memory it is no longer
 using.  It also manually handles the memory management, allocating larger
 and larger arrays in some algorithmically determined fashion (for example,
 multiplying the length by some constant factor).  This gives it an edge in
 performance because it does not have to do any costly lookup to determine
 if it can append in place, plus the realloc of the memory probably is
 cheaper than the GC realloc of D.

Right.  In fact you get precisely 24 allocs/deallocs, each doubling the
memory reserve to give a total capacity of 2^23 -- and then that memory is
there and can be used for the rest of the 100 iterations of the outer loop.
The shock for me was finding that D wasn't treating the memory like this
but was preserving each loop's memory (as you say, for good reason).

 D does not assume you stopped caring about the memory being pointed to
 when it had to realloc. [...] You can't do the same thing with C++
 vectors, when they reallocate, the memory they used to own could be
 freed.  This invalidates all pointers and iterators into the vector,
 but the language doesn't prevent you from having such dangling pointers.

I have a vague memory of trying to do something exactly like your example
when I was working with C++ for the first time, and getting bitten on the
arse by exactly the problem you describe.  I wish I could remember where.
I know that I found another (and possibly better) solution to do what I
wanted, but it would be nice to see if a D-ish solution would give me
something good.

 This must be fixed, the appender should be blazingly fast at appending
 (almost as fast as C++), with the drawback that the overhead is higher.

Overhead = memory cost?  I'm not so bothered as long as the memory stays
within constant, predictable bounds.  It was the memory explosion that
scared me.  And I suspect I'd pay a small performance cost (though it
would have to be small) for the kind of safety and flexibility the arrays
have.

 You haven't done much with it yet.  When you start discovering how much D
 takes care of, you will be amazed :)

I know. :-)

My needs are in some ways quite narrow -- numerical simulations in
interdisciplinary physics -- hence the C background, and hence the premium
on performance.  They're also not very big programs -- simple enough for me
to generally keep a personal overview on the memory management, even though
with C++ that's usually all taken care of automatically (no new or delete
statements if I can avoid it).

What I'm fairly confident about is that, given not too much time, D will
become a _far_ preferable language for that kind of development.

 The thing about D is it *can* be fast and unsafe, just as fast and unsafe
 as C, but that's not the default.

That's apparent -- I mean, given that D wraps the whole C standard library,
I could basically write C code in D if I wanted, no?  But of course it would
have all the notational complexities of C, which is what I'd like to escape
from ... :-P

Apr 12 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 12 Apr 2010 12:03:38 -0400, Joseph Wakeling  
<joseph.wakeling gmail.com> wrote:

 I thought dev effort was now focusing back on GDC ... ? :-P

AFAIK, gdc hasn't been actively developed for a few years.

ldc, on the other hand, has regular releases.  I think ldc may be the  
future of D compilers, but I currently use dmd since I'm using D2.


 Steven Schveighoffer wrote:
 The C++ example is reallocating memory, freeing memory it is no longer
 using.  It also manually handles the memory management, allocating  
 larger
 and larger arrays in some algorithmically determined fashion (for  
 example,
 multiplying the length by some constant factor).  This gives it an edge  
 in
 performance because it does not have to do any costly lookup to  
 determine
 if it can append in place, plus the realloc of the memory probably is
 cheaper than the GC realloc of D.

 Right.  In fact you get precisely 24 allocs/deallocs, each doubling the
 memory reserve to give a total capacity of 2^23 -- and then that memory  
 is
 there and can be used for the rest of the 100 iterations of the outer  
 loop.
 The shock for me was finding that D wasn't treating the memory like this
 but was preserving each loop's memory (as you say, for good reason).

Yes, you get around this by preallocating.

 D does not assume you stopped caring about the memory being pointed to
 when it had to realloc. [...] You can't do the same thing with C++
 vectors, when they reallocate, the memory they used to own could be
 freed.  This invalidates all pointers and iterators into the vector,
 but the language doesn't prevent you from having such dangling pointers.

 I have a vague memory of trying to do something exactly like your example
 when I was working with C++ for the first time, and getting bitten on the
 arse by exactly the problem you describe.  I wish I could remember where.
 I know that I found another (and possibly better) solution to do what I
 wanted, but it would be nice to see if a D-ish solution would give me
 something good.

It's often these types of performance discrepancies that critics point to  
(not that you are a critic), but it's the cost of having a more  
comprehensive language.  Your appetite for the sheer performance of a  
language will sour once you get bit by a few of these nasty bugs.

But D fosters a completely different way of thinking about solving  
problems.  One problem with C++'s vector is it is a value type -- you must  
pass a reference in order to avoid copying an entire vector.  However, D's  
arrays are a hybrid between reference and value type.  Often, once you set  
data in a vector/array, you never change it again.  D allows ways to  
enforce this (i.e. immutable) and also allows you to pass around "slices"  
of your array with zero overhead (no copying).  It results in some  
extremely high-performance code, which wouldn't be easy, or maybe even  
possible, with C++.

Take for instance a split function.  In C++, I'd expect split(string x) to  
return a vector<string>.  However, vector<string> makes a copy of each  
part of the string it has split out.  D, however, can return references to  
the original data (slices), which consume no overhead.  The only extra  
space allocated is the array to hold the string references.  All this is  
also completely safe!

You could then even modify the original string (assuming you were not  
using immutable strings) in place!  Or append to any one of the strings in  
the array safely.

 This must be fixed, the appender should be blazingly fast at appending
 (almost as fast as C++), with the drawback that the overhead is higher.

 Overhead = memory cost?  I'm not so bothered as long as the memory stays
 within constant, predictable bounds.  It was the memory explosion that
 scared me.  And I suspect I'd pay a small performance cost (though it
 would have to be small) for the kind of safety and flexibility the arrays
 have.

Overhead = bigger initialization cost, memory footprint.  It's not  
important if you are building a large array (which is what appender should  
be for), but the cost would add up if you had lots of little appenders  
that you didn't append much to.  The point is, the builtin array optimizes  
performance for operations besides append, but allows appending as a  
convenience.  Appender should optimize appending, sacrificing performance  
in other areas.  It all depends on your particular application whether you  
should use appender or builtin arrays (or something entirely  
different/custom).

 You haven't done much with it yet.  When you start discovering how much  
 D
 takes care of, you will be amazed :)

 I know. :-)

 My needs are in some ways quite narrow -- numerical simulations in
 interdisciplinary physics -- hence the C background, and hence the  
 premium
 on performance.  They're also not very big programs -- simple enough for  
 me
 to generally keep a personal overview on the memory management, even  
 though
 with C++ that's usually all taken care of automatically (no new or delete
 statements if I can avoid it).

There are many in the community that use D for numerical stuff.  It's  
definitely not as mature as it could be, but getting better.  Don is  
adding a lot of cool stuff to it, including a builtin exponent operator  
and arbitrary precision numbers.

 The thing about D is it *can* be fast and unsafe, just as fast and  
 unsafe
 as C, but that's not the default.

 That's apparent -- I mean, given that D wraps the whole C standard  
 library,
 I could basically write C code in D if I wanted, no?

Yes, but that's not what I meant ;)  I mean, you can write your own types,  
like the Appender (or what the appender *should* be) that optimize the  
behavior of code to meet any needs.  And it can do it with a much better  
syntax than C.  D's template system and ability to make user-types seem  
like builtins I think is unparalleled in C-like languages.

-Steve

Apr 12 2010

Joseph Wakeling <joseph.wakeling webdrake.net> writes:

Steven Schveighoffer wrote:
 On Mon, 12 Apr 2010 12:03:38 -0400, Joseph Wakeling
 <joseph.wakeling gmail.com> wrote:
 
 I thought dev effort was now focusing back on GDC ... ? :-P

 
 AFAIK, gdc hasn't been actively developed for a few years.
 
 ldc, on the other hand, has regular releases.  I think ldc may be the
 future of D compilers, but I currently use dmd since I'm using D2.

http://bitbucket.org/goshawk/gdc/wiki/Home

:-)

Either way I'm happy.  I don't have any issues with dmd, but I do want
to see a properly free D compiler that can be prepackaged in all the
Linux distros.

 Yes, you get around this by preallocating.

Sure.  But that wasn't what shocked me -- what I was amazed by was
setting up a situation where I _had_ preallocated the memory and still
seeing the memory usage explode, because D was preserving the memory
from each round of the loop.

(Actually I'm still having some issues with this, despite using
assumeSafeAppend, but more on that in a separate email.)

 It's often these types of performance discrepancies that critics point
 to (not that you are a critic), but it's the cost of having a more
 comprehensive language.  Your appetite for the sheer performance of a
 language will sour once you get bit by a few of these nasty bugs.

For sure.

 But D fosters a completely different way of thinking about solving
 problems.

I can see how the example you give would be fantastically useful.

More generally, I think this is the point -- I need to adjust my head to
writing D-ish code, just as when moving from C to C++ I needed to switch
to various new ways of doing things.

 There are many in the community that use D for numerical stuff.  It's
 definitely not as mature as it could be, but getting better.  Don is
 adding a lot of cool stuff to it, including a builtin exponent operator
 and arbitrary precision numbers.

I guessed there would be -- I knew for example that there was someone
out there working on a fairly major mathematical/numerical library for
D, but it's a while since I checked that out.

So take my earlier comment about numerical work to refer only to doing
things the way I'm used to ... ;-)

 Yes, but that's not what I meant ;)  I mean, you can write your own
 types, like the Appender (or what the appender *should* be) that
 optimize the behavior of code to meet any needs.  And it can do it with
 a much better syntax than C.  D's template system and ability to make
 user-types seem like builtins I think is unparalleled in C-like languages.

Hence much pleasure and excitement in learning D ... :-)

Don wrote:
 There are quite a lot of us here with exactly that kind of background.
 
 Something about the array issue -- D dynamic arrays are heavily geared towards
algorithms which perform an initial allocation and afterwards avoid memory
allocation entirely.
 In D, such slicing algorithms are extremely clean, extremely fast, and memory
safe.
 In C++, it's much more difficult to write code in that manner. A
straightforward translation from C++ will generally miss the benefits of D
arrays, and you'll end up with slower code.

Exactly my current situation. :-P

 A kind of "Zen of D" is to use array slices as much as possible.

I will look into this more and see if this approach can help with some
of my code -- are there existing projects I could take a look at to get
some examples?

Thanks & best wishes,

    -- Joe

Apr 12 2010

Joseph Wakeling <joseph.wakeling webdrake.net> writes:

Joseph Wakeling wrote:
 (Actually I'm still having some issues with this, despite using
 assumeSafeAppend, but more on that in a separate email.)

... solved; it's interesting to note that assumeSafeAppend has to be
used in _exactly_ the same scope as the append ~= itself.

e.g. with this loop, the memory blows up:

    ////////////////////////////////////////////////////////////////
    foreach(uint i;0..100) {
        x.length = 0;

        assumeSafeAppend(x);

        foreach(uint j;0..5000)
            foreach(uint k;0..1000)
                x ~= j*k;

        writefln("At iteration %u, x has %u elements.",i,x.length);
    }
    ////////////////////////////////////////////////////////////////

... while with this one it's OK:

    ////////////////////////////////////////////////////////////////
    foreach(uint i;0..100) {
        x.length = 0;

        foreach(uint j;0..5000) {
            foreach(uint k;0..1000) {
                assumeSafeAppend(x);
                x ~= j*k;
            }
        }

        writefln("At iteration %u, x has %u elements.",i,x.length);
    }
    ////////////////////////////////////////////////////////////////

Curious question: how come with a registered email address my messages
have to be moderated, whereas via the http interface I can post straight
away? :-P

Best wishes,

    -- Joe

Apr 12 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 12 Apr 2010 15:36:19 -0400, Joseph Wakeling  
<joseph.wakeling webdrake.net> wrote:

 Joseph Wakeling wrote:
 (Actually I'm still having some issues with this, despite using
 assumeSafeAppend, but more on that in a separate email.)

 ... solved; it's interesting to note that assumeSafeAppend has to be
 used in _exactly_ the same scope as the append ~= itself.

 e.g. with this loop, the memory blows up:

     ////////////////////////////////////////////////////////////////
     foreach(uint i;0..100) {
         x.length = 0;

         assumeSafeAppend(x);

         foreach(uint j;0..5000)
             foreach(uint k;0..1000)
                 x ~= j*k;

         writefln("At iteration %u, x has %u elements.",i,x.length);
     }
     ////////////////////////////////////////////////////////////////

This should work as you expect.  Can you produce a full code example?  Are  
you reserving space in x before-hand?

 ... while with this one it's OK:

     ////////////////////////////////////////////////////////////////
     foreach(uint i;0..100) {
         x.length = 0;

         foreach(uint j;0..5000) {
             foreach(uint k;0..1000) {
                 assumeSafeAppend(x);
                 x ~= j*k;
             }
         }

         writefln("At iteration %u, x has %u elements.",i,x.length);
     }
     ////////////////////////////////////////////////////////////////

The first assumeSafeAppend is the only one that matters.  Subsequent ones  
are wasted cycles.  I'm unsure why this one works and the other doesn't.   
Again, I need full compilable examples.

-Steve

Apr 13 2010

Joseph Wakeling <joseph.wakeling webdrake.net> writes:

Steven Schveighoffer wrote:
 This should work as you expect.  Can you produce a full code example? 
 Are you reserving space in x before-hand?

Weird; now it works.  I wonder if because I upgraded to 2.043 ... ?  I
understand there was a fix in there for array append.

On the other hand, if the assumeSafeAppend takes place outside the loops
entirely, blow-up occurs -- because the command is not issued after the
array resize to zero?  I've attached examples.

Thanks & best wishes,

    -- Joe

Apr 13 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 13 Apr 2010 08:24:57 -0400, Joseph Wakeling  
<joseph.wakeling webdrake.net> wrote:

 Steven Schveighoffer wrote:
 This should work as you expect.  Can you produce a full code example?
 Are you reserving space in x before-hand?

 Weird; now it works.  I wonder if because I upgraded to 2.043 ... ?  I
 understand there was a fix in there for array append.

2.043 has fixed a corruption bug, one which you are unlikely to have  
triggered with a simple program like you wrote.  However, 2.041 had a  
severe corruption bug, which any array could fall victim to.  If you  
upgraded from 2.041, it's quite possible that fixed the problem.  If you  
just upgraded from 2.042, I think this may be a case of stale  
object/binary files ;)

 On the other hand, if the assumeSafeAppend takes place outside the loops
 entirely, blow-up occurs -- because the command is not issued after the
 array resize to zero?  I've attached examples.

Yes, assumeSafeAppend must be called every time you want to overwrite  
memory.  The runtime cannot tell if you still have references to the  
memory that was in use you don't want to be changed by appending to the  
header.  So it makes a conservative decision, "hey, this memory was in  
use, I'm not sure if it's in use anymore, so I'll reallocate rather than  
stomp on it."  assumeSafeAppend tells the runtime "yes, it was in use, but  
I'm no longer using it, so it is safe to stomp on it."  However, you only  
have to call it once per loop, because a single assumeSafeAppend assumes  
the remainder of the memory block after the array you pass as an argument  
to assumeSafeAppend is ok to stomp on.

-Steve

Apr 13 2010

Joseph Wakeling <joseph.wakeling webdrake.net> writes:

Joseph Wakeling wrote:
 On the other hand, if the assumeSafeAppend takes place outside the loops
 entirely, blow-up occurs -- because the command is not issued after the
 array resize to zero?  I've attached examples.

If I include a line

     x.reserve(5_000_000);

before the assumeSafeAppend, the memory does not explode indefinitely --
but it uses about 3x as much memory as the 'NoLeak' code even when the
latter lacks advance reservation of memory.

Apr 13 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 13 Apr 2010 08:30:57 -0400, Joseph Wakeling  
<joseph.wakeling webdrake.net> wrote:

 Joseph Wakeling wrote:
 On the other hand, if the assumeSafeAppend takes place outside the loops
 entirely, blow-up occurs -- because the command is not issued after the
 array resize to zero?  I've attached examples.

 If I include a line

      x.reserve(5_000_000);

 before the assumeSafeAppend, the memory does not explode indefinitely --
 but it uses about 3x as much memory as the 'NoLeak' code even when the
 latter lacks advance reservation of memory.

x.reserve(5_000_000) will reserve the memory for the first loop.  But the  
second time through the loop will start reallocating from scratch unless  
you tell the runtime it's safe to reuse that memory.

Most likely it uses less memory than the noleak code because it has a nice  
continuous 40MB region to use in subsequent loops.  Because of the size of  
the array, it will naturally gravitate towards that region, and once it  
finds it, it will happily grow in place for a while.

assumeSafeAppend is needed to avoid reallocating the memory when setting  
the length back to 0.  It is an optimization designed specifically for  
this purpose.

My recommended approach for your little code sample is this:


import std.array;
import std.stdio;

void main()
{
	double[] x;
	x.reserve(5_000_000); // heads up, runtime, I'm about to use 40MB of  
continuous space

	foreach(uint i;0..100) {
		x.length = 0;
	
		assumeSafeAppend(x); // you can reuse that memory I just got done  
filling, I don't need it anymore.
		
		foreach(uint j;0..5_000) {
			foreach(uint k;0..1_000) {
				x ~= j*k;
			}
		}
		
		writefln("At iteration %u, x has %u elements.",i,x.length);
	}
}

If you don't use reserve, here is what happens during the loop:

1st append: The smallest block possible to hold 1 double is found and  
allocated
2nd append: If the block holding the first double can hold 2, the  
"allocated" size of the block is increased to 2 doubles.  Otherwise, the  
smallest block possible to hold 2 doubles is found and allocated.
...

Which means the first loop will consume extra memory on its way to  
building the full array.  With reserve, the array is preallocated, so  
every append is able to simply update the allocated length instead of  
reallocating, increasing performance and saving memory.

Note that every time the reallocation occurs because the block isn't big  
enough, the old block is left behind until the GC cleans it up.  But even  
when the GC cleans it up, it doesn't necessarily release it back to the  
OS, so your program may still be consuming that memory.

-Steve

Apr 13 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 13 Apr 2010 08:56:03 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:


 If you don't use reserve, here is what happens during the loop:

 1st append: The smallest block possible to hold 1 double is found and  
 allocated
 2nd append: If the block holding the first double can hold 2, the  
 "allocated" size of the block is increased to 2 doubles.  Otherwise, the  
 smallest block possible to hold 2 doubles is found and allocated.
 ...

Assuming you may ask questions about this, it's not an exact description  
of what happens.  The append code and GC are smarter about it than this, I  
just wanted to explain it in an easy-to-understand way :)  The real code  
uses algorithms to determine optimal grow sizes and can extend into  
consecutive blocks if possible.

-Steve

Apr 13 2010

Joseph Wakeling <joseph.wakeling webdrake.net> writes:

Steven Schveighoffer wrote:
 Assuming you may ask questions about this, it's not an exact description
 of what happens.  The append code and GC are smarter about it than this,
 I just wanted to explain it in an easy-to-understand way :)  The real
 code uses algorithms to determine optimal grow sizes and can extend into
 consecutive blocks if possible.

I realised that while running some of the code that bearophile suggested
to me, because if you look at 'capacity' in a non-pre-reserved D array
that's appended to 5 million times, its capacity is only a little over 5
million -- compared to a C++ vector which is the nearest power of 2
(2^23, over 8 million).

The GC overhead means that in this case actual memory usage is a bit
higher than C++, but on a larger scale this could surely make a big
difference.

In the case of the piece of code I'm working on I don't think the
pre-reserve is really so important as far as performance goes -- the
append speed is a bit of a bugger, but the main issue is probably to do
with speed of copying and iterating across arrays.

For example, I suspect that the D array's,

        x[] = 0;
        y[] = z[];

is not as efficient as a C++ vector's

        x.assign(x.size(),0);
        y.assign(z.begin(),z.end());

I think I can find some ways round this by taking advantage of some of
the D arrays' cleverness, limiting the copying of values and instead
just swapping around which memory each array is pointing to.

As for iteration, I don't know to what degree D's foreach() across
arrays compares to for() commands in C++ -- I guess it should be pretty
close in performance, no?

Best wishes,

    -- Joe

Apr 13 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 13 Apr 2010 11:15:15 -0400, Joseph Wakeling  
<joseph.wakeling webdrake.net> wrote:

 As for iteration, I don't know to what degree D's foreach() across
 arrays compares to for() commands in C++ -- I guess it should be pretty
 close in performance, no?

D special cases foreach on arrays to basically be a standard array for  
loop, i.e.:

foreach(x; arr)
{
    ...
}

translates to something like:

auto __len = arr.length;
for(uint __i = 0; __i < __len; ++__i)
{
    auto x = arr[__i];
    ...
}

Although it may be even further optimized, I'm not sure.

-Steve

Apr 13 2010

Don <nospam nospam.com> writes:

Joseph Wakeling wrote:
 Steven Schveighoffer wrote:
 Assuming you may ask questions about this, it's not an exact description
 of what happens.  The append code and GC are smarter about it than this,
 I just wanted to explain it in an easy-to-understand way :)  The real
 code uses algorithms to determine optimal grow sizes and can extend into
 consecutive blocks if possible.

 
 I realised that while running some of the code that bearophile suggested
 to me, because if you look at 'capacity' in a non-pre-reserved D array
 that's appended to 5 million times, its capacity is only a little over 5
 million -- compared to a C++ vector which is the nearest power of 2
 (2^23, over 8 million).
 
 The GC overhead means that in this case actual memory usage is a bit
 higher than C++, but on a larger scale this could surely make a big
 difference.
 
 In the case of the piece of code I'm working on I don't think the
 pre-reserve is really so important as far as performance goes -- the
 append speed is a bit of a bugger, but the main issue is probably to do
 with speed of copying and iterating across arrays.
 
 For example, I suspect that the D array's,
 
         x[] = 0;
         y[] = z[];
 
 is not as efficient as a C++ vector's
 
         x.assign(x.size(),0);
         y.assign(z.begin(),z.end());

The D code compiles directly to a memset and memcpy, which are 
intrinsics. There's no way C++ could beat it.

Apr 13 2010

Joseph Wakeling <joseph.wakeling webdrake.net> writes:

Don wrote:
 The D code compiles directly to a memset and memcpy, which are
 intrinsics. There's no way C++ could beat it.

OK.  I'm just bemused about why -- having solved the memory leak -- my D
code is running significantly slower than equivalent C++ code.

For now I'm just playing with D by doing a rewrite of some C++ code I
did for a project a while back -- to see if I can write a neater version
of the code in the new language, and learn something about D in the
process.  I may have fallen into the trap you described earlier, of just
'translating' C++ style code directly into D, but I don't think that's
the issue here.

There are a couple of algorithmic inefficiencies in there with
deliberate reason, but removing them for testing purposes doesn't
improve matters.  The array append is slow, but it's not the major part
of the running time of the program.

For the sake of clarity, I've attached the complete code.  The really
important part to look at is the file yzlm.d in the opCall() function
and class functions objectReputationUpdate() and userReputationUpdate().
This is a reputation-generating process which takes a set of ratings by
users of objects, and uses them to iteratively recalculate user
reputation and re-aggregate the ratings taking this into account.

I don't see anything particularly bad about the way these algorithms are
written but maybe they are imperfectly D-ish ... :-)

Best wishes,

    -- Joe

Apr 13 2010

bearophile <bearophileHUGS lycos.com> writes:

Joseph Wakeling:
 OK.  I'm just bemused about why -- having solved the memory leak -- my D
 code is running significantly slower than equivalent C++ code.

I'll try to take a look at your D code. In the meantime you can show the
equivalent C++ code too, so I can compare.
Note that dmd has an old backend, so it doesn't optimize as well as g++ or as
llvm. LDC is often able to optimize better.

Bye,
bearophile

Apr 13 2010

bearophile <bearophileHUGS lycos.com> writes:

So far I've just given a light reading of the code. Notes:
- 1.0L in D is a real, not a double, it's 10-12-16 bytes long.
- Mt19937 is a Mersenne twister, keep in mind that it's really good but slow.
If you need less quality, use a faster generator.
- Compile your code with -w, so it says you that you have missed an "override".
- ReputationAlgorithm can be an abstract class, I think. So you can write it as:
abstract class ReputationAlgorithm {
    this() {}
    this(ref Rating[] ratings,
         ref double[] reputationUser,
         ref double[] reputationObject);
}

- Both Yzlm and AvgArithmetic classes can be marked as final, and in the other
classes there are other methods can be marked as final. As in Java in D class
methods are virtual unless you mark them as final, but unlike the HotSpot of
Java as far as I know no D compiler is currently able to inline virtual methods.
- Dmd has both a built-ihn profiler and code coverage, use them to find hot
spots.
- Put a space after commas, generally.
- pow(x, 2) and sqrt(y) can be written as x ^^ 2 and y ^^ 0.5 (but you have to
import std.math anyway, because of a bug).
- You can learn to use a little of contract programming, and move in a
precondition the asserts, for example of the Yzlm constructor. Then you can add
class invariants too, if you want.
- Switching the contents of dynamic arrays do work (it doesn't work well with
stack-allocated arrays, the contents get copied).

I'll try to add more comments later.
Bye,
bearophile

Apr 13 2010

bearophile <bearophileHUGS lycos.com> writes:

This is a little cleaned up version of the result of the profiling (no inling,
final classes, -O -release):

   Func  
   Time  

7586124  std.random __uniform std.random __MersenneTwisterEngine
4040384  main
2408110  performance userReputationUpdate performance Rating
2299561  std.math __nextafter nextafter
2176746  std.contracts __enforce
2051276  std.conv __to
2006079  performance.objectReputationUpdate performance Rating
1898177  std.array __Appender performance Rating Appender __put performance
Rating put performance Rating

It's a little disaster. As I have thought the Mersenne twister is taking up lot
of time. And then the nextafter, the enforce, the to!, the appender...

Bye,
bearophile

Apr 13 2010

bearophile <bearophileHUGS lycos.com> writes:

This is a single module, and on my PC is more than two times faster than the
original code. Now I think there are no naive ways left to speed it up.

Compile it with:
dmd -O -release -inline

The low performance was caused by:
- virtual methods that don't need to be virtual;
- ^^0.5 and pow(x,2) doesn't perfectly efficient;
- the two nested loops in the main are more efficient as ref double, this is
something dmd will need to fix;
- Mersenne twister is slow. I have used a Kiss rnd generator here, adapted from
Tango.
- the uniform of std.random is bad, it contains nextafter that's slow, it can
be avoided using a closed range: "[]" instead of the default one "[)".
- Somewhere in the middle of the random generator there was an enforce. I am
now not so sure it's good to use a lot of enforce() in the std lib of a system
language. Maybe Andrei will need to go back using asserts inside contracts.
- The temporary struct inside the two main loops didn't get optimized away
- the append was messy and slow, I have removed the appends and replaced with
simpler and tidier code with an index, that's quite faster.

Now the profiling result is clean:
16394266 main
 8079735 2performance510FastRandom7uniformMFddZd
 2841876 2performance510FastRandom8randUintMFZk
 1923377 2performance54Yzlm20userReputationUpdateMFKAS12performance56RatingKAdKAdZv
 1729333 2performance54Yzlm22objectReputationUpdateMFKAS12performance56RatingKAdKAdZd

Most of the time is spent in the main and generating rnd values, but the
timings are clean.

The resulting code can be slower than the C++ code, but now it's mostly a
matter of backend. If I port this code to D1 for Tango and I compile it very
well with LDC the performance can be about as good as the C++ version or better
:-) If necessary I can use an even faster random number generator, but that's
for tomorrow, if you want.


import std.stdio: printf;
import std.math: sqrt, pow;

struct FastRandom {
    uint kiss_x = 1;
    uint kiss_y = 2;
    uint kiss_z = 4;
    uint kiss_w = 8;
    uint kiss_carry = 0;
    uint kiss_k, kiss_m;

    this(uint seed) {
        this.seed(seed);
    }

    void seed(uint seed) {
        kiss_x = seed | 1;
        kiss_y = seed | 2;
        kiss_z = seed | 4;
        kiss_w = seed | 8;
        kiss_carry = 0;
    }

    uint randUint() {
        kiss_x = kiss_x * 69069 + 1;
        kiss_y ^= kiss_y << 13;
        kiss_y ^= kiss_y >> 17;
        kiss_y ^= kiss_y << 5;
        kiss_k = (kiss_z >> 2) + (kiss_w >> 3) + (kiss_carry >> 2);
        kiss_m = kiss_w + kiss_w + kiss_z + kiss_carry;
        kiss_z = kiss_w;
        kiss_w = kiss_m;
        kiss_carry = kiss_k >> 30;
        return kiss_x + kiss_y + kiss_w;
    }

    double random() {
        return this.randUint() / (uint.max + 1.0);
    }

    double uniform(double a, double b) {
        double r = cast(double)this.randUint() / (uint.max + 1.0);
        return a + (b - a) * r;
    }
}


struct Rating {
    uint user, object;
    double value;
}


abstract class ReputationAlgorithm {
    this() {}
}


final class Yzlm : ReputationAlgorithm {
    double beta;
    double convergenceRequirement;
    double errorMin;
//    uint[] userLinks; // commented out because for now it
                        // has a constant value for all users
    double[] weightSum;
    double[] oldReputationObject;

    double objectReputationUpdate(ref Rating[] ratings,
                                  ref double[] reputationUser,
                                  ref double[] reputationObject) {
        double diff = 0;
        double[] temp = oldReputationObject;

        // Original version had:
        //
        //    oldReputationObject[] = reputationObject[]
        //
        // This version is an attempt to save effort
        // by just switching round the memory the two
        // arrays are pointing at -- not sure if it
        // actually does what I'm expecting it to.
        // Doesn't seem to improve speed. :-(
        oldReputationObject = reputationObject;
        reputationObject = temp;
        reputationObject[] = 0;

        weightSum[] = 0;

        foreach (Rating r; ratings) {
            reputationObject[r.object] += reputationUser[r.user] * r.value;
            weightSum[r.object] += reputationUser[r.user];
        }

        foreach (uint object, ref double r; reputationObject) {
            r /= (weightSum[object] > 0) ? weightSum[object] : 1;
            auto aux = (r - oldReputationObject[object]);
            diff += aux * aux;
        }

        return sqrt(diff);
    }

    void userReputationUpdate(ref Rating[] ratings,
                              ref double[] reputationUser,
                              ref double[] reputationObject) {
        reputationUser[] = 0;

        foreach (Rating r; ratings) {
            auto aux = (r.value - reputationObject[r.object]);
            reputationUser[r.user] += aux * aux;
        }

        foreach (uint user, ref double r; reputationUser) {
            //if(userLinks[user]>0)
                r = pow( (r/reputationObject.length/*userLinks[user]*/) +
errorMin, -beta);
        }
    }

    void opCall(ref Rating[] ratings,
                ref double[] reputationUser,
                ref double[] reputationObject) {
    //    userLinks.length = reputationUser.length;
    //    userLinks[] = 0;

        weightSum.length = reputationObject.length;
        oldReputationObject.length = reputationObject.length;

    //    foreach (Rating r; ratings)
    //        userLinks[r.user]++;

        double diff;
        uint iterations = 0;
        do {
            userReputationUpdate(ratings, reputationUser, reputationObject);
            diff = objectReputationUpdate(ratings, reputationUser,
reputationObject);
            ++iterations;
        } while (diff > convergenceRequirement);
        printf("Exited in %u iterations with diff = %g < %g\n",
               iterations, diff, convergenceRequirement);
    }

    this() {}

    this(double b, double c, double e) {
        beta = b;
        convergenceRequirement = c;
        errorMin = e;
        assert(beta >= 0);
        assert(convergenceRequirement > 0);
        assert(errorMin >= 0);
    }

    this(ref Rating[] ratings,
         ref double[] reputationUser,
         ref double[] reputationObject,
         double b, double c, double e) {
        this(b,c,e);

        opCall(ratings, reputationUser, reputationObject);
    }
}


class AvgWeighted : ReputationAlgorithm {
    double[] weightSum;

    void opCall(ref Rating[] ratings,
                ref double[] reputationUser,
                ref double[] reputationObject) {
        weightSum.length = reputationObject.length;
        weightSum[] = 0;

        reputationObject[] = 0;

        foreach (Rating r; ratings) {
            reputationObject[r.object] += reputationUser[r.user]*r.value;
            weightSum[r.object] += reputationUser[r.user];
        }

        foreach (uint o, ref double r; reputationObject)
            r /= weightSum[o];
    }

    this() {}

    this(ref Rating[] ratings,
         ref double[] reputationUser,
         ref double[] reputationObject) {
        opCall(ratings, reputationUser, reputationObject);
    }
}


final class AvgArithmetic : AvgWeighted {
    override void opCall(ref Rating[] ratings,
            ref double[] reputationUser,
            ref double[] reputationObject) {
        reputationUser[] = 1;
        super.opCall(ratings, reputationUser, reputationObject);
    }

    this() {}

    this(ref Rating[] ratings,
         ref double[] reputationUser,
         ref double[] reputationObject) {
        opCall(ratings, reputationUser, reputationObject);
    }
}


void main() {
    Rating[] ratings;
    double[] reputationObject, reputationUser;
    double[] objectQuality, userError;
    auto aa = new AvgArithmetic;
    auto yzlm = new Yzlm(0.8, 1e-12, 1e-36);

    auto rnd = FastRandom(1001);

    reputationObject.length = 1_000;
    reputationUser.length = 1_000;

    objectQuality.length = reputationObject.length;
    userError.length = reputationUser.length;

    ratings.length = reputationObject.length * reputationUser.length;

    foreach (uint i; 0 .. 100) {
        foreach (ref double Q; objectQuality)
            Q = rnd.uniform(0.0, 10.0);

        foreach (ref double sigma2; userError)
            sigma2 = rnd.random();

        int pos;
        foreach (uint object, ref double Q; objectQuality)
            foreach (uint user, ref double sigma2; userError)
                ratings[pos++] = Rating(user, object, rnd.uniform(Q - sigma2, Q
+ sigma2));

        printf("We now have %u ratings.\n", ratings.length);

        aa(ratings, reputationUser, reputationObject);
        yzlm(ratings, reputationUser, reputationObject);

        double deltaQ = 0;
        foreach (uint object, double r; reputationObject) {
            auto aux = (r - objectQuality[object]);
            deltaQ += aux * aux;
        }
        deltaQ = sqrt(deltaQ / reputationObject.length);
        printf("[%u] Error in quality estimate: %g\n", i, deltaQ);
    }
}


Bye,
bearophile

Apr 13 2010

bearophile <bearophileHUGS lycos.com> writes:

This compiles for D1. With LDC it runs about two times faster still, about
4.5-5 times faster than the original version. We can compare this to the C++
version of yours.

ldc -O5 -release -inline -enable-unsafe-fp-math -unroll-allow-partial test.d

I have also tried the faster rnd generator (R250-521), but in this program it's
not faster than Kiss, so I've kept Kiss. Note that Tango that you can use with
LDC has Kiss already.


version (Tango) {
    import tango.stdc.stdio: printf;
    import tango.math.Math: sqrt, pow;
} else {
    import std.stdio: printf;
    import std.math: sqrt, pow;
}

struct FastRandom {
    uint kiss_x = 1;
    uint kiss_y = 2;
    uint kiss_z = 4;
    uint kiss_w = 8;
    uint kiss_carry = 0;
    uint kiss_k, kiss_m;

    void seed(uint seed) {
        kiss_x = seed | 1;
        kiss_y = seed | 2;
        kiss_z = seed | 4;
        kiss_w = seed | 8;
        kiss_carry = 0;
    }

    uint randUint() {
        kiss_x = kiss_x * 69069 + 1;
        kiss_y ^= kiss_y << 13;
        kiss_y ^= kiss_y >> 17;
        kiss_y ^= kiss_y << 5;
        kiss_k = (kiss_z >> 2) + (kiss_w >> 3) + (kiss_carry >> 2);
        kiss_m = kiss_w + kiss_w + kiss_z + kiss_carry;
        kiss_z = kiss_w;
        kiss_w = kiss_m;
        kiss_carry = kiss_k >> 30;
        return kiss_x + kiss_y + kiss_w;
    }

    double random() {
        return this.randUint() / (uint.max + 1.0);
    }

    double uniform(double a, double b) {
        double r = cast(double)this.randUint() / (uint.max + 1.0);
        return a + (b - a) * r;
    }
}


struct Rating {
    uint user, object;
    double value;
}


abstract class ReputationAlgorithm {
    this() {}
}


final class Yzlm : ReputationAlgorithm {
    double beta;
    double convergenceRequirement;
    double errorMin;
//    uint[] userLinks; // commented out because for now it
                        // has a constant value for all users
    double[] weightSum;
    double[] oldReputationObject;

    double objectReputationUpdate(ref Rating[] ratings,
                                  ref double[] reputationUser,
                                  ref double[] reputationObject) {
        double diff = 0;
        double[] temp = oldReputationObject;

        // Original version had:
        //
        //    oldReputationObject[] = reputationObject[]
        //
        // This version is an attempt to save effort
        // by just switching round the memory the two
        // arrays are pointing at -- not sure if it
        // actually does what I'm expecting it to.
        // Doesn't seem to improve speed. :-(
        oldReputationObject = reputationObject;
        reputationObject = temp;
        reputationObject[] = 0;

        weightSum[] = 0;

        foreach (Rating r; ratings) {
            reputationObject[r.object] += reputationUser[r.user] * r.value;
            weightSum[r.object] += reputationUser[r.user];
        }

        foreach (uint object, ref double r; reputationObject) {
            r /= (weightSum[object] > 0) ? weightSum[object] : 1;
            auto aux = (r - oldReputationObject[object]);
            diff += aux * aux;
        }

        return sqrt(diff);
    }

    void userReputationUpdate(ref Rating[] ratings,
                              ref double[] reputationUser,
                              ref double[] reputationObject) {
        reputationUser[] = 0;

        foreach (Rating r; ratings) {
            auto aux = (r.value - reputationObject[r.object]);
            reputationUser[r.user] += aux * aux;
        }

        foreach (uint user, ref double r; reputationUser) {
            //if(userLinks[user]>0)
                r = pow( (r/reputationObject.length/*userLinks[user]*/) +
errorMin, -beta);
        }
    }

    void opCall(ref Rating[] ratings,
                ref double[] reputationUser,
                ref double[] reputationObject) {
    //    userLinks.length = reputationUser.length;
    //    userLinks[] = 0;

        weightSum.length = reputationObject.length;
        oldReputationObject.length = reputationObject.length;

    //    foreach (Rating r; ratings)
    //        userLinks[r.user]++;

        double diff;
        uint iterations = 0;
        do {
            userReputationUpdate(ratings, reputationUser, reputationObject);
            diff = objectReputationUpdate(ratings, reputationUser,
reputationObject);
            ++iterations;
        } while (diff > convergenceRequirement);
        printf("Exited in %u iterations with diff = %g < %g\n",
               iterations, diff, convergenceRequirement);
    }

    this() {}

    this(double b, double c, double e) {
        beta = b;
        convergenceRequirement = c;
        errorMin = e;
        assert(beta >= 0);
        assert(convergenceRequirement > 0);
        assert(errorMin >= 0);
    }

    this(ref Rating[] ratings,
         ref double[] reputationUser,
         ref double[] reputationObject,
         double b, double c, double e) {
        this(b,c,e);

        opCall(ratings, reputationUser, reputationObject);
    }
}


class AvgWeighted : ReputationAlgorithm {
    double[] weightSum;

    void opCall(ref Rating[] ratings,
                ref double[] reputationUser,
                ref double[] reputationObject) {
        weightSum.length = reputationObject.length;
        weightSum[] = 0;

        reputationObject[] = 0;

        foreach (Rating r; ratings) {
            reputationObject[r.object] += reputationUser[r.user]*r.value;
            weightSum[r.object] += reputationUser[r.user];
        }

        foreach (uint o, ref double r; reputationObject)
            r /= weightSum[o];
    }

    this() {}

    this(ref Rating[] ratings,
         ref double[] reputationUser,
         ref double[] reputationObject) {
        opCall(ratings, reputationUser, reputationObject);
    }
}


final class AvgArithmetic : AvgWeighted {
    override void opCall(ref Rating[] ratings,
            ref double[] reputationUser,
            ref double[] reputationObject) {
        reputationUser[] = 1;
        super.opCall(ratings, reputationUser, reputationObject);
    }

    this() {}

    this(ref Rating[] ratings,
         ref double[] reputationUser,
         ref double[] reputationObject) {
        opCall(ratings, reputationUser, reputationObject);
    }
}


void main() {
    Rating[] ratings;
    double[] reputationObject, reputationUser;
    double[] objectQuality, userError;
    auto aa = new AvgArithmetic;
    auto yzlm = new Yzlm(0.8, 1e-12, 1e-36);

    FastRandom rnd;
    rnd.seed(1001);

    reputationObject.length = 1_000;
    reputationUser.length = 1_000;

    objectQuality.length = reputationObject.length;
    userError.length = reputationUser.length;

    ratings.length = reputationObject.length * reputationUser.length;

    for (uint i; i < 4; i++) { // 100  4  ***************
        foreach (ref double Q; objectQuality)
            Q = rnd.uniform(0.0, 10.0);

        foreach (ref double sigma2; userError)
            sigma2 = rnd.random();

        int pos;
        foreach (uint object, ref double Q; objectQuality)
            foreach (uint user, ref double sigma2; userError)
                ratings[pos++] = Rating(user, object, rnd.uniform(Q - sigma2, Q
+ sigma2));

        printf("We now have %u ratings.\n", ratings.length);

        aa(ratings, reputationUser, reputationObject);
        yzlm(ratings, reputationUser, reputationObject);

        double deltaQ = 0;
        foreach (uint object, double r; reputationObject) {
            auto aux = (r - objectQuality[object]);
            deltaQ += aux * aux;
        }
        deltaQ = sqrt(deltaQ / reputationObject.length);
        printf("[%u] Error in quality estimate: %g\n", i, deltaQ);
    }
}

Bye,
bearophile

Apr 13 2010

Joseph Wakeling <joseph.wakeling webdrake.net> writes:

Hi Bearophile,

Thanks ever so much for the amount of effort you put into this -- it is
extremely kind and generous. :-)

I'm not seeing the significant gains that you see with DMD, but for me
too LDC doubles the speed.

My guess for the difference is that you ran less than the full 100
iterations of the main for loop.  As you can see in the code the Yzlm
class launches an iterative procedure which takes more or less
iterations to converge depending on initial conditions.

This _normally_ is relatively few, but can in some circumstances be
large (several hundred).

An effect of changing the random number generator is that it changes the
initial conditions presented to the algorithm.  For example, in the
first 10 runs of the loop, the original code performs 404 iterations
overall and your new code performs only 348.

Combine that with other savings like the removal of the appender and the
faster random number generation and you get a fairly sizeable saving --
about a 25-30% cut in running time when compiling with dmd.  But those
savings become less of an influence on running time as you increase the
number of simulations.

The _real_ measure of speed is thus not overall program running time but
running time relative to the total number of iterations.  The good news
is that while the LDC-compiled version doesn't beat g++ it comes pretty
close, with about 32 iterations per second for the D code compared to 26
for the C++. :-)

The C++ code can probably be optimised a bit further but not too much;
the algorithms in the iterative process are pretty much identical
between the two versions.  (The C++ doesn't have the 'clever' memory
swap-around that I put in the D code, but that doesn't make any
difference to D's performance anyway...)  So I think it's probably just
compiler difference that's to blame for speed differences -- which is
fine.  After all, D in general and DMD in particular is in development.

I _am_ surprised that in your profiling the random number generator took
up so much time, as if you cut out the lines,

        aa(ratings, reputationUser, reputationObject);
        yzlm(ratings, reputationUser, reputationObject);

... and recompile, the program screams through in no time at all --
which would lead me to think that it's the yzlm opCall and the functions
it calls that take up the majority of time.  Or is there some lazy
evaluation going on here ... ?

I don't think it's the algorithm in any case (although it might be
Phobos' implementation) as working with Tango and LDC, changing between
the Twister or Kiss random number generators doesn't make for any
meaningful performance difference.

One last thing.  While comparing DMD and LDC I noticed something odd.
Take this code, which is an LDC and DMD-compatible version of the 'array
leak' code I posted originally:

/////////////////////////////////////////////////////////////////////////
version(Tango) {
    import tango.stdc.stdio: printf;
} else {
    import std.stdio: printf;
}

void main()
{
    double[] x;

    for(uint i=0;i<100;++i) {
        x.length = 0;

        for(uint j=0;j<5_000;++j) {
            for(uint k=0;k<1_000;++k) {
                x ~= j*k;
            }
        }

        printf("At iteration %u, x has %u elements.\n",i,x.length);
    }
}
/////////////////////////////////////////////////////////////////////////

I noticed that when compiled with LDC the memory usage stays constant,
but in DMD the memory use blows up.  Is this a D1/D2 difference (with D2
having the more advanced features that require preservation of memory)
or is it a bug in one or the other of the compilers ... ?

Thanks & best wishes,

    -- Joe

Apr 14 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 14 Apr 2010 17:19:53 -0400, Joseph Wakeling  
<joseph.wakeling webdrake.net> wrote:


 /////////////////////////////////////////////////////////////////////////
 version(Tango) {
     import tango.stdc.stdio: printf;
 } else {
     import std.stdio: printf;
 }

 void main()
 {
     double[] x;

     for(uint i=0;i<100;++i) {
         x.length = 0;

         for(uint j=0;j<5_000;++j) {
             for(uint k=0;k<1_000;++k) {
                 x ~= j*k;
             }
         }

         printf("At iteration %u, x has %u elements.\n",i,x.length);
     }
 }
 /////////////////////////////////////////////////////////////////////////

 I noticed that when compiled with LDC the memory usage stays constant,
 but in DMD the memory use blows up.  Is this a D1/D2 difference (with D2
 having the more advanced features that require preservation of memory)
 or is it a bug in one or the other of the compilers ... ?

It is because D2 is more advanced in preserving memory.  In D1, setting  
length to zero is equivalent to also assuming it can append safely.

One thing I forgot about, there is currently a design flaw in D2 array  
appending which will keep around a certain number of arrays even after all  
references have been removed (i.e. the arrays should be collectable, but  
they are not collected by the GC).  I have a clear path on how to fix  
that, but I haven't done it yet.

See this bugzilla report:  
http://d.puremagic.com/issues/show_bug.cgi?id=3929

This might also help explain why the memory blows up.

-Steve

Apr 15 2010

bearophile <bearophileHUGS lycos.com> writes:

Joseph Wakeling:

My guess for the difference is that you ran less than the full 100 iterations
of the main for loop.<

You are right, sorry.


So I think it's probably just compiler difference that's to blame for speed
differences<

This can be true, but such differences are not magic, they can be found, and
eventually they can even become precise enhancement requests for llvm
developers, they are open to such requests.


After all, D in general and DMD in particular is in development.<

DMD has a quite old back-end that is not in active development. Maybe it will
become 64 bit.


I _am_ surprised that in your profiling the random number generator took up so
much time, as if you cut out the lines,

        aa(ratings, reputationUser, reputationObject);
        yzlm(ratings, reputationUser, reputationObject);
... and recompile, the program screams through in no time at all -- which would
lead me to think that it's the yzlm opCall and the functions it calls that take
up the majority of time.  Or is there some lazy evaluation going on here ... ?<

I've done few more tests, an I've seen that my profiling in DMD was wrong, I
have disabled the inlining (to have a less aggregated result), but this has
produced fake results, because in the true optimized run the inlining removes
most of the overhead of the Kiss generator, so the true profiling is now like
you say:

    Func  
    Time  

17703021  Yzlm.objectReputationUpdate
12709135  Yzlm.userReputationUpdate
 1385194  main


In this program I have seen that a good percentage of the speed difference
between ldc/dmd is caused by foreach loops. When you iterate over structs
longer than size_t use ref (or in D2 better ref const(...)).
foreach (Rating r; ratings)
==>
foreach (ref const(Rating) r; ratings)

There is nothing algorithmically strange going on here, but to be sure you can
take a look at the produced asm.


This is the middle parts (the two loops) of Yzlm.objectReputationUpdate
compiled with ldc, this is one of the two main hot spots of the program, the
you can compared this asm with the asm of the same part from the C++ version:

...
.LBB6_2:
    movl    (%eax), %edx
    movl    52(%esp), %edi
    movl    4(%edi), %ebx
    movsd   (%ebx,%edx,8), %xmm0
    mulsd   8(%eax), %xmm0
    movl    4(%eax), %edx
    movl    48(%esp), %ebx
    movl    4(%ebx), %ebx
    addsd   (%ebx,%edx,8), %xmm0
    movsd   %xmm0, (%ebx,%edx,8)
    movl    4(%eax), %edx
    movl    36(%esi), %ebx
    movsd   (%ebx,%edx,8), %xmm0
    movl    (%eax), %ebp
    movl    4(%edi), %edi
    addsd   (%edi,%ebp,8), %xmm0
    movsd   %xmm0, (%ebx,%edx,8)
    addl    $16, %eax
    decl    %ecx
    jne .LBB6_2
.LBB6_3:
    movl    48(%esp), %eax
    movl    (%eax), %eax
    testl   %eax, %eax
    je  .LBB6_9
    movl    48(%esp), %ecx
    movl    4(%ecx), %ecx
    pxor    %xmm0, %xmm0
    xorl    %edx, %edx
    pxor    %xmm1, %xmm1
    .align  16
.LBB6_5:
    movl    36(%esi), %edi
    movsd   (%edi,%edx,8), %xmm2
    ucomisd %xmm1, %xmm2
    ja  .LBB6_7
    movsd   .LCPI6_0, %xmm2
.LBB6_7:
    movsd   (%ecx,%edx,8), %xmm3
    divsd   %xmm2, %xmm3
    movsd   %xmm3, (%ecx,%edx,8)
    movl    44(%esi), %edi
    subsd   (%edi,%edx,8), %xmm3
    mulsd   %xmm3, %xmm3
    addsd   %xmm3, %xmm0
    incl    %edx
    cmpl    %eax, %edx
    jne .LBB6_5
...



I've also seen that the program gets a little faster with some unrollig, done
like this:

        const int UNROLL = 5;
        assert(!(ratings.length % UNROLL));
        assert(!(reputationUser.length % UNROLL));

        for (int i; i < ratings.length; i += UNROLL) {
            foreach (k; Range!(UNROLL)) {
                auto r = &(ratings[i + k]);
                auto aux = (r.value - reputationObject[r.object]);
                reputationUser[r.user] += aux * aux;
            }
        }


Where:

template Tuple(T...) {
    alias T Tuple;
}

template Range(int stop) {
    static if (stop <= 0)
        alias Tuple!() Range;
    else
        alias Tuple!(Range!(stop-1), stop-1) Range;
}


But to make it work with a arrays of arbitrary length you need a little more
code, using a strategy similar to:

sum = 0;
for (i = 0; i < n - 3; i += 4)
  sum += A[i] + A[i+1] + A[i+2] + A[i+3];
for (; i < n; i++)
  sum += A[i];
  

I don't think it's the algorithm in any case (although it might be Phobos'
implementation) as working with Tango and LDC, changing between the Twister or
Kiss random number generators doesn't make for any meaningful performance
difference.<

Those Phobos and Tango implementations, when you use uniform(), are probably
different, there are enforce() and the nextafter() that are probably not used
in that Tango code.


I noticed that when compiled with LDC the memory usage stays constant, but in
DMD the memory use blows up.  Is this a D1/D2 difference (with D2 having the
more advanced features that require preservation of memory) or is it a bug in
one or the other of the compilers ... ?<

D2 dynamic arrays have being changed recently, D1 dynamic arrays have not
changed. LDC is D1 still. So this is just a D1/D2 difference. The purpose of
this change between D1 and D2 was to avoid bugs caused by "memory stomping" of
array slices.

Bye,
bearophile

Apr 15 2010

bearophile <bearophileHUGS lycos.com> writes:

 This is the middle parts (the two loops) of Yzlm.objectReputationUpdate
compiled with ldc, this is one of the two main hot spots of the program, the
you can compared this asm with the asm of the same part from the C++ version:

To find why the C++ code is faster you can show me the equivalent C++ code that
I can compile myself, or you can compile your C++ code and show me the asm of
the two functions that are hot spots.

Bye,
bearophile

Apr 15 2010

Joseph Wakeling <joseph.wakeling webdrake.net> writes:

bearophile wrote:
 This is the middle parts (the two loops) of Yzlm.objectReputationUpdate
compiled with ldc, this is one of the two main hot spots of the program, the
you can compared this asm with the asm of the same part from the C++ version:

 
 To find why the C++ code is faster you can show me the equivalent C++ code
that I can compile myself, or you can compile your C++ code and show me the asm
of the two functions that are hot spots.

I took a little time last night and wrote up a C++ version that is much
closer in design to the D code (as well as smaller, simpler, less hell
to read...).  It has libboost as a dependency for random number
generation and for the BOOST_FOREACH macro.

Following your observation with D, it looks like the C++ also benefits
from using references in the FOREACH loops for elements larger than
size_t -- this shaves about 20s off running time on my machine.  Either
way, like its uglier predecessor it's quite a bit faster than DMD or
LDC.  Unfortunately I can't get llvm-g++ running on my machine, or I
would compare performance there too.

Hope this makes an interesting point of comparison. :-)

Best wishes,

    -- Joe

Apr 17 2010

Joseph Wakeling <joseph.wakeling webdrake.net> writes:

bearophile wrote:
 You are right, sorry.

No need to apologise!  You helped me significantly improve my code,
helped me understand D a lot better and left me feeling generally very
positive about developing further in D.  I'd call that a result. :-)

 So I think it's probably just compiler difference that's to blame for speed
differences<

 
 This can be true, but such differences are not magic, they can be found, and
eventually they can even become precise enhancement requests for llvm
developers, they are open to such requests.

I hope my questions here have been useful in this respect ...

 After all, D in general and DMD in particular is in development.<

 
 DMD has a quite old back-end that is not in active development. Maybe it will
become 64 bit.

The DMD backend seems a generally problematic piece of code given the
proprietary restrictions that apply to it.  I have the impression that
it's being used simply because it's the backend that is known and tested
and that over the longer term there may be a switch ... ?

The one compiler-related concern I have is whether there will be an
effective free-as-in-freedom compiler for D2 in the near future.  Work
is going on GDC but it's moving slowly -- and I don't see a clear LDC
roadmap to D2 support.

 In this program I have seen that a good percentage of the speed difference
between ldc/dmd is caused by foreach loops. When you iterate over structs
longer than size_t use ref (or in D2 better ref const(...)).
 foreach (Rating r; ratings)
 ==>
 foreach (ref const(Rating) r; ratings)
 
 There is nothing algorithmically strange going on here, but to be sure you can
take a look at the produced asm.

Interesting ... !  I'm afraid the asm is over my head, but I will read
up and try to understand it.

 To find why the C++ code is faster you can show me the equivalent C++
 code that I can compile myself, or you can compile your C++ code and
 show me the asm of the two functions that are hot spots.

Sure -- attached.  The key functions are the ones in tref.cpp (the
smalltref class) and trefsim.cpp, and the treftest.cpp file that runs a
roughly-equivalent simulation to test.d.

It's a little embarrassing to share because it's fairly badly-designed
code -- my first serious C++ attempt -- it served its purpose some time
ago and was done with.  It started to become vaguely relevant again so I
thought I'd try and redo it with a better design, and rather than do it
in C++ (which I could do much better by now) this seemed like a nice
excuse to build my first D project ... :-)

I already made a couple of tweaks based on your improvements to the D
code, but I think g++ probably already has such optimizations during the
compile process.

I also realised that I wasn't comparing like with like -- the code was
using the RanLux random number generator, and with mt19937 it shaves
quite a bit of time off.  So C++ still has a bit of an edge right now.

There's surely a lot of other stuff that can be done to improve this
C++, but please don't put lots of effort in unless it helps with
improving D (the language and/or frontend, not my code, which is not so
important:-).

Thanks & best wishes,

    -- Joe

Apr 16 2010

bearophile <bearophileHUGS lycos.com> writes:

 - the two nested loops in the main are more efficient as ref double,
 this is something dmd will need to fix;

A test shows that on ldc the two nested loops are a little faster without the
ref. I'd like the compiler to use a "const ref" with the foreach iterates on
array items bigger than a word.

I've done a related small test:


version (Tango)
    import tango.stdc.stdio: printf;
else
    import std.stdio: printf;

void main() {
    auto data = new double[1_000];
    double tot = 0.0;
    foreach (uint i, ref el; data)
        tot += el * i;
    printf("%f\n", tot);
}


Just the asm of the loop.
Note how ldc keeps two iteration indexes, one in xmm0 and one in ESI, because
it's faster this way. LCPI1_0 is .quad 4607182418800017408 that is the double
value 1.0.

(Here the presence of ref doesn't change the code produced by ldc, but in the
two loops of the original code it makes a little difference).

ldc -O5 -release -inline -enable-unsafe-fp-math -output-s test.d
dmd -O -release -inline test.d

-------------------------

ldc, with ref:

.LBB1_1:
    movapd  %xmm0, %xmm1
    mulsd   (%eax,%esi,8), %xmm1
    movsd   16(%esp), %xmm2
    addsd   %xmm1, %xmm2
    movsd   %xmm2, 16(%esp)
    incl    %esi
    cmpl    $1000, %esi
    addsd   .LCPI1_0, %xmm0
    jne .LBB1_1

-------------------------

ldc, without ref:

.LBB1_1:
    movapd  %xmm0, %xmm1
    mulsd   (%eax,%esi,8), %xmm1
    movsd   16(%esp), %xmm2
    addsd   %xmm1, %xmm2
    movsd   %xmm2, 16(%esp)
    incl    %esi
    cmpl    $1000, %esi
    addsd   .LCPI1_0, %xmm0
    jne .LBB1_1

-------------------------

dmd, with ref:

L45:    mov 8[ESP],EDX
        xor ESI,ESI
        fld qword ptr [EDX*8][ECX]
        mov 0Ch[ESP],ESI
        mov EBX,EDX
        inc EDX
        fild    long64 ptr 8[ESP]
        fmulp   ST(1),ST
        fadd    qword ptr 018h[ESP]
        cmp EDX,010h[ESP]
        fstp    qword ptr 018h[ESP]
        jb  L45

-------------------------

dmd, without ref:

L45:    mov 8[ESP],EDX
        xor ESI,ESI
        mov EBX,EDX
        mov 0Ch[ESP],ESI
        fild    long64 ptr 8[ESP]
        fmul    qword ptr [EDX*8][ECX]
        inc EDX
        cmp EDX,010h[ESP]
        fadd    qword ptr 018h[ESP]
        fstp    qword ptr 018h[ESP]
        jb  L45

Bye,
bearophile

Apr 14 2010

Don <nospam nospam.com> writes:

bearophile wrote:
 So far I've just given a light reading of the code. Notes:

 - pow(x, 2) and sqrt(y) can be written as x ^^ 2 and y ^^ 0.5 (but you have to
import std.math anyway, because of a bug).

That's not a bug. It's intentional. x ^^ y will probably always require 
import std.math, if y is a floating point number.

Note that x^^2 doesn't require import std.math. (The fact that x ^^ 5 
requires import std.math is a temporary limitation).

Apr 14 2010

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

Don wrote:
 bearophile wrote:
 So far I've just given a light reading of the code. Notes:

 
 - pow(x, 2) and sqrt(y) can be written as x ^^ 2 and y ^^ 0.5 (but you 
 have to import std.math anyway, because of a bug).

 
 That's not a bug. It's intentional. x ^^ y will probably always require 
 import std.math, if y is a floating point number.

Really?  Why is that?  I find that kind of disappointing, I always 
believed it to be a temporary solution.

I think the inconsistency with the other operators will make this a 
major WTF for people new to the language.  Why should a^^b require an 
explicit import while a*b doesn't?

If the language made it possible to overload operators using free 
functions, I wouldn't mind if opBinary!"^^"(float, float) was 
implemented in std.math.  The way it is now, it's a halfway built-in, 
halfway library feature, and just seems halfway altogether.

-Lars

Apr 14 2010

Don <nospam nospam.com> writes:

Lars T. Kyllingstad wrote:
 Don wrote:
 bearophile wrote:
 So far I've just given a light reading of the code. Notes:

 - pow(x, 2) and sqrt(y) can be written as x ^^ 2 and y ^^ 0.5 (but 
 you have to import std.math anyway, because of a bug).

 That's not a bug. It's intentional. x ^^ y will probably always 
 require import std.math, if y is a floating point number.

 
 Really?  Why is that?  I find that kind of disappointing, I always 
 believed it to be a temporary solution.
 
 I think the inconsistency with the other operators will make this a 
 major WTF for people new to the language.  Why should a^^b require an 
 explicit import while a*b doesn't?

Because pow() for floating point, when implemented properly, is a HUGE 
function, that ends up dragging almost all of std.math into the 
executable. And I think it's deceptive to do that silently.
To make it completely built-in, basically all of std.math would need to 
be moved into druntime. Feel free to try to change my mind, of course.

 If the language made it possible to overload operators using free 
 functions, I wouldn't mind if opBinary!"^^"(float, float) was 
 implemented in std.math.  The way it is now, it's a halfway built-in, 
 halfway library feature, and just seems halfway altogether.

You can expect the integration to become cleaner; still, it's a 
compromise. It was a big fight to get it into the language at all, and 
I've tried to minimize the cost of it. ^^ is basically syntax sugar, and 
the price you pay for the additional tiny bit of syntax sugar (avoiding 
import std.math) has an appalling cost-benefit ratio.

Raising to a float power is really a niche feature. Are there really 
many uses for ^^ of floats, where std.math isn't imported already (for 
example, where you don't even use abs()) ?

Apr 14 2010

bearophile <bearophileHUGS lycos.com> writes:

Don:

Thank you for your answer and comments.

 Because pow() for floating point, when implemented properly, is a HUGE
 function, that ends up dragging almost all of std.math into the
 executable. And I think it's deceptive to do that silently.
 To make it completely built-in, basically all of std.math would need to
 be moved into druntime.

I have just read the source of pow() and indeed it's a complex function that
needs several other functions of math.


Raising to a float power is really a niche feature.<

Used it as x^^0.5 to perform the square root is not a niche feature, square
roots are common enough.
And sqrt is an intrinsic, it doesn't need all std.math.

Error messages are not good yet, this program:


import std.c.stdio: printf;
void main() {
    printf("%f\n", 2.5 ^^ 0.5);
}


Generates:

test.d(3): Error: must import std.math to use ^^ operator
test.d(3): Error: undefined identifier module test5.std
test.d(3): Error: no property 'math' for type 'void'
Error: no property 'sqrt' for type 'int'
test.d(3): Error: function expected before (), not __error of type int


Alternative single error:

test.d(4): Error: must import std.math to use "x ^^ floating point".


If I change the code a little the situation doesn't improve a lot:

import std.c.stdio: printf;
import std.math: sqrt;
void main() {
    printf("%f\n", 2.5 ^^ 0.5);
}


test5.d(4): Error: undefined identifier module test5.std
test5.d(4): Error: no property 'math' for type 'void'
Error: no property 'sqrt' for type 'int'
test5.d(4): Error: function expected before (), not __error of type int


So I can see various solutions:

1) Keep the situation as now, but improve the error messages (essentially
printing only one error message that's more focused).

2) As the point (1), but in the case of x^^0.5 doesn't require the import math;
x^^FP requires the import still.

3) Automatically import the pow() in all modules, even if this causes the
import of lot of std.math. So no errors are generated.

4) Automatically import the pow() in a module only if it contains a x^^FP where
FP is a floating point != 0.5  So the code in most modules is not affected by
std.math.

Regardless what the final decision will be, this topic must be discussed in the
D newsgroup, it's not something you have to decide for yourself (despite you
are a very programmer, Don). This subthread can even be moved to the main D
newsgroup.

Bye,
bearophile

Apr 14 2010

Don <nospam nospam.com> writes:

bearophile wrote:
 Don:
 
 Thank you for your answer and comments.
 
 Because pow() for floating point, when implemented properly, is a HUGE
 function, that ends up dragging almost all of std.math into the
 executable. And I think it's deceptive to do that silently.
 To make it completely built-in, basically all of std.math would need to
 be moved into druntime.

 
 I have just read the source of pow() and indeed it's a complex function that
needs several other functions of math.

It's not even implemented correctly yet. A correct implementation is AT 
LEAST ten times larger.
 
 
 Raising to a float power is really a niche feature.<

 
 Used it as x^^0.5 to perform the square root is not a niche feature, square
roots are common enough.
 And sqrt is an intrinsic, it doesn't need all std.math.


 
 Error messages are not good yet, this program:
 
 
 import std.c.stdio: printf;
 void main() {
     printf("%f\n", 2.5 ^^ 0.5);
 }
 
 
 Generates:
 
 test.d(3): Error: must import std.math to use ^^ operator
 test.d(3): Error: undefined identifier module test5.std
 test.d(3): Error: no property 'math' for type 'void'
 Error: no property 'sqrt' for type 'int'
 test.d(3): Error: function expected before (), not __error of type int
 
 
 Alternative single error:
 
 test.d(4): Error: must import std.math to use "x ^^ floating point".

Please don't complain about this obvious stuff. Parasitic error 
suppression is an ongoing task in the compiler. It's gradually 
improving, and there's nothing special about this particular case.


 If I change the code a little the situation doesn't improve a lot:
 
 import std.c.stdio: printf;
 import std.math: sqrt;
 void main() {
     printf("%f\n", 2.5 ^^ 0.5);
 }
 
 
 test5.d(4): Error: undefined identifier module test5.std
 test5.d(4): Error: no property 'math' for type 'void'
 Error: no property 'sqrt' for type 'int'
 test5.d(4): Error: function expected before (), not __error of type int
 
 
 So I can see various solutions:
 
 1) Keep the situation as now, but improve the error messages (essentially
printing only one error message that's more focused).
 
 2) As the point (1), but in the case of x^^0.5 doesn't require the import
math; x^^FP requires the import still.
 
 3) Automatically import the pow() in all modules, even if this causes the
import of lot of std.math. So no errors are generated.
 
 4) Automatically import the pow() in a module only if it contains a x^^FP
where FP is a floating point != 0.5  So the code in most modules is not
affected by std.math.
 
 Regardless what the final decision will be, this topic must be discussed in
the D newsgroup, it's not something you have to decide for yourself (despite
you are a very programmer, Don). This subthread can even be moved to the main D
newsgroup.

I see it as a VERY minor issue, we don't need to deal with it for a long 
time.

Apr 14 2010

Justin Spahr-Summers <Justin.SpahrSummers gmail.com> writes:

On Wed, 14 Apr 2010 05:58:55 -0400, bearophile 
<bearophileHUGS lycos.com> wrote:
 
 Don:
 
Raising to a float power is really a niche feature.<

 
 Used it as x^^0.5 to perform the square root is not a niche feature, square
roots are common enough.
 And sqrt is an intrinsic, it doesn't need all std.math.

I think I've only ever used a square root once, and that was to get the 
distance between two points. I'm not sure that square roots are common 
enough in general code to justify an implicit import (especially if we 
think of D in its context as a systems programming language).

Apr 14 2010

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

Don wrote:
 Lars T. Kyllingstad wrote:
 Don wrote:
 bearophile wrote:
 So far I've just given a light reading of the code. Notes:

 - pow(x, 2) and sqrt(y) can be written as x ^^ 2 and y ^^ 0.5 (but 
 you have to import std.math anyway, because of a bug).

 That's not a bug. It's intentional. x ^^ y will probably always 
 require import std.math, if y is a floating point number.

 Really?  Why is that?  I find that kind of disappointing, I always 
 believed it to be a temporary solution.

 I think the inconsistency with the other operators will make this a 
 major WTF for people new to the language.  Why should a^^b require an 
 explicit import while a*b doesn't?

 
 Because pow() for floating point, when implemented properly, is a HUGE 
 function, that ends up dragging almost all of std.math into the 
 executable. And I think it's deceptive to do that silently.
 To make it completely built-in, basically all of std.math would need to 
 be moved into druntime. Feel free to try to change my mind, of course.

Exponentiation is a built-in operation in FORTRAN, so I made this little 
program to check:

       program test

       implicit none
       real  :: base, expo, pow

       write (*,*) "Base:"
       read  (*,*) base
       write (*,*) "Exponent:"
       read  (*,*) expo

       pow = base**expo
       write (*,*) pow

       end program test

The produced executable is 11K.  If I replace exponentiation with 
multiplication, it is still 11K.  Why wouldn't the same be possible in D?

Note that I'm not trying to argue with you, I am sure you know what 
you're talking about.  I know very little of what's going on under the 
hood of either compiler, so I'm asking out of curiosity.


 If the language made it possible to overload operators using free 
 functions, I wouldn't mind if opBinary!"^^"(float, float) was 
 implemented in std.math.  The way it is now, it's a halfway built-in, 
 halfway library feature, and just seems halfway altogether.

 
 You can expect the integration to become cleaner; still, it's a 
 compromise. It was a big fight to get it into the language at all, and 
 I've tried to minimize the cost of it. ^^ is basically syntax sugar, and 
 the price you pay for the additional tiny bit of syntax sugar (avoiding 
 import std.math) has an appalling cost-benefit ratio.
 
 Raising to a float power is really a niche feature. Are there really 
 many uses for ^^ of floats, where std.math isn't imported already (for 
 example, where you don't even use abs()) ?

It's the inconsistency with the other operations on built-in types that 
bothers me, not the hassle of writing 'import std.math'.

But please don't get me wrong, I absolutely prefer the current solution 
over nothing.  Now that I've gotten used to it I'm never going back. :)

For fun I grepped my numerics library for uses of ^^, and found I've 
used it >65 times (65 is the line count, and it's used twice in some 
lines).  And let it be known that the majority of the cases are 
float^^float! ;)

-Lars

Apr 14 2010

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

Lars T. Kyllingstad wrote:
 Don wrote:
 Lars T. Kyllingstad wrote:
 Don wrote:
 bearophile wrote:
 So far I've just given a light reading of the code. Notes:

 - pow(x, 2) and sqrt(y) can be written as x ^^ 2 and y ^^ 0.5 (but 
 you have to import std.math anyway, because of a bug).

 That's not a bug. It's intentional. x ^^ y will probably always 
 require import std.math, if y is a floating point number.

 Really?  Why is that?  I find that kind of disappointing, I always 
 believed it to be a temporary solution.

 I think the inconsistency with the other operators will make this a 
 major WTF for people new to the language.  Why should a^^b require an 
 explicit import while a*b doesn't?

 Because pow() for floating point, when implemented properly, is a HUGE 
 function, that ends up dragging almost all of std.math into the 
 executable. And I think it's deceptive to do that silently.
 To make it completely built-in, basically all of std.math would need 
 to be moved into druntime. Feel free to try to change my mind, of course.

 
 Exponentiation is a built-in operation in FORTRAN, so I made this little 
 program to check:
 
       program test
 
       implicit none
       real  :: base, expo, pow
 
       write (*,*) "Base:"
       read  (*,*) base
       write (*,*) "Exponent:"
       read  (*,*) expo
 
       pow = base**expo
       write (*,*) pow
 
       end program test
 
 The produced executable is 11K.  If I replace exponentiation with 
 multiplication, it is still 11K.  Why wouldn't the same be possible in D?

Scratch that, I just remembered that libgfortran is dynamically linked 
in by default.  However, compiling with -static-libgfortran makes the 
executable 155K with both exponentiation and multiplication.

-Lars

Apr 14 2010

Don <nospam nospam.com> writes:

Lars T. Kyllingstad wrote:
 Lars T. Kyllingstad wrote:
 Don wrote:
 Lars T. Kyllingstad wrote:
 Don wrote:
 bearophile wrote:
 So far I've just given a light reading of the code. Notes:

 - pow(x, 2) and sqrt(y) can be written as x ^^ 2 and y ^^ 0.5 (but 
 you have to import std.math anyway, because of a bug).

 That's not a bug. It's intentional. x ^^ y will probably always 
 require import std.math, if y is a floating point number.

 Really?  Why is that?  I find that kind of disappointing, I always 
 believed it to be a temporary solution.

 I think the inconsistency with the other operators will make this a 
 major WTF for people new to the language.  Why should a^^b require 
 an explicit import while a*b doesn't?

 Because pow() for floating point, when implemented properly, is a 
 HUGE function, that ends up dragging almost all of std.math into the 
 executable. And I think it's deceptive to do that silently.
 To make it completely built-in, basically all of std.math would need 
 to be moved into druntime. Feel free to try to change my mind, of 
 course.

 Exponentiation is a built-in operation in FORTRAN, so I made this 
 little program to check:

       program test

       implicit none
       real  :: base, expo, pow

       write (*,*) "Base:"
       read  (*,*) base
       write (*,*) "Exponent:"
       read  (*,*) expo

       pow = base**expo
       write (*,*) pow

       end program test

 The produced executable is 11K.  If I replace exponentiation with 
 multiplication, it is still 11K.  Why wouldn't the same be possible in D?

 
 Scratch that, I just remembered that libgfortran is dynamically linked 
 in by default.  However, compiling with -static-libgfortran makes the 
 executable 155K with both exponentiation and multiplication.
 
 -Lars

I have a vague recollection that correctly-rounded pow() will require 
bigint (can't quite remember, though). I'm also concerned about build 
tools -- I don't want them to have to know about the dependency.
As a bare minimum, the error message will need to improve (with some 
explanation of why std.math is required, to reduce the WTF factor).
But in any case, it's a very minor issue.
Personally, I'm finding that having ^^ as standard nomenclature is 
extremely useful, even apart from the use in code.

Apr 14 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 14/04/10 20:54, Don wrote:
 I have a vague recollection that correctly-rounded pow() will require
 bigint (can't quite remember, though). I'm also concerned about build
 tools -- I don't want them to have to know about the dependency.
 As a bare minimum, the error message will need to improve (with some
 explanation of why std.math is required, to reduce the WTF factor).
 But in any case, it's a very minor issue.
 Personally, I'm finding that having ^^ as standard nomenclature is
 extremely useful, even apart from the use in code.

I've always disliked the requirement to import std.math, but only 
because of it's lack of flexibility... Surely it would be better to 
require the pow() function to be defined somewhere, this way if/when 
tango is ported to D2 it doesn't have to hack around this to allow it to 
allow users to use ^^.

Apr 14 2010

bearophile <bearophileHUGS lycos.com> writes:

Robert Clipsham Wrote:

 this way if/when 
 tango is ported to D2 it doesn't have to hack around this to allow it to 
 allow users to use ^^.

I hope Tango2 will be designed to be installed beside Phobos2, and not in place
of it.

Bye,
bearophile

Apr 14 2010

Joseph Wakeling <joseph.wakeling webdrake.net> writes:

bearophile wrote:
 Robert Clipsham Wrote:
 
 this way if/when 
 tango is ported to D2 it doesn't have to hack around this to allow it to 
 allow users to use ^^.

 
 I hope Tango2 will be designed to be installed beside Phobos2, and not in
place of it.

I thought that was the aim ... ?  At least according to D's Wikipedia page:
http://en.wikipedia.org/wiki/D_%28programming_language%29#Division_concerning_the_standard_library

Apr 15 2010

Ary Borenszweig <ary esperanto.org.ar> writes:

Don wrote:
 Lars T. Kyllingstad wrote:
 Don wrote:
 bearophile wrote:
 So far I've just given a light reading of the code. Notes:

 - pow(x, 2) and sqrt(y) can be written as x ^^ 2 and y ^^ 0.5 (but 
 you have to import std.math anyway, because of a bug).

 That's not a bug. It's intentional. x ^^ y will probably always 
 require import std.math, if y is a floating point number.

 Really?  Why is that?  I find that kind of disappointing, I always 
 believed it to be a temporary solution.

 I think the inconsistency with the other operators will make this a 
 major WTF for people new to the language.  Why should a^^b require an 
 explicit import while a*b doesn't?

 
 Because pow() for floating point, when implemented properly, is a HUGE 
 function, that ends up dragging almost all of std.math into the 
 executable. And I think it's deceptive to do that silently.
 To make it completely built-in, basically all of std.math would need to 
 be moved into druntime. Feel free to try to change my mind, of course.

Is there a better way to do pow() for floating point numbers without 
importing std.math?

I see this:

1. You want to do x ^^ fp.
2. The compiler complains saying "if you want to do that, import 
std.math. I'm telling you this just to let you know you'll be importing 
the whole module".

Alternative 1 for user:
User says "Ok, I import std.math"

Alternative 2 for user:
User says "No way I'm importing std.math just to make a pow. But... I 
still *need* to make that pow... what is your advice, mr. compiler?"

Apr 15 2010

Don <nospam nospam.com> writes:

Ary Borenszweig wrote:
 Don wrote:
 Lars T. Kyllingstad wrote:
 Don wrote:
 bearophile wrote:
 So far I've just given a light reading of the code. Notes:

 - pow(x, 2) and sqrt(y) can be written as x ^^ 2 and y ^^ 0.5 (but 
 you have to import std.math anyway, because of a bug).

 That's not a bug. It's intentional. x ^^ y will probably always 
 require import std.math, if y is a floating point number.

 Really?  Why is that?  I find that kind of disappointing, I always 
 believed it to be a temporary solution.

 I think the inconsistency with the other operators will make this a 
 major WTF for people new to the language.  Why should a^^b require an 
 explicit import while a*b doesn't?

 Because pow() for floating point, when implemented properly, is a HUGE 
 function, that ends up dragging almost all of std.math into the 
 executable. And I think it's deceptive to do that silently.
 To make it completely built-in, basically all of std.math would need 
 to be moved into druntime. Feel free to try to change my mind, of course.

 
 Is there a better way to do pow() for floating point numbers without 
 importing std.math?
 
 I see this:
 
 1. You want to do x ^^ fp.
 2. The compiler complains saying "if you want to do that, import 
 std.math. I'm telling you this just to let you know you'll be importing 
 the whole module".
 
 Alternative 1 for user:
 User says "Ok, I import std.math"
 
 Alternative 2 for user:
 User says "No way I'm importing std.math just to make a pow. But... I 
 still *need* to make that pow... what is your advice, mr. compiler?"

That's a good point, it should be possible to use a static import as 
well. I do think it's pretty odd to be doing floating point without 
importing std.math, though. I mean, abs() is absolutely fundamental.

Apr 15 2010

Ary Borenszweig <ary esperanto.org.ar> writes:

Don wrote:
 Ary Borenszweig wrote:
 Don wrote:
 Lars T. Kyllingstad wrote:
 Don wrote:
 bearophile wrote:
 So far I've just given a light reading of the code. Notes:

 - pow(x, 2) and sqrt(y) can be written as x ^^ 2 and y ^^ 0.5 (but 
 you have to import std.math anyway, because of a bug).

 That's not a bug. It's intentional. x ^^ y will probably always 
 require import std.math, if y is a floating point number.

 Really?  Why is that?  I find that kind of disappointing, I always 
 believed it to be a temporary solution.

 I think the inconsistency with the other operators will make this a 
 major WTF for people new to the language.  Why should a^^b require 
 an explicit import while a*b doesn't?

 Because pow() for floating point, when implemented properly, is a 
 HUGE function, that ends up dragging almost all of std.math into the 
 executable. And I think it's deceptive to do that silently.
 To make it completely built-in, basically all of std.math would need 
 to be moved into druntime. Feel free to try to change my mind, of 
 course.

 Is there a better way to do pow() for floating point numbers without 
 importing std.math?

 I see this:

 1. You want to do x ^^ fp.
 2. The compiler complains saying "if you want to do that, import 
 std.math. I'm telling you this just to let you know you'll be 
 importing the whole module".

 Alternative 1 for user:
 User says "Ok, I import std.math"

 Alternative 2 for user:
 User says "No way I'm importing std.math just to make a pow. But... I 
 still *need* to make that pow... what is your advice, mr. compiler?"

 
 That's a good point, it should be possible to use a static import as 
 well. I do think it's pretty odd to be doing floating point without 
 importing std.math, though. I mean, abs() is absolutely fundamental.

But if you do a static import the whole module gets linked in, right?

My point is, if you are going to pow, you will need std.math, so it'll 
always be a burden to import it by hand when using it. ^^

Apr 15 2010

bearophile <bearophileHUGS lycos.com> writes:

Ary Borenszweig:
 My point is, if you are going to pow, you will need std.math, so it'll 
 always be a burden to import it by hand when using it. ^^

Can the automatic import happen only iff a module uses the ^^fp?

Bye,
bearophile

Apr 15 2010

Ary Borenszweig <ary esperanto.org.ar> writes:

bearophile wrote:
 Ary Borenszweig:
 My point is, if you are going to pow, you will need std.math, so it'll 
 always be a burden to import it by hand when using it. ^^

 
 Can the automatic import happen only iff a module uses the ^^fp?
 
 Bye,
 bearophile

That's what I'm asking for.

Apr 16 2010

Don <nospam nospam.com> writes:

Ary Borenszweig wrote:
 Don wrote:
 Ary Borenszweig wrote:
 Don wrote:
 Lars T. Kyllingstad wrote:
 Don wrote:
 bearophile wrote:
 So far I've just given a light reading of the code. Notes:

 - pow(x, 2) and sqrt(y) can be written as x ^^ 2 and y ^^ 0.5 
 (but you have to import std.math anyway, because of a bug).

 That's not a bug. It's intentional. x ^^ y will probably always 
 require import std.math, if y is a floating point number.

 Really?  Why is that?  I find that kind of disappointing, I always 
 believed it to be a temporary solution.

 I think the inconsistency with the other operators will make this a 
 major WTF for people new to the language.  Why should a^^b require 
 an explicit import while a*b doesn't?

 Because pow() for floating point, when implemented properly, is a 
 HUGE function, that ends up dragging almost all of std.math into the 
 executable. And I think it's deceptive to do that silently.
 To make it completely built-in, basically all of std.math would need 
 to be moved into druntime. Feel free to try to change my mind, of 
 course.

 Is there a better way to do pow() for floating point numbers without 
 importing std.math?

 I see this:

 1. You want to do x ^^ fp.
 2. The compiler complains saying "if you want to do that, import 
 std.math. I'm telling you this just to let you know you'll be 
 importing the whole module".

 Alternative 1 for user:
 User says "Ok, I import std.math"

 Alternative 2 for user:
 User says "No way I'm importing std.math just to make a pow. But... I 
 still *need* to make that pow... what is your advice, mr. compiler?"

 That's a good point, it should be possible to use a static import as 
 well. I do think it's pretty odd to be doing floating point without 
 importing std.math, though. I mean, abs() is absolutely fundamental.

 
 But if you do a static import the whole module gets linked in, right?
 
 My point is, if you are going to pow, you will need std.math, so it'll 
 always be a burden to import it by hand when using it. ^^

But you're assuming that you're using ^^ without using anything else 
from std.math. I think that's a very obscure case.
For example, any code which is ported from C or C++, or D1, that uses 
pow, will already be importing std.math. Cases where you see that you 
could use ^^ that isn't already using pow() (eg, where you see z = x*x + 
y*y), you will need to add an import.

Apr 16 2010

Ary Borenszweig <ary esperanto.org.ar> writes:

Don wrote:
 Ary Borenszweig wrote:
 Don wrote:
 Ary Borenszweig wrote:
 Don wrote:
 Lars T. Kyllingstad wrote:
 Don wrote:
 bearophile wrote:
 So far I've just given a light reading of the code. Notes:

 - pow(x, 2) and sqrt(y) can be written as x ^^ 2 and y ^^ 0.5 
 (but you have to import std.math anyway, because of a bug).

 That's not a bug. It's intentional. x ^^ y will probably always 
 require import std.math, if y is a floating point number.

 Really?  Why is that?  I find that kind of disappointing, I always 
 believed it to be a temporary solution.

 I think the inconsistency with the other operators will make this 
 a major WTF for people new to the language.  Why should a^^b 
 require an explicit import while a*b doesn't?

 Because pow() for floating point, when implemented properly, is a 
 HUGE function, that ends up dragging almost all of std.math into 
 the executable. And I think it's deceptive to do that silently.
 To make it completely built-in, basically all of std.math would 
 need to be moved into druntime. Feel free to try to change my mind, 
 of course.

 Is there a better way to do pow() for floating point numbers without 
 importing std.math?

 I see this:

 1. You want to do x ^^ fp.
 2. The compiler complains saying "if you want to do that, import 
 std.math. I'm telling you this just to let you know you'll be 
 importing the whole module".

 Alternative 1 for user:
 User says "Ok, I import std.math"

 Alternative 2 for user:
 User says "No way I'm importing std.math just to make a pow. But... 
 I still *need* to make that pow... what is your advice, mr. compiler?"

 That's a good point, it should be possible to use a static import as 
 well. I do think it's pretty odd to be doing floating point without 
 importing std.math, though. I mean, abs() is absolutely fundamental.

 But if you do a static import the whole module gets linked in, right?

 My point is, if you are going to pow, you will need std.math, so it'll 
 always be a burden to import it by hand when using it. ^^

 
 But you're assuming that you're using ^^ without using anything else 
 from std.math. I think that's a very obscure case.
 For example, any code which is ported from C or C++, or D1, that uses 
 pow, will already be importing std.math. Cases where you see that you 
 could use ^^ that isn't already using pow() (eg, where you see z = x*x + 
 y*y), you will need to add an import.
 

- Oh, hi Foo, I want to show you this great language D. You can doo blah 
and bleh.
- Cool, I will code something little to see how it is.
   ...
   Hey, WTF, I use the pow operand and it tells me I need to add an 
import? WTF??

You know, people program from scratch, they don't spend their life 
porting C and C++ code... ^^

Apr 16 2010

bearophile <bearophileHUGS lycos.com> writes:

Don:

 That's not a bug. It's intentional. x ^^ y will probably always require 
 import std.math, if y is a floating point number.

What's the rationale behind this decision?
I have filed it as a bug many days ago, because I think it's a wrong intention:
http://d.puremagic.com/issues/show_bug.cgi?id=3832
Having to import all names from the math module to use a built-in operator is
unintuitive, not handy, and sounds silly.

Bye,
bearophile

Apr 14 2010

bearophile <bearophileHUGS lycos.com> writes:

I'm catching up with the posts.

Joseph Wakeling:

 As for iteration, I don't know to what degree D's foreach() across
 arrays compares to for() commands in C++ -- I guess it should be pretty
 close in performance, no?

As you have seen from my benchmarks, when with dmd you use:
foreach (x; arr)
and x is longer than a word, then the loop is not as fast as it can be. When x
is longer than a word it's often more efficient to use a ref, especially in the
inner loop:
foreach (ref const(double) x; arr)
With the ldc compiler the different between using or not using a ref is zero,
or some times using ref slows down the foreach a little.
I have not filed enhancement requests about this because I think it's useless.

Bye,
bearophile

Apr 14 2010

Brad Roberts <braddr puremagic.com> writes:

On Mon, 12 Apr 2010, Joseph Wakeling wrote:

 Curious question: how come with a registered email address my messages
 have to be moderated, whereas via the http interface I can post straight
 away? :-P
 
 Best wishes,
 
     -- Joe

All new subscribers to the lists start off moderated until a post flows 
through that isn't obvious spam.  As soon as I see that, I remove the 
moderated flag.  The list is a much bigger spam target than the news 
server is.  I deflect dozens of spam messages a day, some of which are 
clever enough to register for the list first (not daily, but probably 
monthly).

Later,
Brad

Apr 12 2010

Don <nospam nospam.com> writes:

Joseph Wakeling wrote:
 Thanks to all again for the discussion, examples and explanations. :-)

[snip]
 My needs are in some ways quite narrow -- numerical simulations in
 interdisciplinary physics -- hence the C background, and hence the premium
 on performance.  They're also not very big programs -- simple enough for me
 to generally keep a personal overview on the memory management, even though
 with C++ that's usually all taken care of automatically (no new or delete
 statements if I can avoid it).
 
 What I'm fairly confident about is that, given not too much time, D will
 become a _far_ preferable language for that kind of development.

There are quite a lot of us here with exactly that kind of background.

Something about the array issue -- D dynamic arrays are heavily geared 
towards algorithms which perform an initial allocation and afterwards 
avoid memory allocation entirely.
In D, such slicing algorithms are extremely clean, extremely fast, and 
memory safe.
In C++, it's much more difficult to write code in that manner. A 
straightforward translation from C++ will generally miss the benefits of 
D arrays, and you'll end up with slower code.

A kind of "Zen of D" is to use array slices as much as possible.

Apr 12 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 11 Apr 2010 12:50:11 -0400, Joseph Wakeling  
<joseph.wakeling gmail.com> wrote:

 I was very happy to see that D _does_ have a 'reserve' function for  
 arrays,
 which I had been missing compared to C++ (it's not mentioned in the array
 docs).

It's new.  It probably should be mentioned in the spec, but it's a  
druntime thing.

 Still, I don't think that pre-reserving the memory per se is the  
 influencing
 factor on the differences in performance.

No, and like bearophile said, the D array is a compromise between  
performance and flexibility.  There are amazing ways to use D arrays that  
you could never do with C++ vectors.

 Note that in C++ the memory is not preassigned either.  The difference
 between the performance of these pieces of code is striking -- on my
 machine the D example takes about 70--75 seconds to run, whereas the
 C++ example speeds through it in 10s or less.

The C++ example is reallocating memory, freeing memory it is no longer  
using.  It also manually handles the memory management, allocating larger  
and larger arrays in some algorithmically determined fashion (for example,  
multiplying the length by some constant factor).  This gives it an edge in  
performance because it does not have to do any costly lookup to determine  
if it can append in place, plus the realloc of the memory probably is  
cheaper than the GC realloc of D.

If you want to compare apples to apples (well, probably more like red  
apples to green apples), you need to do these things in a struct for D.  I  
had thought the D appender class would do the trick, but as you stated  
below, it's even slower.  This needs to be remedied.

 D also uses about 20% more memory than the C++ even though the C++ code
 declares a higher capacity for the vector (above 8 million) than D does
 for the array (a little over 5 million).

D does not assume you stopped caring about the memory being pointed to  
when it had to realloc.  Therefore, it leaves it around in case you are  
still using it.  You can also expect more overhead in the GC because it  
tends to hang on to memory in case it wants to use it again, or because it  
hasn't collected it yet.  This is true of most GC-based languages.

For example, you can do this in D:

int[] arr = new int[50];
int *arrelem = &arr[5];

for(int i = 0; i < 10000; i++)
   arr ~= i;

*arrelem = 12345; // valid.

You can't do the same thing with C++ vectors, when they reallocate, the  
memory they used to own could be freed.  This invalidates all pointers and  
iterators into the vector, but the language doesn't prevent you from  
having such dangling pointers.  Using one of them can result in memory  
corruption, one of the worst bugs.  D tries to eliminate as much memory  
corruption problems as possible.

 I don't know if it was naive to assume that D's dynamic arrays would be
 equivalent to vectors in C++.  But it's clear that the D array appender
 ~= is much slower than the C++ vector push_back().

But is much safer, and supports safe slicing.  Compromises.

 Using an Appender class and put() (from std.array) is even slower,
 despite the std.array docs recommending this over ~. :-(

This must be fixed, the appender should be blazingly fast at appending  
(almost as fast as C++), with the drawback that the overhead is higher.

 It's disappointing because I'm loving so much of D's syntax, but I can
 write far more efficient code (with less explicit memory management)
 in C++ ...

You haven't done much with it yet.  When you start discovering how much D  
takes care of, you will be amazed :)

array appending isn't the fastest operation, but it is safe, and safety  
can be worth infinitely more than speed sometimes.

The thing about D is it *can* be fast and unsafe, just as fast and unsafe  
as C, but that's not the default.

-Steve

Apr 11 2010

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Memory leak with dynamic array