digitalmars.D - Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL

Dan W (15/15) May 16 2010 Hi all, I'm toying around with the idea of porting my raytracer codebase...

bearophile (7/10) May 16 2010 You can't ask a new open source language to have the features of a ten+ ...

Walter Bright (21/23) May 16 2010 Right, but dmd is using an optimizer and code generator that has been ar...

bearophile (18/26) May 16 2010 Only practical experiments can tell if you are right.

%u (30/40) May 18 2010 Hi all, due to the slow speed of my browser and multiple posts, I'll be

Walter Bright (7/9) May 18 2010 When comparing D performance with C++, it is best to compare compilers w...

retard (8/20) May 18 2010 What if I'm using a clean room implementation of D with a custom backend...

Robert Clipsham (14/21) May 18 2010 I don't believe Walter is arguing against this methodology. What he is

bearophile (4/7) May 18 2010 Life isn't fair. Too bad for the one with a inferior back-end.

Walter Bright (7/8) May 18 2010 Of course it isn't fair. But if you want to draw useful conclusions from...

Walter Bright (4/25) May 18 2010 You're allowed to do whatever you want. I'm pointing out that the differ...

retard (41/63) May 18 2010 It's a rookie mistake to believe that languages have some kind of

Walter Bright (11/13) May 20 2010 Well, they do. It's also true that these performance differences can be ...

retard (8/25) May 21 2010 Of course. The language/implementation comparisons are all faulty. You

bearophile (239/243) May 18 2010 In LDC (LLVM) this optimization is named -loop-unswitch and it's present...

Robert Clipsham (29/44) May 16 2010 dmd is under 2 (3) licenses, one for the front end and one for the

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (8/13) May 16 2010 That's a common misconception about the GPL: you have to distribute

Leandro Lucarella (12/20) May 16 2010 The source must be available. You usually don't distribute the source if

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (33/48) May 17 2010 l

Alex Makhotin (10/11) May 16 2010 Hi,

bearophile (6/9) May 16 2010 At the moment performance (if compared to C++ code compiled with GCC or ...

BCS (19/34) May 16 2010 The front end is under an Open Source (R) license. The backbend is open ...
bearophile (5/7) May 16 2010 This page explains this topic:

Walter Bright (9/14) May 16 2010 What's actually happening is interprocedural analysis, and inlining acro...

bearophile (106/109) May 16 2010 Thank you for your answers.

Walter Bright (12/34) May 16 2010 Right, dmd doesn't do custom calling conventions. But, it is not necessa...

bearophile (5/8) May 16 2010 I'll talk about this a bit with LLVM devs.

Nick Sabalausky (20/26) May 16 2010 Other people mentioned the recent D plugin for Visual Studio. If that is...
Robert Jacques (4/7) May 16 2010 I have some decent CUDA bindings with a nice high level API that I'd be ...

Dan W <twinbee42 skytopia.com> writes:

Hi all, I'm toying around with the idea of porting my raytracer codebase to D.
But before committing, I have a few rookie questions:

1: What kind of license is the D compiler under? I'm thinking of shipping a
commercial, close sourced (for now) program with the D compiler (so that users
can compile within the GUI). Is this possible to do, or can I least pay for the
priviledge?

2: Is it possible to use D with the Visual C++ IDE? Preferably, I would like
the apprepriate compiler and D options listed in the options (in place of the
usual c/c++ options).

3: I need my program to be as fast as possible. The Visual C++ compiler has
features such as "link-time code generation" and "Profile guided optimization".
Does D have equivalents?

4: Does D play nicely with QT, SDL, Lua?

5: How about compatibility with GPGPU stuff like CUDA and OpenCL? Can I just as
easily write GPGPU programs which run as fast as I can with C/C++?

May 16 2010

bearophile <bearophileHUGS lycos.com> writes:

Dan W.:

 3: I need my program to be as fast as possible. The Visual C++ compiler has
 features such as "link-time code generation" and "Profile guided optimization".
 Does D have equivalents?

You can't ask a new open source language to have the features of a ten+ years
old commercial compiler.
If you compile D1-Tango code on Linux 32 bit using LDC using all the correct
compile switches you can get performance comparable to C code compiled with
GCC. LDC does not have vectorization (that gcc has) but has link-time
optimization that GCC 4.5 has just in part. This is the very best performance
you can hope with D.


General note: a trap D has put itself into: a significant group of people seem
interested in D only as a high performance language. But history shows that
nearly no new language starts its life being very fast. High performance,
especially if you mean it as compared to quite mature C++ implementations, is
something that can only come some years after a language has already reached
some form of success and people start to use weeks, months or years just tuning
the GC, creating whole new kinds of GC, inventing and implementing other
D-specific optimizations, implementing a good escape analysis, implementing a
good devirtualization+inlining of virtual functions, implementing various
different kinds of efficient vectorizations, implementing a good pointer alias
analysis, and so on.

Today some kind of Java programs running on HotSpot have a performance
comparable to C++ programs. JavaScript running on V8 is often less than ten
times slower than well compiled C. But for years both Java and JavaScript were
dog-slow. Most things in D are designed to require a simple enough compiler, it
doesn't need an advanced JIT just to be efficient. So even naively compiled D
programs aren't 50 times slower than equivalent C++ programs. Yet, the
performance is not the same as commercial C++ compilers, and it will not be
like that unless groups of serious people set as their main/only purpose the
creation of a efficient D2 compiler. "Performance" is not something that just
happens, you need lot of focused work to gain it.

Bye,
bearophile

May 16 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 You can't ask a new open source language to have the features of a ten+ years
 old commercial compiler.

Right, but dmd is using an optimizer and code generator that has been around
for 
25 years now. It's optimization is competent and reasonably advanced - the
usual 
data flow optimizations are there, and the expected back end optimizations like 
register allocation using live range analysis and instruction scheduling are
all 
there.

The back end can be improved for floating point, but for integer/pointer work
it 
is excellent.

It does not do link time code generation nor profile guided optimization, 
although in my experiments such features pay off only in a small minority of 
cases. In my experiments on vector array operations, the improvement from the 
CPU's vector instructions is disappointing. It always seems to get hopelessly 
bottlenecked by memory bandwidth.

The dmd does have a built-in profiling tool, which is extremely effective in 
pinpointing trouble spots in the source code. For example, in the recent issue 
where the spell checker was slow, the profiler pointed the damning finger at 
exactly where the problem was. (It was an algorithmic problem, not an 
optimization problem.)

Just to brag about how good it can be, DMC++ remains by far the fastest C++ 
compiler available, and DMD is incredibly fast at compiling. Both are built
with 
the same optimizer and code generator that DMD uses.

May 16 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

Thank you for your answers and explanations.

 The back end can be improved for floating point, but for integer/pointer work
it 
 is excellent.

Only practical experiments can tell if you are right.
(a Raytracer uses lot of floating point ops. My small raytracers compiled with
dmd are some times slower than the same compiled with ldc).


 It does not do link time code generation nor profile guided optimization, 
 although in my experiments such features pay off only in a small minority of 
 cases.

I agree that profile guided optimization on GCC usually pays little, so I
usually I don't use it with GCC. My theory is that it is not using the
profiling information well enough yet. Reading the asm output of the Java
HotSpot (and this is not easy to do) has shown me that HotSpot performs some
things that GCC isn't doing yet, that in numerical programs give a good
performance increase. Here I have shown one of the simpler and most effective
optimizations done by HotSpot thanks to the profile information it continuously
collects:
http://llvm.org/bugs/show_bug.cgi?id=5540

Link time optimization, as done by LDC has given a good speedup in several of
my programs, I like it enough. It allows to apply all other compiler
optimizations more effectively. It's able to decrease the program size too.


 In my experiments on vector array operations, the improvement from the 
 CPU's vector instructions is disappointing. It always seems to get hopelessly 
 bottlenecked by memory bandwidth.

dmd array operations are currently not so useful.
But there are several ways to vectorize code that can give very large (up to
10-16 times) speedups on numerical code.

This is one of the kinds of vectorization:
http://gcc.gnu.org/wiki/Graphite
http://wiki.llvm.org/Polyhedral_optimization_framework

Another kind of vectorization is performing up to three levels of tiling (when
the implemented algorithm is not cache oblivious).

Another kind of vectorization is the usage of all the fields of a SSE (and
future AVC) registers in parallel. Doing this well seems very hard for
compilers (llvm is not able to do it, gcc does it a bit in some situations, and
I don't know what the intel compiler does here, I think the intel compiler
performs it only if the given C code is written in a specific way that you
often have to find by time-consuming trial and error), I don't know why. So
this optimization is often done manually, writing asm by hand... if you look at
the asm written in video decoders you can see that it's many times faster than
the asm produced from C by the best compilers.

Then there are true parallel optimizations, that means using more than one CPU
core to perform operations, examples of this are Cilk, parallel fors done in a
simple way or in a more refined way as in Chapel language, and then there are
the various message passing implementations, etc.

If you have heavy numerical code and you combine all those things you can often
get code 40 times faster or more. To perform all such optimizations you need
smart compilers and/or a language that gives lot of semantics to the back-end
(as Cilk, Chapel, Fortress).

Bye,
bearophile

May 16 2010

%u <twinbee42 skytopia.com> writes:

Hi all, due to the slow speed of my browser and multiple posts, I'll be
posting just one email which covers everything. Please let me know if
replying to each individually is really preferred. Many thanks for all
and any help.

May I ask you why are you planning to port an existing codebase to D?
What kind of benefits specifically(except comparable to C performance)
you expect from D?

Thank you.

Sure. There's a couple of reasons really. First is that a lot of 'fluff' in
C is rectified in D so that declarations and header files are a thing of
the past. Hence less repetition and housekeeping.
Second reason is (and I know this might sound idealistic), it'd be nice
to promote D more, and get more people using it, since it is a step up
from a C in many regards.

My code is still fairly small (certainly less than 1 million lines :) ), so it
won't be too much hassle.

Walter said:

It does not do link time code generation nor profile guided optimization,
although in my experiments such features pay off only in a small minority of
cases.

In VC++, PGO is a great speed help because of inlining, but from what you said
later, this doesn't seem to be so much of an issue with D as (like you said),
it has access to all the code anyway. I'm a little concerned though about the
floating
point performance, as raytracing does quite a bit of this of course.

The DMC++ compiler you mentioned sounds interesting too. I'd like to compare
performance with that, the VC++ one, and the Intel compiler.

Thanks to Robert, for recommending VisualD and the bindings. I might try all
three D compilers to which gets the best speed, but perhaps LDC seems most
promising from what you've said. I suppose in the future when many-core becomes
prevalent that compiler optimization won't be so much of an issue because of
the relative simplicity compared to the tricks of the present day CPU.

One issue I have with the Visual C++ compiler is that it doesn't seem to support
loop unswitching (i.e. doubling up code with boolean If statements). I wonder if
one of the D compilers supports it. I started a thread over at cprogramming
about it here: http://cboard.cprogramming.com/c-programming/126756-lack-compiler-loop-optimization-loop-unswitching.html

I have some decent CUDA bindings with a nice high level API that I'd be
willing to share/open source. But you still have to write the actual GPU
kernels in C/C++.

Thanks, I'll bear those in mind.

Cheers, Dan

May 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

%u wrote:
 The DMC++ compiler you mentioned sounds interesting too. I'd like to compare
 performance with that, the VC++ one, and the Intel compiler.

When comparing D performance with C++, it is best to compare compilers with the 
same back end, i.e.:

    dmd with dmc
    gcc with gdc
    lcc with ldc

This is because back ends can vary greatly in the code generated.

May 18 2010

retard <re tard.com.invalid> writes:

Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:

 %u wrote:
 The DMC++ compiler you mentioned sounds interesting too. I'd like to
 compare performance with that, the VC++ one, and the Intel compiler.

 
 When comparing D performance with C++, it is best to compare compilers
 with the same back end, i.e.:
 
     dmd with dmc
     gcc with gdc
     lcc with ldc
 
 This is because back ends can vary greatly in the code generated.

What if I'm using a clean room implementation of D with a custom backend 
and no accompanying C compiler, am I not allowed to compare the 
performance with anything?

When people compare C compilers, they usually use the latest Visual 
Studio, gcc, icc, and llvm versions -- i.e. C compilers from various 
vendors. Using the same logic one is not allowed to compare dmc against 
those since it would always lose.

May 18 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 18/05/10 20:19, retard wrote:
 What if I'm using a clean room implementation of D with a custom backend
 and no accompanying C compiler, am I not allowed to compare the
 performance with anything?

 When people compare C compilers, they usually use the latest Visual
 Studio, gcc, icc, and llvm versions -- i.e. C compilers from various
 vendors. Using the same logic one is not allowed to compare dmc against
 those since it would always lose.

I don't believe Walter is arguing against this methodology. What he is 
arguing against is comparing dmd with gcc for example. Comparing ldc 
with gdc and dmd is fine, comparing dmd with dmc is fine, but when it 
comes to comparing D and C, he believes you should compare compilers 
using the same backend, that is dmd and dmc rather than dmd and gcc. Or 
that's what I took from it.

This said, I don't agree with that methodology, unless it's only a small 
test. If you're comparing lots of C compilers and D you should include 
dmc for example if you're using dmd as the D reference, or clang if 
you're using ldc as a reference. If you're comparing C and D, you should 
stick to compilers with the same backend, otherwise the one with the 
superior backend will always win, and it's not a fair interlanguage 
comparison.

May 18 2010

bearophile <bearophileHUGS lycos.com> writes:

Robert Clipsham:
 otherwise the one with the 
 superior backend will always win, and it's not a fair interlanguage 
 comparison.

Life isn't fair. Too bad for the one with a inferior back-end.

Bye,
bearophile

May 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Life isn't fair. Too bad for the one with a inferior back-end.

Of course it isn't fair. But if you want to draw useful conclusions from a 
benchmark, you have to do what is known as "isolate the variables". If there
are 
two independent variables feeding into performance, you CANNOT draw a
conclusion 
about one of them from the performance. In other words, if:

    g = f(x,y)

then knowing g, x and y tells you nothing at all about x's contribution to g.

May 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

retard wrote:
 Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:
 
 %u wrote:
 The DMC++ compiler you mentioned sounds interesting too. I'd like to
 compare performance with that, the VC++ one, and the Intel compiler.

 When comparing D performance with C++, it is best to compare compilers
 with the same back end, i.e.:

     dmd with dmc
     gcc with gdc
     lcc with ldc

 This is because back ends can vary greatly in the code generated.

 
 What if I'm using a clean room implementation of D with a custom backend 
 and no accompanying C compiler, am I not allowed to compare the 
 performance with anything?

You're allowed to do whatever you want. I'm pointing out that the difference in 
code generator ability should not be misconstrued as a difference in the
languages.


 When people compare C compilers, they usually use the latest Visual 
 Studio, gcc, icc, and llvm versions -- i.e. C compilers from various 
 vendors. Using the same logic one is not allowed to compare dmc against 
 those since it would always lose.

It's perfectly reasonable to compare dmc and gcc for code generation quality.

May 18 2010

retard <re tard.com.invalid> writes:

Tue, 18 May 2010 15:03:43 -0700, Walter Bright wrote:

 retard wrote:
 Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:
 
 %u wrote:
 The DMC++ compiler you mentioned sounds interesting too. I'd like to
 compare performance with that, the VC++ one, and the Intel compiler.

 When comparing D performance with C++, it is best to compare compilers
 with the same back end, i.e.:

     dmd with dmc
     gcc with gdc
     lcc with ldc

 This is because back ends can vary greatly in the code generated.

 
 What if I'm using a clean room implementation of D with a custom
 backend and no accompanying C compiler, am I not allowed to compare the
 performance with anything?

 
 You're allowed to do whatever you want. I'm pointing out that the
 difference in code generator ability should not be misconstrued as a
 difference in the languages.

It's a rookie mistake to believe that languages have some kind of 
differences performance wise. That kind of comparison was likely useful 
in the 80s when languages and instruction sets had a greater resemblance 
(they were all low level languages). But as you can see from the 
bearophile's link ( http://blog.llvm.org/2010/05/glasgow-haskell-compiler-
and-llvm.html ), there is larger performance gap between a naive and a 
highly tuned implementation of the same language than between decent 
implementations of different modern languages.

Why developers want to compare dmd with g++ is just because they're not 
interested in D or D's code generator per se. They have a task to solve 
and they want the fastest production ready (stable enough to compile 
their solution) toolchain for the problem - NOW. There is no loyalty 
left. Most mainstream languages contain the same imperative / object 
oriented hybrid core with small functional extensions (closures/lambdas). 
You only need to choose the best for this particular task. Usually 
there's only a limited amount of time left so you may need to guess. You 
just have to evaluate partial information snippets, for instance that dmd 
sucks at inlining closures and Java doesn't do tail call optimization.

Ideally a casual developer studies the language grammar for a few hours 
and then starts writing code. If the language turns out to be bad, he 
just moves on and forgets it unless the toolchain improves later and 
there will be a reddit post about it. That's how I met Perl. With years 
of Pascal/C/C++/Java experience under my belt, I learned that Perl might 
be a perfect tool for extending apache with our plugin. Few hours of 
studying (the language) + quite a bit more (the APIdocs) and there I was 
writing Perl - probably really buggy code, but code nonetheless.

There are even languages that consist of visual graphs (the "editor" is 
just a CAD-like GUI) or sentences written in normal english - they don't 
have any kind of link between the target machine and the solution other 
than the abstract computational model. If you encounter a statement such 
as:

  find_longest_common_substring(string1, string2);

you cannot know how fast it is. This kind of code is getting more popular 
and it's called declarative - it doesn't tell how it solves it problem, 
it just tells what it does. It's also the abstraction level that most 
developers are (should be) using. You may ask, if that statement is 
faster in C than in Python. The Python coder could just use the one 
written in C and invoke it via a foreign function interface. The FFI 
might add few cycles worth of overhead, but overall the algorithm is the 
same.

May 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

retard wrote:
 It's a rookie mistake to believe that languages have some kind of 
 differences performance wise.

Well, they do. It's also true that these performance differences can be swamped 
by the quality of the implementation, and the ability of the programmer. But 
that doesn't mean there are not inherent performance differences due to the 
semantics the language requires.

It's like car racing. The performance is a combination of 3 factors:

1. the 'formula' for the particular class you're racing in
2. the quality of the construction of the car to that formula
3. the ability of the driver

It's simply wrong to measure the performance and then naively attribute it to 
one of those three, pretending the other two are constant.

May 20 2010

retard <re tard.com.invalid> writes:

Thu, 20 May 2010 10:06:17 -0700, Walter Bright wrote:

 retard wrote:
 It's a rookie mistake to believe that languages have some kind of
 differences performance wise.

 
 Well, they do. It's also true that these performance differences can be
 swamped by the quality of the implementation, and the ability of the
 programmer. But that doesn't mean there are not inherent performance
 differences due to the semantics the language requires.
 
 It's like car racing. The performance is a combination of 3 factors:
 
 1. the 'formula' for the particular class you're racing in 2. the
 quality of the construction of the car to that formula 3. the ability of
 the driver
 
 It's simply wrong to measure the performance and then naively attribute
 it to one of those three, pretending the other two are constant.

Of course. The language/implementation comparisons are all faulty. You 
also need to model the performance of the programmer by building some 
kind of developer skill profiles and measure how the languages & 
implementations compete against each other in all these skill classes. 
For example the language shooutout site favors experienced programmers; 
bad programmers generate code with 2-3 orders of magnitude worse 
performance.

May 21 2010

bearophile <bearophileHUGS lycos.com> writes:

%u:
 One issue I have with the Visual C++ compiler is that it doesn't seem to
support
 loop unswitching (i.e. doubling up code with boolean If statements). I wonder
if
 one of the D compilers supports it. I started a thread over at cprogramming
 about it here: http://cboard.cprogramming.com/c-programming/126756-lack-compiler-loop-optimization-loop-unswitching.html

In LDC (LLVM) this optimization is named -loop-unswitch and it's present on
default on -O3 and higher.

--------------------------

Your C++ code cleaned up a bit:


#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double test(bool b) {
	double d = 0.0;
	double u = 0.0;
	for (int n = 0; n < 1000000000; n++) {
		d += u;
		if (b)
		    u = sin((double)n);		
	}	
	return d;
}

int main() {
    bool b = (bool)atoi("1");
    printf("%f\n", test(b));    
}


The asm generated of just the test() function:
g++ -O3 -S

__Z4testb:
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ebx
	subl	$36, %esp
	cmpb	$0, 8(%ebp)
	jne	L2
	fldz
	movl	$1000000000, %eax
	fld	%st(0)
	.p2align 4,,7
L3:
	subl	$1, %eax
	fadd	%st(1), %st
	jne	L3
	fstp	%st(1)
	addl	$36, %esp
	popl	%ebx
	popl	%ebp
	ret
	.p2align 4,,7
L2:
	fldz
	xorl	%ebx, %ebx
	fld	%st(0)
	jmp	L5
	.p2align 4,,7
L9:
	fxch	%st(1)
L5:
	faddp	%st, %st(1)
	movl	%ebx, -12(%ebp)
	addl	$1, %ebx
	fildl	-12(%ebp)
	fstpl	(%esp)
	fstpl	-24(%ebp)
	call	_sin
	cmpl	$1000000000, %ebx
	fldl	-24(%ebp)
	jne	L9
	fstp	%st(1)
	addl	$36, %esp
	popl	%ebx
	popl	%ebp
	ret

-------------------

More aggressive compilation:
g++ -O3 -s -fomit-frame-pointer -msse3 -march=native -ffast-math -S

__Z4testb:
	subl	$4, %esp
	cmpb	$0, 8(%esp)
	jne	L2
	movl	$1000000000, %eax
	.p2align 4,,10
L3:
	decl	%eax
	jne	L3
	fldz
	addl	$4, %esp
	ret
	.p2align 4,,10
L2:
	fldz
	xorl	%eax, %eax
	fld	%st(0)
	.p2align 4,,10
L5:
	movl	%eax, (%esp)
	faddp	%st, %st(1)
	incl	%eax
	fildl	(%esp)
	cmpl	$1000000000, %eax
	fsin
	jne	L5
	fstp	%st(0)
	addl	$4, %esp
	ret

--------------------------

This is a D1 translation:


import tango.math.Math: sin;
import tango.stdc.stdio: printf;
import tango.stdc.stdlib: atoi;

double test(bool b) {
    double d = 0.0;
    double u = 0.0;
    for (int n; n < 1_000_000_000; n++) {
        d += u;
        if (b)
            u = sin(cast(double)n);
    }

    return d;
}

void main() {
    bool b = cast(bool)atoi("1");
    printf("%f\n", test(b));    
}


Compiled with:
ldc -O3 -release -inline test.d
Asm produced, note the je .LBB1_4 near the top:


_D5test54testFbZd:
	pushl	%esi
	subl	$64, %esp
	testb	$1, %al
	je	.LBB1_4
	pxor	%xmm0, %xmm0
	movsd	%xmm0, 32(%esp)
	movl	$1000000000, %esi
	movsd	%xmm0, 24(%esp)
	movsd	%xmm0, 16(%esp)
	.align	16
.LBB1_2:
	movsd	32(%esp), %xmm0
	movsd	%xmm0, 56(%esp)
	fldl	56(%esp)
	fstpt	(%esp)
	call	sinl
	fstpl	48(%esp)
	movsd	24(%esp), %xmm1
	addsd	16(%esp), %xmm1
	movsd	%xmm1, 24(%esp)
	decl	%esi
	movsd	32(%esp), %xmm0
	addsd	.LCPI1_0, %xmm0
	movsd	%xmm0, 32(%esp)
	movsd	48(%esp), %xmm0
	movsd	%xmm0, 16(%esp)

	jne	.LBB1_2
.LBB1_3:
	movsd	24(%esp), %xmm0
	movsd	%xmm0, 40(%esp)
	fldl	40(%esp)
	addl	$64, %esp
	popl	%esi
	ret
.LBB1_4:
	movl	$1000000000, %eax
	.align	16
.LBB1_5:
	decl	%eax
	jne	.LBB1_5
	pxor	%xmm0, %xmm0
	movsd	%xmm0, 24(%esp)
	jmp	.LBB1_3

This runs in about 86 seconds.

--------------------------

Aggressive compilation with LDC:
ldc -O3 -release -inline -enable-unsafe-fp-math -unroll-allow-partial test.d

_D5test54testFbZd:
	subl	$92, %esp
	testb	$1, %al
	je	.LBB1_4
	pxor	%xmm0, %xmm0
	xorl	%eax, %eax
	movapd	%xmm0, %xmm1
	movapd	%xmm0, %xmm2
	.align	16
.LBB1_2:
	leal	1(%eax), %ecx
	cvtsi2sd	%ecx, %xmm3
	movsd	%xmm3, 40(%esp)
	leal	2(%eax), %ecx
	cvtsi2sd	%ecx, %xmm3
	movsd	%xmm3, 48(%esp)
	leal	3(%eax), %ecx
	cvtsi2sd	%ecx, %xmm3
	movsd	%xmm3, 56(%esp)
	leal	4(%eax), %ecx
	cvtsi2sd	%ecx, %xmm3
	movsd	%xmm3, 64(%esp)
	movsd	%xmm0, 80(%esp)
	fldl	80(%esp)
	fsin
	fstpl	72(%esp)
	fldl	40(%esp)
	fsin
	fstpl	8(%esp)
	fldl	48(%esp)
	fsin
	fstpl	16(%esp)
	fldl	56(%esp)
	fsin
	fstpl	24(%esp)
	fldl	64(%esp)
	fsin
	fstpl	32(%esp)
	addsd	%xmm1, %xmm2
	addsd	72(%esp), %xmm2
	addsd	8(%esp), %xmm2
	addsd	16(%esp), %xmm2
	movapd	%xmm2, %xmm1
	addsd	24(%esp), %xmm1
	addl	$5, %eax
	cmpl	$1000000000, %eax
	addsd	.LCPI1_0, %xmm0
	movsd	32(%esp), %xmm2

	jne	.LBB1_2
.LBB1_3:
	movsd	%xmm1, (%esp)
	fldl	(%esp)
	addl	$92, %esp
	ret
.LBB1_4:
	xorl	%eax, %eax
	.align	16
.LBB1_5:
	addl	$10, %eax
	cmpl	$1000000000, %eax
	jne	.LBB1_5
	pxor	%xmm1, %xmm1
	jmp	.LBB1_3


This runs in about 58 seconds. Note also it's partially unrolled 4 times.

Here both G++ and LDC are performing loop unswitching.

Bye,
bearophile

May 18 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 16/05/10 15:27, Dan W wrote:
 Hi all, I'm toying around with the idea of porting my raytracer codebase to D.
 But before committing, I have a few rookie questions:

 1: What kind of license is the D compiler under? I'm thinking of shipping a
 commercial, close sourced (for now) program with the D compiler (so that users
 can compile within the GUI). Is this possible to do, or can I least pay for the
 priviledge?

dmd is under 2 (3) licenses, one for the front end and one for the 
backend. I won't go into details, you can find the details in the 
archives though. Long story short if you want to redistribute dmd you 
have to ask Walter for the priviledge. LDC and GDC have no such 
restrictions, you can include them as long as you don't modify the 
source, and if you do then you distribute the source as well as the 
binaries.

 2: Is it possible to use D with the Visual C++ IDE? Preferably, I would like
 the apprepriate compiler and D options listed in the options (in place of the
 usual c/c++ options).

Try VisualD, which was released about a month ago. I haven't tried it 
yet, I believe it still has some way to go... This said its current 
feature list looks impressive.

http://dsource.org/projects/visuald/

 3: I need my program to be as fast as possible. The Visual C++ compiler has
 features such as "link-time code generation" and "Profile guided optimization".
 Does D have equivalents?

If you want LTO you'll need to use LDC with some fancy compilation steps 
(I believe bearophile, our resident benchmarker should be able to 
provide you with these). The downside to LDC is that it does not support 
exceptions on windows (it will support them as soon as llvm does).

 4: Does D play nicely with QT, SDL, Lua?

See:
http://dsource.org/projects/qtd/ - Qt bindings
http://dsource.org/projects/luad/ - Lua bindings
http://dsource.org/projects/derelict/ - Various bindings for 
multimedia/game apps including SDL, OpenGL, OpenAL etc

 5: How about compatibility with GPGPU stuff like CUDA and OpenCL? Can I just as
 easily write GPGPU programs which run as fast as I can with C/C++?

I don't know what the status of this is, I think a couple of people have 
written some initial bindings for either CUDA or OpenCL, perhaps someone 
else can enlighten you as to their status. As for their speed it will be 
just as fast as the equivilant code in C/C++.

I hope things go well for you, there's a lot of initial hurdles for 
getting into D, but once you find your way around them you'll learn to 
love this great language! There are lots of people that have written ray 
tracers in D, so should you need assistance there's people who can help.

May 16 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Robert Clipsham wrote:
 LDC and GDC have no such
 restrictions, you can include them as long as you don't modify the
 source, and if you do then you distribute the source as well as the
 binaries.
=20

	That's a common misconception about the GPL: you have to distribute
the source even if you didn't modify it.

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

May 16 2010

Leandro Lucarella <llucax gmail.com> writes:

"Jérôme M. Berger", el 16 de mayo a las 22:50 me escribiste:
 Robert Clipsham wrote:
 LDC and GDC have no such
 restrictions, you can include them as long as you don't modify the
 source, and if you do then you distribute the source as well as the
 binaries.
 

 	That's a common misconception about the GPL: you have to distribute
 the source even if you didn't modify it.

The source must be available. You usually don't distribute the source if
you didn't modify the program because anyone can find it in the original
place. But when you do modify it, you must provide a way to access the
source.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
More than 50% of the people in the world have never made
Or received a telephone call

May 16 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Leandro Lucarella wrote:
 "J=C3=A9r=C3=B4me M. Berger", el 16 de mayo a las 22:50 me escribiste:
 Robert Clipsham wrote:
 LDC and GDC have no such
 restrictions, you can include them as long as you don't modify the
 source, and if you do then you distribute the source as well as the
 binaries.

 	That's a common misconception about the GPL: you have to distribute
 the source even if you didn't modify it.

=20
 The source must be available. You usually don't distribute the source i=

f
 you didn't modify the program because anyone can find it in the origina=

l
 place. But when you do modify it, you must provide a way to access the
 source.
=20

	Here is the relevant section in GPLv2:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D8<------------------------------
  3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:

    a) Accompany it with the complete corresponding machine-readable
    source code, which must be distributed under the terms of Sections
    1 and 2 above on a medium customarily used for software
interchange; or,

    b) Accompany it with a written offer, valid for at least three
    years, to give any third party, for a charge no more than your
    cost of physically performing source distribution, a complete
    machine-readable copy of the corresponding source code, to be
    distributed under the terms of Sections 1 and 2 above on a medium
    customarily used for software interchange; or,

    c) Accompany it with the information you received as to the offer
    to distribute corresponding source code.  (This alternative is
    allowed only for noncommercial distribution and only if you
    received the program in object code or executable form with such
    an offer, in accord with Subsection b above.)
------------------------------>8=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

	Since the OP was talking about a commercial distribution, point (c)
does not apply and therefore, source must be redistributed.

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

May 17 2010

Alex Makhotin <alex bitprox.com> writes:

Dan W wrote:
 Hi all, I'm toying around with the idea of porting my raytracer codebase to D.

Hi,

May I ask you why are you planning to port an existing codebase to D?
What kind of benefits specifically(except comparable to C performance) 
you expect from D?

Thank you.



-- 
Alex Makhotin,
the founder of BITPROX,
http://bitprox.com

May 16 2010

bearophile <bearophileHUGS lycos.com> writes:

Alex Makhotin:

 May I ask you why are you planning to port an existing codebase to D?
 What kind of benefits specifically(except comparable to C performance) 
 you expect from D?

At the moment performance (if compared to C++ code compiled with GCC or ICC) is
not a selling point of D.
But D can be advertised for its other quality: compared to C or C++ it's very
nice to write D code, it's more handy, and a little safer. This can be enough
to to justify a switch from C++ to D :-)
A problem in such advertising strategy is that lot of people I know don't seem
to look for a better C++, it seems they want to keep themselves away from
anything that smells a bit of C++ :-(

Bye,
bearophile

May 16 2010

BCS <none anon.com> writes:

Hello Dan,

 Hi all, I'm toying around with the idea of porting my raytracer
 codebase to D. But before committing, I have a few rookie questions:
 
 1: What kind of license is the D compiler under? I'm thinking of
 shipping a commercial, close sourced (for now) program with the D
 compiler (so that users can compile within the GUI). Is this possible
 to do, or can I least pay for the priviledge?

The front end is under an Open Source (R) license. The backbend is open source 
but only in that you can see the source. Several projects combine the front 
end with a FOOS back end than can b shipped but you can't ship copies of 
the official exe without Walters ok but he's been know to give it at no cost 
if you ask really nicely.

 2: Is it possible to use D with the Visual C++ IDE? Preferably, I
 would like the apprepriate compiler and D options listed in the
 options (in place of the usual c/c++ options).

There is a D plugin that recently got posted that allows that. I've never 
got it working but I think that's me FUBARing VS.

 3: I need my program to be as fast as possible. The Visual C++
 compiler has features such as "link-time code generation" and "Profile
 guided optimization". Does D have equivalents?

For link time code generation: you might get the same effect via templates 
(they are way easier under D than C++). As for the other, I think DMD can 
do some of that but I don't remember the details.

 5: How about compatibility with GPGPU stuff like CUDA and OpenCL?

I remember seeing some work in that direction about 2-3 years ago. If you 
can get a C API to that stuff, you can do it in D. There might be a wrapper 
somewhere that gives an API that's cleaner to use from D.

 Can I just as easily write GPGPU programs which run as fast as I can with 

C/C++?

Assuming a reasonable API, you should be able to whip out D code to interact 
with CUDA/OpenCL at least as fast as you can write the same in C/C++.


-- 
... <IXOYE><

May 16 2010

bearophile <bearophileHUGS lycos.com> writes:

Dan W.:

 3: I need my program to be as fast as possible. The Visual C++ compiler has
 features such as "link-time code generation"

This page explains this topic:
http://msdn.microsoft.com/en-us/magazine/cc301698.aspx

Bye,
bearophile

May 16 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 3: I need my program to be as fast as possible. The Visual C++ compiler has
 features such as "link-time code generation"

 
 This page explains this topic:
 http://msdn.microsoft.com/en-us/magazine/cc301698.aspx

What's actually happening is interprocedural analysis, and inlining across 
source modules. In C++ this needs to happen at link time because the C++ 
compilation module is each source file is completely independent of other
source 
files.

This is not true of D. In D, the compiler can (at the option of how it is 
compiled and how the programmer sets up the source modules) look at all the 
source to the program. Hence, a lot of inlining can (and does) happen across 
modules without needing any support from the linker.

May 16 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

This is not true of D. In D, the compiler can<

Thank you for your answers.
At the moment D compilers aren't doing this, I think. (LDC performs an
optimization at link time). But it's nice that D leaves this optimization
opportunity to future D compilers.

----------------------

If you have noticed that Html page lists three optimizations. The first one is
the one you have explained.

The second optmizations it talks about is custom calling conventions:

Normally, all functions are either cdecl, stdcall, or fastcall. With custom
calling conventions, the back end has enough knowledge that it can pass more
values in registers, and less on the stack. This usually cuts code size and
improves performance.<


I have translated his demo code to D:

int foo(int i, int* j, int* k, int l) {
    *j = *k;
    *k = i + l;
    return i + *j + *k + l;
}
int main(char[][] args) {
    int i, j, k, l;
    l = i = args.length;
    int x = foo(i, &j, &k, l);
    return x * args.length;
}


This is how dmd compiles foo() (-O -release):

_D7stdcall3fooFiPiPiiZi	comdat
		push	EAX
		mov	ECX,8[ESP]
		mov	EDX,[ECX]
		push	EBX
		mov	EBX,010h[ESP]
		push	ESI
		mov	ESI,018h[ESP]
		push	EDI
		lea	EDI,[EAX][ESI]
		mov	[EBX],EDX
		mov	[ECX],EDI
		mov	EAX,[EBX]
		add	EAX,ESI
		add	EAX,EDI
		add	EAX,0Ch[ESP]
		pop	EDI
		pop	ESI
		pop	EBX
		pop	ECX
		ret	0Ch


This is how LDC compiles foo() with -O3 -release:

_D4test3fooFiPiPiiZi:
    pushl   %esi
    movl    8(%esp), %ecx
    movl    (%ecx), %edx
    movl    12(%esp), %esi
    movl    %edx, (%esi)
    addl    16(%esp), %eax
    movl    %eax, (%ecx)
    addl    %eax, %eax
    addl    (%esi), %eax
    popl    %esi
    ret $12


This is the asm of foo() shown in that article:

_foo:
    mov         ecx,dword ptr [eax]
    mov         dword ptr [esi],ecx
    lea         ecx,[edi+edx]
    mov         dword ptr [eax],ecx
    mov         eax,dword ptr [esi]    // *j
    add         eax,ecx                // *k sub-expression (from
    add         eax,edi                // l
    add         eax,edx                // i
    ret

It seems LDC isn't performing this optimization.

----------------------

The third optimizations it talks about is 'Small TLS Encoding':

When you use __declspec(thread) variables, the code generator stores the
variables at a fixed offset in each per-thread data area. Without LTCG, the
code generator has no idea of how many __declspec(thread) variables there will
be. As such, it must generate code that assumes the worst, and uses a four-byte
offset to access the variable. With LTCG, the code generator has the
opportunity to examine all __declspec(thread) variables, and note how often
they're used. The code generator can put the smaller, more frequently used
variables at the beginning of the per-thread data area and use a one-byte
offset to access them.<

This is the C++ example code he uses:

__declspec(thread) int i = 1;
int main() {
    i = 4;
    return i;
}


The asm he shows without this optimization:
_main:
    mov         eax,dword ptr [__tls_index]
    mov         ecx,dword ptr fs:[2Ch]
    mov         ecx,dword ptr [ecx+eax*4]
    push        4
    pop         eax
    mov         dword ptr [ecx+4],eax
    ret



The asm he shows with this optimization:
_main:
    mov         eax,dword ptr fs:[0000002Ch]
    mov         ecx,dword ptr [eax]
    mov         eax,4
    mov         dword ptr [ecx+8],eax
    ret


I have translated that last C++ example in this D code:

int i = 1;
int main() {
    i = 4;
    return i;
}


I think I can't test this with LDC because it doesn't have TLS/__gshared.
dmd compiles it to:

__Dmain
		mov	ECX,FS:__tls_array
		mov	EDX,[ECX]
		mov	EAX,4
		mov	_D4test1ii[EDX],EAX
		ret


On this little example dmd seems to produce similar asm.

Bye,
bearophile

May 16 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Walter Bright:
 
 This is not true of D. In D, the compiler can<

 
 Thank you for your answers. At the moment D compilers aren't doing this,

Yes, they are. dmd definitely inlines across source modules.


 The second optmizations it talks about is custom calling conventions:
 
 Normally, all functions are either cdecl, stdcall, or fastcall. With custom
 calling conventions, the back end has enough knowledge that it can pass
 more values in registers, and less on the stack. This usually cuts code
 size and improves performance.<


Right, dmd doesn't do custom calling conventions. But, it is not necessary for
D 
to have the linker do them. As I explained, the compiler has as much source 
available to it as the user wishes to supply.


 The third optimizations it talks about is 'Small TLS Encoding':
 
 When you use __declspec(thread) variables, the code generator stores the
 variables at a fixed offset in each per-thread data area. Without LTCG, the
 code generator has no idea of how many __declspec(thread) variables there
 will be. As such, it must generate code that assumes the worst, and uses a
 four-byte offset to access the variable. With LTCG, the code generator has
 the opportunity to examine all __declspec(thread) variables, and note how
 often they're used. The code generator can put the smaller, more frequently
 used variables at the beginning of the per-thread data area and use a
 one-byte offset to access them.<


Yes, but you won't find this to be a speed improvement. The various addressing 
modes all run at the same speed. Furthermore, the use of global variables (and 
that includes TLS) should be minimized. Use of TLS (or any globals) in a tight 
loop should be avoided on general principles in favor of caching the value in a 
local. I don't believe this optimization is worth the effort.

Many compilers spend a lot of time trying to optimize access to statics and 
globals. This ain't low hanging fruit for any but badly written programs.

May 16 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

 Right, dmd doesn't do custom calling conventions. But, it is not necessary for
D 
 to have the linker do them. As I explained, the compiler has as much source 
 available to it as the user wishes to supply.

I'll talk about this a bit with LLVM devs.
Thank you for all your explanations, you often teach me things.

Bye,
bearophile

May 16 2010

"Nick Sabalausky" <a a.a> writes:

"Dan W" <twinbee42 skytopia.com> wrote in message 
news:hsovdd$1s1j$1 digitalmars.com...
 2: Is it possible to use D with the Visual C++ IDE? Preferably, I would 
 like
 the apprepriate compiler and D options listed in the options (in place of 
 the
 usual c/c++ options).

Other people mentioned the recent D plugin for Visual Studio. If that isn't 
mature enough for you, there's a very mature plugin for Eclipse called 
Descent: http://www.dsource.org/projects/descent

 3: I need my program to be as fast as possible.

Optimization often seems to be a mixed bag across any two modern languages.

On one hand, there are some cases where D can be a little slower than 
average. For instance, I've heard that the GC isn't great at handling lots 
of small objects. Bearophile can probably tell you a lot about any slow 
spots of D, he's done a lot of testing in that area.

On the other hand, there's plenty that D is fast with. Other people have 
mentioned a lot about this already. But I'll also add that the design of D 
has a few things that can allow certain things to be done in a more 
efficient way than can easily be done in C/C++. Array slicing (combined with 
GC), for example, has been shown to go a long way in helping to make a 
ridiculously fast (and memory-efficient) XML parser with less effort than it 
would take in C/C++:

http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-parsequerymutateserialize/
http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/
http://dotnot.org/blog/archives/2008/03/12/why-is-dtango-so-fast-at-parsing-xml/

May 16 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Sun, 16 May 2010 10:27:57 -0400, Dan W <twinbee42 skytopia.com> wrote:
 5: How about compatibility with GPGPU stuff like CUDA and OpenCL? Can I  
 just as
 easily write GPGPU programs which run as fast as I can with C/C++?

I have some decent CUDA bindings with a nice high level API that I'd be  
willing to share/open source. But you still have to write the actual GPU  
kernels in C/C++.

May 16 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL