www.digitalmars.com Home | Search | C & C++ | D | DMDScript | News Groups | index | prev | next
Archives

D Programming
D
D.gnu
digitalmars.D
digitalmars.D.bugs
digitalmars.D.dtl
digitalmars.D.dwt
digitalmars.D.announce
digitalmars.D.learn
digitalmars.D.debugger

C/C++ Programming
c++
c++.announce
c++.atl
c++.beta
c++.chat
c++.command-line
c++.dos
c++.dos.16-bits
c++.dos.32-bits
c++.idde
c++.mfc
c++.rtl
c++.stl
c++.stl.hp
c++.stl.port
c++.stl.sgi
c++.stlsoft
c++.windows
c++.windows.16-bits
c++.windows.32-bits
c++.wxwindows

digitalmars.empire
digitalmars.DMDScript

D.gnu - D vs C code generation

↑ ↓ ← wscott wscott1.homeip.net (Wayne Scott) writes:
I was doing some tests using the gdc compiler and comparing it to gcc.

First I created C version of the example wc program:

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>

char *
readfile(char *file)
{
	char	*ret;
	int	fd;
	struct stat sb;

	stat(file, &sb);

	ret = malloc(sb.st_size + 1);
	fd = open(file, O_RDONLY);
	read(fd, ret, sb.st_size);
	ret[sb.st_size] = 0;
	close(fd);
	return (ret);
}

int
main (int ac, char **av)
{
	int	w_total = 0;
	int	l_total = 0;
	int	c_total = 0;
	int	i;

	printf ("   lines   words   bytes file\n");
	for (i = 1; i < ac; i++) {
		char	*input;
		int	w_cnt = 0, l_cnt = 0, c_cnt = 0;
		int	inword = 0;
		char	*p;

		input = readfile(av[i]);

		p = input;
		while (*p) {
			if (*p == '\n') ++l_cnt;
			if (*p != ' ') {
				if (!inword) {
					inword = 1;
					++w_cnt;
				}
			} else {
				inword = 0;
			}
			++c_cnt;
			++p;
		}
		free(input);
		printf ("%8u%8u%8u %s\n", l_cnt, w_cnt, c_cnt, av[i]);
		l_total += l_cnt;
		w_total += w_cnt;
		c_total += c_cnt;
	}
	if (ac > 2) {
		printf ("--------------------------------------\n"
"%8u%8u%8u total\n",
		    l_total, w_total, c_total);
	}
	return (0);
}

Then I compiled both versions with -O2 and used cachegrind to find
out exactly how many instruction each one needed to run.  Here are the
results with the C version first.

(This is over 2 megs of C source)
$ valgrind --tool=cachegrind ./wc_c ~/bk/bk-3.3.x/src/*.c > /dev/null
==3349== I   refs:      22,529,481
==3349== I1  misses:           784
==3349== L2i misses:           778
==3349== I1  miss rate:        0.0%
==3349== L2i miss rate:        0.0%
==3349== 
==3349== D   refs:       2,393,366  (2,262,770 rd + 130,596 wr)
==3349== D1  misses:        10,159  (    9,671 rd +     488 wr)
==3349== L2d misses:         9,680  (    9,315 rd +     365 wr)
==3349== D1  miss rate:        0.4% (      0.4%   +     0.3%  )
==3349== L2d miss rate:        0.4% (      0.4%   +     0.2%  )
==3349== 
==3349== L2 refs:           10,943  (   10,455 rd +     488 wr)
==3349== L2 misses:         10,458  (   10,093 rd +     365 wr)
==3349== L2 miss rate:         0.0% (      0.0%   +     0.2%  )
farm Dlang $ valgrind --tool=cachegrind ./wc_d ~/bk/bk-3.3.x/src/*.c > /dev/null
==3351== Cachegrind, an I1/D1/L2 cache profiler for x86-linux.
==3351== I   refs:      29,081,497
==3351== I1  misses:         1,216
==3351== L2i misses:         1,199
==3351== I1  miss rate:        0.0%
==3351== L2i miss rate:        0.0%
==3351== 
==3351== D   refs:       4,891,118  (3,663,754 rd + 1,227,364 wr)
==3351== D1  misses:        61,871  (   24,677 rd +    37,194 wr)
==3351== L2d misses:        60,880  (   23,757 rd +    37,123 wr)
==3351== D1  miss rate:        1.2% (      0.6%   +       3.0%  )
==3351== L2d miss rate:        1.2% (      0.6%   +       3.0%  )
==3351== 
==3351== L2 refs:           63,087  (   25,893 rd +    37,194 wr)
==3351== L2 misses:         62,079  (   24,956 rd +    37,123 wr)
==3351== L2 miss rate:         0.1% (      0.0%   +       3.0%  )

As you can see the D version of the code used 30% more instructions and
100% more data accesses.
(BTW the system wc program was a lot slower than both of these...)

That is not too bad for the benefits, but I was hoping they would
be closer.  Originally I was seeing MUCH different results, but I was
using smaller input sets.  D has a much higer startup overhead.

Next I tried making the D code look like my C version without the
dynamic arrays and just using pointers.  It didn't really change the
numbers at all.  Also adding -fno-bounds-check didn't help.  That is a
good sign because it means that the array code generates the same code
you would write using pointer.

Anyway I thought the result was interesting...

-Wayne
Jul 17 2004
↑ ↓ Stephen Waits <steve waits.net> writes:
Wayne Scott wrote:
 Anyway I thought the result was interesting...

Interesting ideed. Can you please say which versions of dmd and gcc you used? Thanks, Steve
Jul 19 2004
↑ ↓ wscott wscott1.homeip.net (Wayne Scott) writes:
In article <cdhcbc$no3$2 digitaldaemon.com>,
Stephen Waits  <steve waits.net> wrote:
Wayne Scott wrote:
 Anyway I thought the result was interesting...

Interesting ideed. Can you please say which versions of dmd and gcc you used? Thanks, Steve

Ahh yes, I did leave out that information. I used release 1f of the D gcc frontend from here: http://home.earthlink.net/~dvdfrdmn/d/ build on top of GCC 3.3.4. And compared it to that same gcc. I tried rebuilding with the linux version of the official compiler and I get this result with -O -release: ==22599== Cachegrind, an I1/D1/L2 cache profiler for x86-linux. ==22599== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote. ==22599== Using valgrind-2.1.1, a program supervision framework for x86-linux. ==22599== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward. ==22599== For more details, rerun with: -v ==22599== ==22599== ==22599== I refs: 23,711,446 ==22599== I1 misses: 1,066 ==22599== L2i misses: 1,055 ==22599== I1 miss rate: 0.0% ==22599== L2i miss rate: 0.0% ==22599== ==22599== D refs: 7,230,055 (6,404,429 rd + 825,626 wr) ==22599== D1 misses: 48,292 ( 10,964 rd + 37,328 wr) ==22599== L2d misses: 46,787 ( 9,685 rd + 37,102 wr) ==22599== D1 miss rate: 0.6% ( 0.1% + 4.5% ) ==22599== L2d miss rate: 0.6% ( 0.1% + 4.4% ) ==22599== ==22599== L2 refs: 49,358 ( 12,030 rd + 37,328 wr) ==22599== L2 misses: 47,842 ( 10,740 rd + 37,102 wr) ==22599== L2 miss rate: 0.1% ( 0.0% + 4.4% ) That is similar to the number of instructions in the C version, but over 3X the number of D refs. The number of D1 misses was the same so the extra loads and stores were probably all on the stack. -Wayne PS: Does anyone read this newgroup or should I have posted this stuff to the digitalmars.D newgroup?
Jul 20 2004
↑ ↓ → Stephen Waits <steve waits.net> writes:
Wayne Scott wrote:

 PS: Does anyone read this newgroup or should I have posted this stuff
     to the digitalmars.D newgroup?

Probably wouldn't hurt to cross post it over there. It does involve DMD in addition to gcc, so it's on-topic in both groups. You'll definitely get more response in the main group. --Steve
Jul 20 2004