www.digitalmars.com         C & C++   DMDScript  

D.gnu - D vs C code generation

reply wscott wscott1.homeip.net (Wayne Scott) writes:
I was doing some tests using the gdc compiler and comparing it to gcc.

First I created C version of the example wc program:

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>

char *
readfile(char *file)
{
	char	*ret;
	int	fd;
	struct stat sb;

	stat(file, &sb);

	ret = malloc(sb.st_size + 1);
	fd = open(file, O_RDONLY);
	read(fd, ret, sb.st_size);
	ret[sb.st_size] = 0;
	close(fd);
	return (ret);
}

int
main (int ac, char **av)
{
	int	w_total = 0;
	int	l_total = 0;
	int	c_total = 0;
	int	i;

	printf ("   lines   words   bytes file\n");
	for (i = 1; i < ac; i++) {
		char	*input;
		int	w_cnt = 0, l_cnt = 0, c_cnt = 0;
		int	inword = 0;
		char	*p;

		input = readfile(av[i]);

		p = input;
		while (*p) {
			if (*p == '\n') ++l_cnt;
			if (*p != ' ') {
				if (!inword) {
					inword = 1;
					++w_cnt;
				}
			} else {
				inword = 0;
			}
			++c_cnt;
			++p;
		}
		free(input);
		printf ("%8u%8u%8u %s\n", l_cnt, w_cnt, c_cnt, av[i]);
		l_total += l_cnt;
		w_total += w_cnt;
		c_total += c_cnt;
	}
	if (ac > 2) {
		printf ("--------------------------------------\n"
"%8u%8u%8u total\n",
		    l_total, w_total, c_total);
	}
	return (0);
}

Then I compiled both versions with -O2 and used cachegrind to find
out exactly how many instruction each one needed to run.  Here are the
results with the C version first.

(This is over 2 megs of C source)
$ valgrind --tool=cachegrind ./wc_c ~/bk/bk-3.3.x/src/*.c > /dev/null
==3349== I   refs:      22,529,481
==3349== I1  misses:           784
==3349== L2i misses:           778
==3349== I1  miss rate:        0.0%
==3349== L2i miss rate:        0.0%
==3349== 
==3349== D   refs:       2,393,366  (2,262,770 rd + 130,596 wr)
==3349== D1  misses:        10,159  (    9,671 rd +     488 wr)
==3349== L2d misses:         9,680  (    9,315 rd +     365 wr)
==3349== D1  miss rate:        0.4% (      0.4%   +     0.3%  )
==3349== L2d miss rate:        0.4% (      0.4%   +     0.2%  )
==3349== 
==3349== L2 refs:           10,943  (   10,455 rd +     488 wr)
==3349== L2 misses:         10,458  (   10,093 rd +     365 wr)
==3349== L2 miss rate:         0.0% (      0.0%   +     0.2%  )
farm Dlang $ valgrind --tool=cachegrind ./wc_d ~/bk/bk-3.3.x/src/*.c > /dev/null
==3351== Cachegrind, an I1/D1/L2 cache profiler for x86-linux.
==3351== I   refs:      29,081,497
==3351== I1  misses:         1,216
==3351== L2i misses:         1,199
==3351== I1  miss rate:        0.0%
==3351== L2i miss rate:        0.0%
==3351== 
==3351== D   refs:       4,891,118  (3,663,754 rd + 1,227,364 wr)
==3351== D1  misses:        61,871  (   24,677 rd +    37,194 wr)
==3351== L2d misses:        60,880  (   23,757 rd +    37,123 wr)
==3351== D1  miss rate:        1.2% (      0.6%   +       3.0%  )
==3351== L2d miss rate:        1.2% (      0.6%   +       3.0%  )
==3351== 
==3351== L2 refs:           63,087  (   25,893 rd +    37,194 wr)
==3351== L2 misses:         62,079  (   24,956 rd +    37,123 wr)
==3351== L2 miss rate:         0.1% (      0.0%   +       3.0%  )

As you can see the D version of the code used 30% more instructions and
100% more data accesses.
(BTW the system wc program was a lot slower than both of these...)

That is not too bad for the benefits, but I was hoping they would
be closer.  Originally I was seeing MUCH different results, but I was
using smaller input sets.  D has a much higer startup overhead.

Next I tried making the D code look like my C version without the
dynamic arrays and just using pointers.  It didn't really change the
numbers at all.  Also adding -fno-bounds-check didn't help.  That is a
good sign because it means that the array code generates the same code
you would write using pointer.

Anyway I thought the result was interesting...

-Wayne
Jul 17 2004
parent reply Stephen Waits <steve waits.net> writes:
Wayne Scott wrote:
 Anyway I thought the result was interesting...

Interesting ideed. Can you please say which versions of dmd and gcc you used? Thanks, Steve
Jul 19 2004
parent reply wscott wscott1.homeip.net (Wayne Scott) writes:
In article <cdhcbc$no3$2 digitaldaemon.com>,
Stephen Waits  <steve waits.net> wrote:
Wayne Scott wrote:
 Anyway I thought the result was interesting...

Interesting ideed. Can you please say which versions of dmd and gcc you used? Thanks, Steve

Ahh yes, I did leave out that information. I used release 1f of the D gcc frontend from here: http://home.earthlink.net/~dvdfrdmn/d/ build on top of GCC 3.3.4. And compared it to that same gcc. I tried rebuilding with the linux version of the official compiler and I get this result with -O -release: ==22599== Cachegrind, an I1/D1/L2 cache profiler for x86-linux. ==22599== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote. ==22599== Using valgrind-2.1.1, a program supervision framework for x86-linux. ==22599== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward. ==22599== For more details, rerun with: -v ==22599== ==22599== ==22599== I refs: 23,711,446 ==22599== I1 misses: 1,066 ==22599== L2i misses: 1,055 ==22599== I1 miss rate: 0.0% ==22599== L2i miss rate: 0.0% ==22599== ==22599== D refs: 7,230,055 (6,404,429 rd + 825,626 wr) ==22599== D1 misses: 48,292 ( 10,964 rd + 37,328 wr) ==22599== L2d misses: 46,787 ( 9,685 rd + 37,102 wr) ==22599== D1 miss rate: 0.6% ( 0.1% + 4.5% ) ==22599== L2d miss rate: 0.6% ( 0.1% + 4.4% ) ==22599== ==22599== L2 refs: 49,358 ( 12,030 rd + 37,328 wr) ==22599== L2 misses: 47,842 ( 10,740 rd + 37,102 wr) ==22599== L2 miss rate: 0.1% ( 0.0% + 4.4% ) That is similar to the number of instructions in the C version, but over 3X the number of D refs. The number of D1 misses was the same so the extra loads and stores were probably all on the stack. -Wayne PS: Does anyone read this newgroup or should I have posted this stuff to the digitalmars.D newgroup?
Jul 20 2004
parent Stephen Waits <steve waits.net> writes:
Wayne Scott wrote:

 PS: Does anyone read this newgroup or should I have posted this stuff
     to the digitalmars.D newgroup?

Probably wouldn't hurt to cross post it over there. It does involve DMD in addition to gcc, so it's on-topic in both groups. You'll definitely get more response in the main group. --Steve
Jul 20 2004