www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Expected performance

reply Martin Hess <martinhess mac.com> writes:
I'm considering D for a project that is very CPU intensive so I'm evaluating
D's performance. As a first test I started with associative arrays and ended up
with a result that surprised me a bit so I would like to get some input on the
test's construction.

The test builds a dictionary of 50k byte arrays that map to integers.

int[byte[]] dictionary;

Next it looks up elements a half billion times.

Assuming 1.5 instructions per cycle I'm seeing about 387 instructions per
lookup.

I tried it in Java and was seeing about 309 instructions per lookup.

Nievely I assumed I would get better performance that Java. Does anyone have
any thoughts on this?

Test details:

Mac OSX 10.5
GDC 4.0.1
Java 1.5.0_13

================

import std.stdio;
import std.random;

char[] letters = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

byte[] makeGUID(int size)
{
	byte[] a = new byte[size];
	
	for (int i = 0; i < size; i++)
	{
		a[i] = letters[rand() % letters.length];
	}
	
	return a;
}

const int tableSize = 50000;

version(Unix)
{
	private import std.c.unix.unix;
	
int main (char[][] args)
{
	byte[][tableSize] guids;
	int[byte[]] dictionary;
		
	for (int i = 0; i < tableSize; i++)
	{
		byte[] guid = makeGUID(26);
		guids[i] = guid;
		dictionary[guid] = i;
	}
	
	dictionary.rehash;
	
	ulong start = time(null);
	for (int i = 0; i < (500 * 1000 * 1000); i++)
	{
		byte[] guid = guids[i % guids.length];
		int j = dictionary[guid];
	}
	ulong stop = time(null);
	
	writefln(stop-start);
	
	return 0;
}
}
Nov 07 2007
next sibling parent Sascha Katzner <sorry.no spam.invalid> writes:
Martin Hess wrote:
 Nievely I assumed I would get better performance that Java. Does
 anyone have any thoughts on this?

Perhaps they use different hash functions? If you really want a fast as possible implementation I would suggest you implement your own hash table, with a customized hash function. LLAP, Sascha Katzner
Nov 07 2007
prev sibling next sibling parent reply Aarti_pl <aarti interia.pl> writes:
Martin Hess pisze:
 I'm considering D for a project that is very CPU intensive so I'm evaluating
D's performance. As a first test I started with associative arrays and ended up
with a result that surprised me a bit so I would like to get some input on the
test's construction.
 
 The test builds a dictionary of 50k byte arrays that map to integers.
 
 int[byte[]] dictionary;
 
 Next it looks up elements a half billion times.
 
 Assuming 1.5 instructions per cycle I'm seeing about 387 instructions per
lookup.
 
 I tried it in Java and was seeing about 309 instructions per lookup.
 
 Nievely I assumed I would get better performance that Java. Does anyone have
any thoughts on this?
 

Did you turn on optimizations when compiling D code? -inline do function inlining -O optimize -release compile release version BR Marcin Kuszczak (aarti_pl - www.zapytajmnie.com)
Nov 07 2007
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2007-11-07 05:06:55 -0500, Aarti_pl <aarti interia.pl> said:

 Did you turn on optimizations when compiling D code?
 
 -inline        do function inlining
 -O             optimize
 -release       compile release version

He's using GDC, so that would rather be: -finline-functions one of -O3 (fastest) or -Os (fastest, smaller) -frelease You could also optimize intruction scheduling for a specific processor; for me that would be: -mtune=G4 -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 07 2007
parent Martin Hess <martinhess mac.com> writes:
I had -O3 already but I added the other 2 optimizations and didn't see any time
difference.

Michel Fortin Wrote:

 On 2007-11-07 05:06:55 -0500, Aarti_pl <aarti interia.pl> said:
 
 Did you turn on optimizations when compiling D code?
 
 -inline        do function inlining
 -O             optimize
 -release       compile release version

He's using GDC, so that would rather be: -finline-functions one of -O3 (fastest) or -Os (fastest, smaller) -frelease You could also optimize intruction scheduling for a specific processor; for me that would be: -mtune=G4 -- Michel Fortin michel.fortin michelf.com http://michelf.com/

Nov 07 2007
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Martin Hess:
 The test builds a dictionary of 50k byte arrays that map to integers.
 int[byte[]] dictionary;
 Next it looks up elements a half billion times.

From my tests I think D AAs aren't much optimized yet, in the past I have shown simple Python code that uses Python dicts that runs faster than D code (you can find it in the group or I can try to find it again), and I have done similar tests with C# 3.0 beta with similar results. To have fast hash maps you need a good hash function and good hash collision management. And if you put tons of keys like that you also need a good GC (or memory management). With time D will probably improve, D developers can take a look at how Python implemens dicts, to copy them all (Perl dicts may be fit too). With your situation you have both lot of memory to manage and a large data block to compute hash on, so you need both quick memory management and and a fast hash function. For the memory management I can't help you much, but for the hash function you may take a look at the assembly version of the superfasthash: http://www.azillionmonkeys.com/qed/hash.html http://www.team5150.com/~andrew/blog/2007/03/breaking_superfasthash.html http://www.team5150.com/~andrew/blog/2007/03/when_bad_hashing_means_good_caching.html Bye, bearophile
Nov 07 2007
parent sambeau <sambeau do-not-spam-sambeau.com> writes:
bearophile Wrote:

 For the memory management I can't help you much, but for the hash function you
may take a look at the assembly version of the superfasthash:
 
 http://www.azillionmonkeys.com/qed/hash.html
 http://www.team5150.com/~andrew/blog/2007/03/breaking_superfasthash.html
 http://www.team5150.com/~andrew/blog/2007/03/when_bad_hashing_means_good_caching.html

You might also want to take a look at this: http://judy.sourceforge.net/ Cheers, s
Nov 07 2007
prev sibling parent Lionello Lunesu <lio lunesu.remove.com> writes:
Martin Hess wrote:
 I'm considering D for a project that is very CPU intensive so I'm evaluating
D's performance. As a first test I started with associative arrays and ended up
with a result that surprised me a bit so I would like to get some input on the
test's construction.
 
 The test builds a dictionary of 50k byte arrays that map to integers.
 
 int[byte[]] dictionary;
 
 Next it looks up elements a half billion times.
 
 Assuming 1.5 instructions per cycle I'm seeing about 387 instructions per
lookup.
 
 I tried it in Java and was seeing about 309 instructions per lookup.
 
 Nievely I assumed I would get better performance that Java. Does anyone have
any thoughts on this?
 
 Test details:
 
 Mac OSX 10.5
 GDC 4.0.1
 Java 1.5.0_13
 
 ================
 
 import std.stdio;
 import std.random;
 
 char[] letters = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
 
 byte[] makeGUID(int size)
 {
 	byte[] a = new byte[size];
 	
 	for (int i = 0; i < size; i++)
 	{
 		a[i] = letters[rand() % letters.length];
 	}
 	
 	return a;
 }
 
 const int tableSize = 50000;
 
 version(Unix)
 {
 	private import std.c.unix.unix;
 	
 int main (char[][] args)
 {
 	byte[][tableSize] guids;
 	int[byte[]] dictionary;
 		
 	for (int i = 0; i < tableSize; i++)
 	{
 		byte[] guid = makeGUID(26);
 		guids[i] = guid;
 		dictionary[guid] = i;
 	}
 	
 	dictionary.rehash;
 	
 	ulong start = time(null);
 	for (int i = 0; i < (500 * 1000 * 1000); i++)
 	{
 		byte[] guid = guids[i % guids.length];
 		int j = dictionary[guid];
 	}
 	ulong stop = time(null);
 	
 	writefln(stop-start);
 	
 	return 0;
 }
 }

A toHash for a GUID could basically just return the first 4 bytes of the GUID as hash: struct Guid { byte[] guid;// or fixed size? uint toHash() { return *cast(uint*)byte.ptr; } } int[Guid] dictionary; I've boosted AArray filling/reading in the past by temporarily disabling the garbage collector (std.gc.disable()/enable()). Anyway, if performance doesn't turn you to D, surely this will: # for (int i = 0; i < 500_000_000; i++) L.
Nov 08 2007