digitalmars.D - Expected performance

Martin Hess (58/58) Nov 07 2007 I'm considering D for a project that is very CPU intensive so I'm evalua...

Sascha Katzner (6/8) Nov 07 2007 Perhaps they use different hash functions? If you really want a fast as
Aarti_pl (8/22) Nov 07 2007 Did you turn on optimizations when compiling D code?

Michel Fortin (12/17) Nov 07 2007 He's using GDC, so that would rather be:

Martin Hess (2/26) Nov 07 2007

bearophile (8/11) Nov 07 2007 From my tests I think D AAs aren't much optimized yet, in the past I hav...

sambeau (6/10) Nov 07 2007 You might also want to take a look at this:

Lionello Lunesu (14/86) Nov 08 2007 A toHash for a GUID could basically just return the first 4 bytes of the...

Martin Hess <martinhess mac.com> writes:

I'm considering D for a project that is very CPU intensive so I'm evaluating
D's performance. As a first test I started with associative arrays and ended up
with a result that surprised me a bit so I would like to get some input on the
test's construction.

The test builds a dictionary of 50k byte arrays that map to integers.

int[byte[]] dictionary;

Next it looks up elements a half billion times.

Assuming 1.5 instructions per cycle I'm seeing about 387 instructions per
lookup.

I tried it in Java and was seeing about 309 instructions per lookup.

Nievely I assumed I would get better performance that Java. Does anyone have
any thoughts on this?

Test details:

Mac OSX 10.5
GDC 4.0.1
Java 1.5.0_13

================

import std.stdio;
import std.random;

char[] letters = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

byte[] makeGUID(int size)
{
	byte[] a = new byte[size];
	
	for (int i = 0; i < size; i++)
	{
		a[i] = letters[rand() % letters.length];
	}
	
	return a;
}

const int tableSize = 50000;

version(Unix)
{
	private import std.c.unix.unix;
	
int main (char[][] args)
{
	byte[][tableSize] guids;
	int[byte[]] dictionary;
		
	for (int i = 0; i < tableSize; i++)
	{
		byte[] guid = makeGUID(26);
		guids[i] = guid;
		dictionary[guid] = i;
	}
	
	dictionary.rehash;
	
	ulong start = time(null);
	for (int i = 0; i < (500 * 1000 * 1000); i++)
	{
		byte[] guid = guids[i % guids.length];
		int j = dictionary[guid];
	}
	ulong stop = time(null);
	
	writefln(stop-start);
	
	return 0;
}
}

Nov 07 2007

Sascha Katzner <sorry.no spam.invalid> writes:

Martin Hess wrote:
 Nievely I assumed I would get better performance that Java. Does
 anyone have any thoughts on this?

Perhaps they use different hash functions? If you really want a fast as 
possible implementation I would suggest you implement your own hash 
table, with a customized hash function.

LLAP,
Sascha Katzner

Nov 07 2007

Aarti_pl <aarti interia.pl> writes:

Martin Hess pisze:
 I'm considering D for a project that is very CPU intensive so I'm evaluating
D's performance. As a first test I started with associative arrays and ended up
with a result that surprised me a bit so I would like to get some input on the
test's construction.
 
 The test builds a dictionary of 50k byte arrays that map to integers.
 
 int[byte[]] dictionary;
 
 Next it looks up elements a half billion times.
 
 Assuming 1.5 instructions per cycle I'm seeing about 387 instructions per
lookup.
 
 I tried it in Java and was seeing about 309 instructions per lookup.
 
 Nievely I assumed I would get better performance that Java. Does anyone have
any thoughts on this?
 

Did you turn on optimizations when compiling D code?

-inline        do function inlining
-O             optimize
-release       compile release version

BR
Marcin Kuszczak
(aarti_pl - www.zapytajmnie.com)

Nov 07 2007

Michel Fortin <michel.fortin michelf.com> writes:

On 2007-11-07 05:06:55 -0500, Aarti_pl <aarti interia.pl> said:

 Did you turn on optimizations when compiling D code?
 
 -inline        do function inlining
 -O             optimize
 -release       compile release version

He's using GDC, so that would rather be:

-finline-functions
one of -O3 (fastest) or -Os (fastest, smaller)
-frelease

You could also optimize intruction scheduling for a specific processor; 
for me that would be:

-mtune=G4

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 07 2007

Martin Hess <martinhess mac.com> writes:

I had -O3 already but I added the other 2 optimizations and didn't see any time
difference.

Michel Fortin Wrote:

 On 2007-11-07 05:06:55 -0500, Aarti_pl <aarti interia.pl> said:
 
 Did you turn on optimizations when compiling D code?
 
 -inline        do function inlining
 -O             optimize
 -release       compile release version

 
 He's using GDC, so that would rather be:
 
 -finline-functions
 one of -O3 (fastest) or -Os (fastest, smaller)
 -frelease
 
 You could also optimize intruction scheduling for a specific processor; 
 for me that would be:
 
 -mtune=G4
 
 -- 
 Michel Fortin
 michel.fortin michelf.com
 http://michelf.com/

Nov 07 2007

bearophile <bearophileHUGS lycos.com> writes:

Martin Hess:
 The test builds a dictionary of 50k byte arrays that map to integers.
 int[byte[]] dictionary;
 Next it looks up elements a half billion times.

From my tests I think D AAs aren't much optimized yet, in the past I have shown
simple Python code that uses Python dicts that runs faster than D code (you can
find it in the group or I can try to find it again), and I have done similar

good hash function and good hash collision management. And if you put tons of
keys like that you also need a good GC (or memory management). With time D will
probably improve, D developers can take a look at how Python implemens dicts,
to copy them all (Perl dicts may be fit too).

With your situation you have both lot of memory to manage and a large data
block to compute hash on, so you need both quick memory management and and a
fast hash function. For the memory management I can't help you much, but for
the hash function you may take a look at the assembly version of the
superfasthash:

http://www.azillionmonkeys.com/qed/hash.html
http://www.team5150.com/~andrew/blog/2007/03/breaking_superfasthash.html
http://www.team5150.com/~andrew/blog/2007/03/when_bad_hashing_means_good_caching.html

Bye,
bearophile

Nov 07 2007

sambeau <sambeau do-not-spam-sambeau.com> writes:

bearophile Wrote:

 For the memory management I can't help you much, but for the hash function you
may take a look at the assembly version of the superfasthash:
 
 http://www.azillionmonkeys.com/qed/hash.html
 http://www.team5150.com/~andrew/blog/2007/03/breaking_superfasthash.html
 http://www.team5150.com/~andrew/blog/2007/03/when_bad_hashing_means_good_caching.html


You might also want to take a look at this:
http://judy.sourceforge.net/

Cheers,
s

Nov 07 2007

Lionello Lunesu <lio lunesu.remove.com> writes:

Martin Hess wrote:
 I'm considering D for a project that is very CPU intensive so I'm evaluating
D's performance. As a first test I started with associative arrays and ended up
with a result that surprised me a bit so I would like to get some input on the
test's construction.
 
 The test builds a dictionary of 50k byte arrays that map to integers.
 
 int[byte[]] dictionary;
 
 Next it looks up elements a half billion times.
 
 Assuming 1.5 instructions per cycle I'm seeing about 387 instructions per
lookup.
 
 I tried it in Java and was seeing about 309 instructions per lookup.
 
 Nievely I assumed I would get better performance that Java. Does anyone have
any thoughts on this?
 
 Test details:
 
 Mac OSX 10.5
 GDC 4.0.1
 Java 1.5.0_13
 
 ================
 
 import std.stdio;
 import std.random;
 
 char[] letters = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
 
 byte[] makeGUID(int size)
 {
 	byte[] a = new byte[size];
 	
 	for (int i = 0; i < size; i++)
 	{
 		a[i] = letters[rand() % letters.length];
 	}
 	
 	return a;
 }
 
 const int tableSize = 50000;
 
 version(Unix)
 {
 	private import std.c.unix.unix;
 	
 int main (char[][] args)
 {
 	byte[][tableSize] guids;
 	int[byte[]] dictionary;
 		
 	for (int i = 0; i < tableSize; i++)
 	{
 		byte[] guid = makeGUID(26);
 		guids[i] = guid;
 		dictionary[guid] = i;
 	}
 	
 	dictionary.rehash;
 	
 	ulong start = time(null);
 	for (int i = 0; i < (500 * 1000 * 1000); i++)
 	{
 		byte[] guid = guids[i % guids.length];
 		int j = dictionary[guid];
 	}
 	ulong stop = time(null);
 	
 	writefln(stop-start);
 	
 	return 0;
 }
 }

A toHash for a GUID could basically just return the first 4 bytes of the 
GUID as hash:

struct Guid
{
   byte[] guid;// or fixed size?
   uint toHash() { return *cast(uint*)byte.ptr; }
}
int[Guid] dictionary;

I've boosted AArray filling/reading in the past by temporarily disabling 
the garbage collector (std.gc.disable()/enable()).

Anyway, if performance doesn't turn you to D, surely this will:



L.

Nov 08 2007

D Programming

C/C++ Programming

Other

digitalmars.D - Expected performance