## digitalmars.D.learn - speeding up + ctfe question

- maarten van damme <maartenvd1994 gmail.com> May 26 2012
- sclytrack <sclytrack iq87.fr> May 26 2012
- "jerro" <a a.com> May 26 2012
- sclytrack <sclytrack iq87.fr> May 27 2012
- sclytrack <sclytrack iq87.fr> May 27 2012
- maarten van damme <maartenvd1994 gmail.com> May 26 2012
- "jerro" <a a.com> May 26 2012
- maarten van damme <maartenvd1994 gmail.com> May 27 2012
- maarten van damme <maartenvd1994 gmail.com> May 27 2012
- "jerro" <a a.com> May 27 2012
- maarten van damme <maartenvd1994 gmail.com> May 27 2012
- "jerro" <a a.com> May 27 2012
- maarten van damme <maartenvd1994 gmail.com> May 28 2012
- "jerro" <a a.com> May 28 2012
- maarten van damme <maartenvd1994 gmail.com> May 28 2012

I finally made the primerange I wanted to make. I think I'm using a rather dumb algorithm. The idea is to store the prime we're at in curPrime and the amount of preceding primes in countPrime. Then opindex is able to calculate how many primes following or preceding curPrime we have to find to get the asked value. I'm not english so some names can be poorly chosen. Code is at http://dl.dropbox.com/u/15024434/d/primeRange.d The problem is that it's pretty slow. I wanted to find the first prime number with 10 digits and it literally takes forever (but manages it though). Is there an easy way to speed up or is this the ceiling? (better algorithms, stupidity's I've left,...) I also have an array of primes I use to "quicktest" to see if something is a prime. You can change it's size using setQuicktestSize at runtime. I want to make it large at compile time though. as I'm new to ctfe I have no idea as to how one can do this.

May 26 2012

On 05/26/2012 03:49 PM, maarten van damme wrote:I finally made the primerange I wanted to make. I think I'm using a rather dumb algorithm. The idea is to store the prime we're at in curPrime and the amount of preceding primes in countPrime. Then opindex is able to calculate how many primes following or preceding curPrime we have to find to get the asked value. I'm not english so some names can be poorly chosen. Code is at http://dl.dropbox.com/u/15024434/d/primeRange.d The problem is that it's pretty slow. I wanted to find the first prime number with 10 digits and it literally takes forever (but manages it though). Is there an easy way to speed up or is this the ceiling? (better algorithms, stupidity's I've left,...) I also have an array of primes I use to "quicktest" to see if something is a prime. You can change it's size using setQuicktestSize at runtime. I want to make it large at compile time though. as I'm new to ctfe I have no idea as to how one can do this.

See below for a rather dumb algorithm. The idea is to list a list of prime numbers. The problem is that it is pretty slow and you get a ... //src/main.d(57): Error: template instance main.isDivisable!(499,498) recursive expansion error when the number is too large. I'm not English so the names ARE poorly chosen. //writeln(primeList!(200)); //<---don't pick over 200 or it goes nuts //writeln(decompose!(49,2)); template primeList(int index) { static if (index > 1) { static if (isPrime!(index)) { enum int [] primeList = primeList!(index-1) ~ [index]; } else { enum int [] primeList = primeList!(index-1); } } else { enum int [] primeList = []; } } template isDivisable( int number, int divisor) { enum bool isDivisable = (number % divisor) == 0; } // 10 ---> [2,5] template decompose(int number, int d = 2) { static if (d < number) { static if (isDivisable!(number, d)) { enum int [] decompose = [d] ~ decompose!(number/d, d); } else { enum int [] decompose = decompose!(number, d+1); } } else { enum int [] decompose = [number]; } } template isPrimeDecompose(int number, int d = 2) { static if (d < number) { static if (isDivisable!(number, d)) { enum bool isPrimeDecompose = false; } else { enum bool isPrimeDecompose = isPrimeDecompose!(number, d+1); } } else { enum bool isPrimeDecompose = true; } } template isPrime(int number) { static if (number > 1) enum bool isPrime = isPrimeDecompose!(number, 2); else enum bool isPrime = false; }

May 26 2012

On Saturday, 26 May 2012 at 13:49:53 UTC, maarten van damme wrote: Is there an easy way to speed up or is this theceiling?

I got a 30 percent speedup by replacing this line: if(&& canFind(quicktestPrimes, possiblePrime)) with this: if(quicktestPrimes.back >= possiblePrime && canFind(quicktestPrimes, possiblePrime)) You could probably get a much better speedup by using some kind of a prime sieve. BTW you shouldn't be using opIndex like that. Calling opIndex shouldn't change observable state of the object and it should run in O(log(n)) or better. People expect an index operator to behave similarly to the index operator on arrays. If the function does something completely different, it is very confusing to make it an opIndex.

May 26 2012

On 05/27/2012 01:18 PM, maarten van damme wrote:well, maybe, looking back at it, a range isn't the ideal way to go about generating primes easily. I'm going to implement the sieve of Atkin and make it return an array of primes up to a given number. This might be a bit more useful and fast. Is there somewhere I can catch up on ctfe? after reading andrei's book and the documentation it's still not clear to me how I could make my setquicktestsize function run at compile time.

https://github.com/PhilippeSigaud/D-templates-tutorial

May 27 2012

On 05/27/2012 02:10 PM, maarten van damme wrote:thank you :) I wrote a sieve of aristotle because atkin's sieve needed waay to many optimizations to beat aristotle's sieve by even a little bit so I wanted to see if aristotle's was good enough. I ran in two problems. It was extremely fast when sieving a byte array of 1 million entries (without optimizations at all) but when trying with 10_000_000 entries the program crashes before it even starts to execute (main isn't reached). Then I decided to benchmark the code and dmd fails to compile with "Internal error: ..\ztc\cgcs.s 354". I assume the first problem has to do with memory limits and that the second is a little dmd bug. code is at http://dl.dropbox.com/u/15024434/d/aristotleSieve.d

Yeah I get the same errors. And I'm not acquainted with the benchmark code. -- http://dlang.org/concepts.html isprime there says that 2 is not a prime. enum result = isprime(2); compile time enum "result" says false. -- http://erdani.com/d/web/phobos-prerelease/std_benchmark.html (the future) -- Er is zon buiten. Zonnige zondag namiddag met priemgetallen in de plaats van buiten zitten. Tss tss. :-)

May 27 2012

well, I was thinking about using a sieve but when you were to request prime 400_000 you're sieve would blow up in size. That's why I opted for something else (and I don't know if it was the right thing to do though). (Ab)using opIndex like that is indeed a bit wrong but what would be the alternative? remove slicing and introduce getPrime? Wouldn't that be the same but without the great syntactic sugar?

May 26 2012

On Saturday, 26 May 2012 at 18:40:53 UTC, maarten van damme wrote:well, I was thinking about using a sieve but when you were to request prime 400_000 you're sieve would blow up in size.

Because you only need primes up to sqrt(n) to check if n is prime, you can make a sieve based range that only uses O(sqrt(n)) memory. I've put an example at https://gist.github.com/2795954 .It could be optimized further (skipping even numbers, keeping track of the multiples of small primes...), but it is already much faster than a range that does not use a sieve.That's why I opted for something else (and I don't know if it was the right thing to do though). (Ab)using opIndex like that is indeed a bit wrong but what would be the alternative? remove slicing and introduce getPrime? Wouldn't that be the same but without the great syntactic sugar?

Syntax sugar isn't great if it confuses people and causes bugs. You can use r.drop(n).front to get nth element of a range. If you need to do it often, you could just write a nth() function.

May 26 2012

well, maybe, looking back at it, a range isn't the ideal way to go about generating primes easily. I'm going to implement the sieve of Atkin and make it return an array of primes up to a given number. This might be a bit more useful and fast. Is there somewhere I can catch up on ctfe? after reading andrei's book and the documentation it's still not clear to me how I could make my setquicktestsize function run at compile time.

May 27 2012

thank you :) I wrote a sieve of aristotle because atkin's sieve needed waay to many optimizations to beat aristotle's sieve by even a little bit so I wanted to see if aristotle's was good enough. I ran in two problems. It was extremely fast when sieving a byte array of 1 million entries (without optimizations at all) but when trying with 10_000_000 entries the program crashes before it even starts to execute (main isn't reached). Then I decided to benchmark the code and dmd fails to compile with "Internal error: ..\ztc\cgcs.s 354". I assume the first problem has to do with memory limits and that the second is a little dmd bug. code is at http://dl.dropbox.com/u/15024434/d/aristotleSieve.d

May 27 2012

I ran in two problems. It was extremely fast when sieving a byte array of 1 million entries (without optimizations at all) but when trying with 10_000_000 entries the program crashes before it even starts to execute (main isn't reached).

You say it does compile, but then crashes immediatly? I can't reproduce that. For me it is a compile time error if toSieve is 16MB or more, but otherwise it runs fine. Anyway you should use a dynamic array instead of a static one if you want really large arrays.Then I decided to benchmark the code and dmd fails to compile with "Internal error: ..\ztc\cgcs.s 354". I assume the first problem has to do with memory limits and that the second is a little dmd bug.

It is a bug, all internal errors are. It looks like you can avoid like this: auto tmp=(benchmark!(f0)(1000)); int time=tmp[0].to!("msecs", int);

May 27 2012

ok, can't seem to reproduce the crashing. now on to optimizing my sieve a bit more ,9 miliseconds for 1_000_000 is still to slow. -- Er is zon buiten. Zonnige zondag namiddag met priemgetallen in de plaats van buiten zitten. Tss tss. :-) hoe? wie? x)

May 27 2012

On Sunday, 27 May 2012 at 17:00:01 UTC, maarten van damme wrote:ok, can't seem to reproduce the crashing. now on to optimizing my sieve a bit more ,9 miliseconds for 1_000_000 is still to slow. -- Er is zon buiten. Zonnige zondag namiddag met priemgetallen in de plaats van buiten zitten. Tss tss. :-) hoe? wie? x)

Have you tried turning optimization on? I get about 8ms with your code with no flags and about 2.9 with -O -inline -release. To optimize it further you could make a sieve that only works with odd numbers. I get (https://gist.github.com/2817289) about 1ms that way. I'm sure it is possible to do much better, but that would probably be much more work too.

May 27 2012

I tried using 6k+-1 for all primes and for some reason it performed slower. I think I have something completely inefficient somewhere, can't figure out where though. I think it has something to do with me increasing k and then multiplying with k while I could have simply added 6 to K... and I haven't tried with optimization on, first I want to get the algorithm as good as possible before trying to squeeze out those extra milliseconds :)

May 28 2012

I tried using 6k+-1 for all primes and for some reason it performed slower. I think I have something completely inefficient somewhere, can't figure out where though.

You aren't, by any chance, using divisions or remainders? those are much slower than, say, multiplications (unless the divisor is a power of two and known at compile time, in which case the compiler will optimize them to bitwise operations).and I haven't tried with optimization on, first I want to get the algorithm as good as possible before trying to squeeze out those extra milliseconds :)

I prefer to compile with optimizations on when trying to optimize something. That way I don't waste my time on micro-optimizations that the compiler could do for me.

May 28 2012

I didn't use divisions or remainders, only multiplications. I've changed it so it only uses addition and now it still doesn't outperform a version that only checks odd's. it's not as fast as your version where every index corresponds to i*2+1 because I fill every even number with false...

May 28 2012