## digitalmars.D - Euler problems 14, 135, 174

- bearophile <bearophileHUGS lycos.com> Apr 08 2010
- Don <nospam nospam.com> Apr 09 2010
- bearophile <bearophileHUGS lycos.com> Apr 09 2010
- BCS <none anon.com> Apr 09 2010
- bearophile <bearophileHUGS lycos.com> Apr 10 2010
- bearophile <bearophileHUGS lycos.com> Apr 09 2010

When I show D examples here I can't show large D programs because people will not read them, and I can't show 10-lines long D programs because they get ignored as useless toys. I have chosen three problems from the famous Project Euler problem set, number 14, 135 and 174: http://projecteuler.net/index.php?section=problems&id=14 http://projecteuler.net/index.php?section=problems&id=135 http://projecteuler.net/index.php?section=problems&id=174 You can surely find a way to solve those three problems efficiently using idiomatic C language. I have chosen to translate to D2 their Python implementation found in this blog post, "Three Pathological cases for the PyPy JIT": http://www.ripton.net/blog/?p=51 Here I have on purpose translated Python code not written by me. D is designed to help in the translation from C to D, but in the past I have translated Python code to D V.1 almost every day, it was a common thing. I think I am not the only person that wants to translate some Python code to D. ------------------------------- Problem 14: The following iterative sequence is defined for the set of positive integers: n -> n / 2 (n is even) n -> 3n + 1 (n is odd) Which starting number, under one million, produces the longest chain? That's a famous sequence. This is the original Python+Psyco implementation: def next_num(num): if num & 1: return 3 * num + 1 else: return num // 2 MAX_NUM = 1000000 # try with 5 millions too lengths = {1: 0} def series_length(num): global lengths if num in lengths: return lengths[num] else: num2 = next_num(num) result = 1 + series_length(num2) lengths[num] = result return result def main(): num_with_max_length = 1 max_length = 0 for ii in range(1, MAX_NUM): length = series_length(ii) if length > max_length: max_length = length num_with_max_length = ii print num_with_max_length, max_length import psyco; psyco.full() if __name__ == "__main__": main() I am using Python 2.6.5 with Psyco 1.6.0final. The translation to D2 is easy: import std.stdio: writeln; int next_num(int num) { if (num & 1) return 3 * num + 1; else return num / 2; } int[int] lengths; static this() { lengths = [1: 0]; } int series_length(int num) { int* p = num in lengths; if (p) { return *p; } else { int num2 = next_num(num); int result = 1 + series_length(num2); lengths[num] = result; return result; } } void main() { enum int MAX_NUM = 1_000_000; // //enum int MAX_NUM = 5_000_000; // s int num_with_max_length = 1; int max_length = 0; foreach (ii; 1 .. MAX_NUM) { int length = series_length(ii); if (length > max_length) { max_length = length; num_with_max_length = ii; } } writeln(num_with_max_length, " ", max_length); } The lengths array is not a static variable inside series_length() because it needs some initialization, that I have to perform in the static this(). If you try to run it (I am using dmd 2.042) you receive an error: object.Error: Stack Overflow It's a nice error because you know for sure there's a bug somewhere. In Python the integer numbers are multi-precision, and often Euler problems use large integer numbers. So the first thing I've done is to change the ints into longs: import std.stdio: writeln; alias long Int; Int next_num(Int num) { if (num & 1) return 3 * num + 1; else return num / 2; } Int[Int] lengths; static this() { lengths = [1: 0]; } Int series_length(Int num) { Int* p = num in lengths; if (p) { return *p; } else { Int num2 = next_num(num); Int result = 1 + series_length(num2); lengths[num] = result; return result; } } void main() { enum Int MAX_NUM = 1_000_000; // //enum Int MAX_NUM = 5_000_000; // s Int num_with_max_length = 1; Int max_length = 0; foreach (ii; 1 .. MAX_NUM) { Int length = series_length(ii); if (length > max_length) { max_length = length; num_with_max_length = ii; } } writeln(num_with_max_length, " ", max_length); } I can say this works because it gives the same answer of the Python program. If you don't have a Python program to compare the output to you can't be sure, because somewhere some number can be bigger than a long too, and cause silent bugs. If you don't have a Python program to compare it to you need unit tests and lot of care to be sure the result is correct. With MAX_NUM=1_000_000 the max key in lengths is 56_991_483_520, so it's truly a sparse associative array. You can't use just a normal dynamic array to represent its key-values. When the D2 program is debugged I compile with -O -release -inline. The Python+Psyco program prints: 837799 524 It runs in 1.60 seconds on Windows 32 bit, 2.3 GHz CPU. The D2 version prints the same output: 837799 524 It runs in 5.40 seconds. The cause of the significantly lower performance of the D code is surely the Python dicts that are far more optimized, despite the interpter and JIT costs some time. Putting __gshared before the definition of lengths doesn't change the running time of the D code. I will be glad to benchmark the bigints of dmd 2.043. (I hope Don has implemented the small int optimization I suggested him. The less significant bit of a 32/64 bit union can be used as a boolean flag to tell apart a small int allocated in place on the stack, from a pointer to heap-allocated big int). ------------------------------- Problem 135: Given the positive integers, x, y, and z, are consecutive terms of an arithmetic progression, the least value of the positive integer, n, for which the equation, x**2 - y**2 - z**2 = n, has exactly two solutions is n = 27: 34**2 - 27**2 - 20**2 = 12**2 - 9**2 - 6**2 = 27 It turns out that n = 1155 is the least value which has exactly ten solutions. How many values of n less than one million have exactly ten distinct solutions? The Python code is not complex: def main(): limit = 1000000 counts = limit * [0] for d in range(1, limit / 4 + 1): d24 = d ** 2 * 4 if d24 < limit: start = 1 counts[d24] += 1 else: start = int((d24 - limit) ** 0.5) if start < 1: start = 1 for root in xrange(start, 2 * d): n = d24 - root ** 2 if n <= 0: break if n < limit: z = root + d y = z + d x = y + d counts[n] += 1 if d > root: z = d - root y = z + d x = y + d counts[n] += 1 total = 0 assert counts[0] == 0 for val in counts: if val == 10: total += 1 print total import psyco; psyco.full() if __name__ == "__main__": main() The translation to D2 is easy, it uses the useful ^^ operator: import std.stdio: writeln; import std.math; void main() { enum int limit = 1_000_000; auto counts = new int[limit]; foreach (d; 1 .. limit / 4 + 1) { int d24 = d ^^ 2 * 4; // line 9 int start; if (d24 < limit) { start = 1; counts[d24]++; // line 13 } else start = cast(int)((d24 - limit) ^^ 0.5); if (start < 1) start = 1; foreach (root; start .. 2 * d) { int n = d24 - root ^^ 2; if (n <= 0) break; if (n < limit) { int z = root + d; int y = z + d; int x = y + d; counts[n]++; if (d > root) { z = d - root; y = z + d; x = y + d; counts[n]++; } } } } int total; assert(counts[0] == 0); foreach (val; counts) if (val == 10) total++; writeln(total); } I have imported std.math as a workaround to a bug (already in bugzilla), because ^^0.5 can't find the sqrt. The D2 program produces a run time bug (thanks to array bound tests): core.exception.RangeError temp(13): Range violation Some short debugging shows that line 9 has an integer overflow: Values of d24: ... 2146654224 2146839556 2147024896 2147210244 2147395600 -2147386332 => error When d24 becomes negative this if(d24<limit) is true, so this line runs: counts[d24]++; And it causes the out of array bounds error. So I can replace the ints with longs as before, but then I have three compile errors because you can't index array items with a long. So I have to use some casts, this is the new code: import std.stdio: writeln; import std.math; void main() { enum long limit = 1_000_000; auto counts = new long[limit]; foreach (d; 1 .. limit / 4 + 1) { long d24 = d ^^ 2 * 4; long start; if (d24 < limit) { start = 1; counts[cast(uint)d24]++; } else start = cast(long)((d24 - limit) ^^ 0.5); if (start < 1) start = 1; foreach (root; start .. 2 * d) { long n = d24 - root ^^ 2; if (n <= 0) break; if (n < limit) { long z = root + d; long y = z + d; long x = y + d; counts[cast(uint)n]++; if (d > root) { z = d - root; y = z + d; x = y + d; counts[cast(uint)n]++; } } } } long total; assert(counts[0] == 0); foreach (val; counts) if (val == 10) total++; writeln(total); } This time the D program is much faster, it runs in 0.2 seconds, while the Psyco version needs 2.47 seconds. Both print the 4989 result. ------------------------------- Problem 174: We shall define a square lamina to be a square outline with a square "hole" so that the shape possesses vertical and horizontal symmetry. Given eight tiles it is possible to form a lamina in only one way: 3x3 square with a 1x1 hole in the middle. However, using thirty-two tiles it is possible to form two distinct laminae. http://projecteuler.net/project/images/p_173_square_laminas.gif If t represents the number of tiles used, we shall say that t = 8 is type L(1) and t = 32 is type L(2). Let N(n) be the number of t <= 1000000 such that t is type L(n); for example, N(15) = 832. What is sum(N(n)) for 1 <= n <= 10? The Python implementation: from math import ceil from collections import defaultdict def gen_laminae(limit): """Yield laminae with up to limit squares, as tuples (outer, inner, squares)""" for outer in range(3, limit // 4 + 2): if outer & 1: min_min_inner = 1 else: min_min_inner = 2 min_inner_squared = outer ** 2 - limit if min_inner_squared < 0: min_inner = min_min_inner else: min_inner = max(min_min_inner, int(ceil(min_inner_squared ** 0.5))) if outer & 1 != min_inner & 1: min_inner += 1 outer_squared = outer ** 2 for inner in range(min_inner, outer - 1, 2): squares = outer_squared - inner ** 2 yield (outer, inner, squares) def main(): squares_to_count = defaultdict(int) for (outer, inner, squares) in gen_laminae(1000000): squares_to_count[squares] += 1 histogram = defaultdict(int) for val in squares_to_count.values(): if val <= 10: histogram[val] += 1 print sum(histogram.values()) if __name__ == "__main__": main() The defaultdict is like a D associative array, you can for example increment a value even if the key is not already present (this is not allowed in normal python dict). If a value is not already present uses the given "constructor" to create a default value. Here it uses int, calling int() it returns 0. Here I am not sure what to do, I can translate that code to D2 as it is, but I don't like that code too much, because it returns a tuple (outer,inner,squares) but then the main() uses just its squares item. This wastes some time to build all those tuples. So in theory I can translate to D both that Python version with the 3-tuple and a versione that yields a single squares value. To keep things simpler I translate just a Python version that yields a single number (I have not used Psyco because here it worsens the performance a bit): from math import ceil from collections import defaultdict def gen_laminae(limit): for outer in range(3, limit // 4 + 2): if outer & 1: min_min_inner = 1 else: min_min_inner = 2 min_inner_squared = outer ** 2 - limit if min_inner_squared < 0: min_inner = min_min_inner else: min_inner = max(min_min_inner, int(ceil(min_inner_squared ** 0.5))) if (outer & 1) != (min_inner & 1): min_inner += 1 outer_squared = outer ** 2 for inner in range(min_inner, outer - 1, 2): squares = outer_squared - inner ** 2 yield squares def main(): squares_to_count = defaultdict(int) for squares in gen_laminae(1000000): squares_to_count[squares] += 1 histogram = defaultdict(int) for val in squares_to_count.values(): if val <= 10: histogram[val] += 1 print sum(histogram.values()) if __name__ == "__main__": main() First translation to D: import std.stdio: writeln; import std.math: ceil; import std.algorithm: max; import std.math; // for ^^0.5 /// Yield laminae with up to limit squares, as squares struct GenLaminae { int limit; int opApply(int delegate(ref int) dg) { int result; foreach (outer; 3 .. limit / 4 + 2) { int min_min_inner; if (outer & 1) min_min_inner = 1; else min_min_inner = 2; int min_inner_squared = outer ^^ 2 - limit; int min_inner; if (min_inner_squared < 0) { min_inner = min_min_inner; } else { min_inner = max(min_min_inner, cast(int)ceil(min_inner_squared ^^ 0.5)); if ((outer & 1) != (min_inner & 1)) min_inner++; } int outer_squared = outer ^^ 2; // foreach can't be used here for (int inner = min_inner; inner < (outer - 1); inner += 2) { int squares = outer_squared - inner ^^ 2; result = dg(squares); if (result) break; } } return result; } } void main() { int[int] squares_to_count; foreach (squares; GenLaminae(1_000_000)) squares_to_count[squares]++; int[int] histogram; foreach (val; squares_to_count) if (val <= 10) histogram[val]++; int tot; foreach (val; histogram) tot += val; writeln(tot); } At the end of the main() I have used a foreach because Phobos2 seems to miss a sum(). It has reduce(), but reduce is less easy to use and it introduces a cognitive burden that's small but not small enough, so using the foreach is simpler. I suggest to add a sum() to Phobos2, if it's missing. The D2 program doesn't seem to stop. Probably some other invisible overflow. So crossing my fingers I have used longs instead (the opApply uses ints still, it just yields the long): import std.stdio: writeln; import std.math: ceil; import std.algorithm: max; import std.math; // for ^^0.5 /// Yield laminae with up to limit squares, as squares struct GenLaminae { long limit; int opApply(int delegate(ref long) dg) { int result; foreach (outer; 3 .. limit / 4 + 2) { long min_min_inner; if (outer & 1) min_min_inner = 1; else min_min_inner = 2; long min_inner_squared = outer ^^ 2 - limit; long min_inner; if (min_inner_squared < 0) { min_inner = min_min_inner; } else { min_inner = max(min_min_inner, cast(long)ceil(min_inner_squared ^^ 0.5)); if ((outer & 1) != (min_inner & 1)) min_inner++; } long outer_squared = outer ^^ 2; // foreach can't be used here for (long inner = min_inner; inner < (outer - 1); inner += 2) { long squares = outer_squared - inner ^^ 2; result = dg(squares); if (result) break; } } return result; } } void main() { long[long] squares_to_count; foreach (squares; GenLaminae(1_000_000)) squares_to_count[squares]++; long[long] histogram; foreach (val; squares_to_count) if (val <= 10) histogram[val]++; long tot; foreach (val; histogram) tot += val; writeln(tot); } By chance it seems to not overflow longs, and it gives the right output. The result is 209566. The D programs runs in 0.77 seconds, the Psyco version in 3.66 seconds. As a final test I have translated the D code to C#2, for mono. First I have translated the wrong version, that uses ints: using System; using System.Collections.Generic; static class Euler174 { static IEnumerable<int> GenLaminae(int limit) { checked { for (int outer = 3; outer < limit / 4 + 2; outer++) { int min_min_inner; if ((outer & 1) != 0) min_min_inner = 1; else min_min_inner = 2; int min_inner_squared = outer * outer - limit; int min_inner; if (min_inner_squared < 0) { min_inner = min_min_inner; } else { min_inner = Math.Max(min_min_inner, (int)Math.Ceiling(Math.Sqrt(min_inner_squared))); if ((outer & 1) != (min_inner & 1)) min_inner++; } int outer_squared = outer * outer; for (int inner = min_inner; inner < (outer - 1); inner += 2) { int squares = outer_squared - inner * inner; yield return squares; } } } } static void Main() { checked { var squares_to_count = new Dictionary<int, int>(); foreach (var squares in GenLaminae(1000000)) { if (squares_to_count.ContainsKey(squares)) squares_to_count[squares]++; else squares_to_count.Add(squares, 1); } var histogram = new Dictionary<int, int>(); foreach (var val in squares_to_count.Values) { if (val <= 10) { if (histogram.ContainsKey(val)) // slow double lookup? histogram[val]++; else histogram.Add(val, 1); } } int tot = 0; foreach (var val in histogram.Values) tot += val; Console.WriteLine("{0} ", tot); } } } At runtime it stops in about a second and gives a nice OverflowException, so I know the program is wrong: Unhandled Exception: System.OverflowException: Number overflow. at Euler174+<GenLaminae>c__Iterator0.MoveNext () [0x00000] at Euler174.Main () [0x00000] So I can try with longs, and this time it throws no exceptions, so longs are enough! (There can be other kind of bugs, of course, but one big source of bugs is killed): using System; using System.Collections.Generic; static class Euler174 { static IEnumerable<long> GenLaminae(long limit) { checked { for (long outer = 3; outer < limit / 4 + 2; outer++) { long min_min_inner; if ((outer & 1) != 0) min_min_inner = 1; else min_min_inner = 2; long min_inner_squared = outer * outer - limit; long min_inner; if (min_inner_squared < 0) { min_inner = min_min_inner; } else { min_inner = Math.Max(min_min_inner, (long)Math.Ceiling(Math.Sqrt(min_inner_squared))); if ((outer & 1) != (min_inner & 1)) min_inner++; } long outer_squared = outer * outer; for (long inner = min_inner; inner < (outer - 1); inner += 2) { long squares = outer_squared - inner * inner; yield return squares; } } } } static void Main() { checked { var squares_to_count = new Dictionary<long, long>(); foreach (var squares in GenLaminae(1000000)) { if (squares_to_count.ContainsKey(squares)) squares_to_count[squares]++; else squares_to_count.Add(squares, 1); } var histogram = new Dictionary<long, long>(); foreach (var val in squares_to_count.Values) { if (val <= 10) { if (histogram.ContainsKey(val)) // slow double lookup? histogram[val]++; else histogram.Add(val, 1); } } long tot = 0; foreach (var val in histogram.Values) tot += val; Console.WriteLine("{0} ", tot); } } } That C# code runs in 0.52 seconds (Mono C# compiler, gmcs, probably version 2.6.3. Using -optimize+ changes nothing. I have not tried on dotnet that's faster). I have done few small tests and I've seen that most of the time is spent in the Main(), so here the lower performance of the D code (the D version needs 0.77 seconds to run) is again caused by slower associative arrays. Note: If I comment out the checked{ the running time is about the same 0.52-0.53 seconds. Those checks don't slow down this program (because most of the time is spent in the associative array logic). In general the overhead of such tests is minimal. Here C# shows to be safer and faster, even on mono. This post shows that if you want to write (even simple) numerical code that uses integral numbers, you need to use everywhere efficient multi-precision integers, or at least you need integer overflows at runtime. Otherwise you are programming in the darkness. Good luck with your programs, bearophile

Apr 08 2010

bearophile wrote:This post shows that if you want to write (even simple) numerical code that uses integral numbers, you need to use everywhere efficient multi-precision integers, or at least you need integer overflows at runtime. Otherwise you are programming in the darkness.

Although this type of code is extremely common in programming puzzles, I seriously doubt that it's used much anywhere else. But anyway --- use doubles instead of ints, and turn all the floating point exceptions on. Will be much faster (on 32-bit compilers), and overflow errors get caught as soon as they occur.

Apr 09 2010

Don:Although this type of code is extremely common in programming puzzles, I seriously doubt that it's used much anywhere else.

But integer-related bugs infests many other kinds of programs too :-) Good luck with your bugs too Don, bearophile

Apr 09 2010

Hello bearophile,Don:Although this type of code is extremely common in programming puzzles, I seriously doubt that it's used much anywhere else.

IIRC FP overflows can trigger hardware based interrupts where as I don't recall seeing similar features for integers. -- ... <IXOYE><

Apr 09 2010

Don:But anyway --- use doubles instead of ints, and turn all the floating point exceptions on. Will be much faster (on 32-bit compilers), and overflow errors get caught as soon as they occur.

I have tried to use doubles as you say or to use BitInt, but in both cases I have so far failed in producing working programs... Regarding BigInt, once they are more complete and refined, they can become a built in of the language (they can be called "bint" for example). Multi-precision ints are useful, and this can improve their literals, their implicit conversions, etc. Bye, bearophile

Apr 10 2010

I have tried two of those programs with the latest dmd2 release. The performance of the euler174 is the same. While the running time of the euler14 goes from (with a fully unloaded PC) 5.49 seconds with v2.042, to 4.13 seconds with v.2.043, probably thanks to AAs improvements. Bye, bearophile

Apr 09 2010