www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Hamming numbers comparison, take 2

reply bearophile <bearophileHUGS lycos.com> writes:
I have tried to implement the rosettacode Hamming Numbers Task:

I have tried to translate the Python version (but the best version is the
Haskell one). I have tried to perform this translation time ago, and it was too
much early for Phobos2. Now I have tried again, and I can see things are
getting better, but Phobos2 seems not good enough yet.

I am using this to find the multiples, this is easy:
    auto m2 = map!q{2 * a}(p2); // multiples of 2
    auto m3 = map!q{3 * a}(p3); // multiples of 3
    auto m5 = map!q{5 * a}(p5); // multiples of 5

But then nWayUnion can't be used, because it accepts an iterable of iterables:
    auto merged = nWayUnion([m2, m3, m5]);
So I'd like nWayUnion to accept N different iterables all of of different type,
with just the constraint that each one yields the same type. I presume it can
work as chain() (I don't know why chain() and nWayUnion() have a so much
different signature).

To remove the duplicates I use this, that looks good enough:
auto output = map!q{a.field[0]}(group(combined)); // eliminate duplicates

The group() of Phobos2 isn't as powerful as the Python groupby(), because
group() yields the pair of head item and its count, while Python groupby()
yields the pair of the head item and a lazy iterable of the duplicated items.

This design of the Python version is sometimes less handy because if you want
to know how many similar items there are (and this is a common usage of
groupby()) you have to count them for example with something like a walkLength,
but it's more powerful because the items can define a generic equality, so they
can be actually not the same item and you may need them all.

Adding the starting item is easy:
auto combined = chain([1], merged); // prepend a starting point
But in practice this Task asks for the millionth item of this Hamming sequence,
that is larger than a ulong, so I have to use something like this:
auto combined = chain([BigInt(1)], merged); // prepend a starting point
But then the BigInt lack a handy toString() (irony: I have seen that inside the
unittests of BigInt there is one such function).

I think Phobos2 lacks a tee() function:
And I have not seen the islice() (a lazy slice) but this is easy enough to
(I don't know if such ranges are already present in some dsource module about
extra ranges).

A bigger problem in the translation comes from the deferred_output and output,
that I am not sure how to define/translate yet. I accept suggestions :-)
I invite other people to try to use Phobos2, because in practice you can spot
similar troubles only if you actually try to use it.

Jun 09 2010
parent bearophile <bearophileHUGS lycos.com> writes:
This D2 code adapted from the Java version finds the correct solutions, but I
don't think of it as the a good solution because it stores all items and uses
lot of memory. The Python version I was trying to translate uses only few MB of
RAM, while this version uses almost 100 MB of it.

import std.stdio: writeln;
import std.bigint: BigInt;
import std.algorithm: min, map;
import std.range: iota;
import std.array: array;

BigInt hamming(int limit) {
    BigInt two   = 2;
    BigInt three = 3;
    BigInt five  = 5;

    auto h = new BigInt[limit];
    h[0] = 1;
    BigInt x2 = 2;
    BigInt x3 = 3;
    BigInt x5 = 5;
    int i, j, k;

    foreach (ref el; h[1 .. $]) {
        el = min(x2, x3, x5);
        if (x2 == el)
            x2 = two * h[++i];
        if (x3 == el)
            x3 = three * h[++j];
        if (x5 == el)
            x5 = five * h[++k];
    return h[$ - 1];

const(char)[] bigIntRepr(BigInt i) {
    const(char)[] result;
    i.toString((const(char)[] s){ result = s; }, "d");
    return result;

void main() {
    writeln(array(map!bigIntRepr(map!hamming(iota(1, 21)))));

I think it's the first time I am able to write a working program that uses
BigInt :-)
In the first line of the main I have had to use two maps because I was not able
to make it run with just one map.
While writing this program I have found only two problems (beside the known map
one), filed as 4300 and 4301.
This code runs in D 2.62 seconds, its Java version in 1.95 seconds, and its
Python+Psyco version in 1.03 seconds. (The Haskell version that uses a better
algorithm can be a bit faster than the Python version).

The Python translation:

import psyco

def hamming(limit):
    h = [1] * limit
    x2, x3, x5 = 2, 3, 5
    i = j = k = 0

    for n in xrange(1, limit):
        h[n] = min(x2, x3, x5)
        if x2 == h[n]:
            i += 1
            x2 = 2 * h[i]
        if x3 == h[n]:
            j += 1
            x3 = 3 * h[j]
        if x5 == h[n]:
            k += 1
            x5 = 5 * h[k]

    return h[-1]

print [hamming(i) for i in xrange(1, 21)]
print hamming(1691)
print hamming(1000000)

Jun 10 2010