www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Parallel prefix algos & D

This post is about many things at the same time.

This is a short slides pack about Parallel Prefix Algorithms (but it's not very
readable, it's too much succinct):
http://ocw.mit.edu/courses/mathematics/18-337j-applied-parallel-computing-sma-5505-spring-2005/lecture-notes/lec4.pdf

Not too much time ago Martin Odersky has given a lecture about quite related
matters.

literateprograms.org is a site where people show algorithms implemented in
various languages, and explain them more or less according to the ideas of
Knuth's Literate Programming. The site already allows pages about D language,
but I think so far no one has added D implementations (while the site
rosettacode has hundreds of D2 implementations). You are invited to add D stuff
there.

This little task page of literateprograms.org uses Python as implementation
language:
http://en.literateprograms.org/Parallel_prefix_%28Python%29


The basic Python2 code (with small changes) is short and simple, but it's not
parallel (I have omitted the code relative to the second part of the page):




def scan(op, a):
    print "in:", a
    if len(a) > 1:
        a[1::2] = imap(op, a[0::2], a[1::2])
        a[1::2] = scan(op, a[1::2])
        a[2::2] = imap(op, a[1::2], a[2::2])
    print "out:", a
    return a

if __name__ == "__main__":
    from operator import add, mul
    scan(add, range(16))
    print
    scan(mul, range(1, 17))
    print
    scan(add, list("hi world"))


A D2 translation of this Python code is interesting for two reasons:
1) The D code is supposed to use parallelism for real.
2) It shows some consequences of the design decisions of D and Phobos.

Regarding the point 2, std.algorithm.map doesn't work with two or more
iterables in parallel as Python (I have not appreciated this design decision,
since the beginning: from my experience creating two outputs from a single map
scan is quite less commonly needed than having a mapping function that used two
or more arguments). And built-in D slices don't have an optional stride, so the
map() has to work with a lambda that unpacks the tuples produced by something
like:

zip(stride(a,2), stride(a[1..$],2))

and then to perform the strided (striped) assign like "a[1::2] = " you need a
copy. D looks quite worse than the very clear original python version (but
Python is harder to parallelize).

Regarding the point 1 I think Phobos already has enough to parallelize the
code. You are invited to write the code, if you have some free time, and if you
want even to see if the parallelism actually reduces the run time here.

Bye,
bearophile
Jan 30 2012