www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Vectorized Laziness

reply bearophile <bearophileHUGS lycos.com> writes:
I've seen this good Google Talk video from Jan 2008, "MonetDB/X100: a (very)
fast column-store", about a more efficient column-oriented DBMS:

The efficiency of this DBMS (something like 50 times faster) is produced by few
- It's column-wise, this helps in several other successive optimizations too
(row-oriented DBMS have their purpose still, the column-oriented are good only
if you want to perform statistics on most of your data, certain kinds of data
mining, etc).
- At 13.57 it shows that instead of yielding single tuples (or single items,
it's column-oriented), it yields arrays of about 100 fields. This allows to
create primitives that are much more efficient. And the CPU can process them
better, using SSE instruction too. Such small arrays are designed to fit in the
CPU cache (probably L2). Such vectorized operations are also pipelined in some
- The filtering operations often don't produce new vectors, they just mark the
items as not present any more inside an array. This helps reducing the number
of allocatations of new arrays.
- On disk data is kept compressed, the compression is column-wise, and
decompression is done only just-in-time to reduce the amount of data transfers.
The number compression takes in account that often data is sorted, so it's
delta-compressed, and then such delta is compressed only in a range of the
Gaussian-like residual distribution (outliers are encoded directly). This
compression also allows to keep large indexes in RAM, that speeds up things
more. Such compression is I/O bound, the CPU performs it at max speed.
- They even shown vectorized hashing, but I have not understood how they work.
- Reading from the disks is done in a merged way, to avoid reading the same
data many times for similar queries.

(The good thing is that it's not hard to understand most things shown in this
video. But I'd like to see the C code they use as "reference", that's just 3
times faster than their DBMS. Such C code may be improved).

Inside, DBMS work as the lazy operations that are getting more common in D2
code, that are common in Python3, and even more in Haskell.

So in D2 to reduce the overhead of the lazy operations it may be useful to use
a vectorized strategy (that's meant to be almost transparent for the D
programmer), so items can be produced and moved in groups, inside arrays, like
in that video. (The DBMS has an interpreter overhead, it's made of fast
primitives used by lazy interpreted code). This idea can be applied in other
languages too.

Lazy filtering is a bit less common in D2 code, so I don't know if the idea of
not creating new blocks (and just marking items as absent, for example using a
bit array attached to the items array) can be useful in D2 too, as in that X100

In this discussion you have to think about pieces of code like:
xfilter(xchain(xmap(_, xrange(_)), xfilter(xmap(_))))
or things even more complex, and not something simpler, that LDC may be able to
just fully inline.

(Time ago I have posted related comments on the Python newsgroup, but it was

(Partially unrelated note: years after starting to program in D, I still miss a
lot strict/lazy array comprehensions in D. Maybe Walter will eventually see how
handy they are.)

Nov 10 2009
parent BCS <none anon.com> writes:
Hello bearophile,

I seem to recall watching that a while ago. Fun and cool stuff IIRC.
Nov 10 2009