digitalmars.D.learn - Stream-Based Processing of Range Chunks in D
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (31/31) Dec 10 2013 I'm looking for an elegant way to perform chunk-stream-based
- qznc (14/47) Dec 10 2013 You could make a range step for each kind of statistic, which
- Philippe Sigaud (16/24) Dec 10 2013 Concerning the put, you could have an auxiliary function that's
I'm looking for an elegant way to perform chunk-stream-based
processing of arrays/ranges. I'm building a file indexing/search
engine in D that calculates various kinds of statistics on files
such as histograms and SHA1-digests. I want these calculations to
be performed in a single pass with regards to data-access
locality.
Here is an excerpt from the engine
/** Process File in Cache Friendly Chunks. */
void calculateCStatInChunks(immutable (ubyte[]) src,
size_t chunkSize, bool doSHA1,
bool doBHist8) {
if (!_cstat.contentsDigest[].allZeros) { doSHA1 = false; }
if (!_cstat.bhist8.allZeros) { doBHist8 = false; }
import std.digest.sha;
SHA1 sha1;
if (doSHA1) { sha1.start(); }
import std.range: chunks;
foreach (chunk; src.chunks(chunkSize)) {
if (doSHA1) { sha1.put(chunk); }
if (doBHist8) { /*...*/ }
}
if (doSHA1) {
_cstat.contentsDigest = sha1.finish();
}
}
Seemingly this is not a very elegant (functional) approach as I
have to spread logic for each statistics (reducer) across three
different places in the code, namely `start`, `put` and `finish`.
Does anybody have suggestions/references on Haskell-monad-like
stream based APIs that can make this code more D-style
component-based?
Dec 10 2013
On Tuesday, 10 December 2013 at 09:57:44 UTC, Nordlöw wrote:
I'm looking for an elegant way to perform chunk-stream-based
processing of arrays/ranges. I'm building a file
indexing/search engine in D that calculates various kinds of
statistics on files such as histograms and SHA1-digests. I want
these calculations to be performed in a single pass with
regards to data-access locality.
Here is an excerpt from the engine
/** Process File in Cache Friendly Chunks. */
void calculateCStatInChunks(immutable (ubyte[]) src,
size_t chunkSize, bool doSHA1,
bool doBHist8) {
if (!_cstat.contentsDigest[].allZeros) { doSHA1 =
false; }
if (!_cstat.bhist8.allZeros) { doBHist8 = false; }
import std.digest.sha;
SHA1 sha1;
if (doSHA1) { sha1.start(); }
import std.range: chunks;
foreach (chunk; src.chunks(chunkSize)) {
if (doSHA1) { sha1.put(chunk); }
if (doBHist8) { /*...*/ }
}
if (doSHA1) {
_cstat.contentsDigest = sha1.finish();
}
}
Seemingly this is not a very elegant (functional) approach as I
have to spread logic for each statistics (reducer) across three
different places in the code, namely `start`, `put` and
`finish`.
Does anybody have suggestions/references on Haskell-monad-like
stream based APIs that can make this code more D-style
component-based?
You could make a range step for each kind of statistic, which
outputs the input range unchanged and does its job as a side
effect.
SHA1 sha1;
src.chunks(chunkSize)
.add_sha1(doSHA1, &sha1)
.add_bhist(doBHist8)
.strict_consuming();
You could try to use constructor/destructor mechanisms for
sha1.start and sha1.finish. Or at least scope guards:
SHA1 sha1;
if (doSHA1) { sha1.start(); }
scope(exit) if (doSHA1) { _cstat.contentsDigest = sha1.finish(); }
Dec 10 2013
On Tue, Dec 10, 2013 at 10:57 AM, "Nordlöw" <per.nordlow gmail.com> wrote:I'm looking for an elegant way to perform chunk-stream-based processing of arrays/ranges. I'm building a file indexing/search engine in D that calculates various kinds of statistics on files such as histograms and SHA1-digests. I want these calculations to be performed in a single pass with regards to data-access locality.Seemingly this is not a very elegant (functional) approach as I have to spread logic for each statistics (reducer) across three different places in the code, namely `start`, `put` and `finish`.Concerning the put, you could have an auxiliary function that's defined only once: void delegate( /*typeofChunk?*/ chunk) worker, sha, bhist8; if (doSHA1) sha = (chunk) { sha1.put(chunk);} else sha = (chunk) {} if (doBhist8) bhist8 = (chunk) { /*some BHist8 work*/} else bhist8 = (chunk) {} worker = (chunk) { sha(chunk); bist8(chunk};} ... foreach (chunk; src.chunks(chunkSize)) worker(chunk);
Dec 10 2013









"qznc" <qznc web.de> 