digitalmars.D.learn - Stream-Based Processing of Range Chunks in D

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (31/31) Dec 10 2013 I'm looking for an elegant way to perform chunk-stream-based

qznc (14/47) Dec 10 2013 You could make a range step for each kind of statistic, which
Philippe Sigaud (16/24) Dec 10 2013 Concerning the put, you could have an auxiliary function that's

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

I'm looking for an elegant way to perform chunk-stream-based 
processing of arrays/ranges. I'm building a file indexing/search 
engine in D that calculates various kinds of statistics on files 
such as histograms and SHA1-digests. I want these calculations to 
be performed in a single pass with regards to data-access 
locality.

Here is an excerpt from the engine

     /** Process File in Cache Friendly Chunks. */
     void calculateCStatInChunks(immutable (ubyte[]) src,
                                 size_t chunkSize, bool doSHA1, 
bool doBHist8) {
         if (!_cstat.contentsDigest[].allZeros) { doSHA1 = false; }
         if (!_cstat.bhist8.allZeros) { doBHist8 = false; }

         import std.digest.sha;
         SHA1 sha1;
         if (doSHA1) { sha1.start(); }

         import std.range: chunks;
         foreach (chunk; src.chunks(chunkSize)) {
             if (doSHA1) { sha1.put(chunk); }
             if (doBHist8) { /*...*/ }
         }

         if (doSHA1) {
             _cstat.contentsDigest = sha1.finish();
         }
     }

Seemingly this is not a very elegant (functional) approach as I 
have to spread logic for each statistics (reducer) across three 
different places in the code, namely `start`, `put` and `finish`.

Does anybody have suggestions/references on Haskell-monad-like 
stream based APIs that can make this code more D-style 
component-based?

Dec 10 2013

"qznc" <qznc web.de> writes:

On Tuesday, 10 December 2013 at 09:57:44 UTC, Nordlöw wrote:
 I'm looking for an elegant way to perform chunk-stream-based 
 processing of arrays/ranges. I'm building a file 
 indexing/search engine in D that calculates various kinds of 
 statistics on files such as histograms and SHA1-digests. I want 
 these calculations to be performed in a single pass with 
 regards to data-access locality.

 Here is an excerpt from the engine

     /** Process File in Cache Friendly Chunks. */
     void calculateCStatInChunks(immutable (ubyte[]) src,
                                 size_t chunkSize, bool doSHA1, 
 bool doBHist8) {
         if (!_cstat.contentsDigest[].allZeros) { doSHA1 = 
 false; }
         if (!_cstat.bhist8.allZeros) { doBHist8 = false; }

         import std.digest.sha;
         SHA1 sha1;
         if (doSHA1) { sha1.start(); }

         import std.range: chunks;
         foreach (chunk; src.chunks(chunkSize)) {
             if (doSHA1) { sha1.put(chunk); }
             if (doBHist8) { /*...*/ }
         }

         if (doSHA1) {
             _cstat.contentsDigest = sha1.finish();
         }
     }

 Seemingly this is not a very elegant (functional) approach as I 
 have to spread logic for each statistics (reducer) across three 
 different places in the code, namely `start`, `put` and 
 `finish`.

 Does anybody have suggestions/references on Haskell-monad-like 
 stream based APIs that can make this code more D-style 
 component-based?

You could make a range step for each kind of statistic, which 
outputs the input range unchanged and does its job as a side 
effect.

   SHA1 sha1;
   src.chunks(chunkSize)
      .add_sha1(doSHA1, &sha1)
      .add_bhist(doBHist8)
      .strict_consuming();

You could try to use constructor/destructor mechanisms for 
sha1.start and sha1.finish. Or at least scope guards:

SHA1 sha1;
if (doSHA1) { sha1.start(); }
scope(exit) if (doSHA1) { _cstat.contentsDigest = sha1.finish(); }

Dec 10 2013

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Tue, Dec 10, 2013 at 10:57 AM, "Nordlöw" <per.nordlow gmail.com> wrote:
 I'm looking for an elegant way to perform chunk-stream-based processing of
 arrays/ranges. I'm building a file indexing/search engine in D that
 calculates various kinds of statistics on files such as histograms and
 SHA1-digests. I want these calculations to be performed in a single pass
 with regards to data-access locality.

 Seemingly this is not a very elegant (functional) approach as I have to
 spread logic for each statistics (reducer) across three different places in
 the code, namely `start`, `put` and `finish`.

Concerning the put, you could have an auxiliary function that's
defined only once:

void delegate( /*typeofChunk?*/ chunk) worker, sha, bhist8;

if (doSHA1)
    sha = (chunk) { sha1.put(chunk);}
else
    sha = (chunk) {}

if (doBhist8)
    bhist8 = (chunk) { /*some BHist8 work*/}
else
    bhist8 = (chunk) {}

worker = (chunk) { sha(chunk); bist8(chunk};}

...

       foreach (chunk; src.chunks(chunkSize))
            worker(chunk);

Dec 10 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Stream-Based Processing of Range Chunks in D