www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Collect Statistics efficiently and easily

reply Brett <Brett gmail.com> writes:
Many times I have to get statistical info which is simply compute 
statistics on a data set that may be generating or already 
generated.

The code usually is

M = max(M, v);
m = min(m, v);

but other things like standard deviation, mean, etc might need to 
be computed.

This may need to be done on several data sets simultaneously.

is there any way that one could just compute them in one line 
that is efficient, probably using ranges? I'd like to avoid 
having to loop through a data set multiple times as it would be 
quite inefficient.
Sep 16 2019
parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 17 September 2019 at 01:53:39 UTC, Brett wrote:
 Many times I have to get statistical info which is simply 
 compute statistics on a data set that may be generating or 
 already generated.

 The code usually is

 M = max(M, v);
 m = min(m, v);

 but other things like standard deviation, mean, etc might need 
 to be computed.

 This may need to be done on several data sets simultaneously.

 is there any way that one could just compute them in one line 
 that is efficient, probably using ranges? I'd like to avoid 
 having to loop through a data set multiple times as it would be 
 quite inefficient.
You can use `std.algorithm.fold` to compute multiple results in a single pass: auto stats = v.fold!(max, min); M = stats[0]; m = stats[1];
Sep 17 2019
parent Brett <Brett gmail.com> writes:
On Tuesday, 17 September 2019 at 14:06:41 UTC, Paul Backus wrote:
 On Tuesday, 17 September 2019 at 01:53:39 UTC, Brett wrote:
 Many times I have to get statistical info which is simply 
 compute statistics on a data set that may be generating or 
 already generated.

 The code usually is

 M = max(M, v);
 m = min(m, v);

 but other things like standard deviation, mean, etc might need 
 to be computed.

 This may need to be done on several data sets simultaneously.

 is there any way that one could just compute them in one line 
 that is efficient, probably using ranges? I'd like to avoid 
 having to loop through a data set multiple times as it would 
 be quite inefficient.
You can use `std.algorithm.fold` to compute multiple results in a single pass: auto stats = v.fold!(max, min); M = stats[0]; m = stats[1];
That may work but I'm already iterating and doing it inside a loop. I'm I'm specifically talking about is sort of abstract the computation of each statistic type. If I were to convert my algorithm to be a range then maybe I could do similar to what you are saying but I would still require using more than min and max(such as avg, std, and others). It may be viable but I'll have to think about it. I tend to find myself writing the same abstract code to compute the same statistics quite often(sometimes it deals with a history and sometimes not. E.g., I might want to compute the average and keep the last 5, or the 5 largest).
Sep 18 2019