digitalmars.D.learn - D Mir: standard deviation speed

tastyminerals (59/59) Jul 14 2020 I am trying to implement standard deviation calculation in Mir

jmh530 (21/80) Jul 14 2020 It would be helpful to provide a link.

tastyminerals (12/34) Jul 14 2020 Ok, the wiki page looks more informative, I shall look into my

jmh530 (53/61) Jul 15 2020 I changed the dflags-ldc to "-mcpu-native -O" and compiled with

9il (3/19) Jul 15 2020 @fastmath violates all summation algorithms except `"fast"`.

jmh530 (4/9) Jul 15 2020 I hadn't realized that @fmamath was the problem, rather than

9il (3/13) Jul 15 2020 Ah, no, my bad! You write @fmamath, I have read it as @fastmath.

jmh530 (2/5) Jul 15 2020 I've mixed up @fastmath and @fmamath as well. No worries.

9il (4/5) Jul 14 2020 @fastmath shouldn't be really used with summation algorithms

9il (2/7) Jul 14 2020 `mean` is the summation algorithm too
tastyminerals (3/8) Jul 14 2020 Good to know. So, it's fine to use it with sum!"fast" but better

9il (2/11) Jul 14 2020 They both are more precise by default.

9il (4/16) Jul 14 2020 This was a reply to the other your post in the thread, sorry. Mir

tastyminerals (2/21) Jul 15 2020 Right. Is this why standardDeviation is significantly slower?

9il (3/19) Jul 15 2020 Yes. It allows you to pick a summation option, you can try others

tastyminerals (29/49) Jul 15 2020 Indeed, I played around with VarianceAlgo and Summation options,

tastyminerals <tastyminerals gmail.com> writes:

I am trying to implement standard deviation calculation in Mir 
for benchmark purposes.
I have two implementations. One is the straightforward std = 
sqrt(mean(abs(x - x.mean())**2)) and the other follows Welford's 
algorithm for computing variance (as described here: 
https://www.johndcook.com/blog/standard_deviation/).

However, although the first implementation should be less 
efficient / slower, the benchmarking results show a startling 
difference in its favour. I'd like to understand if I am doing 
something wrong and would appreciate some explanation.


import std.math : abs;
import mir.ndslice;
import mir.math.common : pow, sqrt, fastmath;
import mir.math.sum : sum;
import mir.math.stat : mean;

 fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)
{
     pragma(inline, false);
     if (flatMatrix.empty)
         return 0.0;
     double n = cast(double) flatMatrix.length;
     double mu = flatMatrix.mean;
     return (flatMatrix.map!(a => (a - mu).abs ^^ 2).sum!"fast" / 
n).sqrt;
}



 fastmath double sdWelford(T)(Slice!(T*, 1) flatMatrix)
{
     pragma(inline, false);
     if (flatMatrix.empty)
         return 0.0;

     double m0 = 0.0;
     double m1 = 0.0;
     double s0 = 0.0;
     double s1 = 0.0;
     double n = 0.0;
     foreach (x; flatMatrix.field)
     {
         ++n;
         m1 = m0 + (x - m0) / n;
         s1 = s0 + (x - m0) * (x - m1);
         m0 = m1;
         s0 = s1;
     }
     // switch to n - 1 for sample variance
     return (s1 / n).sqrt;
}

Benchmarking:

Naive std (1k loops):
   std of [60, 60] matrix 0.001727
   std of [300, 300] matrix 0.043452
   std of [600, 600] matrix 0.182177
   std of [800, 800] matrix 0.345367

std with Welford's variance (1k loops):
   std of [60, 60] matrix 0.0225476
   std of [300, 300] matrix 0.534528
   std of [600, 600] matrix 2.0714
   std of [800, 800] matrix 3.60142

Jul 14 2020

jmh530 <john.michael.hall gmail.com> writes:

On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
 I am trying to implement standard deviation calculation in Mir 
 for benchmark purposes.
 I have two implementations. One is the straightforward std = 
 sqrt(mean(abs(x - x.mean())**2)) and the other follows 
 Welford's algorithm for computing variance (as described here: 
 https://www.johndcook.com/blog/standard_deviation/).

 However, although the first implementation should be less 
 efficient / slower, the benchmarking results show a startling 
 difference in its favour. I'd like to understand if I am doing 
 something wrong and would appreciate some explanation.


 import std.math : abs;
 import mir.ndslice;
 import mir.math.common : pow, sqrt, fastmath;
 import mir.math.sum : sum;
 import mir.math.stat : mean;

  fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)
 {
     pragma(inline, false);
     if (flatMatrix.empty)
         return 0.0;
     double n = cast(double) flatMatrix.length;
     double mu = flatMatrix.mean;
     return (flatMatrix.map!(a => (a - mu).abs ^^ 2).sum!"fast" 
 / n).sqrt;
 }



  fastmath double sdWelford(T)(Slice!(T*, 1) flatMatrix)
 {
     pragma(inline, false);
     if (flatMatrix.empty)
         return 0.0;

     double m0 = 0.0;
     double m1 = 0.0;
     double s0 = 0.0;
     double s1 = 0.0;
     double n = 0.0;
     foreach (x; flatMatrix.field)
     {
         ++n;
         m1 = m0 + (x - m0) / n;
         s1 = s0 + (x - m0) * (x - m1);
         m0 = m1;
         s0 = s1;
     }
     // switch to n - 1 for sample variance
     return (s1 / n).sqrt;
 }

 Benchmarking:

 Naive std (1k loops):
   std of [60, 60] matrix 0.001727
   std of [300, 300] matrix 0.043452
   std of [600, 600] matrix 0.182177
   std of [800, 800] matrix 0.345367

 std with Welford's variance (1k loops):
   std of [60, 60] matrix 0.0225476
   std of [300, 300] matrix 0.534528
   std of [600, 600] matrix 2.0714
   std of [800, 800] matrix 3.60142

It would be helpful to provide a link.

You should only need one accumulator for mean and centered sum of 
squares. See the python example under the Welford example
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
This may have broken optimization somehow.

variance and standardDeviation were recently added to 
mir.math.stat. They have the option to switch between Welford's 
algorithm and the others. What you call as the naive algorithm, 
is VarianceAlgo.twoPass and the Welford algorithm can be toggled 
with VarianceAlgo.online, which is the default option. It also 
would be interesting if you re-did the analysis with the built-in 
mir functions.

There are some other small differences between your 
implementation and the one in mir, beyond the issue discussed 
above. You take the absolute value before the square root and 
force the use of sum!"fast". Another difference is 
VarianceAlgo.online in mir is using a precise calculation of the 
mean rather than the fast update that Welford uses. This may have 
a modest impact on performance, but should provide more accurate 
results.

Jul 14 2020

tastyminerals <tastyminerals gmail.com> writes:

On Tuesday, 14 July 2020 at 19:36:21 UTC, jmh530 wrote:
 On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
   [...]

 It would be helpful to provide a link.

 You should only need one accumulator for mean and centered sum 
 of squares. See the python example under the Welford example
 https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
 This may have broken optimization somehow.

 variance and standardDeviation were recently added to 
 mir.math.stat. They have the option to switch between Welford's 
 algorithm and the others. What you call as the naive algorithm, 
 is VarianceAlgo.twoPass and the Welford algorithm can be 
 toggled with VarianceAlgo.online, which is the default option. 
 It also would be interesting if you re-did the analysis with 
 the built-in mir functions.

 There are some other small differences between your 
 implementation and the one in mir, beyond the issue discussed 
 above. You take the absolute value before the square root and 
 force the use of sum!"fast". Another difference is 
 VarianceAlgo.online in mir is using a precise calculation of 
 the mean rather than the fast update that Welford uses. This 
 may have a modest impact on performance, but should provide 
 more accurate results.

Ok, the wiki page looks more informative, I shall look into my 
Welford implementation.

I've just used standardDeviation from Mir and it showed even 
worse results than both of the examples above.

Here is a (WIP) project as of now.
Line 160 in 
https://github.com/tastyminerals/mir_benchmarks_2/blob/master/source/basic_ops.d

std of [60, 60] matrix 0.0389492 (> 0.001727)
std of [300, 300] matrix 1.03592 (> 0.043452)
std of [600, 600] matrix 4.2875 (> 0.182177)
std of [800, 800] matrix 7.9415 (> 0.345367)

Jul 14 2020

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 15 July 2020 at 05:57:56 UTC, tastyminerals wrote:
 [snip]

 Here is a (WIP) project as of now.
 Line 160 in 
 https://github.com/tastyminerals/mir_benchmarks_2/blob/master/source/basic_ops.d

 std of [60, 60] matrix 0.0389492 (> 0.001727)
 std of [300, 300] matrix 1.03592 (> 0.043452)
 std of [600, 600] matrix 4.2875 (> 0.182177)
 std of [800, 800] matrix 7.9415 (> 0.345367)

I changed the dflags-ldc to "-mcpu-native -O" and compiled with 
`dub run --compiler=ldc2`. I got similar results as yours for 
both in the initial run.

I changed sd to

 fmamath private double sd(T)(Slice!(T*, 1) flatMatrix)
{
     pragma(inline, false);
     if (flatMatrix.empty)
         return 0.0;
     double n = cast(double) flatMatrix.length;
     double mu = flatMatrix.mean;
     return (flatMatrix.map!(a => (a - mu) ^^ 2)
             .sum!"precise" / n).sqrt;
}

and got

std of [10, 10] matrix 0.0016321
std of [20, 20] matrix 0.0069788
std of [300, 300] matrix 2.42063
std of [60, 60] matrix 0.0828711
std of [600, 600] matrix 9.72251
std of [800, 800] matrix 18.1356

And the biggest change by far was the sum!"precise" instead of 
sum!"fast".

When I ran your benchStd function with
ans = matrix.flattened.standardDeviation!(double, "online", 
"fast");
I got
std of [10, 10] matrix 1e-07
std of [20, 20] matrix 0
std of [300, 300] matrix 0
std of [60, 60] matrix 1e-07
std of [600, 600] matrix 0
std of [800, 800] matrix 0

I got the same result with Summator.naive. That almost seems too 
low.

The default is Summator.appropriate, which is resolved to 
Summator.pairwise in this case. It is faster than 
Summator.precise, but still slower than Summator.naive or 
Summator.fast. Your welfordSD should line up with Summator.naive.

When I change that to
ans = matrix.flattened.standardDeviation!(double, "online", 
"precise");
I get
Running .\mir_benchmarks_2.exe
std of [10, 10] matrix 0.0031737
std of [20, 20] matrix 0.0153603
std of [300, 300] matrix 4.15738
std of [60, 60] matrix 0.171211
std of [600, 600] matrix 17.7443
std of [800, 800] matrix 34.2592

I also tried changing your welfordSD function based on the stuff 
I mentioned above, but it did not make a large difference.

Jul 15 2020

9il <ilyayaroshenko gmail.com> writes:

On Wednesday, 15 July 2020 at 11:23:00 UTC, jmh530 wrote:
 On Wednesday, 15 July 2020 at 05:57:56 UTC, tastyminerals wrote:
 [snip]

 Here is a (WIP) project as of now.
 Line 160 in 
 https://github.com/tastyminerals/mir_benchmarks_2/blob/master/source/basic_ops.d

 std of [60, 60] matrix 0.0389492 (> 0.001727)
 std of [300, 300] matrix 1.03592 (> 0.043452)
 std of [600, 600] matrix 4.2875 (> 0.182177)
 std of [800, 800] matrix 7.9415 (> 0.345367)

 I changed the dflags-ldc to "-mcpu-native -O" and compiled with 
 `dub run --compiler=ldc2`. I got similar results as yours for 
 both in the initial run.

 I changed sd to

  fmamath private double sd(T)(Slice!(T*, 1) flatMatrix)

 fastmath violates all summation algorithms except `"fast"`.
The same bug is in the original author's post.

Jul 15 2020

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 15 July 2020 at 11:26:19 UTC, 9il wrote:
 [snip]
  fmamath private double sd(T)(Slice!(T*, 1) flatMatrix)

  fastmath violates all summation algorithms except `"fast"`.
 The same bug is in the original author's post.

I hadn't realized that  fmamath was the problem, rather than 
 fastmath overall.  fmamathis used on many mir.math.stat 
functions, though admittedly not in the accumulators.

Jul 15 2020

9il <ilyayaroshenko gmail.com> writes:

On Wednesday, 15 July 2020 at 11:37:23 UTC, jmh530 wrote:
 On Wednesday, 15 July 2020 at 11:26:19 UTC, 9il wrote:
 [snip]
  fmamath private double sd(T)(Slice!(T*, 1) flatMatrix)

  fastmath violates all summation algorithms except `"fast"`.
 The same bug is in the original author's post.

 I hadn't realized that  fmamath was the problem, rather than 
  fastmath overall.  fmamathis used on many mir.math.stat 
 functions, though admittedly not in the accumulators.

Ah, no, my bad! You write  fmamath, I have read it as  fastmath. 
 fmamath is OK here.

Jul 15 2020

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 15 July 2020 at 11:41:35 UTC, 9il wrote:
 [snip]

 Ah, no, my bad! You write  fmamath, I have read it as 
  fastmath.  fmamath is OK here.

I've mixed up  fastmath and  fmamath as well. No worries.

Jul 15 2020

9il <ilyayaroshenko gmail.com> writes:

On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
  fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)

 fastmath shouldn't be really used with summation algorithms 
except the `"fast"` version of them. Otherwise, they may or may 
not behave like "fast".

Jul 14 2020

9il <ilyayaroshenko gmail.com> writes:

On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
 On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
  fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)

  fastmath shouldn't be really used with summation algorithms 
 except the `"fast"` version of them. Otherwise, they may or may 
 not behave like "fast".

`mean` is the summation algorithm too

Jul 14 2020

tastyminerals <tastyminerals gmail.com> writes:

On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
 On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
  fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)

  fastmath shouldn't be really used with summation algorithms 
 except the `"fast"` version of them. Otherwise, they may or may 
 not behave like "fast".

Good to know. So, it's fine to use it with sum!"fast" but better 
avoid it for general purposes.

Jul 14 2020

9il <ilyayaroshenko gmail.com> writes:

On Wednesday, 15 July 2020 at 06:00:46 UTC, tastyminerals wrote:
 On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
 On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
  fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)

  fastmath shouldn't be really used with summation algorithms 
 except the `"fast"` version of them. Otherwise, they may or 
 may not behave like "fast".

 Good to know. So, it's fine to use it with sum!"fast" but 
 better avoid it for general purposes.

They both are more precise by default.

Jul 14 2020

9il <ilyayaroshenko gmail.com> writes:

On Wednesday, 15 July 2020 at 06:55:51 UTC, 9il wrote:
 On Wednesday, 15 July 2020 at 06:00:46 UTC, tastyminerals wrote:
 On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
 On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
  fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)

  fastmath shouldn't be really used with summation algorithms 
 except the `"fast"` version of them. Otherwise, they may or 
 may not behave like "fast".

 Good to know. So, it's fine to use it with sum!"fast" but 
 better avoid it for general purposes.

 They both are more precise by default.

This was a reply to the other your post in the thread, sorry. Mir 
algorithms are more precise by default then the algorithms you 
have provided.

Jul 14 2020

tastyminerals <tastyminerals gmail.com> writes:

On Wednesday, 15 July 2020 at 06:57:21 UTC, 9il wrote:
 On Wednesday, 15 July 2020 at 06:55:51 UTC, 9il wrote:
 On Wednesday, 15 July 2020 at 06:00:46 UTC, tastyminerals 
 wrote:
 On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
 On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals 
 wrote:
  fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)

  fastmath shouldn't be really used with summation algorithms 
 except the `"fast"` version of them. Otherwise, they may or 
 may not behave like "fast".

 Good to know. So, it's fine to use it with sum!"fast" but 
 better avoid it for general purposes.

 They both are more precise by default.

 This was a reply to the other your post in the thread, sorry. 
 Mir algorithms are more precise by default then the algorithms 
 you have provided.

Right. Is this why standardDeviation is significantly slower?

Jul 15 2020

9il <ilyayaroshenko gmail.com> writes:

On Wednesday, 15 July 2020 at 07:34:59 UTC, tastyminerals wrote:
 On Wednesday, 15 July 2020 at 06:57:21 UTC, 9il wrote:
 On Wednesday, 15 July 2020 at 06:55:51 UTC, 9il wrote:
 On Wednesday, 15 July 2020 at 06:00:46 UTC, tastyminerals 
 wrote:
 On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
 [...]

 Good to know. So, it's fine to use it with sum!"fast" but 
 better avoid it for general purposes.

 They both are more precise by default.

 This was a reply to the other your post in the thread, sorry. 
 Mir algorithms are more precise by default then the algorithms 
 you have provided.

 Right. Is this why standardDeviation is significantly slower?

Yes. It allows you to pick a summation option, you can try others 
then default in benchmarks.

Jul 15 2020

tastyminerals <tastyminerals gmail.com> writes:

On Wednesday, 15 July 2020 at 07:51:31 UTC, 9il wrote:
 On Wednesday, 15 July 2020 at 07:34:59 UTC, tastyminerals wrote:
 On Wednesday, 15 July 2020 at 06:57:21 UTC, 9il wrote:
 On Wednesday, 15 July 2020 at 06:55:51 UTC, 9il wrote:
 On Wednesday, 15 July 2020 at 06:00:46 UTC, tastyminerals 
 wrote:
 On Wednesday, 15 July 2020 at 02:08:48 UTC, 9il wrote:
 [...]

 Good to know. So, it's fine to use it with sum!"fast" but 
 better avoid it for general purposes.

 They both are more precise by default.

 This was a reply to the other your post in the thread, sorry. 
 Mir algorithms are more precise by default then the 
 algorithms you have provided.

 Right. Is this why standardDeviation is significantly slower?

 Yes. It allows you to pick a summation option, you can try 
 others then default in benchmarks.

Indeed, I played around with VarianceAlgo and Summation options, 
and they impact the end result a lot.

ans = matrix.flattened.standardDeviation!(VarianceAlgo.naive, 
Summation.appropriate);
std of [300, 300] matrix 0.375903
std of [60, 60] matrix 0.0156448
std of [600, 600] matrix 1.54429
std of [800, 800] matrix 3.03954

ans = matrix.flattened.standardDeviation!(VarianceAlgo.online, 
Summation.appropriate);
std of [300, 300] matrix 1.12404
std of [60, 60] matrix 0.041968
std of [600, 600] matrix 5.01617
std of [800, 800] matrix 8.64363


The Summation.fast behaves strange though. I wonder what happened 
here?

ans = matrix.flattened.standardDeviation!(VarianceAlgo.naive, 
Summation.fast);
std of [300, 300] matrix 1e-06
std of [60, 60] matrix 9e-07
std of [600, 600] matrix 1.2e-06
std of [800, 800] matrix 9e-07

ans = matrix.flattened.standardDeviation!(VarianceAlgo.online, 
Summation.fast);
std of [300, 300] matrix 9e-07
std of [60, 60] matrix 9e-07
std of [600, 600] matrix 1.1e-06
std of [800, 800] matrix 1e-06

Jul 15 2020

D Programming

C/C++ Programming

Other

digitalmars.D.learn - D Mir: standard deviation speed