digitalmars.D - More on vectorized comparisons

bearophile (43/43) Aug 22 2012 Some time ago I have suggested to add support to vector

Sean Cavanaugh (38/57) Aug 23 2012 Well, right now the binary operators == != >= <= > and < are required to...

bearophile (13/17) Aug 23 2012 The hypothetical D sugar I was looking for is this, where 'a',

Don Clugston (13/29) Aug 24 2012 It's just syntax sugar for a very obscure operation, and it's somewhat

bearophile (20/25) Aug 24 2012 It's an operation included in Cilk Plus, I think Intel devs know

Don Clugston (5/21) Aug 27 2012 Wow. Then I misunderstood, and it's even less appropriate for D than I

Andrei Alexandrescu (3/4) Aug 27 2012 Beware of wraparound :o).

Peter Alexander (5/11) Aug 27 2012 How could this possibly be useful? It's like the loop, but you

bearophile (21/25) Aug 27 2012 I think this feature is a part of Cilk+. Maybe I have not fully

Peter Alexander (9/14) Aug 27 2012 Ok, I can see the use of this, but I find the syntax *very*

tn (9/28) Aug 24 2012 The proposed syntax looks weird. Wouldn't the followind be more

"bearophile" <bearophileHUGS lycos.com> writes:

Some time ago I have suggested to add support to vector 
comparisons in D, because this is sometimes useful and in the 
modern SIMD units there is hardware support for such operations:


double[] a = [1.0, 1.0, -1.0, 1.0, 0.0, -1.0];
bool[] t = a[] > 0;
assert(t == [true, true, false, true, false, false]);

Usable instructions like (here shows the intrinsic):
http://msdn.microsoft.com/en-us/library/11dy102s%28v=vs.80%29.aspx


Now on Reddit I have found a small discussion about a slides pack 
by Intel:
http://www.reddit.com/r/programming/comments/ym8m6/parallel_programming_for_c_and_c_done_right/

The slides:
https://speakerdeck.com/u/multicoreworld/p/james-reinders-intel-united-states

Link to the PDF:
https://speakerd.s3.amazonaws.com/presentations/5006069136af010002005325/Reinders_KEYNOTE_C_done_right.pdf

At page 69 of those slides there is some code that looks 
interesting, I think this is a reduced version of part of it, 
that shows another way to use vectorized comparisons:


void main() {
     double[] a = [1.0, 1.0, -1.0, 1.0, 0.0, -1.0];
     double[] b = [10,   20,   30,  40,  50,   60];
     double[] c = [1,     2,    3,   4,   5,    6];
     if (a[] > 0)
         b[] += c[];
}


I think that code is semantically equivalent to:

void main() {
     double[] a = [1.0, 1.0, -1.0, 1.0, 0.0, -1.0];
     double[] b = [10,   20,   30,  40,  50,   60];
     double[] c = [1,     2,    3,   4,   5,    6];
     foreach (i; 0 .. a.length)
         if (a[i] > 0)
             b[i] += c[i];
}


After that code b is:
[11, 22, 30, 44, 50, 60]


This means the contents of the 'then' branch of the vectorized 
comparison is done only on items of b and c where the comparison 
has given true.

This looks useful. Is it possible to implement this in D, and do 
you like it?

Bye,
bearophile

Aug 22 2012

Sean Cavanaugh <WorksOnMyMachine gmail.com> writes:

On 8/22/2012 7:19 PM, bearophile wrote:
 Some time ago I have suggested to add support to vector comparisons in
 D, because this is sometimes useful and in the modern SIMD units there
 is hardware support for such operations:


 I think that code is semantically equivalent to:

 void main() {
      double[] a = [1.0, 1.0, -1.0, 1.0, 0.0, -1.0];
      double[] b = [10,   20,   30,  40,  50,   60];
      double[] c = [1,     2,    3,   4,   5,    6];
      foreach (i; 0 .. a.length)
          if (a[i] > 0)
              b[i] += c[i];
 }


 After that code b is:
 [11, 22, 30, 44, 50, 60]


 This means the contents of the 'then' branch of the vectorized
 comparison is done only on items of b and c where the comparison has
 given true.

 This looks useful. Is it possible to implement this in D, and do you
 like it?

Well, right now the binary operators == != >= <= > and < are required to 
return bool instead of allowing a user defined type, which prevents a 
lot of the sugar you would want to make the code nice to write.  Without 
the sugar the code would ends up this:

foreach(i; 0 .. a.length)
{
     float4 mask = greaterThan(a[i], float4(0,0,0,0));
     b[i] = select(mask, b[i] + c[i], b[i]);
}

in GPU shader land this expression is at least simpler to write:

foreach(i; 0 .. a.length)
{
     b[i] = (b[i] > 0) ? (b[i] + c[i]) : b[i];
}


All of these implementations are equivalent and remove the branch from 
the code flow, which is pretty nice for the CPU pipeline.   In SIMD the 
comparisons generate masks into a register which you can immediately 
use.  On modern (SSE4) CPUs the select is a single instruction, on older 
ones it takes three: (mask & A) | (~mask & B), but its all better than a 
real branch.

If you have a large amount of code needing a branch, you can take the 
mask generated by the compare, and extract it into a CPU register, and 
compare it for 0, nonzero, specific or any bits set.  a float4 
comparison ends up generating 4 bits, so the code with a real branch is 
like:

if (any(a[i] > 0))
{
     // do stuff if any of a[i] are greater than zero
}	
if (all(a[i] > 0))
{
     // do stuff if all of a[i] are greater than zero
}
if ((getMask(a[i] > 0) & 0x7) == 0x7)
{
     // do stuff if the first three elements are greater than zero
}

Aug 23 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Sean Cavanaugh:

 Well, right now the binary operators == != >= <= > and < are 
 required to return bool instead of allowing a user defined 
 type, which prevents a lot of the sugar you would want to make 
 the code nice to write.

The hypothetical D sugar I was looking for is this, where 'a', 
'b' and 'c' are normal dynamic arrays of doubles (not of float[4] 
of double[2]) (currently this code is a syntax error):

if (a[] > 0)
     b[] += c[];


The front-end is able to implement those two lines of code as it 
likes, like seeing those normal arrays as arrays of double[2] (or 
double[4] on more modern CPUs) and put there all the needed 
intrinsics or assembly needed to implement that semantics.

So what's the problem the > operator causes in this code?

Bye,
bearophile

Aug 23 2012

Don Clugston <dac nospam.com> writes:

On 24/08/12 00:13, bearophile wrote:
 Sean Cavanaugh:

 Well, right now the binary operators == != >= <= > and < are required
 to return bool instead of allowing a user defined type, which prevents
 a lot of the sugar you would want to make the code nice to write.

 The hypothetical D sugar I was looking for is this, where 'a', 'b' and
 'c' are normal dynamic arrays of doubles (not of float[4] of double[2])
 (currently this code is a syntax error):

 if (a[] > 0)
      b[] += c[];


 The front-end is able to implement those two lines of code as it likes,
 like seeing those normal arrays as arrays of double[2] (or double[4] on
 more modern CPUs) and put there all the needed intrinsics or assembly
 needed to implement that semantics.

 So what's the problem the > operator causes in this code?

 Bye,
 bearophile

It's just syntax sugar for a very obscure operation, and it's somewhat 
ambiguous -- is it allowed to use short-circuit evaluation?
Mathematically, it doesn't make sense. You can compare scalars, but 
ordered comparison of vectors is a bit nonsensical, unless it is 
element-wise.
Usually, a[] > 0, a[] < 0, and a[] == 0 will all be false.

Most likely, you really meant  dot(a[]) > 0.

Something like

if ( all( a[] > 0 ) )
     b[] += c[];

is more reasonable. But an implicit 'reduce' in a vector operation has 
little to commend it, I think.

Aug 24 2012

"bearophile" <bearophileHUGS lycos.com> writes:

It's just syntax sugar for a very obscure operation,<

It's an operation included in Cilk Plus, I think Intel devs know
enough what they are doing.
And I think code like this is a common need:

if (a[] > 0) {
      // ...
}


and it's somewhat ambiguous -- is it allowed to use 
short-circuit evaluation?<

That code means:

foreach (i; 0 .. a.length) {
      if (a[i] > 0) {
          // ...
      }
}

Here I don't see problems caused by short circuit evaluation.


You can compare scalars, but ordered comparison of vectors is a 
bit nonsensical, unless it is element-wise.<

It's a comparison element-wise between each item and a constant,
it's similar to:

a[] = a[] + 5;

Probably you have misunderstood the semantics of what I am
discussing.

Bye,
bearophile

Aug 24 2012

Don Clugston <dac nospam.com> writes:

On 24/08/12 15:57, bearophile wrote:
 It's just syntax sugar for a very obscure operation,<

 It's an operation included in Cilk Plus, I think Intel devs know
 enough what they are doing.
 And I think code like this is a common need:

 if (a[] > 0) {
       // ...
 }
 and it's somewhat ambiguous -- is it allowed to use short-circuit
 evaluation?<

 That code means:

 foreach (i; 0 .. a.length) {
       if (a[i] > 0) {
           // ...
       }
 }

 Here I don't see problems caused by short circuit evaluation.

Wow. Then I misunderstood, and it's even less appropriate for D than I 
thought.

vote -= int.max;

Worst syntax I've seen in a very long time.

Aug 27 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/27/12 8:06 AM, Don Clugston wrote:
 vote -= int.max;

Beware of wraparound :o).

Andrei

Aug 27 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Friday, 24 August 2012 at 13:57:42 UTC, bearophile wrote:
 That code means:

 foreach (i; 0 .. a.length) {
      if (a[i] > 0) {
          // ...
      }
 }

How could this possibly be useful? It's like the loop, but you 
lose the index variable. I can't see how you could possibly do 
anything with this.

Can you show an example of some code that uses this?

Aug 27 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Peter Alexander:

 How could this possibly be useful? It's like the loop, but you 
 lose the index variable. I can't see how you could possibly do 
 anything with this.

I think this feature is a part of Cilk+. Maybe I have not fully 
understood this feature, or maybe some Intel developers are mad 
:-)

I think in code like this:

if (a[] >= 0)
     b[] += c[];

The 'b' and 'c' arrays receive the implicit index of the items of 
'a' that aren't negative.

I think its semantics is a bit like this (assuming zip() supports 
ref iteration), you see no index variable here:

parallel_foreach (ai, ref bi, ci; zip(a, b, c))
     if (ai > 0)
         bi += ci;


I think it's something commonly useful.


 Can you show an example of some code that uses this?

On the net I have found two examples of Cilk+ code that uses that 
feature. One of them is referenced in the first email of this 
thread. With Google you can find the full code from Intel that 
piece of code comes from.

Bye,
bearophile

Aug 27 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Monday, 27 August 2012 at 20:29:29 UTC, bearophile wrote:
 I think in code like this:

 if (a[] >= 0)
     b[] += c[];

 The 'b' and 'c' arrays receive the implicit index of the items 
 of 'a' that aren't negative.

Ok, I can see the use of this, but I find the syntax *very* 
confusing. Expressions shouldn't be able to mess with code 
semantics like that.

If you wanted to do something like that, I could live with this 
syntax:

b[] += (a[] >= 0 ? c[] : 0);

The a[] >= 0 returns a vector of booleans, and then the ternary 
operator acts element-wise with those booleans as the condition.

Aug 27 2012

"tn" <no email.com> writes:

On Thursday, 23 August 2012 at 00:19:39 UTC, bearophile wrote:
 At page 69 of those slides there is some code that looks 
 interesting, I think this is a reduced version of part of it, 
 that shows another way to use vectorized comparisons:


 void main() {
     double[] a = [1.0, 1.0, -1.0, 1.0, 0.0, -1.0];
     double[] b = [10,   20,   30,  40,  50,   60];
     double[] c = [1,     2,    3,   4,   5,    6];
     if (a[] > 0)
         b[] += c[];
 }


 I think that code is semantically equivalent to:

 void main() {
     double[] a = [1.0, 1.0, -1.0, 1.0, 0.0, -1.0];
     double[] b = [10,   20,   30,  40,  50,   60];
     double[] c = [1,     2,    3,   4,   5,    6];
     foreach (i; 0 .. a.length)
         if (a[i] > 0)
             b[i] += c[i];
 }

The proposed syntax looks weird. Wouldn't the followind be more 
intuitive:

foreach (i; a[i] > 0)
     b[i] += c[i];

Or alternatively it would be nice to be able to do it like in 
Matlab:

i = (a[] > 0);
b[i] += c[i];

Aug 24 2012

D Programming

C/C++ Programming

Other

digitalmars.D - More on vectorized comparisons