www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Vectorization vs. sql

reply =?iso-8859-1?q?Knud_S=F8rensen?= <12tkvvb02 sneakemail.com> writes:
I think vectorization is very similar to sql.

Take matrix multiplication in sql.


select a.index1,b.index2, sum(a.value*b.value) into c (index1,index2,value) 
from a,b where a.index2=b.index1 group by a.index1,b.index2;


So, maybe we can make a similar notation for vectorization in D.

Something like: 
vectorize sum(a[i,j]*b[j,k]) into c[i,k] with i=1..l,k=1..n over j=1..m;

Just an idea.

Knud
Feb 16 2005
next sibling parent reply "Charlie Patterson" <charliep1 excite.com> writes:
"Knud Sørensen" <12tkvvb02 sneakemail.com> wrote in message 
news:pan.2005.02.16.17.09.36.655981 sneakemail.com...
I think vectorization is very similar to sql.

Sorry to be the first rain on the parade, but lots of people despise SQL (and lots admire it). It is usually considered to have "impedence mismatch" with imperative languages and that's how I tend to feel too. It's hard to map the mind back and forth and the setting up of variable, etc are done differently. The question would be, does another syntax to work "through" and array buy enough to justify itself? Or is it easy enough to write one or a couple of for loops when the occasional need rises?
Feb 16 2005
parent reply =?iso-8859-1?q?Knud_S=F8rensen?= <12tkvvb02 sneakemail.com> writes:
On Wed, 16 Feb 2005 13:51:00 -0500, Charlie Patterson wrote:

 
 "Knud Sørensen" <12tkvvb02 sneakemail.com> wrote in message 
 news:pan.2005.02.16.17.09.36.655981 sneakemail.com...
I think vectorization is very similar to sql.

Sorry to be the first rain on the parade, but lots of people despise SQL (and lots admire it). It is usually considered to have "impedence mismatch" with imperative languages and that's how I tend to feel too. It's hard to map the mind back and forth and the setting up of variable, etc are done differently. The question would be, does another syntax to work "through" and array buy enough to justify itself? Or is it easy enough to write one or a couple of for loops when the occasional need rises?

So, far none have posted a general syntax for vectorization and I would like to play with this syntax to see where it leads. I invites you to play with me so we might learn from each other ;-) If you think vectorization is just for loops you might read up on the subject. http://www.google.com/custom?domains=www.digitalmars.com&q=vectorization&sa=Search&sitesearch=www.digitalmars.com
Feb 16 2005
parent reply "Charlie Patterson" <charliep1 excite.com> writes:
"Knud Sørensen" <12tkvvb02 sneakemail.com> wrote in message
 Sorry to be the first rain on the parade, but lots of people despise SQL
 (and lots admire it).  It is usually considered to have "impedence 
 mismatch"
 with imperative languages and that's how I tend to feel too.
  It's hard to
 map the mind back and forth and the setting up of variable, etc are done
 differently.  The question would be, does another syntax to work 
 "through"
 and array buy enough to justify itself?
 Or is it easy enough to write one
 or a couple of for loops when the occasional need rises?

So, far none have posted a general syntax for vectorization and I would like to play with this syntax to see where it leads. I invites you to play with me so we might learn from each other ;-)

Thanks and I will follow along and help if I can. But my point still stands that any relational-looking syntax will have that "impedance mismatch." My understanding of D is that it cleans up where Java is too limited and C++ is too complicated. I think vectorization is "out there" and not used often enough to warrant scaring some potential new users looking for an upgrade from Java. IMHO. Which is not to say I don't want any new concepts. I think a good regular expressions engine would be used in every non-class-assignment program and is worthy of debating syntax. (Maybe completely forgetting about the grep/awk/perl legacy.) Again, I just think vectorization is an advanced concept that won't be used enough to warrant all that effort grafting it into D. Sorry.
Feb 17 2005
parent Dave <Dave_member pathlink.com> writes:
In article <cv2fha$b1q$1 digitaldaemon.com>, Charlie Patterson says...
"Knud Sørensen" <12tkvvb02 sneakemail.com> wrote in message
 Sorry to be the first rain on the parade, but lots of people despise SQL
 (and lots admire it).  It is usually considered to have "impedence 
 mismatch"
 with imperative languages and that's how I tend to feel too.
  It's hard to
 map the mind back and forth and the setting up of variable, etc are done
 differently.  The question would be, does another syntax to work 
 "through"
 and array buy enough to justify itself?
 Or is it easy enough to write one
 or a couple of for loops when the occasional need rises?

So, far none have posted a general syntax for vectorization and I would like to play with this syntax to see where it leads. I invites you to play with me so we might learn from each other ;-)

Thanks and I will follow along and help if I can. But my point still stands that any relational-looking syntax will have that "impedance mismatch."

Agreed here, but..
My understanding of D is that it cleans up where Java is too limited and C++ 
is too complicated.  I think vectorization is "out there" and not used often 
enough to warrant scaring some potential new users looking for an upgrade 
from Java.  IMHO.

. I'm not so sure here. Totally aside from the performance implications of the compiler vectorizing, what this discussion also implies is some sort of abbreviated syntax for common array operations. IMHO, abbreviated array operations would probably be a larger productivity enhancement to D and, if done right, not nearly as scary as learning the regex syntax for new users ;) That is not to say that the regex ideas are not great - they are - and would also be very welcome by me. - Dave
Which is not to say I don't want any new concepts.  I think a good regular 
expressions engine would be used in every non-class-assignment program and 
is worthy of debating syntax.  (Maybe completely forgetting about the 
grep/awk/perl legacy.)  Again, I just think vectorization is an advanced 
concept that won't be used enough to warrant all that effort grafting it 
into D.  Sorry.

Feb 17 2005
prev sibling next sibling parent reply "Craig Black" <cblack ara.com> writes:
"Knud Sørensen" <12tkvvb02 sneakemail.com> wrote in message 
news:pan.2005.02.16.17.09.36.655981 sneakemail.com...
I think vectorization is very similar to sql.

 Take matrix multiplication in sql.


 select a.index1,b.index2, sum(a.value*b.value) into c 
 (index1,index2,value)
 from a,b where a.index2=b.index1 group by a.index1,b.index2;


 So, maybe we can make a similar notation for vectorization in D.

 Something like:
 vectorize sum(a[i,j]*b[j,k]) into c[i,k] with i=1..l,k=1..n over j=1..m;

 Just an idea.

 Knud

Why not just have a vectorize keyword that will give a hint to the compiler to optimize a block of code? Then just use regular D loops. for(int i = 1; i < l; i++) for(int j = 1; j < m; j++) for(int k = 1; k < n; k++) { vectorize { c[i, k] = a[i,j] * b[j, k]; } }
Feb 16 2005
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Craig Black schrieb:
 "Knud S=F8rensen" <12tkvvb02 sneakemail.com> wrote in message=20
 news:pan.2005.02.16.17.09.36.655981 sneakemail.com...
=20
I think vectorization is very similar to sql.

Take matrix multiplication in sql.


select a.index1,b.index2, sum(a.value*b.value) into c=20
(index1,index2,value)
from a,b where a.index2=3Db.index1 group by a.index1,b.index2;


So, maybe we can make a similar notation for vectorization in D.

Something like:
vectorize sum(a[i,j]*b[j,k]) into c[i,k] with i=3D1..l,k=3D1..n over j=3D=


Just an idea.

Knud

=20 Why not just have a vectorize keyword that will give a hint to the comp=

 to optimize a block of code?  Then just use regular D loops.
=20
 for(int i =3D 1; i < l; i++)
 for(int j =3D 1; j < m; j++)
 for(int k =3D 1; k < n; k++)
 {
    vectorize { c[i, k] =3D a[i,j] * b[j, k]; }
 }=20

I don't like that idea too much: first, you say "do in that order", then = you say "but ignore the order". The for-loop simply is far too flexible to combine it with=20 vectorization. What is needed is a new kind of a loop that does not=20 imply an order in the first place. (See my suggestion in the other post)
Feb 16 2005
parent reply Georg Wrede <georg.wrede nospam.org> writes:
 Why not just have a vectorize keyword that will give a hint to the 
 compiler to optimize a block of code?  Then just use regular D loops.

 for(int i = 1; i < l; i++)
 for(int j = 1; j < m; j++)
 for(int k = 1; k < n; k++)
 {
    vectorize { c[i, k] = a[i,j] * b[j, k]; }
 } 

I don't like that idea too much: first, you say "do in that order", then you say "but ignore the order". The for-loop simply is far too flexible to combine it with vectorization. What is needed is a new kind of a loop that does not imply an order in the first place.

Why not
 vectorize(int i; minVi; maxVi) {
 vectorize(int j; minVj; maxVj) {
 vectorize(int k; minVk; maxVk)
 {
    c[i, k] = a[i,j] * b[j, k];
 } }}


Feb 17 2005
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Georg Wrede schrieb:
 Why not
 
  >> vectorize(int i; minVi; maxVi) {
  >> vectorize(int j; minVj; maxVj) {
  >> vectorize(int k; minVk; maxVk)
  >> {
  >>    c[i, k] = a[i,j] * b[j, k];
  >> } }}

Because this means that you can only vectorize statements and not expressions. This again means: * you cannot express vectorized returns * you cannot mix vectorized operations with other array operations * you cannot feed the result of a vectorized operation into a function without storing it into a temporary variable Furthermore, this syntax still seems to imply that the 'i' loop is the outermost loop. Especially loop reordering is an important tool for optimization which could be done by an advanced optimizing compiler. (Instead of having a syntax that implies an order and then saying "But the compiler is allowed to change the order", I would strongly prefer a syntax that differs substantially from sequential statements.) By the way: be aware that I only consider *vectorization*, i.e. command level parallelization. Parallelizing whole blocks of code is a completely different story and should be handled separately. That kind of parallelization needs different strategies in the compiler and aims for a mostly distinct audience.
Feb 17 2005
parent reply Georg Wrede <georg.wrede nospam.org> writes:
Norbert Nemec wrote:
 Georg Wrede schrieb:
 
 Why not

  >> vectorize(int i; minVi; maxVi) {
  >> vectorize(int j; minVj; maxVj) {
  >> vectorize(int k; minVk; maxVk)
  >> {
  >>    c[i, k] = a[i,j] * b[j, k];
  >> } }}

Because this means that you can only vectorize statements and not expressions. This again means: * you cannot express vectorized returns * you cannot mix vectorized operations with other array operations * you cannot feed the result of a vectorized operation into a function without storing it into a temporary variable Furthermore, this syntax still seems to imply that the 'i' loop is the outermost loop. Especially loop reordering is an important tool for optimization which could be done by an advanced optimizing compiler.

Unerstood.
 (Instead of having a syntax that implies an order and then saying "But 
 the compiler is allowed to change the order", I would strongly prefer a 
 syntax that differs substantially from sequential statements.)

Bear with me, on this area i a total noob... So, immediately above you say "sequential statements". Does that include the c[i, k] = a[i,j] * b[j, k]; above? (I like to understand thoroughly before I can think clearly. ;-) My dream is that we can come up with something that "vector folks" would love, but that also does not look too scary or intractable to programmers used to "the C notation family", including D.
 By the way: be aware that I only consider *vectorization*, i.e. command 
 level parallelization. Parallelizing whole blocks of code is a 
 completely different story and should be handled separately. That kind 
 of parallelization needs different strategies in the compiler and aims 
 for a mostly distinct audience.

Feb 17 2005
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Georg Wrede schrieb:
 (Instead of having a syntax that implies an order and then saying "But 
 the compiler is allowed to change the order", I would strongly prefer 
 a syntax that differs substantially from sequential statements.)

So, immediately above you say "sequential statements". Does that include the c[i, k] = a[i,j] * b[j, k]; above?

In a way, yes. What I mean is: C is a strictly sequential language. The programmer is used to the fact that commands are executed in order. (This is, what "sequential" means) If we want to extend this by constructions that break that strict sequentiality, the syntax should be sufficiently distinct.
 My dream is that we can come up with something that "vector folks"
 would love, but that also does not look too scary or intractable
 to programmers used to "the C notation family", including D.

True. This is my dream as well. Of course, vectorized expressions will never be the first thing to learn for a D newby. They are an advanced concept and people will be able to use D for years without touching it. Anyhow, it should of course fit in as smooth as possible. As I see it, people can go forth step by step: 1) use D just like C 2) start using arrays 3) start using array expressions (simple to understand, but neither fully flexible nor extendable) 4) start using vectorized expressions (more advanced, fully mixable with array expressions. To illustrate the last point, it should be clear that array expressions int[] a,b,c; [...] c = a+b; can always be replaced by their equivalent vectorized expressions int[] a,b,c; [...] assert(a.length == b.length) c = [i in 0..a.length](a[i]+b[i]);
Feb 18 2005
next sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Norbert Nemec wrote:
 Georg Wrede schrieb:
 
 My dream is that we can come up with something that "vector folks"
 would love, but that also does not look too scary or intractable
 to programmers used to "the C notation family", including D.


 To illustrate the last point, it should be clear that array expressions
     int[] a,b,c;
     [...]
     c = a+b;
 can always be replaced by their equivalent vectorized expressions
     int[] a,b,c;
     [...]
     assert(a.length == b.length)
     c = [i in 0..a.length](a[i]+b[i]);

That looks quite nice! Does that work the same with complicated expressions, with many indices? Does it still look as nice?
Feb 19 2005
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Georg Wrede schrieb:
 Norbert Nemec wrote:
 
 Georg Wrede schrieb:

 My dream is that we can come up with something that "vector folks"
 would love, but that also does not look too scary or intractable
 to programmers used to "the C notation family", including D.


....
 To illustrate the last point, it should be clear that array expressions
     int[] a,b,c;
     [...]
     c = a+b;
 can always be replaced by their equivalent vectorized expressions
     int[] a,b,c;
     [...]
     assert(a.length == b.length)
     c = [i in 0..a.length](a[i]+b[i]);

That looks quite nice! Does that work the same with complicated expressions, with many indices? Does it still look as nice?

I think, the syntax should scale up rather directly. Of course, complicated expressions will never loop simple, but the syntax also should not get into the way. To restate the example I gave before - matrix multiplication: double[,] a,b,c; [...initialization of a and b...] assert(a.length[1] == b.length[0]) c = [i in 0..a.length[0], k in 0..b.length[1]] (sum( [j in 0..a.length[1]] (a[i,j]*b[j,k]) )); Be aware, that vectorized expressions give rather lengthy code. Everything has to be given explicitely. Of course, one could think of thousand ways of shorthand notations that do stuff implicitely. However, anything that happens implicitely is bound to get in the way some day when you want to do something unusual. As a short-hand solution, there are (index-free) array expressions. Vectorized expressions on the other hand are meant only for cases where you need full flexibility.
Feb 20 2005
parent Georg Wrede <georg.wrede nospam.org> writes:
Norbert Nemec wrote:
 Georg Wrede schrieb:
 
 Norbert Nemec wrote:

 Georg Wrede schrieb:

 My dream is that we can come up with something that "vector folks"
 would love, but that also does not look too scary or intractable
 to programmers used to "the C notation family", including D.


....
 To illustrate the last point, it should be clear that array expressions
     int[] a,b,c;
     [...]
     c = a+b;
 can always be replaced by their equivalent vectorized expressions
     int[] a,b,c;
     [...]
     assert(a.length == b.length)
     c = [i in 0..a.length](a[i]+b[i]);

That looks quite nice! Does that work the same with complicated expressions, with many indices? Does it still look as nice?

I think, the syntax should scale up rather directly. Of course, complicated expressions will never loop simple, but the syntax also should not get into the way. To restate the example I gave before - matrix multiplication: double[,] a,b,c; [...initialization of a and b...] assert(a.length[1] == b.length[0]) c = [i in 0..a.length[0], k in 0..b.length[1]] (sum( [j in 0..a.length[1]] (a[i,j]*b[j,k]) ));

Excellent. This looks quite readable, and doesn't seem to demand "vectorizing knowledge" from the casual reader. This is important in projects where only some people are "vector people".
 Be aware, that vectorized expressions give rather lengthy code. 
 Everything has to be given explicitely. 

I see no problem with that. In a "C family" language it is not unusual at all to find quite explicit passages in source code.
 Of course, one could think of 
 thousand ways of shorthand notations that do stuff implicitely. However, 
 anything that happens implicitely is bound to get in the way some day 
 when you want to do something unusual. 

True! And the uninitiated would have a hard time understanding the code.
Feb 20 2005
prev sibling next sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <cv5jbl$ihb$1 digitaldaemon.com>, Norbert Nemec says...
Georg Wrede schrieb:
 (Instead of having a syntax that implies an order and then saying "But 
 the compiler is allowed to change the order", I would strongly prefer 
 a syntax that differs substantially from sequential statements.)

So, immediately above you say "sequential statements". Does that include the c[i, k] = a[i,j] * b[j, k]; above?

In a way, yes. What I mean is: C is a strictly sequential language. The programmer is used to the fact that commands are executed in order. (This is, what "sequential" means) If we want to extend this by constructions that break that strict sequentiality, the syntax should be sufficiently distinct.
 My dream is that we can come up with something that "vector folks"
 would love, but that also does not look too scary or intractable
 to programmers used to "the C notation family", including D.

True. This is my dream as well. Of course, vectorized expressions will never be the first thing to learn for a D newby. They are an advanced concept and people will be able to use D for years without touching it. Anyhow, it should of course fit in as smooth as possible. As I see it, people can go forth step by step: 1) use D just like C 2) start using arrays 3) start using array expressions (simple to understand, but neither fully flexible nor extendable) 4) start using vectorized expressions (more advanced, fully mixable with array expressions. To illustrate the last point, it should be clear that array expressions int[] a,b,c; [...] c = a+b; can always be replaced by their equivalent vectorized expressions int[] a,b,c; [...] assert(a.length == b.length) c = [i in 0..a.length](a[i]+b[i]);

Just curious - in the examples above (and for the rest of what you have in mind Norbert), using array expressions wouldn't neccessarily preclude any of the compiler optimizations that are available for the equivalent vectorized expression, would it? Or will there need to be an implied order of operations with what you have in mind for array expressions? Thanks, - Dave
Feb 21 2005
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Dave schrieb:
 In article <cv5jbl$ihb$1 digitaldaemon.com>, Norbert Nemec says...
 
Georg Wrede schrieb:

(Instead of having a syntax that implies an order and then saying "But 
the compiler is allowed to change the order", I would strongly prefer 
a syntax that differs substantially from sequential statements.)

So, immediately above you say "sequential statements". Does that include the c[i, k] = a[i,j] * b[j, k]; above?

In a way, yes. What I mean is: C is a strictly sequential language. The programmer is used to the fact that commands are executed in order. (This is, what "sequential" means) If we want to extend this by constructions that break that strict sequentiality, the syntax should be sufficiently distinct.
My dream is that we can come up with something that "vector folks"
would love, but that also does not look too scary or intractable
to programmers used to "the C notation family", including D.

True. This is my dream as well. Of course, vectorized expressions will never be the first thing to learn for a D newby. They are an advanced concept and people will be able to use D for years without touching it. Anyhow, it should of course fit in as smooth as possible. As I see it, people can go forth step by step: 1) use D just like C 2) start using arrays 3) start using array expressions (simple to understand, but neither fully flexible nor extendable) 4) start using vectorized expressions (more advanced, fully mixable with array expressions. To illustrate the last point, it should be clear that array expressions int[] a,b,c; [...] c = a+b; can always be replaced by their equivalent vectorized expressions int[] a,b,c; [...] assert(a.length == b.length) c = [i in 0..a.length](a[i]+b[i]);

Just curious - in the examples above (and for the rest of what you have in mind Norbert), using array expressions wouldn't neccessarily preclude any of the compiler optimizations that are available for the equivalent vectorized expression, would it?

Array expressions are basically just a shorthand for vectorized expressions. They are simpler to type and to read, but they work only for the simple cases. Apart from details like the range checking, there is no internal difference between the two examples above. The compiler should be able to produce identical code from both. (Actually, I think, a compiler would even use the same internal representation for both.) In general: if array operations are flexible enough for the problem you want to solve, fine for you. Vectorized expressions are really meant only for the trickier cases.
 Or will there need to be an implied order of operations with what you
 have in mind for array expressions?

I'm not sure that I understand your question. For array operations, it is obvious, that the order is irrelevant: c = a+b; might be executed like for(int i=0;i<a.length;i++) c[i] = a[i]+b[i]; or for(int i=a.length-1;i>=0;i--) c[i] = a[i]+b[i]; or in any other wild order. For vectorized expressions, it is not quite as obvious, since you cannot prohibit side-effects. You might write something like [i in 1..m](a[i] = 0.25*(a[i-1]+2*a[i]+a[i+1])) and suddenly, the result depends on the order in which the vectorized expression is calculated. Such expressions are *errors*! The whole point of vectorized expressions is that the order is not defined so that the compiler is free to choose the most efficient one. The compiler may not be able to detect errors like the one above. A good compiler will complain. A less intelligent one might produce unpredictable code. (A moderately smart one might at least warn you, if it cannot outrule the possibility of a order-dependency.)
Feb 21 2005
parent Dave <Dave_member pathlink.com> writes:
In article <cvdjjc$pl6$1 digitaldaemon.com>, Norbert Nemec says...
Dave schrieb:
 In article <cv5jbl$ihb$1 digitaldaemon.com>, Norbert Nemec says...
 
Georg Wrede schrieb:

(Instead of having a syntax that implies an order and then saying "But 
the compiler is allowed to change the order", I would strongly prefer 
a syntax that differs substantially from sequential statements.)

So, immediately above you say "sequential statements". Does that include the c[i, k] = a[i,j] * b[j, k]; above?

In a way, yes. What I mean is: C is a strictly sequential language. The programmer is used to the fact that commands are executed in order. (This is, what "sequential" means) If we want to extend this by constructions that break that strict sequentiality, the syntax should be sufficiently distinct.
My dream is that we can come up with something that "vector folks"
would love, but that also does not look too scary or intractable
to programmers used to "the C notation family", including D.

True. This is my dream as well. Of course, vectorized expressions will never be the first thing to learn for a D newby. They are an advanced concept and people will be able to use D for years without touching it. Anyhow, it should of course fit in as smooth as possible. As I see it, people can go forth step by step: 1) use D just like C 2) start using arrays 3) start using array expressions (simple to understand, but neither fully flexible nor extendable) 4) start using vectorized expressions (more advanced, fully mixable with array expressions. To illustrate the last point, it should be clear that array expressions int[] a,b,c; [...] c = a+b; can always be replaced by their equivalent vectorized expressions int[] a,b,c; [...] assert(a.length == b.length) c = [i in 0..a.length](a[i]+b[i]);

Just curious - in the examples above (and for the rest of what you have in mind Norbert), using array expressions wouldn't neccessarily preclude any of the compiler optimizations that are available for the equivalent vectorized expression, would it?

Array expressions are basically just a shorthand for vectorized expressions. They are simpler to type and to read, but they work only for the simple cases. Apart from details like the range checking, there is no internal difference between the two examples above. The compiler should be able to produce identical code from both. (Actually, I think, a compiler would even use the same internal representation for both.) In general: if array operations are flexible enough for the problem you want to solve, fine for you. Vectorized expressions are really meant only for the trickier cases.
 Or will there need to be an implied order of operations with what you
 have in mind for array expressions?

I'm not sure that I understand your question. For array operations, it is obvious, that the order is irrelevant: c = a+b; might be executed like for(int i=0;i<a.length;i++) c[i] = a[i]+b[i]; or for(int i=a.length-1;i>=0;i--) c[i] = a[i]+b[i]; or in any other wild order. For vectorized expressions, it is not quite as obvious, since you cannot prohibit side-effects. You might write something like [i in 1..m](a[i] = 0.25*(a[i-1]+2*a[i]+a[i+1])) and suddenly, the result depends on the order in which the vectorized expression is calculated. Such expressions are *errors*! The whole point of vectorized expressions is that the order is not defined so that the compiler is free to choose the most efficient one. The compiler may not be able to detect errors like the one above. A good compiler will complain. A less intelligent one might produce unpredictable code. (A moderately smart one might at least warn you, if it cannot outrule the possibility of a order-dependency.)

Right - just checking my understanding of what is being discussed. Wrt to the question on the order of array expression operations, I was just checking that there would not (for some reason) be an implicit order on those as they are being discussed in this thread. Thanks, - Dave
Feb 21 2005
prev sibling parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Norbert Nemec wrote:
 Georg Wrede schrieb:
 
...

2) start using arrays 3) start using array expressions (simple to understand, but neither fully flexible nor extendable) 4) start using vectorized expressions (more advanced, fully mixable with array expressions. To illustrate the last point, it should be clear that array expressions int[] a,b,c; [...] c = a+b; can always be replaced by their equivalent vectorized expressions int[] a,b,c; [...] assert(a.length == b.length) c = [i in 0..a.length](a[i]+b[i]);

Are these arrays or matricies? How does one interpret: int[] a,b,c; [...] c = a*b; What about: c = a/b; Is there a dot-product as well as a cross-product? (Here, I guess, I'm assuming that we're talking about matricies.) Arrays are useful for bunches of things. Matricies are useful for bunches of things. They tend to be different bunches of things. And except at the very lowest level the operations tend to be quite different. Is there a reasonable way to handle BOTH?
Feb 23 2005
parent Norbert Nemec <Norbert Nemec-online.de> writes:
Charles Hixson schrieb:
 Are these arrays or matricies?

Good question. I know, in a numerics language, both should be present. However, they are mostly distinct concepts and should therefore be kept cleanly distinct. So far, all my comments have revolved around arrays. Array operations are elementwise operations which are relatively easy to handle exhaustively in the language specs. IMO, matrix semantics should better be encapsulated into a Matrix library. Of course, this class would probably be a wrapper around arrays. It might, though, also feature a selection of storage models that interact transparently. The simple solution of offering two different kinds of multiplication for the same objects (Like Matlab does it) does not appeal to me too much. Basically, it is a matter of the object how it is multiplied. For an array, matrix multiplication does not make sense. In the same way, matrices are never multiplied elementwise. It is hard to find clean arguments for this, but when I think about it, it just seems natural to me that way.
Feb 23 2005
prev sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Knud S=F8rensen schrieb:
 I think vectorization is very similar to sql.
=20
 Take matrix multiplication in sql.
=20
=20
 select a.index1,b.index2, sum(a.value*b.value) into c (index1,index2,va=

 from a,b where a.index2=3Db.index1 group by a.index1,b.index2;
=20
=20
 So, maybe we can make a similar notation for vectorization in D.
=20
 Something like:=20
 vectorize sum(a[i,j]*b[j,k]) into c[i,k] with i=3D1..l,k=3D1..n over j=3D=

I guess, most people in this group will not like that SQL-like syntax.=20 Anyhow, the idea is very similar to what I had in mind already for=20 vectorized expressions: c =3D [i in 0..l,k in 0..n](sum([j in 0..m](a[i,j]*b[j,k]))) (The exact syntax is still disputable.) The idea is to make [x in a..b](some_expr(x)) a vectorized expression, i.e. an expression that returns an array, where = the entries of the array are evaluated without any specific order any=20 only if needed. I.e. ([i in 0..5](a[i]*b[i]))[2] would be equivalent to a[2]*b[2] or, a bit trickier: ([i in 2..5](a[i]*b[i]))[2] =3D=3D a[4]*b[4] Main difference to the SQL-like 'vectorize'-statement is, that this is=20 not restrictred to complete statements, even though, it does work for=20 assignments just as well, of course: [i in 0..m](a[i] =3D b[i+1]+c); Furthermore, it should work across functions as well: int[] add(int[] a, int[] b) { assert(a.length =3D=3D b.length); return [i in 0..a.length](a[i]+b[j]); } The compiler should be able to recognize the vectorized expression in=20 the return statement and optimize it when the function is inlined. (This detail needs further refinement)
Feb 16 2005
next sibling parent =?iso-8859-1?q?Knud_S=F8rensen?= <12tkvvb02 sneakemail.com> writes:
On Thu, 17 Feb 2005 05:25:09 +0100, Norbert Nemec wrote:

I think your syntax looks very good.
But I think we can learn from the similarities between sql 
and vectorisation, the have been done a lot of research in
query optimization for sql and maybe we can use some of it 
for vectorisation.

Also if we change focus from array of numbers to array of 
objects, then we almost have a indexed table.

vectorize a[i,j].name(),a[i,j].address() into file[i] with i=0..10 over
j=0..100;
 
So, maybe we can expand the vectorization engine to work on those too.

Also have you thought of how to implement aggregated functions,
maybe we can learn form databases.


Knud
Feb 17 2005
prev sibling parent reply =?iso-8859-1?q?Knud_S=F8rensen?= <12tkvvb02 sneakemail.com> writes:
Hi 

I have been play a little with the vectorization notation
here is some examples.

Adding a scalar to a vector.
[i in 0..l](a[i]+=0.5)

Finding size of a vector.
size=sqrt(sum([i in 0..l](a[i]*a[i])));

Finding dot-product; 
dot=sum([i in 0..l](a[i]*b[i]));

Matrix vector multiplication.
[i in 0..l](r[i]=sum([j in 0..m](a[i,j]*v[j])));

Calculating the trace of a matrix
res=sum([i in 0..l](a[i,i]));

Taylor expansion on every element in a vector
[i in 0..l](r[i]=sum([j in 0..m](a[j]*pow(v[i],j))));

Calculating Fourier series.
f=sum([j in 0..m](a[j]*cos(j*pi*x/2)+b[j]*sin(j*pi*x/2)))+c;

Calculating (A+I)*v using the Kronecker delta-tensor : delta(i,j)={i=j ? 1 : 0}
[i in 0..l](r[i]=sum([j in 0..m]((a[i,j]+delta(i,j))*v[j])));

Calculating cross product of two 3d vectors using the 
antisymmetric tensor/Permutation Tensor/Levi-Civita tensor
[i in 0..3](r[i]=sum([j in 0..3,k in 0..3](anti(i,j,k)*a[i]*b[k])));

Calculating determinant of a 4x4 matrix using the antisymmetric tensor
det=sum([i in 0..4,j in 0..4,k in 0..4,l in 0..4]
    (anti(i,j,k,l)*a[0,i]*a[1,j]*a[2,k]*a[3,l]));

I think we need some way to tell the vectorization engine about 
special tensors like delta() and anti() !

The vectorization engine should be able to exploit that 
this tensor is zero many times to optimize the calculations.

In sql one would say "where i!=k and i!=j and j!=k"
in the example with anti(i,j,k).

In this notation we need a way to encapsulate the where statements 
into specials tensors so that the vectorization engine
know how to use them in optimization.
Feb 20 2005
next sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Nice work, Knud! Examples always are the best way to test a design idea..=
=2E



Knud S=F8rensen schrieb:
 Matrix vector multiplication.
 [i in 0..l](r[i]=3Dsum([j in 0..m](a[i,j]*v[j])));

alternatively: r[0..l] =3D [i in 0..l](sum([j in 0..m](a[i,j]*v[j]))); It is interesting to see how the syntax allows different ways to express = the same thing.
 Calculating (A+I)*v using the Kronecker delta-tensor : delta(i,j)=3D{i=3D=

 [i in 0..l](r[i]=3Dsum([j in 0..m]((a[i,j]+delta(i,j))*v[j])));
=20
 Calculating cross product of two 3d vectors using the=20
 antisymmetric tensor/Permutation Tensor/Levi-Civita tensor
 [i in 0..3](r[i]=3Dsum([j in 0..3,k in 0..3](anti(i,j,k)*a[i]*b[k])));
=20
 Calculating determinant of a 4x4 matrix using the antisymmetric tensor
 det=3Dsum([i in 0..4,j in 0..4,k in 0..4,l in 0..4]
     (anti(i,j,k,l)*a[0,i]*a[1,j]*a[2,k]*a[3,l]));
=20
 I think we need some way to tell the vectorization engine about=20
 special tensors like delta() and anti() !
=20
 The vectorization engine should be able to exploit that=20
 this tensor is zero many times to optimize the calculations.

I think this needs a lot of investigation. Optimization is only=20 possible, if they are inside a sum. It should be considered, but I have=20 no clear concept for it, yet.
 In sql one would say "where i!=3Dk and i!=3Dj and j!=3Dk"
 in the example with anti(i,j,k).

Of course, something like "where" does not really make much sense for=20 vectorized expressions: It would mean that the number and position of=20 the elements in the result depends on their content at runtime, which=20 completely breaks vectorization.
 In this notation we need a way to encapsulate the where statements=20
 into specials tensors so that the vectorization engine
 know how to use them in optimization.

Well - as I said, the real power of anti and delta, as you call them=20 (I'm not yet fully convinced of the naming) only shows up when they are=20 used inside a sum. Up to now, I had considered 'sum' a rather stupid=20 function that just takes an arbitrary array and sums it up. If you want=20 the compiler to handle anti and delta, I think most of the intelligence=20 has to go into the 'sum' handling in the compiler. Anyhow: I think these are rather futuristic considerations. It is=20 interesting to think ahead, of course...
Feb 20 2005
parent reply Georg Wrede <georg.wrede nospam.org> writes:
Norbert Nemec wrote:
 Nice work, Knud! Examples always are the best way to test a design idea...

 I think we need some way to tell the vectorization engine about 
 special tensors like delta() and anti() !


Are there some special problems to be expected with that? (I'm no vector guy.) As in, do delta(), anti() and such demand something very special from the compiler/language, or are they just functions that could be written in D right now? And still work with vectorization?
Feb 21 2005
next sibling parent =?iso-8859-1?q?Knud_S=F8rensen?= <12tkvvb02 sneakemail.com> writes:
On Mon, 21 Feb 2005 10:05:47 +0200, Georg Wrede wrote:

 
 
 Norbert Nemec wrote:
 Nice work, Knud! Examples always are the best way to test a design idea...

 I think we need some way to tell the vectorization engine about 
 special tensors like delta() and anti() !


Are there some special problems to be expected with that? (I'm no vector guy.) As in, do delta(), anti() and such demand something very special from the compiler/language, or are they just functions that could be written in D right now? And still work with vectorization?

Yes, take the determinant expression det=sum([i in 0..4,j in 0..4,k in 0..4,l in 0..4] (anti(i,j,k,l)*a[0,i]*a[1,j]*a[2,k]*a[3,l])); for every anti() we got 1 addition and 4 multiplications. Now anti(i,j,k,l) have 256 combinations and is zero on 232 so if the compiler knows how to optimize anti it would save 232 additions and 928 multiplications and just leaving 24 additions and 96 multiplications.
Feb 21 2005
prev sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Georg Wrede schrieb:
 
 
 Norbert Nemec wrote:
 
 Nice work, Knud! Examples always are the best way to test a design 
 idea...

....
 I think we need some way to tell the vectorization engine about 
 special tensors like delta() and anti() !


Are there some special problems to be expected with that? (I'm no vector guy.) As in, do delta(), anti() and such demand something very special from the compiler/language, or are they just functions that could be written in D right now? And still work with vectorization?

Both could be implemented as plain functions quite trivially. However, using them without optimization would be the killer. Just consider replacing. f(3) by sum([i in 0..1000](f(i)*delta(i,3))) which is equivalent to sum([i in 0..1000](f(i)*(i==3?1:0)) Generally, I believe the Kronecker-Delta and the Levi-Civita-Symbol are best left to the pen-and-paper mathematicians. On the paper, that kind of formalism really saves the day. In numerics, I cannot see yet, why it would be a good idea to use it.
Feb 21 2005
parent reply =?iso-8859-1?q?Knud_S=F8rensen?= <12tkvvb02 sneakemail.com> writes:
On Mon, 21 Feb 2005 13:59:25 +0100, Norbert Nemec wrote:

 Georg Wrede schrieb:
 
 
 Norbert Nemec wrote:
 
 Nice work, Knud! Examples always are the best way to test a design 
 idea...

....
 I think we need some way to tell the vectorization engine about 
 special tensors like delta() and anti() !


Are there some special problems to be expected with that? (I'm no vector guy.) As in, do delta(), anti() and such demand something very special from the compiler/language, or are they just functions that could be written in D right now? And still work with vectorization?

Both could be implemented as plain functions quite trivially. However, using them without optimization would be the killer. Just consider replacing. f(3) by sum([i in 0..1000](f(i)*delta(i,3))) which is equivalent to sum([i in 0..1000](f(i)*(i==3?1:0)) Generally, I believe the Kronecker-Delta and the Levi-Civita-Symbol are best left to the pen-and-paper mathematicians. On the paper, that kind of formalism really saves the day. In numerics, I cannot see yet, why it would be a good idea to use it.

Well I see vectorization notation as 2 parts. 1) unordered loops like [i in 0..3,k in 0..4] 2) an element relation like r[i]=sum([j in 0..m](a[i,j]*v[j]) The Kronecker-delta is identity matrix (I) expressed as an element relation. I =[i in 0..n,j in 0..n](delta(i,j)); The Levi-Civita-Symbol (anti(i,j,k)) is the concept of anti-symmetry expressed as an element relation. You will see it in calculation of cross products, determinants and co-matrices in normal linear algebra. But the difference is that they can be used in high dimensions as well. You could try to write a vectorized expression for cross product, a co-matrix or determinant without anti() to see how that looks.
Feb 21 2005
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Knud S=F8rensen schrieb:
 Well I see vectorization notation as 2 parts.
 1) unordered loops like [i in 0..3,k in 0..4]
 2) an element relation like r[i]=3Dsum([j in 0..m](a[i,j]*v[j])
=20
 The Kronecker-delta is identity matrix (I) expressed as an element
 relation.
=20
 I =3D[i in 0..n,j in 0..n](delta(i,j));
=20
 The Levi-Civita-Symbol (anti(i,j,k)) is the concept of anti-symmetry
 expressed as an element relation.
=20
 You will see it in calculation of cross products, determinants and
 co-matrices in normal linear algebra.
=20
 But the difference is that they can be used in high dimensions as well.=

=20
 You could try to write a vectorized expression for cross product, a
 co-matrix or determinant without anti() to see how that looks.

I believe I understand pretty well what you say (just doing my PhD in=20 theoretical physics, I have had my share of index-orgies on the paper=20 and at the blackboard... :-) ) Mathematically, the idea of "relations" makes perfect sense.=20 Computationally, I don't know yet, whether the concept is really that=20 helpful. I makes sense to allow mathematical notation, but my opinion=20 is, that the language should not demand any deep intelligence from the=20 compiler. Meaning: Even without optimization, the performance should be=20 reasonable. It might seem convenient to have 'anti' available and be able to take=20 compact expressions directly from the textbook. However, optimizing the=20 expressions takes a lot of intelligence from the compiler. And unless=20 you know exactly what the compiler optimizes, you should rather not=20 trust that it will do what you expect. So in the end, you'll probably be = better of coding these operations by hand anyway. For the cross-product, for example the following expression would give=20 you a nice vectorized version: [i in 0..2](u[(i+1)%3] * v[(i+2)%3] - u[(i+2)%3] * v[(i+1)%3]) Maybe not as beautiful as the mathematical version, but then - we do not = need to simplify it, but we want to crunch numbers.
Feb 21 2005
parent reply =?iso-8859-1?q?Knud_S=F8rensen?= <12tkvvb02 sneakemail.com> writes:
On Mon, 21 Feb 2005 17:02:05 +0100, Norbert Nemec wrote:



 It might seem convenient to have 'anti' available and be able to take 
 compact expressions directly from the textbook. However, optimizing the 
 expressions takes a lot of intelligence from the compiler. And unless 
 you know exactly what the compiler optimizes, you should rather not 
 trust that it will do what you expect. So in the end, you'll probably be 
 better of coding these operations by hand anyway.

I see your concern.
 For the cross-product, for example the following expression would give
 you a nice vectorized version:
 	[i in 0..2](u[(i+1)%3] * v[(i+2)%3] - u[(i+2)%3] * v[(i+1)%3])

Nice, but hard to generalizes to more dimensions. I got the idea that maybe we put anti-symmetry in to the sum function and make a antisum function, but I need to think more about it to make a concrete suggestion.
 Maybe not as beautiful as the mathematical version, but then - we do not
 need to simplify it, but we want to crunch numbers.

Yes, but we also have to make sure that we can crunch the right numbers ;-) ps.) Here is a beautiful vectorized expression for the n-dimensional determinant (using anti). sum([id[n] in 0..n](anti(id[])*multiply([i in 0..n]a[i,id[i]]))));
Feb 22 2005
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Knud S=F8rensen schrieb:
For the cross-product, for example the following expression would give
you a nice vectorized version:
	[i in 0..2](u[(i+1)%3] * v[(i+2)%3] - u[(i+2)%3] * v[(i+1)%3])

=20 Nice, but hard to generalizes to more dimensions.

The cross-product as such is inherently 3-dimensional. In higher=20 dimensions, there may be similar things (like outer products and similar = creatures) but they cannot really be called "generalizations" of the=20 cross product.
 I got the idea that maybe we put anti-symmetry in to the sum function=20
 and make a antisum function, but I need to think more about it to=20
 make a concrete suggestion.=20

That might be the way to go. Still not sure yet, whether it is really=20 essential for the language. Unless I see a really nice and clean=20 proposal, I would suggest to wait until we there is some experience with = vectorized expressions in general.
 ps.) Here is a beautiful vectorized expression for the n-dimensional
 determinant (using anti).=20
 sum([id[n] in 0..n](anti(id[])*multiply([i in 0..n]a[i,id[i]]))));

Don't know, yet, what to think about this syntax for n-dimensional sums. = I would have to see and understand a general syntax proposal before=20 making up my mind about it. B.t.w.: The 'multiply' should rather be called 'product' (since it=20 should go parallel with 'sum' and not with 'add').
Feb 22 2005
parent =?iso-8859-1?q?Knud_S=F8rensen?= <12tkvvb02 sneakemail.com> writes:
On Tue, 22 Feb 2005 14:18:19 +0100, Norbert Nemec wrote:

 Knud Sørensen schrieb:
For the cross-product, for example the following expression would give
you a nice vectorized version:
	[i in 0..2](u[(i+1)%3] * v[(i+2)%3] - u[(i+2)%3] * v[(i+1)%3])

Nice, but hard to generalizes to more dimensions.

The cross-product as such is inherently 3-dimensional. In higher dimensions, there may be similar things (like outer products and similar creatures) but they cannot really be called "generalizations" of the cross product.

I where referring to the way you use anti-symmetry, but here is a generalization of cross-product to n-1 n-vectors. Let a[n-1,n] represented an array of n-1 n-vectors v=[i in 0..n]sum([id[n-1] in 0..n]anti(i,id[])*product([k in 0..n-1]a[k,id[k]));
 I got the idea that maybe we put anti-symmetry in to the sum function 
 and make a antisum function, but I need to think more about it to 
 make a concrete suggestion. 


Another idea might be to allow advance slices/masking. Let p_anti and n_anti be slices which represent the positive and negetive part of the anti tensor. p_anti(3) would return (0,1,2),(2,0,1)... then det= sum([id[n] in p_anti(n)](product([i in 0..n]a[i,id[i]])) - [id[n] in n_anti(n)](product([i in 0..n]a[i,id[i]])));
 That might be the way to go. Still not sure yet, whether it is really 
 essential for the language. Unless I see a really nice and clean 
 proposal, I would suggest to wait until we there is some experience with 
 vectorized expressions in general.

Right now we are just playing with the problem to see what we can learn.
 ps.) Here is a beautiful vectorized expression for the n-dimensional
 determinant (using anti). 
 sum([id[n] in 0..n](anti(id[])*multiply([i in 0..n]a[i,id[i]]))));

Don't know, yet, what to think about this syntax for n-dimensional sums. I would have to see and understand a general syntax proposal before making up my mind about it.

The important ting is that we have a syntax which can express this type of problem not that it uses this exact syntax.
 B.t.w.: The 'multiply' should rather be called 'product' (since it 
 should go parallel with 'sum' and not with 'add').

agree.
Feb 22 2005
prev sibling parent Georg Wrede <georg.wrede nospam.org> writes:
Knud Sørensen wrote:
 Hi 
 
 I have been play a little with the vectorization notation
 here is some examples.
 
 Adding a scalar to a vector.
 [i in 0..l](a[i]+=0.5)
 
 Finding size of a vector.
 size=sqrt(sum([i in 0..l](a[i]*a[i])));
 
 Finding dot-product; 
 dot=sum([i in 0..l](a[i]*b[i]));
 
 Matrix vector multiplication.
 [i in 0..l](r[i]=sum([j in 0..m](a[i,j]*v[j])));
 
 Calculating the trace of a matrix
 res=sum([i in 0..l](a[i,i]));
 
 Taylor expansion on every element in a vector
 [i in 0..l](r[i]=sum([j in 0..m](a[j]*pow(v[i],j))));
 
 Calculating Fourier series.
 f=sum([j in 0..m](a[j]*cos(j*pi*x/2)+b[j]*sin(j*pi*x/2)))+c;
 
 Calculating (A+I)*v using the Kronecker delta-tensor : delta(i,j)={i=j ? 1 : 0}
 [i in 0..l](r[i]=sum([j in 0..m]((a[i,j]+delta(i,j))*v[j])));
 
 Calculating cross product of two 3d vectors using the 
 antisymmetric tensor/Permutation Tensor/Levi-Civita tensor
 [i in 0..3](r[i]=sum([j in 0..3,k in 0..3](anti(i,j,k)*a[i]*b[k])));
 
 Calculating determinant of a 4x4 matrix using the antisymmetric tensor
 det=sum([i in 0..4,j in 0..4,k in 0..4,l in 0..4]
     (anti(i,j,k,l)*a[0,i]*a[1,j]*a[2,k]*a[3,l]));

In spite of some hairy names above, the code itself looks clear, and readable! The reader can figure out what's going on without ever having heard of Fourier. :-) In my experience, many times when a specialist programmer gets stuck, and explains the code to the next cubicle guy, often the latter can point out the obvious bugs, that are too trivial for the specialist to notice.
Feb 20 2005