digitalmars.D - Small Vectors Proposal

Mikola Lysenko (161/161) Jan 30 2007 Coordinate geometry is a fundamental part of mathematics. Many

Oskar Linde (25/36) Jan 30 2007 I think the arguing makes much sense, but as the proposal adds 57(!) new...

Frits van Bommel (4/51) Jan 30 2007 Something like this may unfortunately be necessary if backwards
Mikola Lysenko (9/20) Jan 30 2007 After thinking about it some more, I believe this approach would

Benji Smith (14/24) Jan 30 2007 Three and four dimensional vectors are nice for physics and whatnot, but...

Mikola Lysenko (35/39) Jan 30 2007 I can understand your desire to create a universal solution, and for a

janderson (17/65) Feb 01 2007 I don't think its critical that high order matrix multiplications get

Chad J (15/88) Feb 01 2007 I think we should have array operations for dynamic arrays

Benji Smith (12/19) Feb 07 2007 The more I think about it, the more I agree.

Don Clugston (8/21) Feb 01 2007 I completely agree. For large vectors and matrices, you're going to have...

Bill Baxter (10/33) Feb 01 2007 I'd say that's not necessarily true with games using more and more

Bill Baxter (8/25) Jan 30 2007 Yes I agree with Oskar and Walter that all those 57 primitive type names...

Frits van Bommel (9/21) Jan 30 2007 [snip]

Mikola Lysenko (8/35) Jan 30 2007 Hmm... Fair enough. We could define ordering on a per component level

Frits van Bommel (18/28) Jan 30 2007 Your code is wrong when e.g. (x > a) and (y < b). It returns false true

Mikola Lysenko (4/36) Jan 30 2007 Whoops! I wasn't really thinking too carefully when I wrote that.

Frits van Bommel (5/8) Jan 30 2007 It also fits really well with the "plain static array" proposed change:

Walter Bright (8/8) Jan 30 2007 Thank you, this is in line with what I thought you meant. I'm glad I was...

Mikola Lysenko (7/20) Jan 30 2007 Yes, Oskar made a good point in his post. The main issue I was
Andrei Alexandrescu (See Website For Email) (9/22) Jan 30 2007 I don't think this is advisable. It practically introduces an arbitrary

Walter Bright (4/23) Jan 30 2007 I agree that it's a very bad idea to have such a discontinuity in ref vs...

Andrei Alexandrescu (See Website For Email) (4/28) Jan 30 2007 So then arrays by ref and a stdlib template that wraps an array in a

Oskar Linde (19/47) Feb 01 2007 I disagree. What you describe is a way of working around a design issue

Andrei Alexandrescu (See Website For Email) (7/23) Feb 01 2007 But then people will complain about the inconsistency between static

Frits van Bommel (9/15) Feb 01 2007 I don't see much complaints about the "inconsistency" between structs

Lionello Lunesu (14/36) Jan 31 2007 That difference could be made using different notations:

janderson (17/30) Jan 31 2007 I think some of the properties could be used with multiple dimensions.

Andrei Alexandrescu (See Website For Email) (45/180) Jan 30 2007 Having read the proposal, I think it can likely be accommodated entirely...

Bill Baxter (8/21) Jan 30 2007 Your responses to the other points seem reasonable and encouraging, but

Andrei Alexandrescu (See Website for Email) (21/43) Jan 30 2007 Nonono. I just meant: the fact that today's architectures have

Deewiant (5/18) Feb 01 2007 I presume the original proposal would have allowed both cast(float2)(int...

Andrei Alexandrescu (See Website For Email) (9/28) Feb 01 2007 It is a change that I personally think should be made. Walter devised

Walter Bright (5/10) Feb 01 2007 Yes, it's the "when we don't know what the semantics of this combination...

Andrei Alexandrescu (See Website For Email) (4/15) Feb 01 2007 That's how D started without templates and ended up with having one of

Chad J (26/26) Jan 30 2007 Another thing I realized it would be nice to have is some intrinsic
Knud Soerensen (31/31) Feb 02 2007 I think it is better to make very general framework for describing the

janderson (19/56) Feb 02 2007 I think this is a special case. Typically I'd say the programmer would
Mikola Lysenko (56/72) Feb 03 2007 Well, I can't say that I share your enthusiasm for the power of

janderson (9/44) Feb 03 2007 I couldn't agree more with this last part. However I must say that for
Knud Soerensen (23/107) Feb 03 2007 I agree with you on this.

Mikola Lysenko <mclysenk mtu.edu> writes:

Coordinate geometry is a fundamental part of mathematics.  Many 
important problems in computer science are easily expressed in terms of 
vectors and products.  There are many domains which could benefit from 
enhanced vector performance, especially in scientific and graphical 
applications.

Despite their importance, very few languages provide support for vector 
types.  The prevailing notion is that it should be up to the compiler to 
optimize and somehow automatically determine where SIMD code can be 
applied.  The problem with this approach is that it is often very 
difficult for the compiler to spot where a SIMD operation can be 
applied.  Code must be carefully tweaked to make the compiler recognize 
the correct possible optimization.  Even then, many programmers still 
write their own low-dimensional vector classes for convenience, and 
perform all SIMD optimizations by hand in assembler.

It is important to make the distinction between low dimension vectors 
and other higher order arrays.  This is crucial since the higher 
dimensional arrays are often more tenuously connected with any sort of 
geometric or physical interpretation.  Moreover, many architectures are 
specially optimized for the lower dimensional cases, and offer special 
registers which are virtually impossible to properly exploit using 
libraries alone.  The situation is analogous to floating point 
arithmetic, proper hardware support requires language level integration. 
  Shader languages (Cg/GLSL) already provide extremely robust vector 
arithmetic support, and give some direction to this proposal.  Here are 
the proposed language extensions:

1. Additional primitive types

All numerical primitive types in the language will be supplemented with 
3 additional vector extensions.  Each vector extension will contain 2, 3 
or 4 components since these are best supported by hardware, and the most 
relevant geometrically.  The initial value for each component in each 
type will be the same as that of its base type.  Each the size of each 
vector component will be equal to the size of its base type times its 
dimension.  Here is a complete list of the new types to be added:

Integer types:
byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, 
long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, 
ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, 
ucent3, ucent4

Floating types:
float2, float3, float4, double2, double3, double4, real2, real3, real4

Imaginary types:
ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, ireal3, 
ireal4

Complex types:
cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, creal3, 
creal4

All new vector types are pass-by-value, and may be used in combination 
with any number of other types.  The following expressions would all be 
valid:

int2[] a;
float4* b;
char[][ucent4] c;


2. Vector literals

It must be possible to declare a vector literal with unambiguously as an 
argument to a function, or any number of other situations.  This syntax 
must be concise since it will be used often, and it must be flexible, 
allowing vectors to be constructed via concatenation.  In this light, 
the following is proposed:

float4(1, 2, 3, 4);
int2(100, 200);
creal3(1 + 2i, 3+4i, 5-6i);

Additionally, this syntax will also support the construction of vectors 
from other sub vectors.  This can be useful for augmenting a 3-vector 
with a fourth homogenous component for transformation amongst other 
things.  Here are some examples:

float4(float3(x, y, z), 1);		// returns (x, y, z, 1)
int3(0, int2(a, b));			// returns (0, a, b)
creal4(ireal2(r, z), real2(p, q));	// returns (r, z, p, q)


3. Casting rules

Vector types will obey the same casting rules as their base type within 
a single dimension.  It is not possible to use a cast to change the 
dimension of a vector.

cast(float2)(int2(1, 2));	//ok
cast(int)(float2(3, 4));	//error, int is not the same dimension as float2
real3(a, b) + ireal(c, d);	//ok, type of expression is creal3


4. Component-Wise Arithmetic

All of the arithmetic operators on primitive types will be extended such 
that when applied to vectors they operate on each component 
independently.  The size and types of the vectors must be compatible 
with the operator.

float4(1, 2, 3, 4) + float4(3, 5, 6, 7); 	// Vector-Vector operation, 
returns (4, 7, 9, 11)
float3(1, 5, 6) * float3(1, 0, 2);		// returns (1, 0, 12)
uint2(0xFF00FF, 0x00FF88) & uint2(0xFF, 0xFF);	// returns (0xFF, 0x88)

int3(x, y, z) * int2(a, b);			// error, size of vectors does not match

Additionally, scalar vector operations will be supported.  Each 
scalar-vector operator is applied per-component within the vector.

int2(0, 3) * 5;					// Vector-Scalar operation, returns (0, 15)
byte2(2, 5) + 6;				// returns (8, 11)

Note that the only two comparison operators allowed on vectors are == 
and !=.  The problem with <, >, etc. is that there is no commonly agreed 
upon definition for such things.  Therefore the operators are omitted to 
avoid confusion.

int2(1, 2) == int2(1, 2);		//ok, evaluates to true
float3(1, 0, 0) < float3(0, 0, 1);	// error, < is undefined for vectors


5. Swizzle operations

In many vector applications, it is often necessary to reorder the 
components.  In shader languages this is accomplished via 'swizzle' 
properties in each vector type.  Each swizzle selects a set of 1-4 
vector components and places them into a new vector of the given 
dimension.  Conventionally these components have the following names:

x - first component
y - second component
z - third component
w - fourth component

A swizzle is then given as a property consisting of 1-4 components in a 
given order.  Here are some examples:

float2(a, b).yx;		// evaluates to (b, a)
float3(1, 2, 3).zzyx;		// evaluates to (3, 3, 2, 1)
byte4('a', 'b', 'c', 'd').w;	// evaluates to 'd'
creal3(1 + 2i, 0, 4i).zzzz;	// evaluates to (4i, 4i, 4i, 4i)

int2(1, 2).z;			// error, vector does not have 3 components

These can be useful when projecting homogenous vectors into a lower 
dimension subspace, or when computing various products.  Here is an 
example of a cross product implemented using swizzles:

float3 cross(float3 a, float3 b)
{
	return a.zxy * b.yzx - a.yzx * b.zxy;
}

And here is a simple homogenous projection:

float3 project(float4 hg)
{
	return hg.xyz / hg.w;
}


6. Standard Library Additions

The basic vector type additions have been kept as spartan as possible to 
avoid cluttering the basic language, while still allowing the maximum 
expressiveness.  Instead of defining various vector products as 
properties, they are implemented within a standard library extension. 
The reason for this is simple; many users may wish to redefine these 
products based on their own coordinate system.  Within the standard 
library, all vector products are performed in right-handed euclidean 
space.

Vector specific functions will be placed in a new library std.vmath, and 
all functions in std.math will be extended to work on vectors as well 
primitive types.  Here is a list of functions std.vmath will support 
along with a brief description:

dot(a, b) - Computes dot product of two vectors:
	a.x * b.x + a.y * b.y + a.z * b.z + a.w * b.w

cross(a, b) - Computes the cross product as defined above.  Only defined 
for 3d vectors.

perp(a, b) - Computes the perp product of two vectors as (a.x * b.y - 
a.y * b.x).  Only defined for 2d vectors.

mag(a) - Computes the magnitude of a vector, given as:  sqrt(x * x + y * 
y + z * z + w * w);

mag2(a) - Compute the square of the magnitude of a vector.

norm(a) - Returns a / mag(a)

lerp(a, b, t) - Compute an interpolated position between two vectors.  a 
* (1 - t) + b * t

Additionally, it would be convenient to have quaternion functions 
supported in the standard library.  A quaternion q would be a 4 vector 
with the following form:  q = q.w + q.x * i + q.y * j + q.z * k

qconj(a) - Computes the quaternion conjugate of a.  Given as 
float4(-a.xyz, a.w)

qmul(a, b) - Computes the quaternion product of two 4d vectors.

qrot(a, b) - Performs a quaternion rotation/dilation on the 3d vector b 
by the 4d quaternion a.


Finally, there are range manipulation functions:

clamp(a, min, max) - Clamps each component of a to the range [min, max]

minv(a) - Returns the smallest component in the vector a

maxv(a) - Returns the largest component in the vector a

Jan 30 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Mikola Lysenko wrote:
[snip]
 1. Additional primitive types
 
 byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, 
 long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, 
 ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, 
 ucent3, ucent4
 float2, float3, float4, double2, double3, double4, real2, real3, real4
 ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, ireal3, 
 ireal4
 cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, creal3, 
 creal4

I think the arguing makes much sense, but as the proposal adds 57(!) new 
domain specific primitive types to the D language specification, I 
believe it is way too heavy.

For discussion, here is a not very thought out counter proposal:

1. Make the D 2.0 static array pass-by-value instead of it's 
inconsistent and strange pseudo-reference/value type heritage from C. 
(Pass by reference could be retained for extern(C) functions only).

2. Define ABIs for SSE2, SSE3, etc platforms so that small static arrays 
can be allocated and passed in SIMD registers.

3. Implement vector operations for static arrays.

Advantages:

* The code would automatically be platform independent. On a non 
SIMD-able platform, float[4], byte[8], and so on would all be well 
defined and work identically to the way they do on SIMD platforms.

* Future platforms would not require changes to the language 
specification. float[8] could suddenly become SIMD-able for instance.

* Minor changes to the language specification. Mostly ABI changes.

A much worse, but less radical proposal, would be to revive the C 
"register" keyword. Instead of:

float4 x;

you get:

register float[4] x;

/Oskar

Jan 30 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Oskar Linde wrote:
 Mikola Lysenko wrote:
 [snip]
 1. Additional primitive types

 byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, 
 long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, 
 ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, 
 ucent3, ucent4
 float2, float3, float4, double2, double3, double4, real2, real3, real4
 ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, 
 ireal3, ireal4
 cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, 
 creal3, creal4

 
 I think the arguing makes much sense, but as the proposal adds 57(!) new 
 domain specific primitive types to the D language specification, I 
 believe it is way too heavy.
 
 For discussion, here is a not very thought out counter proposal:
 
 1. Make the D 2.0 static array pass-by-value instead of it's 
 inconsistent and strange pseudo-reference/value type heritage from C. 
 (Pass by reference could be retained for extern(C) functions only).
 
 2. Define ABIs for SSE2, SSE3, etc platforms so that small static arrays 
 can be allocated and passed in SIMD registers.
 
 3. Implement vector operations for static arrays.
 
 Advantages:
 
 * The code would automatically be platform independent. On a non 
 SIMD-able platform, float[4], byte[8], and so on would all be well 
 defined and work identically to the way they do on SIMD platforms.
 
 * Future platforms would not require changes to the language 
 specification. float[8] could suddenly become SIMD-able for instance.
 
 * Minor changes to the language specification. Mostly ABI changes.

I must say, I like this version better.

 A much worse, but less radical proposal, would be to revive the C 
 "register" keyword. Instead of:
 
 float4 x;
 
 you get:
 
 register float[4] x;

Something like this may unfortunately be necessary if backwards 
compatibility is to be maintained with pass-by-reference code. :(

Jan 30 2007

Mikola Lysenko <mclysenk mtu.edu> writes:

Oskar Linde wrote:
 For discussion, here is a not very thought out counter proposal:
 
 1. Make the D 2.0 static array pass-by-value instead of it's 
 inconsistent and strange pseudo-reference/value type heritage from C. 
 (Pass by reference could be retained for extern(C) functions only).
 
 2. Define ABIs for SSE2, SSE3, etc platforms so that small static arrays 
 can be allocated and passed in SIMD registers.
 
 3. Implement vector operations for static arrays.
 

After thinking about it some more, I believe this approach would 
probably work just as well.  Another possibility is to simply restrict 
vectors to floating point types only:

ie, float2, float3, float4, double2, double3, double4, real2, real3, real4.

This makes the number of new types only 9 instead of 57.  The 
disadvantage is that it is no longer easy to use the MMX extensions 
which are specially tailored for integer vectors.  Overall, I guess I'd 
be happy with either situation.

Jan 30 2007

Benji Smith <dlanguage benjismith.net> writes:

Mikola Lysenko wrote:
 After thinking about it some more, I believe this approach would 
 probably work just as well.  Another possibility is to simply restrict 
 vectors to floating point types only:
 
 ie, float2, float3, float4, double2, double3, double4, real2, real3, real4.
 
 This makes the number of new types only 9 instead of 57.  The 
 disadvantage is that it is no longer easy to use the MMX extensions 
 which are specially tailored for integer vectors.  Overall, I guess I'd 
 be happy with either situation.

Three and four dimensional vectors are nice for physics and whatnot, but 
  for those of us in the machine-learning field, it's commonplace to 
construct vectors (and to calculate distances, dot-products, and cosines 
between those vectors) using many many more dimensions.

Right now, I'm working on a statistical prediction algorithm that uses 
sparse 300,000-dimensional feature vectors. There's no reason to limit 
SSE optimizations to small-dimensionality vectors.

I don't think having special datatypes is the right solution. I think it 
should be possible to annotate any array with a flag that says "I'll be 
doing vector math here. Watch out for possible optimizations".

Or just having a library of vector functions would probably be 
sufficient to solve the problem in the majority of cases.

--benji

Jan 30 2007

Mikola Lysenko <mclysenk mtu.edu> writes:

Benji Smith wrote:
 Right now, I'm working on a statistical prediction algorithm that uses 
 *sparse* 300,000-dimensional feature vectors. There's no reason to limit 
 SSE optimizations to small-dimensionality vectors.
 

I can understand your desire to create a universal solution, and for a 
long time I too was convinced this was the way to go.  However I have 
come to understand that there are many practical differences between low 
and high dimension vectors.  Think about it this way: would you try to 
use a sparse matrix/vector library for doing operations on 4-d vectors 
and matrices?  It should be clear that these are two different very 
different types of problems.

If we tried to use pass-by-value semantics and array operations on 
gigantic vectors the result would be equally bad.  Obviously passing a 
300k+ element array around by value is going to destroy your 
performance.  The classic solution is to create a series of custom 
routines for whichever problem you are trying to solve, or perhaps use a 
robust optimized library like BLAS for the heavy lifting.

High dimension linear algebra is a tricky subject, and it can take years 
of research to get algorithms with competitive performance.  On the 
other hand, Low dimension vectors require relatively trivial compiler 
support, and are used far more often.  To get a rough estimate of their 
relative popularity, I invoke Google CodeSearch:

vector math -sparse : 282k hits
http://www.google.com/codesearch?q=vector+math+-sparse&hl=en&btnG=Search+Code

vector sparse : 31k hits
http://www.google.com/codesearch?hl=en&lr=&q=vector+sparse&btnG=Search

 From this informal statistic, we could infer that the low dimensional 
vectors are used at least an order of magnitude more frequently than 
their higher order counterparts.

I do not mean to say that fast sparse vector/matrix operations are 
unimportant, after all a fast matrix solver can sometimes shave weeks 
off your processing time.  In such a situation, it is permissible for a 
library to be slightly cumbersome - just as long as it gets the job done 
fast.

Compare this to small vectors, where many applications place convenience 
and readability above performance.  Integrating small vectors into the 
core language will not only improve their usability, but it should also 
provide substantial performance gains.

Jan 30 2007

janderson <askme me.com> writes:

Mikola Lysenko wrote:
 Benji Smith wrote:
 Right now, I'm working on a statistical prediction algorithm that uses 
 *sparse* 300,000-dimensional feature vectors. There's no reason to 
 limit SSE optimizations to small-dimensionality vectors.

 
 I can understand your desire to create a universal solution, and for a 
 long time I too was convinced this was the way to go.  However I have 
 come to understand that there are many practical differences between low 
 and high dimension vectors.  Think about it this way: would you try to 
 use a sparse matrix/vector library for doing operations on 4-d vectors 
 and matrices?  It should be clear that these are two different very 
 different types of problems.
 
 If we tried to use pass-by-value semantics and array operations on 
 gigantic vectors the result would be equally bad.  Obviously passing a 
 300k+ element array around by value is going to destroy your 
 performance.  The classic solution is to create a series of custom 
 routines for whichever problem you are trying to solve, or perhaps use a 
 robust optimized library like BLAS for the heavy lifting.
 
 High dimension linear algebra is a tricky subject, and it can take years 
 of research to get algorithms with competitive performance.  On the 
 other hand, Low dimension vectors require relatively trivial compiler 
 support, and are used far more often.  To get a rough estimate of their 
 relative popularity, I invoke Google CodeSearch:
 
 vector math -sparse : 282k hits
 http://www.google.com/codesearch?q=vector+math+-sparse&hl=en&btnG=Search+Code 
 
 
 vector sparse : 31k hits
 http://www.google.com/codesearch?hl=en&lr=&q=vector+sparse&btnG=Search
 
  From this informal statistic, we could infer that the low dimensional 
 vectors are used at least an order of magnitude more frequently than 
 their higher order counterparts.
 
 I do not mean to say that fast sparse vector/matrix operations are 
 unimportant, after all a fast matrix solver can sometimes shave weeks 
 off your processing time.  In such a situation, it is permissible for a 
 library to be slightly cumbersome - just as long as it gets the job done 
 fast.
 
 Compare this to small vectors, where many applications place convenience 
 and readability above performance.  Integrating small vectors into the 
 core language will not only improve their usability, but it should also 
 provide substantial performance gains.

I don't think its critical that high order matrix multiplications get 
the same performance gains as smaller arrays that use the SIMD.  Its 
just important that its consistent.

Putting vector operation into an array gives us more power because we 
get all the standard array operations as well (I concede that these 
operations could be put into the base arrays as well).

I hope that eventually that dynamic arrays would get the SIMD support as 
well.  That way you could for instance load an entire list of vertexes 
(or whatever) and apply an operation on all of them.  D could even do 
this in parallel.  (I concede here that you could probably do that with 
an array of these base types. Also vectors with 4 parts can be treated 
somewhat differently in that the 4th component is sometimes used for 
other reasons).

I generally think that representing vector types as arrays makes the 
language more powerful, then having individual special case types.

-Joel

Feb 01 2007

Chad J <gamerChad _spamIsBad_gmail.com> writes:

janderson wrote:
 Mikola Lysenko wrote:
 
 Benji Smith wrote:

 Right now, I'm working on a statistical prediction algorithm that 
 uses *sparse* 300,000-dimensional feature vectors. There's no reason 
 to limit SSE optimizations to small-dimensionality vectors.

 I can understand your desire to create a universal solution, and for a 
 long time I too was convinced this was the way to go.  However I have 
 come to understand that there are many practical differences between 
 low and high dimension vectors.  Think about it this way: would you 
 try to use a sparse matrix/vector library for doing operations on 4-d 
 vectors and matrices?  It should be clear that these are two different 
 very different types of problems.

 If we tried to use pass-by-value semantics and array operations on 
 gigantic vectors the result would be equally bad.  Obviously passing a 
 300k+ element array around by value is going to destroy your 
 performance.  The classic solution is to create a series of custom 
 routines for whichever problem you are trying to solve, or perhaps use 
 a robust optimized library like BLAS for the heavy lifting.

 High dimension linear algebra is a tricky subject, and it can take 
 years of research to get algorithms with competitive performance.  On 
 the other hand, Low dimension vectors require relatively trivial 
 compiler support, and are used far more often.  To get a rough 
 estimate of their relative popularity, I invoke Google CodeSearch:

 vector math -sparse : 282k hits
 http://www.google.com/codesearch?q=vector+math+-sparse&hl=en&btnG=Search+Code 


 vector sparse : 31k hits
 http://www.google.com/codesearch?hl=en&lr=&q=vector+sparse&btnG=Search

  From this informal statistic, we could infer that the low dimensional 
 vectors are used at least an order of magnitude more frequently than 
 their higher order counterparts.

 I do not mean to say that fast sparse vector/matrix operations are 
 unimportant, after all a fast matrix solver can sometimes shave weeks 
 off your processing time.  In such a situation, it is permissible for 
 a library to be slightly cumbersome - just as long as it gets the job 
 done fast.

 Compare this to small vectors, where many applications place 
 convenience and readability above performance.  Integrating small 
 vectors into the core language will not only improve their usability, 
 but it should also provide substantial performance gains.

 
 
 I don't think its critical that high order matrix multiplications get 
 the same performance gains as smaller arrays that use the SIMD.  Its 
 just important that its consistent.
 
 Putting vector operation into an array gives us more power because we 
 get all the standard array operations as well (I concede that these 
 operations could be put into the base arrays as well).
 
 I hope that eventually that dynamic arrays would get the SIMD support as 
 well.  That way you could for instance load an entire list of vertexes 
 (or whatever) and apply an operation on all of them.  D could even do 
 this in parallel.  (I concede here that you could probably do that with 
 an array of these base types. Also vectors with 4 parts can be treated 
 somewhat differently in that the 4th component is sometimes used for 
 other reasons).
 
 I generally think that representing vector types as arrays makes the 
 language more powerful, then having individual special case types.
 
 -Joel

I think we should have array operations for dynamic arrays 
(n-dimensional vectors) as well, but distinct from lo-D static arrays. 
The reason is the same as others state: hi-D arrays can spend some fixed 
overhead setting things up before jumping into a gigantic loop, whereas 
lo-D arrays get optimized much differently with no fixed overhead and 
possibly higher running overhead or less generality.  They are two 
different beasts.

Since hi-D arrays can deal with some overhead at startup, they are ideal 
candidates for a library implementation of array operations.  There are 
only one or two features D would need to get there and have this 
seamless library implementation (maybe they are done already?? I 
forget), and that may be an easier and more general route than trying to 
make compiler writers (Walter) write compiler support for both lo-D and 
hi-D optimization cases.

Feb 01 2007

Benji Smith <dlanguage benjismith.net> writes:

Chad J wrote:
 I think we should have array operations for dynamic arrays 
 (n-dimensional vectors) as well, but distinct from lo-D static arrays. 
 The reason is the same as others state: hi-D arrays can spend some fixed 
 overhead setting things up before jumping into a gigantic loop, whereas 
 lo-D arrays get optimized much differently with no fixed overhead and 
 possibly higher running overhead or less generality.  They are two 
 different beasts.

The more I think about it, the more I agree.

Hi-D vectors are almost always sparse, so the feature/magnitude pairs 
are usually stored in a HashMap (or some other similar container) rather 
than being represented in arrays. Adding compiler optimizations for hi-D 
vector operators would mean knowing the underlying implementation of the 
container, and that'd obviously be bad.

But I don't think vector optimizations should be constrained to 3D or 4D 
vectors. Generalize the optimizations on array types, and then any code 
that performs vector operations on those arrays automatically gets the 
performance boost, regardless of the dimensionality of their vectors.

--benji

Feb 07 2007

Don Clugston <dac nospam.com.au> writes:

Mikola Lysenko wrote:
 Benji Smith wrote:
 Right now, I'm working on a statistical prediction algorithm that uses 
 *sparse* 300,000-dimensional feature vectors. There's no reason to 
 limit SSE optimizations to small-dimensionality vectors.

 
 I can understand your desire to create a universal solution, and for a 
 long time I too was convinced this was the way to go.  However I have 
 come to understand that there are many practical differences between low 
 and high dimension vectors.  Think about it this way: would you try to 
 use a sparse matrix/vector library for doing operations on 4-d vectors 
 and matrices?  It should be clear that these are two different very 
 different types of problems.

I completely agree. For large vectors and matrices, you're going to have 
to use a loop, and the loop startup costs are amortised; you can use the 
same BLAS code for a 3159 element vector as for a 3158 element one, and 
it makes negligible difference to the speed. But for 3 and 4 element 
vectors it makes an enormous difference.

And the applications are completely different. I imagine games 
programmers don't use much above 4x4 matrices.

Feb 01 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Don Clugston wrote:
 Mikola Lysenko wrote:
 Benji Smith wrote:
 Right now, I'm working on a statistical prediction algorithm that 
 uses *sparse* 300,000-dimensional feature vectors. There's no reason 
 to limit SSE optimizations to small-dimensionality vectors.

 I can understand your desire to create a universal solution, and for a 
 long time I too was convinced this was the way to go.  However I have 
 come to understand that there are many practical differences between 
 low and high dimension vectors.  Think about it this way: would you 
 try to use a sparse matrix/vector library for doing operations on 4-d 
 vectors and matrices?  It should be clear that these are two different 
 very different types of problems.

 
 I completely agree. For large vectors and matrices, you're going to have 
 to use a loop, and the loop startup costs are amortised; you can use the 
 same BLAS code for a 3159 element vector as for a 3158 element one, and 
 it makes negligible difference to the speed. But for 3 and 4 element 
 vectors it makes an enormous difference.
 
 And the applications are completely different. I imagine games 
 programmers don't use much above 4x4 matrices.

I'd say that's not necessarily true with games using more and more 
realistic physics, and even with run of the mill skeletal animation 
techniques.  But in those cases you're talking arbitrary N-vectors, so 
in any event the most common dimensions for data are:
     2 (screen/texture space),
     3 (3D world space, homogeneous 2D space, color space),
     4 (homogeneous 3D world space),
and N (physics and everything else).

--bb

Feb 01 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Oskar Linde wrote:
 Mikola Lysenko wrote:
 [snip]
 1. Additional primitive types

 byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, 
 long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, 
 ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, 
 ucent3, ucent4
 float2, float3, float4, double2, double3, double4, real2, real3, real4
 ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, 
 ireal3, ireal4
 cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, 
 creal3, creal4

 
 I think the arguing makes much sense, but as the proposal adds 57(!) new 
 domain specific primitive types to the D language specification, I 
 believe it is way too heavy.

Yes I agree with Oskar and Walter that all those 57 primitive type names 
are not needed in the core language, and instead float[4] should act 
like your float4.

But for users, this kind of alias should behave as expected:
     alias float[4] float4;

And a std.vecmath library could define all 57 of the aliases above.

--bb

Jan 30 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Mikola Lysenko wrote:
 All new vector types are pass-by-value, and may be used in combination 
 with any number of other types.  The following expressions would all be 
 valid:
 

[snip 2 others]
 char[][ucent4] c;

[snip]

 Note that the only two comparison operators allowed on vectors are == 
 and !=.  The problem with <, >, etc. is that there is no commonly agreed 
 upon definition for such things.  Therefore the operators are omitted to 
 avoid confusion.
 
 int2(1, 2) == int2(1, 2);        //ok, evaluates to true
 float3(1, 0, 0) < float3(0, 0, 1);    // error, < is undefined for vectors

These two sections are contradictory: associative arrays need an 
ordering to be defined on the key type.
How about a simple lexicographical order, with two non-equal vectors (of 
the same dimensionality) comparing as their first non-equal component 
would? (That's how comparisons work for arrays IIRC)


Other than that I don't see any obvious problems with it.

Jan 30 2007

Mikola Lysenko <mclysenk mtu.edu> writes:

Frits van Bommel wrote:
 Mikola Lysenko wrote:
 All new vector types are pass-by-value, and may be used in combination 
 with any number of other types.  The following expressions would all 
 be valid:

 [snip 2 others]
 char[][ucent4] c;

 
 [snip]
 
 Note that the only two comparison operators allowed on vectors are == 
 and !=.  The problem with <, >, etc. is that there is no commonly 
 agreed upon definition for such things.  Therefore the operators are 
 omitted to avoid confusion.

 int2(1, 2) == int2(1, 2);        //ok, evaluates to true
 float3(1, 0, 0) < float3(0, 0, 1);    // error, < is undefined for 
 vectors

 
 These two sections are contradictory: associative arrays need an 
 ordering to be defined on the key type.
 How about a simple lexicographical order, with two non-equal vectors (of 
 the same dimensionality) comparing as their first non-equal component 
 would? (That's how comparisons work for arrays IIRC)
 
 
 Other than that I don't see any obvious problems with it.

Hmm...  Fair enough.  We could define ordering on a per component level 
arbitrarily so associative arrays work.  I could see this being 
confusing in some situations, but it would be worth it in order to make 
the types consistent.  Therefore:

float3(x, y, z) < float3(a, b, c)

would translate too:

(x < a) ? true : ((y < b) ? true : (z < c))

Jan 30 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Mikola Lysenko wrote:
 Hmm...  Fair enough.  We could define ordering on a per component level 
 arbitrarily so associative arrays work.  I could see this being 
 confusing in some situations, but it would be worth it in order to make 
 the types consistent.  Therefore:
 
 float3(x, y, z) < float3(a, b, c)
 
 would translate too:
 
 (x < a) ? true : ((y < b) ? true : (z < c))

Your code is wrong when e.g. (x > a) and (y < b). It returns false true 
while it should be false; x != a means ((x, y, z) < (a, b, c)) == (x < a).

Think of the order of words in a dictionary: first you compare the first 
character, if it's equal you go on to the second, and so on. The first 
character that differs determines the sort order.

It'd be implemented something like this:
---
int compare (short3 a, short3 b)
     if (a.x != b.x) return a.x - b.x;
     if (a.y != b.y) return a.y - b.y;
     if (a.z != b.z) return a.z - b.z;
     return 0;   // all elements equal, so the vectors are equal
}
---
Note 1: I used shorts to produce clear code but avoid overflows.
Note 2: This uses the opCmp/TypeInfo.compare return value convention: 
negative means a < b, positive means a > b, zero means equality.

Jan 30 2007

Mikola Lysenko <mclysenk mtu.edu> writes:

Frits van Bommel wrote:
 Mikola Lysenko wrote:
 Hmm...  Fair enough.  We could define ordering on a per component 
 level arbitrarily so associative arrays work.  I could see this being 
 confusing in some situations, but it would be worth it in order to 
 make the types consistent.  Therefore:

 float3(x, y, z) < float3(a, b, c)

 would translate too:

 (x < a) ? true : ((y < b) ? true : (z < c))

 
 Your code is wrong when e.g. (x > a) and (y < b). It returns false true 
 while it should be false; x != a means ((x, y, z) < (a, b, c)) == (x < a).
 
 Think of the order of words in a dictionary: first you compare the first 
 character, if it's equal you go on to the second, and so on. The first 
 character that differs determines the sort order.
 
 It'd be implemented something like this:
 ---
 int compare (short3 a, short3 b)
     if (a.x != b.x) return a.x - b.x;
     if (a.y != b.y) return a.y - b.y;
     if (a.z != b.z) return a.z - b.z;
     return 0;   // all elements equal, so the vectors are equal
 }
 ---
 Note 1: I used shorts to produce clear code but avoid overflows.
 Note 2: This uses the opCmp/TypeInfo.compare return value convention: 
 negative means a < b, positive means a > b, zero means equality.
 


Whoops!  I wasn't really thinking too carefully when I wrote that. 
You're correct, a lexicographic ordering is definitely the simplest in 
this case.

Jan 30 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Mikola Lysenko wrote:
 Whoops!  I wasn't really thinking too carefully when I wrote that. 
 You're correct, a lexicographic ordering is definitely the simplest in 
 this case.

It also fits really well with the "plain static array" proposed change: 
this is already what arrays do[1] :).

[1]: Well, if they're the same length. Otherwise they compare by length, 
presumably for efficiency.

Jan 30 2007

Walter Bright <newshound digitalmars.com> writes:

Thank you, this is in line with what I thought you meant. I'm glad I was 
right. Your proposal is sensible and well thought out.

The only improvement I'd make is instead of:

float4 f;

Have:

float[4] f;

i.e. make static arrays of 2,3,4 dimensions have the suggested 
properties. I think this is quite doable.

Jan 30 2007

Mikola Lysenko <mclysenk mtu.edu> writes:

Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I was 
 right. Your proposal is sensible and well thought out.
 
 The only improvement I'd make is instead of:
 
 float4 f;
 
 Have:
 
 float[4] f;
 
 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.


Yes, Oskar made a good point in his post.  The main issue I was 
concerned with was the vector literal syntax.  Currently array literals 
are allocated on the heap when you declare them, which would be 
prohibitively costly in most situations.  In order for low-dimension 
vectors to be useful, they need a simple constructor syntax that does 
not allocate anything on the heap.

Jan 30 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I was 
 right. Your proposal is sensible and well thought out.
 
 The only improvement I'd make is instead of:
 
 float4 f;
 
 Have:
 
 float[4] f;
 
 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.

I don't think this is advisable. It practically introduces an arbitrary 
discontinuity. Think of a template of T and N that manipulates arrays 
T[N]. All of a sudden, for certain values of T and N, arrays started 
being passed around by value.

What exactly makes small vectors unsuitable for an implementation via 
library elements? If we distill a list, maybe we also can figure ways in 
which we can make the language better, not only one special case of it.


Andrei

Jan 30 2007

Walter Bright <newshound digitalmars.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I 
 was right. Your proposal is sensible and well thought out.

 The only improvement I'd make is instead of:

 float4 f;

 Have:

 float[4] f;

 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.

 
 I don't think this is advisable. It practically introduces an arbitrary 
 discontinuity. Think of a template of T and N that manipulates arrays 
 T[N]. All of a sudden, for certain values of T and N, arrays started 
 being passed around by value.

I agree that it's a very bad idea to have such a discontinuity in ref vs 
value. Static arrays would have to be all by value or all by ref. Then, 
you'll just see slower speed for the larger ones.

Jan 30 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I 
 was right. Your proposal is sensible and well thought out.

 The only improvement I'd make is instead of:

 float4 f;

 Have:

 float[4] f;

 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.

 I don't think this is advisable. It practically introduces an 
 arbitrary discontinuity. Think of a template of T and N that 
 manipulates arrays T[N]. All of a sudden, for certain values of T and 
 N, arrays started being passed around by value.

 
 I agree that it's a very bad idea to have such a discontinuity in ref vs 
 value. Static arrays would have to be all by value or all by ref. Then, 
 you'll just see slower speed for the larger ones.

So then arrays by ref and a stdlib template that wraps an array in a 
struct could make everyone happy.

Andrei

Jan 30 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I 
 was right. Your proposal is sensible and well thought out.

 The only improvement I'd make is instead of:

 float4 f;

 Have:

 float[4] f;

 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.

 I don't think this is advisable. It practically introduces an 
 arbitrary discontinuity. Think of a template of T and N that 
 manipulates arrays T[N]. All of a sudden, for certain values of T and 
 N, arrays started being passed around by value.

 I agree that it's a very bad idea to have such a discontinuity in ref 
 vs value. Static arrays would have to be all by value or all by ref. 
 Then, you'll just see slower speed for the larger ones.

 
 So then arrays by ref and a stdlib template that wraps an array in a 
 struct could make everyone happy.

I disagree. What you describe is a way of working around a design issue 
instead of resolving it.

The C (and D) static array has always been an odd type. It is not 
assignable, but is passed by reference to functions. You can not return 
static arrays from functions. Template libraries like std.bind have to 
add lots of extra code to special case the static arrays. Structs 
containing static arrays behave differently than static arrays 
themselves. Also, in D, the static array is the only type where 
typeof(T.init) != T.

If static arrays became a value type, all irregularities could disappear.

The only place this change would make any difference for old code is 
where static arrays are passed as function parameters. I believe the 
amount of code containing functions with static array arguments is very 
small in D. In fact, only two Phobos modules contain such functions 
(std.utf and std.md5). Converting those functions would be trivial: The 
by ref behavior would be attainable with the inout and out parameter 
types that today are forbidden for static arrays.

/Oskar

Feb 01 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Oskar Linde wrote:
 The C (and D) static array has always been an odd type. It is not 
 assignable, but is passed by reference to functions. You can not return 
 static arrays from functions. Template libraries like std.bind have to 
 add lots of extra code to special case the static arrays. Structs 
 containing static arrays behave differently than static arrays 
 themselves. Also, in D, the static array is the only type where 
 typeof(T.init) != T.
 
 If static arrays became a value type, all irregularities could disappear.

But then people will complain about the inconsistency between static 
arrays and dynamic arrays. Why some are passed by value and others are 
passed by reference?

 The only place this change would make any difference for old code is 
 where static arrays are passed as function parameters. I believe the 
 amount of code containing functions with static array arguments is very 
 small in D. In fact, only two Phobos modules contain such functions 
 (std.utf and std.md5). Converting those functions would be trivial: The 
 by ref behavior would be attainable with the inout and out parameter 
 types that today are forbidden for static arrays.

Probably things could be made to work; I personally am unclear on what's 
the most self-consistent set of rules.


Andrei

Feb 01 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Oskar Linde wrote:
 If static arrays became a value type, all irregularities could disappear.

 
 But then people will complain about the inconsistency between static 
 arrays and dynamic arrays. Why some are passed by value and others are 
 passed by reference?

I don't see much complaints about the "inconsistency" between structs 
and classes (which is probably the closest thing in the language currently).
Besides, static arrays are already more of a value type (typically 
allocated on the stack, directly in an aggregate, or in static data, 
like structs) and dynamic arrays more of a reference type (typically 
allocated on the heap, like classes).
The biggest difference is that dynamic arrays _can_ also refer to static 
arrays.

Feb 01 2007

Lionello Lunesu <lio lunesu.remove.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I 
 was right. Your proposal is sensible and well thought out.

 The only improvement I'd make is instead of:

 float4 f;

 Have:

 float[4] f;

 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.

 
 I don't think this is advisable. It practically introduces an arbitrary 
 discontinuity. Think of a template of T and N that manipulates arrays 
 T[N]. All of a sudden, for certain values of T and N, arrays started 
 being passed around by value.

That difference could be made using different notations:

void func( float[4] ar );  // by value
void funct(T,uint N)( T[N] ar );

void func( float[] ar );   // by ref
void funct(T)( T[] ar );

This makes sense, too, since "float[4] x;" is already allocated on the 
stack.

 What exactly makes small vectors unsuitable for an implementation via 
 library elements? If we distill a list, maybe we also can figure ways in 
 which we can make the language better, not only one special case of it.

This is already very doable, thanks to the possibility of calling global 
functions taking an array as if they were members of the array:

float Length( float[] ar ); // by ref
float[4] ar;
auto len = ar.Length();

L.

Jan 31 2007

janderson <askme me.com> writes:

Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I was 
 right. Your proposal is sensible and well thought out.
 
 The only improvement I'd make is instead of:
 
 float4 f;
 
 Have:
 
 float[4] f;
 
 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.


I think some of the properties could be used with multiple dimensions. 
SIMD could still be used efficiently on these higher orders (just repeat 
the operation when you run out).  This also opens up opportunities for 
parallel operations in the future.

Anything else could go in the standard library.

Swizzle is the odd one out.  It would have to be a special case. We can 
only support up to N characters with it, unless you do something fancy like:

float[8] value;
float[8] value2;

value.xyz = value2.x; //xyz in value are set to value2.x

value.xyz[xyz] = value2.x; //Sets both value[0-3] to xyz and value[4-7] 
to value2.x.

or

(value+4).xyz = value2.x;  //Just offset the array (value[4-7] are set 
to value2.x)

-Joel

Jan 31 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Having read the proposal, I think it can likely be accommodated entirely 
with current and up-and-coming language features. To expand:

Mikola Lysenko wrote:
 It is important to make the distinction between low dimension vectors 
 and other higher order arrays.  This is crucial since the higher 
 dimensional arrays are often more tenuously connected with any sort of 
 geometric or physical interpretation.

Well I disagree here, but it's not important. Look outside your domain. 
There are people who think as well the future belongs to machine 
learning and statistical methods, which are ripe with large dimensional 
vectors. I think it's just shortsighted to claim that geometrical and 
physical interpretation is more special than others.

 Moreover, many architectures are 
 specially optimized for the lower dimensional cases, and offer special 
 registers which are virtually impossible to properly exploit using 
 libraries alone.  The situation is analogous to floating point 
 arithmetic, proper hardware support requires language level integration.

I don't see much of an analogy, but this is also a minor point. Floating 
point arithmetic was weird way before it was implemented in hardware, 
and the weirdness reflected in languages. But on many machines adding 
two floating point numbers required a routine call, which didn't add or 
subtract anything to the equation.

 1. Additional primitive types

Let's try:

s/primitive types/types/ig

and see what happens, just for the sake of experiment.

 All numerical primitive types in the language will be supplemented with 
 3 additional vector extensions.  Each vector extension will contain 2, 3 
 or 4 components since these are best supported by hardware, and the most 
 relevant geometrically.  The initial value for each component in each 
 type will be the same as that of its base type.  Each the size of each 
 vector component will be equal to the size of its base type times its 
 dimension.  Here is a complete list of the new types to be added:
 
 Integer types:
 byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, 
 long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, 
 ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, 
 ucent3, ucent4
 
 Floating types:
 float2, float3, float4, double2, double3, double4, real2, real3, real4
 
 Imaginary types:
 ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, ireal3, 
 ireal4
 
 Complex types:
 cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, creal3, 
 creal4

My variant define vector_ops module:

struct small_vector(base, ubyte size)
{
   base[size] data;
   ...
}

alias small_vector!(byte, 2) byte2;
...

 All new vector types are pass-by-value, and may be used in combination 
 with any number of other types.

Check this one. Structs are passed by value, and various constructors 
and (up-and-coming) conversions will take care of the rest.

 The following expressions would all be 
 valid:
 
 int2[] a;
 float4* b;
 char[][ucent4] c;

Check that one.

 2. Vector literals
 
 It must be possible to declare a vector literal with unambiguously as an 
 argument to a function, or any number of other situations.  This syntax 
 must be concise since it will be used often, and it must be flexible, 
 allowing vectors to be constructed via concatenation.  In this light, 
 the following is proposed:
 
 float4(1, 2, 3, 4);
 int2(100, 200);
 creal3(1 + 2i, 3+4i, 5-6i);

Constructors. Check that one.

 Additionally, this syntax will also support the construction of vectors 
 from other sub vectors.  This can be useful for augmenting a 3-vector 
 with a fourth homogenous component for transformation amongst other 
 things.  Here are some examples:
 
 float4(float3(x, y, z), 1);        // returns (x, y, z, 1)
 int3(0, int2(a, b));            // returns (0, a, b)
 creal4(ireal2(r, z), real2(p, q));    // returns (r, z, p, q)

Constructors. Check that one too.

 3. Casting rules
 
 Vector types will obey the same casting rules as their base type within 
 a single dimension.  It is not possible to use a cast to change the 
 dimension of a vector.
 
 cast(float2)(int2(1, 2));    //ok
 cast(int)(float2(3, 4));    //error, int is not the same dimension as 
 float2
 real3(a, b) + ireal(c, d);    //ok, type of expression is creal3

opCast. Yup, check that sucker too.

 4. Component-Wise Arithmetic
 
 All of the arithmetic operators on primitive types will be extended such 
 that when applied to vectors they operate on each component 
 independently.  The size and types of the vectors must be compatible 
 with the operator.
 
 float4(1, 2, 3, 4) + float4(3, 5, 6, 7);     // Vector-Vector operation, 
 returns (4, 7, 9, 11)
 float3(1, 5, 6) * float3(1, 0, 2);        // returns (1, 0, 12)
 uint2(0xFF00FF, 0x00FF88) & uint2(0xFF, 0xFF);    // returns (0xFF, 0x88)
 
 int3(x, y, z) * int2(a, b);            // error, size of vectors does 
 not match

Overloaded operators. Check.

 Additionally, scalar vector operations will be supported.  Each 
 scalar-vector operator is applied per-component within the vector.
 
 int2(0, 3) * 5;                    // Vector-Scalar operation, returns 
 (0, 15)
 byte2(2, 5) + 6;                // returns (8, 11)

Overloaded operators. Check.

 Note that the only two comparison operators allowed on vectors are == 
 and !=.  The problem with <, >, etc. is that there is no commonly agreed 
 upon definition for such things.  Therefore the operators are omitted to 
 avoid confusion.
 int2(1, 2) == int2(1, 2);        //ok, evaluates to true
 float3(1, 0, 0) < float3(0, 0, 1);    // error, < is undefined for 

vectors

Check :o).

 5. Swizzle operations
 
 In many vector applications, it is often necessary to reorder the 
 components.  In shader languages this is accomplished via 'swizzle' 
 properties in each vector type.  Each swizzle selects a set of 1-4 
 vector components and places them into a new vector of the given 
 dimension.  Conventionally these components have the following names:
 
 x - first component
 y - second component
 z - third component
 w - fourth component
 
 A swizzle is then given as a property consisting of 1-4 components in a 
 given order.

This is more interesting. Today's D can't do it. Fortunately, code 
generation techniques that Walter works on right now (possibly literally 
at the time I'm writing this and/or at the time you're reading this) 
will enable comfortable generation of combinatorial functions, of which 
swizzles is actually an excellent motivating example.

   Here are some examples:
 
 float2(a, b).yx;        // evaluates to (b, a)
 float3(1, 2, 3).zzyx;        // evaluates to (3, 3, 2, 1)
 byte4('a', 'b', 'c', 'd').w;    // evaluates to 'd'
 creal3(1 + 2i, 0, 4i).zzzz;    // evaluates to (4i, 4i, 4i, 4i)
 
 int2(1, 2).z;            // error, vector does not have 3 components

Check (for D 2.0).

 These can be useful when projecting homogenous vectors into a lower 
 dimension subspace, or when computing various products.  Here is an 
 example of a cross product implemented using swizzles:
 
 float3 cross(float3 a, float3 b)
 {
     return a.zxy * b.yzx - a.yzx * b.zxy;
 }
 
 And here is a simple homogenous projection:
 
 float3 project(float4 hg)
 {
     return hg.xyz / hg.w;
 }

Yum. Love that.

 6. Standard Library Additions

[snip]

Check by virtue of definition :o).


Andrei

Jan 30 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Having read the proposal, I think it can likely be accommodated entirely 
 with current and up-and-coming language features. To expand:

 Moreover, many architectures are specially optimized for the lower 
 dimensional cases, and offer special registers which are virtually 
 impossible to properly exploit using libraries alone.  The situation 
 is analogous to floating point arithmetic, proper hardware support 
 requires language level integration.

 
 I don't see much of an analogy, but this is also a minor point. Floating 
 point arithmetic was weird way before it was implemented in hardware, 
 and the weirdness reflected in languages. But on many machines adding 
 two floating point numbers required a routine call, which didn't add or 
 subtract anything to the equation.

Your responses to the other points seem reasonable and encouraging, but 
your response to this issue concerning performance is... what?  That 
making optimal use of the hardware is not important?

For more specifics of what Mikola finds problematic on the efficiency 
front see:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmar
.D&article_id=47596 


--bb

Jan 30 2007

"Andrei Alexandrescu (See Website for Email)" <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Having read the proposal, I think it can likely be accommodated 
 entirely with current and up-and-coming language features. To expand:

 
 Moreover, many architectures are specially optimized for the lower 
 dimensional cases, and offer special registers which are virtually 
 impossible to properly exploit using libraries alone.  The situation 
 is analogous to floating point arithmetic, proper hardware support 
 requires language level integration.

 I don't see much of an analogy, but this is also a minor point. 
 Floating point arithmetic was weird way before it was implemented in 
 hardware, and the weirdness reflected in languages. But on many 
 machines adding two floating point numbers required a routine call, 
 which didn't add or subtract anything to the equation.

 
 Your responses to the other points seem reasonable and encouraging, but 
 your response to this issue concerning performance is... what?  That 
 making optimal use of the hardware is not important?

Nonono. I just meant: the fact that today's architectures have
dedicated operations and hardware for floating point did not change the
semantics of floating point operations in programming languages.

 For more specifics of what Mikola finds problematic on the efficiency 
 front see:
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmar
.D&article_id=47596 

Thanks. I'm seeing:

1. Function calls

I agree that optimization of compound operations a la d = a * b + c is
hard, but it should be addressed in a principled way (e.g. expression
templates).

2. Register allocation

This has been discussed in an older thread on comp.lang.c++.moderated.
Basically current compiler technology conservatively puts every
"serious" data structure (e.g. structs with constructors) in real 
memory, not registers. I see this as more of a limitation of current 
compiler technology, than a fundamental issue. Of course, I do no work 
on improving said technology, which lessens my point. :o)

3. Data alignment

D must add primitives to define types aligned at arbitrary boundaries.
This will not only help small vectors, but a host of other systems-level
software as well (e.g. allocators).


Andrei

Jan 30 2007

Deewiant <deewiant.doesnotlike.spam gmail.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Mikola Lysenko wrote:
 3. Casting rules

 Vector types will obey the same casting rules as their base type
 within a single dimension.  It is not possible to use a cast to change
 the dimension of a vector.

 cast(float2)(int2(1, 2));    //ok
 cast(int)(float2(3, 4));    //error, int is not the same dimension as float2
 real3(a, b) + ireal(c, d);    //ok, type of expression is creal3

 
 opCast. Yup, check that sucker too.
 

I presume the original proposal would have allowed both cast(float2)(int2(1, 2))
and cast(real2)(int2(1, 2)). Currently, however, we can have only one opCast per
struct/class.

Is this a planned D 2.0 change I've missed, or how would this work?

Feb 01 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Deewiant wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Mikola Lysenko wrote:
 3. Casting rules

 Vector types will obey the same casting rules as their base type
 within a single dimension.  It is not possible to use a cast to change
 the dimension of a vector.

 cast(float2)(int2(1, 2));    //ok
 cast(int)(float2(3, 4));    //error, int is not the same dimension as float2
 real3(a, b) + ireal(c, d);    //ok, type of expression is creal3

 opCast. Yup, check that sucker too.

 
 I presume the original proposal would have allowed both cast(float2)(int2(1,
2))
 and cast(real2)(int2(1, 2)). Currently, however, we can have only one opCast
per
 struct/class.
 
 Is this a planned D 2.0 change I've missed, or how would this work?

It is a change that I personally think should be made. Walter devised 
the one cast per class rule out of caution when facing an issue known to 
be potentially disastrous. Generally, the "when in doubt, don't put it 
in" approach turned out to be a winner for D in general --- the "length" 
Chernobyl disaster notwithstanding :o).

As our grasping of the tradeoffs involved improves, the right choices 
will appear naturally. This is sort of what happened with opAssign.


Andrei

Feb 01 2007

Walter Bright <newshound digitalmars.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 It is a change that I personally think should be made. Walter devised 
 the one cast per class rule out of caution when facing an issue known to 
 be potentially disastrous. Generally, the "when in doubt, don't put it 
 in" approach turned out to be a winner for D in general --- the "length" 
 Chernobyl disaster notwithstanding :o).

Yes, it's the "when we don't know what the semantics of this combination 
of features should be, make it an error" rule. That way, when it 
sometimes eventually becomes clear what those semantics should be, we 
don't have a backwards compatibility problem.

Feb 01 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 It is a change that I personally think should be made. Walter devised 
 the one cast per class rule out of caution when facing an issue known 
 to be potentially disastrous. Generally, the "when in doubt, don't put 
 it in" approach turned out to be a winner for D in general --- the 
 "length" Chernobyl disaster notwithstanding :o).

 
 Yes, it's the "when we don't know what the semantics of this combination 
 of features should be, make it an error" rule. That way, when it 
 sometimes eventually becomes clear what those semantics should be, we 
 don't have a backwards compatibility problem.

That's how D started without templates and ended up with having one of 
the best template systems around. Same, I think, will happen to macros.

Andrei

Feb 01 2007

Chad J <gamerChad _spamIsBad_gmail.com> writes:

Another thing I realized it would be nice to have is some intrinsic 
functions.

Here are some things I can think of off hand:

/// Saturated componentwise addition of two vectors.
adds( T vector1, T vector2 );

/// Multiply Add.
madd( T vector1, T vector2 );

/// Shifts all bits in "vector" right or left, respectively.
shiftr( T vector );
shiftl( T vector );

/** Comparison routines that return masks.
     Compare the operands in the appropriate way.  If the comparison is
       true for a particular set of components, the resulting component
       will be filled with all 1's, otherwise it will be filled with all
       0's.
     These should be hardware defined CPUs with SSEs and MMX, which are
       used on most desktops today and probably for a looong time.
*/
compareLess( T vector1, T vector2 ); // returns vector1 < vector2
compareLessEquals( T vector1, T vector2 ); // returns vector1 <= vector2
compareGreater( T vector1, T vector2 ); // returns vector1 > vector2
... (you get the idea)


I'm looking forward to some slick SIMD support in D!  I'm getting 
excited just to see this considered.  *stoked*

Oh and speaking of intrinsics, std.intrinsics is lacking the kinda 
obvious rotate left and rotate right (rol/ror) :(

Jan 30 2007

Knud Soerensen <4tuu4k002 sneakemail.com> writes:

I think it is better to make very general framework for describing the
vector operations, because when we have a very general description 
then the compiler is free to optimize as it sees fit.
You can think at it as a SQL for vector operations.
See http://all-technology.com/eigenpolls/dwishlist/index.php?it=10
There is also a suggestion for a general form of Swizzle operations
http://all-technology.com/eigenpolls/dwishlist/index.php?it=11

I think that you forget that a program using vectors typical use a lot 
of them and therefor it needs to transform a lot at a time.

Exsample take a models using 1000 vectors.

Your suggestion would suggest defining it like 
float3 model[1000].
and it would suggest that they where placed in memory as
x1 y1 z1
x2 y2 z2
x3 y3 z3
.
.
.

each vector following each other.

But now imagine that the most used operation in the program 
is a plan projection on the x-y plan, then we only need the x and y
coordinate.
So an arrangement in memory which looks like 
x1 x2 x3 . . .
y1 y2 y3 . . .
z1 z2 z3 . . .
would result in that the cpu could specific load only the x and the y
coordinates into the cache, which would speed up the calculations a lot.

So, if one could specify that 3x1000 floats and let the compiler
arrange them in memory that would be optimal.

Feb 02 2007

janderson <askme me.com> writes:

Knud Soerensen wrote:


 You can think at it as a SQL for vector operations.
 See http://all-technology.com/eigenpolls/dwishlist/index.php?it=10
 There is also a suggestion for a general form of Swizzle operations
 http://all-technology.com/eigenpolls/dwishlist/index.php?it=11
 
 I think that you forget that a program using vectors typical use a lot 
 of them and therefor it needs to transform a lot at a time.
 
 Exsample take a models using 1000 vectors.
 
 Your suggestion would suggest defining it like 
 float3 model[1000].
 and it would suggest that they where placed in memory as
 x1 y1 z1
 x2 y2 z2
 x3 y3 z3
 .
 .
 .
 
 each vector following each other.
 
 But now imagine that the most used operation in the program 
 is a plan projection on the x-y plan, then we only need the x and y
 coordinate.
 So an arrangement in memory which looks like 
 x1 x2 x3 . . .
 y1 y2 y3 . . .
 z1 z2 z3 . . .
 would result in that the cpu could specific load only the x and the y
 coordinates into the cache, which would speed up the calculations a lot.
 
 So, if one could specify that 3x1000 floats and let the compiler
 arrange them in memory that would be optimal.

I think this is a special case.  Typically I'd say the programmer would 
be applying operations on several component at a time.  The SIMD is 
setup this way.  Maybe the compiler could detect the case your taking 
about, however I think eventually the elements in the array would want 
to be used as components.

I do however think a long list of values (such as a vertex buffer) would 
gain from SIMD optimized operations (future hardware devices which will 
add more and more array support and parallelism).

 I think it is better to make very general framework for describing the
 vector operations, because when we have a very general description
 then the compiler is free to optimize as it sees fit.

This I agree with.  The first versions need not provide SIMD 
optimizations they just need to be done in such a way that these can be 
added later without changing our code.  I can imagine situations in the 
future where were write for instance a multiply and then an add.  The 
compiler sees this (re-arrange code as it needs to) and change it to the 
SIMD muladd.  ie arrays become like just another numerical datatype the 
compiler can manipulate.

This sort of thing will help reduce micro-optimizations needed for this 
type of coding.

-Joel

Feb 02 2007

Mikola Lysenko <mclysenk mtu.edu> writes:

Knud Soerensen wrote:
 I think it is better to make very general framework for describing the
 vector operations, because when we have a very general description 
 then the compiler is free to optimize as it sees fit.
 You can think at it as a SQL for vector operations.

Well, I can't say that I share your enthusiasm for the power of 
necromantic compiler optimizations.  If you want to see what a 
vectorizing compiler looks like, take a gander at:

http://www.codeplay.com/

VectorC is a C compiler which focuses on architecture specific SIMD 
optimizations.  The problem with their approach is that it is too 
general.  If you want to create optimized vector code, you need to tweak 
things just-so to make the compiler understand it.  This means tedious 
trial-and-error optimization sessions where you end up rewriting the 
same piece of C-code 50 times in order to get the compiler to spit out 
the right assembler.  To facilitate this process, they provide several 
tools and pragmas you can use to give the compiler hints.

By the time you are done, you have a very fast vector optimized SSE 
function that only works on one specific compiler.  Good luck porting 
your results it to GCC!  Does it save you time at the end of the day? 
Maybe, but what happens when some future releases changes the optimizer 
and your code no longer comes out all clean and vectorized?  You have to 
go back and repeat the process from scratch.  For the amount of hair 
pulling these compilers cause I would just as soon write the damn code 
in assembler and be done with it once and for all.

The basic issue with VectorC and all the other vectorizing compilers is 
that they try to do too much.  Everyone seems so hell bent on creating 
this hyper-general-super-chocolate-coated-vectorizing-optimizer, when in 
reality it's not even necessary.  Small geometric vectors are far more 
common, and improving their utility is a much bigger win for less 
effort.  We need a convenient and fast set of arithmetic routines for 
doing simple geometric stuff, not an overarching super vector-framework.


 ... imagine that the most used operation in the program 
 is a plan projection on the x-y plan, then we only need the x and y
 coordinate.
 So an arrangement in memory which looks like 
 x1 x2 x3 . . .
 y1 y2 y3 . . .
 z1 z2 z3 . . .
 would result in that the cpu could specific load only the x and the y
 coordinates into the cache, which would speed up the calculations a lot.
 
 So, if one could specify that 3x1000 floats and let the compiler
 arrange them in memory that would be optimal.

This doesn't strike me as very realistic.  How is the compiler ever 
going to figure out that you wanted to project these vectors onto the 
x-y plane somewhere down the road?  This is a very difficult global 
optimization, and I can think of no way to perform it without direct 
programmer intervention.

Moreover, I'm not sure why you split up the x-y components.  A much 
better layout would look like this:

x y x y x y x y x y ...
z z z ....

Now the alignment for each vector is such that you could load 2 x-y 
pairs into a single SSE register at once.  This is not only better for 
caching, but it is also better for performance.

However, neither the layout you proposed or the one above make much 
sense.  Typically you want to do more with vectors than just drop the 
z-component.  You need to perform matrix multiplication, normalization, 
dot and cross products etc.  Each of these operates most efficiently 
when all vector components are packed into a single SIMD register. 
Therefore it seems clear that the preferred layout ought to be:

x y z w x y z w ....

I can't really fathom why you would want to go through the extra trouble 
of splitting up each component.  Not only does it make arithmetic less 
efficient, but it also creates a book keeping nightmare.  Imagine trying 
to grow the total number of vectors in your original layout!  You would 
need to do multiple costly memcpy operations.

For low dimension vectors, the necessary compiler optimizations are 
obvious, and there is one clear 'best' data layout.  We don't need any 
fancy compiler magic, since the code can be directly translated into 
vector operations just like ordinary arithmetic.

Feb 03 2007

janderson <askme me.com> writes:

Mikola Lysenko wrote:
 Knud Soerensen wrote:
 I think it is better to make very general framework for describing the
 vector operations, because when we have a very general description 
 then the compiler is free to optimize as it sees fit.
 You can think at it as a SQL for vector operations.

 
 Moreover, I'm not sure why you split up the x-y components.  A much 
 better layout would look like this:
 
 x y x y x y x y x y ...
 z z z ....
 
 Now the alignment for each vector is such that you could load 2 x-y 
 pairs into a single SSE register at once.  This is not only better for 
 caching, but it is also better for performance.
 
 However, neither the layout you proposed or the one above make much 
 sense.  Typically you want to do more with vectors than just drop the 
 z-component.  You need to perform matrix multiplication, normalization, 
 dot and cross products etc.  Each of these operates most efficiently 
 when all vector components are packed into a single SIMD register. 
 Therefore it seems clear that the preferred layout ought to be:
 
 x y z w x y z w ....
 
 I can't really fathom why you would want to go through the extra trouble 
 of splitting up each component.  Not only does it make arithmetic less 
 efficient, but it also creates a book keeping nightmare.  Imagine trying 
 to grow the total number of vectors in your original layout!  You would 
 need to do multiple costly memcpy operations.
 
 For low dimension vectors, the necessary compiler optimizations are 
 obvious, and there is one clear 'best' data layout.  We don't need any 
 fancy compiler magic, since the code can be directly translated into 
 vector operations just like ordinary arithmetic.

I couldn't agree more with this last part.  However I must say that for 
'very particular problems' in the past I've found that splitting structs 
(vectors or whatever) up by there types can have significant performance 
gains when your doing batching operations on one component, due to cache.

However this so special case and would be very difficult for the 
compiler to figure out.  Its something I believe the programmer should 
do not the compiler.

-Joel

Feb 03 2007

Knud Soerensen <4tuu4k002 sneakemail.com> writes:

On Sat, 03 Feb 2007 13:18:55 -0500, Mikola Lysenko wrote:

 Knud Soerensen wrote:
 I think it is better to make very general framework for describing the
 vector operations, because when we have a very general description 
 then the compiler is free to optimize as it sees fit.
 You can think at it as a SQL for vector operations.

 
 Well, I can't say that I share your enthusiasm for the power of 
 necromantic compiler optimizations.  If you want to see what a 
 vectorizing compiler looks like, take a gander at:
 
 http://www.codeplay.com/
 
 VectorC is a C compiler which focuses on architecture specific SIMD 
 optimizations.  The problem with their approach is that it is too 
 general.  If you want to create optimized vector code, you need to tweak 
 things just-so to make the compiler understand it.  This means tedious 
 trial-and-error optimization sessions where you end up rewriting the 
 same piece of C-code 50 times in order to get the compiler to spit out 
 the right assembler.  To facilitate this process, they provide several 
 tools and pragmas you can use to give the compiler hints.
 
 By the time you are done, you have a very fast vector optimized SSE 
 function that only works on one specific compiler.  Good luck porting 
 your results it to GCC!  Does it save you time at the end of the day? 
 Maybe, but what happens when some future releases changes the optimizer 
 and your code no longer comes out all clean and vectorized?  You have to 
 go back and repeat the process from scratch.  For the amount of hair 
 pulling these compilers cause I would just as soon write the damn code 
 in assembler and be done with it once and for all.
 

I agree with you on this. 
We need a way to write the vector code once and be able to compile it 
on many architectures with good results.

 The basic issue with VectorC and all the other vectorizing compilers is 
 that they try to do too much.  Everyone seems so hell bent on creating 
 this hyper-general-super-chocolate-coated-vectorizing-optimizer, when in 
 reality it's not even necessary.  Small geometric vectors are far more 
 common, and improving their utility is a much bigger win for less 
 effort.

That might be true for your projects but not for mine.

  We need a convenient and fast set of arithmetic routines for 
 doing simple geometric stuff, not an overarching super vector-framework.

Actually, what I need is an overarching super TENSOR-framework ;-)
 
 ... imagine that the most used operation in the program 
 is a plan projection on the x-y plan, then we only need the x and y
 coordinate.
 So an arrangement in memory which looks like 
 x1 x2 x3 . . .
 y1 y2 y3 . . .
 z1 z2 z3 . . .
 would result in that the cpu could specific load only the x and the y
 coordinates into the cache, which would speed up the calculations a lot.
 
 So, if one could specify that 3x1000 floats and let the compiler
 arrange them in memory that would be optimal.

 
 This doesn't strike me as very realistic.  How is the compiler ever 
 going to figure out that you wanted to project these vectors onto the 
 x-y plane somewhere down the road?  This is a very difficult global 
 optimization, and I can think of no way to perform it without direct 
 programmer intervention.
 

Yes, the optimization might be difficult but the point is
that the code should not impose artificial limitation on 
the compiler. If the compiler know how to make 
the optimization then it is free to make it.


 Moreover, I'm not sure why you split up the x-y components.  A much 
 better layout would look like this:
 
 x y x y x y x y x y ...
 z z z ....
 
 Now the alignment for each vector is such that you could load 2 x-y 
 pairs into a single SSE register at once.  This is not only better for 
 caching, but it is also better for performance.

Good a better example :-)  

 However, neither the layout you proposed or the one above make much 
 sense.  

I do not mean proposed a specific layout what I tried to say 
was that the compiler should be free to chose the layout
which fit the problem. 

 Typically you want to do more with vectors than just drop the 
 z-component.  You need to perform matrix multiplication, normalization, 
 dot and cross products etc.  Each of these operates most efficiently 
 when all vector components are packed into a single SIMD register. 
 Therefore it seems clear that the preferred layout ought to be:
 
 x y z w x y z w ....

Then the compiler should use this layout for those typical problems,
but we should not limit the compiler to generate sub-optimal code
for non-typical problems. 

 I can't really fathom why you would want to go through the extra trouble 
 of splitting up each component.  Not only does it make arithmetic less 
 efficient, but it also creates a book keeping nightmare.  Imagine trying 
 to grow the total number of vectors in your original layout!  You would 
 need to do multiple costly memcpy operations.

Imagine trying to grow the number of dimensions in your layout ;-)
 

 For low dimension vectors, the necessary compiler optimizations are 
 obvious, and there is one clear 'best' data layout.  We don't need any 
 fancy compiler magic, since the code can be directly translated into 
 vector operations just like ordinary arithmetic.

Yes, good performance for typical problems is important 
but we should be careful not to chose a notation which 
ruin the performance for non-typical problems.

Feb 03 2007

D Programming

C/C++ Programming

Other

digitalmars.D - Small Vectors Proposal