www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Small Vectors Proposal

reply Mikola Lysenko <mclysenk mtu.edu> writes:
Coordinate geometry is a fundamental part of mathematics.  Many 
important problems in computer science are easily expressed in terms of 
vectors and products.  There are many domains which could benefit from 
enhanced vector performance, especially in scientific and graphical 
applications.

Despite their importance, very few languages provide support for vector 
types.  The prevailing notion is that it should be up to the compiler to 
optimize and somehow automatically determine where SIMD code can be 
applied.  The problem with this approach is that it is often very 
difficult for the compiler to spot where a SIMD operation can be 
applied.  Code must be carefully tweaked to make the compiler recognize 
the correct possible optimization.  Even then, many programmers still 
write their own low-dimensional vector classes for convenience, and 
perform all SIMD optimizations by hand in assembler.

It is important to make the distinction between low dimension vectors 
and other higher order arrays.  This is crucial since the higher 
dimensional arrays are often more tenuously connected with any sort of 
geometric or physical interpretation.  Moreover, many architectures are 
specially optimized for the lower dimensional cases, and offer special 
registers which are virtually impossible to properly exploit using 
libraries alone.  The situation is analogous to floating point 
arithmetic, proper hardware support requires language level integration. 
  Shader languages (Cg/GLSL) already provide extremely robust vector 
arithmetic support, and give some direction to this proposal.  Here are 
the proposed language extensions:

1. Additional primitive types

All numerical primitive types in the language will be supplemented with 
3 additional vector extensions.  Each vector extension will contain 2, 3 
or 4 components since these are best supported by hardware, and the most 
relevant geometrically.  The initial value for each component in each 
type will be the same as that of its base type.  Each the size of each 
vector component will be equal to the size of its base type times its 
dimension.  Here is a complete list of the new types to be added:

Integer types:
byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, 
long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, 
ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, 
ucent3, ucent4

Floating types:
float2, float3, float4, double2, double3, double4, real2, real3, real4

Imaginary types:
ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, ireal3, 
ireal4

Complex types:
cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, creal3, 
creal4

All new vector types are pass-by-value, and may be used in combination 
with any number of other types.  The following expressions would all be 
valid:

int2[] a;
float4* b;
char[][ucent4] c;


2. Vector literals

It must be possible to declare a vector literal with unambiguously as an 
argument to a function, or any number of other situations.  This syntax 
must be concise since it will be used often, and it must be flexible, 
allowing vectors to be constructed via concatenation.  In this light, 
the following is proposed:

float4(1, 2, 3, 4);
int2(100, 200);
creal3(1 + 2i, 3+4i, 5-6i);

Additionally, this syntax will also support the construction of vectors 
from other sub vectors.  This can be useful for augmenting a 3-vector 
with a fourth homogenous component for transformation amongst other 
things.  Here are some examples:

float4(float3(x, y, z), 1);		// returns (x, y, z, 1)
int3(0, int2(a, b));			// returns (0, a, b)
creal4(ireal2(r, z), real2(p, q));	// returns (r, z, p, q)


3. Casting rules

Vector types will obey the same casting rules as their base type within 
a single dimension.  It is not possible to use a cast to change the 
dimension of a vector.

cast(float2)(int2(1, 2));	//ok
cast(int)(float2(3, 4));	//error, int is not the same dimension as float2
real3(a, b) + ireal(c, d);	//ok, type of expression is creal3


4. Component-Wise Arithmetic

All of the arithmetic operators on primitive types will be extended such 
that when applied to vectors they operate on each component 
independently.  The size and types of the vectors must be compatible 
with the operator.

float4(1, 2, 3, 4) + float4(3, 5, 6, 7); 	// Vector-Vector operation, 
returns (4, 7, 9, 11)
float3(1, 5, 6) * float3(1, 0, 2);		// returns (1, 0, 12)
uint2(0xFF00FF, 0x00FF88) & uint2(0xFF, 0xFF);	// returns (0xFF, 0x88)

int3(x, y, z) * int2(a, b);			// error, size of vectors does not match

Additionally, scalar vector operations will be supported.  Each 
scalar-vector operator is applied per-component within the vector.

int2(0, 3) * 5;					// Vector-Scalar operation, returns (0, 15)
byte2(2, 5) + 6;				// returns (8, 11)

Note that the only two comparison operators allowed on vectors are == 
and !=.  The problem with <, >, etc. is that there is no commonly agreed 
upon definition for such things.  Therefore the operators are omitted to 
avoid confusion.

int2(1, 2) == int2(1, 2);		//ok, evaluates to true
float3(1, 0, 0) < float3(0, 0, 1);	// error, < is undefined for vectors


5. Swizzle operations

In many vector applications, it is often necessary to reorder the 
components.  In shader languages this is accomplished via 'swizzle' 
properties in each vector type.  Each swizzle selects a set of 1-4 
vector components and places them into a new vector of the given 
dimension.  Conventionally these components have the following names:

x - first component
y - second component
z - third component
w - fourth component

A swizzle is then given as a property consisting of 1-4 components in a 
given order.  Here are some examples:

float2(a, b).yx;		// evaluates to (b, a)
float3(1, 2, 3).zzyx;		// evaluates to (3, 3, 2, 1)
byte4('a', 'b', 'c', 'd').w;	// evaluates to 'd'
creal3(1 + 2i, 0, 4i).zzzz;	// evaluates to (4i, 4i, 4i, 4i)

int2(1, 2).z;			// error, vector does not have 3 components

These can be useful when projecting homogenous vectors into a lower 
dimension subspace, or when computing various products.  Here is an 
example of a cross product implemented using swizzles:

float3 cross(float3 a, float3 b)
{
	return a.zxy * b.yzx - a.yzx * b.zxy;
}

And here is a simple homogenous projection:

float3 project(float4 hg)
{
	return hg.xyz / hg.w;
}


6. Standard Library Additions

The basic vector type additions have been kept as spartan as possible to 
avoid cluttering the basic language, while still allowing the maximum 
expressiveness.  Instead of defining various vector products as 
properties, they are implemented within a standard library extension. 
The reason for this is simple; many users may wish to redefine these 
products based on their own coordinate system.  Within the standard 
library, all vector products are performed in right-handed euclidean 
space.

Vector specific functions will be placed in a new library std.vmath, and 
all functions in std.math will be extended to work on vectors as well 
primitive types.  Here is a list of functions std.vmath will support 
along with a brief description:

dot(a, b) - Computes dot product of two vectors:
	a.x * b.x + a.y * b.y + a.z * b.z + a.w * b.w

cross(a, b) - Computes the cross product as defined above.  Only defined 
for 3d vectors.

perp(a, b) - Computes the perp product of two vectors as (a.x * b.y - 
a.y * b.x).  Only defined for 2d vectors.

mag(a) - Computes the magnitude of a vector, given as:  sqrt(x * x + y * 
y + z * z + w * w);

mag2(a) - Compute the square of the magnitude of a vector.

norm(a) - Returns a / mag(a)

lerp(a, b, t) - Compute an interpolated position between two vectors.  a 
* (1 - t) + b * t

Additionally, it would be convenient to have quaternion functions 
supported in the standard library.  A quaternion q would be a 4 vector 
with the following form:  q = q.w + q.x * i + q.y * j + q.z * k

qconj(a) - Computes the quaternion conjugate of a.  Given as 
float4(-a.xyz, a.w)

qmul(a, b) - Computes the quaternion product of two 4d vectors.

qrot(a, b) - Performs a quaternion rotation/dilation on the 3d vector b 
by the 4d quaternion a.


Finally, there are range manipulation functions:

clamp(a, min, max) - Clamps each component of a to the range [min, max]

minv(a) - Returns the smallest component in the vector a

maxv(a) - Returns the largest component in the vector a
Jan 30 2007
next sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Mikola Lysenko wrote:
[snip]
 1. Additional primitive types
 
 byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, 
 long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, 
 ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, 
 ucent3, ucent4
 float2, float3, float4, double2, double3, double4, real2, real3, real4
 ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, ireal3, 
 ireal4
 cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, creal3, 
 creal4

I think the arguing makes much sense, but as the proposal adds 57(!) new domain specific primitive types to the D language specification, I believe it is way too heavy. For discussion, here is a not very thought out counter proposal: 1. Make the D 2.0 static array pass-by-value instead of it's inconsistent and strange pseudo-reference/value type heritage from C. (Pass by reference could be retained for extern(C) functions only). 2. Define ABIs for SSE2, SSE3, etc platforms so that small static arrays can be allocated and passed in SIMD registers. 3. Implement vector operations for static arrays. Advantages: * The code would automatically be platform independent. On a non SIMD-able platform, float[4], byte[8], and so on would all be well defined and work identically to the way they do on SIMD platforms. * Future platforms would not require changes to the language specification. float[8] could suddenly become SIMD-able for instance. * Minor changes to the language specification. Mostly ABI changes. A much worse, but less radical proposal, would be to revive the C "register" keyword. Instead of: float4 x; you get: register float[4] x; /Oskar
Jan 30 2007
next sibling parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Oskar Linde wrote:
 Mikola Lysenko wrote:
 [snip]
 1. Additional primitive types

 byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, 
 long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, 
 ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, 
 ucent3, ucent4
 float2, float3, float4, double2, double3, double4, real2, real3, real4
 ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, 
 ireal3, ireal4
 cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, 
 creal3, creal4

I think the arguing makes much sense, but as the proposal adds 57(!) new domain specific primitive types to the D language specification, I believe it is way too heavy. For discussion, here is a not very thought out counter proposal: 1. Make the D 2.0 static array pass-by-value instead of it's inconsistent and strange pseudo-reference/value type heritage from C. (Pass by reference could be retained for extern(C) functions only). 2. Define ABIs for SSE2, SSE3, etc platforms so that small static arrays can be allocated and passed in SIMD registers. 3. Implement vector operations for static arrays. Advantages: * The code would automatically be platform independent. On a non SIMD-able platform, float[4], byte[8], and so on would all be well defined and work identically to the way they do on SIMD platforms. * Future platforms would not require changes to the language specification. float[8] could suddenly become SIMD-able for instance. * Minor changes to the language specification. Mostly ABI changes.

I must say, I like this version better.
 A much worse, but less radical proposal, would be to revive the C 
 "register" keyword. Instead of:
 
 float4 x;
 
 you get:
 
 register float[4] x;

Something like this may unfortunately be necessary if backwards compatibility is to be maintained with pass-by-reference code. :(
Jan 30 2007
prev sibling next sibling parent reply Mikola Lysenko <mclysenk mtu.edu> writes:
Oskar Linde wrote:
 For discussion, here is a not very thought out counter proposal:
 
 1. Make the D 2.0 static array pass-by-value instead of it's 
 inconsistent and strange pseudo-reference/value type heritage from C. 
 (Pass by reference could be retained for extern(C) functions only).
 
 2. Define ABIs for SSE2, SSE3, etc platforms so that small static arrays 
 can be allocated and passed in SIMD registers.
 
 3. Implement vector operations for static arrays.
 

After thinking about it some more, I believe this approach would probably work just as well. Another possibility is to simply restrict vectors to floating point types only: ie, float2, float3, float4, double2, double3, double4, real2, real3, real4. This makes the number of new types only 9 instead of 57. The disadvantage is that it is no longer easy to use the MMX extensions which are specially tailored for integer vectors. Overall, I guess I'd be happy with either situation.
Jan 30 2007
parent reply Benji Smith <dlanguage benjismith.net> writes:
Mikola Lysenko wrote:
 After thinking about it some more, I believe this approach would 
 probably work just as well.  Another possibility is to simply restrict 
 vectors to floating point types only:
 
 ie, float2, float3, float4, double2, double3, double4, real2, real3, real4.
 
 This makes the number of new types only 9 instead of 57.  The 
 disadvantage is that it is no longer easy to use the MMX extensions 
 which are specially tailored for integer vectors.  Overall, I guess I'd 
 be happy with either situation.

Three and four dimensional vectors are nice for physics and whatnot, but for those of us in the machine-learning field, it's commonplace to construct vectors (and to calculate distances, dot-products, and cosines between those vectors) using many many more dimensions. Right now, I'm working on a statistical prediction algorithm that uses sparse 300,000-dimensional feature vectors. There's no reason to limit SSE optimizations to small-dimensionality vectors. I don't think having special datatypes is the right solution. I think it should be possible to annotate any array with a flag that says "I'll be doing vector math here. Watch out for possible optimizations". Or just having a library of vector functions would probably be sufficient to solve the problem in the majority of cases. --benji
Jan 30 2007
parent reply Mikola Lysenko <mclysenk mtu.edu> writes:
Benji Smith wrote:
 Right now, I'm working on a statistical prediction algorithm that uses 
 *sparse* 300,000-dimensional feature vectors. There's no reason to limit 
 SSE optimizations to small-dimensionality vectors.
 

I can understand your desire to create a universal solution, and for a long time I too was convinced this was the way to go. However I have come to understand that there are many practical differences between low and high dimension vectors. Think about it this way: would you try to use a sparse matrix/vector library for doing operations on 4-d vectors and matrices? It should be clear that these are two different very different types of problems. If we tried to use pass-by-value semantics and array operations on gigantic vectors the result would be equally bad. Obviously passing a 300k+ element array around by value is going to destroy your performance. The classic solution is to create a series of custom routines for whichever problem you are trying to solve, or perhaps use a robust optimized library like BLAS for the heavy lifting. High dimension linear algebra is a tricky subject, and it can take years of research to get algorithms with competitive performance. On the other hand, Low dimension vectors require relatively trivial compiler support, and are used far more often. To get a rough estimate of their relative popularity, I invoke Google CodeSearch: vector math -sparse : 282k hits http://www.google.com/codesearch?q=vector+math+-sparse&hl=en&btnG=Search+Code vector sparse : 31k hits http://www.google.com/codesearch?hl=en&lr=&q=vector+sparse&btnG=Search From this informal statistic, we could infer that the low dimensional vectors are used at least an order of magnitude more frequently than their higher order counterparts. I do not mean to say that fast sparse vector/matrix operations are unimportant, after all a fast matrix solver can sometimes shave weeks off your processing time. In such a situation, it is permissible for a library to be slightly cumbersome - just as long as it gets the job done fast. Compare this to small vectors, where many applications place convenience and readability above performance. Integrating small vectors into the core language will not only improve their usability, but it should also provide substantial performance gains.
Jan 30 2007
next sibling parent reply janderson <askme me.com> writes:
Mikola Lysenko wrote:
 Benji Smith wrote:
 Right now, I'm working on a statistical prediction algorithm that uses 
 *sparse* 300,000-dimensional feature vectors. There's no reason to 
 limit SSE optimizations to small-dimensionality vectors.

I can understand your desire to create a universal solution, and for a long time I too was convinced this was the way to go. However I have come to understand that there are many practical differences between low and high dimension vectors. Think about it this way: would you try to use a sparse matrix/vector library for doing operations on 4-d vectors and matrices? It should be clear that these are two different very different types of problems. If we tried to use pass-by-value semantics and array operations on gigantic vectors the result would be equally bad. Obviously passing a 300k+ element array around by value is going to destroy your performance. The classic solution is to create a series of custom routines for whichever problem you are trying to solve, or perhaps use a robust optimized library like BLAS for the heavy lifting. High dimension linear algebra is a tricky subject, and it can take years of research to get algorithms with competitive performance. On the other hand, Low dimension vectors require relatively trivial compiler support, and are used far more often. To get a rough estimate of their relative popularity, I invoke Google CodeSearch: vector math -sparse : 282k hits http://www.google.com/codesearch?q=vector+math+-sparse&hl=en&btnG=Search+Code vector sparse : 31k hits http://www.google.com/codesearch?hl=en&lr=&q=vector+sparse&btnG=Search From this informal statistic, we could infer that the low dimensional vectors are used at least an order of magnitude more frequently than their higher order counterparts. I do not mean to say that fast sparse vector/matrix operations are unimportant, after all a fast matrix solver can sometimes shave weeks off your processing time. In such a situation, it is permissible for a library to be slightly cumbersome - just as long as it gets the job done fast. Compare this to small vectors, where many applications place convenience and readability above performance. Integrating small vectors into the core language will not only improve their usability, but it should also provide substantial performance gains.

I don't think its critical that high order matrix multiplications get the same performance gains as smaller arrays that use the SIMD. Its just important that its consistent. Putting vector operation into an array gives us more power because we get all the standard array operations as well (I concede that these operations could be put into the base arrays as well). I hope that eventually that dynamic arrays would get the SIMD support as well. That way you could for instance load an entire list of vertexes (or whatever) and apply an operation on all of them. D could even do this in parallel. (I concede here that you could probably do that with an array of these base types. Also vectors with 4 parts can be treated somewhat differently in that the 4th component is sometimes used for other reasons). I generally think that representing vector types as arrays makes the language more powerful, then having individual special case types. -Joel
Feb 01 2007
parent reply Chad J <gamerChad _spamIsBad_gmail.com> writes:
janderson wrote:
 Mikola Lysenko wrote:
 
 Benji Smith wrote:

 Right now, I'm working on a statistical prediction algorithm that 
 uses *sparse* 300,000-dimensional feature vectors. There's no reason 
 to limit SSE optimizations to small-dimensionality vectors.

I can understand your desire to create a universal solution, and for a long time I too was convinced this was the way to go. However I have come to understand that there are many practical differences between low and high dimension vectors. Think about it this way: would you try to use a sparse matrix/vector library for doing operations on 4-d vectors and matrices? It should be clear that these are two different very different types of problems. If we tried to use pass-by-value semantics and array operations on gigantic vectors the result would be equally bad. Obviously passing a 300k+ element array around by value is going to destroy your performance. The classic solution is to create a series of custom routines for whichever problem you are trying to solve, or perhaps use a robust optimized library like BLAS for the heavy lifting. High dimension linear algebra is a tricky subject, and it can take years of research to get algorithms with competitive performance. On the other hand, Low dimension vectors require relatively trivial compiler support, and are used far more often. To get a rough estimate of their relative popularity, I invoke Google CodeSearch: vector math -sparse : 282k hits http://www.google.com/codesearch?q=vector+math+-sparse&hl=en&btnG=Search+Code vector sparse : 31k hits http://www.google.com/codesearch?hl=en&lr=&q=vector+sparse&btnG=Search From this informal statistic, we could infer that the low dimensional vectors are used at least an order of magnitude more frequently than their higher order counterparts. I do not mean to say that fast sparse vector/matrix operations are unimportant, after all a fast matrix solver can sometimes shave weeks off your processing time. In such a situation, it is permissible for a library to be slightly cumbersome - just as long as it gets the job done fast. Compare this to small vectors, where many applications place convenience and readability above performance. Integrating small vectors into the core language will not only improve their usability, but it should also provide substantial performance gains.

I don't think its critical that high order matrix multiplications get the same performance gains as smaller arrays that use the SIMD. Its just important that its consistent. Putting vector operation into an array gives us more power because we get all the standard array operations as well (I concede that these operations could be put into the base arrays as well). I hope that eventually that dynamic arrays would get the SIMD support as well. That way you could for instance load an entire list of vertexes (or whatever) and apply an operation on all of them. D could even do this in parallel. (I concede here that you could probably do that with an array of these base types. Also vectors with 4 parts can be treated somewhat differently in that the 4th component is sometimes used for other reasons). I generally think that representing vector types as arrays makes the language more powerful, then having individual special case types. -Joel

I think we should have array operations for dynamic arrays (n-dimensional vectors) as well, but distinct from lo-D static arrays. The reason is the same as others state: hi-D arrays can spend some fixed overhead setting things up before jumping into a gigantic loop, whereas lo-D arrays get optimized much differently with no fixed overhead and possibly higher running overhead or less generality. They are two different beasts. Since hi-D arrays can deal with some overhead at startup, they are ideal candidates for a library implementation of array operations. There are only one or two features D would need to get there and have this seamless library implementation (maybe they are done already?? I forget), and that may be an easier and more general route than trying to make compiler writers (Walter) write compiler support for both lo-D and hi-D optimization cases.
Feb 01 2007
parent Benji Smith <dlanguage benjismith.net> writes:
Chad J wrote:
 I think we should have array operations for dynamic arrays 
 (n-dimensional vectors) as well, but distinct from lo-D static arrays. 
 The reason is the same as others state: hi-D arrays can spend some fixed 
 overhead setting things up before jumping into a gigantic loop, whereas 
 lo-D arrays get optimized much differently with no fixed overhead and 
 possibly higher running overhead or less generality.  They are two 
 different beasts.

The more I think about it, the more I agree. Hi-D vectors are almost always sparse, so the feature/magnitude pairs are usually stored in a HashMap (or some other similar container) rather than being represented in arrays. Adding compiler optimizations for hi-D vector operators would mean knowing the underlying implementation of the container, and that'd obviously be bad. But I don't think vector optimizations should be constrained to 3D or 4D vectors. Generalize the optimizations on array types, and then any code that performs vector operations on those arrays automatically gets the performance boost, regardless of the dimensionality of their vectors. --benji
Feb 07 2007
prev sibling parent reply Don Clugston <dac nospam.com.au> writes:
Mikola Lysenko wrote:
 Benji Smith wrote:
 Right now, I'm working on a statistical prediction algorithm that uses 
 *sparse* 300,000-dimensional feature vectors. There's no reason to 
 limit SSE optimizations to small-dimensionality vectors.

I can understand your desire to create a universal solution, and for a long time I too was convinced this was the way to go. However I have come to understand that there are many practical differences between low and high dimension vectors. Think about it this way: would you try to use a sparse matrix/vector library for doing operations on 4-d vectors and matrices? It should be clear that these are two different very different types of problems.

I completely agree. For large vectors and matrices, you're going to have to use a loop, and the loop startup costs are amortised; you can use the same BLAS code for a 3159 element vector as for a 3158 element one, and it makes negligible difference to the speed. But for 3 and 4 element vectors it makes an enormous difference. And the applications are completely different. I imagine games programmers don't use much above 4x4 matrices.
Feb 01 2007
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Don Clugston wrote:
 Mikola Lysenko wrote:
 Benji Smith wrote:
 Right now, I'm working on a statistical prediction algorithm that 
 uses *sparse* 300,000-dimensional feature vectors. There's no reason 
 to limit SSE optimizations to small-dimensionality vectors.

I can understand your desire to create a universal solution, and for a long time I too was convinced this was the way to go. However I have come to understand that there are many practical differences between low and high dimension vectors. Think about it this way: would you try to use a sparse matrix/vector library for doing operations on 4-d vectors and matrices? It should be clear that these are two different very different types of problems.

I completely agree. For large vectors and matrices, you're going to have to use a loop, and the loop startup costs are amortised; you can use the same BLAS code for a 3159 element vector as for a 3158 element one, and it makes negligible difference to the speed. But for 3 and 4 element vectors it makes an enormous difference. And the applications are completely different. I imagine games programmers don't use much above 4x4 matrices.

I'd say that's not necessarily true with games using more and more realistic physics, and even with run of the mill skeletal animation techniques. But in those cases you're talking arbitrary N-vectors, so in any event the most common dimensions for data are: 2 (screen/texture space), 3 (3D world space, homogeneous 2D space, color space), 4 (homogeneous 3D world space), and N (physics and everything else). --bb
Feb 01 2007
prev sibling parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Oskar Linde wrote:
 Mikola Lysenko wrote:
 [snip]
 1. Additional primitive types

 byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, 
 long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, 
 ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, 
 ucent3, ucent4
 float2, float3, float4, double2, double3, double4, real2, real3, real4
 ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, 
 ireal3, ireal4
 cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, 
 creal3, creal4

I think the arguing makes much sense, but as the proposal adds 57(!) new domain specific primitive types to the D language specification, I believe it is way too heavy.

Yes I agree with Oskar and Walter that all those 57 primitive type names are not needed in the core language, and instead float[4] should act like your float4. But for users, this kind of alias should behave as expected: alias float[4] float4; And a std.vecmath library could define all 57 of the aliases above. --bb
Jan 30 2007
prev sibling next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Mikola Lysenko wrote:
 All new vector types are pass-by-value, and may be used in combination 
 with any number of other types.  The following expressions would all be 
 valid:
 

 char[][ucent4] c;

[snip]
 Note that the only two comparison operators allowed on vectors are == 
 and !=.  The problem with <, >, etc. is that there is no commonly agreed 
 upon definition for such things.  Therefore the operators are omitted to 
 avoid confusion.
 
 int2(1, 2) == int2(1, 2);        //ok, evaluates to true
 float3(1, 0, 0) < float3(0, 0, 1);    // error, < is undefined for vectors

These two sections are contradictory: associative arrays need an ordering to be defined on the key type. How about a simple lexicographical order, with two non-equal vectors (of the same dimensionality) comparing as their first non-equal component would? (That's how comparisons work for arrays IIRC) Other than that I don't see any obvious problems with it.
Jan 30 2007
parent reply Mikola Lysenko <mclysenk mtu.edu> writes:
Frits van Bommel wrote:
 Mikola Lysenko wrote:
 All new vector types are pass-by-value, and may be used in combination 
 with any number of other types.  The following expressions would all 
 be valid:

 char[][ucent4] c;

[snip]
 Note that the only two comparison operators allowed on vectors are == 
 and !=.  The problem with <, >, etc. is that there is no commonly 
 agreed upon definition for such things.  Therefore the operators are 
 omitted to avoid confusion.

 int2(1, 2) == int2(1, 2);        //ok, evaluates to true
 float3(1, 0, 0) < float3(0, 0, 1);    // error, < is undefined for 
 vectors

These two sections are contradictory: associative arrays need an ordering to be defined on the key type. How about a simple lexicographical order, with two non-equal vectors (of the same dimensionality) comparing as their first non-equal component would? (That's how comparisons work for arrays IIRC) Other than that I don't see any obvious problems with it.

Hmm... Fair enough. We could define ordering on a per component level arbitrarily so associative arrays work. I could see this being confusing in some situations, but it would be worth it in order to make the types consistent. Therefore: float3(x, y, z) < float3(a, b, c) would translate too: (x < a) ? true : ((y < b) ? true : (z < c))
Jan 30 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Mikola Lysenko wrote:
 Hmm...  Fair enough.  We could define ordering on a per component level 
 arbitrarily so associative arrays work.  I could see this being 
 confusing in some situations, but it would be worth it in order to make 
 the types consistent.  Therefore:
 
 float3(x, y, z) < float3(a, b, c)
 
 would translate too:
 
 (x < a) ? true : ((y < b) ? true : (z < c))

Your code is wrong when e.g. (x > a) and (y < b). It returns false true while it should be false; x != a means ((x, y, z) < (a, b, c)) == (x < a). Think of the order of words in a dictionary: first you compare the first character, if it's equal you go on to the second, and so on. The first character that differs determines the sort order. It'd be implemented something like this: --- int compare (short3 a, short3 b) if (a.x != b.x) return a.x - b.x; if (a.y != b.y) return a.y - b.y; if (a.z != b.z) return a.z - b.z; return 0; // all elements equal, so the vectors are equal } --- Note 1: I used shorts to produce clear code but avoid overflows. Note 2: This uses the opCmp/TypeInfo.compare return value convention: negative means a < b, positive means a > b, zero means equality.
Jan 30 2007
parent reply Mikola Lysenko <mclysenk mtu.edu> writes:
Frits van Bommel wrote:
 Mikola Lysenko wrote:
 Hmm...  Fair enough.  We could define ordering on a per component 
 level arbitrarily so associative arrays work.  I could see this being 
 confusing in some situations, but it would be worth it in order to 
 make the types consistent.  Therefore:

 float3(x, y, z) < float3(a, b, c)

 would translate too:

 (x < a) ? true : ((y < b) ? true : (z < c))

Your code is wrong when e.g. (x > a) and (y < b). It returns false true while it should be false; x != a means ((x, y, z) < (a, b, c)) == (x < a). Think of the order of words in a dictionary: first you compare the first character, if it's equal you go on to the second, and so on. The first character that differs determines the sort order. It'd be implemented something like this: --- int compare (short3 a, short3 b) if (a.x != b.x) return a.x - b.x; if (a.y != b.y) return a.y - b.y; if (a.z != b.z) return a.z - b.z; return 0; // all elements equal, so the vectors are equal } --- Note 1: I used shorts to produce clear code but avoid overflows. Note 2: This uses the opCmp/TypeInfo.compare return value convention: negative means a < b, positive means a > b, zero means equality.

Whoops! I wasn't really thinking too carefully when I wrote that. You're correct, a lexicographic ordering is definitely the simplest in this case.
Jan 30 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Mikola Lysenko wrote:
 Whoops!  I wasn't really thinking too carefully when I wrote that. 
 You're correct, a lexicographic ordering is definitely the simplest in 
 this case.

It also fits really well with the "plain static array" proposed change: this is already what arrays do[1] :). [1]: Well, if they're the same length. Otherwise they compare by length, presumably for efficiency.
Jan 30 2007
prev sibling next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Thank you, this is in line with what I thought you meant. I'm glad I was 
right. Your proposal is sensible and well thought out.

The only improvement I'd make is instead of:

float4 f;

Have:

float[4] f;

i.e. make static arrays of 2,3,4 dimensions have the suggested 
properties. I think this is quite doable.
Jan 30 2007
next sibling parent Mikola Lysenko <mclysenk mtu.edu> writes:
Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I was 
 right. Your proposal is sensible and well thought out.
 
 The only improvement I'd make is instead of:
 
 float4 f;
 
 Have:
 
 float[4] f;
 
 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.

Yes, Oskar made a good point in his post. The main issue I was concerned with was the vector literal syntax. Currently array literals are allocated on the heap when you declare them, which would be prohibitively costly in most situations. In order for low-dimension vectors to be useful, they need a simple constructor syntax that does not allocate anything on the heap.
Jan 30 2007
prev sibling next sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I was 
 right. Your proposal is sensible and well thought out.
 
 The only improvement I'd make is instead of:
 
 float4 f;
 
 Have:
 
 float[4] f;
 
 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.

I don't think this is advisable. It practically introduces an arbitrary discontinuity. Think of a template of T and N that manipulates arrays T[N]. All of a sudden, for certain values of T and N, arrays started being passed around by value. What exactly makes small vectors unsuitable for an implementation via library elements? If we distill a list, maybe we also can figure ways in which we can make the language better, not only one special case of it. Andrei
Jan 30 2007
next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I 
 was right. Your proposal is sensible and well thought out.

 The only improvement I'd make is instead of:

 float4 f;

 Have:

 float[4] f;

 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.

I don't think this is advisable. It practically introduces an arbitrary discontinuity. Think of a template of T and N that manipulates arrays T[N]. All of a sudden, for certain values of T and N, arrays started being passed around by value.

I agree that it's a very bad idea to have such a discontinuity in ref vs value. Static arrays would have to be all by value or all by ref. Then, you'll just see slower speed for the larger ones.
Jan 30 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I 
 was right. Your proposal is sensible and well thought out.

 The only improvement I'd make is instead of:

 float4 f;

 Have:

 float[4] f;

 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.

I don't think this is advisable. It practically introduces an arbitrary discontinuity. Think of a template of T and N that manipulates arrays T[N]. All of a sudden, for certain values of T and N, arrays started being passed around by value.

I agree that it's a very bad idea to have such a discontinuity in ref vs value. Static arrays would have to be all by value or all by ref. Then, you'll just see slower speed for the larger ones.

So then arrays by ref and a stdlib template that wraps an array in a struct could make everyone happy. Andrei
Jan 30 2007
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I 
 was right. Your proposal is sensible and well thought out.

 The only improvement I'd make is instead of:

 float4 f;

 Have:

 float[4] f;

 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.

I don't think this is advisable. It practically introduces an arbitrary discontinuity. Think of a template of T and N that manipulates arrays T[N]. All of a sudden, for certain values of T and N, arrays started being passed around by value.

I agree that it's a very bad idea to have such a discontinuity in ref vs value. Static arrays would have to be all by value or all by ref. Then, you'll just see slower speed for the larger ones.

So then arrays by ref and a stdlib template that wraps an array in a struct could make everyone happy.

I disagree. What you describe is a way of working around a design issue instead of resolving it. The C (and D) static array has always been an odd type. It is not assignable, but is passed by reference to functions. You can not return static arrays from functions. Template libraries like std.bind have to add lots of extra code to special case the static arrays. Structs containing static arrays behave differently than static arrays themselves. Also, in D, the static array is the only type where typeof(T.init) != T. If static arrays became a value type, all irregularities could disappear. The only place this change would make any difference for old code is where static arrays are passed as function parameters. I believe the amount of code containing functions with static array arguments is very small in D. In fact, only two Phobos modules contain such functions (std.utf and std.md5). Converting those functions would be trivial: The by ref behavior would be attainable with the inout and out parameter types that today are forbidden for static arrays. /Oskar
Feb 01 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Oskar Linde wrote:
 The C (and D) static array has always been an odd type. It is not 
 assignable, but is passed by reference to functions. You can not return 
 static arrays from functions. Template libraries like std.bind have to 
 add lots of extra code to special case the static arrays. Structs 
 containing static arrays behave differently than static arrays 
 themselves. Also, in D, the static array is the only type where 
 typeof(T.init) != T.
 
 If static arrays became a value type, all irregularities could disappear.

But then people will complain about the inconsistency between static arrays and dynamic arrays. Why some are passed by value and others are passed by reference?
 The only place this change would make any difference for old code is 
 where static arrays are passed as function parameters. I believe the 
 amount of code containing functions with static array arguments is very 
 small in D. In fact, only two Phobos modules contain such functions 
 (std.utf and std.md5). Converting those functions would be trivial: The 
 by ref behavior would be attainable with the inout and out parameter 
 types that today are forbidden for static arrays.

Probably things could be made to work; I personally am unclear on what's the most self-consistent set of rules. Andrei
Feb 01 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Oskar Linde wrote:
 If static arrays became a value type, all irregularities could disappear.

But then people will complain about the inconsistency between static arrays and dynamic arrays. Why some are passed by value and others are passed by reference?

I don't see much complaints about the "inconsistency" between structs and classes (which is probably the closest thing in the language currently). Besides, static arrays are already more of a value type (typically allocated on the stack, directly in an aggregate, or in static data, like structs) and dynamic arrays more of a reference type (typically allocated on the heap, like classes). The biggest difference is that dynamic arrays _can_ also refer to static arrays.
Feb 01 2007
prev sibling parent Lionello Lunesu <lio lunesu.remove.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I 
 was right. Your proposal is sensible and well thought out.

 The only improvement I'd make is instead of:

 float4 f;

 Have:

 float[4] f;

 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.

I don't think this is advisable. It practically introduces an arbitrary discontinuity. Think of a template of T and N that manipulates arrays T[N]. All of a sudden, for certain values of T and N, arrays started being passed around by value.

That difference could be made using different notations: void func( float[4] ar ); // by value void funct(T,uint N)( T[N] ar ); void func( float[] ar ); // by ref void funct(T)( T[] ar ); This makes sense, too, since "float[4] x;" is already allocated on the stack.
 What exactly makes small vectors unsuitable for an implementation via 
 library elements? If we distill a list, maybe we also can figure ways in 
 which we can make the language better, not only one special case of it.

This is already very doable, thanks to the possibility of calling global functions taking an array as if they were members of the array: float Length( float[] ar ); // by ref float[4] ar; auto len = ar.Length(); L.
Jan 31 2007
prev sibling parent janderson <askme me.com> writes:
Walter Bright wrote:
 Thank you, this is in line with what I thought you meant. I'm glad I was 
 right. Your proposal is sensible and well thought out.
 
 The only improvement I'd make is instead of:
 
 float4 f;
 
 Have:
 
 float[4] f;
 
 i.e. make static arrays of 2,3,4 dimensions have the suggested 
 properties. I think this is quite doable.

I think some of the properties could be used with multiple dimensions. SIMD could still be used efficiently on these higher orders (just repeat the operation when you run out). This also opens up opportunities for parallel operations in the future. Anything else could go in the standard library. Swizzle is the odd one out. It would have to be a special case. We can only support up to N characters with it, unless you do something fancy like: float[8] value; float[8] value2; value.xyz = value2.x; //xyz in value are set to value2.x value.xyz[xyz] = value2.x; //Sets both value[0-3] to xyz and value[4-7] to value2.x. or (value+4).xyz = value2.x; //Just offset the array (value[4-7] are set to value2.x) -Joel
Jan 31 2007
prev sibling next sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Having read the proposal, I think it can likely be accommodated entirely 
with current and up-and-coming language features. To expand:

Mikola Lysenko wrote:
 It is important to make the distinction between low dimension vectors 
 and other higher order arrays.  This is crucial since the higher 
 dimensional arrays are often more tenuously connected with any sort of 
 geometric or physical interpretation.

Well I disagree here, but it's not important. Look outside your domain. There are people who think as well the future belongs to machine learning and statistical methods, which are ripe with large dimensional vectors. I think it's just shortsighted to claim that geometrical and physical interpretation is more special than others.
 Moreover, many architectures are 
 specially optimized for the lower dimensional cases, and offer special 
 registers which are virtually impossible to properly exploit using 
 libraries alone.  The situation is analogous to floating point 
 arithmetic, proper hardware support requires language level integration.

I don't see much of an analogy, but this is also a minor point. Floating point arithmetic was weird way before it was implemented in hardware, and the weirdness reflected in languages. But on many machines adding two floating point numbers required a routine call, which didn't add or subtract anything to the equation.
 1. Additional primitive types

Let's try: s/primitive types/types/ig and see what happens, just for the sake of experiment.
 All numerical primitive types in the language will be supplemented with 
 3 additional vector extensions.  Each vector extension will contain 2, 3 
 or 4 components since these are best supported by hardware, and the most 
 relevant geometrically.  The initial value for each component in each 
 type will be the same as that of its base type.  Each the size of each 
 vector component will be equal to the size of its base type times its 
 dimension.  Here is a complete list of the new types to be added:
 
 Integer types:
 byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, 
 long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, 
 ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, 
 ucent3, ucent4
 
 Floating types:
 float2, float3, float4, double2, double3, double4, real2, real3, real4
 
 Imaginary types:
 ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, ireal3, 
 ireal4
 
 Complex types:
 cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, creal3, 
 creal4

My variant define vector_ops module: struct small_vector(base, ubyte size) { base[size] data; ... } alias small_vector!(byte, 2) byte2; ...
 All new vector types are pass-by-value, and may be used in combination 
 with any number of other types.

Check this one. Structs are passed by value, and various constructors and (up-and-coming) conversions will take care of the rest.
 The following expressions would all be 
 valid:
 
 int2[] a;
 float4* b;
 char[][ucent4] c;

Check that one.
 2. Vector literals
 
 It must be possible to declare a vector literal with unambiguously as an 
 argument to a function, or any number of other situations.  This syntax 
 must be concise since it will be used often, and it must be flexible, 
 allowing vectors to be constructed via concatenation.  In this light, 
 the following is proposed:
 
 float4(1, 2, 3, 4);
 int2(100, 200);
 creal3(1 + 2i, 3+4i, 5-6i);

Constructors. Check that one.
 Additionally, this syntax will also support the construction of vectors 
 from other sub vectors.  This can be useful for augmenting a 3-vector 
 with a fourth homogenous component for transformation amongst other 
 things.  Here are some examples:
 
 float4(float3(x, y, z), 1);        // returns (x, y, z, 1)
 int3(0, int2(a, b));            // returns (0, a, b)
 creal4(ireal2(r, z), real2(p, q));    // returns (r, z, p, q)

Constructors. Check that one too.
 3. Casting rules
 
 Vector types will obey the same casting rules as their base type within 
 a single dimension.  It is not possible to use a cast to change the 
 dimension of a vector.
 
 cast(float2)(int2(1, 2));    //ok
 cast(int)(float2(3, 4));    //error, int is not the same dimension as 
 float2
 real3(a, b) + ireal(c, d);    //ok, type of expression is creal3

opCast. Yup, check that sucker too.
 4. Component-Wise Arithmetic
 
 All of the arithmetic operators on primitive types will be extended such 
 that when applied to vectors they operate on each component 
 independently.  The size and types of the vectors must be compatible 
 with the operator.
 
 float4(1, 2, 3, 4) + float4(3, 5, 6, 7);     // Vector-Vector operation, 
 returns (4, 7, 9, 11)
 float3(1, 5, 6) * float3(1, 0, 2);        // returns (1, 0, 12)
 uint2(0xFF00FF, 0x00FF88) & uint2(0xFF, 0xFF);    // returns (0xFF, 0x88)
 
 int3(x, y, z) * int2(a, b);            // error, size of vectors does 
 not match

Overloaded operators. Check.
 Additionally, scalar vector operations will be supported.  Each 
 scalar-vector operator is applied per-component within the vector.
 
 int2(0, 3) * 5;                    // Vector-Scalar operation, returns 
 (0, 15)
 byte2(2, 5) + 6;                // returns (8, 11)

Overloaded operators. Check.
 Note that the only two comparison operators allowed on vectors are == 
 and !=.  The problem with <, >, etc. is that there is no commonly agreed 
 upon definition for such things.  Therefore the operators are omitted to 
 avoid confusion.
 int2(1, 2) == int2(1, 2);        //ok, evaluates to true
 float3(1, 0, 0) < float3(0, 0, 1);    // error, < is undefined for 

Check :o).
 5. Swizzle operations
 
 In many vector applications, it is often necessary to reorder the 
 components.  In shader languages this is accomplished via 'swizzle' 
 properties in each vector type.  Each swizzle selects a set of 1-4 
 vector components and places them into a new vector of the given 
 dimension.  Conventionally these components have the following names:
 
 x - first component
 y - second component
 z - third component
 w - fourth component
 
 A swizzle is then given as a property consisting of 1-4 components in a 
 given order.

This is more interesting. Today's D can't do it. Fortunately, code generation techniques that Walter works on right now (possibly literally at the time I'm writing this and/or at the time you're reading this) will enable comfortable generation of combinatorial functions, of which swizzles is actually an excellent motivating example. Here are some examples:
 
 float2(a, b).yx;        // evaluates to (b, a)
 float3(1, 2, 3).zzyx;        // evaluates to (3, 3, 2, 1)
 byte4('a', 'b', 'c', 'd').w;    // evaluates to 'd'
 creal3(1 + 2i, 0, 4i).zzzz;    // evaluates to (4i, 4i, 4i, 4i)
 
 int2(1, 2).z;            // error, vector does not have 3 components

Check (for D 2.0).
 These can be useful when projecting homogenous vectors into a lower 
 dimension subspace, or when computing various products.  Here is an 
 example of a cross product implemented using swizzles:
 
 float3 cross(float3 a, float3 b)
 {
     return a.zxy * b.yzx - a.yzx * b.zxy;
 }
 
 And here is a simple homogenous projection:
 
 float3 project(float4 hg)
 {
     return hg.xyz / hg.w;
 }

Yum. Love that.
 6. Standard Library Additions

Check by virtue of definition :o). Andrei
Jan 30 2007
next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Having read the proposal, I think it can likely be accommodated entirely 
 with current and up-and-coming language features. To expand:

 Moreover, many architectures are specially optimized for the lower 
 dimensional cases, and offer special registers which are virtually 
 impossible to properly exploit using libraries alone.  The situation 
 is analogous to floating point arithmetic, proper hardware support 
 requires language level integration.

I don't see much of an analogy, but this is also a minor point. Floating point arithmetic was weird way before it was implemented in hardware, and the weirdness reflected in languages. But on many machines adding two floating point numbers required a routine call, which didn't add or subtract anything to the equation.

Your responses to the other points seem reasonable and encouraging, but your response to this issue concerning performance is... what? That making optimal use of the hardware is not important? For more specifics of what Mikola finds problematic on the efficiency front see: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmar .D&article_id=47596 --bb
Jan 30 2007
parent "Andrei Alexandrescu (See Website for Email)" <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Having read the proposal, I think it can likely be accommodated 
 entirely with current and up-and-coming language features. To expand:

 Moreover, many architectures are specially optimized for the lower 
 dimensional cases, and offer special registers which are virtually 
 impossible to properly exploit using libraries alone.  The situation 
 is analogous to floating point arithmetic, proper hardware support 
 requires language level integration.

I don't see much of an analogy, but this is also a minor point. Floating point arithmetic was weird way before it was implemented in hardware, and the weirdness reflected in languages. But on many machines adding two floating point numbers required a routine call, which didn't add or subtract anything to the equation.

Your responses to the other points seem reasonable and encouraging, but your response to this issue concerning performance is... what? That making optimal use of the hardware is not important?

Nonono. I just meant: the fact that today's architectures have dedicated operations and hardware for floating point did not change the semantics of floating point operations in programming languages.
 For more specifics of what Mikola finds problematic on the efficiency 
 front see:
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmar
.D&article_id=47596 

Thanks. I'm seeing: 1. Function calls I agree that optimization of compound operations a la d = a * b + c is hard, but it should be addressed in a principled way (e.g. expression templates). 2. Register allocation This has been discussed in an older thread on comp.lang.c++.moderated. Basically current compiler technology conservatively puts every "serious" data structure (e.g. structs with constructors) in real memory, not registers. I see this as more of a limitation of current compiler technology, than a fundamental issue. Of course, I do no work on improving said technology, which lessens my point. :o) 3. Data alignment D must add primitives to define types aligned at arbitrary boundaries. This will not only help small vectors, but a host of other systems-level software as well (e.g. allocators). Andrei
Jan 30 2007
prev sibling parent reply Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Mikola Lysenko wrote:
 3. Casting rules

 Vector types will obey the same casting rules as their base type
 within a single dimension.  It is not possible to use a cast to change
 the dimension of a vector.

 cast(float2)(int2(1, 2));    //ok
 cast(int)(float2(3, 4));    //error, int is not the same dimension as float2
 real3(a, b) + ireal(c, d);    //ok, type of expression is creal3

opCast. Yup, check that sucker too.

I presume the original proposal would have allowed both cast(float2)(int2(1, 2)) and cast(real2)(int2(1, 2)). Currently, however, we can have only one opCast per struct/class. Is this a planned D 2.0 change I've missed, or how would this work?
Feb 01 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Deewiant wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Mikola Lysenko wrote:
 3. Casting rules

 Vector types will obey the same casting rules as their base type
 within a single dimension.  It is not possible to use a cast to change
 the dimension of a vector.

 cast(float2)(int2(1, 2));    //ok
 cast(int)(float2(3, 4));    //error, int is not the same dimension as float2
 real3(a, b) + ireal(c, d);    //ok, type of expression is creal3


I presume the original proposal would have allowed both cast(float2)(int2(1, 2)) and cast(real2)(int2(1, 2)). Currently, however, we can have only one opCast per struct/class. Is this a planned D 2.0 change I've missed, or how would this work?

It is a change that I personally think should be made. Walter devised the one cast per class rule out of caution when facing an issue known to be potentially disastrous. Generally, the "when in doubt, don't put it in" approach turned out to be a winner for D in general --- the "length" Chernobyl disaster notwithstanding :o). As our grasping of the tradeoffs involved improves, the right choices will appear naturally. This is sort of what happened with opAssign. Andrei
Feb 01 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 It is a change that I personally think should be made. Walter devised 
 the one cast per class rule out of caution when facing an issue known to 
 be potentially disastrous. Generally, the "when in doubt, don't put it 
 in" approach turned out to be a winner for D in general --- the "length" 
 Chernobyl disaster notwithstanding :o).

Yes, it's the "when we don't know what the semantics of this combination of features should be, make it an error" rule. That way, when it sometimes eventually becomes clear what those semantics should be, we don't have a backwards compatibility problem.
Feb 01 2007
parent "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 It is a change that I personally think should be made. Walter devised 
 the one cast per class rule out of caution when facing an issue known 
 to be potentially disastrous. Generally, the "when in doubt, don't put 
 it in" approach turned out to be a winner for D in general --- the 
 "length" Chernobyl disaster notwithstanding :o).

Yes, it's the "when we don't know what the semantics of this combination of features should be, make it an error" rule. That way, when it sometimes eventually becomes clear what those semantics should be, we don't have a backwards compatibility problem.

That's how D started without templates and ended up with having one of the best template systems around. Same, I think, will happen to macros. Andrei
Feb 01 2007
prev sibling next sibling parent Chad J <gamerChad _spamIsBad_gmail.com> writes:
Another thing I realized it would be nice to have is some intrinsic 
functions.

Here are some things I can think of off hand:

/// Saturated componentwise addition of two vectors.
adds( T vector1, T vector2 );

/// Multiply Add.
madd( T vector1, T vector2 );

/// Shifts all bits in "vector" right or left, respectively.
shiftr( T vector );
shiftl( T vector );

/** Comparison routines that return masks.
     Compare the operands in the appropriate way.  If the comparison is
       true for a particular set of components, the resulting component
       will be filled with all 1's, otherwise it will be filled with all
       0's.
     These should be hardware defined CPUs with SSEs and MMX, which are
       used on most desktops today and probably for a looong time.
*/
compareLess( T vector1, T vector2 ); // returns vector1 < vector2
compareLessEquals( T vector1, T vector2 ); // returns vector1 <= vector2
compareGreater( T vector1, T vector2 ); // returns vector1 > vector2
... (you get the idea)


I'm looking forward to some slick SIMD support in D!  I'm getting 
excited just to see this considered.  *stoked*

Oh and speaking of intrinsics, std.intrinsics is lacking the kinda 
obvious rotate left and rotate right (rol/ror) :(
Jan 30 2007
prev sibling next sibling parent reply Knud Soerensen <4tuu4k002 sneakemail.com> writes:
I think it is better to make very general framework for describing the
vector operations, because when we have a very general description 
then the compiler is free to optimize as it sees fit.
You can think at it as a SQL for vector operations.
See http://all-technology.com/eigenpolls/dwishlist/index.php?it=10
There is also a suggestion for a general form of Swizzle operations
http://all-technology.com/eigenpolls/dwishlist/index.php?it=11

I think that you forget that a program using vectors typical use a lot 
of them and therefor it needs to transform a lot at a time.

Exsample take a models using 1000 vectors.

Your suggestion would suggest defining it like 
float3 model[1000].
and it would suggest that they where placed in memory as
x1 y1 z1
x2 y2 z2
x3 y3 z3
.
.
.

each vector following each other.

But now imagine that the most used operation in the program 
is a plan projection on the x-y plan, then we only need the x and y
coordinate.
So an arrangement in memory which looks like 
x1 x2 x3 . . .
y1 y2 y3 . . .
z1 z2 z3 . . .
would result in that the cpu could specific load only the x and the y
coordinates into the cache, which would speed up the calculations a lot.

So, if one could specify that 3x1000 floats and let the compiler
arrange them in memory that would be optimal.
Feb 02 2007
next sibling parent janderson <askme me.com> writes:
Knud Soerensen wrote:


 You can think at it as a SQL for vector operations.
 See http://all-technology.com/eigenpolls/dwishlist/index.php?it=10
 There is also a suggestion for a general form of Swizzle operations
 http://all-technology.com/eigenpolls/dwishlist/index.php?it=11
 
 I think that you forget that a program using vectors typical use a lot 
 of them and therefor it needs to transform a lot at a time.
 
 Exsample take a models using 1000 vectors.
 
 Your suggestion would suggest defining it like 
 float3 model[1000].
 and it would suggest that they where placed in memory as
 x1 y1 z1
 x2 y2 z2
 x3 y3 z3
 .
 .
 .
 
 each vector following each other.
 
 But now imagine that the most used operation in the program 
 is a plan projection on the x-y plan, then we only need the x and y
 coordinate.
 So an arrangement in memory which looks like 
 x1 x2 x3 . . .
 y1 y2 y3 . . .
 z1 z2 z3 . . .
 would result in that the cpu could specific load only the x and the y
 coordinates into the cache, which would speed up the calculations a lot.
 
 So, if one could specify that 3x1000 floats and let the compiler
 arrange them in memory that would be optimal.

I think this is a special case. Typically I'd say the programmer would be applying operations on several component at a time. The SIMD is setup this way. Maybe the compiler could detect the case your taking about, however I think eventually the elements in the array would want to be used as components. I do however think a long list of values (such as a vertex buffer) would gain from SIMD optimized operations (future hardware devices which will add more and more array support and parallelism).
 I think it is better to make very general framework for describing the
 vector operations, because when we have a very general description
 then the compiler is free to optimize as it sees fit.

This I agree with. The first versions need not provide SIMD optimizations they just need to be done in such a way that these can be added later without changing our code. I can imagine situations in the future where were write for instance a multiply and then an add. The compiler sees this (re-arrange code as it needs to) and change it to the SIMD muladd. ie arrays become like just another numerical datatype the compiler can manipulate. This sort of thing will help reduce micro-optimizations needed for this type of coding. -Joel
Feb 02 2007
prev sibling parent reply Mikola Lysenko <mclysenk mtu.edu> writes:
Knud Soerensen wrote:
 I think it is better to make very general framework for describing the
 vector operations, because when we have a very general description 
 then the compiler is free to optimize as it sees fit.
 You can think at it as a SQL for vector operations.

Well, I can't say that I share your enthusiasm for the power of necromantic compiler optimizations. If you want to see what a vectorizing compiler looks like, take a gander at: http://www.codeplay.com/ VectorC is a C compiler which focuses on architecture specific SIMD optimizations. The problem with their approach is that it is too general. If you want to create optimized vector code, you need to tweak things just-so to make the compiler understand it. This means tedious trial-and-error optimization sessions where you end up rewriting the same piece of C-code 50 times in order to get the compiler to spit out the right assembler. To facilitate this process, they provide several tools and pragmas you can use to give the compiler hints. By the time you are done, you have a very fast vector optimized SSE function that only works on one specific compiler. Good luck porting your results it to GCC! Does it save you time at the end of the day? Maybe, but what happens when some future releases changes the optimizer and your code no longer comes out all clean and vectorized? You have to go back and repeat the process from scratch. For the amount of hair pulling these compilers cause I would just as soon write the damn code in assembler and be done with it once and for all. The basic issue with VectorC and all the other vectorizing compilers is that they try to do too much. Everyone seems so hell bent on creating this hyper-general-super-chocolate-coated-vectorizing-optimizer, when in reality it's not even necessary. Small geometric vectors are far more common, and improving their utility is a much bigger win for less effort. We need a convenient and fast set of arithmetic routines for doing simple geometric stuff, not an overarching super vector-framework.
 ... imagine that the most used operation in the program 
 is a plan projection on the x-y plan, then we only need the x and y
 coordinate.
 So an arrangement in memory which looks like 
 x1 x2 x3 . . .
 y1 y2 y3 . . .
 z1 z2 z3 . . .
 would result in that the cpu could specific load only the x and the y
 coordinates into the cache, which would speed up the calculations a lot.
 
 So, if one could specify that 3x1000 floats and let the compiler
 arrange them in memory that would be optimal.

This doesn't strike me as very realistic. How is the compiler ever going to figure out that you wanted to project these vectors onto the x-y plane somewhere down the road? This is a very difficult global optimization, and I can think of no way to perform it without direct programmer intervention. Moreover, I'm not sure why you split up the x-y components. A much better layout would look like this: x y x y x y x y x y ... z z z .... Now the alignment for each vector is such that you could load 2 x-y pairs into a single SSE register at once. This is not only better for caching, but it is also better for performance. However, neither the layout you proposed or the one above make much sense. Typically you want to do more with vectors than just drop the z-component. You need to perform matrix multiplication, normalization, dot and cross products etc. Each of these operates most efficiently when all vector components are packed into a single SIMD register. Therefore it seems clear that the preferred layout ought to be: x y z w x y z w .... I can't really fathom why you would want to go through the extra trouble of splitting up each component. Not only does it make arithmetic less efficient, but it also creates a book keeping nightmare. Imagine trying to grow the total number of vectors in your original layout! You would need to do multiple costly memcpy operations. For low dimension vectors, the necessary compiler optimizations are obvious, and there is one clear 'best' data layout. We don't need any fancy compiler magic, since the code can be directly translated into vector operations just like ordinary arithmetic.
Feb 03 2007
parent janderson <askme me.com> writes:
Mikola Lysenko wrote:
 Knud Soerensen wrote:
 I think it is better to make very general framework for describing the
 vector operations, because when we have a very general description 
 then the compiler is free to optimize as it sees fit.
 You can think at it as a SQL for vector operations.

Moreover, I'm not sure why you split up the x-y components. A much better layout would look like this: x y x y x y x y x y ... z z z .... Now the alignment for each vector is such that you could load 2 x-y pairs into a single SSE register at once. This is not only better for caching, but it is also better for performance. However, neither the layout you proposed or the one above make much sense. Typically you want to do more with vectors than just drop the z-component. You need to perform matrix multiplication, normalization, dot and cross products etc. Each of these operates most efficiently when all vector components are packed into a single SIMD register. Therefore it seems clear that the preferred layout ought to be: x y z w x y z w .... I can't really fathom why you would want to go through the extra trouble of splitting up each component. Not only does it make arithmetic less efficient, but it also creates a book keeping nightmare. Imagine trying to grow the total number of vectors in your original layout! You would need to do multiple costly memcpy operations. For low dimension vectors, the necessary compiler optimizations are obvious, and there is one clear 'best' data layout. We don't need any fancy compiler magic, since the code can be directly translated into vector operations just like ordinary arithmetic.

I couldn't agree more with this last part. However I must say that for 'very particular problems' in the past I've found that splitting structs (vectors or whatever) up by there types can have significant performance gains when your doing batching operations on one component, due to cache. However this so special case and would be very difficult for the compiler to figure out. Its something I believe the programmer should do not the compiler. -Joel
Feb 03 2007
prev sibling parent Knud Soerensen <4tuu4k002 sneakemail.com> writes:
On Sat, 03 Feb 2007 13:18:55 -0500, Mikola Lysenko wrote:

 Knud Soerensen wrote:
 I think it is better to make very general framework for describing the
 vector operations, because when we have a very general description 
 then the compiler is free to optimize as it sees fit.
 You can think at it as a SQL for vector operations.

Well, I can't say that I share your enthusiasm for the power of necromantic compiler optimizations. If you want to see what a vectorizing compiler looks like, take a gander at: http://www.codeplay.com/ VectorC is a C compiler which focuses on architecture specific SIMD optimizations. The problem with their approach is that it is too general. If you want to create optimized vector code, you need to tweak things just-so to make the compiler understand it. This means tedious trial-and-error optimization sessions where you end up rewriting the same piece of C-code 50 times in order to get the compiler to spit out the right assembler. To facilitate this process, they provide several tools and pragmas you can use to give the compiler hints. By the time you are done, you have a very fast vector optimized SSE function that only works on one specific compiler. Good luck porting your results it to GCC! Does it save you time at the end of the day? Maybe, but what happens when some future releases changes the optimizer and your code no longer comes out all clean and vectorized? You have to go back and repeat the process from scratch. For the amount of hair pulling these compilers cause I would just as soon write the damn code in assembler and be done with it once and for all.

We need a way to write the vector code once and be able to compile it on many architectures with good results.
 The basic issue with VectorC and all the other vectorizing compilers is 
 that they try to do too much.  Everyone seems so hell bent on creating 
 this hyper-general-super-chocolate-coated-vectorizing-optimizer, when in 
 reality it's not even necessary.  Small geometric vectors are far more 
 common, and improving their utility is a much bigger win for less 
 effort.

That might be true for your projects but not for mine.
  We need a convenient and fast set of arithmetic routines for 
 doing simple geometric stuff, not an overarching super vector-framework.

Actually, what I need is an overarching super TENSOR-framework ;-)
 ... imagine that the most used operation in the program 
 is a plan projection on the x-y plan, then we only need the x and y
 coordinate.
 So an arrangement in memory which looks like 
 x1 x2 x3 . . .
 y1 y2 y3 . . .
 z1 z2 z3 . . .
 would result in that the cpu could specific load only the x and the y
 coordinates into the cache, which would speed up the calculations a lot.
 
 So, if one could specify that 3x1000 floats and let the compiler
 arrange them in memory that would be optimal.

This doesn't strike me as very realistic. How is the compiler ever going to figure out that you wanted to project these vectors onto the x-y plane somewhere down the road? This is a very difficult global optimization, and I can think of no way to perform it without direct programmer intervention.

Yes, the optimization might be difficult but the point is that the code should not impose artificial limitation on the compiler. If the compiler know how to make the optimization then it is free to make it.
 Moreover, I'm not sure why you split up the x-y components.  A much 
 better layout would look like this:
 
 x y x y x y x y x y ...
 z z z ....
 
 Now the alignment for each vector is such that you could load 2 x-y 
 pairs into a single SSE register at once.  This is not only better for 
 caching, but it is also better for performance.

Good a better example :-)
 However, neither the layout you proposed or the one above make much 
 sense.  

I do not mean proposed a specific layout what I tried to say was that the compiler should be free to chose the layout which fit the problem.
 Typically you want to do more with vectors than just drop the 
 z-component.  You need to perform matrix multiplication, normalization, 
 dot and cross products etc.  Each of these operates most efficiently 
 when all vector components are packed into a single SIMD register. 
 Therefore it seems clear that the preferred layout ought to be:
 
 x y z w x y z w ....

Then the compiler should use this layout for those typical problems, but we should not limit the compiler to generate sub-optimal code for non-typical problems.
 I can't really fathom why you would want to go through the extra trouble 
 of splitting up each component.  Not only does it make arithmetic less 
 efficient, but it also creates a book keeping nightmare.  Imagine trying 
 to grow the total number of vectors in your original layout!  You would 
 need to do multiple costly memcpy operations.

Imagine trying to grow the number of dimensions in your layout ;-)
 For low dimension vectors, the necessary compiler optimizations are 
 obvious, and there is one clear 'best' data layout.  We don't need any 
 fancy compiler magic, since the code can be directly translated into 
 vector operations just like ordinary arithmetic.

Yes, good performance for typical problems is important but we should be careful not to chose a notation which ruin the performance for non-typical problems.
Feb 03 2007