D - Optimising

nicO (18/18) Aug 17 2001 I'm from the fcpu project (f-cpu.org), the goal is to create a "free

Walter (3/21) Aug 17 2001 Since D supports rectangular arrays, much more aggressive array optimiza...

LuigiG (7/33) Aug 18 2001 BTW, Microsoft's compiler has a

Walter (4/8) Aug 18 2001 I decided long ago not to support such a switch. It would really only be

LuigiG (5/13) Aug 19 2001 Yep,

nicO (3/31) Aug 18 2001 Is that enough to introduice strip mining ? Or to change the order of

Walter (3/10) Aug 18 2001 I think so.

nicO (2/15) Aug 19 2001 Good news !
nicO (2/15) Aug 19 2001 And what about vector computing ? (for the use of MMX and SSE)

Walter (2/3) Aug 19 2001 I don't know.

nicO (4/9) Aug 19 2001 To work better with current CPU, any compiler should use SIMD

Sean L. Palmer (24/33) Oct 29 2001 I think the key concept to take advantage of SSE and MMX and 3DNow etc S...

Walter (10/45) Jan 09 2002 You're right. -Walter

Sean L. Palmer (80/131) Jan 10 2002 If D had good native support for SIMD programming, that would seem to be...

Walter (16/89) Jan 10 2002 Actually, I think D does with its array operation semantics. They aren't

Pavel Minayev (3/5) Jan 10 2002 But there is no dot and cross product for vectors.

Walter (3/8) Jan 10 2002 True.

Pavel Minayev (4/5) Jan 10 2002 Then why not add them? As well as matrix multiply? And optimize

Walter (3/8) Jan 11 2002 Because I need to finish what I've got first!

Robert W. Cunningham (6/16) Jan 12 2002 I nominate Walter to be the first human to be cloned...

Pavel Minayev (4/7) Jan 12 2002 Yes, but who will hold the copyright?

Pavel Minayev (3/4) Jan 12 2002 I don't mean "now". I mean "in general" =)

nicO <nicolas.boulay ifrance.com> writes:

I'm from the fcpu project (f-cpu.org), the goal is to create a "free
cpu". 

One of our main problem is the compiler. I have wrote some algorithme in
C to improve matrix multiplication (using gcc on a P3). I have used loop
unrolling, software pipelining, a have brake a lot of false
dependancies, remove calculation from inner loop. I reorder the loop to
align data access and do strip mining. Most of this are processor
dependant and should be done by the compiler. I have won an average of 4
times compare to usual algorithme, my best gain was made on an 1.2Ghz
athlon processors with 512*512 matrix : x25 !

One of the main problem is the alias problem induice by pointer. In my
test, i lose 25% of the performance by using pointer instead of array.

So what in D will be done to improve the use of new instruction ? All
new cpu use conditional move (avoid to empty the pipeline), and vector
instructions (like MMX and SSE). We can also add the fact to compile for
2 processors and more.

What do you think ?

nicO

Aug 17 2001

"Walter" <walter digitalmars.com> writes:

Since D supports rectangular arrays, much more aggressive array optimization
becomes possible.

nicO wrote in message <3B7DFFFC.7FB87E4F ifrance.com>...
I'm from the fcpu project (f-cpu.org), the goal is to create a "free
cpu".

One of our main problem is the compiler. I have wrote some algorithme in
C to improve matrix multiplication (using gcc on a P3). I have used loop
unrolling, software pipelining, a have brake a lot of false
dependancies, remove calculation from inner loop. I reorder the loop to
align data access and do strip mining. Most of this are processor
dependant and should be done by the compiler. I have won an average of 4
times compare to usual algorithme, my best gain was made on an 1.2Ghz
athlon processors with 512*512 matrix : x25 !

One of the main problem is the alias problem induice by pointer. In my
test, i lose 25% of the performance by using pointer instead of array.

So what in D will be done to improve the use of new instruction ? All
new cpu use conditional move (avoid to empty the pipeline), and vector
instructions (like MMX and SSE). We can also add the fact to compile for
2 processors and more.

What do you think ?

nicO

Aug 17 2001

"LuigiG" <ask.if iyou.need.it> writes:

BTW, Microsoft's compiler has a
"/Oa assume no aliasing"
switch.
kind of playing with fire I assume ;)

"Walter" <walter digitalmars.com> wrote in message
news:9lkk0g$2qdq$2 digitaldaemon.com...
 Since D supports rectangular arrays, much more aggressive array

optimization
 becomes possible.

 nicO wrote in message <3B7DFFFC.7FB87E4F ifrance.com>...
I'm from the fcpu project (f-cpu.org), the goal is to create a "free
cpu".

One of our main problem is the compiler. I have wrote some algorithme in
C to improve matrix multiplication (using gcc on a P3). I have used loop
unrolling, software pipelining, a have brake a lot of false
dependancies, remove calculation from inner loop. I reorder the loop to
align data access and do strip mining. Most of this are processor
dependant and should be done by the compiler. I have won an average of 4
times compare to usual algorithme, my best gain was made on an 1.2Ghz
athlon processors with 512*512 matrix : x25 !

One of the main problem is the alias problem induice by pointer. In my
test, i lose 25% of the performance by using pointer instead of array.

So what in D will be done to improve the use of new instruction ? All
new cpu use conditional move (avoid to empty the pipeline), and vector
instructions (like MMX and SSE). We can also add the fact to compile for
2 processors and more.

What do you think ?

nicO

Aug 18 2001

"Walter" <walter digitalmars.com> writes:

LuigiG wrote in message <9ll862$5ch$1 digitaldaemon.com>...
BTW, Microsoft's compiler has a
"/Oa assume no aliasing"
switch.
kind of playing with fire I assume ;)


I decided long ago not to support such a switch. It would really only be
useful to someone who understood exactly how the internal compiler
optimizations really worked, which is likely nobody <g>.

Aug 18 2001

"LuigiG" <ask.if iyou.need.it> writes:

"Walter" <walter digitalmars.com> wrote in message
news:9lmlj0$111a$3 digitaldaemon.com...
 LuigiG wrote in message <9ll862$5ch$1 digitaldaemon.com>...
BTW, Microsoft's compiler has a
"/Oa assume no aliasing"
switch.
kind of playing with fire I assume ;)


 I decided long ago not to support such a switch. It would really only be
 useful to someone who understood exactly how the internal compiler
 optimizations really worked, which is likely nobody <g>.


Yep,
basically, if you know so much about the compiler that you can use the
noalias switch; you don't the noalias switch anymore.

Aug 19 2001

nicO <nicolas.boulay ifrance.com> writes:

Walter a �crit :
 
 Since D supports rectangular arrays, much more aggressive array optimization
 becomes possible.
 

Is that enough to introduice strip mining ? Or to change the order of
the loop ?

 nicO wrote in message <3B7DFFFC.7FB87E4F ifrance.com>...
I'm from the fcpu project (f-cpu.org), the goal is to create a "free
cpu".

One of our main problem is the compiler. I have wrote some algorithme in
C to improve matrix multiplication (using gcc on a P3). I have used loop
unrolling, software pipelining, a have brake a lot of false
dependancies, remove calculation from inner loop. I reorder the loop to
align data access and do strip mining. Most of this are processor
dependant and should be done by the compiler. I have won an average of 4
times compare to usual algorithme, my best gain was made on an 1.2Ghz
athlon processors with 512*512 matrix : x25 !

One of the main problem is the alias problem induice by pointer. In my
test, i lose 25% of the performance by using pointer instead of array.

So what in D will be done to improve the use of new instruction ? All
new cpu use conditional move (avoid to empty the pipeline), and vector
instructions (like MMX and SSE). We can also add the fact to compile for
2 processors and more.

What do you think ?

nicO

Aug 18 2001

"Walter" <walter digitalmars.com> writes:

nicO wrote in message <3B7EB156.6D6761E6 ifrance.com>...
Walter a �crit :
 Since D supports rectangular arrays, much more aggressive array


optimization
 becomes possible.

Is that enough to introduice strip mining ? Or to change the order of
the loop ?


I think so.

Aug 18 2001

nicO <nicolas.boulay ifrance.com> writes:

Walter a �crit :
 
 nicO wrote in message <3B7EB156.6D6761E6 ifrance.com>...
Walter a �crit :
 Since D supports rectangular arrays, much more aggressive array


 optimization
 becomes possible.

Is that enough to introduice strip mining ? Or to change the order of
the loop ?

 
 I think so.

Good news !

Aug 19 2001

nicO <nicolas.boulay ifrance.com> writes:

Walter a �crit :
 
 nicO wrote in message <3B7EB156.6D6761E6 ifrance.com>...
Walter a �crit :
 Since D supports rectangular arrays, much more aggressive array


 optimization
 becomes possible.

Is that enough to introduice strip mining ? Or to change the order of
the loop ?

 
 I think so.

And what about vector computing ? (for the use of MMX and SSE)

Aug 19 2001

"Walter" <walter digitalmars.com> writes:

nicO wrote in message <3B801890.5A7216C7 ifrance.com>...
And what about vector computing ? (for the use of MMX and SSE)

I don't know.

Aug 19 2001

nicO <nicolas.boulay ifrance.com> writes:

Walter a �crit :
 
 nicO wrote in message <3B801890.5A7216C7 ifrance.com>...
And what about vector computing ? (for the use of MMX and SSE)

 
 I don't know.

To work better with current CPU, any compiler should use SIMD
instruction. The difference in performance is great.

nicO

Aug 19 2001

"Sean L. Palmer" <spalmer iname.com> writes:

I think the key concept to take advantage of SSE and MMX and 3DNow etc SIMD
instruction sets is that the language needs to recognize the data types
manipulated by these instruction sets (usually short, power-of-two constant
sized vectors of int or float of various sizes) and have the language
provide operators to work with these "extra" basic types.  Vector add,
subtract, vector componentwise multiply, vector-by-scalar multiply, square
root of one or all components, reciprocal square root, dot product, and
matrix multiply (SSE/MMX don't provide much help for these last two).
Conditional testing (for every component in A, if less than corresponding
component in B, set result component to 1, else set result component to 0).
Bitwise ops for integer SIMD.  Component swizzling/shuffling/rearrangement.

If you're keen on understanding how SIMD programming works, download the
Microsoft Processor Pack for MSVC 6 and install it... it provides new __m64
and __m128 data types and a bunch of new intrinsic C functions that work on
these types, as well as __declspec(aligned(16)) so you can have a prayer of
getting the compiler to help you meet these types' heavy memory alignment
restrictions.

Perhaps if a few operators were added, constant-sized arrays of basic int
and float types could automatically take advantage of these types of
instruction sets.  Operators that work on two arrays and take an array as
result.  This really ties into the operator overloading discussion.

Sean

"nicO" <nicolas.boulay ifrance.com> wrote in message
news:3B807A0C.33576068 ifrance.com...
 Walter a �crit :
 nicO wrote in message <3B801890.5A7216C7 ifrance.com>...
And what about vector computing ? (for the use of MMX and SSE)

 I don't know.

 To work better with current CPU, any compiler should use SIMD
 instruction. The difference in performance is great.

 nicO

Oct 29 2001

"Walter" <walter digitalmars.com> writes:

You're right. -Walter

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:9rj7js$2sq8$1 digitaldaemon.com...
 I think the key concept to take advantage of SSE and MMX and 3DNow etc

SIMD
 instruction sets is that the language needs to recognize the data types
 manipulated by these instruction sets (usually short, power-of-two

constant
 sized vectors of int or float of various sizes) and have the language
 provide operators to work with these "extra" basic types.  Vector add,
 subtract, vector componentwise multiply, vector-by-scalar multiply, square
 root of one or all components, reciprocal square root, dot product, and
 matrix multiply (SSE/MMX don't provide much help for these last two).
 Conditional testing (for every component in A, if less than corresponding
 component in B, set result component to 1, else set result component to

0).
 Bitwise ops for integer SIMD.  Component

swizzling/shuffling/rearrangement.
 If you're keen on understanding how SIMD programming works, download the
 Microsoft Processor Pack for MSVC 6 and install it... it provides new

__m64
 and __m128 data types and a bunch of new intrinsic C functions that work

on
 these types, as well as __declspec(aligned(16)) so you can have a prayer

of
 getting the compiler to help you meet these types' heavy memory alignment
 restrictions.

 Perhaps if a few operators were added, constant-sized arrays of basic int
 and float types could automatically take advantage of these types of
 instruction sets.  Operators that work on two arrays and take an array as
 result.  This really ties into the operator overloading discussion.

 Sean

 "nicO" <nicolas.boulay ifrance.com> wrote in message
 news:3B807A0C.33576068 ifrance.com...
 Walter a �crit :
 nicO wrote in message <3B801890.5A7216C7 ifrance.com>...
And what about vector computing ? (for the use of MMX and SSE)

 I don't know.

 To work better with current CPU, any compiler should use SIMD
 instruction. The difference in performance is great.

 nicO

Jan 09 2002

"Sean L. Palmer" <spalmer iname.com> writes:

If D had good native support for SIMD programming, that would seem to be a
great big selling point for D.  "D:  Language of the future of parallel
programming".  ;)   That's what SIMD programming is, parallel programming.
You do several operations in parallel usually and mix the results together
occasionally.  Modern processors can do several operations in parallel
usually but we usually have to resort to assembly language to program that
way.  D could abstract the whole concept of that kind of data structure and
emulate it if the actual hardware isn't available  (you could choose to have
the compiler emit SSE or 3DNow instructions, or emulate using the FPU.)

Not just 3D graphics, but a *lot* of higher order math and physics
calculations are done with vectors and matrices.  This stuff needs to run as
fast as possible, and a compiler is much better suited to the job of
reducing stalls and such than a human is.  (because we humans are lazy,
mainly, and it's tedious, repetitive work... work computers are well suited
to do)  If you don't have language support you have to do it thru the
assembly language pathway, which is littered with minefields since usually
even inline assembly causes all registers to be saved and restored or at
least causes some kind of inefficiency with the optimizer.  Non-inline
assembly (not every platform supports inline assembly) involves call/return
statements and more register save/load bullshit.  Not the kind of stuff you
want happening all over in your tight, fast inner loops where 99% of the
time in the application is spent... the parts that benefit most from vector
computing.

You could even provide convenience features that may not exist in hardware,
such as inner product (which isn't available on Intel SSE instruction set,
yet definitely belongs in any hardware vector processor IMO) or cross
product (no instruction sets provide this to my knowledge, it takes like 3
or 4 instructions to emulate).

I think all the 3D engine programmers would switch to D overnight.  ;)

But not only that, the code generator could support using these instructions
to optimize other parts of the code too.  It could unroll this, for
instance:

float a[4],b[4],c[4];

for (int i = 0; i<4; ++i)
  a[i] = b[i]*b[i] + c[i];

Into this:

a[] = b[] * b[] + c[];

Or perhaps with some Unicode operator (Dot and Cross product symbols both
exist in Unicode)
Let's change that pesky "evaluates right-to-left" assignment into something
that reads more left-to-right (or in this case,

b[] * b[] + c[] ==> a[];

Which breaks down to this:

b[0] * b[0] + c[0] => a[0];
b[1] * b[1] + c[1] => a[1];
b[2] * b[2] + c[2] => a[2];
b[3] * b[3] + c[3] => a[3];

Let's just turn that line sideways, rotate it 90 degrees to the right, and
drop the square brackets:

b0 b1 b2 b3
*  *  *  *
b0 b1 b2 b3
+  +  +  +
c0 c1 c2 c3
=> => => =>
a0 a1 a2 a3

Which would essentially boil down to this kind of assembly language:

fmov4 xr0, b
fmov4 xr1, c
fmul4 xr2,xr0,xr0
fadd4 xr2,xr2,xr1
fmov4 a, xr2

or even this:

fmov4 xr0, b
fmad4 xr2,xr0,xr0,c ; multiply each component of xr0 by itself and add to
corresponding component of xr1, store all 4 into xr2
fmov4 a, xr2

Check out the instruction set for D3D's vertex or pixel shaders sometime.
;)  All new video cards have that kind of vector processor.  Your kid's PS2
has two of those vector units.  Every PC since the Pentium MMX has at least
one that can do very fast operations on bytes.   Yet I can't name a
programming language above assembly that inherently supports such things.

Sean

"Walter" <walter digitalmars.com> wrote in message
news:a1j6gs$2b02$1 digitaldaemon.com...
 You're right. -Walter

 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:9rj7js$2sq8$1 digitaldaemon.com...
 I think the key concept to take advantage of SSE and MMX and 3DNow etc

 SIMD
 instruction sets is that the language needs to recognize the data types
 manipulated by these instruction sets (usually short, power-of-two

 constant
 sized vectors of int or float of various sizes) and have the language
 provide operators to work with these "extra" basic types.  Vector add,
 subtract, vector componentwise multiply, vector-by-scalar multiply,


square
 root of one or all components, reciprocal square root, dot product, and
 matrix multiply (SSE/MMX don't provide much help for these last two).
 Conditional testing (for every component in A, if less than


corresponding
 component in B, set result component to 1, else set result component to

 0).
 Bitwise ops for integer SIMD.  Component

 swizzling/shuffling/rearrangement.
 If you're keen on understanding how SIMD programming works, download the
 Microsoft Processor Pack for MSVC 6 and install it... it provides new

 __m64
 and __m128 data types and a bunch of new intrinsic C functions that work

 on
 these types, as well as __declspec(aligned(16)) so you can have a prayer

 of
 getting the compiler to help you meet these types' heavy memory


alignment
 restrictions.

 Perhaps if a few operators were added, constant-sized arrays of basic


int
 and float types could automatically take advantage of these types of
 instruction sets.  Operators that work on two arrays and take an array


as
 result.  This really ties into the operator overloading discussion.

 Sean

 "nicO" <nicolas.boulay ifrance.com> wrote in message
 news:3B807A0C.33576068 ifrance.com...
 Walter a �crit :
 nicO wrote in message <3B801890.5A7216C7 ifrance.com>...
And what about vector computing ? (for the use of MMX and SSE)

 I don't know.

 To work better with current CPU, any compiler should use SIMD
 instruction. The difference in performance is great.

 nicO

Jan 10 2002

"Walter" <walter digitalmars.com> writes:

Actually, I think D does with its array operation semantics. They aren't
implemented :-(, but they are in the spec. -Walter

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a1jj23$2jl8$1 digitaldaemon.com...
 If D had good native support for SIMD programming, that would seem to be a
 great big selling point for D.  "D:  Language of the future of parallel
 programming".  ;)   That's what SIMD programming is, parallel programming.
 You do several operations in parallel usually and mix the results together
 occasionally.  Modern processors can do several operations in parallel
 usually but we usually have to resort to assembly language to program that
 way.  D could abstract the whole concept of that kind of data structure

and
 emulate it if the actual hardware isn't available  (you could choose to

have
 the compiler emit SSE or 3DNow instructions, or emulate using the FPU.)

 Not just 3D graphics, but a *lot* of higher order math and physics
 calculations are done with vectors and matrices.  This stuff needs to run

as
 fast as possible, and a compiler is much better suited to the job of
 reducing stalls and such than a human is.  (because we humans are lazy,
 mainly, and it's tedious, repetitive work... work computers are well

suited
 to do)  If you don't have language support you have to do it thru the
 assembly language pathway, which is littered with minefields since usually
 even inline assembly causes all registers to be saved and restored or at
 least causes some kind of inefficiency with the optimizer.  Non-inline
 assembly (not every platform supports inline assembly) involves

call/return
 statements and more register save/load bullshit.  Not the kind of stuff

you
 want happening all over in your tight, fast inner loops where 99% of the
 time in the application is spent... the parts that benefit most from

vector
 computing.

 You could even provide convenience features that may not exist in

hardware,
 such as inner product (which isn't available on Intel SSE instruction set,
 yet definitely belongs in any hardware vector processor IMO) or cross
 product (no instruction sets provide this to my knowledge, it takes like 3
 or 4 instructions to emulate).

 I think all the 3D engine programmers would switch to D overnight.  ;)

 But not only that, the code generator could support using these

instructions
 to optimize other parts of the code too.  It could unroll this, for
 instance:

 float a[4],b[4],c[4];

 for (int i = 0; i<4; ++i)
   a[i] = b[i]*b[i] + c[i];

 Into this:

 a[] = b[] * b[] + c[];

 Or perhaps with some Unicode operator (Dot and Cross product symbols both
 exist in Unicode)
 Let's change that pesky "evaluates right-to-left" assignment into

something
 that reads more left-to-right (or in this case,

 b[] * b[] + c[] ==> a[];

 Which breaks down to this:

 b[0] * b[0] + c[0] => a[0];
 b[1] * b[1] + c[1] => a[1];
 b[2] * b[2] + c[2] => a[2];
 b[3] * b[3] + c[3] => a[3];

 Let's just turn that line sideways, rotate it 90 degrees to the right, and
 drop the square brackets:

 b0 b1 b2 b3
 *  *  *  *
 b0 b1 b2 b3
 +  +  +  +
 c0 c1 c2 c3
 => => => =>
 a0 a1 a2 a3

 Which would essentially boil down to this kind of assembly language:

 fmov4 xr0, b
 fmov4 xr1, c
 fmul4 xr2,xr0,xr0
 fadd4 xr2,xr2,xr1
 fmov4 a, xr2

 or even this:

 fmov4 xr0, b
 fmad4 xr2,xr0,xr0,c ; multiply each component of xr0 by itself and add to
 corresponding component of xr1, store all 4 into xr2
 fmov4 a, xr2

 Check out the instruction set for D3D's vertex or pixel shaders sometime.
 ;)  All new video cards have that kind of vector processor.  Your kid's

PS2
 has two of those vector units.  Every PC since the Pentium MMX has at

least
 one that can do very fast operations on bytes.   Yet I can't name a
 programming language above assembly that inherently supports such things.

 Sean

Jan 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a1jlrk$2lgs$1 digitaldaemon.com...

 Actually, I think D does with its array operation semantics. They aren't
 implemented :-(, but they are in the spec. -Walter

But there is no dot and cross product for vectors.

Jan 10 2002

"Walter" <walter digitalmars.com> writes:

True.

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a1k1lr$2shq$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a1jlrk$2lgs$1 digitaldaemon.com...

 Actually, I think D does with its array operation semantics. They aren't
 implemented :-(, but they are in the spec. -Walter

 But there is no dot and cross product for vectors.

Jan 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a1kmio$2hu$2 digitaldaemon.com...

 True.

Then why not add them? As well as matrix multiply? And optimize
using MMX/3DNow!/SSE?

Jan 10 2002

"Walter" <walter digitalmars.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a1knkr$d57$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a1kmio$2hu$2 digitaldaemon.com...

 True.

 Then why not add them? As well as matrix multiply? And optimize
 using MMX/3DNow!/SSE?

Because I need to finish what I've got first!

Jan 11 2002

"Robert W. Cunningham" <rcunning acm.org> writes:

Walter wrote:

 "Pavel Minayev" <evilone omen.ru> wrote in message
 news:a1knkr$d57$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a1kmio$2hu$2 digitaldaemon.com...

 True.

 Then why not add them? As well as matrix multiply? And optimize
 using MMX/3DNow!/SSE?

 Because I need to finish what I've got first!

I nominate Walter to be the first human to be cloned...

With enough Walters, and if they all work the same hours and all think
alike, then we may have an SIMD-based language author and compiler
writer!


-BobC

Jan 12 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Robert W. Cunningham" <rcunning acm.org> wrote in message
news:3C408D59.DE2F85AB acm.org...

 With enough Walters, and if they all work the same hours and all think
 alike, then we may have an SIMD-based language author and compiler
 writer!

Yes, but who will hold the copyright?
Also, imagine all those Walters arguing here on the newsgroup... =)

Jan 12 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a1oifr$brs$2 digitaldaemon.com...

 Because I need to finish what I've got first!

I don't mean "now". I mean "in general" =)

Jan 12 2002

D Programming

C/C++ Programming

Other

D - Optimising