www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Array operations, C#, etc

Mono for gaming and higher performance:
http://tirania.org/tmp/PC54-slides-as-pdf.pdf
Link coming from this blog post:
http://tirania.org/blog/archive/2008/Nov-03.html

The article (well, slide set) shows how C# (on mono) is used to replace
scripting languages like Lua/Python for IA in games (D isn't listed there,
maybe they think D is a dinosaur like C++).

Near the end the slide set also shows the the approach taken by mono to use the
SIMD instructions of the CPU, defining many types like:

Mono.Simd.Vector16b  - 16 unsigned bytes
Mono.Simd.Vector16sb - 16 signed bytes
Mono.Simd.Vector2d   - 2 doubles
Mono.Simd.Vector2l   - 2 signed 64-bit longs
Mono.Simd.Vector2ul  - 2 unsigned 64-bit longs
Mono.Simd.Vector4f   - 4 floats
etc...

Operations on them become translated as SIMD instructions.

D instead augments all arrays with array operations, but then it also has to
manage the cases where lengths aren't exact multiples of the MMX registers.

I think that such length management is done at runtime, so you have to pay a
little price if you have just few items, like 4 floats, that for example you
don't pay using Mono.Simd.Vector4f.

When array sizes are known at compile time and fixed, like in this situation:

void main() {
  float[4] a = [1.0, 2.0, 3.0, 4.0];
  float[4] b = [10.0, 20.0, 30.0, 40.0];
  float[4] s;
  s[] = a[] + b[];
}

The compiler, into the arrayfloat._arraySliceSliceAddSliceAssign_f() function
can use compile-time information to runtime length controls & fallbacks, using
some static ifs.

With the purpose is to make it produce only the naked instuctions in that case
(I may write a little benchmark in D with inlined ASM to compare the speed of
the s[]=a[]+b[] line):

movups (%eax),%xmm0
movups (%edi),%xmm1
addps %xmm1,%xmm0
movups %xmm0,(%eax)

I think another little and less easy to solve problem comes from this control
near the beginning of that _arraySliceSliceAddSliceAssig function:
if (sse() && ...
That is probably quick, but if you have to sum just 4 floats into a loop I
presume it may slow down the code some (there's also the function call, it's
not inlined).
The availability of sse() can't be done a compile time because you don't know
where the code will run (but eventually a compiler argument can be added to
specify the program will be run only on CPUs with SSE2, etc). A brutal solution
is to duplicate the object code of the functions that contain SSE instructions,
so at the beginning of the runtime you can change the jump pointers of the
function calls in the whole programs once :-) It may make the executable
longer, but seeing how execs are often 300+KB I don't think that's a big
problem, and I presume only few functions will contain array operations.

Bye,
bearophile
Nov 03 2008