www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - OOP, faster data layouts, compilers

reply bearophile <bearophileHUGS lycos.com> writes:
Through Reddit I've found a set of wordy slides, "Design for Performance", on
designing efficient games code:
http://www.scribd.com/doc/53483851/Design-for-Performance
http://www.reddit.com/r/programming/comments/guyb2/designing_code_for_performance/

The slide touch many small topics, like the need for prefetching, desing for
cache-aware code, etc. One of the main topics is how to better lay data
structures in memory for modern CPUs. It shows how object oriented style leads
often to collections of little trees, for example  arrays of object references
(or struct pointers) that refer to objects that contain other references to sub
parts. Iterating on such data structures is not so efficient.

The slides also discuss a little the difference between creating an array of
2-item structs, or a struct that contains two arrays of single native values.
If the code needs to scan just one of those two fields, then the struct that
contains the two arrays is faster.

Similar topics were discussed better in "Pitfalls of Object Oriented
Programming" (2009):
http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf

In my opinion if D2 has some success then one of its significant usages will be
to write fast games, so the design/performance concerns expressed in those two
sets of slides need to be important for D design.

D probably allows to lay data in memory as shown in those slides, but I'd like
some help from the compiler too.  I don't think the compilers will be soon able
to turn an immutable binary tree into an array, to speedup its repeated
scanning, but maybe there are ways to express semantics in the code that will
allow them future smarter compilers to perform some of those memory layout
optimization, like transposing arrays. A possible idea is a
 no_inbound_pointers that forbids taking the addess of the items, and allows
the compiler to modify the data layout a little.

Bye,
bearophile
Apr 21 2011
parent reply "Paulo Pinto" <pjmlp progtools.org> writes:
Many thanks for the links, they provide very nice discussions.

Specially the link below, that you can follow from your first link,
http://c0de517e.blogspot.com/2011/04/2011-current-and-future-programming.html

But in what concerns game development, D2 might already be too late.

I know a bit of it, since a live a bit on that part of the universe.

Due to XNA(Windows and XBox 360), Mono/Unity, and now WP7, many game studios
have started to move their tooling into C#. And some of them are nowadays 
even using
it for the server side code.

Java used to have a foot there, specially due to the J2ME game development, 
with a small
push thanks to Android. Which decreased since Google made the NDK available.

If one day Microsoft really lets C# free, the same way AT&T  somehow did 
with C and C++, then C#
might actually be the next C++, at least in what game development is 
concerned.

And the dependency on a JIT environment is an implementation issue. The 
Bartok compiler in Singularity
compiles to native code, and Mono also provides a similar option.

So who knows?

--
Paulo



"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:ioqdhe$2030$1 digitalmars.com...
 Through Reddit I've found a set of wordy slides, "Design for Performance", 
 on designing efficient games code:
 http://www.scribd.com/doc/53483851/Design-for-Performance
 http://www.reddit.com/r/programming/comments/guyb2/designing_code_for_performance/

 The slide touch many small topics, like the need for prefetching, desing 
 for cache-aware code, etc. One of the main topics is how to better lay 
 data structures in memory for modern CPUs. It shows how object oriented 
 style leads often to collections of little trees, for example  arrays of 
 object references (or struct pointers) that refer to objects that contain 
 other references to sub parts. Iterating on such data structures is not so 
 efficient.

 The slides also discuss a little the difference between creating an array 
 of 2-item structs, or a struct that contains two arrays of single native 
 values. If the code needs to scan just one of those two fields, then the 
 struct that contains the two arrays is faster.

 Similar topics were discussed better in "Pitfalls of Object Oriented 
 Programming" (2009):
 http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf

 In my opinion if D2 has some success then one of its significant usages 
 will be to write fast games, so the design/performance concerns expressed 
 in those two sets of slides need to be important for D design.

 D probably allows to lay data in memory as shown in those slides, but I'd 
 like some help from the compiler too.  I don't think the compilers will be 
 soon able to turn an immutable binary tree into an array, to speedup its 
 repeated scanning, but maybe there are ways to express semantics in the 
 code that will allow them future smarter compilers to perform some of 
 those memory layout optimization, like transposing arrays. A possible idea 
 is a  no_inbound_pointers that forbids taking the addess of the items, and 
 allows the compiler to modify the data layout a little.

 Bye,
 bearophile 

Apr 22 2011
next sibling parent reply Kai Meyer <kai unixlords.com> writes:
On 04/22/2011 02:55 AM, Paulo Pinto wrote:
 Many thanks for the links, they provide very nice discussions.

 Specially the link below, that you can follow from your first link,
 http://c0de517e.blogspot.com/2011/04/2011-current-and-future-programming.html

 But in what concerns game development, D2 might already be too late.

 I know a bit of it, since a live a bit on that part of the universe.

 Due to XNA(Windows and XBox 360), Mono/Unity, and now WP7, many game studios
 have started to move their tooling into C#. And some of them are nowadays
 even using
 it for the server side code.

 Java used to have a foot there, specially due to the J2ME game development,
 with a small
 push thanks to Android. Which decreased since Google made the NDK available.

 If one day Microsoft really lets C# free, the same way AT&T  somehow did
 with C and C++, then C#
 might actually be the next C++, at least in what game development is
 concerned.

 And the dependency on a JIT environment is an implementation issue. The
 Bartok compiler in Singularity
 compiles to native code, and Mono also provides a similar option.

 So who knows?

 --
 Paulo

I don't think C# is the next C++; it's impossible for C# to be what C/C++ is. There is a purpose and a place for Interpreted languages like C# and Java, just like there is for C/C++. What language do you think the interpreters for Java and C# are written in? (Hint: It's not Java or C#.) I also don't think that the core of Unity (or any decent game engine) is written in an interpreted language either, which basically means the guts are likely written in either C or C++. The point being made is that Systems Programming Languages like C/C++ and D are picked for their execution speed, and Interpreted Languages are picked for their ease of programming (or development speed). Since D is picked for execution speed, we should seriously consider every opportunity to improve in that arena. The OP wasn't just for the game developers, but for game framework developers as well.
Apr 22 2011
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Am 22.04.2011 18:48, schrieb Kai Meyer:
 
 I don't think C# is the next C++; it's impossible for C# to be what
 C/C++ is. There is a purpose and a place for Interpreted languages like
 C# and Java, just like there is for C/C++. What language do you think
 the interpreters for Java and C# are written in? (Hint: It's not Java or
 C#.) I also don't think that the core of Unity (or any decent game
 engine) is written in an interpreted language either, which basically
 means the guts are likely written in either C or C++. The point being
 made is that Systems Programming Languages like C/C++ and D are picked
 for their execution speed, and Interpreted Languages are picked for
 their ease of programming (or development speed). Since D is picked for
 execution speed, we should seriously consider every opportunity to
 improve in that arena. The OP wasn't just for the game developers, but
 for game framework developers as well.

IMHO D won't be successful for games as long as it only supports Windows, Linux and OSX on PC (-like) hardware. We'd need support for modern game consoles (XBOX360, PS3, maybe Wii) and for mobile devices (Android, iOS, maybe Win7 phones and other stuff). This means good PPC (maybe the PS3's Cell CPU would need special support even though it's understands PPC code? I don't know.) and ARM support and support for the operating systems and SDKs used on those platforms. Of course execution speed is very important as well, but D in it's current state is not *that* bad in this regard. Sure, the GC is a bit slow, but in high performance games you shouldn't use it (or even malloc/free) all the time, anyway, see http://www.digitalmars.com/d/2.0/memory.html#realtime Another point: I find Minecraft pretty impressive. It really changed my view upon Games developed in Java. Cheers, - Daniel
Apr 22 2011
parent reply Kai Meyer <kai unixlords.com> writes:
On 04/22/2011 11:05 AM, Daniel Gibson wrote:
 Am 22.04.2011 18:48, schrieb Kai Meyer:
 I don't think C# is the next C++; it's impossible for C# to be what
 C/C++ is. There is a purpose and a place for Interpreted languages like
 C# and Java, just like there is for C/C++. What language do you think
 the interpreters for Java and C# are written in? (Hint: It's not Java or
 C#.) I also don't think that the core of Unity (or any decent game
 engine) is written in an interpreted language either, which basically
 means the guts are likely written in either C or C++. The point being
 made is that Systems Programming Languages like C/C++ and D are picked
 for their execution speed, and Interpreted Languages are picked for
 their ease of programming (or development speed). Since D is picked for
 execution speed, we should seriously consider every opportunity to
 improve in that arena. The OP wasn't just for the game developers, but
 for game framework developers as well.

IMHO D won't be successful for games as long as it only supports Windows, Linux and OSX on PC (-like) hardware. We'd need support for modern game consoles (XBOX360, PS3, maybe Wii) and for mobile devices (Android, iOS, maybe Win7 phones and other stuff). This means good PPC (maybe the PS3's Cell CPU would need special support even though it's understands PPC code? I don't know.) and ARM support and support for the operating systems and SDKs used on those platforms. Of course execution speed is very important as well, but D in it's current state is not *that* bad in this regard. Sure, the GC is a bit slow, but in high performance games you shouldn't use it (or even malloc/free) all the time, anyway, see http://www.digitalmars.com/d/2.0/memory.html#realtime Another point: I find Minecraft pretty impressive. It really changed my view upon Games developed in Java. Cheers, - Daniel

Hah, Minecraft. Have you tried loading up a high resolution texture pack yet? There's a reason why it looks like 8-bit graphics. It's not Java that makes Minecraft awesome, imo :)
Apr 22 2011
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Am 22.04.2011 19:11, schrieb Kai Meyer:
 On 04/22/2011 11:05 AM, Daniel Gibson wrote:
 Am 22.04.2011 18:48, schrieb Kai Meyer:
 I don't think C# is the next C++; it's impossible for C# to be what
 C/C++ is. There is a purpose and a place for Interpreted languages like
 C# and Java, just like there is for C/C++. What language do you think
 the interpreters for Java and C# are written in? (Hint: It's not Java or
 C#.) I also don't think that the core of Unity (or any decent game
 engine) is written in an interpreted language either, which basically
 means the guts are likely written in either C or C++. The point being
 made is that Systems Programming Languages like C/C++ and D are picked
 for their execution speed, and Interpreted Languages are picked for
 their ease of programming (or development speed). Since D is picked for
 execution speed, we should seriously consider every opportunity to
 improve in that arena. The OP wasn't just for the game developers, but
 for game framework developers as well.

IMHO D won't be successful for games as long as it only supports Windows, Linux and OSX on PC (-like) hardware. We'd need support for modern game consoles (XBOX360, PS3, maybe Wii) and for mobile devices (Android, iOS, maybe Win7 phones and other stuff). This means good PPC (maybe the PS3's Cell CPU would need special support even though it's understands PPC code? I don't know.) and ARM support and support for the operating systems and SDKs used on those platforms. Of course execution speed is very important as well, but D in it's current state is not *that* bad in this regard. Sure, the GC is a bit slow, but in high performance games you shouldn't use it (or even malloc/free) all the time, anyway, see http://www.digitalmars.com/d/2.0/memory.html#realtime Another point: I find Minecraft pretty impressive. It really changed my view upon Games developed in Java. Cheers, - Daniel

Hah, Minecraft. Have you tried loading up a high resolution texture pack yet? There's a reason why it looks like 8-bit graphics. It's not Java that makes Minecraft awesome, imo :)

No I haven't. What I find impressive is this (almost infinitely) big world that is completely changeable, i.e. you can build new stuff everywhere, you can dig tunnels everywhere (ok, somewhere really deep there's a limit) and the game still runs smoothly. Haven't seen something like that in any game before.
Apr 22 2011
next sibling parent reply Kai Meyer <kai unixlords.com> writes:
On 04/22/2011 11:20 AM, Daniel Gibson wrote:
 Am 22.04.2011 19:11, schrieb Kai Meyer:
 On 04/22/2011 11:05 AM, Daniel Gibson wrote:
 Am 22.04.2011 18:48, schrieb Kai Meyer:
 I don't think C# is the next C++; it's impossible for C# to be what
 C/C++ is. There is a purpose and a place for Interpreted languages like
 C# and Java, just like there is for C/C++. What language do you think
 the interpreters for Java and C# are written in? (Hint: It's not Java or
 C#.) I also don't think that the core of Unity (or any decent game
 engine) is written in an interpreted language either, which basically
 means the guts are likely written in either C or C++. The point being
 made is that Systems Programming Languages like C/C++ and D are picked
 for their execution speed, and Interpreted Languages are picked for
 their ease of programming (or development speed). Since D is picked for
 execution speed, we should seriously consider every opportunity to
 improve in that arena. The OP wasn't just for the game developers, but
 for game framework developers as well.

IMHO D won't be successful for games as long as it only supports Windows, Linux and OSX on PC (-like) hardware. We'd need support for modern game consoles (XBOX360, PS3, maybe Wii) and for mobile devices (Android, iOS, maybe Win7 phones and other stuff). This means good PPC (maybe the PS3's Cell CPU would need special support even though it's understands PPC code? I don't know.) and ARM support and support for the operating systems and SDKs used on those platforms. Of course execution speed is very important as well, but D in it's current state is not *that* bad in this regard. Sure, the GC is a bit slow, but in high performance games you shouldn't use it (or even malloc/free) all the time, anyway, see http://www.digitalmars.com/d/2.0/memory.html#realtime Another point: I find Minecraft pretty impressive. It really changed my view upon Games developed in Java. Cheers, - Daniel

Hah, Minecraft. Have you tried loading up a high resolution texture pack yet? There's a reason why it looks like 8-bit graphics. It's not Java that makes Minecraft awesome, imo :)

No I haven't. What I find impressive is this (almost infinitely) big world that is completely changeable, i.e. you can build new stuff everywhere, you can dig tunnels everywhere (ok, somewhere really deep there's a limit) and the game still runs smoothly. Haven't seen something like that in any game before.

The random world generator is amazing, but it's not speed. The polygon count of the game is excruciatingly low because the client is smart enough to only draw the faces of blocks that are visible. The very bottom (bedrock) and they very top of the sky (as high as you can build blocks) is 256 blocks tall. The game is full of low-level bit-stuffing (like stacks of 64). The genius of the game is not in any special features of Java, it's in the data structure and data generator, which can be done much faster in other languages. But it begs the question, "why does it need to be faster?" It is "fast enough" in the JVM (unless you load up the high resolution textures, in which case the game becomes unbearably slow when viewing long distances.) The purpose of the original post was to indicate that some low level research shows that underlying data structures (as applied to video game development) can have an impact on the performance of the application, which D (I think) cares very much about.
Apr 22 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Kai Meyer:

 The purpose of the original post was to indicate that some low level 
 research shows that underlying data structures (as applied to video game 
 development) can have an impact on the performance of the application, 
 which D (I think) cares very much about.

The idea of the original post was a bit more complex: how can we invent new/better ways to express semantics in D code that will not forbid future D compilers to perform a bit of changes in the layout of data structures to increase code performance? Complex transforms of the data layout seem too much complex for even a good compiler, but maybe simpler ones will be possible. And I think to do this the D code needs some more semantics. I was suggesting an annotation that forbids inbound pointers, that allows the compiler to move data around a little, but this is just a start. Bye, bearophile
Apr 22 2011
parent reply Sean Cavanaugh <WorksOnMyMachine gmail.com> writes:
On 4/22/2011 2:20 PM, bearophile wrote:
 Kai Meyer:

 The purpose of the original post was to indicate that some low level
 research shows that underlying data structures (as applied to video game
 development) can have an impact on the performance of the application,
 which D (I think) cares very much about.

The idea of the original post was a bit more complex: how can we invent new/better ways to express semantics in D code that will not forbid future D compilers to perform a bit of changes in the layout of data structures to increase code performance? Complex transforms of the data layout seem too much complex for even a good compiler, but maybe simpler ones will be possible. And I think to do this the D code needs some more semantics. I was suggesting an annotation that forbids inbound pointers, that allows the compiler to move data around a little, but this is just a start. Bye, bearophile

In many ways the biggest thing I use regularly in game development that I would lose by moving to D would be good built-in SIMD support. The PC compilers from MS and Intel both have intrinsic data types and instructions that cover all the operations from SSE1 up to AVX. The intrinsics are nice in that the job of register allocation and scheduling is given to the compiler and generally the code it outputs is good enough (though it needs to be watched at times). Unlike ASM, intrinsics can be inlined so your math library can provide a platform abstraction at that layer before building up to larger operations (like vectorized forms of sin, cos, etc) and algorithms (like frustum cull checks, k-dop polygon collision etc), which makes porting and reusing the algorithms to other platforms much much easier, as only the low level layer needs to be ported, and only outliers at the algorithm level need to be tweaked after you get it up and running. On the consoles there is AltiVec (VMX) which is very similar to SSE in many ways. The common ground is basically SSE1 tier operations : 128 bit values operating on 4x32 bit integer and 4x32 bit float support. 64 bit AMD/Intel makes SSE2 the minimum standard, and a systems language on those platforms should reflect that. Loading and storing is comparable across platforms with similar alignment restrictions or penalties for working with unaligned data. Packing/swizzle/shuffle/permuting are different but this is not a huge problem for most algorithms. The lack of fused multiply and add on the Intel side can be worked around or abstracted (i.e. always write code as if it existed, have the Intel version expand to multiple ops). And now my wish list: If you have worked with shader programming through HLSL or CG the expressiveness of doing the work in SIMD is very high. If I could write something that looked exactly like HLSL but it was integrated perfectly in a language like D or C++, it would be pretty huge to me. The amount of math you can have in a line or two in HLSL is mind boggling at times, yet extremely intuitive and rather easy to debug.
Apr 22 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Sean Cavanaugh:

 In many ways the biggest thing I use regularly in game development that
 I would lose by moving to D would be good built-in SIMD support.  The PC
 compilers from MS and Intel both have intrinsic data types and
 instructions that cover all the operations from SSE1 up to AVX.  The
 intrinsics are nice in that the job of register allocation and
 scheduling is given to the compiler and generally the code it outputs is
 good enough (though it needs to be watched at times).

This is a topic quite different from the one I was talking about, but it's an interesting topic :-) SIMD intrinsics look ugly, they add lot of noise to the code, and are very specific to one CPU, or instruction set. You can't design a clean language with hundreds of those. Once 256 or 512 bit registers come, you need to add new intrinsics and change your code to use them. This is not so good. D array operations are probably meant to become smarter, when you perform a: int[8] a, b, c; a = b + c; A future good D compiler may use just two inlined istructions, or little more. This will probably include shuffling and broadcasting properties too. Maybe this kind of code is not as efficient as handwritten assembly code (or C code that uses SIMD intrinsics) but it's adaptable to different CPUs, future ones too, it's much less noisy, and it seems safer. I think such optimizations are better left to the back-end, so lot of time ago I've asked it to LLVM devs, for future LDC: http://llvm.org/bugs/show_bug.cgi?id=6956 The presence of such well implemented vector ops will not forbid another D compiler to add true SIMD intrinsics too.
 Unlike ASM, intrinsics can be inlined so your math library can provide a

DMD may eventually need this feature of the LDC compiler: http://www.dsource.org/projects/ldc/wiki/InlineAsmExpressions Bye, bearophile
Apr 22 2011
parent reply Sean Cavanaugh <WorksOnMyMachine gmail.com> writes:
On 4/22/2011 4:41 PM, bearophile wrote:
 Sean Cavanaugh:

 In many ways the biggest thing I use regularly in game development that
 I would lose by moving to D would be good built-in SIMD support.  The PC
 compilers from MS and Intel both have intrinsic data types and
 instructions that cover all the operations from SSE1 up to AVX.  The
 intrinsics are nice in that the job of register allocation and
 scheduling is given to the compiler and generally the code it outputs is
 good enough (though it needs to be watched at times).

This is a topic quite different from the one I was talking about, but it's an interesting topic :-) SIMD intrinsics look ugly, they add lot of noise to the code, and are very specific to one CPU, or instruction set. You can't design a clean language with hundreds of those. Once 256 or 512 bit registers come, you need to add new intrinsics and change your code to use them. This is not so good.

In C++ the intrinsics are easily wrapped by __forceinline global functions, to provide a platform abstraction against the intrinsics. Then, you can write class wrappers to provide the most common level of functionality, which boils down to a class to do vectorized math operators for + - * / and vectorized comparison functions == != >= <= < and >. From HLSL you have to borrow the 'any' and 'all' statements (along with variations for every permutation of the bitmask of the test result) to do conditional branching for the tests. This pretty much leaves swizzle/shuffle/permuting and outlying features (8,16,64 bit integers) in the realm of 'ugly'. From here you could build up portable SIMD transcendental functions (sin, cos, pow, log, etc), and other libraries (matrix multiplication, inversion, quaternions etc). I would say in D this could be faked provided the language at a minimum understood what a 128 (SSE1 through 4.2) and 256 bit value (AVX) was and how to efficiently move it via registers for function calls. Kind of 'make it at least work in the ABI, come back to a good implementation later' solution. There is some room to beat Microsoft here, as the the code visual studio 2010 outputs currently for 64 bit environments cannot pass 128 bit SIMD values by register (forceinline functions are the only workaround), even though scalar 32 and 64 bit float values are passed by XMM register just fine. The current hardware landscape dictates organizing your data in SIMD friendly manners. Naive OOP based code is going to de-reference too many pointers to get to scattered data. This makes the hardware prefetcher work too hard, and it wastes cache memory by only using a fraction of the RAM from the cache line, plus wasting 75-90% of the bandwidth and memory on the machine.
 D array operations are probably meant to become smarter, when you perform a:

 int[8] a, b, c;
 a = b + c;

Now the original topic pertains to data layouts, of which SIMD, the CPU cache, and efficient code all inter-relate. I would argue the above code is an idealistic example, as when writing SIMD code you almost always have to transpose or rotate one of the sets of data to work in parallel across the other one. What happens when this code has to branch? In SIMD land you have to test if any or all 4 lanes of SIMD data need to take it. And a lot of time the best course of action is to compute the other code path in addition to the first one, AND the fist result and NAND the second one and OR the results together to make valid output. I could maybe see a functional language doing ok at this. The only reasonable construct to be able to explain how common this is in optimized SIMD code, is to compare it to is HLSL's vectorized ternary operator (and understanding that 'a' and 'b' can be fairly intricate chunks of code if you are clever): float4 a = {1,2,3,4}; float4 b = {5,6,7,8}; float4 c = {-1,0,1,2}; float4 d = {0,0,0,0}; float4 foo = (c > d) ? a : b; results with foo = {5,6,3,4} For a lot of algorithms the 'a' and 'b' path have similar cost, so for SIMD it executes about 2x faster than the scalar case, although better than 2x gains are possible since using SIMD also naturally reduces or eliminates a ton of branching which CPUs don't really like to do due to their long pipelines. And as much as Intel likes to argue that a structure containing positions for a particle system should look like this because it makes their hardware benchmarks awesome, the following vertex layout is a failure: struct ParticleVertex { float[1000] XPos; float[1000] YPos; float[1000] ZPos; } The GPU (or Audio devices) does not consume it this way. The data is also not cache coherent if you are trying to read or write a single vertex out of the structure. A hybrid structure which is aware of the size of a SIMD register is the next logical choice: align(16) struct ParticleVertex { float[4] XPos; float[4] YPos; float[4] ZPos; } ParticleVertex[250] ParticleVertices; // struct is also now 75% of a 64 byte cache line // Also, 2 of any 4 random accesses for a vertex are in the same // cache line, and only 2 are touched in the worst case But this hybrid structure still has to be shuffled before being given to a GPU (albeit in much more bite size increments that could easily read-shuffle-write at the same speed of a platform optimized memcpy) Things get real messy when you have multiple vertex attributes as decisions to keep them together or separate are conflicting and both choices make sense to different systems :)
Apr 22 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Sean Cavanaugh:

 In C++ the intrinsics are easily wrapped by __forceinline global
 functions, to provide a platform abstraction against the intrinsics.

When AVX will become 512 bits wide, or you need to use a very different set of vector register, your global functions need to change, so the code that calls them too has to change. This is acceptable for library code, but it's not good for D built-ins operations. D built-in vector ops need to be more clean, general and long-lasting, even if they may not fully replace SSE intrinsics.
 I would say in D this could be faked provided the language at a minimum
 understood what a 128 (SSE1 through 4.2) and 256 bit value (AVX) was and
 how to efficiently move it via registers for function calls.

Also think about what the D ABI will be 15-25 years from now. D design must look a bit more forward too.
 Now the original topic pertains to data layouts,

It was about how to not preclude future D compilers from shuffling data around a bit by themselves :-)
 I would argue the above
 code is an idealistic example, as when writing SIMD code you almost
 always have to transpose or rotate one of the sets of data to work in
 parallel across the other one.

Right.
 float4 a = {1,2,3,4};
 float4 b = {5,6,7,8};
 float4 c = {-1,0,1,2};
 float4 d = {0,0,0,0};
 float4 foo = (c > d) ? a : b;

Recently I have asked for a D vector comparison operation too, (the compiler is supposed able to splits them into register-sized chunks for the comparisons), this is good for AVX instructions (a little problem here is that I think currently DMD allocates memory on heap to instantiate those four little arrays): int[4] a = [1,2,3,4]; int[4] b = [5,6,7,8] int[4] c = [-1,0,1,2]; int[4] d = [0,0,0,0]; int[4] foo = (c[] > d[]) ? a[] : b[];
 Things get real messy when you have multiple vertex attributes as
 decisions to keep them together or separate are conflicting and both
 choices make sense to different systems :)

It's not easy for future compilers to perform similar auto-vectorizations :-) Bye and thank you for your answer, bearophile
Apr 22 2011
prev sibling parent reply Don <nospam nospam.com> writes:
Sean Cavanaugh wrote:
 On 4/22/2011 2:20 PM, bearophile wrote:
 Kai Meyer:

 The purpose of the original post was to indicate that some low level
 research shows that underlying data structures (as applied to video game
 development) can have an impact on the performance of the application,
 which D (I think) cares very much about.

The idea of the original post was a bit more complex: how can we invent new/better ways to express semantics in D code that will not forbid future D compilers to perform a bit of changes in the layout of data structures to increase code performance? Complex transforms of the data layout seem too much complex for even a good compiler, but maybe simpler ones will be possible. And I think to do this the D code needs some more semantics. I was suggesting an annotation that forbids inbound pointers, that allows the compiler to move data around a little, but this is just a start. Bye, bearophile

In many ways the biggest thing I use regularly in game development that I would lose by moving to D would be good built-in SIMD support. The PC compilers from MS and Intel both have intrinsic data types and instructions that cover all the operations from SSE1 up to AVX. The intrinsics are nice in that the job of register allocation and scheduling is given to the compiler and generally the code it outputs is good enough (though it needs to be watched at times). Unlike ASM, intrinsics can be inlined so your math library can provide a platform abstraction at that layer before building up to larger operations (like vectorized forms of sin, cos, etc) and algorithms (like frustum cull checks, k-dop polygon collision etc), which makes porting and reusing the algorithms to other platforms much much easier, as only the low level layer needs to be ported, and only outliers at the algorithm level need to be tweaked after you get it up and running. On the consoles there is AltiVec (VMX) which is very similar to SSE in many ways. The common ground is basically SSE1 tier operations : 128 bit values operating on 4x32 bit integer and 4x32 bit float support. 64 bit AMD/Intel makes SSE2 the minimum standard, and a systems language on those platforms should reflect that.

Yes. It is for primarily for this reason that we made static arrays return-by-value. It is intended that on x86, float[4] will be an SSE1 register. So it should be possible to write SIMD code with standard array operations. (Note that this is *much* easier for the compiler, than trying to vectorize scalar code). This gives syntax like: float[4] a, b, c; a[] += b[] * c[]; (currently works, but doesn't use SSE, so has dismal performance).
 
 Loading and storing is comparable across platforms with similar 
 alignment restrictions or penalties for working with unaligned data. 
 Packing/swizzle/shuffle/permuting are different but this is not a huge 
 problem for most algorithms.  The lack of fused multiply and add on the 
 Intel side can be worked around or abstracted (i.e. always write code as 
 if it existed, have the Intel version expand to multiple ops).
 
 And now my wish list:
 
 If you have worked with shader programming through HLSL or CG the 
 expressiveness of doing the work in SIMD is very high.  If I could write 
 something that looked exactly like HLSL but it was integrated perfectly 
 in a language like D or C++, it would be pretty huge to me.  The amount 
 of math you can have in a line or two in HLSL is mind boggling at times, 
 yet extremely intuitive and rather easy to debug.

Apr 26 2011
parent reply Peter Alexander <peter.alexander.au gmail.com> writes:
On 26/04/11 9:01 AM, Don wrote:
 Sean Cavanaugh wrote:
 In many ways the biggest thing I use regularly in game development
 that I would lose by moving to D would be good built-in SIMD support.
 <snip>

Yes. It is for primarily for this reason that we made static arrays return-by-value. It is intended that on x86, float[4] will be an SSE1 register. So it should be possible to write SIMD code with standard array operations. (Note that this is *much* easier for the compiler, than trying to vectorize scalar code). This gives syntax like: float[4] a, b, c; a[] += b[] * c[]; (currently works, but doesn't use SSE, so has dismal performance).

What about float[4]s that are part of an object? Will they be automatically align(16) so that they can be quickly moved into the SSE registers, or will the user have to specify that manually? Also, what if I don't want my float[4] to be stored in a SSE register e.g. because I will be treating those four floats as individual floats, and never as a vector? IMO, float[4] should be left as it is and you should introduce a new vector data type that has all these optimisations. Just because a vector is four floats doesn't mean that all groups of four floats are vectors.
Apr 26 2011
parent Don <nospam nospam.com> writes:
Peter Alexander wrote:
 On 26/04/11 9:01 AM, Don wrote:
 Sean Cavanaugh wrote:
 In many ways the biggest thing I use regularly in game development
 that I would lose by moving to D would be good built-in SIMD support.
 <snip>

Yes. It is for primarily for this reason that we made static arrays return-by-value. It is intended that on x86, float[4] will be an SSE1 register. So it should be possible to write SIMD code with standard array operations. (Note that this is *much* easier for the compiler, than trying to vectorize scalar code). This gives syntax like: float[4] a, b, c; a[] += b[] * c[]; (currently works, but doesn't use SSE, so has dismal performance).

What about float[4]s that are part of an object? Will they be automatically align(16) so that they can be quickly moved into the SSE registers, or will the user have to specify that manually?

No special treatment, they just use the alignment for arrays of the type. Which I believe is indeed align(16) in that case.
 Also, what if I don't want my float[4] to be stored in a SSE register 
 e.g. because I will be treating those four floats as individual floats, 
 and never as a vector?

That's a decision for the compiler to make. It'll generate whatever code it thinks is appropriate. (My mention of float[4] being in an SSE register applies ONLY to parameter passing; but it isn't decided yet anyway).
 IMO, float[4] should be left as it is and you should introduce a new 
 vector data type that has all these optimisations. Just because a vector 
 is four floats doesn't mean that all groups of four floats are vectors.

It has absolutely nothing to do with vectors. All groups of floats (of ANY length) benefit from SIMD. D's semantics make it easy to take advantage of SIMD, regardless of what size it is. C's ancient machine model doesn't envisage SIMD, so C compilers are left with a massive abstraction inversion. It's really quite ridiculous that in this area, most mainstream programming languages are still operating at a lower level of abstraction than asm.
Apr 28 2011
prev sibling parent Mike Parker <aldacron gmail.com> writes:
On 4/23/2011 4:22 AM, Andrew Wiley wrote:

 The reason Minecraft runs so well in Java, from my point of view, is
 that the authors resisted the Java urge to throw objects at the problem
 and instead put everything into large byte arrays and wrote methods to
 manipulate them. From that perspective, using Java would be about the
 same as using any language, which let them stick to what they knew
 without incurring a large performance penalty.

FYI, Markus, the author, has been a figure in the Java game development community for years. He was the original client programmer for Wurm Online[1] (where the landscape is 'infinite' and tiled) and a frequent participant in the Java4k competition[2] (with Left4kDead[3] perhaps being his most popular). I think it's a safe assumption that the techniques he put to use in Minecraft were learned from his experiments with the Wurm landscape and with cramming Java games into 4kb. [1] http://www.wurmonline.com/ [2] http://www.java4k.com/index.php?action=home [3] http://www.mojang.com/notch/j4k/l4kd/
Apr 22 2011
prev sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 22/04/2011 18:20, Daniel Gibson wrote:
 Am 22.04.2011 19:11, schrieb Kai Meyer:
 On 04/22/2011 11:05 AM, Daniel Gibson wrote:
 Am 22.04.2011 18:48, schrieb Kai Meyer:
 I don't think C# is the next C++; it's impossible for C# to be what
 C/C++ is. There is a purpose and a place for Interpreted languages like
 C# and Java, just like there is for C/C++. What language do you think
 the interpreters for Java and C# are written in? (Hint: It's not Java or
 C#.) I also don't think that the core of Unity (or any decent game
 engine) is written in an interpreted language either, which basically
 means the guts are likely written in either C or C++. The point being
 made is that Systems Programming Languages like C/C++ and D are picked
 for their execution speed, and Interpreted Languages are picked for
 their ease of programming (or development speed). Since D is picked for
 execution speed, we should seriously consider every opportunity to
 improve in that arena. The OP wasn't just for the game developers, but
 for game framework developers as well.

IMHO D won't be successful for games as long as it only supports Windows, Linux and OSX on PC (-like) hardware. We'd need support for modern game consoles (XBOX360, PS3, maybe Wii) and for mobile devices (Android, iOS, maybe Win7 phones and other stuff). This means good PPC (maybe the PS3's Cell CPU would need special support even though it's understands PPC code? I don't know.) and ARM support and support for the operating systems and SDKs used on those platforms. Of course execution speed is very important as well, but D in it's current state is not *that* bad in this regard. Sure, the GC is a bit slow, but in high performance games you shouldn't use it (or even malloc/free) all the time, anyway, see http://www.digitalmars.com/d/2.0/memory.html#realtime Another point: I find Minecraft pretty impressive. It really changed my view upon Games developed in Java. Cheers, - Daniel

Hah, Minecraft. Have you tried loading up a high resolution texture pack yet? There's a reason why it looks like 8-bit graphics. It's not Java that makes Minecraft awesome, imo :)

No I haven't. What I find impressive is this (almost infinitely) big world that is completely changeable, i.e. you can build new stuff everywhere, you can dig tunnels everywhere (ok, somewhere really deep there's a limit) and the game still runs smoothly. Haven't seen something like that in any game before.

Yes, that is why Minecraft is so appealing, but AFAIK that is more of a game design issue than a technical one. It may not be easy to implement such an engine, but I'm sure many game coders out there could have done it, it's not "rocket" science. Rather, it was the gameplay design idea (and fleshing it out) that made Minecraft unique and popular, AFAIK. -- Bruno Medeiros - Software Engineer
Apr 29 2011
prev sibling parent Andrew Wiley <wiley.andrew.j gmail.com> writes:
--001636c594ffff0d9604a186c6ed
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Apr 22, 2011 at 12:31 PM, Kai Meyer <kai unixlords.com> wrote:

 On 04/22/2011 11:20 AM, Daniel Gibson wrote:

 Am 22.04.2011 19:11, schrieb Kai Meyer:

 On 04/22/2011 11:05 AM, Daniel Gibson wrote:

 Am 22.04.2011 18:48, schrieb Kai Meyer:

 I don't think C# is the next C++; it's impossible for C# to be what
 C/C++ is. There is a purpose and a place for Interpreted languages like
 C# and Java, just like there is for C/C++. What language do you think
 the interpreters for Java and C# are written in? (Hint: It's not Java
 or
 C#.) I also don't think that the core of Unity (or any decent game
 engine) is written in an interpreted language either, which basically
 means the guts are likely written in either C or C++. The point being
 made is that Systems Programming Languages like C/C++ and D are picked
 for their execution speed, and Interpreted Languages are picked for
 their ease of programming (or development speed). Since D is picked for
 execution speed, we should seriously consider every opportunity to
 improve in that arena. The OP wasn't just for the game developers, but
 for game framework developers as well.

IMHO D won't be successful for games as long as it only supports Windows, Linux and OSX on PC (-like) hardware. We'd need support for modern game consoles (XBOX360, PS3, maybe Wii) and for mobile devices (Android, iOS, maybe Win7 phones and other stuff). This means good PPC (maybe the PS3's Cell CPU would need special support even though it's understands PPC code? I don't know.) and ARM support and support for the operating systems and SDKs used on those platforms. Of course execution speed is very important as well, but D in it's current state is not *that* bad in this regard. Sure, the GC is a bit slow, but in high performance games you shouldn't use it (or even malloc/free) all the time, anyway, see http://www.digitalmars.com/d/2.0/memory.html#realtime Another point: I find Minecraft pretty impressive. It really changed my view upon Games developed in Java. Cheers, - Daniel

Hah, Minecraft. Have you tried loading up a high resolution texture pack yet? There's a reason why it looks like 8-bit graphics. It's not Java that makes Minecraft awesome, imo :)

No I haven't. What I find impressive is this (almost infinitely) big world that is completely changeable, i.e. you can build new stuff everywhere, you can dig tunnels everywhere (ok, somewhere really deep there's a limit) and the game still runs smoothly. Haven't seen something like that in any game before.

The random world generator is amazing, but it's not speed. The polygon count of the game is excruciatingly low because the client is smart enough to only draw the faces of blocks that are visible. The very bottom (bedrock) and they very top of the sky (as high as you can build blocks) is 256 blocks tall. The game is full of low-level bit-stuffing (like stacks of 64). The genius of the game is not in any special features of Java, it's in the data structure and data generator, which can be done much faster in other languages. But it begs the question, "why does it need to be faster?" It is "fast enough" in the JVM (unless you load up the high resolution textures, in which case the game becomes unbearably slow when viewing long distances.)

Actually, the world is 128 blocks tall, and divided into 16x128x16 block "chunks." To elaborate on the bit stuffing, at the end of the day, each block is 2.5 bytes (type, metadata, and some lighting info) with exceptions for things like chests. The reason Minecraft runs so well in Java, from my point of view, is that the authors resisted the Java urge to throw objects at the problem and instead put everything into large byte arrays and wrote methods to manipulate them. From that perspective, using Java would be about the same as using any language, which let them stick to what they knew without incurring a large performance penalty. However, it's also true that as soon as you try to use a 128x128 texture pack, you very quickly become disillusioned with Minecraft's performance. --001636c594ffff0d9604a186c6ed Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <br><br><div class=3D"gmail_quote">On Fri, Apr 22, 2011 at 12:31 PM, Kai Me= yer <span dir=3D"ltr">&lt;<a href=3D"mailto:kai unixlords.com">kai unixlord= s.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"m= argin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> <div><div></div><div class=3D"h5">On 04/22/2011 11:20 AM, Daniel Gibson wro= te:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> Am 22.04.2011 19:11, schrieb Kai Meyer:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> On 04/22/2011 11:05 AM, Daniel Gibson wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> Am 22.04.2011 18:48, schrieb Kai Meyer:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> <br> I don&#39;t think C# is the next C++; it&#39;s impossible for C# to be what= <br> C/C++ is. There is a purpose and a place for Interpreted languages like<br> C# and Java, just like there is for C/C++. What language do you think<br> the interpreters for Java and C# are written in? (Hint: It&#39;s not Java o= r<br> C#.) I also don&#39;t think that the core of Unity (or any decent game<br> engine) is written in an interpreted language either, which basically<br> means the guts are likely written in either C or C++. The point being<br> made is that Systems Programming Languages like C/C++ and D are picked<br> for their execution speed, and Interpreted Languages are picked for<br> their ease of programming (or development speed). Since D is picked for<br> execution speed, we should seriously consider every opportunity to<br> improve in that arena. The OP wasn&#39;t just for the game developers, but<= br> for game framework developers as well.<br> </blockquote> <br> IMHO D won&#39;t be successful for games as long as it only supports<br> Windows, Linux and OSX on PC (-like) hardware.<br> We&#39;d need support for modern game consoles (XBOX360, PS3, maybe Wii) an= d<br> for mobile devices (Android, iOS, maybe Win7 phones and other stuff).<br> This means good PPC (maybe the PS3&#39;s Cell CPU would need special suppor= t<br> even though it&#39;s understands PPC code? I don&#39;t know.) and ARM suppo= rt<br> and support for the operating systems and SDKs used on those platforms.<br> <br> Of course execution speed is very important as well, but D in it&#39;s<br> current state is not *that* bad in this regard. Sure, the GC is a bit<br> slow, but in high performance games you shouldn&#39;t use it (or even<br> malloc/free) all the time, anyway, see<br> <a href=3D"http://www.digitalmars.com/d/2.0/memory.html#realtime" target=3D= "_blank">http://www.digitalmars.com/d/2.0/memory.html#realtime</a><br> <br> Another point: I find Minecraft pretty impressive. It really changed my<br> view upon Games developed in Java.<br> <br> Cheers,<br> - Daniel<br> </blockquote> <br> Hah, Minecraft. Have you tried loading up a high resolution texture pack<br=

va<br> that makes Minecraft awesome, imo :)<br> </blockquote> <br> No I haven&#39;t.<br> What I find impressive is this (almost infinitely) big world that is<br> completely changeable, i.e. you can build new stuff everywhere, you can<br> dig tunnels everywhere (ok, somewhere really deep there&#39;s a limit) and<= br> the game still runs smoothly. Haven&#39;t seen something like that in any<b= r> game before.<br> </blockquote> <br></div></div> The random world generator is amazing, but it&#39;s not speed. The polygon = count of the game is excruciatingly low because the client is smart enough = to only draw the faces of blocks that are visible. The very bottom (bedrock= ) and they very top of the sky (as high as you can build blocks) is 256 blo= cks tall. The game is full of low-level bit-stuffing (like stacks of 64). T= he genius of the game is not in any special features of Java, it&#39;s in t= he data structure and data generator, which can be done much faster in othe= r languages. But it begs the question, &quot;why does it need to be faster?= &quot; It is &quot;fast enough&quot; in the JVM (unless you load up the hig= h resolution textures, in which case the game becomes unbearably slow when = viewing long distances.)<br> </blockquote><div><br></div><div>Actually, the world is 128 blocks tall, an= d divided into 16x128x16 block &quot;chunks.&quot;=A0</div><div>To elaborat= e on the bit stuffing, at the end of the day, each block is 2.5 bytes (type= , metadata, and some lighting info) with exceptions for things like chests.= </div> <div><br></div><div>The reason Minecraft runs so well in Java, from my poin= t of view, is that the authors resisted the Java urge to throw objects at t= he problem and instead put everything into large byte arrays and wrote meth= ods to manipulate them. From that perspective, using Java would be about th= e same as using any language, which let them stick to what they knew withou= t incurring a large performance penalty.</div> <div><br></div><div>However, it&#39;s also true that as soon as you try to = use a 128x128 texture pack, you very quickly become disillusioned with Mine= craft&#39;s performance.</div></div><br> --001636c594ffff0d9604a186c6ed--
Apr 22 2011