www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - 3d graphics float "benchmarks"

reply redditor <a b.com> writes:
MiniLight: a minimal global illumination renderer (re)written in Scala, OCaml,
Python, Ruby, Lua, Flex and C++ : http://www.hxa.name/minilight/ 

I think it would be interesting to have a D version and compare for the float
performance and maybe even lines of code.

There is another similar "benchmark" here:
http://lucille.atso-net.jp/aobench/

(from reddit)

I'm too much of a D noobie myself to attempt this.
Mar 24 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
redditor:
 There is another similar "benchmark" here:
 http://lucille.atso-net.jp/aobench/
I may send this D2 version to the original author... WIDTH = 256 HEIGHT = 256 NSUBSAMPLES = 2 NAO_SAMPLES = 8 Timings, seconds: Python: 193.6 Psyco: 49.80 C gcc: 3.98 C llvm-gcc: 3.84 Psyco timings may be improved using classes. I have tried to create a ShedSkin version too. presume the timings aren't far from the llvm-gcc ones). D code compiled with: DMD v1.041 -O -release -inline C code compiled with: gcc: V. 4.3.3-dw2-tdm-1 (GCC) LLVM: gcc version 4.2.1 (Based on Apple Inc. build 5636) (LLVM build) For both: -Wall -O3 -s -fomit-frame-pointer -msse3 -march=core2 Python: ActivePython 2.6.1.1 (r261:67515, Dec 5 2008, 13:58:38) [MSC v.1500 32 bit (Intel)] on win32 Psyco for Python 2.6, V.1.6.0 final 0
 I'm too much of a D noobie myself to attempt this.
training to learn D1, for you. ----------------------- import std.c.stdio: fopen, fprintf, fwrite, fclose, FILE; import std.c.stdlib: rand; import std.math: sqrt, cos, sin, PI; const int RAND_MAX = short.max; const int WIDTH = 256; const int HEIGHT = 256; const int NSUBSAMPLES = 2; const int NAO_SAMPLES = 8; double fpRand() { // if not available return rand() / cast(double)RAND_MAX; } struct Vec3 { double x, y, z; static double dot(ref Vec3 v0, ref Vec3 v1) { return v0.x * v1.x + v0.y * v1.y + v0.z * v1.z; } static void cross(ref Vec3 v0, ref Vec3 v1, out Vec3 c) { c.x = v0.y * v1.z - v0.z * v1.y; c.y = v0.z * v1.x - v0.x * v1.z; c.z = v0.x * v1.y - v0.y * v1.x; } static void normalize(ref Vec3 c) { double length = sqrt(dot(c, c)); if (length > 1e-17) { c.x /= length; c.y /= length; c.z /= length; } } } struct RayIntersection { Vec3 rayPosition, rayDirection; double distance; Vec3 hitPosition, normal; bool isHit; } struct Sphere { Vec3 center; double radius; void intersects(ref RayIntersection isect) { Vec3 rs = Vec3(isect.rayPosition.x - center.x, isect.rayPosition.y - center.y, isect.rayPosition.z - center.z); double B = Vec3.dot(rs, isect.rayDirection); double C = Vec3.dot(rs, rs) - radius * radius; double D = B * B - C; if (D > 0.0) { double t = -B - sqrt(D); if (t > 0.0 && t < isect.distance) { isect.distance = t; isect.isHit = true; isect.hitPosition.x = isect.rayPosition.x + isect.rayDirection.x * t; isect.hitPosition.y = isect.rayPosition.y + isect.rayDirection.y * t; isect.hitPosition.z = isect.rayPosition.z + isect.rayDirection.z * t; isect.normal.x = isect.hitPosition.x - center.x; isect.normal.y = isect.hitPosition.y - center.y; isect.normal.z = isect.hitPosition.z - center.z; Vec3.normalize(isect.normal); } } } } struct Plane { Vec3 position, normal; void intersects(ref RayIntersection isect) { double d = -Vec3.dot(position, normal); double v = Vec3.dot(isect.rayDirection, normal); if (-1e-17 < v && v < 1e-17) return; double t = -(Vec3.dot(isect.rayPosition, normal) + d) / v; if (t > 0.0 && t < isect.distance) { isect.distance = t; isect.isHit = true; isect.hitPosition.x = isect.rayPosition.x + isect.rayDirection.x * t; isect.hitPosition.y = isect.rayPosition.y + isect.rayDirection.y * t; isect.hitPosition.z = isect.rayPosition.z + isect.rayDirection.z * t; isect.normal = normal; } } } Sphere[3] spheres; Plane plane; Vec3[] getOrthoBasis(ref Vec3 normal) { auto orthoBasis = new Vec3[3]; orthoBasis[2] = normal; orthoBasis[1].x = 0.0; orthoBasis[1].y = 0.0; orthoBasis[1].z = 0.0; if (normal.x < 0.6 && normal.x > -0.6) { orthoBasis[1].x = 1.0; } else if (normal.y < 0.6 && normal.y > -0.6) { orthoBasis[1].y = 1.0; } else if (normal.z < 0.6 && normal.z > -0.6) { orthoBasis[1].z = 1.0; } else { orthoBasis[1].x = 1.0; } Vec3.cross(orthoBasis[1], orthoBasis[2], orthoBasis[0]); Vec3.normalize(orthoBasis[0]); Vec3.cross(orthoBasis[2], orthoBasis[0], orthoBasis[1]); Vec3.normalize(orthoBasis[1]); return orthoBasis; } void getAmbientOcclusion(ref RayIntersection isect, out Vec3 ambientOcclusion) { int ntheta = NAO_SAMPLES; int nphi = NAO_SAMPLES; const double eps = 0.0001; RayIntersection occIsect; occIsect.rayPosition.x = isect.hitPosition.x + eps * isect.normal.x; occIsect.rayPosition.y = isect.hitPosition.y + eps * isect.normal.y; occIsect.rayPosition.z = isect.hitPosition.z + eps * isect.normal.z; auto basis = getOrthoBasis(isect.normal); int hitCount; for (int j = 0; j < ntheta; j++) { for (int i = 0; i < nphi; i++) { double theta = sqrt(fpRand()); double phi = 2.0 * PI * fpRand(); double x = cos(phi) * theta; double y = sin(phi) * theta; double z = sqrt(1.0 - theta * theta); occIsect.rayDirection.x = x * basis[0].x + y * basis[1].x + z * basis[2].x; occIsect.rayDirection.y = x * basis[0].y + y * basis[1].y + z * basis[2].y; occIsect.rayDirection.z = x * basis[0].z + y * basis[1].z + z * basis[2].z; occIsect.distance = 1.0e+17; occIsect.isHit = false; spheres[0].intersects(occIsect); spheres[1].intersects(occIsect); spheres[2].intersects(occIsect); plane.intersects(occIsect); if (occIsect.isHit) hitCount++; } } double occlusionRatio = cast(double)(ntheta * nphi - hitCount) / cast(double)(ntheta * nphi); ambientOcclusion.x = occlusionRatio; ambientOcclusion.y = occlusionRatio; ambientOcclusion.z = occlusionRatio; } ubyte clamp(double value) { int i = cast(int)(value * 255.5); if (i > 255) i = 255; else if (i < 0) i = 0; return cast(ubyte)i; } void render(ubyte[] byteImage, int width, int height, int numberOfSubSamples) { auto fimg = new double[width * height * 3]; fimg[] = 0.0; RayIntersection isect; isect.rayPosition.x = 0.0; isect.rayPosition.y = 0.0; isect.rayPosition.z = 0.0; for (int y = 0; y < height; y++) { for (int x = 0; x < width; x++) { for (int v = 0; v < numberOfSubSamples; v++) { for (int u = 0; u < numberOfSubSamples; u++) { isect.rayDirection.x = (x + (u / cast(double)numberOfSubSamples) - (width / 2.0)) / (width / 2.0); isect.rayDirection.y = -(y + (v / cast(double)numberOfSubSamples) - (height / 2.0)) / (height / 2.0); isect.rayDirection.z = -1.0; isect.distance = 1.0e+17; isect.isHit = false; Vec3.normalize(isect.rayDirection); spheres[0].intersects(isect); spheres[1].intersects(isect); spheres[2].intersects(isect); plane.intersects(isect); if (isect.isHit) { Vec3 ambientOcclusion; getAmbientOcclusion(isect, ambientOcclusion); fimg[3 * (y * width + x) + 0] += ambientOcclusion.x; fimg[3 * (y * width + x) + 1] += ambientOcclusion.y; fimg[3 * (y * width + x) + 2] += ambientOcclusion.z; } } } fimg[3 * (y * width + x) + 0] /= cast(double)(numberOfSubSamples * numberOfSubSamples); fimg[3 * (y * width + x) + 1] /= cast(double)(numberOfSubSamples * numberOfSubSamples); fimg[3 * (y * width + x) + 2] /= cast(double)(numberOfSubSamples * numberOfSubSamples); byteImage[3 * (y * width + x) + 0] = clamp(fimg[3 * (y * width + x) + 0]); byteImage[3 * (y * width + x) + 1] = clamp(fimg[3 * (y * width + x) + 1]); byteImage[3 * (y * width + x) + 2] = clamp(fimg[3 * (y * width + x) + 2]); } } } void setupScene() { spheres[0].center.x = -2.0; spheres[0].center.y = 0.0; spheres[0].center.z = -3.5; spheres[0].radius = 0.5; spheres[1].center.x = -0.5; spheres[1].center.y = 0.0; spheres[1].center.z = -3.0; spheres[1].radius = 0.5; spheres[2].center.x = 1.0; spheres[2].center.y = 0.0; spheres[2].center.z = -2.2; spheres[2].radius = 0.5; plane.position.x = 0.0; plane.position.y = -0.5; plane.position.z = 0.0; plane.normal.x = 0.0; plane.normal.y = 1.0; plane.normal.z = 0.0; } void savePPM(char* fname, int w, int h, ubyte* img) { FILE *fp; fp = fopen(fname, "wb"); assert(fp); fprintf(fp, "P6\n"); fprintf(fp, "%d %d\n", w, h); fprintf(fp, "255\n"); fwrite(img, w * h * 3, 1, fp); fclose(fp); } void main() { auto img = new ubyte[WIDTH * HEIGHT * 3]; setupScene(); render(img, WIDTH, HEIGHT, NSUBSAMPLES); savePPM("ao3_d.ppm".ptr, WIDTH, HEIGHT, img.ptr); }
Mar 24 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Replacing "double" with "float" in all  the C/D program the output image is the
same, but the timings change:

Timings, seconds:
  Python:     193.6
  Psyco:       49.80
  D2:           6.30
  D3:           4.22
  C2 gcc:       4.06 (float)  
  C gcc:        3.98
  C llvm-gcc:   3.84
  C2 llvm-gcc:  3.73 (float)
  D3b:          3.62 (float)

Using both cores of my CPU the timings probably become half. Someone here may
suggest the changes in the code to use 2/4 cores (on Windows) in that code.

Be careful, the GPUAO versions that are supposed to run in 0.01 s with the GPU,
may contain a virus.

Bye,
bearophile
Mar 24 2009
parent bearophile <bearophileHUGS lycos.com> writes:
I give advice to new Python programmers, they are often people coming from
Java, and they write "Java in Python". One of the many kinds of things they do
wrongly is to use classes for everything, and to build deep class trees. The
result is code that is too much slow and long.

D language isn't much widespread, so many of new D programmers may come from
Java, and D1 compiles code quite similar to basic Java. So they may write
Java-style D code. I have translated naively some small programs from Java to
D1 and the result is often a slow down of the running speed. This shows that D
compilers have a lot of optimizations to catch up from HotSpot (I think two
important "optimizations" of HotSpot are its efficient garbage collector and
the ability to fully or partially inline many virtual methods).

One of those tiny 3D benchmarks shows what I mean:
http://leonardo-m.livejournal.com/79346.html

Few of those timings, seconds:
  ao_d, float:                  3.67 s
  AO.java, float, naive:        6.81 s
  ao2_py with Psyco:           16.72 s
  ao2_d, float, naive:         33.77 s

(Note that I have seen a C++ version that uses threads that runs in about 1
second on my PC, that has 2 cores, so it's about two times faster than the D
code. The code is not too much different among translations because the purpose
of this benchmark is to compare languages and not compare different skills in
manually optimizing code).

ao_d is a fast D version of mine that uses structs and lot of references and
hidden mutation. It's not easy to translate this version to "clean" and
bug-free Java code because of the hidden mutations. Despite being fast I
usually don't like this style of programming, because all those references and
hidden mutations lead to code hard to debug and hard to translate to other
languages (and sooner or later I have to translate lot of the programs I write).

AO.java is a quite naive Java adaptation of the original Processing version.

ao2_d version is a direct D translation of the Java version. It's clean, it's
easy to understand, easy to debug and easy enough to translate to other
languages. It shows that sometimes D code written in Java-like style can be
very slow when run by D. Profiling of ao2_d D code seems to show that part of
the slowdown comes from the virtual methods like dot that aren't inlined. This
little program also shows why Java is so much widespread: it's easy to write
correct and readable programs that aren't that much slower (and probably it's
not too much difficult to add threads to this Java program, making it about as
fast as my faster D version).

So D style guides have to warn new D programmers coming from Java that some of
the optimizations they used to rely on, aren't available, so they have to chage
their style and code if they want to keep/gain performance.

(If someone is willing I'd like to know timings for ao2_py version on a 3+
cores CPU with Psyco, and the D code run with LDC).

Bye,
bearophile
Mar 27 2009
prev sibling parent Moritz Warning <moritzwarning web.de> writes:
On Tue, 24 Mar 2009 11:51:31 -0400, redditor wrote:

 MiniLight: a minimal global illumination renderer (re)written in Scala,
 OCaml, Python, Ruby, Lua, Flex and C++ : http://www.hxa.name/minilight/
 
 I think it would be interesting to have a D version and compare for the
 float performance and maybe even lines of code.
 
 There is another similar "benchmark" here:
 http://lucille.atso-net.jp/aobench/
 
 (from reddit)
 
 I'm too much of a D noobie myself to attempt this.
I'm in the process to port it to D just for fun. Stay tuned. :)
Mar 24 2009