www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - float[] =?UTF-8?B?4oaSIFZlcnRleFtdIOKAkyBkZWNyZWFzZXMgcGVyZm9ybWE=?=

reply David <d dav1d.de> writes:
I am writing a game engine, well I was using a float[] array to store my 
vertices, this worked well, but I have to send more and more uv 
coordinates (and other information) which needn't be stored as `float`'s 
so I moved from a float-Array to a Vertex Array:
https://github.com/Dav1dde/BraLa/blob/master/brala/dine/build
r/tessellator.d#L30 


align(1) struct Vertex {
     float x;
     float y;
     float z;
     float nx;
     float ny;
     float nz;
     float u_terrain;
     float v_terrain;
     float u_biome;
     float v_biome;
}

Everything is still a float, so it's easier. Nothing wrong with that or? 
Well this change decreases my performance by 1000%. My frame rate drops 
from ~12ms per frame to ~120ms per frame. I tried to find the bottleneck 
with `perf` but no results (the time is not spent in the game/engine).

The commit:
https://github.com/Dav1dde/BraLa/commit/02a37a0e46f195f5a46404747d659d26490e6c32

I hope you can see anything wrong. I have no idea!
Jul 24 2012
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
David:

 align(1) struct Vertex {
     float x;
     float y;
     float z;
     float nx;
     float ny;
     float nz;
     float u_terrain;
     float v_terrain;
     float u_biome;
     float v_biome;
 }

 Everything is still a float, so it's easier. Nothing wrong with 
 that or? Well this change decreases my performance by 1000%.

Aligning floats to 1 byte doesn't seem a good idea. Try to remove the aling(1). Bye, bearophile
Jul 24 2012
next sibling parent reply David <d dav1d.de> writes:
Am 24.07.2012 20:57, schrieb bearophile:
 David:
 Everything is still a float, so it's easier. Nothing wrong with that
 or? Well this change decreases my performance by 1000%.

Aligning floats to 1 byte doesn't seem a good idea. Try to remove the aling(1). Bye, bearophile

This makes no difference.
Jul 24 2012
next sibling parent reply Simon <s.d.hammett gmail.com> writes:
On 24/07/2012 20:08, David wrote:
 Am 24.07.2012 20:57, schrieb bearophile:
 David:
 Everything is still a float, so it's easier. Nothing wrong with that
 or? Well this change decreases my performance by 1000%.

Aligning floats to 1 byte doesn't seem a good idea. Try to remove the aling(1). Bye, bearophile

This makes no difference.

Could be that your structs are getting default initialised so you will be getting a constructor called for every instance of a Vertex. This will be a lot slower than a float array. Try void initialising your Vertex arrays. http://dlang.org/declaration.html See the bit Void Initializations near the bottom. Also make sure that you are passing fixed size arrays by reference. -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk
Jul 24 2012
parent David <d dav1d.de> writes:
 Could be that your structs are getting default initialised so you will
 be getting a constructor called for every instance of a Vertex.

 This will be a lot slower than a float array.
 Try void initialising your Vertex arrays.

 http://dlang.org/declaration.html

 See the bit Void Initializations near the bottom.

 Also make sure that you are passing fixed size arrays by reference.

No. The vertices are just created once (with a call to the default ctor) and immedialty added to the Vertex* but they are never instantiated.
Jul 24 2012
prev sibling parent reply David <d dav1d.de> writes:
 Hmm. Could this be a GC-related issue?

Actually this could be. They are stored inside a Vertex* array which is allocated which is allocated with `malloc`, maybe the GC scans all of the created vertex structs? Could this be?
Jul 24 2012
parent reply David <d dav1d.de> writes:
Am 24.07.2012 21:46, schrieb David:
 Hmm. Could this be a GC-related issue?

Actually this could be. They are stored inside a Vertex* array which is allocated which is allocated with `malloc`, maybe the GC scans all of the created vertex structs? Could this be?

import core.memory; GC.disable(); directly when entering main didn't help, so I guess it's not the GC
Jul 24 2012
parent David <d dav1d.de> writes:
 This is strange. You said that you profiled the program and the extra
 time spent is not in user code? Where is it spent then?

This is a damn good question. I tried to debug it manually with writefln's, it showed that glfwSwapBuffers needed the time (which, I looked it up, is just a wrapper around glXSwapBuffers). `perf` showed me nothing, the time was used in some unresolved calls. I will make new tests with perf tomorrow.
Jul 24 2012
prev sibling parent reply David <d dav1d.de> writes:
 I agree. I don't know how the CPU handles misaligned floats, but from
 what I understand, it will do two loads to fetch the two word-aligned
 parts of the float, and then assemble it together. This may be what's
 causing the slowdown.


 T

Remvoing the `align(1)` changes nothing, not 1ms slower or faster, unfortunatly.
Jul 24 2012
parent reply David <d dav1d.de> writes:
Am 25.07.2012 01:10, schrieb Era Scarecrow:
 Remvoing the `align(1)` changes nothing, not 1ms slower or faster,
 unfortunately.

[quote] [code] Vertex[] data; foreach(i; 0..6) { data ~= Vertex(positions[i][0], positions[i][1], positions[i][2], [/code] [/quote] Try using reserve? The new structure size looks like it's about 40 bytes, and aside from resizing I'm not sure why it would have issues. [code] Vertex[] data; data.reserve(6); //following foreach... [/code]

Also not the problem, I returned the whole array at once and it didn't help. But thanks for your idea. The strange thing is, this tessellation function(s) are just run once and then the data is passed to the GPU. So my comment shouldn't have a direct impact on the speed (e.g. GC issue would explain it, but unfortunatly it isn't the GC). I'll try a different compiler, too.
Jul 25 2012
parent reply David <d dav1d.de> writes:
 I'll try a different compiler, too.

It's the same issue with ldc
Jul 25 2012
parent David <d dav1d.de> writes:
Am 25.07.2012 15:44, schrieb Andrea Fontana:
 Have you checked your default compiler/linker args?

 Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:
 I'll try a different compiler, too.

It's the same issue with ldc


They didn't change (of course I changed the args which are different for ldc), what do you exactly mean?
Jul 25 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 24, 2012 at 08:57:08PM +0200, bearophile wrote:
 David:
 
align(1) struct Vertex {
    float x;
    float y;
    float z;
    float nx;
    float ny;
    float nz;
    float u_terrain;
    float v_terrain;
    float u_biome;
    float v_biome;
}

Everything is still a float, so it's easier. Nothing wrong with
that or? Well this change decreases my performance by 1000%.

Aligning floats to 1 byte doesn't seem a good idea. Try to remove the aling(1).

I agree. I don't know how the CPU handles misaligned floats, but from what I understand, it will do two loads to fetch the two word-aligned parts of the float, and then assemble it together. This may be what's causing the slowdown. T -- Маленькие детки - маленькие бедки.
Jul 24 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 24, 2012 at 09:08:10PM +0200, David wrote:
 Am 24.07.2012 20:57, schrieb bearophile:
David:
Everything is still a float, so it's easier. Nothing wrong with that
or? Well this change decreases my performance by 1000%.

Aligning floats to 1 byte doesn't seem a good idea. Try to remove the aling(1). Bye, bearophile

This makes no difference.

Hmm. Could this be a GC-related issue? T -- No! I'm not in denial!
Jul 24 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 24, 2012 at 10:53:05PM +0200, David wrote:
 Am 24.07.2012 21:46, schrieb David:
Hmm. Could this be a GC-related issue?

Actually this could be. They are stored inside a Vertex* array which is allocated which is allocated with `malloc`, maybe the GC scans all of the created vertex structs? Could this be?

import core.memory; GC.disable(); directly when entering main didn't help, so I guess it's not the GC

This is strange. You said that you profiled the program and the extra time spent is not in user code? Where is it spent then? T -- Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.
Jul 24 2012
prev sibling next sibling parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Tue, 24 Jul 2012 22:53:05 +0200, David <d dav1d.de> wrote:

 Am 24.07.2012 21:46, schrieb David:
 Hmm. Could this be a GC-related issue?

Actually this could be. They are stored inside a Vertex* array which is allocated which is allocated with `malloc`, maybe the GC scans all of the created vertex structs? Could this be?

import core.memory; GC.disable(); directly when entering main didn't help, so I guess it's not the GC

As long as you're using malloc, the GC should leave it alone. -- Simen
Jul 24 2012
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, July 25, 2012 00:12:19 David wrote:
 This is strange. You said that you profiled the program and the extra
 time spent is not in user code? Where is it spent then?

This is a damn good question. I tried to debug it manually with writefln's, it showed that glfwSwapBuffers needed the time (which, I looked it up, is just a wrapper around glXSwapBuffers). `perf` showed me nothing, the time was used in some unresolved calls. I will make new tests with perf tomorrow.

dmd comes with a profile built into it. Just compile -profile, and you'll get profile information when you run your program. - Jonathan m Davis
Jul 24 2012
prev sibling next sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Tuesday, 24 July 2012 at 19:42:34 UTC, David wrote:
 I agree. I don't know how the CPU handles misaligned floats, 
 but from
 what I understand, it will do two loads to fetch the two 
 word-aligned
 parts of the float, and then assemble it together. This may be 
 what's
 causing the slowdown.


 T

Remvoing the `align(1)` changes nothing, not 1ms slower or faster, unfortunately.

[quote] [code] Vertex[] data; foreach(i; 0..6) { data ~= Vertex(positions[i][0], positions[i][1], positions[i][2], [/code] [/quote] Try using reserve? The new structure size looks like it's about 40 bytes, and aside from resizing I'm not sure why it would have issues. [code] Vertex[] data; data.reserve(6); //following foreach... [/code]
Jul 24 2012
prev sibling next sibling parent Andrea Fontana <nospam example.com> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Have you checked your default compiler/linker args?=20

Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:

 I'll try a different compiler, too.

It's the same issue with ldc =20

Jul 25 2012
prev sibling next sibling parent reply David <d dav1d.de> writes:
Ok here we go:

perf.data: http://dav1d.de/perf.data

and a fancy image (showing the results of perf): http://dav1d.de/output.png

I hope anyone knows where the time is spent.

Most time spent:
+  53,14%  bralad  [unknown]                   [k] 0xc01e5d2b
Jul 25 2012
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 25-Jul-12 17:54, David wrote:
 Ok here we go:

 perf.data: http://dav1d.de/perf.data

 and a fancy image (showing the results of perf): http://dav1d.de/output.png

 I hope anyone knows where the time is spent.

 Most time spent:
 +  53,14%  bralad  [unknown]                   [k] 0xc01e5d2b

Would be cool to have before/after graph. -- Dmitry Olshansky
Jul 25 2012
parent reply David <d dav1d.de> writes:
Am 25.07.2012 16:23, schrieb Dmitry Olshansky:
 On 25-Jul-12 17:54, David wrote:
 Ok here we go:

 perf.data: http://dav1d.de/perf.data

 and a fancy image (showing the results of perf):
 http://dav1d.de/output.png

 I hope anyone knows where the time is spent.

 Most time spent:
 +  53,14%  bralad  [unknown]                   [k] 0xc01e5d2b

Would be cool to have before/after graph.

I don't know how to make comparisons with perf.data but here is the captured data of the "working" version: http://dav1d.de/output_before.png perf.data: http://dav1d.de/perf_before.data
Jul 25 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 25-Jul-12 19:32, David wrote:
 Am 25.07.2012 16:23, schrieb Dmitry Olshansky:
 On 25-Jul-12 17:54, David wrote:
 Ok here we go:

 perf.data: http://dav1d.de/perf.data

 and a fancy image (showing the results of perf):
 http://dav1d.de/output.png

 I hope anyone knows where the time is spent.

 Most time spent:
 +  53,14%  bralad  [unknown]                   [k] 0xc01e5d2b

Would be cool to have before/after graph.

I don't know how to make comparisons with perf.data but here is the captured data of the "working" version: http://dav1d.de/output_before.png perf.data: http://dav1d.de/perf_before.data

It looks like a syscall/opengl issue. You somehow managed to hit a dark corner of GL driver. It's either a fallback to software (partial) or some extra translation layer. I once had a cool table that showed which GL calls are direct to hardware and which are not for various nvidia cards. Now the trick is to get an idea why. The best idea to debug driver related stuff is to test on some other computer (like different version of OS, video card etc.). Can't quite decipher output but I find it strange that it mentions _d_invariant. You'd better compiler with -release if you care for speed. -- Dmitry Olshansky
Jul 25 2012
parent reply David <d dav1d.de> writes:
 It looks like a syscall/opengl issue. You somehow managed to hit a dark
 corner of GL driver. It's either a fallback to software (partial) or
 some extra translation layer.
 I once had a cool table that showed which GL calls  are direct to
 hardware and which are not for various nvidia cards.

 Now the trick is to get an idea why. The best idea to debug driver
 related stuff is to test on some other computer (like different version
 of OS, video card etc.).

Worst case scenario ... driver issue.
 Can't quite decipher output but I find it strange that it mentions
 _d_invariant. You'd better compiler with -release if you care for speed.

I don't care about speed much, but 1000% less performance is just too bad.
Jul 25 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 26-Jul-12 00:52, David wrote:
 It looks like a syscall/opengl issue. You somehow managed to hit a dark
 corner of GL driver. It's either a fallback to software (partial) or
 some extra translation layer.
 I once had a cool table that showed which GL calls  are direct to
 hardware and which are not for various nvidia cards.

 Now the trick is to get an idea why. The best idea to debug driver
 related stuff is to test on some other computer (like different version
 of OS, video card etc.).

Worst case scenario ... driver issue.

Been there once. I any case I'd try to split coordinates into 2 or 3 interleaved arrays. (like vertex+norm and separately 2 UV). It's usually slower but not 10x ;)
 Can't quite decipher output but I find it strange that it mentions
 _d_invariant. You'd better compiler with -release if you care for speed.

I don't care about speed much, but 1000% less performance is just too bad.

-- Dmitry Olshansky
Jul 25 2012
parent reply David <d dav1d.de> writes:
Am 25.07.2012 23:03, schrieb Dmitry Olshansky:
 On 26-Jul-12 00:52, David wrote:
 It looks like a syscall/opengl issue. You somehow managed to hit a dark
 corner of GL driver. It's either a fallback to software (partial) or
 some extra translation layer.
 I once had a cool table that showed which GL calls  are direct to
 hardware and which are not for various nvidia cards.

 Now the trick is to get an idea why. The best idea to debug driver
 related stuff is to test on some other computer (like different version
 of OS, video card etc.).

Worst case scenario ... driver issue.

Been there once. I any case I'd try to split coordinates into 2 or 3 interleaved arrays. (like vertex+norm and separately 2 UV). It's usually slower but not 10x ;)

Well the intersting question is, why is it slower? I checked it twice, the data passed to the GPU is 100% the same, no difference, the only difference is the stored format on the CPU (and that's just a matter of casting).
Jul 25 2012
parent David <d dav1d.de> writes:
 It's not easy to answer similar general questions. Why don't you list
 the assembly of the two versions and compare?

My assembly is pretty rusty and actually, I have no idea what to look for.
Jul 25 2012
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
David:

 Well the intersting question is, why is it slower? I checked it 
 twice, the data passed to the GPU is 100% the same, no 
 difference, the only difference is the stored format on the CPU 
 (and that's just a matter of casting).

It's not easy to answer similar general questions. Why don't you list the assembly of the two versions and compare? Bye, bearophile
Jul 25 2012
prev sibling next sibling parent Andrea Fontana <nospam example.com> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I had a performance problem with std.xml some month ago. It takes me a
lot to point out that there was a default linker param (in gdc & dmd
under linux) that slow down the whole thing.=20
So maybe it's not a code-related issue, I mean :)
=20

Il giorno mer, 25/07/2012 alle 15.53 +0200, David ha scritto:

 Am 25.07.2012 15:44, schrieb Andrea Fontana:
 Have you checked your default compiler/linker args?

 Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:
 I'll try a different compiler, too.

It's the same issue with ldc


They didn't change (of course I changed the args which are different for=

 ldc), what do you exactly mean?

Jul 25 2012
prev sibling next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 07/24/2012 11:38 AM, David wrote:

 Well this change decreases my performance by 1000%.

Random guess: CPU cache misses? Ali
Jul 25 2012
parent reply David <d dav1d.de> writes:
Am 26.07.2012 00:12, schrieb Ali Çehreli:
 On 07/24/2012 11:38 AM, David wrote:

  > Well this change decreases my performance by 1000%.

 Random guess: CPU cache misses?

 Ali

You're the 2nd one mentioning this, any ideas how to check this?
Jul 25 2012
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 07/25/2012 03:26 PM, David wrote:
 Am 26.07.2012 00:12, schrieb Ali Çehreli:
 On 07/24/2012 11:38 AM, David wrote:

 Well this change decreases my performance by 1000%.

Random guess: CPU cache misses? Ali

You're the 2nd one mentioning this, any ideas how to check this?

I have no experience. Pages like this look promising: http://stackoverflow.com/questions/2486840/linux-c-how-to-profile-time-wasted-due-to-cache-misses Ali
Jul 25 2012
parent David <d dav1d.de> writes:
Am 26.07.2012 00:37, schrieb Ali Çehreli:
 On 07/25/2012 03:26 PM, David wrote:
 Am 26.07.2012 00:12, schrieb Ali Çehreli:
 On 07/24/2012 11:38 AM, David wrote:

 Well this change decreases my performance by 1000%.

Random guess: CPU cache misses? Ali

You're the 2nd one mentioning this, any ideas how to check this?

I have no experience. Pages like this look promising: http://stackoverflow.com/questions/2486840/linux-c-how-to-profile-time-wasted-due-to-cache-misses Ali

From what I've seen everything is ok (I used `perf top -e L1-dcache-load-misses -e L1-dcache-loads` to see the hotspots, nothing too bad)
Jul 25 2012
prev sibling next sibling parent reply David <d dav1d.de> writes:
Ok, interesting thing.

I switched my buffer from Vertex* to void* and I cast every Vertex I get 
to void[] and add it to the buffer (slice → memcopy) and everything 
works fine now. I can live with that (once the basic functions are 
implemented it's not even a pain to use), but still, I wonder where the 
problem is.
Jul 26 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 26-Jul-12 14:14, David wrote:
 Ok, interesting thing.

 I switched my buffer from Vertex* to void* and I cast every Vertex I get
 to void[] and add it to the buffer (slice → memcopy) and everything
 works fine now. I can live with that (once the basic functions are
 implemented it's not even a pain to use), but still, I wonder where the
 problem is.

Hm. Do you ever do pointer arithmetic on Vertex*? Is the size and offsets are correct (like in Vertex vs float)? -- Dmitry Olshansky
Jul 26 2012
parent reply David <d dav1d.de> writes:
 Hm. Do you ever do pointer arithmetic on Vertex*?  Is the size and
 offsets are correct (like in Vertex vs float)?

No, yes. I really have no idea why this happens, I saved the contents of my buffers and compared them with the buffers of the `float[]` version (thanks to `git checkout`) and they were exactly 100% the same. It's a mystery.
Jul 26 2012
parent dennis luehring <dl.soluz gmx.net> writes:
Am 26.07.2012 21:18, schrieb David:
 Hm. Do you ever do pointer arithmetic on Vertex*?  Is the size and
 offsets are correct (like in Vertex vs float)?

No, yes. I really have no idea why this happens, I saved the contents of my buffers and compared them with the buffers of the `float[]` version (thanks to `git checkout`) and they were exactly 100% the same. It's a mystery.

can you create a version of you code thats allows switching (version(Vertex) else ...) between array and Vertex? or provide both versions here again you checked dmd and ldc output so it can't be a backend thing (maybe frontend or GC) - or mysterious GL bugs
Jul 26 2012
prev sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 24.07.2012 20:38, schrieb David:
 I am writing a game engine, well I was using a float[] array to store my
 vertices, this worked well, but I have to send more and more uv
 coordinates (and other information) which needn't be stored as `float`'s
 so I moved from a float-Array to a Vertex Array:
 https://github.com/Dav1dde/BraLa/blob/master/brala/dine/builder/tessellator.d#L30


 align(1) struct Vertex {
      float x;
      float y;
      float z;
      float nx;
      float ny;
      float nz;
      float u_terrain;
      float v_terrain;
      float u_biome;
      float v_biome;
 }

 Everything is still a float, so it's easier. Nothing wrong with that or?
 Well this change decreases my performance by 1000%. My frame rate drops
 from ~12ms per frame to ~120ms per frame. I tried to find the bottleneck
 with `perf` but no results (the time is not spent in the game/engine).

 The commit:
 https://github.com/Dav1dde/BraLa/commit/02a37a0e46f195f5a46404747d659d26490e6c32


 I hope you can see anything wrong. I have no idea!

Check the dissassembly view of this line: buffer[elements++] = Vertex(x, y, z, nx, ny, nz, u, v, u_biome, v_biome); If you are using an old version of dmd it will allocate an block of memory which has the size of Vertex, then it will fill the date into that block of memory, and then memcpy it to your buffer array. You could try working around this by doing: buffer[elements++].__ctor(x, y, z, nx, ny, nz, u, v, u_biome, v_biome); Kind Regards Benjamin Thaut
Aug 24 2012
next sibling parent reply David <d dav1d.de> writes:
 Check the dissassembly view of this line:
 buffer[elements++] = Vertex(x, y, z, nx, ny, nz, u, v, u_biome, v_biome);

 If you are using an old version of dmd it will allocate an block of
 memory which has the size of Vertex, then it will fill the date into
 that block of memory, and then memcpy it to your buffer array.

 You could try working around this by doing:

 buffer[elements++].__ctor(x, y, z, nx, ny, nz, u, v, u_biome, v_biome);

 Kind Regards
 Benjamin Thaut

That's not the problem. The problem has nothing to do with the tessellation, since the *rendering* is also 1000% slower (when all data is already processed).
Aug 24 2012
parent reply David <d dav1d.de> writes:
Am 28.08.2012 01:53, schrieb Sean Kelly:
 On Aug 24, 2012, at 1:16 PM, David <d dav1d.de> wrote:
 That's not the problem. The problem has nothing to do with the tessellation,
since the *rendering* is also 1000% slower (when all data is already processed).

Is the alignment different between one and the other? I would't think so since it's dynamic memory, but the performance difference suggests that it might be.

The arrays are 100% identical (I dumped a Vertex()-array and a raw float-array, they were 100% identical).
Aug 28 2012
parent reply David <d dav1d.de> writes:
Am 28.08.2012 17:41, schrieb bearophile:
 David:

 The arrays are 100% identical (I dumped a Vertex()-array and a raw
 float-array, they were 100% identical).

I hope some people are realizing how much time is being wasted in this thread. Taking a look at the asm is my suggestion still. If someone is rusty in asm, it's time to brush away the rust with a steel brush. Bye, bearophile

You're right, but I also said, that I don't care anylonger, I found a workaround, I can live with it. I generally tend to ignore dmd bugs and just workaround them, I don't have the time to track down every stuipid bug from a ~8k codebase. Thanks anyways for your help.
Aug 28 2012
next sibling parent David <d dav1d.de> writes:
 But I'd like you to not ignore all the bugs you find, and instead
 minimize some of them and submit them to Bugzilla. Despite thousands of
 open bugs and about a hundred of open patches, many bugs do get fixed at
 every release. If you submit bugs, D/DMD will improve, in your future
 you will find less bugs to work around in your D code, and you will help
 other present and future D programmers avoid hitting them. This is
 important because D is young and its community is small. The idea is:
 they give you a compiler/language for free, and you give something back
 to the community submitting some bugs :-)

I totally agree
 I understand you don't care much anymore for the discussed problem, and
 I know that localizing D/DMD bugs requires some time and work.

And that's the problem, I tried to track down a few of the bugs I hit. 50% vanished when I changed unrelated code (cool hugh? getting a segfault in std.net.curl → std.regex → std.functional.memoize, when chaning your ResourceManager, which has really nothing to do with either curl, regex or std.functional nor the module which calls std.net.curl), then I wasn't able to reproduce a few others, in the end, I think, I was able to track down a single dmd bug. That was with a relativly small code-base (maybe 1-2k?) now I have around 8k and I just don't have the time and maybe the knowledge. At least I can fix phobos/druntime bugs. Not sure why I wrote that, I don't wanna whiny, D is great/buggy and I knew it, when I started that project. And I am glad there are people like you, Kenji and lots of others who keep on improving D in their free time (not to forget Walter and Andrei).
Aug 28 2012
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 08/28/2012 06:35 PM, David wrote:
 Am 28.08.2012 17:41, schrieb bearophile:
 David:

 The arrays are 100% identical (I dumped a Vertex()-array and a raw
 float-array, they were 100% identical).

I hope some people are realizing how much time is being wasted in this thread. Taking a look at the asm is my suggestion still. If someone is rusty in asm, it's time to brush away the rust with a steel brush. Bye, bearophile

You're right, but I also said, that I don't care anylonger, I found a workaround, I can live with it. I generally tend to ignore dmd bugs and just workaround them, I don't have the time to track down every stuipid bug from a ~8k codebase. Thanks anyways for your help.

Use this to create a minimal test case with minimal user interaction: https://github.com/CyberShadow/DustMite
Aug 28 2012
parent reply David <d dav1d.de> writes:
 Use this to create a minimal test case with minimal user interaction:
 https://github.com/CyberShadow/DustMite

Doesn't help if dmd doesn't crash, or?
Aug 28 2012
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 08/29/2012 01:26 AM, David wrote:
 Use this to create a minimal test case with minimal user interaction:
 https://github.com/CyberShadow/DustMite

Doesn't help if dmd doesn't crash, or?

It doesn't help a lot if compilation succeeds, but you stated that you generally tend to ignore dmd bugs. Most dmd bugs make compilation fail.
Aug 28 2012
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Aug 24, 2012, at 1:16 PM, David <d dav1d.de> wrote:
=20
 That's not the problem. The problem has nothing to do with the =

is already processed). Is the alignment different between one and the other? I would't think so = since it's dynamic memory, but the performance difference suggests that = it might be.=
Aug 27 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
David:

 The arrays are 100% identical (I dumped a Vertex()-array and a 
 raw float-array, they were 100% identical).

I hope some people are realizing how much time is being wasted in this thread. Taking a look at the asm is my suggestion still. If someone is rusty in asm, it's time to brush away the rust with a steel brush. Bye, bearophile
Aug 28 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
David:

 I generally tend to ignore dmd bugs and just workaround them, I 
 don't have the time to track down every stuipid bug from a ~8k 
 codebase.

I understand you don't care much anymore for the discussed problem, and I know that localizing D/DMD bugs requires some time and work. But I'd like you to not ignore all the bugs you find, and instead minimize some of them and submit them to Bugzilla. Despite thousands of open bugs and about a hundred of open patches, many bugs do get fixed at every release. If you submit bugs, D/DMD will improve, in your future you will find less bugs to work around in your D code, and you will help other present and future D programmers avoid hitting them. This is important because D is young and its community is small. The idea is: they give you a compiler/language for free, and you give something back to the community submitting some bugs :-) Bye and thank you, bearophile
Aug 28 2012
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
On Wed, 29 Aug 2012, Timon Gehr wrote:

 On 08/29/2012 01:26 AM, David wrote:
 Use this to create a minimal test case with minimal user interaction:
 https://github.com/CyberShadow/DustMite

Doesn't help if dmd doesn't crash, or?

It doesn't help a lot if compilation succeeds, but you stated that you generally tend to ignore dmd bugs. Most dmd bugs make compilation fail.

It's more generally useful than that. It can reduce for any set of commands that together produce a binary decision: pass or fail. The key problem is that it does need to be deterministic. It doesn't matter if it's dmd that fails, or an execution of the output code, or really anything that determines pass or fail. The basic pattern is: while (progress can be made) try a reduction if reduction still reproduces the error continue else revert done (it's obviously more complex and there's tons of magic inside try a reduction)
Aug 28 2012