www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Consistent bugs with dmd -O -inline in a large project

reply "Lumi Pakkanen" <frostburn suomi24.fi> writes:
I'm creating a somewhat large hobby project with D. I'm enjoying 
the ride so far. Unit tests and contract programming have saved 
me from long bug hunts, but today I ran into a bug that seems to 
be caused by the -O and -inline flags with dmd.

Without the flags the program runs correctly, but -O produces 
wrong results consistently and -inline seems to cause memory 
corruption.

Now my problem here is that the program has over 5000 lines of 
code with interdependencies running everywhere so I'm not sure if 
it's possible to come up with a neat small program that 
demonstrates the problem for a bug report.

What should I do? Am I stuck with not using -O and -inline for 
now, hoping that things will improve in the future?
Oct 12 2014
next sibling parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sun, 12 Oct 2014 15:44:13 +0000
Lumi Pakkanen via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 What should I do?
haven't you tried DustMite? https://github.com/CyberShadow/DustMite/wiki it may help to get reduced test case.
 Am I stuck with not using -O and -inline for=20
 now, hoping that things will improve in the future?
yep. people trying hard to squash the bugs from optimiser, but optimiser is a complex beast, so it's not easy. dustmite'd test case (if you'll be able to produce it) can help alot though.
Oct 12 2014
prev sibling next sibling parent Mike Parker <aldacron gmail.com> writes:
On 10/13/2014 12:44 AM, Lumi Pakkanen wrote:
 Now my problem here is that the program has over 5000 lines of code with
 interdependencies running everywhere so I'm not sure if it's possible to
 come up with a neat small program that demonstrates the problem for a
 bug report.

 What should I do? Am I stuck with not using -O and -inline for now,
 hoping that things will improve in the future?
You might try to reduce it with DustMite: https://github.com/CyberShadow/DustMite/wiki --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
Oct 12 2014
prev sibling next sibling parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sun, 12 Oct 2014 15:44:13 +0000
Lumi Pakkanen via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 What should I do? Am I stuck with not using -O and -inline for=20
 now, hoping that things will improve in the future?
p.s. you can try ldc/gdc too. they using different codegens, and their optimisers are better that dmd one. yet they aren't "bleeding edge" for parser/phobos, so some recently added/fixed things may not work with them. still worth a try, i think.
Oct 12 2014
prev sibling next sibling parent reply "Chris" <wendlec tcd.ie> writes:
On Sunday, 12 October 2014 at 15:44:14 UTC, Lumi Pakkanen wrote:
 I'm creating a somewhat large hobby project with D. I'm 
 enjoying the ride so far. Unit tests and contract programming 
 have saved me from long bug hunts, but today I ran into a bug 
 that seems to be caused by the -O and -inline flags with dmd.

 Without the flags the program runs correctly, but -O produces 
 wrong results consistently and -inline seems to cause memory 
 corruption.

 Now my problem here is that the program has over 5000 lines of 
 code with interdependencies running everywhere so I'm not sure 
 if it's possible to come up with a neat small program that 
 demonstrates the problem for a bug report.

 What should I do? Am I stuck with not using -O and -inline for 
 now, hoping that things will improve in the future?
I have the same problem. If I don't use -O it works fine (-inline is ok). If I use it, I get an error when executing the program. Error executing command run: Program exited with code -11 or Segmentation fault (core dumped) I posted here a few months ago, but to no avail. I still haven't found the answer to the problem. As in your case, my project has become too big to just "try to trace the bug".
Oct 13 2014
parent reply "Gary Willoughby" <dev nomad.so> writes:
On Monday, 13 October 2014 at 09:18:01 UTC, Chris wrote:
 I have the same problem. If I don't use -O it works fine 
 (-inline is ok). If I use it, I get an error when executing the 
 program.
I had the same about a year ago, thought i was going crazy and refactored the program and it went away. Never did find out what was causing it.
Oct 13 2014
next sibling parent reply "OlaOst" <olaa81 gmail.com> writes:
On Monday, 13 October 2014 at 13:11:13 UTC, Gary Willoughby wrote:
 On Monday, 13 October 2014 at 09:18:01 UTC, Chris wrote:
 I have the same problem. If I don't use -O it works fine 
 (-inline is ok). If I use it, I get an error when executing 
 the program.
I had the same about a year ago, thought i was going crazy and refactored the program and it went away. Never did find out what was causing it.
Here too. I just managed to pare it down to 2 files: -- main.d -- import std.algorithm; import failsinline; void main() { auto fail = new FailsInline(); } -- main.d -- -- failsinline.d -- import std.algorithm; import std.array; void failsinline() { auto transform = (int i) => i; [0].map!transform.array; } -- failsinline.d -- 'rdmd main.d' works fine. 'rdmd -inline main.d' gives object.Error (0): Access Violation. Removing the std.algorithm import from main.d makes it work fine. Same issue in dmd 2.066, 2.066-rc2 and 2.067-b1.
Oct 13 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Mon, 13 Oct 2014 14:28:49 +0000
OlaOst via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 Here too. I just managed to pare it down to 2 files:
care to fill bugreport?
Oct 13 2014
parent "OlaOst" <olaa81 gmail.com> writes:
On Monday, 13 October 2014 at 14:53:24 UTC, ketmar via 
Digitalmars-d wrote:
 On Mon, 13 Oct 2014 14:28:49 +0000
 OlaOst via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 Here too. I just managed to pare it down to 2 files:
care to fill bugreport?
Added to https://issues.dlang.org/show_bug.cgi?id=13244
Oct 13 2014
prev sibling parent reply "Chris" <wendlec tcd.ie> writes:
On Monday, 13 October 2014 at 13:11:13 UTC, Gary Willoughby wrote:
 On Monday, 13 October 2014 at 09:18:01 UTC, Chris wrote:
 I have the same problem. If I don't use -O it works fine 
 (-inline is ok). If I use it, I get an error when executing 
 the program.
I had the same about a year ago, thought i was going crazy and refactored the program and it went away. Never did find out what was causing it.
I think there is no easy way of finding out where the optimization goes wrong. But should this happen at all, i.e. does it point to a flaw in my program or is it a compiler bug? I like to think it's the latter, after all the program works perfectly without -O. On the other hand, it's scary because I have no clue where to look for the offender.
Oct 16 2014
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Thursday, 16 October 2014 at 08:45:18 UTC, Chris wrote:
 I think there is no easy way of finding out where the 
 optimization goes wrong. But should this happen at all, i.e. 
 does it point to a flaw in my program or is it a compiler bug? 
 I like to think it's the latter, after all the program works 
 perfectly without -O. On the other hand, it's scary because I 
 have no clue where to look for the offender.
It could be either. Sometimes, if you program relies on undefined behaviour, enabling optimizations might be what uncovers the bug, and manifest as a crash. On the other hand, it could be just a compiler bug. It has happened several times to me with DMD, so it's not entirely unlikely. These things happen. Run Dustmite, reduce, and if you still think you're program is right, file a bug against DMD.
Oct 16 2014
next sibling parent "Sag Academy" <sagacademyjaipur gmail.com> writes:
On Thursday, 16 October 2014 at 10:25:12 UTC, Peter Alexander
wrote:
 On Thursday, 16 October 2014 at 08:45:18 UTC, Chris wrote:
 I think there is no easy way of finding out where the 
 optimization goes wrong. But should this happen at all, i.e. 
 does it point to a flaw in my program or is it a compiler bug? 
 I like to think it's the latter, after all the program works 
 perfectly without -O. On the other hand, it's scary because I 
 have no clue where to look for the offender.
It could be either. Sometimes, if you program relies on undefined behaviour, enabling optimizations might be what uncovers the bug, and manifest as a crash. On the other hand, it could be just a compiler bug. It has happened several times to me with DMD, so it's not entirely unlikely. These things happen. Run Dustmite, reduce, and if you still think you're program is right, file a bug against DMD.
may be it is right
Oct 16 2014
prev sibling parent reply "Chris" <wendlec tcd.ie> writes:
On Thursday, 16 October 2014 at 10:25:12 UTC, Peter Alexander 
wrote:
 On Thursday, 16 October 2014 at 08:45:18 UTC, Chris wrote:
 I think there is no easy way of finding out where the 
 optimization goes wrong. But should this happen at all, i.e. 
 does it point to a flaw in my program or is it a compiler bug? 
 I like to think it's the latter, after all the program works 
 perfectly without -O. On the other hand, it's scary because I 
 have no clue where to look for the offender.
It could be either. Sometimes, if you program relies on undefined behaviour, enabling optimizations might be what uncovers the bug, and manifest as a crash. On the other hand, it could be just a compiler bug. It has happened several times to me with DMD, so it's not entirely unlikely. These things happen. Run Dustmite, reduce, and if you still think you're program is right, file a bug against DMD.
Ok, I've found the flaw in my program. It's code that was left over after refactoring some modules. It looks like this (simplified): I import a module and access an enum in that module. However, I never use the accessed element module politician.answer; enum { Statement = "Blah" } mixin template PressConference { int STATEMENT_LEN; // ... } ------------- module press.article; mixin PressConference; this() { STATEMENT_LEN = Statement.length; // Not good! Left over code from refactoring. } Even worse, I never use STATEMENT_LEN in this class. The whole logic is non-sense and is due to my not cleaning up the constructor. So the optimizer optimized this away, seeing that it is never used, but it is still accessed in the class constructor. My question now is, shouldn't the optimizer have noticed that it is still being accessed? Or what did the optimizer actually do. This helped me to find "dead code" at least.
Oct 16 2014
parent reply "Chris" <wendlec tcd.ie> writes:
On Thursday, 16 October 2014 at 11:54:09 UTC, Chris wrote:
 Ok, I've found the flaw in my program. It's code that was left 
 over after refactoring some modules. It looks like this 
 (simplified):

 I import a module and access an enum in that module. However, I 
 never use the accessed element

 module politician.answer;

 enum { Statement = "Blah" }

 mixin template PressConference {
   int STATEMENT_LEN;
   // ...
 }

 -------------

 module press.article;

 mixin PressConference;
 this() {
   STATEMENT_LEN = Statement.length;  // Not good! Left over 
 code from refactoring.
 }

 Even worse, I never use STATEMENT_LEN in this class. The whole 
 logic is non-sense and is due to my not cleaning up the 
 constructor.

 So the optimizer optimized this away, seeing that it is never 
 used, but it is still accessed in the class constructor.

 My question now is, shouldn't the optimizer have noticed that 
 it is still being accessed? Or what did the optimizer actually 
 do.

 This helped me to find "dead code" at least.
Update on the above. I actually do use the variable STATEMENT_LEN later in the mixed in code. This escapes the optimizer. mixin template PressConference { int STATEMENT_LEN; // ... void someFunction() { // uses STATEMENT_LEN } } Hm.
Oct 16 2014
parent reply "Chris" <wendlec tcd.ie> writes:
On Thursday, 16 October 2014 at 13:35:57 UTC, Chris wrote:
 On Thursday, 16 October 2014 at 11:54:09 UTC, Chris wrote:
 Ok, I've found the flaw in my program. It's code that was left 
 over after refactoring some modules. It looks like this 
 (simplified):

 I import a module and access an enum in that module. However, 
 I never use the accessed element

 module politician.answer;

 enum { Statement = "Blah" }

 mixin template PressConference {
  int STATEMENT_LEN;
  // ...
 }

 -------------

 module press.article;

 mixin PressConference;
 this() {
  STATEMENT_LEN = Statement.length;  // Not good! Left over 
 code from refactoring.
 }

 Even worse, I never use STATEMENT_LEN in this class. The whole 
 logic is non-sense and is due to my not cleaning up the 
 constructor.

 So the optimizer optimized this away, seeing that it is never 
 used, but it is still accessed in the class constructor.

 My question now is, shouldn't the optimizer have noticed that 
 it is still being accessed? Or what did the optimizer actually 
 do.

 This helped me to find "dead code" at least.
Update on the above. I actually do use the variable STATEMENT_LEN later in the mixed in code. This escapes the optimizer. mixin template PressConference { int STATEMENT_LEN; // ... void someFunction() { // uses STATEMENT_LEN } } Hm.
If I compile with -release -noboundscheck -inline (but without -O), I get this error: Internal error: backend\cod4.c 358 If I compile with -O -release -noboundscheck -inline It compiles, but crashes. The only thing that works is: -release -noboundscheck What is the optimizer optimizing away?
Oct 16 2014
parent "Chris" <wendlec tcd.ie> writes:
Ok. It was the compiler. To reproduce the error, I wrote a small 
example:

import std.stdio;
import std.algorithm : sort;

enum {
   Answers = [
     "Are you corrupt?" : "No!",
     "Will you resign?" : "No!"
   ]
}

void main() {
   auto journalist = new myClass;
   journalist.printAnswers();
}

class myClass {
   mixin News;
   this() {
     Questions = Answers.keys();
     // Only here to do what my program does
     sort!((a, b) => a.length > b.length)(Questions);
   }

   protected void printAnswers() {
     foreach (q; Questions) {
       writefln("Q: %s\nA: %s", q, getAnswer(q));
     }
   }
}

mixin template News() {
   string[] Questions;

   auto getAnswer(string q) {
     return Answers[q];
   }
}

[version 2.065]
$ dmd optimizer.d -O -release -inline -noboundscheck
$ ./optimizer
$ Segmentation fault (core dumped)

[versino 2.066]
$ dmd optimizer.d -O -release -inline -noboundscheck
$ ./optimizer
Q: Are you corrupt?
A: No!
Q: Will you resign?
A: No!
$

Sorry, I couldn't try it with 2.066 first, because I still have 
to update my code base.
Oct 16 2014
prev sibling next sibling parent "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Lumi Pakkanen"  wrote in message 
news:choqmgtkydoxleapeyhw forum.dlang.org...

 I'm creating a somewhat large hobby project with D. I'm enjoying the ride 
 so far. Unit tests and contract programming have saved me from long bug 
 hunts, but today I ran into a bug that seems to be caused by the -O 
 and -inline flags with dmd.

 Without the flags the program runs correctly, but -O produces wrong 
 results consistently and -inline seems to cause memory corruption.

 Now my problem here is that the program has over 5000 lines of code with 
 interdependencies running everywhere so I'm not sure if it's possible to 
 come up with a neat small program that demonstrates the problem for a bug 
 report.

 What should I do? Am I stuck with not using -O and -inline for now, hoping 
 that things will improve in the future?
There are a few techniques to try and track this sort of thing down. 0. Build dmd from the lastest master and see if it works (if you haven't done this already). The bug may have been fixed. 1. As others have suggested, run dustime on your code. It's magical. 2. Do a binary search, compiling with some modules not using -inline (or instead with -O). Then, do the same with functions within the module, moving them to another module (or using d/di split) to prevent inlining. When the caller function is found, disable inlining of the potential problematic callees by adding asm { nop; } or similar to their body. 3. Spend some quality time with a debugger and a disassembler, tracing back from the fault to find out where it all went wrong. This becomes more difficult, but still possible if the call stack is corrupted. This could be the fastest or the slowest method depending on your luck. Usual debugging tools like valgrind may be a huge help. 4. Switching word size (-m32/-m64) may make the problem go away, if that's an option for your project.
Oct 13 2014
prev sibling parent reply "Trass3r" <un known.com> writes:
 What should I do? Am I stuck with not using -O and -inline for 
 now, hoping that things will improve in the future?
Step 1) DustMite the heck out of it and create a bug report. Step 2) Start using ldc/gdc for release builds if possible.
Oct 16 2014
parent reply "Chris" <wendlec tcd.ie> writes:
On Thursday, 16 October 2014 at 21:23:09 UTC, Trass3r wrote:
 What should I do? Am I stuck with not using -O and -inline for 
 now, hoping that things will improve in the future?
Step 1) DustMite the heck out of it and create a bug report. Step 2) Start using ldc/gdc for release build
I had planned to use GDC/LDC too, but GDC is 2.064, so no option for me. LDC is 2.065, that would still be ok for my program (although I've just updated my code to 2.066). I always use DMD for development (short compilation times), and even for release builds I use DMD, if I build the program in order to give it out for first tests. Andrei mentioned that DMD built programs are only around 10% slower than GDC/LDC builds, in many cases I can put up with that. So DMD should not have any serious issues with release builds, imo, even if alternatives exist. Next time I'll use DustMite too, once I've learned how to use it properly. However, I managed to find the place where things went wrong quite fast by using the oldest and (sometimes still the best) debugging tool: inserting writeln() statements along the path of initializations/routines in the program.
Oct 16 2014
next sibling parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Thu, 16 Oct 2014 21:53:40 +0000
Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 I had planned to use GDC/LDC too, but GDC is 2.064
GDC is 2.065.
 Andrei mentioned that DMD built programs are=20
 only around 10% slower than GDC/LDC builds
it depends of the task. my voxel renderer runs with miserable 15 FPS with dmd -O -inline, yet with much more appropriate 40 FPS with gdc -O2. but it's a specific task, many other software can work with reasonable speed.
Oct 16 2014
parent reply "Chris" <wendlec tcd.ie> writes:
On Thursday, 16 October 2014 at 22:01:37 UTC, ketmar via 
Digitalmars-d wrote:
 On Thu, 16 Oct 2014 21:53:40 +0000
 Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 I had planned to use GDC/LDC too, but GDC is 2.064
GDC is 2.065.
But why does it say 2.064 here http://dlang.org/download.html?
 Andrei mentioned that DMD built programs are only around 10% 
 slower than GDC/LDC builds
it depends of the task. my voxel renderer runs with miserable 15 FPS with dmd -O -inline, yet with much more appropriate 40 FPS with gdc -O2. but it's a specific task, many other software can work with reasonable speed.
Oct 17 2014
parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 17 October 2014 09:53, Chris via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Thursday, 16 October 2014 at 22:01:37 UTC, ketmar via Digitalmars-d
 wrote:
 On Thu, 16 Oct 2014 21:53:40 +0000
 Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 I had planned to use GDC/LDC too, but GDC is 2.064
GDC is 2.065.
But why does it say 2.064 here http://dlang.org/download.html?
dlang.org is not like a wiki. If I were to send a PR to change that to 2.065, the site probably won't be updated until the 2.067 release, by which point that information will be wrong again. Iain
Oct 17 2014
parent "Chris" <wendlec tcd.ie> writes:
On Friday, 17 October 2014 at 09:15:37 UTC, Iain Buclaw via 
Digitalmars-d wrote:
 On 17 October 2014 09:53, Chris via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Thursday, 16 October 2014 at 22:01:37 UTC, ketmar via 
 Digitalmars-d
 wrote:
 On Thu, 16 Oct 2014 21:53:40 +0000
 Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 I had planned to use GDC/LDC too, but GDC is 2.064
GDC is 2.065.
But why does it say 2.064 here http://dlang.org/download.html?
dlang.org is not like a wiki. If I were to send a PR to change that to 2.065, the site probably won't be updated until the 2.067 release, by which point that information will be wrong again. Iain
I see, I see. But that should really be updated on the D homepage. After all it's the first port of call for D programmers. If I cannot trust the information there ...
Oct 17 2014
prev sibling next sibling parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 16 Oct 2014 23:01, "ketmar via Digitalmars-d" <
digitalmars-d puremagic.com> wrote:
 On Thu, 16 Oct 2014 21:53:40 +0000
 Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 I had planned to use GDC/LDC too, but GDC is 2.064
GDC is 2.065.
And soon to be 2.066 as soon as I apply the last 244 patches between May and the final release date. Iain.
Oct 17 2014
parent "Chris" <wendlec tcd.ie> writes:
On Friday, 17 October 2014 at 07:02:53 UTC, Iain Buclaw via
Digitalmars-d wrote:
 On 16 Oct 2014 23:01, "ketmar via Digitalmars-d" <
 digitalmars-d puremagic.com> wrote:
 On Thu, 16 Oct 2014 21:53:40 +0000
 Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 I had planned to use GDC/LDC too, but GDC is 2.064
GDC is 2.065.
And soon to be 2.066 as soon as I apply the last 244 patches between May and the final release date. Iain.
Thanks. Good to know.
Oct 17 2014
prev sibling next sibling parent "Trass3r" <un known.com> writes:
 LDC is 2.065
Already 2.066 in the repo.
Oct 17 2014
prev sibling parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Fri, 17 Oct 2014 08:02:42 +0100
Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 GDC is 2.065.
And soon to be 2.066 as soon as I apply the last 244 patches between May and the final release date.
yay! you're my hero, do you know that? ;-)
Oct 17 2014