digitalmars.D - Consistent bugs with dmd -O -inline in a large project
- Lumi Pakkanen (13/13) Oct 12 2014 I'm creating a somewhat large hobby project with D. I'm enjoying
- ketmar via Digitalmars-d (7/10) Oct 12 2014 haven't you tried DustMite? https://github.com/CyberShadow/DustMite/wiki
- Mike Parker (6/12) Oct 12 2014 You might try to reduce it with DustMite:
- ketmar via Digitalmars-d (6/8) Oct 12 2014 p.s. you can try ldc/gdc too. they using different codegens, and their
- Chris (9/22) Oct 13 2014 I have the same problem. If I don't use -O it works fine (-inline
- Gary Willoughby (4/7) Oct 13 2014 I had the same about a year ago, thought i was going crazy and
- OlaOst (23/30) Oct 13 2014 Here too. I just managed to pare it down to 2 files:
- ketmar via Digitalmars-d (3/4) Oct 13 2014 care to fill bugreport?
- OlaOst (3/7) Oct 13 2014 Added to https://issues.dlang.org/show_bug.cgi?id=13244
- Chris (7/14) Oct 16 2014 I think there is no easy way of finding out where the
- Peter Alexander (10/16) Oct 16 2014 It could be either.
- Sag Academy (3/19) Oct 16 2014 may be it is right
- Chris (28/44) Oct 16 2014 Ok, I've found the flaw in my program. It's code that was left
- Daniel Murphy (18/30) Oct 13 2014 There are a few techniques to try and track this sort of thing down.
- Trass3r (2/4) Oct 16 2014 Step 1) DustMite the heck out of it and create a bug report.
- Chris (15/19) Oct 16 2014 I had planned to use GDC/LDC too, but GDC is 2.064, so no option
- ketmar via Digitalmars-d (8/11) Oct 16 2014 GDC is 2.065.
- Chris (3/16) Oct 17 2014 But why does it say 2.064 here http://dlang.org/download.html?
- Iain Buclaw via Digitalmars-d (6/16) Oct 17 2014 dlang.org is not like a wiki. If I were to send a PR to change that
- Chris (5/28) Oct 17 2014 I see, I see. But that should really be updated on the D
- Iain Buclaw via Digitalmars-d (5/9) Oct 17 2014 And soon to be 2.066 as soon as I apply the last 244 patches between May
- Chris (3/16) Oct 17 2014 Thanks. Good to know.
- Trass3r (1/2) Oct 17 2014 Already 2.066 in the repo.
- ketmar via Digitalmars-d (3/6) Oct 17 2014 yay! you're my hero, do you know that? ;-)
I'm creating a somewhat large hobby project with D. I'm enjoying the ride so far. Unit tests and contract programming have saved me from long bug hunts, but today I ran into a bug that seems to be caused by the -O and -inline flags with dmd. Without the flags the program runs correctly, but -O produces wrong results consistently and -inline seems to cause memory corruption. Now my problem here is that the program has over 5000 lines of code with interdependencies running everywhere so I'm not sure if it's possible to come up with a neat small program that demonstrates the problem for a bug report. What should I do? Am I stuck with not using -O and -inline for now, hoping that things will improve in the future?
Oct 12 2014
On Sun, 12 Oct 2014 15:44:13 +0000 Lumi Pakkanen via Digitalmars-d <digitalmars-d puremagic.com> wrote:What should I do?haven't you tried DustMite? https://github.com/CyberShadow/DustMite/wiki it may help to get reduced test case.Am I stuck with not using -O and -inline for=20 now, hoping that things will improve in the future?yep. people trying hard to squash the bugs from optimiser, but optimiser is a complex beast, so it's not easy. dustmite'd test case (if you'll be able to produce it) can help alot though.
Oct 12 2014
On 10/13/2014 12:44 AM, Lumi Pakkanen wrote:Now my problem here is that the program has over 5000 lines of code with interdependencies running everywhere so I'm not sure if it's possible to come up with a neat small program that demonstrates the problem for a bug report. What should I do? Am I stuck with not using -O and -inline for now, hoping that things will improve in the future?You might try to reduce it with DustMite: https://github.com/CyberShadow/DustMite/wiki --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
Oct 12 2014
On Sun, 12 Oct 2014 15:44:13 +0000 Lumi Pakkanen via Digitalmars-d <digitalmars-d puremagic.com> wrote:What should I do? Am I stuck with not using -O and -inline for=20 now, hoping that things will improve in the future?p.s. you can try ldc/gdc too. they using different codegens, and their optimisers are better that dmd one. yet they aren't "bleeding edge" for parser/phobos, so some recently added/fixed things may not work with them. still worth a try, i think.
Oct 12 2014
On Sunday, 12 October 2014 at 15:44:14 UTC, Lumi Pakkanen wrote:I'm creating a somewhat large hobby project with D. I'm enjoying the ride so far. Unit tests and contract programming have saved me from long bug hunts, but today I ran into a bug that seems to be caused by the -O and -inline flags with dmd. Without the flags the program runs correctly, but -O produces wrong results consistently and -inline seems to cause memory corruption. Now my problem here is that the program has over 5000 lines of code with interdependencies running everywhere so I'm not sure if it's possible to come up with a neat small program that demonstrates the problem for a bug report. What should I do? Am I stuck with not using -O and -inline for now, hoping that things will improve in the future?I have the same problem. If I don't use -O it works fine (-inline is ok). If I use it, I get an error when executing the program. Error executing command run: Program exited with code -11 or Segmentation fault (core dumped) I posted here a few months ago, but to no avail. I still haven't found the answer to the problem. As in your case, my project has become too big to just "try to trace the bug".
Oct 13 2014
On Monday, 13 October 2014 at 09:18:01 UTC, Chris wrote:I have the same problem. If I don't use -O it works fine (-inline is ok). If I use it, I get an error when executing the program.I had the same about a year ago, thought i was going crazy and refactored the program and it went away. Never did find out what was causing it.
Oct 13 2014
On Monday, 13 October 2014 at 13:11:13 UTC, Gary Willoughby wrote:On Monday, 13 October 2014 at 09:18:01 UTC, Chris wrote:Here too. I just managed to pare it down to 2 files: -- main.d -- import std.algorithm; import failsinline; void main() { auto fail = new FailsInline(); } -- main.d -- -- failsinline.d -- import std.algorithm; import std.array; void failsinline() { auto transform = (int i) => i; [0].map!transform.array; } -- failsinline.d -- 'rdmd main.d' works fine. 'rdmd -inline main.d' gives object.Error (0): Access Violation. Removing the std.algorithm import from main.d makes it work fine. Same issue in dmd 2.066, 2.066-rc2 and 2.067-b1.I have the same problem. If I don't use -O it works fine (-inline is ok). If I use it, I get an error when executing the program.I had the same about a year ago, thought i was going crazy and refactored the program and it went away. Never did find out what was causing it.
Oct 13 2014
On Mon, 13 Oct 2014 14:28:49 +0000 OlaOst via Digitalmars-d <digitalmars-d puremagic.com> wrote:Here too. I just managed to pare it down to 2 files:care to fill bugreport?
Oct 13 2014
On Monday, 13 October 2014 at 14:53:24 UTC, ketmar via Digitalmars-d wrote:On Mon, 13 Oct 2014 14:28:49 +0000 OlaOst via Digitalmars-d <digitalmars-d puremagic.com> wrote:Added to https://issues.dlang.org/show_bug.cgi?id=13244Here too. I just managed to pare it down to 2 files:care to fill bugreport?
Oct 13 2014
On Monday, 13 October 2014 at 13:11:13 UTC, Gary Willoughby wrote:On Monday, 13 October 2014 at 09:18:01 UTC, Chris wrote:I think there is no easy way of finding out where the optimization goes wrong. But should this happen at all, i.e. does it point to a flaw in my program or is it a compiler bug? I like to think it's the latter, after all the program works perfectly without -O. On the other hand, it's scary because I have no clue where to look for the offender.I have the same problem. If I don't use -O it works fine (-inline is ok). If I use it, I get an error when executing the program.I had the same about a year ago, thought i was going crazy and refactored the program and it went away. Never did find out what was causing it.
Oct 16 2014
On Thursday, 16 October 2014 at 08:45:18 UTC, Chris wrote:I think there is no easy way of finding out where the optimization goes wrong. But should this happen at all, i.e. does it point to a flaw in my program or is it a compiler bug? I like to think it's the latter, after all the program works perfectly without -O. On the other hand, it's scary because I have no clue where to look for the offender.It could be either. Sometimes, if you program relies on undefined behaviour, enabling optimizations might be what uncovers the bug, and manifest as a crash. On the other hand, it could be just a compiler bug. It has happened several times to me with DMD, so it's not entirely unlikely. These things happen. Run Dustmite, reduce, and if you still think you're program is right, file a bug against DMD.
Oct 16 2014
On Thursday, 16 October 2014 at 10:25:12 UTC, Peter Alexander wrote:On Thursday, 16 October 2014 at 08:45:18 UTC, Chris wrote:may be it is rightI think there is no easy way of finding out where the optimization goes wrong. But should this happen at all, i.e. does it point to a flaw in my program or is it a compiler bug? I like to think it's the latter, after all the program works perfectly without -O. On the other hand, it's scary because I have no clue where to look for the offender.It could be either. Sometimes, if you program relies on undefined behaviour, enabling optimizations might be what uncovers the bug, and manifest as a crash. On the other hand, it could be just a compiler bug. It has happened several times to me with DMD, so it's not entirely unlikely. These things happen. Run Dustmite, reduce, and if you still think you're program is right, file a bug against DMD.
Oct 16 2014
On Thursday, 16 October 2014 at 10:25:12 UTC, Peter Alexander wrote:On Thursday, 16 October 2014 at 08:45:18 UTC, Chris wrote:Ok, I've found the flaw in my program. It's code that was left over after refactoring some modules. It looks like this (simplified): I import a module and access an enum in that module. However, I never use the accessed element module politician.answer; enum { Statement = "Blah" } mixin template PressConference { int STATEMENT_LEN; // ... } ------------- module press.article; mixin PressConference; this() { STATEMENT_LEN = Statement.length; // Not good! Left over code from refactoring. } Even worse, I never use STATEMENT_LEN in this class. The whole logic is non-sense and is due to my not cleaning up the constructor. So the optimizer optimized this away, seeing that it is never used, but it is still accessed in the class constructor. My question now is, shouldn't the optimizer have noticed that it is still being accessed? Or what did the optimizer actually do. This helped me to find "dead code" at least.I think there is no easy way of finding out where the optimization goes wrong. But should this happen at all, i.e. does it point to a flaw in my program or is it a compiler bug? I like to think it's the latter, after all the program works perfectly without -O. On the other hand, it's scary because I have no clue where to look for the offender.It could be either. Sometimes, if you program relies on undefined behaviour, enabling optimizations might be what uncovers the bug, and manifest as a crash. On the other hand, it could be just a compiler bug. It has happened several times to me with DMD, so it's not entirely unlikely. These things happen. Run Dustmite, reduce, and if you still think you're program is right, file a bug against DMD.
Oct 16 2014
On Thursday, 16 October 2014 at 11:54:09 UTC, Chris wrote:Ok, I've found the flaw in my program. It's code that was left over after refactoring some modules. It looks like this (simplified): I import a module and access an enum in that module. However, I never use the accessed element module politician.answer; enum { Statement = "Blah" } mixin template PressConference { int STATEMENT_LEN; // ... } ------------- module press.article; mixin PressConference; this() { STATEMENT_LEN = Statement.length; // Not good! Left over code from refactoring. } Even worse, I never use STATEMENT_LEN in this class. The whole logic is non-sense and is due to my not cleaning up the constructor. So the optimizer optimized this away, seeing that it is never used, but it is still accessed in the class constructor. My question now is, shouldn't the optimizer have noticed that it is still being accessed? Or what did the optimizer actually do. This helped me to find "dead code" at least.Update on the above. I actually do use the variable STATEMENT_LEN later in the mixed in code. This escapes the optimizer. mixin template PressConference { int STATEMENT_LEN; // ... void someFunction() { // uses STATEMENT_LEN } } Hm.
Oct 16 2014
On Thursday, 16 October 2014 at 13:35:57 UTC, Chris wrote:On Thursday, 16 October 2014 at 11:54:09 UTC, Chris wrote:If I compile with -release -noboundscheck -inline (but without -O), I get this error: Internal error: backend\cod4.c 358 If I compile with -O -release -noboundscheck -inline It compiles, but crashes. The only thing that works is: -release -noboundscheck What is the optimizer optimizing away?Ok, I've found the flaw in my program. It's code that was left over after refactoring some modules. It looks like this (simplified): I import a module and access an enum in that module. However, I never use the accessed element module politician.answer; enum { Statement = "Blah" } mixin template PressConference { int STATEMENT_LEN; // ... } ------------- module press.article; mixin PressConference; this() { STATEMENT_LEN = Statement.length; // Not good! Left over code from refactoring. } Even worse, I never use STATEMENT_LEN in this class. The whole logic is non-sense and is due to my not cleaning up the constructor. So the optimizer optimized this away, seeing that it is never used, but it is still accessed in the class constructor. My question now is, shouldn't the optimizer have noticed that it is still being accessed? Or what did the optimizer actually do. This helped me to find "dead code" at least.Update on the above. I actually do use the variable STATEMENT_LEN later in the mixed in code. This escapes the optimizer. mixin template PressConference { int STATEMENT_LEN; // ... void someFunction() { // uses STATEMENT_LEN } } Hm.
Oct 16 2014
Ok. It was the compiler. To reproduce the error, I wrote a small example: import std.stdio; import std.algorithm : sort; enum { Answers = [ "Are you corrupt?" : "No!", "Will you resign?" : "No!" ] } void main() { auto journalist = new myClass; journalist.printAnswers(); } class myClass { mixin News; this() { Questions = Answers.keys(); // Only here to do what my program does sort!((a, b) => a.length > b.length)(Questions); } protected void printAnswers() { foreach (q; Questions) { writefln("Q: %s\nA: %s", q, getAnswer(q)); } } } mixin template News() { string[] Questions; auto getAnswer(string q) { return Answers[q]; } } [version 2.065] $ dmd optimizer.d -O -release -inline -noboundscheck $ ./optimizer $ Segmentation fault (core dumped) [versino 2.066] $ dmd optimizer.d -O -release -inline -noboundscheck $ ./optimizer Q: Are you corrupt? A: No! Q: Will you resign? A: No! $ Sorry, I couldn't try it with 2.066 first, because I still have to update my code base.
Oct 16 2014
"Lumi Pakkanen" wrote in message news:choqmgtkydoxleapeyhw forum.dlang.org...I'm creating a somewhat large hobby project with D. I'm enjoying the ride so far. Unit tests and contract programming have saved me from long bug hunts, but today I ran into a bug that seems to be caused by the -O and -inline flags with dmd. Without the flags the program runs correctly, but -O produces wrong results consistently and -inline seems to cause memory corruption. Now my problem here is that the program has over 5000 lines of code with interdependencies running everywhere so I'm not sure if it's possible to come up with a neat small program that demonstrates the problem for a bug report. What should I do? Am I stuck with not using -O and -inline for now, hoping that things will improve in the future?There are a few techniques to try and track this sort of thing down. 0. Build dmd from the lastest master and see if it works (if you haven't done this already). The bug may have been fixed. 1. As others have suggested, run dustime on your code. It's magical. 2. Do a binary search, compiling with some modules not using -inline (or instead with -O). Then, do the same with functions within the module, moving them to another module (or using d/di split) to prevent inlining. When the caller function is found, disable inlining of the potential problematic callees by adding asm { nop; } or similar to their body. 3. Spend some quality time with a debugger and a disassembler, tracing back from the fault to find out where it all went wrong. This becomes more difficult, but still possible if the call stack is corrupted. This could be the fastest or the slowest method depending on your luck. Usual debugging tools like valgrind may be a huge help. 4. Switching word size (-m32/-m64) may make the problem go away, if that's an option for your project.
Oct 13 2014
What should I do? Am I stuck with not using -O and -inline for now, hoping that things will improve in the future?Step 1) DustMite the heck out of it and create a bug report. Step 2) Start using ldc/gdc for release builds if possible.
Oct 16 2014
On Thursday, 16 October 2014 at 21:23:09 UTC, Trass3r wrote:I had planned to use GDC/LDC too, but GDC is 2.064, so no option for me. LDC is 2.065, that would still be ok for my program (although I've just updated my code to 2.066). I always use DMD for development (short compilation times), and even for release builds I use DMD, if I build the program in order to give it out for first tests. Andrei mentioned that DMD built programs are only around 10% slower than GDC/LDC builds, in many cases I can put up with that. So DMD should not have any serious issues with release builds, imo, even if alternatives exist. Next time I'll use DustMite too, once I've learned how to use it properly. However, I managed to find the place where things went wrong quite fast by using the oldest and (sometimes still the best) debugging tool: inserting writeln() statements along the path of initializations/routines in the program.What should I do? Am I stuck with not using -O and -inline for now, hoping that things will improve in the future?Step 1) DustMite the heck out of it and create a bug report. Step 2) Start using ldc/gdc for release build
Oct 16 2014
On Thu, 16 Oct 2014 21:53:40 +0000 Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:I had planned to use GDC/LDC too, but GDC is 2.064GDC is 2.065.Andrei mentioned that DMD built programs are=20 only around 10% slower than GDC/LDC buildsit depends of the task. my voxel renderer runs with miserable 15 FPS with dmd -O -inline, yet with much more appropriate 40 FPS with gdc -O2. but it's a specific task, many other software can work with reasonable speed.
Oct 16 2014
On Thursday, 16 October 2014 at 22:01:37 UTC, ketmar via Digitalmars-d wrote:On Thu, 16 Oct 2014 21:53:40 +0000 Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:But why does it say 2.064 here http://dlang.org/download.html?I had planned to use GDC/LDC too, but GDC is 2.064GDC is 2.065.Andrei mentioned that DMD built programs are only around 10% slower than GDC/LDC buildsit depends of the task. my voxel renderer runs with miserable 15 FPS with dmd -O -inline, yet with much more appropriate 40 FPS with gdc -O2. but it's a specific task, many other software can work with reasonable speed.
Oct 17 2014
On 17 October 2014 09:53, Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:On Thursday, 16 October 2014 at 22:01:37 UTC, ketmar via Digitalmars-d wrote:dlang.org is not like a wiki. If I were to send a PR to change that to 2.065, the site probably won't be updated until the 2.067 release, by which point that information will be wrong again. IainOn Thu, 16 Oct 2014 21:53:40 +0000 Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:But why does it say 2.064 here http://dlang.org/download.html?I had planned to use GDC/LDC too, but GDC is 2.064GDC is 2.065.
Oct 17 2014
On Friday, 17 October 2014 at 09:15:37 UTC, Iain Buclaw via Digitalmars-d wrote:On 17 October 2014 09:53, Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:I see, I see. But that should really be updated on the D homepage. After all it's the first port of call for D programmers. If I cannot trust the information there ...On Thursday, 16 October 2014 at 22:01:37 UTC, ketmar via Digitalmars-d wrote:dlang.org is not like a wiki. If I were to send a PR to change that to 2.065, the site probably won't be updated until the 2.067 release, by which point that information will be wrong again. IainOn Thu, 16 Oct 2014 21:53:40 +0000 Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:But why does it say 2.064 here http://dlang.org/download.html?I had planned to use GDC/LDC too, but GDC is 2.064GDC is 2.065.
Oct 17 2014
On 16 Oct 2014 23:01, "ketmar via Digitalmars-d" < digitalmars-d puremagic.com> wrote:On Thu, 16 Oct 2014 21:53:40 +0000 Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:And soon to be 2.066 as soon as I apply the last 244 patches between May and the final release date. Iain.I had planned to use GDC/LDC too, but GDC is 2.064GDC is 2.065.
Oct 17 2014
On Friday, 17 October 2014 at 07:02:53 UTC, Iain Buclaw via Digitalmars-d wrote:On 16 Oct 2014 23:01, "ketmar via Digitalmars-d" < digitalmars-d puremagic.com> wrote:Thanks. Good to know.On Thu, 16 Oct 2014 21:53:40 +0000 Chris via Digitalmars-d <digitalmars-d puremagic.com> wrote:And soon to be 2.066 as soon as I apply the last 244 patches between May and the final release date. Iain.I had planned to use GDC/LDC too, but GDC is 2.064GDC is 2.065.
Oct 17 2014
On Fri, 17 Oct 2014 08:02:42 +0100 Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> wrote:yay! you're my hero, do you know that? ;-)GDC is 2.065.And soon to be 2.066 as soon as I apply the last 244 patches between May and the final release date.
Oct 17 2014