www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Serious Problems with the Test Suite

reply Walter Bright <newshound2 digitalmars.com> writes:
A good test suite should:

1. verify that things that are supposed to work do work

2. when things don't verify, point to where the problem is

The D test suite fails miserably at point 2. The only bright spot is the 
autotester, where when one of the tests fail it's quick to find the problem
source.

But I cringe every time something else fails, because then I know I'm in for 
hours or even DAYS trying to figure out what and where things went wrong.

For example,

https://github.com/dlang/dmd/pull/11287

has several failures. All of which come with USELESS log files. I have no idea 
what went wrong. Some principles for log files:

1. If the log file says ERROR, it should be an ERROR, i.e. the test should
fail. 
I'm often confronted with log files that list multiple ERRORe, but never mind, 
those errors don't need to pass. All benign ERROR messages, all deprecation 
messages, all warning messages need to be fixed, so what when the log file says 
ERROR that's why the test failed.

2. The ERROR that causes the test to fail should be LAST line in the log file, 
not 300 lines back.

3. Log files need to contain comment text at each step to SAY WHAT THEY ARE
DOING.

4. Makefiles should NEVER, EVER be run in "quiet" mode, for the simple reason 
that one has no idea what it was trying to do when it failed.

5. Test files must either include a URL to the bugzilla issue they fix or have 
some clue in the comments what they are doing.

6. Running tests multi-process makes them go faster, but since the log files 
randomly interleave the output from them, it makes it impossible to figure out 
where the failure is.

7. Any test that fails because of a network error, or other environmental error 
unrelated to what is being tested, should automatically sleep for a minute or 
ten, then try again.

8. Any timeout terminations MUST say which test timed out.

9. Tests should not be Rube Goldberg Machines with layers and layers of 
complexity before the actual test is even run. Tests should be a THIN layer
over 
the test.

10. Many tests are UTTERLY UNDOCUMENTED. For example,

https://github.com/dlang/dmd/tree/master/test/unit

What is that? What does it do? Is it one test or many tests? Let's look at:

https://github.com/dlang/dmd/blob/master/test/unit/frontend.d

Not a SINGLE COMMENT in it. What it is, what it does, etc., is all left to the 
imagination. This is completely unacceptable for production code, it is also 
unacceptable for any code accepted into the D repository.

11. Every time we run into "oh, that's just a heisenbug, try re-running the 
test" that is a BUG in the test suite and needs to be fixed. Those are gigantic 
time and resource wasting problems.
Jun 17 2020
next sibling parent reply Avrina <avrina12309412342 gmail.com> writes:
On Wednesday, 17 June 2020 at 23:59:52 UTC, Walter Bright wrote:
 11. Every time we run into "oh, that's just a heisenbug, try 
 re-running the test" that is a BUG in the test suite and needs 
 to be fixed. Those are gigantic time and resource wasting 
 problems.
I've run into these problems with, for example, optlink. When trying to get optlink removed, you prevent it. These heisenbugs exist because, a lot of the time, you aren't willing to chop off dead weight.
Jun 17 2020
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jun 18, 2020 at 01:59:39AM +0000, Avrina via Digitalmars-d wrote:
 On Wednesday, 17 June 2020 at 23:59:52 UTC, Walter Bright wrote:
 11. Every time we run into "oh, that's just a heisenbug, try
 re-running the test" that is a BUG in the test suite and needs to be
 fixed. Those are gigantic time and resource wasting problems.
I've run into these problems with, for example, optlink. When trying to get optlink removed, you prevent it. These heisenbugs exist because, a lot of the time, you aren't willing to chop off dead weight.
Whoa, holey miss the point batman! Optlink may have its own share of issues, but the problem here isn't with this or that piece of software, it's with the structure of the testsuite. Tests that are non-deterministic or depend on external state, strictly speaking, shouldn't be in the test suite. This includes tests that involve downloading some remote resource over the network, tests that assume things about the host OS and filesystem, etc.. There are a couple of these in the test suite, and they put you at the mercy of external state which is beyond your control. (I remember one time there was a heisenbug that had to do with random number generators, meaning, its probability of arbitrary, totally coincidental failure was non-zero. Sigh.) These tests ought to be removed, or at least disabled in CI. Any time you depend on external state, it really does not belong in the test suite, or at least, it does not belong in the autotester, because it just leads to tons of wasted time trying to track down exactly what it is that failed, which most of the time isn't even relevant to the PR you're trying to push through. T -- MASM = Mana Ada Sistem, Man!
Jun 17 2020
parent reply Avrina <avrina12309412342 gmail.com> writes:
On Thursday, 18 June 2020 at 02:34:42 UTC, H. S. Teoh wrote:
 On Thu, Jun 18, 2020 at 01:59:39AM +0000, Avrina via 
 Digitalmars-d wrote:
 On Wednesday, 17 June 2020 at 23:59:52 UTC, Walter Bright 
 wrote:
 11. Every time we run into "oh, that's just a heisenbug, try 
 re-running the test" that is a BUG in the test suite and 
 needs to be fixed. Those are gigantic time and resource 
 wasting problems.
I've run into these problems with, for example, optlink. When trying to get optlink removed, you prevent it. These heisenbugs exist because, a lot of the time, you aren't willing to chop off dead weight.
Whoa, holey miss the point batman! Optlink may have its own share of issues, but the problem here isn't with this or that piece of software, it's with the structure of the testsuite.
There are issues with optlink, I've seen them manifest in testsuite and just running the test again "fix" it. It's not the only problem where this has occured. I'm sure there's more problem with the test suite, and it is rather messy and has grown slow. I was replying specifically to the point about "heisenbugs". Some of which are of Walter's own creation do to his refusal to accept change.
Jun 18 2020
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/18/2020 7:38 AM, Avrina wrote:
 There are issues with optlink, I've seen them manifest in testsuite and just 
 running the test again "fix" it. It's not the only problem where this has
occured.
I've run those tests more than anyone, and have not seen an optlink heisenbug.
Jun 18 2020
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jun 18, 2020 at 02:40:33PM -0700, Walter Bright via Digitalmars-d wrote:
 On 6/18/2020 7:38 AM, Avrina wrote:
 There are issues with optlink, I've seen them manifest in testsuite
 and just running the test again "fix" it. It's not the only problem
 where this has occured.
I've run those tests more than anyone, and have not seen an optlink heisenbug.
I think it's because Walter uses advanced quantum technology that can directly handle quantum-superimposed computation states [1], so none of these heisenbugs affect him. ;-) [1] https://forum.dlang.org/post/mailman.3657.1591403118.31109.digitalmars-d puremagic.com T -- English is useful because it is a mess. Since English is a mess, it maps well onto the problem space, which is also a mess, which we call reality. Similarly, Perl was designed to be a mess, though in the nicest of all possible ways. -- Larry Wall
Jun 18 2020
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/18/2020 3:20 PM, H. S. Teoh wrote:
 On Thu, Jun 18, 2020 at 02:40:33PM -0700, Walter Bright via Digitalmars-d
wrote:
 I've run those tests more than anyone, and have not seen an optlink
 heisenbug.
I think it's because Walter uses advanced quantum technology that can directly handle quantum-superimposed computation states [1], so none of these heisenbugs affect him. ;-) [1] https://forum.dlang.org/post/mailman.3657.1591403118.31109.digitalmars-d puremagic.com
That's not an optlink issue.
Jun 18 2020
parent Mathias LANG <geod24 gmail.com> writes:
On Friday, 19 June 2020 at 00:54:15 UTC, Walter Bright wrote:
 On 6/18/2020 3:20 PM, H. S. Teoh wrote:
 On Thu, Jun 18, 2020 at 02:40:33PM -0700, Walter Bright via 
 Digitalmars-d wrote:
 I've run those tests more than anyone, and have not seen an 
 optlink
 heisenbug.
I think it's because Walter uses advanced quantum technology that can directly handle quantum-superimposed computation states [1], so none of these heisenbugs affect him. ;-) [1] https://forum.dlang.org/post/mailman.3657.1591403118.31109.digitalmars-d puremagic.com
That's not an optlink issue.
Starting a new thread as not to derail the original topic, which contained valid points. Optlink has been a pain for everyone on x86 Windows for a while. I personally use Linux and Mac OSX, but tried doing some work on Windows recently and first think I got was a linker crash. There have been active steps taken to limit its use / reduce the exposure of new users to it, among them: - Dub defaults to mscoff since v1.15.0, and that has drastically improved the UX for new users. See https://github.com/dlang/dub/pull/1661 for the many reasons this was done. - Vibe.d recently dropped support for it because they were causing crashes / timeout: https://github.com/vibe-d/vibe.d/pull/2445 - This was tried in DMD, and you obviously shut it down: https://github.com/dlang/dmd/pull/8347 . I will just quote the last post by Manu here: "I don't have the energy to pursue this. I do think it's important though." And yes, they are document, advertised, and have been advocated for years, yet you refused to listen to the feedback countless users have given.
Jun 18 2020
prev sibling next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 17 June 2020 at 23:59:52 UTC, Walter Bright wrote:
 A good test suite should:

 1. verify that things that are supposed to work do work

 [...]
Most of those could be fixed with an improved test runner. If we did a timeout per test. Another oblivious improvement would be printing only the tests which failed. As for the missing comments, I think that's a plus. When introducing a change in how dmd interprets D's semantics, one should be forced to scratch their head.
Jun 17 2020
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
I've added a new keyword TestSuite and here are the current test suite bugs
that 
I found:

https://issues.dlang.org/buglist.cgi?keywords=TestSuite&list_id=231900
Jun 18 2020