Why are unit tests failing seen as bad?

Question

In some organisations, apparently, part of the software release process is to use unit testing, but at any point in time all unit tests must pass. Eg there might be some screen which shows all unit tests passing in green - which is supposed to be good.

Personally, I think this is not how it should be for the following reasons:

It promotes the idea that code should be perfect and no bugs should exist - which in the real world is surely impossible for a program of any size.
It is a disincentive to think up unit tests that will fail. Or certainly come up with unit tests that would be tricky to fix.
If at any point in time all unit tests pass, then there is no big picture of the state of the software at any point in time. There is no roadmap/goal.
It deters writing unit tests up-front - before the implementation.

I would even suggest that even releasing software with failing unit tests is not necessary bad. At least then you know that some aspect of the software has limitations.

Am I missing something here? Why do organisations expect all unit tests to pass? Isn't this living in a dream world? And doesn't it actually deter a real understanding of code?

Comments are not for extended discussion; this conversation has been moved to chat. — maple_shaft, Commented May 25, 2018 at 14:25

Doc Brown · Accepted Answer · 2018-05-23 10:40:52Z

288

This question contains IMHO several misconceptions, but the main one I would like to focus on is that it does not differentiate between local development branches, trunk, staging or release branches.

In a local dev branch, it is likely to have some failing unit tests at almost any time. In the trunk, it is only acceptable to some degree, but already a strong indicator to fix things ASAP. Note that failing unit tests in the trunk can disturb the rest of the team, since they require everyone to check if not his/her latest change was causing the failure.

In a staging or release branch, failing tests are "red alert", showing there has been gone something utterly wrong with some changeset, when it was merged from the trunk into the release branch.

I would even suggest that even releasing software with failing unit tests is not necessary bad.

Releasing software with some known bugs below a certain severity is not necessarily bad. However, these known glitches should not cause a failing unit test. Otherwise, after each unit test run, one will have to look into the 20 failed unit tests and check one-by-one if the failure was an acceptable one or not. This gets cumbersome, error-prone, and discards a huge part of the automation aspect of unit tests.

If you really have tests for acceptable, known bugs, use your unit testing tool's disable/ignore feature (so you they are not run by default, only on-demand). Additionally, add a low-priority ticket to your issue tracker, so the problem won't get forgotten.

edited May 23, 2018 at 10:40

answered May 22, 2018 at 11:48

Doc Brown

210k33 gold badges386 silver badges592 bronze badges

22

I think this is the real answer. OP mentions "release process" and "some screen [showing test results]", which sounds like a build server. Release is not the same as development (don't develop in production!); it's fine to have failing tests in dev, they're like TODOs; they should all be green (DONE) when pushed to the build server.
– Warbo
Commented May 22, 2018 at 14:08
8

A much better answer than the highest voted one. It shows an understanding of where the op is coming from without lecturing them about some ideal world situation, acknowledges the possibility of known bugs (for which not the entire roadmap is dropped to fix some rare corner case) and explains that unit tests should only definitely be green in a release branch/process.
– Sebastiaan van den Broek
Commented May 22, 2018 at 16:54
6

@SebastiaanvandenBroek: thanks for your positive reply. Just to make this clear: IMHO failing unit tests should be rare even in the trunk, since getting such failires too often will disturb the whole team, not just the one who made the change which caused the failure.
– Doc Brown
Commented May 22, 2018 at 17:56
5

I think the problem here is thinking all automated tests are unit tests. Many test frameworks include the ability to mark tests that are expected to fail (often called XFAIL). (This is different from a test that requires an error result. XFAIL tests would ideally succeed, but don't.) The test suite still passes with these failing. The most common use case is things that only fail on some platforms (and are only XFAIL on those), but using the feature to track something that will require too much work to fix right now is also within reason. But these sorts of tests are usually not unit tests.
– Kevin Cathcart
Commented May 22, 2018 at 23:04
1

+1, though I suggest a slight addition (in bold) to this sentence: "This gets cumbersome, error-prone, conditions people to ignore failures in the test suite as noise, and discards a huge part of the automation aspect of unit tests.
– mtraceur
Commented May 25, 2018 at 21:46

| Show 10 more comments

Phill W. · Accepted Answer · 2022-02-04 11:42:03Z

... all unit tests passing in green - which is supposed to be good.

It is good. No "supposed to be" about it.

It promotes the idea that code should be perfect and no bugs should exist - which in the real world is surely impossible for a program of any size.

No. It proves that you've tested the code as well as you can up to this point. It is entirely possible that your tests do not cover every case. If so, any errors will eventually turn up in bug reports and you'll write [failing] tests to reproduce the problems and then fix the application so that the tests pass.

It is a disincentive to think up unit tests that will fail.

Failing or negative tests place firm limits on what your application will and will not accept. Most program I know of will object to a "date" of February the 30th. Also, Developers, creative types that we are, don't want to break "their babies". The resulting focus on "happy-path" cases leads to fragile applications that break - often.

To compare the mindset of the Developer and the Tester:

A Developer stops as soon as the code does what they want it to.
A Tester stops when they can no longer make the code break.

These are radically different perspectives and one that is difficult for many Developers to reconcile.

Or certainly come up with unit tests that would be tricky to fix.

You don't write tests to make work for yourself. You write tests to ensure that your code is doing what it's supposed to do and, more importantly, that it continues to do what it's supposed to do after you've changed its internal implementation.

Debugging "proves" that the code does what you want it to today.
Tests "prove" that the code still does what you want it to over time.

If at any point in time all unit tests pass, then there is no big picture of the state of the software at any point in time. There is no roadmap/goal.

The only "picture" testing gives you is a snapshot that the code "works" at the point in time that it was tested. How it evolves after that is a different story.

It deters writing unit tests up-front - before the implementation.

That's exactly what you should be doing. Write a test that fails (because the method it's testing hasn't been implemented yet) then write the method code to make the method work and, hence, the test pass. That's pretty much the crux of Test-Driven Development.

I would even suggest that even releasing software with failing unit tests is not necessary bad. At least then you know that some aspect of the software has limitations.

Releasing code with broken tests means that some part of its functionality no longer works as it did before. That may be a deliberate act because you've fixed a bug or enhanced a feature (but then you should have changed the test first so that it failed, then coded the fix/ enhancement, making the test work in the process). More importantly: we are all Human and we make mistakes. If you break the code, then you should break the tests and those broken tests should set alarm bells ringing.

Isn't this living in a dream world?

If anything, it's living in the Real World, acknowledging that Developers are neither omniscient nor infallable, that we do make mistakes and that we need a safety net to catch us if and when we do mess up!
Enter Tests.

And doesn't it actually deter a real understanding of code?

Perhaps. You don't necessarily need to understand the implementation of something to write tests for it (that's part of the point of them). Tests define the behaviour and limits of the application and ensures that those stay the same unless you deliberately change them.

@Tibos: Disabling a test is like commenting out a function. You have version control. Use it. — Kevin, Commented May 22, 2018 at 15:26
@Kevin I don't know what you mean by 'use it'. I mark a test as 'skipped' or 'pending' or whatever convention my test runner uses, and commit that skip tag to version control. — dcorking, Commented May 22, 2018 at 15:33
@dcorking: I mean don't comment code out, delete it. If you later decide that you need it, then restore it from version control. Committing a disabled test is no different. — Kevin, Commented May 22, 2018 at 16:14
"It is entirely possible that your tests do not cover every case." I would go so far to say that for every non-trivial piece of code tested, you definitely do not have every case covered. — corsiKa, Commented May 22, 2018 at 16:24
@Tibos Proponents of unit testing say that the cycle time from writing a failing test to writing the code for it should be small (e.g. 20 minutes. Some claim 30 seconds). If you don't have time to write the code immediately, it's probably way too complex. If it's not complex, delete the test as it can be rewritten if the dropped feature gets added again. Why not comment it out? You don't know that the feature will ever be added again, so the commented out test (or code) is just noise. — CJ Dennis, Commented May 23, 2018 at 2:49

VoiceOfUnreason · Accepted Answer · 2018-05-23 12:29:46Z

Why are unit tests failing seen as bad?

They aren't -- test driven development is built upon the notion of failing tests. Failing unit tests to drive development, failing acceptance tests to drive a story....

What you are missing is context; where are the unit tests allowed to fail?

The usual answer is that unit tests are allowed to fail in private sandboxes only.

The basic notion is this: in an environment where failing tests are shared, it takes extra effort to understand whether a change to the production code has introduced a new error. The difference between zero and not zero is much easier to detect and manage than the difference between N and not N.

Furthermore, keeping the shared code clean means that developers can stay on task. When I merge your code, I don't need to shift contexts from the problem I'm being paid to solve to calibrating my understanding of how many tests should be failing. If the shared code is passing all of the tests, any failures that appear when I merge in my changes must be part of the interaction between my code and the existing clean baseline.

Similarly, during on boarding a new developer can become productive more quickly, as they don't need to spend time discovering which failing tests are "acceptable".

To be more precise: the discipline is that tests which run during the build must pass.

There is, as best I can tell, nothing wrong with having failing tests that are disabled.

For instance, in a "continuous integration" environment, you'll be sharing code on a high cadence. Integrating often doesn't necessarily mean that your changes have to be release ready. There are an assortment of dark deploy techniques that prevent traffic from being released into sections of the code until they are ready.

Those same techniques can be used to disable failing tests as well.

One of the exercises I went through on a point release was dealing with development of a product with many failing tests. The answer we came up with was simply to go through the suite, disabling the failing tests and documenting each. That allowed us to quickly reach a point where all of the enabled tests were passing, and management/goal donor/gold owner could all see what trades we had made to get to that point, and could make informed decisions about cleanup vs new work.

In short: there are other techniques for tracking work not done than leaving a bunch of failing tests in the running suite.

I would have said "There is ... nothing wrong with having failing tests that are disabled". — CJ Dennis, Commented May 23, 2018 at 2:53

Community · Accepted Answer · 2020-06-16 10:01:49Z

There are many great answers, but I'd like to add another angle that I believe is not yet well covered: what exactly is the point of having tests.

Unit tests aren't there to check that your code is bug free.

I think this is the main misconception. If this was their role, you'd indeed expect to have failing tests all over the place. But instead,

Unit tests check that your code does what you think it does.

In extreme cases it may include checking that known bugs are not fixed. The point is to have control over your codebase and avoid accidental changes. When you make a change it is fine and actually expected to break some tests - you are changing the behavior of the code. The freshly broken test are now a fine trail of what you changed. Check that all the breakages conform to what you want from your change. If so, just update the tests and go on. If not - well, your new code is definitely buggy, go back and fix it before submitting!

Now, all the above work only if all tests are green, giving a strong positive results: this is exactly how the code works. Red tests don't have that property. "This is what this code doesn't do" is rarely a useful information.

Acceptance tests may be what you are looking for.

There is such thing as acceptance testing. You may write a set of tests that have to be fulfilled to call the next milestone. These are ok to be red, because that's what they were designed for. But they are very different thing from unit tests and neither can nor should replace them.

I once had to replace a library with another. Unit tests helped me ensure that all corner cases was still treated identically by the new code. — Thorbjørn Ravn Andersen, Commented May 24, 2018 at 20:12

Robbie Dee · Accepted Answer · 2018-05-22 13:06:30Z

24

I view it as the software equivalent of broken window syndrome.

Working tests tell me that the code is of a given quality and that the owners of the code care about it.

As for when you should care about the quality, that rather depends what source code branch/repository you're working on. Dev code may very well have broken tests indicating work in progress (hopefully!).

Broken tests on a branch/repository for a live system should immediately set alarm bells ringing. If broken tests are allowed to continue failing or if they're permanently marked as "ignore" - expect their number to creep up over time. If these aren't regularly reviewed the precedent will have been set that it is OK for broken tests to be left.

Broken tests are viewed so pejoratively in many shops as to have a restriction on whether broken code can even be committed.

edited May 22, 2018 at 13:06

answered May 22, 2018 at 12:54

Robbie Dee

9,8052 gold badges24 silver badges53 bronze badges

10

If tests document the way a system is, they should certainly always be passing - if they aren't, it means the invariants are broken. But if they document the way a system is supposed to be, failing tests can have their use as well - as long as your unit testing framework supports a good way of marking them as "known issues", and if you link them with an item in your issue tracker. I think both approaches have their merit.
– Luaan
Commented May 22, 2018 at 13:02
1

@Luaan Yes, this does rather assume that all unit tests are created equally. It certainly isn't uncommon for build managers to slice and dice the tests via some attribute depending on how long they run, how brittle they are and various other criteria.
– Robbie Dee
Commented May 22, 2018 at 13:17
This answer is great by my very self experience. Once some people get used to ignore a bunch of failing tests, or to break the best practices in some points, let a couple of months pass and you will see % of ignored tests dramatically increasing, code quality dropping to "hack-script" level. And it will be very hard to recall everyone to the process.
– usr-local-ΕΨΗΕΛΩΝ
Commented May 28, 2018 at 10:15

Add a comment |

Joel Coehoorn · Accepted Answer · 2019-11-05 14:43:14Z

13

Here is the underlying logical fallacy:

If it is good when all tests pass, then it must be bad if any tests fail.

With unit tests, it IS good when all the tests pass. It is ALSO GOOD when a test fails. The two need not be in opposition.

A failing test is a problem that was caught by your tooling before it reached a user. It's an opportunity to fix a mistake before it is published. And that's a good thing.

What is not good is allowing a failing unit test to persist over time, as that will train you to ignore your tooling and reports. When a test fails, it's time to investigate why. Sometimes it's a bug in the code. Sometimes it's a bug in the test. Sometimes it's an environmental issue, or assumptions have changed. Sometimes in those cases the issue is temporary, but will still hang around long enough to be confounding; in these cases, you should disable the test (with a reminder somewhere to re-examine the situation later) until the issue can be resolved.

edited Nov 5, 2019 at 14:43

answered May 22, 2018 at 17:47

Joel Coehoorn

1,8934 gold badges17 silver badges22 bronze badges

Interesting line of thought. I see the question's fallacy more like this: "since it is good when a unit test fails, it is bad when all tests pass".
– Doc Brown
Commented May 22, 2018 at 18:31
While your last paragraph is a good point, it seems that the problem is more a misunderstanding of "at any point in time all unit tests must pass" (as the accepted answer indicates) and the point of unit tests.
– Bernhard Barker
Commented May 23, 2018 at 11:33

Add a comment |

Flater · Accepted Answer · 2018-05-22 13:57:01Z

Phill W's answer is great. I can't replace it.

However, I do want to focus on another part that may have been part of the confusion.

In some organisations, apparently, part of the software release process is to use unit testing, but at any point in time all unit tests must pass

"at any point in time" is overstating your case. What's important is that unit tests pass after a certain change has been implemented, before you start implementing another change.
This is how you keep track of which change caused a bug to arise. If the unit tests started failing after implementing change 25 but before implementing change 26, then you know that change 25 caused the bug.

During the implementation of a change, of course the unit tests could fail; tat very much depends on how big the change is. If I'm redeveloping a core feature, which is more than just a minor tweak, I'm likely going to break the tests for a while until I finish implementing my new version of the logic.

This can create conflicts as to team rules. I actually encountered this a few weeks ago:

Every commit/push causes a build. The build must never fail (if it does or any test fails, the committing developer is blamed).
Every developer is expected to push their changes (even if incomplete) at the end of the day, so the team leads can code review in the morning.

Either rule would be fine. But both rules cannot work together. If I am assigned a major change that takes several days to complete, I wouldn't be able to adhere to both rules at the same time. Unless I would comment my changes every day and only commit them uncommented after everything was done; which is just nonsensical work.

In this scenario, the issue here isn't that unit tests have no purpose; it's that the company has unrealistic expectations. Their arbitrary ruleset does not cover all cases, and failure to adhere to the rules is blindly regarded as developer failure rather than a rule failure (which it is, in my case).

The one way this can work is to use branching, such that the devs commit and push to feature branches that don't need to build cleanly while incomplete, but commits to the core branch do trigger a build, which should build cleanly. — Gwyn Evans, Commented May 22, 2018 at 17:57
Enforcing pushing incomplete changes is absurd, I can't see any justification for doing so. Why not code review when the change is complete? — Callum Bradbury, Commented May 22, 2018 at 21:49
Well, for one, it's a quick way of ensuring that the code's not only on the dev's laptop/workstation if their hard disk were to stop working or be otherwise lost - if there's a policy of committing even if in the middle of working, then there's a limited amount of work at risk. — Gwyn Evans, Commented May 22, 2018 at 22:00

Robbie Dee · Accepted Answer · 2018-05-22 14:01:30Z

6

If you don't fix all unit tests you can rapidly get into the state where nobody fixes any broken tests.

Is incorrect as passing unit tests don't show the code is perfect
It's a disincentive to come up with code that would be difficult to test too, which is good from a design point of view
Code coverage can help there (though it's not a panacea). Also unit tests are just one aspect of testing - you want integration/acceptance tests too.

edited May 22, 2018 at 14:01

Robbie Dee

9,8052 gold badges24 silver badges53 bronze badges

answered May 22, 2018 at 10:35

jk.

10.3k1 gold badge34 silver badges43 bronze badges

Add a comment |

Graham · Accepted Answer · 2018-05-22 14:34:48Z

To add a few points to the already-good answers...

but at any point in time all unit tests must pass

This shows a lack of understanding of a release process. A test failure may indicate a planned feature under TDD which isn't yet implemented; or it may indicate a known issue which has a fix planned for a future release; or it may simply be something where management have decided this isn't important enough to fix because customers are unlikely to notice. The key thing all these share is that management have made a judgement call about the failure.

It promotes the idea that code should be perfect and no bugs should exist - which in the real world is surely impossible for a program of any size.

Other answers have covered the limits of testing.

I don't understand why you think eliminating bugs is a downside though. If you don't want to deliver code which you've checked (to the best of your ability) does what it's supposed to, why are you even working in software?

If at any point in time all unit tests pass, then there is no big picture of the state of the software at any point in time. There is no roadmap/goal.

Why must there be a roadmap?

Unit tests initially check that functionality works, but then (as regression tests) check that you haven't inadvertently broken anything. For all the features with existing unit tests, there is no roadmap. Every feature is known to work (within the limits of testing). If that code is finished, it has no roadmap because there is no need for more work on it.

As professional engineers, we need to avoid the trap of gold-plating. Hobbyists can afford to waste time tinkering round the edges with something that works. As professionals, we need to deliver a product. That means we get something working, verify that it's working, and move on to the next job.

Ori Marko · Accepted Answer · 2018-05-23 10:18:47Z

6

It promotes the idea that code should be perfect and no bugs should exist - which in the real world is surely impossible for a program of any size.

Not true. why do you think it's impossible? here example for program that it works:

public class MyProgram {
  public boolean alwaysTrue() {
    return true;
  }

  @Test
  public void testAlwaysTrue() {
    assert(alwaysTrue() == true);
  }
}

It is a disincentive to think up unit tests that will fail. Or certainly come up with unit tests that would be tricky to fix.

In that case it may not be unit test, but integration test if it's complicated

If at any point in time all unit tests pass, then there is no big picture of the state of the software at any point in time. There is no roadmap/goal.

true, it's called unit test for a reason, it check a small unit of code.

It deters writing unit tests up-front - before the implementation.

Developers ~~will~~ deter writing any tests if they don't understand its benefits ~~by their nature (unless they came from QA)~~

edited May 23, 2018 at 10:18

answered May 22, 2018 at 10:44

Ori Marko

3031 silver badge8 bronze badges

“Developers will deter[sic] writing any tests by their nature” — that is utter nonsense. I work at an entire company of developers who practice TDD and BDD.
– RubberDuck
Commented May 23, 2018 at 10:09
@RubberDuck I tried to answer to a "fact" in question and I was exaggerating . I'll update
– Ori Marko
Commented May 23, 2018 at 10:12
"X will be deterred from doing Y if they don't understand the benefits of Y" applies for just about any X and Y, so that statement probably isn't particularly useful. It would probably make more sense to explain the benefits of writing the tests, and specifically doing so upfront.
– Bernhard Barker
Commented May 23, 2018 at 11:04
2

"impossible for a program of any size" doesn't mean "all programs, no matter what size", it means "any significant program (having a non-trivial length)" Your attempted counter-example is inapplicable, because it isn't a significant and useful program.
– Ben Voigt
Commented May 25, 2018 at 22:00
@BenVoigt I don't think I'm expected to give a "significant program" as an answer.
– Ori Marko
Commented May 27, 2018 at 12:53

| Show 1 more comment

AnoE · Accepted Answer · 2018-05-25 11:57:34Z

It promotes the idea that code should be perfect and no bugs should exist

Most definitely not. It promotes the idea that your tests should not fail, nothing more and nothing less. Assuming that having tests (even a lot of them) says anything about "perfect" or "no bugs" is a fallacy. Deciding how shallow or deep your tests should be is a significant part of writing good tests, and the reason why we have distinctively separate categories of tests ("unit" tests, integration tests, "scenarios" in the cucumber sense etc.).

It is a disincentive to think up unit tests that will fail. Or certainly come up with unit tests that would be tricky to fix.

In test driven development, it is mandatory that every unit tests fails first, before starting to code. It's called "red-green cycle" (or "red-green-refactor cycle") for this very reason.

Without the test failing, you do not know whether the code is actually tested by the test. The two might not be related at all.
By changing the code to exactly make the test turn from red to green, nothing more and nothing less, you can be pretty confident that your code does what it is supposed to do, and not a lot more (which you might never need).

If at any point in time all unit tests pass, then there is no big picture of the state of the software at any point in time. There is no roadmap/goal.

Tests are more kind of a micro-goal. In test-driven development, the programmer will write a test (singular) first, and then have a clear goal to implement some code; then the next test, and so on.

The function of the tests is not to be there in completeness before code is written.

When done correctly, in a language and with a testing library that is well-suited to this approach, this can actually massively speed up development, since the error messages (exceptions/stacktraces) can directly point the developer to where he needs to perform work next.

It deters writing unit tests up-front - before the implementation.

I don't see how this statement would be true. Writing tests should ideally be a part of the implementation.

Am I missing something here? Why do organisations expect all unit tests to pass?

Because organisations expect tests to have relevance to the code. Writing tests that succeed means you have documented some part of your application, and have proven that the application does what it (the test) says. Nothing more and nothing less.

Also, a very big part of having tests is "regression". You want to be able to develop or refactor new code with confidence. Having a large amount of green tests allows you to do that.

This goes from organizational to a psychological level. A developer who knows that his errors will in high likelihood be caught by the tests will be much more free to come up with intelligent, bold solutions for the problems he needs to solve. On the other hand, a developer who does not have tests will, after some time, be grinding to a standstill (due to fear) because he never knows if a change he does breaks the rest of the application.

Isn't this living in a dream world?

No. Working with a test-driven application is pure joy - unless you just do not like the concept for whatever reason ("more effort" etc. etc.) which we can discuss in another question.

And doesn't it actually deter a real understanding of code?

Absolutely not, why would it?

You find plenty of large open source projects (for which the management of "understanding" and know-how about the code is a very pressing topic) that actually use the tests as the main documentation of the software, by, aside from being tests, also provide real, working, syntactically correct examples for users or developers of the application/library. This often works splendidly.

Obviously, writing bad tests is bad. But that has nothing to do with the function of tests per se.

Warbo · Accepted Answer · 2022-06-24 10:34:55Z

(From my original comments)

There's a difference between required functionality and future goals. Tests are for required functionality: they're precise, formal, executable and if they fail the software doesn't work. Future goals might not be precise or formal, let alone executable, so they're better left in natural language like in issue/bug trackers, documentation, comments, etc.

As an exercise, try replacing the phrase "unit test" in your question with "compiler error" (or "syntax error", if there's no compiler). It's obvious that a release shouldn't have compiler errors, since it would be unusable; yet compiler errors and syntax errors are the normal state of affairs on a developer's machine when they're writing code. The errors only disappear when they've finished; and that's exactly when the code should be pushed. Now replace "compiler error" in this paragraph with "failing unit test" :)

Justin · Accepted Answer · 2018-05-23 20:30:49Z

The purpose of automated tests is to tell you when you have broken something as early as possible. The workflow looks a bit like this:

Make a change
Build and test your change (ideally automatically)
If the tests fail, it means that you broke something that previously worked
if the tests pass you should be confident that your change introduced no new regressions (depending on test coverage)

If your tests were already failing then step #3 doesn't work as effectively - the tests will fail, but you don't know whether that means you broke something or not without investigating. Maybe you could count the number of failing tests, but then a change might fix one bug and break another, or a test might start failing for a different reason. This means you need to wait some amount of time before you know if something has been broken, either until all of the issues have been fixed or until each failing test has been investigated.

The ability for unit tests to find newly introduced bugs as early as possible is the most valuable thing about automated testing - the longer a defect goes undiscovered the more expensive it is to fix.

It promotes the idea that code should be perfect and no bugs should exist
It is a disincentive to think up unit tests that will fail

Tests for things that don't work don't tell you anything - write unit tests for things that do work, or that you are about to fix. It doesn't mean your software is defect free, it means that none of the defects you previously wrote unit tests for have come back again.

It deters writing unit tests up-front

If it works for you then write tests up front, just don't check them into your master / trunk until they pass.

If at any point in time all unit tests pass, then there is no big picture of the state of the software at any point in time. There is no roadmap/goal.

Unit tests aren't for setting a roadmap/goal, maybe use a backlog for that instead? If all your tests pass then the "big picture" is that your software isn't broken (if your test coverage is good). Well done!

GalacticCowboy · Accepted Answer · 2018-05-24 15:04:20Z

The existing answers are certainly good, but I have not seen anyone address this foundational misconception in the question:

at any point in time all unit tests must pass

No. Most assuredly, this will not be true. While I am developing software, NCrunch is most often either brown (build failure) or red (failed test).

Where NCrunch needs to be green (all tests passing) is when I am ready to push a commit to the source control server, because at that point others may take a dependency on my code.

This also feeds into the topic of creating new tests: tests should assert the logic and behavior of the code. Boundary conditions, fault conditions, etc. When I write new tests, I try to identify these "hot spots" in the code.

Unit tests document how I expect my code to be called - preconditions, expected outputs, etc.

If a test breaks following a change, I need to decide whether the code or the test is in error.

As a side note, unit testing sometimes goes hand-in-hand with Test Driven Development. One of the principles of TDD is that broken tests are your guideposts. When a test fails, you need to fix the code so the test passes. Here is a concrete example from earlier this week:

Background: I wrote and now support a library used by our developers that is used to validate Oracle queries. We had tests that asserted that the query matched some expected value, which made case important (it's not in Oracle) and merrily approved of invalid queries as long as they completely matched the expected value.

Instead, my library parses the query using Antlr and an Oracle 12c syntax, and then wraps various assertions on the syntax tree itself. Things like, it's valid (no parse errors were raised), all its parameters are satisfied by the parameter collection, all the expected columns read by the data reader are present in the query, etc. All of these are items that have slipped through to production at various times.

One of my fellow engineers sent me a query on Monday that had failed (or rather, had succeeded when it should have failed) over the weekend. My library said the syntax was fine, but it blew up when the server tried to run it. And when he looked at the query, it was obvious why:

UPDATE my_table(
SET column_1 = 'MyValue'
WHERE id_column = 123;

I loaded up the project and added a unit test that asserted that this query should not be valid. Obviously, the test failed.

Next, I debugged the failing test, stepped through the code where I expected it to throw the exception, and figured out that Antlr was raising an error on the open paren, just not in a way the previous code was expecting. I modified the code, verified that the test was now green (passing) and that none others had broken in the process, committed, and pushed.

This took maybe 20 minutes, and in the process I actually improved the library significantly because it now supported an entire range of errors that previously it had been ignoring. If I did not have unit tests for the library, researching and fixing the issue could have taken hours.

Jay · Accepted Answer · 2018-05-27 23:17:47Z

"but at any point in time all unit tests must pass"

If that's the attitude in your company, that's a problem. At a CERTAIN time, namely, when we declare that code is ready to move to the next environment, all unit tests should pass. But during development, we should routinely expect many unit tests to fail.

No reasonable person expects a programmer to get his work perfect on the first try. What we do reasonably expect is that he will keep working on it until there are no known problems.

"It is a disincentive to think up unit tests that will fail. Or certainly come up with unit tests that would be tricky to fix." If someone in your organization thinks that they should not mention a possible test because it might fail and cause them more work to fix it, that person is totally unqualified for their job. This is a disastrous attitude. Would you want a doctor who says, "When I'm doing surgery, I deliberately don't check if the stitches are right, because if I see they're not I'll have to go back and re-do them and that will slow down finishing the operation"?

If the team is hostile to programmers who identify errors before code goes to production, you have a real problem with the attitude of that team. If management punishes programmers who identify errors that slow down delivery, odds are that your company is headed for bankruptcy.

Yes, it's certainly true that sometimes rational people say, "We're approaching the deadline, this is a trivial problem and it's not worth devoting the resources right now that it would take to fix it." But you can't make that decision rationally if you don't know. Coolly examining a list of errors and assigning priorities and schedules to fixing them is rational. Deliberately making yourself ignorant of problems so you don't have to make this decision is foolish. Do you think the customer won't find out just because you didn't want to know?

Michael Kay · Accepted Answer · 2018-05-24 13:43:09Z

One point that I don't think comes out from previous answers is that there's a difference between internal tests and external tests (and I think many projects aren't careful enough to distinguish the two). An internal test tests that some internal component is working the way it should; an external test shows that the system as a whole is working the way it should. It's quite possible, of course, to have failures in components that don't result in a failure of the system (perhaps there is a feature of the component that the system doesn't use, or perhaps the system recovers from a failure of the component). A component failure that doesn't result in a system failure shouldn't stop you releasing.

I have seen projects that are paralysed by having too many internal component tests. Every time you try and implement a performance improvement, you break dozens of tests, because you are changing the behaviour of components without actually changing the externally-visible behaviour of the system. This leads to a lack of agility in the project as a whole. I believe investment in external system tests generally has a much better payoff than investment in internal component tests, especially when you're talking about very low-level components.

When you suggest that failing unit tests don't really matter, I wonder whether this is what you have in mind? Perhaps you should be assessing the value of the unit tests and ditching those that cause more trouble than they are worth, while focusing more on tests that verify the externally-visible behaviour of the application.

I think what you're describing as "external tests" are often described elsewhere as "integration" tests. — GalacticCowboy, Commented May 24, 2018 at 14:22
Yes, but I've come across differences in terminology. For some people, integration testing is more about the deployed software/hardware/network configuration, whereas I'm talking about the external behaviour of a piece of software that you're developing. — Michael Kay, Commented May 24, 2018 at 15:16

Kain0_0 · Accepted Answer · 2022-02-04 12:23:31Z

The other answers cover most of the misconceptions around TDD, and the fallacies expressed in the question.

I did notice that no one brought up basic testing doctrine.

Progression Tests
Regression Tests

Progression tests are tests that describe a desired future state of the software. These are expected to fail as they highlight missing features or present flaws.

In fact if they pass this is some cause for concern. As either:

the developer failed to write a meaningful test
the feature has already been written/bug already fixed, and this test should be moved to regression

Regression Tests are tests that describe the current state of the software. Think of them as the current developers manual on how to use this function, or that object. Here failure is the concern.

But even then its a concern when on the mainline, eg ready for deployment.

If its failing in the developers local checkout, its a warning to the developer that they have changed something about how the software works. And that it will affect the clients of this software in some way that really should be understood first. (by client this includes your own code as it calls deeper functions). The assumption is that before pushing the code up into the mainline these tests are restored to passing by fixing them/the called code/both/(or simply removed).

So to be blunt. When you write a test, it is always a progression test. It is up to you the developer to turn it into a regression test, or to have the CI/CD pipeline be aware that it is a progression test and treat it accordingly.

It really doesn't matter if the test is a unit test, component test, integration test, end-to-end test, or any other flavour of test written using TDD, BDD, or insert development practice here in this or that framework against a ui, or on a function.

Clearly if you go from “no test” to “failing test” that is very valuable progress. — gnasher729, Commented Jun 26, 2022 at 15:48
@gnasher729 From an infrastructure perspective I completely agree. Sometimes the largest hurdle to getting automated testing is getting all the pieces of the automation lined up. From a confidence perspective not really. Does the failing test mean that the developer who wrote the test doesn't understand, or that the developer that wrote the code doesn't understand? Same can be said for a passing test, but at least you have a 50%/50% that both are accurate and correct. — Kain0_0, Commented Jun 27, 2022 at 23:03

Cort Ammon · Accepted Answer · 2022-06-24 14:38:35Z

The answer to this really depends on how you are using your unit tests. In what I would call the "usual" way of using unit tests, any failures are a problem.

A unit-test is a pass or fail which is a proxy for saying "this functionality is working as intended." Other software may depend on said functionality. Typically one gets undefined behavior when dependencies that you rely on do not fulfill their roles. If there is a failing test, there is a chance that somebody is using that code, and is expecting it to do its job. The consequences of that failure are unknown.

This does not have to be such a monolithic pattern. You could break it up into interfaces and say "If tests X and Y pass, then the class meets the requirements of interface A. If test Z passes, then the class meets the requirements of interface B." This gives you a much more fine grained image of what is happening in your software, and lets you make more fine grained decisions. However, it takes effort. Whether or not it is worth the effort is a business decision.

If you look at a framework for unit testing, there's usually some limited support for this. I most recently used pytest, and they have an entire "skipping" API. This lets you skip tests, either all the time (because you know they're wrong), or conditionally (such as not including the 3d rendering tests if the software was tested on a headless node with no graphics card).

This is a quick and cheap solution, making it popular. And its reasonably easy to collect a list of skipped tests and manage them properly.

Scott · Accepted Answer · 2018-05-23 04:17:10Z

This is a specific example of confirmation bias, wherein people tend to seek information that confirms their existing beliefs.

One famous example of this occurring, is in the 2,4,6 game.

I have a rule in my head that any series of three numbers will pass or fail,
2,4,6 is a pass
you may list sets of three numbers, and I will tell you if they pass or fail.

Most people pick a rule, say "the gap between the 1st and 2nd number is the same as the gap between the 2nd and 3rd."

They will test some numbers:

4, 8, 12? Pass
20, 40, 60? Pass
2, 1004, 2006? Pass

They say "Yes, every observation confirms my hypothesis, it must be true." And announce their rule to the person giving the riddle.

But they never received a single 'fail' to any set of three numbers. The rule could just have been 'the three numbers need to be numbers' for all the information they actually have.

The rule is actually just that the numbers are in ascending order. People typically only get this riddle correct if they test for failure. Most people get it wrong, by choosing a more specific rule, and only testing numbers that meet this specific rule.

As to why people fall for confirmation bias, and may see unit tests failing as evidence of a problem, there are many psychologists who can explain confirmation bias better than I, it basically comes down to people not liking being wrong, and struggling to genuinely attempt to prove themselves wrong.

How is it relevant to the question? Failing unit tests are evidence of a problem, by definition. — Frax, Commented May 23, 2018 at 8:52
You absolutely can have unit tests that require the system under test enter a failure mode. That's not the same as never seeing a test fail. It's also why TDD is specified as a "Red->Green->Refactor" cycle — Caleth, Commented May 25, 2018 at 12:38

Stack Exchange Network

Why are unit tests failing seen as bad?

19 Answers 19

Unit tests aren't there to check that your code is bug free.

Unit tests check that your code does what you think it does.

Acceptance tests may be what you are looking for.

Not the answer you're looking for? Browse other questions tagged
unit-testing
or ask your own question.

Linked

Hot Network Questions

Why are unit tests failing seen as bad?

19 Answers 19

Unit tests aren't there to check that your code is bug free.

Unit tests check that your code does what you think it does.

Acceptance tests may be what you are looking for.

Not the answer you're looking for? Browse other questions tagged unit-testing or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
unit-testing
or ask your own question.