38

We all know the standard TDD cycle: I first write a unit test, the smallest unit test that fails; then I write some production code, again the smallest amount of production code that is sufficient to pass the test; finally I refactor. In this way, it is hoped, I will produce code that is 100% covered by unit tests.

Note that there is an emphasis on unit here. Tests produced in this process are supposed to limit their scope only to the method I am changing; every method must be accompanied by a test that tests this particular method. Also all dependencies of the class that contains this method are mocked. Each class, therefore, will be accompanied by a mock, much like each method will be accompanied by a test.

This, it is believed, will ensure correctness of the code. Without such coverage the risk of bugs will be far greater. Also, perhaps even more importantly, it is believed that such a code will allow fearless modifications without breaking it. Without a 100% coverage by unit tests any change in untested code risks breaking it; so any change of business requirements, any attempt at refactoring is likely to introduce regressions, making the code fragile, untrustworthy and solidifying its current state.

I can't see how is this the case. In my mindset, unit tests are, rather, inherently hostile to refactoring, while providing little confidence in the correctness of the code.

Assume I have a unit of code and a test for it. Now time comes to refactor. Trivially this test can never guard against regressions in any other place of my code, because all dependencies are mocked. But it also can't guard against regressions in this very piece of code it is associated with! Whenever change this unit of code, or maybe even remove it completely, I will also have to rewrite or even completely remove the test that guards it. The test, therefore, gets removed right before it could get useful. It was, therefore, useless and writing it was a wasted effort.

Unit tests are inherently tied to the way code is implemented. They are, by definition, tied to particular classes and methods. They, therefore, by their very existence, solidify the way code is implemented. Code cannot be refactored without modifying the tests as well. Refactoring is, thus, rendered difficult.

Finally, many bugs stem from misunderstandings of other pieces of code I'm working with, or misunderstandings of the contracts of 3rd party APIs. Errors, therefore, lie not in any particular method or class, but rather in the way methods or classes interact. Unit tests can never catch such errors.

Note I am not trying to say that automated tests are useless. However, it does seem to me that integration tests can be better suited here. Firstly, integration tests don't force me to write mocks, meaning they incur less overhead. Secondly, integration tests, by their nature, test the way code behaves, rather than the way it is written. The refactoring of the internals of the code, therefore, will leave an integration test untouched, indeed letting the test guard against regressions. Ideally, an integration test will only change if business requirements do.

Of course, there are things that cannot be tested by integration tests. For example, if a method must guard against error conditions that are never supposed to happen in reality, then I must write a unit test for it. Also, if I have lots of logic, with complex algorithms, then I also must test this logic and these algorithms in particular. For example, if I forgot that an implementation of a hash map is probably already present in my language's standard library and wrote one myself, I would most likely also have to write tests for this hash map in particular. Even then, however, I may limit myself to testing the input/output of the algorithm I'm writing, without testing particular methods and classes that implement it.

A well written set of tests at a higher level of integration may already achieve large code coverage. If code is added to handle some special cases, then this may be covered by adding these edge cases to the suite of integration tests.

Still, I can't see how can mocks be useful. They only seem to me to lower the confidence I can put in my test. Separating logic from input/output may help reduce the need of mocks.

This seems to give a rise to two approaches, that despite superficial similarity, in practice tend to yield very different code. The first approach is to first focus on achieving 100% coverage with unit tests while mocking everything, only rising to the level of integration tests if something cannot be tested by unit tests. The second approach is to start from integration tests, maybe even locking oneself in a TDD-like cycle. Mocks will be avoided if possible, unit tests will be written only if necessary.

My question is, how does the first of these approaches (i.e. sticking to the pyramid of tests and covering the code by lots of unit tests that use mocks) help grant confidence in correctness of the code and enable fearless refactoring? Where is my error in my reasoning that unit tests grant little confidence and are hostile to refactoring, as opposed to integration tests?

Many experienced coders vehemently argue for focusing on unit tests, hinting I must be missing something.

23
  • 14
    "Unit tests are inherently tied to the way code is implemented." - no, but that's the crux of the issue: how most people write tests makes them tied to the implementation. Suppose you're testing a function that anonymizes data - if you write a test so that it checks a for a specific value, you're tied to the implementation and you can't choose a different anonymization algorithm. Instead you should be testing for properties that are expected of this function (a.k.a. "behavior") that client code relies on. 1/2 Commented Mar 15 at 22:25
  • 12
    @FilipMilovanović More condensed example: if you unit-test a sorting function, you check that the output is sorted, not that it compares specific elements in specific order. The latter ties tests to the specific implementation; the former allows you to swap bubble sort for quicksort without touching tests at all. Commented Mar 16 at 17:41
  • 2
    @user3840170, in development I'm familiar with, the definition of the interfaces themselves is part of the implementation of an application. You seem to be suggesting that altering algorithms which sit behind pre-ordained interfaces is the main kind of alteration which programmers make. In fact, altering the overall configuration of modules and interfaces between them, including the shape and dynamical behaviour of interfaces, and their existence, features significantly in the kinds of alterations which regularly occur. Tests inevitably end up tied to the current implementation.
    – Steve
    Commented Mar 17 at 11:50
  • 2
    @FilipMilovanović, point is, the system of modules and interfaces is certainly as much a product of software design as anything else, and is just as often implicated in rework (which may occur either due to initial error, or due to re-adaptation). I'm not constantly reworking interfaces any more than I'm constantly swapping bubblesort for quicksort. It's that when rework does happen, then that rework is quite likely to involve alterations to interfaces that would be hooked for automatic testing (including the shape, dynamics, or whatever else is considered the "contract" of the interface).
    – Steve
    Commented Mar 17 at 16:01
  • 3
    "This, it is believed, will ensure correctness of the code." I'm not sure who believes that. I see unit tests as risk mitigations: they reduce the risk that I'll get things wrong. If I write a test, and it says the code is OK, but the code still breaks, it indicates that I made an assumption that turned out to be wrong: I'd look for the assumption and write a new test before doing anything else. Commented Mar 17 at 23:16

12 Answers 12

25

There is a lot to digest in this question. There appear to be some misconceptions about unit tests, what exactly constitutes a "regression", and how much unit test code you should expect to change when you change the System Under Test. Understanding how unit tests prevent regressions requires a change in perspective first.

I would like to define what "unit test" means to me. I think it has some similarities with your definition:

  1. A unit test executes fast — blazing fast! Sub-millisecond, please.
  2. A unit test can be executed concurrently without affecting other tests.
  3. A unit test should not utilize resources from the outside world.
    • No file system access.
    • No web service calls.
    • No e-mails.
    • No cross-thread communication.
    • No cross-process communication.
    • No SQL and no database connectivity.
  4. A unit tests verifies the public behavior of that unit conforms to a requirement.

Beyond that, mocking has nothing to do with unit tests, except for rule #3 above. I think this is the most important thing to remember:

Unit tests do not need to mock every dependency.

The "overhead" you describe in writing unit tests is the overhead of mocking dependencies. If these dependencies don't have side effects outside of the current test (see rule 3 above), then you don't need to mock that dependency. Use the real thing. The goal here is to reduce the very overhead that frustrates you.

Finally, many bugs stem from misunderstandings of other pieces of code I'm working with, or misunderstandings of the contracts of 3rd party APIs. Errors, therefore, lie not in any particular method or class, but rather in the way methods or classes interact.

This is very, very, very true. The interactions between objects can be quite complex, especially when the behavior involves side effects like database calls, file system access, etc. In my opinion, this code is not suitable for unit tests. What you are calling "integration tests" are a better strategy here, which is something you've already noted.

Firstly, integration tests don't force me to write mocks, meaning they incur less overhead. Secondly, integration tests, by their nature, test the way code behaves, rather than the way it is written.

Oh boy do we have a lot to dissect here. The first sentence mentions overhead again. The overhead with mocking comes when you write the test. An integration test might be faster to write, but it is orders of magnitude slower to run! The "overhead" gets shifted further down the line. The overhead is not incurred when the test is written. The overhead of integration tests gets shifted to test execution time. Humans are waiting many minutes to hours for tests to run. This is overhead, too. If you mock dependencies, you get a sub-millisecond test rather than a test taking many seconds to many minutes to run.

The second sentence is very interesting too. You say integration tests verify the behavior, but unit tests verify the way code is written. This is a big red flag that the unit test is either not written properly, or not valuable (see rule #4 above). The unit test should verify public behavior which corresponds to a requirement. If you have unit tests that mock a bunch of dependencies only to assert that some method got called on that dependency, then I would say you haven't written a good unit test. What is the outward behavior the rest of the world should see? That's what you should test.

To be very clear, unit tests should test how code behaves, not how it is written.

Ideally, an integration test will only change if business requirements do.

Until management gets a good deal on a different cloud provider, and now you need to lift-and-shift your entire infrastructure. What used to be a MySQL database is now a SQL Server database. What used to be MongoDB is now Couchbase. What used to be an SMTP e-mail server is now a "notification service". The infrastructure of your ecosystem undergoes a lot more change than you might think, especially if multiple teams are involved building out microservices. There is a lot more churn here than you think, so be careful about making this assumption. This brings us to the issue of "regressions".

A software regression is a kind of defect that occurs in something that was working previously, but suddenly doesn't. This can happen because of a code change, or infrastructure change within your ecosystem. Unit tests cannot guard against regressions caused by the outside world. Unit tests guard against those code changes that your team has control over.

Assume I have a unit of code and a test for it. Now time comes to refactor. Trivially this test can never guard against regressions in any other place of my code, because all dependencies are mocked. But it also can't guard against regressions in this very piece of code it is associated with! Whenever change this unit of code, or maybe even remove it completely, I will also have to rewrite or even completely remove the test that guards it. The test, therefore, gets removed right before it could get useful. It was, therefore, useless and writing it was a wasted effort.

... Code cannot be refactored without modifying the tests as well. Refactoring is, thus, rendered difficult.

Don't conflate "regression" with "change in business requirements." If business requirements change, the System Under Test changes, and guess what? You need to update tests. You are also correct in saying that you cannot guard against regressions in code that changes because requirements change. That's not a regression. That's a change in requirements.

Regressions occur in other parts of the application that are not directly related to the requirements being changed. All of the other unit tests (that should each be running sub-millisecond, by the way) guard against regressions — accidental changes in behavior that violate previously functional implementations of unchanged requirements. So the test you are changing doesn't guard against regressions. The tests you aren't changing are guarding against regressions. All of those other Systems Under Test should continue functioning as they did before.

Remember how each unit test should execute sub-millisecond? This is where test execution time becomes important. The faster a test runs, the more likely you are to run it, which means it becomes more likely to catch regressions earlier in the development lifecycle where they are quicker and easier to fix.

The "regression" is not prevented for the person changing the code. The regression is prevented for your end users, because you caught it at 2:13 PM on Wednesday 4 seconds after you changed the code, and not 5 months later after getting a vague bug report from an end user first thing in the morning before you've had your first cup of coffee.

That's how unit tests prevent regressions.

5
  • 4
    A change in requirements is not the only reason to refactor code. Other reasons include improving maintainability (fixing bad design decisions), or generalising code to support changed requirements elsewhere. The claim by @gaazkam was that refactoring some code, even if the requirements for that piece of code did not change, necessitates changing unit tests too (if they are written with scope smaller than the entire refactored unit), meaning that those tests did not guard well against regressions introduced by the refactoring. What about these actual regressions?
    – tomsmeding
    Commented Mar 16 at 9:30
  • @tomsmeding You've beaten me to writing this comment right before I intended to write the same :) Thanks
    – gaazkam
    Commented Mar 16 at 13:14
  • 4
    @tomsmeding: ok, so you refactor some code. Guess what? You need to update tests if the code structure changes. All those other tests? Those should keep passing. That's how unit tests prevent regressions. Commented Mar 16 at 13:35
  • Right -- the unit tests that prevent regressions here are the ones that either have a scope larger than the "module" being refactored, or that test modules that depend on the refactored module (that is, some "social" unit tests), and thus indirectly function as a sort of partial integration test for the refactored module. This justifies existence of unit tests for all non-leaf modules; unit tests for leaf modules are useful until said leaf module is refactored. Is that right?
    – tomsmeding
    Commented Mar 16 at 19:08
  • 7
    Those unit tests are especially useful. Refactoring implies no change in behavior. A change in structure, perhaps, but no changes in behavior. Commented Mar 16 at 22:04
23

You're running into the classicists vs mockists debate or the sociable tests vs solitary tests debate.

I may also suggest that the pyramid of tests may not be the only way to think about tests. Brian Marick wrote about "technology-facing tests" and "business-facing tests", which evolved into the testing quadrants (see also: Lisa Crispin and Janet Gregory's Agile Testing: A Practical Guide for Testers and Agile Teams). I'd suggest that thinking about the primary audience of the test and who is given confidence by its existence is more useful than naming based on the scope of the test.

The problem is restricting yourself to a single point of view. Perhaps consider sociable tests instead of solitary tests. Perhaps consider the testing quadrants instead of just the testing pyramid.

3
  • 1
    Thanks for interesting reads. So I guess that long story short is that my workplace decided to enforce mockist style of unit tests, which I dread because I find classicist style more intuitive, but I guess I must just fall into line...
    – gaazkam
    Commented Mar 15 at 20:12
  • 4
    @gaazkam Neither approach is wrong. They both have a place. I also find classicist style or sociable tests a little more intuitive, but the more sociable the tests become, the harder it is to isolate the failure to a specific area. You don't just need to fall in line - you could educate your colleagues on the pros and cons of each technique, perhaps by writing more sociable tests and explaining cases where they are valuable. But also realizing that some cases may need more solitary tests, including extensive mocking, because that is what is valuable.
    – Thomas Owens
    Commented Mar 16 at 2:17
  • 1
    @gaazkam It also sounds like the mockist-style tests you're concerned about are not very high quality (or else you are caricaturing them). Even when you're mocking all the dependencies, the tests should aim to be checking the observable behaviour of this unit, not just duplicating the implementation by asserting that a particular series of method calls were made. It should be quite possible to change one aspect of the unit's behaviour and re-run all the other tests for this unit (unchanged) to check that you haven't accidentally changed other aspects of the unit's behaviour.
    – Ben
    Commented Mar 19 at 1:03
14

Please allow me to begin with a pre-amble: this is not your fault. The literature sucks.


Note that there is an emphasis on unit here.

Yup; and there shouldn't be. "Unit test" was the language used by Kent Beck when he first started sharing his ideas. By the time he wrote "I call them unit tests, but they don't match the accepted definition of unit tests very well"; it was too late; the language had taken deep enough hold of the community that the deliberate efforts to change the vocabulary were doomed.

I'm inclined to agree with Mike Hill's assessment

To begin with, tho we liked the practice of TDD, and were gradually learning how best to use it, our theory about what made it work was not very strong yet. Introducing a practice w/o any idea of what makes it work is a recipe for failure. We pushed too hard, too soon.

If we were starting all over today, I would instead emphasize that what we are creating are controlled experiments -- we put a lump of code in an isolated box, put some data in, turn the crank, see if the right data falls out.

The grain of the test, which is to say how big the lump is, isn't actually a critical constraint. There are contexts in which large lumps make sense, there are others where small lumps make sense. Welcome to the world of trade offs. Beck's Test Desiderata essay touches on some of these trade offs.

It might also be worth reviewing what Jay Fields has written about "solitary" tests vs "sociable" tests.

In short: information hiding is still a thing; we still want to arrange our code such that, if we need to change things later, we can cost effectively limit the blast radius of the change.

And that's very difficult to do when you insist on coupling your tests tightly to a specific decomposition of modules.


I can't see how can mocks be useful.

Mocks are a whole separate mess; you can get some of the context here.

But the short version: the Mock Objects practice, as described in the original Endo-Testing presentation, is intended to align with a particular style of implementing objects (and object interfaces), where the emphasis is on message passing, combined with a desire to have errors detected as close as possible to the point where the message is generated, rather than waiting until the end of the tests (at which point all of the transient context has been lost).

If you are thinking of TDD as an analysis and design tool (which is how it was represented early on), and the thing that you are trying to analyze is the messaging protocols your objects use when collaborating to achieve a goal, then you are going to be writing tests that fix those protocols.

And the mistakes in the designs of those protocols that you don't recognize when you are writing the firsts tests? Yup, those are going to mean some rework.


About mocks. Assume I have a class that contains little logic, its main purpose is talking to other classes, or the db, or some restful api developed at the same organization, etc. To test this class we mock all those dependencies out....

Don't Mock what you Don't Own is old advice from the team that introduced "Mock Objects" into the discourse.

3
  • 1
    About mocks. Assume I have a class that contains little logic, its main purpose is talking to other classes, or the db, or some restful api developed at the same organization, etc. To test this class we mock all those dependencies out. But the greatest potential source of errors probably lies in our misunderstanding of the data model or of the behavior of these other classes, apis or dbs. These misunderstandings will then be encoded in the mock. (cont)
    – gaazkam
    Commented Mar 16 at 13:12
  • 1
    The testing becomes somewhat tautological then: the correctness of our code depends on the correctness of our understanding of its dependencies, which is correct if we correctly understand its dependencies. Unfortunately, this does not show that code is, indeed, correct. If dependencies were not mocked out, then the test would give far greater confidence that our code is actually correct.
    – gaazkam
    Commented Mar 16 at 13:13
  • "the greatest potential source of errors probably lies in our misunderstanding of the data model or of the behavior of these other classes, apis or dbs." Absolutely. And TDD (with or without mocks) is not intended to solve that problem. It's a knife when you need a screwdriver; sure, the knife has a blade, and you can use it to turn a screw, but it's a bit rubbish at it (especially in comparison to a real screwdriver). Commented Mar 16 at 14:49
7

You are absolutely correct. The "mockist" approach with mocking every dependency of a class or method leads to brittle tests which require a lot of work, makes refactoring difficult, and doesn't even help ensure correctness of the code.

The sane approach is to only mock those dependencies which have non-deterministic behavior or where it is otherwise impractical to use the real implementation. E.g. if you have a subsystem which sometimes send a mail as a side-effect, it makes sense to inject a stubbed smtp client instead of a real.

If we go back to the origins of unit testing, stubbing and mocking is only recommended in the case where the real implementations have not been written yet, and a bottom-up development approach (writing the dependencies first) is said to avoid the need for mocking.

The early proponents of test-driven development emphasized that unit tests allowed you to safely refactor. But with a mockist approach, many refactorings would also require changes to the mocks. But if you change code and tests at the same time, you have no guarantee that functionality is unaffected. You lose the safety the tests were supposed to ensure.

So clearly the over-reliance of mocking is counter-productive. How did it become so widespread?

The confusion comes down to differing definition of "unit" in unit test:

Definition 1: A unit is the smallest part which can be tested in isolation, i.e. a single function or class. Testing without mocking every single dependency would then be classified as an integration test rather than a unit test.

Given this definition, integration tests are much more useful and valuable than unit tests. Unit tests are mostly useful for testing those functions and classes which does not have any dependencies.

Definition 2: A unit is a component which can be tested in isolation without testing the the whole system end-to-end. But that component can be small or big, i.e. having many constituent parts which will be exercised as part of the test.

Integration test is then classified as tests which cross the boundaries of subsystems, e.g. interaction between application and database. Integration tests would then tend to be significantly slower than unit test, and it makes sense to separate them out.

Whether you use one definition or the other is not that important. The "unit" is not what you should focus on anyway. Instead you should focus on internal interfaces which can be tested programmatically. Automated tests should avoid being tightly coupled to implementation details and therefore should not care if the functionality behind the interface is implemented using one or multiple objects.

The problem arise when definition 1 is combined with the "test pyramid", a piece of dogma which says you should have more unit tests than integration tests. This is completely sensible if you follow definition 2, but if you follow definition 1, you might conclude that most of your tests should be brittle and useless and fewer tests should be useful.

Finally, note that the Testing Pyramid as originally defined by Mike Cohn uses the term "service test" rather than integration test. Services refer to the subsystems in a service-oriented architecture and service tests are tests of the integrations between these services. It certainly does not refer to just any test touching two or more classes.

3
  • 2
    I've even seen people argue that even database should not be mocked out. Many errors happen because the databse engine behaves differently from our expectations. It is not dififcult to set up a test database and it does not have to be slow. Mocking the database only tests that our mock behaves according to our expectations, not that our code works well. To be honest, I find this approach convincing.
    – gaazkam
    Commented Mar 16 at 11:47
  • 1
    @gaazkam Absolutely, if the application is data driven an the tests can be made fast enough (e.g. by using an in-memory database instance). The important factor is not whether the tests can be classified as "unit" or "integration" but which form of testing provide the most value.
    – JacquesB
    Commented Mar 16 at 13:39
  • @JacquesB Would using an in-memory DB for testing count as a ‘mock’?  (Either way, it's jolly useful.  Also agreed on not worrying about naming — we'd got into the bad habit of calling all tests ‘unit tests’; once I realised that many weren't, I started calling all of them ‘automated tests’, which is the important thing.)
    – gidds
    Commented Mar 17 at 0:35
4

Note that there is an emphasis on unit here. Tests produced in this process are supposed to limit their scope only to the method I am changing; every method must be accompanied by a test that tests this particular method.

A unit is not equivalent to a method, and diligently writing unit tests for every single method (sometimes even going out of your way to test private methods using various ugly hacks) is almost certainly going to result in fragile tests that need a lot of maintenance and - since the developer expects them to fail every time they touch the code - hardly catch any actual issues.

What constitutes a unit and what should be mocked out is a subject of a lot of heated debate, but you usually want to write unit tests at the level at which it makes sense to have some sort of specification. Consider the following class:

public class PopulationService
{
    public double GetPopulationFooRatio(Population pop)
    {
        // ...calculate some things...
        try
        {
            return GetNormalizedFooRatio(fooCount, pop.Count);
        }
        catch (DivideByZeroException e)
        {
            return 0;
        }
    }

    private double GetNormalizedFooRatio(int foos, int total)
    {
        // ...maybe do some more calculations...
        return ((double)tweakedFoos) / total * Constants.NormalizationConstant;
    }
}

Now, does it make sense to have a GetNormalizedFooRatio_WhenTotalIsZero_ThrowsDivideByZeroException test, then mock out the method in GetPopulationFooRatio tests and write a GetPopulationFooRatio_WhenGetNormalizedFooRatioThrowsDivideByZeroException_ReturnsZero test - or is it enough to just have GetPopulationFooRatio_WhenPopulationCountIsZero_ReturnsZero?

There are valid reasons to mock out the method and separate the two methods' tests. If you use the private method in multiple other methods, having a test for it means you won't be facing ten failed tests that actually have the same root issue you need to dig for. If the concept of a "normalized foo ratio" has a specific meaning for your business logic, it might be useful to have a documented and thoroughly tested way to calculate it. Or on a more practical note, if GetNormalizedFooRatio is computationally expensive, you might not want to have every test of GetPopulationFooRatio call it.

But if you can't find a good reason to draw a unit boundary, all you've done is tie yourself to the specific implementation of what actually matters - that the public method returns zero when the population count is zero - and you'll need to scrap the test when code review comes and you get told to rewrite it to if (pop.Count == 0) return 0; at the beginning of the method.

Ultimately, outside of external dependencies (such as an SQL database, or a cloud service - which you'll typically want to mock for unit tests since you're not guaranteed to have access to those in your build environment), unit boundaries are a tradeoff. Fine-grained tests are easier to write and can help in isolating a failure to a specific piece of code, but often result in leaking implementation details. Coarse tests, on the other hand, let you test for what actually matters and change the implementation freely - but may require complex setup to cover an edge case, or take a long time to execute.

But if you find yourself having to change tests every time you refactor and your mocks seem to bring you no value, it's a good sign that you could be testing at a higher level than you are now - public methods of classes instead of every method, or outer boundaries of n-tier layers instead of every class. As long as you're monitoring your code coverage to make sure you're not missing those "impossible" edge cases, the gods of software development should not smite you.

3

I'm not going to provide an answer but instead I'll give you a scenario and some questions that will hopefully lead you to a more informed conclusion.

Assume you're working on a very large scale financial system. For brevity, assume that the decision was made to write everything in house. Your applications sit on top of some complex business logic, that sits on top of some statistical library implementing all the formulas the business uses, which in turn sits on top of a math library. There are multiple teams working on the system, each on their own area.

How can the math or statistical library teams be sure they're delivering functional code when they don't know and/or aren't supposed to know how that code is going to be used? Is it then the integration team's job to make sure everything is correct and functional? How can they be 100% sure the issue is not in their code, without mocking the lower level libraries and having little or no math/statistical knowledge?

4
  • 1
    This is an interesting answer because it touches on the fact that, as teams get bigger and organised into specialised functions, the extent to which any one person comprehends both the fine detail and the overall purpose reduces, and therefore the ability of anyone to properly conceive appropriate tests that cover the internals of the whole system reduces.
    – Steve
    Commented Mar 15 at 21:23
  • @Steve, I think you've missed the allegory. This answer is the illustration of sharp increase of cognitive load for integration testing compared to unit testing.
    – Basilevs
    Commented Mar 17 at 10:49
  • @Basilevs, I think what you're saying is that there is a way of maintaining the overall quality of the testing whilst reducing the cognitive load on those designing the tests. I'm saying that, by dividing things up to relieve the cognitive load, the programmers become less able to conceive the appropriate tests, because appropriate tests require a global comprehension. There should probably be a name for the fallacy of applying divide-and-conquer to a problem which requires an indivisible analysis.
    – Steve
    Commented Mar 17 at 11:06
  • The situation described is inherently divisible - there is no feasible way for a single person to understand all intricacies of a complex system. Without a proper division of labor and responsibilities, such systems are unmaintainable. Similarly, if a component requires understanding of overall product to test, it can only be tested on a system level, and can't be a subject for a unit test.
    – Basilevs
    Commented Mar 18 at 9:29
1

My question is, how does the first of these approaches (ie sticking to the pyramid of tests and covering the code by lots of unit tests that use mocks) help grant confidence in correctness of the code and enable fearless refactoring?

Many of the points you make about the limitation of unit tests are well-made.

I think the problem with tests is about how people think and talk about them. Nobody should ever be "fearless" about altering existing code, and there is no facility that exists that significantly removes the need for developers to work carefully and cautiously from a good understanding when making alterations to machinery that is already somewhat proven in use.

"Refactoring" is a euphemism for what is almost always an actual alteration to the workings of code, not merely a rearrangement of source code that works equivalently.

A lot of people also forget that code can become incorrect because what it is supposed to do has changed, and because in such a case the code is still doing what it did before which is no longer correct (and not yet doing what it is now supposed to be doing).

If it weren't for the fact that circumstances change and code has to be almost constantly remade to fit new circumstances, then a great many testing strategies would work far better than in fact they do in practice.

This is because almost every testing strategy involves additional development effort, more code being written, and more complexity in that code. These are factors that themselves make code slower to change, more likely to be accidentally broken in the process of being changed, and increase the amount of code that can be broken.

When things have to be made and remade constantly in a bespoke way (rather than a repetitive and routine way), as computer workings do, it is desirable for both the production process and the resulting artefacts to be as simple as possible, to both fulfil the function currently desired and to remain amenable to re-adaptation to a different function.

The circumstances in which tests contribute to reliability and justify their own cost of development are quite complicated. It's not the case that engaging in the apparent activity of developing more tests automatically improves reliability on the whole - not least because the allocation of available resources in this fashion, can compete with different allocations that might often improve reliability even more.

4
  • Thank you for your answer, I appreciate your point of view. Though I am really surprised that it is not downvoted into oblivion, I expected very strong resistance to any answer questioning the need of automated tests.
    – gaazkam
    Commented Mar 16 at 12:00
  • Also I am very curious what are those different allocations of available resources that might often improve reliability even more than writing tests?
    – gaazkam
    Commented Mar 16 at 12:01
  • @gaazkam, I'm not going so far as to flatly deny that automated tests are ever desirable. Just that claims about their usefulness are often overcooked and lacking context. There are many things that contribute to the reliability of computer applications - all the things technical staff are doing when they are not ostensibly writing automated tests (assuming you don't conceive your employer's entire development function as being the activity of test-writing).
    – Steve
    Commented Mar 16 at 14:19
  • @gaazkam, it would be good to reread this answer again. The heart and soul of this answer is to stay focused on why we write tests: because it helps. And when it doesn't help, write a different kind of test, or reconsider whether writing the test as code is even a worthwhile expenditure of labor — think rather than follow dogma simply because someone said "best practice". Commented Mar 17 at 23:19
1

I believe, a lot of those questions comes from the lack of good TDD experience. I used TDD for almost 10 years already, so hopefully this answer will add a value to this question.

First things first. Here are some axioms (IMHO):

  • unit tests can be well-engineered and poorly-engineered as any other code
  • mocks simplify unit testing and efforts to support both a unit and unit tests

Now, having well-written unit tests allows you to have up-to-date view of how a unit can be used in different scenarios, including edge-cases. This is extremely important because having no unit tests does not provide you with such information. Imaging you was working on a feature, understood and implemented all quirks, edge-cases and fixed a dozens of bugs. But you haven't written any unit tests. What will happen when you leave a company? Do you expect new comers to grasp all of those details? They will suffer while debugging and understanding what you already understood. The project may struggle as well.

Next, what are well-written unit tests? Those are tests which facilitate refactoring. What does it means? It means that they atomic enough, so you don't need to spend more than 5-10 minutes to understand what each test does. It also means, that if business requirements are changed you don't need to rewrite everything. This is combines with how well-written, structured and architected the code base is as well. In a well-designed application, fully written using TDD, making a change is a breeze. Sometimes new requirements does require removing a unit and all its unit tests, but usually they are small enough, so you don't feel a great loss.

My question is, how does the first of these approaches (ie sticking to the pyramid of tests and covering the code by lots of unit tests that use mocks) help grant confidence in correctness of the code and enable fearless refactoring?

Having that said, you covered all units with unit tests. And you MOCKED all dependencies. Why mocked? Without mocking you have too many dependencies and side-effects in the unit, you don't want your code be affected by a third-party code. Each unmocked dependency is potentially a hole in unit tests which may require more efforts from you on supporting them. An unmocked dependency may be considered as a tech debt. But don't be an idealist, sometimes it's OK to have unmocked dependencies. Such tech debt may be resolved in the future if something goes wrong with your logic or you'll see that the tests became harder to support.

Bear with me, the question is almost answered. Now the "pyramid of tests" comes. You have well-written unit tests. But how do you know that all those small pieces of logic work as you expect in the application? You write integration tests, which do verify how units behave together (but not how - which is responsibility of a unit test). It's another skill to write a well-written integration tests as well. And then you may add e2e tests.

Where is my error in my reasoning that unit tests grant little confidence and are hostile to refactoring, as opposed to integration tests?

It's not "that unit tests grant little confidence", it's poorly-written unit tests with poorly-engineered system grant little confidence. Integration tests involve many dependencies which are not mocked, and that is why it's difficult to write and support them. That's why you have much less integration tests, compared to unit tests. You have thousands of unit tests and dozens of integration tests. And you have some end to end tests. When it comes to refactoring. In a project with well-separated layers of a test pyramid, you much more often modify individual unit tests, and much less you modify integration tests (often you don't need to touch them at all). And you don't afraid to do so, because all you little edge-case scenarios are covered by tests, which means if you update the logic, you'll see which ones are failing and no longer satisfy the requirements. And then you decide whether it's a bug in your new logic or the tests must be updated.

Unit tests are like your best friends or team-mates which help you to do your job much more confidently by telling you "Hey, remember this small AC? You broke it, don't forget to implement it as well".

1

The definition of “refactoring” is to change the way the code is written without changing its behaviour. An extreme case: We had some code that crashed sometimes and nobody could figure out why or how to fix it. The code was a bit too complicated. So I refactored It until the code I had was:

Some code
If condition1 and condition2 and condition3 then crash
Some more code

After refactoring I fixed it by removing the “crash” line.

So after refactoring, following the correct definition of refactoring, all your unit tests will behave unchanged. If not, your change was not “refactoring”.

Now refactoring might not be what you wanted. You may have figured out that a method is not actually useful, and a different method would be more useful. Since the behaviour changes, your unit tests change. And code calling the method changes. And if you mock the method, the mock needs changing. If you don’t want to change the method name do the following:

You have a method named X. Create a method x_changed with the changed behaviour. The code for X_changed starts out like X, then you modify it. Create mocks and unit tests, by starting with those for x and modifying them. Then you change the callers one by one. Each change should be a refactoring, otherwise you do the same for “caller” and “caller_changed”. When you think all changes were made, you comment out x, fix problems if your assumption was wrong. Until everything works and then X and its unit tests and mocks are removed. And then you change the name of the new method back to the name of the old method (if it is appropriate with the changed behaviour).

1
  • The refactoring can affect unit boundaries and change unit behaviors without changing system behavior. Discussion of unchanged unit behavior is not useful in the context of this question.
    – Basilevs
    Commented Mar 18 at 14:33
0

I think you really need to go back to the fundamentals: what is a test?

A test verifies that defined input into action A produces the expected output B.

That is what your unit test is supposed to be doing, and that’s its value, and that is how it makes SOME refactoring of your code safe.

Mocks are sometimes useful and sometimes necessary in order perform action A.

But the art in designing a good test is in identifying a useful action A and the benefit is, in part, in understanding why the output will be B.

If not doing TTD, the first question I would ask myself is: will a test more clearly describe the intended result of A than A, or the range of input to A, than A. If not, there’s no need for a test.

If you have a method that is supposed to get the first character of a first name, that method doesn’t need a unit test. If it’s supposed to get the initial from a full name, then it may (what are the initials for: Jim “The Plumber” Smith). A good unit test will explain something about A better than just reading A, if it doesn’t, it’s mediocre at best.

Not everything needs to be tested, but verifying that your code matches your expectations is not wasted effort.

-1

I believe you are running into the trap to assume that "TDD" specifies a very strict way to develop. As you write:

I first write a unit test, the smallest unit test that fails; then I write some production code, again the smallest amount of production code that is sufficient to pass the test; finally I refactor.

While the Wikipedia does mention Unit Testing a lot in its page on Test Driven Development, it does have a more generic definition at the start:

Test-driven development (TDD) is a software development process relying on software requirements being converted to test cases before software is fully developed, and tracking all software development by repeatedly testing the software against all test cases. This is as opposed to software being developed first and test cases created later.

Test-driven Development means that tests drive your development. Nothing more and nothing less. You write your tests first, then you change your application to make the tests green, thereby enjoying the near complete test coverage with all its benefits. It is quite literally simply the switching of the other in which you write your code-under-test and the tests (with the mindset that goes with that, of course).

The way you defined TDD is too specific. You do not need to write a unit test first. You need to write any kind of automated test first, which expresses the thing you want to achieve with the code change that the test is for in the best manner. And you do not need to refactor afterwards - refactoring happens when you need to change your code later, but not because of TDD or to enable TDD. If requirements never change, and the code does not evolve to call for refactoring later, it is perfectly usual to never touch it again in a big sweeping manner.

There are many ways to write tests, today. Unit tests are just one specific instance; there are other levels on which you can test. For example, integration tests (where your test code runs more unmocked code from your application, sometimes even with actual 3rd party apps spinning up - e.g., maybe a real DB or Kafka broker or something like that). Or feature tests (i.e., Cucumber-based approaches) which specify test scenarios much more from the perspective of the user instead of the unit of code and usually have the actual frontend talk to the actual backend (+ 3rd parties) during test runtime.

In the title you ask "How do unit tests facilitate refactoring without introducing regressions" - well, they do that by making sure that the units you tested keep conforming to their specification (provided you implemented that specification correctly in the unit tests). Obviously they cannot guard against what you did not test for, but that's nothing special.

Note that there is an emphasis on unit here.

That seems to be your emphasis. You are limiting your thinking by focusing on unit testing. You can write huge, well-working test suites which contain not a single unit test, and which guard against refactoring-introduced regressions just fine.

-1

I'm going to assume you're already aware of a lot of the scripture on unit testing, and I want to precision strike on this claim:

I cant see how is this the case. In my mindset, unit tests are, rather, inherently hostile to refactoring

The issue here is one of a particular subset of cases. If your tests define overly granular units, which seems to be the case here since you state that "every class has a mock", then it is true that you've created a test suite that doesn't like developers making any changes to any class' dependencies without then forcing developers to rewrite the tests. I agree that such a system is hostile in nature.

However, I don't agree on your definition of a "unit", and the source of the hostility here that that your unit is being defined differently by your requirements than it is by your test suite.

So, to boil it down to the core issue here: a test suite should strive to mirror its requirements, not its implementation. Your complaints seems to match scenarios where the unit under test are based on the implementation, which would explains why you encounter hostility.

Secondly, I want us to first agree that unit tests are not hostile towards changes that only entail internal logic being changed, if none of the class dependencies are being changed. The main complaint here usually is about needing to update the class' dependencies in order to be able to unit test it (as the test needs to mock those dependencies), and that is what I'm responding to.
If you don't agree with that point, then what is written below isn't going to answer that part.

If you find yourself often needing to change your class dependencies, more so than the implementation of your classes themselves, then it seems to me that you've drawn the wrong class boxes around your complexity.

Allow me to elaborate that statement. Every application, at the end of the day, is a massive pile of logic and business rules, which exists as an attempt to model a non-trivial process. I'm going to call this "complexity", in the sense of it being "the sum of all logic and rules".

The goal of software engineering principles is to subdivide that complexity into smaller, more digestible tasks, tasks which we can individually solve before we compose them as part of a larger orchestration. If you want a mental image, what we're essentially doing is looking at the pile of complexity, and encircling it into subcomponents. We get to decide how we draw these lines, the only rule is that everything needs to be encircled.

The individual solvability of these components is imperative. This is why we engage in the whole process to begin with. When you tell me that you have to redesign your class dependencies when your implementation changes, this suggests to me that you drew inaccurate circles on your pile of complexity, which is the underlying cause of the subsequent hostility you experience.

When drawing your circles, you should attempt to draw them in a way that the most complex bits are inside the circle, and the least complex bits are on the boundaries between the circles. Stepping away from the analogy, this means that interfaces/contracts should be simple, and implementation should be complex.
Obviously, I don't mean that you need to make the implementation more complex than it needs to be. I'm saying that unavoidable complexity should err towards the implementation logic more so than the interfacing/orchestration logic.

If your interfaces/contracts are complex, and your implementation is simple, then it makes a lot of sense why you don't like unit tests:

  • The dependency graph is highly volatile and contracts have complex expectations of public behavior, which makes the respective mocks equally complex to model.
  • The value of unit testing trivial implementation logic makes this feel like a pointless exercise.

But if your interfaces/contracts are simple, and your implementation is complex, then unit testing does not suffer the kind of hostility that you describe:

  • The dependency graph is not volatile, only requiring change for significant reworks, at which point the effort of redefining the mock for a simple contract is a proportionate task.
  • The complex implementation is hard to visually verify, making the tests provide actual value to you, the developer, to trust that your logic behaves the way you intended it to.

Anecdotally, I've worked as a .NET consultant for about a decade, specializing in joining teams that were struggling to deliver. Often there wasn't even a test suite to begin with, but when there was (with active complaints of "test hostility" like yours), it seemed to consistently be the case that the "units" of the unit test were being defined differently from where the actual complexity of the overall application was situated. The tests were too granular, or they focused on an implementation details that a product owner or business analyst wouldn't even be aware of, which means that it's not focusing on testing actual application behavior (i.e. requirements), but rather developer implementation (i.e. classes); thus missing the mark on what testing is supposed to achieve.

Not the answer you're looking for? Browse other questions tagged or ask your own question.