14

When writing unit tests, I feel that there is a trade-off between code repetition and test logic.

Example of my current (likely flawed) approach:

To test this function (overly simple function for illustration):

from warnings import warn

class AmbiguousSignWarning(Warning):
    pass

def product_sign(arg1, arg2):
    """
    Returns 1 if product is non-negative and -1 if negative. Warns when product is 0.
    """
    product = arg1 * arg2
    if product >= 0:
        if product == 0:
            warn(AmbiguousSignWarning(
                "0 can be considered both positive or negative. Treated as positive."
            ))
        return 1
    else:
        return -1

I'd write something like:

from warnings import filterwarnings, resetwarnings
from pytest import mark, warns

PRODUCT_SIGN_TEST_CASES = [
    {
        "description": "arg1 > 0, arg2 > 0",
        "arg1": 1,
        "arg2": 14,
        "expected_warning": None,
        "expected_result": 1,
    },
    {
        "description": "arg1 == 0, arg2 > 0",
        "arg1": 0,
        "arg2": 14,
        "expected_warning": AmbiguousSignWarning,
        "expected_result": 1,
    },
    {
        "description": "arg1 < 0, arg2 > 0",
        "arg1": -12,
        "arg2": 14,
        "expected_warning": None,
        "expected_result": -1,
    },
    # Goes on like this to exhaust combinations of signs of arg1 and arg2
]

class TestProductSign:
    @mark.parametrize("test_case", PRODUCT_SIGN_TEST_CASES)
    def test_product_sign(self, test_case):
        if test_case["expected_warning"] is None:
            filterwarnings("error", category=AmbiguousSignWarning)
            result = product_sign(
                arg1=test_case["arg1"],
                arg2=test_case["arg2"],
            )
            resetwarnings()
        else:
            with warns(AmbiguousSignWarning):
                result = product_sign(
                    arg1=test_case["arg1"],
                    arg2=test_case["arg2"],
                )
        assert result == test_case["expected_result"]

What I don't like about this approach is that I basically copy paste test cases, modifying them slightly depending on what each tests. Copy-pasting is a no-no (modifying the function would likely end up in having to modify each test case), but how do I avoid code duplication without introducing more test logic?

More logic means more potential for error (perhaps not in this simple example, but in more complex functions); at some point, I would have to test my test logic. My approach also bloats up the testing module as functions that have more complex signatures, or arguments end up using a lot of lines of code just to define the test cases.

If this indeed is an unavoidable tradeoff, should I prefer duplicated (logic-free) code (higher maintenance) over more testing logic (sources for error), especially in a setting where correctness of code is vital?

Some comments on why this design:

  • I wanted to avoid logic as much as possible: single test with minimal logic tests all cases; developer doesn't need to understand the logic of multiple tests
  • The code that is duplicated basically contains no logic, so downstream modifications may be laborious, but less error-prone
  • All test cases are written as literals; minimal code to arrange test cases
  • Addition and removal of test cases should require only looking at the variable PRODUCT_SIGN_TEST_CASES, seeing what dictionary key the other tests provide, and mimicking it.
  • A test_case object is usually printed by pytest when it fails, so that inspection of "description" explains what is tested (since I don't have descriptive test name)

Potential antipatterns I tried to avoid:

  • Lots of testing code, requiring someone who reads the code to understand various test structures
  • Need for testing unit test code due to complex logic

I couldn't find my answer here:

Also, I looked at some repositories on GitHub and found them either to have too much testing logic for my taste, or repeated testing logic achieving the same as I do with repetition in the PRODUCT_SIGN_TEST_CASES variable (example: https://github.com/scipy/scipy/blob/main/scipy/linalg/tests/test_decomp_cholesky.py).

1
  • 7
    Avoid the conditional logic entirely - split that into two parameterised tests, one for the if cases, the other for the else cases.
    – jonrsharpe
    Commented Apr 2, 2022 at 18:25

3 Answers 3

24

The problem is that the test has any logic at all. Conditionals and looping structures indicate the test is doing too much. Like jonsharpe said in a comment, this test needs to be split into at least two parameterized tests.

To promote this separation, create two different variables holding test data. One set is the "happy path" and the other set is for the "unhappy path". The advantage here is that you can give each test data variable a relevant name.

The more focused a test is, the easier it is to maintain as the application evolves. Rather than parameterize the warning tests, consider writing specific tests with a good name to verify each kind of warning. Data-driven tests are useful for boundary testing, where a range of values are valid and each value in the range has the same expected behavior.

Tests should verify only one behavior. As soon as they verify more than one behavior, the test has too many reasons to fail. Conditionals are an immediate sign the test has too many reasons to fail. Refactor the test into multiple tests and give each test a good name. Tests that make more than one assertion should be inspected to ensure the assertions are related to the behavior under test. Assertions unrelated to the behavior being tested should be refactored into their own tests.

4
  • 18
    "ideally a single assertion" needs some clarification. This should not be taken to mean "a single assert statement" but asserting a single behavior. For example, a test that checks if the output of a function is an array with a single specific element would be OK to assert that the length is 1 and the first element has the expected value, in two separate assert statements. Commented Apr 3, 2022 at 9:33
  • @MartinEpsz: I agree with your comment. For whatever reason, I've always had a problem communicating this. Too often I find developers see the words "two separate assert statements" and they immediately jump to "make as many asserts as I possibly can so I don't have to write more unit tests." I like the example in your comment. Commented Apr 4, 2022 at 16:14
  • @MartinEpsz: I've been mulling this answer around in my head for a few days. I clarified "ideally a single assertion" (well, I rewrote that sentence entirely). Does that help? Commented Apr 6, 2022 at 13:16
  • I think it's great. My comment was more directed at whoever read the answer than to you, and I agree this is not easy to write about. But thanks for updating. Commented Apr 6, 2022 at 16:11
3

Here's an extreme way of looking at it:

The function being tested is doing two things: multiplication and reporting the sign. So we're trying to test both that multiplication produces a value with the correct sign and that we're classifying that sign correctly. So the testing matrix is the product of the tests of the two operations.

If we broke product_sign into two functions, say product and sign_of, the total number of tests would be only the sum of the individual tests, not the product. (And since multiplication is part of the language, it might not be our job to test it.)

I know this sounds somewhat absurd, but I'd argue that's because we working with a very simple example. In real life problems, it can be easy to overlook ways to decompose functions into more testable ones. Coming up with a matrix of tests is often a clue that we missed a way to separate the concerns of the unit under test.

1

Preamble: I don't write Python, but am approaching this from a general unit testing perspective. Please take any attempts to code in Python with a heavy dose of salt and correct them in your mind, not comments. :-) Or just edit for correctness.

This approach seems to create a maintainability challenge. There are a few reasons.

The first problem I have is that the test function itself is complicated enough that it needs to be inspected and tested. You don't want to have to test your test code. There is also a problem every time something changes in the function under test, because the test function has to be rewritten to handle the change, and each test case setup has to be reexamined and potentially updated.

Second, the test setup is incredibly verbose. You could achieve a comparable setup, if you really want a single test engine, by creating a series of tests that make a single function call with 4 parameters, eliminating the need to repeat the parameter names. This would also eliminate the need for description because you could have a test name for each function call. Succinct example of this (remember, I don't write Python):

class TestProductSign
  # Parameters are arg1, arg2, expected warning, expected result
  test zero_positive(self):
    test_product_sign(0, 14, AmbiguousSignWarning, 1)
  test neg_pos(self):
    test_product_sign(-12, 14, None, -1)

etc. This would keep your single test function, but create separate tests with meaningful individual titles, and succinct parameter lists. However it still has the problem of test code that needs testing.

But I think the most readable version would be to make a number of very simple tests that are trivial to read and pretty trivial to code. (This time I'll leave the Python out and just give the core code.)

test pos_neg():
  assert product_sign(12, -14) == -1
test pos_pos():
  assert product_sign(3, 5) == 1

etc. I'm not even trying to figure out how you capture the warning. I leave that as an exercise for the student. ;-)

The advantages of this include that the test code is trivial to read and that adding additional functionality in product_sign should not cause any rewriting of the existing tests unless you actually change the result. I'd likely make a constant to represent 1 and -1 in case someone decides to return true and false at some time (in the actual code too). By shortening the test cases you also probably get a faster read of what cases are covered. With 7 lines for every test it's much harder to see if all the required tests are covered. Also, I don't have a clue how pytest works, but in my current language, Elixir, I can run mix test --trace test_product_sign.exs and get a list of the test names as they are executed. I can also do mix test test_product_sign.exs:30 to test only the test case that starts on line 30. With the provided technique there is only one test case and I wouldn't gain those benefits. There are other advantages (perhaps some language specific) that come from having a separate test for each actual test.

If the setup to test your function is more complicated and repetitive, as in the linked Github example), you may be able to create setup or local functions to try to minimize the repetitive parts and parameterize as much as possible. Keeping as much common setup out of the test function as possible, so that the parts of the test that are really different from other tests, is certainly desirable.

Not the answer you're looking for? Browse other questions tagged or ask your own question.