Is it feasible and useful to auto-generate some code of unit tests?

Question

Earlier today I have come up with an idea, based upon a particular real use case, which I would want to have checked for feasability and usefulness. This question will feature a fair chunk of Java code, but can be applied to all languages running inside a VM, and maybe even outside. While there is real code, it uses nothing language-specific, so please read it mostly as pseudo code.

The idea
Make unit testing less cumbersome by adding in some ways to autogenerate code based on human interaction with the codebase. I understand this goes against the principle of TDD, but I don't think anyone ever proved that doing TDD is better over first creating code and then immediatly therafter the tests. This may even be adapted to be fit into TDD, but that is not my current goal.

To show how it is intended to be used, I'll copy one of my classes here, for which I need to make unit tests.

public class PutMonsterOnFieldAction implements PlayerAction {
    private final int handCardIndex;
    private final int fieldMonsterIndex;

    public PutMonsterOnFieldAction(final int handCardIndex, final int fieldMonsterIndex) {
        this.handCardIndex = Arguments.requirePositiveOrZero(handCardIndex, "handCardIndex");
        this.fieldMonsterIndex = Arguments.requirePositiveOrZero(fieldMonsterIndex, "fieldCardIndex");
    }

    @Override
    public boolean isActionAllowed(final Player player) {
        Objects.requireNonNull(player, "player");
        Hand hand = player.getHand();
        Field field = player.getField();
        if (handCardIndex >= hand.getCapacity()) {
            return false;
        }
        if (fieldMonsterIndex >= field.getMonsterCapacity()) {
            return false;
        }
        if (field.hasMonster(fieldMonsterIndex)) {
            return false;
        }
        if (!(hand.get(handCardIndex) instanceof MonsterCard)) {
            return false;
        }
        return true;
    }

    @Override
    public void performAction(final Player player) {
        Objects.requireNonNull(player);
        if (!isActionAllowed(player)) {
            throw new PlayerActionNotAllowedException();
        }
        Hand hand = player.getHand();
        Field field = player.getField();
        field.setMonster(fieldMonsterIndex, (MonsterCard)hand.play(handCardIndex));
    }
}

We can observe the need for the following tests:

Constructor test with valid input
Constructor test with invalid inputs
isActionAllowed test with valid input
isActionAllowed test with invalid inputs
performAction test with valid input
performAction test with invalid inputs

My idea mainly focuses on the isActionAllowed test with invalid inputs. Writing these tests is not fun, you need to ensure a number of conditions and you check whether it really returns false, this can be extended to performAction, where an exception needs to be thrown in that case.

The goal of my idea is to generate those tests, by indicating (through GUI of IDE hopefully) that you want to generate tests based on a specific branch.

The implementation by example

User clicks on "Generate code for branch if (handCardIndex >= hand.getCapacity())".
Now the tool needs to find a case where that holds.

(I haven't added the relevant code as that may clutter the post ultimately)
To invalidate the branch, the tool needs to find a handCardIndex and hand.getCapacity() such that the condition >= holds.
It needs to construct a Player with a Hand that has a capacity of at least 1.
It notices that the capacity private int of Hand needs to be at least 1.
It searches for ways to set it to 1. Fortunately it finds a constructor that takes the capacity as an argument. It uses 1 for this.
Some more work needs to be done to succesfully construct a Player instance, involving the creation of objects that have constraints that can be seen by inspecting the source code.
It has found the hand with the least capacity possible and is able to construct it.
Now to invalidate the test it will need to set handCardIndex = 1.
It constructs the test and asserts it to be false (the returned value of the branch)

What does the tool need to work?
In order to function properly, it will need the ability to scan through all source code (including JDK code) to figure out all constraints. Optionally this could be done through the javadoc, but that is not always used to indicate all constraints. It could also do some trial and error, but it pretty much stops if you cannot attach source code to compiled classes.

Then it needs some basic knowledge of what the primitive types are, including arrays. And it needs to be able to construct some form of "modification trees". The tool knows that it needs to change a certain variable to a different value in order to get the correct testcase. Hence it will need to list all possible ways to change it, without using reflection obviously.

What this tool will not replace is the need to create tailored unit tests that tests all kinds of conditions when a certain method actually works. It is purely to be used to test methods when they invalidate constraints.

My questions:

Is creating such a tool feasible? Would it ever work, or are there some obvious problems?
Would such a tool be useful? Is it even useful to automatically generate these testcases at all? Could it be extended to do even more useful things?
Does, by chance, such a project already exist and would I be reinventing the wheel?

If not proven useful, but still possible to make such thing, I will still consider it for fun. If it's considered useful, then I might make an open source project for it depending on the time.

For people searching more background information about the used Player and Hand classes in my example, please refer to this repository. At the time of writing the PutMonsterOnFieldAction has not been uploaded to the repo yet, but this will be done once I'm done with the unit tests.

There exist commercial tools that do this, so it is feasible and works to some degree. See: Automatic generation of unit tests for Java? - one such application Agitar — user40980, Commented May 29, 2014 at 13:57
It seems that a test generated this way is primarily an echo chamber for the implementation, i.e. is confirms that the implementation does what you wrote it to do, not whether this is sensible from any other perspective. It might incidentally produce tests that are meaningful w.r.t. the requirements and the contract of the code, but doesn't seem intrinsically biased towards that sort of test. — user7043, Commented May 29, 2014 at 14:05
@delnan I understand that. But when writing unit tests they need to be done either way if you are intending to go for highest as possible coverage. — skiwi, Commented May 29, 2014 at 14:06
@skiwi Coverage produced for the sake of coverage doesn't buy you anything (except appease mindless insistence on metrics). Tests should test what matters, such as adherence to the documented API and robustness against invalid inputs. Lack of coverage can either mean that there are cases you've implemented but not tested, or pointless code. Assuming the latter is putting the cart before the horse. Tests as you propose them only test that (1) the automatic test generation is correct and (2) the VM executes the code you wrote correctly. — user7043, Commented May 29, 2014 at 14:20
@skiwi putting code through a unit tester, is not the same as testing the code. — ctrl-alt-delor, Commented Jun 16, 2014 at 22:11

AVee · Accepted Answer · 2014-07-01 10:35:22Z

It's all just software, so given enough effort it's possible ;-). In a language which supports a decent way of doing code analysis it should be feasible too.

As for the usefulness, I think it some automation around unit testing is useful to a certain level, depending on how it's implemented. You need to be clear about where you want to go in advance, and be very aware of the limitations of this type of tooling.

However, the flow you describe has a huge limitation, because the code being tested leads the test. This means an error in reasoning when developing the code will likely end up in the test as well. The end result will probably be a test which basically confirms the code 'does what it does' instead of confirming it does what it should do. This actually isn't completely useless because it can be used later to verify the code still does what it did earlier, but it isn't a functional test and you shouldn't consider your code tested based on such a test. (You might still catch some shallow bugs, like null handling etc.) Not useless, but it can't replace a 'real' test. If that means you'll still have to create the 'real' test it might not be worth the effort.

You could go slightly different routes with this though. The first one is just throwing data at the API to see where it breaks, perhaps after defining some generic assertions about the code. That's basically Fuzz testing

The other one would be to generate the tests but without the assertions, basically skipping step 10. So end up with an 'API quiz' where your tools determines useful test cases and asks the tester for the expected answer given a specific call. That way you're actually testing again. It's still not complete though, if you forget a functional scenario in your code the tool won't magically find it and it doesn't remove all assumptions. Suppose the code should have been if (handCardIndex > hand.getCapacity()) instead if >=, a tool like this will never figure that out by itself. If you want to go back to TDD you could suggest test cases based on just the interface, but that would be even more functionally incomplete, because there isn't even code from which you can infer some functionality.

Your main issues are always going to be 1. Carrying errors in code over to the test and 2. functional completeness. Both issues can be suppressed somewhat, be never be eliminated. You'll always have to go back to the requirements and sit down to verify they are all actually tested correctly. The clear danger here is a false sense of security because your code coverage shows 100% coverage. Sooner or later someone will make that mistake, it's up to you to decide if the benefits outweigh that risk. IMHO it boils down to a classic trade-off between quality and development speed.

100% coverage leading to false sense of security is a problem as it is, even without generating tests. 100% coverage will only mean that all the code was executed when all the tests were run - it won't actually tell if anything was tested, let alone how well it was tested. — eis, Commented Jan 6, 2015 at 15:20

utnapistim · Accepted Answer · 2014-07-01 12:22:03Z

TLDR: It's a bad idea (see below to read why).

First, I wanted to address this:

I understand this goes against the principle of TDD, but I don't think anyone ever proved that doing TDD is better over first creating code and then immediatly therafter the tests. This may even be adapted to be fit into TDD, but that is not my current goal.

TDD is better over writing the code first (yes, it has been proven).

Here are some of the benefits (you can also read about these in lots of online articles and books):

TDD ensures you don't write stuff you don't need (e.g. "write minimal code for the test to pass")
TDD ensures the interface of your API is optimized from the client's point of view, instead of from the implementor's point of view. The first, ensures the code will be easier to use, for client code. The second tends to polute the public API of your module with implementation details (even when you are careful to prevent this).
TDD ensures that you write code to your specs, not to an abstract idea of your algorithm
with writing code first, you find many situations where you believe the code will work, and "see no reason to write an extra test just for a hard to reach corner case" (also known as "developer lazyness", "very bad mistranslation of YAGNI", and "programmer excuse number 8"). With TDD you don't have the excuse. It may not sound like much, until you work in a team and your coleagues bull***t you with an excuse - or call you out for bull****ting them.
TDD minimizes effort (usually, when writing code first, the development loop is "implement, write tests, modify/fix implementation for tests to pass"; with TDD, it is "write tests, implement until tests pass" - which can be much shorter).

Is creating such a tool feasible? Would it ever work, or are there some obvious problems?

Given enough resources, it is theoretically possible to create such a tool.

In practice, it is doubtful you can make something like that, that trully fits generic implementations. It is easy enough to make something for a simple case, but consider that you may test an API that calls 25 other APIs internally (with each one adding to the cyclomatic complexity of your algorithm). The tool would have to do much more than the best static analizers on the market today.

The obvious problem is, the tests would not cover your (client's) requirements, in any meaningful way - they would instead cover all cases of the implementation.

A correct unit test should test that a unit of your code does what it is supposed to do, in a situation. A test generated like you mention, would test that the code does what it is implemented to do (and maybe that your code is reachable). This in no way relates to your specs.

Would such a tool be useful?

Partially (with limited functionality). It could shorten the time needed to write unit tests for a corner cases.

Is it even useful to automatically generate these testcases at all? Could it be extended to do even more useful things?

Not really. I think it would be usefull to automatically generate one testcase at a time, with the user choosing which cases he wants generated (but generating all cases would be a very low signal/noise ratio - you would have to discard most generated code as irrelevant).

Your assertion that writing the tests first is better is considerably in need of citation. In fact, research has quite mixed results, with meta-analysis pointing to considerably reduced productivity in professional environments. — Jack Aidley, Commented Jan 11, 2019 at 10:36

eis · Accepted Answer · 2015-01-06 15:24:28Z

With these main questions:

Is it even useful to automatically generate these testcases at all? Could it be extended to do even more useful things? Does, by chance, such a project already exist and would I be reinventing the wheel?

I will boldly claim that yes, generating testcases can be useful and yes, such projects already exist and are used. However, it is not that useful in the way asked in the question, namely as a TDD replacement. That it is not.

This was said in one of the other answers:

A correct unit test should test that a unit of your code does what it is supposed to do, in a situation. A test generated like you mention, would test that the code does what it is implemented to do (and maybe that your code is reachable). This in no way relates to your specs.

Which I mostly agree with. This is how unit tests are generally used, and how they are used in TDD. However, this is not all tests are used for.

Assume a situation where you have inherited a large codebase with no existing unit tests. You don't know about its behaviour, and you certainly don't know if any change you're about to introduce will break any existing behaviour. Wouldn't it be great to have tests generated that would capture the current behaviour of the existing codebase?

It sure would. Those kinds of tests are called regression tests, and in this case I do see generating test cases as useful. Tools actually do exist that do this already: maybe the most popular one is a commercial tool AgitarOne with its regression test generating capabilities. There are other alternatives as well.

Another use case where this would be useful is to check if some common contracts are not violated. Common contracts could be that a) your code should never throw a null pointer exception if you didn't give it any null parameters, and b) your code, if it implements the equals method, should follow the rules required by the equals method contract. There exists tools for this as well, such as the free tool Randoop, that uses feedback-assisted random test generation to catch such issues.

These are some examples where I see usefulness in generated test cases. However, like pointed out in the previous answers, it is not the same benefit someone gets from tests in TDD, for example, or a good test suite in general. These are somewhat corner cases: in the general case, you should still write the tests yourself.

The regression argument doesn't work in practice because with auto generated you end up with tests that validate the implementation. So as long as you don't change the existing code, your code and your app won't break. As soon as you change existing code your tests will break and you're back to square one where the implementation is the ultimate authority (you end up regenerating the tests for the new implementation). What generated tests do give you is a safety net against accidental changes. Marginally beneficial imo. — Manuel, Commented Dec 14, 2016 at 4:07
@Manuel Two comments: first, generated tests can be generated in a way they verify actual results only, not internal implementation, so if you're only changing internal implementation, tests should not break. Second, even if you would change something that is expected to change actual end results, it is still highly useful to have tests to verify it is not breaking behavior that is not intended to change. You don't want to break other parts of your application, and it is useful to verify that. — eis, Commented Dec 15, 2016 at 13:17
Can you give an example of a tool that generates tests which verify results, not implementation? — Manuel, Commented Dec 15, 2016 at 17:19
@Manuel of the tools I've listed here, only AgitarOne verifies the implementation details (private variables etc), most verify class interface. You can also configure which interfaces are used for test generation. — eis, Commented Dec 15, 2016 at 20:59

CoronA · Accepted Answer · 2019-01-11 07:36:49Z

I recently did some reasearch on generated tests. It seems that there are three main approaches to generate tests for unknown code. All three approaches assume that the current behavior (or specification) is intended or at least accepted. So it could be characterized with a test.

The three approaches are:

No analysis

The code is not analyzed at all. The methods are called in a random sequence and all intermediate results are verified in later regression tests.

This approach is implemented by randoop or evosuite.

Static analysis

The code data and control flow is analyzed. Some tools just try to construct test data that touches all branches, others even verify the code to comply to a certain specification. The analysis should also analyze the state that is changed by a method with certain test data. Both analysis results can be used to generate tests (setup with test data, assert analyzed state changes).

This seems to me to be the most user convenient approach. It is also the nearest to your problem. Symbolic Pathfinder seems to be a tool for this approach, yet it is not production ready.

Dynamic analysis

The code ist analyzed at runtime (debugging, profiling). Common approaches try to capture certain method calls (with state before and after the call) and serialize them to tests.

I developed a tool for this approach, that is called Testrecorder.

The questions

Is creating such a tool feasible? Would it ever work, or are there some obvious problems?

As some answers above point out: There cannot be a solution that is always correct. Yet I think there could be a solution that increases your efficiency.

Would such a tool be useful? Is it even useful to automatically generate these testcases at all? Could it be extended to do even more useful things?

This depends on the scenario. I think the way of writing tests along with your code (TDD) is a good way keep code at minimum quality.

Yet sometimes I am confronted with large legacy code bases. Nobody really wants to spend much time on such code, yet the risk of introducing new code in such code is quite high. I think it could be a valid trade off to use generated tests in this scenarios.

randomA · Accepted Answer · 2014-07-01 12:04:19Z

0

It is not feasible, speaking with the current theoretical knowledge in computer science.

The easiest way to see it is this, put into your code a conjecture that hasn't been proved. (there are many of them). You don't know if the code can get into this or that branch, you also don't know which value would let it into the branch.

So you can't really generate test cases that would go into certain branch even if you have access to the source code.

And if you think it is just software, you will get it if you try hard enough, remember after a few number increase in the encoding length of the conjecture, the search space grows exponentially. Then you have to find another way.

answered Jul 1, 2014 at 12:04

randomA

1

1

You say that it is not feasible "speaking with current theoretical knowledge in computer science". What does that mean? Do you have any supporting documentation for your statement?
– Adam Zuckerman
Commented Jul 1, 2014 at 12:52
I think I need to make an example with the conjecture remark. So let's say this: your program has a set of integers, and a target integer. It has a branching statement that is if a set of input numbers is a subset the program's set and sum to the target integer, then you do A, else you do B. With the current knowledge in computer science, this program is NP-complete, so in order to generate a test case so that you go into branch A, you would need exponential amount of resources to generate a test case. This is enough to say, it's not feasible from theoretical as well as practical stand point.
– InformedA
Commented Jul 2, 2014 at 6:15

Add a comment |

Stack Exchange Network

Is it feasible and useful to auto-generate some code of unit tests?

5 Answers 5

Not the answer you're looking for? Browse other questions tagged
java
design
unit-testing
or ask your own question.

Hot Network Questions

Is it feasible and useful to auto-generate some code of unit tests?

5 Answers 5

Not the answer you're looking for? Browse other questions tagged javadesignunit-testing or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
java
design
unit-testing
or ask your own question.