-
-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parametrized tests in unittest #52145
Comments
IPython has unittest-based parametric testing (something nose has but we import them into our public decorators module for actual use: Simple tests showing them in action are here: The code is all BSD and we'd be more than happy to see it used If there is interest in this code, I'm happy to sign a PSF contributor agreement, the code is mostly my authorship. I received help for the 3.x version on the Testing in Python mailing list, I would handle asking for permission on-list if there is interest in including this. |
I'm not sure what this brings. It is easy to write a loop iterating over test data. |
With paramterized tests *all* the tests are run and *all* failures reported. With testing in a loop the tests stop at the first failure. |
By the way - I have no opinion on whether or not using yield is the right way to support parameterized tests. It may be better for the test method to take arguments, and be decorated as a parameterized test, with the decorator providing the parameters. When I come to look at it I will look at how py.test and nose do it and solicit advice on the Testing in Python list. We had a useful discussion there previously that would be good to refer to. |
+1 to this justification. Parameterized tests are a big win over a simple for loop in a test. (However, I haven't looked at the IPython code at all, and Antoine's objection seemed to have something in particular to do with the IPython code?) |
+1 on something like this. That's also how NUnit supports parameterized tests. |
Ah, thank you. Looks better indeed.
No, it has to do that you need to be able to distinguish the different If I have 500 runs in my parameterized (*) test, and the only failure (*) (this is horrible to type) |
Antoine: the failure message would include a repr of the parameters used in the particular test that failed. So you can tell which test failed and with what parameters. |
Something else I think it would be nice to consider is what the id() (and shortDescription(), heh) of the resulting tests will be. It would be great if the id were sufficient to identify a particular test *and* data combination. In trial, we're trying to use test ids to support distributed test running. If the above property holds for parameterized tests, then we'll be able to automatically distribute them to different processes or hosts to be run. |
I should probably have clarified better our reasons for using this type of code. The first is the one Michael pointed out, where such parametric tests all execute; it's very common in scientific computing to have algorithms that only fail for certain values, so it's important to identify these points of failure easily while still running the entire test suite. The second is that the approach nose uses produces on failure the nose stack, not the stack of the test. Nose consumes the test generators at test discovery time, and then simply calls the stored assertions at test execution time. If a test fails, you see a nose traceback which is effectively useless for debugging and with which using --pdb for interactive debugging doesn't help much (all you can do is print the values, as your own stack is gone). This code, in contrast, evaluates the full test at execution time, so a failure can be inspected 'live'. In practice this makes an enormous difference in a test suite being actively useful for ongoing development where changes may send you into debugging often. I hope this helps clarify the intent of the code better, I'd be happy to provide further details. |
In PyMVPA we have our little decorator as an alternative to Fernando's generators, and which is closer, I think, to what Michael was wishing for: http://github.com/yarikoptic/PyMVPA/blob/master/mvpa/testing/sweepargs.py NB it has some minor PyMVPA specificity which could be easily wiped out, and since it was at most 4 eyes looking at it and it bears "evolutionary" changes, it is far from being the cleanest/best piece of code, BUT:
@sweepargs(arg=range(5))
def test_sweepargs_demo(arg):
ok_(arg < 5)
ok_(arg < 3)
ok_(arg < 2) For nose/unittest it would still look like a single test
$> nosetests -s test_sweepargs_demo.py Traceback (most recent call last):
File "/usr/lib/pymodules/python2.5/nose/case.py", line 183, in runTest
self.test(*self.arg)
File "/usr/lib/pymodules/python2.5/nose/util.py", line 630, in newfunc
return func(*arg, **kw)
File "/home/yoh/proj/pymvpa/pymvpa/mvpa/tests/test_sweepargs_demo.py", line 11, in test_sweepargs_demo
ok_(arg < 2)
File "/usr/lib/pymodules/python2.5/nose/tools.py", line 25, in ok_
assert expr, msg
AssertionError:
Different scenarios lead to failures of unittest test_sweepargs_demo (specific tracebacks are below):
File "/home/yoh/proj/pymvpa/pymvpa/mvpa/tests/test_sweepargs_demo.py", line 10, in test_sweepargs_demo
ok_(arg < 3)
File "/usr/lib/pymodules/python2.5/nose/tools.py", line 25, in ok_
assert expr, msg
on
arg=3
arg=4 File "/home/yoh/proj/pymvpa/pymvpa/mvpa/tests/test_sweepargs_demo.py", line 11, in test_sweepargs_demo ---------------------------------------------------------------------- FAILED (failures=1)
|
Hey Yarick, On Thu, Apr 8, 2010 at 18:53, Yaroslav Halchenko <report@bugs.python.org> w=
Thanks for the post; I obviously defer to Michael on the final On the other hand, your code does have nifty features that could be Cheers, f |
Fernando, I agree... somewhat ;-) At some point (whenever everything works fine and no unittests fail) I wanted to merry sweepargs to nose and make it spit out a dot (or animate a spinning wheel ;)) for every passed unittest, so instead of 300 dots I got a picturesque field of thousands dots and Ss and also saw how many were skipped for some parametrizations. But I became "Not sure" of such feature since field became quite large and hard to "grasp" visually although it gave me better idea indeed of what was the total number of "testings" were done and skipped. So may be it would be helpful to separate notions of tests and testings and provide user ability to control the level of verbosity (1 -- tests, 2 -- testings, 3 -- verbose listing of testings (test(parametrization))) But I blessed sweepargs every time whenever something goes nuts and a test starts failing for (nearly) all parametrization at the same point. And that is where I really enjoy the concise summary. |
Yarick: Yes, I do actually see the value of the summary view. When I have a parametric test that fails, I tend to just run nose with -x so it stops at the first error and with the --pdb options to study it, so I simply ignore all the other failures. To me, test failures are quite often like compiler error messages: if there's a lot of them, it's best to look only at the first few, fix those and try again, because the rest could be coming from the same cause. I don't know if Michael has plans/bandwidth to add the summary support as well, but I agree it would be very nice to have, just not at the expense of individual reporting. |
If we provide builtin support for parameterized tests it will have to report each test separately otherwise there is no point. You can already add support for running tests with multiple parameters yourself - the *only* advantage of building support into unittest (as I see it) is for better reporting of individual tests. |
I agree with Michael - one test that covers multiple settings can easily be done by collecting results within the test itself and then checking at the end that no failures were detected (e.g. I've done this myself with a test that needed to be run against multiple input files - the test knew the expected results and maintained lists of filenames where the result was incorrect. At the end of the test, if any of those lists contained entries, the test was failed, with the error message giving details of which files had failed and why). What parameterised tests could add which is truly unique is for each of those files to be counted and tracked as a separate test. Sometimes the single-test-with-internal-failure-recording will still make more sense, but tracking the tests individually will typically give a better indication of software health (e.g. in my example above, the test ran against a few dozen files, but the only way to tell if it was one specific file that was failing or all of them was to go look at the error message). |
Hi Nick, Am I reading your right, Are you suggesting to implement this On Tue, 11 May 2010, Nick Coghlan wrote:
|
No, I'm saying I don't see summarising the parameterised tests separately from the overall test run as a particularly important feature, since you can test multiple parameters in a single test manually now. The important part is for the framework to be able to generate multiple tests from a single test function with a selection of arguments. Doing a separate run with just those tests will give you any information that a summary would give you. |
In case it could be useful, here is how generative/parametrized tests are handled in logilab.common.testlib http://hg.logilab.org/logilab/common/file/a6b5fe18df99/testlib.py#l1137 |
Has a decision been made to implement some form of parametric tests? Is working being done? Along with parameterizing individual test methods, I'd also like to throw out a request for parametric TestCases. In some cases I find that I want the behavior of setUp() or tearDown() to vary based on some set of parameters, but the individual tests are not parametric per se. |
By this issue existing, that's the decision that we should probably do this, and I think the discussion shows we agree it should happen. How it's done is another way, and we have roughly a year to get it figured out before 3.3 gets closer. I have a patch that implements this as a decorator, but it's out of date by now and not feature complete. It's on my todo list of patches to revive. |
Brian, if you don't have time to work on it in the next little while, maybe you could post your partial patch in case someone else wants to work on it? Might be a good project for someone on the mentoring list. Unless someone sees a clever way to implement both with the same mechanism, parameterizing test cases should probably be a separate issue. Doing it via subclassing is straightforward, which also argues for a separate issue, since an 'is this worth it' discussion should be done separately for that feature. |
*If* we add this to unittest then we need to decide between test load time parameterised tests and test run time parameterisation. Load time is more backwards compatible / easier (all tests can be generated at load time and the number of tests can be known). Run time is more useful. (With load time parameterisation the danger is that test generation can fail so we need to have the test run not bomb out in this case.) A hack for run time parameterisation is to have all tests represented by a single test but generate a single failure that represents all the failures. I think this would be an acceptable approach. It could still be done with a decorator. |
And yes, parameterising test cases is a different issue. bzr does this IIRC. This is easier in some ways, and can be done through load_tests, or any other test load time mechanism. |
Michael, would your "single test" clearly indicate all the individual failures by name? If not, then I would not find it useful. I can already easily "parameterize" inside a single test using a loop, it's the detailed reporting piece that I want support for. |
The reporting piece, and ideally being able to use the arguments to unittest to run a single one of the parameterized tests. (I can get the reporting piece now using the locals() hack, but that doesn't support test selection). Does test selection support required load time parameterization? |
Test selection would require load time parameterisation - although the current test selection mechanism is through importing which would probably *not* work without a specific fix. Same for run time parameterisation. Well how *exactly* you generate the names is an open question, and once you've solved that problem it should be no harder to show them clearly in the failure message with a "single test report" than with multiple test reports. The way to generate the names is to number each test *and* show the parameterised data as part of the name (which can lead to *huge* names if you're not careful - or just object reprs in names which isn't necessarily useful). I have a decorator example that does runtime parameterisation, concatenating failures to a single report but still keeping the generated name for each failure. Another issue is whether or not parameterised tests share a TestCase instance or have separate ones. If you're doing load time generation it makes sense to have a separate test case instance, with setUp and tearDown run individually. This needs to be clearly documented as the parameter generation would run against an uninitialised (not setUp) testcase. Obviously reporting multiple test failures separately (run time parameterisation) is a bit nicer, but runtime test generation doesn't play well with anything that works with test suites - where you expect all tests to be represented by a single test case instance in the suite. I'm not sure that's a solveable problem. |
OK, I created bpo-12600 for dealing with parameterized TestCases as On Wed, Jul 20, 2011 at 4:03 PM, R. David Murray <report@bugs.python.org> wrote:
|
Well, pyflakes will tell you about name clashes within a TestCase (unless you're shadowing a test on a base class which I guess is rarely the case)... When we generate the tests we could add the parameter reprs to the docstring. A decorator factor that takes arguments and an optional name builder seem like a reasonable api to me. Actually, name clashes *aren't* a problem - as we're generating TestCase instances these generated tests won't shadow existing tests (you'll just have two tests with the same name - does this mean id() may not be unique - worth checking). |
Note that name clashes *would* result in non-unique testcase ids, so we need to prevent that. |
Please implement name+argtuple first and build auto-naming on top of that. Nick's approach would not allow me to specify a custom (hand coded) name for each set of arguments, which is my normal use case. I also would not like the arguments auto-generated into the docstring unless that was optional, since I often have quite substantial argument tuples and it would just clutter the output to have them echoed in the docstring. In my use cases the custom name is more useful than seeing the actual parameters. |
David, I don't understand - it looks like Nick's suggestion would allow you to create a name per case, that's the point of it! You could even do this: def _name_from_case(name, cases):
for idx, case in enumerate(cases, start=1):
test_name = case[0]
params = case[1:]
yield (test_name, params) |
Sorry, misclicked and removed this comment from David: Oh, I see. Make the name the first element of the argtuple and then strip it off. Well, that will work, it just seems bass-ackwards to me :) And why is it 'case'? I thought we were talking about tests. |
In my example, I needed a word to cover each entry in the collection of parameter tuples. 'case' fit the bill. The reason I like the builder approach is that it means the simplest usage is to just create a list (or other iterable) of parameter tuples to be tested, then pass that list to the decorator factory. The sequential naming will then let you find the failing test cases in the sequence. Custom builders then cover any cases where better naming is possible and desirable (such as explicitly naming each case as part of the parameters). One refinement that may be useful is for the builders to produce (name, description, parameters) 3-tuple rather than 2-tuples, though. Then the default builder could just insert repr(params) as the description, while David's custom builder could either leave the description blank, or include a relevant subset of the parameters. |
Seen that :)
That too, in our own test suite (test_list or test_tuple, I have that change in a Subversion or old hg clone somewhere). I agree about “rarely”, however. |
You may be interested an existing, unittest-compatible library that provides this: http://pypi.python.org/pypi/testscenarios |
Another nice API: http://feldboris.alwaysdata.net/blog/unittest-template.html |
I just remembered that many of the urllib.urlparse tests are guilty of only reporting the first case that fails, instead of testing everything and reporting all failures: http://hg.python.org/cpython/file/default/Lib/test/test_urlparse.py IMO, it would make a good guinea pig for any upgrade to the stdlib support for parameterised unit tests. |
FWIW I think nose2 is going to have "test load time" parameterized tests rather than "run time" parameterized tests, which is what I think we should do for unit test. The API should be as simple as possible for basic cases, but suitable for more complex cases too. I quite liked Nicks idea of allowing a custom "case builder" but providing a sane default one. |
People interested in this issue might be interested in changeset e6a33938b03f. I use parameterized unit tests in email a lot, and was annoyed by the fact that I couldn't run the tests individually using the unittest CLI. The fix for that turned out to be trivial, but by the time I figured it out, I'd already written most of the metaclass. So since the metaclass reduces the boilerplate (albeit at the cost of looking like black magic), I decided to go with it. And the metaclass at least avoids the rather questionable use of the "modify locals()" hack I was using before. |
I like test_email’s decorator. It looks like https://github.com/wolever/nose-parameterized which I’m using. (The implementation and generation of test method names may be less nice than what was discussed here, but the API (decorator + list of arguments) is something I like (much better than nose’s built-in parametrization). |
Hi Murray! I use a lot od parametrized tests. I usually use the ENV to pass these What is your approach to parametrize all the test stuff? Regards, Borja. On 31 May 2012 03:57, R. David Murray <report@bugs.python.org> wrote:
|
Sorry, I failed to mention that I use Testify to launch all my tests! On 2 July 2012 13:23, Borja Ruiz Castro <report@bugs.python.org> wrote:
|
As another in-the-standard-library uses case: my additions to the ipaddress test suite are really crying out for parameterised test support. My current solution is adequate for coverage and debugging purposes (a custom assert applied to multiple values in a test case), but has the known limitations of that approach (specifically, only the first failing case gets reported rather than all failing cases, which can sometimes slow down the debugging process). |
http://gist.github.com/mfazekas/1710455 I have a parametric delclarator which works is similiar to sweepargs in concept. It can be either applied at class or method level. And it mutates testnames so failure should be nice, and filters can be applied too. @parametric
class MyTest(unittest.TestCase):
@parametric(foo=[1,2],bar=[3,4])
def testWithParams(self,foo,bar):
self.assertLess(foo,bar)
def testNormal(self):
self.assertEqual('foo','foo') @Parametric(foo=[1,2],bar=[3,4]) Sample failures: Traceback (most recent call last):
File "/Work/temp/parametric.py", line 63, in f
self.fun(*args,**v)
File "/Work/temp/parametric.py", line 158, in testNegative
self.assertLess(-foo,-bar)
AssertionError: -1 not less than -3 ====================================================================== Traceback (most recent call last):
File "/Work/temp/parametric.py", line 63, in f
self.fun(*args,**v)
File "/Work/temp/parametric.py", line 158, in testNegative
self.assertLess(-foo,-bar)
AssertionError: -2 not less than -3 |
Looks like we're going to get subtests ( issue bpo-16997 ) instead of parameterized tests. |
Since we now got subtests, can this be closed? |
subtests don't satisfy my use cases. You can't run an individual subtest by name, and I find that to be a very important thing to be able to do during development and debugging. At the moment at least I'm fine with just having my parameterize decorator in the email module, so I'm not motivated to move this forward right now. I would like to come back to it eventually, though. |
Right, subtests are about improving reporting without adding selectivity. |
Is there any possibility of getting this into 3.5? If it helps I've always got time on my hands so if nothing else I could do testing on Windows 8.1. |
I don't believe the "how to request specific parameters" aspect has been Any new API (if any) should likely also be based on the subtest feature, |
I agree with Nick. There is a potential use case for parameterized tests as well as sub tests, but it's not something we're going to rush into. |
+1 for the feature Subtests make the test results of all asserts visible at test execution time but decrease the readability of a test: @parameterized([2,4,6]) # 2) act
result = something.method(a)
# 3) assert
self.assertIsNone(result) When using subtests the phases of 1) arrange, 2) act, 3) assert are not clearly separated, the unit test contains logic and two additional indentation levels that could be avoided with parameterized tests: def test_method_whenCalled_returnsNone(self, a):
# 1) arrange
something = Something()
for a in [2,4,6]:
with self.subTest(a=a):
# 2) act
result = something.method(a)
# 3) assert
self.assertIsNone(result) |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
The unittest-enhancing |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: