-2

I am writing a unit-test framework (for Matlab/Octave, but that does not matter) which also supports property based testing. Thus, I need some functions which generate random data which are tailored for the functions under test (At the same time I am writing a scientific program which I test using my unit-test framework).

These data-generator functions are useful per se, not only in connection with unit tests. My question is, where shall I put the data-generator functions? I want that the framework can also be used for other projects (not just the one I am writing at the moment).

My current idea is, that the data-generators are not part of the unit-test framework. Intead, it belong to the software which is tested and I specify simple properties which the data-generators must have. The test framework then can use all data-generators which satisfy these properties.

  • Is this approach good?
  • What is "best practice" for this problem?
  • What other approaches could I take.

E.g.

  • I have functions which transform a signal from time domain into frequency-domain (similar to FFT) and back. Thus, I want to test if my transform functions behave correctly, resp. how large the error is after transformation. For this to test I need a lot of signals of different types

    • total positive functions
    • bandlimited functions
    • noise (coloured and white)
    • functions whose Fourier Series diverges
    • etc...
  • Another function I have computes spectral properties of sets of matrices For this to test I need a lot of various sets

    • random matrices
    • badly conditioned matrices
    • sparse matrices
    • matrices arising from some counterexamples

All of this examples are interesting per se for the user of the software I am writing. Thus, I want that the user has access to this data-generating functions.

3
  • ... and did you look at other unit testing frameworks, how they solved this problem? For example, NUnit's Random attribute? Or its TestCaseSource attribute, which allows to define arbitrary functions in the tests providing test data in a functional manner?
    – Doc Brown
    Commented Dec 7, 2020 at 16:17
  • @Doc Brown: 1) Standard properties: Shift invariance, linearity, inverse of inverse is original, etc... 2)I want to mimic python/hypothesis' approach. I did not look yet how NUnit does it, I will do it next.
    – tommsch
    Commented Dec 7, 2020 at 18:28
  • @DocBrown I assume that my data generating functions are useful for a) people who use my test framework but no my scientific program b) people who use my "scientific program" but not my test framework.
    – tommsch
    Commented Dec 9, 2020 at 9:53

2 Answers 2

4

First, I think having "data generating functions" for tests and other purposes makes perfectly sense in the specific context you described. The fact those functions may utilize a random generator is fine as long as the tests stay deterministic (by using a fixed seed value), and as long as the created data is not completely random - which means, despite the randomness, the functions will be able create representative examples for specific categories of data (like the type of matrices mentioned in your question).

But now to your actual question: in a comment, you wrote

I assume that my data generating functions are useful for a) people who use my test framework but no my scientific program b) people who use my "scientific program" but not my test framework

so this gives a clear indicator where to put the generator function: they should be part of a library which will allow to reuse them in the test framework as well as in the "scientific program". Then people can do exactly what you wrote above: use the generator functions with or without the test framework, or with or without your program.

The only potential issue I see is the following: tests will depend on your scientific program as well as on your framework, so you need to care for not getting name collisions, with the "generator-library" linked against the framework, and the "generator-library" linked against your program (but maybe with a different version), when a test links the framework as well as the program.

       Generator    Generator  
          Lib V1    Lib V2 
             /       \
            /         \
    Test Framework    Program
             \        /
              \      /
               \    / 
              Unit Test: which Version of the lib to use here?

If this really is an issue this depends on the intended change management process of both. How to resolve it depends on the programming language environment and it's module system (maybe you have to provide two variants of the lib with different namespaces, or something like that). But I guess you get the idea and can apply it to your environment.

1
  • After writing the cited comment, I already came to the same conclusion.
    – tommsch
    Commented Dec 10, 2020 at 11:53
10

Drawbacks of Random Test Data

It may not be a good idea to use random data in the unit test, because it is slightly harder to debug, hard to create edge cases, and hard to specify the expected answer. Also since the data is random, it may not even catch the bug in the code.

File based Test Data

Instead, you can have a file used by the test, so it can be shared by other projects if they need.

In Java by convention the main source code is located in src/main/java with config files and other static assets located in src/main/resources. Analogous the test code is located in src/test/java and the data used by tests in src/test/resources. So in test code you can read the file under resources and use the data in test. It should be similar with other languages I assume.

6
  • 2
    Agreed. Unit tests should have the same behavior each time they're ran. If you're using random data it is easily possible that it can fail one time and rerunning it passes.
    – Shelby115
    Commented Dec 7, 2020 at 13:24
  • 2
    I agree to most of what you wrote, but making random tests deterministic is pretty simple by using a fixed seed for the random number generator. So this is not very strong argument.
    – Doc Brown
    Commented Dec 7, 2020 at 16:10
  • Thanks @DocBrown yeah that's right, I rolled it back
    – lennon310
    Commented Dec 7, 2020 at 16:13
  • In property based testing (e.g. quickcheck) it is the standard approach to use random data.
    – tommsch
    Commented Dec 7, 2020 at 18:30
  • 1
    @tommsch: I am not an expert on property-based tests, but I guess it is a standard approach to use-pseudo random data with a fixed seed, to keep the tests deterministic.
    – Doc Brown
    Commented Dec 8, 2020 at 15:15

Not the answer you're looking for? Browse other questions tagged or ask your own question.