30

Here's a skeleton of a class I built that loops through and deduplicates data - it's in C# but the principles of the question aren't language specific.

public static void DedupeFile(FileContents fc)
{
    BuildNameKeys(fc);
    SetExactDuplicates(fc);
    FuzzyMatching(fc);
}

// algorithm to calculate fuzzy similarity between surname strings
public static bool SurnameMatch(string surname1, string surname2)

// algorithm to calculate fuzzy similarity between forename strings
public static bool ForenameMatch(string forename1, string forename2)

// algorithm to calculate fuzzy similarity between title strings
public static bool TitleMatch(string title1, string title2)

// used by above fn to recognise that "Mr" isn't the same as "Ms" etc
public static bool MrAndMrs(string title1, string title2)

// gives each row a unique key based on name
public static void BuildNameKeys(FileContents fc)

// loops round data to find exact duplicates 
public static void SetExactDuplicates(FileContents fc)

// threads looping round file to find fuzzy duplicates
public static void FuzzyMatching(FileContents fc, int maxParallels = 32)

Now, in actual usage only the first function actually needs to be public. All the rest are only used inside this class and nowhere else.

Strictly that means they should of course be private. However, I've left them public for ease of unit testing. Some people will no doubt tell me I should be testing them via the public interface but that's partly why I picked this class: it's an excellent example of where that approach gets awkward. The fuzzy matching functions are great candidates for unit tests, and yet a test on that single "public" function would be near-useless.

This class won't ever get used outside a small team at this office, and I don't believe that the structural understanding imparted by making the other methods private is worth the extra faff of packing my tests with code to access private methods directly.

Is this "all public" approach reasonable for classes in internal software? Or is there a better approach?

I am aware there is already a question on How do you unit test private methods?, but this question is about whether there are scenarios where it's worthwhile bypassing those techniques in favour of simply leaving methods public.

EDIT: For those interested, I added the full code on CodeReviewSE as restructuring this class seemed too good a learning opportunity to miss.

12
  • 11
    Have you considered making the methods Internal? And add an InternalsVisibleTo attribute, that points to your Unit Test project.
    – Falgantil
    Commented Jun 6, 2016 at 9:18
  • 1
    @BjarkeSøgaard No! I'd never come across that attribute before but it sounds like it would solve my specific problem really well. So thanks very much for that. Going to leave the question here as it still seems to have value for other languages.
    – Bob Tway
    Commented Jun 6, 2016 at 9:22
  • 5
    @MattThrower, Hi Matt, sorry, but I edited my comment after reading your question a bit more carefully. When thinking of a unit test, a "unit" != small amount of code, it means "one unit of work". A test should test a feature, not a method. I.e TestDedupeFile_WhenCalledWithFuzzyTitles_MatchesThem. You would have as many tests for the public method, as you would if you wrote one for each private method. Commented Jun 6, 2016 at 9:35
  • 1
    As to; Is it worth the tradeoff? I don't know about the mocking part, how much work it implies. However, when a developer sees this code, he should be able to make changes to the privates of the class as he see fit, whether that be splitting one method into two, or two into one. Commented Jun 6, 2016 at 9:37
  • 1
    Related
    – Robbie Dee
    Commented Jun 6, 2016 at 14:37

8 Answers 8

64

I've left them public for ease of unit testing

Ease of writing those tests, maybe. But you are then tightly coupling that class to a bunch of tests that interact with its inner workings. This results in brittle tests: they will likely break as soon as you make changes to the code. This creates a real maintenance headache, that often results in people simply deleting the tests as they become more trouble than they are worth.

Remember, unit testing doesn't mean "testing the smallest possible piece of code". It's a test of a functional unit, ie for a set of inputs into part of the system, I expect these results. That unit might be a static method, a class, or a bunch of classes within an assembly. By targeting public APIs only, you mimic the behaviour of the system and so your tests become both less coupled and more robust.

So make the methods private, mock the "whole 'FileContents' DTO" and only test the one true public method. It'll be more work initially, but over time, you'll reap the benefits of creating useful tests like this.

6
  • In my experience you have a choice between too many things public, writing only integration instead of unit tests or testing internals. (Though the last option is usually limited to public methods on internal classes) Commented Jun 6, 2016 at 12:55
  • 11
    @CodesInChaos, whether a test is a unit or integration test is a matter of semantics in my experience. If tests can be run in parallel, without side effects, then they are unit tests, even if they result in the entire application running. Testing internals still leads to brittle tests: internals should be free to change in any way, which cannot happen if tests get their sticky paws on them. I have just been through an exercise of removing some sixty internal classes in a library I created. Because my tests only hit public methods, all 600+ of them still work properly.
    – David Arno
    Commented Jun 6, 2016 at 13:03
  • 5
    @DavidArno "If tests can be run in parallel, without side effects, then they are unit tests" - great definition!
    – Lovis
    Commented Jun 6, 2016 at 14:46
  • 2
    @DavidArno Great first sentence. Taking the extra hour to do things the right way will pay off not only with the maintainability benefits you describe, but also make it so doing the right thing only takes 45 extra minutes next time and eventually becomes second nature. Commented Jun 6, 2016 at 15:11
  • Thanks for all the contributions. I'm accepting this as the answer as it contains the clearest explanation of why it's a better idea to test only public methods.
    – Bob Tway
    Commented Jun 6, 2016 at 15:57
37

I would normally expect you to exercise the private member functions via the public interface. In this instance I would write different tests to feed different file contexts in, with different data sets present in order to exercise those methods.

I don't think your tests should know about those private methods. I think they're part of the implementation, and your test should be concerned with how your component functions, not how it's been implemented.

This class won't ever get used outside a small team at this office

You might be surprised what gets reused as soon as you make it public. An alternative is to accept that making it public will result in re-use, and then pull the methods out into a general utility class. That way you can test those methods separately (since you've publicised them) and since it's a public utility class, you're not making any implicit assumptions that they'll only be used in this scenario you're currently focused on.

3
  • 4
    That last paragraph is particularly valuable. If your implementation details are interesting enough and useful enough it is worthwhile to consider pulling them out to a separate location where they are public and can be tested separately.
    – YoungJohn
    Commented Jun 6, 2016 at 13:29
  • Basically what your last paragraph is saying is not just "the public parts of your interface need to be unit tested", but in fact also the reverse: "the parts of your interface that are unit tested need / are interesting enough to be public". I never thought of it this way, but my first hunch is to agree with this and view it as a strong guideline.
    – CompuChip
    Commented Jun 6, 2016 at 14:48
  • 1
    "No need to wear a seatbelt. Today's itinerary doesn't mention any drunks running red li..." Commented Jun 6, 2016 at 15:50
9

I think the issue comes from the design. My gut feeling says you either wrote the tests after the code, or that you already had a complete implementation in mind when you started writing the tests, and then you just shoehorned the tests to fit the design.

I think this type of trouble can be avoided by remembering the short TDD cycle of making a small assertion and then making it pass in the simplest way possible.

You speak of it being hard to exercise all the private methods via public methods. This is probably indicative of your class doing too much. If you can't construct it by adding test after test and just seeing the requirements being met and then refactoring it into maintainable and readable code, then you either don't have enough knowledge about the entity to implement it, or it's too complex (yeah yeah, it's a generalization, I know they're evil).

By looking at your method names it looks to me like this class has way too many responsibilities. It has several algorithm implementations, it manages threads, it even possibly reads files from disk. Split up your work into manageable reusable chunks. Matching/validation algorithms can easily be injected (as strategies, more more preferably imo, as delegates), thread management should probably occur at a higher level, etc.

When you no longer have a large complicated class with billions of responsibilities (well, more or less), the testing becomes almost trivial.

8
  • Thanks for the input. You're right that I wrote tests after the class - that's the way I normally work. TDD has value but I have issues with it (way outside the scope of the question). The class does less than you think: it's only 125 lines of code. It doesn't read files, and the actual distance algorithm is a separate class.
    – Bob Tway
    Commented Jun 6, 2016 at 10:23
  • 1
    Fair enough, I won't argue that point here since it's out of scope as you say. I hope the answer is useful to others though, it's a rule of thumb I like to use.
    – sara
    Commented Jun 6, 2016 at 12:39
  • 1
    it's still useful to me, and I appreciate the input :) got a +1.
    – Bob Tway
    Commented Jun 6, 2016 at 12:51
  • @MattThrower - To a certain extent you just contradicted yourself. Since the class is only 125 lines and does not do as much as we might think it does, then there should be no reason why it couldn't be thoroughly unit tested via a single public interface.
    – Cerad
    Commented Jun 6, 2016 at 13:23
  • 2
    +1 This is probably indicative of your class doing too much - this to me is what came jumping off the screen - an obvious violation of the SRP...
    – Robbie Dee
    Commented Jun 6, 2016 at 14:47
6

I agree with @BrianAgnew and @kai, but would like to add more than a comment.

While an IDedupeFiler (or whatever) should be tested through its public interface, the OP has decided there is value in testing the individual sub-routines. Irrespective of file size or line count (which is only a rough proxy count for class responsibilites), the OP has decided that there is too much complexity to test from the top of this class.

This is a good thing incidentally, one of the reasons people like TDD is that the need to test forces the coder to adapt (and improve) their design. It’s valid to point out that the earlier tests are written the earlier this process happens, but the OP isn’t at all far down the road of untestable design decisions and refactoring will not be onerous.

The question seems to be whether the OP should (1) make the methods public and achieve testability with less refactoring, or whether they should do something else. I’d agree with @kai and say the OP should (2) split this class up into isolated, separately testable chunks.

(1) breaks encapsulation and makes it less immediately obvious what the use and public interface of the class actually is. I think the OP acknowledges that this isn’t the best OOP design choice in their question. (2) means more classes, but I don’t think that’s much of a problem, and it provides testability without design compromises.

If you really disagree with me that your sub-methods represent separate and separately testable concerns, then don't try to dig into the class to test them. Exercise them through your top public method. How hard you find this will be a good indicator of whether this was the right choice.

1
  • 3
    Good points. I like the last sentence. People sometimes say "I chose an untestable design because it was better", but I don't buy that. An untestable design is inherently bad imo. All code needs to be tested. Doesn't matter if it's an automatic unit test runner, some big full-system tests or manual exploration tests or whatever. Teh code MUST be tested, so designing for testability is high prio. I'd rather have 10 well tested small classes than 3 large complex ones that cause brittle unmaintainable tests (or worse yet: force me to fire up IIS, compile all aspx files and launch my browser)
    – sara
    Commented Jun 6, 2016 at 12:44
3

I think you might benefit from a slight shift in the way you view unit tests. Instead of thinking about them as a way to guarantee that all your code is working, think of them as a way to guarantee that your public interface does what you claim it does.

In other words, don't worry about testing the internals at all - write unit tests that prove that when you give your class X inputs, you get Y outputs - that's it. How it manages to do that doesn't matter at all. In fact, the private methods could be totally wrong, useless, redundant, etc., and as long as the public interface does what it is supposed to do, from the point of view of the unit test, it is working.

This is important because you want to be able to go back in and refactor that code later when a new library comes out that does it better, or you realize you were doing unnecessary work, or you just decide that the way things are named and organized could be made more clear. If your tests are only around the public interface, then you are free to do that without worrying about breaking things.

If you make this shift in the way you think about unit tests and find that it is still hard to write the test for a given class, then it is a good sign that your design needs some improvements. Maybe the class should try to do less - maybe some of those private methods really belong in a new smaller class. Or maybe you need to use dependency injection to manage external dependencies.

In this case, without knowing more details about how you are doing this deduplication, I'd guess that the methods you want to make public might actually be better off as separate classes, or as public methods in a utility library instead. That way, you could define the interface for each one along with its range of allowed inputs and expected outputs, and test them in isolation from the deduplication script itself.

3

In addition to the other answers, there's another benefit to making these functions private and testing them via the public interface.

If you gather code coverage metrics on your code, it becomes easier to tell when a function is no longer being used. But, if you make all of these functions public, and make unit tests for them, then they will always have at least the unit test calling them even when nothing else does.

Consider this example:

public void foo() {
    bar();
    baz();
}

public void bar() { ... }

public void baz() { ... }

And then you make individual unit tests for foo, bar, and baz. So they will all be called by your unit test framework, and they will all have code coverage saying they're being used.

Now consider this change:

public void foo() {
    bar();
}

public void bar() { ... }

public void baz() { ... }

Since your unit tests still call all of these functions, your code coverage is going to say they're all being used. But baz is no longer being called by any other part of your software other than the unit tests. It's effectively dead code.

Had you written it like this:

public void foo() {
    bar();
}

private void bar() { ... }

private void baz() { ... }

Then you would lose 100% of your code coverage of baz when foo changed, and that's your signal to remove it from the code. OR, this raises a red flag because baz is actually a super important operation, and its omission causes other problems (hopefully if this is the case, then other unit tests would catch it, but perhaps not).

0

I would seperate the function and the test cases to seperate classes. The one contains the function the other contains the test cases which calls the function and asserts if the result is equals to what you expect. I'm not in C but in JAVA I would use Junit for this. This divides the logic and makes your class more readable. You also don't have to worry about your problem.

-1

You do not need to declare the methods public just to Unit-test them.

Unit test normally should belong to the same package where the class under testing belongs (of course, inside the different source code folder). As a result, also package private and protected methods can be accessed for testing. See Maven layout, for instance.

While it may not be worth writing very detailed tests for all internals, the visibility scope is not the only indicator to identify the unit.

1
  • @Downvoter, you can write Java program where main() is the only public method, rest can be easily made at least package private even for a relatively complex application spanning over ten classes or about. Is it so that such application does not require any Unit testing?
    – h22
    Commented Jun 8, 2016 at 6:25

Not the answer you're looking for? Browse other questions tagged or ask your own question.