85

I read about this snafu: Programming bug costs Citigroup $7m after legit transactions mistaken for test data for 15 years.

When the system was introduced in the mid-1990s, the program code filtered out any transactions that were given three-digit branch codes from 089 to 100 and used those prefixes for testing purposes.

But in 1998, the company started using alphanumeric branch codes as it expanded its business. Among them were the codes 10B, 10C and so on, which the system treated as being within the excluded range, and so their transactions were removed from any reports sent to the SEC.

(I think this illustrates that using a non-explicit data indicator is ... sub-optimal. It would have been much better to populate and use a semantically explicit Branch.IsLive property.)

That aside, my first reaction was "Unit tests would have helped here"... but would they?

I recently read Why most unit testing is waste with interest, and so my question is: what would the unit tests that would have failed on the introduction of alphanumeric branch codes look like?

25
  • 15
  • 17
    It seems they also missed an integration test which checked the amount of transactions exported to SEC. If you build an export feature that would be a reasonable check. Commented Jul 14, 2016 at 7:14
  • 31
    The author of the article doesn't seem to understand unit testing. Some claims are just plain ridiculous ("unit tests are unlikely to test more than one trillionth of the functionality of any given method"), others would destroy the chance of getting regressions ("look at the tests that have never failed in a year and consider throwing them away"). Or suggestions like "turn unit tests into assertions", which are supposed to change failed tests for runtime exceptions?
    – vgru
    Commented Jul 14, 2016 at 13:24
  • 25
    @gnat I didn't read the external link, and I still found this question meaningful
    – Jeutnarg
    Commented Jul 14, 2016 at 16:06
  • 23
    For what it's worth, I pretty much disagree with everything in "Why Most Unit Testing is Waste." I'd write a rebuttal, but this margin is too small to contain it. Commented Jul 14, 2016 at 19:24

11 Answers 11

21

Are you really asking, "would unit tests have helped here?", or are you asking, "could any kind of tests possibly have helped here?".

The most obvious form of testing that would have helped, is a precondition assertion in the code itself, that a branch identifier consists only of digits (supposing that this is the assumption relied on by the coder in writing the code).

This could then have failed in some kind of integration test, and as soon as the new alpha-numeric branch ids are introduced then the assertion blows up. But that's not a unit test.

Alternatively, there could be an integration test of the procedure that generates the SEC report. This test ensures that every real branch identifier reports its transactions (and therefore requires real-world input, a list of all branch identifiers in use). So that's not a unit test either.

I can't see any definition or documentation of the interfaces involved, but it may be that unit tests cannot possibly have detected the error because the unit was not faulty. If the unit is permitted to assume that branch identifiers consist only of digits, and the developers never made a decision what the code should do in case it didn't, then they should not write a unit test to enforce particular behaviour in the case of non-digit identifiers because the test would reject a hypothetical valid implementation of the unit that handled alphanumeric branch identifiers correctly, and you don't usually want to write a unit test that prevents valid future implementations and extensions. Or maybe one document written 40 years ago implicitly defined (via some lexicographical range in raw EBCDIC, instead of a more human-friendly collation rule) that 10B is a test identifier because it does in point of fact fall between 089 and 100. But then 15 years ago someone decided to use it as a real identifier, so the "fault" doesn't lie in the unit that correctly implements the original definition: it lies in the process that failed to notice that 10B is defined to be a test identifier and therefore should not be assigned to a branch. The same would happen in ASCII if you defined 089 - 100 as a test range and then introduced an identifier 10$ or 1.0. It just happens that in EBCDIC the digits come after the letters.

One unit test (or arguably a functional test) that conceivably might have saved the day, is a test of the unit that generates or validates new branch identifiers. That test would assert that the identifiers must contain only digits, and would be written in order to allow users of the branch identifiers to assume the same. Or perhaps there's a unit somewhere that imports real branch identifiers but never sees the test ones, and that could be unit-tested to ensure it rejects all test identifiers (if identifiers are only three characters we can enumerate them all, and compare the behaviour of the validator to that of the test-filter to ensure they match, which deals with the usual objection to spot-tests). Then when somebody changed the rules, the unit test would have failed since it contradicts the newly-required behaviour.

Since the test was there for a good reason, the point where you need to remove it due to changed business requirements becomes an opportunity for somebody to be given the job, "find every place in the code that relies on the behaviour that we want to change". Of course this is difficult and hence unreliable, so would by no means guarantee saving the day. But if you capture your assumptions in tests of the units that you're assuming properties of, then you've given yourself a chance and so the effort is not wholly wasted.

I agree of course that if the unit hadn't been defined in the first place with a "funny-shaped" input, then there would be nothing to test. Fiddly namespace divisions can be hard to test properly because the difficulty doesn't lie in implementing your funny definition, it lies in making sure that everyone understands and respects your funny definition. That is not a local property of one code unit. Furthermore, changing some data type from "a string of digits" to "a string of alphanumerics" is akin to making an ASCII-based program handle Unicode: it will not be simple if your code is heavily coupled to the original definition, and when the data type is fundamental to what the program does then it often is heavily coupled.

it’s a bit disturbing to think it’s largely wasted effort

If your unit tests sometimes fail (while you're refactoring, for example), and in doing so give you useful information (your change is wrong, for example), then the effort wasn't wasted. What they don't do, is test whether your system works. So if you're writing unit tests instead of having functional and integration tests then you may be using your time sub-optimally.

5
  • Assertions are good!
    – user186205
    Commented Jul 15, 2016 at 16:27
  • 3
    @nocomprende: as Reagan had it, "trust, but verify". Commented Jul 15, 2016 at 17:26
  • 1
    I was going to also say "Unit tests bad!" but I thought most people would miss the reference to Animal Farm and start actually criticizing me instead of thinking about what I was saying (knee-jerk responses are ineffective) but I didn't say that. Maybe a person who is smarter and has greater erudition can make that point.
    – user186205
    Commented Jul 15, 2016 at 17:40
  • 2
    "All the tests are passing, but some tests are MORE passing than others!"
    – GHP
    Commented Jul 18, 2016 at 12:18
  • 1
    testing is a red herring. These guys just did not know how "branch code" was defined. This would be like the US Post Office not knowing that it was changing the definition of zip code when it added 4 digits.
    – radarbob
    Commented Jul 18, 2016 at 13:33
118

Unit tests could have caught that the branch codes 10B and 10C were incorrectly classified as "testing branches", but I find it unlikely that the tests for that branch classification would have been extensive enough to catch that error.

On the other hand, spot checks of the generated reports could have revealed that branched 10B and 10C were consistently missing from the reports much sooner than the 15 years that the bug was now allowed to remain present.

Finally, this is a good illustration why it is a bad idea to mix testing data with the real production data in one database. If they had used a separate database that contains the testing data, there would not have been a need to filter that out of the official reports and it would have been impossible to filter out too much.

16
  • 80
    +1 Unit tests can never compensate for poor design decisions (like mixing test and real data)
    – Jeutnarg
    Commented Jul 14, 2016 at 16:06
  • 6
    While it is best to avoid mixing test data with real data, it can be hard to validate a production system if that requires modifying real data. For example, it's a bad idea to validate a banking system by modifying bank accounts totals in production. Using code ranges to designate meaning is problematic. A more explicit attribute of the records would probably have been a better choice.
    – JimmyJames
    Commented Jul 14, 2016 at 16:40
  • 4
    @Voo I think there is a tacit assumption that there exists a level of complexity or requirement of reliability where testing the actual, deployed production system is considered worthwhile or necessary. (Consider how much could go wrong because of a wrong configuration variable.) I could see this being the case for a large financial institution.
    – jpmc26
    Commented Jul 14, 2016 at 20:39
  • 4
    @Voo I'm not talking about testing. I'm talking about validation of the system. In a real production system, there are many ways that it can fail that have nothing to do with code. If you are putting a new banking system into production, you might have some issue in the db or network etc. that is preventing transactions from being applied to the accounts. I've never worked at a bank but I'm pretty sure it's frowned upon to start modifying real accounts with phony transactions. So that leaves you with either setting up fake accounts or wait and pray.
    – JimmyJames
    Commented Jul 14, 2016 at 20:41
  • 12
    @JimmyJames In healthcare it's common to periodically copy the production database into the test environment to perform testing on data that's as close to real as possible; I'd think a bank can do the same.
    – dj18
    Commented Jul 14, 2016 at 21:54
76

The software had to handle certain business rules. If there were unit tests, the unit tests would have checked that the software handled the business rules correctly.

The business rules changed.

Apparently nobody realised that the business rules had changed, and nobody changed the software to apply the new business rules. If there had been unit tests, those unit tests would have to be changed, but nobody would have done that because nobody realised that the business rules had changed.

So no, unit tests wouldn't have caught that.

The exception would be if the unit tests and the software had been created by independent teams, and the team doing the unit tests changed the tests to apply the new business rules. Then the unit tests would have failed, which hopefully would have resulted in a change of the software.

Of course in the same case if only the software was changed and not the unit tests, then the unit tests would also fail. Whenever a unit test fails, it doesn't mean the software is wrong, it means either the software or the unit test (sometimes both) are wrong.

11
  • 2
    Is it feasible to have different teams where one is working on code and other on "unit" tests? How is that even possible?... I'm refactoring my code all the time.
    – Sergio
    Commented Jul 14, 2016 at 11:50
  • 2
    @Sergio from one perspective, refactoring changes internals while preserving behavior - so if the test is written in a way that tests behavior without relying on internals, then it doesn't need updating.
    – Daenyth
    Commented Jul 14, 2016 at 12:39
  • 1
    I've seen this happen a number of times. Software is in production with no complaints, then all of a sudden users explode with complaints that it no longer works and has been gradually failing over the years. That's what happens when you decide to go and change your internal procedures without following the standard notification process... Commented Jul 14, 2016 at 14:50
  • 42
    "The business rules changed" is the critical observation. Unit tests validate that you implemented the logic you thought you implemented, not that your logic was correct. Commented Jul 14, 2016 at 18:14
  • 5
    If I'm correct about what happened, it's unlikely that unit tests to catch this would be written. The basic principle for selecting tests is to test some "good" cases, some "bad" cases, and cases bracketing any boundaries. In this case, you'd test "099", "100", and "101". Since "10B" was covered by the "reject non-numbers" tests under the old system, and is greater than 101 (and so is covered by testing that) under the new system, there's no reason to test it -- except that in EBCDIC, "10B" sorts between "099" and "100".
    – Mark
    Commented Jul 14, 2016 at 18:47
30

No. This is one of the big problems with unit testing: they lull you into a false sense of security.

If all your tests pass, it doesn't mean your system is working right; it means all your tests are passing. It means that the parts of your design that you consciously thought about and wrote tests for are working as you consciously thought they would, which really isn't that much of a big deal anyway: that was the stuff that you were actually paying close attention to, so it's very likely you got it right anyway! But it does nothing to catch cases you never thought of, such as this one, because you never thought to write a test for them. (And if you had, you'd have been aware that that meant code changes were necessary, and you'd have changed them.)

19
  • 17
    My father used to ask me that: Why didn't you think of the thing you didn't think of? (Only he used to make it confusing by saying "If you don't know, ask!") But how do I know that I don't know?
    – user186205
    Commented Jul 14, 2016 at 13:28
  • 7
    "It means that the parts of your design that you consciously thought about and wrote tests for are working as you consciously thought they would." Exactly right. This information is invaluable if you are refactoring, or if something changes somewhere else in the system that breaks your assumptions. Developers who are lulled into a false sense of security simply do not understand the limitations of unit testing, but that doesn't render unit testing a useless tool. Commented Jul 15, 2016 at 15:09
  • 12
    @MasonWheeler: Like you, the author thinks that unit testing is somehow supposed to prove that your program works. It doesn't. Let me repeat that: unit testing does not prove that your program works. Unit testing proves that your methods fulfill your testing contract, and that's all it does. The rest of the paper falls down, because it rests on that single invalid premise. Commented Jul 15, 2016 at 15:18
  • 5
    Naturally, developers who have that false belief are going to be disappointed when unit testing utterly fails them, but that's the fault of the developer, not of unit testing, and it doesn't invalidate the genuine value that unit testing provides. Commented Jul 15, 2016 at 15:22
  • 5
    o_O @ your first sentence. Unit tests give you a false sense of security while coding like having your hands on the steering wheel gives you a false sense of security while driving.
    – djechlin
    Commented Jul 15, 2016 at 22:45
10

No, not necessarily.

The original requirement was to use numeric branch codes, so a unit test would have been produced for a component that accepted various codes and rejected any like 10B. The system would have been passed as working (which it was).

Then, the requirement would have changed and the codes updated, but this would have meant the unit test code that supplied the bad data (that is now good data) would have to be modified.

Now we assume that, the people managing the system would know this was the case and would change the unit test to handle the new codes... but if they knew that was occurring, they would also have known to change the code that handled these codes anyway.. and they didn't do that. A unit test that originally rejected code 10B would have happily said "everything is fine here" when run, if you didn't know to update that test.

Unit testing is good for original development but not for system testing, especially not 15 years after the requirements are long forgotten.

What they need in this kind of a situation is an end-to-end integration test. One where you can pass in the data you expect to work and see if it does. Someone would have noticed that their new input data didn't produce a report and would then investigate further.

1
  • Spot on. And the main (only?) problem with unit tests. Saved me wording my own answer, as I'd have said precisely the same thing (but probably worse!) :) Commented Jul 15, 2016 at 23:04
8

Type testing (the process of testing invariants using randomly-generated valid data, as exemplified by the Haskell testing library QuickCheck and various ports/alternatives inspired by it in other languages) may well have caught this issue, unit testing almost certainly would not have done.

This is because when the rules for the validity of branch codes were updated, it is unlikely anybody would have thought to test those specific ranges to ensure they worked correctly.

However, if type testing had been in use, somebody should at the time the original system was implemented have written a pair of properties, one to check that the specific codes for test branches were treated as test data and one to check that no other codes were... when the data type definition for the branch code was updated (which would have been required in order to allow testing that any of the changes for the branch code from digit to numeric worked), this test would have started testing values in the new range and would most likely have identified the fault.

Of course, QuickCheck was first developed in 1999, so was already too late to catch this issue.

1
  • 1
    I think it's more normal to call this property based testing, and of course it's just as possible to write a property based test that would still pass given this change (though I do think you are more likely to write a test that could find it)
    – jk.
    Commented Jul 14, 2016 at 9:15
5

I really doubt unit testing would make a difference to this problem. It sounds like one of those tunnel vision situations because functionality was changed to support new branch codes, but this was not carried out throughout all areas in the system.

We use unit testing to design a class. Re-running a unit test is only required if the design has changed. If a particular unit does not change, then the unchanged unit tests will return the same results as before. Unit tests are not going to show you the impacts of changes to other units (if they do you are not writing unit tests).

You can only reasonably detect this problem via:

  • Integration tests - but you would have to specifically add the new code formats to feed through multiple units in the system (i.e. they would only show you the problem if the original tests included the now-valid branches)
  • End-to-end testing - the business should run an end-to-end test that incorporated old and new branch code formats

Not having sufficient end-to-end testing is more worrying. You cannot rely on unit testing as your ONLY or MAIN test for system changes. Sounds like it only required someone to run a report on the newly supported branch code formats.

3

The takeaway from this is to Fail Fast.

We don't have the code, nor do we have many examples of prefixes that are or are not test branch prefixes according to the code. All we have is this:

  • 089 - 100 => test branch
  • 10B, 10C => test branch
  • < 088 => presumably real branches
  • > 100 => presumably real branches

The fact that the code allows numbers and strings is more than a little strange. Of course, 10B and 10C can be considered hex numbers, but if the prefixes are all treated as hex numbers, 10B and 10C fall outside of the test range and would be treated as real branches.

This likely means that the prefix is stored as a string but treated as a number in some cases. Here is the simplest code I can think of that replicates this behavior (using C# for illustrative purposes):

bool IsTest(string strPrefix) {
    int iPrefix;
    if(int.TryParse(strPrefix, out iPrefix))
        return iPrefix >= 89 && iPrefix <= 100;
    return true; //here is the problem
}

In English, if the string is a number and is between 89 and 100, it's a test. If it's not a number, it's a test. Otherwise it's not a test.

If the code follows this pattern, no unit test would have caught this at the time the code was deployed. Here are some example unit tests:

assert.isFalse(IsTest("088"))
assert.isTrue(IsTest("089"))
assert.isTrue(IsTest("095"))
assert.isTrue(IsTest("100"))
assert.isFalse(IsTest("101"))
assert.isTrue(IsTest("10B")) // <--- business rule change

The unit test shows that "10B" should be treated as a test branch. User @gnasher729 above says that the business rules changed and that's what the last assertion above shows. At some point that assert should have switched to an isFalse, but that didn't happen. Unit tests get run at development- and build-time but then at no point afterwards.


What's the lesson here? The code needs some way to signal that it received unexpected input. Here is an alternative way to write this code that emphasizes that it expects the prefix to be a number:

// Alternative A
bool TryGetIsTest(string strPrefix, out bool isTest) {
    int iPrefix;
    if(int.TryParse(strPrefix, out iPrefix)) {
        isTest = iPrefix >= 89 && iPrefix <= 100;
        return true;
    }
    isTest = true; //this is just some value that won't be read
    return false;
}

For those who don't know C#, the return value indicates whether or not the code was able to parse a prefix from the given string. If the return value is true, the calling code can use the isTest out variable to check if the branch prefix is a test prefix. If the return value is false, the calling code should report that the given prefix is not expected, and the isTest out variable is meaningless and should be ignored.

If you are ok with exceptions, you can do this instead:

// Alternative B
bool IsTest(string strPrefix) {
    int iPrefix = int.Parse(strPrefix);
    return iPrefix >= 89 && iPrefix <= 100;
}

This alternative is more straightforward. In this case, the calling code needs to catch the exception. In either case, the code should have some way of reporting to the caller that it didn't expect a strPrefix that couldn't be converted to an integer. In this way the code fails fast and the bank can quickly find the problem without the SEC fine embarrassment.

2

An assertion built-in to the run-time might have helped; for example:

  1. Create a function like bool isTestOnly(string branchCode) { ... }
  2. Use this function to decide which reports to filter out
  3. Re-use that function in an assertion, in the branch-creation code, to verify or assert that a branch isn't (can't be) created using this type of branch code ‼
  4. Have this assertion enabled in the real run-time (and not "optimized away except in the debug-only developer version of the code") ‼

See also:

1

So many answers and not even one Dijkstra's quote:

Testing shows the presence, not the absence of bugs.

Therefore, it depends. If the code was tested properly, most likely this bug would not exist.

-1

I think a unit test here would have ensured the problem never exist in the first place.

Consider, you have written the bool IsTestData(string branchCode) function.

The first unit test you write should be for null and empty string. Then for incorrect length strings then for non integer strings.

To make all those tests pass you will have to add parameter checking to the function.

Even if you then only test for 'good' data 001 -> 999 not thinking about the possibility of 10A the parameter checking will force you to rewrite the function when you start using alphanumerics to avoid the exceptions it will throw

11
  • 1
    This would not have helped - the function was not changed, and the test would not start failing given the same test data. Someone would have had to think of changing the test in order to make it fail, but if they had thought of that, they would have probably thought of changing the function as well.
    – Hulk
    Commented Jul 15, 2016 at 7:45
  • (Or perhaps I'm missing something, as I'm not sure what you mean by "parameter checking")
    – Hulk
    Commented Jul 15, 2016 at 7:51
  • The function would be forced to throw an exception for non integer strings inorder to pass the simple edge case unit test. Hence production code would error if you started to use alphanumeric branch codes without specificaly programming for them
    – Ewan
    Commented Jul 15, 2016 at 7:58
  • But wouldn't the function have used some IsValidBranchCode-function for performing this check? And this function would probably have been changed without any need to modify the IsTestData? So if you are only testing 'good data' the test would not have helped. The edge case test would have had to include some now-valid branch code (and not simply some still invalid ones) in order to start failing.
    – Hulk
    Commented Jul 15, 2016 at 8:11
  • 1
    If the check is in IsValidCode, so that the function passes without its own explicit check, then yes its possible to miss it, but then we would have an extra set of even more tests, mock Validators etc even more chances for specific "these are test numbers"
    – Ewan
    Commented Jul 15, 2016 at 8:40

Not the answer you're looking for? Browse other questions tagged or ask your own question.