Would unit tests have helped Citigroup to avoid this expensive mistake?

Question

I read about this snafu: Programming bug costs Citigroup $7m after legit transactions mistaken for test data for 15 years.

When the system was introduced in the mid-1990s, the program code filtered out any transactions that were given three-digit branch codes from 089 to 100 and used those prefixes for testing purposes.

But in 1998, the company started using alphanumeric branch codes as it expanded its business. Among them were the codes 10B, 10C and so on, which the system treated as being within the excluded range, and so their transactions were removed from any reports sent to the SEC.

(I think this illustrates that using a non-explicit data indicator is ... sub-optimal. It would have been much better to populate and use a semantically explicit Branch.IsLive property.)

That aside, my first reaction was "Unit tests would have helped here"... but would they?

I recently read Why most unit testing is waste with interest, and so my question is: what would the unit tests that would have failed on the introduction of alphanumeric branch codes look like?

Should a question that is meaningless without viewing an external link be closed? — gnat, Commented Jul 14, 2016 at 4:00
It seems they also missed an integration test which checked the amount of transactions exported to SEC. If you build an export feature that would be a reasonable check. — Luc Franken, Commented Jul 14, 2016 at 7:14
The author of the article doesn't seem to understand unit testing. Some claims are just plain ridiculous ("unit tests are unlikely to test more than one trillionth of the functionality of any given method"), others would destroy the chance of getting regressions ("look at the tests that have never failed in a year and consider throwing them away"). Or suggestions like "turn unit tests into assertions", which are supposed to change failed tests for runtime exceptions? — vgru, Commented Jul 14, 2016 at 13:24
@gnat I didn't read the external link, and I still found this question meaningful — Jeutnarg, Commented Jul 14, 2016 at 16:06
For what it's worth, I pretty much disagree with everything in "Why Most Unit Testing is Waste." I'd write a rebuttal, but this margin is too small to contain it. — Robert Harvey, Commented Jul 14, 2016 at 19:24

Steve Jessop · Accepted Answer · 2016-07-15 11:18:34Z

Are you really asking, "would unit tests have helped here?", or are you asking, "could any kind of tests possibly have helped here?".

The most obvious form of testing that would have helped, is a precondition assertion in the code itself, that a branch identifier consists only of digits (supposing that this is the assumption relied on by the coder in writing the code).

This could then have failed in some kind of integration test, and as soon as the new alpha-numeric branch ids are introduced then the assertion blows up. But that's not a unit test.

Alternatively, there could be an integration test of the procedure that generates the SEC report. This test ensures that every real branch identifier reports its transactions (and therefore requires real-world input, a list of all branch identifiers in use). So that's not a unit test either.

I can't see any definition or documentation of the interfaces involved, but it may be that unit tests cannot possibly have detected the error because the unit was not faulty. If the unit is permitted to assume that branch identifiers consist only of digits, and the developers never made a decision what the code should do in case it didn't, then they should not write a unit test to enforce particular behaviour in the case of non-digit identifiers because the test would reject a hypothetical valid implementation of the unit that handled alphanumeric branch identifiers correctly, and you don't usually want to write a unit test that prevents valid future implementations and extensions. Or maybe one document written 40 years ago implicitly defined (via some lexicographical range in raw EBCDIC, instead of a more human-friendly collation rule) that 10B is a test identifier because it does in point of fact fall between 089 and 100. But then 15 years ago someone decided to use it as a real identifier, so the "fault" doesn't lie in the unit that correctly implements the original definition: it lies in the process that failed to notice that 10B is defined to be a test identifier and therefore should not be assigned to a branch. The same would happen in ASCII if you defined 089 - 100 as a test range and then introduced an identifier 10$ or 1.0. It just happens that in EBCDIC the digits come after the letters.

One unit test (or arguably a functional test) that conceivably might have saved the day, is a test of the unit that generates or validates new branch identifiers. That test would assert that the identifiers must contain only digits, and would be written in order to allow users of the branch identifiers to assume the same. Or perhaps there's a unit somewhere that imports real branch identifiers but never sees the test ones, and that could be unit-tested to ensure it rejects all test identifiers (if identifiers are only three characters we can enumerate them all, and compare the behaviour of the validator to that of the test-filter to ensure they match, which deals with the usual objection to spot-tests). Then when somebody changed the rules, the unit test would have failed since it contradicts the newly-required behaviour.

Since the test was there for a good reason, the point where you need to remove it due to changed business requirements becomes an opportunity for somebody to be given the job, "find every place in the code that relies on the behaviour that we want to change". Of course this is difficult and hence unreliable, so would by no means guarantee saving the day. But if you capture your assumptions in tests of the units that you're assuming properties of, then you've given yourself a chance and so the effort is not wholly wasted.

I agree of course that if the unit hadn't been defined in the first place with a "funny-shaped" input, then there would be nothing to test. Fiddly namespace divisions can be hard to test properly because the difficulty doesn't lie in implementing your funny definition, it lies in making sure that everyone understands and respects your funny definition. That is not a local property of one code unit. Furthermore, changing some data type from "a string of digits" to "a string of alphanumerics" is akin to making an ASCII-based program handle Unicode: it will not be simple if your code is heavily coupled to the original definition, and when the data type is fundamental to what the program does then it often is heavily coupled.

it’s a bit disturbing to think it’s largely wasted effort

If your unit tests sometimes fail (while you're refactoring, for example), and in doing so give you useful information (your change is wrong, for example), then the effort wasn't wasted. What they don't do, is test whether your system works. So if you're writing unit tests instead of having functional and integration tests then you may be using your time sub-optimally.

I was going to also say "Unit tests bad!" but I thought most people would miss the reference to Animal Farm and start actually criticizing me instead of thinking about what I was saying (knee-jerk responses are ineffective) but I didn't say that. Maybe a person who is smarter and has greater erudition can make that point. — user186205, Commented Jul 15, 2016 at 17:40
"All the tests are passing, but some tests are MORE passing than others!" — GHP, Commented Jul 18, 2016 at 12:18
testing is a red herring. These guys just did not know how "branch code" was defined. This would be like the US Post Office not knowing that it was changing the definition of zip code when it added 4 digits. — radarbob, Commented Jul 18, 2016 at 13:33

Bart van Ingen Schenau · Accepted Answer · 2016-07-14 06:50:40Z

118

Unit tests could have caught that the branch codes 10B and 10C were incorrectly classified as "testing branches", but I find it unlikely that the tests for that branch classification would have been extensive enough to catch that error.

On the other hand, spot checks of the generated reports could have revealed that branched 10B and 10C were consistently missing from the reports much sooner than the 15 years that the bug was now allowed to remain present.

Finally, this is a good illustration why it is a bad idea to mix testing data with the real production data in one database. If they had used a separate database that contains the testing data, there would not have been a need to filter that out of the official reports and it would have been impossible to filter out too much.

answered Jul 14, 2016 at 6:50

Bart van Ingen Schenau

75.4k20 gold badges123 silver badges188 bronze badges

80

+1 Unit tests can never compensate for poor design decisions (like mixing test and real data)
– Jeutnarg
Commented Jul 14, 2016 at 16:06
6

While it is best to avoid mixing test data with real data, it can be hard to validate a production system if that requires modifying real data. For example, it's a bad idea to validate a banking system by modifying bank accounts totals in production. Using code ranges to designate meaning is problematic. A more explicit attribute of the records would probably have been a better choice.
– JimmyJames
Commented Jul 14, 2016 at 16:40
4

@Voo I think there is a tacit assumption that there exists a level of complexity or requirement of reliability where testing the actual, deployed production system is considered worthwhile or necessary. (Consider how much could go wrong because of a wrong configuration variable.) I could see this being the case for a large financial institution.
– jpmc26
Commented Jul 14, 2016 at 20:39
4

@Voo I'm not talking about testing. I'm talking about validation of the system. In a real production system, there are many ways that it can fail that have nothing to do with code. If you are putting a new banking system into production, you might have some issue in the db or network etc. that is preventing transactions from being applied to the accounts. I've never worked at a bank but I'm pretty sure it's frowned upon to start modifying real accounts with phony transactions. So that leaves you with either setting up fake accounts or wait and pray.
– JimmyJames
Commented Jul 14, 2016 at 20:41
12

@JimmyJames In healthcare it's common to periodically copy the production database into the test environment to perform testing on data that's as close to real as possible; I'd think a bank can do the same.
– dj18
Commented Jul 14, 2016 at 21:54

| Show 11 more comments

gnasher729 · Accepted Answer · 2016-07-14 08:07:38Z

76

The software had to handle certain business rules. If there were unit tests, the unit tests would have checked that the software handled the business rules correctly.

The business rules changed.

Apparently nobody realised that the business rules had changed, and nobody changed the software to apply the new business rules. If there had been unit tests, those unit tests would have to be changed, but nobody would have done that because nobody realised that the business rules had changed.

So no, unit tests wouldn't have caught that.

The exception would be if the unit tests and the software had been created by independent teams, and the team doing the unit tests changed the tests to apply the new business rules. Then the unit tests would have failed, which hopefully would have resulted in a change of the software.

Of course in the same case if only the software was changed and not the unit tests, then the unit tests would also fail. Whenever a unit test fails, it doesn't mean the software is wrong, it means either the software or the unit test (sometimes both) are wrong.

answered Jul 14, 2016 at 8:07

gnasher729

46.2k4 gold badges66 silver badges131 bronze badges

2

Is it feasible to have different teams where one is working on code and other on "unit" tests? How is that even possible?... I'm refactoring my code all the time.
– Sergio
Commented Jul 14, 2016 at 11:50
2

@Sergio from one perspective, refactoring changes internals while preserving behavior - so if the test is written in a way that tests behavior without relying on internals, then it doesn't need updating.
– Daenyth
Commented Jul 14, 2016 at 12:39
1

I've seen this happen a number of times. Software is in production with no complaints, then all of a sudden users explode with complaints that it no longer works and has been gradually failing over the years. That's what happens when you decide to go and change your internal procedures without following the standard notification process...
– Brian Knoblauch
Commented Jul 14, 2016 at 14:50
42

"The business rules changed" is the critical observation. Unit tests validate that you implemented the logic you thought you implemented, not that your logic was correct.
– Ryan Cavanaugh
Commented Jul 14, 2016 at 18:14
5

If I'm correct about what happened, it's unlikely that unit tests to catch this would be written. The basic principle for selecting tests is to test some "good" cases, some "bad" cases, and cases bracketing any boundaries. In this case, you'd test "099", "100", and "101". Since "10B" was covered by the "reject non-numbers" tests under the old system, and is greater than 101 (and so is covered by testing that) under the new system, there's no reason to test it -- except that in EBCDIC, "10B" sorts between "099" and "100".
– Mark
Commented Jul 14, 2016 at 18:47

| Show 6 more comments

Mason Wheeler · Accepted Answer · 2016-07-14 12:31:29Z

30

No. This is one of the big problems with unit testing: they lull you into a false sense of security.

If all your tests pass, it doesn't mean your system is working right; it means all your tests are passing. It means that the parts of your design that you consciously thought about and wrote tests for are working as you consciously thought they would, which really isn't that much of a big deal anyway: that was the stuff that you were actually paying close attention to, so it's very likely you got it right anyway! But it does nothing to catch cases you never thought of, such as this one, because you never thought to write a test for them. (And if you had, you'd have been aware that that meant code changes were necessary, and you'd have changed them.)

answered Jul 14, 2016 at 12:31

Mason Wheeler

82.9k24 gold badges235 silver badges311 bronze badges

17

My father used to ask me that: Why didn't you think of the thing you didn't think of? (Only he used to make it confusing by saying "If you don't know, ask!") But how do I know that I don't know?
– user186205
Commented Jul 14, 2016 at 13:28
7

"It means that the parts of your design that you consciously thought about and wrote tests for are working as you consciously thought they would." Exactly right. This information is invaluable if you are refactoring, or if something changes somewhere else in the system that breaks your assumptions. Developers who are lulled into a false sense of security simply do not understand the limitations of unit testing, but that doesn't render unit testing a useless tool.
– Robert Harvey
Commented Jul 15, 2016 at 15:09
12

@MasonWheeler: Like you, the author thinks that unit testing is somehow supposed to prove that your program works. It doesn't. Let me repeat that: unit testing does not prove that your program works. Unit testing proves that your methods fulfill your testing contract, and that's all it does. The rest of the paper falls down, because it rests on that single invalid premise.
– Robert Harvey
Commented Jul 15, 2016 at 15:18
5

Naturally, developers who have that false belief are going to be disappointed when unit testing utterly fails them, but that's the fault of the developer, not of unit testing, and it doesn't invalidate the genuine value that unit testing provides.
– Robert Harvey
Commented Jul 15, 2016 at 15:22
5

o_O @ your first sentence. Unit tests give you a false sense of security while coding like having your hands on the steering wheel gives you a false sense of security while driving.
– djechlin
Commented Jul 15, 2016 at 22:45

| Show 14 more comments

gbjbaanb · Accepted Answer · 2016-07-14 07:57:11Z

No, not necessarily.

The original requirement was to use numeric branch codes, so a unit test would have been produced for a component that accepted various codes and rejected any like 10B. The system would have been passed as working (which it was).

Then, the requirement would have changed and the codes updated, but this would have meant the unit test code that supplied the bad data (that is now good data) would have to be modified.

Now we assume that, the people managing the system would know this was the case and would change the unit test to handle the new codes... but if they knew that was occurring, they would also have known to change the code that handled these codes anyway.. and they didn't do that. A unit test that originally rejected code 10B would have happily said "everything is fine here" when run, if you didn't know to update that test.

Unit testing is good for original development but not for system testing, especially not 15 years after the requirements are long forgotten.

What they need in this kind of a situation is an end-to-end integration test. One where you can pass in the data you expect to work and see if it does. Someone would have noticed that their new input data didn't produce a report and would then investigate further.

Spot on. And the main (only?) problem with unit tests. Saved me wording my own answer, as I'd have said precisely the same thing (but probably worse!) :) — Lightness Races in Orbit, Commented Jul 15, 2016 at 23:04

Jules · Accepted Answer · 2016-07-14 08:56:39Z

Type testing (the process of testing invariants using randomly-generated valid data, as exemplified by the Haskell testing library QuickCheck and various ports/alternatives inspired by it in other languages) may well have caught this issue, unit testing almost certainly would not have done.

This is because when the rules for the validity of branch codes were updated, it is unlikely anybody would have thought to test those specific ranges to ensure they worked correctly.

However, if type testing had been in use, somebody should at the time the original system was implemented have written a pair of properties, one to check that the specific codes for test branches were treated as test data and one to check that no other codes were... when the data type definition for the branch code was updated (which would have been required in order to allow testing that any of the changes for the branch code from digit to numeric worked), this test would have started testing values in the new range and would most likely have identified the fault.

Of course, QuickCheck was first developed in 1999, so was already too late to catch this issue.

I think it's more normal to call this property based testing, and of course it's just as possible to write a property based test that would still pass given this change (though I do think you are more likely to write a test that could find it) — jk., Commented Jul 14, 2016 at 9:15

Class Skeleton · Accepted Answer · 2016-07-14 12:06:25Z

I really doubt unit testing would make a difference to this problem. It sounds like one of those tunnel vision situations because functionality was changed to support new branch codes, but this was not carried out throughout all areas in the system.

We use unit testing to design a class. Re-running a unit test is only required if the design has changed. If a particular unit does not change, then the unchanged unit tests will return the same results as before. Unit tests are not going to show you the impacts of changes to other units (if they do you are not writing unit tests).

You can only reasonably detect this problem via:

Integration tests - but you would have to specifically add the new code formats to feed through multiple units in the system (i.e. they would only show you the problem if the original tests included the now-valid branches)
End-to-end testing - the business should run an end-to-end test that incorporated old and new branch code formats

Not having sufficient end-to-end testing is more worrying. You cannot rely on unit testing as your ONLY or MAIN test for system changes. Sounds like it only required someone to run a report on the newly supported branch code formats.

user2023861 · Accepted Answer · 2016-07-15 20:05:06Z

The takeaway from this is to Fail Fast.

We don't have the code, nor do we have many examples of prefixes that are or are not test branch prefixes according to the code. All we have is this:

089 - 100 => test branch
10B, 10C => test branch
< 088 => presumably real branches
> 100 => presumably real branches

The fact that the code allows numbers and strings is more than a little strange. Of course, 10B and 10C can be considered hex numbers, but if the prefixes are all treated as hex numbers, 10B and 10C fall outside of the test range and would be treated as real branches.

This likely means that the prefix is stored as a string but treated as a number in some cases. Here is the simplest code I can think of that replicates this behavior (using C# for illustrative purposes):

bool IsTest(string strPrefix) {
    int iPrefix;
    if(int.TryParse(strPrefix, out iPrefix))
        return iPrefix >= 89 && iPrefix <= 100;
    return true; //here is the problem
}

In English, if the string is a number and is between 89 and 100, it's a test. If it's not a number, it's a test. Otherwise it's not a test.

If the code follows this pattern, no unit test would have caught this at the time the code was deployed. Here are some example unit tests:

assert.isFalse(IsTest("088"))
assert.isTrue(IsTest("089"))
assert.isTrue(IsTest("095"))
assert.isTrue(IsTest("100"))
assert.isFalse(IsTest("101"))
assert.isTrue(IsTest("10B")) // <--- business rule change

The unit test shows that "10B" should be treated as a test branch. User @gnasher729 above says that the business rules changed and that's what the last assertion above shows. At some point that assert should have switched to an isFalse, but that didn't happen. Unit tests get run at development- and build-time but then at no point afterwards.

What's the lesson here? The code needs some way to signal that it received unexpected input. Here is an alternative way to write this code that emphasizes that it expects the prefix to be a number:

// Alternative A
bool TryGetIsTest(string strPrefix, out bool isTest) {
    int iPrefix;
    if(int.TryParse(strPrefix, out iPrefix)) {
        isTest = iPrefix >= 89 && iPrefix <= 100;
        return true;
    }
    isTest = true; //this is just some value that won't be read
    return false;
}

For those who don't know C#, the return value indicates whether or not the code was able to parse a prefix from the given string. If the return value is true, the calling code can use the isTest out variable to check if the branch prefix is a test prefix. If the return value is false, the calling code should report that the given prefix is not expected, and the isTest out variable is meaningless and should be ignored.

If you are ok with exceptions, you can do this instead:

// Alternative B
bool IsTest(string strPrefix) {
    int iPrefix = int.Parse(strPrefix);
    return iPrefix >= 89 && iPrefix <= 100;
}

This alternative is more straightforward. In this case, the calling code needs to catch the exception. In either case, the code should have some way of reporting to the caller that it didn't expect a strPrefix that couldn't be converted to an integer. In this way the code fails fast and the bank can quickly find the problem without the SEC fine embarrassment.

Community · Accepted Answer · 2017-05-23 12:40:15Z

An assertion built-in to the run-time might have helped; for example:

Create a function like bool isTestOnly(string branchCode) { ... }
Use this function to decide which reports to filter out
Re-use that function in an assertion, in the branch-creation code, to verify or assert that a branch isn't (can't be) created using this type of branch code ‼
Have this assertion enabled in the real run-time (and not "optimized away except in the debug-only developer version of the code") ‼

Stack Exchange Network

Would unit tests have helped Citigroup to avoid this expensive mistake?

11 Answers 11

Not the answer you're looking for? Browse other questions tagged
unit-testing
or ask your own question.

Hot Network Questions

Would unit tests have helped Citigroup to avoid this expensive mistake?

11 Answers 11

Not the answer you're looking for? Browse other questions tagged unit-testing or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
unit-testing
or ask your own question.