16

I currently have a position as a team leader inside a company (20-30 employees) who offer a service while using internal software. My team is of 5 developers and a product manager, and our main responsibility is to maintain and implement new features into this internal feature, which is the tool used by all employees every day.

As a background, all members of the team have been in the company ranging from 6 months to 2 years, while I hold the highest seniority inside the team. Something to note is that there's a difference in age between myself and some team members. I'm in my 20s while some of them are in their 30s, but with less work experience in the IT field. I report directly to the CTO of the company while handling the role of lead developer inside the team.

The problem I'm facing is the following: during the last few months, we've caused bugs while developing new features or fixing other bugs in the software. Some of these bugs didn't have a high effect on the business (for example a page not being displayed correctly) but others did have an impact on the revenue of the company. This means that we affected very important features.

On every bug that happened, I made sure that the person who caused the issue fixed it and knew its impact. My role inside the team is to review all new written code, so some of those bugs were also over-looked by myself during the review. Since we don't have a QA engineer, the product manager takes care of testing all features when delivered.

Every time, the employee apologized for causing the issue, and it's clear from analyzing them that they're issues that could have been solved if the code was tested enough or if the employee paid more attention. Every 2 weeks (or sprint, we use SCRUM), we hold a meeting to discuss what went well/not well, and I've told the team that we need to pay more attention to all changes and be more strict when developing new features to avoid problems.

Even after that, issues continue happening, so I want to know what will be the most effective way to communicate to the team that type of dynamic cannot continue happening. I have the following questions:

  1. How can I effectively communicate to my team that we need to be accountable/careful for issues caused?
  2. Should I use the opportunity inside the 1-on-1 meetings with each team member and discuss it personally instead of addressing the whole team?
  3. How can I provide feedback to the product manager (who doesn't report to me) that they're not testing enough and need to be more attentive on that part of the process?
  4. I've been discussing these issues with the CTO and followed her advice, but didn't improve the situation. How can I notify her that I need her intervention to communicate with the project manager while clearing that I'm holding myself accountable for the team as well? I want to avoid office politics issues where I appear as someone putting faults on someone else.

Notes:

  • To avoid these issues, we're implementing currently automated tests to reduce the risk of bugs in the software.
  • I'm holding myself accountable as well since these issues should be solved during the review process. Since I'm also participating in the development tasks, I end up with no time. I'm trying to change the team's workflow to be able to free up more of my time.
13
  • 44
    It's possible that the root cause is that you don't have a dedicated QA. Your PM is also the QA, which means he has to divide his time between the PM and QA jos, and can't work 40 hours per week as the QA. You should ask the company to hire a dedicated tester who will catch more bugs for the team. Commented Mar 30, 2023 at 22:45
  • 9
    @fiftyfifty 100%, the high frequency and severity of bugs is directly correlated to lack of dedicated QA Commented Mar 31, 2023 at 5:51
  • 8
    "My role inside the team is to review all new written code" seriously?
    – njzk2
    Commented Mar 31, 2023 at 19:57
  • 2
    "I've told the team that we need to pay more attention" how do you concretely suggest to do that? do you have testing in place? do you cover all discovered bugs by tests to avoid regressions?
    – njzk2
    Commented Mar 31, 2023 at 19:59
  • 3
    “You know, try harder” is never a tactic that pays off. You have to change the process.
    – mxyzplk
    Commented Jul 14, 2023 at 1:40

10 Answers 10

47

I want to know what will be the most effective way to communicate to the team that type of dynamic cannot continue happening.

This isn't an effective way of addressing the issue.

If you want developers to spend more time manually testing their changes, or generally be more cautious and careful when coding, this will mean a slower - most likely significantly slower - development process. It's very unlikely that any potential cutting of corners (if that's the reason for these bugs) is due to team members not caring about the quality of the product, so appealing to them to essentially work harder isn't a long-term solution. Telling them to reprioritise on manually testing for issues instead of coding new features is an option, but this may be hard to convince your bosses to accept. You're essentially offering a reduction in bugs in exchange for a permanent and significant reduction in productivity, as far as they're concerned. It's also a very inefficient way of trying to find bugs compared to automated testing.

What you need to do is to change the development process to include more automated testing, and a better code review process (you shouldn't be the only person doing code reviews). This will have an upfront cost on productivity but once the team settles into the new system, it can be overall faster (as well as less prone to bugs) than the current approach. There are other answers that discuss what changes might be useful, and other questions on this site about how to lead that transition/how to sell it to your boss, but from what you've mentioned in your question this is what you need to solve the problems you are having. And this is definitely something that, as team lead, you should be in responsible for.

6
  • 13
    just as an addition you might integrate or not: there is a reason that developer and QA are different professions, it's two different skill sets that are required and having developers do (too much) manual testing can be a total waste of resources and drive them away. Yes, developers should somehow make sure their code works, but for most the ideal solution for that is "you compile/run tests and it's green or red". There are some exceptions, but at a certain point when you need to click around a lot in an application to test specifics it makes sense to include a QA person early in the process. Commented Mar 30, 2023 at 21:48
  • 3
    @FrankHopkins I agree, but I don't think it fits into this answer. The term 'QA' in practice is used for a very wide range of roles, from manual testers to automation engineers to (sometimes) roles more like PO or BA -- I've seen it happen on projects I've worked on. Adding designated testers would be an improvement for OP, but needs some process changes to happen first anyway. The details of why & how to integrate the right kind of QA for a given project is really its own answer, and more realistically its own essay.
    – Dakeyras
    Commented Mar 31, 2023 at 0:22
  • 16
    Agreed - telling the team they can't keep causing bugs is redundant. No engineer should need telling that errors are bad, and doing so will only harm morale and cause stress. You've said that several of the issues would have been caught if a particular person had paid attention or if a particular bit of code had been tested, so you need to change your process so that one person's lapse isn't all it takes for a bug to reach production, and to ensure that everything gets tested that needs to be. "Pay more attention" isn't a process change, it's an admonishment. Implement something measurable. Commented Mar 31, 2023 at 9:44
  • 1
    @FrankHopkins Absolutely right about testing needing a different mindset — also a different attitude. Developers are natural optimists; we assume it's working, and look for ways to confirm that. Testers assume something's broken, and try to find how. For the same reason that you need different lawyers working for the prosecution and the defence, you need different people writing and testing the code. You can't do both effectively!
    – gidds
    Commented Apr 1, 2023 at 13:07
  • 1
    Even with no automation, fact is that developers are better at developing than testers, and testers are better at testing than developers, and handing over from development to QA as soon as there are no known bugs is just more efficient.
    – gnasher729
    Commented Apr 2, 2023 at 9:47
27

If you want your team to be effective.. You need to remember that you blame the process, not the team member. As doing otherwise leads to a significant drop in team moral and serious case of both CYA (cover your ass) and NJH (new job hunting). Despite the mass layoffs at Facebook/Twitter its still pretty easy to find work.

From the sounds of it, your development process is atrocious. Your development process can be summed up as "we write code and pray to the machine god that it works as intended with no side effects." You don't have QA, automated unit tests, automated functional tests, or--it sounds like--a good test environment.

it also sounds like there might be a lot of technical debt/fragile code in your software. Fixing bugs/ and adding new features shouldn't impact anything except things directly related to the bug/ feature you are adding on a well designed system. In a well designed system you shouldn't need to do any testing outside of the code you are working on (you still do because good testing habits run thru your blood).

While this is ultimately on your team members as they should have pushed back against the companies lack of good development processes, against your PoS code base and for better coding practices, it sounds like many of them are not very experienced at software engineering, and so might not know better.

What you need to do? Have a discussion with your team and openly talk about the number of bugs and difficulties that they are experiencing while writing software. You need to get buy in from your team members to improve your development practices. Good development practices should be branded onto their hearts.

To implement these good development processes, improve your testing environment, write unit tests, and functional tests for existing features... If you have a GUI, automate testing that as well, there unit testing frameworks for everything. I would think about assigning someone on your team to spend half their time in charge of QA. Balance out the cost of bugs with the cost of slower development, and some other team members to building a bunch of regression tests... if bugs are costing you enough money spending the time/money to improve your code base will pay itself back.

Also start doing test driven development... Don't write code, and then test it to make sure it works. Write unit tests that test for the desired behavior then write code that implements that behavior.

All of this will slow down your rate of development, and cost time/money. But once you have paid back your technical debt, you should see an increase in efficiency and speed because good development/testing process give you coding confidence.

8
  • 15
    Generally good answer, but I'd be careful with a fixed introduction of TDD. I know quite a few developers who find it tedious, counter-productive, annoying etc. It's a process that doesn't suit everybody. I'd rather make sure there are good tests - and give the team / developers options to make that happen, TDD being one option among them. To get buy-in it typically helps to agree on the goal and then come up together with the right tools. So perhaps for the practicalities - I'd consider them good suggestions, but wouldn't fix on "do this" but discuss them with the team and pick fitting ones. Commented Mar 30, 2023 at 21:42
  • 3
    You can write tests afterward, or write tests first, or write tests as the same time that you introduce any functional change to demonstrate the difference between what it did (or didn't do) before and what it does now. The critical thing is DON'T MERGE UNTESTED CODE. In a large project, that can mean DON'T MERGE CODE WITHOUT RUNNING A FULL-SYSTEM REGRESSION TEST, to guard against unexpected interaction. An automated system to enforce that is very helpful and most change-control systems have at least one such system available to plug into them which will enforce that rule.
    – keshlam
    Commented Mar 31, 2023 at 5:05
  • 14
    And if you think testing is expensive... just wait until you find out what failing to test will cost you.
    – keshlam
    Commented Mar 31, 2023 at 5:05
  • 2
    @FrankHopkins That is a good point, Test Driven development can be very cumbersome/frustrating to get into the habit of. I personally prefer it because in the process of writing my tests I better understand the problem I am trying to solve.
    – Questor
    Commented Mar 31, 2023 at 16:14
  • 2
    However, I have found that if you don't write code with unit tests in the back of your mind you often end up with code that is difficult to unit test.
    – Questor
    Commented Mar 31, 2023 at 16:18
15

The Questor has written a good answer about the technical side of this - however here is what I'm going to address:

Leadership starts with yourself

If people see someone heading in what they think is the right direction, they will want to follow. It's up to you to set the standards for yourself and that will inspire people to follow/adhere to them.

There's two reasons I say this - firstly because I believe it to be true and secondly - in relation to your specific issue:

"I reviewed the code and missed the Bug"

Start there. Start with that conversation "Here is what I didn't do, and here's how I'm going to improve on it"

You will need to hold yourself accountable - but if people see that, they will want to do likewise.

All the technical problems can be fixed by technical solutions - this is what will fix the culture.

5
  • 20
    This quote in particular jumped out at me: "On every bug that happened, I made sure that the person who caused the issue fixed it and knew it's impact.", but the team caused the issue - a single dev might have written the bug, but the QA missed it and the team lead signed off shipping it to PROD! The Swiss Cheese Model (en.wikipedia.org/wiki/Swiss_cheese_model) is a good analogy - the layers don't have to be perfect, but you need to work out how the dev, the QA and the team lead all missed the same defect and let it pass right through undetected...
    – mclayton
    Commented Mar 30, 2023 at 22:46
  • 11
    Yes, that, and the quote "Every time, the employee apologized for causing the issue" really made me wonder how his team members viewed the situation. In particular, the OP seems to think assigning blame and getting an apology are signs of a healthy team. Me, I'm wondering why their reaction wasn't "how do we fix our process to prevent this?". It created an image of a strong blame culture and very disempowered subordinates for me.
    – JonathanZ
    Commented Apr 1, 2023 at 15:07
  • @JonathanZsupportsMonicaC "It created an image of a strong blame culture ..." I agree. However it would be interesting to know the country in which the OP works, since work culture may also be related. Some work cultures tend to reflect the general attitude of the population of the country and there are countries where assigning the blame to a specific individual and this latter apologizing is commonplace. Commented Apr 2, 2023 at 14:21
  • 4
    @LorenzoDonatisupportUkraine - Yeah, so much of this is cultural. Sometimes i wonder if the workplace stack exchange should require a "Country" specifier before allowing answers. But man, that apology keeps bugging me. Who did they apologize to? And if the OP is supposed to review every submission, did they apologize as well for missing the mistakes? It's very alien to my experience.
    – JonathanZ
    Commented Apr 2, 2023 at 15:21
  • @JonathanZsupportsMonicaC Yep, although I have no big experience in corporate environment (worked for public institutions most of my work life), for what I've read this reminds me of something common in far-east countries, like South Korea or Japan, but it's just a wild guess, admittedly. Commented Apr 2, 2023 at 16:34
10

Having no dedicated QA is absolutely the problem. People making mistakes is unavoidable. These mistakes costing the company revenue, that’s avoidable by having QA.

One problem is that you have inexperienced developers. An experienced developer would have told you this a long time ago. And software developers cannot test their own code. It’s because having written the code, they subconsciously don’t want to find bugs. Finding bugs makes them look bad. With a QA person, finding bugs makes them look good.

6
  • 7
    It's not that developers don't want to find bugs. The problem is that when you developed a piece of software, then it's hard to fathom how it could be used in ways and situations you didn't think about while developing it. But this answer is correct in that having a QA department of people who tests the application from a different perspective is the solution to this problem.
    – Philipp
    Commented Mar 31, 2023 at 16:10
  • 2
    +1 No dedicated QA team and process is practically the same as telling your customers you don't give a damn what you give them. I would not knowingly do business with a company who did not have a dedicated test team and process. Commented Mar 31, 2023 at 19:28
  • 1
    @Phillipp: It's not conscious. Consciously I want to find bugs because it means I do a better job. Subconsciously I don't want to find bugs. Of course just the fact that QA employs different people doing things in a different way will uncover bugs. If the QA person follows a script from a sheet of paper, steps will be done slowly - and that can change behaviour.
    – gnasher729
    Commented Mar 31, 2023 at 22:54
  • (Also @Philipp), there are indeed subconscious aspects which are not directly related to a subconscious "I don't want to admit I made a mistake". It's our brain that is incapable of spotting even glaring mistakes by re-reading something it has already scanned a tons of times. It's the same mechanism at play when you reread for the tenth time a letter or an article you have just written and still don't spot the dumb typo (that you may spot if you reread it after a week, though). That's why publishers have proofreaders check manuscripts even if those comes from highly educated professionals. ... Commented Apr 2, 2023 at 16:44
  • 1
    (Also @Philip). Anyway, it always pays to have a QA person whose job is exclusively that of trying to break your code without any kind of subconscious "coder remorse or pity". You really want someone who enjoys breaking a piece of code and it is motivated to do so (both emotionally and financially). IIRC there was some strips of Dilbert (I may remember the wrong comic) where the QA guys were depicted as "black-cape villains" cackling evilly any time they broke a piece of code. :-) Commented Apr 2, 2023 at 16:51
5

Several key principles may help you here.

First, let's look at the team-level items:

  1. Since you're using SCRUM, it might help to clarify your "Definition of Done" for your tasks. Does "Done" only include initial coding? Does it require internal testing, as well? And how thorough is the limited-audience public review prior to distribution of each module? If your developers think "done" means, "I got it to run," then there will always be buggy code. If, on the other hand, coders know that "done" means well-verified, and know that their code will be rigorously tested by some end user, with a potential for additional hassle and embarrassment (not shame--just the natural consequences of having to correct your own code after a recall), they will generally code more responsibly.
  2. In front of your entire team, your product manager (PM) should be clear about the consequences of developing buggy code. There are a couple of principles here. First, according to the SCRUM model, the specification (including quality level) of the product is the responsibility of the PM. If the PM doesn't care about buggy code enough to specify the parameters, it's going to be harder for your team to care about buggy code. If all the PM cares about is getting modules developed, your team is going to respond to that. So find out if the PM cares (and if they don't, perhaps help them understand the consequences of emphasizing speed). Second, when a team understands the consequences of releasing buggy code, they will naturally choose to be more diligent. However, if you keep them shielded from the consequences, their motivation for correcting their behavior is low. So have the PM help them see the costs, even to the extent that it means passing the consequences down to the developers responsible for them (maybe not the financial costs, per se, but the additional workload, the hassle of having to revisit old code, the velocity impediments from switching tasks, etc.)

Next, let's look at the individual-level question you asked: Should you address the bug issue in 1-on-1 discussions with team members?

  1. You want to present vision, standards, and processes generally, so that everyone gets to participate, gets the same instruction, and commits to the same goals.
  2. You want to correct mistakes privately, clearly, and quickly. Help your developers know that their mistakes are not evidence that they're "bad" coders, but just that their approach is not as efficient as it could be. Your approach in a 1-on-1 can incorporate principles you have previously discussed in the group setting--reiteration can be helpful--but also make sure you work to understand any obstacles to a particular developer's compliance. Make it a 1-on-1 dedicated to helping them feel more successful, and if they are naïve to the needs of the team, see if you can invite them to "participate," to "help solve a problem."

Now, let's look at the CTO/PM/you triangle:

  1. This is a tricky structure, and a bit not-SCRUM like. What makes it complicated? A) You and the PM both "lead" your team, but you don't necessarily have the same priorities. B) Further, lines of accountability are not clear/aligned--if code is buggy, does all of the blame fall to you? It should fall at least as much on the PM. C) The direct line from the CTO to you means that discussions about bugs are going to likely hit you first, because you can do things about the bugs more quickly.
  2. To remedy these things, I recommend a re-alignment of accountability, with most of the changes moving toward the PM. A) You both still lead the team, but your role is to help your team meet the PM's objectives as fast and effectively as possible. You let them choose the direction (including level of quality), and you lead your team to that objective. B) While you are responsible for bugs in the code, the PM is responsible for releasing a buggy product. If they are incentivized by the speed at which you work (fixing forgotten bugs is SLOW work), and penalized when clients turn over because they encounter bugs, then they will choose to help you focus on cleaner code. Re-aligning the rewards/consequences will naturally help them to value the bug-free environment/process. C) It will likely feel ostentatious, but I recommend talking with the CTO about ways you can improve your processes, and NOT about bugs that need to be fixed. Conversations about bugs that need to be fixed should be had between the CTO and the PM. The CTO is kind of like your scrum master in this setting--they give you resources, remove obstacles, etc., but you get your list of tasks from the PM.

Finally, let's consider one personal-level item:

  1. While there is always potential for politics and positioning, especially when resources are scarce, remember that a gracious, helpful perspective/mindset and approach will get you MUCH further than a cynical or critical one. If you want the PM's help, see if what you do can further their objectives somehow. If you need your team's buy-in, find out what's important to them. As you work together to help everyone win (including yourself), you'll succeed as a company.
2
  • “When the team understands the consequences of releasing buggy code…” who says they don’t? Bugs happen. It is your responsibility to avoid the consequences. Usually QA finds the bugs and someone fixes them. And usually many bugs are found during code reviews which doesn’t seem to happen in a competent way either.
    – gnasher729
    Commented Apr 1, 2023 at 13:43
  • Regarding the specification issue. This doesn't need to be upfront. A key part of the scrum philosophy is a) communication with the PO during the sprint to fill in questions the developers discover during the sprint, b) rework as a result of missed/unexpected specification detail is to be expected and is a key part of the sprint review process. Commented Apr 1, 2023 at 17:00
5

As other answers have alluded: it is the team (organization) as a whole that bears the responsibility. The corollary is: how to improve the team so as to deliver higher-quality code and thus not impacting the business negatively? Pointing fingers is counterproductive; but if a pattern emerges, leadership should initiate corrective action.

A lot of good points were made re. what is wrong and what could be done; I want to add some more observations from my experience. I see some similarities between OP's circumstances and my own: team of ~7 developers, project ~2 years old (so no devs working on it longer than that), product used by the company's "resellers", Agile-ish process, no official/structured QA. Differences: older people, mostly 30-50-ish (so some more experience and maturity - I for one know a lot more than I did even at 30). Big international luxury goods org.

  1. Complexity: initially the product was quite easy to grasp in totality, but as new features and integrations were added, one would only vaguely be aware of some other functionality, and not also the intricacies of how it was implemented. There's also a lot of concurrency, which is not always handled safely. This complexity is (IMHO) the biggest source of introduced bugs: something was added and tested just fine by the dev, but s/he was not aware of how some adjacent code is impacted by it, and some very subtle bugs got introduced. (No real finger-pointing about new bugs: a new bugfix story gets logged, estimated, done whenever; or if it's urgent, one or more of us would do a mob programming session and get it fixed ASAP.) Another point of personal opinion: Functional Programming makes it easier to reason about the flow of data and execution, unfortunately a lot of team members have not caught on yet, so you get state that gets mutated in one place affecting some other functionality, and it is not always quite visible that it does. Another BIG help would have been detailed documentation: e.g. architecture and functionality, so one could read up on what is expected and fit in. However this would take a lot more man hours, and it seems most orgs choose the tradeoff where they would rather respond to user needs very quickly.

  2. Knowledge: Technology gets updated very fast, not all our team members have even a working knowledge of the WHOLE tech stack, and there seems a bit of penny-pinching reluctance from management to get people upskilled constantly. Time is under pressure... Also, it's not always feasible to self-study: some areas require access to expensive resources and/or real-world scenarios. Result: things break because people simply didn't know.

  3. Testing: we do have a strict (automatically enforced) policy of unit testing and a minimum required code coverage threshold. Problem with metrics is that now people code to pass the metric... Still does not catch all bugs or regression. Again a personal opinion: TDD is difficult as I often find myself not knowing how something should be implemented, so there's a lot of experimenting until it works. Unit test are added afterwards. So in my case it would be perhaps better to use the unit/integration testing as a kind of "proof that functionality works", for all possible use cases (if only functional requirements were documented...).

  4. Documentation: I mention documentation a lot. The sad truth is that if one's requirements are all spread out in (often badly worded) Jira stories, with changes in subsequent stories, it is nigh impossible to get it perfect or look up the "should have been" - people are again trying to keep it in their imperfect memories. Ideal would be to have them all compiled into an accessible, coherent source. (I'm fully convinced that if Documentation were part of the work products, Agile could easily accommodate updating them just as code is updated - you lose time now but gain clarity (and thus time) later - question remains with the tradeoff: is it worth it to you? And if not, then put up with the consequences.)

  5. Code Reviews: The Github pull request tool makes it quite difficult to get an overall picture if it is a more complex change because only the changes are highlighted (and if you are in the web browser, you don't have access to all the helpful IDE features). While our whole team participates (2 approvals required), I feel in many cases I get a better idea of the code if checking it out and tracing through it, reading the IDE's hints, or even running it. And by that time others have already approved and the author has merged. Get a code review process that adds real value and isn't just a rubber stamp.

Long story short: nobody makes mistakes on purpose, but cognitive complexity makes them overlook things. Organizing things helps to simplify the complexity in a lot of disparate, orthogonal areas and different levels of abstraction. And to make things worse each human's cognitive abilities differ, so you have to cater for your team's lowest common denominator (which can potentially be a fresh-out-of-college new hire).

1
  • Good answer, one minor issue: If there's a changeset that can't be reviewed effectively in browser but your teammates approve it based on a browser review regardless, you don't have a code review process but a rubber stamping process. :) Everyone should obviously be on the same page regarding what's expected from them. Commented Apr 2, 2023 at 22:17
4

The thing with developers is that they (we) usually fix known problems, but what software development needs, is testing unknown problems. Like the others have said, QA is a step you can't skip. Especially a team of 5 developers is a large one, in my experience a small project with 2 developers is the maximum you can go without dedicated testers. It's not enough to test each feature with it's intended usage, you need to have a test plan to go through with every release to go through all test cases with intended usage, corner cases, unintended usage and impossible scenarios. It's also important to have someone from the target audience testing, which, as it is an internal tool for employees, might be already covered, but you need to make sure there is no knowledge gap that makes developers/testers miss important ways to use the application.

Another thing is, you simply can't have a requirement of "we have no bugs". Unknown things will always happen. "Test until there is no more bugs" is not a useful thing to tell to the development team. Neither is apologizing. What could be useful (in addition to having QA), is finding out what are the types of bugs most often happening. Is there a complicated swirl of features, and fixing bugs causes more bugs? You need refactoring and possibly improvements in architecture. Are the bugs appearing in different devices, OS's or browsers? Make sure to test all environments, here automatic tests could especially help. Is there lot of responsiveness issues and need to create exceptions for different screen sizes? You'd have most likely needed a better approach or libraries to begin with, if you are required to handle responsiveness issues manually case by case, bugs will be neverending. Are there beginner coders making simple code logic bugs? Code review process needs improvement. Performance? UX? Your project specific information is needed to understand where the issue is.

3

You don't fix process by burning witches

If every developer is producing code that adds bugs, the problem is almost certainly not every developer. The problem is the process of how you develop.

Even if it was only half of the developers that are producing bugs, it is a process and management problem.

If all you do is start punishing developers for releasing bugs, you'll get every developer who can get a job elsewhere doing so, leaving behind the rest who will almost all be unhappy.

You need to fix your process.

High Quality Code is Expensive

People will try to sell you "just do X" and your problem goes away. But your real problem is that high quality code is expensive compared to the code you are producing right now.

It can easily be 10 to 100 to 1000x as expensive as the process you describe.

Now, it does reduce the rate at which you introduce new bugs. But it will also massively increase the cost of fixing old bugs and introducing new features. Sometimes this cost increase is so large that the company cannot survive it, and it goes under.

I have no time to improve quality

If you (as an organization) have no time to do the development you are currently doing, then no, nothing you will do will prevent new bugs from occurring.

There is no cheap fix.

There are things you can do to marginally improve the quality of your bug fixes -- many things -- but it definitely won't prevent even the majority of bugs from getting through without a significant decrease in (at least short-term) productivity.

  1. Serious and effective automated testing way more than doubles the cost of a code base. And deferring it doesn't make it cheaper. Currently your code base lacks automated testing. So tooling it with serious automated testing will require as much development effort as it took to create your code base in the first place, and probably more.

  2. Extensive manual QA is an expensive process. QAing effectively most changes requires more person hours than writing it. And even getting it down to this level requires a bunch of it to be automated.

  3. Full review of changes is more work than writing it.

  4. Creating changes that don't break effective automated tests significantly more work than making a cowboy change that just works.

  5. Entire domains of coding, like asynchronous code, graphics, and a bunch of others, require a high level of skill that can only be gained through a pile of expensive failures. Either you experience those failures internally, or you hire someone who has that expertise. If you don't want these failures to impact customers, this means an insanely thick buffer between developers and customers so you can catch these errors, possibly at the level of developing entire products and discarding them. When google buys a company they usually discard the code base, because it isn't up to google standards, but they hire the employees, because those employees know what they did wrong the first time.

Marginal, "Cheap" fixes

These should double development costs, but have a hope of reducing your regression problem.

  1. Have every change done twice independently with no communication. Once both changes are fully approved, compare the two changes and determine which one is better and safer, or if a combined one would be even better.

Each changer has an incentive to find problems in the other change now, even if just out of pride.

  1. Develop continuous deployment. Offer different levels of stability to your customers (nightly, weekly, monthly, and quarterly releases). Hire a large QA team (including engineers-in-test) to pound on your nightly builds and raise issues.

  2. Require new automated tests for every new bug fix. When a bug is fixed, there must be an automated test that failed before the fix, and passed after the fix. These tests must be on behavior of some subsystem or API, and the failed/success test cannot be arbitrary. Ie, no "this vector has 3 elements" is a pass; it must be "the result is nonsense or incorrect".

Ideally the test should be cheap, but for some kinds of bug that isn't possible. An expensive test is still required.

Using git bisect or similar the location where the fix is introduced and (ideally) where the bug is introduced must be possible with an automated harness.

Run as many of these bug tests nightly as you can. If you can't run them all, run a random subset. If a regression bug happens, git bisect with the last time it passed to find when it was introduced.

New changes have to pass the entire suite of tests, even the more expensive ones. This may require a pile of computers whose only job is to run these tests.

...

All of these aren't cheap. But they are cheaper than a full on fix, and should reduce your rate of introducing new bugs.

On the other hand, telling an employee they are bad because they introduced a regression they should have fixed is free.

4
  • 2
    "Serious and effective automated testing way more than doubles the cost of a code base." "Creating changes that don't break effective automated tests significantly more work than making a cowboy change that just works." I think, based on my experience, that these claims are unfounded. To put it mildly. Especially the latter quote. Quality is speed. When you have good automated tests in place, making a change is faster since worrying about breaking things is handled by the computer, not the programmer. That's the point of automation. I'm kinda morbidly curious what has led you to these views. Commented Apr 2, 2023 at 22:05
  • @neonblitzer Oh sure! Once you have gotten the code written, later changes are easier. But adding full on automated testing is a lot of work! And guess what happens if you skip adding automated tests for a new feature later on? That ... is again cheaper. But at each point, the easy and cheap route is "don't test it", which means you need to have a longer term than a few weeks horizon to do the extra work. Have you ever taken a large, legacy code base and got it up to a fully automated testing state? It is an insanely huge endeavor.
    – Yakk
    Commented Apr 2, 2023 at 23:10
  • Fair enough – what you're saying is that technical debt doubles the cost of a code base. Deciding to write high quality code and automated tests isn't expensive in any timeframe longer than half a year, but the fact that OP's company hasn't done that makes fixing the issue later on expensive. Commented Apr 4, 2023 at 17:15
  • 1
    @neonblitzer I mean, given a large enough code base to developer ratio, it can take a LOT longer than a year to automated test even the majority of it. But yes, writing an easily testable code base will get you a lower long-term regression and bug rate than not doing so at the same development effort. Initially you'll get some features slower (which is why "throw away" code still has lots of value) and not notice the bug problem, so even here you need to have good policies (enforcing it) or short-term metrics will favour people who don't, and buy-in from the bean counters.
    – Yakk
    Commented Apr 4, 2023 at 17:20
2

Your people are not the problem, your systems are

People make mistakes. People will always make mistakes. The solution to your problem is not to find people who don’t make mistakes because those people don’t exist.

You need to design and implement systems that expect mistakes and minimise their impact. Even then, your systems are designed and implemented by people. People who, as I believe I mentioned, will make mistakes. So your systems will fail.

You need to also be aware that there is a non trivial risk that a system you implement to avoid failure will introduce new sources of failure.

You can’t eliminate failure, you can only be prepared for it

3
  • “Minimise the impact”: At one company, a new version of the software was released at 8am in a single small country. At 9am we got a notice that the app was completely broken. At 9:30 am we stopped any updates going to customers that hadn’t received them yet. At about 12am the problem was fixed, and the app sent to Apple for an emergency review. In the evening, the new version was available to customers. That was about the worst case failure possible, fixed with the least possible damage. “Never release on Friday evening” alone reduces the possible damage considerably.
    – gnasher729
    Commented Apr 1, 2023 at 16:09
  • @gnasher729 so nobody died? Sounds like your software is low impact anyway - it’s not like it runs air traffic control or defibrillators.
    – Dale M
    Commented Apr 1, 2023 at 21:57
  • 1
    Most software is “low impact”. This software would have been problematic if it had malfunctioned but some important functionality failed completely and very visibly. “This bug is Like an elephant in the kitchen. It’s either there or it isn’t there, but you will know”.
    – gnasher729
    Commented Apr 2, 2023 at 10:20
2

Due to the complexity of software, it is unrealistic to expect zero bugs, however most bugs can be caught before they do any damage with a three stage process consisting of dev, code review and testing.

Each step in the process must be handled by a different person, for example a junior dev might write the initial code, a more senior dev will do the CR, and testing should be performed by a dedicated tester.

Due to this, there are three opportunities to catch the error, and while all three people are responsible, the blame is shared, so there is minimal drama.

But also if the code is going to affect a critical system, there should be an initial risk assessment done by the project manager which will tell you whether it is a good idea to slow down on this particular task and spend extra time on quality control.

All of this is assuming you're already following basic principles like being able to roll back the deployment easily, keeping backups, having up to date docs and ensuring the team is a cohesive unit. Avoiding a culture of blame will help with the last point, though there is only so much you can do if that blame culture comes from the upper management.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .