19

So this is a basic question that was asked by a junior developer (not in these exact words). He had worked on a feature which was used in multiple projects. The main feature can be easily isolated and written as a library and can be used in three projects. The feature was basically showing a regulatory dialog when the user wanted to do some action on a trade.

Instead of writing the models, services, presentation, validation logic in a single library and reusing, what he had done was to do all this in all three projects. He basically duplicated the code.

So, when this came to me and I started refactoring he was happy with it, but at one point he asked an honest question.

Why can't I copy/paste code?

I told him why with examples of well used libraries, how code copy/pasting can lead to untested brittle code with hidden bugs, how that increases maintenance effort, how it prevents people from reusing same functionality in the future, etc.

However, later I thought perhaps a person who doesn't understand why he shouldn't copy/paste code might not also understand these concerns either.

The coding culture at my workplace is not so great, so there's that. But apart from all the failures of the upper layers and culture we can point out here (granting all that), what is a good way to explain to a layman why code duplication is bad?

15
  • 7
    How is explaining it to lay people different from the textbook explanation? When you have to change duplicated code, it takes way longer and risks introducing subtle deviations. That's pretty much it. (If your code never had to change that wouldn't be an issue, but that isn't how the world works.) Commented Oct 18, 2023 at 8:46
  • 7
    I'm tempted to say "let him do that and check back in a month". But I agree with the above - "if you have three copies of code you need to change it three times every time, and are three times as likely to screw it up" should be simple enough for him to grasp, and once he gets that idea you can preach modularity and the idea of common libraries. Commented Oct 18, 2023 at 9:06
  • 5
    To be fair, there are some situations where copying code is the best solution. For example when copying code between different domains context, so you intend for each implementation to diverge with time.
    – JonasH
    Commented Oct 18, 2023 at 9:42
  • 6
    Copying code is the best idea a lot more often than people think it is. If sharing it means that it will have no real owner or no clear purpose, that it will change according to requirements from 3 different directions, and/or that it can't be maintained without testing 3 projects and coordinating 3 release cycles, then it's only the same in 3 places by coincidence, and it's best if each project maintains their own version. Commented Oct 18, 2023 at 23:42
  • 2
    @MattTimmermans disagree. a) people think far too often that copy&paste is a good idea when it's not, b) there's not really much harm in sharing code even when copy&paste would in hindsight have been better (because you can still fork later on), but careless copy&paste does a lot of damage in terms of increased maintenance, and since the copies tend to get out of sync it is a lot of refactoring&testing effort to merge them later on, often infeasible to do. Commented Oct 19, 2023 at 16:23

9 Answers 9

44

Copying code (causing duplication) and sharing code (de-duplicating it via shared modules, classes, libraries, whatever) each have their benefits and drawbacks.

In my opinion, the key difference is in how they respond to changes, so what is appropriate comes down to what kind of changes you anticipate you'll need to make in the future:

De-duplicated code Duplicated code
Need to change 1 case Hard. You need to decouple things first, and then change the one case. Very easy. Just change the 1 case and not the others.
Need to change all cases together Very easy. You just make the change once centrally, and all cases are done. Very hard. You need to find all cases, and make the same change to all of them. Very error-prone.
No changes ever needed Doesn't matter, so it DRYing it might have been a premature abstraction. Doesn't matter.

In other words:

  • De-duplicating code makes consistency easy. This can be a problem when you need divergence (e.g. you want one case to differ from the others). It protects against unintentional divergence.
  • Duplicating code makes divergence easy. This introduces the opportunity for unintentional divergence (e.g. you have some code with a bug repeated 3 times. You fix the bug, but only in 2 out of 3 places, so now the 3rd is unintentionally inconsistent with the others). On the other hand, it makes intentional divergence easier.
24
  • 4
    Was the asterisk after "No changes ever needed" meant to refer to a footnote of some kind?
    – gerrit
    Commented Oct 19, 2023 at 6:18
  • 6
    @gerrit Wondering about that too. I guess the footnote could just say "Ha ha." in a sarcastic tone.
    – Thomas
    Commented Oct 19, 2023 at 7:48
  • 7
    For "No changes ever needed" (a questionable premise), there is still a relatively small benefit of limiting the size of your code base. If there is significant amounts of duplication, it could take more time to understand the code, to track down bugs and to find some code you're looking for. When you need to change 1 case, that can often be a fairly easy case of adding a parameter to a method (although it depends on the nature of the change). It can also often be really hard to judge how often something would need to be changed in future, and how big those changes would be.
    – NotThatGuy
    Commented Oct 19, 2023 at 13:13
  • 8
    As an addition to the "no changes ever needed" point: changes also include bugfixes, which, in 99% of cases will need to be changed in all the places the code is used. So unless you expect that the code you just wrote has no bugs whatsoever (ha ha), you should lean towards deduplication by default.
    – Tomeamis
    Commented Oct 19, 2023 at 16:41
  • 3
    "is designed to be extensible" that's the hard part.
    – Alexander
    Commented Oct 19, 2023 at 17:04
20

You can copy and paste code.

There are times when it is the right thing to do. We are taught not to do it because, more often than when it is the right thing to do, it is the lazy thing to do.

It is not the right thing to do when you’re copying an idea that, if you change it, will have to be changed everywhere you copy it. If everywhere you copy it it needs to vary independently then it’s good to copy.

You can dedupe code

The thing that makes not-copying difficult to write is you’ll need a good name. Avoiding creating bad names is good but avoiding creating good names is bad.

If we can get a good name out of you then the code becomes easier to read. You’ve discovered a useful abstraction that saves readers time.

Sometimes duping is required

However, sometimes code expresses a different idea that varies independently and just happens to look the same. In this case duplication is no sin.

You can’t tell by looking at the copied code. Being the same code character for character isn’t the sin.

Just because x and y both happen to equal 1 doesn’t, on its own, mean one of them has to go. No, but if you had a rule that said x must equal y then you should question whether you really need y.

Uncertainty

What if you can’t predict the future and aren’t sure if it needs to vary independently? In that case, consider avoiding copying simply because it makes it harder to see what you have done. Requirements change isn't the only thing that forces change. Discovered bugs force change as well. It's not fun to find you've copied a bug to a bunch of different places and left yourself no easy way to find every place. Also there's refactoring. Your copy might be an easy to detect dupe now but then someone refactors it and suddenly it's a dupe (still does the same thing for the same reasons) that looks completely different.

When you give a snippet of code a name and reuse that name you are making the reuse explicit, ensuring that each use changes together, and hopefully used a good name on a good abstraction. When you don't need those features, copy as you please.

Try again later

You might read this and conclude that since these features were desired the junior developer made a mistake. Hold on. Often the easy way to reuse code is to first copy it, figure out what code needs to be created to support it, and test that it really does what you need it to do. Then consider factoring it out with a name and abstraction (method, object, etc). Yes, copy and paste can be a good first step to explicitly reusing code. Which is exactly what you used it for.

Mark your path

But this only works if you know where all that copying happened. You were lucky you could find it. People with questions and bathroom trips interrupt at the most inconvenient times. So if you want to be a casual copier consider adopting a habit of documenting where your code came from when you copy it. A quick little comment can save your bladder and ensure that the reuse is detectable.

1
  • 2
    Excellent answer. You didn't explicitly mention the acronym DRY, but that's what the discussion boils down to. I once wrote a blog post where I coined the SRY principle: Sometimes Repeat Yourself. The "independent variations" argument is there, plus two additional reasons why repetition can be good: readability (because there's less abstraction) and searchability (only applies in specific situations).
    – Thomas
    Commented Oct 19, 2023 at 7:47
14

You can talk about untested code, brittleness, etc. That's just going to fly over their head. The simplest way to describe it is to explain that if you need to make a change to the behavior of that dialog window, if its all in one library you don't have to hunt around in 20 different projects trying to find every place its been used so you can change each copy/pasted instance.

5
  • 4
    I think this is the most intuitive answer that a junior programmer could understand. They might say, "But I know all the places where it was copied to" because they haven't truly grasped the scale of a large project yet, or they haven't grasped that they might not be the only person maintaining the code. Commented Oct 18, 2023 at 17:54
  • 2
    I'd add that, when trying to tell someone else later how to achieve the same thing, instead of "Copy and paste this segment into your code to make the thing you want, and keep an eye out anytime we need to update that part", you can say "Call this thing" that we made, and if it works once, you can ignore the details..". Commented Oct 18, 2023 at 18:03
  • 5
    @NateBarbettini You may be right. The stages of DRY seem to be: 1) Always repeat 2) Never repeat 3) repeat only when you can see each as a special case 4) repeat and fix later as needed when you can see how change will come. If you just want to move the junior dev from 1 to 2 this is fine. Commented Oct 18, 2023 at 18:03
  • 2
    @NateBarbettini yeah... you might know all of the places where it was copied to, and still forget to modify some (or modify them incorrectly).
    – RonJohn
    Commented Oct 20, 2023 at 13:10
  • @RonJohn Exactly. It's something that every developer can feel almost physically, but that's because we've all done it. Juniors may not have had the painful experience yet. Commented Oct 21, 2023 at 0:52
13

You can share source code files between projects without copy & paste, so sharing binary modules vs. copy & paste is a false dichotomy / false dilemma.

Copy & paste offers no built-in mechanism to trace the various copies that went into the different projects.  Sometimes that is fine, but other times highly problematic.

Sharing source code files (obviously) requires that both projects have access to the (original) shared source files, and this must be reflected in the build of a project — though this means that there is a record of the sharing built into the build system for the projects.

Sharing binary modules (if you mean dynamically loadable binaries) is sometimes a good answer, though adds complexity to the delivery and deployment of a project build.

You have to decide what kind of sharing is most appropriate and meets your sharing objectives as there are several to choose from, not just the two of copy & paste vs. binary modules.

Refactoring to find additional abstractions may also increase the ability to share without copy & paste, not just across projects but within one project.

9

Every line of code is future maintenance cost. Make three copies of your library as source code and you just tripled your maintenance cost.

Now what’s really bad is to have the same code three times with tiny changes.

4
  • 2
    This is absolutely true. "Every line of code is future maintenance cost". But visit Code Golf and after that try to tell me that every line of code has an equal maintenance cost. Commented Oct 18, 2023 at 20:56
  • 3
    For a prime example, see the Microsoft Office suite. It's infamous for features that should reasonably be identical between programs, but yet they're slightly different in ways that frustrate users. Newer versions have improved this somewhat, but in the early 2000s it was a mess. I'm sure it was easier for programmers, but the user experience was very annoying.
    – bta
    Commented Oct 18, 2023 at 23:49
  • 1
    @bta exactly this. In power point you can easily draw flow charts, because when you draw an arrow you get anchor points. In word the anchor points are missing. What the ... Commented Oct 19, 2023 at 11:41
  • Agree with the choice of words... but you skipped the second half of the sentence - "Make three copies of your library as source code and you just tripled your maintenance cost... try to make the change to that library used in 50 projects and you are toast". :) Commented Oct 20, 2023 at 6:19
4

In this case, there is a particular reason why copy-paste is worse than reuse.

You wrote:

Feature was basically showing a regulatory dialog when the user wanted to do some action on a trade.

This is an external dependency which is likely to change. Compliance is simpler when this type of code is centralised.

3

The coding culture at my workplace is not so great so there's that. But apart from all the failures of the upper layers and culture we can point out here (granting all that), what is a good way to explain to a layman why code duplication is bad?

To your point of explaining it to a layman: Why reinvent the wheel?

Their (hopefully) logical response to that would be "Copying and pasting isn't reinventing, it is just using it again".

Then use the multiple clock paradigm to explain why copy/pasting isn't really reuse, it is inheritance (editor inheritance).

Having multiple copies of what started out as same thing is just like having multiple clocks on the wall. Which one is correct (think cheap mechanical as the drift out of time) and the effort to set them all for daylight savings time.

Having the code (binary) once is similar in concept to a single high quality clock. One clock to maintain, everyone knows what time it is.

Anyone past a noobie will cringe at this answer, but they wouldn't ask the question.

1
  • 1
    More specifically, having multiple clocks with each its independent clockwork is what's a problem. Shared code is also like having multiple clocks, but they're all synchronised so it's enough to spend maintenance effort on one of them. Commented Oct 19, 2023 at 16:17
2

I'm a big believer in @gnasher729's answer, but...

When you reference someone else's binary code, you have just linked that code's lifecycle to your code's lifecycle. Look at what the various package managers (npm, nuget, ...) go through to manage versioning.

If the original code wasn't designed and packaged to share, then you are doing the wrong thing; eventually the glue that bonds your code to the shared code will either break or make you pull your hair out.

This is a lot like class inheritance. It's great, till it isn't. There's a reason why many projects now specify that base classes that aren't designed to be inherited should be marked sealed (or final or whatever the language calls it).

Testing also enters the picture. It's no fun when you need a new feature from a new version of your shared dependency, and once you build things, all your tests break (and, how much test coverage do you need given that your dependency has changed).

I've done way too much maintenance programming over my career; copy/pasta can lead to much pain and maintenance cost. Unfortunately, so can ill-designed binary sharing.

2
  • There is an import point you are making here that I think needs to be called out explicitly - that the primary way to share code between multiple projects is that you have to create yet another binary - which incurs the cost of binary dependency management - this is not a cost that has to be paid if you have a monolith (that compiles to a single binary).
    – DavidT
    Commented Oct 19, 2023 at 22:26
  • Yes, everything is hunky-dory if you have a monolith - except, of course, that you have a monolith. The last thing I worked on had 37 separate runnable applications and dozens upon dozens of supporting "assemblies" (the .NET equivalent of Java packages) all bound together into a single git repo and a single Visual Studio solution. That's a much different dependency management problem.
    – Flydog57
    Commented Oct 19, 2023 at 22:53
0

With all the benefits to DRY code and me spending quite a lot of time trying to find similarities or blunt copy-pastes to dedupe, seeing the question put this way I thought I want to play a bit of devil's advocate: Why do we end up with copy-pasted code? Why does it stick around?

I'd say one benefit of the lazy approach is that it is quicker to PoC code. You can have an idea for some logic and you want it written down and dev-tested before you spend a day designing it "properly". Sure, something good for maintenance will pay off in the long run, but if during this or next sprint you expect to find that this was a dead end, why invest too much early on? Tech debt stories are full of "quick and dirty that we can fix later" approaches ;)

Another possible benefit is what one colleague called "Darwinian evolution of code": you end up with several codebases that started from a common ancestor and end up independently solving different problems and adapting to those. If you keep an attentive lookout for these deviating clones, and eventually but not immediately refactor them into a single implementation (or perhaps some base class/interface and per-situation implementations), you can find that the resulting code is sturdier precisely because you've tried several approaches and have tried their up- and down-sides in more wild situations than just in one greenhouse with a lot of assumptions/constraints. More so if it started with a bright neat idea rather than some diligent design (not implying here that one is always better than the other).

So surely, ending up with less code to maintain is good (e.g. a bug fixed once in the shared code is fixed for everyone at once; you quickly get feedback if it breaks something). But smothering the copy-pasted code sprawl in the crib might actually be counter-productive. There may be situations where it is better to let it sprawl for a month or two, only then refactor ;)

Not the answer you're looking for? Browse other questions tagged or ask your own question.