58

A fellow developer has started work on a new Drupal project, and the sysadmin has suggested that they should only put the sites/default subdirectory in source control, because it "will make updates easily scriptable." Setting aside that somewhat dubious claim, it raises another question -- what files should be under source control? And is there a situation where some large chunk of files should be excluded?

My opinion is that the entire tree for the project should be under control, and this would be true for a Drupal project, rails, or anything else. This seems like a no-brainer -- you clearly need versioning for your framework as much as you do for any custom code you write.

That said, I would love to get other opinions on this. Are there any arguments for not having everything under control?

1
  • 2
    Anything that generates the final representation (including documentation) should be under version control, provided the storage is feasible. It sounds like code generation is being dubiously conflated with versioning here, in which I would check the claims of easily scripting (read: generating) updates from what you have versioned.
    – MrGomez
    Commented Nov 21, 2011 at 18:15

17 Answers 17

72

I would say that the minimum that source control should contain is all of the files necessary to recreate a running version of the project. This even includes DDL files to set up and modify any database schema, and in the correct sequence too. Minus, of course, the tools necessary to build and execute the project as well as anything that can be automatically derived/generated from other files in source control (such as JavaDoc files generated from the Java files in source control).

9
  • 1
    @EdWoodcock: You're right, getting the order correct can be a real pain, but on sometimes you want to re-create a particular state of the database, or optionally apply certain changes when testing rather than dropping/recreate the whole thing. I find it varies by project. Commented Nov 18, 2011 at 16:08
  • 1
    Point taken, there's a level or pragmatism required for that one.
    – Ed James
    Commented Nov 18, 2011 at 16:16
  • 3
    @JayBazuzi: Workstation setup guides (in source control) should outline the necessary tools and dependencies, as well as how to set up and where to get the tools from. Maintaining a usable toolkit is important, but is not the purpose of source control. I suppose if you REALLY wanted to, you could add the installer file/.msi and some instruction files, but that might not be feasible in may workplaces. Would you really want to check in VisualStudio Pro 2010 or IBM RAD, XMLSpy, etc, into your source control system? Many workplaces have controlled deployments for these tools. Commented Nov 18, 2011 at 21:56
  • 2
    @artistoex : That's splitting hairs. It's generally assumed that the build box has the same libraries as the dev boxes do. If the two differ, there's something wrong with the IT manager. All you (ideally) would need is just the source code. Some projects this isn't applicable, but for most it should be.
    – Mike S
    Commented Dec 9, 2011 at 10:24
  • 1
    @mike I meant it. I think it was Kent Beck in a book about XP who actually proposed that. Not such a bad a idea. You are nearly 100% sure to be able to reconstruct all build factors. And don't forget environments most likely change over the course of a project.
    – wnrph
    Commented Dec 9, 2011 at 13:11
30

It is best to put just about everything under the sun into source control.

  • Code

  • Libraries

  • Resources

  • Build/Deploy Scripts

  • Database creation and update scripts

  • Certain documentation

  • Environment Specific Configuration Files

The only thing that shouldn't be put into source control are build artifacts for your project.

4
  • 5
    Make sure that the "certain documentation" isn't dependent on a particular tool. I've run into a number of projects that used something like SunOS version of Frame to do docs, they checked in all of the ".mif" files, but not the resulting .ps or .pdf files. Now that SunOS and Frame are relegated to the dustbin of history, a lot of design docs only exist as treasured paper copies. Commented Nov 18, 2011 at 16:22
  • 2
    @BruceEdiger In that case I would personally want both the output and the tool specific information. If the tool disappears, you at least still have a static electronic copy :)
    – maple_shaft
    Commented Nov 18, 2011 at 16:26
  • one of the advantages here of a big process company, the source goes into vcs, the generated stuff has to go into configuration management system, so even if your tool is defunct you still have the results controlled
    – jk.
    Commented Nov 18, 2011 at 21:26
  • How about the specific version of the compiler(s) you're using? Heck, why not the whole OS?
    – wnoise
    Commented Nov 20, 2011 at 13:24
19

I would say that;

  • any file needed to perform the build goes into version control
  • any file (which can be) generated by the build does not

I would tend to put large binaries such as tool install packages somewhere outside of the trunk, but they should still be under version control.

18

Hard won experience has taught me that almost everything belongs in source control. (My comments here are colored by a decade and a half developing for embedded/telecom systems on proprietary hardware with proprietary, and sometimes hard to find, tools.)

Some of the answers here say "don't put binaries in source control". That's wrong. When you're working on a product with lots of third party code and lots of binary libraries from vendors, you check in the binary libraries. Because, if you don't, then at some point you're going to upgrade and you'll run into trouble: the build breaks because the build machine doesn't have the latest version; someone gives the new guy the old CDs to install from; the project wiki has stale instructions regarding what version to install; etc. Worse still, if you have to work closely with the vendor to resolve a particular issue and they send you five sets of libraries in a week, you must be able to track which set of binaries exhibited which behavior. The source control system is a tool that solves exactly that problem.

Some of the answers here say "don't put the toolchain in source control". I won't say it's wrong, but it's best to put the toolchain in source control unless you have a rock solid configuration management (CM) system for it. Again, consider the upgrade issue as mentioned above. Worse still, I worked on a project where there were four separate flavors of the toolchain floating around when I got hired -- all of them in active use! One of the first things I did (after I managed to get a build to work) was put the toolchain under source control. (The idea of a solid CM system was beyond hope.)

And what happens when different projects require different toolchains? Case in point: After a couple of years, one of the projects got an upgrade from a vendor and all the Makefiles broke. Turns out they were relying on a newer version of GNU make. So we all upgraded. Whoops, another project's Makefiles all broke. Lesson: commit both versions of GNU make, and run the version that comes with your project checkout.

Or, if you work in a place where everything else is wildly out of control, you have conversations like, "Hey, the new guy is starting today, where's the CD for the compiler?" "Dunno, haven't seen them since Jack quit, he was the guardian of the CDs." "Uhh, wasn't that before we moved up from the 2nd floor?" "Maybe they're in a box or something." And since the tools are three years old, there's no hope of getting that old CD from the vendor.

All of your build scripts belong in source control. Everything! All the way down to environment variables. Your build machine should be able to run a build of any of your projects by executing a single script in the root of the project. (./build is a reasonable standard; ./configure; make is almost as good.) The script should set up the environment as required and then launch whatever tool builds the product (make, ant, etc).

If you think it's too much work, it's not. It actually saves a ton of work. You commit the files once at the beginning of time, and then whenever you upgrade. No lone wolf can upgrade his own machine and commit a bunch of source code that depends on the latest version of some tool, breaking the build for everyone else. When you hire new developers, you can tell them to check out the project and run ./build. When version 1.8 has a lot of performance tuning, and you tweak code, compiler flags, and environment variables, you want to make sure that the new compiler flags don't accidentally get applied to version 1.7 patch builds, because they really need the code changes that go along with them or you see some hairy race conditions.

Best of all, it will save your ass someday: imagine that you ship version 3.0.2 of your product on a Monday. Hooray, celebrate. On Tuesday morning, a VIP customer calls the support hotline, complaining about this supercritical, urgent bug in version 2.2.6 that you shipped 18 months ago. And you still contractually have to support it, and they refuse to upgrade until you can confirm for certain that the bug is fixed in the new code, and they are large enough to make you dance. There are two parallel universes:

  • In the universe where you don't have libraries, toolchain, and build scripts in source control, and you don't have a rock-solid CM system.... You can check out the right version of the code, but it gives you all kinds of errors when you try to build. Let's see, did we upgrade the tools in May? No, that was the libraries. Ok, go back to the old libraries -- wait, were there two upgrades? Ah yes, that looks a little better. But now this strange linker crash looks familiar. Oh, that's because the old libraries didn't work with the new toolchain, that's why we had to upgrade, right? (I'll spare you the agony of the rest of the effort. It takes two weeks and nobody is happy at the end of it, not you, not management, not the customer.)

  • In the universe where everything is in source control, you check out the 2.2.6 tag, have a debug build ready in an hour or so, spend a day or two recreating the "VIP bug", track down the cause, fix it in the current release, and convince the customer to upgrade. Stressful, but not nearly as bad as that other universe where your hairline is 3cm higher.

With that said, you can take it too far:

  • You should have a standard OS install that you have a "gold copy" of. Document it, probably in a README that is in source control, so that future generations know that version 2.2.6 and earlier only built on RHEL 5.3 and 2.3.0 and later only built on Ubuntu 11.04. If it's easier for you to manage the toolchain this way, go for it, just make sure it's a reliable system.
  • Project documentation is cumbersome to maintain in a source control system. Project docs are always ahead of the code itself, and it's not uncommon to be working on documentation for the next version while working on code for the current version. Especially if all your project docs are binary docs that you can't diff or merge.
  • If you have a system that controls the versions of everything used in the build, use it! Just make sure it's easy to sync across the whole team, so that everyone (including the build machine) is pulling from the same set of tools. (I'm thinking of systems like Debian's pbuilder and responsible usage of python's virtualenv.)
6
  • 1
    Don't forget to check in any hard-to-replace hardware. One company lost a build because they no longer had some CPU (HPPA? 68040?) that the build tools ran on.
    – hotpaw2
    Commented Dec 12, 2011 at 17:55
  • 1
    What does “CM system” stand for?
    – bodo
    Commented Nov 12, 2015 at 12:07
  • 1
    In most cases I'd rather document the binaries and versions than commit the binaries themselves. Yes - in your case the binaries were hard to get, and you didn't have another good method of stashing them. But I feel generally documenting all dependencies as well as how to set things (like the dev VM) up works as a lighter-weight equivalent. Scripting it improves reproduction, but in the end we all have to ship.
    – Iiridayn
    Commented Jul 30, 2018 at 23:57
  • Downvoting because of the advice to put the toolchain and build artifacts in source control. Yes, if you have poor management solutions for those, it may sometimes be necessary, but it’s never desirable. And popular OSS tools like PHP will always be available (because there’s no single publisher to vanish), so it’s definitely not necessary in the case of the present question. Commented Mar 26, 2019 at 4:20
  • Some time later, thinking more about this: I now do most of my development within well-defined Docker images, which I suppose is a way of putting the tool dependencies (or rather, a description of them) in the VCS. Build artifacts don’t belong there, though. Commented Sep 20, 2021 at 13:05
16

And don't forget to put all database code in Source Control as well! This would include the orginal create scripts, the scripts to alter tables (that are marked by what version of the software uses them, so you can recreate any version of the database for any version of the applications) and scripts to populate any lookup tables.

13

The only things that I do not put under source control are files that you can easily regenerate or are developer specific. This means executables and binaries that are composed of your source code, documentation that is generated from reading/parsing files under source control, and IDE-specific files. Everything else goes into version control and is appropriately managed.

7

The use-case for source control is : What if all our developers machines and all of our deployment machines were hit by a meteor? You want recovery to be as close to checkout and build as possible. (If that's too silly, you can go with "hire a new developer.")

In other words, everything other than OS, apps, and tools should be in VCS, and in embedded systems, where there can be a dependency on a specific tool binary version, I've seen the tools kept in VCS too!

Incomplete source control is one of the most common risks I see when consulting -- there's all sorts of friction associated with bringing on a new developer or setting up a new machine. Along with the concepts of Continuous Integration and Continuous Delivery you ought to have a sense of "Continuous Development" -- can an IT person set up a new development or deployment machine essentially automatically, so that the developer can be looking at code before they finish their first cup of coffee?

4
  • 1
    This also means that working from multiple machines is painless. Just pull the repo, and you are ready to go. Commented Nov 18, 2011 at 18:50
  • +1 for the meteor reference, that sums things up nicely.
    – muffinista
    Commented Nov 20, 2011 at 1:01
  • Can someone point to an example of (eg) a java project with the full toolchain under rev control such that it can be checked out and used in a straightforward way?
    – andersoj
    Commented Mar 28, 2013 at 2:22
  • @andersoj Check out boxen.github.com Commented Mar 28, 2013 at 20:12
6

Anything that contributes to the project and for which you want to track changes.

Exceptions can include large binary blobs such as images, if you're using an scm that doesn't handle binary data very well.

0
2

Drupal uses git so I will use git's terminology. I would use subrepos for each module to be able to pull down module updates from drupal's official repos, while still preserving the structure of individual deployments. That way you get the scriptability benefits without losing the benefits of having everything under source control.

2

Anything your automated build does generates does not go in source control. Anything that requires no modification during build does go in source control. It's that simple.

For example, the following don't go in source control:

  • generated code
  • generated binaries
  • anything created by your build
  • anything created at runtime by your service, process, web application

What does go in source control:

  • anything a human creates
  • anything that is created by another person or group (e.g. a third-party in-house library where source control is distributed or an open-source project's binaries).
  • scripts and other source that create things like a database (i.e. how would you recreate the db if all the DBA's go AWOL).

These rules-of-thumb are predicated on the notion that whatever is in source control could be modified by a human and could take someone's valuable time to understand why it's there.

1

Everything should be under source control, except:

  • Configuration files, if they include configuration options that are different for each developer and / or each environment (development, testing, production)
  • Cache files, if you are using filesystem caching
  • Log files, if you are logging to text files
  • Anything that like cache files and log files is generated content
  • (Very) Large binary files that are unlikely to change (some version control systems don't like them, but if you are using hg or git they don't mind much)

Think of it like that: Every new member to the team should be able to checkout a working copy of the project (minus the configuration items).

And don't forget to put database schema changes (simple sql dumps of every schema change) under version control too. You could include user and api documentation, if it makes sense to the project.


@maple_shaft raises an important issue with my first statement regarding environment configuration files in the comments. I'd like to clarify that my answer is to the specifics of the question, which is about Drupal or generic CMS projects. In such scenarios you typically have a local and a production database, and one environment configuration option is the credentials to these databases (and similar credentials). It's advisable that these are NOT under source control, as that would create several security concerns.

In a more typical development workflow however, I do agree with maple_shaft that environment configuration options should be under source control to enable for an one-step build and deploy of any environment.

6
  • 3
    -1 HIGHLY DISAGREE with your statement about configuration files not belonging in source control. Perhaps developer specific configuration files yes, however environment specific configuration files are necessary if you want the ability for a one-step build and deploy of any environment.
    – maple_shaft
    Commented Nov 18, 2011 at 16:07
  • 2
    @maple_shaft In the context of the question (drupal project or gereric CMS web project) "one-step build and deploy of any environment" is a highly unlikely scenario (will you put production database credentials in with everything?). I'm answering to the question, not providing general guidelines on what should be put under version control. - But your downvote is welcome :)
    – yannis
    Commented Nov 18, 2011 at 16:11
  • I can see in situations where the source code repository is public, as in open source or where security is an extreme concern like in financial institutions that database credentials do not belong in source control. Beyond that source control should be password protected and limited to a certain set of users, so database credentials in source control should not be a primary concern in that scenario. That you pointed that out to me the downvote does seem harsh, if you edit your answer I can remove it.
    – maple_shaft
    Commented Nov 18, 2011 at 16:23
  • @maple_shaft Don't worry about the downvote (I've edited the question, but feel free to leave it if you want). As for password protected version control: We recently had to deal with a situation where a laptop was stolen from a member of our management team, that contained the password to our version control system (which at the time had our S3 credentials in). It was a big snafu from his part (laptop wasn't password protected, and a few other details I can't really disclose) but still it's something that can happen to everyone. Building from that experience we moved everything out of vcs.
    – yannis
    Commented Nov 18, 2011 at 16:34
  • @maple_shaft and although it may seem like I'm advocating paranoia, we now go to the extreme to protect anything related to credentials from similar snafus.
    – yannis
    Commented Nov 18, 2011 at 16:35
1

Anything that you need to work and can change needs to be versioned some way or another. But there is rarely a need to have two independent systems keep track of it.

Anything generated in a reliable way can usually be attached to a source version - therefore it doesn't need to be tracked independantly: generated source, binaries that are not passed from a system to another, etc.

Build logs and other stuff that probably nobody cares about (but you never know for sure) are usually best tracked by whoever is generating it: jenkins, etc.

Build products that are passed from one system to another need to be tracked, but a maven repo is a good way to do it - you don't need the level of control a source control provides. Deliverables are often in the same category.

Whatever remains (and at this point, there should be little more than source files and build server configuration) goes into source control.

0

My answer's pretty simple: not binaries. By implication, almost everything else.

(Definitely not database backups or schema migrations or user-data, though.)

1
  • Schema migrations absolutely go in source control. That way you know what DB schema the code expects. Commented Oct 6, 2019 at 20:50
0

Source control is a change tracking mechanism. Use it when you want to know who changed what and when.

Source control is not free. It adds complexity to your workflow, and requires training for new collegues. Weigh the benefits against the cost.

For example, it can be tough to control databases. We used to have a system where you had to manually save definitions in a text file and then add those to source control. This took a lot of time and was unreliable. Because it was unreliable, you could not use it to set up a new database, or to check at what time a change was made. But we kept it for years, wasting countless hours, because our manager thought "all things should be in source control".

Source control is not magic. Try it, but abandon it if it doesn't add enough value to offset the cost.

3
  • 2
    Are you serious? Source control is bad because it requires training for new colleagues? Are you actually saying you'd prefer to work long-term with people who don't know how to use source control and aren't willing to learn? Personally I'd rather flip burgers.
    – Zach
    Commented Nov 19, 2011 at 1:40
  • Hehe I'm not arguing against source control, just against blindly using source control for everything. If source control has a very complex workflow and that doesn't add value, I would prefer not to use it.
    – Andomar
    Commented Nov 19, 2011 at 1:51
  • 2
    My point is, even if you're only using it for some things (cough source code cough), your colleagues should already know how to use it, so training them shouldn't be increased overhead in using it for something else.
    – Zach
    Commented Nov 19, 2011 at 1:55
0

Stuff I would not put in source control:

  • Secret keys and passwords
  • The SDK even though it's the same directory and if I make a patch to the SDK then a should make it another project since it would be per framework instead of per app
  • 3rd-party libraries such as . Leftovers from migration, backups, compiled code, code under other license (perhaps)

So I don't do a hg addremovefor instance since a make a new clone every once in a while when the SDK updates. That also makes me do a complete backup every time the SDk updates and check that a new version cloned from the repository is well.

0

I highly recommend unto you the following book which addresses your concerns:

Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Specifically, Chapter 2 addresses items to be placed into source control, which as quite a few folks have said, is practically everything except mostly generated content as a result of a build.

I do not agree with one piece of the accepted answer provided by @FrustratedWithFormsDesigner less because he advocates not placing into version control the tools necessary to build the project. Somewhere in source control (adjacent to the code being built) should be the build scripts to build the project and build scripts that run from a command line only. If by tools he means, IDEs and editors, they should not be required to build the project whatsoever. These are good for active/rapid development for developers and setup of this type of environment could be scripted as well or downloaded out of another section of SCM or from some type of binary management server and setup up such IDEs should be as automated as possible.

I also disagree with what @Yannis Rizos states about placing configurations for environments in source control. Reason being is that you should be able to reconstruct any environment at will using nothing but scripts and it is not manageable without having configuration settings in source control. There is also no history of how configurations for various environments have evolved without placing this information into source control. Now, production environment settings may be confidential or companies may not want to place these in version control, so a second option is to still place them in version control so that they have a history, and give this repository limited access.

-1

Keep all code in version control and all the configurations & user data out. To be specific to drupal, you need to put everything in version control except files and settings.php

Not the answer you're looking for? Browse other questions tagged or ask your own question.