73

I am following the advice of @Piotr Migdal in Is there an internet Git-like repository for collaboration on a paper?, and I want to ask about version controls: how beneficial are they (specially under LaTeX settings) for writing papers compared to Dropbox and SugarSync?

I have been using SugarSync for almost a year with no pain. Usually, I create the paper folder and invite other authors to join, so we can see and edit the last version of the paper.

3

7 Answers 7

58

tl;dr: Version control is harder to set up, but makes it safe to work on the same file, and makes it easy to track history (i.e. previous versions).

Pros and cons of syncing files

Yes, the biggest advantage of things like Dropbox (I use it as well for backuping and synchronizing my files) and SugarSync is their easiness.

They may work for collaboration on files, but:

  • they are not meant for two people editing the same file at once (no merge functionalities - so one guy changing a file can overwrite changes made by other guy, even without knowing that),
  • you get no history, i.e.:
    • did anyone worked on that file I want to work know?
    • did anyone added or modified any other files?
    • which changes were made?
    • can I go to a previous version, the one I sent to my supervisor?

Depending what you do, it may not be an issue. For example, if only one is editing tex file, while others are only reading or uploading figures - it's perfectly fine.

And also, look at my answer on Simplest way to jointly write a manuscript? with a not technically-inclined collaborators.

Version control

Version control systems require some technical skills.

Two the most common version control systems are Git and Mercurial (with the second one being more Windows-friendly and, arguably, easier to start).

Both by standard comes only with command line access, but there are some graphical interfaces as well (I really recommend starting with SourceTree).

So, if the collaborators are techie, just teach them how to use it. If not - there is a way around.

You can keep track of version control by yourself, without engaging others (I'm doing it just now with 2 collaborators).

Just you start a repository inside folder you share (the examples are with Git):

cd ~/path/to/the/folder
git init                    // start git repository inside this folder
git add .                   // say git to track all files inside it

Now, every time you or your collaborator make some changes (e.g. add some files, correct typos, revise a chapter, ...) you do:

git commit -a -m "Fixed typos in Seciton 3"

Later, you will be able to go back to this version; and also compare, e.g. the current version of your file with the previous one (by default - by line, here - by words):

git diff HEAD~1 --color-words my_file.tex

See also:

And real world example from using diff (it makes my life so much easier :)); commit messages in Polish, but I guess you get the idea:

enter image description here

enter image description here

Otherwise (a strip from PhD Comics):

enter image description here

11
  • 6
    If two people are editing the same file using DropBox, DropBox created "Conflicted copies" rather than try to merge both changes into the same file, thereby avoiding damaging the file.
    – Ben Norris
    Commented Nov 15, 2012 at 13:36
  • 2
    If two people are editing the same file using DropBox, DropBox created "Conflicted copies" rather than try to merge both changes into the same file, thereby avoiding damaging the file. How can it tell when that happens? The two conflicting versions could be hours or even days apart. This is impossible to do without analyzing the content carefully (and sometimes even analyzing it). Commented Nov 15, 2012 at 16:34
  • 3
    @Kaz I'm not a computer scientist and for me it worked. Commented Nov 15, 2012 at 21:45
  • 19
    One awesome fringe benefit to using version control for a paper: Suppose your print out a copies of version n of your paper and give them to collaborators for comments. A while later, when you're on version n+10, some of them get back to you, with their changes. With version control (at least with Git), it's easy to enter their changes against version n and then apply them to your current version. Commented Jan 29, 2013 at 20:33
  • 4
    I use SmartGit to access Git. It's a lot easier than messing around with the command line, and free for non-commercial use. I found it's automatic connection to BitBucket didn't work correctly though, so you have to cut and paste the link. Commented Feb 3, 2013 at 11:21
36

I'm not entirely sure how dropbox and sugar sync work, but their main aim is not to monitor change, but to keep files in sync over a multitude of platforms and to provide backup. In addition, a good version control system allows you to keep older versions, but also to comment on the changes explain why they where made. The version control is also guaranteed to keep the chain of change of a tex file even over very long periods of time (say submitting to journal a, getting rejected, submitting to journal b, getting reviews, new version, acceptance: such a cycle could easily be 1.5 years).

Also, in a Version Control System (VCS) you decided when you want to save a version, in dropbox I can imagine that the system makes that decision. Being in control yourself is important, for example to be able to generate a difference file when resubmitting a paper (see also my answer to this question on TeX SE).

Using a VCS you can also collaborate easily with people. Just create a private repository at bitbucket (supports mercurial and git), arrange for the other authors to have read and/or write access to your tex files in the repository, and they can change the paper or add to it. The VCS will take care of the merging.

I use Mercurial myself for version controlling papers. However, for version controlling a tex file, a VCS might be overkill. I would still recommend Mercurial though.

8
  • 3
    For one tex file is is not an overkill - see screenshot from my answer and judge for yourself :). Commented Nov 15, 2012 at 11:07
  • 2
    +1 Really good answer! There are two things I miss here, though: 1) it's better to split a big file into smaller ones, this can be easily done in LaTeX. 2) When two people want to work realtime on the same file I recommend to put a multi-editor layer before the VCS. C9 or Google Docs are good examples. Commented Nov 15, 2012 at 14:14
  • 2
    I had to create an account to this Stack Exchange site as well just to give +1 to this answer. :) The nice part about Bitbucket is that there also exists a nice Android client for the site, allowing you to monitor changes to your repositories from anywhere (provided you've got an Android smartphone). VCSs in general are great because they save the author information for all the files in them on a per row basis. Tex files work well with VCSs because they are plain text files, unlike, say, Word documents, where you'd need to use Word's internal versioning features.
    – ZeroOne
    Commented Nov 15, 2012 at 14:39
  • 3
    @jmendeth for an academic paper I would not split up in subfiles, for a report or book I would. Commented Nov 15, 2012 at 14:42
  • @PaulHiemstra of course, I wouldn't neither for a paper. But I was speaking in general. :) Commented Nov 15, 2012 at 14:51
17

Given the praise received by version control systems in the existing answers, I’ll play the devil’s advocate here for a second and underline what I think is a very important point: it strongly depends on what your co-authors are comfortable with.

I use version control for most of the projects I do on my own, from code to papers. However, you have to realize that not everyone is familiar with this paradigm, and those who are familiar with it may not be familiar with a given piece of software (I myself am a heavy Subversion user, but have never used Git…). This is particularly true of people who don't develop software, as those tools come from the field of software development. So, check out what your co-authors use and what they are willing to learn. The great thing about a simple synchronization solution (such as DropBox) with no version control is that its learning curve is flat: just agree on a few rules (date-stamp all files, add initials, always send an email when you have created a new version). Anyone can understand that in a minute.

Finally, I'll add another remark: the need for tracking revision history in the short term needs not necessarily require that you record the revision history for the posterity. For example, my incremental backup system (Apple’s Time Machine) creates snapshots of my files history every hour for a day, every day for the past month, and so on. This covers some of the need for tracking older versions in the short term.

2
  • 1
    +1 for the know your collaborators message, but if you are collaborating with LaTeX (vs say Word) users as the OM suggests, they are likely to be more open to the idea of VC (I would think) so it is probably worth making the case.
    – DQdlM
    Commented Nov 15, 2012 at 15:06
  • 7
    You can also move to a solo workflow when working non-VCS-users: you exchange papers via e-mail with your collaborators, and as soon as you receive them you run a git commit --author="..." on your private git repository. Alternatively, you put a git repository in Dropbox, and tell your co-author to just ignore the hidden .git folder. Commented Nov 16, 2012 at 8:00
12

how beneficial they are (specially under Latex settings) for writing papers compared to dropbox and SugarSync?

I am a long time user of version control systems, in fact everything I have (my $HOME folder) is backed up in a VC.

I tried hard to use various version control systems for writing many (10+) research papers all of them written in LaTeX. My experience with using VCs for writing research papers is however mixed, if not outright negative. Besides the easiness of synchronization with a VC, the main problem is merging the updates. Unlike source code of programs, merging LaTeX is not that straightforward mainly due to line breaking issues. Secondly, even though I have no problem with various VCs, my co-authors (very heterogeneous mix of people) not necessarily have experience with the one I use, or use different one outright, or have no clue about this stuff. Add the quirkiness of setting up passwords, ssh tunnels, installation of client-side software etc. and you see that all in all, using a VC is not a smooth experience (at best).

Recently (3 papers so far), I gave a try to Dropbox and I am pretty pleased with the result. While it does not solve all the issues, it seems to me to solve at least some:

  • almost zero set-up, also layman have no problem installing the client
  • no explicit sync, everything just works instantly (no svn/git/bzr/... add/remove/move/... command line stuff involved)
  • merging issues are about the same as with a version control system - even with a vc in place I always tended to send explicit write lock notifications to co-authors by e-mail, or IM
  • dropbox has some rudimentary version control, for my purposes it's pretty sufficient. Writing papers is not about branching, right?
  • moreover, no repository setup is necessary. You just share a folder with a selected group of co-authors and that's it. Nobody else can see it. Few clicks, almost zero hassle.

As you see, my advice would be to stay with Dropbox-like solution. For my purposes, at least, it turned out to be the best solution so far.


As a follow-up to comments received: consider also the requirements you have for writing a research paper. Why to use a heavy-lifting solutions, such as a distributed version control, when we are speaking here about 1-10 text files, a handful of images and possibly a repository of data (binary, or text blobs). Do you really need to go through all the hassle with a DVCS for that? Maybe, if your research is rather a special case, most of the time, I guess, not. To me, easiness and accessibility to laymen of solutions such as Dropbox by far outweighs the advanced technological features, such as branching, tagging, etc.

9
  • Can you choose yourself which old versions are saved in dropbox? You would want to for example keep the version you submitted to a journal for creating difference files, exactly that one. Commented Nov 15, 2012 at 8:02
  • 1
    @Paul Hiemstra: I speak about using dropbox solely for the purposes of collaborative writing. Additionally, I always store the text in my own $HOME folder, which as I said, is versioned separately. But to answer the question, upon a milestone (submission, revision 1,2,...) I always create a separate folder and store the milestone version there. After all, it's about the same as e.g., Subversion would do if you create a tag - it's a separate folder in svn anyway... Remember, a paper is a small piece of data (few kB), not a *GB code base.
    – walkmanyi
    Commented Nov 15, 2012 at 8:44
  • 5
    “merging LaTeX is not that straightforward” – The way to handle line breaks in LaTeX, since code VCS are all line-based, is to have each sentence on a separate logical line. Modern text editors handle this smoothly and it makes the organisation of the document much more logical, and handled gracefully by VCS. — “Writing papers is not about branching, right?” – wrong. In fact, branching perfectly suits trial-and-error work, or work on separate features/sections concurrently. This is perfect for editing papers. Commented Nov 15, 2012 at 17:52
  • 1
    @KonradRudolph: I do not want to start a flame about these issues, still there's one important non-technological point here. I don't know with whom you write papers, but my experience shows that not everybody is technologically skilled like me. Just installing and understanding e.g., git is hard enough for some, not speaking about wrapping their heads around branching. Did you try that one on a professor over 50? How about colleagues from humanities? Good luck with DVCS and branching. I better write a full paper and collaborate via good old e-mail attachments in the meantime.
    – walkmanyi
    Commented Nov 15, 2012 at 20:50
  • 1
    @walkmanyi, whatever collaborators use is irrelevant, just handle your end of the mess under version control.
    – vonbrand
    Commented Dec 23, 2015 at 1:11
10

I strongly recommend using version control for writing a paper because my advisers have never been very good at using computers. They often edit the wrong versions of documents and then send them to me. Then I have to figure out what they changed and manually reenter it into my latest version. I work around this problem by keeping track of what version I emailed to them and then comparing what they sent back to me using release tags.

Don't assume the boss will ever use your version control system. He doesn't need to. But it's still extremely useful to use version control! Our papers are prepared in MS Word because that's all that the boss knows how to use, and that's the file format the journal wants. He often forgets to use the "Track Changes" feature, but you can use the "Compare and Merge Documents" under the Tools menu to determine what he edited. (Just "merge" it with the version you emailed, and the resulting document will display the differences using the "Track Changes" highlighting.) I never have to compare timestamps or worry about which file is the latest version, and even when MS Word destroys one of my figures I know that I can easily recover it.

You can keep all of your raw experimental data, post-processing code, figure files, and lab notes under version control, too. Then you can backup the whole repository and be really sure that you'll never lose anything. I apply repository-wide tags to indicate when I do new experiments, which helps to keep the code in synch with the data; this answers the old question about which method was used to generate the figures. ("Was it method A? We last used that six months ago, but it could've been similar method B that we started developing around that time. Maybe we used A.1? Great, we'll have to do it all over again...")

You can use the repository-pushing feature as a type of distributed backup system. I use TortoiseHg (a Mercurial GUI for Windows) to push/pull the repository to a USB flashdrive to carry between my home and work computers and also to a network share as a backup, and I never overwrite the wrong files or make extra copies of the files. By the way, forget about using the branching and merging features -- they don't really make sense for binary files, but it's valuable to know whether they got accidentally changed. Mercurial works quite well, even with huge binary files in vendor-proprietary formats.

Summary: Real world science experiments produce too many files to version manually, and the boss might not be very tech-savvy. Version control fixes these problems, and you'll never again have to sort through filenames with random dates hardcoded in them.

2
  • 2
    +1 for real world situation. :) My boss can't use anything else than Word, too. I'm using LaTeX to write my dissertation and still thinking about how to handle this.
    – Eekhoorn
    Commented Nov 29, 2012 at 10:43
  • 2
    While I use LaTeX, there's no way to get the boss to use it. "Track Changes" in Word is just too important to him. Maybe you can both compromise and use LyX (www.lyx.org).
    – user244795
    Commented Nov 29, 2012 at 18:05
9

There are many good points in the other answers, but I'd like to add another one, concerning the time/project management. Although you can do version control with Dropbox, the main strength of Dropbox is that everybody works on the same file(s) at the same time, which makes it fast and always synced, and it's quite good for a "rush", where n people have to work together over a given period of time on a given objective.

However, I'm currently working on 5+ papers at the same time, with different time constraints, different deadlines, and different involvement, and I appreciate to easily have the history of the paper, who committed what/when, and I like to have to commit contributions to a paper. Hence, I know that the version on the main repo is consistent, and I can leave some parts hanging on a local repo without breaking everything, and when I commit, I need to make the effort to understand what has actually changed and what's the interest. In this regard, the fact that you can easily associate a issue tracker to a repo (for instance with BitBucket) can be also quite helpful (for instance, you can add an issue "cite this other paper", attach the paper, and solve the issue when you commit the paragraph actually citing the paper.

This project management approach might be a bias coming from my programming background, and might be overkill in some cases, but in the end, there is no killer feature from one approach or the other, it's also how comfortable it makes your life.

0

I do not have experience with what I'm about to suggest, but it might be helpful. Use both; use both Dropbox and some VCS. How? Well, in the Dropbox folder that you want to share, start a git repository (see @PiotrMigdal answer). As far as I recall you can exclude a directory from being synced in the Dropbox, and you should exclude the .git (hidden-)directory since it is of no interest to your collaborators.

This way, you and you collaborators can easily share the data over Dropbox and you personally can enjoy benefits of real full scale VCS.

However, as always with shared-digital work, one of the most important issues is to set the guidelines - they should be clear to all participants.

4
  • 2
    This is a dangerous suggestion. If someone messes with the .git folder, it can be irreversibly damaged. Git has sharing built in and this can be automated Dropbox style. Use this instead. Not this easy, but way more secure.
    – Eekhoorn
    Commented Nov 29, 2012 at 10:41
  • @zenbomb: I see the problem. You could exclude the .git directory from the Dropbox sync.
    – Dror
    Commented Nov 30, 2012 at 7:20
  • 1
    There is a way out - use bare repository on DropBox which is the remote of the actual full repository which is located somewhere on your drive.
    – Dror
    Commented Mar 8, 2013 at 6:28
  • 2
    Although unrelated to the question, I must mention that this solution is flawed. You cannot exclude a directory on your local Dropbox folder from being synced with the online Dropbox account. It is the other way round which is possible.
    – pnp
    Commented Jul 11, 2014 at 19:47

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .