12

I want to cite a GitHub repository in scientific publications. The repository that I used is a fork version of the original GitHub repository which is accompanied by a paper. Both repositories are the same, but the fork has additional code for further calculations and plots.

I cited the paper to refer to the methodology. But I did not use the original GitHub provided by the paper. Instead, I used the fork.

Should I cite the fork?

3
  • Welcome back, nikki! Would you mind to explain "fork" to make the question more seldcontained?
    – user111388
    Commented Feb 29, 2020 at 20:55
  • 3
    You can be generous with citing. It won't hurt you, but give credits where credits are due. Commented Feb 29, 2020 at 21:40
  • 1
    If you didn't use the additional code, you could have used the original repository. Assuming you did use the additional code, of course you should cite the fork.
    – chepner
    Commented Mar 1, 2020 at 16:26

3 Answers 3

22

You should cite the original paper and the GitHub fork. You are obligated to cite all sources you used, even if they overlap. Your methods should also state exactly what code you reused.

1
  • The licence on the original repository might say all subsidiary works need to reference the original. The licence on the fork might say it needs to be referenced too.
    – CJ Dennis
    Commented Mar 1, 2020 at 1:49
7

For simplicity, original code refers to the software provided by the original repository or paper, and additional code refers to what is only provided by the fork.

A good litmus test is this: Suppose you had written the additional code yourself. Would you mention this, what algorithms you used, etc. in the paper? Or would this code be part of your publication, e.g., to ensure reproducibility. If yes, you should also cite the fork.

For example, if the original code allows you to perform some simulations and the additional code is only about plotting and does not touch the actual simulations (and you checked this to a reasonable extent), do not cite the fork for the same reason that you would not mention or provide any own code or plotting library that does nothing but plot some existing data. There is no reason to assume that the original does not suffice to reproduce your results. If it considerably helped you in preparing your plots, consider acknowledging it.

If, however, the additional code modifies the original code in a way that could affect the results, cite it. It does not matter whether the code added any functionality you used, but without this citation your work could not be reproducible anymore. Remember that citing does not only give credit but also shifts blame.

3
  • 4
    Better err on the side of giving a potential replicator of your results too much details.
    – vonbrand
    Commented Feb 29, 2020 at 23:49
  • This does not account for the licenses on both repositories. The license on the forked repository might state that accreditation is required. Commented Mar 1, 2020 at 20:00
  • @RobinDeSchepper: In my experience, licences requiring accreditation are rather unusual in science (and in general), because they prevent the inclusion of the software in larger bundles, distributions, etc. and thus do more harm than good. But yes, if that applies here, you have to account for it.
    – Wrzlprmft
    Commented Mar 1, 2020 at 20:58
2

Cite the most relevant version.

Could you please let me know if I should cite the fork version? or not?

You used a repository that forks the original to add code for further calculations and plots.* Given that you cite the paper's methodology, which appears in the original repository and is unchanged in the fork, I'd suggest using the original repository. That said, it doesn't really matter which you use.

If the paper is published, then cite the published version, rather than GitHub.

* I don't immediately understand why a fork would be needed for additional code. That code could have appeared in the original repository.

3
  • 8
    Perhaps the additional code was added by someone different than the owners of the original repository (neither of which is the OP), which would explain having it in a fork.
    – GoodDeeds
    Commented Feb 29, 2020 at 19:22
  • 2
    Could have appeared, but didn't.
    – vonbrand
    Commented Feb 29, 2020 at 23:49
  • @vonbrand Indeed. It raises questions.
    – user2768
    Commented Mar 2, 2020 at 8:06

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .