29

I'm an academic rather than a programmer, and I have many years' experience writing Python programs for my own use, to support my research. My latest project is likely to be useful to many others as well as me, and I'm thinking of releasing it as an open-source Python library.

However, there seem to be quite some hurdles to cross in going from a functioning personal project to a library that can be installed and used painlessly by others. This question is about the first steps I should take in order to start working toward a public release.

Currently, I have a single git repository that contains my code that uses the library as well as the library itself, and I use git as an emergency undo button in case anything breaks. All of this works fine for a single user but is obviously not appropriate if I want to release it. Where I want to end up is that my library is in a separate repository and can be installed by others using pip, and has a stable API.

Learning to use setuptools etc. is probably not so hard once I'm at the point of wanting to publish it - my problem is in knowing how I should be working in order to get to that point.

So my question is, what are the first steps one should take in order to start preparing a Python library project for public consumption? How should I reorganise my directory structure, git repository etc. in order to start working towards public a release of the library?

More generally, it would be very helpful if there are resources that are known to be helpful when attempting this for the first time. Pointers toward best practices and mistakes to avoid, etc., would also be very helpful.

Some clarification: the current answers are addressing a question along the lines of "how can I make my Python library a good one for others to use?" This is useful, but it's different from the question I intended to ask.

I'm currently at the start of a long journey towards releasing my project. The core of my implementation works (and works really well), but I'm feeling overwhelmed by the amount of work ahead of me, and I'm looking for guidance on how to navigate the process. For example:

  • My library code is currently coupled to my own domain-specific code that uses it. It lives in a subfolder and shares the same git repository. Eventually, it will need to be made into a stand-alone library and put into its own repository, but I keep procrastinating this because I don't know how to do it. (Neither how to install a library in 'development mode' so that I can still edit it, nor how to keep the two git repos in sync.)

  • My docstrings are terse, because I know that eventually I will have to use Sphinx or some other tool. But these tools seem not to be simple to learn, so this becomes a major sub-project and I keep putting it off.

  • At some point I need to learn to use setuptools or some other tool to package it and track the dependencies, which are quite complex. I'm not sure whether I need to do this now or not, and the documentation is an absolute maze for a new user, so I keep deciding to do it later.

  • I've never had to do systematic testing, but I definitely will for this project, so I have to (i) learn enough about testing to know which methodology is right for my project; (ii) learn what tools are available for my chosen methodology; (iii) learn to use my chosen tool; (iv) implement test suites etc. for my project. This is a project in itself.

  • There may well be other things I have to do as well. For example, jonrsharpe posted a helpful link that mentions git-flow, tox, TravisCI, virtualenv and CookieCutter, none of which I'd heard of before. (The post is from 2013, so I also have to do some work to find out how much is still current.)

When you put this all together it's a huge amount of work, but I'm sure I can get it all done if I keep plugging away at it, and I'm not in a hurry. My problem is knowing how to break it down into manageable steps that can be done one at a time.

In other words, I'm asking which are the most important concrete steps I can take now, in order to reach a releasable product eventually. If I have a free weekend, which of these things should I focus on? Which (if any) can be done in isolation from the others, so that I can at least get one step done without needing to do the whole thing? What's the most efficient way to learn these things so that I will still have time to focus on the project itself? (Bearing in mind that all of this is essentially a hobby project, not my job.) Is there any of it that I don't actually need to do, thus saving myself a huge amount of time and effort?

All answers are greatly appreciated, but I would especially welcome answers that focus on these project management aspects, with specific reference to modern Python development.

7
  • 2
    jeffknupp.com/blog/2013/08/16/…
    – jonrsharpe
    Commented Dec 31, 2018 at 8:44
  • 10
    The best way to check if a library is ready for release "into the wild" is to ask a fellow researcher or a student to try to use it and to write down all the difficulties they run into. If they can use it without constantly having to call you for assistance, then the library is in a shape that it can be used by others. Commented Dec 31, 2018 at 9:51
  • @jonrsharpe thanks, there's a lot of super useful information there
    – N. Virgo
    Commented Dec 31, 2018 at 9:58
  • @BartvanIngenSchenau thank you, I'll definitely bear that in mind once I'm close to that step. I'm very much at the "first steps" stage now, of taking something that works but is very far from ready for release, and wondering how I should be doing things now to make sure it can become releaseable in the future.
    – N. Virgo
    Commented Dec 31, 2018 at 10:00
  • 3
    You should definitely make a stand-alone git repo for the library and then be your own first customer. Only use the library in your project as a proper library, not linking to its source. Commented Dec 31, 2018 at 14:27

6 Answers 6

21

Adding a setup.py, while necessary, is not the most important step if you want your library to be used. More importantly is to add documentation and advertise your library. Since the second point strongly depends on the library, let me rather focus the documentation aspect.

  1. You know everything about your library. And this is problematic. You already know how to install and how to use it, so many things may seem intuitive or plainly obvious to you. Unfortunately, the same things may be neither obvious, not intuitive for the users. Try to look at your library as if you knew nothing about it, and more importantly, ask other people to use it and try to spot all the difficulties they had.

  2. Explain, in plain English, what is your library about. Too many libraries assume that everybody knows about them. When this is not the case, it may be difficult to grasp what is the purpose of the library.

  3. Write detailed technical documentation, but also don't forget about short pieces of code which show how to do some of the tasks with your library. Most developers are in a hurry, and if they need to spend hours trying to understand how to do a basic thing, they may tend to switch to other libraries.

  4. Include your contact information. If your library is a success (and my own experience have shown that this is the case even for rather unknown ones as well), people would encounter difficulties with it: either bugs or simply difficulties understanding or using some parts of it. It is often useful to receive their feedback to improve your library: for every person who reported a problem, there are possibly hundreds who, when encountering it, would just prefer to switch to another library.

Additionally to that:

  1. Make it clear if your library works with Python 2 or 3 or both.

  2. If the library doesn't work on Windows, say so.

  3. Ensure you use official conventions (use pep8 to check). If not, either explain it clearly or fix it.

  4. Take care of handling edge cases. When your library is called with a wrong type or with a value which is not supported, it should say, in plain English, what exactly is wrong. What it shouldn't do is to raise a cryptic exception ten levels down the stack and let the user figure out what went wrong.

3
  • Thank you, I completely agree that the quality of documentation makes or breaks a project. (It's usually the second thing I check when deciding whether to use a project, after the date of the last commit.) On a more technical level, there is a confusingly large ecosystem of tools for managing documentation of Python code. How can I tell which one I should invest in learning for my project?
    – N. Virgo
    Commented Dec 31, 2018 at 10:03
  • 3
    @Nathaniel Sphinx is a bit tricky to set up but is the de-facto standard. You can use readthedocs.org to host Sphinx documentation on the web. Sphinx is able to use the docstrings from the functions and modules in your library. Alternatively, just type up the docs yourself in the readme file, but that gets unwieldy for larger projects. The Python project I maintain uses Github pages for the Sphinx documentation which means that I have to commit the HTML files, though I'm planning to move away from that.
    – amon
    Commented Dec 31, 2018 at 10:53
  • 5
    How can I tell which one I should invest in learning for my project? - you don't. You spend a little time choosing one that seems reasonable and roll with it. As a javascript dev where you have 40 options for every decision, I promise this is the right decision :)
    – aaaaaa
    Commented Dec 31, 2018 at 19:22
2

Having used a fair few less than mature libraries over the years a key piece of advice is once you’ve picked your deployment tool do the following: Does your library do something really useful that you can build a community around?

Identify the your library dependencies.

Attempt a deployment into a clean environment either a docket container or VM. I consider this step crucial as all to often there’s something unique about a personal environment that causes problems.

Consider who is going to maintain the library in the future, there’s nothing more frustrating than coming across a library that was somebody’s pet project for three or four years and then doesn’t get the updates needed to keep it current.

Consider if you or your team want to make the commitment to keep the library tested and documented (unit tests and CI pipelines start being part of the equation ihere).

2

Perhaps you could find a mature OSS project in your field and contribute your code to that project? There could be a few advantages, such as:

  • You can maximise your contribution. Indeed, many "hobby" OSS projects are potentially valuable but used little by the community (cf. @ReaddyEddy answer). It's just a lot of effort to make the project up to scratch initially, then to maintain it, advertise it, provide proper examples and documentation, etc.
  • Lots of the technical issues you mentioned would already be solved in the mature project.
  • If your library adds value to the OSS project, its contributors could help you bring your code up to the project standards. So you can save effort and gain experience. You'll also get specific answers about Sphinx, TravisCI, CookieCutter and other technical aspects.

If there's a relevant OSS project that you like and maybe use, why not open an issue or a pull request or otherwise get in touch with the maintainers? (A good way to start might be to solve an existing issue.)

1
  • Thank you, it's a nice idea. However, in my case there isn't an existing package into which my code could be integrated. There is an established OSS project with similar functionality, but it's built on different technology and uses a fundamentally different algorithm at its core. (As a result, some things are fundamentally impossible that become easy in my version.) I'm certain there's a small but potentially dedicated audience for my code, but because it's a novel approach I don't think there's any way to make it available other than developing it as a new project.
    – N. Virgo
    Commented Jan 3, 2019 at 12:08
2

It's 2019, I strongly suggest starting with the most modern tools. You don't need a setup.py, that's something people in the Python community want to get rid of, and I believe eventually they will.

Try Poetry, You will not regret it.

4
  • 2
    Thank you for the answer. I will take a look into Poetry. I would like to say though, that in 2019 it's fantastically difficult for a newcomer to work out what the most modern tools actually are. If you're not in the know it's very hard to tell which tools are the de facto standard ones that everyone uses and which are among the many also-rans and experimental projects. The official documentation doesn't keep up with these things, and development moves so fast that any introductory material I find is guaranteed to be out of date.
    – N. Virgo
    Commented Jan 5, 2019 at 13:38
  • All of which is to say thank you for telling me that Poetry is the one I should be looking into, rather than the three or four other active projects I found that seem to do the same thing. This is the kind of information I was hoping to get from this question.
    – N. Virgo
    Commented Jan 5, 2019 at 13:39
  • 1
    @Nathaniel Python "Packaging" is changing fast(and that's why there are many ways to do this, and it's hard to find what's best), but with PEP 517, 518 implemented by many tools(like Poetry), we finally have something that's not so terrible. Note that Poetry is not necessarily the "best" tool, but at least it's one of the best. Do take a look at testandcode.com/52, you'll get a pretty good idea around this topic.
    – laike9m
    Commented Jan 6, 2019 at 1:03
  • Thank you, that's very helpful, I'm listening now. Perhaps all this means I should set packaging aside for now and concentrate on the other aspects (e.g. learning tools for documentation and testing), simply because there might be a more stable Python packaging ecosystem in six months or so.
    – N. Virgo
    Commented Jan 6, 2019 at 2:48
2

This is a complicated question that you are asking, and I completely agree Arseni's answer. Good documentation is a very important aspect. If I don't succeed to get your library up and running with a few simple steps, I am just dropping it right there (unless I am really anxious to try it).

Some things you definitely consider

  • Think about how you are going to version your library. You want to have backwards compatibility to some level, and bugfixes along the route as well. Read about semantic versioning
  • You're using git in a relatively linear way (to undo). Are you familiar with branching in git. It is really not that difficult, and makes life easy. Once you got your grips with branches. Adapt a branching model for your repository. Pick the parts of this branching model that you deem relevant. Also compare this with branches from repositories that you are using.
  • Licensing: you should provide a license for your library. I am no legal expert on this matter, so I can only share a link to this comparison between common licenses. Don't take this choice lightly.
  • Bugtracker. You want that user can provide you with bug reports. This helps you to improve the quality of the code. For each bug that you solve, add a test to your testing frame work, which ensures that it doesn't brake in the future (regression testing). A bug tracking system can be used for feature requests.
  • User contributions. Do you want user contributions? I am not sure how this typically works on open-source products, but I can imagine that you can allow users to create feature branches. Via github you seem to be able to control this via pull requests

I've no relevant experience with Python, so I cannot give you any hints in that direction. However, it is possible to automate all testing triggered by each commit on your remote repository (i.e. using Jenkins). I however suggest to postpone this, because it is a lot of work to set up without prior experience.

2

These are great questions.

About important concrete incremental steps towards a releasable library:

  • Separate the files that will become the library from the rest of the project.
    • The library should go into its own git repository but you might find it a useful intermediate step to put it in a separate top-level directory within your current repository. When you do make it a separate repository, store that adjacent to the rest of your project, which can then refer to it via ../library until you get around to the the pip packaging and development mode steps.
    • All accesses from the rest of the project to this library should go through its public API. You might find some interdependencies to tease apart.
  • Be incrementally writing docstrings to document the library's API.
    • Eventually the docstrings will feed into a documentation tool, but the important work is to write the text that explains the API concisely and sufficiently to other people. It's easier to fill it out a bit at a time than all at once, and it'll come out much better by writing rough drafts and getting back to it later when better explanations and examples come to mind.
    • If you find some part of the API is difficult to document, ask if that part of the API has room for improvement. Could it be simpler? More regular? Is it too general? Too specialized? Could it use more familiar names?
    • The docstrings can document argument types using structured comments that tools can check. I've yet to find real documentation on that, but the PyCharm IDE will help construct those docstrings and will immediately check argument types while editing method calls.
    • Speaking of which, PyCharm is a wonderful tool for saving developer time and improving code quality. It will run "inspections" to check the code while you edit it, e.g. checking types when it can, checking for missing and unused imports, duplicate methods, PEP 8 style mistakes, and so on.
  • Start writing unit tests using pytest. Long before you make a release, unit tests will pay off in your own development by finding bugs in corner cases and providing confidence that code changes didn't break things. Again, you can build this up over time. It's pretty easy to start.
  • Peruse existing open source libraries (that are roughly the same size) on GitHub to see how they organize the files and releases. Watch how they do bug/issue tracking and pull requests. Contribute to one or more of them to get experience with these multi-person project organization processes if you don't have experience there. GitHub has good tools for these processes. It does nice things with README.md documentation files at the top level and in any directory, and with a license file.
  • Consider enlisting a collaborator to get feedback on the library, its API, and documentation.
    • When you release, it'll help to have one or more collaborators to fix bugs when you're on vacation, to help answer user questions, and meanwhile to start doing Pull Requests with code reviews, to divvy up the library-releasing tasks, and bring in additional experience with project management and library design.
  • So far you've been doing a linear git commit history. Eventually it'll be useful to use "issue branches" for specific fixes and changes, "release branches" for the controlled run-up to a release, and "development branches" for any multi-person work in progress that's not ready to merge into the master branch. So set aside a day or two along the way to learn about this and start getting practice with it before you need to rely on those git skills. git is very flexible and useful but the user interface can get fraught.
    • One place to read about git branches and their uses is in the Pro Git book. Of the many ways to use branches, start with just "issue branches."
    • The GitHub Desktop app is a great tool to manage branches. It's also great for making commits since it makes it easy to write the commit message while reviewing all the changes.

Not the answer you're looking for? Browse other questions tagged or ask your own question.