1

I am looking for a way to manage the packages and resources needed for different coding projects. I'm not sure that this is the correct place to ask this question but couldn't find a place that perfectly seemed to fit, so please redirect if appropriate.

Essentially, I work on a lot of different projects, primarily in Python but sometimes in C++. I currently work on Ubuntu Linux. Each of these projects requires language specific packages, but also occasionally some operating system-wide resources (such as GPU drivers, etc.) I frequently run into a lot of trouble due to conflicting inter-dependencies, package updates, etc. I'd like a way to a associate all of the relevant files, packages, and a coding environment for each project and isolate them such that I can put the operating system in a state to run each project when necessary and deactivate it when unnecessary.

I understand there are some resources that exist for this. I have found that python or anaconda environments don't work that well because they are only able to deal with python-specific packages. I could create a separate virtual machine for each project, but this seems like overkill and would be a large resource allocation. I don't fully understand what a Docker container or Kubernetes is, but perhaps these are the solution I'm looking for.

In short, what system do you recommend using for keeping all of the necessaries for working on a project in a clean, isolated container that can be deactivated and activated as necessary to avoid conflicts?

0

1 Answer 1

1

Well, I've spent the last month or so thinking about this question and I thought I'd share a bit of what I've learned. I am by no means an expert on this so take what I say with a grain of salt, but I'll at least propose a variety of solutions that I've considered and learned a bit about. I'll discuss them in approximate order of complexity.

  1. Google Colab - if you're just doing simple python programming and need a variety of different packages for different projects, Google Colab is a great resource. It is essentially a Jupyter Notebook style environment that runs on the cloud, and it even provides a GPU (or TPU). This is essentially a dedicated Linux kernel with a basic set of python packages installed, and it's easy to install the packages you need. This can be a nice workaround to installing a bunch of os level and python level packages (in my case CUDA, ffmpeg, opencv, pytorch and a number of other python packages, which took me a solid week to get integrated correctly and took about 15 minutes to get a functionally (almost) equivalent environment in Google Colab). However, if you need to perform a lot of os level installations besides enabling GPU computation, this may not be your best bet as there is no convenient terminal provided.

  2. Pip or Anaconda Environments - it is slightly easier to install and manage different python environments with different packages using virtual environments, but again these don't provide support for installing os level packages, so you can run into trouble if different packages require different versions of a shared lib, etc.

  3. Docker Container - a Docker container is essentially an image of all of the libs and packages on an instance but still runs within the host kernel (in contrast to a virtual machine). For this reason, there is less compute overhead to using a docker image. You can create a docker image (which operates and feels about like a Linux instance), make necessary modifications and package downloads and installations, and then save the docker image for easy reloading in the future. There are a few notable limitations. Some programs such as CUDA that require an intimate connection with the host kernel are not supported by docker (although Nvidia does provide its own set of docker images that support CUDA computation, you'll still need to install CUDA on the host kernel independently of the docker files). Secondly, if you are used to programming in an IDE, it can be hard to get an IDE using the interpreter within the docker container. Some IDE's allow you to SSH to a remote interpreter which is a potential workaround, but this is often a paid feature. In general, docker is thought of more as a tool for deployment and less for development.

  4. Virtual Machine - runs a separate kernel. Advantages are you have more freedom to install os level software and IDEs within the virtual machine itself, but you pay the cost of increased overhead on your computer.

  5. Checkpointing - if you don't want to deal with the potential drawbacks of any of these tools but also want to avoid having to do a full operating system reinstall whenever a new package installation goes badly awry, a nice, easy solution can be to take regular checkpoints of your system. This is essentially a complete image of all system (and optionally user) files. You can set up checkpoints at regular time intervals or use a "try it, and save it if it works" method when installing a bunch of software. Timeshift is a free and easy to use checkpointing software available on Ubuntu.

For my own personal projects, I opted for a mix of these solutions. I used Timeshift to checkpoint regularly as I installed CUDA and a variety of other packages. Once I had a working install of all packages and libs, I took a final snapshot which I can restore to if I ever break my installation. I checkpoint again any time I install a new program or package. I also use anaconda virtual environments to manage python packages and mutliple environments. If I'm working on something that doesn't require anything too niche in terms of shared system files, I might also use Google Colab (which has the additional advantage that I can work from anywhere) although I personally prefer Spyder over the Jupyter-style working environment.

Feel free to make corrections or suggestions to this as again I am by no means an expert, but hopefully this helps at least slightly.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .