Ordlista

Välj ett av nyckelorden till vänster ...

UtilitiesConda

Lästid: ~25 min

Suppose you've written some Python code that you want to share. Other users will have to get your code and perform some setup operations, including making their Python environment aware of your package so they can import it. Ideally you'd communicate information about any other modules that your module requires, so that users can make sure they have all of the requirements before they try to use your module. When you make improvements to your code, you'd like for your users to be able to get those changes as effortlessly as possible, preferably without having to go through installation steps again.

These code distribution challenges are difficult to manage manually, so developers have built systems designed to automate code distribution processes. These systems are called package managers. The main package managers for Python are Pip and Anaconda. Pip is a general Python installer, installing packages from the Python Package Index. Anaconda is more geared toward data science, and it installs packages from its own collection called Anaconda Repository.

We recommend installing Anaconda and using it to manage your packages. Anaconda has a few important advantages over Pip:

  1. Anaconda ensures that all requirements of all available packages are satisfied. Pip updates your environment when you install a package based on that package's requirements. Such an update might break previously installed packages, since they might depend on a different version of the same package.

  1. Anaconda provides built-in support for managing multiple virtual environments. If Package A and Package B have incompatible versions of Package C, you can set up one virtual environment with Package A and one version Package C, and a second virtual environment with Package B and another version of Package C.

  1. For packages that depend on compiled code, Anaconda directly installs binaries. This means that these dependencies are built by the package maintainers sent to you ready to run. If the build process happens on your computer, there are more opportunities for things to go wrong in the installation process.

Some packages are available on PyPI but not Anaconda, and in these cases we recommend that you use pip.

Exercise

  1. Virtual environments are important for .
  2. Because Conda is doing computation to ensure all required dependencies are met, it often takes longer than Pip to install a new package.
  3. Separate virtual environments can be used to manage incompatible dependencies between two projects.

Virtual Environments

Your Python environment is the set of packages you have available to import in a Python session. For example, a user's Python environment often includes all of the packages installed on the computer. A virtual environment emulates such an environment by exposing specific packages (and specific versions of those packages) to the Python interpreter. Virtual environments are useful because they allow the user to quickly switch between different sets of available packages. They also make it possible to be confident about exactly what packages are needed for a given application and share that information so that others can reproduce an environment without interfering with other environments they might need on that machine.

For example, if you need NumPy 1.16.3 for one project and NumPy 1.16.4 for a different project, your package manager can install both versions and just change which one is used when you execute import numpy. This is much more convenient than uninstalling one version and installing the other every time you need to switch between the two projects.

To use conda virtual environments, we first have to set up conda to work with our shell. This requires restarting bash.

conda init bash
conda config --set changeps1 False
exit

The second line configures conda to refrain from its default behavior of printing the name of the current environment every time a command is run from the command line. You might find this setting preferable on your own computer, but it will be essential for us as we execute the bash cells in this section.

To create a new Anaconda virtual environment, use conda create. To activate an environment, use conda activate. (Note: this cell takes a few dozen seconds to run, and it prints quite a bit of text. The --yes argument automatically answers "yes" when conda asks us whether we want to proceed)

conda create -n myenv python numpy=1.16.4 --yes
conda activate myenv

We can check that our newly activated environment has NumPy but not Pandas.

echo "import numpy" > tmp.py
echo "print(numpy.version.version)" >> tmp.py
echo "import pandas" >> tmp.py
python tmp.py

We can view all of the environments we've set up with conda env list:

conda env list

Conda installation operations modify the current environment. For example, we can add pandas:

conda install pandas --yes
python tmp.py

We can get a readable version of the current environment using export:

conda env export

The output of this command can saved to a file—customarily called environment.yml— which can be used by others to replicate the environment. Just for practice, let's save the environment to a YAML file, remove the environment from our system, and then re-create it from the YAML file.

conda env export > environment.yml
conda remove -n myenv --all
conda env create -f environment.yml

Note that we used the -f argument to make conda env create get the package list from the environment.yml file rather than directly from the command line.

Exercise
If you want a colleague to be able to reproduce the Python environment you used in a particular project, one convenient way to do that is to give them your file, using .

Other reproducibility solutions

We will close this section by mentioning two other solutions for the reproducibility problem. If you're working with a non-Conda Python installation, you can use pip together with virtualenv to reproduce the virtual environment functionality of Conda. You can also get pip to give you a list of the packages and versions available in the local virtual environment using pip freeze.

A much more general-purpose tool for achieving reproducibility is Docker. We'll discuss Docker more extensively in the final section in this course.

Bruno
Bruno Bruno