Exercise 3 step 1: Conda environment with Python packages

You cannot do this exercise on ehproduction02 because ehproduction02 does not have internet access. If you need to skip this step use the pre-created environment on ehproduction02 at /home/martin/training/orthophotos/conda-segmentation.tar.gz for subsequent steps. If you want to perform it on your local or some other Linux machine with access to the internet, then here are the instructions.

The steps that need to be performed are:

  • Install miniconda and conda-pack.
  • Copy the segmentation software into your directory sander-script.
  • Create a file environment.yml file as sketched above and enter all the required packages found in setup.py
  • Create a virtual environment conda-segmentation

Note that we will need to define and distinguish a few artifacts during this exercise:

  • a conda environment conda-segmentation
  • the segmentation software sander-script provided by Sander Tars (thank you!)
  • a processor that we call segmentation
  • a processor package we call sander-segmentation-1.0
  • a local working directory we call segmentation-wd on your local computer

A Virtual Environment

In the Python world, virtual environments are used to isolate the package versions used by a particular program in order to avoid conflicts, for example if two programs require the same package but depend on different versions of it. The conda documentation goes into more detail on the concept of virtual environments if you are curious. For now, it is enough to know that a virtual environment provides a way to install packages used by a Python script into a designated directory.

There are different kinds of virtual environments that are used for Python, in our case, we will use a conda environment. In the Jupyter Notebook it is already explained how to install the required packages into a conda environment. In order to avoid issues with deployment, we will install the same packages, without using the pip tool (unlike explained in the Notebook) for package installation.

Dependencies

The Python program for orthophoto classification lists the necessary packages in a file called setup.py. The call to the setup() function contains two important parameters:

# setup.py

setup(
    # ... more arguments
    python_requires='>=3.10',
    install_requires=[
        'geopandas>=0.10',
        'rasterio>=1.1.5',
        # ... more packages
    ],
)
# ...

This tells us that we need a version of the Python interpreter equal or newer than version 3.10 as well as a number of Python packages with their respective versions.

For installing the packages with conda, we need to list this information in a different format, that conda understands. Conda uses .yml files to specify an environment. Typically, this file is named environment.yml. The format of the environment.yml file is as follows:

# environment.yml

name: "conda-segmentation"
channels: 
    - conda-forge
dependencies:
    - python>=3.10
    - geopandas>=0.10
    - rasterio>=1.1.5
  # - ...

Note that the syntax is different to setup.py, but the way to specify a version requirement of a package is the same, e.g. geopandas>=0.10. The environment.yml file uses dashes (-) to list the dependencies and doesn't add a comma (,) after each list entry. Furthermore, the python version is listed in the general dependencies list instead of its own section like we saw in the setup.py file.

We can also see that we gave the environment a name, in this case we chose conda-segmentation. conda-segmentation denotes the conda environment with the dependencies, while in the following sander-script denotes the provided software that implements the classification algorithm.

Download Miniconda and set up base environment

To install miniconda and conda-pack

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p conda-base
eval "$(conda-base/bin/conda shell.bash hook)"
conda activate
conda install -c conda-forge conda-pack

You cannot do the wget and the conda install commands on ehproduction02 because of the missing internet access on them. You need to use a machine with internet access. Furthermore, the tutorial assumes that you have access to a Linux command line. You can achieve this by creating a virtual machine running Linux or (not tested by us) by using the windows subsystem for linux.

Create a conda environment using an environment.yml file

From the training material we need the setup.py provided in the sander-script directory. Copy the complete directory to make updates to it. The instructions show the path on ehproduction02. If you have copied the training material, please use the respective path.

You will need a working directory in which you can place the various directories and files that you will deploy in a later step. After you have deployed your application, you will not need the directory anymore unless you want to make changes and deploy the application again.

If you want to package the conda environment yourself, this directory should be on your own computer with internet access. Otherwise you can choose to create it on ehproduction02 instead. In either case you could choose a location (e.g. segmentation-wd in your home directory) for your working directory and create it:

# on your local computer or ehproduction02
mkdir ~/segmentation-wd
cd ~/segmentation-wd

The following steps will be performed from your working directory unless explicitly specified otherwise.

Copy the source code of the script to your local working directory. If you are working on ehproduction02:

# in your working directory on ehproduction02
cp -r /home/martin/training/orthophotos/sander-script .
ls -l sander-script

Otherwise, if you are working on your local computer:

# in your working directory on your local computer
scp -r ehproduction02:/home/martin/training/orthophotos/sander-script .
ls -l sander-script

Create a file environment.yml as described above in the first subsection above with all dependencies listed in setup.py. Use an editor of your choice. Save it in your sander-script directory.

cd sander-script
cat setup.py
# create a file environment.yml from the information contained in setup.py as described in the introduction

Use your environment.yml to create a new Conda environment with the dependencies.

cd ..
conda env create -p conda-segmentation --file sander-script/environment.yml

conda might ask you to confirm that the chosen packages should be installed, answer y and press Return to confirm. If everything goes well, you should see a folder with the name of your environment, i.e. conda-segmentation in your working directory.

Activating the environment

Now that you have created a virtual environment, it needs to be activated, in order to be used. You can activate the environment with the command:

conda activate $(realpath conda-segmentation)

When your environment is active, you should see that your shell prompt changed to show the name of the environment.

Packing a conda environment

Unfortunately, conda environments cannot simply be copied to another location in the file system or another computer. When they are moved, they do not work anymore. Therefore we need to make our environment relocatable (make it possible to move), before we can deploy it. Luckily, there is a tool called conda-pack that we can use for this purpose. conda-pack takes a conda environment and creates an archive out of it that can be moved to a different computer.

Create a relocatable archive out of our conda environment using:

conda pack --prefix conda-segmentation

The --prefix argument instructs conda pack to package the environment we just created and not the one that is currently active. The name of the .tar.gz file being generated is the same as the name of our environment.

conda pack sometimes creates broken archives. In order to avoid problems with unpacking at runtime on Calvalus, please unpack and re-pack it in a temporary directory with the tar command. The instructions make sure the tar file contains the content of the environment, not a subdirectory conda-segmentation or a subdirectory tmp.

mkdir tmp
cd tmp
tar xzf ../conda-segmentation.tar.gz
tar czf ../conda-segmentation.tar.gz *
cd ..
rm -r tmp

Next steps

Now that we have created a relocatable conda environment with the dependencies of our algorithm we can continue with a local test run.