Exercise 3 step 1: Conda environment with Python packages
You cannot do this exercise on ehproduction02
because ehproduction02
does not have internet access.
If you need to skip this step use the pre-created environment on ehproduction02
at
/home/martin/training/orthophotos/conda-segmentation.tar.gz
for subsequent steps.
If you want to perform it on your local or some other Linux machine with access to the internet,
then here are the instructions.
The steps that need to be performed are:
- Install miniconda and conda-pack.
- Copy the segmentation software into your directory
sander-script
. - Create a file
environment.yml
file as sketched above and enter all the required packages found insetup.py
- Create a virtual environment
conda-segmentation
Note that we will need to define and distinguish a few artifacts during this exercise:
- a conda environment
conda-segmentation
- the segmentation software
sander-script
provided by Sander Tars (thank you!) - a processor that we call
segmentation
- a processor package we call
sander-segmentation-1.0
- a local working directory we call
segmentation-wd
on your local computer
A Virtual Environment
In the Python world, virtual environments are used to isolate the package versions used by a particular program in order to avoid conflicts, for example if two programs require the same package but depend on different versions of it. The conda documentation goes into more detail on the concept of virtual environments if you are curious. For now, it is enough to know that a virtual environment provides a way to install packages used by a Python script into a designated directory.
There are different kinds of virtual environments that are used for Python, in our case, we will use a conda
environment.
In the Jupyter Notebook it is already explained how to install the required packages into a conda environment.
In order to avoid issues with deployment, we will install the same packages, without using the pip
tool (unlike explained in the Notebook) for package installation.
Dependencies
The Python program for orthophoto classification lists the necessary packages in a file called setup.py
. The call to the setup()
function contains two important parameters:
# setup.py
setup(
# ... more arguments
python_requires='>=3.10',
install_requires=[
'geopandas>=0.10',
'rasterio>=1.1.5',
# ... more packages
],
)
# ...
This tells us that we need a version of the Python interpreter equal or newer than version 3.10
as well as a number of Python packages
with their respective versions.
For installing the packages with conda
, we need to list this information in a different format, that conda understands.
Conda uses .yml
files to specify an environment. Typically, this file is named environment.yml
.
The format of the environment.yml
file is as follows:
# environment.yml
name: "conda-segmentation"
channels:
- conda-forge
dependencies:
- python>=3.10
- geopandas>=0.10
- rasterio>=1.1.5
# - ...
Note that the syntax is different to setup.py
, but the way to specify a version requirement of a package is the same,
e.g. geopandas>=0.10
. The environment.yml
file uses dashes (-
) to list the dependencies and doesn't add a comma (,
) after each list entry.
Furthermore, the python
version is listed in the general dependencies
list instead of its own section like we saw in the setup.py
file.
We can also see that we gave the environment a name, in this case we chose conda-segmentation
.
conda-segmentation
denotes the conda environment with the dependencies, while in the following
sander-script
denotes the provided software that implements the classification algorithm.
Download Miniconda and set up base environment
To install miniconda and conda-pack
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p conda-base
eval "$(conda-base/bin/conda shell.bash hook)"
conda activate
conda install -c conda-forge conda-pack
You cannot do the wget
and the conda install
commands on ehproduction02
because of the missing
internet access on them. You need to use a machine with internet access.
Furthermore, the tutorial assumes that you have access to a Linux command line. You can achieve this by
creating a virtual machine running Linux or (not tested by us) by using the
windows subsystem for linux.
Create a conda environment using an environment.yml file
From the training material we need the setup.py
provided in the sander-script
directory. Copy the
complete directory to make updates to it. The instructions show the path on ehproduction02
. If you
have copied the training material, please use the respective path.
You will need a working directory in which you can place the various directories and files that you will deploy in a later step. After you have deployed your application, you will not need the directory anymore unless you want to make changes and deploy the application again.
If you want to package the conda environment yourself, this directory
should be on your own computer with internet access. Otherwise you can choose to create it on ehproduction02
instead.
In either case you could choose a location (e.g. segmentation-wd
in your home directory) for your working directory and create it:
# on your local computer or ehproduction02
mkdir ~/segmentation-wd
cd ~/segmentation-wd
The following steps will be performed from your working directory unless explicitly specified otherwise.
Copy the source code of the script to your local working directory. If you are working on ehproduction02
:
# in your working directory on ehproduction02
cp -r /home/martin/training/orthophotos/sander-script .
ls -l sander-script
Otherwise, if you are working on your local computer:
# in your working directory on your local computer
scp -r ehproduction02:/home/martin/training/orthophotos/sander-script .
ls -l sander-script
Create a file environment.yml
as described above in the first subsection above with all
dependencies listed in setup.py
. Use an editor of your choice. Save it in your sander-script
directory.
cd sander-script
cat setup.py
# create a file environment.yml from the information contained in setup.py as described in the introduction
Use your environment.yml
to create a new Conda environment with the dependencies.
cd ..
conda env create -p conda-segmentation --file sander-script/environment.yml
conda
might ask you to confirm that the chosen packages should be installed, answer y
and
press Return
to confirm. If everything goes well, you should see a folder with the name of your
environment, i.e. conda-segmentation
in your working directory.
Activating the environment
Now that you have created a virtual environment, it needs to be activated, in order to be used. You can activate the environment with the command:
conda activate $(realpath conda-segmentation)
When your environment is active, you should see that your shell prompt changed to show the name of the environment.
Packing a conda environment
Unfortunately, conda environments cannot simply be copied to another location in the file system or another computer.
When they are moved, they do not work anymore.
Therefore we need to make our environment relocatable (make it possible to move), before we can deploy it.
Luckily, there is a tool called conda-pack
that we can use for this purpose.
conda-pack
takes a conda
environment and creates an archive out of it that can be moved to a different computer.
Create a relocatable archive out of our conda environment using:
conda pack --prefix conda-segmentation
The --prefix
argument instructs conda pack
to package the environment we just created and not the one that is currently active.
The name of the .tar.gz
file being generated is the same as the name of our environment.
conda pack
sometimes creates broken archives. In order to avoid problems with unpacking at runtime on Calvalus,
please unpack and re-pack it in a temporary directory with the
tar
command. The instructions make sure the tar
file contains the content of the environment, not a subdirectory conda-segmentation or a subdirectory tmp.
mkdir tmp
cd tmp
tar xzf ../conda-segmentation.tar.gz
tar czf ../conda-segmentation.tar.gz *
cd ..
rm -r tmp
Next steps
Now that we have created a relocatable conda environment with the dependencies of our algorithm we can continue with a local test run.