Exercise 3: Converting a Python Script to a Software Package on Calvalus

This exercise handles a real-world case of deploying a meaningful processor on Calvalus. The exercise is split into 3 steps of local test, packaging, and run. You can skip one and a half of this and start with deployment and run it if you do not have a local development environment.

The application

Imagine your colleague or business partner (let's call him Sander) has created a Python script that uses a Machine Learning (ML) model, implemented using the Pytorch Machine Learning framework in Python, to classify pixels of an orthophoto into 12 different classes:

Class MSK
building 1
pervious surface 2
impervious surface 3
bare soil 4
water 5
coniferous 6
deciduous 7
brushwood 8
vineyard 9
herbaceous vegetation 10
agricultural land 11
plowed land 12

You colleague also helpfully provided a Jupyter Notebook explaining how to use the script, which you can find together with the provided software. Included with the software, you also find a file with the weights of the pre-trained ML model, so that you do not have to perform the training yourself and can go straight to applying the script to your own orthophotos.

The challenge

You would like to apply the model to a very large number of orthophotos, to create segmentation products for a whole city or country. Therefore, running the model on every input one by one on your workstation would take a very long time and consume most of your workstations CPU and RAM, keeping you from doing other work while you are waiting.

In order to prevent this scenario, we move the computation to a cluster instead of using the local workstation, using the Calvalus processing system. By using Calvalus, we can run many classification tasks at the same time, leveraging multiple computers (nodes) of the cluster. As an added benefit, you can use your workstation for other tasks in the meantime without it being slowed down.

The approach

In order to run the Python script as provided by Sander on Calvalus, we need to perform a number of steps to deploy the software and run it on the cluster

Deployment

  • Create a Python virtual environment with all dependencies and package it
  • Add a parameter to the main python script of the classification software
  • Create a wrapper script
  • Install the processor package

Execution

  • Use the processing system instance of exercise 1
  • Write a request
  • Submit the request

First, we make sure that we can run the script on our local machine to verify that everything works correctly in principle. In order to do that, we need to install the required Python packages first, using a virtual environment.

Note

In an ideal world we would need just two locations for this exercise, ehproduction02 and ESTHub's Calvalus cluster. But since ehproduction02 is protected in a way that it cannot access the internet you need a third place in order to create the Python virtual environment. You can use your local computer for building the environment. Or you can skip this step and use the pre-built conda environment provided in the material to run the processor locally on ehproduction02.