Exercise 3: Converting a Python Script to a Software Package on Calvalus
This exercise handles a real-world case of deploying a meaningful processor on Calvalus. The exercise is split into 3 steps of local test, packaging, and run. You can skip one and a half of this and start with deployment and run it if you do not have a local development environment.
The application
Imagine your colleague or business partner (let's call him Sander) has created a Python script that uses a Machine Learning (ML) model, implemented using the Pytorch Machine Learning framework in Python, to classify pixels of an orthophoto into 12 different classes:
Class | MSK |
---|---|
building | 1 |
pervious surface | 2 |
impervious surface | 3 |
bare soil | 4 |
water | 5 |
coniferous | 6 |
deciduous | 7 |
brushwood | 8 |
vineyard | 9 |
herbaceous vegetation | 10 |
agricultural land | 11 |
plowed land | 12 |
You colleague also helpfully provided a Jupyter Notebook explaining how to use the script, which you can find together with the provided software. Included with the software, you also find a file with the weights of the pre-trained ML model, so that you do not have to perform the training yourself and can go straight to applying the script to your own orthophotos.
The challenge
You would like to apply the model to a very large number of orthophotos, to create segmentation products for a whole city or country. Therefore, running the model on every input one by one on your workstation would take a very long time and consume most of your workstations CPU and RAM, keeping you from doing other work while you are waiting.
In order to prevent this scenario, we move the computation to a cluster instead of using the local workstation, using the Calvalus processing system. By using Calvalus, we can run many classification tasks at the same time, leveraging multiple computers (nodes) of the cluster. As an added benefit, you can use your workstation for other tasks in the meantime without it being slowed down.
The approach
In order to run the Python script as provided by Sander on Calvalus, we need to perform a number of steps to deploy the software and run it on the cluster
Deployment
- Create a Python virtual environment with all dependencies and package it
- Add a parameter to the main python script of the classification software
- Create a wrapper script
- Install the processor package
Execution
- Use the processing system instance of exercise 1
- Write a request
- Submit the request
First, we make sure that we can run the script on our local machine to verify that everything works correctly in principle. In order to do that, we need to install the required Python packages first, using a virtual environment.
Note
In an ideal world we would need just two locations for this exercise, ehproduction02
and ESTHub's Calvalus
cluster.
But since ehproduction02 is protected in a way that it cannot access the internet you need a third place in order
to create the Python virtual environment. You can use your local computer for building the environment. Or you can
skip this step and use the pre-built conda environment provided in the material to run the processor locally on
ehproduction02
.