module load singularity/<version>
This post was originally written for the CBC-UCONN HC wiki
The purpose of this post is to enable cluster users to run jupyter
on the cluster interactively, enabling them to conduct data analysis and visualization. There are different “flavors” of jupyter
notebooks, the most appropriate are going to be pointed out at picking a container.
Preliminaries
We assume that the user has access to ssh
through a terminal
. In addition, it is necessary to have SingularityCE (singularity
for short) installed on a computer on which you have root
privileges.
In order to use
singularity
on a Windows (or Mac) machine, a Linux Virtual Machine(VM) needs to be set up. Setting up a VM and installing SingularityCE os beyond the scope of this document.
In addition, singularity
is assumed to be available on the HPC that you have access to. Usually, users have to run
before using it.
Familiarity with containers is helpful but not necessary. Loosely speaking, a container allows us to “isolate” a set of tools and software in order to guarantee code reproducibility and portability. Moreover, singularity
was developed (among other reasons) to integrate these tools with HPC clusters.
Picking a container
The Jupyter Docker Stacks contains several useful docker
containers that can be easily used to build singularity
containers.
A list of the containers (along with their principal features) maintained by the Jupyter team can be found here. Some of these containers are detailed below
jupyter/r-notebook
: a container containing a basic installation for Machine Learning usingR
.jupyter/scipy-notebook
: contains popular libraries for scientific computing usingpython
.jupyter/tensorflow-notebook
: this is thejupyter/scipy-notebook
withtensorflow
installed on it.jupyter/datascience-notebook
: includes libraries for data analysis from thejulia
,python, and
R` communities.
Converting a docker
into a singularity
container
Once you have chosen a container suitable for your needs (and have root access to a machine with singularity
), a singularity
container can be generated by executing the following chunk of code in the terminal.
## singularity pull <choose-a-name>.sif docker://jupyter/<preferred-notebook>
singularity pull mycontainer.sif docker://jupyter/datascience-notebook
In the example above, I choose to use the datascience-notebook
. After doing so, the .sif
file generated by singularity needs to be transferred to the cluster. My personal preference is to use either scp
or rsync
, for example
rsync mycontainer.sif <username>@<hpc-url>:<location>
Using the singularity
container on the cluster
After transferring the .sif
file to the cluster, follow the following steps. Firstly, set up the VPN and log-in to the cluster using ssh
, then navigate to the location where you transferred the container (.sif
) to. Next, you will have to start a interactive job. If the workload manager used in the HPC that you have access to is SLURM, this can be done either with srun
or fisbatch
. To start an interactive job with srun
use
srun --partition=<partition-name> --qos=<queue-name> --mem=64G --pty bash
The same task can be achieved with fisbatch
(if available) with
fisbatch --partition=<partition-name> --qos=<queue-name> --mem=64G
Either of these commands will allocate your job to a specific node. It is important to save the name of the node that your job has been allocated to. Next, load singularity
on that node as follows
module load singularity/<version>
The penultimate step is to start the jupyter
instance. It is done as follows
singularity exec --nv mycontainer.sif jupyter notebook --no-browser --ip='*'
After executing the last chunk of code, the terminal will be “busy” and will provide three URLs, they will look somewhat like “http://127.0.0.1:8888/” this. Copy the last address provided by the output. The last step before being able to access the notebook through the provided address is to create a ssh
tunnel. To do so, open another terminal window and execute
ssh -NL localhost:8888:<node>:8888 <username>@<hpc-url>
where <node>
should be replaced by the node to which the job submitted using srun
(or fisbatch
) was submitted to. This tunnel will keep the other terminal window busy to.
Finally, copy the address provided by the notebook (e.g., “http://127.0.0.1:8888/”) and paste it into your browser.
Citation
@online{godoy2022,
author = {Godoy, Lucas},
title = {Using Jupyter on an {HPC} Cluster},
date = {2022-06-09},
url = {https://lcgodoy.me/posts/jupyter_hpc/jupyter_hpc.html},
langid = {en}
}