Using jupyter on an HPC cluster
This post was originally written for the CBC-UCONN HC wiki
The purpose of this post is to enable cluster users to run
jupyter on the
cluster interactively, enabling them to conduct data analysis and
visualization. There are different “flavors” of
jupyter notebooks, the most
appropriate are going to be pointed out at Picking a container.
We assume that the user has access to
ssh through a
terminal. In addition,
it is necessary to have SingularityCE
singularity for short) installed on a computer on which you have
In order to use
singularityon a Windows (or Mac) machine, a Linux Virtual Machine(VM) needs to be set up. Setting up a VM and installing SingularityCE os beyond the scope of this document.
singularity is assumed to be available on the HPC that you have
access to. Usually, users have to run
module load singularity/<version>
before using it.
Familiarity with containers is helpful but not necessary. Loosely speaking, a
container allows us to “isolate” a set of tools and software in order to
guarantee code reproducibility and portability. Moreover,
developed (among other reasons) to integrate these tools with HPC clusters.
Picking a container
The Jupyter Docker
contains several useful
docker containers that can be easily used to build
jupyter/r-notebook: a container containing a basic installation for Machine Learning using
jupyter/scipy-notebook: contains popular libraries for scientific computing using
jupyter/tensorflow-notebook: this is the
tensorflowinstalled on it.
jupyter/datascience-notebook: includes libraries for data analysis from the
python, andR` communities.
docker into a
Once you have chosen a container suitable for your needs (and have root access
to a machine with
singularity container can be generated by
executing the following chunk of code in the terminal.
## singularity pull <choose-a-name>.sif docker://jupyter/<preferred-notebook> singularity pull mycontainer.sif docker://jupyter/datascience-notebook
In the example above, I choose to use the
datascience-notebook. After doing
.sif file generated by singularity needs to be transferred to the
cluster. My personal preference is to use either
rsync, for example
rsync mycontainer.sif <username>@<hpc-url>:<location>
singularity container on the cluster
After transferring the
.sif file to the cluster, follow the following
steps. Firstly, set up the VPN and log-in to the cluster using
navigate to the location where you transferred the container (
.sif) to. Next,
you will have to start a interactive job. If the workload manager used in the
HPC that you have access to is SLURM,
this can be done either with
fisbatch. To start an interactive job
srun --partition=<partition-name> --qos=<queue-name> --mem=64G --pty bash
The same task can be achieved with
fisbatch (if available) with
fisbatch --partition=<partition-name> --qos=<queue-name> --mem=64G
Either of these commands will allocate your job to a specific node. It is
important to save the name of the node that your job has been allocated
to. Next, load
singularity on that node as follows
module load singularity/<version>
The penultimate step is to start the
jupyter instance. It is done as follows
singularity exec --nv mycontainer.sif jupyter notebook --no-browser --ip='*'
After executing the last chunk of code, the terminal will be “busy” and will
provide three URLs, they will look somewhat like “http://127.0.0.1:8888/”
this. Copy the last address provided by the output. The last step before being
able to access the notebook through the provided address is to create a
tunnel. To do so, open another terminal window and execute
ssh -NL localhost:8888:<node>:8888 <username>@<hpc-url>
<node> should be replaced by the node to which the job submitted using
fisbatch) was submitted to. This tunnel will keep the other
terminal window busy to.
Finally, copy the address provided by the notebook (e.g., “http://127.0.0.1:8888/”) and paste it into your browser.