Automatic SLURM Build and Installation Script

Automatic SLURM Build Script for RH/CentOS 7, 8 and 9 as well as Ubuntu 18, 20 and 22 including Accounting

Building SLURM and installing is a partially manual process. We have compiled a script which automatically builds and installs SLURM on Redhat/CentOS 7.x, 8.x and 9.x as well as Ubuntu 18, 20 and 22 including optional Accounting.

You can execute the script at once or step by step to see what happens in case of interest. We recommend to execute the script as standard user being able to sudo to root.

You will be asked if you want to install the optional accounting support as well which will install MariaDB and configure SLURM accounting.

The automatic SLURM built and installation script for RH/CentOS/Rocky and Ubuntu derivatives can be downloaded here:

You can simply run the following steps on your SLURM master:

#
# Automatic SLURM built and installation script for EL7, EL8 and EL9, Ubuntu and derivatives 
#
# sudo yum install wget -y
# sudo apt install wget -y
wget --no-check-certificate https://www.ni-sp.com/wp-content/uploads/2019/10/SLURM_installation.sh
# set the desired version in case
export VER=20.11.9  #latest 20.11. VER=20.11.8
# export VER=21.08.5
# export VER=22.05.02  
bash SLURM_installation.sh
# wait a couple of minutes
# and test your SLURM installation yourself
sinfo
# see above for more SLURM commands and their output

You can also follow our guide on installing SLURM on Ubuntu (in WSL). Please see below for a container based setup of a SLURM cluster as well.

You can download pre-compiled RPMs for EL7 and EL8 here (you can basically start the script above at “cd ~/rpmbuild/RPMS/x86_64/” after extracting the tarball and setting up mariadb and munge):

In case you are interested in HPC in the Cloud head over to our overview article HPC in the Cloud – Pros and Cons.

Build RPMs only for RH/CentOS

In case you want to build only the RPMs here is the script for EL7:

sudo yum install epel-release -y
sudo yum install python3 gcc openssl openssl-devel pam-devel numactl \
     numactl-devel hwloc lua readline-devel ncurses-devel man2html \
     libibmad libibumad rpm-build  perl-ExtUtils-MakeMaker.noarch \
     rrdtool-devel lua-devel hwloc-devel munge munge-libs munge-devel \
     mariadb-server mariadb-devel -y
mkdir slurm-tmp
cd slurm-tmp
export VER=20.11.8    # latest 20.11
export VER=21.08.6
export VER=22.05.9
# export VER=23.02.2

wget https://download.schedmd.com/slurm/slurm-$VER.tar.bz2
rpmbuild -ta slurm-$VER.tar.bz2 
echo Your RPMs are at $HOME/rpmbuild/RPMS/x86_64:
ls -al $HOME/rpmbuild/RPMS/x86_64

And here the automatic RPM builder for EL8:

sudo yum install epel-release -y
sudo yum install dnf-plugins-core
sudo yum config-manager --set-enabled powertools
# in case of repo access issues
# sudo sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-*
# sudo sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-*
sudo yum install --enablerepo=powertools python3 gcc openssl \
     openssl-devel pam-devel numactl wget make numactl-devel \
     hwloc lua readline-devel ncurses-devel man2html \
     libibmad libibumad rpm-build  perl-ExtUtils-MakeMaker.noarch \
     rrdtool-devel lua-devel hwloc-devel munge munge-libs munge-devel \
     mariadb-server mariadb-devel rpm-build -y
mkdir slurm-tmp
cd slurm-tmp
export VER=20.11.8    # latest 20.11
export VER=21.08.6 
export VER=22.05.9
# export VER=23.02.2
wget https://download.schedmd.com/slurm/slurm-$VER.tar.bz2
rpmbuild -ta slurm-$VER.tar.bz2 
ls -al $HOME/rpmbuild/RPMS/x86_64
echo Your RPMs are at $HOME/rpmbuild/RPMS/x86_64

SLURM Cluster in Docker Containers

SciDAS has created a easy to use container-based SLURM setup to jump-start a small SLURM cluster. The automatic container build creates 2 SLURM compute workers with OpenMPI integration as well as a controller and a database container as per this graph from the github page:

SLURM in Containers Diagram with Services and Ports

Here is an overview how the straightforward installation on Ubuntu looks like with input from the github page:

> git clone https://github.com/SciDAS/slurm-in-docker
Cloning into 'slurm-in-docker'...
remote: Enumerating objects: 549, done.
remote: Total 549 (delta 0), reused 0 (delta 0), pack-reused 549
Receiving objects: 100% (549/549), 144.72 KiB | 682.00 KiB/s, done.
Resolving deltas: 100% (310/310), done.
# BEGIN - install docker in case not yet done - in our case for Ubuntu
> sudo apt-get install -y apt-transport-https \
    ca-certificates curl gnupg-agent \
    software-properties-common make 
> curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
> sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
> sudo apt-get update
> sudo apt-get install -y docker-ce docker-ce-cli containerd.io 
> sudo apt-get install -y docker-compose
> sudo groupadd docker
> sudo usermod -aG docker $USER
# END of Docker installation
# You might need to logout and login to active the docker group access rights
#
# Create the SLURM 19.05.1 containers (SLURM version can be adapted) 
#
> cd slurm-in-docker/
> make                  # building will take some minutes 
# ... lots of output ;) .............................
> docker images
REPOSITORY                      TAG                 IMAGE ID            CREATED             SIZE
scidas/slurm.database           19.05.1             035a7fb27574        3 days ago          828MB
scidas/slurm.worker             19.05.1             6faf0d7804f7        3 days ago          1.31GB
scidas/slurm.controller         19.05.1             e2445edbad54        3 days ago          1.31GB
scidas/slurm.base               19.05.1             668e97c1fb7b        3 days ago          805MB
scidas/slurm.rpms               19.05.1             8b5682048fee        3 days ago          885MB
centos                          7                   7e6257c9f8d8        6 weeks ago         203MB
krallin/centos-tini             7                   748636d1c058        16 months ago       226MB
> docker-compose up -d  # start the environment 
Creating network "slurmindocker_slurm" with the default driver
Creating controller ...
Creating controller ... done
Creating worker01 ...
Creating database ...
Creating worker02 ...
Creating worker01
Creating database
Creating worker02 ... done
> docker exec -ti controller sinfo -lN
NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
worker01       1   docker*        idle    1    1:1:1   1800        0      1   (null) none
worker02       1   docker*        idle    1    1:1:1   1800        0      1   (null) none
> docker exec -ti controller srun -N 2 hostname
worker02
worker01
> docker exec -ti controller srun --mpi=list
srun: MPI types are...
srun: pmi2
srun: openmpi
srun: none
> docker exec -ti controller ompi_info 
# ......... OpenMPI info output .......
# Test OpenMPI
> cat > home/worker/mpi_hello.c << EOF
/******************************************************************************
 * * FILE: mpi_hello.c
 * * DESCRIPTION: MPI tutorial example code: Simple hello world program
 * * AUTHOR: Blaise Barney
 * * LAST REVISED: 03/05/10
 * ******************************************************************************/
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define  MASTER 0

int main (int argc, char *argv[]) {
   int   numtasks, taskid, len;
   char hostname[MPI_MAX_PROCESSOR_NAME];

   MPI_Init(&argc, &argv);
   MPI_Comm_size(MPI_COMM_WORLD,&numtasks);
   MPI_Comm_rank(MPI_COMM_WORLD,&taskid);
   MPI_Get_processor_name(hostname, &len);

   printf ("Hello from task %d on %s!\n", taskid, hostname);

   if (taskid == MASTER)
      printf("MASTER: Number of MPI tasks is: %d\n",numtasks);

   //while(1) {}

   MPI_Finalize();
}
EOF
> docker exec -ti worker01 mpicc mpi_hello.c -o mpi_hello.out
> docker exec -ti worker01 srun -N 2 --mpi=openmpi mpi_hello.out
Hello from task 1 on worker02!
Hello from task 0 on worker01!
MASTER: Number of MPI tasks is: 2
# disable message about missing openib in case with the following setting
# docker exec -ti worker01 bash -c "export \
# OMPI_MCA_btl_base_warn_component_unused=0; srun -N 2 --mpi=openmpi mpi_hello.out"
# login to a worker container
> docker exec -ti worker01 bash
# and finally shutdown the SLURM container environment
> sh teardown.sh
# docker-compose stop
# docker-compose rm -f
# docker volume rm slurmindocker_home slurmindocker_secret
# docker network rm slurmindocker_slurm

In case the controller constantly restarts with messages like
sacctmgr: error: Malformed RPC of type PERSIST_RC(1433) received
sacctmgr: error: slurm_persist_conn_open: Failed to unpack persistent connection init resp message from database:6819 :

sh teardown.sh
rm -rf home/worker/.ssh/*
sudo rm -rf secret/*
docker-compose up -d

Have a look at our other technical guides related to the high-end remote desktop software NICE DCV and EnginFrame HPC and session management portal. If there are any questions let us know.

Commercial Support for SLURM

Our experienced technical team offers professional support for SLURM for commercial and academia customers. We help you solve issues with your SLURM installation via Email, Phone and Webconf. In case you are interested let us know at or via our contact form.