SLURM Job Scheduler: Quick Reference Guide

Introduction

These pedagogic modules are designed to give students hands-on experience with parallel computing. To ensure that students are able to gain experience with real-world distributed-memory environments, we use a cluster that uses the SLURM batch scheduler. In this section, we provide a quick reference guide that describes several SLURM commands that will be used throughout these pedagogic modules.

The full SLURM documentation can be found at this link: https://slurm.schedmd.com/documentation.html. A summary page of job script commands and SLURM usage can be found here: https://slurm.schedmd.com/pdfs/summary.pdf.

If you are using a cluster with a different scheduler (e.g., PBS), there are commensurate commands that perform the same function. A list of commands and their equivalent uses in SLURM can be found here: https://slurm.schedmd.com/rosetta.pdf.

SLURM Example Script

#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --output=/path/to/scratch/myfile.out	
#SBATCH --error=/path/to/scratch/myfile.err
#SBATCH --time=20:00 #20 minutes
#SBATCH --mem=4000          
#SBATCH --nodes=2			
#SBATCH --n=15
#SBATCH --cpus-per-task=1
#SBATCH --exclusive 

module load openmpi

srun /path/to/bin/main 100

job-name: The name of your job.
output: Program output (stdout) is written to this location. This is often a directory on scratch storage.
error: Standard error is written to this location. This is often a directory on scratch storage.
time: The maximum time that your job will run for. The job will be killed if it exceeds this time.
mem: Main memory requested for the job in MiB.
nodes: Number of nodes requested.
n: Number of tasks to be launched (often corresponds to the number of MPI ranks).
cpus-per-task: The number of CPUs assigned per task. E.g., a hybrid MPI+PThreads program may use two threads within each MPI Rank, where 2 CPUs per task are required.
exclusive: The nodes are not shared with any other jobs. On most clusters, nodes typically concurrently execute jobs from multiple users.
module load <module>: Loads required modules on the cluster. Assuming the module system is used, the modules will vary depending on the cluster.
srun: Executes your program. When using MPI, srun replaces mpirun, where srun intelligently determines your requested configuration using the parameters in the script.

Using srun

To best exploit resources, it may be useful to use flags with srun to control the assignment of processes to nodes. Consider the two node job script below. While the three lines with srun use 2, 8, and 16 process ranks, respectively, the flag ntasks-per-node tells the scheduler to evenly split the ranks between nodes (e.g., on the 16 rank execution, 8 processes are assigned to each node). Depending on the scheduler settings, without this flag, you may get an uneven number of ranks assigned to each node (e.g., 10 ranks on one node and 6 ranks on the other).

#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --output=/path/to/scratch/myfile.out	
#SBATCH --error=/path/to/scratch/myfile.err
#SBATCH --time=20:00 #20 minutes
#SBATCH --mem=4000          
#SBATCH --nodes=2			
#SBATCH --n=28
#SBATCH --cpus-per-task=1
#SBATCH --exclusive 

module load openmpi

srun  --nodes=2 --ntasks-per-node=1 -n2 /path/to/scratch/bin/main
srun  --nodes=2 --ntasks-per-node=4 -n8 /path/to/scratch/bin/main
srun  --nodes=2 --ntasks-per-node=8 -n16 /path/to/scratch/bin/main

Last updated on Jan 8, 2020