Parallel Jobs example
Parallel Jobs and SLURM
NOTE: This page is out of date.
Sango supports the two common types of parallel programs: multi-thread and MPI (message passing interface). A multi-thread job runs several threads on a single node with the threads sharing the CPU and memory resources on that node. A MPI job runs several ranked independent processes on one or more nodes with the ability to have the processes communicating with each other. Multi-thread is fundamentally different from MPI jobs in that a multi-thread job will execute within and be able to use the resources from only one single node, whereas a MPI job can execute on and use the resources from an arbitrary number of nodes.
SLURM scheduler uses cpu and task resources to implement multi-thread and MPI parallel jobs, respectively. A SLURM multi-thread parallel job consists of using only one SLURM task that allocates up to the number of CPUs (cores) available from a single compute node. Whereas, a SLURM MPI parallel job consists of allocating a number of tasks up to the total number of CPUs (cores) available from all the compute nodes. In the folowing we give examples of the computation of an approximate of pi (π), for multi-thread and MPI parallel jobs on the tombo cluster.
Multi-thread job script
A multithreaded python script is used to compute an approximate of pi by adding up partial summation computed on different SLURM CPUs (cores) form one SLURM task.
1. Python script calpi_mp.py for multi-threaded computation of an approximate of pi
The python script reads below
#!/usr/bin/env python3 import sys, os, math, multiprocessing ## partial computation of the approximation of pi def compute_rpi(a, b, n): h = 1.0 / n s = 0.0 for i in xrange(a, b): x = h * (i + 0.5) s += 4.0 / (1.0 + x*x) return s * h ## wrapper for the partial computation function def wrp_rpi(abn): (a, b, n) = abn return compute_rpi(a, b, n) ## main program def main_mp(args=None): if args is None: args = sys.argv ## parameters for command arguments try: nt = int(args[1]) ## number of threads ns = int(args[2]) ## number of subdivisions except: sys.stderr.write('Usage: {0} <nt> <ns>\n'.format(args[0])) return 1 ## init table of nt integral ranges ssz = ns / nt lrngs = [(ii*ssz, (ii+1)*ssz, ns) for ii in xrange(0, nt-1)] lrngs.append(((nt-1)*ssz, ns, ns)) ## spawns nt threads to compute the integral pool = multiprocessing.Pool(processes = nt) results = pool.map(wrp_rpi, lrngs) cal_pi = math.fsum(results) error = abs(cal_pi - math.pi) print('pi_mp is approximately {0} error is {1}'.format(cal_pi, error)) return 0 ## ---------------------------------- if __name__ == "__main__": sys.exit(main_mp())
2. SLURM job script job_calpi_mp.slurm for submission to sango scheduler
The SLURM job script below call calpi_mp.py to compute the approximate of pi from a summation up to the 1000000000th order using 10 threads. Note that the number of allocated CPUs (cores) becomes available through the SLURM variable ${SLURM_CPUS_PER_TASK}.
#!/bin/bash #SBATCH --job-name=calpi_mp #SBATCH --partition=compute #SBATCH --ntasks=1 #SBATCH --cpus-per-task=10 #SBATCH --mem=2G module load python/3.7.3 NBSUBDIV=1000000000 python3 calpi_mp.py ${SLURM_CPUS_PER_TASK} ${NBSUBDIV} > calpi_value_mp.txt
Save calpi_mp.py and job_calpi_mp.slurm in the same directory and use the command sbatch to submit the computation to the tombo scheduler as follows
$ sbatch job_calpi_mp.slurm
After completion, computation results can be printed from the output file calpi_value.txt
3. Tips
How to obtain the maximum number of CPUs (cores) available on a node in the compute partition
$ sinfo -p compute -o %c CPUS 40
MPI job script
A MPI python script is used to compute an approximate of pi by adding up partial summation computed on different SLURM tasks.
1. SSH keys settings requirement
When using MPI on the cluster the nodes have to communicate with each other through SSH without password. Follow the following steps to set up your environment prior to use MPI jobs
1.1. Check whether you have a id_rsa / id_rsa.pub pair of keys
$ ls -lh ~/.ssh
If the outout reads like the following two lines, go to step 1.3
-rw-------. 1 user group 1.7K Oct 5 2010 id_rsa -rw-r--r--. 1 user group 400 Oct 5 2010 id_rsa.pub
1.2. Create a pair of keys using the command ssh-keygen (press enter for each prompt to make the key passwordless)
$ ssh-keygen
1.3. Append your public key (yourself) to the list of authorized keys
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
2. python script calpi_MPI.py for MPI computation of the approximate of pi
The python script reads below
#!/usr/bin/env python3 import sys, os, math from mpi4py import MPI ## partial computation of the approximation of pi def compute_rpi(a, b, n): h = 1.0 / n s = 0.0 for i in xrange(a, b): x = h * (i + 0.5) s += 4.0 / (1.0 + x*x) return s * h ## main program def main_mp(args=None): comm = MPI.COMM_WORLD lrank = comm.Get_rank() nprocs = comm.Get_size() ## number of MPI processes if args is None: args = sys.argv ## parameters for command arguments try: ns = int(args[1]) ## number of subdivisions except: sys.stderr.write('Usage: {0} <ns> \n'.format(args[0])) return 1 ## compute partial approximation of pi for given range ssz = ns / nprocs if lrank != nprocs - 1: lval = compute_rpi(lrank*ssz, (lrank+1)*ssz, ns) else: lval = compute_rpi(lrank*ssz, ns, ns) ## sum-up results from all slots cal_pi = comm.reduce(lval, op=MPI.SUM, root=0) if lrank == 0: error = abs(cal_pi - math.pi) print('pi_MPI is approximately {0} error is {1}'.format(cal_pi, error)) return 0 ## ---------------------------------- if __name__ == "__main__": sys.exit(main_mp())
3. SLURM job script job_calpi_MPI.slurm for submission to Sango scheduler
The SLURM job script below calls calpi_MPI.py to compute the approximate of pi from a summation up to the 1000000000th order using 10 tasks.
#!/bin/bash #SBATCH --job-name=calpi_MPI #SBATCH --mail-user=%u@oist.jp #SBATCH --ntasks=10 #SBATCH --partition=compute #SBATCH --mem-per-cpu=1g module load openmpi.gcc/4.0.3 module load python/3.7.3 NBSUBDIV=1000000000 srun --mpi=pmix python calpi_MPI.py ${NBSUBDIV} > calpi_value_MPI.txt
Unlike multi-thread programs, MPI programs must be run using the srun --mpi=pmix command.