Parallel Jobs example

Parallel Jobs and SLURM

 

NOTE: This page is out of date.

 

Sango supports the two common types of parallel programs: multi-thread and MPI (message passing interface). A multi-thread job runs several threads on a single node with the threads sharing the CPU and memory resources on that node. A MPI job runs several ranked independent processes on one or more nodes with the ability to have the processes communicating with each other. Multi-thread is fundamentally different from MPI jobs in that a multi-thread job will execute within and be able to use the resources from only one single node, whereas a MPI job can execute on and use the resources from an arbitrary number of nodes.

SLURM scheduler uses cpu and task resources to implement multi-thread and MPI parallel jobs, respectively. A SLURM multi-thread parallel job consists of using only one SLURM task that allocates up to the number of CPUs (cores) available from a single compute node. Whereas, a SLURM MPI parallel job consists of allocating a number of tasks up to the total number of CPUs (cores) available from all the compute nodes. In the folowing we give examples of the computation of an approximate of pi (π), for multi-thread and MPI parallel jobs on the tombo cluster.

Multi-thread job script

A multithreaded python script is used to compute an approximate of pi by adding up partial summation computed on different SLURM CPUs (cores) form one SLURM task.

1. Python script calpi_mp.py for multi-threaded computation of an approximate of pi

The python script reads below

#!/usr/bin/env python3

import sys, os, math, multiprocessing

## partial computation of the approximation of pi
def compute_rpi(a, b, n):
  h = 1.0 / n
  s = 0.0
  for i in xrange(a, b):
    x = h * (i + 0.5)
    s += 4.0 / (1.0 + x*x)
  return s * h

## wrapper for the partial computation function
def wrp_rpi(abn):
    (a, b, n) = abn
    return compute_rpi(a, b, n)

## main program
def main_mp(args=None):
  if args is None:
    args = sys.argv

  ## parameters for command arguments
  try:
    nt = int(args[1]) ## number of threads
    ns = int(args[2]) ## number of subdivisions
  except:
    sys.stderr.write('Usage: {0} <nt> <ns>\n'.format(args[0]))
    return 1

  ## init table of nt integral ranges
  ssz = ns / nt
  lrngs = [(ii*ssz, (ii+1)*ssz, ns) for ii in xrange(0, nt-1)]
  lrngs.append(((nt-1)*ssz, ns, ns))


  ## spawns nt threads to compute the integral
  pool = multiprocessing.Pool(processes = nt)
  results = pool.map(wrp_rpi, lrngs)
  cal_pi = math.fsum(results)
  error = abs(cal_pi - math.pi)
  print('pi_mp is approximately {0} error is {1}'.format(cal_pi, error))

  return 0

## ----------------------------------
if __name__ == "__main__":
  sys.exit(main_mp())
2. SLURM job script job_calpi_mp.slurm for submission to sango scheduler

The SLURM job script below call calpi_mp.py to compute the approximate of pi from a summation up to the 1000000000th order using 10 threads. Note that the number of allocated CPUs (cores) becomes available through the SLURM variable ${SLURM_CPUS_PER_TASK}.

#!/bin/bash

#SBATCH --job-name=calpi_mp
#SBATCH --partition=compute
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=2G

module load python/3.7.3

NBSUBDIV=1000000000
python3 calpi_mp.py ${SLURM_CPUS_PER_TASK} ${NBSUBDIV} > calpi_value_mp.txt

Save calpi_mp.py and job_calpi_mp.slurm in the same directory and use the command sbatch to submit the computation to the tombo scheduler as follows

 $ sbatch job_calpi_mp.slurm

After completion, computation results can be printed from the output file calpi_value.txt

3. Tips

How to obtain the maximum number of CPUs (cores) available on a node in the compute partition

 $ sinfo -p compute -o %c
CPUS
40

MPI job script

A MPI python script is used to compute an approximate of pi by adding up partial summation computed on different SLURM tasks.

1. SSH keys settings requirement

When using MPI on the cluster the nodes have to communicate with each other through SSH without password. Follow the following steps to set up your environment prior to use MPI jobs

1.1. Check whether you have a id_rsa / id_rsa.pub pair of keys

$ ls -lh ~/.ssh

If the outout reads like the following two lines, go to step 1.3

-rw-------. 1 user group 1.7K Oct  5  2010 id_rsa
-rw-r--r--. 1 user group  400 Oct  5  2010 id_rsa.pub

1.2. Create a pair of keys using the command ssh-keygen (press enter for each prompt to make the key passwordless)

 $ ssh-keygen

1.3. Append your public key (yourself) to the list of authorized keys

 $ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
2. python script calpi_MPI.py for MPI computation of the approximate of pi

The python script reads below

#!/usr/bin/env python3

import sys, os, math
from mpi4py import MPI

## partial computation of the approximation of pi
def compute_rpi(a, b, n):
  h = 1.0 / n
  s = 0.0
  for i in xrange(a, b):
    x = h * (i + 0.5)
    s += 4.0 / (1.0 + x*x)
  return s * h

## main program
def main_mp(args=None):
  comm = MPI.COMM_WORLD
  lrank = comm.Get_rank()
  nprocs = comm.Get_size() ## number of MPI processes

  if args is None:
    args = sys.argv

  ## parameters for command arguments
  try:
    ns = int(args[1]) ## number of subdivisions
  except:
    sys.stderr.write('Usage: {0} <ns> \n'.format(args[0]))
    return 1

  ## compute partial approximation of pi for given range
  ssz = ns / nprocs
  if lrank != nprocs - 1:
    lval = compute_rpi(lrank*ssz, (lrank+1)*ssz, ns)
  else:
    lval = compute_rpi(lrank*ssz, ns, ns)

  ## sum-up results from all slots
  cal_pi = comm.reduce(lval, op=MPI.SUM, root=0)

  if lrank == 0:
    error = abs(cal_pi - math.pi)
    print('pi_MPI is approximately {0} error is {1}'.format(cal_pi, error))

  return 0

## ----------------------------------
if __name__ == "__main__":
  sys.exit(main_mp())
3. SLURM job script job_calpi_MPI.slurm for submission to Sango scheduler

The SLURM job script below calls calpi_MPI.py to compute the approximate of pi from a summation up to the 1000000000th order using 10 tasks.

#!/bin/bash

#SBATCH --job-name=calpi_MPI
#SBATCH --mail-user=%u@oist.jp
#SBATCH --ntasks=10
#SBATCH --partition=compute
#SBATCH --mem-per-cpu=1g

module load openmpi.gcc/4.0.3
module load python/3.7.3

NBSUBDIV=1000000000
srun --mpi=pmix python calpi_MPI.py ${NBSUBDIV} > calpi_value_MPI.txt

Unlike multi-thread programs, MPI programs must be run using the srun --mpi=pmix command.