GPGPU computation using Matlab

This tutorial assumes that the user has a good working knowledge of Matlab and already know how to use GPU nodes on Saion. This tutorial is not about how to use Matlab through its web interface.

In the following we show how to run a Matlab GPU computation on the Saion cluster using a SLURM job script. Only a subset of applications or algorithms are suitable for speedup via GPU Computing. Generally, your program should satisfy the following criteria:

Computationally intensive Heavy computation can be done on the GPU with few data transfer.
Massively parallel Similar task is performed repeatedly on different data.

We will provide a simple example below which setups two matrices in GPU memory, multiply them, then copy the result back to CPU memory and display.

The matlab script performing the matrix multiplication reads

gpuDeviceCount

gpuDevice

A = ones(10, 'single', 'gpuArray');
B = 5 .* eye(10, 'single', 'gpuArray');
C = A * B;
C_host = gather(C);

C_host

The command gpuDeviceCount will return the number of GPUs you have access now, my slurm file requested one node and one GPU, but you can request two GPU by having gres=gpu:2. The command gpuDevice returns the property of the GPU, including type, the max block and thread size supported, GPU memory, etc. In the computation part, the matrix A is setup with all enties equal to 1, setup matrix B as a diagonal matrix with diagonal entries all equal to 5. The two matrices are already in GPU memory since gpuArray was specified at the beginning, but you can also copy existing data from CPU to GPU. One thing to notice is that matrix C is calculated on GPU and therefore stays in GPU memory, one has to copy it to CPU memory by calling gather(), to display or use in other code that may execute on CPU.

Detailed documentation on GPU programming with MATLAB is available from GPU Computing.

The SLURM script (/apps/local/training/samples/matlab_gpu.slurm) used to run the matlab computation is given below

#!/bin/bash -l

#SBATCH --job-name=matlab-gpu
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --time=00:10:00
#SBATCH --input=none
#SBATCH --output=job_%j.out
#SBATCH --error=job_%j.err

module load matlab/R2018b

matlab_exe="matlab -nosplash -nodisplay -nojvm -nodesktop"

${matlab_exe} -r "matlab_prog_gpu, exit"

A run of the code is shown below

 $ sbatch matlab_gpu.slurm
Submitted batch job 879537
 $ cat job_879537.out

                            < M A T L A B (R) >
                  Copyright 1984-2015 The MathWorks, Inc.
                   R2015a (8.5.0.197613) 64-bit (glnxa64)
                             February 12, 2015

 
For online documentation, see http://www.mathworks.com/support
For product information, visit www.mathworks.com.

        Academic License

ans =

     1

ans =

  CUDADevice with properties:

                      Name: 'Tesla K80'
                     Index: 1
         ComputeCapability: '3.7'
            SupportsDouble: 1
             DriverVersion: 7.5000
            ToolkitVersion: 6.5000
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 1.2079e+10
           AvailableMemory: 1.1950e+10
       MultiprocessorCount: 13
              ClockRateKHz: 823500
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

C_host =

     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5

[source]