Running Matlab Jobs
We have several versions of Matlab installed on Deigo, from 2009 onwards. However, only the latest installed versions will work well on the cluster.
You can use it interactively or in batch jobs. We have support for parallel execution so you can use multiple cores on a node or, with some programming, several nodes to speed up your jobs.
We are not going to describe how to use Matlab itself. The Matlab documentation on the Mathworks website is a great resource for learning the specifics.
Matlab Options
These are useful command line options for running on the cluster:
option | meaning |
---|---|
-nosplash |
Don't put up the "welcome to Matlab" message when we start |
-nodesktop |
Use the text mode interface. Much faster than the graphical desktop. You can still plot and view graphs. |
-nodisplay |
Don't allow any graphical display of any kind. Useful for batch jobs. |
-nojvm |
Start without Java. This disables some graphics and network-related functions, but reduces the startup-time and memory consumption. |
In general you'll want to use -nosplash
and maybe -nodesktop
for interactive work, and also -nodisplay
and perhaps -nojvm
for batch jobs.
We will first run Matlab interactively, with the recommended text interface and with the graphical GUI. Then we'll show how you write a basic Matlab job script so you can start a long-running computation without having to stay logged in. Finally we'll show how you can speed up your Matlab jobs by using multiple cores and nodes on the cluster.
Run Matlab in interactive mode
Matlab job scripts
Access the environment
Parallel jobs with task arrays
Parallel code using the Parallel Toolbox
Parallel code through Distributed Computing Server
Run Matlab in interactive mode
If you want to plot data or use the graphical interface, you need to be able to forward and view X windows. On Linux and OSX you add the "-X" parameter to your ssh command:
$ ssh -X your-name@deigo.oist.jp
On OSX you will need the XQuartz server installed. On Windows you need an SSH client that can forward X window graphics. MobaXterm is a free SSH client with a built-in X server. PuTTY plus the Xming X server reportedly also works.
You can run Matlab as an interactive job on Sango and use it like you would on your own computer. Some Matlab functions (such as vector operations and fft) and toolboxes can take advantage of multiple cores, so it's often a good idea to allocate more than one core even if you're just using it interactively.
Log in to the cluster. Load a Matlab module (Use the module system):
$ module load matlab/R2018b
Start an interactive job with the srun
command
$ srun -p short -t 1:00:00 --mem=10G -c8 --x11 --pty bash
In this example we ask for 10GB memory and 8 cores for one hour. --x11
tells Slurm to forward graphical output to our local machine, and --pty bash
starts an interactive command line. See Run Computations in the Getting Started page for more on running jobs, and man srun
for details on the srun command.
Text mode is the best way to run Matlab interactively on the cluster. It's not as pretty as the GUI, but it's much faster over a network, especially from home or over WiFi, and you still have all the functionality of Matlab, including plotting. You will want to disable the graphical desktop and the start-up splash screen. On the compute node, start Matlab as:
$ matlab -nosplash -nodesktop
If you're impatient, or you know you will only use Matlab this session, you can skip the command line and just run Matlab directly as the interactive command:
$ srun -p short -t 1:00:00 --mem=10G -N 1 -c 12 --x11 --pty matlab -nosplash -nodesktop
Run Matlab interactively with the GUI
If you want you can run Matlab with the full GUI. It will be a bit sluggish on the wireless network, and very slow when you connect from outside OIST. To run Matlab with the GUI you start a job just like above, and run Matlab with just the -nosplash
parameter:
$ module load matlab/R2018b
$ srun -p short -t 1:00:00 --mem=10G -N 1 -c 12 --x11 --pty bash
$ matlab -nosplash
This is the Linux desktop version of Matlab, and should work the same as OSX and Windows versions.
Run Matlab job scripts
You can run Matlab scripts as batch jobs. This is the best way to run longer, non-interactive calculations. Let's take a very simple example script and call it svdprog.m
:
A=rand(msize,msize);
[U,S,V] = svd(A);
save(wsname);
This script generates a random matrix of size msize
, calculates the singular-value decomposition, then saves everything to a Matlab ".mat
"-type file with the name in wsname
. The script needs inputs for msize
and wsname
.
Matlab scripts can't be run as regular programs, and they don't have a mechanism to read input parameters directly. Instead, we run a script inline that sets the parameter values, then calls our svdprog.m
script:
#!/bin/bash
#SBATCH --partition=short
#SBATCH --job-name=mlab_test
#SBATCH --output=mlab_test_%u.out
#SBATCH --time=10:00
#SBATCH --mem=1G
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
# load matlab. You can see available version with 'module avail'
module load matlab/R2018b
# matlab program with options for batch processing
mlab_cmd="matlab -nosplash -nodisplay -nojvm -nodesktop"
# Set our matrix size and output file name
mat_size=15
out_name="${SLURM_JOB_NAME}_${SLURM_JOB_ID}.mat"
# Run a small script that sets our variables then calls the real script
${mlab_cmd} -r "msize=${mat_size}; wsname='${out_name}'; svdprog;"
We set "mlab_cmd
" to our Matlab program and options. We set "mat_size
" to our desired matrix size, and "out_name
" to the output file name. Slurm defines "SLURM_JOB_NAME
" and "SLURM_JOB_ID
" for us with the current job name and job ID number.
Before bash runs a line, it replaces "${variable}
" with the value of variable
. "${mlab_cmd}
" is replaced with out Matlab command and options, "${mat_size}
" is replaced with the value "15
" that we set and so on.
In the end, our script actually runs this command line:
matlab -nosplash -nodisplay -nojvm -nodesktop -r "msize=15; wsname='mlab_test_123456.mat'; svdprog;"
The "-r
" option to Matlab tells it to run the following string as a Matlab script. This script sets the variables msize
and wsname
then runs svdprog.m
, like we wanted.
Access the environment from Matlab Scripts
It's often useful to access and change the environment in different ways. Slurm sets a number of environment variables with information on the job name, user, number of cores and so forth. You can access them with "`getenv" like this:
jobname = getenv('SLURM_JOB_NAME');
jobid = str2num(getenv('SLURM_JOB_ID'));
getenv
returns the value as a string. If you want the numerical value, use str2num
to convert it. We will see a number of examples of this further below.
You can create and run from a temporary directory like this:
scratchdir = strcat('/flash/YourunitU/scratch/',getenv('USER'),'/',getenv('SLURM_JOB_ID'));
unix(['mkdir -p ' scratchdir]);
cd(scratchdir);
Create a directory path /flash/YourunitU/scratch/<user nam>/<job id>/
. replace "YourunitU" with the real unit directory name. Then create it by calling the shell "mkdir
" command directly with the "-p
" flag (so it creates all intermediate directories in the path). We call the system version (using the "unix
" function) because the matlab "mkdir
" function has a bug. Finally, change the current directory using "cd
".
Run Matlab jobs in parallel through Slurm job arrays
If you need to do multiple independent matlab computations with different parameters, you can use job arrays to run them in parallel. We will give you a quick example here; please see "Array Batch Jobs" to learn more about Slurm job arrays.
This is our script above, changed to run a range of matrix sizes, with a different output file for each size. All changes are marked in bold:
#!/bin/bash
#SBATCH --partition=short
#SBATCH --job-name=mlab_test
#SBATCH --output=mlab_test_%u.out
#SBATCH --time=10:00
#SBATCH --mem=1G
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --array=0-9%5
# load matlab. You can see available version with 'module avail'
module load matlab/R2018b
# matlab program with options for batch processing
mlab_cmd="matlab -nosplash -nodisplay -nojvm -nodesktop"
# our range of matrix sizes as a bash array
sizearr=( 8 12 15 22 540 400 234 35 17 42 )
# Set our matrix size and output file name
mat_size=${sizearr[${SLURM_ARRAY_TASK_ID}]}
out_name="${SLURM_JOB_NAME}_${SLURM_JOB_ID}_${mat_size}.mat"
# Run matlab on a small inline script
${mlab_cmd} -r "msize=${mat_size}; wsname='${out_name}'; svdprog;"
We add a new sbatch parameter: "--array=0-9%5
" tells Slurm this is an array job; that it should run one job for each value between 0 and 9; and that it should submit no more than 5 jobs at once (we ask you to submit no more than 50 Matlab jobs at any one time).
Slurm will submit this script as a job once for each array value. The value itself is stored in the "SLURM_ARRAY_TASK_ID
" variable, and you can use this to set parameters and file names.
By way of example, we define a bash array sizearr
with a range of values. We then use SLURM_ARRAY_TASK_ID
to index sizearr
, then use mat_size
to set the matrix size and create unique output file names.
When you start this job, Slurm will submit the first 5 jobs (indexes 0-4). Once a job finishes, Slurm will submit more jobs to keep the total number of submissions at 5. This is more efficient than trying to submit batches of jobs yourself.
Remember that if you need it, you can get the value of SLURM_ARRAY_TASK_ID
from inside your matlab script with getenv
as we described earlier:
i=str2num(getenv('SLURM_ARRAY_TASK_ID'));
Run parallel Matlab code using the Parallel Toolbox
You can use the parallel toolbox to write multi-threaded scripts that can use the available cores on a single node. You replace for loops with the "parfor
" keyword ("parallel for") that runs the loop iterations in parallel, in a manner similar to that ot OpenMP.
Let's rewrite our SVD example to use "parfor
". First, here's our Slurm sbatch script:
#!/bin/bash
#SBATCH --partition=short
#SBATCH --job-name=mlab_parbox
#SBATCH --output=mlab_parbox_%u.out
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=4G
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
# load matlab.
module load matlab/R2018b
# matlab program without "-nojvm", since we need java for this toolbox.
mlab_cmd="matlab -nosplash -nodisplay -nodesktop"
# Run matlab on our script
${mlab_cmd} -r "svdprog_par"
Our new svd script uses no parameters. We ask for 8 cores and 4G per core for a total of 32G memory.
Our matlab script "svdprog_par.m
" looks like this:
% our array of matrix sizes
sizearr=[8, 12, 15, 22, 540, 400, 234, 35, 17, 42];
% get number of cores and the job name from environment
cores = str2num(getenv('SLURM_CPUS_PER_TASK'));
jobname = getenv('SLURM_JOB_NAME');
% Create a parallel pool with 'cores' number of threads
pc = parcluster('local');
poolobj = parpool(pc, cores);
% Parallel loop
parfor j = 1:length(sizearr)
msize = sizearr(j)
wsname=strcat(jobname,'_',int2str(msize),'.mat')
% our SVD code
A=rand(msize,msize);
[U,S,V] = svd(A);
% We need to call 'save' from an external function
pf_save(wsname, A, U, S, V);
end
% delete parallel pool
delete(poolobj)
We get the number of available cores from the SLURM_CPUS_PER_TASK
environment variable. We initialize the parallel toolbox, then create a pool of threads using the parpool()
function.
parfor
runs a loop from 1 to the number of elements in our parameter array sizearr
. But unlike a regular for loop, it will run each iteration though this loop in parallel on different threads. We create our msize and wsname parameters for each iteration, then calculate the SVD for each random matrix.
The Matlab save()
function can't be called directly inside a parfor loop. As a workaround, create a script "pf_save.m
" with the following and call it instead:
% define an external save command
function pf_save(dname, A, U, S, V)
save(dname, 'A', 'U', 'S', 'V');
end
You can't parallelize every loop. The iterations need to be independent, so the calculations can't depend on the result of earlier iterations. Also, parfor
adds some overhead so unless you have a fairly heavy calculation inside the loop, you may end up with slower code than without it.
Use Slurm array jobs and the Parallel Toolbox at the same time
The Parallel Toolbox uses multiple CPU cores in parallel in a single Matlab job. With Slurm arrays you can easily run multiple Matlab jobs in parallel. Sometimes you may want to do both: run multiple Matlab jobs, where each job uses the Parallel Toolbox. Ideally, you would be able to run your Matlab Parallel Toolbox code with Slurm arrays without issue.
Unfortunately there is a problem: The Parallel Toolbox records information about the job in a directory in your home. If you start multiple such jobs at the same time, some of them may be confused and overwrite each others files, which will result in failing jobs.
The solution is to change the directory that it saves the information in, so that each job has its own unique directory. You need to do this in your Matlab source code, right after you call "parcluster()", and before you call "parpool()":
% here we create a new parallel environment
pc = parcluster('local');
% Find out if we are running a cluster job, and get the job ID
slurmjob = getenv("SLURM_JOB_ID");
% do we have a job id?
if not (slurmjob == "")
% use the job id as a name for a new subdirectory, and set that as the
% location for Parallel Toolbox to save the information
cluster_dir = pc.JobStorageLocation + "/" + slurmjob;
mkdir(cluster_dir);
pc.JobStorageLocation = cluster_dir;
end
This snippet of code does a few things:
- We look for a Slurm Job ID.
- If we have one, we're running on the cluster
- Get the old location for Parallel Toolbox save files, then add the Job ID as a subdirectory to that
- Create the new directory
- Set this as the new directory for Parallel Toolbox to use
We need one more thing: when you create a parpool, you need to give the "pc" parameter above as the first argument:
poolobj = parpool(pc, .......);
That applies the new settings to the parpool so it actually uses the new directory.
Run Matlab code in parallel through the Distributed Computing Server
We can also run distributed computations across multiple nodes directly from Matlab itself, using the "matlab distributed computing server" (MDCS). Matlab creates and submits a Slurm job that runs the computation in parallel for you. This is similar in spirit to using MPI for parallel computation.
About Matlab versions
- The most recent versions of Matlab (2019a onwards) have good support for the Slurm scheduler built in. Using the Parallel Toolbox is quite easy. Follow these instructions.
- Versions before 2018 don't have good support for Slurm and need more setup to work. Follow these instructions.
- Versions 2018a and 2018b can't use Parallel Toolbox due to a bug in the scheduler; please use 2019a instead.
Parallel Toolbox for Matlab 2019a
This is the documentation for running parallel Toolbox on newer Matlab versions (after 2016). If you are using Matlab 2016 or older, please follow these instructions instead.
NOTE: Matlab 2018a and 2018b suffer from a bug. For Parallel Toolbox, please use Matlab 2019a instead.
As Matlab needs to create and submit a new job, we have to run Matlab on a login node. We need to be very careful not to run any computation on the login node itself.
Load Matlab and start it:
$ module load matlab/R2019b
$ matlab -nosplash -nodisplay -nodesktop
Write a job submission script "mlab_mdcs.m
" like this:
% MDCS configuration for Sango cluster
% Create a temporary directory in /work/scratch/$USER
% without this, matlab would store temporary files in your
% current directory instead.
scratchdir = strcat('/flash/YourunitU/scratch/',getenv('USER'));
unix(['mkdir -p ' scratchdir]);
% create a cluster instance, with our scratchdir as location
cluster = parallel.cluster.Slurm('JobStorageLocation', scratchdir);
% make sure we use the "compute" partition
set(cluster, 'ResourceTemplate', '-p compute --ntasks=^N^ --cpus-per-task=^T^')
%% Number of jobs/tasks to use
numWorkers = 8;
set(cluster, 'NumWorkers', numWorkers);
%% Job submission through the SPMD engine
pjob = createCommunicatingJob(cluster,'Type','spmd');
% This is the code we actually want to run
pjob.AttachedFiles={'colsum.m'};
t=createTask(pjob, @colsum, 1, {}) ;
pjob.NumWorkersRange = [8,8]
submit(pjob)
This creates a temporary directory under /flash/YourunitU/scratch/$USER/
for the server to put temporary files. We set up a new instance of the server, with the number of processes we wish to use.
This time we don't use a parallel loop. Instead we use a parallel block (called SPMD
for "Single Program, Multiple Data" in Matlab), and call our task in colsum.m
, once for each process.
This is "colsum.m
" :
function total_sum = colsum
if labindex == 1
% Send magic square to other labs
A = labBroadcast(1,magic(numlabs)) ;
else
% Receive broadcast on other labs
A = labBroadcast(1) ;
end
% Calculate sum of column identified by labindex for this lab
column_sum = sum(A(:,labindex)) ;
% Calculate total sum by combining column sum from all labs
total_sum = gplus(column_sum);
This code calculates the sum of all elements in a matrix (a magic square) in parallel. The code is called once per process, with the rank in labindex
and total number of processes in numlabs
. Process 1 creates a magic square and sends it to all processes using labBroadcast
. Each process calculates the sum of one column, then gplus()
sums (reduces) all column values from the processes and distributes the final value to total_sum
in all processes.
You run this code interactively from matlab:
>> mlab_mdcs
...
%% you can wait for the task to finish
>> wait(t)
%% Get the final output
>> t.OutputArguments
Please see the Mathworks documentation for much more on the parallel computing toolbox.
Parallel Toolbox for Matlab before 2018
This is the documentation for running parallel Toolbox on old Matlab versions without direct support for the Slurm scheduler. If you are using Matlab 2019 onwards, please follow these instructions instead.
Matlab needs to create and submit a new job, so we have to run our Matlab code on a login node. We need to be very careful not to run any computation on the login node itself.
Load Matlab and start it:
$ module load matlab/R2015b
$ matlab -nosplash -nodisplay -nodesktop
Write a job submission script "mlab_mdcs.m
" like this:
% MDCS configuration for Sango cluster
slurmpath = strcat(getenv('MATLAB_ROOT_DIR'), '/toolbox/distcomp/examples/integration/slurm/shared');
addpath(genpath(slurmpath));
% Create a temporary directory in /work/scratch/$USER
scratchdir = strcat('/flash/YourunutU/scratch/',getenv('USER'));
unix(['mkdir -p ' scratchdir]);
% create a cluster instance, with our scratchdir as location
cluster = parallel.cluster.Generic('JobStorageLocation', scratchdir);
set(cluster, 'HasSharedFilesystem', true);
set(cluster, 'RequiresMathWorksHostedLicensing', false);
set(cluster, 'ClusterMatlabRoot', getenv('MATLAB_ROOT_DIR'));
set(cluster, 'OperatingSystem', 'unix');
set(cluster, 'IndependentSubmitFcn', @independentSubmitFcn);
set(cluster, 'CommunicatingSubmitFcn', @communicatingSubmitFcn);
set(cluster, 'GetJobStateFcn', @getJobStateFcn);
set(cluster, 'DeleteJobFcn', @deleteJobFcn);
%% Number of jobs/tasks to use
numWorkers = 8;
set(cluster, 'NumWorkers', numWorkers);
%% Job submission through the SPMD engine
pjob = createCommunicatingJob(cluster,'Type','spmd');
% This is the code we actually want to run
pjob.AttachedFiles={'colsum.m'};
t=createTask(pjob, @colsum, 1, {}) ;
pjob.NumWorkersRange = [8,8]
submit(pjob)
This adds the Slurm integration code for the distributed server, then creates a temporary directory under /flash/YourunitU/scratch/$USER/
for the server. We set up a new instance of the server, with the number of processes we wish to use.
This time we don't use a parallel loop. Instead we use a parallel block (called SPMD
for "Single Program, Multiple Data" in Matlab), and call our task in colsum.m
, once for each process.
This is "colsum.m
" :
function total_sum = colsum
if labindex == 1
% Send magic square to other labs
A = labBroadcast(1,magic(numlabs)) ;
else
% Receive broadcast on other labs
A = labBroadcast(1) ;
end
% Calculate sum of column identified by labindex for this lab
column_sum = sum(A(:,labindex)) ;
% Calculate total sum by combining column sum from all labs
total_sum = gplus(column_sum);
This code calculates the sum of all elements in a matrix (a magic square) in parallel. The code is called once per process, with the rank in labindex
and total number of processes in numlabs
. Process 1 creates a magic square and sends it to all processes using labBroadcast
. Each process calculates the sum of one column, then gplus()
sums (reduces) all column values from the processes and distributes the final value to total_sum
in all processes.
You run this code interactively from matlab:
>> mlab_mdcs
...
%% you can wait for the task to finish
>> wait(t)
%% Get the final output
>> t.OutputArguments
Please see the Mathworks documentation for much more on the parallel computing toolbox.