# The Restricted Saion Partitions

The "gpu", "powernv" and "kannel" partitions are restricted partitions meant for general GPU or high-core computation. The GPU partitions are often used for deep learning, and the kannel partition for highly parallel physics simulations. Apply for access here.

The Saion system is set up slightly differently from the main cluster. Please read the Saion introduction here for the best way to organise your computations.

## "gpu"

This partition has a total of 16 GPU nodes. Each GPU node has 36 CPU cores and 512GB main memory. 8 nodes have four Tesla P100 GPUs and 16GB memory each. A further 8 nodes each have four NVIDIA V100 GPUs with 16GB memory.

The "gpu" partition will gives you the next free node, irrespective of the GPU type. If you want you can specify the GPU type (see below), but in practice most jobs won't see any significant difference.

The GPU-GPU interconnect on the nodes is very high bandwith, as is the interconnect between GPU nodes for multi-node GPU jobs. But the CPU-GPU connection is relatively slower, so you should try to do as many calculations on the GPU as possible for each piece of data that you transfer to the GPU.

Your allocation on these partitions is 36 CPU cores and 4 GPUs in total. Your longest job run time is 7 days.

You should build your application on a GPU compute node. The login nodes use a different CPU model and have no GPU-related libraries, so you can't build your application there.

We have CUDA 11.1, 11.3 and CuDNN 8; CUDA 10.1, 10.0 and 8; and CuDNN 6 and nccl2 are available as modules. Please use these. We may install more modules if there are user requests for them, but we think many users will want to build their own applications.

To build an application you need to log in to a GPU node. The login nodes do not have the CUDA and other libraries that you typically need. To start an interactive job, try this:

$srun -t 0-4 -c 8 -p gpu --mem=32G --gres=gpu:1 --pty bash -l This gives you 32G memory, 8 cores and one GPU for 4 hours. If you only intend to compile the code and not test it, you can of course refrain from asking for a GPU at all. That way you can still use a GPU node for building your code even if all GPUs are in use by somebody else. To specify a specific GPU, add the type to the gres request: $ ... --gres=gpu:V100:1 ...

But as noted above, most jobs won't see any notable difference. In fact, if you specify a single type of GPU you're limiting yourself to only those nodes with that GPU, and if those happen to be busy you will end up waiting far longer than you would have by just taking the first available GPU.

## powernv

The powernv partition is a 8-node IBM Power system. All nodes come with 512GB memory and 4 NVIDIA GPUs. All interconnects, both between GPU-GPU and CPU-GPU, are very fast, making this system especially useful for data-intensive GPU tasks.

Nodes CPU Cores GPU
2 2 × Power 9 160 4 × V100
2 2 × Power 8 128 4 × P100
4 2 × Power 8 160 4 × P100

You can use all cores and GPUs for up to 7 days. We do ask you to not allocate more resources than you actually need.

Also, you need to allocate cores in multiples of 8. The Power cores are grouped by 8 in the CPU, and need to be allocated as a unit.

### Building and Running your code

The Power CPUs are not Intel compatible. You need to rebuild your code on a Power node in order to run it on the Powernv system. As the nodes differ somewhat by hardware and software, we recommend that you use the saion-power03 node to build your code. That node is a lowest-common denominator, and code built there is likely to run well on all the nodes.

To build your code, run an interactive job on the power03 node:

$srun -p powernv --nodelist=saion-power03 --gres=gpu:1 -t 0-4 --mem=32G -c 16 --pty bash -l This gives you 32G memory, 16 cores and one GPU for 4 hours on the power03 node. This partition does not share code or modules with the other partitions on the Saion system. To make use of the system you always need to add the "-l" option to bash. '-l' stands for "login" and makes bash re-read and update all configuration when the jobs starts. With interactive jobs always start a "bash -l" shell on the node rather than starting your application directly. For batch jobs, make sure your first line reads: #!/bin/bash -l #SBATCH ..... .... ### IBM-supplied deep-learning software We have a full IBM-supplied suite of deep learning software installed on the powernv system. It is installed by way of an Anaconda Python installation. The "python/3.7.6" module contains a number of deep-learning related software that have been rebuilt and optimised for the Power system. Load it with: $ module load python/3.7.6

Software and libraries include Pytorch, Tensorflow, Caffe and Caffe2, and OpenCV, Keras, Pandas, Numba and many other support libraries.

The "pytorch/1.5" module is a partial installation of the PowerAI distribution, with a separately installed and updated version of Pytorch and related software.

### Other Software

The "openmpi.gcc/4.0.3" module contains an OpenMPI installation built for the Power nodes. However, as the system version and hardware differs between nodes, this MPI version is by default set to only use the ethernet interface. This may have an impact on the speed; however, these nodes are primarly meant for GPU computing so it's not a major issue.

If you want to use it, make sure to load this module last, after any other modules. The IBM PowerAI modules (python and pytorch) contain libraries that will interfere with MPI if they are loaded after this one.

For other software, you will need to build and install it yourself on these nodes. This can sometimes be very non-trivial; if you are looking to get started quickly we suggest you use the "gpu" partition instead. We are happy to give you advice and help if you get stuck, so don't hesitate to contact us if you want.

## "kannel"

The kannel partition has 16 KNL nodes. The kannel nodes are each equipped with an Intel Xeon Phi 7230 with 64 cores, four threads per core and 128G memory.The operating system on the kannel nodes is CentOS 7.3, the same we use on Sango but different from the login nodes. The KNL is fully x86-64 compatible, so you should be able to build and run most code unchanged on these systems.

Each KNL core is slower than a normal Intel CPU core, but there are more of them; they can run vectorized code efficiently; and they have a fast on-board interconnect to other KNL nodes. An application that scales well should be at least as fast on one KNL node (with a single KNL CPU) as it is on one Sango node (with two Xeons). An application optimized for KNL can be significantly faster.

The nodes also have a very fast network connected directly between the KNL CPUs themselves, so multi-node applications will be faster than across Sango nodes.

We have some software packages installed through the module system, and we welcome requests for more software that you may want to use on kannel.

The KNL nodes are X86-64 compatible, which means that most general Linux code built for Intel CPUs will run on them unchanged. But that will not take full advantage of the features of the KNL system. For that you do need to rebuild your code specifically for the KNL, and you ideally use the Intel compiler on a kannel node for best results.

To build an application you would do these steps:

$srun -p knl -t 0-8 --pty bash$ module load intel/2017_update1
• Then build your application using icc. We suggest you use these options when building for KNL:
C:       -O3 -xMIC-AVX512 -fma -align -finline-functions
C++:     -std=c11 -O3 -xMIC-AVX512 -fma -align -finline-functions
Fortran: -O3 -xMIC-AVX512 -fma -align array64byte -finline-functions

If you use GCC, try these options for a start:

gcc:      -march=knl -O3 -mavx512f -mavx512pf -mavx512er -mavx512cd -mfma -malign-data=cacheline -finline-functions
g++:      -std=c11 -march=knl -O3 -mavx512f -mavx512pf -mavx512er -mavx512cd -mfma -malign-data=cacheline -finline-functions
gfortran: -O3 -march=knl -mavx512f -mavx512pf -mavx512er -mavx512cd -mfma -malign-data=cacheline -finline-functions

Keep in mind that the GCC installed on kannel is quite old and will not generate the best code for the KNL nodes.

The AVX 512-related options generate code to take advantage of the 512 bit vector processing instructions on KNL when applicable. Good use of vectorization is essential for performance on KNL, so if you are developing your own code it may be well worth your time to make sure your code is really being vectorized.

## "amd" and "prio-gpu"

This is 4 nodes with dual X86 AMD CPUs, each with 64 full cores and 512GB memory; and one GPU node with 512GB memory and 4 NVIDIA P100 GPUs with 16GB each.

These systems are reserved for high-throughput data processing applications such as Relion (used for post-processing of cryo-microscopy data). If you think you may have a need for this, contact us and we can discuss it further.

Previous Section: The open Saion partitions.