The Open Saion partitions
The "test-" and "intel" partitions on Saion are available to any user with an HPC account. You can use these partitions for developing and testing code; experimenting with GPU computations or with high-core count nodes; or simply to run short jobs with less waiting time.
These partitions are not well suited for long-running computations. If you need hours or days for your job, you are probably better off using the regular "compute" partition, or you may want to apply for access to a restricted resource that will give you plenty of compute time and resources. Please read the descriptions below before you decide to use them.
The Saion system is set up slightly differently from the main cluster. Please read the Saion introduction here for the best way to organise your computations.
The "test-gpu" partition is freely available. It gives you easy access to modern GPU nodes without applying for restricted resources.
This partition is running on hardware that is normally used for specific high-priority tasks. If such a task is started, the jobs running on the "test-gpu" partition will be stopped, then restarted once the high-priority tasks are done. As your jobs can be interrupted at any time it is not suitable for important long-running jobs, unless the job can restart where it left off.
This partition is best used for short computations, including interactive use and compiling and testing code. Your allocation is maximum 18 cores and 2 GPUs.
"test-gpu" is a set of 6 nodes with 4 NVIDIA P100 GPUs each. The layout and usage notes is the same as for the restricted Saion "gpu" partition. In order to take advantage of these nodes, you generally have to start a job on one of the nodes and build your software there. The login nodes don't have the drivers or libraries needed to build GPU software.
building your code
You should build your application on a GPU compute node. The login nodes use a different CPU model and have no GPU-related libraries, so you can't build your application there.
We have CUDA 8, 10.0, 10.2, 11.1 and 11.3; and CuDNN 6.0 and 8.8 available as modules.
To build an application you need to log in to a GPU node. The login nodes do not have the CUDA and other libraries that you typically need. To start an interactive job, try this:
$ srun -t 0-1 -c 8 -p test-gpu --mem=32G --gres=gpu:1 --pty bash -l
This gives you 32G memory, 8 cores and one GPU for 1 hour. If you only intend to compile the code and not test it, you can of course refrain from asking for a GPU at all. That way you can still use a GPU node for building your code even if all GPUs are in use by somebody else.
"intel" is a set of 4 dual Intel Xeon nodes, each with 40 cores and 512GB memory. We ask you to use no more memory than 120G per job.
The "intel" partition is gang scheduled. Your job will start immediately, but if too many other jobs are also running on the partition, the jobs will take turns running on the system. The time slice is 10 seconds, meaning your job runs for 10 seconds at a time, before it is suspended in favour of another job.
This system is good for shorter computations, and especially good when you want to quickly test a program. You might simply want to see if your program runs at all, or you want to run it for a few minutes to see that it does what it's supposed to do. On the "intel" system you don't need to wait for your turn; your jobs will usually start immediately.
The frequent interruptions mean that the "intel" partition is not suitable for interactive use. Also, if the node runs out of memory it can not accept more jobs, and further submitted applications will have to wait until memory is freed. This is the reason you are not allowed to use more than 120G per job on the "intel" partition.
Next Section: The restricted Saion partitions.