Large Memory Partitions

Some computations, such as many bioinformatics workflows, are limited by memory rather than processing power. We provide large memory partitions for these kinds of tasks.

System memory nodes max resources max (rec) time
largemem     500GB
750GB
48
14
5 nodes inf (3 weeks)
bigmem 2987GB
1500GB
1
1
8 cores inf (3 weeks)

Access

The large memory partitions are available only after applying for access. Please apply here. We will ask you to come to the Open Hours to show us what you want to do, and we will give you a quick guide on how to best use the partition for your particular jobs.

You log in to these partition using the "-p" or "--partition" parameter. For an interactive job asking for 250GB memory for one hour you do:

$ srun -t 0-1 --mem=250G -p largemem .....

In a Slurm script, add the lines:

#SBATCH --partition=largemem
#SBATCH --mem=250G
#SBATCH --time=0-1

For bigmem use, replace "largemem" with "bigmem" above, and change the requested memory as needed.

Largemem

"largemem" has two sets of nodes: 48 nodes with 500GB allocatable memory each, and 14 nodes with 750GB memory.

Slurm will allocate appropriate nodes for you. If you ask for more than 500G you will be allocated to one of the larger nodes. You will restrict yourself to a smaller subset of nodes, and you may have to wait longer before your job gets to run.

Resource limits

Largemem is a limited resource, and some users completely depend on it being available for their computing work. It is important that you not use too many resources and block others from using it.

These are the resource limits on Largemem:

  • Use no more than 5 nodes worth of resources in total
    • max 5 * 500GB memory
    • max 5 * 40 cores
       
  • Never run jobs that need less than 500GB memory and less time than a week.
    • If they need less than 500GB, run them on the "compute" or "short" partition.
    • If you need longer running time on "compute", please see this page.
       
  • Make sure you are not allocating more memory than you actually use
    • We monitor usage on Largemem, and may stop jobs that are grossly overallocating resources.

We do not set a hard limit on the number of cores or amount of memory that you can allocate. You are responsible for staying within these limits, and we may stop your jobs if you exceed them.

We have an easy way to measure your memory use. Please take a look at using "ruse" below.

Time Limits

There is no time limit on your jobs. But it's not a good idea to run jobs for more than about 2-3 weeks. Software may crash, power may be cut, the network may fail, computers may break. If we need to do maintenance, we do not take long-running jobs into account before shutting down nodes.

The risk that your job will fail increases rapidly beyond about 2 weeks. If you have very long-running tasks, either try to split them into several jobs, or make sure your software can periodically save partial results and continue from there if or when it fails.

Bigmem

"bigmem" is one node with 3TB memory in total, and one node with 1.5TB in total. It is reserved only for jobs that truly can't run on any other system. In order to get access, you need to show that you have tried and failed to get your work done on a 750GB largemem node or equivalent.

Limits:

  • You may use only up to 8 cores.

  • You must ask only for the amount of memory you will actually use (with a safety margin).

  • Any jobs that can use 750GB or less must be run on other partitions.

  • You are strongly encouraged to keep your job time limit below 3 weeks.

We actively monitor the use of Bigmem, and we may remove jobs that allocate excessive amounts of memory, too many cores, or that don't need Bigmem resources. Please see Using Ruse below for a way to measure your memory use.

We ask you to limit the running time for your own sake. Jobs that exceed a couple of weeks are increasingly unlikely to finish running without encountering software bugs, network or hardware faults or other problems. We do not take long-running jobs into account when we need to shut down a system for maintenance.

Use Ruse to measure your memory use

We have an easy way to measure the actual memory and runtime that your applications used.

The "ruse" (for "Resource USE") utility will measure your applications memory use over time, and write the maximum amount of memory you used, the amount of CPU you needed and the time it took.

Load the 'ruse' module:

$ module load ruse

Then add "ruse" to the front of the command you run for your calculations. For exemple, if you are running "BUSCO", a genomics assessment tool, you would normally run it like:

busco -c 16 -i assembly.fasta -l insecta_odb10 -m genome -o amazing_result

To see how much memory and cores you're actually using, add "ruse" to the front:

ruse busco -c 16 -i assembly.fasta -l insecta_odb10 -m genome -o amazing_result

Ruse will wake up once every 10 seconds, measure how much memory and cores your application is currently using, then go back to sleep. At the end of the run it will create a small file like "busco_123456.ruse" (program name and the process ID) that tells you the peak memory that you actually used, as well as the total runtime:

Time:           01:33:56
Memory:         18.1 GB
Cores:          16
Total_procs:    78
Active_procs:   47
Proc(%): 85.6  57.9  54.3  52.1  50.2  48.0  46.1  43.5  41.3  38.7  36.1  33.1    [...]

Use those figures to guide you in selecting the right amount of memory and running time.

Ruse is very light-weight, and has no impact on your running time or memory use. You can add it in front of each command you run, and use it in your regular analysis jobs.

Please take a look at the Ruse documentation page for more examples and all the details.