Install Software and Create Modules

You can install any software you need for your own use on the cluster. And by using modules you can install software for your entire unit, so you don't all need to do it separately. We will give you some pointers on building software on the clusters below.

We install all scientific software as modules on the OIST clusters. You can use any software simply by loading the module. We will show you how you can create your own modules for your software, that any unit member can use in the same way. You can also create separate modules for different versions of the software.

Finally, we will dive a little deeper into software installation and development with a few pointers on how to improve performance.

Also, remember that you can contact us or come to our Open Hours if you ever need any help or support!

We will cover the following three topics:

  1. Installing software on the clusters;
  2. Create a module file;
  3. Use numerical libraries and improve performance.
     

Install your software

First, please refer to the pages for Deigo and Saion for system-specific notes on installing software.

In particular, you need to build software on a compute node, and select the appropriate node for the software you are building — an AMD node on Deigo for general software, or a GPU node on Saion for GPU-accellerated applications.

For Deigo, code compiled on an AMD node will work fine everywhere. If you compile on an Intel node it might not work properly on the AMD nodes. So unless you intend to use the Intel nodes specifically, you should build software on the AMD nodes.

Here's an example of an interactive job for building software:

$ srun -t 0-2 -p short -C epyc -c 16 --mem=8G --pty bash

This command will start a 2-hour job on the "short" partition, with 16 cores and 8GB memory. The "-C epyc" option tells Slurm to select a node with tag "epyc" — these are the AMD nodes. If you want to use the Intel nodes, use "-C xeon" instead.

You can install software into your own home directory. This is convenient if you are the only user. But all units also have a directory /apps/unit/UnitU/ where you can install software for yourself or for your unit.

In your unit you can organize this area any way you want, but we recommend that you organize your software something like this:

/apps/unit/UnitU/software-name/version/
/apps/unit/UnitU/user-name/software-name/version/

If you are installing, say, program Foobar version 1.15 for everyone in your unit then you would install it into /apps/unit/UnitU/foobar/1.15/. With one directory per package and one subdirectory per version you can have multiple versions installed. You can upgrade to the newest release, while other users can stay with an older, well-tested version.

If you're installing just for yourself, you would install into /apps/unit/UnitU/your-name/foobar/1.15/. This keeps it separate from other members software, and after you've left OIST, your colleagues know this particular software is no longer needed.
 

Building software

The general flow for building a new piece of software is:

  1. Find the source repository (often Github or similar) online;
  2. Take a quick look at the "readme" for specific information;
  3. download the version you need;
  4. Unpack into a temporary directory, perhaps in your home;
  5. Check the "README" and/or "INSTALL" files for building instructions;
  6. configure and build the software according to the instructions;
  7. install into /apps/unit/ or into your home.
  8. Once it works, you may want to clean up or delete the source code.

We can't give you definite instructions — there are many ways to package and build code, and each piece of software is different. But we can give you some pointers to common issues.

Separate the build directory and the installation directory

Never install the software into the same directory as the source code. This can break the software, and it makes it impossible for you to clean up the source code once you've finished installing it.

The exception is when the instructions explicitly tell you to do this (as we said above, every piece of software is different).

There is always a way to install into a directory of your choice

Most software and most instructions assume that you are installing on your own personal computer, and that you can install it into one of the system directories under /usr such as /usr/local . However, on the clusters you don't have permission to do so  — and indeed, we do not install anything there either.

Instead you need to install either into your home or into /apps/unit . There is always a way to do this, even if the instructions don't tell you how.

Binary software or software written in Java can often simply be copied to the target directory. You may need to set a few environment variables so the system can find the program (we'll talk about that below), and occasionally the software may need a few extra settings as well, but the documentation will tell you about that.

If you build the software from source you can usually set the target directory when you build the package. Read the documentation ("install.txt" or "readme.txt" for instance), and check the application website for details. We look at a few common cases below:


Automake: Many programs use GNU Automake to configure, build and install. The instructions usually tell you to do something similar to:

./configure
make
make install

With Automake you can almost always give the target directory as a --prefix parameter to configure:

./configure --prefix=/apps/unit/UnitU/foobar/1.15

Other than that, follow the installation instructions.

As a tip, it can sometimes be convenient to create a build/ subdirectory and build from there:

mkdir build
cd build
../configure --prefix= ...
make
make install

This way, all generated files and build configuration changes happen inside the build directory. The actual source tree is completely untouched. If you fail to build and want to try again, all you need to do is delete everything in build/ and start over, without having to clean out and unpack the whole thing from the beginning.

CMake: CMake is another popular build system. You specify the target directory with CMAKE_INSTALL_PREFIX when you run cmake:

cmake -DCMAKE_INSTALL_PREFIX=/apps/unit/UnitU/foobar/1.15 .

As with Automake above, you can usually do the entire build process in a build/ subdirectory:

mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/apps/unit/UnitU/foobar/1.15 ..

 

Python: Don’t use the default system Python to install or run python programs. It is meant for system administration and is not optimized. You load a python module like this:

$ module load python/3.7.3

Always use Python 3 — we no longer support Python 2 — and to just to be safe, specify the version when you run it: use “python3” or “pip3” instead of just “python” and “pip”.

You can install Conda-Forge for yourself if you want the "conda" package tool. We’re hesitant to recommend it, as it breaks easily. You need to be very careful never to mix conda packages with anything else. Always use conda environments for anything you do.

Installing python programs

We now recommend that you always use python environments when installing python packages for yourself. An environment is a self-contained area where the packages you install can’t affect any other software outside of it. As an aside, Anaconda environments are exactly the same thing under a different name.

create an environment

A python environment is a directory where everything gets installed; and scripts for activating (making the software available) and deactivating it. You create an environment like this:

$ virtualenv --system-site-packages /apps/unit/MyunitU/mysoftware

This creates a new environment called mysoftware in the directory /apps/unit/MyunitU. You can create an environment anywhere, including your home directory. If you don’t give a path it gets created in your current directory.

You activate and deactivate an environment by running “source” or “.” on the scripts in the bin/ directory inside:

# activate the environment "mysoftware"
$ source /apps/unit/MyunitU/mysoftware/bin/activate

# to deactivate, run the 'deactivate' command
$ deactivate

Once activated you can use pip3 to install any python package, and it will be installed in the environment. Once installed the environment needs to be activated every time you log in and want to use the software.

As an example, here are the steps to install pytorch:

# load a python module
$ module load python/3.7.3

# create a "pytorch" environment in the current directory
# and activate it
$ virtualenv --system-site-packages pytorch
$ source pytorch/bin/activate

# install pytorch (the package is called just "torch")
$ pip3 install torch

...
# Next time we log in, we need to activate it again
$ source pytorch/bin/activate

If you need a specific version of a package, add the version like this:

pip3 install packagename==version 

If you want to ignore an already installed system-wide package, use the "-I" option:

pip3 install -I packagename==version 

You can see all available versions by deliberately not giving a valid version:

pip3 search packagename==

It will throw an error, but also give you a list of all available versions.

Using “--user” or “--prefix” to install Python packages

We used to recommend using the “--user” and “--prefix” to install python packages. For various reasons we now recommend that you use environments as we described above. Python environments will isolate your different installations from each other and make sure you don’t break any existing software whenever you install something new.

Nevertheless, there are still times you may want to do it the “old” way, so here are brief instructions on doing that:

With pip you can use a “--prefix” parameter to tell it where to install a package:

pip3 install --prefix=/apps/unit/UnitU/foobar/1.15 packagename

After installation you need to make sure that Python can find the packages. You set the PYTHONPATH environment variable. If the package included a program to run you also want to set PATH to point to the bin/ directory:

# So python can find the package
export PYTHONPATH="/apps/unit/UnitU/foobar/1.15/lib/python3.7/site-packages/:$PYTHONPATH"

# so you can find and run installed programs
export PATH="/apps/unit/UnitU/foobar/1.15/bin:$PATH"

Note that it points into a specific subdirectory lib/python<python version>/site-packages under your installation directory. Make sure the python version (“3.7”) matches the one you used to install the package.

If you just want to install into your home directory you can use the "--user" option:

pip3 install --user packagename 

This will install it into the hidden .local/ subdirectory in your home. Python will already look for packages there so you need to do nothing to help it find the package. You may still need to add .local/bin to your PATH as above if the package includes an executable program.

PATH and LD_LIBRARY_PATH

When you try to run a program on the command line, the system has a list of directories that it goes to look for the program. This list is stored in the environment variable called PATH. It contains a list of directories, separated with commas, and when you try to run something, the shell will visit each in turn until it finds an executable file with the same name. You can print the list like this:

$ echo $PATH
/opt/shared/slurm/bin:/opt/shared/slurm/sbin:/usr/lib64/qt-3.3/bin:/home/j/jan-moren/perl5/bin:
/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/opt/shared/local/bin:
/apps/local/bin:/home/j/jan-moren/bin

The "$" in front of PATH tells the shell to replace "$PATH" with the contents of PATH, then "echo" prints it on screen.

To make your newly installed application available as a command that you can run, you need to add the directory where the binary is installed to the PATH variable. If you installed it under, say /apps/unit/UnitU/foobar/1.15 there will often be a bin/ directory with the actual runnable program in /apps/unit/UnitU/foobar/1.15/bin . You want to add this directory to PATH. On the command line, do:

export PATH="/apps/unit/UnitU/foobar/1.15/bin:$PATH"

We set PATH to be our new directory, followed by a colon ":", followed by $PATH which (as above) expands to the existing contents of PATH. The end result is that our new directory gets added to the front of the list in PATH.

In the same way there is a LD_LIBRARY_PATH variable with a list of directories to search for libraries. If your program complains about missing libraries and you have a lib/ or lib64/ directory next to the bin/ directory, add that lib/ directory to LD_LIBRARY_PATH in the same way you added bin/ PATH above.

Finally, as we explained above, for Python programs you may need to add the installation directory to PYTHONPATH in the same way as well.
 

General advice

  • Start from a clean slate. Do "module purge", then load only the modules you need. Build the software, and write down all the steps you take, the compilation options you use, modules you need, and environment variables you set. Then make sure you can run the software from the new location, and again note anything you need to do.
     
  • For complex software it can be a good idea to create a shell script that does the compilation and installation. That will be your documentation for how to install the software, and it's very handy if you need to install a new version later, or you need to show somebody else how to do it. It can be very difficult to remember all the details months later.
     
  • If your software package is big and complicated, don't try to build a highly optimized version at once. First make just a plain, default application without any special options so you are sure that you can build and install it. Once you can build and run it, you can start improving the installation.
     
  • Once you're done and the program runs, its often a good idea to remove the build directory and perhaps also the source tree. These files can often take a lot of space, better used for research data.
     
  • And never forget that you can get help from us at the daily Open Hours.
     

Create a module file

We use the “Lmod” module system on our clusters. Lmod can use module files written both in a language called “Lua” and in a language called “TCL”. They’re interchangeable; we describe how to write TCL module files here as that’s the format we’ve been using at OIST for many years.

You’ve installed the application in your unit directory, and made sure that it runs. Now let’s create a module file. You want to put all module files for your unit into one directory:

/apps/unit/UnitU/.modulefiles/

Note the leading dot; that makes this a hidden file. It doesn’t have to be hidden if you don’t want, but it makes it a little neater. You can see it with the '-a' parameter to ls:

$ ls -a /apps/unit/UnitU/

If the modulefile directory doesn’t already exist then create it. Also make it accessible for everybody in your unit — your other group members should also be able to install software modules here after all:

$ mkdir /apps/unit/UnitU/.modulefiles 
$ chmod g+rwx /apps/unit/UnitU/.modulefiles

If another unit member has created the directory but you can’t write to it then you need to ask them to set group permissions as above.

The module file should be named after the version of the package, and it should be in a subdirectory of the same name as the package. So if we are installing a software called “foobar” version 1.15, it should be installed into /apps/unit/UnitU/foobar/1.15/ and the module file should be named ‘1.15’ in the subdirectory "foobar’:

/apps/unit/UnitU/.modulefiles/foobar/1.15

The module file itself is fairly simple. If you follow the naming convention above you can use the template below and only change a few things. Let’s look at the template:

#%Module1.0##################################################################
set appname    [lrange [split [module-info name] {/}] 0 0]
set appversion [lrange [split [module-info name] {/}] 1 1]
set apphome    /apps/unit/UnitU/$appname/$appversion

## URL of application homepage:
set appurl     https://foobar.com

## Short description of package:
module-whatis  "Foobar does things. It's great! And wonderful! Also good."

## Load any needed modules:
module load python/3.7.3

## Modify as needed, removing any variables not needed.
## Non-path variables can be set with "setenv VARIABLE value"
prepend-path    PATH            $apphome/bin
prepend-path    LD_LIBRARY_PATH $apphome/lib

Adapt the module file

This module file is already mostly complete, so you only need to edit a few things. The first is the third line “set apphome …”. Replace “UnitU” with the directory for your unit software. We take “appname” and “appversion” from the directory name.

Fill in the appurl and module-whatis lines. Set appurl to the homepage for the software. Give the module a good one- or two-line description with module-whatis. That will help other users quickly find out what the software is and where to find more information about it.

In the module load line you add any modules that this software depends on. Please note that modules you used to build the software are not always needed to run it. If you used a particular compiler version to build an application, for instance, then it probably doesn’t need to load the compiler when you run it. It’s better to have fewer module dependencies, so try to avoid loading unnecessary modules.

Next you specify any paths and any environment variables your software needs. You can add entries at the front of path variables with prepend-path, and add to the end with append-path. You can set other environment variables directly with setenv SOME_VARIABLE value. These should all be the same that you used when you ran the application manually.

The paths you need to set depends completely on your package. Here are some common path settings:

prepend-path    PATH            $apphome/bin
prepend-path    LD_LIBRARY_PATH $apphome/lib
prepend-path    LD_LIBRARY_PATH $apphome/lib64
prepend-path    MANPATH         $apphome/share/man

If your module is a library, you sometimes also need to set the paths below so that other applications can find your library:

prepend-path    CPATH           $apphome/include
prepend-path    FPATH           $apphome/include
prepend-path    LIBRARY_PATH    $apphome/lib
prepend-path    PKG_CONFIG_PATH $apphome/lib/pkgconfig

Python Modules

If you want to make a module from software installed into a python environment, you can’t run “source” on the activation script. Instead you set a variable VIRTUAL_ENV to the environment home directory:

setenv          VIRTUAL_ENV     $apphome

If you used “--prefix” or “--user” to install a python package, add the “site-package” directory under the installation directory to the PYTHONPATH variable like this:

prepend-path    PYTHONPATH      $apphome/lib/python3.7/site-package

Note that the path includes the major and minor python version; change it as needed.

The module system is very flexible, and you can do quite a lot with it if you need to. You can read all about LMod on its home page here: Lmod Documentation

Use your new module

To use your new module, you first need to tell the module system where your unit-specific files are. You can tell the module system where to look for module files with the ‘use’ command:

$ module use /apps/unit/UnitU/.modulefiles

You can add that to your $HOME/.bashrc configuration file so you don’t need to type it every time you log in. Now you can load your new module:

$ module load foobar/1.15

When your unit members want to use the modules they only need to do the module use and module load commands as above to access the software.

If you think your software could be useful also for members outside your unit, you can ask us to add it to the system-wide installation. We will be more than happy to add it for you. And if you have an installation script and working module file then that makes it all the easier for us to add it for you.

Advanced Installation

Here are some notes on slightly more advanced topics related to building software for the clusters. A lot of this is from other pages, collected here for reference.

On optimisation

The by far most important thing to keep in mind when trying to improve code speed is to always measure everything. Compilers and CPUs are exceedingly complex systems, and intuition and reason are very unreliable guides for what will be faster. Measure before you do a change, then measure again afterwards. And measure multiple times, with different conditions; a single data point is very unreliable.

If you feel this would be very tedious and time consuming you are completely right. Unless you stand to gain a massive amount of time from improving execution speed you should probably not spend the effort to do so.
 

Compilers and Numerical libraries

The choice of compiler can make a fair bit of difference in speed; it can sometimes also be important for being able to build an application at all.

On Deigo, GCC version 8.3 is the system default compiler. However, we have GCC version 9.1.1 available as a module (and may have newer versions still when you read this). It has several performance improvements, and we recommend using it if you can. All Deigo modules have been built using this version. We will make newer versions available as they are released.

As an alternative, we also have the Intel compilers (in the "intel-modules" area) and the AMD AOCC compiler (in "amd-modules"). Both can generate code for either Intel or AMD systems.

In our recent experience the Intel Fortran compiler still generates clearly better code than GCC, while for C and C++ the better compiler depends on the code you're building. The AMD AOCC compiler generates good numerical code (for AMD systems especially) but still lags somewhat compared to the other two in general.

If you don't know which to use, and if the installation instructions don't specify a preference, go with GCC 9. GCC is in many ways the de-facto standard and will be least likely to give you problems.
 

We have three numerical libraries available: OpenBLAS, AOCL and MKL.

OpenBLAS

OpenBLAS is a modern, high-performance implementation of BLAS and LAPACK. Version 0.3.3 is installed system-wide, and 0.3.9 is available as a module. The module is quite a lot faster. OpenBLAS is sometimes the fastest numerical library for low-level operations, but some higher-level operations can be slow.

If you're unsure what to use, and you don't want to spend a lot of time on this, pick the OpenBLAS module.

AOCL

AOCL is AMDs optimized numerical libraries (they work fine on both Intel and AMD CPUs). They are split into two modules: "aocl.gcc" and "aocl.aocc", for the GCC compilar and AMDs AOCC compiler respectively.

AOCL contains quite a few libraries, including Blis (for BLAS); libFlame (for LAPACK); Scalapack; a high-performance replacement to libm; an AMD optimized version of FFTW and so on. Please refer to the AMD AOCL web page for details on the libraries.

Blis has single-threaded and multi-threaded implementations. To use the single-threaded version, link with "libblis" or "blis", and link with "libblis-mt" or "bli-mt" for the multithreaded version. We recommend the multithreaded version for best performance. For more information, see the Blis multithreading documentation.

Blis can be very high performance; on AMD it is the fastest library for large matrix multiplications. The higher-level libFlame is not as far along yet, and many high-level operations such as singular-value decomposition are still very slow and unoptimized.

Consider using AOCL if you are mostly doing low-level matrix operations, especially if you are using the AOCC compiler.

MKL

The Intel MKL library is a popular numerical library for x86 computers. It is not open source - you need a license to use it, so if you depend on this for your own software you may be unable to use it when you leave OIST.

Traditionally it has been the fastest library available. Today OpenBLAS and Blis can be as fast or faster for many lower-level operations. While not always the absolute fastest it is still consistently good, making it a good choice for applications that do a very wide range of numerical operations. We build and link Pythons Numpy and Scipy libraries with MKL for this reason.

MKL and AMD CPUs

The Intel MKL library has always been very slow on non-Intel cpus. The library checks the CPU maker and if it's not Intel it deliberately chooses the slowest, unoptimized version of its math functions.

We can get around this by explitly telling the library which type of CPU it should assume that we are using. If we set the environment variable MKL_DEBUG_CPU_TYPE to "5" the library will run very well on our AMD cpus. But you need to leave this unset for our Intel cpus.

You do not need to worry about this. Don't set this yourself. We set and unset this environment variable as needed on the different compute nodes, giving you the best speed everywhere.

You should use the Intel MKL library if you are using high-level functions or if you are using the Intel compiler suite; and if you are not worried about being able to use your code without having an Intel software license.

MPI Applications

Our current newest version of OpenMPI is 4.0.3. We strongly advice you to use the newest version of Open MPI (or the newest Intel MPI), as they have the best support for our high-speed network.

You may have applications that were built against the old OpenMPI 1.10. This verison, while still available as a module, is ancient and has no support for our current network hardware. It will run in a slow generic compatibility mode if it works at all.

  • Your MPI program may fail to work with the 1.10.1 library, due to timing issues.
  • Your MPI program may work, but will be quite slow and not scale well.

For this reason we strongly recommend that you rebuild your application using the newest available OpenMPI version on the cluster.

For many applications it should be as simple as loading the new 4.0.3 version of OpenMPI instead of 1.10.1, then do a clean rebuild.

Missing symbols

In rare cases you may get an error about missing or deprecated symbols when updating to a newer OpenMPI version. This may happen when you are rebuilding a very old application that was never updated for more recent MPI standards.

  • If you can, we suggest you find a newer version of your application and use that. Something so old that they still use the MPI 1 symbols will likely have other issues as well, so a newer version will give you a better experience.

  • If you can't update or if there is no new version available, you need to edit the source code to fix these issues. It is not actually very difficult - you mostly just need to replace the old names with the new. You can find a detailed guide here.