Advanced Bash - More about the Command Line

Our introduction to the Linux command line gives you enough information to run commands interactively. But Bash can do a lot more. Some of the more advanced ways to use the cluster and the Slurm scheduler need more Bash features than we cover in the introduction.

Here we will present a grab-bag of Bash features that are useful when you go beyond the basics of using the cluster. It is by no means a complete tour of Bash features; we focus on the things that are particularly useful on our clusters.

Also please don’t feel that you have to read this now if you are still a beginner. Save this for when you actually feel you need it, or if you are curious and want to know more.

We will cover the following subjects:

Bash variables
- Parameter expansion - change your output
- Add input parameters to your script
Capture the output of a command
Bash arrays
- Array operations
- Filling arrays
Commands, loops and if statements
Math in Bash
- Integer math
- Floating point math
Related Commands
- Sed and Awk
- Grep and Regular expressions

Bash variables

We briefly covered the basics of variables in the introduction. Let’s review it again.

Bash can use variables to hold values. Bash is text-oriented, so usually the values are strings.

Here is an example that sets a variable to “World!” then prints (using the “echo” command) “hello World!” to the screen (feel free to try this on a command line - the “$” at the beginning of the line shows this is a command you can type):

# Set "my_var" to the value "World!" 
$ my_var="World!"

# Echo "Hello", followed by the value of "my_var"
$ echo "Hello" $my_var
Hello World!

# You can enclose the variable name in curly brackets
$ echo "Hello" ${my_var}
Hello World!

When we set the variable we use just the name, then an equals sign, then the value, with no spaces around the equals sign. If you try to use spaces, bash would see it as separate commands, not a single thing (try it and see what happens).

To use the variable value, we say “$variable” or “${variable}”. That is, we set a “$” sign in front, and optionally enclose the name in curly brackets “{}”.

This is called substitution - the variable, with dollar sign and all, is substituted with its value at that point in the script. It’s also sometimes called parameter expansion but it generally means the same thing.

So, with this line

echo "Hello" $my_var

“$my_var” is first substituted with its value, and becomes

echo "Hello" World!

then Bash runs the line.

As we saw above, you can say either “$my_var” or “${my_var}” to use the variable value:

echo $my_var        # This is the same as below, but easier to write
echo ${my_var}      # this is the same as above, but more correct

Why would you use the curly brackets? Because sometimes you want to insert a value into the middle of some text, and the brackets help Bash understand what part is the variable:

$ name="Janne"

# This works: Bash understands "name" is a variable inside '${...}'
$ echo "This is ${name}s lunch. Please don't touch."
This is Jannes lunch. Please don't touch.

# This fails: Bash thinks the variable after the dollar sign is "names"
$ echo "This is $names lunch. Please don't touch."
This is  lunch. Please don't touch.

Bash understands that in the expression “${name}s” we want to insert the value of name before the “s”. Without the curly brackets, the expression becomes “$names”, so Bash thinks the variable we want to use is “names”, which of course doesn’t exist, so it prints out nothing. Or worse, “names” does exist, and you get nonsense.

Parameter expansion

When you do variable substitution - when you insert the value of a variable into a place in your script - you can change or manipulate the output at the same time. This can be very powerful and very useful.

There are a lot of things you can do, but we will cover only the most useful ones. Also note that you always need to use curly brackets with your substitution for this to work.

As a first example, we can get the number of characters of a variable.

$ myvar="bipp bopp"
$ echo ${#myvar}        # '#' at the start gets the number of characters
9

This shows us the main idea. In the curly brackets, we have not just the variable name, but special other characters that manipulate the value in some way. In this case, adding “#” to the beginning of the name gives us the number of characters in the value, rather than the value itself.

Let’s see two other things you can do: remove the beginning or end of a string value; and you can replace any particular pattern in the string.

remove the beginning or end of a string value

You can remove part of the beginning or end of a value. This is common and very useful. For instance, if you have a filename such as “mypicture.jpg”, you can remove the “.jpg” part and get just the base filename.

You select what to remove by adding “#” or “##” to the end of the variable name to remove text in the beginning; or “%” or “%%” to remove text at the end. After the characters comes what you want to remove.

“#” and “%” removes as little as possible. “##” and “%%” removes as much as possible.

Here’s some examples using “#” and “##” that remove parts from the beginning:

# let's remove leading 0's 
$ val="000123"
$ echo ${val#000}       # remove exactly three '0' from the beginning
123
                        # "*" means any number of any characters   
$ echo ${val#*0}        # remove anything, then a zero, as little as possible
00123

$ echo ${val##*0}       # remove anything, then a zero, as much as possible
123
                        # "?" means any single character
$ echo ${val#???}       # remove three characters, any character
123

You can use simple pattern matching. You can match any explicit character (such as “000” to match exactly three zeros), you can use “?” to match any single character, and you can use “*” to match any number (including zero) of any character. You can do a lot more with pattern matching; see this Linux Journal article for more in-depth examples.

Let’s look at a practical example. You want to convert an image file from one format to another, and you want to keep the same base file name, but you want to change the file ending:

$ filename="mypicture.jpg"

# remove '.jpg' at the end using "`%.jpg`", 
# add '.png' to the end instead, store in outname
$ outname=${filename%.jpg}.png

# 'convert' is an image processing utility
$ convert ${filename} ${outname}

That looks like a roundabout way to convert a single file. But if you have hundreds of files to convert, you can use this as part of a script that converts them all automatically.

Here is how you can use this, combined with capturing the output of a command (see the next section) for a script that submits a Slurm job on the cluster, then grabs the Job ID from the sbatch output:

# This stores the output of running "sbatch" in out
out=$(sbatch my_job.slurm)

# ´out´ is now "Submitted batch job 12345"

# remove exactly the text up to the job ID - note the space at the end
jobid=${out#Submitted batch job }

# another way: remove everything from beginning up to the last space
jobid=${out##* }

search and replace any substring

Let’s say you have a whole list of filenames like this: “mydata_987_input_123.file”. For a given file, you want to generate an equivalent output file by changing the “input” part to “output”. This is one way:

# An example filename
$ inputfile="mydata_987_input_123.file"

# search for "input", replace with "output", save in outfile
$ outfile=${inputfile/input/output}

$ echo $outfile
mydata_987_output_123.file

After the variable name, you add a forward slash (“/”), a pattern that you search for (just the string “input” in this case), another forward slash, then the replacement text.

This replaces the first match. If you want to replace all matches in the string, begin with two slashes (“//”) instead of one. You can also use “*” and “?” to match any sequence and any one character.

Add input parameters to your script

It can be very useful to add parameters to your script. Parameters are the options, the input file names and so on that you give commands on the command line. With input parameters, you can make a script much more general; you may want to give a file or directory name as a parameter instead of hardcoding it into the script, for instance.

Bash defines several special variables that are useful for reading parameters. There are a lot more special variables than these, however; please look up the Bash reference documentation if you are curious.

$1-$9 - the parameters given to our bash script, in order
$@ - all parameters in a single variable
$# - the number of parameters we got
$0 - the name of our script

$1 holds the value of our first parameter (if any). $2 holds the value of the second parameter, and so on.

$@ holds all parameters as a single string, and is useful when you just want to pass the parameters right on to another program.

$# holds the number of parameters. It is useful to check if the user (you) remembered to give all the necessary parameters or not.

$0 holds the name of our script.

Let’s look at a real-world example: make a program easier to run. Some applications are just a little messy to actually use. The “Trimmomatic” application, for instance, asks you to run it like:

$ java -jar /path/to/trimmomatic-0.39.jar <all your options and inputs>

Typing that every time is clearly no fun at all.

We can create a small script that we call “trimmomatic” that actually runs Trimmomatic for us, and passes through all the options and inputs that we give the script. Here is the actual real script we have for the Trimmomatic module on Deigo:

#!/bin/bash
java -jar /apps/free81/Trimmomatic/0.39/lib/trimmomatic-0.39.jar "$@"

We added this script to the Trimmomatic bin/ directory (as trimmomatic) and made it executable with the command:

$ chmod a+x bin/trimmomatic

For (a)ll users, we add(+) e(x)ecutable permissions. As an aside, the “#!/bin/bash” line tells Linux to use the /bin/bash program to run this file.

$@ contains all options and parameters we give the script, and "$@" passes them on to the real Trimmomatic program exactly as if we had run it directly. The quotes are important, by the way.

Now you can run:

$ trimmomatic <options and input files>

and never realize you’re really running a small shell script that runs the real program in turn. You can also add default parameters and other things into your script if you want to make it more advanced.

Here’s a second example that uses the input parameters directly:

Let’s say you have a Slurm script that creates video from png images in a directory. Over time you’ll probably want to do this over and over again, so you don’t want to manually edit the directory name in the script every time you use it.

Instead you can make the directory a parameter to your script. You then run it something like this:

$ sbatch video.slurm /bucket/SmithU/data/experiment_1

Now you can use the same script for any directory in the future, without having to edit it or run only in the current directory.

Here is the script, with some error checking to avoid bad things to happen if you forget the parameter or misspell the directory (always a good idea):

#!/bin/bash
#SBATCH --partition short
#SBATCH --time 0-1
#SBATCH --mem 10G
#SBATCH --cpus-per-task 8

# get our directory as a parameter

# First see if there is any input: is $# less than 1?
if [ "$#" -lt 1 ]
then
    echo "give a directory with video files"
    exit 1
fi

# Did we get a valid directory? if not (!) a directory (-d) then...
if [ ! -d "$1" ]
then
    echo "$1 is not a valid directory"
    exit 1
fi

# we're good, go to the directory and convert the
# images (numbered in sequence) to video
cd $1
ffmpeg -framerate 1/10 -i *%03d.png -c:v libx264 -r 30 -pix_fmt yuv420p out.mp4

Please see this section for the “if” statement, and this section for testing if a directory exists. Note that the ffmpeg line above is an example only; you need to control a couple more parameters for best results.

Run a command and capture the output

You often want to run a command, then store the output in a variable, so you can use it later.

You do this by enclosing the command either with “$( )” or with “` `” (that is, the backwards single quote, or “backtick”, “backquote” or “grave accent”).

They mean exactly the same thing, but I strongly urge you to only use the $() form. The backwards quotes can be really difficult to spot, and are easily confused with straight quotes " ’ " or forwards quotes " ´ ". They are an easy way to confuse yourself.

Think of capturing output as similar to variable substitution above: the $( ) means we run whatever command is inside the parentheses, take the printed output, and substitute the whole “$( )” part with that output, just as we substitute ${ } with the content of a variable.

A couple of examples:

#put the contents of the file 'macbeth.txt' into a variable
$ macbeth=$(cat macbeth.txt)
$ echo "$macbeth"
SCENE I. A desert place.

    Thunder and lightning. Enter three Witches
...

Here we use cat macbeth.txt to output the text of Macbeth. We enclose that command with $( ) so we capture that text, and assign it to our macbeth variable. Finally we output the contents of macbeth using echo.

Why do we put double quotes around $macbeth when we echo it? The text contains spaces and newlines. The echo command displays a set of strings to the output, where each string is separated by spaces. If we print without quotes, each word becomes its own string as far as echo is concerned, and the formatting is lost. With the quotes we keep the text format:

# each word is separate, so we lose the formatting
$ echo is this    a    dagger
is this a dagger

# the entire text is one string, so we retain the spaces
$ echo "is this    a    dagger"
is this    a    dagger

Generally it’s a good idea to double quote variables. I don’t do that everywhere in this document because often it doesn’t matter and I’m lazy. Please do what I say, not what I do.

Here’s a practical example of capturing output:

$ job=$(sbatch myjob.slurm)
$ echo "$job"
Submitted batch job 12345

# We can get just the job ID like this:
$ jobid=${job##* }

We can capture the output of the sbatch command, and the job ID we got. From this, you can extract the job ID for further use - see the section on parameter expansion for an explanation. Our section on Chain Jobs in the Advanced Slurm page (Coming Soon) will show you why this can be very useful.

Bash arrays

Bash have arrays where you can keep a list of values. They look a little messy, but are really not very different from lists in Python or arrays in Matlab for instance. Arrays can be very useful - you will see them used in the “Slurm Arrays” section of the Advanced Slurm page for instance.

A bash array is a variable that holds a list of values. You can then use a number as an index to access any of the values:

$ arr=(alpha beta gamma delta)    # our array

$ echo ${arr[0]}            # We begin counting at 0
alpha
$ echo ${arr[1]}            # we get the value at index 1
beta
$ echo ${arr[-2]}           # negative indexes count from the end
gamma
$ echo ${arr}               # the plain value gives us the first element
alpha
$ echo $arr[1]              # without { } bash doesn't know '[1]' is an index
alpha[1]
$ echo ${arr[@]}            # @ as index gives us all values
alpha beta gamma delta

We define an array called “arr” with four values. The values are separated by spaces. Arrays start counting from 0, and ‘@’ as an index gives us the entire array, with elements again separated by spaces. Negative values count from the end.

Notice that the formal substitution syntax with curly brackets is necessary (see our section on variable substitution). Without it, Bash doesn’t see the index as a part of the variable; for Bash, $arr[1] is the same as ${arr}[1] which is not what we want.

Array operations

You can get the length (number of elements) of an array and the index of each element:

$ arr=(alpha beta gamma delta)
$ echo ${#arr[@]}           # number of elements - note the'#'
4
$ echo ${!arr[@]}           # element indexes - note the '!'
0 1 2 3

You can add elements to an array by adding elements to the end:

$ arr=(a b)         # arr is (a b)
$ arr+=(c)          # add a value to the end: arr is (a b c)

If you want to add elements to the start you can use variable substitution:

# use variable substitution to add elements to the start and the end:
$ arr=(start ${arr[@]} end) 
$ echo ${arr[@]}
start a b c end

We create a new array with "start", the contents of the old array, then "end"; then assign the new array to the old variable.

You can also set a value at any index directly. Bash allows arrays to have gaps:

$ arr=( )               # empty array

$ arr[4]="first"        # set three elements
$ arr[12]="second"      
$ arr[7]="third"        

$ echo ${arr[@]}        # a list of elements
first third second      

$ echo ${#arr[@]}       # three elements
3

$ echo ${!arr[@]}       # element indexes
4 7 12

With variable substitution, we can of course use variables for the index instead of just plain numbers:

$ i=3
$ arr=(zero one two three four)
$ echo ${arr[$i]}       # simpler way
three
$ i=2
$ echo ${arr[${i}]}     # formal way, using '${ }' everywhere
two

Filling arrays

If you want an array of file names, you can let Bash fill in the list directly into an array:

slurmfiles=(./*slurm)                   
echo “slurm files: “ ${slurmfiles[@]}

In the first line, ./*slurm matches all files in the current directory that end with “slurm”. Bash recognizes this as a file name pattern, not a single string element.

slurmfiles=(./*slurm) creates an array called ‘slurmfiles’ with the filenames as elements.

As we saw earlier, you can capture the output of a command. And you can create an array from a list of strings with spaces between them. This means you can create an array from the output of any command. Each array element will be one space-separated word of the output.

Let’s capture the output of an sbatch submission command, and split into an array:

$ cmdarr=( $(sbatch myjob.slurm) )
$ echo ${cmdarr[@]}
Submitted batch job 12345

$ echo ${cmdarr[3]}
12345

We captured the output of the sbatch command using $( ... ), then made an array with each space-separated word in a separate element using ( ). Finally we store that array in cmdarr.

Now we can get the actual job ID by looking at element 3 (remember, elements start at 0). In the section on “chain jobs” in our “Advanced Slurm” page you will find this very useful.

As we saw in the section on variables, you can do this in a different way with parameter expansion. The benefit of doing it with an array like this is that you can easily get any word in the output, not just the first or last. Both methods are useful, and both have their place. Also, it’s always good to know multiple ways to do something.

The `mapfile` command

“mapfile”, also called “readarray” can read an input line by line (or word by word) and fill it into an array. It’s a buit-in Bash command (so it’s not available in other shells).

mapfile takes an inpupt from stdin, so you can redirect text from any source (you can't use pipes for deep technical reasons). It is quite flexible: You can set what to split on (by default it’s newline), how many lines to read, whether you want to skip lines at the start and more.

Let’s read Macbeth into an array:

# we get Macbeth line by line in "maclines"
$ mapfile -t <macbeth.txt maclines

# How many lines?
$ echo ${#maclines[@]}
4828
# Seems right:
$ wc -l Macbeth.txt
4828

By default mapfile looks for the “newline” separator and splits the input there. The “-t” option tells mapfile to remove that separator; if we didn’t use it, each element in our array would end with a newline.

With the “-d” option we can change the separator so we can get the words instead. If we set it to " " (a space character) mapfile will split on spaces instead of newlines:

# Let's try to get the words:
$ mapfile -d " " -t <macbeth.txt macwords
$ echo ${#macwords[@]}
25332
# Something is wrong:
$ wc -w <Macbeth.txt 
18101
# we have extra empty "words"
$ echo ${macwords[13]}

# Let's use command capture instead:
$ macwords=($(cat Macbeth.txt))
$ echo ${#macwords[@]}
18101

mapfile will split on every space. That is, with two spaces between words, that will be two splits. If we split the string “hi Bob” with two spaces between the words, we’d get the elements “hi”, " " and “Bob”. That is, mapfile will treat the first space as the end of “hi”, but the second space as the end of an empty element. If you want words in a text into an array, it’s better to use command capture to create an array instead.

Splitting on other characters can still be useful for things such as comma-delimited numerical data and things like that, though, so don't completely ignore this.

Commands, loops and if statements

Bash is not just a shell. It is a programming language. Many people agree that it’s not a particularly good language, and you probably shouldn’t write any big programs in it. But it is very convenient to create small shell scripts that automate various things in your sbatch files or elsewhere.

Command exit codes and testing

When we run a command - a program or a built-in Bash function - they always return a value, between 0 and 255, called the exit code. This is almost always used to find out if the command was successful or not: 0 means it worked, while any other value indicates some kind of error or problem.

You can get the exit code directly from the “$?” variable. It contains the latest exit code:

# this hopefully works fine, so exit code is 0
$ ls
...
$ echo $?
0

# We try to list a directory that doesn't exist, so exit code is not 0
$ ls ighfkfslghslfhg/
ls: cannot access 'ighfkfslghslfhg/': No such file or directory
$ echo $?
2

The application manual will often tell us what the different error codes mean, but anything not zero is usually a problem of some sort.

Some commands use exit codes as a result, not as an error. This is typically commands that search for something, compares files or values and other operations like that. The exit code then tells you if the comparison or search was successful or not. A 0 value means “true” or successful — you found what you searched for, or the things you compared were equal. a non-zero value means “false” or unsuccessful — you found nothing, or the comparison was different.

Here’s a few examples:

# We search for a string in Macbeth.
$ grep "a dagger" macbeth.txt 
Is this a dagger which I see before me,

# We found a match so the exit code is 0 (true)
$ echo $?
0

# We search for a different string.
$ grep "a donut" macbeth.txt 

# No match, so the exit code is 1 (false)
$ echo $?
1

The “test” command

The “test” command is a general command for comparing values and file types. It’s very useful and as you’ll see below you often use it without even realizing that you do.

We’re not going to show you everything it can do (do “man test” to get a summary), but here are a few examples:

$ a=1
$ mystr="hello"

# numerical comparisons
$ test $a -eq 1                # this is true; a is equal to 1
$ test $a -lt 0                # this is false, a is not less than 0

# string tests
$ test "$mystr" == "hello"     # this is true.
$ test -z "$mystr"             # this is false - mystring is larger than zero

# file tests
$ test -e myfile.txt           # myfile.txt exists
$ test -f myfile.txt           # myfile.txt is a normal file
$ test -d myfile.txt           # myfile.txt is a directory

There’s another version of the test program. It is called “[” - yes, the command name is the left square bracket. The main difference to test is that it needs a “]” as the final option. Now you can write your tests inside brackets, which looks much nicer:

# these two lines do the same thing
$ test $s == "hello"
$ [ $s == "hello" ]

You can use either. Many people feel “[” is more elegant, but it’s really just a matter of taste.

Finally, there is another version of test that is done with double brackets “[[ ]]”. This is an extension of the original test program, with a few more ways to test things and ways to combine tests. Also, it is built in to Bash so it can be a bit faster; on the other hand, it’s unique to bash only.

if Statements

The “if” statement does something depending on if the output of a program (such as a test) is true or not. This is the pattern:

if [some kind of test]
then
    A block of commands that 
    run if the test is true
fi

Here’s an example:

# is 1 less than 2?
if [ "1" -lt "2" ]
then
    echo "yes"
fi

# same thing, on one line
# if you put "then" or "fi" on the same line, separate with ";"
if [ "1" -lt "2" ]; then echo "yes"; fi

Here you can see the basic structure: the “if” keyword, then a test or some other operation that gives a zero exit code for “true” or non-zero code for “false”.

After that a “then” keyword. If you put it on the same line as the if part, you need to use a semicolon “;” to separate the test from the then. After then you can add a block of any number of lines that get executed only if the if test returned 0 (“true”).

The whole if block is terminated with “fi” (if backwards); again with a semicolon if you don’t put it on its own line.

Here’s another example, this time doing an operation on a file only if it exists:

f="macbeth.txt"

# see if macbeth.txt is a regular file
if [ -f $f ]
then
    # how many times is Macbeth mentioned in the play itself
    grep -o  "Macbeth" $f |wc -l
fi

# same thing, on a single line with ";" to separate statements:
if [ -f $f ]; then grep -o  "Macbeth" $f |wc -l; fi

The “if” statement also has “elif” and “else” parts. elif lets you add a second test that only is run if the first failed. else is a catchall block that will run if no tests returned “true”:

f1="macbeth.txt"
f2="hamlet.txt"

if [ -f $f1 ]
then
    # how many times is Macbeth mentioned in the play itself
    echo -n "Mentions of Macbeth in Macbeth: "
    grep -o  "Macbeth" $f1 |wc -l

# We don't have Macbeth. See if we have Hamlet instead.
elif [ -f $f2 ]
then
    # how many times is Hamlet mentioned in the play itself
    echo -n "Mentions of Hamlet in Hamlet: "
    grep -o  "Hamlet" $f2 |wc -l

# We have neither file
else
    echo "Oh no!"
fi

for and while statements

You use loops to do something repeatedly, or to do the same thing to a whole set of files, for instance. Bash has two ways to create loops: “for” loops, and “while” loops.

for loops are used when you want to run commands on a list of things. This is the pattern:

for variable in list of things
do
    block of statements
    this runs once for each item in list
done

For will go through and assign the space-separated elements in list to variable one by one, and run the statement block each time.

Here is an example:

# print "one", "two" and "three" in turn
for number in one two three
do
    echo $number
done

# same thing on one command line - note the semicolons to separate statements:
$ for number in one two three; do echo $number; done
one
two
three

This will print “one”, “two” and “three” in turn. We give the for statement a variable name - number - the in keyword, and a list of words separated by spaces. For goes through the list one by one, and sets number to each element in turn. In the block of commands inside we echo the value of number, once per item.

You can use file matching to loop over a list of files:

for textfile in *.txt
do
    echo $textfile " is a text file"
done

You can use command substitution to loop over the output of commands, and you can loop over elements in a bash array with the “${arrayname[@]}” construct that prints out all elements separated by spaces:

# create a bash array with all files ending with .txt, then loop over them
$ textfiles=(*.txt)

$ for tf in ${textfiles[@]}; do echo $tf; done
hamlet.txt
macbeth.txt

# find all .fasta files in all subdirectories, copy them to fastafiles/
for f in $(find . -name "*.fasta")
do
    cp $f fastafiles/
done

You can use for loops to iterate over a range of values, in a couple of ways:

# you can express a range directly with curly brackets
# prints 0 1 2 3 4 5
for i in {0..5}
do
    echo $i
done

# add leading zeros by adding them to the first number, and set a step as a third value
# prints 001 003 005 007 009
for i in {001..10..2}
do 
    echo $i
done

You can express a range for a loop with curly brackets as above. The pattern is {start..stop..increment} where increment is optional. Any leading zeros on start determines how much to pad the output with zeros.

Another way is to use the external seq command. It, too, can generate sequences with settable start, stop and increments. Let seq generate a sequence, capture the output, then use the output in a for loop:

# use seq to generate a sequence
for i in $(seq 5)
do 
   echo $i
done

# seq is useful when you want the sequence to be variable
iter=100

# this doesn't work:
for i in {1..${iter}}

# this works
for i in $(seq ${iter})

The main benefit of seq is that you can use variables to set the start, stop and increments. As always, it's good to know more than one way to do things.

While is perhaps less common than for. The pattern is:

while [test]
do
    block of statements
    run as long as the test is true
done

As long as [test] returns 0 (true), while runs the block of statements. Here’s one way to loop from 1 to 10, using the math functions we describe in the next section:

# a less elegant way to create a sequence of numbers
# loop as long as n is less than (-lt) 11
n=1
while [ $n -lt 11 ]
do
    echo $n
    # this increases n - see "math in Bash" below
    (( n++ ))
done

As you can see above, for is really better for creating nubmer sequences.

While is most often used when you want to do something on each line of a text file. Here we read each line one by one, then reverse each line:

while read -r line
do
    echo $line|rev
done <macbeth.txt

I TCA

.ecalp tresed A .I ENECS

A bit of explanation: read is a Bash command that reads an input, and writes it linewise in the variable you give it - “line” in this case. “-r” prevents it from treating backslashes as a special character. If you give it multiple variables you can get each line split into columns. You can do “help read” to get the documentation.

Where is the input coming from? read gets the input from standard in. As you see, we redirect the input from the file after the done keyword. The entire while construction is a single statement that we can redirect data into or redirect out.

We can use while and for statements as part of a pipeline. Here’s another way to reverse each line, as a single line command, this time piping the input in using cat and using “less” so we can see the output clearly:

$ cat macbeth.txt| while read -r line; do echo $line|rev; done |less

However, in this case the while loop is unneccesary; the “rev” command already reverses its input by line:

# Just reverse each line directly
$ cat macbeth.txt| rev| less

# Even easier, give "rev" a filename to read
$ rev macbeth.txt | less

If we wanted to do multiple things with each line, however, the while loop would be useful.

Math in Bash

Sometimes you need to do simple math in your shell scripts. As always there are numerous ways to do this, but we will present a couple of simple approaches.

Integer math and comparisons

Perhaps the simplest way to do both integer math and numerical comparisons is with the double parenthesis. Enclosing something in double parenthesis tells bash to interpret the things inside as a mathematical expression:

# Set a couple of variables
$ a="2"
$ b="3"

# Inside the double parens (( )) we can use variables and assign to
# them without using "$" to substitute them.
# also, we don't need spaces if we don't want them
$ ((c=a*b+2))
$ echo $c
8

# With a "$" in front, we get arithmetic expansion, just like a variable:
# the expression is calculated, then the entire contruct is replaced
# with the result.
$ c=$(( a*b + 2 ))
$ echo $c
8

# We can use comparisons. 0 means "false"
$ echo $((a>b))
0

# anything not 0 means true
$ echo $((a<b))
1

$ echo ((a==2))
1

There are two slightly tricky things going on here, and it’s worth looking at them to avoid some confusion.

First, the difference between using variable substitution in the expression, like this:

((c = $a *2 ))

And just using the plain variable name, like this:

(( c = a * 2))

If you tried these two variants, you’ll get the same result. So what is the difference? When you use variable substition, the variable is substituted with its value before the expression is evaluated. In the first case, what the double parentheses actually evaluates is this:

((c = 2 * 2 ))

As a is two, “$a” gets replaced with 2, then the expression is evaluated. In the second example, the value of a is looked up when the expression is evaluated. Here’s an example where it matters:

# this works
((c = a*2 ))

# this does not
(($c = a*2 ))

Why does the second one fail? Let’s say c is “4”, just as an example. First “$c” is substituted with the value “4”, then bash tries to evaluate the expression:

((4 = a*2 ))

Which of course is nonsense - you can’t assign a value to the number 4 - and so it fails.

Second, what is really the difference between “(( ))” and “$(( ))”?

In short, (( )) works like a command and will have a return code, as we saw in the section on commands. We can for instance do:

if (( a<10 )); then
    ...

The expression inside is evaluated, and if it succeeds, the return code is 0 (true). if not, it returns 1 (false). We can then use it in tests like the one above.

on the other hand, $(( )) works like variable substitution. The expresion is evaluated, then the whole thing is replaced with the final value. We can treat it the same way as when we use a variable value:

# 'x' is the value of "a+2"
x=$((a+2))

# 'smaller' is true if a is smaller than 10, and false otherwise
smaller=$((a<10))

Floating point Math

Floating point math is not common in Bash. When you need that sort of thing it’s usually time to move to a language more suited to “real” calculations, such as Python, R, or Julia.

However, sometimes you do need it. Bash has no built-in way to do this, so we have to resort to external tools. The easiest way is probably by using the “bc” program.

bc is a calculator tool that reads expressions and evaluates them. We use it by piping in our expression in the form of a string, then capture the output. For example:

# quick calculation.
$ echo "1.2*8" | bc
9.6

# use a bash variable value
$ pi="3.141592"
$ r="5"
$ echo "$pi * $r*$r" | bc
78.539800

# same, but capture the output
$ area=$(echo "$pi*$r*$r" | bc)
$ echo $area
78.539800

The bc command is quite powerful. It has many useful features such as variables and control statements, and you can optionally enable trigonometric and other more advanced functions. But again, if you need these you probably really should move to a language better suited for this sort of thing.

Related Commands

As you may have noticed, we often use other commands as part of out shell scripts.

Any application can in principle work in a script, and there are a lot of small utilities written specifically for use on the shell or in shell scripts. We’ve seen “grep” used in several examples to search for text, for instance, and “bc” in the previous section can help us with floating point calculations.

Let's take a brief look at "sed" and "awk", two real workhorses in shell scripting, and another look at "grep" and regular expressions.

Sed and Awk

Sed and Awk both edit and transform text in various ways. While sed is all about changing the text, awk is more about extracting and summarising information. They complement each other.

sed

“sed” stands for “stream editor”. It lets us edit and transform any kind of text in a programmable way. It reads a line of text, runs a set of commands that perhaps change it, then writes out the resulting line. It then takes the next line and repeats.

Let’s see a few examples.

# 's' is the search and replace command. Replace 'Macbeth' with 'Bob':
$ sed "s/Macbeth/Bob" Macbeth.txt
The tragedy of Bob
...
Bob! Macbeth! Macbeth! Beware Macduff!
...

# the 'g' modifier replaces all occurences, not just the first. 'i' makes
# the match case insensitive
$ sed "s/macbeth/Bob/ig" Macbeth.txt
...

# print only the lines from "ACT I" to "ACT II" inclusive
#
# -n suppresses any default output
# "X,Y p" is a range of lines from X to Y, and "p" prints the lines
# /ACT I$/ searches for any line ending with "ACT I"

$ sed -n '/ACT I$/, /ACT II$/ p' Macbeth.txt

With the "-n" option sed prints nothing by default. “<something>,<something> ...” specifies a range. “/ACT I$/” matches any line that ends with “ACT I” (the "$" matches the end of the line), and “/ACT II$/” means any line that ends with “ACT II”. So the range is from the first match of “ACT I”, to the first match of “ACT II” after that.

Finally the command is “p” for “print”. So we print all lines from “ACT I” to “ACT II” but nothing else.

Learning sed can fill a book. Here is a tutorial with a bit more information: How to use the sed Command on Linux.

Awk

Awk is a full programming language for text processing. Like sed it reads text line by line and processes them. An Awk program can have a “BEGIN” section that runs once before it processes any text; a middle section that does the processing; and an “END” section that runs once the text is finished.

Here’s a few simple examples using the “animals.txt” file:

# print the third then the first item, separated with comma
# on each line of animals.txt
# -e ' ' specifies a line of code.
# $1 - $9 contain the first to 9nth word on the line
$ awk -e '{print $3 ", " $1}' animals.txt
15, Rabbit
1, Habu
...

# Print the animal name only if the count is bigger than 10
# -e is not needed if you have only one code line
$ awk '$3>10 {print $1}' animals.txt 
Rabbit
Fly

# Another way to get the job ID for a slurm submission:
$ out=$( sbatch myjob.slurm )
$ echo $out | awk '{print $4}'
123456

Like with sed you can fill a book with awk information. The GNU Awk Users Guide is a comprehensive walkthrough. Here are some example uses. Here a free ebook. You can find many other example uses online.

One of the best resources you can find for both sed and awk is the book sed & awk. It’s a really good, comprehensive walkthorugh of these two tools. Don’t be put off by the age - these are mature, stable tools and the book is still relevant and up to date today.

Grep and regular expressions

We’ve seen the “grep” command several times already in the examples. In short, it searches for substrings in a file. The format it uses to describe what to search for is called “regular expressions”.

A regular expression is a compact format to describe patterns. It’s used not just by grep, but by a lot of languages and other tools, sed and awk above included. There are several variants, but they all support at least the following:

You can search for literal characters (“hello” matches “hello”)
zero or more of the preceeding character (“h*” matches "“,”h“,”hh", and so on)
Any one character (“.” will match any one character)
Character classes (“[abc123]” will match any one of the characters in the list)
Beginning or end of the line (“^” matches the beginning, “$” matches the end)

A few quick examples using the grep syntax:

# Find "Horatio" in hamlet.txt
$ grep "Horatio" Hamlet.txt

# Find any line that has neither Hamlet nor Horatio
# -v inverts the search to print lines that don't match
grep -v "Hamlet\|Horatio" Hamlet.txt

# find either "gray" or "grey" 
grep "gr[ae]y" Hamlet.txt

# Find anything beginning with "H" then zero or more alphabetic 
# characters (not numbers, spaces or punctuation)
grep "H[[:alpha:]]*" Hamlet.txt

# find any two consecutive characters followed by a question mark
# "." matches any single character
# \( \) makes it into a subexpression we can refer to
# \1 refers to the first subespression
# \(.\)\1 means any one character, followed by the same character
grep "\(.\)\1?" Hamlet.txt

As you can see, regular expressions can quickly become very messy. However they are also very, very powerful, and they’re used to specify patterns not just by grep but by a lot of tools, so they’re well worth learning.

The trick to get a complicated regular expression right is to build it up piece by piece. Start by matching part of what you need, then add things one bit at a time.

There are several variants of regular expressions and the details differ depending on the tool. With grep for instance, we need to add the backslash “\" to characters such as ”(“, ”)" and “|” so that they get their special regular expression meaning instead of just matching those characters. In other tools you may need to do the opposite.

There are many tutorials online; which one is best for you depends in part on what tool you want to use. The Wikipedia page is quite good, as is this tutorial. The regex101 website lets you experiment with regular expressions and try them out interactively. It's a fun way to learn how to write them.

Finally

This is not a complete guide to all that Bash can do. See it as an short guide to some of the more useful parts of the shell. If you are interested in more detail we suggest you look for the reference documentation on the Bash site, or pick up any one of the many excellent books on using Bash.