From Motulsky (2014) Chapter 1:
\[\begin{matrix} & \mbox{Population (Model)}& \\ \mbox{Probability} & \downarrow \quad \uparrow & \mbox{Statistics} \\ & \mbox{Sample (Data)} & \end{matrix}\]
R was developed by R. Ihaka and R. Gentleman (1996) as a successor of S language (Chambers & Hastie, 1988).
First, download R package for your computer from R Project site and install as instructed there.
$ r
It is more convenient to use an integrated develpment environment (IDE) like RStudio
R Markdown combines explanation, code, and result in a same file.
“File >New File >R Markdown…” menu to create a new file
You can embed LaTeX-like mathematical formula by Mathjax
\( \)
or $ $
for inline, e.g., \(\sum_{i=1}^n x_i\) or \(\int_0^\infty f(x)dx\).
\[ \]
or $$ $$
for a separate line, e.g., \[\sum_{i=1}^n x_i\] or \[\int_0^\infty f(x)dx.\]
To embed your R code, “Option+Command+I” or “Insert” icon to make a code chunk
R Notebook a new type of R Markdown for embedding R Markdown code in an html file. This file is an example of an R Notebook.
“File >New File >R Notebook” menu to create a new file, which includes
output: html_notebook
in the file header.
“Prevew >Preview Notebook” button to show a page preview.
Everytime you save your “file.Rmd”, R Studio creates “file.nb.html”.
You can just type in numbers and operators in the Console window.
1+1
[1] 2
2*3
[1] 6
4/5
[1] 0.8
Either **
or ^
for power.
2**3
[1] 8
2^3
[1] 8
Many math functions and some constants are pre-defined.
exp(1)
[1] 2.718282
sin(pi/2)
[1] 1
You can use <-
or =
to assign a value to a variable.
x <- 1
y = 2
x+y
[1] 3
x/y
[1] 0.5
z = x+y
z
[1] 3
ls()
lists objects in your work space.
rm()
removes objects.
ls()
[1] "a" "ai" "alpha"
[4] "av" "b" "b0"
[7] "b0hat" "b1" "b1hat"
[10] "c" "CI" "d2norm"
[13] "dmn" "eps" "ess"
[16] "f" "F" "fpbc"
[19] "fpfd" "fpt" "fptt"
[22] "g" "i" "interval"
[25] "j" "k" "L"
[28] "lambda" "m" "M"
[31] "miss" "mu" "n"
[34] "nb" "nu" "ones"
[37] "p" "phat" "pp"
[40] "pval" "q" "r2"
[43] "rss" "s" "S"
[46] "SE" "sig" "sigma"
[49] "t" "T" "t1"
[52] "tss" "tstar" "tt"
[55] "V" "w" "W"
[58] "x" "X" "x1"
[61] "X1" "x2" "X2"
[64] "xbar" "xq" "XY"
[67] "y" "ybar" "z"
rm(x,y)
ls()
[1] "a" "ai" "alpha"
[4] "av" "b" "b0"
[7] "b0hat" "b1" "b1hat"
[10] "c" "CI" "d2norm"
[13] "dmn" "eps" "ess"
[16] "f" "F" "fpbc"
[19] "fpfd" "fpt" "fptt"
[22] "g" "i" "interval"
[25] "j" "k" "L"
[28] "lambda" "m" "M"
[31] "miss" "mu" "n"
[34] "nb" "nu" "ones"
[37] "p" "phat" "pp"
[40] "pval" "q" "r2"
[43] "rss" "s" "S"
[46] "SE" "sig" "sigma"
[49] "t" "T" "t1"
[52] "tss" "tstar" "tt"
[55] "V" "w" "W"
[58] "X" "x1" "X1"
[61] "x2" "X2" "xbar"
[64] "xq" "XY" "ybar"
[67] "z"
getwd()
to get or set the working directory
getwd()
[1] "/Users/doya/Dropbox (OIST)/R/StatisticalMethods"
help()
or ?
for help. ??
for related topics.
?sin
??sin
You can also use “Help” panel of RStudio.
c( )
cancatenates objects to make a vector.
a = c(1,2,3)
a
[1] 1 2 3
You can access each component by [ ]
.
a[1]
[1] 1
a[c(2,3)]
[1] 2 3
Operators work component-wise. There are many functions that work with vectors.
a+a
[1] 2 4 6
3*a
[1] 3 6 9
a*a
[1] 1 4 9
mean(a)
[1] 2
exp(a)
[1] 2.718282 7.389056 20.085537
Regular vectors can be created by :
or seq()
.
b = 1:5
b
[1] 1 2 3 4 5
c = seq(0,10,2)
c
[1] 0 2 4 6 8 10
You can also create an empty array and later assign values.
d = array(dim=10)
d[1] = 1
d
[1] 1 NA NA NA NA NA NA NA NA NA
You can create a new matrix by
A = matrix(0, 2, 3)
A
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
or by combining vectors
B = rbind(1:3, 4:6)
B
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
C = cbind(1:3, 4:6)
C
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
You can reshaping a vector into a matrix.
D = 1:6
dim(D) = c(2, 3)
D
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
You can access the components by [ , ]
D[1,2]
[1] 3
D[1:2,2:3]
[,1] [,2]
[1,] 3 5
[2,] 4 6
D[1,]
[1] 1 3 5
D[,1]
[1] 1 2
Most operators and functions work component-wise.
A = rbind(1:2, 3:4)
A
[,1] [,2]
[1,] 1 2
[2,] 3 4
A+A
[,1] [,2]
[1,] 2 4
[2,] 6 8
A*A
[,1] [,2]
[1,] 1 4
[2,] 9 16
sin(A)
[,1] [,2]
[1,] 0.841471 0.9092974
[2,] 0.141120 -0.7568025
To perform matrix multiplication, we must use %*%
A %*% A
[,1] [,2]
[1,] 7 10
[2,] 15 22
Inverse is given by solve()
function
Ainv = solve(A)
Ainv
[,1] [,2]
[1,] -2.0 1.0
[2,] 1.5 -0.5
A %*% Ainv
[,1] [,2]
[1,] 1 1.110223e-16
[2,] 0 1.000000e+00
Use t()
for a transpose matrix.
t(A)
[,1] [,2]
[1,] 1 3
[2,] 2 4
list
is a collection of components of arbitrary types, e.g., numbers, matrices, characters, or functions.L = list(name="OIST", address=c("Tancha","Onna","Okinawa"), age=6)
L[1]
$name
[1] "OIST"
L[[1]]
[1] "OIST"
L[["name"]]
[1] "OIST"
L$name
[1] "OIST"
L$add
[1] "Tancha" "Onna" "Okinawa"
You can store multiple lines of R commands as a script for repeated use.
# bern.R: Bernoulli distribution of 0 and 1
p = 0.5 # probability of 1
if (runif(1) < p){ # sample a random number uniformly in 0 and 1
x = 1 # 1 if that falls below p
}else{
x = 0 # 0 otherwise
}
print(x)
# starts a comment, which is skipped by R but helpful for humans, including yourself, to understand what the code does and what the parameter means.
Go to the directory where the file was saved, start R, and type in:
source("bern.R")
[1] 0
For repeating a procedure with different parameters, you would define your own function, in the following convention:
bern <- function(p){ # p: probability of 1
if (runif(1) < p){ # sample a uniform random number
x = 1 # 1 if that falls below p
}else{
x = 0 # 0 otherwise
}
return(x)
}
You can call it with any parameter:
bern(0.2)
[1] 0
bern(0.8)
[1] 1
Try making bionomial distribution by adding 1s of n bernoulli samples using for loop:
binom = function(n, p) { # n:samples, p:probability
y = 0 # prepare a counter
for (i in 1:n) {
x = bern(p) # 1 or 0
# print(x) # uncomment to check each x
y = y + x
}
return(y)
}
You can test it with different parameters:
binom(5, 0.5)
[1] 2
binom(10, 0.3)
[1] 4
Let us take samples from a binomial distribution
m = 1000 # number of samples
n = 10
p = 0.5
x = array(dim=m) # prepare an empty array
for (i in 1:m) {
x[i] = binom(n, p) # between 0 to n
}
You can visualize the raw data by plot()
function.
plot(x)
or make a histogram by hist()
.
hist(x)
You can plot any function by specifying x and y for plot()
function.
X = seq(0, n)
plot(X, dbinom(X, n, p))
To overlay a curve on an existing plot, use lines()
function.
hist(x, freq=FALSE) # normalize to sum up to 1
lines(X, dbinom(X, n, p))
R has a simple 3D plot function persp()
.
x = seq(-5, 5, 0.1)
y = seq(0.5, 3, 0.1)
f = function(x,y){exp(-x**2/(2*y**2))/y}
z = outer(x, y, f)
persp(x, y, z, theta=45, phi=30)
There are fancier 3D packages like plot3D
and rgl
.