Topics
- Sample space, Random variable
- Probability mass/density function
- Cumulative distribution function
- Discrete distributions
- Uniform distribution
- Bernoulli distribution
- Binomial distribution
- Poisson distribution
- Multinomial distribution
- Continuous distributions
- Uniform distribution
- Exponential distribution
- Gamma distribution
- Normal (Gaussian) distribution
- Expectation
- Variance
- Law of large numbers
- Central limit theorem
Reference
- Wasserman (2004), Chapters 2, 3, and 5
Sample Space
Sample space \(\Omega\) is the set of possible outcomes.
examples: \(\Omega = \{H, T\}\) for a coin toss
\(\Omega = \{HH, HT, TH, TT\}\) for two coin tosses
Outcome \(\omega\): an element of sample space \(\Omega\)
- example: \(\omega=H\) for a coin toss
Event \(A\): subset of sample space \(\Omega\)
- example: \(\Omega = \{HT, TH\}\) for tow coin tosses
Probability \(P(A)\) of an event \(A\) represents the frequency of observing \(A\)
Random Variable
Random variable \(X\) is a mapping from each outcome \(\omega\in\Omega\) to a real number
Probability for a random variable for the events that give the value:
- $P(X=1) = P({HT,TH}) = 1/2$
- $P(X\le 1) = P({TT,HT,TH}) = 3/4$
In general:
- \(P(X=x) = P(\omega\in\Omega; X(\omega)=x)\)
- \(P(X\in A) = P(\omega\in\Omega; X(\omega)=A)\)
Probability function (or probability mass function)
p = 0.3
x = 0:2
f = c((1-p)^2, 2*p*(1-p), p^2)
plot(x, f, type="h", lwd=3, ylim=c(0, 1)) # "h" for histogram-like

Probability density function (PDF)
Probability density function:
x = seq(-0.5, 1.5, 0.01)
f = dunif(x, 0, 1)
plot(x, f, type="l")

Cumulative distribution function (CDF)
Inverse CDF (or quantile function)
p = 0.5
f = c((1-p)^2, 2*p*(1-p), p^2)
# CDF
F = rep( c(0, cumsum(f)), each=2)
x = c(-1, rep(0:2, each=2), 3)
par(mfrow=c(1, 2)) # side by side
plot(x, F, type="l")
# Inverse CDF
plot(F, x, type="l", xlab="q", ylab="x=F^-1(q)")

- Example: Uniform distribution in \([0,1]\)
x = seq(-0.5, 1.5, 0.01)
# CDF
F = punif(x, 0, 1)
par(mfrow=c(1, 2)) # side by side
plot(x, F, type="l")
# Inverse CDF
plot(F, x, type="l", xlab="q", ylab="x=F^-1(q)")

Discrete Random Variables
\(X \sim F\) means \(X\) has distribution \(F\)
Bernoulli Distribution
\[X \sim \mbox{Bernoulli}(p)\]
- coin toss with the probability of head \(p\)
\[P(X=1) = p\] \[P(X=0) = 1-p\]
The probability (mass) function can be represented as: \[f(x) = p^x(1-p)^{1-x}\]
p = 0.3
x = 0:1
f = p^x * (1-p)^(1-x) # = c(1-p, p)
plot(x, f, type="h", lwd=3, ylim=c(0,1))

Binomial Distribution
\[X \sim \mbox{Binomial}(n, p)\]
- number of heads in \(n\) coin tosses the probability of head \(p\)
\[f(x) = {n \choose x} p^x(1-p)^{n-x}\] \({n \choose x}=\frac{n!}{x!(n-x)!}\): the number of ways choosing \(x\) items out of \(n\).
n = 5
p = 0.6
x = 0:n
f = choose(n,x) * p^x * (1-p)^(n-x)
#f = dbinom(x, n, p)
plot(x, f, type="h", lwd=3, ylim=c(0,1))

Probability distributions in R
Most popular distributions are available with the convention:
d...()
: probability density or mass function
p...()
: CDF
q...()
: Quantile function (inverse CDF)
r...()
: draw samples
# binomial distribution
n = 5
p = 0.3
par(mfcol=c(2,2)) # in 2x2 grid
# mass function
x = 0:n
plot(x, dbinom(x, n, p), type="h", lwd=3, ylim=c(0,1))
# CDF
x = seq(0, n, 0.05)
plot(x, pbinom(x, n, p), type="l")
# Quantile function
q = seq(0, 1, 0.01)
plot(q, qbinom(q, n, p), type="l")
# draw samples
plot(rbinom(100, n, p))

Poisson Disbribution
\[X \sim \mbox{Poisson}(\lambda)\]
- count of events that occur at average rate \(\lambda\)
\[f(x) = e^{-\lambda} \frac{\lambda^x}{x!}\]
lambda = 2
x = 0:10
f = dpois(x, lambda)
plot(x, f, type="h", lwd=3, ylim=c(0,1))

Multinomial distribution
\[X \sim \mbox{Multinomial}(n,p)\]
- For \(k\) possible outcomes with probabiliies \(p=(p_1,..,p_k)\), number of each outcome after \(n\) draws \(X=(X_1,...,X_k)\)
\[f(x) = {n \choose x_1...x_k} p_1^{x_1}...p_k^{x_k}\] \({n \choose x_1...x_k} = \frac{n!}{x_1!...x_k!}\)
p = c(0.4, 0.5, 0.1) # k=3
n = 10
dmn <-function(x1, x2){
if(x1+x2 > n){
return(0) # cannot happen
}else{
x = c(x1, x2, n-x1-x2) # sum up to n
return(dmultinom(x, prob=p))
}
}
x1 = x2 = 0:n
f = outer(x1, x2, Vectorize(dmn))
persp(x1, x2, f, theta=60)

Continuous Random Variables
Exponential Distribution
\[X \sim \mbox{Exp}(\lambda)\]
- Interval of events happening at rate \(\lambda\)
\[f(x) = \lambda e^{-\lambda x}\]
Defined for \(x \ge 0\) and \(\lambda > 0\).
lambda = .5
x = seq(-5, 10, 0.1)
f = dexp(x, lambda)
plot(x, f, type="l")

It is sometimes parameterized by \(\beta = \frac{1}{\lambda}\).
Gamma Distribution
\[X \sim \mbox{Gamma}(a,b)\]
Sum of \(a\) independent samples from Exp(\(b\))
\[f(x) = \frac{b^a}{\Gamma(a)} x^{a-1} e^{-bx}\] where the “Gamma function” is defined as \[\Gamma(a) =\int_0^\infty t^{a-1}e^{-t}dt\] For integer values of \(a\), \(\Gamma(a)=a!\).
a = 1 # same as exp
b = 1
x = seq(-2, 10, 0.01)
f = dgamma(x, a, b)
plot(x, f, type="l")
for (a in 2:6){ # see the change with a
lines(x, dgamma(x, a, b), col=a)
}

For independent random variables \(X_i \sim \mbox{Gamma}(a_i,b)\), \[\sum_{i=1}^n X_i \sim \mbox{Gamma}(\sum_{i=1}^n a_i, b)\]
Normal (Gaussian) Distribution
\[X \sim \mathcal{N}(\mu,\sigma)\] mean \(\mu\) and standard deviation \(\sigma\)
\[f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]
x = seq(-5, 5, 0.1)
f = dnorm(x) # default: mu=0, sigma=1
plot(x, f, type="l")

Expectation
Expectation (or mean, or first moment) of random variable \(X\):
\[E(X) = \int x dF(x)\]
\[E(X) = \sum_x x f(x)\]
\[E(X) = \int_{-\infty}^{\infty} x f(x) dx\]
Expectation is often denoted as \(E(X)=\mu_X=\mu\).
Properties of Expectations
- Expectation of a function \(Y=r(X)\):
\[E(Y) = E(r(X)) = \int r(x)dF_X(x)\]
\[E(\sum_i a_i X_i) =\sum_i a_i E(X_i)\]
- Product of independent random variables:
\[E(\prod_i X_i) =\prod_i E(X_i)\]
Variance
Variance is a measure of the spread of a distribution.
- Variance \(V(X)=\sigma_X^2=\sigma^2\) for a random variable \(X\) with the mean \(\mu\):
\[V(X) = E((X-\mu)^2) = \int(x-\mu)^2 dF(x)\]
\[V(\sum_i a_i X_i) =\sum_i a_i^2 V(X_i)\]
Limit Theory
- Probability theory allows us to predict what happens when we gather a large sample.
The Law of Large Numbers
The sample average \(\bar{X}_n=\frac{1}{n}\sum_{i=1}^n X_i\) converges in probability to the expectation \(\mu=EE(X_i)\).
- \(X_n\) converges to \(X\) in probability
\[X_n\stackrel{P}{\rightarrow}X\]
For every \(\epsilon>0\), as \(n\rightarrow\infty\), \[P(|X_n-X|>\epsilon)\rightarrow 0\]
The Central Limit Theorem
For any distribution of \(X\) with mean \(\mu\) and variance \(\sigma^2\), the distribution of sample averages \(X_n\) follows a Normal distribution \(\mathcal{N}(\mu,\frac{\sigma^2}{n})\)
Deviation of the sample average from the true mean, scaled by \(n\) as \(\sqrt{n}(\bar{X}_n-\mu)\), converges in distribution to a Normal distribution \(\mathcal{N}(0,\sigma^2)\).
\(X_n\) converges to \(X\) in distribution
\[X_n\leadsto X\]
The cumulative distribution function \(F_n(X_n)\) converges to \(F(X)\) at every point \(x\) where \(F\) is continuous. \[\lim_{n\rightarrow\infty}F_n(x)=F(x)\]
Exercise
1. PDF and CDF
- For an exponential distribution, plot the PDF, CDF and inverse CDF (quantile function) by
dexp
, pexp
and qexp
.
- Derive the mathematical form of the CDF of the exponential distribution from its PDF \[f(x) = \lambda e^{-\lambda x}\]
and compare with the plot above.
- Derive the mathematical form of the inverse CDF (quantile function) of the exponential distribution
and compare with the plot above.
2. Relationships between distributions
- Make a sample by summing samples from Bernoulli distribution. Plot its histgram and chek if that fits with the Binomial distribution give by
dbinom()
.
- By taking \(n\) large and scaling \(p\) by \(\frac{1}{n}\) in Binomial distribution, see if the distribution comes close to Poisson distribution
- Draw a sequence of samples from a Bernoulli distribution with small \(p\). Make a histogram of the time intervals between 1s and see what distribution it follows.
- Divide the above sequence into time bins of length \(T\) and count 1s in each bin. What distribution does it follow?
- By summing up multiple samples from exponential distribution, check whether that follows Gamma distribution.
- See in what case Gamma distribution become close to the normal distribution.
3. Expectation and Variance
*is optional for those with mathematical background
Derive the mean and the variance of Bernoulli distribution.
Derive the mean and the variance of Binomial distribution.
Drive the mean and the variance of uniform distribution.
4*) Compute the mean of the exponential distribution from PDF: \[E(X) = \int_0^\infty x f(x) dx\]
5*) Compute the mean of the exponential distribution from CDF: \[E(X) = \int x dF(x) = \int_0^1 F^{-1}(q) dq\]
6*) Derive the variance of the exponential distribution.
- Derive the mean and the variance of Gamma distribution as a sum of the samples from exponential distribution.
