Topics

References

  • Wasserman (2004), Chpter 10
  • Motulsky (2014), Chapters 4, 12, 15-17

Statistical Testing

P Values

  • Probability that the sample is produced according to a null hypothesis

    • binary: more occurrence than control

    • continuous: the mean different from control

Significance level \(\alpha\)

  • Reject a hypothesis if \(P>\alpha\)

  • trade off of false positives and false negatives

Confidence Interval and P Value

  • 95% CI includes the null hypothesis

    • P>0.05 … null hypothesis not rejected
  • 95% CI does not include the null hypothesis

    • P<0.05 … null hypothesis rejected

t-Test

  • Sample from Gaussian \(X_1,..,X_n \sim \mathcal{N}(\mu,\sigma^2)\)

  • Null hypothesis: \(\mu=0\)

  • \(T = \frac{\sqrt{n}\hat{\mu}}{\hat{\sigma}}\) follows t distribution with \(\nu=n–1\).

  • Reject null hypothesis if \(|T| > t^*\)

Multiple Comparison

  • e.g., Genome-wide association study (GWAS)

  • compare rates of \(m\) mutations in patients and controls

  • probability of false positive is \(\alpha\)

  • probability of no false positive is below \((1-\alpha)^m\)

  • probability of at lease one false positive: \(1– (1–\alpha)^m\)

    • \(\alpha=0.05\): 0.4 for \(m=10\)
    • \(\alpha=0.001\): 0.63 for \(m=1,000\)

Bonferroni correction

  • reject null hypothesis if \(P < \frac{\alpha}{m}\)

  • \(1–\alpha\) probability of no false positive

False discovery rate (FDR)

  • Suppose \(m\) null hypotheses are all true

  • then P values shuold be uniformly distributed in (0,1)

  • set FDR = Q (e.g., 0.05)

  • test of smallest P value is ‘discovery’ if \(P<Q/m\)

  • second smallest \(P<2Q/m\), 3rd smallest \(P<3Q/m\),…

  • (false positive)/positive < Q

Analysis of Variance (ANOVA)

Comparing \(k>2\) groups: \((X^1_1,...,X^1_n),..., (X^k_1,...,X^k_n)\)

  • \(k(k–1)/2\) pairwise t-tests?

One-way ANOVA

  • Null hypothesis: all come from the same Gaussian

  • group means: \(M^j = \frac{1}{n} \sum_i^n X^j_i\)

  • total mean \(M = \frac{1}{nk} \sum_j^k\sum_i^n X^j_i\)

  • between group variance: \(V_B = \frac{n}{k–1} \sum_j^k(M^j–M)^2\)

  • within group variance: \(V_W = \frac{1}{nk–k} \sum_j^k \sum_i^n(X^j_i–M^j)^2\)

  • \(F=\frac{V_B}{V_W}\) follows \(F\) distribution \(F(k–1,nk–k)\)

\[f(x;n_1,n_2) ∝ x^\frac{n_1–2}{2}(1+\frac{n_1}{n_2}x)^{–\frac{n_1+n_2}{2}}\]

x = seq(0, 5, 0.1)
plot(x, exp(-x), type="l")  # just for comparison
n1 = c(2, 5)
n2 = c(10, 100)
for (i in 1:length(n1)){
  for (j in 1:length(n2)){
    f = df(x, n1[i], n2[j])
    lines(x, f, col=i*length(n2)+j)
  }
}

  • Reject null hypothesis if F is large

Exercise

1. P-value and t Test

  1. For the above samples, perform t test with \(\alpha=0.05\).
  1. Compare the result with that by t.test() function

2. Multiple Testing

For the above m samples, check the result with Bonferoni correction.

  1. Try False Discovery Rate by p.adjust(p, method="fdr")

3. ANOVA

  1. Apply aov() to several of the samples above.
  1. Take iris or other dataset with three or more groups and try aov().
LS0tCnRpdGxlOiAiNC4gU3RhdGlzdGljYWwgVGVzdGluZyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyBUb3BpY3MKCiogUCB2YWx1ZQoqIHQtdGVzdAoqIE11bHRpcGxlIGNvbXBhcmlzb24KKiBBTk9WQQoKIyMgUmVmZXJlbmNlcwoKKiBXYXNzZXJtYW4gKDIwMDQpLCBDaHB0ZXIgMTAKKiBNb3R1bHNreSAoMjAxNCksIENoYXB0ZXJzIDQsIDEyLCAxNS0xNwoKIyBTdGF0aXN0aWNhbCBUZXN0aW5nCgojIyBQIFZhbHVlcwoKKiBQcm9iYWJpbGl0eSB0aGF0IHRoZSBzYW1wbGUgaXMgcHJvZHVjZWQgYWNjb3JkaW5nIHRvIGEgbnVsbCBoeXBvdGhlc2lzCgogICAgKyBiaW5hcnk6IG1vcmUgb2NjdXJyZW5jZSB0aGFuIGNvbnRyb2wKICAgIAogICAgKyBjb250aW51b3VzOiB0aGUgbWVhbiBkaWZmZXJlbnQgZnJvbSBjb250cm9sCgojIyBTaWduaWZpY2FuY2UgbGV2ZWwgJFxhbHBoYSQKCiogUmVqZWN0IGEgaHlwb3RoZXNpcyBpZiAkUD5cYWxwaGEkCgoqIHRyYWRlIG9mZiBvZiBmYWxzZSBwb3NpdGl2ZXMgYW5kIGZhbHNlIG5lZ2F0aXZlcwoKIyMjIENvbmZpZGVuY2UgSW50ZXJ2YWwgYW5kIFAgVmFsdWUKCiogOTUlIENJIGluY2x1ZGVzIHRoZSBudWxsIGh5cG90aGVzaXMKCiAgICArIFA+MC4wNSDigKYgbnVsbCBoeXBvdGhlc2lzIG5vdCByZWplY3RlZAoKKiA5NSUgQ0kgZG9lcyBub3QgaW5jbHVkZSB0aGUgbnVsbCBoeXBvdGhlc2lzCgogICAgKyBQPDAuMDUg4oCmIG51bGwgaHlwb3RoZXNpcyByZWplY3RlZAoKIyMgdC1UZXN0CgoqIFNhbXBsZSBmcm9tIEdhdXNzaWFuICRYXzEsLi4sWF9uIFxzaW0gXG1hdGhjYWx7Tn0oXG11LFxzaWdtYV4yKSQKCiogTnVsbCBoeXBvdGhlc2lzOiAkXG11PTAkCgoqICRUID0gXGZyYWN7XHNxcnR7bn1caGF0e1xtdX19e1xoYXR7XHNpZ21hfX0kIGZvbGxvd3MgdCBkaXN0cmlidXRpb24gd2l0aCAkXG51PW7igJMxJC4KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIAoqIFJlamVjdCBudWxsIGh5cG90aGVzaXMgaWYgJHxUfCA+IHReKiQKCgojIyBNdWx0aXBsZSBDb21wYXJpc29uCgoqIGUuZy4sIEdlbm9tZS13aWRlIGFzc29jaWF0aW9uIHN0dWR5IChHV0FTKQoKKiBjb21wYXJlIHJhdGVzIG9mICRtJCBtdXRhdGlvbnMgaW4gcGF0aWVudHMgYW5kIGNvbnRyb2xzCgoqIHByb2JhYmlsaXR5IG9mIGZhbHNlIHBvc2l0aXZlIGlzICRcYWxwaGEkCgoqIHByb2JhYmlsaXR5IG9mIG5vIGZhbHNlIHBvc2l0aXZlIGlzIGJlbG93ICQoMS1cYWxwaGEpXm0kCgoqIHByb2JhYmlsaXR5IG9mIGF0IGxlYXNlIG9uZSBmYWxzZSBwb3NpdGl2ZTogJDHigJMgKDHigJNcYWxwaGEpXm0kCgogICAgKyAkXGFscGhhPTAuMDUkOiAwLjQgZm9yICRtPTEwJCAgCiAgICArICRcYWxwaGE9MC4wMDEkOiAwLjYzIGZvciAkbT0xLDAwMCQKCiMjIyBCb25mZXJyb25pIGNvcnJlY3Rpb24KCiogcmVqZWN0IG51bGwgaHlwb3RoZXNpcyBpZiAkUCA8IFxmcmFje1xhbHBoYX17bX0kCgoqICQx4oCTXGFscGhhJCBwcm9iYWJpbGl0eSBvZiBubyBmYWxzZSBwb3NpdGl2ZSAKCiMjIyBGYWxzZSBkaXNjb3ZlcnkgcmF0ZSAoRkRSKQoKKiBTdXBwb3NlICRtJCBudWxsIGh5cG90aGVzZXMgYXJlIGFsbCB0cnVlCgoqIHRoZW4gUCB2YWx1ZXMgc2h1b2xkIGJlIHVuaWZvcm1seSBkaXN0cmlidXRlZCBpbiAoMCwxKQoKKiBzZXQgRkRSID0gUSAoZS5nLiwgMC4wNSkKCiogdGVzdCBvZiBzbWFsbGVzdCBQIHZhbHVlIGlzICdkaXNjb3ZlcnknIGlmICRQPFEvbSQKCiogc2Vjb25kIHNtYWxsZXN0ICRQPDJRL20kLCAzcmQgc21hbGxlc3QgJFA8M1EvbSQsLi4uCgoqIChmYWxzZSBwb3NpdGl2ZSkvcG9zaXRpdmUgPCBRCgojIyBBbmFseXNpcyBvZiBWYXJpYW5jZSAoQU5PVkEpCgpDb21wYXJpbmcgJGs+MiQgZ3JvdXBzOiAkKFheMV8xLC4uLixYXjFfbiksLi4uLCAoWF5rXzEsLi4uLFhea19uKSQKCiogJGsoa+KAkzEpLzIkIHBhaXJ3aXNlIHQtdGVzdHM/CgojIyMgT25lLXdheSBBTk9WQQoKKiBOdWxsIGh5cG90aGVzaXM6IGFsbCBjb21lIGZyb20gdGhlIHNhbWUgR2F1c3NpYW4KCiogZ3JvdXAgbWVhbnM6ICRNXmogPSBcZnJhY3sxfXtufSBcc3VtX2lebiBYXmpfaSQKCiogdG90YWwgbWVhbiAkTSA9IFxmcmFjezF9e25rfSBcc3VtX2pea1xzdW1faV5uIFheal9pJAoKKiBiZXR3ZWVuIGdyb3VwIHZhcmlhbmNlOiAkVl9CID0gXGZyYWN7bn17a+KAkzF9IFxzdW1fal5rKE1eauKAk00pXjIkCgoqIHdpdGhpbiBncm91cCB2YXJpYW5jZTogJFZfVyA9IFxmcmFjezF9e25r4oCTa30gXHN1bV9qXmsgXHN1bV9pXm4oWF5qX2nigJNNXmopXjIkCgoqICRGPVxmcmFje1ZfQn17Vl9XfSQgZm9sbG93cyAkRiQgZGlzdHJpYnV0aW9uICRGKGvigJMxLG5r4oCTaykkCgokJGYoeDtuXzEsbl8yKSDiiJ0geF5cZnJhY3tuXzHigJMyfXsyfSgxK1xmcmFje25fMX17bl8yfXgpXnvigJNcZnJhY3tuXzErbl8yfXsyfX0kJAoKYGBge3J9CnggPSBzZXEoMCwgNSwgMC4xKQpwbG90KHgsIGV4cCgteCksIHR5cGU9ImwiKSAgIyBqdXN0IGZvciBjb21wYXJpc29uCm4xID0gYygyLCA1KQpuMiA9IGMoMTAsIDEwMCkKZm9yIChpIGluIDE6bGVuZ3RoKG4xKSl7CiAgZm9yIChqIGluIDE6bGVuZ3RoKG4yKSl7CiAgICBmID0gZGYoeCwgbjFbaV0sIG4yW2pdKQogICAgbGluZXMoeCwgZiwgY29sPWkqbGVuZ3RoKG4yKStqKQogIH0KfQpgYGAKCiogUmVqZWN0IG51bGwgaHlwb3RoZXNpcyBpZiBGIGlzIGxhcmdlCgojIEV4ZXJjaXNlCgojIyAxLiBQLXZhbHVlIGFuZCB0IFRlc3QKCjEpIEZvciB0aGUgYWJvdmUgc2FtcGxlcywgcGVyZm9ybSB0IHRlc3Qgd2l0aCAkXGFscGhhPTAuMDUkLgoKYGBge3J9CgpgYGAKCjIpIENvbXBhcmUgdGhlIHJlc3VsdCB3aXRoIHRoYXQgYnkgYHQudGVzdCgpYCBmdW5jdGlvbgoKYGBge3J9CgpgYGAKCiMjIDIuIE11bHRpcGxlIFRlc3RpbmcKCkZvciB0aGUgYWJvdmUgbSBzYW1wbGVzLCBjaGVjayB0aGUgcmVzdWx0IHdpdGggQm9uZmVyb25pIGNvcnJlY3Rpb24uCgpgYGB7cn0KCmBgYAoKMikgVHJ5IEZhbHNlIERpc2NvdmVyeSBSYXRlIGJ5IGBwLmFkanVzdChwLCBtZXRob2Q9ImZkciIpYAoKYGBge3J9CgpgYGAKCiMjIDMuIEFOT1ZBCgoxKSBBcHBseSBgYW92KClgIHRvIHNldmVyYWwgb2YgdGhlIHNhbXBsZXMgYWJvdmUuCgpgYGB7cn0KCmBgYAoKMikgVGFrZSBpcmlzIG9yIG90aGVyIGRhdGFzZXQgd2l0aCB0aHJlZSBvciBtb3JlIGdyb3VwcyBhbmQgdHJ5IGBhb3YoKWAuCgpgYGB7cn0KCmBgYAoKCgo=