Develop the basic methodology of hypothesis testing for statistical analysis of experimental and simulation studies. Through lectures and exercises using Python, explore the fundamentals of probability theory, population statistics, and statistical methods including p-values, t-test, U-test, Welch test, confidence intervals, single and multivariate analyses, and correlations. Extend these concepts with discussion of information theory, mutual information, and experimental design.

Students who have not learned the basics of statistical methods and will conduct experimental studies or numerical simulations in the future are encouraged to take the course.

Every week, a lecture on each topic is followed by an exercise with Python language.

1. Introduction

History and basic concepts of hypothesis testing are explained. The fundamentals of probability distributions are also given.

2. Sampling and Central Limit Theory

The central limit theory is the core of various hypothesis testing methods. Low of large numbers and the theory is explained in the context of sampling from a population. I will also explain the degrees of freedom in data sampling.

3. T-test, U-test, Welch test

Comparison of means between two groups is frequently required in statistical assessment of measured data. Depending on the properties of data, however, different methods should be adopted. These methods are explained together with the basic notions of statistical significance and p-values.

4.Confidence Intervals

Now, the mere use of p-values is not encouraged by experts. First, I will explain why the use of p-values is not sufficient for statistical assessment. Then, I will show how statistical differences can be more reliably assessed within the hypothesis-testing framework by using the confidence intervals of the means and proportions.

5. ANOVA, Effect Size

Statistical comparison between multiple groups is frequently required in a realistic situation. I will explain how such a comparison can be done by comparing the within-class variances and the between-class variances. Various corrections required for multiple comparisons are also explained together with the criteria for statistical differences.

6. Correlation Analysis

Correlation analysis is a standard method for analyzing the statistical relationship between statistical variables. After explaining the meaning of statistical independence, I will explain the correlation analysis of continuous and discrete variables together with their limitations.

7. Information Theory

Information theory is a concept that was not discovered in ancient Greek. In particular, mutual information is often used for quantifying the relationship between two statistical variables. A virtue of mutual information is that unlike correlations mutual information is applicable to variables showing a nonlinear mutual relationship. I will explain the basics of information theory.

Half term course: first 6 weeks of term 2