Review of basics in statistics

We will take a brief tour of some core ideas in statistics. Much of this content may already be familiar. For more in-depth discussion, see for example All of Statistics and The Elements of Statistical Learning.

Probability: modes of convergence and limit laws

Let \(X_n\), for \(n = 1, 2, 3, \dots\), be a sequence of real-valued random variables.

  • Almost sure convergence: \(X_n \overset{\mathrm{as}}{\to} X\) means
    \[ \mathbb{P}(\lim_{n \to \infty} X_n = X) = 1. \]

  • Convergence in probability: \(X_n \overset{p}{\to} X\) means that for every
    \(\epsilon > 0\), \[ \mathbb{P}(|X_n - X| > \epsilon) \to 0 \quad \text{as } n \to \infty. \]

  • Convergence in distribution: \(X_n \overset{d}{\to} X\) means
    \[ \mathbb{P}(X_n \leq x) \to \mathbb{P}(X \leq x) \] at all continuity points \(x\) of the law of \(X\).

  • Almost sure convergence implies convergence in probability.

  • Convergence in probability implies convergence in distribution.

  • Convergence in distribution does not imply convergence in probability (except if the limiting distribution is constant).

  • Convergence in distribution is equivalent to \(\mathbb{E}[f(X_n)] \to \mathbb{E}[f(X)]\) for all bounded and continuous \(f\) (this is part of what’s called the portmanteau lemma).

  • Asymptotic probability notation:
    \(X_n = O_p(a_n)\) means \(\frac{X_n}{a_n}\) is bounded in probability, that is, for each \(\epsilon > 0\), there exists \(M > 0\) such that for large enough \(n\), \[ \mathbb{P}\left( \left| \frac{X_n}{a_n} \right| > M \right) \leq \epsilon. \]
    Similarly, \(X_n = o_p(a_n)\) means \(\frac{X_n}{a_n} \overset{p}{\to} 0\).

Theorem (Law of Large Numbers).
If \(X_1, X_2, \dots\) are i.i.d. with mean \(\mu = \mathbb{E}[X_i]\), then the sample mean \[ \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i \] satisfies \(\bar{X}_n \overset{\mathrm{as}}{\to} \mu\).

Note: this also implies \(\bar{X}_n \overset{p}{\to} \mu\), but not necessarily vice versa (e.g., if the mean is infinite).

Theorem (Central Limit Theorem).
Under the same conditions and assuming a finite second moment, letting \(\sigma^2 = \mathrm{Var}(X_i)\), we have \[ \frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \overset{d}{\to} N(0,1). \]

Theorem (Glivenko-Cantelli).
If \(X_1, X_2, \dots\) are i.i.d., then the empirical distribution function \[ F_n(x) = \frac{1}{n} \sum_{i=1}^n \mathbf{1}\{X_i \leq x\} \] satisfies \[ \sup_x |F_n(x) - F(x)| \overset{\mathrm{as}}{\to} 0. \]

Theorem (Kolmogorov-Smirnov).
Under the same setup, \[ \sqrt{n} \sup_x |F_n(x) - F(x)| \overset{d}{\to} \sup_{t \in [0,1]} |B(t)|, \] where \(B(t)\) is a Brownian bridge (a Brownian motion conditioned on \(B(0) = 0\) and \(B(1) = 0\)).

Probability: basic concentration inequalities

  • Markov’s inequality: if \(X \geq 0\) and \(\mu = \mathbb{E}[X]\), then for any \(a > 0\),
    \[ \mathbb{P}(X \geq a) \leq \frac{\mu}{a}. \]

  • Chebyshev’s inequality: if \(\mu = \mathbb{E}[X]\) and \(\sigma^2 = \mathrm{Var}(X)\), then for any \(t > 0\),
    \[ \mathbb{P}(|X - \mu| \geq t) \leq \frac{\sigma^2}{t^2}. \]

  • Hoeffding’s inequality: if \(X_1, \dots, X_n\) are independent and mean zero, and \(a_i \leq X_i \leq b_i\), then for any \(t > 0\),
    \[ \mathbb{P}(\bar{X}_n \geq t) \leq \exp\left(-\frac{2n^2t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right). \]

  • Bernstein’s inequality: let \(\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i\). If \(X_1, \dots, X_n\) are independent and mean zero, with \(\sigma_i^2 = \mathrm{Var}(X_i)\) and \(|X_i| \leq M\) for all \(i\), then for any \(t > 0\),
    \[ \mathbb{P}(\bar{X}_n \geq t) \leq \exp\left(-\frac{nt^2/2}{\frac{1}{n} \sum_{i=1}^n \sigma_i^2 + Mt/3}\right). \]

Some basic results in stochastic convergence

Continuous Mapping Theorem

Let \(g: \mathbb{R} \to \mathbb{R}\) be a continuous function. Then:

  • If \(X_n \overset{p}{\to} X\), then \(g(X_n) \overset{p}{\to} g(X)\).
  • If \(Y_n \overset{d}{\to} Y\), then \(g(Y_n) \overset{d}{\to} g(Y)\).

Slutsky’s Theorem

Let \(X_n \overset{d}{\to} X\) and \(Y_n \overset{d}{\to} c \in \mathbb{R}\). Then:

  • \(X_n + Y_n \overset{d}{\to} X + c\)
  • \(X_n Y_n \overset{d}{\to} cX\)