Appendix C — The multivariate normal distribution

C.1 Definition

The normal distribution is the most important distribution for a single random variable, and its extension to a vector random variable is equally important in Statistics.

Let us denote the \(p\)-dimensional random vector \[\mathbf{x}^T=(x_1,\ldots,x_p)\] where \(x_1,\ldots, x_p\) are univariate random variables.

In the univariate case the probability density function (p.d.f.) \[f(x)=\frac{1}{\sqrt{2\pi} \sigma} e^{ - \frac{(x-\mu)^2}{2\sigma^2}},\] i.e. depends on two parameters: \(\mu\) and \(\sigma\). Note that this formula can also be written as \[\label{eq5_1} f(x)=\frac{1}{\sqrt{2\pi} \sigma} e^{\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)}\] where \(\Sigma=\sigma^2\). When \(\mathbf{x}\) is a \(p\)-dimensional random vector it can be shown that the joint p.d.f. is \[f(\mathbf{x})=\frac{1}{(2\pi)^{p/2}|\Sigma|^{1/2}} e^{-\frac{1}{2}(\mathbf{x}-\mathbf{\mu})^T\Sigma^{-1}(\mathbf{x}-\mathbf{\mu})}\] where \(\Sigma\) is the \(p\times p\) covariance matrix of the random variables and \(|\Sigma|\) is its determinant. This equation reduces to the previous one when \(p=1\). The p.d.f. of the MVN also has two parameters: the vector \(\mathbf{\mu}\) and the covariance matrix \(\Sigma\). In statistical notation, we would write \(\mathbf{x} \sim N_p(\mathbf{\mu},\Sigma)\)

C.2 Linear transformation

The key results about multivariate normal distributions are these.

Linear transformation. If \(A\) is a \(q\times p\) matrix of rank \(q\) (with \(q\leq p\)) \(\mathbf{b}\) is a \(q\)-dimensional vector, then \[A\mathbf{x}+\mathbf{b}\sim N_q(A\mathbf{\mu}+\mathbf{b},A\Sigma A^T)\]

(Note that the formulae for the mean and variance follow from the general results on transformations in the previous section; the extra content in this result is that Normality is preserved by linear transformations.)
Standardization. We can derive a standardizing transformation that produces a vector with zero mean and identity variance. Since \(\Sigma\) is a variance covariance matrix it is positive definite. One of the properties of positive definite matrices is that we can find a \(p\times p\) matrix \(C\) such that \(\Sigma= CC^T\). Then it follows immediately that \[C^{-1}(\mathbf{x}-\mathbf{\mu})\sim N_p(\mathbf{0}_p,I_p)\] where \(\mathbf{0}_p\) is the \(p\times 1\) vector of zeroes and \(I_p\) is the \(p\times p\) identity matrix. Note that \(C\) is not unique; there are many possible choices for \(C\), which can be described as a square root of \(\Sigma\).

So we see that standardization produces a vector of random variables that are independent and identically distributed as \(N(0,1)\). So standardization produces independent standard normal random variables.
If \(\Sigma\) is diagonal, i.e. \[\Sigma =\left(\begin{array}{cccc} \sigma_{11} & 0 & \cdots & 0\\ 0 & \sigma_{22} & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & \sigma_{pp}\end{array}\right)\] then the random variables \(x_1,\ldots,x_p\) are independent and of course they are uncorrelated too.