Appendix A — Matrix algebra essentials
Consider a \(p\times q\) matrix \(A\) \[A=\left( \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1q} \\ a_{21} & a_{22} & \cdots & a_{2q} \\ \vdots & \vdots & \ddots & \vdots \\ a_{p1} & a_{p2} & \cdots & a_{pq} \end{array}\right)\] Sometimes we write \(A=(a_{ij})_{i=1,\ldots,p;j=1,\ldots,q}\) or simply \(A=(a_{ij})\) if the dimensions \(p,q\) are clear. The following are some basic properties concerning matrices.
\(A^T\) denotes the transpose of matrix \(A\) and is obtained from \(A\) by interchanging the rows and columns, that is the columns of \(A^T\) are the rows of \(A\) and the rows of \(A^T\) are the columns of \(A\). Formally, if \(A=(a_{ij})\) we have that \(A^T=(b_{ij})\) with \(b_{ij}=a_{ji}\).
E.g. \[A=\left(\begin{array}{ccc} 2 & 1 & 3 \\ 1 & 0 & -1\end{array}\right) \quad \textrm{and} \quad A^T=\left(\begin{array}{cc} 2 & 1\\ 1 & 0\\ 3 & -1\end{array}\right)\]
The square matrix \((p=q)\) \(A\) is symmetric if \(A = A^T\).
E.g. \[A=\left(\begin{array}{cc} 1 & 2\\ 2 & -1\end{array}\right)\]
The identity matrix is a square matrix that has units in its main diagonal and zero elsewhere; it is denoted by \(I\). Sometimes we write \(I_n\) to indicate that this is a \(n\times n\) identity matrix.
E.g. \[I_3=\left(\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0 \\ 0 & 0 & 1\end{array}\right)\]
If a square matrix \(A\) is invertible, then \(AA^{-1}=A^{-1}A=I\) and \(A^{-1}\) is the inverse of \(A\).
E.g. \[A=\left(\begin{array}{cc} 1 & 2 \\ 2 & -1\end{array}\right) \quad A^{-1}=\frac{1}{-1-4}\left(\begin{array}{cc} -1 & -2 \\ -2 & 1\end{array}\right) = \frac{1}{5} \left(\begin{array}{cc} 1 & 2 \\ 2 & -1\end{array}\right)\] Check \[AA^{-1}=\frac{1}{5} \left(\begin{array}{cc} 1 & 2 \\ 2 & -1\end{array}\right) \left(\begin{array}{cc} 1 & 2 \\ 2 & -1\end{array}\right) = \frac{1}{5} \left(\begin{array}{cc} 5 & 0\\ 0 & 5\end{array}\right) = \left(\begin{array}{cc} 1 & 0\\ 0 & 1\end{array}\right) = I_2\]
If the symmetric matrix \(A\) is non-singular then \(A^{-1}\) is also symmetric.
E.g \(A\) and \(A^{-1}\) as in (4).
The square matrix \(A\) is diagonal if all off-diagonal elements of \(A\) are zero.
E.g. \[A=\left(\begin{array}{ccc} 2 & 0 & 0 \\ 0 & -1 & 0\\ 0 & 0 & 1\end{array}\right)\] Sometimes we write \(A\) as \[A=\textrm{diag}(2,-1,1)\]
Multiplication is distributive over addition and subtraction, so \((A-B)(C-D)=AC-BC-AD+BD\).
\((A+B)^T=A^T+B^T\) and \((AB)^T=B^TA^T\).
If \(\mathbf{a}=(a_1,\ldots,a_n)^T\) is a vector of length \(n\) then \(\mathbf{a}^T\mathbf{a}=a_1^2+\cdots+a_n^2\).
E.g. \[\mathbf{a}=\left(\begin{array}{c} 1\\ 2\\ 3\\ 4\\ 5\end{array}\right) \quad \mathbf{a}^T\mathbf{a}=(1,2,3,4,5) \left(\begin{array}{c} 1\\ 2\\ 3\\ 4\\ 5\end{array}\right) = 1^2+2^2+3^2+4^2+5^2=55\]
If \(A\) is \(n \times p\) matrix then \(AA^T\) is a \(n \times n\) matrix obtained by taking products of the rows of \(A\), whilst \(A^TA\) is a \(p \times p\) matrix obtained by taking products of the columns of \(A\), and are thus both symmetric.
E.g. \[\begin{gathered} A=\left(\begin{array}{cc} 2 & 1\\ 3 & 0\\ 1 & -1\end{array}\right) \quad A^T =\left(\begin{array}{ccc} 2 & 3 & 1\\ 1 & 0 & -1\end{array}\right) \\ A^TA=\left(\begin{array}{ccc} 2 & 3 & 1\\ 1 & 0 & -1\end{array}\right)\left(\begin{array}{cc} 2 & 1\\ 3 & 0\\ 1 & -1\end{array}\right) = \left(\begin{array}{cc} 14 & 1\\ 1 & 0\end{array}\right) \\ AA^T=\left(\begin{array}{cc} 2 & 1\\ 3 & 0\\ 1 & -1\end{array}\right)\left(\begin{array}{ccc} 2 & 3 & 1\\ 1 & 0 & -1\end{array}\right)=\left(\begin{array}{ccc} 5 & 6 & 1\\ 6 & 9 & 3\\ 1 & 3 & 2\end{array}\right) \end{gathered}\]
The trace of a square matrix \(A\) is the sum of its diagonal elements, i.e. by writing \(A=(a_{ij})_{i,j=1,\ldots,n}\), then trace of \(A\) is \(tr(A)=\sum_{i=1}^n a_{ii}\). We have that \(tr(A+B)=tr(A)+tr(B)\), \(tr(cA)=ctr(A)\), and \(tr(AB)=tr(BA)\), for some matrices \(A\) and \(B\) and some scalar \(c\).
E.g. \[A=\left(\begin{array}{ccc} 1 & 2 & -1 \\ 0 & 1 & 3 \\ 1 & 4 & -2\end{array}\right) \quad tr(A)=1+1+(-2)=0\]
An \(n\times n\) matrix \(A\) is called idempotent if \(A^2=AA=A\). An obvious example is \(A=I_n\) and this is the only non-singular idempotent matrix. We can show that if \(A\) is idempotent and if \(A\neq I_n\), then \(A\) is singular and its trace is equal to its rank \(n-p\), for some \(p>0\). Furthermore the eigenvalues are all \(0\) or \(1\), and the matrix is diagonalizable.
If \(A\) and \(B\) are non-singular matrices then \((AB)^{-1}=B^{-1}A^{-1}\).
A.1 Rank of a matrix
Consider the \(p\times q\) matrix \(A\) \[A = \left( \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1q} \\ a_{21} & a_{22} & \cdots & a_{2q} \\ \vdots & \vdots & \ddots & \vdots \\ a_{p1} & a_{p2} & \cdots & a_{pq} \end{array}\right) = \left(\begin{array}{cccc} \mathbf{a}_1, & \mathbf{a}_2, & \ldots, & \mathbf{a}_q\end{array}\right) = \left( \begin{array}{c} \mathbf{b}_1 \\ \mathbf{b}_2 \\ \vdots \\ \mathbf{b}_p \end{array}\right)\] where \(\mathbf{a}_i\) is the \(i\)-th column and \(\mathbf{b}_j\) is the \(j\)-th row \((i=1,\ldots,q;j=1,\ldots,p)\).
The rank of \(A\) is a non-negative integer, written as \(rank(A)\), which gives the number of linearly independent columns or rows of \(A\). According to this \[rank(A)=k, \quad 0\leq k\leq \min(p,q)\] so that there are \(k\) independent row vectors of \(A\), i.e. \[\lambda_1 \mathbf{b_1}+\cdots+\lambda_k\mathbf{b}_k =\mathbf{0} \Rightarrow \lambda_1=\cdots=\lambda_k=0\] and there are \(k\) independent column vectors of \(A\), i.e. \[\mu_1 \mathbf{a}_1+\cdots+\mu_k\mathbf{a}_k=\mathbf{0} \Rightarrow \mu_1=\cdots=\mu_k=0\] It follows that there are exactly \(k\) linearly independent row vectors and \(k\) linearly independent column vectors in \(A\) and any \(k+1\) or larger row vectors or column vectors will be linearly dependent. Below we give a simple example.
Consider the following matrix \[A=\left(\begin{array}{ccc} 2 & 0 & -1 \\ 1 & -4 & 1\end{array}\right)\] What is the rank of \(A\)?
We have \(p=2\) and \(q=3\). There are only 2 row vectors and so \[\lambda_1\mathbf{b}_1+\lambda_2\mathbf{b}_2=\lambda_1\left(\begin{array}{c} 2\\ 0\\ -1\end{array}\right) +\lambda_2\left(\begin{array}{c} 1\\ -4\\ 1\end{array}\right)=\left(\begin{array}{c} 0\\ 0\\ 0\end{array}\right)\] leads to the linear system \[\begin{gathered} 2\lambda_1+\lambda_2=0 \\ -4\lambda_2=0\\ -\lambda_1+\lambda_2=0 \end{gathered}\] which of course gives \(\lambda_1=\lambda_2=0\) and so the 2 row vectors of \(A\) are linearly independent and the rank of \(A\) is 2.
Now let us consider the column vectors of \(A\). There are 3 column vectors and we have \[\mu_1\mathbf{a}_1+\mu_2\mathbf{a}_2+\mu_3\mathbf{a}_3=\mu_1\left(\begin{array}{c} 2\\ 1\end{array}\right) +\mu_2\left(\begin{array}{c} 0\\ -4\end{array}\right) +\mu_3\left(\begin{array}{c} -1\\ 1\end{array}\right)=\left(\begin{array}{c} 0\\ 0\end{array}\right)\] which leads to the linear system \[\begin{gathered} 2\mu_1-\mu_3=0 \\ \mu_1-4\mu_2+\mu_3=0 \end{gathered}\] which gives the solution \(\mu_1=4\mu_2/3\), \(\mu_3=8\mu_2/3\) and \(\mu_2\) runs free in the real line. Obviously the 3 column vectors are linearly dependent and so the rank of \(A\) can not be 3 (something we expected from before).
Now let’s take the first 2 column vectors and write \[\nu_1\left(\begin{array}{c} 2\\ 1\end{array}\right) + \nu_2\left(\begin{array}{c} 0\\ -4\end{array}\right) =\left(\begin{array}{c} 0\\ 0\end{array}\right)\] which leads to the linear system \[\begin{gathered} 2\nu_1=0 \\ \nu_1-4\nu_2=0 \end{gathered}\] and gives the solution \(\nu_1=\nu_2=0\). So there are at most 2 linearly independent column vectors and so the rank of \(A\) is 2, same as using the row vectors!
We can now state some properties of the rank of a matrix.
\(0\leq rank(A) \leq \min(p,q)\). If \(rank(A)=\min(p,q)\), then \(A\) is said to be of full rank.
If \(rank(A)<\min(p,q)\), we can find some linearly dependent columns or rows of \(A\).
If \(A\) is a \(p\times q\) matrix with \(p\geq q\) and \(rank(A)=q\), then the matrix \(A^TA\) is non-singular and the matrix \(AA^T\) is singular. A similar result can be stated for \(p\leq q\).
A.2 Vector differentiation
Suppose that \(\mathbf{x}=(x_1,\ldots,x_p)^T\) is a \(p\times 1\) vector of variables and \(f(\mathbf{x})\) is a real-valued function of \(p\) variables \(x_1,\ldots,x_p\), writing \(f(\mathbf{x})=f(x_1,\ldots,x_p)\), where the domain of \(f(\mathbf{x})\) is a subset of \(\mathbb{R}^p\). Let \(\partial f(\mathbf{x})/\partial x_i\) denote the partial derivative of \(f(\mathbf{x})\) with respect to \(x_i\), then by definition the partial derivative of \(f(\mathbf{x})\) with respect to the vector \(\mathbf{x}\) is \[\frac{\partial f(\mathbf{x}) }{ \partial \mathbf{x} } = \left(\begin{array}{c} \partial f(\mathbf{x}) / \partial x_1 \\ \vdots \\ \partial f(\mathbf{x}) / \partial x_p \end{array}\right)\] or in words: the partial derivative of \(f(\mathbf{x})\) with respect to the vector \(\mathbf{x}\) is the vector, in which elements are the respective partial derivatives of \(f(\mathbf{x})\) with respect to the variables \(x_1,\ldots,x_p\).
Suppose now that \(\mathbf{c}=(c_1,\ldots,c_p)^T\) is a vector of constants and \(f(\mathbf{x})=\mathbf{c}^T\mathbf{x}\). Then we have \[\frac{\partial f(\mathbf{x}) }{ \partial \mathbf{x} } = \mathbf{c}\] The proof of this is by noting that \(\mathbf{c}^T\mathbf{x}=\sum_{i=1}^p c_ix_i\) with \(\partial f(\mathbf{x})/\partial x_i=c_i\) and so \[\frac{\partial f(\mathbf{x}) }{ \partial \mathbf{x} } = \left(\begin{array}{c} c_1 \\ \vdots \\ c_p\end{array}\right) = \mathbf{c}\] as required.
Note that since \(\mathbf{c}^T\mathbf{x}=\mathbf{x}^T\mathbf{c}\), we immediately have that \[\label{ch1:app1} \frac{ \partial \mathbf{c}^T\mathbf{x} }{\partial \mathbf{x}} = \frac{ \partial \mathbf{x}^T \mathbf{c}}{ \partial \mathbf{x}} = \mathbf{c}\]
Now suppose that \(A=(a_{ij})_{i,j=1,\ldots,p}\) is a \(p\times p\) symmetric matrix of constants and write \(A=(\mathbf{a}_1,\ldots,\mathbf{a}_p)\), where \(\mathbf{a}_i\) is the \(i\)-th column of \(A\). Then with \(f(\mathbf{x})=\mathbf{x}^TA\mathbf{x}\) we have \[\frac{\partial f(\mathbf{x}) }{ \partial \mathbf{x} } = 2A\mathbf{x}\] To prove this result, first we note that \[\mathbf{x}^TA\mathbf{x}= \sum_{i=1}^p\sum_{j=1}^p a_{ij}x_ix_j= \sum_{i=1}^p a_{ii}x_i^2 +2\sum_{i<j}^p a_{ij}x_ix_j\] Then \[\frac{ \partial \mathbf{x}^T A\mathbf{x} }{\partial x_i} = 2a_{ii}x_i +2\sum_{j=2}^p a_{ij}x_j = 2 \sum_{j=1}^p a_{ij}x_j = 2\mathbf{a}_i^T\mathbf{x}\] So we get \[\frac{ \partial f(\mathbf{x}) }{\partial \mathbf{x}} = 2\left(\begin{array}{c} \mathbf{a}_1^T\mathbf{x} \\ \vdots \\ \mathbf{a}_p^T\mathbf{x} \end{array}\right) = 2 \left(\begin{array}{c} \mathbf{a}_1^T \\ \vdots \\ \mathbf{a}_p^T\end{array}\right) \mathbf{x} = 2A\mathbf{x}\] So we have established \[\label{ch1:app2} \frac{ \partial \mathbf{x}^TA\mathbf{x}}{\partial \mathbf{x}} = 2A\mathbf{x}\] Note that for this result to hold, we need to know that \(A\) is a symmetric matrix.