How p affects your posterior judgements in BHMs

There may be simpler/more elegant ways to show what I show below, but I find that this question comes up often and it’s good to have a toy example handy.

Consider the hierarchical model

$Z = Y + v,$

$Y =X{\beta} + e,$

where for simplicity, we assume that

$v \sim \mathcal{N}(0,I),$

$e \sim \mathcal{N}(0,I).$

How does increasing the number of covariates p affect our posterior judgements on $Y$ and $\beta$? We will analyse the simple case when the components of $X$ are all orthogonal. Then the answer to this question is:

Let X be an n x p matrix where n = dim(Y) and p = dim($\beta$), where the columns of X are orthogonal, then

• Var(Y | Z) strictly increases with p and
• Var($\beta_i$ | Z), i = 1,…,p, is independent of p.

To illustrate (not prove) why these two claims are true we first re-write the model as

$Z = CV + v,$

$AV = e,$

where $C = [I \quad 0]$, $A = [I \quad -X]$ and $V = [Y^T \quad \beta^T]^T$. In typical applications Z would constitute the observations, Y the hidden state, X the regressors and $\beta$ the weights attached to each component in $X$. For our simple model

$\textrm{Var}(V | Z) = (C^TC + A^TA)^{-1}.$

Var(Y | Z)

Here we are interested in the effect of the number of columns in $X$ on the posterior uncertainty of Y, that is, Var(Y | Z). Using Schur complements on $\textrm{Var}(V | Z)$ we obtain

$\textrm{Var}(Y | Z) = (2I - X(X^TX)^{-1}X^T)^{-1}.$

Now, if $X = x_1$, where $x_1$ is a vector (a single covariate), then

$\textrm{Var}_1(Y | Z) = \left(2I - \frac{x_1x_1^T}{\|x_1\|}\right)^{-1},$

while if $X = [x_1 \quad x_2]$ (two covariates) then it can be shown that

$\textrm{Var}_2(Y | Z) = \left(2I - \frac{\|x_2\|x_1x_1^T + \|x_1\|x_2x_2^T - 2(x_1\cdot x_2)x_1x_2^T}{\|x_1\|\|x_2\| - (x_1\cdot x_2)^2}\right)^{-1},$

where $x_1 \cdot x_2$ denotes the inner product between $x_1$ and $x_2$.

Note that both $\textrm{Var}_1(Y | Z)$ and $\textrm{Var}_2(Y | Z)$ can be written in the form

$Var_i(Y | Z) = (2I - B_i)^{-1}.$

Therefore if $\textrm{Var}_1(Y | Z) <\textrm{Var}_2(Y | Z)$ , then $B_1 < B_2$ and vice-versa. Now, when $x_1$ and $x_2$ are orthogonal,

$B_2 = \frac{\|x_2\|x_1x_1^T + \|x_1\|x_2x_2^T}{\|x_1\|\|x_2\|} = \frac{x_1x_1^T}{\|x_1\|} +\frac{x_2x_2^T}{\|x_2\|} > B_1 =\frac{x_1x_1^T}{\|x_1\|}.$

Therefore the posterior variance of Y increases with the number of orthogonal regressors in X.

Var($\beta$ | Z)

Here we are interested in the effect of the number of columns in $X$ on the posterior uncertainty of $\beta$, that is, Var($\beta$ | Z). Using Schur complements on $\textrm{Var}(V | Z)$ we obtain

$\textrm{Var}(\beta | Z) = 2(X^TX)^{-1},$

and therefore when $p = 1$ (one regressor),

$\textrm{Var}_1(\beta | Z) = \frac{2}{\|x_1\|}.$

When $p=2$ we obtain

$\textrm{Var}_2(\beta | Z) = \frac{2}{\|x_1\|\|x_2\| - (x_1\cdot x_2)^2} \left[ \begin{array}{cc} \|x_2\| & -x_2\cdot x_1\\ -x_1\cdot x_2 & \|x_1\| \end{array} \right].$

If $x_1$ is orthogonal to $x_2$ then we obtain

$\textrm{Var}_2(\beta | Z) = \left[ \begin{array}{cc} 2/\|x_1\| & 0\\ 0 & 2/\|x_2\| \end{array} \right].$

Note that $\textrm{Var}_1(\beta_1 | Z) =\textrm{Var}_2(\beta_1 | Z)$