How p affects your posterior judgements in BHMs

There may be simpler/more elegant ways to show what I show below, but I find that this question comes up often and it’s good to have a toy example handy.

Consider the hierarchical model

Z = Y + v,

Y =X{\beta} + e,

where for simplicity, we assume that

v \sim \mathcal{N}(0,I),

e \sim \mathcal{N}(0,I).

How does increasing the number of covariates p affect our posterior judgements on Y and \beta? We will analyse the simple case when the components of X are all orthogonal. Then the answer to this question is:

Let X be an n x p matrix where n = dim(Y) and p = dim(\beta), where the columns of X are orthogonal, then

  • Var(Y | Z) strictly increases with p and
  • Var(\beta_i | Z), i = 1,…,p, is independent of p.

To illustrate (not prove) why these two claims are true we first re-write the model as

Z = CV + v,

AV = e,

where C = [I \quad 0], A = [I \quad -X] and V = [Y^T \quad \beta^T]^T. In typical applications Z would constitute the observations, Y the hidden state, X the regressors and \beta the weights attached to each component in X. For our simple model

\textrm{Var}(V | Z) = (C^TC + A^TA)^{-1}.

Var(Y | Z)

Here we are interested in the effect of the number of columns in X on the posterior uncertainty of Y, that is, Var(Y | Z). Using Schur complements on \textrm{Var}(V | Z) we obtain

\textrm{Var}(Y | Z) = (2I - X(X^TX)^{-1}X^T)^{-1}.

Now, if X = x_1, where x_1 is a vector (a single covariate), then

\textrm{Var}_1(Y | Z) = \left(2I - \frac{x_1x_1^T}{\|x_1\|}\right)^{-1},

while if X = [x_1 \quad x_2] (two covariates) then it can be shown that

\textrm{Var}_2(Y | Z) = \left(2I - \frac{\|x_2\|x_1x_1^T + \|x_1\|x_2x_2^T - 2(x_1\cdot x_2)x_1x_2^T}{\|x_1\|\|x_2\| - (x_1\cdot x_2)^2}\right)^{-1},

where x_1 \cdot x_2 denotes the inner product between x_1 and x_2.

Note that both \textrm{Var}_1(Y | Z) and \textrm{Var}_2(Y | Z) can be written in the form

Var_i(Y | Z) = (2I - B_i)^{-1}.

Therefore if \textrm{Var}_1(Y | Z) <\textrm{Var}_2(Y | Z) , then B_1 < B_2 and vice-versa. Now, when x_1 and x_2 are orthogonal,

B_2 = \frac{\|x_2\|x_1x_1^T + \|x_1\|x_2x_2^T}{\|x_1\|\|x_2\|} = \frac{x_1x_1^T}{\|x_1\|} +\frac{x_2x_2^T}{\|x_2\|} > B_1 =\frac{x_1x_1^T}{\|x_1\|}.

Therefore the posterior variance of Y increases with the number of orthogonal regressors in X.

Var(\beta | Z)

Here we are interested in the effect of the number of columns in X on the posterior uncertainty of \beta, that is, Var(\beta | Z). Using Schur complements on \textrm{Var}(V | Z) we obtain

\textrm{Var}(\beta | Z) = 2(X^TX)^{-1},

and therefore when p = 1 (one regressor),

\textrm{Var}_1(\beta | Z) = \frac{2}{\|x_1\|}.

When p=2 we obtain

\textrm{Var}_2(\beta | Z) = \frac{2}{\|x_1\|\|x_2\| - (x_1\cdot x_2)^2} \left[ \begin{array}{cc} \|x_2\| & -x_2\cdot x_1\\ -x_1\cdot x_2 & \|x_1\| \end{array} \right].

If x_1 is orthogonal to x_2 then we obtain

\textrm{Var}_2(\beta | Z) = \left[ \begin{array}{cc} 2/\|x_1\| & 0\\ 0 & 2/\|x_2\| \end{array} \right].

Note that \textrm{Var}_1(\beta_1 | Z) =\textrm{Var}_2(\beta_1 | Z)