\SweaveOpts{engine=R, eps=FALSE, pdf=TRUE, width=8, strip.white=all}
\SweaveOpts{keep.source=TRUE}
\SweaveOpts{prefix=TRUE, prefix.string=figs/Covariates, include=TRUE}
\setkeys{Gin}{width=\textwidth}
%% MM: Have we made sure the reader sees this in one place ?
<>=
options(show.signif.stars = FALSE,# <- Doug
lattice.theme = function() canonical.theme("pdf", color = FALSE),
str = strOptions(strict.width = "cut"))
library(splines)
library(lattice)
library(Matrix)
library(Rcpp)
library(minqa)
library(lme4)
data(Machines, package = "MEMSS")
data(ergoStool, package = "MEMSS")
.f4 <- "%.4f"
.f1 <- "%.1f"
fm06 <- lmer(effort ~ 1 + (1|Subject) + (1|Type), ergoStool, REML=FALSE)
if (file.exists("pr06.rda")) {
load("pr06.rda")
} else {
pr06 <- profile(fm06)
save(pr06, file = "pr06.rda")
}
fm07 <- lmer(effort ~ 1 + Type + (1|Subject), ergoStool, REML=FALSE)
if (file.exists("pr07.rda")) {
load("pr07.rda")
} else {
pr07 <- profile(fm07)
save(pr07, file = "pr07.rda")
}
f7 <- "%.7f"
f3 <- "%.3f"
if (file.exists("classroom.rda")) {
load("classroom.rda")
} else {
classroom <-
within(read.csv("http://www-personal.umich.edu/~bwest/classroom.csv"),
{
classid <- factor(classid)
schoolid <- factor(schoolid)
sex <- factor(sex, labels = c("M","F"))
minority <- factor(minority, labels = c("N", "Y"))
})
classroom$childid <- NULL
save(classroom, file = "classroom.rda")
}
if (file.exists("pr7.rda")) {
load("pr7.rda")
} else {
fm7 <- lmer(mathgain ~ I(mathkind-450) + sex + minority + ses
+ housepov + (1|classid) + (1|schoolid), classroom)
pr7 <- profile(fm7)
save(pr7, file = "pr7.rda")
}
if (file.exists("ratbrain.rda")) {
load("ratbrain.rda")
} else {
ratbrain <-
within(read.delim("http://www-personal.umich.edu/~bwest/rat_brain.dat"),
{
treatment <- factor(treatment,
labels = c("Basal", "Carbachol"))
region <- factor(region,
labels = c("BST", "LS", "VDB"))
})
save(ratbrain, file = "ratbrain.rda")
}
@
%% Practical issues: Number of levels of a factor
%% Model building - the tendency to over-model
%% Convergence to a singular variance-covariance matrix for the
%% random effects.
\chapter{Building Linear Mixed Models}
\label{chap:Covariates}
%% Topics - interactions
In the previous chapter we incorporated the time covariate of
longitudinal data in both the fixed-effects terms and the
random-effects terms of linear mixed models. In this chapter we will
extend the use of covariates, both numeric and categorical, in linear
mixed models and discuss general approaches to building and assessing
these models.
Statistical model building is still somewhat more of an art than a
science and there are many practical issues that we should bear in
mind while engaged in it. We discuss how some of the optional
arguments to the \code{lmer} function can be used to check for
possible problems.
We begin with a discussion of categorical covariates and how they are
incorporated in the fixed-effects terms.
\section{Incorporating categorical covariates with fixed levels}
\label{sec:categoricalcov}
As we have seen in the earlier chapters, each random-effects term
in a linear mixed model is defined with respect to a \emph{grouping
factor}, which is a categorical covariate whose levels are regarded
as a sample from a population of possible levels. In some cases we
may have other categorical covariates with a fixed set of levels (such
as gender, with fixed levels male and female) that we wish to
incorporate into the model. Although incorporating such covariates
into the model formula is straightforward, the interpretation of
coefficients and the model's structure can be subtle.
\subsection{The \texttt{Machines} Data}
\label{sec:MachinesData}
\citet[Table 23.1]{milliken09:_analy_messy_data} discuss data from an
experiment measuring productivity on a manufacturing task according to
the type of machine used and the operator. No further details on the
experiment are given and it is possible that these data, which are
available as \code{Machines} in the \package{MEMSS} package, were
constructed for illustration and are not observed data from an actual
experiment.
<>=
str(Machines)
xtabs(~ Machine + Worker, Machines)
@
The cross-tablulation shows that each of the six operators used each
of the three machines on three occasions producing replicate
observations of the ``subject-stimulus'' combinations. Although the
operators represent a sample from the population of potential
operators, the three machines are the specific machines of interest.
That is, we regard the levels of \code{Machine} as fixed levels and
the levels of \code{Worker} as a random sample from a population. In
other subject/stimulus studies we may regard the levels of the
stimulus as a random sample from a population of stimuli.
We can see in Fig.~\ref{fig:Machinesplot}
\begin{figure}[tbp]
\centering
<>=
print(dotplot(reorder(Worker, score) ~ score, Machines,
groups = Machine,
xlab = "Quality and productivity score",
ylab = "Worker", type = c("p","a"),
auto.key = list(columns = 3, lines = TRUE),
jitter.y = TRUE))
@
\caption[Quality score by operator and machine]{A quality and
productivity score for each of six operators (the \code{Worker}
factor) on each of three machine types.}
\label{fig:Machinesplot}
\end{figure}
that, while the scores for each machine-operator combination are
tightly clustered, there are considerable, apparently systematic,
differences between machines and somewhat smaller differences between
operators. The pattern across operators for each of the machines is
similar except for one unusual combination, operator 6 on machine B.
For the other five operators the scores on machine A are less than
those on machine B which are less than those on Machine C but for
operator 6 the scores on machine B are less than those on machine A.
We expect our models to show a significant interaction between the
\code{Worker} and \code{Machine} factors because of this unusual
pattern. As we will see, we can incorporate such an interaction into
a linear mixed model in two different ways.
\subsection{Comparing models with and without interactions}
We fit and compare three models for these data: \code{fm08} without
interactions, \code{fm09} with vector-valued random effects to allow
for interactions, and \code{fm10} with interactions incorporated into
a second simple scalar random-effects term.
<>=
fm08 <- lmer(score ~ Machine + (1|Worker), Machines, REML=FALSE)
fm09 <- lmer(score ~ Machine + (Machine|Worker), Machines, REML=FALSE)
fm10 <- lmer(score ~ Machine + (1|Worker) + (1|Machine:Worker),
Machines, REML=FALSE)
anova(fm08, fm09, fm10)
@
Notice that in the \code{anova} summary the order of the models has
been rearranged according to the complexity of the models, as measured
by the number of parameters to be estimated. The simplest model,
\code{fm08}, incorporates three fixed-effects parameters and two
variance component parameters --- a total of 5 parameters, the number
displayed in the column labeled \code{Df}. Model \code{fm10}, with
scalar random effects for \code{Worker} and for the
\code{Machine:Worker} combination incorporates one additional variance
component for a total of six parameters, while model \code{fm09} adds
five variance-component parameters to those in \code{fm08}, for a
total of 10 parameters.
In the comparison of models \code{fm08} and \code{fm10} (i.e. the line
labeled \code{fm10} in the \code{anova} table) we see that the
additional parameter is highly significant. The change in the
deviance of \Sexpr{sprintf(.f4,deviance(fm08)-deviance(fm10))} (in the
column labeled \code{Chisq}) at a cost of one additional parameter is
huge; hence we prefer the more complex model \code{fm10}. In the next
line, which is the comparison of the more complex model \code{fm09} to
the simpler model \code{fm10}, the change in the deviance is
\Sexpr{sprintf(.f4,deviance(fm10)-deviance(fm09))} at a cost of 4
additional parameters with a p-value of
\Sexpr{sprintf(.f1,100*pchisq(deviance(fm10)-deviance(fm09),4,lower=FALSE))}\%.
In formal hypothesis tests we establish a boundary, often chosen to
be 5\%, below which we regard the p-value as providing ``significant''
evidence to prefer the more complex model and above which the results
are regarded as representing an ``insignificant'' improvement. Such
boundaries, while arbitrary, help us to assess the numerical
results and here we prefer model \code{fm10}, of intermediate complexity.
A display of this fitted model,
<>=
fm10
@
shows that the estimated standard deviations for the \code{Worker}
random-effects and for \code{Machine:Worker} random effects are
comparable and both are larger than the estimated residual standard
deviation. This is consistent with our discussion of the patterns in
Fig.~\ref{fig:Machinesplot}.
Furthermore, a plot of the prediction intervals on the random effects
(Fig.~\ref{fig:fm10ranef})
\begin{figure}[tbp]
\centering
<>=
qrr10 <- dotplot(ranef(fm10, condVar = TRUE), strip = FALSE)
print(qrr10[[1]], pos = c(0,0,1,0.75), more = TRUE)
print(qrr10[[2]], pos = c(0,0.65,1,1))
@
\caption[Random effects prediction intervals for model \code{fm10}]{95\%
prediction intervals on the random effects for model
\code{fm10} fit to the \code{Machines} data. The intervals in the
upper panel are those for levels of the \code{Worker} factor. Those
in the lower panel correspond to levels of the \code{Machine:Worker} interaction.}
\label{fig:fm10ranef}
\end{figure}
shows that \code{B:6} is the dominant interaction. In fact, the
prediction interval for this level of the interaction is the only prediction interval
that does not contain zero.
For model \code{fm10} the interpretation of the random effects
parameter estimates and the random effects themselves is similar to
those for the models in Chap.~\ref{chap:Multiple}. The interpretation
of the fixed-effects coefficients is somewhat different.
\subsection{Coefficients for factors in the fixed effects}
\label{sec:coefficients}
The models fit in previous chapters incorporated categorical
covariates, represented as factors, in the random effects but not in
the fixed effects. We have seen that a simple, scalar random effects
term contributes one coefficient to the linear predictor, $\vec
X\vec\beta+\vec Z\vec b$, for each level of the grouping factor. For
many of these models there was a single fixed-effects coefficient,
labeled \code{(Intercept)}, in the parameter $\vec\beta$. This
coefficient represents a typical response value when it is the only
fixed-effect coefficient.
When we have a factor in the fixed-effects part of the model formula,
however, we cannot simultaneously estimate an \code{(Intercept)}
coefficient and ``effects'' for each of the levels of the factor
because there would be a redundancy in the coefficients. Suppose we
had an \code{(Intercept)} coefficient and three separate effects for
the levels of the \code{Machine} factor. If we were to add a
constant, say 2.6, to each of the effects coefficients and subtract
2.6 from the \code{(Intercept)} we would generate the same set of
fitted values and hence the same likelihood. We would not be able to
define a unique set of maximum likelihood estimates,
$\widehat{\vec\beta}$.
In \R{} this redundancy is removed by applying \code{contrasts} to the
levels of a factor when it is incorporated in the fixed-effects part
of a linear predictor formula, and we can choose which contrasts are
used when fitting a model. To show the effect of changing the
contrasts we will refit model \code{fm10} with different contrasts for
the \code{Machine} factor and examine the estimated fixed effects. We
can either extract the fixed-effects coefficient estimates by themselves
<>=
fixef(fm10)
@
or extract the coefficients and their standard errors and t ratios
<>=
coef(summary(fm10))
@
for comparisons. To save on typing we will use \code{update} to fit
the modified models. Recall that \code{update} applied to a fitted
model allows us to re-fit the model with changes that we specify in
subsequent arguments.
First let's consider how the model changes if we suppress the
intercept coefficient, which we can do by appending \code{- 1} to the
model formula. The \code{update} function allows us to do this by
writing a model formula with dots indicating ``what was there in
the original fit''. Thus
<>=
(fm10a <- update(fm10, . ~ . - 1))
@
Notice that there are three fixed-effects coefficients in this model
fit, just as there are in model \code{fm10}. All we have done by
suppressing the intercept is to estimate a different set of
coefficients in the fixed effects. The log-likelihood and the various
statistics summarizing the fit are the same --- for example,
<>=
AIC(fm10a,fm10)
@
Even the fitted values are the same
<>=
all.equal(fitted(fm10a), fitted(fm10))
@
<>=
all.equal(getME(fm10, "mu"), getME(fm10a, "mu"))
@
There are several things to note about model \code{fm10a} compared to model
\code{fm10}. First, we can see that the
%%% ????
To see some of the approaches to
removing this redundancy, including the defau
%%% ???
If we suppress the intercept
we can estimate one coefficient for each level of the factor
<>=
coef(summary(update(fm10, . ~ . + 0)))
@
unless we impose additional \code{estimability conditions} on the
coefficients.
%%% ???
only one of the prediction intervals for the interaction that does not
%%% ???
In both models \code{fm09} and \code{fm10} we incorporate the
\code{Machine} factor
%%% ???
is in fit models having different configurations of
simple, scalar random effects but always with the same fixed-effects
specification of an intercept, or overall mean response, only.
It is common in practice to have several fixed-effects terms involving
one or more covariates in the specification of a linear mixed model.
Indeed, the purpose of fitting a linear mixed model is often to draw
inferences about the effects of the covariates while appropriately
accounting for different sources of variability in the responses.
In this chapter we demonstrate how fixed-effects terms are
incorporated in a linear mixed model and how inferences about the
effects of the covariates are drawn from a fitted linear mixed model.
\section{Models for the \texttt{ergoStool} data}
\label{sec:ergoStool}
Problems~\ref{pr:2ergoexamine} and \ref{pr:2ergoplot} in
Chap.~\ref{chap:Multiple} involve examining the structure of the
\code{ergoStool} data from the \package{MEMSS} package
<>=
str(ergoStool)
@
and plotting these data, as in Fig.~\ref{fig:ergoplot}.
\begin{figure}[tbp]
\centering
<>=
print(dotplot(reorder(Subject, effort) ~ effort, ergoStool,
groups = Type, type = c("p","a"),
xlab = "Effort to arise (Borg scale)",
auto.key = list(columns=4,lines=TRUE)))
@
\caption[Effort to arise by subject and stool type]{Subjective
evaluation of the effort required to arise (on the Borg scale) by 9
subjects, each of whom tried each of four types of stool.}
\label{fig:ergoplot}
\end{figure}
These data are from an ergometrics experiment where nine subjects
evaluated the difficulty to arise from each of four types of stools.
The measurements are on the scale of perceived exertion developed by
the Swedish physician and researcher Gunnar Borg. Measurements on
this scale are in the range 6-20 with lower values indicating less
exertion.
From Fig.~\ref{fig:ergoplot} we can see that all nine subjects
rated type \code{T1} or type \code{T4} as requiring the least
exertion and rated type \code{T2} as requiring the most exertion.
Type \code{T3} was perceived as requiring comparatively little
exertion by some subjects (H and E) and comparatively greater exertion
by others (F, C and G).
Problem~\ref{pr:2ergofitre} involves fitting and evaluating a model in
which the effects of both the \code{Subject} and the \code{Type}
factors are incorporated as random effects. Such a model may not be
appropriate for these data where we wish to make inferences about
these particular four stool types. According to the distinction
between fixed- and random-effects described in Sect.~\ref{sec:memod},
if the levels of the \code{Type} factor are fixed and reproducible
we generally incorporate the factor in the fixed-effects part of the
model.
Before doing so, let's review the results of fitting a linear mixed
model with random effects for both \code{Subject} and \code{Type}.
\subsection{Random-effects for both \texttt{Subject} and \texttt{Type}}
\label{sec:SubjTypeRE}
A model with random effects for both \code{Subject} and \code{Type} is
fit in the same way that we fit such in Chap.~\ref{chap:longitudinal},
<>=
(fm06 <- lmer(effort ~ 1 + (1|Subject) + (1|Type), ergoStool, REML=FALSE))
@
from which we determine that the mean effort to arise, across stool
types and across subjects, is \Sexpr{sprintf(f3, fixef(fm06))} on this
scale, with standard deviations of
\Sexpr{sprintf(f3, attr(VarCorr(fm06)[["Subject"]], "stddev"))}
for the random-effects for the \code{Subject} factor and
\Sexpr{sprintf(f3, attr(VarCorr(fm06)[["Type"]], "stddev"))}
for the \code{Type} factor.
One question we would want to address is whether there are
``significant'' differences between stool types, taking into account
the differences between subjects. We could approach this question by
fitting a reduced model, without random effects for \code{Type}, and
comparing this fit to model \code{fm06} using a likelihood-ratio test.
<>=
fm06a <- lmer(effort ~ 1 + (1|Subject), ergoStool, REML=FALSE)
anova(fm06a, fm06)
@
The p-value in this test is very small, indicating that the more
complex model, \code{fm06}, which allows for differences in the effort
to arise for the different stool types, provides a significantly
better fit to the observed data.
In Sect.~\ref{sec:TestingSig2is0} we indicated that, because the
constraint on the reduced model, $\sigma_2=0$, is on the boundary of
the parameter space, the p-value for this likelihood ratio test
statistic calculated using a $\chi^2_1$ reference distribution will be
conservative. That is, the p-value one would obtain by, say,
simulation from the null distribution, would be even smaller than the
p-value, \Sexpr{sprintf(f7, anova(fm06a, fm06)[["Pr(>Chisq)"]][2])},
reported by this test, which is already very small.
Thus the evidence against the null hypothesis ($H_0:\sigma_2 = 0$) and
in favor of the alternative, richer model ($H_a:\sigma_2>0$) is very strong.
Another way of addressing the question of whether it is
reasonable for $\sigma_2$ to be zero is to profile \code{fm06} and
examine profile zeta plots (Fig.~\ref{fig:pr06zeta})
\begin{figure}[tbp]
\centering
<>=
print(xyplot(pr06, aspect=1.3, layout=c(4,1)))
@
\caption[Profile zeta plot for the parameters in model \code{fm06}]
{Profile zeta plot for the parameters in model \code{fm06} fit to the
\code{ergoStool} data.}
\label{fig:pr06zeta}
\end{figure}
and the corresponding profile pairs plot (Fig.~\ref{fig:pr06pairs}).
\begin{figure}[tbp]
\centering
<>=
print(splom(pr06))
@
\caption[Profile pairs plot for the parameters in model \code{fm06}]
{Profile pairs plot for the parameters in model \code{fm06} fit to the
\code{ergoStool} data.}
\label{fig:pr06pairs}
\end{figure}
We can see from the profile zeta plot (Fig.~\ref{fig:pr06zeta}) that
both $\sigma_1$, the standard deviation of the \code{Subject} random
effects, and, $\sigma_2$, the standard deviation of the \code{Type}
random effects, are safely non-zero. We also see that $\sigma_2$ is
very poorly determined. That is, a 95\% profile-based confidence
interval on this parameter, obtained as
<>=
confint(pr06)[".sig02",]
@
is very wide. The upper end point of this 95\% confidence interval,
\Sexpr{sprintf(f3,confint(pr06)[2,2])}, is more than twice as large as
the estimate, $\widehat{\sigma_2}=$
\Sexpr{sprintf(f3, attr(VarCorr(fm06)[["Type"]], "stddev"))}.
A plot
\begin{figure}[tbp]
\centering
<>=
print(dotplot(ranef(fm06, condVar = TRUE), strip=FALSE)[["Type"]])
@
\caption[Prediction intervals on the random effects for stool
type]{95\% prediction intervals on the random effects for \code{Type}
from model \code{fm06} fit to the \code{ergoStool} data.}
\label{fig:fm06dot}
\end{figure}
of the prediction intervals on the random effects for \code{Type}
(Fig.~\ref{fig:fm06dot}) confirms the impression from
Fig.~\ref{fig:ergoplot} regarding the stool types. Type \code{T2}
requires the greatest effort and type \code{T1} requires the least
effort. There is considerable overlap of the prediction intervals for
types \code{T1} and \code{T4} and somewhat less overlap between types
\code{T4} and \code{T3} and between types \code{T3} and \code{T2}.
In an analysis like this we begin by asking if there are any
significant differences between the stool types, which we answered for
this model by testing the hypothesis $H_0:\sigma_2=0$ versus
$H_a:\sigma_2>0$. If we reject $H_0$ in favor of $H_a$ --- that is, if
we conclude that the more complex model including random effects for
\code{Type} provides a significantly better fit than the simpler
model --- then usually we want to follow up with the question, ``Which
stool types are significantly different from each other?''. It is
possible, though not easy, to formulate an answer to that question
from a model fit such as \code{fm06} in which the stool types are
modeled with random effects, but it is more straightforward to address
that question when we model the stool types as fixed-effects
parameters, which we do next.
\subsection{Fixed Effects for \texttt{Type}, Random for \texttt{Subject}}
\label{sec:ergoSubjRE}
To incorporate the \code{Type} factor in the fixed-effects parameters,
instead of as a grouping factor for random effects, we remove the
random-effects term, \code{(1|Type)}, and add \code{Type} to the
fixed-effects specification.
<>=
(fm07 <- lmer(effort ~ 1 + Type + (1|Subject), ergoStool, REML = 0))
@
It appears that the last three levels of the \code{Type} factor are
now modeled as fixed-effects parameters, in addition to the
\code{(Intercept)} parameter, whose estimate has decreased markedly
from that in model \code{fm06}. Furthermore, the estimates of the
fixed-effects parameters labeled \code{TypeT2}, \code{TypeT3} and
\code{TypeT4}, while positive, are very much smaller than would be
indicated by the average responses for these types.
It turns out, of course, that the fixed-effects parameters generated
by a factor covariate do not correspond to the overall mean and the
effect for each level of the covariate. Although a model for an
experiment such as this is sometimes written in a form like
\begin{equation}
\label{eq:fm07formula}
y_{ij}=\mu + \alpha_i + b_j + \epsilon_{ij},\quad
i=1,\dots,4,\;j=1,\dots 9
\end{equation}
where $i$ indexes the stool type and $j$ indexes the subject, the
parameters $\left\{\mu,\alpha_1,\alpha_2,\alpha_3,\alpha_4\right\}$,
representing the overall mean and the effects of each of the stool
types, are redundant. Given a set of estimates for these
parameters we would not change the predictions from the model if, for
example, we added one to $\mu$ and subtracted one from all the
$\alpha$'s. In statistical terminology we say that this set of
parameters is not \emph{estimable} unless we impose some other
conditions on them. The estimability condition
$\sum_{i=1}^4\alpha_i=0$ is often used in introductory texts.
The approach taken in \R{} is not based on redundant parameters that
are subject to estimability conditions. While this approach may
initially seem reasonable, in complex models it quickly becomes
unnecessarily complex to need to use constrained optimization for
parameter estimation. Instead we incorporate the constraints into the
parameters that we estimate. That is, we reduce the redundant set of
parameters to an estimable set of \emph{contrasts} between the levels
of the factors.
\subsubsection{The default contrasts generated for a factor}
\label{sec:contrasts}
Although the particular set of contrasts used for a categorical factor
can be controlled by the user, either as a global option for a session
(see \code{?options}) or by the optional \code{contrasts} argument
available in most model-fitting functions, most users do not modify
the contrasts, preferring to leave them at the default setting, which
is the ``treatment'' contrasts (\code{contr.treatment}) for an
unordered factor and orthogonal polynomial contrasts
(\code{contr.poly}) for an ordered factor. You can check the current
global setting with
<>=
getOption("contrasts")
@
Because these were the contrasts in effect when model \code{fm07} was
fit, the particular contrasts used for the \code{Type}
factor, which has four levels, correspond to
<>=
contr.treatment(4)
@
In this display the rows correspond to the levels of the \code{Type}
factor and the columns correspond to the parameters labeled
\code{TypeT2}, \code{TypeT3} and \code{TypeT4}.
The values of \code{Type} in the data frame, whose first few rows are
<>=
head(ergoStool)
@
combined with the contrasts produce the model matrix $\vec X$, whose
first few rows are
<>=
head(model.matrix(fm07))
@
We see that the rows of $\vec X$ for observations on stool type \code{T1}
have zeros in the last three columns; the rows for observations on stool
type \code{T2} have a 1 in the second column and zeros in the last two
columns, and so on. As before, the \code{(Intercept)} column is a
column of 1's.
When we evaluate $\vec X\vec\beta$ in the linear predictor expression,
$\vec X\vec\beta+\vec Z\vec b$, we take the $p$ elements of the
fixed-effects parameter vector, $\vec\beta$, whose estimate is
<>=
fixef(fm07)
@
and the $p$ elements of a row of the matrix $\vec X$, multiply
corresponding components and sum the resulting products. For example,
the fixed-effects predictor for the first observation (stool type
\code{T1}) will be
\begin{displaymath}
8.5556\times1+3.8889\times0+2.2222\times0+0.6667\times0=8.5556
\end{displaymath}
and the fixed-effects predictor for the second observation (stool type
\code{T2}) will be
\begin{displaymath}
8.5556\times1+3.8889\times1+2.2222\times0+0.6667\times0=12.4444
\end{displaymath}
We see that the parameter labeled \code{(Intercept)} is actually the
fixed-effects prediction for the first level of \code{Type}
(i.e. level \code{T1}) and the second parameter, labeled
\code{TypeT2}, is the difference between the fixed-effects prediction
for the second level (\code{T2}) and the first level (\code{T1}) of
the \code{Type} factor.
Similarly, the fixed-effects predictions for the \code{T3} and
\code{T4} levels of \code{Type} are $8.5556+2.2222=10.7778$ and
$8.5556+0.6667=9.2222$, respectively, as can be verified from
<>=
head(as.vector(model.matrix(fm07) %*% fixef(fm07)))
@
The fact that the parameter labeled \code{TypeT2} is the difference
between the fixed-effects prediction for levels \code{T2} and
\code{T1} of the \code{Type} factor is why we refer to the parameters
as being generated by \emph{contrasts}. They are formed by
contrasting the fixed-effects predictions for some combination of the
levels of the factor. In this case the contrast is between levels
\code{T2} and \code{T1}.
In general, the parameters generated by the ``treatment'' contrasts
(the default for unordered factors) represent differences between the
first level of the factor, which is incorporated into the
\code{(Intercept)} parameter, and the subsequent levels. We say that
the first level of the factor is the \emph{reference} level and the
others are characterized by their shift relative to this reference
level.
\subsubsection{Profiling the contrasts}
\label{sec:profilingcontrasts}
Because some of the contrasts that are of interest to us correspond to
fixed-effects parameter values, we can profile the fitted model
\begin{figure}[tbp]
\centering
<>=
print(xyplot(pr07, aspect=1.3, layout=c(6,1)))
@
\caption[Profile zeta plot for the parameters in model \code{fm06}]
{Profile zeta plot for the parameters in model \code{fm06} fit to the
\code{ergoStool} data.}
\label{fig:pr07zeta}
\end{figure}
(Fig.~\ref{fig:pr07zeta}) and check, say, the confidence intervals on
these parameters.
<>=
confint(pr07, c("TypeT2", "TypeT3", "TypeT4"))
@
According to these intervals, and from what we see from
Fig.~\ref{fig:pr07zeta}, types \code{T2} and \code{T3} are
significantly different from type \code{T1} (the intervals do not contain
zero) but type \code{T4} is not (the confidence interval on this
contrast contains zero).
However, this process must be modified in two ways to provide a
suitable answer. The most important modification is to take into
account the fact that we are performing \emph{multiple comparisons}
simultaneously. We describe what this means and how to accomodate for
it in the next subsection. The other problem is that this process
only allows us to evaluate contrasts of the reference level,
\code{T1}, with the other levels and the reference level is
essentially arbitrary. For completeness we should evaluate all six
possible contrasts of pairs of levels.
We can do this by refitting the model with a different reference
level for the \code{Type} factor and profiling the modified model
fit. The \code{relevel} function allows us to change the reference
level of a factor.
<>=
pr07a <- profile(lmer(effort ~ 1 + Type + (1|Subject),
within(ergoStool, Type <- relevel(Type, "T2")),
REML=FALSE))
pr07b <- profile(lmer(effort ~ 1 + Type + (1|Subject),
within(ergoStool, Type <- relevel(Type, "T3")),
REML=FALSE))
@
The other constrasts of interest are
<>=
confint(pr07a, c("TypeT3", "TypeT4"))
confint(pr07b, "TypeT4")
@
from which we would conclude that type \code{T2} requires significantly
greater effort than any of the other types at the 5\% level (because
none of the 95\% confidence intervals on contrasts with \code{T2}
contain zero) and that types \code{T3} and \code{T4} are significantly
different at the 5\% level.
However, we must take into account that we are performing multiple,
simultaneous comparisons of levels.
\subsubsection{Multiple comparisons}
\label{sec:multcomp}
In the technical definition of a confidence interval we regard the end
points as being random (because they are calculated from the random
variable which is the observed data) and the value of the parameter or
the contrast of interest as being fixed (because it is determined from
the fixed, but unknown, values of the parameters). Thus we speak of
the probability that an interval covers the true parameter value
rather than the probability that the parameter falls in the interval.
The distinction may seem, and probably is, somewhat pedantic. We
introduce it here simply to clarify the term ``coverage probability''
used throughout this section.
We have evaluated six possible pairwise comparisons of the four levels
of the \code{Type} factor. A 95\% confidence interval on a particular
contrast has, in theory, a 5\% probability of failing to cover the
true difference. That is, if the difference between two levels was in
fact zero, there would still be a 5\% probability that a 95\%
confidence interval on that contrast would not include zero. When we
consider the coverage of the six intervals contrasting all possible
pairs of stool types we usually have in mind that there should be a
95\% probability of all six intervals covering the true, but unknown,
differences in effort for the stool types. That is, we think of the
coverage probability as applying to the simultaneous coverage of the
family of intervals, not to the coverage of one specific interval.
But the intervals calculated in the previous section were based on
95\% coverage for each specific interval. In the worst case scenario
the family-wise coverage could be as low as $1-0.05 * 6=0.70$ or 70\%.
For factors with more than four levels there are even more possible
pairwise comparisons (for $k$ levels there are $k(k-1)/2$ possible
pairs) and this worst-case coverage probability is even further from
the nominal level of 95\%.
Several methods have been developed to compensate for multiple
comparisons in the analysis of linear models with fixed effects only.
One of the simplest, although somewhat coarse, compensations is the
Bonferroni correction where the individual intervals are chosen to
have a greater coverage probability in such a way that the
``worst-case'' probability is the desired level. With six comparisons
to get a family-wise coverage probability of 95\% the individual
intervals are chosen to have coverage of
<>=
(covrge <- 1 - 0.05/6)
@
or a little more than 99\%. We can specify this coverage level for
the individual intervals to ensure a family-wise coverage of at least 95\%.
<>=
rbind(confint(pr07, c("TypeT2","TypeT3","TypeT4"), covrge),
confint(pr07a, c("TypeT3","TypeT4"), covrge),
confint(pr07b, "TypeT4", covrge))
@
We again reach the conclusion that the only pair of stool types for
which zero is within the confidence interval on the difference in
effects is the (\code{T1},\code{T4}) pair but, for these intervals,
the family-wise coverage of all six intervals is at least 95\%.
There are other, perhaps more effective, techniques for adjusting
intervals to take into account multiple comparisons. The purpose of
this section is to show that the profile-based confidence intervals
can be extended to at least the Bonferroni correction.
The easiest way to apply other multiple comparison adjustment methods
is to model both the \code{Type} and the \code{Subject} factors with
fixed effects, which we do next.
\subsection{Fixed Effects for \texttt{Type} and \texttt{Subject}}
\label{sec:ergofe}
Even though the subjects in this study are chosen as representatives
of a population, many statisticians would regard \code{Subject} as a
\emph{blocking factor} in the experiment and fit a model with
fixed-effects for both \code{Type} and \code{Subject}. A blocking
factor is a known source of variability in the response. We are not
interested in the effects of the levels of the blocking factor---we
only wish to accomodate for this source of variability when comparing
levels of the experimental factor, which is the \code{Type} factor in
this example.
We will discuss the advantages and disadvantages of the fixed- versus
random-effects choice for the \code{Subject} factor at the end of this
section. For the moment we proceed to fit the fixed-effects model,
for which we could use the \code{lm} function or the \code{aov}
function. These two functions produce exactly the same model fit but
the \code{aov} function returns an object of class \code{"aov"} which
extends the class \code{"lm"}, providing more options for examining
the fitted model.
<>=
summary(fm08 <- aov(effort ~ Subject + Type, ergoStool))
@
As seen above, the \code{summary} method for objects of class
\code{"aov"} provides an analysis of variance table. The order in
which the terms are listed in the model formula can affect the results
in this table, if the data are unbalanced, and we should be cautious
to list the terms in the model in the appropriate order, even for
a balanced data set like the \code{ergoStool}. The rule is that
blocking factors should precede experimental factors because the
contributions of the terms are assessed sequentially. Thus we read
the rows in this table as measuring the variability due to the
\code{Subject} factor and due to the \code{Type} factor after taking
into account the \code{Subject}. We want to assess the experimental
factor after having removed the variability due to the blocking
factor.
If desired we can assess individual coefficients by applying the
summary method for \code{"lm"} objects, called \code{summary.lm} to
this fitted model. For example, the coefficients table is available as
<>=
coef(summary.lm(fm08))
@
but often the individual coefficients are of less interest than the
net effect of the variability due to the levels of the factor, as
shown in the analysis of variance table. For example, in the summary
of the coefficients shown above the \code{(Intercept)} coefficient is
the predicted response for the reference subject (subject A) on the
reference stool type (type T1). Other coefficients generated by the
\code{Subject} term are the differences from the reference subject to
other subjects. It is not clear why we would want to compare all the
other subjects to subject A.
One of the multiple comparison methods that we can apply to
\code{fm08} is Tukey's Honest Significant Difference (HSD) method
<>=
TukeyHSD(fm08, which = "Type")
@
from which we reach essentially the same conclusions as before,
although perhaps less arduously.
\subsection{Fixed Effects Versus Random Effects for \texttt{Subject}}
\label{sec:fevsreSubj}
These three analyses provoke the question of whether to use
fixed-effects parameters or random effects for the \code{Subject}
factor. That is, should \code{Subject} be treated as a blocking
factor or as a sample of levels from a population. If the sole
purpose of the experiment is to rate the stool types relative to each
other then it may be more convenient to consider \code{Subject} as a
blocking factor in the experiment and incorporate it as the first term
in a fixed-effects model for this completely balanced data set from a
designed experiment. Strictly speaking, the inferences about the
stool types that we would draw apply to these particular nine subjects
only, not to a general population, but in practice it is not too much
of a stretch to think of them as applying to a population
\emph{unless} we want to predict the score that a general member of
the population would give to a particular stool type. We can
formulate the prediction but assessing the variability in the
prediction is difficult when using fixed-effects models.
Random effects are preferred, perhaps even necessary, if we wish to
make inferences that apply to the population of potential users.
Also, when we have unbalanced data or large samples, the flexibility
of mixed models becomes important in stabilizing the estimation of
parameters. The estimation of parameters in fixed-effects models by
least squares is somewhat rigid. In theory it requires that the
columns of the model matrix, $\vec X$, are linearly independent. In
practice, it is very difficult to determine if the columns of a large
model matrix are indeed linear independent or, equivalently, if the
rank of $\vec X$ is exactly $p$. The best we can do is determine if
the condition number, $\kappa$, of $\vec X$ (the ratio of the largest
to smallest singular values --- see \code{?kappa}) is sufficiently
small to trust the numerical linear algebra results. One way of
thinking of $\kappa$ is as a ``magnification factor'' for numerical
perturbations. In the worst-case scenario, perturbations of relative
size $\epsilon$ in the elements of $\vec y$ results in perturbations
of relative size $\kappa\epsilon$ in the coefficients
$\widehat{\beta}$.
When the number of columns, $p$, of $\vec X$ is small the condition
number tends to be small.
<>=
kappa(fm08, exact = TRUE)
@
However, when $p$ is large the condition number of $\vec X$ tends to
become very large and evaluation of fixed-effects parameter
estimates and their standard errors is an ill-conditioned problem.
Calculations involving the random effects model matrix, $\vec Z$, are
not as sensitive to ill-conditioning. The numerical accuracy is
determined by the condition number of the sparse Cholesky factor,
$\vec L_\theta$, defined in (\ref{eq:sparseCholesky1}) which is less
than the condition number of $\Lambda_\theta\trans\vec Z\trans\vec
Z\Lambda_\theta$, even when $\Lambda_\theta$ is singular.
%% base :: .kappa_tri() is doc.ed as "an internal function called by
%% 'kappa.qr' and 'kappa.default'. "
<>=
.kappa_tri(as(as(getME(fm06, "L"), "sparseMatrix"), "matrix"), exact = TRUE)
@
The evaluation of $\vec L_\theta$ is an example of
\emph{regularization} methods for solving ill-posed problems or to
prevent overfitting.
\section{Covariates Affecting Mathematics Score Gain}
\label{sec:mathgain}
\citet{west07:_linear_mixed_model} provides comparisons of several
different software systems for fitting linear mixed models by fitting
sample models to different data sets using each of these software
systems. The \code{lmer} function from the \package{lme4} package is
not included in these comparisons because it was still being developed
when that book was written.
In this section we will use \code{lmer} to fit models described in
Chap.~4 of \citet{west07:_linear_mixed_model} to data on the gain in
mathematics scores for students in a selection of classrooms in
several schools.
<>=
str(classroom)
@
The response modeled in Chap.~4 of \citet{west07:_linear_mixed_model}
is \code{mathgain}, the difference between a student's mathematics
score at the end of grade one and at the end of kindergarten. To
allow comparisons with the results in
\citet{west07:_linear_mixed_model} we will use that response in this section.
There is one observation per student. In the terminology of the
multilevel modeling literature the levels of variability correspond to
student (level 1), classroom (level 2) and school (level 3) with the
assumption that classroom is nested within school. The concept of
``levels'' can only be applied to models and data sets in which the
grouping factors of the random effects form a nested sequence. In
crossed or partially crossed configurations there are no clearly
defined levels.
At this point we should check if there is implicit nesting. That is,
are the levels of the \code{classid} factor nested within
\code{schoolid} factor. We could simply create the interaction factor
to avoid the possibility of implicit nesting but it saves a bit of
trouble if we check before doing so
<>=
with(classroom, isNested(classid, schoolid))
@
\begin{figure}[tbp]
\centering
<>=
lsch <- names(which(xtabs(~ schoolid, unique(subset(classroom, select=c(schoolid,classid)))) == 5))
funiq <- function(x) if(is.factor(x)) factor(x) else x
cl <- within(do.call(data.frame,lapply(subset(classroom, schoolid %in% lsch),
funiq)),
{
sch <- reorder(schoolid, mathgain)
cls <- reorder(reorder(classid, mathgain), as.numeric(schoolid))
})
print(dotplot(cls ~ mathgain | sch, cl, pch = 21,
strip = FALSE, strip.left = TRUE, layout = c(1, 12),
scales = list(y = list(relation = "free")),
ylab = "Classroom within school", type = c("p", "a"),
xlab = "Gain in Mathematics score from kindergarten to first grade",
jitter.y = TRUE))
@
\caption{Comparative dotplots of gain in the mathematics scores in classrooms within schools.}
\label{fig:classroomdot}
\end{figure}
A model with simple, scalar random effects and without any
fixed-effects terms (other than the implicit intercept) is called the
``unconditional model'' in the multilevel modeling literature. We fit
it as
<>=
(fm09 <- lmer(mathgain ~ (1|classid) + (1|schoolid), classroom))
@
The results from this model fit using the REML criterion can be
compared to Table 4.6 (page 156) of
\citet{west07:_linear_mixed_model}.
It seems that the \code{housepov} value is a property of the school.
We can check this by considering the number of unique combinations of
\code{housepov} and \code{schoolid} and comparing that to the number
of levels of \code{schoolid}. For safety we check the number of
levels of \code{factor(schoolid)} in case there are unused levels in
\code{schoolid}.
<>=
with(classroom, length(levels(factor(schoolid))))
nrow(unique(subset(classroom, select = c(schoolid, housepov))))
@
In some formulations of multilevel models or hierarchical linear
models it is important to associate covariates with different levels
in the hierarchy.
<>=
(fm7 <- lmer(mathgain ~ 1 + I(mathkind-450) + sex + minority + ses
+ housepov + (1|classid) + (1|schoolid), classroom))
@
A profile plot of the parameters in model \code{fm7}
\begin{figure}[tbp]
\centering
<>=
print(xyplot(pr7, absVal=0, aspect=1.3, layout=c(5,2)))
@
\caption{Profile plot of the parameters in model \code{fm4}.}
\label{fig:fm4prplot}
\end{figure}
is shown in Fig.~\ref{fig:fm4prplot}.
\section{Rat Brain example}
\label{sec:RatBrain}
<>=
ftable(xtabs(activate ~ animal + treatment + region, ratbrain))
@
Description of the Rat Brain data should go here.
\begin{figure}[tbp]
\centering
<>=
print(dotplot(region ~ activate | treatment, groups = animal,
ratbrain, type = c("p","a"), layout = c(1,2),
strip=FALSE, strip.left = TRUE,
auto.key = list(space = "right", lines = TRUE)))
@
\caption[Activation of brain regions in rats]{Activation of brain
regions in rats.}
\label{fig:Ratbraindot}
\end{figure}