\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,width=8,strip.white=all}
\SweaveOpts{keep.source=TRUE}
\SweaveOpts{prefix=TRUE,prefix.string=figs/Long,include=TRUE}
\setkeys{Gin}{width=\textwidth}
<>=
options(width=74, show.signif.stars = FALSE,
lattice.theme = function() canonical.theme("pdf", color = FALSE),
str = strOptions(strict.width = "cut"))
library(splines)
library(lattice)
library(Matrix)
library(Rcpp)
library(minqa)
library(lme4)
fm17 <- lmer(Reaction ~ Days + (1|Subject) + (0+Days|Subject),
sleepstudy, REML=FALSE)
if (file.exists("pr17.rda")) {
load("pr17.rda")
} else {
pr17 <- profile(fm17)
save(pr17, file = "pr17.rda")
}
data(Orthodont, package = "MEMSS")
@
\chapter{Models for Longitudinal Data}
\label{chap:longitudinal}
Longitudinal data consist of repeated measurements on the same subject
(or some other ``experimental unit'') taken over time. Generally we
wish to characterize the time trends within subjects and between
subjects. The data will always include the response, the time
covariate and the indicator of the subject on which the measurement
has been made. If other covariates are recorded, say whether the
subject is in the treatment group or the control group, we may wish to
relate the within- and between-subject trends to such covariates.
In this chapter we introduce graphical and statistical techniques
for the analysis of longitudinal data by applying them to a simple example.
\section{The \texttt{sleepstudy} Data}
\label{sec:sleep}
\citet{belenky03:_patter} report on a study of the effects of sleep
deprivation on reaction time for a number of subjects chosen from a
population of long-distance truck drivers. These subjects were
divided into groups that were allowed only a limited amount of sleep
each night. We consider here the group of 18 subjects who were
restricted to three hours of sleep per night for the first ten days of
the trial. Each subject's reaction time was measured several times on
each day of the trial.
<>=
str(sleepstudy)
@
In this data frame, the response variable \code{Reaction}, is the average
of the reaction time measurements on a given subject for a given day.
The two covariates are \code{Days}, the number of days of sleep
deprivation, and \code{Subject}, the identifier of the subject on
which the observation was made.
\begin{figure}[tbp]
\centering
<>=
print(xyplot(Reaction ~ Days | Subject, sleepstudy, aspect = "xy",
layout = c(6,3), type = c("g", "p", "r"),
index.cond = function(x,y) coef(lm(y ~ x))[1],
xlab = "Days of sleep deprivation",
ylab = "Average reaction time (ms)"))
@
\caption[Lattice plot of the \code{sleepstudy} data]{A lattice plot of
the average reaction time versus number of days of sleep deprivation
by subject for the \code{sleepstudy} data. Each subject's data are
shown in a separate panel, along with a simple linear regression
line fit to the data in that panel. The panels are ordered, from
left to right along rows starting at the bottom row, by increasing
intercept of these per-subject linear regression lines. The subject
number is given in the strip above the panel.}
\label{fig:sleepxyplot}
\end{figure}
As recommended for any statistical analysis, we begin by plotting the
data. The most important relationship to plot for longitudinal data on
multiple subjects is the trend of the response over time by subject,
as shown in Fig.~\ref{fig:sleepxyplot}. This plot, in which the
data for different subjects are shown in separate panels with the axes
held constant for all the panels, allows for examination of the
time-trends within subjects and for comparison of these patterns
between subjects. Through the use of small panels in a repeating
pattern Fig.~\ref{fig:sleepxyplot} conveys a great deal of
information, the individual time trends for 18 subjects over 10 days
--- a total of 180 points --- without being overly cluttered.
\subsection{Characteristics of the \texttt{sleepstudy} Data Plot}
\label{sec:DataPlotChar}
The principles of ``Trellis graphics'', developed by Bill
Cleveland\index{Cleveland,William} and his coworkers at Bell Labs and
implemented in the \package{lattice} package for \R{} by Deepayan
Sarkar\index{Sarkar,Deepayan}, have been incorporated in this plot.
As stated above, all the panels have the same vertical and horizontal
scales, allowing us to evaluate the pattern over time for each subject
and also to compare patterns between subjects. The line drawn in each
panel is a simple least squares line fit to the data in that panel
only. It is provided to enhance our ability to discern patterns in
both the slope (the typical change in reaction time per day of sleep
deprivation for that particular subject) and the intercept (the
average response time for the subject when on their usual sleep
pattern).
The aspect ratio of the panels (ratio of the height to the width) has
been chosen, according to an algorithm described in
\citet{cleveland93:_visual_data}, to facilitate comparison of slopes.
The effect of choosing the aspect ratio in this way is to have the
slopes of the lines on the page distributed around $\pm 45^\circ$,
thereby making it easier to detect systematic changes in slopes.
The panels have been ordered (from left to right starting at the
bottom row) by increasing intercept. Because the subject identifiers,
shown in the strip above each panel, are unrelated to the response it
would not be helpful to use the default ordering of the panels, which
is by increasing subject number. If we did so our perception of
patterns in the data would be confused by the, essentially random,
ordering of the panels. Instead we use a characteristic of the data to
determine the ordering of the panels, thereby enhancing our ability to
compare across panels. For example, a question of interest to the
experimenters is whether a subject's rate of change in reaction time
is related to the subject's initial reaction time. If this were the
case we would expect that the slopes would show an increasing trend
(or, less likely, a decreasing trend) in the left to right, bottom to
top ordering.
There is little evidence in Fig.~\ref{fig:sleepxyplot} of such a
systematic relationship between the subject's initial reaction time
and their rate of change in reaction time per day of sleep
deprivation. We do see that for each subject, except 335, reaction
time increases, more-or-less linearly, with days of sleep deprivation.
However, there is considerable variation both in the initial reaction
time and in the daily rate of increase in reaction time. We can also
see that these data are balanced, both with respect to the number of
observations on each subject, and with respect to the times at which
these observations were taken. This can be confirmed by
cross-tabulating \code{Subject} and \code{Days}.
<>=
xtabs(~ Subject + Days, sleepstudy)
@
In cases like this where there are several observations (10) per
subject and a relatively simple within-subject pattern (more-or-less
linear) we may want to examine coefficients from within-subject
fixed-effects fits. However, because the subjects constitute a sample
from the population of interest and we wish to draw conclusions about
typical patterns in the population and the subject-to-subject
variability of these patterns, we will eventually want to fit mixed
models and we begin by doing so. In section~\ref{sec:fm17re} we will
compare estimates from a mixed-effects model with those from the
within-subject fixed-effects fits.
\section{Mixed-effects Models For the \texttt{sleepstudy} Data}
\label{sec:SleepMixed}
Based on our preliminary graphical exploration of these data, we fit a
mixed-effects model with two fixed-effects parameters, the intercept
and slope of the linear time trend for the population, and two random
effects for each subject. The random effects for a particular subject
are the deviations in intercept and slope of that subject's time trend
from the population values.
We will fit two linear mixed models to these data. One model,
\code{fm16}, allows for correlation (in the unconditional distribution)
of the random effects for the same subject. That is, we allow for the
possibility that, for example, subjects with higher initial reaction
times may, on average, be more strongly affected by sleep deprivation.
The second model provides independent (again, in the unconditional
distribution) random effects for intercept and slope for each subject.
\subsection{A Model With Correlated Random Effects }
\label{sec:correlatedre}
The first model is fit as
<>=
(fm16 <- lmer(Reaction ~ 1 + Days + (1 + Days|Subject), sleepstudy,
REML=FALSE))
@
From the display we see that this model incorporates both an
intercept and a slope (with respect to \code{Days}) in the fixed
effects and in the random effects. Extracting the conditional modes
of the random effects
<>=
head(ranef(fm16)[["Subject"]])
@
confirms that these are \emph{vector-valued} random effects. There
are a total of $q=36$ random effects, two for each of the 18 subjects.
The random effects section of the model display,
<>=
cat(paste(capture.output(print(fm16))[8:11], collapse = "\n"), "\n")
@
indicates that there will be a random effect for the intercept and a
random effect for the slope with respect to \code{Days} at each level
of \code{Subject} and, furthermore, the unconditional distribution of
these random effects, $\bc B\sim\mathcal{N}(\vec 0,\Sigma)$, allows
for correlation of the random effects for the same subject.
\begin{figure}[tbp]
\centering
<>=
print(image(getME(fm16, "Lambda"), sub = NULL, xlab = expression(Lambda),
ylab = NULL),
split = c(1,1,3,1), more = TRUE)
print(image(tcrossprod(getME(fm16, "Lambda")), sub = NULL,
xlab = expression(Sigma), ylab = NULL),
split = c(2,1,3,1), more = TRUE)
print(image(getME(fm16, "L"), sub = NULL, xlab = "L", ylab = NULL),
split = c(3,1,3,1))
@
\caption{Images of $\Lambda$, $\Sigma$ and $\vec L$ for model \code{fm16}.}
\label{fig:fm16LambdaL}
\end{figure}
We can confirm the potential for correlation of random effects within
subject in the images of $\Lambda$, $\Sigma$ and $\vec L$ for this
model (Fig.~\ref{fig:fm16LambdaL}). The matrix $\Lambda$ has 18
triangular blocks of size 2 along the diagonal, generating 18 square,
symmetric blocks of size 2 along the diagonal of $\Sigma$. The 18
symmetric blocks on the diagonal of $\Sigma$ are identical. Overall
we estimate two standard deviations and a correlation for a
vector-valued random effect of size 2, as shown in the model summary.
Often the variances and the covariance of random effects are quoted,
rather than the standard deviations and the correlation shown here.
We have already seen that the variance of a random effect is a poor
scale on which to quote the estimate because confidence intervals on
the variance are so badly skewed. It is more sensible to assess the
estimates of the standard deviations of random effects or, possibly, the
logarithms of the standard deviations if we can be confident that 0
is outside the region of interest. We do display the estimates of the
variances of the random effects but mostly so that the user can
compare these estimates to those from other software or for cases
where an estimate of a variance is expected (sometimes even required)
to be given when reporting a mixed model fit.
We do not quote estimates of covariances of vector-valued random
effects because the covariance is a difficult scale to interpret,
whereas a correlation has a fixed scale. A correlation must be
between $-1$ and $1$, allowing us to conclude that a correlation
estimate close to those extremes indicates that $\Sigma$ is close to
singular and the model is not well formulated.
The estimates of the fixed effects parameters are
$\widehat{\vec\beta}=(251.41,10.467)\trans$. These represent a typical
initial reaction time (i.e. without sleep deprivation) in the
population of about 250 milliseconds, or 1/4 sec., and a typical
increase in reaction time of a little more than 10 milliseconds per
day of sleep deprivation.
The estimated subject-to-subject variation in the intercept
corresponds to a standard deviation of about 25 ms. A 95\% prediction
interval on this random variable would be approximately $\pm 50$ ms.
Combining this range with a population estimated intercept of 250
ms. indicates that we should not be surprised by intercepts as low as 200
ms. or as high as 300 ms. This range is consistent with the reference
lines shown in Figure~\ref{fig:sleepxyplot}.
Similarly, the estimated subject-to-subject variation in the slope
corresponds to a standard deviation of about 5.7 ms./day so we would not
be surprised by slopes as low as $10.5 - 2\cdot 5.7=-0.9$ ms./day or as
high as $10.5 + 2\cdot 5.7=21.9$ ms./day. Again, the conclusions from
these rough, ``back of the envelope'' calculations are consistent with
our observations from Fig.~\ref{fig:sleepxyplot}.
The estimated residual standard deviation is about 25 ms. leading us to
expect a scatter around the fitted lines for each subject of up to
$\pm 50$ ms. From Figure~\ref{fig:sleepxyplot} we can see that some
subjects (309, 372 and 337) appear to have less variation than $\pm
50$ ms. about their within-subject fit but others (308, 332 and 331)
may have more.
Finally, we see the estimated within-subject correlation of the random
effect for the intercept and the random effect for the slope is very
low, $0.081$, confirming our impression that there is little evidence
of a systematic relationship between these quantities. In other
words, observing a subject's initial reaction time does not give us
much information for predicting whether their reaction time will be
strongly affected by each day of sleep deprivation or not.
It seems reasonable that we could get nearly as good a fit from a
model that does not allow for correlation, which we describe next.
\subsection{A Model With Uncorrelated Random Effects}
\label{sec:uncorrelatedre}
In a model with uncorrelated random effects we have
$\bc B\sim\mathcal{N}(\vec 0,\Sigma)$ where $\Sigma$ is diagonal. We
have seen models like this in previous chapters but those models had
simple scalar random effects for all the grouping factors. Here
we want to have a simple scalar random effect for \code{Subject} and a
random effect for the slope with respect to \code{Days}, also indexed
by \code{Subject}. We accomplish this by specifying two
random-effects terms. The first, \code{(1|Subject)}, is a simple
scalar term. The second has \code{Days} on the left hand side of the
vertical bar.
It may seem that the model formula we want should be
<>=
Reaction ~ 1 + Days + (1|Subject) + (Days|Subject)
@
but it is not. Because the intercept is implicit in linear models,
the second random effects term is equivalent to
\code{(1+Days|Subject)} and will, by itself, produce correlated,
vector-valued random effects.
We must suppress the implicit intercept in the second random-effects
term, which we do by writing it as \code{(0+Days|Subject)}, read as
``no intercept and \code{Days} by \code{Subject}''. An alternative
expression for \code{Days} without an intercept by \code{Subject} is
\code{(Days - 1 | Subject)}. Using the first form we have
<>=
(fm17 <- lmer(Reaction ~ 1 + Days + (1|Subject) + (0+Days|Subject),
sleepstudy, REML=FALSE))
@
As in model \code{fm16}, there are two random effects for each subject
<>=
head(ranef(fm17)[["Subject"]])
@
but no correlation has been estimated
<>=
cat(paste(capture.output(print(fm17))[8:11], collapse = "\n"), "\n")
@
The \code{Subject} factor is repeated in the ``Groups'' column because
there were two distinct terms generating these random effects and
these two terms had the same grouping factor.
\begin{figure}[tbp]
\centering
<>=
print(image(getME(fm17, "Lambda"), sub = NULL, xlab = expression(Lambda),
ylab = NULL),
split = c(1,1,3,1), more = TRUE)
print(image(tcrossprod(getME(fm17, "Lambda")), sub = NULL,
xlab = expression(Sigma), ylab = NULL),
split = c(2,1,3,1), more = TRUE)
print(image(getME(fm17, "L"), sub = NULL, xlab = "L", ylab = NULL),
split = c(3,1,3,1))
@
\caption[Images of $\Lambda$, $\Sigma$ and $\vec L$ for model
\code{fm17}]{Images of $\Lambda$, the relative covariance factor,
$\Sigma$, the variance-covariance matrix of the random effects, and
$\vec L$, the sparse Cholesky factor, in model \code{fm17}.}
\label{fig:fm17LambdaL}
\end{figure}
Images of the matrices $\Lambda$, $\Sigma$ and $\vec L$
(Fig.~\ref{fig:fm17LambdaL}) show that $\Sigma$ is indeed diagonal.
The order of the random effects in $\Sigma$ and $\Lambda$ for model
\code{fm17} is different from the order in model \code{fm16}. In model
\code{fm16} the two random effects for a particular subject were
adjacent. In model \code{fm17} all the intercept random effects occur
first then all the \code{Days} random effects. The sparse Cholesky
decomposition, $\vec L$, has the same form in both models because the
fill-reducing permutation (described in
Sect.~\ref{sec:fill-reducingP}) calculated for model \code{fm17}
provides a post-ordering to group random effects with similar
structure in $\vec Z$.
\begin{figure}[tbp]
\centering
<>=
print(image(getME(fm16, "Zt"), sub = NULL, xlab = NULL, ylab = NULL,
scales = list(x = list(labels = NULL))),
pos = c(0,0.38,1,1), more = TRUE)
print(image(getME(fm17, "Zt"), sub = NULL, xlab = NULL, ylab = NULL,
scales = list(x = list(labels = NULL))),
pos = c(0,0,1,0.62))
@
\caption[Images of $\vec Z\trans$ for models \code{fm16} and
\code{fm17}]{Images of $\vec Z\trans$ for models \code{fm16} (upper
panel) and \code{fm17} (lower panel).}
\label{fig:fm169Zt}
\end{figure}
Images of $\vec Z\trans$ for these two models (Fig.~\ref{fig:fm169Zt})
shows that the columns of $\vec Z$ (rows of $\vec Z\trans$) from one
model are the same those from the other model but in a different
order.
\subsection{Generating $\vec Z$ and $\Lambda$ From Random-effects Terms}
\label{sec:GeneratingZLambda}
Let us consider these columns in more detail, starting with the columns
of $\vec Z$ for model \code{fm17}. The first 18 columns (rows in the
bottom panel of Fig.~\ref{fig:fm169Zt}) are the indicator columns for
the \code{Subject} factor, as we would expect from the simple, scalar
random-effects term \code{(1|Subject)}. The pattern of zeros and
non-zeros in the second group of 18 columns is determined by the
indicators of the grouping factor, \code{Subject}, and the values of
the non-zeros are determined by the \code{Days} covariate. In other
words, these columns are formed by the \emph{interaction} of the
numeric covariate, \code{Days}, and the categorical covariate,
\code{Subject}.
The non-zero values in the model matrix $\vec Z$ for model \code{fm16}
are the same as those for model \code{fm16} but the columns are in a
different order. Pairs of columns associated with the same level of
the grouping factor are adjacent. One way to think of the process of
generating these columns is to extend the idea of an interaction
between a single covariate and the grouping factor to generating an
``interaction'' of a model matrix and the levels of the grouping
factor. In other words, we begin with the two columns of the model
matrix for the expression \code{1 + Days} and the 18 columns of
indicators for the \code{Subject} factor. The result will have 36
columns that we regard as 18 adjacent pairs. The values within
each of these pairs of columns are the values of the \code{1 + Days}
columns, when the indicator is 1, otherwise they are zero.
We can now describe the general process of creating the model matrix,
$\vec Z$, and the relative covariance factor, $\Lambda$ from the
random-effects terms in the model formula. Each random-effects term
is of the form \code{(expr|fac)}. The expression \code{expr} is
evaluated as a linear model formula, producing a model matrix with $s$
columns. The expression \code{fac} is evaluated as a factor. Let $k$
be the number of levels in this factor, after eliminating unused
levels, if any. The $i$th term generates $s_ik_i$ columns in the
model matrix, $\vec Z$, and a diagonal block of size $s_ik_i$ in the
relative covariance factor, $\Lambda$. The $s_ik_i$ columns in $\vec
Z$ have the pattern of the interaction of the $s_i$ columns from the
$i^{th}$ \code{expr} with the $k_i$ indicator columns for the $i$th
grouping factor \code{fac}. The diagonal block in $\Lambda$ is itself
block diagonal, consisting of $k_i$ blocks, each a lower triangular
matrix of size $s_i$. In fact, these inner blocks are repetitions of
the same lower triangular $s_i\times s_i$ matrix. The $i^{th}$ term
contributes $s_i(s_i+1)/2$ elements to the variance-component
parameter, $\vec\theta$, and these are the elements in the lower
triangle of this $s_i\times s_i$ template matrix.
Note that when counting the columns in a model matrix we must take
into account the implicit intercept term. For example, we could write
the formula for model \code{fm16} as
<>=
Reaction ~ Days + (Days|Subject)
@
realizing that the linear model expression, \code{Days}, actually
generates two columns because of the implicit intercept.
Whether or not to include an explicit intercept term in a model
formula is a matter of personal taste. Many people prefer to write
the intercept explicitly so as to emphasize the relationship between
terms in the formula and coefficients or random effects in the model.
Others omit these implicit terms so as to economize on the amount of
typing required. Either approach can be used. The important point to
remember is that the intercept must be explicitly suppressed when you
don't want it in a term.
Also, the intercept term must be explicit when it is the only term
in the expression. That is, a simple, scalar random-effects term must
be written as \code{(1|fac)} because a term like \code{(|fac)} is not
syntactically correct. However, we can omit the intercept from the
fixed-effects part of the model formula if we have any random-effects
terms. That is, we could write the formula for model \code{fm01} in
Chap.~\ref{chap:ExamLMM} as
<>=
Yield ~ (1|Batch)
@
or even
<>=
Yield ~ 1|Batch
@
although omitting the parentheses around a random-effects term is
risky. Because of operator precedence, the vertical bar operator,
\code{|}, takes essentially everything in the expression to the left
of it as its first operand. It is advisable always to enclose such
terms in parentheses so the scope of the operands to the \code{|}
operator is clearly defined.
\subsection{Comparing Models \texttt{fm17} and \texttt{fm16}}
\label{sec:comparingfm16fm17}
Returning to models \code{fm16} and \code{fm17} for the
\code{sleepstudy} data, it is easy to see that these are nested models
because \code{fm16} is reduced to \code{fm17} by constraining the
within-group correlation of random effects to be zero (which is
equivalent to constraining the element below the diagonal in the
$2\times 2$ lower triangular blocks of $\Lambda$ in
Fig.~\ref{fig:fm16LambdaL} to be zero).
We can use a likelihood ratio test to compare these fitted models.
<>=
anova(fm17, fm16)
@
The value of the $\chi^2$ statistic, $0.0639$, is very small,
corresponding to a p-value of $0.80$ and indicating that the extra
parameter in model \code{fm16} relative to \code{fm17} does not produce
a significantly better fit. By the principal of parsimony we prefer
the reduced model, \code{fm17}.
This conclusion is consistent with the visual impression provided by
Fig.~\ref{fig:sleepxyplot}. There does not appear to be a strong
relationship between a subject's initial reaction time and the extent
to which his or her reaction time is affected by sleep deprivation.
In this likelihood ratio test the value of the parameter being tested,
a correlation of zero, is not on the boundary of the parameter space.
We can be confident that the p-value from the LRT adequately reflects
the underlying situation.
(\textbf{Note}: It is possible to extend profiling to the correlation
parameters and we will do so but that has not been done yet.)
\section{Assessing the Precision of the Parameter Estimates}
\label{sec:assess-prec-param}
Plots of the profile $\zeta$
\begin{figure}[tbp]
\centering
%% pr17 -- wrong labels? "Type T2"
<>=
print(xyplot(pr17, aspect = 1.3, layout = c(5,1)))
@
\caption[Profile zeta plots for the parameters in model
\code{fm17}]{Profile zeta plot for each of the parameters in model
\code{fm17}. The vertical lines are the endpoints of 50\%, 80\%,
90\%, 95\% and 99\% profile-based confidence intervals for each
parameter.}
\label{fig:pr17plt}
\end{figure}
for the parameters in model \code{fm17} (Fig.~\ref{fig:pr17plt}) show
that confidence intervals on $\sigma_1$ and $\sigma_2$ will be slightly
skewed; those for $\log(\sigma)$ will be symmetric and
well-approximated by methods based on quantiles of the standard normal
distribution and those for the fixed-effects parameters, $\beta_1$ and
$\beta_2$ will be symmetric and slightly over-dispersed relative to
the standard normal. For example, the 95\% profile-based confidence
intervals are
<>=
confint(pr17)
@
The profile pairs plot (Fig.~\ref{fig:pr17pairs})
\begin{figure}[tbp]
\centering
<>=
print(splom(pr17))
@
\caption[Profile pairs plot for the parameters in model
\code{fm17}]{Profile pairs plot for the parameters in model \code{fm17}.
The contour lines correspond to marginal 50\%, 80\%, 90\%, 95\% and
99\% confidence regions based on the likelihood ratio. Panels below
the diagonal represent the $(\zeta_i,\zeta_j)$ parameters; those above
the diagonal represent the original parameters.}
\label{fig:pr17pairs}
\end{figure}
shows, for the most part, the usual patterns. First, consider the
panels below the diagonal, which are on the $(\zeta_i,\zeta_j)$
scales. The $\zeta$ pairs for $\log(\sigma)$ and $\beta_0$, in the
$(4,3)$ panel, and for $\log(\sigma)$ and $\beta_1$, in the $(5,3)$
panel, show the ideal pattern. The profile traces are straight and
orthogonal, producing interpolated contours on the $\zeta$ scale that
are concentric circles centered at the origin. When mapped back to
the scales of $\log(\sigma)$ and $\beta_0$ or $\beta_1$, in panels
$(3,4)$ and $(3,5)$, these circles become slightly distorted, but this
is only due to the moderate nonlinearity in the profile $\zeta$ plots
for these parameters.
Examining the profile traces on the $\zeta$ scale for $\log(\sigma)$
versus $\sigma_1$, the $(3,1)$ panel, or versus $\sigma_2$, the
$(3,2)$ panel, and for $\sigma_1$ versus $\sigma_2$, the $(2,1)$ panel,
we see that close to the estimate the traces are orthogonal but as one
variance component becomes small there is usually an increase in the
others. In some sense the total variability in the response will be
partitioned across the contribution of the fixed effects and the
variance components. In each of these panels the fixed-effects
parameters are at their optimal values, conditional on the values of
the variance components, and the variance components must compensate
for each other. If one is made smaller then the others must become
larger to compensate.
The patterns in the $(4,1)$ panel ($\sigma_1$ versus $\beta_0$, on the
$\zeta$ scale) and the $(5,2)$ panel ($\sigma_2$ versus $\beta_1$, on
the $\zeta$ scale) are what we have come to expect. As the
fixed-effects parameter is moved from its estimate, the standard
deviation of the corresponding random effect increases to compensate.
The $(5,1)$ and $(4,2)$ panels show that changing the value of a fixed
effect doesn't change the estimate of the standard deviation of the
random effects corresponding to the other fixed effect, which makes
sense although the perfect orthogonality shown here will probably not
be exhibited in models fit to unbalanced data.
In some ways the most interesting panels are those for the pair of
fixed-effects parameters: $(5,4)$ on the $\zeta$ scale and $(4,5)$ on
the original scale. The traces are not orthogonal. In fact the
slopes of the traces at the origin of the $(5,4)$ ($\zeta$ scale)
panel are the correlation of the fixed-effects estimators ($-0.194$
for this model) and its inverse. However, as we move away from the
origin on one of the traces in the $(5,4)$ panel it curves back toward
the horizontal axis (for the horizontal trace) or the vertical axis
(for the vertical trace). In the $\zeta$ scale the individual
contours are still concentric ellipses but their eccentricity changes
from contour to contour. The innermost contours have greater
eccentricity than the outermost contours. That is, the outermost
contours are more like circles than are the innermost contours.
In a fixed-effects model the shapes of projections of deviance
contours onto pairs of fixed-effects parameters are consistent. In a
fixed-effects model the profile traces in the original scale will
always be straight lines. For mixed models these traces can fail to
be linear, as we see here, contradicting the widely-held belief that
inferences for the fixed-effects parameters in linear mixed models,
based on T or F distributions with suitably adjusted degrees of
freedom, will be completely accurate. The actual patterns of deviance
contours are more complex than that.
%\clearpage << MM's suggestion
\section{Examining the Random Effects and Predictions}
\label{sec:fm17re}
The result of applying \code{ranef} to fitted linear mixed model is a
list of data frames. The components of the list correspond to the
grouping factors in the random-effects terms, not to the terms
themselves. Model \code{fm17} is the first model we have fit with more
than one term for the same grouping factor where we can see the
combination of random effects from more than one term.
<>=
str(rr1 <- ranef(fm17))
@
The \code{plot} method
\begin{figure}[tbp]
\centering
<>=
print(plot(rr1, type = c("g","p"), aspect = 1)$Subject,
split = c(1,1,2,1), more = TRUE)
print(plot(coef(fm17), type = c("g","p"), aspect = 1)$Subject,
split = c(2,1,2,1))
@
\caption{Plot of the conditional modes of the random effects for model \code{fm17} (left panel) and the corresponding subject-specific coefficients (right panel).}
\label{fig:ranefcoeffm17}
\end{figure}
for \code{"ranef.mer"} objects produces one plot for each grouping
factor. For scalar random effects the plot is a normal probability
plot. For two-dimensional random effects, including the case of two
scalar terms for the same grouping factor, as in this model, the plot
is a scatterplot. For three or more random effects per level of the
grouping factor, the plot is a scatterplot matrix. The left hand
panel in Fig.~\ref{fig:ranefcoeffm17} was created with
\code{plot(ranef(fm17))}.
The \code{coef} method for a fitted \code{lmer} model combines the
fixed-effects estimates and the conditional modes of the random
effects, whenever the column names of the random effects correspond to
the names of coefficients. For model \code{fm17} the fixed-effects
coefficients are \code{(Intercept)} and \code{Days} and the columns of
the random effects match these names. Thus we can calculate some kind
of per-subject ``estimates'' of the slope and intercept and plot them,
as in the right hand panel of Fig.~\ref{fig:ranefcoeffm17}. By comparing
the two panels in Fig.~\ref{fig:ranefcoeffm17} we can see that the
result of the \code{coef} method is simply the conditional modes of
the random effects shifted by the coefficient estimates.
It is not entirely clear how we should interpret these values. They
are a combination of parameter estimates with the modal values of
random variables and, as such, are in a type of ``no man's land'' in
the probability model. (In the Bayesian
approach~\citep{box73:_bayes_infer_statis_analy} to inference,
however, both the parameters and the random effects are random
variables and the interpretation of these values is straightforward.)
Despite the difficulties of interpretation in the probability model,
these values are of interest because they determine the fitted
response for each subject.
\begin{figure}[tbp]
\centering
<>=
df <- coef(lmList(Reaction ~ Days | Subject, sleepstudy))
fclow <- subset(df, `(Intercept)` < 251)
fchigh <- subset(df, `(Intercept)` > 251)
cc1 <- as.data.frame(coef(fm17)$Subject)
names(cc1) <- c("A", "B")
df <- cbind(df, cc1)
ff <- fixef(fm17)
with(df,
print(xyplot(`(Intercept)` ~ Days, aspect = 1,
x1 = B, y1 = A,
panel = function(x, y, x1, y1, subscripts, ...) {
panel.grid(h = -1, v = -1)
x1 <- x1[subscripts]
y1 <- y1[subscripts]
larrows(x, y, x1, y1, type = "closed", length = 0.1,
angle = 15, ...)
lpoints(x, y,
pch = trellis.par.get("superpose.symbol")$pch[2],
col = trellis.par.get("superpose.symbol")$col[2])
lpoints(x1, y1,
pch = trellis.par.get("superpose.symbol")$pch[1],
col = trellis.par.get("superpose.symbol")$col[1])
lpoints(ff[2], ff[1],
pch = trellis.par.get("superpose.symbol")$pch[3],
col = trellis.par.get("superpose.symbol")$col[3])
ltext(fclow[,2], fclow[,1], row.names(fclow),
adj = c(0.5, 1.7))
ltext(fchigh[,2], fchigh[,1], row.names(fchigh),
adj = c(0.5, -0.6))
},
key = list(space = "top", columns = 3,
text = list(c("Mixed model", "Within-group", "Population")),
points = list(col = trellis.par.get("superpose.symbol")$col[1:3],
pch = trellis.par.get("superpose.symbol")$pch[1:3]))
)))
@ %$
\caption[Comparison of within-subject estimates and conditional modes
for \code{fm17}]{Comparison of the within-subject estimates of the
intercept and slope for each subject and the conditional modes of
the per-subject intercept and slope. Each pair of points joined by
an arrow are the within-subject estimates and conditional modes of
the random effects for a particular subject. The arrow points from the
within-subject estimate to the conditional mode for the
mixed-effects model. The subject identifier number is at the tail
of each arrow.}
\label{fig:shrinkage}
\end{figure}
Because responses for each individual are recorded on each of ten days
we can determine the within-subject estimates of the slope and
intercept (that is, the slope and intercept of each of the lines in
Fig.~\ref{fig:sleepxyplot}). In Fig.~\ref{fig:shrinkage} we compare
the within-subject least squares estimates to the per-subject slope
and intercept calculated from model \code{fm17}. We see that, in
general, the per-subject slopes and intercepts from the mixed-effects
model are closer to the population estimates than are the
within-subject least squares estimates. This pattern is sometimes
described as a \emph{shrinkage} of coefficients toward the population
values.
The term ``shrinkage'' may have negative connotations. John Tukey
chose to characterize this process in terms of the estimates for
individual subjects ``borrowing strength'' from each other. This is a
fundamental difference in the models underlying mixed-effects models
versus strictly fixed-effects models. In a mixed-effects model we
assume that the levels of a grouping factor are a selection from a
population and, as a result, can be expected to share characteristics
to some degree. Consequently, the predictions from a mixed-effects
model are attenuated relative to those from strictly fixed-effects
models.
\begin{figure}[tbp]
\centering
<>=
print(xyplot(Reaction ~ Days | Subject, sleepstudy, aspect = "xy",
layout = c(6,3), type = c("g", "p", "r"),
coef.list = df[,3:4],
panel = function(..., coef.list) {
panel.xyplot(...)
panel.abline(as.numeric(coef.list[packet.number(),]),
col.line = trellis.par.get("superpose.line")$col[2],
lty = trellis.par.get("superpose.line")$lty[2]
)
panel.abline(fixef(fm17),
col.line = trellis.par.get("superpose.line")$col[4],
lty = trellis.par.get("superpose.line")$lty[4]
)
},
index.cond = function(x,y) coef(lm(y ~ x))[1],
xlab = "Days of sleep deprivation",
ylab = "Average reaction time (ms)",
key = list(space = "top", columns = 3,
text = list(c("Within-subject", "Mixed model", "Population")),
lines = list(col = trellis.par.get("superpose.line")$col[c(1:2,4)],
lty = trellis.par.get("superpose.line")$lty[c(1:2,4)]))))
@
\caption[Comparison of predictions from separate fits and
\code{fm17}]{Comparison of the predictions from the within-subject fits
with those from the conditional modes of the subject-specific
parameters in the mixed-effects model.}
\label{fig:shrinkfit}
\end{figure}
The predictions from model \code{fm17} and from the within-subject
least squares fits for each subject are shown in
Fig.~\ref{fig:shrinkfit}. It may seem that the shrinkage from the
per-subject estimates toward the population estimates depends only on
how far the per-subject estimates (solid lines) are from the
population estimates (dot-dashed lines). However, careful examination
of this figure shows that there is more at work here than a simple
shrinkage toward the population estimates proportional to the distance
of the per-subject estimates from the population estimates.
It is true that the mixed model estimates for a particular subject are
``between'' the within-subject estimates and the population estimates,
in the sense that the arrows in Fig.~\ref{fig:shrinkage} all point
somewhat in the direction of the population estimate. However, the
extent of the attenuation of the within-subject estimates toward the
population estimates is not simply related to the distance between
those two sets of estimates. Consider the two panels, labeled 330 and
337, at the top right of Fig.~\ref{fig:shrinkfit}. The within-subject
estimates for 337 are quite unlike the population estimates but the
mixed-model estimates are very close to these within-subject
estimates. That is, the solid line and the dashed line in that panel
are nearly coincident and both are a considerable distance from the
dot-dashed line. For subject 330, however, the dashed line is
more-or-less an average of the solid line and the dot-dashed line,
even though the solid and dot-dashed lines are not nearly as far apart
as they are for subject 337.
The difference between these two cases is that the within-subject
estimates for 337 are very well determined. Even though this subject
had an unusually large intercept and slope, the overall pattern of the
responses is very close to a straight line. In contrast, the overall
pattern for 330 is not close to a straight line so the within-subject
coefficients are not well determined. The multiple $R^2$ for the
solid line in the 337 panel is $93.3\%$ but in the 330 panel it is
only $15.8\%$. The mixed model can pull the predictions in the 330
panel, where the data are quite noisy, closer to the population line
without increasing the residual sum of squares substantially. When
the within-subject coefficients are precisely estimated, as in the 337
panel, very little shrinkage takes place.
We see from Fig.~\ref{fig:shrinkfit} that the mixed-effects model
smooths out the between-subject differences in the predictions by
bringing them closer to a common set of predictions, but not at the
expense of dramatically increasing the sum of squared residuals. That
is, the predictions are determined so as to balance fidelity to the
data, measured by the residual sum of squares, with simplicity of the
model. The simplest model would use the same prediction in each panel
(the dot-dashed line) and the most complex model, based on linear
relationships in each panel, would correspond to the solid lines. The
dashed lines are between these two extremes. We will return to this
view of the predictions from mixed models balancing complexity versus
fidelity in Sect.~\ref{sec:IntegratingH}, where we make the
mathematical nature of this balance explicit.
We should also examine
\begin{figure}[tbp]
%% FIXME: ranef(fm17, condVar = TRUE) currently throws an error.
\centering
<>=
print(dotplot(ranef(fm16,post=TRUE),
scales = list(x = list(relation = "free")))[[1]])
@
\caption{Prediction intervals on the random effects for model \code{fm17}.}
\label{fig:caterpillar}
\end{figure}
the prediction intervals on the random effects
(Fig.~\ref{fig:caterpillar}) where we see that many prediction
intervals overlap zero but there are several that do not. In this
plot the subjects are ordered from bottom to top according to
increasing conditional mode of the random effect for
\code{(Intercept)}. The resulting pattern in the conditional modes of
the random effect for \code{Days} reinforces our conclusion that the
model \code{fm17}, which does not allow for correlation of the random
effects for \code{(Intercept)} and \code{Days}, is suitable.
\section{Chapter Summary}
\section*{Problems}
\addcontentsline{toc}{section}{Problems}
\begin{prob}\label{probL1}
Check the structure of documentation, structure and a summary of the
\code{Orthodont} data set from the \package{MEMSS} package.
\begin{compactenum}[(a)]
\item Create an \code{xyplot} of the \code{distance} versus
\code{age} by \code{Subject} for the female subjects
only. You can use the optional argument \code{subset = Sex ==
"Female"} in the call to \code{xyplot} to achieve this. Use
the optional argument \code{type = c("g","p","r")} to add
reference lines to each panel.
\item Enhance the plot by choosing an aspect ratio for which the
typical slope of the reference line is around 45$^o$. You can set
it manually (something like \code{aspect = 4}) or with an
automatic specification (\code{aspect = "xy"}). Change the layout
so the panels form one row (\code{layout = c(11,1)}).
\item Order the panels according to increasing response at age 8.
This is achieved with the optional argument \code{index.cond}
which is a function of arguments \code{x} and \code{y}. In this
case you could use \code{index.cond = function(x,y) y[x == 8]}.
Add meaningful axis labels. Your final plot should be like
\begin{center}
<>=
print(xyplot(distance ~ age|Subject, Orthodont, subset = Sex == "Female",
index.cond = function(x,y) y[x == 8],
aspect = 'xy', layout = c(11,1), type = c("g","p","r"),
xlab = "Age (yr)",
ylab = "Distance from pituitary to pterygomaxillary fissure (mm)"))
@
\end{center}
\item Fit a linear mixed model to the data for the females only with
random effects for the intercept and for the slope by subject,
allowing for correlation of these random effects within subject.
Relate the fixed effects and the random effects' variances and
covariances to the variability shown in the figure.
\item Produce a ``caterpillar plot'' of the random effects for
intercept and slope. Does the plot indicate correlated random effects?
\item Consider what the Intercept coefficient and random effects
represents. What will happen if you center the ages by
subtracting 8 (the baseline year) or 11 (the middle of the age range)?
\item Repeat for the data from the male subjects.
\end{compactenum}
\end{prob}
\begin{prob}
\item Fit a model to both the female and the male subjects in the
\code{Orthodont} data set, allowing for differences by sex in the
fixed-effects for intercept (probably with respect to the centered
age range) and slope.
\end{prob}