Stata Practical Multilevel

Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
Introduction
Some of the sections within this module have online quizzes for you
Module 5: Introduction to Multilevel to test your understanding. To find the quizzes:
Modelling EXAMPLE
From within the LEMMA learning environment
• Go down to the section for Module 5: Introduction to Multilevel Modelling
Stata Practical • Click " 5.1 Comparing Groups Using Multilevel Modelling"
to open Lesson 5.1
• Click Q1 to open the first question
George Leckie 1
Centre for Multilevel Modelling
Pre-requisites Introduction to the Scottish Youth Cohort Trends

• Modules 1-4
Dataset
You will be analysing data from the Scottish School Leavers Survey (SSLS), a
nationally representative survey of young people. We use data from seven cohorts
of young people collected in the first sweep of the study, carried out at the end of
Contents the final year of compulsory schooling (aged 16-17) when most sample members
had taken Standard grades.2
Introduction to the Scottish Youth Cohort Trends Dataset .................................... 2
In the practical for Module 3 on multiple regression, we considered the predictors
P5.1 Comparing Groups using Multilevel Modelling ........................................... 4 of attainment in Standard grades (subject-based examinations, typically taken in
P5.1.1 A multilevel model of attainment with school effects ............................. 5 up to eight subjects). In this practical, we extend the (previously single-level)
P5.1.2 Examining school effects (residuals) .................................................. 9 multiple regression analysis to allow for dependency of exam scores within schools
P5.2 Adding Student-level Explanatory Variables: Random Intercept Models ......... 12 and to examine the extent of between-school variation in attainment. We also
consider the effects on attainment of several school-level predictors.
P5.3 Allowing for Different Slopes across Schools: Random Slope Models ............. 17
P5.3.1 Testing for random slopes ............................................................. 19 The dependent variable is a total attainment score. Each subject is graded on a
P5.3.2 Interpretation of random cohort effects across schools .......................... 19 scale from 1 (highest) to 7 (lowest) and, after recoding so that a high numeric
P5.3.3 Examining intercept and slope residuals for schools .............................. 19 value denotes a high grade, the total is taken across subjects. The analysis
P5.3.4 Between-school variance as a function of cohort .................................. 22 dataset contains the student-level variables considered in Module 3 together with a
P5.3.5 Adding a random coefficient for gender (dichotomous x) ........................ 24
school identifier and three school-level variables:
P5.3.6 Adding a random coefficient for social class (categorical x) ..................... 26
P5.4 Adding Level 2 Explanatory Variables .................................................. 33 Variable name Description and codes
P5.4.1 Contextual effects ...................................................................... 36 caseid Anonymised student identifier
P5.4.2 Cross-level interactions ................................................................ 39 schoolid Anonymised school identifier
P5.5 Complex Level 1 Variation ................................................................ 42 score Point score calculated from awards in Standard grades taken at age 16.
P5.5.1 Within-school variance as a function of cohort (continuous x) .................. 42 Scores range from 0 to 75, with a higher score indicating a higher
P5.5.2 Within-school variance as a function of gender (dichotomous x) ................ 42 attainment
P5.5.3 Within-school variance as a function of cohort and gender ...................... 44
P5.6 References ................................................................................... 45 2
We are grateful to Linda Croxford (Centre for Educational Sociology, University of Edinburgh) for
providing us with these data. The dataset was constructed as part of an ESRC-funded project on
Education and Youth Transitions in England, Wales and Scotland 1984-2002.
1
This Stata practical is adapted from the corresponding MLwiN practical: Steele, F. (2008) Module Further analyses of the data can be found in Croxford, L. and Raffe, D. (2006) “Education Markets
5: Introduction to Multilevel Modelling. LEMMA VLE, Centre for Multilevel Modelling. Accessed at and Social Class Inequality: A Comparison of Trends in England, Scotland and Wales”. In R. Teese
http://www.cmm.bris.ac.uk/lemma/course/view.php?id=13. (Ed.) Inequality Revisited. Berlin: Springer.
Centre for Multilevel Modelling, 2010 1 Centre for Multilevel Modelling, 2010 2
Introduction P5.1 Comparing Groups using Multilevel Modelling
cohort90 The sample includes the following cohorts: 1984, 1986, 1988, 1990, P5.1 Comparing Groups using Multilevel Modelling
1996 and 1998. The cohort90 variable is calculated by subtracting
1990 from each value. Thus values range from -6 (corresponding to
1984) to 8 (1998), with 1990 coded as zero Load “5.1.dta” into memory and open the do-file for this lesson:
female Sex of student (1 = female, 0 = male)
From within the LEMMA Learning Environment
sclass Social class, defined as the higher class of mother or father Go to Module 5: Introduction to Multilevel Modelling, and scroll down to
(1 = managerial and professional, 2 = intermediate, 3 = working, 4 =
unclassified) Stata Datasets and Do-files
schtype School type, distinguishing independent schools from state-funded Click “ 5.1.dta” to open the dataset
schools (1 = independent, 0 = state-funded)
schurban Urban-rural classification of school (1 = urban, 0 = town or rural)
schdenom School denomination (1 = Roman Catholic, 0 = non-denominational) and use the describe command to produce a summary of the dataset:
. describe
There are 33,988 students in 508 schools.
Contains data from 5.1.dta
obs: 33,988
vars: 9 3 Sep 2009 09:31
size: 713,748 (99.9% of memory free)
--------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------
caseid float %9.0g Case ID
schoolid int %9.0g School ID
score byte %9.0g Score
cohort90 byte %9.0g Cohort
female byte %9.0g Female
sclass byte %9.0g Social class
schtype byte %9.0g School type
schurban byte %9.0g School urban-rural classification
schdenom byte %9.0g School denomination
--------------------------------------------------------------------------------
Sorted by:
P5.1 Comparing Groups using Multilevel Modelling P5.1 Comparing Groups using Multilevel Modelling
P5.1.1 A multilevel model of attainment with school effects Issuing the xtmixed command gives the following output:
. xtmixed score || schoolid:, mle variance nostderr
We will start with the simplest multilevel model which allows for school effects on
attainment, but without explanatory variables. This ‘null’ model may be written Performing EM optimization:
Performing gradient-based optimization:

scoreij = β 0 + u0 j + eij
Iteration 0: log likelihood = -143269.53
where scoreij is the attainment of student i in school j , β 0 is the overall mean
Mixed-effects ML regression Number of obs = 33988
across schools, u0 j is the effect of school j on attainment, and eij is a student- Group variable: schoolid Number of groups = 508
level residual. The school effects u0 j , which we will also refer to as school (or Obs per group: min = 1
avg = 66.9
level 2) residuals, are assumed to follow a normal distribution with mean zero and max = 190
variance σ u20 .
Wald chi2(0) = .
Stata’s main command for fitting multilevel models for continuous response Log likelihood = -143269.53 Prob > chi2 = .
variables is the xtmixed command.3 To fit the above model using the xtmixed ------------------------------------------------------------------------------
command, we type: xtmixed score || schoolid:, mle variance score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nostderr. _cons | 30.6006 .3694317 82.83 0.000 29.87652 31.32467
------------------------------------------------------------------------------
The response variable (score) follows the command which is then followed by the
------------------------------------------------------------------------------
list of fixed part explanatory variables (excluding the constant as this is included Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
by default4). The above model contains only an intercept and so no fixed part -----------------------------+------------------------------------------------
schoolid: Identity |
explanatory variables are specified. The level 2 random part of the model is var(_cons) | 61.02457 . . .
specified after two vertical bars ||. The level 2 identifier (schoolid) is specified -----------------------------+------------------------------------------------
first followed by a colon and then the list of random part explanatory variables var(Residual) | 258.3572 . . .
------------------------------------------------------------------------------
(again excluding the constant as this is included by default). The mle option is LR test vs. linear regression: chibar2(01) = 3749.78 Prob >= chibar2 = 0.0000
used to request maximum likelihood estimation (as opposed to the default of
restricted maximum likelihood estimation). The variance option reports the Before interpreting the model, we will discuss the estimation procedure that
variances of the random intercept and any random coefficients included in the xtmixed uses.5 The default estimation option is to fit the model using the EM
model (as opposed to the default of standard deviations). The nostderr option is (expectation maximisation) algorithm until convergence (or 20 iterations have
specified to avoid calculating standard errors for the random part parameters. been reached). At that point, maximization switches to a gradient-based method,
This speeds up the time it takes to fit each xtmixed model and we can still use unless the emonly option is specified, in which case maximization stops.6 In the
likelihood ratio tests to compare nested models with different random part analysis which follows we will mainly use this default estimation option.
specifications.
While the default estimation options are normally the preferred approach,
complicated models can be very slow to iterate. The advantage of specifying
emonly is that EM iterations are typically much faster than those for gradient-
based methods. However, the disadvantage is that it can take a large number of
EM iterations to converge (if at all).
3
Note, two-level random intercept models can equally be fitted with the xtreg command (with
the mle option); see help xtreg. We do not discuss the xtreg command as it cannot be used to
fit more complicated multilevel models while xtmixed can. However, we do note that xtreg
(with the mle option) fits models considerably faster than xtmixed and is therefore recommended
5
for fitting two-level random intercept models. See Rabe-Hesketh and Skrondal (2008) for examples For further details see help xtmixed.
6
of two-level random intercept models fitted with both commands. By default, the gradient-based method is Newton–Raphson iterations, but other methods are
4
Note, the noconstant option can be used to omit the constant; see help xtmixed. available by specifying the appropriate maximize options; see help xtmixed.
The overall mean attainment (across schools) is estimated as 30.60. The mean for
school j is estimated as 30.60 + uˆ0 j , where uˆ0 j is the school residual which we Testing for school effects
will estimate in a moment. A school with uˆ0 j >0 has a mean that is higher than
To test the significance of school effects, we can carry out a likelihood ratio test
average, while uˆ0 j <0 for a below-average school. (We will obtain confidence comparing the null multilevel model with a null single-level model. To fit the null
intervals for residuals to determine whether differences from the overall mean can single-level model, we need to remove the random school effect:
be considered ‘real’ or due to chance.)
scoreij = β 0 + eij
Before we continue, we store the results using the estimates store command:
. xtmixed score, mle variance nostderr
. estimates store nullmodel
We can then explore other model specifications with the option of restoring these Wald chi2(0) = .
estimates later (by using the estimates restore command) without having to Log likelihood = -145144.42 Prob > chi2 = .
refit this model. This will be particularly helpful when we fit more complex ------------------------------------------------------------------------------
models that are slower to converge. We can even store each model we fit under a score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
different name so that we can restore any previously fitted model at a later point. -------------+----------------------------------------------------------------
_cons | 31.09462 .0939156 331.09 0.000 30.91055 31.27869
------------------------------------------------------------------------------
Partitioning variance ------------------------------------------------------------------------------

Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
The between-school (level 2) variance var(_cons) in attainment is estimated as var(Residual) | 299.7787 . . .
------------------------------------------------------------------------------
σû20 = 61.02, and the within-school between-student (level 1) variance
var(Residual) is estimated as σê2 = 258.36. Thus the total variance is The likelihood ratio test statistic is calculated as two times the difference in the
61.02 + 258.36 = 319.38. log likelihood values for the two models:
The variance partition coefficient (VPC) is 61.02/319.38 = 0.19, which indicates LR = 2(-143269.53 - -145144.42) = 3750 on 1 d.f. (because there is only one
that 19% of the variance in attainment can be attributed to differences between parameter difference between the models, σ u20 ).
schools. Note, however, that we have not accounted for intake ability (measured
by exams taken on entry to secondary school) so the school effects are not value- Bearing in mind that the 5% point of a chi-squared distribution on 1 d.f. is 3.84,
added. Previous studies have found that between-school variance in progress, i.e. there is overwhelming evidence of school effects on attainment. We will therefore
after accounting for intake attainment, is close to 10%. revert to the multilevel model with school effects.7
Note, the xtmixed command automatically compares the specified model with
the equivalent single-level model. The likelihood ratio test statistic for this
comparison can be seen in the last line of the xtmixed output of the first model
we fitted: chibar2(01) = 3749.78. Note that there is not a corresponding
likelihood ratio test statistic for the second model we fitted as this model is a
single-level model.
7
Note that this test statistic has a non-standard sampling distribution as the null hypothesis of a
zero variance is on the boundary of the parameter space; we do not envisage a negative variance.
In this case the correct p-value is half the one obtained from the tables of chi-squared distribution
with 1 degree of freedom. In the output of the xtmixed command, Stata automatically reports
the correct p-value for this test. See help j_xtmixedlr for further details.
Now restore the estimates of the earlier model using the estimates restore To see the school residual, standard error and ranking for a particular school, we
command. We can now continue our analysis of that model: can list the data. Here we do this for the first 10 schools in the data by making
use of the <= (less than or equal) operator.9
. estimates restore nullmodel
(results nullmodel are active now) . sort schoolid
. list schoolid u0 u0se u0rank if pickone==1 & schoolid<=10

P5.1.2 Examining school effects (residuals) +------------------------------------------+
| schoolid u0 u0se u0rank |
|------------------------------------------|
To estimate the school-level residuals uˆ0 j and their associated standard errors, we 41. | 1 -11.84128 2.389899 37 |
58. | 2 3.206334 1.302732 337 |
use the predict command first with the reffects option and second with the 201. | 3 3.396004 1.497341 344 |
reses option (the reses option is available as of Stata 11):8 309. | 4 -7.415012 2.07105 73 |
413. | 5 3.426228 1.630054 345 |
. predict u0, reffects |------------------------------------------|
544. | 6 12.43373 1.403097 487 |
. predict u0se, reses 660. | 7 -1.651931 1.459818 199 |
727. | 8 20.97878 2.021325 508 |
753. | 9 -8.694923 6.437819 59 |
The school-level residuals and their standard errors have been calculated and 772. | 10 1.737383 1.904442 291 |
stored for every record in the dataset. However, summary statistics and graphs for +------------------------------------------+
school-level variables must be based on a dataset with one record per school. We
therefore create a dummy variable pickone to pick one observation per school From these values we can see, for example, that school 1 had an estimated
(see P3.1.2 in Module 3 where we explain this approach in detail): residual of -11.84 which was ranked 37, i.e. 37 places from the bottom. For this
school, we estimate a mean score of 30.60 – 11.84 = 18.76. In contrast, the mean
. egen pickone = tag(schoolid) for school 8 (ranked 508, the highest) is estimated as 30.60 + 20.98 = 51.58.
Next we sort the school effects in ascending order based on the values of u0: Finally, we use the serrbar command to produce a ‘caterpillar plot’ to show the
school effects in rank order together with 95% confidence intervals. The order of
. sort u0 the three variables that follow the command is important. The first variable must
contain the point estimates, the second the associated standard errors and the
Then we rank the school effects. To do this, we use the generate command with third the rank of the point estimates. We use the scale(1.96) option to obtain
the sum() function to create a new variable u0rank equal to the running (i.e. 95% confidence limits and the yline(0) option to plot a horizontal line at zero
cumulative) sum of pickone. Thus the nth observation on u0rank contains the sum which represents the average school in the data:
of the first n observations on pickone.
. serrbar u0 u0se u0rank if pickone==1, scale(1.96) yline(0)
. generate u0rank = sum(pickone)
9
If the schools in the dataset were not numbered consecutively then the above command would
only list those schools where schoolid took a value of 10 or less. It is therefore often useful to
recode identifier variables such as schoolid so that they do take consecutive values. The relevant
8 command is egen newschoolid = group(schoolid). The group function creates a new
The estimated residuals uˆ0 j are called shrunken residuals or sometimes empirical Bayes estimates variable, which we call here newschoolid, taking on values 1, 2, ... for the groups formed by
or posterior estimates. schoolid. The order of the groups is that of the sort order of schoolid.
P5.1 Comparing Groups using Multilevel Modelling P5.2 Adding Student-level Explanatory Variables: Random Intercept Models
P5.2 Adding Student-level Explanatory Variables:

Random Intercept Models
Load “5.2.dta” into memory and open the do-file for this lesson:

Go to Module 5: Introduction to Multilevel Modelling, and scroll down to
Stata Datasets and Do-files

Click “ 5.2.dta” to open the dataset
We begin by allowing for a linear cohort effect:
scoreij = β 0 + β1cohort90ij + u0 j + eij
. xtmixed score cohort90 || schoolid:, mle variance nostderr
Performing EM optimization:
Notice that the confidence intervals around the residual estimates vary greatly in Performing gradient-based optimization:
their width; smaller schools will have wider intervals than larger schools.
Note that because we have not accounted for intake ability, we cannot interpret
these residuals as “school effects” in the value-added sense that it is used in Group variable: schoolid Number of groups = 508
school effectiveness research. Unfortunately, no measure of prior attainment is
available from the Scottish School Leavers Survey. Nevertheless, exam Obs per group: min = 1
avg = 66.9
performance at age 16 is an important educational outcome because it is a strong max = 190
predictor of post-16 educational attainment and entry to university depends on
attainment rather than progress. In these exercises, we will study trends in mean
Wald chi2(1) = 6120.93
attainment and variation in attainment between individuals and between schools. Log likelihood = -140456.79 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
Don’t forget to take the online quiz! score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cohort90 | 1.214954 .0155293 78.24 0.000 1.184518 1.245391
_cons | 30.55915 .3225441 94.74 0.000 29.92698 31.19133
From within the LEMMA learning environment ------------------------------------------------------------------------------
• Go down to the section for Module 5: Introduction to Multilevel Modelling ------------------------------------------------------------------------------
• Click " 5.1 Comparing Groups Using Multilevel Modelling" Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
to open Lesson 5.1 -----------------------------+------------------------------------------------
schoolid: Identity |
• Click Q1 to open the first question var(_cons) | 45.98856 . . .
-----------------------------+------------------------------------------------
var(Residual) | 219.2879 . . .
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 3158.04 Prob >= chibar2 = 0.0000
P5.2 Adding Student-level Explanatory Variables: Random Intercept Models P5.2 Adding Student-level Explanatory Variables: Random Intercept Models
The equation of the average fitted regression line (across schools) is
ˆ ij = 30.559 + 1.215 cohort90ij

score
The fitted line for a given school will differ from this average line in its intercept,
by an amount uˆ0 j for school j . However, the slope of the school lines is assumed
to be fixed at 1.215, i.e. the effect of cohort is assumed the same for all schools.
A plot of the predicted school lines will show a set of parallel lines. To produce
ˆ
this plot, we first need to compute score for each student, based on their cohort
and school. We do this using the predict command with the fitted option to
create a new variable (predscore) which is equal to the average fitted regression
line plus the relevant school’s intercept:
. predict predscore, fitted
Next we create a variable to pick out the minimum amount of data required to
plot the predicted school lines (see P3.1.2 in Module 3 where we explain this
approach in detail):.
. egen pickone = tag(schoolid cohort90)

Careful examination of the top left hand corner of the graph shows that these
We will use the twoway command with the connected plottype instead of the commands have not been totally successful. A few scatter points belonging to
different schools are connected vertically. This has occurred as a small number of
line plottype in order to display markers (i.e. the data points) in addition to the
schools are observed for only one cohort and so cohort90 does not jump to a lower
school lines. First, however, we must arrange observations in ascending order
value as we move from one of these schools to the next in the dataset.
based on the values of schoolid, and within each value of schoolid arrange
observations in ascending order based on the values of cohort90. This is required
To circumvent this problem, we will reproduce the graph for the subset of schools
as we use the connect(ascending) option to connect points as long as
which are observed for two or more cohorts. To do this, we first generate a new
cohort90 is increasing. Whenever cohort90 jumps to a lower value, the two
variable multiplecohorts and initially set its values equal to those of the pickone
scatter points are not connected. Sorting the data in this way ensures that only
variable.
scatter points for the same school are connected.
. generate multiplecohorts = pickone
. sort schoolid cohort90
. twoway connected predscore cohort90 if pickone==1, connect(ascending) We will then replace multiplecohorts with the value 0 for those schools observed
for only one cohort. This can be achieved by sorting the observations within each
school by cohort90 and then looking to see whether the value of the last
observation of cohort90 in each school is the same as the first observation. If it
is, the school is observed for only one cohort and we set multiplecohorts equal to
0. The relevant command is:
. bysort schoolid (cohort90): replace multiplecohorts = 0 ///

> if cohort90[_N]==cohort90[1]
(32 real changes made)
where we have used the /// line join indicator to inform Stata that the two lines
of code form one command. If we did not do this, Stata would incorrectly
interpret the second line as a new command.
In this command, we have used the bysort prefix which repeats the command
after the colon for each group of observations for which the values of the variables
P5.2 Adding Student-level Explanatory Variables: Random Intercept Models P5.2 Adding Student-level Explanatory Variables: Random Intercept Models
listed between the bysort prefix and the colon are the same. The use of Returning to the results and comparing with the results for the null model of P5.1,
parentheses in this variable list verifies that the data are sorted first by schoolid we can see that the addition of cohort has reduced the amount of variance at both
and then by cohort90 and then repeats the command after the colon once for the school and the student level. The between-school variance has reduced from
each value of schoolid only. Had we omitted the parentheses, Stata would have 61.02 to 45.99, and the within-school variance has reduced from 258.36 to 219.29.
instead repeated the command after the colon once for each observed The decrease in the within-school variance is expected because cohort is a
combination of schoolid and cohort90. student-level variable. The large reduction in the between-school variance
suggests that the distribution of students by cohort differs from school to school
The command after the colon replaces the dummy variable multiplecohorts with (see C5.2.3). In Module 3 (C3.1.1) we found that, pooling across all schools, the
the value 0 when the if logical expression is true, but otherwise leaves the proportions in each cohort were:
dummy variable unchanged. Within the expression, we use explicit subscripting
[_N] and [1] to refer to the last and first values of cohort90 within each school. Table 5.1. Proportion of students in each cohort
The single line of output after the command states that 32 changes have been Year 1984 1986 1988 1990 1996 1998
made to multiplecohorts. This informs us that 32 schools in the data are observed % students 19.1 18.6 15.4 12.9 12.5 21.6
for only one cohort.
One source of the variation in these proportions across schools can be seen from
Now we can simply repeat the previous twoway command but this time we the plot of the predicted lines above. If you look at the top line (corresponding to
condition upon multiplecohorts taking the value 1. the school with the highest intercept), you can see that there are only three
predicted values, for cohort90 = -4, -2 and 0 (1986, 1988 and 1990). This is
. twoway connected predscore cohort90 if multiplecohorts==1, connect(ascending)
because, in this school, no data were collected for 1984, 1996 and 1998. Clearly,
for this school, the proportions for the missing years will be zero. Similarly, for
the school with the second lowest intercept, there are no data points for the last
two cohorts (cohort90 = 6 and 8).
After accounting for cohort effects, the proportion of unexplained variance that is
due to differences between schools decreases slightly to 45.99/(45.99 + 219.29) =
17%.
Don’t forget to take the online quiz!

• Click " 5.2 Multilevel Regression with a Level 1 Explanatory Variable:
Random Intercept Models"
to open Lesson 5.2
The graph is now plotted correctly.
P5.3 Allowing for Different Slopes across Schools: Random Slope Models P5.3 Allowing for Different Slopes across Schools: Random Slope Models
P5.3 Allowing for Different Slopes across Schools: Fit the model:
Random Slope Models . xtmixed score cohort90 ///

> || schoolid: cohort90, covariance(unstructured) ///
> mle variance nostderr
In the previous exercise, we allowed for school effects on the mean attainment by
allowing the intercept of the regression of attainment on cohort to vary randomly Performing EM optimization:
across schools. We assumed, however, that cohort changes in attainment are the Performing gradient-based optimization:
same for all schools, i.e. the slope of the regression line was assumed fixed across
schools. We will now extend the random intercept model fitted at the end of P5.2 Iteration 1: log likelihood = -140343.09
to allow both the intercept and the slope to vary randomly across schools.
Group variable: schoolid Number of groups = 508
Obs per group: min = 1
From within the LEMMA Learning Environment avg = 66.9
max = 190
Go to Module 5: Introduction to Multilevel Modelling, and scroll down to
Stata Datasets and Do-files Wald chi2(1) = 2376.07

Log likelihood = -140343.09 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Fit the model: cohort90 | 1.233902 .0253135 48.74 0.000 1.184289 1.283516
_cons | 30.60963 .313448 97.65 0.000 29.99529 31.22398
------------------------------------------------------------------------------
scoreij = β 0 + β1cohort90ij + u0 j + u1j cohort90ij + eij
------------------------------------------------------------------------------
Note that a new term u1j has been added to the model, so that the coefficient of -----------------------------+------------------------------------------------
schoolid: Unstructured |
cohort90 has become β1j = β1 + u1j , and so the community-level variance has been var(cohort90) | .1605836 . . .
var(_cons) | 42.85854 . . .
replaced by a matrix with two new parameters, σ u21 and σ u 01 . cov(cohort90,_cons) | -1.024181 . . .
-----------------------------+------------------------------------------------
var(Residual) | 215.7394 . . .
------------------------------------------------------------------------------
 u0 j  0 σ2  LR test vs. linear regression: chi2(3) = 3385.44 Prob > chi2 = 0.0000
~ MVN ( 0, Ω u ) , 0 =   , Ωu =  u 0
  0 σ σ 2 
 u1j     u 01 u1  Note: LR test is conservative and provided only for reference.
In the output, the estimate of the intercept variance σ u20 is given to the right of
Note that the slope residual, and associated variance and covariance, have a
subscript of ‘1’ because cohort90 is the 1st explanatory variable in the model (not var(_cons) while the estimate of the slope variance σ u21 is given to the right of
including the constant). var(cohort90). Note, however, that the model output reports the random
slope variance before the random intercept variance. This is because although
Stata includes a constant term by default, it includes it as the last variable in the
list of explanatory variables rather than the first. We can see that this is the case
in both the fixed and random parts of the model. The estimate of the covariance
σ u 01 is given to the right of cov(cohort90,_cons) and is reported after the
variance parameters. Note, we had to specify the covariance(unstructured)
option to allow the random intercepts and slopes to covary (as opposed to the
default that they are independent).
The last line of output states that the likelihood ratio test, which compares the The intercept-slope correlation is estimated as:
current model to a single-level model, is conservative and provided only for
reference.10 This means that the reported p-value is not the correct p-value; σû 01 −1.024
rather it is an upper bound for the correct p-value. The test is described as ρû 01 = = = −0.390
2 2
σˆ σˆ 42.859 × 0.161
conservative as when the correct p-value is 0.05 (i.e. the multilevel model is just u0 u1
preferred to the single-level model), the reported p-value will be slightly higher
leading us to incorrectly favour the simpler model. However, as long as the We can obtain the estimated correlation directly by using the estat recov
reported p-value is less than 0.05 then the same will be the case for the correct p- command with the corr option to display the school level random effects
value and so it is safe to infer that the multilevel model is preferred to the single- correlation matrix:11
level model.
. estat recov, corr
Random-effects correlation matrix for level schoolid

P5.3.1 Testing for random slopes
| cohort90 _cons
-------------+----------------------
We can use a likelihood ratio test to test whether the cohort effect varies across cohort90 | 1
schools. The null hypothesis for this test is that the two additional parameters _cons | -.3903978 1
σ u 01 and σ u21 are simultaneously equal to zero. The log-likelihood value for the
To estimate the school intercepts and slopes we use the predict command with
random intercept model was found to be -140457 (P5.2), so the likelihood ratio the reffects options. We specify two new variables the first for the random
test statistic is slopes, the second for the random intercepts. This ordering reflects the order in
which the random effects were specified in the xtmixed command:
LR = 2 (-140343 - -140457) = 228 on 2 d.f.
. predict u1 u0, reffects
So there is very strong evidence that the cohort effect differs across schools.
P5.3.2 Interpretation of random cohort effects across schools
The cohort effect for school j is estimated as 1.234 + uˆ1j , and the between-school
variance in these slopes is estimated as 0.161. For the ‘average’ school we predict
an increase of 1.234 points in the attainment score for each successive cohort. A
95% coverage interval for the school slopes is estimated as 1.234 ± 1.96 0.161 =
0.448 to 2.020. Thus, assuming a normal distribution, we would expect the middle
95% of schools to have a slope between 0.448 and 2.020.
The intercept variance of 42.835 is interpreted as the between-school variance

when cohort90 = 0, i.e. for the 1990 cohort.
P5.3.3 Examining intercept and slope residuals for schools
The negative covariance estimate of -1.024 means that schools with a high
intercept (above-average attainment in 1990) tend to have a flatter-than-average
slope. Similarly, schools with a low slope (below-average attainment in 1990) tend
to have seen a more marked increase in attainment with cohort (above-average
slope).
11
Note omitting the corr option would result in the school-level random effects covariance matrix
10
See help j_xtmixedlr for further details. being displayed.
To obtain a plot of the school slopes versus the school intercepts, uˆ1j vs. uˆ0 j : ˆ
To produce a plot of the predicted school lines, we first need to compute score
for each student, based on their cohort and school:
. egen pickone = tag(schoolid)
. predict predscore, fitted
. scatter u1 u0 if pickone==1, yline(0) xline(0) ///
> ytitle("Slope of cohort90 (u1j)") xtitle("Intercept (u0j)")
As in P5.2, we plot the fitted regression lines for the subset of schools for which
we have multiple cohorts of data:
. egen multiplecohorts = tag(schoolid cohort90)
. bysort schoolid (cohort90): replace multiplecohorts = 0 ///

> if cohort90[_N]==cohort90[1]
(32 real changes made)
. twoway connected predscore cohort90 if multiplecohorts==1, connect(ascending)
where we have used the ytitle() and xtitle() options to add axes titles to
the graph. The use of double quotes is not necessary, but we find that it makes
the syntax easier to read.
From this plot, it is possible to identify, for example, those schools which had a
lower-than-average attainment in 1990 but a better-than-average year-on-year
improvement. Schools in the top-left quadrant are such schools while schools in
the bottom-left quadrant also had a below-average mean attainment in 1990, but
the below-average slopes for these schools means that they continued at this low
level. P5.3.4 Between-school variance as a function of cohort
The equation for the fitted regression line for school j is The random slope model we have fitted implies that the between-school variance
in attainment is a function of cohort; that is, the amount of between-school
ˆ ij = (30.610 + uˆ0 j ) + (1.234 + uˆ1j ) cohort90ij
score variance differs across cohorts.
In C5.3.5 (Equation 5.9), we saw that for a model with a random slope for an
where the values of uˆ0 j and uˆ1j are shown in the pairwise residual plot shown explanatory variable xij , the level 2 variance is:
above.
var(u0 j + u1j xij ) = var(u0 j ) + 2xij cov(u0 j , u1j ) + xij2 var(u1j )
= σ u20 + 2σ u 01xij + σ u21xij2
Substituting cohort90 for x , and the estimates for σ u20 , σ u 01 and σ u21 , we obtain: P5.3.5 Adding a random coefficient for gender (dichotomous x)
Between-school variance = 42.859 – 2.048 cohort90 +0.161 cohort902 In Module 3, we found that the mean attainment was higher for girls than for boys.
We will now consider whether this gender difference is the same across schools by
Applying this equation to selected cohorts we obtain the following estimates of introducing a random coefficient for gender.12
level 2 variance.
We will start be adding a fixed effect for gender. This will be our comparison
Table 5.2. Estimates of the between-school variance model for testing for a random coefficient.
cohort90 Year Between-school variance scoreij = β 0 + β1cohort90ij + β 2 femaleij + u0 j + u1j cohort90ij + eij
-6 1984 42.859 – (2.048 × -6) + [0.161 × (-6)2] = 60.943
. xtmixed score cohort90 female ///
0 1990 42.859 > || schoolid: cohort90, covariance(unstructured) ///
6 1996 42.859 – (2.048 × 6) + (0.161 × 62] = 36.367 > mle variance nostderr
We would therefore conclude that the mean attainment increased with cohort,
and the variation in mean attainment among schools has decreased. 7
We can produce a plot of the between-school variance with the twoway command
and the function plottype. The equation for the between-school variance is Mixed-effects ML regression Number of obs = 33988
typed as part of the command where we can think of x as corresponding to the
cohort90 variable. However, the command does not make use of any data, it Obs per group: min = 1
simply plots the line associated with the typed equation. The range(-6 8) avg = 66.9
max = 190
option specifies that the function should only be plotted for when x ranges
between -6 and 8. This restricts the plot of the between school variance to the
Wald chi2(2) = 2517.61
cohorts in the data (1984 to 1998). Log likelihood = -140272.07 Prob > chi2 = 0.0000
. twoway function 42.859 + -2.048*x + 0.161*x^2, range(-6 8) ------------------------------------------------------------------------------

-------------+----------------------------------------------------------------
cohort90 | 1.227326 .0253264 48.46 0.000 1.177687 1.276965
female | 1.944526 .1629805 11.93 0.000 1.62509 2.263962
_cons | 29.58487 .3240554 91.30 0.000 28.94974 30.22001
------------------------------------------------------------------------------
------------------------------------------------------------------------------
-----------------------------+------------------------------------------------
var(cohort90) | .1612602 . . .
var(_cons) | 42.57498 . . .
cov(cohort90,_cons) | -1.030571 . . .
-----------------------------+------------------------------------------------
var(Residual) | 214.8374 . . .
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(3) = 3403.21 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
12
As noted in Module 3, we use the more general term ‘coefficient’ rather than ‘slope’ for
categorical explanatory variables. The term ‘slope’ is reserved for straight line relationships
between y and a continuous x .
To add a random coefficient for gender: this model with the previous model with a fixed gender effect, is testing the null
hypothesis that all three of these parameters are equal to zero.
scoreij = β 0 + β1cohort90ij + β2 femaleij + u0 j + u1j cohort90ij + u2 j femaleij + eij
The likelihood ratio test statistic is
. xtmixed score cohort90 female ///
> || schoolid: cohort90 female, covariance(unstructured) /// LR = 2 (-140269.45 - -140272.07) = 5.24 on 3 d.f.
Performing EM optimization: This is not significant at the 5% level (the 5% point of a chi-squared distribution on
3 d.f. is 7.82), so we cannot reject the null hypothesis and we conclude that the
gender effect is the same for each school. We therefore revert to a model with a
Iteration 0: log likelihood = -140272 fixed coefficient for female.
P5.3.6 Adding a random coefficient for social class (categorical x)
In Module 3, we found strong social class effects on attainment. We will now
Obs per group: min = 1 explore whether these class effects can be assumed the same across schools.
avg = 66.9
max = 190
Before adding social class to the model, we create three dummy variables for
Wald chi2(2) = 2524.31
when sclass is 1, 2 and 4 respectively (taking social class 3 as the reference
Log likelihood = -140269.45 Prob > chi2 = 0.0000 category).
------------------------------------------------------------------------------ . generate sclass1 = sclass==1
-------------+---------------------------------------------------------------- . generate sclass2 = sclass==2
cohort90 | 1.22777 .0253452 48.44 0.000 1.178094 1.277446
female | 1.93145 .1738955 11.11 0.000 1.590621 2.272278 . generate sclass4 = sclass==4
_cons | 29.58908 .317659 93.15 0.000 28.96648 30.21169
------------------------------------------------------------------------------
------------------------------------------------------------------------------
-----------------------------+------------------------------------------------
var(cohort90) | .1617015 . . .
var(female) | 1.37019 . . .
var(_cons) | 40.55858 . . .
cov(cohort90,female) | -.0530665 . . .
cov(cohort90,_cons) | -1.008308 . . .
cov(female,_cons) | 1.535505 . . .
-----------------------------+------------------------------------------------
var(Residual) | 214.5159 . . .
------------------------------------------------------------------------------
The effect of gender in school j is estimated as 1.931 + uˆ2 j . Allowing for a

random effect of gender at the school level has led to the addition of three new
random parameters to the model ( σ u 02 , σ u12 , σ u22 ). The estimate of the random
coefficient variance for female σ u22 is reported to the right of var(female). The
covariance between the intercept and female σ u 02 is given to the right of
cov(female,_cons) while the covariance between cohort90 and female σ u12 is
given to the right of cov(cohort90,female). A likelihood ratio test, comparing
We will start by fitting a fixed effect for social class: Next we add random coefficients for the social class dummy variables:
scoreij = β0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij scoreij = β0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij
+u0 j + u1j cohort90ij + eij +u0 j + u1j cohort90ij + u3 j sclass1ij + u4 j sclass2ij + u5 j sclass4ij + eij
. xtmixed score cohort90 female sclass1 sclass2 sclass4 /// where:

Performing EM optimization:  u0 j   σ u20 

   
Performing gradient-based optimization:  u1j  2
 σ u 01 σ u1 
 u  ~ N 0, Ω , Ωu =  σ u 03 σ u13 ,
Iteration 0: log likelihood = -138346.13  3j  ( u) 
σ u23

(
eij ~ N 0,σ e2 )
 u4 j   σ u 04 σ u14 σ u 34 σ u24 
  σ 
Mixed-effects ML regression Number of obs = 33988 u   u 05 σ u15 σ u 35 σ u 45 σ u25 
Group variable: schoolid Number of groups = 508  5j 
avg = 66.9
max = 190 Fit the model:
. xtmixed score cohort90 female sclass1 sclass2 sclass4 ///

Wald chi2(5) = 6918.15 > || schoolid: cohort90 sclass1 sclass2 sclass4, ///
Log likelihood = -138346.13 Prob > chi2 = 0.0000 > covariance(unstructured) ///
> mle variance nostderr emonly
------------------------------------------------------------------------------
score | Coef. Std. Err. z P>|z| [95% Conf. Interval] Performing EM optimization:
-------------+----------------------------------------------------------------
cohort90 | 1.182831 .0243149 48.65 0.000 1.135175 1.230488 Iteration 0: log likelihood = -138913.58
female | 1.961342 .1542812 12.71 0.000 1.658956 2.263727 Iteration 1: log likelihood = -138603.9
sclass1 | 11.08567 .2063932 53.71 0.000 10.68115 11.4902 Iteration 2: log likelihood = -138482.58
sclass2 | 5.875198 .2040505 28.79 0.000 5.475266 6.275129 Iteration 3: log likelihood = -138424.11
sclass4 | -3.737739 .2845318 -13.14 0.000 -4.295412 -3.180067 Iteration 4: log likelihood = -138391.59
_cons | 24.60987 .2796221 88.01 0.000 24.06182 25.15792 Iteration 5: log likelihood = -138371.6
------------------------------------------------------------------------------ Iteration 6: log likelihood = -138358.4
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] Iteration 9: log likelihood = -138337.37
-----------------------------+------------------------------------------------ Iteration 10: log likelihood = -138333.42
schoolid: Unstructured | Iteration 11: log likelihood = -138330.28
var(cohort90) | .150845 . . . Iteration 12: log likelihood = -138327.74
var(_cons) | 22.51349 . . . Iteration 13: log likelihood = -138325.64
cov(cohort90,_cons) | -.5841601 . . . Iteration 14: log likelihood = -138323.88
-----------------------------+------------------------------------------------ Iteration 15: log likelihood = -138322.39
var(Residual) | 192.9457 . . . Iteration 16: log likelihood = -138321.11
LR test vs. linear regression: chi2(3) = 1797.04 Prob > chi2 = 0.0000 Iteration 18: log likelihood = -138319.05
Note: LR test is conservative and provided only for reference. Iteration 20: log likelihood = -138317.45


avg = 66.9
max = 190
Wald chi2(5) = 5267.22

------------------------------------------------------------------------------
-------------+---------------------------------------------------------------- Table 5.3 presents estimated model parameters from a version of the model where
cohort90 | 1.183949 .0246006 48.13 0.000 1.135732 1.232165
female | 1.96809 .1540539 12.78 0.000 1.66615 2.270031 we did not specify the emonly option (i.e. where a gradient-based method was
sclass1 | 11.2049 .2591391 43.24 0.000 10.69699 11.7128 used and the model was allowed to iterate until convergence).13 There are very
sclass2 | 6.12456 .2414345 25.37 0.000 5.651357 6.597763
sclass4 | -3.1019 .3462576 -8.96 0.000 -3.780552 -2.423247
considerable differences between the two sets of parameter estimates. This
_cons | 24.3627 .2372518 102.69 0.000 23.8977 24.8277 highlights the importance of checking that convergence has been reached. In the
------------------------------------------------------------------------------ analysis below we will therefore interpret the estimates from the converged
------------------------------------------------------------------------------ model.
-----------------------------+------------------------------------------------
var(cohort90) | .1581957 . . .
var(sclass1) | 9.988941 . . .
var(sclass2) | 6.943944 . . .
var(sclass4) | 14.72518 . . .
var(_cons) | 12.22236 . . .
cov(cohort90,sclass1) | -.1699083 . . .
cov(cohort90,_cons) | -.5084394 . . .
cov(sclass1,sclass2) | 5.726013 . . .
cov(sclass1,_cons) | 3.49037 . . .
cov(sclass2,_cons) | 3.230988 . . .
cov(sclass4,_cons) | 6.062856 . . .
-----------------------------+------------------------------------------------
var(Residual) | 190.7541 . . .
------------------------------------------------------------------------------

Note: EM algorithm failed to converge
Before interpreting the output, it is important to notice that we have specified the
emonly option. We have done this as the model takes a very long time to
converge. Specifying the emonly option leads the model to stop prematurely,
after just 20 EM iterations (note that a maximum of 20 iterations is the default
setting). Stopping the model prematurely has some advantages: we can check the
output to see that we have specified the model correctly and we can get a rough
idea as to whether the additional random effects might be important. However, it
is important to realise that this model has not converged (the final line of output
confirms this) and so the estimates should never be used as final estimates.
13
Note that we have reordered both the fixed and random part parameter estimates to agree with
the way we have written down the model as opposed to the order in which the parameters appear
in the model output above. Note also that no standard errors are reported for the random part
parameters. This is because we have continued to specify the nostderr option as omitting this
option prevents the model from converging.
Table 5.3. Estimates from the converged model The new model contains a large number of additional random parameters. There
are 12 more parameters in this model than in the fixed social class effects model.
Parameter Estimates Standard errors The likelihood ratio test statistic for a comparison of these models is:
β0 (_cons) 24.401 0.231
LR = 2 (-138306 - -138346) = 80 on 12 d.f.
β1 (cohort90) 1.185 0.024
β2 (female) 1.966 0.154 The 5% point of a chi-squared distribution on 12 d.f. is 21.03, so we conclude that
there is evidence that the effect of social class on attainment differs across
β3 (sclass1) 11.182 0.244 schools.
β4 (sclass2) 6.072 0.220
The coefficients of sclass1, sclass2 and sclass4 have a fixed component,
β5 (sclass4) -3.190 0.314 representing contrasts with the reference category 3 (working class) on average,
and a school-specific component. For example, after accounting for cohort and
11.267 - gender effects, children with a parent in a professional or managerial occupation
σ u20 var(_cons)
(sclass = 1) attending school j are expected to have an attainment score that is
σ u 01 cov(cohort90,_cons) -0.554 - 11.2 + uˆ3 j points higher than working class children in the same school.
σ 2
var(cohort90) 0.156 -
u1
4.813 -
Due to the large number of parameters in the random part of this model, the
σ u 03 cov(sclass1,_cons)
simplest way to interpret the random coefficient for class is to compute the
σ u13 cov(cohort90,sclass1) -0.111 - between-school variance
σ 2
var(sclass1) 7.136 -
u3
(
var u0 j + u1j cohort90ij + u3 j sclass1ij + u4 j sclass2ij + u5 j sclass4ij )
σ u 04 cov(sclass2,_cons) 5.059 -
σ u14 cov(cohort90,sclass2) -0.118 - We will do this for each social class category, holding constant the value of
cohort90 (the other variable with a random coefficient). For convenience, we will
σ u 34 cov(sclass1,sclass2) 4.175 -
fix cohort90 at zero, so the between-school variances will refer to 1990. This
σ 2
var(sclass2) 3.321 - simplifies the expression for the between-school variance to:
u4
σ u 05 cov(sclass4,_cons) 8.077 -
(
var u0 j + u3 j sclass1ij + u4 j sclass2ij + u5 j sclass4ij )
σ u15 cov(cohort90,sclass4) -0.442 -
σ u 35 cov(sclass1,sclass4) 3.141 - We can use the display command as a substitute for calculating by hand. For
example, the variance for category 1 of sclass (sclass1 = 1, sclass2 = 0 and
σ u 45 cov(sclass2,sclass4) 2.957 -
sclass4 = 0) is:
σ u25 var(sclass4) 7.182 -
σ e2 191.768 - ( )
var u0 j + u3 j = σ u20 + 2σ u 03 + σ u23
. display 11.267 + 2*4.813 + 7.136

Log likelihood -138306.57 28.029
Similarly, the variance for category 2 of sclass (sclass1 = 0, sclass2 = 1 and sclass4
= 0) is:
( )
var u0 j + u4 j = σ u20 + 2σ u 04 + σ u24
. display 11.267 + 2*5.059 + 3.321

24.706
P5.3 Allowing for Different Slopes across Schools: Random Slope Models P5.4 Adding Level 2 Explanatory Variables
The variance for category 3 of sclass (sclass1 = 0, sclass2 = 0 and sclass4 = 0) is P5.4 Adding Level 2 Explanatory Variables
simply:
In the last two exercises, you have seen how to add level 1 explanatory variables
( )
var u0 j = σ u20 to the model and interpret the results of random intercept and random slope
models. A key motivation for using multilevel modelling, however, is to assess the
. display 11.267 effects of level 2 explanatory variables on level 1 outcomes and the extent to
11.267
which they can explain the level 2 variance. In education, for example, we may
be interested in the contextual effect of prior attainment on students’ later
While the variance for category 4 of sclass (sclass1 = 0, sclass2 = 0 and sclass4 =
academic performance. A student’s progress may be affected by the performance
1) is:
of others in their peer group, and this effect may differ according to the student’s
own prior attainment (a cross-level interaction).
( )
var u0 j + u5 j = σ u20 + 2σ u 05 + σ u25
Our example dataset contains three school-level variables that are potential
. display 11.267 + 2*8.077 + 7.182 predictors of a student’s attainment at age 16: schtype (independent vs. state
34.603
schools), schurban (urban vs. rural location of school), and schdenom (Roman
Catholic vs. non-denominational school). In this exercise, we will add these
The between-school variance is similar for the first two categories variables to our model and consider whether the effect on attainment of one of
(professional/managerial and intermediate), highest for the unclassified group them depends on a selected student-level variable.
(category 4) and lowest for working class children (category 3). This implies that
the school attended matters most for the unclassified group (in terms of their age As in any analysis, we should look at the distribution of our variables before
16 attainment), and least for working class children. For example, the difference including them in a model.
between unclassified and working class in school j is estimated as -3.190 + uˆ5 j .
The estimated variance of u 5 j is 7.182, so a 95% coverage interval for the Load “5.4.dta” into memory and open the do-file for this lesson:
unclassified-working class difference is −3.190 ± 1.96 7.182 = -8.443 to 2.063. From within the LEMMA Learning Environment
Suppose we rank schools according to their unclassified-working class difference, Go to Module 5: Introduction to Multilevel Modelling, and scroll down to
such that schools with the largest difference (in favour of working class children)
are ranked lowest. In the bottom 2.5% of schools, unclassified children are Stata Datasets and Do-files
expected to have a mean score that is more than 8.443 points lower than working
class children. In the top 2.5% of schools, however, the difference is estimated to
be more than 2.063, in favour of unclassified children.
Once again, it is important to note that these school differences should not be
interpreted as school effects in the usual sense because we have not accounted for
prior attainment. Of most interest is the extent of between-school variance in the
progress made by children from different social backgrounds.

• Click "5.3 Allowing for Different Slopes Across Groups: Random Slope Models"
to open Lesson 5.3
P5.4 Adding level 2 Explanatory Variables P5.4 Adding level 2 Explanatory Variables
Each school-level variable is binary, so we will simply look at the proportion in We will add these variables, one at a time, to a simplified version of the model
each category. This can be done by using the tab1 command to produce one-way fitted at the end of P5.3. Although we found evidence that the effect of social
tables of frequencies. We restrict the scope of the command to one record per class on attainment differs across schools, we will work with a simpler model by
school. removing the random coefficients on the class dummy variables.
. tab1 schtype schurban schdenom if pickone==1 scoreij = β 0 + β1cohort90ij + β 2 femaleij + β 3sclass1ij + β 4 sclass2ij + β5sclass4ij
-> tabulation of schtype if pickone==1 +u0 j + u1j cohort90ij + eij
School type | Freq. Percent Cum.
------------+-----------------------------------
0 | 456 89.76 89.76
1 | 52 10.24 100.00
------------+-----------------------------------
Total | 508 100.00 > mle variance nostderr
-> tabulation of schurban if pickone==1 Performing EM optimization:
School | Performing gradient-based optimization:

urban-rural |
classificat | Iteration 0: log likelihood = -138346.13
ion | Freq. Percent Cum. Iteration 1: log likelihood = -138346.13
------------+-----------------------------------
0 | 163 32.09 32.09 Mixed-effects ML regression Number of obs = 33988
1 | 345 67.91 100.00 Group variable: schoolid Number of groups = 508
------------+-----------------------------------
Total | 508 100.00 Obs per group: min = 1
avg = 66.9
-> tabulation of schdenom if pickone==1 max = 190
School |
denominatio | Wald chi2(5) = 6918.15
n | Freq. Percent Cum. Log likelihood = -138346.13 Prob > chi2 = 0.0000
------------+-----------------------------------
0 | 425 83.66 83.66 ------------------------------------------------------------------------------
1 | 83 16.34 100.00 score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------+----------------------------------- -------------+----------------------------------------------------------------
Total | 508 100.00 cohort90 | 1.182831 .0243149 48.65 0.000 1.135175 1.230488
female | 1.961342 .1542812 12.71 0.000 1.658956 2.263727
sclass1 | 11.08567 .2063932 53.71 0.000 10.68115 11.4902
You should obtain the following proportion of schools in category 1 of each sclass2 | 5.875198 .2040505 28.79 0.000 5.475266 6.275129
sclass4 | -3.737739 .2845318 -13.14 0.000 -4.295412 -3.180067
variable: schtype (10% independent), schurban (68% urban), and schdenom (16% _cons | 24.60987 .2796221 88.01 0.000 24.06182 25.15792
Catholic). ------------------------------------------------------------------------------
------------------------------------------------------------------------------
-----------------------------+------------------------------------------------
var(cohort90) | .150845 . . .
var(_cons) | 22.51349 . . .
cov(cohort90,_cons) | -.5841601 . . .
-----------------------------+------------------------------------------------
var(Residual) | 192.9457 . . .
------------------------------------------------------------------------------
P5.4 Adding Level 2 Explanatory Variables P5.4 Adding Level 2 Explanatory Variables
P5.4.1 Contextual effects error. There has also been a slight reduction in the school-level variance. After
accounting for school type, the between-school variance for the 1990 cohort (the
We will begin by adding school type (independent vs. state) to the model. intercept variance) reduces from 22.5 to 20.6. However, there remains a large
amount of unexplained between-school variance.
scoreij = β0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij
We will now add in the urban-rural indicator of school location:
+ β6 schtype j
+u0 j + u1j cohort90ij + eij scoreij = β0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij
+ β6 schtype j + β7 schurban j
. xtmixed score cohort90 female sclass1 sclass2 sclass4 schtype /// +u0 j + u1j cohort90ij + eij
Performing EM optimization: > schtype schurban ///
Group variable: schoolid Number of groups = 508 Iteration 0: log likelihood = -138328.96
avg = 66.9 Mixed-effects ML regression Number of obs = 33988
max = 190 Group variable: schoolid Number of groups = 508

Wald chi2(6) = 6997.32 avg = 66.9
Log likelihood = -138333.44 Prob > chi2 = 0.0000 max = 190
------------------------------------------------------------------------------
score | Coef. Std. Err. z P>|z| [95% Conf. Interval] Wald chi2(7) = 7018.43
-------------+----------------------------------------------------------------
cohort90 | 1.184027 .0242101 48.91 0.000 1.136576 1.231478
female | 1.963768 .154259 12.73 0.000 1.661426 2.26611 ------------------------------------------------------------------------------
sclass1 | 11.03064 .2069528 53.30 0.000 10.62502 11.43626 score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
sclass2 | 5.856441 .2041423 28.69 0.000 5.45633 6.256553 -------------+----------------------------------------------------------------
sclass4 | -3.750241 .2845443 -13.18 0.000 -4.307938 -3.192544 cohort90 | 1.181905 .0242256 48.79 0.000 1.134424 1.229386
schtype | 4.247496 .8168863 5.20 0.000 2.646428 5.848564 female | 1.966677 .1542528 12.75 0.000 1.664347 2.269006
sclass1 | 11.03297 .2069363 53.32 0.000 10.62738 11.43856
_cons | 24.27927 .2787147 87.11 0.000 23.733 24.82554
sclass2 | 5.84713 .2041836 28.64 0.000 5.446938 6.247323
------------------------------------------------------------------------------
sclass4 | -3.739987 .2845715 -13.14 0.000 -4.297737 -3.182237
schtype | 4.391871 .8092549 5.43 0.000 2.80576 5.977981
------------------------------------------------------------------------------
schurban | -1.437171 .4763462 -3.02 0.003 -2.370793 -.50355
-----------------------------+------------------------------------------------ _cons | 25.25994 .4272198 59.13 0.000 24.4226 26.09727
schoolid: Unstructured | ------------------------------------------------------------------------------
var(cohort90) | .1481224 . . .
var(_cons) | 20.56986 . . . ------------------------------------------------------------------------------
cov(cohort90,_cons) | -.4585435 . . . Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------ -----------------------------+------------------------------------------------
var(Residual) | 192.9941 . . . schoolid: Unstructured |
------------------------------------------------------------------------------ var(cohort90) | .148265 . . .
LR test vs. linear regression: chi2(3) = 1681.11 Prob > chi2 = 0.0000 var(_cons) | 19.95181 . . .
cov(cohort90,_cons) | -.4522767 . . .
Note: LR test is conservative and provided only for reference. -----------------------------+------------------------------------------------
var(Residual) | 193.0111 . . .
------------------------------------------------------------------------------
A child in an independent school would be expected to have a score that is 4.25 LR test vs. linear regression: chi2(3) = 1650.58 Prob > chi2 = 0.0000
points higher than a child in a state school (from the same cohort, and of the same Note: LR test is conservative and provided only for reference.
sex and social background). We can see that this effect is strongly statistically
significant because the estimated coefficient is more than 5 times its standard
On average, a student in an urban school has a score that is 1.44 points lower than The ratio of the estimated coefficient of schdenom to its standard error is less
a student attending a school in a town or rural area. This difference is adjusted than 0.3, so there is little evidence of a difference between Catholic and non-
for the effects of school type, and student cohort, gender and social class. The denominational schools. We will therefore remove this variable from our model.14
between-school variance in 1990 has decreased further but by a very small amount
(from 20.6 to 20.0).
P5.4.2 Cross-level interactions
Finally, we will test for differences in attainment by school denomination.
Our analysis thus far has revealed that student attainment at age 16 is significantly
scoreij = β0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij related to the year in which the exams were taken (cohort), and student gender
+ β6 schtype j + β7 schurban j + β8 schdenom j and parental social class. At the school level, there are differences in student
attainment between independent and state schools, and between urban and rural
+u0 j + u1j cohort90ij + eij schools. However, we have considered only main effects of these variables. In
practice, the relationship between y and an explanatory variable x1 may depend
> schtype schurban schdenom /// on the value of another variable x 2 , i.e. an interaction effect between x1 and x 2 .
In a multilevel model, x1 and x 2 may be defined at the same or different levels. If
they are at different levels, the interaction is referred to as a cross-level
interaction.
To illustrate cross-level interactions and their interpretation, we will test for an
interaction between cohort (level 1) and school type (level 2). We will also
explore whether a cohort-school type interaction can explain between-school
Mixed-effects ML regression Number of obs = 33988 differences in attainment trends (i.e. whether such an interaction reduces some of
the variance of the random part of the slope for cohort). First we generate this
Obs per group: min = 1 new interaction variable:
avg = 66.9
max = 190
. generate cohort90Xschtype = cohort90*schtype
Wald chi2(8) = 7019.84

------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
cohort90 | 1.182018 .0242188 48.81 0.000 1.13455 1.229486
female | 1.966649 .1542535 12.75 0.000 1.664318 2.26898
sclass1 | 11.0335 .2069607 53.31 0.000 10.62786 11.43913
sclass2 | 5.847271 .2041904 28.64 0.000 5.447065 6.247477
sclass4 | -3.740548 .284578 -13.14 0.000 -4.298311 -3.182786
schtype | 4.398108 .8108271 5.42 0.000 2.808916 5.9873
schurban | -1.462191 .4849347 -3.02 0.003 -2.412646 -.511737
schdenom | .1698495 .6015188 0.28 0.778 -1.009106 1.348805
_cons | 25.2484 .4293585 58.80 0.000 24.40687 26.08993
------------------------------------------------------------------------------
------------------------------------------------------------------------------
-----------------------------+------------------------------------------------
schoolid: Unstructured | 14
It is a possible that a variable with a non-significant main effect could be involved in a significant
var(cohort90) | .1481931 . . . interaction effect. To illustrate how this might arise, suppose we have a binary student-level
var(_cons) | 19.96586 . . .
cov(cohort90,_cons) | -.4598982 . . .
variable z (coded 0 and 1). Suppose also that attending a Catholic school is associated with higher
-----------------------------+------------------------------------------------ attainment among students with z = 0, but lower attainment among students with z = 1. This would
var(Residual) | 193.0126 . . . be an example of an interaction between school denomination and z (actually a cross-level
------------------------------------------------------------------------------ interaction because the two variables are defined at different levels). If the categories of z are of
LR test vs. linear regression: chi2(3) = 1647.69 Prob > chi2 = 0.0000 a similar size, ignoring the interaction with z and allowing only for an overall main effect of school
denomination is likely to lead to an apparently non-significant effect. We will not pursue this
Note: LR test is conservative and provided only for reference. possibility here.
We then add this interaction to the model: school differences in attainment trends: the school-level variance in the cohort90
coefficient has reduced only slightly from 0.148 to 0.138.
scoreij = β0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij
To see the nature of the interaction effect, consider the fixed part of the model
+ β6 schtype j + β7 schurban j + β8 cohort90Xschtypeij
that contains cohort90 and schtype:
+u0 j + u1j cohort90ij + eij
1.214 cohort90 + 5.291schtype − 0.599 cohort90Xschtype
> schtype schurban cohort90Xschtype ///
> || schoolid: cohort90, covariance(unstructured) /// For schtype = 0 (state schools), this equation reduces to:
1.214 cohort90

So in the average state school (i.e. with u1j = 0 ),15 we would expect a year-on-
Iteration 0: log likelihood = -138312.52 year increase in attainment of 1.214 points.
Mixed-effects ML regression Number of obs = 33988 For schtype = 1 (independent schools), this equation reduces to:
Obs per group: min = 1 1.214 cohort90 + 5.291- 0.599 cohort90 = (1.214 - 0.599) cohort90 + 5.291
avg = 66.9
max = 190 = 0.615 cohort90 + 5.291
Wald chi2(8) = 7154.79 So in the average independent school, we would expect a year-on-year increase in
Log likelihood = -138312.52 Prob > chi2 = 0.0000 attainment of 0.615 points.
------------------------------------------------------------------------------
score | Coef. Std. Err. z P>|z| [95% Conf. Interval] The coefficient of schtype (estimated as 5.291) is the expected difference in
-------------+---------------------------------------------------------------- attainment between independent and state schools in 1990 (i.e. when
cohort90 | 1.21353 .0244236 49.69 0.000 1.16566 1.261399 cohort90 = 0).
female | 1.970255 .1541846 12.78 0.000 1.668059 2.272451
sclass1 | 11.01941 .2068684 53.27 0.000 10.61395 11.42486
sclass2 | 5.830755 .2041076 28.57 0.000 5.430712 6.230798 Our overall conclusion is that the mean attainment is higher in independent
sclass4 | -3.742837 .2844436 -13.16 0.000 -4.300336 -3.185338
schtype | 5.290782 .8307023 6.37 0.000 3.662635 6.918929
schools than in state schools, but independent schools experienced a smaller
schurban | -1.403769 .4828696 -2.91 0.004 -2.350176 -.4573615 increase in attainment with cohort. As in our earlier analyses, it would be
cohort90Xs~e | -.599423 .1037954 -5.78 0.000 -.8028582 -.3959878 interesting to investigate whether the trends in progress are different for
_cons | 25.18708 .4320316 58.30 0.000 24.34031 26.03384
------------------------------------------------------------------------------ independent and state schools.
------------------------------------------------------------------------------
-----------------------------+------------------------------------------------
var(cohort90) | .1380192 . . .
var(_cons) | 20.41395 . . . From within the LEMMA learning environment
cov(cohort90,_cons) | -.3906912 . . . • Go down to the section for Module 5: Introduction to Multilevel Modelling
-----------------------------+------------------------------------------------
var(Residual) | 192.8513 . . . • Click "5.4 Adding Level 2 Explanatory Variables"
------------------------------------------------------------------------------ to open Lesson 5.4
LR test vs. linear regression: chi2(3) = 1651.36 Prob > chi2 = 0.0000 • Click to open the first question
Q
Note: LR test is conservative and provided only for reference. 1(i)
The estimated coefficient of the interaction variable cohort90Xschtype is almost

6 times its standard error, so this is strong evidence that the effect of cohort
differs for independent and state schools. (Equivalently, we can say that the
difference between independent and state schools differs across cohorts.) 15
The effect of cohort varies randomly across schools, so we fix the school cohort residual at its
However, the addition of this interaction effect does little to explain between- mean of zero to examine the cohort-school type interaction effect.
P5.5 Complex Level 1 Variation P5.5 Complex Level 1 Variation
P5.5 Complex Level 1 Variation To allow boys and girls to have separate variances, we specify the
residuals(independent, by(female)) option (the residuals() option is
In a random slope (coefficient) model, the level 2 variance is a function of the available as of Stata 11).
explanatory variable(s) with a random coefficient. For example, in P5.3, we
allowed the effects of cohort and social class to vary randomly across schools, scoreij = β 0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij
which implies that the between-school variance depends on cohort and class. Up + β6 schtype j + β7 schurban j + β8 cohort90Xschtypeij
to this point, however, we have assumed that the level 1 (within-school) variance
+u0 j + u1j cohort90ij + eij
is constant. In this exercise, we will allow the within-school variance to depend on
explanatory variables in a complex level 1 variance model.
where:
( )
var eij = σ e20 for males
Go to Module 5: Introduction to Multilevel Modelling, and scroll down to = σ e21 for females
Stata Datasets and Do-files . xtmixed score cohort90 female sclass1 sclass2 sclass4 ///
> schtype schurban cohort90Xschtype ///
Click “ 5.5.dta” to open the dataset > || schoolid: cohort90, covariance(unstructured) ///
> residuals(independent, by(female)) ///
> mle variance
Obtaining starting values by EM:

P5.5.1 Within-school variance as a function of cohort (continuous x) Performing gradient-based optimization:
Unfortunately, it is not possible to use the xtmixed command to fit models where Iteration 0: log likelihood = -138312.52
the level 1 variance is a function of a continuous variable. We recommend that Iteration 2: log likelihood = -138303.77
the interested reader considers the gllamm command which can fit such models
Computing standard errors:
(Rabe-Hesketh and Skrondal, 2008). What is possible is to fit models where the
level 1 variance is a function of a categorical variable. An example is given in Mixed-effects ML regression Number of obs = 33988
P5.5.2. Group variable: schoolid Number of groups = 508

avg = 66.9
P5.5.2 Within-school variance as a function of gender (dichotomous max = 190
x)
Wald chi2(8) = 7189.37
We will extend the model in P5.4.2 to assess whether boys and girls differ in terms
of the variability in their scores. We have already found that the mean score is ------------------------------------------------------------------------------
higher among girls than boys from fitting a model with a dummy for gender in the -------------+----------------------------------------------------------------
fixed part of the model. We will now include gender in the random level 1 part of cohort90 | 1.215688 .0243563 49.91 0.000 1.16795 1.263425
the model. female | 1.969503 .1544564 12.75 0.000 1.666774 2.272232
sclass1 | 11.01903 .2067112 53.31 0.000 10.61388 11.42418
sclass2 | 5.835307 .203975 28.61 0.000 5.435523 6.23509
In C5.5.2 we considered two alternative ways of specifying a model where the sclass4 | -3.765776 .2843587 -13.24 0.000 -4.323109 -3.208443
level 1 variance depends on a dichotomous variable. The preferred approach is to schtype | 5.311512 .8300741 6.40 0.000 3.684597 6.938428
schurban | -1.413508 .4821454 -2.93 0.003 -2.358496 -.4685208
specify separate level 1 residuals for boys and girls, and then to estimate a cohort90Xs~e | -.5929665 .1036443 -5.72 0.000 -.7961056 -.3898274
separate variance for each. We do this by including in the random level 1 part of _cons | 25.19513 .4319988 58.32 0.000 24.34843 26.04184
the model a dummy for boys and a separate dummy for girls. We do not need to ------------------------------------------------------------------------------
create these variables as the xtimixed command will do this automatically for us ------------------------------------------------------------------------------
when we specify a model with complex level 1 variation. Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
var(cohort90) | .1369753 .0181445 .1056544 .1775813
var(_cons) | 20.38501 1.786094 17.16842 24.20425
P5.5 Complex Level 1 Variation P5.5 Complex Level 1 Variation
cov(cohort90,_cons) | -.4000899 .1334368 -.6616212 -.1385586

-----------------------------+------------------------------------------------
Residual: Independent, |
by female |
0: var(e) | 199.5662 2.273028 195.1605 204.0714
1: var(e) | 186.8744 2.009239 182.9775 190.8542 From within the LEMMA learning environment
------------------------------------------------------------------------------ • Go down to the section for Module 5: Introduction to Multilevel Modelling
• Click " 5.5 Complex Level 1 Variance"
Note: LR test is conservative and provided only for reference. to open Lesson 5.5
The likelihood ratio test statistic comparing this model with the constant level 1
variance model is:
LR = 2(-138304 - -138313) = 18 on 1 d.f. Don’t forget to take the quiz that tests you on the whole of Module
5!
So there is strong evidence (critical value at 5% is 3.84) that the amount of within-
school variance differs for boys and girls. The estimated within-school variance is
From with in the LEMMA learning environment
186.874 for girls and 199.566 for boys. So girls have a higher mean attainment
than boys and there is less variation in their scores.
• Click "Module 5 Understanding Quiz" to open the Quiz
P5.5.3 Within-school variance as a function of cohort and gender
Unfortunately, it is not possible to use the xtmixed command to fit models where P5.6 References
the level 1 variance is a function of a continuous variable. We recommend that
the interested reader considers the gllamm command which can fit such models Rabe-Hesketh, S. and Skrondal, A. (2008) Multilevel and longitudinal modeling
(Rabe-Hesketh and Skrondal, 2008). using Stata (Second Edition). College Station, TX: Stata Press.

Stata Practical Multilevel

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stata Practical Multilevel

Uploaded by

Copyright:

Available Formats

Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling

Pre-requisites Introduction to the Scottish Youth Cohort Trends

Performing gradient-based optimization:

Partitioning variance ------------------------------------------------------------------------------

. list schoolid u0 u0se u0rank if pickone==1 & schoolid<=10

P5.2 Adding Student-level Explanatory Variables:

From within the LEMMA Learning Environment

Stata Datasets and Do-files

We begin by allowing for a linear cohort effect:

scoreij = β 0 + β1cohort90ij + u0 j + eij

. xtmixed score cohort90 || schoolid:, mle variance nostderr

The equation of the average fitted regression line (across schools) is

ˆ ij = 30.559 + 1.215 cohort90ij

. predict predscore, fitted

. egen pickone = tag(schoolid cohort90)

. bysort schoolid (cohort90): replace multiplecohorts = 0 ///

Don’t forget to take the online quiz!

From within the LEMMA learning environment

The graph is now plotted correctly.

Random Slope Models . xtmixed score cohort90 ///

Stata Datasets and Do-files Wald chi2(1) = 2376.07

Random-effects correlation matrix for level schoolid

P5.3.2 Interpretation of random cohort effects across schools

The intercept variance of 42.835 is interpreted as the between-school variance

P5.3.3 Examining intercept and slope residuals for schools

. egen multiplecohorts = tag(schoolid cohort90)

. bysort schoolid (cohort90): replace multiplecohorts = 0 ///

. twoway connected predscore cohort90 if multiplecohorts==1, connect(ascending)

. twoway function 42.859 + -2.048*x + 0.161*x^2, range(-6 8) ------------------------------------------------------------------------------

Note: LR test is conservative and provided only for reference.

Note: LR test is conservative and provided only for reference.

The effect of gender in school j is estimated as 1.931 + uˆ2 j . Allowing for a

. xtmixed score cohort90 female sclass1 sclass2 sclass4 /// where:

Performing EM optimization:  u0 j   σ u20 

. xtmixed score cohort90 female sclass1 sclass2 sclass4 ///

Mixed-effects ML regression Number of obs = 33988

Obs per group: min = 1

Wald chi2(5) = 5267.22

Note: LR test is conservative and provided only for reference.

. display 11.267 + 2*4.813 + 7.136

. display 11.267 + 2*5.059 + 3.321

Don’t forget to take the online quiz!

From within the LEMMA learning environment

-> tabulation of schurban if pickone==1 Performing EM optimization:

School | Performing gradient-based optimization:

Note: LR test is conservative and provided only for reference.

Obs per group: min = 1

Wald chi2(8) = 7019.84

Performing gradient-based optimization:

The estimated coefficient of the interaction variable cohort90Xschtype is almost

Obtaining starting values by EM:

Obs per group: min = 1

cov(cohort90,_cons) | -.4000899 .1334368 -.6616212 -.1385586

P5.5.3 Within-school variance as a function of cohort and gender

You might also like

. twoway function 42.859 + -2.048x + 0.161x^2, range(-6 8) ------------------------------------------------------------------------------