Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Journal of Applied Research in Higher Education

Evaluating the effectiveness of a mathematics bridge program using propensity


scores
Sally A. Lesik Karen G. Santoro Edward A. DePeau
Article information:
To cite this document:
Sally A. Lesik Karen G. Santoro Edward A. DePeau , (2015),"Evaluating the effectiveness of a
mathematics bridge program using propensity scores", Journal of Applied Research in Higher
Education, Vol. 7 Iss 2 pp. 331 - 345
Permanent link to this document:
http://dx.doi.org/10.1108/JARHE-01-2014-0010
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

Downloaded on: 20 September 2016, At: 20:46 (PT)


References: this document contains references to 39 other documents.
To copy this document: permissions@emeraldinsight.com
The fulltext of this document has been downloaded 276 times since 2015*
Users who downloaded this article also downloaded:
(2015),"Effectiveness of instructor-led collaborative learning in the classroom", Journal of
Applied Research in Higher Education, Vol. 7 Iss 2 pp. 134-145 http://dx.doi.org/10.1108/
JARHE-07-2012-0018
(2015),"Learning spaces in virtual worlds: bringing our distance students home", Journal of Applied
Research in Higher Education, Vol. 7 Iss 1 pp. 83-98 http://dx.doi.org/10.1108/JARHE-02-2014-0026
(2015),"The challenges of creating an online undergraduate community of practice", Journal
of Applied Research in Higher Education, Vol. 7 Iss 1 pp. 99-112 http://dx.doi.org/10.1108/
JARHE-03-2014-0043

Access to this document was granted through an Emerald subscription provided by All users group
For Authors
If you would like to write for this, or any other Emerald publication, then please use our Emerald
for Authors service information about how to choose which publication to write for and submission
guidelines are available for all. Please visit www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.com
Emerald is a global publisher linking research and practice to the benefit of society. The company
manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as
well as providing an extensive range of online products and additional customer resources and
services.
Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the
Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for
digital archive preservation.

*Related content and download information correct at time of download.


The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2050-7003.htm

Evaluating the effectiveness of a Mathematics


bridge
mathematics bridge program program
using propensity scores
Sally A. Lesik, Karen G. Santoro and Edward A. DePeau 331
Department of Mathematical Sciences, Central Connecticut State University, Received 29 January 2014
New Britain, Connecticut, USA Revised 26 August 2014
16 November 2014
Accepted 2 December 2014
Abstract
Purpose – The purpose of this paper is to illustrate how to examine the effectiveness of a pilot
summer bridge program for elementary algebra using propensity scores. Typically, selection
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

into treatment programs, such as summer bridge programs, is based on self-selection. Self-selection
makes it very difficult to estimate the true treatment effect because the selection process itself
often introduces a source of bias.
Design/methodology/approach – By using propensity scores, the authors can match students
who participated in the summer bridge program with equivalent students who did not participate
in the summer bridge program. By matching students in the treatment group to equivalent students
who do not participate in the treatment, the authors can obtain an unbiased estimate of the treatment
effect. The authors also describe a method to conduct a sensitivity analysis to estimate the amount
of hidden bias generated from unobserved factors that would be needed to alter the inferences made
from a propensity score matching analysis.
Findings – Findings suggest there is no significant difference in the pass rates of the subsequent
intermediate algebra course for students who participated in the summer bridge program when
compared to matched students who did not participate in the summer bridge program. Thus, students
who participate in the summer bridge program fared no better or worse when compared to similar
students who do not participate in the program. These findings also appear to be robust to hidden bias.
Originality/value – This study describes a unique way to estimate the causal effect of participating
in a treatment program when there is self-selection into the treatment program.
Keywords Elementary algebra, Intermediate algebra, Mathematics education,
Propensity score analysis, Sensitivity analysis, Rosenbaum bounds, Summer bridge programmes
Paper type Research paper

Introduction
Developmental mathematics programs have been a controversial issue in higher
education for many years (Adelman, 1998; Boylan et al., 1994; Boylan and Saxon, 1998).
The need to evaluate developmental mathematics programs is widespread, with most
institutions doing some type of analysis on their effectiveness (Altieri, 1990; Umoh
et al., 1994; Waycaster, 2001, 2004). Early intervention programs for developmental
mathematics, are often used to bridge the gap between high school and college-level
mathematics courses (Boylan, 1999; Boylan et al., 1992). One type of such an early
intervention program is summer bridge program (Garcia, 1991; Strayhorn, 2011).
Summer bridge programs are usually offered to students in the summer before they
enroll in their first semester. The objectives of such bridge programs are to increase
students’ precollege mathematics knowledge through targeted instruction in content areas
Journal of Applied Research in
of need (Edgecombe, 2011), and sometimes also include assistance with study skills and Higher Education
a general acclimation to campus and college life. These programs can help students avoid Vol. 7 No. 2, 2015
pp. 331-345
sitting through an entire semester of remedial coursework when in reality they may only © Emerald Group Publishing Limited
2050-7003
need to review a portion of the content found in the course (Boylan and Saxon, 1999). DOI 10.1108/JARHE-01-2014-0010
JARHE Many community colleges and universities offer bridge programs and have
7,2 researched their effectiveness. For instance, the National Center for Postsecondary
Research (Barnett et al., 2012) published a study on eight summer bridge programs in
Texas. The study found that in the first year and a half after the summer bridge
programs, students that completed the programs passed their first college-level math
course at higher rates when compared to students who did not participate in the
332 programs. In addition, the percentage of students that passed their first college-level
mathematics course after participating in a summer bridge program was higher than
non-participating students and remained statistically significant for the first four
semesters following participation in the program.
A summer bridge program in mathematics, reading, and writing conducted in 2009
at Elgin Community College in Illinois, found that 70 percent of students who
participated in the program then placed into a college-level course. In the following
year, the percentage of bridge students that placed into a college-level mathematics
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

course increased to 87 percent. They also found that 82 percent of the successful
summer bridge students then earned the grade of C or better in the next subject-related
college-level course that fall as compared to 76 percent of students who did not
participate in the program (Douglas and Schaid, 2010).
These are just a few examples of studies suggesting that summer bridge programs
can be an effective way to get students ready for college-level courses and their success
can be attributed to many different factors. One such factor is the use of a variety of
instructional methods such as hands-on and visual approaches to learning (Boylan and
Saxon, 1999). Bonham and Boylan (2011) also suggest utilizing technology to deliver a
variety of instructional methods, especially with developmental mathematics programs
where students will use technology to identify strengths and weaknesses in their
content knowledge.
Although summer bridge programs appear to be effective in getting students
prepared for college-level courses, most of the research on the effectiveness of these
programs relies on observational data and this poses a problem for estimating the true
effect of such an intervention program. Many of the aforementioned studies used simple
descriptive techniques to compare students who participated in the bridge program to
those students who did not. As with many observational studies in mathematics
education, when participation in a treatment program is based on selection, this can be a
concern because selection effects can bias the estimate of a treatment effect (Graham,
2010; Graham and Kurlaender, 2011).
Summer bridge programs are typically designed for students that demonstrate
weaker skills and most programs invite, rather than mandate, students to participate
in the program. Thus, some invited students may choose to participate while others do
not. This makes estimating the true program effect very difficult because the reasons
students self-select into such a program vary and are based on many different factors
that may or may not also have an effect on student performance in and after the
program (Shadish et al., 2002). For example, students with the weakest mathematical
skills may more frequently elect to participate in a summer bridge program because
they (and/or their parents) are more likely to believe they need the remediation and
hope the program will help them. It may also be that more females will select to
participate in a bridge program. Thus, the only way to estimate the true effectiveness of
such programs would be to conduct a random assignment (Shadish et al., 2002).
A random assignment would require assigning students to participate in the
summer bridge program based only on chance (Shadish et al., 2002). By randomly
assigning students to participate in the summer bridge program (the treatment group) Mathematics
versus not participating in the summer bridge program (the control group), this would bridge
establish the two groups as being equivalent except for the treatment assignment.
If both groups were equivalent at the onset of the study, then a simple comparison of the
program
success rates between the two groups would be an unbiased estimate of the treatment
effect. However, with summer bridge programs, assigning students to participate
by virtue of random assignment is not practical. Thus, we are forced to deal with 333
self-selection which is one of the key issues in evaluating summer bridge programs and it
can make the estimate of the program effect biased (Schneider et al., 2007).
To address selection effects with observational data, statistical techniques such as
propensity scores (Graham, 2010; Graham and Kurlaender, 2011; Guo and Fraser, 2010;
Morgan and Winship, 2007, Rubin, 1997) can be used to obtain an unbiased estimate
of a treatment effect. Propensity score analyses is a way to match treatment and control
participants and thus obtain an unbiased estimate of the treatment effect (Guo and
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

Fraser, 2010; Morgan and Winship, 2007; Rubin, 2006).


The basic idea behind propensity scores is to first formulate a statistical model
of the selection process by using pre-treatment characteristics to describe the selection
process and then comparing participants in the treatment and control groups that have
equivalent propensity scores. For instance, gender and skill level could be seen as some
of the core pre-treatment variables that could describe the selection process into
a summer bridge program. More specifically, females, are often perceived as having
weaker mathematical skills (Niederle and Vesterlund, 2010) and stronger verbal skills
(Hyde and Linn, 1988), and these are factors that could be used to model selection into a
treatment program.
The paper will describe the evaluation of a pilot summer bridge program to
illustrate how propensity scores can be used to estimate the effect of participating in a
summer bridge program in mathematics (i.e. the treatment effect) on subsequent
success in mathematics even when there is self-selection into the treatment program.
We will also describe in detail how to use a sensitivity analysis to address the limitation
of hidden bias with a propensity score analysis as well as other limitations, and present
a discussion of some of the findings.

The Summer Institute


The summer bridge program we examine in this study, the Summer Institute,
was offered at a mid-sized public university in the northeastern USA. In an effort
to encourage steady progress toward graduation, that state passed a law that
mandated all students fulfill a remedial mathematics proficiency requirement through
placement testing or by passing an elementary algebra course within their first
24 credits. Elementary algebra at this university is a course that serves remediation
purposes only – it carries no credit toward graduation. After 24 credits, students who
have not yet met the remediation requirement are not allowed to register for classes
at any of the state universities until it is met. The Summer Institute was created
through a grant and began in 2010 and it was developed for incoming first-year
students who have not yet fulfilled the state’s remediation requirement. The goal was to
provide students with academic support in mathematics along with the opportunity to
meet the proficiency requirement during the summer before enrolling for the first time
in the fall semester.
Students eligible to participate in the Summer Institute for the 2010 cohort were
identified by the score they received on the mathematics portion of the SAT examination.
JARHE Scores of 450 or less place a student into the non-credit remedial mathematics course
7,2 (elementary algebra) and all students with such scores who were also state residents were
invited to attend. Participants self-selected into the program and admission was granted
to approximately 40 students on a first-come, first-serve basis. The program was
completely cost-free to students.
The Summer Institute program consisted of on-campus workshops, online practice
334 problem sets, and an exemption examination. Students were expected to attend each of
12 two-hour workshops that were held three times per week in July. These workshops
were led by faculty with successful experience teaching elementary algebra at the
university, and the structure of the workshops was a combination of faculty-led
instruction along with individual and group practice and activities. For at-home study,
sets of questions were made available to students online through an online learning
management system used by the university. A database of questions was created by
faculty to align with the elementary algebra curriculum at the institution. For the
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

cohort in this study, there were six problem sets, one for each of the six units in the
elementary algebra curriculum, and students had access to them prior to and
throughout the Summer Institute. The problem sets had to be completed in one sitting
but could be attempted (with the same question types but with different questions)
as often as students chose. Students received immediate feedback after each attempt in
a review that showed each question again along with the student answer and the
correct answer, and those attempt reviews could be examined again at any time. On the
last day of the program, students took the exemption examination. The examination
is based on the database of questions used for the practice problem sets and is
administered and proctored on campus. Scores of 70 percent or better earn exemption
from elementary algebra and provide placement into the next course, intermediate
algebra. Although intermediate algebra does not count toward the general education
requirements for the university, it does earn credit toward graduation.

Data
Data collected consists of n ¼ 506 full-time, first-time students, all of whom took
intermediate algebra in their first semester in the fall of 2010. One group consisted of
students who initially placed into, and enrolled directly in the intermediate algebra
course and the other group was the treatment group. The treatment group consisted of
approximately 18 students who originally placed into elementary algebra and then
successfully completed the bridge program during the summer of 2010, which elevated
their placement to intermediate algebra, and who also enrolled in intermediate algebra in
the fall of 2010. About half of the students that successfully completed the bridge program
did not enroll in intermediate algebra in the fall of 2010 when the data were collected
despite their eligibility[1]. Descriptive statistics for both the treatment and control groups
can be found in Table I. Variables collected for this study are described as follows:
INSTITUTE – this is a binary variable that represents whether or not a student
participated in the Summer Institute (the value “1” was assigned if the student participated
in the Summer Institute, and “0” was assigned if the student did not participate in the
Summer Institute).
PASS – this is a binary variable that represents whether or not a student passed the
intermediate algebra course with the grade of a C- or better on their first attempt[2]
(the value “1” was assigned if the student passed intermediate algebra with the grade of
C- or better on their first attempt, and “0” if the student did not earn at least a C- on their
first attempt taking the course).
FEMALE – this is a binary variable that represents a student’s gender (the value “1” Mathematics
was assigned if the student identified as a female and “0” was assigned if the student bridge
identified as male).
SATMATH – this is a continuous score received on the mathematics portion of the
program
SAT examination (scores range from 280 to 540).
SATVERBAL – this is a continuous score received on the verbal portion of the SAT
examination (scores range from 290 to 690). 335
SATWRITING – this is a continuous score received on the written portion of the
SAT examination (scores range from 320 to 710).

Analysis
The effect of participating in the Summer Institute on whether or not a student passed
intermediate algebra on their first try, was initially estimated by considering the
percentage of students who passed intermediate algebra based on whether or not they
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

participated in the Summer Institute. The numbers of students who passed and failed
intermediate algebra based on seminar participation is presented in Table II.
Approximately 74.0 percent of students who did not participate in the Summer
Institute passed the intermediate algebra course as compared to only 50.0 percent of
students who participated in the Summer Institute (Fisher’s exact test, p o 0.05). This
finding would suggest that the Summer Institute did not help students succeed in
intermediate algebra, and in fact, the students who participated in the Summer Institute
fared worse than did non-participating students, as a significantly smaller percentage
of students who participated in the Summer Institute passed intermediate algebra as
compared to those students who did not participate.
However, the question arises as to whether or not the larger percentage of non-
participants who passed intermediate algebra was because the program was
ineffective, or was it because those students who participated in the Summer Institute
(the treatment group) were somehow systematically different from those students who

Summer Institute participation


Yes No Table I.
Variable n Mean SD n Mean SD Summary statistics
for Summer Institute
Female 18 0.61 0.50 488 0.49 0.50 and non-participants
SAT math 18 426.11 23.04 488 495.68 39.42 who enrolled in
SAT verbal 18 505.56 45.27 488 496.33 58.50 intermediate algebra
SAT writing 18 485.56 68.88 484 496.43 62.53 in the fall 2010
Note: n ¼ 506 semester

Pass Table II.


Summer Institute No Yes Total Number of students
who passed and
No 127 361 488 failed intermediate
Yes 9 9 18 algebra based on
Total 136 370 506 Summer Institute
Note: n ¼ 506 participation
JARHE did not participate in the Summer Institute (the control group). Furthermore, the
7,2 number of Summer Institute participants in this study is significantly smaller (n ¼ 18)
than the number of students in the control group (n ¼ 488).
One indication that the treatment and control groups are different can be seen by
looking at the mean differences in the gender and mathematics knowledge levels,
as demonstrated by SAT scores between the treatment and control groups illustrated
336 in Table I. For example, 61.1 percent of the treatment group is female as compared
to 49.0 percent of the control group. The mean SAT mathematics score for the
treatment group is 426.11 as compared to 495.68 for the control group. The mean
SAT verbal score for the treatment group is 505.56 as compared to 496.33 for the
control group. The mean SAT writing score for the treatment group is 485.56 as
compared to 496.43 for the control group. Thus, we can see that the treatment and
control groups are not equivalent when considering the mathematical, verbal,
and writing skills of the students, and this is especially noticeable for the mean SAT
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

mathematics scores.
Propensity scores can be used to establish equivalent treatment and control groups
by creating a model that predicts treatment status using relevant pre-treatment
characteristics (Graham, 2010; Graham and Kurlaender, 2011; Guo and Fraser, 2010;
Lunceford and Davidian, 2004). The propensity score model used in this study is
a logistic regression analysis that predicts participation in the treatment program
based on gender, SAT mathematics score, SAT verbal score, and SAT writing score
as is described in Equation (1). This model estimates the conditional probability of
participating in the Summer Institute (treatment program) based on the aforementioned
collection of relevant pre-treatment variables. The results of this analysis are presented
in Table III:
probðtreatmentjpre  treatment variablesÞ
1
¼ (1)
ðb0 þ b1 Female þ b2 SATMATH þ b3 SATVERBAL þ b4 SATWRITINGÞ
1þe
There are many different ways to match equivalent observations in the treatment and
control groups with propensity scores using techniques such as Mahalanobis metric
matching, k-nearest neighbor matching, and caliper matching to name a few (Guo and

Table III.
Parameter estimates,
standard errors,
test statistics, and
p-values for the
logistic regression
model of selection
that predicts Variable Estimated parameter SE Test statistic p-Value
Summer Institute
participation based Female 0.247 0.579 0.43 0.669
on gender, SAT math −0.043 0.008 −5.29 0.000
SAT mathematics Sat verbal 0.019 0.006 2.98 0.003
score, SAT verbal Sat writing 0.003 0.006 0.46 0.645
score, and SAT Constant 5.631 2.421 2.33 0.020
writing score Note: n ¼ 502
Fraser, 2010). We decided on one-to-one nearest neighbor matching as this technique Mathematics
matches each individual participant in the treatment group to an individual member of bridge
the control group with the closest propensity score. We initially decided on sampling
with replacement because there is reason to believe that the propensity score
program
distributions were likely to be different between Summer Institute participants and
non-participants. This is probably due to the fact that fewer students participated in
the Summer Institute as compared to non-participants (Smith and Todd, 2005; Caliendo 337
and Kopeinig, 2005).
Table IV shows the summary statistics for the participants and non-participants
who are matched with replacement (i.e. those non-participants who have similar
estimated propensity scores). Notice that in the matched with replacement sample that
there is the same percentage of females in the treatment and control groups, and the
mean SAT mathematics scores show much less of a difference.
Propensity scores can be used to create matched treatment and control groups and
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

once equivalent groups are established, regression analysis or simple inferential


techniques can then be used to model the outcome of interest (Graham, 2010; Graham
and Kurlaender, 2011; Guo and Fraser, 2010). First finding propensity scores for the
treatment, and then considering only the matched cases can lead to obtaining an
unbiased estimate of the treatment effect provided that the functional form of the
propensity score model is specified correctly (Rubin, 1997, 2006). Thus, an unbiased
estimate of the treatment effect can be found because the treatment and control groups
are now more similar with respect to gender and skill level. This improves our ability to
make causal inferences about a treatment effect as compared to trying to do so on
unmatched samples (Graham, 2010; Graham and Kurlaender, 2011; Guo and Fraser,
2010; Lunceford and Davidian, 2004; Morgan and Winship, 2007; Rosenbaum and
Rubin, 1983; Rubin, 1997, 2006; Schneider et al., 2007).
Table V presents a two-way table describing the pass rates for the matched with
replacement group of Summer Institute participants and non-participants. No significant

Summer Institute participation


Yes No Table IV.
Variable n Mean SD n Mean SD Summary statistics
for Summer Institute
Female 18 0.61 0.50 18 0.61 0.50 and matched (with
SAT math 18 426.11 23.04 18 434.44 44.75 replacement)
SAT verbal 18 505.56 45.27 18 520.00 77.61 non-participants
SAT writing 18 485.56 68.88 18 514.44 98.35 who enrolled in
Note: n ¼ 36 intermediate algebra

Pass Table V.
Summer Institute No Yes Total Number of students
who passed and
No 10 8 18 failed intermediate
Yes 9 9 18 algebra for matched
Total 19 17 36 (with replacement)
Note: n ¼ 36 students
JARHE difference in the pass rates for the matched samples was found, as 44.4 percent of the
7,2 non-participants passed intermediate algebra on their first try as compared to 50.0
percent of Summer Institute participants (Fisher’s exact test, pW0 05). This contradicts
the previous finding with the non-matched sample suggesting that a lower percentage of
students in the treatment group passed intermediate algebra when compared to those
students in the control group.
338 We can also consider matching without replacement and these summary statistics
are presented in Table VI. Notice that in the sample that was matched without
replacement, that there are some similarities as well as some differences between the
treatment and control groups. For instance, 61 percent of the treatment group is female
as compared to 72 percent of the control group. The mean SAT mathematics score for
the treatment group is 426.11 as compared to 419.44 for the control group. The mean
SAT verbal score for the treatment group is 505.56 as compared to 490.56 for the
control group. The mean SAT writing score for the treatment group is 485.56 as
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

compared to 478.89 for the control group.


Table VII presents a two-way table describing the pass rates for the matched (without
replacement) group of Summer Institute participants and non-participants. No significant
difference in the pass rates for the matched samples was found, as 61.1 percent of the non-
participants passed intermediate algebra on their first try as compared to 50.0 percent of
Summer Institute participants (Fisher’s exact test, pW0 05). This too contradicts the
original finding with the non-matched sample suggesting that a significantly lower
percentage of students in the treatment group passed intermediate algebra when
compared to those students in the control group as this difference is not significant.

Limitations
Perhaps the biggest limitation of propensity scores is that the method only matches on
observed variables and not unobserved variables (Graham, 2010; Graham and

Table VI. Summer Institute participation


Summary statistics Yes No
for Summer Institute Variable n Mean SD n Mean SD
and matched
(without Female 18 0.61 0.50 18 0.72 0.46
replacement) SAT math 18 426.11 23.04 18 419.44 46.33
non-participants SAT verbal 18 505.56 45.27 18 490.56 68.73
who enrolled in SAT writing 18 485.56 68.88 18 478.89 75.61
intermediate algebra Note: n ¼ 36

Table VII.
Number of students Pass
who passed and Summer Institute No Yes Total
failed intermediate
algebra for matched No 7 11 18
(without Yes 9 9 18
replacement) Total 16 20 36
students Note: n ¼ 36
Kurlaender, 2011; Guo and Fraser, 2010; Lunceford and Davidian, 2004; Morgan and Mathematics
Winship, 2007; Rosenbaum and Rubin, 1983; Rubin, 1997, 2006; Schneider et al., 2007). bridge
In other words, the assignment to the treatment is considered “strongly ignorable”
(Rosenbaum and Rubin, 1983). The true value of propensity scores relies on correctly
program
modeling the selection process (Weiss, 1998). If the selection model does not adequately
describe the selection process because there are unobserved factors that are related to the
selection process that are also related to the outcome, then propensity score techniques do 339
not offer much more in terms of reducing bias than do standard inferential techniques.
Thus, hidden bias can exist if there are unobserved factors that impact both the
treatment and the outcome measure simultaneously (Rosenbaum, 2002).
Although there are no direct tests that can be done to detect hidden bias,
a sensitivity analysis that relies on a bounding approach can be used to determine
how strong an unobserved variable needs to be to alter the inferences made with
a propensity score analysis (Becker and Caliendo, 2007; Rosenbaum, 2002). Rosenbaum
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

(2002) developed a method to conduct a sensitivity analysis to estimate the magnitude


of hidden bias generated from unobserved factors that would be needed to alter the
inferences made from a one-to-one matching analysis. The basic idea behind
Rosenbaum bounds is to intentionally violate the “strongly ignorable” assumption by
specifying hypothetical odds ratios that represent an increase in the odds for the
outcome for those exposed to the treatment due to unobserved factors.
The Mantel and Haenszel (1959) test, compares the observed number of students
who did not pass the subsequent intermediate algebra course with the expected
number of students who did not pass the subsequent intermediate algebra course if
there is no treatment effect (Aakvik, 2001). This test statistic is bounded by two
distributions, one distribution that describes overestimating the treatment effect and
the other that describes underestimating the treatment effect. Essentially, selection is
influenced not only by observed factors, but perhaps also by unobserved factors.
Because of these unobserved factors, students who share similar propensity scores
may differ in their odds of participating in the Summer Institute by a factor of Γ.
In other words, given some degree of hidden bias, two students with the same observed
characteristics would differ in their odds of participating in the Summer Institute due to
unobserved factors. When Γ ¼ 1, this implies that there is no hidden bias. If as Γ
increases and the significance level of the estimate of the treatment effect remains
stable, then the estimate of the treatment effect is robust to hidden bias. If as Γ
increases and the significance level of the estimate of the treatment effect does not
remain stable, then the estimate of the treatment effect is sensitive to hidden bias
(Becker and Caliendo, 2007).
Once we have found significance levels for the estimate of the treatment effect based
on different values of Γ, we can see what magnitude of Γ is needed to alter the original
conclusions. Table VIII provides the Mantel-Haenszel statistics as well as the significance
levels of the treatment effect for varying values of Γ for both overestimation and
underestimation. These statistics and significance levels can be used to estimate the
magnitude of hidden bias for one-to-one propensity score matching without replacement
when there is a binary outcome measure (Becker and Caliendo, 2007).
In Table VIII when Γ ¼ 1, this implies that there is no hidden bias (the significance
level for the treatment effect is originally 0.3704), and thus no significant treatment
effect. First consider overestimating the treatment effect. Notice when Γ ¼ 2.50 that the
significance level becomes significant (the p-value for the treatment effect is now
0.0424). This suggests that if unobserved factors caused the odds ratio of selection into
JARHE Mantel-Haenszel statistic Significance level
7,2 Γ Overestimation Underestimation Overestimation Underestimation

1.00 0.3307 0.3307 0.3704 0.3704


1.25 0.6688 0.0031 0.2518 0.4988
1.50 0.9427 −0.2683 0.1729 0.6058
1.75 1.1759 −0.1794 0.1198 0.5712
340 2.00 1.3793 0.0167 0.0839 0.4933
2.25 1.5601 0.1897 0.0594 0.4248
2.50 1.7230 0.3446 0.0424* 0.3652
2.75 1.8715 0.4849 0.0306* 0.3139
3.00 2.0080 0.6133 0.0223* 0.2698
3.25 2.1344 0.7317 0.0164* 0.2322
3.50 2.2523 0.8415 0.0122* 0.2000
3.75 2.3628 0.9441 0.0091* 0.1726
4.00 2.4669 1.0403 0.0068* 0.1491
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

4.25 2.5652 1.1309 0.0052* 0.1290


4.50 2.6586 1.2166 0.0039* 0.1119
4.75 2.7474 1.2979 0.0030* 0.0972
5.00 2.8321 1.3753 0.0023* 0.0845
5.25 2.9133 1.4492 0.0018* 0.0736
5.50 2.9910 1.5198 0.0014* 0.0643
5.75 3.0658 1.5875 0.0011* 0.0562
6.00 3.1377 1.6525 0.0009* 0.0492*
6.25 3.2071 1.7151 0.0007* 0.0432*
6.50 3.2741 1.7754 0.0005* 0.0379*
6.75 3.3389 1.8336 0.0004* 0.0334*
7.00 3.4017 1.8898 0.0003* 0.0294*
7.25 3.4626 1.9443 0.0003* 0.0259*
7.50 3.5216 1.9970 0.0002* 0.0229*
7.75 3.5791 2.0482 0.0002* 0.0203*
8.00 3.6349 2.0979 0.0001* 0.0180*
8.25 3.6893 2.1463 0.0001* 0.0159*
8.50 3.7424 2.1933 0.0001* 0.0141*
8.75 3.7941 2.2391 0.0001* 0.0126*
Table VIII. 9.00 3.8445 2.2837 0.0001* 0.0112*
Significance levels 9.25 3.8938 2.3273 0.0000* 0.0100*
for Rosenbaum 9.50 3.9420 2.3698 0.0000* 0.0089*
bounds for one-to-one 9.75 3.9892 2.4113 0.0000* 0.0079*
matched samples 10.00 4.0353 2.4519 0.0000* 0.0071*
without replacement Notes: n ¼ 36. *p o0.05

the treatment to differ by a factor of 2.5, then we may see a significant difference in the
pass rates between the treatment and control groups. In other words, if students are 2.5
times more likely to participate in the treatment group and these students also have
a higher probability of passing intermediate algebra (i.e. a positive treatment effect),
then the estimate of the treatment effect would be considered an overestimate of the
true but unknown treatment effect, suggesting that there could be hidden bias. As for
underestimating the treatment effect, notice when Γ ¼ 6.00 that the p-value for the
treatment effect is now 0.0492. This suggests that if unobserved factors caused the odds
ratio of selection into the treatment to differ by a factor of 6, then we may see a significant
difference in the pass rates between the treatment and control groups. Thus, if students
are six times more likely to participate in the treatment group and these students also Mathematics
have a higher probability of not passing intermediate algebra (i.e. a negative treatment bridge
effect), then the estimate of the treatment effect would be considered an underestimate of
the true treatment effect, again suggesting that there could be hidden bias. However, for
program
this study, we are not necessarily concerned with overestimating the treatment effect since
this is an example where the treatment effect at Γ ¼ 1 is non-significant and the
underestimation bounds provide the magnitude of hidden bias that would be needed for 341
the treatment effect to become significant (Becker and Caliendo, 2007).
So essentially, the closer Γ is to 1 with a changing significance level, the more
this suggests that the treatment effect may be overestimated or underestimated due
to unobserved factors, and thus the estimate of the treatment effect is not robust to
hidden bias.
Although there are no formal methods to decide the appropriate critical value for Γ,
various rules of thumb suggest that values of Γ which are less than 2.0 that also alter the
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

conclusion regarding the significance of the estimate of the treatment effect should be of
concern. This is because if Γ is less than 2.0, then one student could be less than twice as
likely to participate in the treatment program due to unobserved factors. Since the values
for Γ that we found are greater than 2, this leads us to believe that the estimate of the
treatment effect is robust to hidden bias. This helps strengthen the inference of no
significant difference in the pass rates between those students who participate in the
Summer Institute versus matched students who did not participate in the program.
Even though the main goal of this study is to illustrate how propensity scores can
be used to evaluate the effectiveness of a treatment program, another limitation of this
study is the small sample size of the treatment group and subsequent matched
comparison group. One of the concerns with our study is that even though more than
40 students initially participated in the Summer Institute, and more than 90 percent of
these students passed the elementary algebra exemption examination, only 18 of those
successful students went on to take the intermediate algebra course in the subsequent
fall semester. Most of the other students who passed the Summer Institute did not
take intermediate algebra during the course of time when the data were collected.
Furthermore, some of the students who passed the exemption examination earned
a high enough score on a computerized placement examination to be able to take
college-level mathematics courses above the level of intermediate algebra. Clearly,
a larger sample size would have allowed for a more powerful study and thus have made
for a more robust evaluation (Weiss, 1998).

Discussion
As we have shown, propensity score matching has many advantages over more
traditional methods that consider unmatched samples (Zanutto, 2006). Even though
propensity score matching does rely on the correct functional form of the model between
the participation in the treatment program and the relevant variables, sensitivity
analyses can be done to see if the estimate of the treatment effect is robust to hidden bias.
As Lunceford and Davidian (2004) suggest, even including numerous additional
variables in the propensity score model will increase the precision of the estimate of the
treatment effect. We also found that changes in unobserved characteristics did not
appear to alter the estimate of the treatment effect, thus giving more confidence in the
propensity score analysis.
Perhaps the greatest advantage of using propensity score matching is that it is easy
to describe findings to an audience with little or no statistical background who can
JARHE appreciate the importance of comparing groups that are equivalent. The importance
7,2 of comparing two groups that are similar, gives more confidence in the true estimates of
the treatment effect.
Given that the Summer Institute was a pilot program, further analysis of subsequent
cohorts will surely yield a more extensive analysis. However, with our analysis we
found that students who participated in the pilot summer bridge program fared no
342 better or worse when compared to students who were matched based on gender,
mathematics, verbal, and writing skills. Furthermore, this finding appears to be robust
to hidden bias. In some of the developmental education literature there is a
misunderstanding that students who participate in developmental mathematics
programs should perform better in mathematics than non-developmental students
(Goudas and Boylan, 2012). As Goudas and Boylan (2012) state, the purpose of
developmental education is, “[…] that remedial students should perform equally to
non-remedial students and only in gatekeeper courses […]” (p. 4). Given that the
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

purpose of developmental education is to put students on a level playing field with


non-developmental students, the findings from this study suggest that bridge programs
in mathematics can be a success.
Students that take semester-long remedial courses tend to take longer to complete
their degrees and therefore, are less likely to graduate (Attewell et al., 2006). Instruction
in the summer bridge program targeted individual student’s weaknesses that
prevented them from placing directly into intermediate algebra. The Summer Institute
was effective enough to get approximately 90 percent of the participating students to
place into the next course, intermediate algebra. Success in the program allowed these
students to take intermediate algebra one semester sooner than they would have if they
did not participate in the program. This places first-time developmental students on the
same time line to graduate as first-time non-developmental students.
We attribute the student success in the pilot summer bridge program to an effective
combination of intervention strategies – targeted content instruction, online practice
with immediate feedback, and coaching on general mathematical study strategies,
as well as to the early introduction to campus and college life that the program provides.
Though online delivery of mathematics remediation has received a great deal of
attention in recent years, some research suggests that students benefit most when
technology is used in moderation and in conjunction with in-person instruction and
coaching as was done in this program (Boylan, 2002). The online component was
beneficial to both students and workshop instructors because the online practice
problems gave students the opportunity to practice skills and review concepts, and also
provided immediate feedback. Instructors naturally had access to information about
students’ online progress, so not only could they check assessment scores, but they
could also view individual assignment attempts and monitor the amounts of time
students spent working online, all of which was used to inform the targeted instruction
and coaching.
Also contributing to the success of the program were the non-academic benefits.
It provided an early introduction to various campus resources and facilities such as the
tutoring center, which provides students with tutoring assistance and coaching on
successful study strategies. Thus, students began their fall semester already knowing
how and where to get help, a key to student success in developmental education
(Boylan, 2002). And finally, students often worked in groups during the Summer
Institute workshops, which gave them opportunities to build friendships, to learn from
each other, and to build confidence in mathematics.
Notes Mathematics
1. More than 90 percent of the students who participated in the Summer Institute passed the bridge
elementary algebra exemption examination, and of these, a total of 18 students went on to
take intermediate algebra in the subsequent fall semester (fall 2010). The rest of the students
program
who passed the Summer Institute either have not yet taken intermediate algebra or took the
course after the subsequent fall semester. Some of the students who passed the exemption
examination had a high enough score to be able to take college-level mathematics courses
(i.e. mathematics courses which have intermediate algebra as a prerequisite).
343
2. The grade of C- or better was used as passing, as this is the prerequisite grade that is
required to take any subsequent courses in mathematics.

References
Aakvik, A. (2001), “Bounding a matching estimator: the case of a norwegian training program”,
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

Oxford Bulletin of Economics and Statistics, Vol. 63 No. 1, pp. 115-143.


Adelman, C. (1998), “The kiss of death? An alternative view of college remediation”, National
Crosstalk, Vol. 6 No. 3, p. 11.
Altieri, G. (1990), “A structural model for student outcomes: assessment programs in community
colleges”, Community College Review, Vol. 17 No. 4, pp. 15-32.
Attewell, P., Lavin, D., Domina, T. and Levey, T. (2006), “New evidence on college remediation”,
The Journal of Higher Education, Vol. 77 No. 5, pp. 886-924.
Barnett, E.A., Bork, R.H., Mayer, A.K., Pretlow, J., Wathington, H.D. and Weiss, M.J. (2012),
Bridging the Gap: An Impact Study of Eight Developmental Summer Bridge Programs in
Texas, Community College Research Center, Teachers College, Columbia Univeristy,
New York, NY.
Becker, S.O. and Caliendo, M. (2007), “Sensitivity analysis for average treatment effects”, The
Stata Journal, Vol. 7 No. 1, pp. 71-83.
Bonham, B.S. and Boylan, H.R. (2011), “Developmental mathematics: challenges, promising
practices, and recent initiatives”, Journal of Developmental Education, Vol. 34 No. 3, pp. 2-10.
Boylan, H. (1999), “Exploring alternatives to remediation”, Journal of Developmental Education,
Vol. 22 No. 3, pp. 2-10.
Boylan, H. and Saxon, D. (1998), “The origin, scope, and outcomes of developmental education in
the 20th century”, in Higbee, J. and Dwinell, P. (Eds), Developmental Education: Preparing
Successful College Students, National Resource Center for the First-Year Experience and
Students in Transition, University of South Carolina, Columbia, SC, pp. 5-14.
Boylan, H. and Saxon, D. (1999), Outcomes of Remediation, National Center for Developmental
Education, Boone, NC.
Boylan, H., Bonham, B. and Bliss, L. (1994), “Who are the developmental students?”, Research in
Developmental Education, Vol. 11 No. 2, pp. 1-4.
Boylan, H., Bonham, B., Claxton, C. and Bliss, L. (1992), “The state of the art in developmental
education: report of a national study”, First National Conference on Research in
Developmental Education, Charlotte, NC, November.
Boylan, H.R. (2002), What Works: Research-Based Best Practices in Developmental Education,
National Center for Developmental Education, Boone, NC.
Caliendo, M. and Kopeinig, S. (2005), “Some practical guidance for the implementation
of propensity score matching” discussion paper series, The Institute for the Study of
Labor, Bonn.
JARHE Douglas, A. and Schaid, J. (2010), From Curiosity to Results: The Creation of A Summer Bridge
Program, University of Illinois at Urbana-Champaign, Champaign, IL.
7,2
Edgecombe, N. (2011), “Accelerating the academic achievement of students referred to
developmental education”, Brief No. 55, Community College Research Center (CCRC),
Teachers College, Columbia University, New York, NY.
Garcia, P. (1991), “Summer bridge: improving retention rates for underprepared students”,
344 Journal of the Freshman Year Experience, Vol. 3 No. 2, pp. 91-105.
Goudas, A.M. and Boylan, H.R. (2012), “Addressing flawed research in developmental education”,
Journal of Developmental Education, Vol. 36 No. 1, pp. 2-13.
Graham, S.E. (2010), “Using propensity scores to reduce selection bias in mathematics education
research”, Journal for Research in Mathematics Education, Vol. 41 No. 2, pp. 147-168.
Graham, S.E. and Kurlaender, M. (2011), “Using propensity scores in educational research:
general principles and practical applications”, The Journal of Educational Research,
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

Vol. 104 No. 5, pp. 340-353.


Guo, S. and Fraser, M.W. (2010), Propensity Score Analysis: Statistical Methods and Applications,
Sage, Los Angeles, CA.
Hyde, J.S. and Linn, M.C. (1988), “Gender differences in verbal ability: a meta-analysis”,
Psychological Bulletin, Vol. 104 No. 1, pp. 53-69.
Lunceford, J.K. and Davidian, M. (2004), “Stratification and weighting via the propensity score in
estimation of causal treatment efects: a comparative study”, Statistics in Medicine, Vol. 23
No. 19, pp. 2937-2960.
Mantel, N. and Haenszel, W. (1959), “Statistical aspects of the analysis of data from retrospective
studies”, Journal of the National Cancer Institute, Vol. 22 No. 4, pp. 719-748.
Morgan, S.L. and Winship, C. (2007), Counterfactuals and Causal Inference: Methods and
Principles for Social Research, Cambridge Universitiy Press, Cambridge, MA.
Niederle, M. and Vesterlund, L. (2010), “Explaining the gender gap in math scores: the role of
competition”, Journal of Economic Perspectives, Vol. 24 No. 2, pp. 129-144.
Rosenbaum, P.R. (2002), Observational Studies, Springer, New York, NY.
Rosenbaum, P.R. and Rubin, D.B. (1983), “The central role of the propensity score in
observational studeis for causal effects”, Biometrika, Vol. 70 No. 1, pp. 41-55.
Rubin, D.B. (1997), “Estimating causal effects from large data sets using propensity scores”,
Annals of Internal Medicine, Vol. 127 No. 8, pp. 757-763.
Rubin, D.B. (2006), Matched Sampling for Causal Effects, Cambridge University Press,
Cambridge, MA.
Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W.H. and Shavelson, R.J. (2007), Estimating
Causal Effects Using Experimental and Observational Designs, American Educational
Research Association, Washington, DC.
Shadish, W., Cook, T. and Campbell, D. (2002), Experimental and Quasi-Experimental Designs for
Generalized Causal Inference, Houghton Mifflin, Boston, MA.
Smith, J. and Todd, P. (2005), “Does matching overcome LaLonde’s critique of nonexperimental
estimators?”, Journal of Econometrics, Vol. 125 No. 1, pp. 305-0353.
Strayhorn, T.L. (2011), “Bridging the pipeline: increasing underrepresented students’preparation
for college through a summer bridge program”, American Behavioral Scientist, Vol. 55
No. 2, pp. 142-159.
Umoh, U., Eddy, J. and Spaulding, D. (1994), “Factors related to student retention in community
college developmental education”, Community College Review, Vol. 22 No. 2, pp. 37-47.
Waycaster, P. (2001), “Factors impacting success in community college developmental Mathematics
mathematics courses and subsequent courses”, Community College Journal of Research
& Practice, Vol. 25 Nos 5-6, pp. 403-416.
bridge
Waycaster, P. (2004), “The best predictors of success in developmental mathematics courses”,
program
Inquiry, Vol. 9 No. 1, pp. 1-8.
Weiss, C.H. (1998), Evaluation, Prentice Hall, Upper Saddle River, NJ.
Zanutto, E.L. (2006), “A comparison of propensity score and linear regression analysis of complex 345
survey data”, Journal of Data Science, Vol. 4 No. 1, pp. 47-91.

Corresponding author
Dr Sally A. Lesik can be contacted at: lesiks@ccsu.edu
Downloaded by 114.120.239.122 At 20:46 20 September 2016 (PT)

For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com

You might also like