Regression Pharma

Currents in Pharmacy Teaching and Learning xxx (xxxx) xxx–xxx
Contents lists available at ScienceDirect
Currents in Pharmacy Teaching and Learning

journal homepage: www.elsevier.com/locate/cptl
Methodology Matters
Using multiple linear regression in pharmacy education scholarship

Amanda A. Olsena, Jacqueline E. McLaughlinb, , Spencer E. Harpec
⁎
a
College of Education, University of Texas at Arlington, Arlington, TX, United States
b
UNC Eshelman School of Pharmacy, University of North Carolina, Campus Box 7355, 329 Beard Hall, 301 Pharmacy Lane, Chapel Hill, North
Carolina 27599, United States
c
Midwestern University Chicago College of Pharmacy, Downers Grove, IL, United States
ARTICLE INFO ABSTRACT
Keywords: Our situation: There has been an increased interest in regression techniques within pharmacy
Regression education to allow researchers to determine variables that may predict a specific outcome (e.g.,
Statistics predicting student scores on the Pharmacy Curriculum Outcomes Assessment). This article has
Biostatistics been tailored for individuals who are interested in learning more about multiple linear regression
Quantitative research
as a data analysis tool and its potential utility in pharmacy education research.
Modeling
Methodological literature review: Within this section, the basic steps of regression are outlined,
starting with correlational analysis before progressing to simple linear regression and multiple
regression. Key terms needed to understand and interpret regressions are also discussed.
Our recommendations and their applications: Nine practical recommendations are provided to help
researchers better understand and implement regression analyses in their studies.
Potential impact: Regression analyses could be helpful in advancing pharmacy educational
scholarship by enabling scholars to better understand variables that may predict specific out-
comes such as student achievement or program retention.
Our situation
Throughout our careers as pharmacy educators, we have been asked to engage in research with students and colleagues. We have
found that arguably, one of the most challenging aspects of conducting quantitative research is determining and implementing the
correct statistical analyses. Specifically, as researchers we should be thinking about how the research design informs the data col-
lected and analysis plan. One statistical analysis commonly misunderstood is regression. Often, we find that students and faculty
struggle to understand, implement, and interpret this type of analysis within their research. For example, we have seen researchers
report multiple correlational relationships when a regression analysis would have been a more informative and robust option.
Regression analyses are often used to determine variables that are thought to explain or predict student performance. This often
occurs when we are interested in determining factors that predict student course failure,1 passing the North American Pharmacist
Licensure Examination (NAPLEX),2 or matching for residency.3 One instrument commonly used to predict student performance is the
Pharmacy Curriculum Outcomes Assessment (PCOA), a comprehensive tool designed to assess pharmacy knowledge. The PCOA
became a required assessment for all pharmacy students nearing completion of the didactic curriculum with the release of the 2016
Accreditation Council for Pharmacy Education Accreditation Standards (Standards 2016).4 Since then, pharmacy educators have
worked to better understand how results from the PCOA might be used to inform curriculum change and student development.5,6 We
often approach studying PCOA outcomes by using a regression analysis, which can help identify predictors of student performance on
⁎
Corresponding author.
E-mail address: Jacqui_McLaughlin@unc.edu (J.E. McLaughlin).
https://doi.org/10.1016/j.cptl.2020.05.017
1877-1297/ © 2020 Elsevier Inc. All rights reserved.
Please cite this article as: Amanda A. Olsen, Jacqueline E. McLaughlin and Spencer E. Harpe, Currents in Pharmacy Teaching
and Learning, https://doi.org/10.1016/j.cptl.2020.05.017
A.A. Olsen, et al. Currents in Pharmacy Teaching and Learning xxx (xxxx) xxx–xxx
Box 1
Recommended regression resources.
1. Blalock SJ. Making Meaning of Data in Pharmaceutical and Clinical Research. American Pharmacists Association; 2019.
2. Chatterjee S, Hadi AS. Regression Analysis by Example. 5th ed. John Wiley & Sons; 2012.
3. Dupont WD. Statistical Modeling for Biomedical Researchers. 2nd ed. Cambridge University Press; 2009.
4. Fox J. Applied Regression Analysis and Generalized Linear Models. 3rd ed. SAGE Publications; 2015.
5. Friesner D, Bentley JP. Simple and multiple linear regression. In: Aparasu RR, Bentley JP, eds. Principles of Research Design
and Drug Literature Evaluation. 2nd ed. McGraw-Hill; 2019.
6. Montgomery DC, Peck EA, Vining GG. Introduction to Linear Regression Analysis. 5th ed. John Wiley& Sons; 2012.
7. Randolph KA, Myers LL. Basic Statistics in Multivariate Analysis. Oxford University Press; 2013. [Chapter 5 discusses re-
gression]
8. Salkind NJ. Statistics for People Who (Think They) Hate Statistics. 6th ed. SAGE Publications; 2017. [Chapter 15 discusses
correlation; Chapter 16 discusses regression]
9. Skrepnek GH. Regression methods in the empiric analysis of health care data. J Manag Care Pharm. 2005;11(3):240–251.
doi:10.18553/jmcp.2005.11.3.240
10. UCLA Institute for Digital Research and Education (https://stats.idre.ucla.edu/). Provides several online resources including
annotated output (https://stats.idre.ucla.edu/other/annotatedoutput/) and data analysis examples (https://stats.idre.ucla.
edu/other/dae/) for several common statistical packages.
the assessment. For example, in a study by Gillette et al.,5 the Pharmacy College Admissions Test (PCAT), the Health Science
Reasoning Test, and cumulative pharmacy grade point averages (GPAs) were associated with higher PCOA scores. Similarly, Giuliano
et al.6 found that PCOA performance was predicted by GPA, PCAT Reading section score, institution, accommodators, and student
affinity towards reading. By identifying possible variables that predict PCOA performance, pharmacy schools can better understand
variables that have a relationship with student success.
Although regression can be a reasonable analytic approach for studying PCOA scores, there are a number of challenges that can
occur when developing a regression model.5,6 Common pitfalls include the use of overly causal language to describe results, in-
appropriate sample size, failure to acknowledge sample-dependence, and incorrect beta coefficient interpretations for categorical
variables.7 Therefore, the goal of this article is to outline the potential uses, assumptions, advantages, and disadvantages of using this
regression technique, as well as discuss terminology, the interpretation of regression results, potential pitfalls, and tips for successful
implementation. We have tailored this article for individuals who are interested in learning more about multiple linear regression as a
data analysis tool and its potential utility in pharmacy education research. Box 1 provides additional regression resources.
Methodological literature review
In this paper, a correlational analysis will be described before advancing to simple linear regression and multiple regression.
Although there are many types of regression, such as logistic regression and multilevel modeling, this article will focus on linear
regression, sometimes called ordinary least squares regression. Linear regression was selected as it is one of the most common
regression techniques used in educational scholarship and is one of the first types of regression analyses learned by scholars.
Introduction to correlation
The goal of a correlation is to understand the strength and direction of the relationship between two different variables.8 The
strength of the relationship describes how closely two variables are associated with each other and is measured from 0 to 1, where 0
indicates there is no relationship and 1 indicates there is perfect linear relationship.8 In the social sciences, a correlation of 0.1 is
typically considered weak, 0.3 is considered moderate, and 0.5 is considered strong.9,10 It is important to note that these benchmarks
may vary by discipline (e.g., a correlation of 0.5 may be viewed as poor in some laboratory-based disciplines) and even context within
a discipline (e.g., a correlation of 0.8 may be required for a strong relationship when assessing test-retest reliability within education).
Furthermore, the direction of a correlation can either be positive or negative. A positive relationship occurs when both variables
move in the same direction (e.g., as the temperature outside increases, the air conditioning bill increases) and a negative relationship
occurs when one variable increases as the other variable decreases, or vice versa (e.g., as the outside temperature decreases, the
heating bill increases).
Results from a correlational analysis are typically reported using the letter “r” and include both the strength (i.e. 0 to 1) and
direction (i.e. + or -) of the relationship between the two variables of interest.8 Often a strong correlation coefficient means that a
linear relationship exists between two variables, where a weak correlation coefficient suggests that a linear relationship may not
exist. For example, a weak correlation coefficient may be found when the actual relationship is curvilinear, which would require a
different type of analysis. It is important to note that strong correlations alone do not equal causation.7
2
A common way to visualize the correlation between two variables is to use a scatterplot, where each axis represents one variable.
A positive correlation would have data points starting in the bottom left corner of the plot increasing to the top right corner, while a
negative correlation would have data points starting from the top left corner of the plot decreasing to the bottom right corner. The
tighter the data points are clustered together along the diagonal, the stronger the correlation. When no linear pattern emerges and the
points are randomly scattered around the plot, there is no correlation between the two variables of interest.
Introduction to regression
Linear regressions are an extension of Pearson's correlation (Pearson r), although it should be noted there are other types of
correlations (e.g., Kendall, Spearman) that will not be discussed in this paper.8 While correlations determine whether two variables
are associated with each other, regression analyses determine whether a variable can predict the outcome of another variable.11 Since
we are interested in predicting one variable based on another, we can decide which variable is the dependent (outcome) variable and
which variable is the independent (predictor) variable. Unlike correlational analyses, variables in regression analyses need to be
classified as either predictor or outcome variables.
Simple linear regression
In simple linear regression, only one predictor variable is used to predict or estimate a single outcome variable.12 The main
difference between regression and correlation plots is that regression plots include a straight, or linear line, known as the line of best
fit.12 The line of best fit shows the general direction and best representation, or trend, of the data points. The standard equation for
simple linear regression is Yi = (b0 + b1Xi) + ei, where the line of best fit is defined by the slope of the line, or the regression
coefficient of the predictor variable (b1) and the point at which the line crosses the y, or vertical axis (b0). The variable Yi is the
outcome variable to be predicted, Xi is the participant's score on the predictor variable, and ei is the error or residual term, which is
the difference between the line of best fit and the actual participant's observed score.11
When plotting the line of best fit for variables that have strong correlations, the line of best fit is close to the data points, suggesting
little difference between the observed data and the line of best fit. When there are large distances between the observed data points and
the line of best fit, there are high levels of measurement error. This occurs when there are deviations between what we expect the value
to be and its true value. Although the true value is unobservable, it can be estimated using the line of best fit. Therefore, it is evident that
when variables are strongly associated with each other, there is less error associated with the regression results.
Multiple linear regression
In multiple linear regression, which will now be called “regression” for the remainder of this paper, it is important to emphasize
that the term “multiple” as in multiple linear regression, refers to the use of multiple predictor variables. Multiple linear regression
may also be referred to as “multivariable” linear regression. This can be confused with the term “multivariate” which should only be
used when there are multiple outcome variables. In the regression formula Yi = (b0 + b1X1i + b2X2i + … + bnXni) + ei, the variable
Yi is the outcome variable to be predicted, b1 is the coefficient of the first predictor variable (X1i), b2 is the coefficient of the second
predictor variable (X2i), bn is the coefficient of the nth predictor variable (Xni), and ei is the error term, which is the difference
between the predicted value from the line of best fit and actual observed value of the outcome variable.12
The intercept (b0)

The intercept or constant (b0) is the expected mean value of the outcome variable when all predictor variables are set to zero.
Often the intercept is not interpreted directly. For example, it would not be possible to interpret the intercept for an individual's age
(e.g., when age equals zero) because it is not possible to ever be zero years old. However, depending on the variable, interpreting the
intercept may be useful. For example, if you were interested in what the expected mean blood pressure is for a group of patients when
no medication is given, you would need to interpret the intercept for that information. Therefore, you as the researcher need to
determine when it makes sense to interpret the intercept.
The regression coefficients (bn)

The relationship between a predictor variable (X1i, X2i, to Xni) and the outcome variable in a regression model is represented by
the regression coefficient. When interpreting these coefficients, all other variables in the model are held constant, which is called
“controlling for all variables in the model” or “holding all other variables constant”. In other words, the model equalizes the par-
ticipants for all variables except for the predictor being interpreted. Regression coefficients can either be unstandardized or stan-
dardized.
Unstandardized coefficient (also called slope, unstandardized beta, represented with a “B”)
The value of the unstandardized coefficient for the predictor represents the amount of change in the outcome, given a one-unit
change in the predictor. The sign of the coefficient determines whether the changes in the variables are in the same direction (a
positive coefficient) or in opposite directions (a negative coefficient). To make the interpretation as simple as possible, it may be
helpful to think of what happens to the outcome variable when the predictor variable is increased by one unit (i.e., for a one unit
increase in X, there is a B unit increase/decrease in Y).
3
Categorical predictor variables require additional consideration because they must be coded when added to regression models.
There are many different approaches to coding categorical variables (e.g., effect coding, difference coding, Helmert coding).13 Given
its fairly intuitive interpretation, one common coding approach is called dummy coding, briefly described here. In the simplest case of
a dichotomous variable, like gender, one value is selected as the reference value and is coded as 0 while the other value is coded as 1.
With dummy coding, the unstandardized coefficient presents the change in Y for the category in question (e.g., female) compared to
the reference category (e.g., male). When categorical variables have multiple response options, such as race/ethnicity (e.g., White,
Black, Hispanic, Asian, other), one option is still selected as the reference group (e.g., White). The various response options are coded
into separate variables as 0 (not in that category) or 1 (in that category). All of the dummy variables except the reference category are
included in the regression model (i.e. the set of 0/1 variables for Black, Hispanic, Asian, and other; a variable for white is not
included). In this case, the regression coefficients for any of those included race/ethnicity variables are interpreted as the effect for
that category compared to the reference group. For example, the coefficient for Asian individuals would describe the relationship
with the outcome variable in comparison to white individuals. If the goal were to compare a given response category to all others
within the group (e.g., Asian individuals vs. any other race/ethnicity), then alternate coding approaches would be required. The most
intuitive approach would be to create a set of dichotomous variables for each race/ethnicity coded as 1 for individuals within the
group of interest and 0 for those in any other group (e.g., Black would be 1 if the individual were Black and 0 if he/she were of any
other race/ethnicity). A note of caution with dummy-coding; it is important to think about dummy-coded variables when determining
sample size because the more variables that are created, the larger the sample size needs to be (see Recommendation 4 below).
Standardized coefficient (also called beta coefficients or beta weights, represented with a “β”)
Using the unstandardized coefficient may not be sufficient for several reasons. First, the unstandardized coefficient is measured in
the unit or scale of the predictor variable. There could be numerous units or scales represented within a single model, for example
“years” for age, “0 to 100 points” for exam grade, “1 to 4 points” for GPA, and “0 to 800 points” for PCAT. As a result, one cannot
glean the relative importance of the relationship between the predictors and outcome in comparison to the relative importance of the
other variables. In contrast, coefficients that have been standardized result in the variances of the predictor and outcome variables all
being equal to one. This standardization produces coefficients that indicate the relative importance of each variable and enables
researchers to determine which variables have the highest and lowest relative predictive validity. When interpreting standardized
coefficients, for each one standard deviation increase in X, there is β standard deviation increase/decrease in Y. Note how the “unit”
here is a standard deviation unit when interpreting standardized coefficients. This is in contrast to the native units of the variables
when interpreting unstandardized coefficients.
The P value
At the onset of any empirical research, the researcher must identify the threshold for statistical significance, or alpha.
Conceptually, alpha represents the probability of rejecting the null hypothesis when it is true. An alpha level of 0.05 indicates that the
researcher is willing to accept a 5% risk of concluding that a difference exists when there is no difference. The P value is the product
of a statistical test, and in regression, the analysis provides a P value for each predictor, as well as for the overall regression model.
According to null-hypothesis statistical testing, the null hypothesis is rejected when the P value is < alpha. This indicates that the
predictor variables have a statistically significant relationship with the outcome variable. When the P value is > alpha, the null
hypothesis fails to be rejected and no statistically significant relationship between the predictor and outcome is inferred. Although
0.05 is a common alpha level in contemporary research, consideration should be given to whether or not more lenient (0.1) or
stringent criteria (0.01) levels should be established for determining statistical significance. Regardless of the threshold chosen for
statistical significance, the actual P value should be reported rather than simply reporting a cut-point (e.g., P < .05). Similarly,
reporting various cut-points to suggest levels of statistical significance, such as the traditional * for P < .05, ** for P < .01, and ***
for P < .001, is not appropriate.14 It is also recommended that confidence intervals for the regression coefficients be reported. It is
important to note that there is spirited discussion occurring throughout the academic community on the appropriate use of P va-
lues.15,16
While statistical significance is linked to the variability of the sample or how well the data fit the proposed statistical model or
hypothetical explanation, practical significance is linked to whether the result has meaning or is useful in the real world.10,17
Confusion between practical and statistical significance often occurs when interpreting correlation coefficients. It is not uncommon to
see statistically significant correlation coefficients of 0.2 or 0.3. However, while these coefficients may be statistically significant,
they may not be practically significant since these are generally considered weak correlations. To help determine what is considered
to be practically significant, Peeters10 suggests using generalized guidelines (e.g., Cohen's effect size interpretation), benchmarking,
or selecting a minimal important difference. Effect sizes can measure the magnitude of a relationship and aid in determining practical
significance. As proposed by Cohen,9 an effect size of 0.1 is considered to be small, 0.3 is medium, and 0.5 is large. It is important to
remember effect sizes are context-specific, so the strict application of Cohen's benchmarks without careful consideration is not
recommended.10,17–19
Confidence interval
The confidence interval is calculated for the slope of the regression line and is constructed using the standard error of the sampling
distribution. The standard error represents the average distance that the observed values fall from the regression line and can be used
to determine the precision of predictions. Typically, researchers choose either a 95% confidence interval or a 99% confidence interval
depending on the alpha level selected (either 0.05 or 0.01). If a 95% confidence interval is selected, this means that approximately
4
95% of the observations will fall within plus or minus two times the standard error of the regression line. In other words, the
researchers would be 95% confident that their identified confidence interval would contain the true unstandardized beta coefficients.
In addition, the confidence interval can also be used to determine statistical significance. When the confidence interval for linear
regression coefficients does not contain 0, then there is a statistically significant difference.
R2 (also called coefficient of determination in simple linear regression or the coefficient of multiple determination in multiple linear regression)
R2 (also expressed r2) is a measure of how well variables in the model predict the outcome. Technically, it is the amount of
variance in the outcome that is accounted for when the values of the predictors are known. This can be seen visually by inspecting
how close the data are to the regression line. An R2 of 1 indicates that the model explains all of the variability in the outcome. This,
however, is unlikely to happen in any educational research study because people are extremely difficult to predict. As a result, there is
often error in our predictions which leads to lower R2 values. Error does not mean that you as the researcher made a mistake, instead
it means that there is variation in the data or an incorrect regression model (e.g., important variables omitted, wrong variables
included), meaning not every data point is on the line of best fit. For example, if R2 = 0.34, it means that 34% of the variance in the
outcome is accounted for or explained by the predictors in the model. While a very small R2 value may suggest that the model is
missing important variables for understanding the outcome, if there are statistically significant predictors, some conclusions can still
be drawn. Since R2 will always increase when variables are added to the model, adjusted R2 is often used to control for that increase,
as it can account for small sample sizes and models with many predictors.
Assumptions
In addition to understanding how outcome variables are predicted, there are a few assumptions that should be met and tested. The
main regression assumptions include variable types, multicollinearity, independence of errors, distribution of errors, linearity, and
homoscedasticity (see 12
(1) Variable types: In a regression analysis, predictor variables are either continuous (i.e. interval or ratio data) or categorical (i.e.
nominal or ordinal data), meaning that predictor variables can be at any level of measurement. However, the outcome variable
must be continuous.20 One common misconception is that the dependent variable must be normally distributed. Technically, the
normality assumption refers to the error terms, but assessing normality of the outcome variable can sometimes be a helpful
indicator that the error distribution will also be normal.12
(2) Multicollinearity: It is important that the predictor variables included in the regression analysis are not too highly correlated with
each other. Multicollinearity is the excessive correlation between predictor variables. Multicollinearity can be examined before or
after the regression analysis is completed by analyzing the correlations between predictor variables. This assumption can be
tested by calculating the variance inflation factor (VIF), which measures whether predictor variables are highly correlated with
each other suggesting that they are measuring the same thing.12 When multicollinearity is detected, careful thought should be
given to which variable(s) should be included in the model or if strongly correlated variables should be aggregated into a single
variable.
(3) Independent errors: The independent errors assumption analyzes the correlation between the error terms of any two observations.
Researchers can ensure this assumption is met early in the research process by taking care that the individual observations are
independent of each other (i.e. there is no matching or pairing). The Durbin-Watson (D-W) test can be used to confirm that error
terms are independent from one another in situations where there are temporal or serial relationships at play (e.g., a pharmacy
program's NAPLEX pass rate in one year may be correlated with the previous year's pass rate).12 When errors are not independent,
regression results can become biased, potentially leading to incorrect inferences.
(4) Normally distributed errors: The error terms of the variables in the regression model are assumed to be normally distributed,
suggesting that the error terms have a mean of 0. When the error terms are not normally distributed, the resulting coefficient
estimates may be biased.12
(5) Linearity: To meet this assumption, the data representing the relationship between each predictor variable and the outcome
variable must be linear (form a straight line).12
(6) Homoscedasticity: The homoscedasticity assumption is interested in the spread of the error term. Specifically, the error term
should be similar across the entire predictor variable. For example, if the variable age had smaller error levels for younger
individuals but larger errors for older individuals on the outcome variable, this would be a homoscedasticity violation. When this
assumption is violated, the standard errors become biased which can change confidence intervals and statistical significance.
Appropriate uses of regression
As stated in the introduction, the purpose of regression is to predict an outcome variable or determine how well the predictor
variables explain the outcome variable.
Steps for conducting a regression
Step 1
Determine that your research question and data are appropriate for regression. It is important to determine that the question you
are trying to answer aligns with the purpose of regression. While regression is a flexible tool, it must be used in the appropriate
5
Table 1
Linear regression assumptions.
Assumption Meaning How to test it Remedies for violations
Variable types Predictor variables can be NA Exclusion of variables that do not

any level of measurement, meet the criteria. (e.g., an outcome
but the outcome variable variable that is categorical)
must be a continuous variable
Multicollinearity Predictor variables are highly Variance inflation Exclusion of variables that are
correlated with each other factor > 10 highly correlated with each other
suggesting they are (e.g., variables with a VIF > 10)
measuring the same thing
Independent errors The error terms of any two The Durbin-Watson Test Specify the model to account for
observations are independent between 1.4 and 2.6, or a P lack of independence (e.g.,
of each other value > .05 incorporate clustering) or change
how the outcome variable is
measured
Normally distributed errors Error terms are normally Straight line on a Q-Q Plot Data cleaning and the exclusion of
distributed outliers or transform the data (apply
a mathematical function to the data,
such as a square root or log
function)
Linearity The relationship between the Correlational analyses Either transform the data or exclude
predictor variables and variables that do not have a linear
outcome variable must form a relationship
straight line
Homoscedasticity The spread of the error terms Residuals (error terms) Conduct a weighted least squares
across a predictor variable against predicted values regression or transform the outcome
graph variable
NA = not applicable; VIF = variance inflation factor.
situations. As with any statistical analysis approach, regression requires an adequate sample size. The sample size requirements for
regression may be larger than more simple tests. This is discussed in more detail later in this paper. The data should also be screened
to check for errors such as out-of-range values, duplicates, and unusual cases.
Step 2
Identify which variables you plan to include in your regression analysis and potential relationships between those variables. To
determine which variables should be included in the regression model, researchers consult the literature to identify variables of
interest and determine their research questions. It is also important to identify the level of measurement for each variable in the
study. As stated earlier, the outcome variable must be continuous, while the predictor variables can be either continuous or cate-
gorical. For example, PCOA scores are continuous, while predictor variables like gender or major prior to pharmacy school would be
categorical. After variables have been identified, it is recommended that the researchers hypothesize how the variables in their
analysis relate to each other. By being aware of the relationship between variables, there can be a better understanding of the
regression results.
Step 3
Set the alpha level for the statistical analyses. Alpha levels are typically set by the researcher at either 0.05 or 0.01. If alpha was
set at 0.05, variables with a P value < .05 would be considered statistically significant.
Step 4
Build the regression model. There are multiple ways to build a regression model. Usually, the purpose of the analysis drives the
modeling approach. For example, if you are just interested in determining which predictor variables statistically significantly predict
an outcome variable, all possible variables should be included in the model (i.e. most popular approach). However, if you are
interested in identifying the best set of predictor variables, then the model will need to be built more strategically, where all variables
are included in the model and then non-significant ones are removed case by case. This is also called stepwise regression, since
variables are either added or removed step by step to assess changes in the model. Stepwise regression in either direction (forward or
backward) is not a recommended best practice as it can bias the regression results.21,22 The use of automated algorithms in statistical
software to build regression models is also strongly discouraged. Subject matter knowledge is a vital piece of building useful re-
gression models.
Step 5
Test the regression model assumptions. Satisfied assumptions result in the best unbiased estimates of the values in the sample
being analyzed.12 If you happen to violate an assumption, solutions can be found in the Table 1. Although deviations from
6
assumptions should be reported, most (if not all) routine findings of this assumption data-screening will not appear in a research
report. It should inform and inspire confidence among investigators in their overall findings.
(1) Multicollinearity: Calculate the VIF to determine whether there are multiple predictor variables that are highly correlated with
each other. Although there is no agreed upon benchmark for the VIF, Bowerman and O'Connell23 and Meyers24 determined that a
VIF > 10 would be cause for concern.
(2) Independent errors: The D-W test statistic can range between 0 and 4, with a value of 2 suggesting that the residuals are not
correlated. When there is a value > 2, there is a negative correlation between adjacent residuals while a value < 2 suggests there
is a positive correlation.25 Statistical packages typically report a P value for the D-W test. When the P value is < 0.05, the
independent errors assumption is violated. Error terms can also be plotted to test this assumption.
(3) Assumptions about errors: Data visualization is the easiest way to determine whether assumptions regarding residuals are met.
Optimally, the data points in a scatter plot of the residuals vs. fitted (or predicted) values should look like a random array of dots
scattered around zero. If the data points appear to display any pattern, such as a funnel or an upward trend, it is possible that the
assumption of homoscedasticity was violated.12
A Q-Q plot is often used to show deviations from normality. A straight line represents a normal distribution, while the points
around the line represent observed residuals or error terms. In a normally distributed dataset, all points would lie on the line.12 Larger
datasets are more immune to these different types of violations.
Step 6
Interpret the results of the regression. Interpretation has several key pieces, many of which are common sources of confusion and
opportunities for missteps. Interpretation is covered in the next section of this paper.
Interpretation of regression results
Accurately interpreting the results of a regression model is critical for understanding the relationship between predictor variables
and the outcome variable. In the following section, Table 2 as reported in Gillette et al.5(p87) will be used as an example to help
explain how to interpret each core element in a regression model. In this article, admissions and demographic variables that could
help predict student success on the PCOA during the first (P1), second (P2), and third (P3) years of pharmacy school were in-
vestigated, although in this paper we will focus on P1 only.
The P value
In Gillette et al.,5 the researchers identified an alpha level of 0.05 as the threshold for statistical significance. In Table 2, P1
data,5(p87) five variables (PCAT composite, Health Sciences Reasoning Test, age at admission, cumulative pharmacy GPA, the 2017
graduating cohort) have a P value < .05. Therefore, only these five variables were statistically significantly related to student PCOA
scores when accounting for the remaining variables in the model.
Confidence interval
As stated earlier, confidence intervals can be used as a way to determine statistical significance. Although the confidence intervals
are not reported in the Gillette et al.5 article, they are often reported when using regression. When a confidence interval does not
include 0, then that variable would be statistically significant. If we were to interpret a confidence interval, we could say, “The
researchers are 95% confident that the interval contains the true coefficient.”
Standardized coefficient
The standardized coefficients are interpreted very similarly to the unstandardized coefficients. The only difference is that instead
of using one-point increases or decreases, we can use one SD increases or decreases. When using a standardized coefficient, the PCAT
composite score could be interpreted as, “For every one SD increase in the PCAT composite score, student PCOA scores increased by
0.42 SDs.”5 Student gender could be interpreted as, “Being a female student, compared to being a male student, was associated with a
0.01 SD decrease in student PCOA scores.”5 When variables are on different scales, it may be useful to use the standardized coef-
ficients, which places all variables in terms of SD units. Furthermore, since the variables are standardized, they can be interpreted
using common effect size guidelines, such as Cohen's. Using Cohen's guidelines, PCAT composite scores would have a moderately
strong effect size while student gender would have a negligible effect size.9
Unstandardized coefficient
All predictors in the model must be interpreted one at a time and typically, only statistically significant predictors are interpreted. As
an example, PCAT composite, a continuous variable from Gillette et al.5 could be interpreted as, “For every one-point increase in the
PCAT composite score, student PCOA scores increased by 0.84 points, holding all other variables in the model constant.” Student gender
would be interpreted slightly differently since this was a dummy coded categorical variable. Student gender could be interpreted as,
“Being a female student (assuming it was coded as a 1), compared to being a male student (assuming it was coded as a 0), was associated
with a 0.76 decrease in PCOA scores.” This highlights the importance of clearly stating the reference category or group when using
dummy coding. Since student gender was not statistically significant in the model, it was not interpreted in the paper.5
7
The intercept
The intercept is the average value of the outcome variable when all the predictor variables are set at 0. Often researchers do not
interpret the intercept since we are rarely interested in when all predictor variables are at 0. In the Gillette et al.5 article, the intercept
is not reported. Ideally, the intercept should have been reported (if not in the table, at least as a footnote). In this case, it is unclear
whether the intercept term was omitted from the regression model or simply not reported.
R2. In the Gillette et al.5 example, R2 is 0.49. This means that 49% of the variance in student PCOA scores was accounted for by the
predictors in the model. Since R2 is measured from 0% to 100%, the higher the percentage accounted for is typically better, although
benchmarks for appropriate R2 levels may differ by case and between or within disciplines.
Our recommendations and their applications
Below are a series of recommendations to guide potential users of regression analysis. These generally follow the steps of con-
ducting a regression analysis.
Recommendation 1: Determine the purpose of the regression analysis
Regression analysis, in any form, can be used for a variety of purposes: predicting an outcome, explaining an outcome, or
estimating an effect while adjusting for other predictors. Being clear on the intended purpose of conducting a regression analysis is
important since that purpose can have implications on the selection of predictor variables that may be included in the model and even
the general model-building strategy used.
Recommendation 2: Clearly specify the outcome variable and the predictor variable(s) of interest
Since regression analysis involves the researcher selecting an outcome variable, it is important to be clear about the outcome
variable early in the research process. Similarly, potential predictor variables need to be identified. The use of conceptual or theo-
retical frameworks can be extremely helpful in determining the outcome and predictor variables. Sometimes these frameworks have
been proposed elsewhere in the literature. Other times these may be developed for a specific project. The use of a clear framework can
also help readers, including those who may build on your work, better understand the regression analysis. When considering the
potential predictor variables, remember to keep in mind the overall purpose of the regression analysis (Recommendation 1).
Recommendation 3: Get to know the data before constructing any regression model
Taking time to screen the data before starting any regression analysis is beneficial. Data distributions and descriptive statistics
should always be explored prior to any model-building. One benefit is that frank violations of regression analysis assumptions may be
identified at this stage rather than constructing a regression model that is ill-fitting or does not behave well. For example, severe
departures from normality, particularly in the outcome variable, can be identified in early stages. Although the assumption of
normality is for the error terms, identifying severe non-normality can be a signal to make certain analytical choices. It is important to
remember that simple visualizations can be extremely helpful in getting to know your data. While performing correlations between
all variables of interest can show the strength of an association, those are assumed to be linear. Scatterplots of two variables can
quickly identify potential non-linear relationships, which is important since curves can be fit with linear regressions if appropriate
terms are included in the model (e.g., log variables, quadratic terms). Usually, the data analysis completed for this recommendation is
not placed within the research report, but it is still important to conduct.
Recommendation 4: Ensure you have a sufficient sample size
It is important to have a sufficient sample size to ensure findings are reliable and have enough power to detect statistically
significant differences. However, a sample size calculation for regression analysis is not always a straightforward issue. Generally,
rules of thumb are used to approach sample size in two complementary approaches. The first approach involves determining how
large a sample is needed given the modeling approach (e.g., prediction) and the anticipated number of variables. The second ap-
proach speaks to how many independent variables may be considered given the available sample size. The rules of thumb vary widely
from 2 to > 50.26–30 Within the range, a commonly used rule of thumb is 10 observations for each independent variable being
considered in the model. For predictive modeling or when using stepwise model-building approaches, larger sample size re-
commendations on the order of 50 observations per independent variable have been recommended.31 Larger sample sizes do come
with the risk of identifying small relationships as statistically significant, which is another reason why focusing on practical sig-
nificance is important. Regression analyses conducted on two different samples may yield different results. This speaks to the im-
portance of having a sample that is representative of the target population being examined for the research question.
Recommendation 5: Consider transformations of variables carefully
Sometimes researchers decide to apply transformations to variables (e.g., logarithmic, square root, inverse) to address violations of
certain assumptions. These transformed variables are then used in regression models. This happens commonly when the outcome variable
8
is highly skewed. Hospital length of stay or medical costs are examples of health-related scholarship that are sometimes transformed prior
to analysis. Examples from an educational context may include student loan debt or time spent on a particular activity. While transfor-
mations may make sense from a strictly mathematical sense, they can cause problems from an interpretational standpoint. Using length of
stay as an example, a researcher may use a natural logarithmic transformation, that is ln(length of stay in days), and use that transformed
variable as the outcome variable in the regression. The difficulty comes in the interpretation since the regression coefficients now speak to
changes in the transformed variable. Rather than relating to days, the coefficients relate to ln(length of stay in days). Few people would
find that interpretation intuitive (when is the last time somebody said they spent half of a “log-day” in the hospital?).
There are general rules of thumb to ease interpretation difficulties32; however, transformed variables have been an issue of
considerable debate in the areas of economics and health services research.33–35 In situations where sample sizes are large, linear
regression is often quite robust to these violated assumptions provided they are not severe violations.12
There are also alternative approaches to regression that allow for the alternative model specifications that better reflect the data
distribution and data generation processes. These alternative approaches fall under the umbrella of generalized linear models (GLMs). In
fact, linear regression is a specific type of GLM. Logistic regression, which is commonly used in health care research, is another example of a
specific type of GLM. Examples when a GLM may be more appropriate include highly skewed outcome variables, such as student loan debt,
or time spent on an activity where there are many outliers, or outcome variables that represent counts, such as the number of courses a
student fails. The general principle behind these alternate models are similar to linear regression, but the interpretation is slightly different.
Recommendation 6: When reporting regression results, clearly specify the model-building approach that was used
Using regression as a data analysis tool is as much an art as it is a science. As noted earlier in this paper, there are many different
steps involved that represent various decision points on the analytical journey such as the need for strong theory, conceptual fra-
meworks, and/or the collation of prior research. It is vital that the thought process behind those decisions are made clear for readers.
A primary issue is the general model-building approach that was used, which may relate to the overall purpose of the regression (see
Recommendation 1). If the purpose is to identify all statistically significant predictors, then explain how that process unfolded. If the
purpose was to adjust for various confounders, explain which confounders were included and why they were included (e.g., the
results of previous research, a guiding theoretical or conceptual framework). Another important part of regression modeling that is
particularly relevant when the purpose is prediction or explanation is determining whether a model was “good.” There are many
ways to do this, but examining R2 values is a common approach. When various competing models are being compared, it is important
to include an explicit discussion of the decision criteria, including any statistical approaches, to comparing regression models.
Recommendation 7: Pay close attention to the units when interpreting regression coefficients
When standardized regression coefficients are used, the issue of units is removed since all interpretations are effectively on a scale
of SDs (i.e. for a 1 SD increase in a predictor variable, we would expect to see a change in the outcome variable corresponding to
beta). While standardized regression coefficients are helpful when comparing the relative importance of variables, some may find it
difficult to think about life in terms of “SD units.” When interpreting unstandardized regression coefficients, remember the native
units of the variables. One common situation where this may cause confusion is when the outcome is a score that represents a
percentage (e.g., the percent correct on an exam). In these situations, it is very important to remember the difference between an
absolute change (a percentage point change) and a relative change (a percent change). Consider a regression model where the
outcome is the percent correct on an exam. The predictor variable being interpreted is amount of time spent studying in hours. The
coefficient for study time is 5.2. The interpretation here would be for every additional hour a student studies, we would expect to see
a 5.2 percentage point increase in exam score. Notice that the interpretation is not a 5.2% increase since that represents a relative
change.
Recommendation 8: Do not confuse statistically significant relationships as causal inferences
In regression analysis, one outcome variable is selected. It may seem straightforward to say that the predictor variables “cause”
the outcome variable given the thinking behind regression. Unfortunately, this is not always the case. For example, a regression
model could be constructed to predict a student's NAPLEX score from a host of predictor variables. In the model, a significant variable
may be whether the student lives on campus vs. off campus, where students living on campus score higher on the NAPLEX. While this
variable may be a significant predictor of NAPLEX scores from a purely mathematical standpoint, it does not mean that living on
campus causes students to score higher than those who life off campus. Regression methods can be useful in causal inference, but
there are many other issues to consider including study design, theoretical and/or conceptual frameworks, and the potential ma-
nipulability of the predictor variables.36 This recommendation is important to consider when presenting the results of a regression
analysis since it can be easy to suggest inadvertently that variable X caused variable Y. Rule of thumb, in most cases hedge against
using causation language.
Recommendation 9: Use standardized coefficients or other methods to examine relative importance of predictors
Sometimes researchers are interested in discussing the relative impact or importance of different variables in a regression, that is
which predictor has the “most” influence on the outcome variable. When making these comparisons, it is important to consider the
9
scale of measurements or the units. If a set of independent variables have different scales or are on different units, then the un-
standardized coefficients cannot be used to determine relative influence. Standardized regression coefficients can allow the com-
parison of variables measured on different scales. In some situations, particularly with highly correlated predictor variables, other
approaches may be required such as dominance analysis.37 It is important to keep in mind the measurement tools commonly used for
psychological or social constructs. These are often measured using Likert scales or similar tools.38 While two measures that both use
Likert scaling techniques may seem to be “on the same scale,” the researcher must pay attention to the response options used in the
measure (e.g., four vs. five vs. seven ordered responses). These differences can affect the underlying measurement properties and
influence apparent variable importance, so standardized coefficients or similar methods must be used for comparisons. Remember
that the standardized coefficients are most useful for comparing the relative impact of variables within or across regression models,
but they may not provide the most intuitive interpretation of the relationship between the independent variable and the outcome.
Applications and implications
To illustrate the recommendations cited in the previous section and discuss regression within the context of PCOA, the article
written by Gillette et al.5 will be used as an example.
Recommendation 1:Purpose. The purpose of this study was to determine which admissions and demographic variables could help
predict student success on the PCOA during the P1, P2, and P3 years. Predictor variables were selected based on recommendations
from the literature and the model-building strategy was selected to highlight differences between the various predictors on the
outcome variable for students in different years of the pharmacy curriculum.
Recommendation 2:Clearly specify variables. The outcome and predictors, or variables of interest, were clearly stated. The outcome
variable was the PCOA score. Predictor variables included PCAT score, cumulative pharmacy GPA, student age, student gender, pre-
requisite GPA, Health Sciences Reasoning Test, attainment of a bachelor's degree, a group dilemma score to measure teamwork, and a
standardized behavioral interview score, where students were assessed on integrity, leadership abilities, communication, and writing.
Recommendation 3:Familiarize yourself with the data. The data appears to have been explored before regression analyses were
conducted, as descriptive statistics were reported and the age variable was transformed. Although assumptions for regression models
should always be tested, they are not regularly reported in manuscripts. Often correlation tables are presented before regression
results to demonstrate the relationships between variables.
Recommendation 4:Check for a sufficient sample size. All three regression models have a sufficient sample size as there are at least 10
observations for each independent variable being considered in the model. Typically, larger sample sizes result in better predictions
and are more robust to assumption violations in study data.
Recommendation 5:Consider variable transformations. The log of the age of admission variable was taken, demonstrating this
variable was transformed. Variables are transformed when they are highly skewed, which is not surprising in this example due to the
specific sample of pharmacy students being investigated. To determine whether variables are highly skewed, we recommend con-
ducting descriptive statistics and graphing variables on a scatterplot.
Recommendation 6:Specify model-building approach. Although the model-building approach was not explicitly stated in this article,
the model-building approach was definitely related to the overall purpose of the regression. The models were built based upon
determining whether there were differences between the P1, P2, and P3 students. Furthermore, for each of the regression models, R2
was reported. For Model 1, 49% of the variance in the student PCOA scores were accounted for by the predictors in the model, while
65% and 59% of the variance were accounted for in Model 2 and Model 3, respectively.
Recommendations 7 and 9:Observe the unit of analysis for the regression coefficients. In Gillette et al.5 and the example in this paper,
both the standardized and unstandardized regression coefficients were provided. Since all variables in the regression model were on
different scales, the standardized coefficients help compare the relative importance of variables. However, if interpreting the native
unit is of interest, then the predictor variables could be interpreted using the unstandardized coefficients.
Recommendation 8:Understand the meaning of statistically significant relationships. When this recommendation is violated and causal
relationships are referenced, it typically occurs in the discussion section of an article where statistically significant relationships are
examined.
Potential impact
As demonstrated by this paper, regression analysis is a versatile analytic technique that allows researchers to predict an outcome
variable, explain an outcome variable, or estimate the effect of a predictor variable while adjusting for other predictors. Due to its
versatility, it is commonly used in educational, bio-medical, and social science research, making it one of the most popular analytic
techniques.
By expanding our knowledge and use of regressions, there is an opportunity to continue improving the quality of educational
research by asking more complex research questions to derive richer results. By providing multiple examples, we hope we have
provided a scaffolded overview of regression analysis and that more pharmacy educators feel empowered to use this type of analysis
in their research. For more resources on regression see Box 1.
Regression is a powerful and robust analytical technique that can help pharmacy educators continue to enhance the quality of
pharmacy research. While this article should not be your only resource when exploring, designing, or implementing regression
models, we hope this article was useful for individuals who engage in pharmacy education research, like the scholarship of teaching
and learning, or would like to start engaging in this type of pharmacy education research.
10
Disclosure(s)
None.
Declaration of Competing Interest
None.
References
1. Houglum JE, Aparasu RR, Delfinis TM. Predictors of academic success and failure in a pharmacy professional program. Am J Pharm Educ. 2005;69(3):43. https://
doi.org/10.5688/aj690343.
2. McCall KL, MacLaughlin EJ, Fike DS, Ruiz B. Preadmission predictors of PharmD graduates’ performance on the NAPLEX. Am J Pharm Educ. 2007;71(1):5. https://
doi.org/10.5688/aj710105.
3. Lyons K, Taylor DA, Minshew LM, McLaughlin JE. Student and school-level predictors of pharmacy residency attainment. Am J Pharm Educ. 2018;82(2):6220.
https://doi.org/10.5688/ajpe6220.
4. Accreditation standards and key elements for the professional program in pharmacy leading to the doctor of pharmacy degree (“Standards 2016”). Accreditation
Council for Pharmacy Education. 2 February 2015. Accessed 29 May 2020. https://www.acpe-accredit.org/pdf/Standards2016FINAL.pdf.
5. Gillette C, Rudolph M, Rockich-Winston N, et al. Predictors of student performance on pharmacy curriculum outcomes assessment at a new school of pharmacy
using admissions and demographic data. Curr Pharm Teach Learn. 2017;9(1):84–89. https://doi.org/10.1016/j.cptl.2016.08.033.
6. Giuliano CA, Gortney J, Binienda J. Predictors of performance on the pharmacy curriculum outcomes assessment (PCOA). Curr Pharm Teach Learn.
2016;8(2):148–154. https://doi.org/10.1016/j.cptl.2015.09.011.
7. Aggarwal R, Ranganathan P. Common pitfalls in statistical analysis: linear regression analysis. Perspect Clin Res. 2017;8(2):100–102. https://doi.org/10.4103/
2229-3485.203040.
8. de Winter JCF, Gosling SD, Potter J. Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: a tutorial using si-
mulations and empirical data. Psychol Methods. 2016;21(3):273–290. https://doi.org/10.1037/met0000079.
9. Cohen J. Statistical power analysis for the Behavioral sciences. Routledge Academic. 1988.
10. Peeters MJ. Practical significance: moving beyond statistical significance. Curr Pharm Teach Learn. 2016;8(1):83–89. https://doi.org/10.1016/j.cptl.2015.09.001.
11. Berry WD, Feldman S. Multiple Regression in Practice. Sage Publications, Inc; 1985.
12. Field A, Miles J, Field Z. Discovering Statistics Using R. Sage Publications, Inc; 2012.
13. Wendorf CA. Primer on multiple regression coding: common forms and the additional case of repeated contrasts. Underst Stat. 2010;3(1):47–57. https://doi.org/
10.1207/s15328031us0301_3.
14. Peeters MJ, Harpe SE. Numbers etiquette in reports of pharmacy education scholarship. Curr Pharm Teach Learn. 2016;8(6):896–904. https://doi.org/10.1016/j.
cptl.2016.08.026.
15. Berger VW. On the generation and ownership of alpha in medical studies. Control Clin Trials. 2004;25(6):613–619. https://doi.org/10.1016/j.cct.2004.07.006.
16. Schreiber JB. New paradigms for considering statistical significance: a way forward for health services research journals, their authors, and their readership. Res
Social Adm Pharm. 2020;16(4):591–594. https://doi.org/10.1016/j.sapharm.2019.05.023.
17. Kirk RE. Practical significance: a concept whose time has come. Educ Psychol Meas. 1996;56(5):746–759. https://doi.org/10.1177/0013164496056005002.
18. Lakens D. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol. 2013;4:863. https://doi.
org/10.3389/fpsyg.2013.00863.
19. Thompson B. Effect sizes, confidence intervals, and confidence intervals for effect sizes. Psychol Sch. 2007;44(5):423–432. https://doi.org/10.1002/pits.20234.
20. Norman G. Likert scales, levels of measurement, and the “laws” of statistics. Adv Health Sci Educ. 2010;15(5):625–632. https://doi.org/10.1007/s10459-010-
9222-y.
21. Malek MH, Berger DE, Coburn JW. On the inappropriateness of stepwise regression analysis for model building and testing. Eur J Appl Physiol.
2007;101(2):263–264. https://doi.org/10.1007/s00421-007-0485-9.
22. Smith G. Step away from stepwise. J Big Data. 2018;5:32https://doi.org/10.1186/s40537-018-0143-6.
23. Bowerman BL, O’Connell RT. Linear Statistical Models: An Applied Approach. 2nd ed. Duxbury 1990; 1990.
24. Meyers R. Classical and Modern Regression with Applications. 2nd ed. Duxbury 1990; 1990.
25. Durbin J, Watson GS. Testing for serial correlation in least squares regression. Biometrika. 1951;38(1/2):159–177. https://doi.org/10.2307/2332391.
26. Austin PC, Steyerberg EW. The number of subjects per variable required in linear regression analyses. J Clin Epidemiol. 2015;68(6):627–636. https://doi.org/10.
1016/j.jclinepi.2014.12.014.
27. Norman GR, Streiner DL. Pretty Darned Quick Statistics. 3rd ed. B.c. Decker; 2003.
28. Good PI, Hardin JW. Common Errors in Statistics (and how to Avoid Them). 4th ed. John Wiley & Sons, Inc; 2012.
29. Harrell FE. Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer; 2001.
30. Green SB. How many subjects does it take to do a regression analysis. Multivar Behav Res. 1991;26(3):499–510. https://doi.org/10.1207/s15327906mbr2603_7.
31. Hair JF, Black WC, Babin BJ, Anderson RE. Multivariate Data Analysis. 7th ed. Pearson 2009; 2009.
32. Skrepnek GH. Regression methods in the empiric analysis of health care data. J Manag Care Pharm. 2005;11(3):240–251. https://doi.org/10.18553/jmcp.2005.
11.3.240.
33. Duan N. Smearing estimate: a nonparametric retransformation method. J Am Stat Assoc. 1983;78(383):605–610. https://doi.org/10.2307/2288126.
34. Manning WG, Mullahy J. Estimating log models: to transform or not to transform? J Health Econ. 2001;20:461–494. https://doi.org/10.1016/S0167-6296(01)
00086-8.
35. Deb P, Norton EC. Modeling health care expenditures and use. Annu Rev Public Health. 2018;39:489–505. https://doi.org/10.1146/annurev-publhealth-040617-
013517.
36. Harpe SE. Design, analysis, and conclusions: telling a consistent causal story. Curr Pharm Teach Learn. 2017;9(1):121–136. https://doi.org/10.1016/j.cptl.2016.
09.001.
37. Azen R, Budescu DV. The dominance analysis approach for comparing predictors in multiple regression. Psychol Methods. 2003;8(2):129–148. https://doi.org/10.
1037/1082-989X.8.2.129.
38. Harpe SE. How to analyze Likert and other rating scale data. Curr Pharm Teach Learn. 2015;7(6):836–850. https://doi.org/10.1016/j.cptl.2015.08.001.
11

Regression Pharma

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Pharma

Uploaded by

Copyright:

Available Formats

Currents in Pharmacy Teaching and Learning xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Currents in Pharmacy Teaching and Learning

Using multiple linear regression in pharmacy education scholarship

ARTICLE INFO ABSTRACT

1877-1297/ © 2020 Elsevier Inc. All rights reserved.

Methodological literature review

Simple linear regression

Multiple linear regression

The intercept (b0)

The regression coefficients (bn)

Appropriate uses of regression

Steps for conducting a regression

Variable types Predictor variables can be NA Exclusion of variables that do not

NA = not applicable; VIF = variance inflation factor.

Interpretation of regression results

Our recommendations and their applications

Recommendation 1: Determine the purpose of the regression analysis

Recommendation 4: Ensure you have a sufficient sample size

Recommendation 5: Consider transformations of variables carefully

Recommendation 8: Do not confuse statistically significant relationships as causal inferences

Applications and implications

Declaration of Competing Interest

You might also like