Ambreen 2338 18990 1 BRM Session 14 SPSS

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

Basic Quantitative

Data Analysis

Using SPSS

1
SPSS – What Is It?
• SPSS means “Statistical Package for the Social Sciences” and
was first launched in 1968.
• SPSS is software for editing and analyzing all sorts of data.
These data may come from basically any source: scientific
research, a customer database, Google Analytics or even the
server log files of a website. SPSS can open all file formats
that are commonly used for structured data such as
• spreadsheets from MS Excel or OpenOffice;
• plain text files (.txt or .csv)

2
SPSS Views

SPSS Data View


• The data view displays data values
SPSS Variable View
• An SPSS data file always has a second sheet called
variable view.
• It shows the metadata associated with the data.
• Metadata is information about the meaning of variables
and data values.
• This is generally known as the “codebook” but in SPSS
it's called the dictionary.
3
SPSS Output Window

• After clicking Ok, a new window opens up: SPSS’ Output


Viewer window.
• It displays tables with all statistical results on all
variables we chose

4
Creating a data file and entering
data
• Defining Variables
• Name, Type, Width, Decimals, Label, Values, Missing,
Columns, Align, Measure
• Entering Data in SPSS
• Using Data Editor
• Using EXCEL

5
Descriptive Statistics
Instruction: Open (Survey.sav) file for this task
• Categorical Variable
• Frequency
• Crosstabs
• Continuous Variable
• Descriptive Statistics
• Missing Data
• Exclude Case Listwise
• Exclude cases Pairwise
• Replace with mean

6
Descriptive Statistics
Missing Values
• The Exclude cases listwise option will include cases in the analysis only
if they have full data on all of the variables listed in your Variables box
for that case. A case will be totally excluded from all the analyses if it is
missing even one piece of information. This can severely, and
unnecessarily, limit your sample size. (Remove the entire case from
analysis).
• The Exclude cases pairwise option (Recommended), however, excludes
the case (person) only if they are missing the data required for the
specific analysis. Variables will still be included in any of the analyses
for which they have the necessary information.
• The Replace with mean option (not recommended as data is biased),
which is available in some SPSS statistical procedures (e.g. multiple
regression), calculates the mean value for the variable and gives every
missing case this value. This option should never be used, as it can
severely distort the results of your analysis, particularly if you have a lot
of missing values.
7
Descriptive Statistics

• Assessing Normality
• Descriptive
• Skewness and
Kurtosis

• Histogram

• Normal Q-Q plot


(Probability plot)

8
Interpreting Skewness and Kurtosis

• Fairly Symmetrical data: Skewness between -0.5 and +0.5


• Normal distribution (there is normality in the data)
• Moderately skewed data:
• Negatively skewed: skewness between -1.0 and -0.5
• Positively skewed: skewness between 0.5 and 1.0
• This is an approximately normal distribution
• Highly skewed data:
• Negatively skewed: skewness is less than -1.0
• Positively skewed: skewness is more than 1.0
• Kurtosis between -2.0 and 2.0 is considered acceptable
• Less than -2.0 is too flat (negatively kurtosis)
• More than 2.0 is too peaked (positive kurtosis)
9
Manipulating the Data

• Calculating Total Scale Scores


• Step 1: Reverse any negatively worded items.
• Optimum Scale: Item 2, 4, 6 (Procedure handout)
• Step 2: Add together scores from all the items
that make up the subscale or scale
• (Procedure handout)

Use (survey.sav) File for this exercise

10
Reliability of a Scale

• When you are selecting scales to include in your study, it is


important to find scales that are reliable
• One of the main issues concerns the scale's internal
consistency.
• This refers to the degree to which the items that make up the scale
'hang together'. Are they all measuring the same underlying
construct?
• One of the most commonly used indicators of internal
consistency is Cronbach's alpha coefficient.
• Ideally, the Cronbach alpha coefficient of a scale should be above.7
(DeVellis 2003).
Example: Survey.sav ; Variable: lifsat1,2,3,4,5 (Procedure handout)
Warning: must check all negatively worded items
11
Interpretation

• Cronbach’s Alpha
• Values above .7 are considered acceptable; however, values above .8
are preferable.
• Inter-Item Correlation Matrix
• Check for negative values. All values should be positive, indicating that
the items are measuring the same underlying characteristic.
• Corrected Item-Total Correlation
• values shown in the Item-Total Statistics table give you an indication of
the degree to which each item correlates with the total score.
• Low values (less than .3) here indicate that the item is measuring
something different from the scale as a whole.
• If your scale’s overall Cronbach alpha is too low (e.g. less than .7) and
you have checked for incorrectly scored items, you may need to
consider removing items with low item-total correlations.
12
Interpretation (contd.)

• Alpha if Item Deleted

• the impact of removing each item from the scale is


given.
• Compare these values with the final alpha value
obtained.
• If any of the values in this column are higher than the final alpha
value, you may want to consider removing this item from the scale.

13
Correlation

• Correlation analysis is used to describe the strength and


direction of the linear relationship between two variables.
• PRELIMINARY ANALYSES FOR CORRELATION
• Before performing a correlation analysis, it is a good idea to
generate a scatterplot.
• This enables you to check for violation of the assumptions of
linearity and homoscedasticity
Follow handout to generate scatterplot
File: Survey.sav

14
Interpretation of output from scatterplot
• Step 1: Inspecting the distribution of data points
• Are the data points spread all over the place? This suggests a very low
correlation.
• Are all the points neatly arranged in a narrow cigar shape? This suggests
quite a strong correlation.
• Could you draw a straight line through the main cluster of points, or would a
curved line better represent the points? If a curved line is evident (suggesting
a curvilinear relationship) Pearson correlation should not be used, as it
assumes a linear relationship.
• What is the shape of the cluster? Is it even from one end to the other? Or
does it start off narrow and then get fatter? If this is the case, your data may
be violating the assumption of homoscedasticity.
• Step 2: Determining the direction of the relationship between the variables
• The scatterplot can tell you whether the relationship between your two
variables is positive or negative.

15
Pearson Correlation

• RQ: Is there a relationship between the amount of control people


have over their internal states and their levels of perceived stress?
Do people with high levels of perceived control experience lower
levels of perceived stress?
• Follow Procedure handout; Example file survey.sav
• Interpretation
• Step 1: Checking the information about the sample
• Step 2: Determining the direction of the relationship
• Step 3: Determining the strength of the relationship
• small r=.10 to .29; medium r=.30 to .49; large r=.50 to 1.0
• Step 4: Calculating the coefficient of determination
• Step 5: Assessing the significance level
16
PRESENTING THE RESULTS FROM
CORRELATION (Analysis reporting)
• The relationship between perceived control of
internal states (as measured by the PCOISS) and
perceived stress (as measured by the Perceived
Stress Scale) was investigated using Pearson
product-moment correlation coefficient.
• There was a strong, negative correlation between
the two variables, r = –.58, n = 426, p < .001, with
high levels of perceived control associated with
lower levels of perceived stress.

17
Correlation is often used to explore the relationship among a
group of variables, rather than just two as described above.
It is cumbersome to report all the individual correlation
coefficients in a paragraph; it would be better to present
them in a table. One way this could be done is as follows:

18
COMPARING THE CORRELATION
COEFFICIENTS FOR TWO GROUPS
• Sometimes when doing correlational research you may
want to compare the strength of the correlation
coefficients for two separate groups.
• For example, you may want to look at the relationship between
optimism and negative affect for males and females separately.
• Follow procedure handout
• Important:
• Remember, when you have finished looking at males and
females separately you will need to turn the Split File
option off. It stays in place until you specifically turn it off.
• To do this, click on Data, Split File and select the first button:
Analyze all cases, do not create groups.
19
Multiple Regression

• Multiple regression tells you how much of the


variance in your dependent variable can be
explained by your independent variables.
• It also gives you an indication of the relative
contribution of each independent variable. Tests
allow you to determine the statistical significance of
the results, in terms of both the model itself and
the individual independent variables.

20
Multiple Regression

• Some of the main types of research questions that


multiple regression can be used to address are:
• how well a set of variables is able to predict a
particular outcome
• which variable in a set of variables is the best
predictor of an outcome
• whether a particular predictor variable is still able
to predict an outcome when the effects of another
variable are controlled for (e.g. socially desirable
responding).
21
MAJOR TYPES OF MULTIPLE
REGRESSION

1. Standard / Simultaneous

2. hierarchical or sequential

3. Stepwise. – backward and forward step.


• But strictly not recommended

22
Standard Multiple Regression
• All the independent (or predictor) variables are entered into
the equation simultaneously.
• Each independent variable is evaluated in terms of its
predictive power (Beta), over and above that offered by all
the other independent variables.
• This is the most commonly used multiple regression
analysis. You would use this approach if you had a set of
variables (e.g. various personality scales) and wanted to
know how much variance in a dependent variable (e.g.
anxiety) they were able to explain (R-square)
• This approach would also tell you how much unique
variance in the dependent variable each of the
independent variables explained.

23
Steps for interpreting the SPSS output for Multiple Regression

1. Look in the Model Summary table, under the R Square and the Sig. F
Change columns. These are the values that are interpreted.

The R Square value is the amount of variance in the outcome that is accounted for
by the predictor variables you have used.
• Adjusted R Square: When a small sample is involved, the R square value in the
sample tends to be a rather optimistic overestimation of the true value in the
population. The Adjusted R square statistic ‘corrects’ this value to provide a
better estimate of the true population value.
• In the ANOVA Table, Sig. column (contains the p-value):
If the p-value is LESS THAN .05, the model has accounted for a statistically
significant amount of variance in the outcome.
If the p-value is MORE THAN .05, the model has not accounted for a significant
amount of the outcome.

24
Steps for interpreting the SPSS output for Multiple Regression
(Contd.)

2. Look in the Coefficients table, under the B, Std. Beta, and Sig., columns.
The B column contains the unstandardized beta coefficients that depict the
magnitude and direction of the effect on the outcome variable. Use these to make
the regression equation for model prediction.
The Std. Error contains the error values associated with the unstandardized beta
coefficients.
The Beta column presents standardized beta coefficients for each predictor
variable. Use these to identify the better predictor. These values for each of the
different variables have been converted to the same scale so that you can compare
them. Don’t predict the model using these values.
• The Sig. column shows the p-value associated with each predictor variable.
If a p-value is LESS THAN .05, then that variable has a significant association with
the outcome variable.
If a p-value is MORE THAN .05, then that variable does not have a significant
association with the outcome variable.

25
PRESENTING THE RESULTS FROM
MULTIPLE REGRESSION IN THE REPORT
• Multiple regression was used to predict levels of
stress (DV- Perceived Stress Scale) through Mastry
and Perceived control over internal states (PCOISS).
After entry of the Mastery Scale and PCOISS Scale as
IVs, the total variance explained by the model as a
whole was 47.4%, p < .001.
• The Mastery Scale recorded a higher beta value (beta
= –.44, p < .001) than the PCOISS Scale (beta = –.33, p
< .001), showing that perceived control over external
factors (Mastry) is a stronger predictor of perceived
stress than Perceived control over internal states
(PCOISS)
26

You might also like