Professional Documents
Culture Documents
SPSS Training Manual EARO-01
SPSS Training Manual EARO-01
Gondar
(SRMP-NG)
(SPSS)
By
Minilik Tsega
Knowledge is power and data is just data. No matter how much data you have on
hand, if you don’t have a way to make sense of it, you really have nothing at all.
That is where SPSS comes in.
Introduction
Chapter One
Introduction to SPSS
If you want to keep the active cell where it is but view another part
of the window, use the scroll arrows along the right and bottom
sides of the workbook window. To practice, click the arrow in the
direction you want to move in the Data Editor window. Then click
the down scroll arrow in the vertical scroll bar. The worksheet
scrolls down one row. Then click the up scroll arrow in the vertical
scroll bar. The worksheet scrolls up one row. Similarly, the
3. Status Bar
The status bar at the bottom of each SPSS window apprises the
user of the stage of operations. In particular, for each procedure
you run, a case counter indicates the number of cases processed so
far. There are also messages about the selection of specified
subsets of the data set (filter status). The message weight on
indicates that a weight variable is being used to weight cases for
analysis. When the statement : SPSS Processor is ready appears in
the Status Bar, SPSS is ready to receive your instructions.
4. Dialog Boxes
Most menu selections open dialog boxes. Each dialog box for
statistical procedure and charts have several basic components.
- source variable list:- list of variables in the working data file.
- Target variable list:- one or more lists indicating the variables
you have choosen for analysis, such as dependent and
independent variable lists.
Some changes from the default that we recommend and that are
used in all of the ITC Public Computing facilities and classrooms
are:
The following are some types of formats which can be read into
SPSS or into which you can save your SPSS data file:
SPSS for Windows can read different types of data files. To read
data files, click on File in the menu bar, and then on Open. The
Open File dialog box is displayed:
To read an SPSS data file, click on File in the menu bar, and then
click on Open. This opens the Open File dialog box. Point the
arrow to the data file you wish to open and click on it. If necessary
use the up and down arrows to scroll through files until locate your
file. Click OK.
For spreadsheet, you can read variable names from the first row of
the file or the first row of the defined range. If the names are
longer than eight characters, they are turnicated. If the first eight
characters do not create a unique variable name, the name is
modified to have a unique variable name.
Once the Excel file is in this format, you can read it into SPSS by
simply going to File: Open, select Excel from under the option box
"Files of Type," and locate the file. NB. Since Excel 5 or later
version files can have multiple spreadsheets, by default data editor
reads the first worksheet.
Once you have located the file, if it's a newer Excel file, (Office 98
or later) you are opening, SPSS will display the following dialog
box:
This box is asking whether the Excel file has variable names that
appear in the first row of the data set. If you do have such variable
names, check this box as above. Doing so, makes SPSS assign
names to each of the new variables. You can also select which
worksheet to read in if there are multiple worksheets in the file.
As with the Excel file, SPSS will give you the following prompt
about whether you have variable names that appear in the first row
of the data set.
The Text Open Wizard uses six steps to open any text file. In the
first step you can apply a predefined format (previously saved in
the text wizard). As this is not the case in our data file, we check
No button in the step.
In the next step, you are requested to provide information about the
variables in your data file. In particular, you are asked to answer
the question about the arrangement of the variables in the data file.
Fixed width (format) means that each variable is recorded in the
same column for every case. Delimited means that spaces,
commas, tabs, or other characters are used to separate variables.
The variables are recorded in the same order for each case but not
necessarily in the same column locations.
In step 3, you are asked to provide information about cases. In our
data file, each subject is a case. As the top line in the data file
intolab.txt contains the variable names (gender, age, systolic), so in
this step we indicate that the data values start on the second line.
The next three steps are straightforward. Accept all default options
provided by the Text Wizard. The data from introlab.txt will be
displayed in the Data View window and the description of all the
variables in the Variables View window.
Chapter two
The Data Editor window can be displayed in one of the two views:
Data View or Variable View. The Data View displays the contents
of the data file in the form of a spreadsheet. The Variable View
defines all variables in the data file. Switching from one view to
the other can be done by clicking the appropriate tab (Data View or
Variable View) at the bottom of the Data Editor window (see the
picture on page 4).
The Data View window is a grid, whose rows represent subjects
(or cases) and whose columns contain values of the variables
(gender, salary, age etc.) for each subject. Each cell of the grid,
therefore, will usually contain the score of one particular subject
on one particular variable. For example, the salaries of employees
in a company can be presented in a column, and then each
employee is a case.
The cell is the intersection of the case and the variable. Cells
contain only data values. Unlike spreadsheet programs (Excel,
Lotus), cells in the Data Editor cannot contain formulas.
The data file is rectangular. The dimensions of the data file are
determined by the number of cases and variables. Initially, every
column in the Data Editor has the heading var, and all the cells are
empty.
Now you will enter the data into the Data Editor window. Do not
enter the names of the variables at the top of each column yet.
Follow the instructions below.
1.1. Variables:
The variable name must begin with a letter and cannot end with a
period. The length of the name cannot exceed 8 characters.
Variable names that end with an underscore should be avoided.
Blanks and special characters can not be used (!, ?,” and *)
Enter the new variable name in the column Name in any blank
row. For example, enter the name gender in the first row. After
entering the name, the default attributes (Type, Width,...) are
automatically assigned. Then if you click on the Type column, the
variable type sub dialog box appears
1.2. Variable Type
- Numeric , Comma and dot – you can enter values with any
number of decimal positions. The data editor displays only
the defined number of decimal positions
- String – all values are right padded to maximum width.
- Date – you can use slashes, dashes, spaces. Commas, or
periods as delimiters between day, month and year. (
dd/mm/yy)
- Time you can use colons, periods or spaces.
To assign to a label, enter the value in the text box then enter the
label in the label text box then click on Add.
In the same way enter the remaining variables Age and Systolic.
Age should be defined as a numeric variable with two digits (to
minimize the chances of transcription error) and Systolic as a
numeric variable with 3 digits.
View, and click the Missing cell for the variable (systolic). A
button in the cell appears.
With this definition of the missing values for the variable systolic,
SPSS will treat 9999 as a missing value of the variable and not
include it in any computations involving the systolic blood
pressure variable.
To delete a variable (row), select the row number that you wish to
delete, click on Edit, and then on Clear. The selected variable will
be deleted and all variables to the right of the deleted variable will
shift to the left. Alternatively, you can select the row and press
Delete key on your keyboard.
Clicking on any cell will highlight it (active cell) and its contents
will appear in the cell editor. You can enter the data in any order.
Data values are not recorded until you press Enter or select another
Enter the values for all cases on one variable (column) and then
repeat the procedure for all values in the remaining columns. Enter
the data for our example. You will learn how to save the data in the
next Section.
1. Editing Data
To delete the old value and enter a new value: click the cell, enter
the new value, press Enter. To modify a data value: click the cell,
click the cell editor, edit the data value, and press Enter. To delete
the values in a range, select (highlight) the area concerned and
press Delete. Use the Undo command in Edit to undo any action
you just performed. For example, use the Undo command to delete
the value you have just entered in the Data Editor window.
2. Adding Cases
To insert a new case (row) in between cases that already exist in
your data file: click on the row below the row where you wish to
enter the new case, click on Data on the menu bar, click on Insert
Case from the pull-down menu.
3. Deleting Cases
To delete a case, click on the case number that you wish to delete,
click on Edit from the menu, and then on Clear. The selected case
will be deleted and the rows below will shift upward.
To select subset of cases, click on Data in the main menu and then
on Select Cases from the pull-down menu. This opens the Select
Cases dialog box.
To save changes to an SPSS data file make the Data Editor the
active window and from the menus choose File and then Save. The
Chapter Three
1. Data Transformations
After a data set has been entered into SPSS, it may be necessary to
modify it in certain ways. With SPSS, you can perform data
transformations ranging from simple tasks, such as combining
categories for analysis, to more advanced tasks, such as creating
new variables based on complex equations.
In the dialog box there are two basic places to focus on.
- Calculator Pad: contains numbers, arithmetic operators,
relational operators and logical operators. You can use it just
like calculator.
- Functions: there are over 130 built – in functions, including:
o Arithmetic function (SQRT, EXP, LG10, LN, SIN,
COS, ABS, RND(round))
o Statistical functions ( SUM, MEAN, SD, VARIANCE,
CFVAR, MIN, MAX)
o Distribution functions
o Logical functions
o Date and time aggregation and extraction functions
o Missing value functions
o Cross-case functions
o String functions
1. Conditional expressions
You can use conditional expressions to apply to transformations
.a conditional expression returns a value true, false or missing
for each case.
To specify a conditional expression, click on If ... in the
compute variable dialog box. This opens the If Cases dialog
box. You can choose one of the following alternatives:
- Include all cases: values are calculated for all cases, and
any conditional expressions are ignored. It is the default.
- Include if case satisfies condition: The expression can
include variable names, constants, arithmetic operators,
numeric and other functions, logical variables, and
relational operators.
Observe that there is a pound sign (#) icon at the variable age and
the variable systolic. In fact, all numeric variables (age, and
systolic are numeric) are identified with the icon. On the other
hand, all string variables are identified by an icon with the letter A.
Obviously, gender is a string variable. For information about a
variable, click the left mouse button on the variable name to select
it, and then the right mouse button and choose Variable
Information from the pop-up menu. Enter the name of the new
variable in the Target Variable box. To build an expression, either
paste components into the Expression field or type directly in the
Expression field. The If… dialog box allows you to apply data
transformations to selected subsets of cases.
You have two options available for recoding variables. You may
recode values into the same variable, which eliminates all record of
the original values. You also have the option to create a new
Then click on Old and New Values. You will obtain the following
box:
The old value of the variable gender is "F", and the new value is
"1". Then click on Add tab to recode the old value and its new
value. Similarly enter "M" as the old value, "2" as the new value,
and click on Add tab.
When you have indicated all the recode instructions, click on
Continue to close the above dialog box. Then click on OK to close
the Recode Into Same Variables dialog box. Now gender no longer
is expressed as either "F" or "M", but it is one of two integers 1, or
2. Close the file without saving the changes you have made and
retrieve the original data file introlab.sav.
Now click on Old and New Values. The old value of the variable
age is the range Lowest through 40, and the new value is 1. Then
click on Add tab to recode the next range and its new value.
Finally, you will obtain the following box:
Ranking Method
To choose other ranking methods: Click on Rank Types... in the
rank cases dialog box. Available options are: Rank, Savage score,
Fractional renk, Fractional rank as %, sum of case weights,
Ntiles
You can choose any one or both of the options below appearing in
the dialog box.
- Proportion Estimates :- the estimates of the cumulative
proportion (area) of the distribution.
- Normal scores:- the new variable contains the Z scores
from the standard normal distribution that correspond to
the estimated cumulative proportion.
You can recode numeric and string variables. You can recode
numeric variables into string variables and vice versa. If you
select multiple variables, they must all be the same type. You
cannot recode numeric and string variables together.
First Case Is. Defines the starting date value, which is assigned to
the first case. Sequential values, based on the time interval, are
assigned to subsequent cases.
If date variables have already been defined, they are replaced when
you define new date variables that will have the same names as the
existing date variables.
To Define Dates for Time Series Data
From the menus choose:
Select the time series function you want to use to transform the
original variable(s). Select the variable(s) from which you want to
create new time series variables. Only numeric variables can be
used.
of valid values above and below the missing value used to compute
the mean.
Median of nearby points. Replaces missing values with the
median of valid surrounding values. The span of nearby points is
the number of valid values above and below the missing value
used to compute the median.
2. Sorting Data
Suppose that we would like to sort the data in the data file
introlab.sav according to the age of the subjects enrolled in the
study. In order to sort the data, from the menus choose Data, and
then Sort Cases. The following dialog box will be displayed:
In order to sort the subjects according to the age, select age and
move it to the Sort by box. You can sort cases in ascending and
descending order. If you select multiple sort variables, cases are
sorted by each variable within category of the prior variable on the
Sort list. For example, if you select gender as the first sorting
variable and age as the second sorting variable, cases will be sorted
by age classification within each gender category. For string
variables, uppercase letters precede their lowercase counterparts in
sort order. For example, the string value "F" comes before "f" in
sort order.
3. Transpose
Transpose creates a new data file in which the rows and columns in
the original data file are transposed so that cases (rows) become
variables and variables (columns) become cases. Transpose
automatically creates new variable names and displays a list of the
new variable names.
example, you might want to merge a data file that contains pre-test
results with one that contains post-test results.
Open one of the data files. From the menus choose: Data >
Merge Files > Add Variables...
Select the data file to merge with the open data file.
Select the variables from the external file variables (+) on the
Excluded Variables list. Select Match cases on key variables in
sorted files. Add the variables to the Key Variables list.
The key variables must exist in both the working data file and the
external data file. Both data files must be sorted by ascending order
The key variables must have the same names in both data files.
4. Aggregate Data
Select one or more break variables that define how cases are
grouped to create aggregated data. Select one or more aggregate
variables to include in the new data file. Select an aggregate
function for each aggregate variable.
5. Split File
Split File splits the data file into separate groups for analysis based
on the values of one or more grouping variables.
To Split a Data File for Analysis , From the menus choose: Data >
Split File...
Unselected Cases. You can filter or delete cases that don’t meet
the selection criteria. Filtered cases remain in the data file but are
excluded from analysis. Select Cases creates a filter variable,
FILTER_$, to indicate filter status. Selected cases have a value of
1; filtered cases have a value of 0. Filtered cases are also indicated
with a slash through the row number in the Data Editor. To turn
filtering off and include all cases in your analysis, select All cases.
Deleted cases are removed from the data file and cannot be
recovered if you save the data file after deleting the cases.
7. Weight Cases
Weight Cases gives cases different weights (by simulated
replication) for statistical analysis.
Chapter Four
Chapter Five
1. Evaluating Assumptions
Tests of Normality
To test whether our data have come from a normal distribution, we
can use the normal probability plot. In a normal probability plot,
each observed value is paired with its expected value from the
normal distribution. If the sample is from a normal distribution, we
expect that the points will fall more or less on a straight line.
When the Explore dialog box opens, the following options are
available.
X= [(Oij-Eij)/ Eij]
dialog box click on the statistics... push button and check in the
chi-square check box.
Chi-square-based measures
The phi coefficient which is a modification of the Pearson chi-
square is
2
= N
2
C =
+N
2
2
N (k-
V= 1)
Ordinal Measures
b = P–Q
(P + Q + Tx)(P + Q + TY)
c = 2m(P – Q)
N2(m – 1)
(XY – NXY)
r= (X – X) 2(Y – Y)2
Chi-square. For tables with two rows and two columns, select
Chi-square to calculate the Pearson chi-square, the likelihood-ratio
chi-square, Fisher’s exact test, and Yates’ corrected chi-square
(continuity correction). For 2 X 2 tables, Fisher’s exact test is
computed when a table that does not result from missing rows or
columns in a larger table has a cell with an expected frequency of
less than 5. Yates’ corrected chi-square is computed for all other
2 X 2 tables. For tables with any number of rows and columns,
select Chi-square to calculate the Pearson chi-square and the
likelihood-ratio chi-square. When both table variables are
quantitative, Chi-square yields the linear-by-linear association test.
Kappa. For tables that have the same categories in the columns as
in the rows (for example, measuring agreement between two
raters), select Cohen’s Kappa.
Risk. For tables with two rows and two columns, select Risk for
relative risk estimates and the odds ratio.
McNemar. The McNemar test is a nonparametric test for two
related dichotomous variables. It tests for changes in responses
using the chi-square distribution. It is useful for detecting changes
in responses due to experimental intervention in "before and after"
designs.
Cochran’s and Mantel-Haenszel. Cochran’s and Mantel-
Haenszel statistics can be used to test for independence between a
dichotomous factor variable and a dichotomous response variable,
conditional upon covariate patterns defined by one or more layer
(control) variables. The Mantel-Haenszel common odds ratio is
also computed, along with Breslow-Day and Tarone's statistics for
testing the homogeneity of the common odds ratio.
4. Subpopulation Differences
The Means dialog box is opened. Places for dependent list and
independent list are available to be filled by appropriate variables.
Also when the options... pushbutton is clicked on, the Means:
Options dialog box appears which contains check boxes and raid
buttons for options of univariate statistics and analysis of variance.
There are options which you may set. The Define Ranges... and
the Options pushbutton in the dialog box when clicked display
smaller dialog box for value range specification and setting
options.
x =
N
S x = s/ N
X1 - X2
S12 – S22
t
= N1 N2
X1 - X2
Sp2 – Sp2
t
= N1 N2
t D
= SD/N
You must define the two groups for the grouping variable.
You require one - way analysis of variance when only one variable
is used to classify cases into the different groups. When two or
more variables are used to form the groups, the simple Factorial
ANOVA procedure is required.
You can use the One-way ANOVA procedure only when your
groups are independent. If you observe the same person under
several conditions, you cannot use this procedure.
Assumptions Required
Each of the groups is an independent random sample from a
normal population
In the population, the variances of the groups are equal
You can test the null hypothesis that the groups come from
population with the same variance by means of the Leven test,
which can be obtained with the One-Way ANOVA procedure.
Variability Analysis
be close to 1. The statistical test for the null hypothesis that all
groups have the same mean in the population is based on this ratio,
called an F statistic.
Degree. You can choose a 1st, 2nd, 3rd, 4th, or 5th degree
polynomial.
Coefficients. User-specified a priori contrasts to be tested by the t
statistic. Enter a coefficient for each group (category) of the factor
variable and click Add after each entry. Each new value is added to
the bottom of the coefficient list. To specify additional sets of
contrasts, click Next. Use Next and Previous to move between sets
of contrasts.
Tests. Once you have determined that differences exist among the
means, post hoc range tests and pairwise multiple comparisons can
determine which means differ. Range tests identify homogeneous
subsets of means that are not different from each other. Pairwise
multiple comparisons test the difference between each pair of
of predicted mean values for the cells in the model, and profile
plots (interaction plots) of these means allow you to easily
visualize some of the relationships.
R= (X,- X)(Yi-Y)
(N-1) Sx Sy
Where N is the number of cases and S x and Sy are the standard
deviations of the two variables. The absolute value of r indicates
the strength of the linear relationship. The largest possible absolute
value or r is 1, which occur when all points fall exactly on the line.
In order to test the hypothesis that the population correlation
coefficient is different from zero, we use the statistics.
t= r N-2
1-r2
If the population correlation coefficient (p) is Zero, the test statistic
has a Student’s t distribution with N-2 degrees of freedom. The
assumption required to use the above statistic is that independent
random samples are taken from a distribution in which the two
variables together are distributed normally.
T=r N-θ-2
1- r2
Where θ is the order of the coefficient and r is the partial
correlation coefficient. The degrees of freedom for t are N- θ-2,
where N is the number of cases.
10. Distances
This procedure calculates any of a wide variety of statistics
measuring either similarities or dissimilarities (distances), either
between pairs of variables or between pairs of cases. These
Y= B0+B1X
Where Bo is the intercept of the line and B1 is the slope, i.e, the
amount of change in Y for a single unit change in X
Yi=0+1Xi+ei
The population parameters (values) for the slope and intercept are
denoted by 1and 0
The term ei usually called the error, is the difference between the
observed value of Yi
And the subpopulation mean at the point Xi. The ei are assumed to
be normally distributed, independent, random variables with a
mean of 0 and variance of 2
B1 = (Xi-X)(Y1-Y)
(Xi-X)
The standard error of B0 is
o= 1 + X2
2 (N-1)S X2 Where
Sx2 is the sample variance of the
independent variable.
B1=
(N-1) S2x
S2 = (Yi-B0-B1Xi)2
N-2
S is termed as the standard error of the estimate.
T= B1
SB1
The distribution of the statistic, when the assumptions are met and
the hypothesis of no linear relationship is true, is Student’s
distribution with N-2 degrees of freedom The statistic for testing
the hypothesis that the intercept is 0 is.
T= Bo
S Bo
Its distribution is also Student’s t with N-2 degrees of freedom.
R2 = R2- P(1-R2)
N-P-1
Were P is the number of independent variables in the equation.
0=1=2 =…=N
11.6 Determining Important Variables.
Beta Coefficients
Use of the actual regression coefficients may not give the true
picture of the importance of variables. Instead regression
coefficients are standardized using the following equation.
beta k-Bk(Sk/SY)
Where Sk is the standard deviation of the Kth independent variable.
Beta is the beta coefficient calculated.
Part and Partial Coefficients
Another way of assessing the relative importance of independent
variables is to consider the increase in R 2 when a variable is
entered into an equation that already contains the other
independent variables. This increase if
Forward Selection
In forward selection, the first variable considered for entry into the
equation is the one with the largest positive or negative correlation
with the dependent variable. The F test for the hypothesis that the
coefficient of entered variable is 0 is then calculated. To determine
whether this variable (and each succeeding variable) is entered, the
F value is compared to an established criterion. You can specify
one of two criteria in SPSS. One criterion is the minimum value of
Backward Elimination
While forward selection starts with no independent variables in the
equation and sequentially enters them, backward elimnination
starts with all variables in the equation and sequentially removes
them. Instead of entry criteria, removal criteria are used.
Stepwise Selection
12 Curve Estimation
In a situation where you want to fit a curve that you think is
appropriate to data you have, you can do it in SPSS. The Curve
Estimation procedure produces curve estimation regression
statistics and related plots for 11 different curve estimation
regression models. You can also save predicted values, residuals,
and prediction intervals as new variables.
In the Curve Estimation dialog box, click your right mouse button
on a model to obtain the equation of the model.
variables are the same. This test makes no assumptions about the
shape of these distributions.
To compute the sign test, the difference between the buying scores
of husbands and wives is calculated for each case. Next, the
numbers of positive and negative differences are obtained. If the
distributions of the two variables are the same, the numbers of
positive and negative differences should be similar.
To obtain a sing test, from the menus choose
X2=(Oi-Ei)2
Ei
Prob(event) 1
=
1+ e –z
Where z is a linear Combination , Z= Bo+B 1x1+B2X2+B3X3+
………+ BpXp
The probability of an event not occurring is estimated as,
(
regression model
= Bo+B1X1+B2X2+……+BpXp
as.
Lo
g
P(event)
1-P(events
)
The quantity on the left side of the equal sign is called a logit. It is
a natural log of the odds that the event will occur.
Output
The logistic regression analyses gives us estimates of B, standard
error, walid statistic, significance probability level, From these
output we can give the logistic regression equation for the
probability of an event occurs.
Prob(event) 1 =0.0340
1-e-(-3.346)
Based on this estimate, we could say the event will not occur
because the probability is <0.5
Test
We will not stop at getting the output we have to see the
significance of the coefficients For large sample size, the test that a
coefficient is 0 can be based on the walid statistic, which has a chi-
square distribution. When a variable has a single degree of
freedom, the walid statistic is just the square of the ratio of the
coefficient to its standard error.
But, the waild statistic has a very undesirable property. When the
absolute value of the regression coefficients becomes large, the
estimated standard error is too large. This produces a walid statistic
that in too small, leading you to fail to reject the null hypothesis,
that the coefficient is 0, when infact you should. So when you have
a large coefficient, you should not rely on the walid statistic for
hypothesis testing. Instead you should model with and without that
variable and base your hypothesis test on the change in the Log-
likelihood.
L ( 1-P
P
)
n
This quantity is called a logit. If the observed proportion is 0.5, the
logit-transformed value is 0. And if the observed proportion is
0.95, the logit transformed value is 1.47. In most situations,
analysis based on logits and probits give very Similar results.
Transformed Pi=A+BXi
Where Pi is observed proportion responding at dose Xi (usually
log of the dose is used)
Taking the log of the dose in probit analysis, first look at the plot
of observed probits against the dose. If the plot looks linear go
ahead. If not, change the transformation format to another until, the
relationships looks linear.
Eg Y= Bo+B1 X12
This can be rewritten as Y=Bo+B 1X1 - Where
X1 = X2
Consider the model
Y= e Bo+B1k1+E
This model is not of the form linear, but if we get the Natural
logarithm of both sides .
Ln (Y) = Bo+B1X1+ E
This model is linear in parameters, we can use the usual techniques
to estimate them, Models that seem nonlinear but transformed to
linear are sometimes called intrinsically linear models. It is good
idea to always search a way to make the model linear.
C
Yi + where Yi is the population size at time ti
1+eA+BTi
= Ei
If you ignore the error term, sometimes a linear form of the model
can be derived. Linear regression can then be used to obtain initial
values.
Y= eA+bx +E
If we ignore then error term and take the natural log of both sides
Ln(Y) = A+Bx
you form the sum of a series of terms, one for each condition. Each
term consists of a logical expression (in parentheses) multiplied by
the expression that should result when that logical expression is
true.
dialog box.) Note: This selection persists in this dialog box for the
rest of your session. If you change the model, be sure to deselect it.
expression or paste from the Parameters list at the left. You cannot
use ordinary variables in a constraint.
• One of the three logical operators <=, =, or >=.
• A numeric constant, to which the expression is compared using
the logical operator.
Type the constant. Numeric constants must be typed in American
format, with the dot
as a decimal delimiter.
You can save a number of new variables to your active data file.
Available options are Predicted values, Residuals, Derivatives, and
Loss function values. These variables can be used in subsequent
analyses to test the fit of the model or to identify problem cases.
28 Chapter 5
Interpreting Nonlinear Regression Results
Nonlinear regression problems often present computational
difficulties:
• The choice of initial values for the parameters influences
convergence. Try to choose initial values that are reasonable and, if
possible, close to the expected final solution.
• Sometimes one algorithm performs better than the other on a
particular problem. In the Options dialog box, select the other
algorithm if it is available. (If you specify a loss function or certain
types of constraints, you cannot use the Levenberg- Marquardt
algorithm.)
• When iteration stops only because the maximum number of
iterations has occurred, the “final” model is probably not a good
solution. Select Use starting values from previous analysis in the
Chapter Six
Factor Analysis
Descriptives
Extraction
Rotation
Options
Chapter Seven
Discriminant Analysis
Note: The grouping variable can have more than two values. The
codes for the grouping variable must be integers, however, and you
need to specify their minimum and maximum values. Cases with
values outside of these bounds are excluded from the analysis.
Define Range
Select Cases
To select cases for your analysis, in the main dialog box click
Select, choose a selection variable, and click Value to enter an
integer as the selection value. Only cases with that value for the
selection variable are used to derive the discriminant functions.
Statistics
Stepwise Method
Chapter Eight
Cluster Analysis
As long as all the variables are of the same type, the Hierarchical
Cluster Analysis procedure can analyze interval (continuous),
count, or binary variables.
Options
Hide details
Exclude cases listwise. Excludes cases with missing values
for any clustering variable from the analysis.
Exclude cases pairwise. Assigns cases to clusters based on
distances computed from all variables with nonmissing
values.
Options
will use the disk to store information that will not fit in memory.
Specify a number greater than or equal to 4.
Plots
Cluster pie chart. Displays a pie chart showing the percentage and
counts of observations within each cluster.
Method
Transform Values
Statistics
Plots
Chapter Nine
Reliability Analysis
Statistics
You can select various statistics describing your scale and items.
Statistics reported by default include the number of cases, the
number of items, and reliability estimates as follows:
Chapter Ten
Graphs
We produced examples of pie charts and bar chart as part of the
Frequencies and Crosstabs output. In this section we look at the
range of SPSS graphs and charts in more detail and how to
customize them to your own taste, using some of the more popular
types of graphs.
There are two sorts of Graphs available from the Graphs menu.
Down the main part of the menu is access to old style graphs,
available for a long time in SPSS. The Interactive submenu
contains a similar list of graphs which are available in the new
interactive graphics format.
You may recognize some of the graph types and have used them
before. Some of the graph types have specialized applications,
such as the Pareto... and Control... options to quality control
work, Sequence... and Time Series for Business and Econometric
work and ROC curve first available in SPSS 9.
1. Graphs Gallery
If you want to know more about the different types of graph then
select Gallery from the Graphs menu to get information about
how to construct them. This will open the SPSS help system at the
Main Chart Gallery
Above the histogram you should see the chart menus and a variety
of buttons on the Toolbar which can be used to change different
aspects of the graph. The Toolbar looks different now that a chart
is open. The menus have also changed to provide facilities for
changing the content and appearance of graphs.
The top section of the tool bar contains the same buttons you
would see when a data or output window is active. On the second
row are the chart editing buttons. From left to right the chart
editing buttons are used to: -
.
Some of the buttons can only be used with particular types of
graph, so for a histogram they will be grayed out on the toolbar.
label. The axis will also be rescaled so it starts from 60" (five feet)
and finish at 78" (six feet six inches).
You will then be presented with a box to choose which axis, either
the Interval or Scale axis. The Interval axis corresponds the range
of values of the variable and the Scale axis corresponds to the
number of cases in each height interval (i.e. the horizontal and
vertical axis respectively for this histogram).
Both scale and interval axes have the same general form - although
the dialog boxes and thus the things about them you can change are
slightly different. The interval axis, along the base of the
histogram, corresponds to the range of the variable values - the
scale for the weight variable.
The intervals between each tick on an axis and the axis range can
be customised rather than letting SPSS automatically set the scales.
Custom and Define... buttons are used to make changes to the axis
intervals.
3. Saving a chart
Often it is enough to save the charts you create along with the rest
of the output in an output window (with a .SPO extension).
However, it is possible to export charts individually into a graphics
file format using Export Chart... from the File menu. If you can't
see Export Chart... in the File menu, double click on the chart to
make sure it is open. This opens the Export Chart dialog box, so
the chart can be saved to a file. The default for a file type is jpeg or
.jpg; this file format is used a lot on the web. There is a variety of
possible formats available, what you choose will depend on what
you want to do with the chart.
4. Scatter plot
So far we have dealt with charts to display one scale or category
variables. A scatter plot shows the relationship between two scale
variables.
Once the plot has appeared in the viewer, it can be edited, saved
with the Output window or exported to another format as we have
seen earlier with other graphs. Double-click on the scatterplot and
the Chart Editor window is similar to the earlier ones and most
facilities will work the same but a few more buttons are now active
on the toolbar
5. Interactive Charts
So far all the charts we have seen have been available since version
6 of SPSS but newer interactive method of producing charts has
been available since version 8. The new interactive chart type
makes displaying charts and graphs of sub groups is much easier
and the editing facilities are nicer to use, although it can be quite
slow on older computers. Interactive is second from the top in the
Graphs menu and leads to a sub-menu of similar looking chart
types to the ones we have seen already. The system for editing the
charts is better and their dimensions and contents can be changed
interactively as the name would suggest.
You can see the interactive Create Bar Chart dialog box has a lot
more elements. Along the top of the dialog box you can see there
are tabs leading to other aspects of the bar chart. One of the most
noticeable differences is that you can see in an axis diagram of
which variables have been assigned to which dimension of the bar
chart. There are no arrow buttons in this dialog box to direct the
variables into the appropriate space. In these dialog boxes you
simply drag and drop a variable from the variable list into the
desired part of the chart diagram. Three different icons in the
variable list to indicate how SPSS will deal with that variable in a
chart.
The undo and redo last action buttons can be used to undo the last
thing done to the chart and redo will redo the editing instruction if
the last action has been "undone". There are two buttons to swap
axes as the swap axis button in the old style charts.
Text tools
On the right of the horizontal toolbar are pop-up menus to change
the style of selected text, including the font face, size, bold and
italic.
Text buttons on the horizontal to the right to change text font, size
and style, you will only see the font type and size when a text label
is selected in the graph.
Cursor tools
Cursor buttons change the editing mode. The arrow tool is for
selecting objects. Text can be changed using the text tool to select
the text and then changing the font using the menus. Point Id tool
looks like a target sight and is for identifying points in scatter!plots
Style tools
The vertical tool bar contains buttons to change the appearance of
the charts in various ways... Some of the items can only be used
with particular types of chart.
The style buttons are normally on the left hand vertical tool bar
below the Cursor tools. In descending order they are fill color,
border color, fill pattern, plot symbols for scatter!plots, symbol
size for scatter!plots, line style, line width, connector style.
Color
Placing a variable in the Color: box will produce a clustered bar
chart with a different colored bar for each category of the colour
variable.
Notice the Cluster heading beside the Style and Color fields. This
can be used to change the graph style from a clustered to a stacked
bar chart.
Size
For this type of chart size really isn't important! The Size: option
will only be useful for scatter plots, where the plot point size will
vary with the category of the size variable. In fact if you look back
at the original Create Bar Chart dialog box then the size option is
not available.
If you wish to try out this option then choose Graphs/ Interactive/
Scatterplot... from the menus and reproduce the height vs. weight
scatter plot we did earlier this time with cigarette consumption
defining the plot point size.
5.2. 3 D
Instead of producing a clustered or stacked bar chart it is possible
to arrange the bars in 3d formation. There is a drop-down menu
with three choices in the assign variables dialog box.
There are also a couple of relevant buttons in the utility tools (see
page 37). The 3-D tool will also give you access to the 3-D and 3-
D Light palettes. As illustrated below, the 3-D Light palette can
change the direction and strength of the lighting on the chart.
The Co-ordinate systems tool does the same job as the dimension
menus in the Create Bar chart dialog box and the Assign Variables
dialog box. If we changed from 3-D to 2-D then the chart would be
reduced to a simple bar chart again.
Panel variables
Extra dimensions can be added using panel variables. Each
category of a panel variable defines a separate graph. If there is
more than one panel variable, one graph is produced for each
combination of the variable categories.
Insert Elements
Elements can be added to the graph or chart using the Insert
Elements button on the tool bar
Chart Looks
You can also change the default look of the interactive charts you
produce (in Edit/Options... under the interactive tab) in
ChartLook:. This can be set so the charts you produce for reports
or presentations will have a uniform image - you could even
specify your own look by saving a suitable chart as a chartlook
while editing it.
These chart looks are also available from the menus while creating
or editing an interactive chart. While creating an interactive chart,
you can choose another chart look under the Option tab in the
Index
SPSS 10 for Windows Keystroke
New Data Editor Sheet = Alt + F + N + A
New Syntax Window = Alt + F + N + S
New Output Window = Alt + F + N + O
New Draft Output Window = Alt + F + N + R
New Script Window = Alt + F + N + C
Open Datafile = Alt + F + O + A
Open Text Data = Alt + F + R
Save = Alt + F + S or Ctrl + S
Save As = Alt + F + A
Print Data = Alt + F + P
Print Preview = Alt + F + V
Undo = Alt + E + U or Ctrl + Z
Redo = Alt + E + R or Ctrl + R
Cut = Alt + E + T or Ctrl + X
Copy = Alt + E + C or Ctrl + C
Paste = Alt + E + P or Ctrl + V
Paste Variables = Alt + E + V
Clear = Alt + E + E or DEL
Define Dates = Alt + D + E
Insert Variable at Pointer = Alt + D + V
Insert Case at Pointer = Alt + D + I
Go to Case = Alt + D + S
Sort Cases = Alt + D + O
Transpose Variables = Alt + D + N
Merge Files - Add Cases = Alt + D + G + C
Merge Files - Add Variables = Alt + D + G + V
Aggregate Data = Alt + D + A
Split File = Alt + D + F
Select Cases = Alt + D + C
Weight Cases = Alt + D + W