SPSS - Unit I

Introduction
• Statistics is that branch of mathematics that converts data into some

useful information
• This transformation requires complex calculations which can easily be
done using computers.
• Attempting these calculations manually is difficult
• Therefore, SPSS was created to help researchers in handling large
volume of data that they collect during their research study.
SPSS
• SPSS is a software package used for statistical analysis.
• It is statistical software that accepts raw data and converts into some
relevant statistics that can be used for further analysis.
• It is termed as Statistical Product and Service Solutions widely known
as Statistical Package for the Social Sciences
• It is a comprehensive tool for analyzing statistical data. SPSS is
compatible in accepting data of different file format and use them to
generate tabulated reports, charts, and graphs including descriptive
and inferential statistics.
• SPSS is a Windows based program that can be used to perform data
entry and analysis and to create tables and graphs. SPSS is capable of
handling large amounts of data and can perform all of the analyses
covered in the text and much more.
• It was launched in 1968 and later bought by IBM in 2009.
• It has been developed by three students at the University of Stanford
Norman H. Nie, C. Hadlai (Tex) Hull and Dale H. Bent.
• They developed a software system based on the idea of using
statistics to turn raw data into information essential to decision-
making
Advantages
• It provides a wide range of statistical procedures, including descriptive statistics,
inferential statistics, and graphics.
• User-friendly interface: SPSS has a simple and intuitive interface that makes it easy for
researchers to navigate and use.
• Comprehensive statistical analyses: SPSS provides a wide range of statistical
procedures, making it a versatile tool for analyzing data from different types of
research studies.
• Data management: SPSS includes data management capabilities, such as data recoding
and data cleaning, which can save researchers time and effort.
• Output customization: SPSS allows researchers to customize the output of their
analyses, such as changing the colors and formatting of charts and tables.
• Save time and effort, perform a job in seconds
• More exact calculations compared with manual calculation
• Many complex statistical tests are available as a built in feature.
Basic concepts in quantitative study
• Research is basically a systematic enquiry or an objective process of
gathering data for the purpose of making decisions.
• Research is a fact finding process which is undertaken through a systematic
procedure which includes collection, compilation presentation and
interpretation of data.
• The research which aims to establish some theory is known as basic
research,
• while the research conducted for solving problems with the use of theories
is termed as applied research.
• There are certain concepts or terminologies associated with quantitative
study which will be repetitively used
• Population: A population consists of all the items or individuals about
which you want to draw a conclusion.
• Variable: It refers to properties or characteristics that can be taken in
different values, quantitative and qualitative. There are two type of
variable, one is independent variable and the other is dependent
variable.
• Model: It represents some system that have been established to
study some aspect of that system or the system as a whole.
• Sample: It is the part of a population that is selected for analysis.
• Parameter: It represents characteristics of the population.
• Statistic: it represents the characteristics of the sample
• Hypothesis: It is a proposition that can be experimentally verifiable
and has a definite practical consequence.
• Data: Data are systematic record of values taken by a variable or a number of variables on a
particular point of time or over different points of time
• Data can be:
1. Quantitative: these are numerically represented and calculations can be performed on
them
2. Qualitative: it represents some characteristics or attributes on variables
3. Discrete: it takes some specific values rather than a range of values
4. Continuous: it can take values between a certain range
Types:
5. Time series: A time series is a set of observations on the values that a variable takes at
different times. Such data may be collected at regular time intervals
6. Cross sectional : Cross-section data are data on one or more variables collected at the
same point in time, such as the census of population
7. Pooled : In pooled, or combined, data are elements of both time series and cross-section
data
8. Panel : This is a special type of pooled data in which the same cross-sectional unit (say, a
family or a firm) is surveyed over time.
Frequency distribution
• When observations, discrete or continuous, are available on a single
characteristic of a large number of individuals, often it becomes necessary to
condense the data as far as possible without losing any information of interest
• Frequency of a variable is the number of times it occurs in given data
• There are two types of frequency distribution, namely, simple frequency
distribution and grouped frequency distribution. Simple frequency
distribution shows the values of the variable individually whereas the grouped
frequency distribution shows the values of the variable in groups or intervals
• ARRAY: the arrangement of data in ascending or descending order.
• The representation of the data in the form of array and tally is known as
frequency distribution.
• The word 'frequency' is derived from 'how frequently' a variable occurs
• Inclusive classes: In such series upper limit of one interval is not equal
to lower limit of next interval. Both upper limit and lower limit are
included in the same class.
eg: 15-19, 20-24 and son on
• Exclusive classes: In such series upper limit of one interval is the
lower limit of next interval.
eg: 15-20, 20-25, 25-30 and so on
Charts and diagrams for ungrouped data
• Charts and diagrams are useful devices for the data presentation. Diagrams are
appealing to the eyes and they are helpful in assimilating data readily and quickly.
• A chart, on the other hand, can clarify complex problems and reveal the hidden
facts. But charts or diagrams unlike tables do not show details of data and require
much time to construct
• The common types of charts and diagrams are,
1) Line diagrams: Mostly, the time series data are represented by line diagrams. In a
line diagram, data are shown by means of a curve or a straight line.
2) Bar diagrams: Bar diagram consists a group of equally spaced rectangular bars,
one for each category (or class) of given statistical data
3) Pie diagrams: A pie diagram is a circle whose area is divided proportionately
among the different components by straight lines drawn from the center to the
circumference
4) Pictogram: A pictogram is a chart that uses pictures to represent data.
Diagrams of frequency distribution
• Histogram, frequency polygon and ogives are means of diagrammatic presentation of
frequency type of data
1. Histogram: It is the most common form diagrammatic presentation of grouped
frequency data. It is a set of adjacent rectangles on a common base line. The base of
each rectangle measures the class width whereas the height measures the
frequency density.
2. Frequency Polygon of a frequency distribution could be achieved by joining the
midpoints of the tops of the consecutive rectangles. The two end points of a
frequency polygon are joined to the base line at the mid values of the empty classes
at the end of the frequency distribution
3. Ogives are nothing but the graphical representation of the cumulative distribution.
Plotting the cumulative frequencies against the mid-values of classes and joining
them, we obtain ogives
Measures if central tendency
• According to Professor Bowley averages are "statistical constants which enable us to
comprehend in a single effort the significance of the whole.“
• They give us an idea about the concentration of the values in the central part of the
distribution
• A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. They are also called as summary
statistics.
• The following are the five measures of central tendency that are in common use:
1. Mean
2. Median
3. Mode
4. Geometric mean
5. Harmonic mean
Measures of dispersion
• In addition to locating the center of the observed values of the
variable in the data, another important aspect of a descriptive study
of the variable is numerically measuring the extent of variation
around the center.
• The word dispersion is used to denote degree of heterogeneity in the
data.
• It is an important characteristic indicating the extent to which the
observations may vary among themselves.
• It tells you how your data is clustered around the mean
• Literal meaning of dispersion is “scatteredness”
Absolute measure of dispersion
• Range
• Mean deviation
• Quartile deviation
• Standard deviation
Relative measures of dispersion
• Coefficient of range
• Coefficient of mean deviation
• Coefficient of quartile deviation
• Coefficient of variation
Correlation
● Bivariate distribution: So far we have confined ourselves to
univariate distributions, i.e .• the distributions involving only one
variable. We may, however, come across certain series where each
term of the series may assume the values of two or more variables.
For example, if we measure the heights and weights of a certain
group of persons, we shall get what is known as Bivariate
distribution - one variable relating to height and other variable
relating to weight.
● In a bivariate distribution we may be interested to find out if there is
any correlation or covariation between the two variables under
study. It the change in one variable affects a change in the other
variable, the variables are said to be correlated.
• Correlation can be generally defined as the degree of association of
any two variable
Utility of correlation coefficient
• Its useful tool for the economists to study the relationship between
the variables
• It helps in numerically measuring the degree of relationship between
varibles
• It helps in testing the significance of the relationship
• Sampling error can be calculated
• Correlation is the basis for the study of regression analysis
● Between two variables, the degree of association may range all the
way from no relationship at all to a relationship so close that one
variable is a function of the other. Thus, correlation may be:
1) Perfectly positive
2) Limited positive degree
3) No correlation at all
4) Limited negative degree
5) Perfectly negative
● We use following methods to measure simple correlation between
two variables:
1) Scatter Diagram
2) Karl Pearson’s Coefficient of Correlation
3) Coefficient of Rank Correlation
4) Concurrent deviation methods
Regression
• Simple regression analysis
Regression analysis is concerned with the study of the dependence of
one variable, the dependent variable, on one or more other variables,
the explanatory variables, with a view to estimating and/or predicting
the (population) mean or average value of the former in terms of the
known or fixed (in repeated sampling) values of the latter.
• Regression analysis is a statistical tool to study the nature and
extent of functional relationship between two or more variables and
to estimate (or predict) the unknown values of dependent variable
from the known values of independent variable
Simple linear regression analysis
• The simplest possible regression analysis, namely, the bivariate, or
two-variable, regression in which the dependent variable (the
regressand) is related to a single explanatory variable (the regressor).
• regression analysis is largely concerned with estimating and/or
predicting the (population) mean value of the dependent variable on
the basis of the known or fixed values of the explanatory variable(s)
• Here we call the average or mean value as conditional expected
values, because it is depending on the given fixed values of variable x
• Symbolically, we denote the conditional expected values, E(Y/X). We
read it as expected value of Y for the given X
• The stochastic specification if the above equation wil be like this
• To sum up, then, we find our primary objective in regression analysis
is to estimate the PRF
• Two variable Regression analysis is concerned with the study of the
dependence of one variable, the dependent variable, on one other
variable, the explanatory variable, with a view to estimating and/or
predicting the (population) mean or average value of the former in terms
of the known or fixed (in repeated sampling) values of the latter
• Example :
𝑌𝑖=𝛽1+𝛽2X𝑖+𝑢𝑖 : Two variable regression model
𝑌𝑖[E(Y/X2)]=𝛽1+ 𝛽2X2i+Ui Where
Yi= dependent variable (consumption)
𝛽1=intercept
𝛽2 =slope
X2= Explanatory variable (Income)
Ui= random disturbance term
Estimation of OLS estimators
• ==
• = ȳ-
The Coefficient of Determination : A
Measure of “Goodness of Fit”
• We now consider the goodness of fit of the fitted regression line to a set of data;
that is, we shall find out how “well” the sample regression line fits the data
• it is clear that if all the observations were to lie on the regression line, we would
obtain a “perfect” fit, but this is rarely the case. Generally, there will be some
positive uˆi and some negative uˆI
• What we hope for is that these residuals around the regression line are as small
as possible.
• The coefficient of determination r 2 (two-variable case) or R2 (multiple
regression) is a summary measure that tells how well the sample regression line
fits the data.
• The value of r square lies between 0 and 1.
• TSS = ESS + RSS
• Now dividing the above equation by TSS on both sides,
1=
The quantity r 2 thus defined is known as the (sample) coefficient of determination and is the most
commonly used measure of the goodness of fit of a regression line.
It measures the proportion or percentage of the total variation in Y explained by the regression model.
Hypotheis testing
• Hypothesis: It is mere assumption or some supposition to be proved or
disapproved. Research hypothesis is a predictive statement capable of
being tested by scientific methods that relates independent variable to
some dependent variable
• Null hypothesis:
It means no difference b/w the true/actual value and hypothesis
value.
Null hypothesis denoted as H0.
• Alternative hypothesis:
Any hypothesis, which is complementary to the null hypothesis is
called an alternative hypothesis, usually denoted by H1.
• No hypothesis test is 100% certain. Because the test is based on
probabilities, there is always a chance of making an incorrect
conclusion. When you do a hypothesis test, two types of errors are
possible: type I and type II.
Type I error:
When the null hypothesis is true and you reject it, you make a type I
error.
Type II error
When the null hypothesis is false and you fail to reject it, you make a
type II error. When we accept the null hypothesis though it is false.
Steps in the hypothesis testing
1.Formulation of hypothesis(H0 and H1)
2.Level of significance(α) and the critical region: the maximum size of
type I error that we may commit. This refers to degree of significant
with which we accept or reject particular hypothesis. A region
(corresponding to a statistic t) in the sample space S that amounts to
rejection of H0 is termed as 'critical region'.
3.One tail or two tail test: In this alternative hypothesis expressed by the symbol(<) or
(>) is called one tailed test. A test of any statical hypothesis where the alternative is
written with a symbol ( ≠ ) is called two tailed test.
h0=0 (no difference between sample mean and population)
h1≠0 two tail test
h1 <0 left tail test
h1>0 right tail test
4. Decide sample statistic:
5. Calculate sample statistic
6. Critical values/table values
7. Compare calculated and critical values and make decision: if calculated
value is greater than table value or critical value, reject H0 and accept H1
and vice versa
8. Write conclusion

SPSS - Unit I

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SPSS - Unit I

Uploaded by

Copyright:

Available Formats

Introduction

• Statistics is that branch of mathematics that converts data into some

You might also like