Professional Documents
Culture Documents
Introduction To Statistics, Brief Knowledge in SPSS& Data Entry in SPSS
Introduction To Statistics, Brief Knowledge in SPSS& Data Entry in SPSS
bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
Introduction to Statistics,
Brief Knowledge in SPSS&
Data Entry in SPSS
Introduction to Statistics
Definition of Statistics:
It is difficult to define statistics in a few words, since its dimension, scope, function, use and
importance are constantly changing over time. Facts and figures of phenomenon or events are
called statistics.
1
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
The role of statistics in any field of applications is an aid to solving problems or to be more
precise, to answering a variety of questions raised in solving problems. It has its wider
applications in accounting, marketing, finance and production processes. In modern times,
importance of statistics is felt in every walk of life and has indispensable connection with the
appropriate policy formulation and sound decision making. Policy makers and program
managers are fully dependent on statistical information for national planning. Statistical
methods have extensive applications in natural, biological, agricultural and other branches of
sciences. It is also widely used in solving problems of both chemical and preventive medicine
as well as in health. A few examples of applications of statistics may interest the readers:
The farmer’s profit depends largely on yields, costs and sale prices of his products. A
farmer will be interested to ascertain whether a given amount of fertilizer will result in
additional yield. This needs to be verified through field trial. It is part of the work of a
statistician who will design an experiment to establish a relationship between the crop-
yield and amount of fertilizer. This may be in the form of a curve or assume a
mathematical expression. The farmer uses this tool to decide on the amount of fertilizer to
use, together with the information on cost and anticipated sale price.
In a nutrition survey, the weights of a group of under-five children in rural areas have
been compared with another group of children in urban areas of the same age group The
investigation revealed that the urban children weigh one and a half pound more than the
rural children on the average. Can this difference be considered as a true difference or it
has been due to chance? Only statistical testing can provide the answer to this query.
2
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
wants to ascertain which machine did produce this bulb. To arrive at a probabilistic
decision, the manager must use statistical knowledge.
An insurance company offers to sell a one-year term insurance policy to a 40-year old
person, who has a probability 0.90 of surviving one year more. How large a premium
should the insurance company charge him for a Tk. 50,000 term life insurance? A
statistical approach is needed to solve this problem.
A survey was conducted among the children of Bhola district to study the prevalence of
night-blindness. The investigation revealed that the children of well-to-do families are
more frequent to suffer from night-blindness than the children of poor families. What
caused this differential? A statistical analysis will lead to draw valid conclusion to arrive
at a decision.
Statistical inference vis-à-vis generalizations are needed in many instances. To mention a few
examples, we may think of
(i) assessing the value of all property in Dhaka City for the year 2020 on the basis of
business trends, population projection, and other factors;
(ii) comparing the effectiveness of two or more teaching methods on the basis of samples
of students thus taught;
(iii) determining the most effective dose of a new vaccine on the basis of experiments
conducted with volunteer patients from selected hospitals;
(iv) predicting the traffic jam over a bridge to be built in near future;
(v) predicting the number of teaching positions to be created in the university within
next 10 years on the basis of the current trend in the enrollment of the students and
future planning.
In each of the above examples, there always remain uncertainties. It is perhaps the job of the
statistician to suggest a most reasonable and promising course of action.
3
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
4
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
Quantitative variable: A quantitative variable is one for which the resulting observations are
numeric and thus possesses a natural ordering. Example: Age, Height Family size etc.
Qualitative Variable: A qualitative variable is one for which numerical measurement is not
possible, such as, hair color, religion, profession etc.
Difference between quantitative and qualitative variables:
Discrete variable: When a variable can assume only the isolated values within a given range,
the variable is called discrete variable such as family size, class size etc.
Continuous variable: When a variable can theoretically assume any value within a given
range the variable is said to be continuous variable. Thus age, height, temperature etc. are
continuous variables.
5
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
Population Sample
(1) An aggregate of all individuals or items (1) A small but representative part with
(actual or possible) defined on some finite number of individuals or items of a
common characteristics is called a population is called a sample.
population.
(2) It may be finite or infinite. (2) A sample is always finite.
6
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
(3) Capital letters are used to denote (3) Small letters are used to denote sample
population size usually by 𝑁. size usually by 𝑛.
(4) The statistical constants of population (4) The statistical measures obtained from
are usually referred to as parameters. the sample observations has been termed
as statistics
(5) Population size is always greater than (5) Sample size is always smaller than the
the sample size. population size.
(6) Census survey deals with the population. (6) Sample survey deals with the sample.
(7) Population is considered as a universal (7) Sample is a subset of the population.
set.
7
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
Parameter Statistic
(1) Any function of the population values is (1) Any function of the sample observation
called parameter. is called statistic.
(2) Parameter is an unknown constant. (2) Statistic does not contain unknown
constant.
(3) Parameters are not used to estimate (3) Statistic are used to estimate population
population characteristics. characteristics (such as parameters)
(4) Parameters are free from sampling and (4) Statistics are subject to sampling and
other errors. non-sampling error.
(5) There is no distribution of parameter. (5) Statistic has distribution, which is called
sampling distribution.
(6) The population mean , variance 2 etc (6) The sample mean x , variance s 2 etc
are called parameter. are called statistic.
Distinguishing Between a Parameter and a Statistic
Decide whether the numerical value describes a population parameter or a sample statistic.
Explain your reasoning.
1) A recent survey of 200 college career centers reported that the average starting salary
for petroleum engineering majors is $83,121.
2) The 2182 students who accepted admission offers to Northwestern University in 2009
have an average SAT score of 1442.
3) In a random check of a sample of retail stores, the Food and Drug Administration
found that 34% of the stores were not storing fish at the proper temperature.
Solution
1) Because the average of $83,121 is based on a subset of the population, it is a sample
statistic.
2) Because the SAT score of 1442 is based on all the students who accepted admission
offers in 2009, it is a population parameter.
3) Because the percent of 34% is based on a subset of the population, it is a sample
statistic.
8
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
Data: Data is a plural word and comprehend the idea of collection of pieces of information
on some variables. Data are the foundation stones and basic raw, disorganized facts and
figures collected from any field of inquiry.
9
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
Model of
data and
information
2) Secondary data: The data which are collected or obtained from some published or
unpublished sources are called secondary data. This type of data is not original in
character. For example: the reports and publications made by Central Bureau of
Statistics are primary for that organization but secondary for those who use it.
The main difference between primary and secondary data is only of degree one. Data which
are primary in the hands of one becomes secondary in the hands of other. That is primary data
once collected and published becomes secondary data for other investigators. For example:
the data relating the population of Bangladesh published by Bangladesh Bureau of Statistics
are primary for that organization but secondary for those who use it.
10
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
There are the following differences between primary and secondary data:
The data collected on quantitative variables is called quantitative data and the data collected
on qualitative variables is called qualitative data.
11
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
heart of any discipline. For example, we want to measure how much an employee is satisfied
with his job.
Scale: A scale may be defined as any series of items [that] are arranged progressively
according to value or magnitude, into which an item can be placed according to its
quantification. In other words, a scale is a continuous spectrum or series of categories. The
purpose of scaling is to represent, usually quantitatively, an item’s, a person’s, or an event’s
place in the spectrum.
Scale of Measurement:
Measurement is a process of assigning number to some characteristics or variables or events
according to scientific rules.
The variables in any study may be of different nature and they may represent some attributes,
characteristics or key factors of interest. These variables can be measured under four levels or
scales of measurement. The measurement scales are:
1. Nominal scale
2. Ordinal scale
3. Interval scale
4. Ratio scale
Comparative study of scales of measurement:
12
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
Nominal Scale: The measurement scale, in which numbers are assigned to the categories or
variable values for identification only, is called a nominal scale. For example: gender,
13
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
Ordinal Scale: The measurement scale in which numbers are assigned to the categories or
variable values for identification as well as for ranking is called an ordinal scale. For
example: consider the variable economic status which can be categorized as rich (1), middle
class (2) and poor (3).
Interval Scale: The measurement scale in which numbers are assigned to the variable values
in such a way that the level of measurement is broken down on a scale of equal units and the
zero value on the scale is not absolutely zero, is called an interval scale. For example: the
variable temperature can have values o 0 c, 100 c, 200 c etc.
Ratio Scale: The measurement scale in which numbers are assigned to the variable values in
such a way that the level of measurement is broken down on a scale of equal units and the
zero value on the scale is absolutely zero, is called a ratio scale. For example: age, weight,
pulse rate, parity etc.
Classification of variable by scale of measurement:
Variables
Qualitative Quantitative
Economic status, Religion Age, family size
14
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
SPSS – X can use up to 315 variables in comparison to the 500 that SPSS/PC+ can use. SPSS
for Windows can use more than 500 variables.
SPSS for Windows has been derived from the mainframe version and not from SPSS/PC+
version.
The exchange of files between the different versions of SPSS (SPSS – X, SPSS/PC+, SPSS
for Windows) is handled by special SPSS files that are created and read with the IMPORT
and EXPORT commands. Communication with other well-known PC packages is also
possible.
15
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
We will stick to SPSS for windows. SPSS for windows is an advanced statistical package
designed to run interactively on PC and other computers in a graphical environment, using
descriptive menus and simple dialog boxes to do most of the work. Most tasks can be
accomplished simply by pointing and clicking the mouse.
16
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
Viewer Window: The viewer window is where we see the statistics and graphics – the output
from the work in SPSS. The viewer window is also called Output Window which is split
into two parts or panes:
The Outline Pane (Left side of the viewer window)
The Display Pane (Right side of the viewer window)
Pivot table: Most of SPSS’s tabular and statistical output appears in the viewer in the form of
pivot tables. Double clicking a pivot table lets you edit it.
Chart editor: Double clicking a chart in the viewer will open the chart editor. Now we can
17
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
Naming a Variable:
The name of a variable should be short in size (no specific limit). You may use an
alpha-numeric name (First character must be alphabet).
18
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
You may use _ (underscore) or . (dot) between two words if the name is long, but
can’t use space or - (dash). For example, Residential Status variable as res_stat or
res.stat, but not res-stat or res stat.
Give a label for the variable so that you can get a portrayal of the variable in the
output.
Now we’ll learn how to input data in SPSS. Let’s do create a data file in SPSS using the
information given below:
ID Name Sex Age Region Height Education Monthly
(inc.) income (Tk)
1 Ahmed Ali Male 45 Dhaka 67 Higher 32000
2 John Abraham Male 36 Chittagong 70 Secondary 14000
3 Meena Female 23 Barisal 62 Secondary 13000
4 Ronjon Sharma Male 42 Khulna 71 Illiterate 4000
5 Helal Male 57 Rajshahi 65 Primary 9000
6 Nancy Female 40 Dhaka 59 Higher 28000
7 Suzuka Female 34 Rangpur 64 Secondary 11000
8 Mintu Male 67 Rajshahi 57 Illiterate 7500
9 Romeo Male 38 Khulna 68 Primary 9500
10 Anisul Haque Male 41 Sylhet 69 Secondary 10000
Questionnaire
ID Number ……… Date of interview:
1. Name of the Respondent: … … …
2. Age: …
19
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
3. Sex:
Male …1
Female … 2
4. Residential Status:
Urban …1
Semi urban …2
Slum …3
Rural …4
5. Education:
Illiterate …0
Primary …1
S.S.C …2
H.S.C …3
Graduate and above …4
6. Occupation:
Day laborer …1
Farmer …2
Service …3
Business …4
House wife …5
Retired from service … 6
Others (please specify)…7
7. Family members: ……
8. Total family income per month: ………
9. Smoking Status:
Current smoker …1
Past smoker …2
Never smoked …0
10. Are you suffering from diabetes? (if No go to Q # 14)
Yes …1
No …0
20
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
Enter the hypothetical data in the data editor window. Find inconsistencies if there any.
id date (dd.mm.yy) name age sex Res edu occu f_size income smok
1 02.02.08 Riaz 43 1 4 1 2 5 13000 2
2 10.02.08 Kamal 59 1 1 4 1 6 10000 1
3 05.02.08 Bely 42 2 1 1 3 4 5000 1
4 09.02.08 Nanto 39 1 3 2 4 7 9500 2
5 15.02.08 Monir 43 1 4 4 5 4 8000 2
6 21.02.08 Poresh 64 1 1 3 2 8 14000 0
21
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: akanda@du.ac.bd
DATA LIST / NAME 1-10 (A) AGE 15-19 SEX 20 (A) EDU 25-28 BSAL 30-40.
VARIABLE LABELS EDU 'EDUCATION' BSAL ‘BASIC SALARY’.
BEGIN DATA
SHAFIQUE 24 M 12 8000
MUKTA 30 F 16 10000
KALAM 35 M 10 5000
SHAHIN 32 M 14 7000
END DATA.
22