Professional Documents
Culture Documents
Arm 4
Arm 4
Arm 4
Introduction
❑ Processing: editing, coding, classification and tabulation
of collected data so that analysis can be performed.
❑ Analysis:
❑ The computation of certain measures along with searching for
patterns of relationship that exist among data-groups.
❑ Relationships or differences supporting or conflicting with
original or new hypotheses should be subjected to statistical
tests of significance to determine with what validity data can be
said to indicate any conclusions
3
Data Processing and Analysis
Statistics in research
The important statistical measures that are used to summarize the
survey/research data are:
1. measures of central tendency or statistical averages;
2. measures of dispersion;
4. measures of relationship;
5. other measures.
4
Data Processing and Analysis
Measures of Central Tendency
Mean:
❑ Also known as arithmetic average.
❑ The most common measure of central tendency.
5
Data Processing and Analysis
Measures of Central Tendency
Median:
❑ The value of the middle item of series when it is arranged in
ascending or descending order of magnitude.
❑ It divides the series into two halves; in one half all items are less than
median, whereas in the other half all items have values higher than
median.
7
Data Processing and Analysis
Measures of dispersion
❑ Measures of central tendency fail to give any idea about the
scatter of the values of items of a variable in the series around
the true value of average.
❑ In order to measure this scatter, statistical devices called
measures of dispersion are calculated.
❑ The most important measures of dispersion are:
1. Range,
2. Mean deviation,
3. Standard deviation.
8
Data Processing and Analysis
Measures of dispersion
Range:
❑ The difference between the values of the extreme items of a
series.
9
Data Processing and Analysis
Measures of dispersion
Mean deviation:
❑ The average of difference of the values of items from some
average of the series (Mean, median or Mode)
❑ Only absolute differences (i.e. ignore minus sign)
11
Data Processing and Analysis
Measures of dispersion
Standard deviation:
13
Data Processing and Analysis
Measures of asymmetry (skewness)
❑ When the distribution of item in a series happens to be perfectly
symmetrical, we then have the following type of curve for the
distribution:
15
Data Processing and Analysis
Measures of asymmetry (skewness)
16
Data Processing and Analysis
Measures of relationships
Correlation
❑ A correlation is a statistical measure of the relationship
between two variables.
❑ The measure is best used in variables that demonstrate a
linear relationship between each other.
❑ The fit of the data can be visually represented in a
scatterplot. Using a scatterplot, we can generally assess the
relationship between the variables and determine whether
they are correlated or not.
17
Data Processing and Analysis
Measures of relationships
Correlation
❑ Pearson correlation coefficient:
18
Data Processing and Analysis
Measures of relationships
Correlation Example
19
Data Processing and Analysis
Measures of relationships
Correlation Example
20
Data Processing and Analysis
Measures of relationships
Correlation
22
Data Processing and Analysis
Measures of relationships
Correlation
❑ For any two correlated events, A and B, their possible
relationships include:
❑ The famous expression “correlation does not mean causation” is crucial to the
understanding of the two statistical concepts.
❑ If two variables are correlated, it does not imply that one variable causes the
changes in another variable.
❑ Causation may be a reason for the correlation, but it is not the only possible
explanation.
❑ However, a later study at Ohio State University did not find that infants sleeping
with the light on caused the development of myopia. It did find a strong link
between parental myopia and the development of child myopia, also noting that
myopic parents were more likely to leave a light on in their children's bedroom. In
this case, the cause of both conditions is parental myopia, and the above-stated
conclusion is false.
25
Data Processing and Analysis
Measures of relationships
Simple regression
❑ Regression is the determination of a statistical
relationship between two or more variables.
❑ In simple regression, we have only two variables, one
variable (defined as independent) is the cause of the
behaviour of another one (defined as dependent variable).
❑ Regression can only interpret what exists physically i.e.,
there must be a physical way in which independent
variable X can affect dependent variable Y.
26
Data Processing and Analysis
Measures of relationships
Simple regression
❑ The basic relationship between x and y is given by:
27
Data Processing and Analysis
Measures of relationships
Simple regression
28
Data Processing and Analysis
Measures of relationships
Simple regression
29
Data Processing and Analysis
Measures of relationships
Simple regression
30
Data Processing and Analysis
Measures of relationships
Simple regression
❑ In excel use data analysis from Data tab.
❑ If you cannot find Data analysis button, load the tool pack
using the link below:
https://www.excel-easy.com/data-analysis/analysis-
toolpak.html
31
Data Processing and Analysis
Measures of relationships
Simple regression
❑ In excel use data analysis from Data tab.
❑ If you cannot find Data analysis button, load the tool pack
using the link below:
https://www.excel-easy.com/data-analysis/analysis-
toolpak.html
32
Data Processing and Analysis
Measures of relationships
Simple regression
33
Data Processing and Analysis
Measures of relationships
Multiple regression
❑ When there are two or more than two independent
variables, the analysis concerning relationship is known as
multiple correlation.
❑ The equation describing such relationship as the multiple
regression equation.
34
Data Processing and Analysis
Measures of relationships
Multiple regression Simple regression
❑ In Excel, all the steps are same, except two columns of X1 and X2 are
selected in “Input X Range”
35
Sampling Fundamentals
Sampling
❑ Sampling: the selection of some part of an aggregate or
totality on the basis of which a judgement or inference
about the aggregate or totality is made.
❑ Sampling is needed because:
❑ It saves time and money
❑ The only way to analyze data when population contains infinitely many
members
❑ The only way when a test involves the destruction of the item
36
Sampling Fundamentals
Important terms
Population:
❑ The population or universe can be finite or infinite (e.g. Number of stars in the sky,
Number of grains of sand in a sample)
Sampling error
❑ The inaccuracy in the information collected when a study of a small portion of the
population is used.
❑ The expected percentage of times that the actual value will fall within the stated limits.
Thus, if we take a confidence level of 95%, then we mean that there are 95 chances in 100
(or .95 in 1) that the sample results represent the true condition of the population within
a specified precision range against 5 chances in 100 (or .05 in 1) that it does not.
37
Sampling Fundamentals
Sampling Distributions
Sampling distribution of mean
❑ The sampling theory for large samples is not applicable in small samples
because when samples are small, we cannot assume that the sampling
distribution is approximately normal.
40
Sampling Fundamentals
Sampling Distribution
Student’s t-test, based on t-distribution
❑ For cases where sample size is 30 or less and the population variance is not
known.
41
Sampling Fundamentals
Estimation
Student’s t-test, based on t-distribution
42
Sampling Fundamentals
Estimation
Student’s t-test, based on t-distribution
❑ For applying t-test, we work out the value of test statistic (i.e., ‘t’) and then compare with the table value of t
(based on ‘t’ distribution) at certain level of significance for given degrees of freedom.
❑ If the calculated value of ‘t’ is either equal to or exceeds the table value, we infer that the difference is
significant, but if calculated value of t is less than the concerning table value of t, the difference is not treated
as significant.
43
Sampling Fundamentals
Estimation
Student’s t-test, based on t-distribution
❑ (1-95%)/2= 5%/2=0.025
95%
2.5% 2.5%
44
Sampling Fundamentals
Estimation
Student’s t-test, based on t-distribution
❑ At 0.025, t=2.145
46
Sampling Fundamentals
Estimation
Population mean (Normal distribution)
47
Sampling Fundamentals
Estimation
48
Sampling Fundamentals
Estimation
49
Sampling Fundamentals
Estimation
Tables of normal distribution:
50
Sampling Fundamentals
Estimation
51
Sampling Fundamentals
Estimation
52
Sampling Fundamentals
Estimation
Suggested problems (Walpole):
53
Sampling Fundamentals
Estimation
Suggested problems (Walpole):
54