Professional Documents
Culture Documents
Allama Iqbal Open University Islamabad: Book Name (8614) Level: B.Ed
Allama Iqbal Open University Islamabad: Book Name (8614) Level: B.Ed
Allama Iqbal Open University Islamabad: Book Name (8614) Level: B.Ed
Book name
(8614)
Level: B.Ed.
Question No.3
Explain advantages and disadvantages of bar charts and scatter plot.
Answer:
Advantages and Disadvantages of Bar Charts
Following are the advantages of bar charts.
i) They show data category in a frequency distribution.
ii) They display relative numbers / proportions of multiple categories.
iii) They summarize a large amount of data in an easily interpretable manner.
iv) They make trends easier to highlight than tables do.
v) By bar charts estimates can be made quickly and accurately.
vi) They are easily accessible to everyone.
Following are the disadvantages of bar charts.
i) They often require additional explanation.
ii) Thy fail to expose key assumptions, causes, impacts and patterns
iii) They can be manipulated to give false impressions.
Scatter Plots
Advantages of Scatter plots:
1. Show a relationship and a trend in the data relationship.
2. Show all data points, including minimum and maximum and outliers.
3. Can highlight correlations.
4. Retains the exact data values and sample size.
5. Shows both positive and negative type of graphical correlation.
Disadvantages of Scatter Plots:
1. Flat best-fit line gives inconclusive results.
2. Interpretation can be subjective.
3. Correlation does not mean and not show causation.
4. Data on both axes have to be continuous data (see our post discrete vs
continuous data).
5. You cannot use Scatter diagrams to show the relation of more than two variables
Question No.4
Explain normal distribution. How does normality of data affect the analysis of data?
Answer
Normal Distribution
Normal distribution, also known as the Gaussian distribution, is a probability distribution that
is symmetric about the mean, showing that data near the mean are more frequent in occurrence
than data far from the mean. In graph form, normal distribution will appear as a bell curve.
The normal distribution is the most common type of distribution assumed in technical stock
market analysis and in other types of statistical analyses. The standard normal distribution has
two parameters: the mean and the standard deviation. For a normal distribution, 68% of the
observations are within +/- one standard deviation of the mean, 95% are within +/- two standard
deviations, and
99.7% are within +- three standard deviations.
Effects of Normality of Data
The normality assumption is one of the most misunderstood in all of statistics. In multiple
regression, the assumption requiring a normal distribution applies only to the disturbance term,
not to the independent variables as is often believed. Perhaps the confusion about this
assumption derives from difficulty understanding what this disturbance term refers to simply
put, it is the random error in the relationship between the independent variables and the
dependent variable in a regression model Each case in the sample actually has a different
random variable which encompasses all the "noise" that accounts for differences in the
observed and predicted values produced by a regression equation, and it is the distribution of
this disturbance term or noise for all cases in the sample that should be normally distributed.
There are few consequences associated with a violation of the normality assumption, as it does
not contribute to bias or inefficiency in regression models. It is only important for the
calculation of p values for significance testing, but this is only a consideration when the sample
size is very small. When the sample size is sufficiently large (>200), the normality assumption
is not needed at all as the Central Limit Theorem ensures that the distribution of disturbance
term will approximate normality. When dealing with very small samples, it is important to
check for a possible violation of the normality assumption. This can be accomplished through
an inspection of the residuals from the regression model (some programs will perform this
automatically while others require that you save the residuals as a new variable and examine
them using summary statistics and histograms). There are several statistics available to examine
the normality of variables. including skewness and kurtosis, as well as numerous graphical
depictions, such as the normal probability plot. Unfortunately the statistics to assess it are
unstable in small samples, so their results should be interpreted with caution. When the
distribution of the disturbance term is found to deviate from normality, the best solution is to
use a more conservative p value (.01 rather than .05) for conducting significance tests and
constructing confidence intervals. More precisely, the tests are a form of model selection, and
can be interpreted several ways, depending on one's interpretations of probability:
6. l.In descriptive statistics terms, one measures a goodness of fit of a normal model to the data
- if the fit is poor then the data are not well modeled in that respect by a normal distribution,
without making a judgment on any underlying variable.
7. 2.In frequentist statistics statistical hypothesis testing, data are tested against the null
hypothesis that it is normally distributed.
8. 3.In Bayesian statistics, one does not "test normality" per se, but rather computes the
likelihood that the data come from a normal distribution with given parameters /1,(J (for all
/1,(J), and compares that with the likelihood that the data come from other distributions under
consideration, most simply using a Bayes factor (giving the relative likelihood of seeing the
data given different models), or more finely taking a prior distribution on possible models and
parameters and computing a posterior distribution given the computed likelihoods.
A normality test is used to determine whether sample data has been drawn from a normally
distributed population (within some tolerance). A number of statistical tests, such as the
Student's t-test and the one-way and two-way ANOV A require a normally distributed sample
population.
Question No.5
How is mean different from median? Explain the role of level of measurement in measure of
central tendency.
Answer:
Mean
Mean is the most commonly used measure in educational research. It is appropriate for
describing ratio or interval data. It can also be used for both continuous and discrete numeric
data. It is the arithmetic average of the score. It is determined by adding up all the scores and
then by the sum by the total number of scores. Suppose we have scores, 40, 85, 94, 62, 76, 66,
90, 59, 68, and 84. In order to find the mean of these scores we simply add all the scores, which
comes to 724. Then divide this sum 10 (total number of scores). We will get 72.4, which is the
mean score. The formula for computing the mean is: (Mean score) X = DUn Where L represents
"Sum of', X represents any raw score value, n represents total number of scores. We can also
define mean as mean is the amount each individual would get if the total (LX) were divided
equally among all the individual members (n) in the distribution. In some other words we can
say that the mean is the balance point for the distribution. To interpret the as the "balance point
or the center value", we can use the analogy of a seesaw. Its mean lies right at the center where
the fulcrum keeps the board perfectly balanced. As the mean is based on every score or value
of the dataset so it is influenced by outliers and skewed distribution. Also it cannot be calculated
for categorical data as the values cannot be summed.
Median
Median is the middle value of rank order data. It divides the distribution in two halves (i.e. 50%
of scores or observations on either side of median value). It means that this value separates
higher half of the data set from the lower half. The goal of the median is to determine the
precise midpoint of the distribution. Median is appropriate for describing ordinal data.
Procedure for Determining Median
When the number of scores is odd, simply arrange the scores in order (from lower to higher or
from higher to lower). The median will be the middle score in the list. Consider the set of scores
2, 5, 7, 10, 12. The score "T' lies in the middle of the scores, so it is median. When there is an
even number of scores in the distribution, arrange the scores in order (from lower to higher or
from higher to lower). The median will be the average of the middle two score in the list.
Consider the set of scores 4, 6, 9, 14 16, 20. The average of the middle two scores 11.5 (i.e. 9+
14/2 = 23/2 = 11.5) is the median of the distribution. Median is less affected by outliers and
skewed data and is usually preferred measure of central tendency when the distribution is not
symmetrical. The median cannot be determined for categorical or nominal data.
Central Tendency
Central tendency is a descriptive summary of a dataset through a single value that reflects the
center of the data distribution. Along with the variability (dispersion) of a dataset, central
tendency is a branch of descriptive statistics. The central tendency is one of the most
quintessential concepts in statistics. Although it does not provide information regarding the
individual values in the dataset, it delivers a comprehensive summary of the whole dataset.
Mean (Average):
Represents the sum of all values in a dataset divided by the total number of thevalues.
Median:
The middle value in a dataset that is arranged in ascending order (from the smallest value to
the largest value). If a dataset contains an even number of values, the median of the dataset is
the mean of the two middle values.
Mode:
Defines the most frequently occurring value in a dataset. In some cases, a dataset may contain
multiple modes while some datasets may not have any mode at all. Even though the measures
above are the most commonly used to define central tendency, there are some other measures,
including, but not limited to, geometric mean, harmonic mean, midrange, and geometric
median. The selection of a central tendency measure depends on the properties of a dataset. For
instance, the mode is the only central tendency measure for categorical data, while a median
works best with ordinal data. Although the mean is regarded as the best measure of central
tendency for quantitative data, that is not always the case. For example, the mean may not work
well with quantitative datasets that contain extremely large or extremely small values. The
extreme values may distort the mean. Thus, you may consider other measures. The measures
of central tendency can be found using a formula or definition. Also, they can be identified
using a frequency distribution graph. Note that for datasets that follow a normal distribution,
the mean, median, and mode are located on the same spot on the graph.