Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

ASSIGNMENT NO: 1

Course code: (8614) educational statistics

Unit: 1 to 5

Semester: spring 2021

Name: sana Manzoor

Roll no: CA651932


Q 1: what do you understand by statistics? What are the characteristics of statistics?
Explain in detail.
Answer:
Statistics meaning
The word' statistics' comes from the Latin word' status,' the Italian word 'Statista,' or the
German word' statistic.'

All of these terms refer to the current political situation. Historically, states were
compelled to collect statistical data, mostly on the number of young men in order to enlist
them into the army.

Definition of Statistics
Different authors have defined statistics in various ways.
• ‘'Statistics are numerical summaries of facts put in connection to each other in any
department of inquiry." Bowled

• ‘'By statistics, we mean quantitative data that has been significantly influenced by a
diversity of" causes. Kendall and Yule• ‘'By statistics, we mean an aggregate of facts
affected to a significant degree by a variety of causes, numerically expressed,
enumerated, or estimated according to reasonable standards of accuracy, collected in a
systematic manner for a predetermined purpose, and arranged in relation to one
another."
Secrets of Horace

Statistics' Importance
• It depicts complex data in graphical form, tabular form, and diagrammatic
representation, making it easier to understand. • It provides an exact description and
better understanding.

 It gives valid inferences with the reliability measures about the population
parameters from the sample data
 It helps to understand the variability pattern through the quantitative observant
LIMITATIONA OF STATISTICS

The science of statistics has following limitations:

I. The use of statistics is limited to numerical studies


We cannot apply statistical techniques to all type of phenomena. These
techniques can only applied to the phenomena that are capable of being
quantitatively measured and numerically expressed.
II. Statistics techniques deal with population or aggregate of individuals
rather than with individuals
Statistical techniques are not exacts laws like mathematical or chemical
laws. They are derived by taking a majority of cases and are not true for
every individual. Thus the statistical inferences are uncertain.
III. Statistics relies on estimation and approximations
Statistical techniques are not exacts laws like mathematical or chemical
laws. They are derived by taking a majority of cases and are not true for
every individual. Thus the statistical inferences are uncertain.
IV. Statistics results might lead to fallacious conclusions statistical
results are represented by figures, which are liable to be manipulated. Also
the data placed in the hands of an expert may lead to fallacious results
because figures may be stated without their context or may be applied to a
fact other than the one to which they really relate.

CHARACTERISTICS OF STATISTICS

1: Statistics are Aggregate of Facts:

Only those facts which are capable of being studied in relation to time, place or frequency
can be called statistics. Individual, single or unconnected figures are not statistics because
they cannot be studied in relation to each other. Due to this reason, only aggregate of
facts e.g., data relating to I.Q. of a group of students, academic achievement of students,
etc. are called statistics and are studied in relation to each other.

2. Statistics are affected to a marked Extent by Multiplicity, of Causes:


Statistical data are more related to social sciences and as such, changes are affected to a
combined effect of many factors. We cannot study the effect of a particular cause on a
phenomenon. It is only in physical sciences that individual causes can be traced and their
impact is clearly known. In statistical study of social sciences, we come to know the
combined effect of multiple causes.

For example, deterioration of achievement score in academic sphere of some students


may not be only due to lack of interest in school subjects, but may also due to lack of
motivation, effective teaching methods, attitude of the students on school subjects, faulty
scoring procedure, etc.

Similarly scores on memory test of a group certainly depend on meaningfulness of


learning materials, maturation of the students, methods of learning, motivation, interest
of the students, etc.

3. Statistics are Numerically Expressed:


Qualitative phenomena which cannot be numerically expressed, cannot be described as
statistics e.g. honesty, goodness, ability, etc. But if we assign numerical expression, it may
be described as ‘statistics’.

4. Statistics are Enumerated or estimated according to Reasonable Standards of


Accuracy:
The standard of estimation and of accuracy differs from enquiry to enquiry or from
purpose to purpose. There cannot be one standard of uniformity for all types of enquiries
and for all purposes. A single student cannot be ignored while calculating I.Q. of 100
students in group whereas 10 soldiers can be easily ignored while finding out I.Q. of
soldiers of whole country.
Similarly we can ignore ten deaths in a country but we cannot ignore even a single death
in a family. The amount of time and resources at disposal also determine the amount of
accuracy in estimates.

5. Statistics are collected in a Systematic Manner:


In order to have reasonable standard of accuracy statistics must be collected in a very
systematic manner. Any rough and haphazard method of collection will not be desirable
for that may lead to improper and wrong conclusion. Accuracy will also be not definite
and as such cannot be believed.

6. Statistics for a Pre-determined Purpose:


The investigator must have a purpose beforehand and then should start the work of
collection. Data collected without any purpose is of no use. Suppose we want to know
intelligence of a section of people, we must not collect data relating to income, attitude
and interest. Without having a clear idea about the purpose we will not be in a position
to distinguish between necessary data and unnecessary data or relevant data and
irrelevant data.

7. Statistics are Capable of being placed in Relation to each other:


Statistics is a method for the purpose of comparison etc. It must be capable of being
compared, otherwise, it will lose much of its value and significance. Comparison can be
made only if the data are homogeneous.

Data on memory test can be compared with I.Q. not with salary status of parents. It is
with the use of comparison only that we can depict changes which may relate to time,
place, frequency or any other character, and statistical devices are used for this purpose.
Q 2: what do you understand by term ‘’data’’? Write in detail the types of data.

Answer:

Data is a set of values of subjects with respect to qualitative or quantitative


variables. Data is raw, unorganized facts that need to be processed. Data can be
something simple and seemingly random and useless until it is organized.

 Data is a set of values of subjects with respect to qualitative or quantitative variables.


 Data is raw, unorganized facts that need to be processed. Data can be something
simple and seemingly random and useless until it is organized.
 When data is processed, organized, structured or presented in a given context so as
to make it useful, it is called information.
 Information, necessary for research activities are achieved in different forms.

TYPES OF DATA
 Primary data
 Secondary data
 Cross-sectional data
 Categorical data
 Time series data
 Spatial data
 Ordered data

1: Primary data:
• Primary data is original and unique information that the researcher collects directly
from a source to meet his needs.
• It is the information gathered by the investigator for a specific reason.

Primary data is acquired by observing firsthand a community's attitudes toward health
services, determining a community's health needs, evaluating a social programme,
determining the job happiness of an organization's personnel, and observing the
quality of service provided by a worker.

2: Secondary data:

• Secondary data is information that has previously been acquired for a certain reason
and documented elsewhere. • Secondary data is information that has been obtained by
someone else for a different purpose (but is being used by the investigator for a different
purpose).
• Obtaining information through the use of census data to determine a population's age-
sex structure, hospital records to determine a community's morbidity and mortality
patterns, an organization's records to determine its activities, and the collection of data
from sources such as articles, journals, magazines, books, and periodicals to obtain
historical and statistical data.

3: Cross-sectional data:
• Cross-sectional data is gathered by watching a large number of topics (such as
individuals, firms, countries, or regions) at the same time or without respect for time
differences.
• It is the information for a single time or space point.
This type of information is limited in that it cannot depict changes over time or cause-
and-effect relationships in which one variable influences another.

4: Categorical data:
• Categorical variables are data types that may be separated into categories. Race, sex,
age group, and educational level are examples of categorical variables.
• Categorical data refers to information that cannot be quantified numerically. The nature
of categorical data is qualitative.
• Attributes are another name for categorical data.
• A univariate data set is made up of observations on a single characteristic. If the
individual observations in a univariate data set are categorical replies, the data set is
categorical.
Intelligence, beauty, literacy, and unemployment are examples of categorical data.

5: Unempirical data:
• Wherever the same measurements are taken on a regular basis, time series data is
created.
• Numbers that reflect or trace a variable's values through time, such as a month, quarter,
or year.
• Different phenomena, such as temperature, weight, population, and so on, can have
their values recorded throughout a period of time.
• The variable's values remain growing, falling, or stable.
• Time-series data refers to data organized by time periods. For example, a population
from a different era.

6: spatial data:
• Geospatial data, also known as geographic information, is data or information
that defines the geographic position of features and boundaries on Earth, such as
natural or man-made landforms, oceans, and more.
• Spatial data is data that can be mapped and is usually represented as coordinates
and topology.
• Geographical information systems (GIS) and other geolocation or positioning
services use spatial data.
• Spatial data is made up of points, lines, polygons, and other geographic and
geometric data primitives that can be mapped to a specific location, stored as
metadata with an object, or used by a communication system to find end-user
devices.

7: Ordered data:
• Data that is organized into categories is referred to as ordered data.

• Ordered data is identical to categorical data, except the variables are arranged in a
specific order.

•For example, for the economic status category, ordered data may be low, medium, or
high.
Q 3: what type of characteristics a pictogram should have to successfully convey the
meaning? Write down the advantages and drawbacks of using pictograms.

Answer:

A pictogram is a graphical symbol that conveys its meaning through its pictorial
resemblance to a physical object. A pictogram may include a symbol plus graphic
elements such as border, back pattern, or color that is intended to convey specific
information s. we can also say that a pictogram is a kind of graph that uses pictures instead
of bars to represent data under analysis. A pictogram is also called ‘’pictograph’’ or simply
‘’picto’’.

A pictogram or pictograph represents the frequency of data as pictures of symbols. Each


pictures or symbols may represent one or more units of data.

Pictograms from a part of our daily lives. They are used in transport, medication,
education, computers etc. they indicate, in iconic form, places, directions, actions or
constraints on actions in either the real world ( a road, a town, etc. ) or in virtual world
(computer, internet etc.).

To successfully convey the meaning, a pictogram:

1. Should be self-explanatory.
2. Should be recognized able by all people.
3. Must represent a general concept.
4. Should be clear concise and interesting.
5. Should be identifiable as a set, through uniform treatment of scale, style and
subject.
6. Should be highly visible, easy to reproduce in any scale and in positive or negative
form.
7. Should not be dependent upon a border and should work equal well in positive or
negative form.
8. Should avoid stylistic fads ork2 a commercial appearance and should imply to wide
audience that has a sophisticated, creative culture.
9. Should be attractive when used with their design, elements and typestyles.
Advantages and drawbacks of pictograms:
1) Pictogram can make warnings more eye-catching.
2) They can serve as an instant reminder of a hazard or an established
message.
3) They can improve warning comprehension for those with visual or literacy
difficulties.
4) They have the potential to be interpreted more accurately and more quickly
than words.
5) They can be recognized and recalled far better than words.
6) They can improve the legibility of warnings.
7) They may be better when undertaking familiar routine tasks.

There are a number of disadvantage of relying on pictograms:


i. Very few pictograms are universally understood.
ii. Even well understood pictograms will not be interpreted equally by
all groups of peoples and across all cultures, and it takes years for
any pictogram to reach maximum effectiveness.
iii. They have the potential for interpreting the opposite or often
underside meaning which can create additional confusion.

How to use pictogram

1: Add graphical skill summaries to your resume.

The infographic resume is gaining popularity as a unique approach to present your


qualifications and experience to potential employers.

Pictograms are a terrific addition to the abilities summary section of an infographic


resume because they perform so well for illustrating proportions. You can use them to
demonstrate your mastery of each of the talents in your professional toolkit.

2: In a project status report, indicate how far you've gotten toward a target.
Pictograms can also be used to show progress or status towards a goal. Project plans,
product roadmaps, and project status reports, for example, can all benefit from a
pictogram as a visual sign of progress.
Pictograms are utilized in the project status template below to provide an overview
of project progress and status (in terms of schedule, scope, and budget). It's a quick
approach to communicate your project's high-level status.
3: Add some color to a plain bar chart for more effect.
Pictograms, as I previously stated, can assist in making plain info more remembered
and intriguing.
You can use a sequence of pictograms to make your data more visually appealing if
your infographic is built around a simple dataset (such a single bar chart).
For instance, have a look at the infographic template below. It employs custom
pictograms to generate a full-page, high-impact data visualization instead of an ultra-
simple bar chart (which would only have 3 bars).

4: To summarize survey results, use pictograms as visual tallies.

For summarizing and displaying survey results, infographics are ideal. They're far more
engaging than figures in tables and spreadsheets, and they can aid your readers in
comprehending significant survey findings.

One of the keys to a powerful survey results infographic is the use of pictograms. You can
use them to represent things like basic demographic data as visual counts (like job title or
role). As seen in the survey infographic below, they're a fascinating way to illustrate the
enormity of various data.
Q 4: Define normal curve. Write down the properties of normal curve.

Answer:

Because it fits many natural occurrences, the normal distribution is the most important
probability distribution in statistics. Height, blood pressure, measurement error, and IQ
scores, for example, all follow the normal distribution. It's also known as the bell curve or
the Gaussian distribution.

A probability function that defines how the values of a variable are distributed is known as
the normal distribution. The majority of the observations cluster around the central peak,
while the probability for values further from the mean drop off equally in both directions.
Extreme values in both the left and right tails of the distribution are also rare.

The Gaussian or Gauss distribution is another name for the normal distribution. In both
natural and social sciences, the distribution is commonly employed. The Central Limit
Theorem, which states that averages obtained from independent, identically distributed
random variables tend to produce normal distributions regardless of the type of
distributions sampled from, makes it relevant.

Shape of Normal Distribution

From the summit of the curve, where the mean is, a normal distribution is symmetric.
This suggests that the majority of the observed data is clustered near the mean, with data
becoming less frequent as one moves away from it. The resulting graph is bell-shaped,
with the mean, median, and mode all having the same value and appearing at the curve's
apex.

Because one-half of the observable data points fall on either side of the graph, if you fold
it in half in the middle, you'll obtain two equal halves.

Normal Distribution Parameters

The mean and standard deviation are the two most important parameters of a (normal)
distribution. The shape and probabilities of the distribution are determined by the
parameters. As the parameter values change, the form of the distribution changes.
1. Mean:

Researchers use the mean as a measure of central tendency. It can be used to describe
the distribution of ratios or intervals of variables. The mean dictates the location of the
peak in a normal distribution graph, and the majority of the data points are concentrated
around the mean. Any changes in the mean value cause the curve to move to the left or
right along the X-axis.

2. Standard Deviation

The standard deviation measures the data points' dispersion from the mean. It represents
the distance between the mean and the observations and determines how far away the
data points are from the mean.

The standard deviation on the graph determines the curve width, tightening or expanding
the width of the distribution along the x-axis. A steep curve is produced by a small
standard deviation relative to the mean, while a flatter curve is produced by a large
standard deviation relative to the mean.

Properties

The following characteristics are shared by all types of (normal) distribution:

1. It is balanced.

The shape of a normal distribution is totally symmetrical. This means that the distribution
curve can be split in half in the middle, yielding two equal halves. When one-half of the
observations fall on each side of the curve, the curve takes on a symmetric shape.

2. The average, median, and mode are all the same.

The highest frequency of a normal distribution lies near the middle point, which means it
contains the most observations of the variable. The midway is also where these three
metrics intersect. In a perfectly (normal) distribution, the measurements are usually
equal.
3. Observational rule

There is a constant proportion of distance lying under the curve between the mean and a
specific number of standard deviations from the mean in normally distributed data. 68.25
percent of all cases, for example, are within one standard deviation of the mean. Ninety-
five percent of all instances are within two standard deviations of the mean, and ninety-
nine percent are within three standard deviations of the mean.

4. Kurtosis and skewness

The coefficients skewness and kurtosis measure how divergent a distribution is from a
normal distribution. The symmetry of a normal distribution is measured by skewness,
while the thickness of a normal distribution is measured by kurtosis.

Real life data rarely, if ever, follow a perfect normal distribution. The skewness and
kurtosis coefficients measure how different a given distribution is from a normal
distribution. The skewness measures the symmetry of a distribution. The normal
distribution is symmetric and has a skewness of zero.

If the distribution of a data set has a skewness less than zero, or negative skewness, then
the left tail of the distribution is longer than the right tail; positive skewness implies that
the right tail of the distribution is longer than the left.

The kurtosis statistic measures the thickness of the tail ends of a distribution in relation
to the tails of the normal distribution.

The kurtosis of the normal distribution is three, indicating that it has neither fat nor skinny
tails. When compared to the normal distribution, if an observed distribution has a kurtosis
greater than three, the distribution is said to have heavy tails. When compared to the
normal distribution, a distribution with a kurtosis of less than three is considered to have
thin tails.

Variation co efficient:

Another relevant statistic for determining a data set's dispersion is the coefficient of
variation. C.V= (S / X) X 100 is the coefficient of variation.

With respect to the scale of the data, the coefficient of variation is invariant.
Q 5: explain procedure for determining median, with one example each at last, if:

i. The number of scores is even


ii. The number of scores is odd.

Answer:

What does the Median mean?

The size (real or estimated) of the item that occurs in the middle of a series of items
ordered in ascending or descending order of magnitude can be defined as the median. It
is found in the middle of a series, dividing it into two equal halves. The median is also
known as a position average.

The middle number or center value in a set of data is called the median. The midpoint of
the set is also known as the median.
To find the median, arrange the data in ascending order from least to greatest or greatest
to least value.

The median formula

The median formula is (n + 1) 2 the, where "n" refers to the number of elements in the
set and "the" simply refers to the number (n).

To find the median, sort the numbers in ascending order from smallest to greatest. Then
locate the number in the middle. The center of this collection of numbers, for example, is
5, because it is in the middle: 1, 2, 3, 5, 6, 7, 9.
The formula yields the same answer. The set has seven numbers, thus n = 7:
• (2 + 7) + 1
• = (4) 2 • = (8) 2
In the numbers 1, 2, 3, 5, 6, 7, and 9, the fourth number is 5.

Calculate the median of a set of odd numbers.


If the total number of observations is odd, the median is calculated using the formula:
Median = (n+1)/2 term.

Find the median for the following data set: 102, 56, 34, 99, 89, 101, 10, 102, 56, 34, 99,
89, 101, 10.

Step 1:
Sort your data in ascending order from smallest to largest. The following is the order for
this example data set: 10, 34, 56, 89, 99, 101, 102.

Step 2:
Find the number in the middle (10, 34, 56, 89, 99, 101, 102), which has an equal amount
of data points above and below it.
The average is 89.

Calculate the median of a set of even numbers.

The median formula is: Median = [(n/2) term + (n/2)+1th]/2 if the total number of
observations is even.

Step 1:
Sort the information in ascending order (smallest to highest).
34, 10 Consider the following scenario: For the following data set, find the median: 102,
54, 56, 89, 99, 101, 102.

Step 2:
Locate the MIDDLE TWO NUMBERS (where there are an equal number of data points
above and below the two middle numbers).
10, 34, 54, 56, 89, 99, 101, 102, 89, 99, 101, 102, 89, 99, 101, 102, 89

Step 3:
To get the average, add the two middle figures and divide by two:
• 145 / 2 = 72.5 • 56 + 89 = 145
72.5 is the average.

The Median has the following advantages:


1. It is straightforward to grasp and calculate, especially for individual and discontinuous
series.
2. It is unaffected by the series' extreme items.
3. It can be graphically determined.
4. The median can be computed for open-ended courses.
5. After arranging the data in order of magnitude, it may be found by inspection.

Demerit of median:

1: Because it is a positional average, it does not take into account all variables.
2. Sampling fluctuations have a greater impact on the median value.

3. It cannot be further algebraically treated. The combined median, like the mean,
cannot be determined.

4. It is impossible to calculate precisely when it is in the middle of two items.

You might also like