Professional Documents
Culture Documents
Unit-1 Introduction To Statistics
Unit-1 Introduction To Statistics
STATISTICS
UNIT-I
Introduction to Statistics
Syllabus :
• Definitions and need for data
• Survey Design
• Source of Data
• Sampling
Descriptive statistics
Descriptive statistics uses data that provides a description of
the population either through numerical calculation or graph
or table.
e.g., Bar charts, Line graphs and pie charts comprise the
graphic methods ,whereas numeric measures include
measures of central tendency ,dispersion .
Inferential statistics
Inferential Statistics makes inference and prediction about
population based on a sample of data taken from population.
e.g., Employees in a company, students in a university/college
,companies, voters, households, manufactured items, births
and deaths, road accidents, etc..
Types of data
Qualitative(Categorial) data
Quantitative (Numerical) data
Qualitative(Categorial) data are representing characteristic
such as
• Data at the nominal level consists of names , labels, and
categories.
e.g., Brand of cell phone.
• Data at the ordinal level consists of data that can be ordered.
• Continuous data
• Non-statistical.
Statistical sources refer to data that is gathered for some
official purposes, incorporate censuses, and officially
administered surveys.
Non-statistical sources refer to the collection of data for
other administrative purposes or for the private sector.
Two sources of data
Internal sources
When data is collected from reports and records of the
organization itself, they are known as the internal sources.
e.g., a company publishes its annual report’ on profit and
loss, total sales, loans, wages, etc.
External sources
When data is collected from sources outside the
organization, they are known as the external sources.
e.g., if a tour and travel company obtains information on
Andhra Pradesh tourism from Andhra Pradesh Transport
Corporation.
Sources of Data :
Data sources are classified as :
Primary Data
Secondary Data
Primary and Secondary Data in Statistics :
The difference between primary and secondary data in
Statistics is that Primary data is collected firsthand by a
researcher (organization, person, authority, agency or party
etc.) through experiments, surveys, questionnaires, focus
groups, conducting interviews and taking (required)
measurements, while the secondary data is readily available
(collected by someone else) and is available to the public
through publications, journals and newspapers.
Primary Data
Primary data
• Primary data means first-hand information collected by an
investigator.
• It is collected for the first time.
• Non-probability Sampling
Probability Sampling
• Probability sampling involves random selection,
allowing you to make strong statistical inferences about
the whole group.
Simple Random Sampling
• Respondents are randomly selected from larger group .
Systematic Sampling
• The items are selected from the target population by
selecting the random selection point and selecting the
other methods after a fixed sample interval.
Stratified Sampling:
• Respondents are split into sub- groups and then
randomly selected from each group.
Cluster Sampling:
• the cluster or group of people are formed from the
population set. The group has similar significatory
characteristics.
Non-Probability Sampling
Non-probability sampling involves non-random selection
based on convenience or other criteria, allowing you to
easily collect data .
Non-Probability Sampling Types
• Convenience sampling,
• Quota sampling,
• Judgmental sampling,
• Snowball sampling.
Convenience Sampling:
• the samples are selected from the population directly
because they are conveniently available for the researcher.
Quota Sampling:
the researcher forms a sample that involves the individuals
to represent the population based on specific traits or
qualities
Purposive or Judgmental Sampling
Researcher selects a typical group of individuals who
might represent the larger population and then collects data
from this group .
Snowball Sampling
Selecting participants by finding one or two participants
and then asking them to refer you to others.
Class Limits :
Corresponding to class interval , the class limits may be
defined as the minimum value and the maximum value
the class interval may contain.
The minimum value is known as the lower-class
limit(LCL) and the maximum value is known as the
upper-class limit (UCL).
Class Interval : The difference between the upper-class
limit and the lower-class limit of a class is known as a
class interval .
Class boundary :
Class boundaries may be defined as the actual class limit
of class interval .
For overlapping classification (or) mutually exclusive
classification that excludes the upper
Class limits like 10 -20, 20 -30, 30 -40, 40- 50 ……etc.
Which is usually applicable for continuous variable .
For non - overlapping classification (or) mutually
inclusive classification that includes both the class
limits like 0 -9, 10 -19, 20 -29, 30- 39 ……etc. Which
is usually applicable for discrete variable .
LCB = LCL – D /2
UCB = UCL + D /2
Where D is the difference between the LCL of the next
class interval and the UCL of the given class interval.
Mid point or Mid value or Class mark
Corresponding class interval this may be defined as the
total of the two class limits or class boundaries to be
divided by 2 .
Mid point = LCL +UCL / 2
= LCB +UCB /2
• Number of class intervals in frequency
distribution can be calculated by using the
following formulae
Where k = Number of classes
N = The total number of observations
• Width of class intervals
• Tabular
• Graphical
Textual Presentation :
• The data gathered are presented in paragraph
form.
• Data are written and read .