Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

OPTS1101-FUNDAMENTALS OF BUSINESS

STATISTICS

UNIT-I
Introduction to Statistics
Syllabus :
• Definitions and need for data

• Techniques of Conducting Surveys

• Survey Design

• Source of Data

• Methods of Primary data collection

• Sampling

• Different types of sample design

• Data analysis and presentation


Statistics is a branch of mathematics dealing with the collection,
analysis, interpretation, and presentation of the many of numerical
data.
Data is it involves collecting, classifying, summarizing,
organizing, analyzing, and interpreting .
Population
It is actually a collection of set of individuals or objects or
events whose properties are to be analyzed.
Sample
It is the subset of a population. The size of the sample is always
less than the total size of the population .
Types of Statistical methods
• Descriptive statistics
• Inferential statistics
Types of Statistical methods
• Descriptive statistics
• Inferential statistics

Descriptive statistics
Descriptive statistics uses data that provides a description of
the population either through numerical calculation or graph
or table.
e.g., Bar charts, Line graphs and pie charts comprise the
graphic methods ,whereas numeric measures include
measures of central tendency ,dispersion  .
Inferential statistics
Inferential Statistics makes inference and prediction about
population based on a sample of data taken from population.
e.g., Employees in a company, students in a university/college
,companies, voters, households, manufactured items, births
and deaths, road accidents, etc..
Types of data
Qualitative(Categorial) data
Quantitative (Numerical) data
Qualitative(Categorial) data are representing characteristic
such as
• Data at the nominal level consists of names , labels, and
categories.
e.g., Brand of cell phone.
• Data at the ordinal level consists of data that can be ordered.

e.g., Fat level measured as low, medium , high .


• Data at the interval level consists of data that can be
ordered ,and differences are meaningful.
e.g., Temperature measured in Celsius.
• Data at the ratio level consists of data that can be ordered
differences are meaningful ,and zero corresponds to none of
the value.
e.g., Shoe length in inches.
Quantitative (Numerical) data are two types :
• Discrete data

• Continuous data

Discrete data refers to the data values which can only


attain certain specific values.
e.g., The number of students in a class, The number of
chips in a bag, The number of stars in the sky
Continuous Data can contain values between a certain
range that is within the highest and lowest values.
e.g., Height and weight of a student ,Voltage ,Temperature
, Length.
Conducting the Survey
The four main ways to conduct surveys are through in-
person interviews, by telephone, through the mail.
Survey Design
Surveys are used in a variety of ways to determine the
opinions, approaches, and buying preferences of
customers as well as to calculate employee issues like job
satisfaction.
Sources of Data
The sources of data can be classified into two types:
• Statistical

• Non-statistical.
Statistical sources refer to data that is gathered for some
official purposes, incorporate censuses, and officially
administered surveys.
Non-statistical sources refer to the collection of data for
other administrative purposes or for the private sector.
Two sources of data
Internal sources
When data is collected from reports and records of the
organization itself, they are known as the internal sources.
e.g., a company publishes its annual report’ on profit and
loss, total sales, loans, wages, etc.
External sources
When data is collected from sources outside the
organization, they are known as the external sources.
 e.g., if a tour and travel company obtains information on
Andhra Pradesh tourism from Andhra Pradesh Transport
Corporation.
Sources of Data :
Data sources are classified as :
Primary Data
Secondary Data
Primary and Secondary Data in Statistics :
The difference between primary and secondary data in
Statistics is that Primary data is collected firsthand by a
researcher (organization, person, authority, agency or party
etc.) through experiments, surveys, questionnaires, focus
groups, conducting interviews and taking (required)
measurements, while the secondary data is readily available
(collected by someone else) and is available to the public
through publications, journals and newspapers.
Primary Data
Primary data
• Primary data means first-hand information collected by an
investigator.
• It is collected for the first time.

• It is original and more reliable.

e.g., The population census conducted by the government of


India after every ten years is primary data.

Sources of Primary Data


The sources of primary data are primary units such as basic
experimental units, individuals, households. Following
methods are used to collect data from primary units usually
and these methods depends on the nature of the primary unit.
PersonalInvestigation
The researcher conducts the experiment or survey
himself/herself and collected data from it. The collected data is
generally accurate and reliable. This method of collecting
primary data is feasible only in case of small-scale
laboratory, field experiments or pilot surveys and is not
practicable for large scale experiments and surveys because it
take too much time.
Through Investigators

The trained (experienced) investigators are employed to


collect the required data. In case of surveys, they contact
the individuals and fill in the questionnaires after asking
the required information, where a questionnaire is an
inquiry form having several questions designed to obtain
information from the respondents. This method of
collecting data is usually employed by most of the
organizations and its gives reasonably accurate information
but it is very costly and may be time taking too.
Through questionnaire

The required information (data) is obtained by sending a


questionnaire (printed or soft form) to the selected
individuals (respondents) (by mail) who fill in the
questionnaire and return it to the investigator. This method
is relatively cheap as compared to “through investigator”
method but non-response rate is very high as most of the
respondents don’t bother to fill in the questionnaire and
send it back to investigator.
Through Local Sources

The local representatives or agents are asked to send requisite


information who provide the information based upon their own
experience. This method is quick, but it gives rough estimates only.
Through Telephone

The information may be obtained by contacting the individuals on


telephone. Its a Quick and provide accurate required information.
Through Internet :
With the introduction of information technology, the people may be
contacted through internet and the individuals may be asked to
provide the relevant information. Google survey is widely used as
online method for data collection now a day. There are many paid
online survey services too.
Secondary Data :
• Secondary data refers to second-hand information.
• It is not originally collected and rather obtained from already
published or unpublished sources.
e.g., the address of a person taken from the telephone directory, or the
phone number of a company taken from Just Dial are secondary data.
Sources of Secondary Data
The secondary data may be available from the following sources:
Government organizations
Federal and Provincial Bureau of Statistics, Collect Reporting Service-
Agriculture Department, Census and Registration Organization etc.
Semi-Government organization
Municipal committees, District Councils, Commercial and Financial
Institutions like banks etc.
• Teaching and Research Organizations
• Research Journals and Newspapers
• Internet
Sampling
The process of selecting a portion of the population to
represent the entire population .
Types of Sampling Method
• Probability Sampling

• Non-probability Sampling

Probability Sampling
• Probability sampling involves random selection,
allowing you to make strong statistical inferences about
the whole group.
Simple Random Sampling
• Respondents are randomly selected from larger group .

Systematic Sampling
• The items are selected from the target population by
selecting the random selection point and selecting the
other methods after a fixed sample interval.
Stratified Sampling:
• Respondents are split into sub- groups and then
randomly selected from each group.
Cluster Sampling:
• the cluster or group of people are formed from the
population set. The group has similar significatory
characteristics.
Non-Probability Sampling
Non-probability sampling involves non-random selection
based on convenience or other criteria, allowing you to
easily collect data .
Non-Probability Sampling Types
• Convenience sampling,

• Quota sampling,

• Judgmental sampling,

• Snowball sampling.

Convenience Sampling:
• the samples are selected from the population directly
because they are conveniently available for the researcher.
Quota Sampling:
the researcher forms a sample that involves the individuals
to represent the population based on specific traits or
qualities
Purposive or Judgmental Sampling
Researcher selects a typical group of individuals who
might represent the larger population and then collects data
from this group .
Snowball Sampling
Selecting participants by finding one or two participants
and then asking them to refer you to others.
Class Limits :
Corresponding to class interval , the class limits may be
defined as the minimum value and the maximum value
the class interval may contain.
The minimum value is known as the lower-class
limit(LCL) and the maximum value is known as the
upper-class limit (UCL).
Class Interval : The difference between the upper-class
limit and the lower-class limit of a class is known as a
class interval .
Class boundary :
Class boundaries may be defined as the actual class limit
of class interval .
For overlapping classification (or) mutually exclusive
classification that excludes the upper
Class limits like 10 -20, 20 -30, 30 -40, 40- 50 ……etc.
Which is usually applicable for continuous variable .
For non - overlapping classification (or) mutually
inclusive classification that includes both the class
limits like 0 -9, 10 -19, 20 -29, 30- 39 ……etc. Which
is usually applicable for discrete variable .
LCB = LCL – D /2
UCB = UCL + D /2
Where D is the difference between the LCL of the next
class interval and the UCL of the given class interval.
Mid point or Mid value or Class mark
Corresponding class interval this may be defined as the
total of the two class limits or class boundaries to be
divided by 2 .
Mid point = LCL +UCL / 2
= LCB +UCB /2
• Number of class intervals in frequency
distribution can be calculated by using the
following formulae
Where k = Number of classes
N = The total number of observations
• Width of class intervals

where LNV = Largest numerical value


SNV = Smallest numerical value
• The frequency is the number of times a particular data
point occurs in the set of data.
• A frequency distribution is a table that list each data
point and its frequency.
• The relative frequency is the frequency of a data point
expressed as a percentage of the total number of data
points.
• Ungrouped data is data given as Indi- vidual data
points.
• Grouped data is data given in intervals.
(1) The following set of numbers represents mutual fund prices
reported at the end of a week for selected 40 nationally sold
funds.
10 17 15 22 11 16 19 24 29 18
25 26 32 14 17 20 23 27 30 12
15 18 24 36 18 15 21 28 33 38
34 13 10 16 20 22 29 29 23 31
Arrange these prices into a frequency distribution having a suitable
number of classes.
(2) The take – home salary (in Rs)of 40 unskilled workers from
company for a particular month was
3482 3392 3499 3412 3440 3444
3446 3540 3394 3365 3412 3458
3482 3394 3450 3444 3440 3494
3460 3425 3500 3390 3414 3365
3390 3460 3422 3500 3470 3428
Construct a frequency distribution having a suitable number
of classes.
(3) A computer company received a rush order for as many
home computers as could be shipped during a six – week
period .Company records provide the following daily
shipments:
22 65 65 67 55 50 65
77 73 30 62 54 48 65
79 60 63 45 51 68 79
83 33 41 49 28 55 61
65 75 55 75 39 87 45
50 66 65 59 25 35 53
Group these daily shipments figures into a frequency
distribution having suitable number of classes.
Presentation of data: This refers to the
organization of data into tables, graphs or
charts, so that logical and statistical conclusion
can be derived from the collected measurements.
Data may be presented in 3 methods
• Textual

• Tabular

• Graphical

Textual Presentation : 
• The data gathered are presented in paragraph
form.
• Data are written and read .

• It is a combination of texts and figures.


Example : Of the 150 sample interviewed , the following
complaints were noted : 27 for lack of books in the library ,
25 for a dirty play ground , 20 for lack of laboratory
equipment ,17 for a not well maintained university
building .
• Tabular Presentation  :

• Methods of presenting data using the statistical table.

• A systematic organization of data in columns and rows.


Example: Total population distribution by region :2000

Region Population Percentage


Region-I 4,200,478 5.49

Region-II 2,813,159 3.68


Region-III 8,030,945 10.50
Region-IV 11,793,655 15.42
Region-V 4,686,669 6.13
Region-VI 6,211,038 8.12
Graphical Method:
Histogram:
• Represented by a set of rectangular bars
• Variables (class) is taken along the X- axis &
frequency along the Y – axis
• With the class intervals as base , rectangles with
height proportion to class frequency are drawn.
• The set of rectangular bars so obtained gives
histogram.
Frequency Curve:
• Variables is taken along X-axis and frequency
along Y-axis .
• Frequencies are plotted against the class mid –
values and then, these points are joined by a
smooth curve.
• The curve so obtained is the frequency curve.
• Total area under the frequency curve represents total
frequency.
Frequency Polygon:
• Variables is taken along X-axis and frequency along Y-
axis .
• Class frequency are plotted against the class mid- values
and then, these points are joined by straight line .
• The figure so obtained is the frequency polygon.
• Total area under the frequency curve represents total
frequency.
Ogive (Cumulative Frequency Curve)
• Ogive is a smooth graph with cumulative frequency (cf)
plotted against values of variables (Class limits)
• Class limits are taken a long X- axis and cumulative
frequency along Y- axis
• There are 2 type of O gives

Less than of curve or less than Ogive (<cf)


Greater than curve or greater than O give(>cf)
Pie diagram (Sector diagram):
• Presenting discrete data of qualitative characteristics
such as blood groups, RH factor , Age group, Sex group,
causes of mortality or social group in a population etc.
• The frequencies of the groups are shown in a circle.
• Degrees of angle denote the frequency and area of the
sector.
• Size of each (degree measure) =

You might also like