Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Introduction to Statistics Chapter One: The subject matter of statistics

Chapter One: The Subject Matter of Statistics


1.1. Definition of Statistics

A few relevant definitions of statistical methods are given below:


 Statistics is the science which deals with the methods of collecting, classifying, presenting,
comparing, and interpreting numerical data collected to throw some light on any sphere of
enquiry (Seligman).
 The science of statistics is the method of judging, collecting natural or social phenomenon
from the results obtained from the analysis or enumeration or collection of estimates (King
A. L.).
 Bowley has given the following three definitions keeping in mind various aspects of
statistics as a science:
 Statistics may be called the science of counting.
 Statistics may be called the science of average.
 Statistics is the science of the measurement of social organism regarded as a whole in
all its manifestations.
o These definitions confine the scope of statistical analysis only to ‘counting, average, and
applications’ in the field of sociology alone. Bowley realized this limitation and himself
said that statistics cannot be confined to any science.
 Statistics is a collection of mathematics to deal with the organization, analysis, and
interpretation of data.
 Three main statistical methods are used in the data analysis: descriptive statistics,
inferential statistics, and regressions analysis.
 In general, statistics may be defined as the science of collecting, organizing, presenting,
analyzing, and interpreting numerical data for making better decisions.
 Statistics provides you with methods for making better sense of the numbers used every
day to describe or analyze the world we live in.
 Scope of Statistics
 The scope of applications of statistics has assumed unprecedented dimensions these
days.

1|Page
By: Asimamaw B. (MSc.)
Introduction to Statistics Chapter One: The subject matter of statistics

 Statistical methods are applicable in all diversified fields such as economics, trade,
industry, commerce, agriculture, bio-sciences, physical sciences, educations,
astronomy, insurance, accountancy and auditing, sociology, psychology,
meteorology, and so on.
 Statistics in Economics
 Statistical methods are extensively used in all branches of economics. For example:
Time-series analysis is used for studying the behaviour of prices, production, and
consumption of commodities, money in circulation, and bank deposits and clearings.
 Index numbers are useful in economic planning as they indicate the changes over a
specified period of time in (a) prices of commodities, (b) imports and exports, (c)
industrial/agricultural production, (d) cost of living, and the like.
 Demand analysis is used to study the relationship between the price of a commodity
and its output (supply).
 Forecasting techniques are used for curve fitting by the principle of least squares and
exponential smoothing to predict inflation rate, unemployment rate, or manufacturing
capacity utilization.
 Why Study Statistics?
 Businesses use statistical methodology and thinking to make decisions about which
products to produce, how much to spend advertising them, how to evaluate their
employees, how often to service their machinery and equipment, how large their
inventories should be, and nearly every aspect of running their operations
 In the business world, statistics has these important specific uses:
 To summarize business data.
 To draw conclusions from those data.
 To make reliable forecasts about business activities.
 To improve business processes.

1.2. Types of Statistical Methods


 As we have seen, statistics can refer to a set of individual numbers or numerical facts, or
to general or specific statistical techniques. A further breakdown of the subject is possible,
depending on whether the emphasis is on (1) simply describing the characteristics of a set

2|Page
By: Asimamaw B. (MSc.)
Introduction to Statistics Chapter One: The subject matter of statistics

of data or (2) proceeding from data characteristics to making generalizations, estimates,


forecasts, or other judgments based on the data.
 Statistical methods, broadly, fall into the following two categories:
 Descriptive statistics, and
 Inferential statistics
 Descriptive statistics summarizes the data and usually focuses on the distribution, the central
tendency, and the dispersion of data.
 The distribution can be normal distribution or binomial distribution, and the central
tendency is to describe the data with respect to the central of the data.
 The central tendency can be the mean, median, and mode of data.
 The dispersion describes the spread of the data, and dispersion can be the variance,
standard deviation, and interquartile range.
 Inferential statistics tests the relationship between two data sets or two samples, and a
hypothesis is usually set for the statistical relationships between them.
 The hypothesis can be a null hypothesis or alterative hypothesis, and rejecting the null
hypothesis is done using tests like the T Test, Chi Square Test, and ANOVA.
 The Chi Square Test is more for categorical variables, and the T Test is more for
continuous variables.
 The ANOVA test is for more complex applications.
 Regression analysis is used to identify the relationships between two variables.
 Regressions can be linear regressions or non-linear regressions.
 The regression can also be a simple linear regression or multiple linear regressions for
identifying relationships for more variables.
1. Descriptive statistics
 Descriptive statistics: consists of procedures used to summarize and describe the
characteristics of a set of data.
 Descriptive statistics includes statistical methods involving the collection, presentation,
and characterization of a set of data in order to describe the various features of that set of
data.
 Bar charts, line graphs, and pie charts comprise the graphic methods, whereas numeric
measures include measures of central tendency, dispersion, skewness, and kurtosis.
 Consider the national census conducted by the Ethiopian government every 10 years.
Results of this census give you the average age, income, and other characteristics of the
Ethiopian population. To obtain this information, the Census Bureau must have some
means to collect relevant data. Once data are collected, the bureau must organize and
summarize them. Finally, the bureau needs a means of presenting the data in some
meaningful form, such as charts, graphs, or tables.
 For example, upon looking ring finger of students in your class, you may find that
35% of your fellow students were married. If so, the figure “35%” is a descriptive
statistic.
 A frequency distribution of weekly wages in ABC Company during the past year
can be described in the following table.
3|Page
By: Asimamaw B. (MSc.)
Introduction to Statistics Chapter One: The subject matter of statistics

Table 1: Frequency distribution of weekly wages


Monthly Wage in ETB Number of workers (f )
1200- 1799 5
1800 - 2399 7
2400- 2999 3
3000 - 3599 11
3600 - 4199 9
4200 -4799 2
Total = 37

2. Inferential Statistics
 Inferential statistics are often (but not always) the next step after you have collected and
summarized data.
 Inferential statistics are used to make inferences based on a smaller group of data (such as
our group of 50 students) about a possibly larger one (such as all the undergraduate
students in the College of Business and Economics). With inferential statistics, you take
data from samples and make generalizations about a population.
 Inferential statistics, sometimes referred to as inductive statistics, we go beyond mere
description of the data and arrive at inferences regarding the phenomenon or phenomena
for which sample data were obtained. Most of the time, due to the expense, time, size of
population, medical concerns, etc., it is not possible to use the entire population for a
statistical study; therefore, researchers use samples.
 Inferential statistics include those techniques by which decisions about a statistical
population or process are made based only on a sample having been observed.
 Inferential statistics includes statistical methods which facilitate estimating the
characteristic of a population or making decisions concerning a population on the basis
of sample results.
 Sample and population are two relative terms.
 The larger group of units about which inferences are to be made is called the population
or universe.
 Sample is a fraction, subset, or portion of that universe.
 Inferential statistics uses probability, i.e., the chance of an event occurring. You may be
familiar with the concepts of probability through various forms of gambling. If you play cards,
dice, bingo, or lotteries, you win or lose according to the laws of probability. Probability theory
is also used in the insurance industry and other areas.
 There are two main areas of inferential statistics:

4|Page
By: Asimamaw B. (MSc.)
Introduction to Statistics Chapter One: The subject matter of statistics

 Estimating parameters. This means taking a statistic from your sample data (for
example the sample mean) and using it to say something about a population parameter
(i.e. the population mean).
 Hypothesis tests. This is where you can use sample data to answer research questions.
For example, you might be interested in knowing if a new cancer drug is effective. Or
if breakfast helps children perform better in schools.
 An area of inferential statistics called hypothesis testing is a decision-making process for
evaluating claims about a population, based on information obtained from samples.
 For example, a researcher may wish to know if a new drug will reduce the number
of heart attacks in men over 70 years of age. For this study, two groups of men over
70 would be selected. One group would be given the drug, and the other would be
given a placebo (a substance with no medical benefits or harm). Later, the number
of heart attacks occurring in each group of men would be counted, a statistical test
would be run, and a decision would be made about the effectiveness of the drug.
 Statisticians also use statistics to determine relationships among variables.
 For example, relationships were the focus of the most noted study in the 20th
century, “Smoking and Health”, published by the Surgeon General of the United
States in 1964. He stated that after reviewing and evaluating the data, his group
found a definite relationship between smoking and lung cancer. He did not say that
cigarette smoking actually causes lung cancer, but that there is a relationship
between smoking and lung cancer. This conclusion was based on a study done in
1958 by Hammond and Horn. In this study, 187,783 men were observed over a
period of 45 months. The death rate from lung cancer in this group of volunteers
was 10 times as great for smokers as for nonsmokers.
 By studying past and present data and conditions, statisticians try to make predictions
based on this information. For example, a car dealer may look at past sales records for a
specific month to decide what types of automobiles and how many of each type to order
for that month next year.
 Procedure for Performing an Inferential Test: there are many steps to do inferential
statistics.
 Start with a theory
 Make a research hypothesis
 Operationalize the variables
 Identify the population to which the study results should apply
 Form a null hypothesis for this population
 Collect a sample of children from the population and run the study
 Perform statistical tests to see if the obtained sample characteristics are sufficiently
different from what would be expected under the null hypothesis to be able to reject
the null hypothesis.
 The following types of inferential statistics are extensively used and relatively easy to
interpret:
 One sample test of difference/One sample hypothesis test
5|Page
By: Asimamaw B. (MSc.)
Introduction to Statistics Chapter One: The subject matter of statistics

 Confidence Interval
 Contingency Tables and Chi Square Statistic
 T-test or Anova
 Pearson Correlation
 Bi-variate Regression
 Multi-variate Regression
 Descriptive and inferential statistics work hand in hand, and which statistic you use and when
you depends on the question want answered.
 Data visualization is the technique used to communicate or present data using graphs, charts,
and dashboards. Data visualizations can help us understand the data more easily.
1.3. Need For Data
 Definition of data and its types
 Data consist of information coming from observations, counts, measurements, or
responses.
 Data are the real factors and figures seen or observed that are collected, organized,
presented, summarized, analyzed and interpretation.
 Data set – is all the data collected in a particular study. Say data on sex, age, awareness
and so on.
 Discrete Data – refers to data obtained by counting. It assumes always whole numbers.
But the data on age is continuous data because it can assume any real number up to
human age limit.
 Continuous Data – refers to data gathered by measuring and can include decimal
numbers (in real number in practice).
 Qualitative Data – data in non-numeric form say data on sex (which can be either male
or female) or awareness (which can be either yes or no, or can be measured by degree
awareness). Note that we can present qualitative data in numeric form (say 1 to male
and 0 to female) and we can use statistical methods on the transformed data.
 Quantitative Data- are data which expressed in numeric form say data on age, number
of people lost due to HIV and so on.
 Statistical data are the basic material needed to make an effective decision in a particular
situation. The main reasons for collecting data are as listed below:
 To provide necessary inputs to a given phenomenon or situation under study.
 To measure performance in an ongoing process such as production, service, and so on.
 To enhance the quality of decision-making by enumerating alternative courses of action
in a decision-making process, and selecting an appropriate one.
 To satisfy the desire to understand an unknown phenomenon.
 To assist in guessing the causes and probable effects of certain characteristics in given
situations.
 When we collect data we should consider following questions:
 Have data come from an unbiased source, that is, source should not have an interest
in supplying the data that lead to a misleading conclusion,

6|Page
By: Asimamaw B. (MSc.)
Introduction to Statistics Chapter One: The subject matter of statistics

 Do data represent the entire population under study i.e. how many observations should
represent the population,
 Do the data support other evidences already available. Is any evidence missing that
may cause to arrive at a different conclusion?
 Are data support the logical conclusions drawn.

Table 2: Nature of Data, Information, and Measurement


Measurement
Data Type Information Type Type

Categorical → Do you practice Yoga? Yes † No †


→ How many books do you
Discrete have
Numerical in your library? Number
Centimeters or
Continuous → What is your height? Inches

1.4. Scales of Measurement


 The four common levels of data measurement are: Nominal, Ordinal, and Interval and
Ratio levels or scales.
A. Nominal scale: Nominal scale is simply a system of assigning number symbols to events in
order to label them.
- The word ‘nominal’ is derived from the Latin word nomen, meaning ‘name’.
- Its simple function is to divide the data into separate categories that can then be compared
with each other. By first giving names to or labelling the parts or states of a concept, or
by naming discrete units of data, we are then able to measure the concept or data at the
simplest level.
- Nominal Scale Categories e.g. Male-Female; Labelling Household head, 1 if the
household head is female, 0 if the household head is male. Internet email provider ❑
Gmail ❑ Windows Live ❑ Yahoo ❑ Other.

- The variables measured under a nominal scale can be put to get a frequency percentage,
mode, median, etc.
- Chi-square test is the most common test of statistical significance that can be utilized,
and for the measures of correlation, the contingency coefficient can be worked out.

7|Page
By: Asimamaw B. (MSc.)
Introduction to Statistics Chapter One: The subject matter of statistics

- Nominal scale is the least powerful level of measurement. It indicates no order or distance
relationship and has no arithmetic origin. A nominal scale simply describes differences
between things by assigning them to categories. Nominal data are, thus, counted data. The
scale wastes any information that we may have about varying degrees of attitude, skills,
understandings, etc.
- In spite of all this, nominal scales are still very useful and are widely used in surveys and
other ex-post-facto research when data are being classified by major sub-groups of the
population.
B. Ordinal Scale
- An ordinal scale classifies values into distinct categories in which ranking is implied.
- The ordinal scale places events in order. But the intervals between two consecutive orders
may not be equal.
- The ordinal scale is more precise scale than the nominal scale. It allows the teacher to
assign values by placing of arranging the observations in relative rank order.
- The ordinal level of measurement implies that an entity being measured is quantified in
terms of being more than or less than, or of a greater or lesser order than.
- This is the most important characteristic of ordinal measures: There is no way to tell how
far apart the attributes are from one another.
- Ordinal Scale Categories Ordering Implied Positions, Satisfaction, etc.
- Scales of opinion—like the familiar ‘‘strongly agree,’’ ‘‘agree,’’ ‘‘neutral,’’ ‘‘disagree,’’
‘‘strongly disagree’’ found on so many surveys—are ordinal measures.
C. Interval Scale
- Interval scales of measurement are further improved over an ordinal scale of
measurement.
- A scale of measurement for a variable in which the interval between observations is
expressed in terms of a fixed standard unit of measurement.
- Interval scales are those where the values measured are not only rank-ordered, but are also
equidistant from adjacent attributes. In the case of interval scale, the intervals are adjusted
in terms of some rule that has been established as a basis for making the units equal.
- The units are equal only in so far as one accepts the assumptions on which the rule is
based.
- Interval scales can have an arbitrary zero, but it is not possible to determine for them what
may be called an absolute zero or the unique origin. The primary limitation of the interval
scale is the lack of a true zero; it does not have the capacity to measure the complete
absence of a trait or characteristic.
o Temperature scales are one of the most familiar types of interval scale.

8|Page
By: Asimamaw B. (MSc.)
Introduction to Statistics Chapter One: The subject matter of statistics

o For example, the temperature scale (in Fahrenheit or Celsius), where the
difference between 30 and 40 degree Fahrenheit is the same as that
between Standardized exam score, eg. 85-90 = A, ≥ 90 = 𝐴+ , etc.
o 80 and 90 degree Fahrenheit.
- Likewise, if you have a scale that asks respondents’ monthly income using the following
attributes (ranges): Birr. 0 to 10,000, Birr.10,000 to 20,000, Birr.20,000 to 30,000, and so
forth, this is also an interval scale, because the mid-point of each range are equidistant
from each other.
- Mean is the appropriate measure of central tendency, while standard deviation is the most
widely used measure of dispersion. Product moment correlation techniques are
appropriate and the generally used tests for statistical significance are the‘t’ test and ‘F’
test.

D. Ratio scale
 Demerit of an interval scale of having no absolute zero point of measurement is
being overcome in a ratio scale.
 A ratio scale is also a type of an interval scale with an equal interval between the
consecutive scales along with the added feature of having the true zero point on the
scale.
 We can conceive of an absolute zero of length and similarly we can conceive of an
absolute zero of time. For example, the zero point on a centimeter scale indicates
the complete absence of length or height.
 Ratio scales are those that have all the qualities of nominal, ordinal, and interval
scales, and in addition, also has a “true zero” point (where the value zero implies
lack or non-availability of the underlying construct).
 Ratio scale represents the actual amounts of variables. Measures of physical
dimensions such as weight, height, distance, etc. are examples.
 The ratio scales have wider acceptability and use. Generally, almost all statistical
tools are usable with the variables measured in a ratio scale.
 Most measurement in the natural sciences and engineering, such as mass, incline of
a plane, and electric charge, employ ratio scales, as are some social science
variables such as age, tenure in an organization, and firm size (measured as
employee count or gross revenues).
 In summary, you can use the following simple test to determine which kind of data
measurement that you can use on the values of a variable. If you can say that:
One value is different from another, you have a nominal scale.
o Nominal: categorize into boxes, names.
One value is bigger, better or more of anything than another, you have an ordinal scale
o Ordinal: prioritize according to relative values, put into order.
9|Page
By: Asimamaw B. (MSc.)
Introduction to Statistics Chapter One: The subject matter of statistics

One value is so many units (degrees, inches) more or less than another, you have an
interval scale.
o Interval: sort according to measured value.
One value is so many times as big or bright or tall or heavy as another, you have a ratio
scale.
o Ratio: measure in relation to a zero value.
 Samples ("frames") and sample size, instruments.
 Methodologies for collecting data, etc.
 While deciding about the method of data collection to be used for the study, the researcher
should keep in mind two types of data viz., primary, and secondary.
 The primary data are those which are collected afresh and for the first time, and thus happen
to be original in character.
 The secondary data, on the other hand, are those which have already been collected by
someone else and which have already been passed through the statistical process.
1.5. Sources of Data
 Data sources are classified as
- primary sources, and
- secondary sources.
 Individuals, focus groups, and/or panels of respondents specifically decided upon and set
up by the investigator for data collection are examples of primary data sources. Any one or
a combination of the following methods can be chosen to collect primary data:
- Direct personal observations
- Direct or indirect oral interviews
- Administrating questionnaires
 Secondary sources mean data collected by someone else earlier; data is collected by
someone other than the primary user.
- External secondary data sources
 Government publications, which include (i) The National Accounts Statistics,
published by the Central Statistical Agency (CSA). It contains estimates of
national income for several years, growth rate, and rate on major economic
activities such as agriculture, industry, trade, transport, and so on; (ii)
Wholesale Price Index, published by the office of the Economic Advisor,
Ministry of Commerce and Industry; (iii) Consumer Price Index; (iv) Reserve
Bank of India bulletins; (v) Economic Survey.
 Non-Government publications include publications of various industrial and
trade associations.
- Internal secondary data sources
 The data generated within an organization in the process of routine business
activities, are referred to as internal secondary data.

10 | P a g e
By: Asimamaw B. (MSc.)
Introduction to Statistics Chapter One: The subject matter of statistics

 Financial accounts, production, quality control, and sales records are


examples of such data.
 However, data originating from one department of an organization may not
be useful for another department in its original form. It is, therefore,
desirable to condense such data into a form needed by the other.
References
 Allan G. Bluman (2012). Elementary Statistics: Step by Step Approach. Eighth Edition.
McGraw-Hill.
 J. K. Sharma (2007). Business Statistics. Second Edition. Pearson Education.
 Leonard J. Kazmier (2004). Schaum’s Outline of Theory and Problems of Business Statistics.
Fourth Edition. McGraw-Hill.
 Mark L. Berenson, David M. Levine, Timothy C. Krehbiel (2011). Basic Business Statistics:
Concepts and Applications. Twelfth Edition. Pearson Education
 Neil J. Salkind (2016). Statistics for People Who (Think They) Hate Statistics. Sage
Publications, Inc.

“End of Chapter One”

11 | P a g e
By: Asimamaw B. (MSc.)

You might also like