Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Unit 1: Introduction

Introduction of Statistics
The term “Statistics” has been derived from the Latin word “status” or Italian word “Statista” or
German word “Statistik” of which mean political state.
In those days “Statistics” was used only in collecting the information relating to the population
of the state military strength, incomes etc. for framing the military strength was considered only
as the science of statecraft. However with passage of time, the science of statistics has been
applied very widely. So, in modern times, the science of statistics has been applied very widely
and the scope of statistics has considerably enlarged. It is used not only to the state in
administration but it is used in computer, IT, Economics, Business, Research, Bank etc. There is
hardly any place of human activity where statistics has not been used.
Statistics is the science of art of learning from data. It is concerned with the collection of data, its
subsequent description, and its analysis, which often leads to the drawing of valid conclusions.
It may be defined as the collection, presentation, analysis and interpretation of numerical data.

Scopes of Statistics
1. Statistics, computer and information technology.
2. Statistics and Accounting.
3. Statistics and Economics.
4. Statistics and Business.
5. Statistics and Planning.
6. Statistics and Mathematics
7. Statistics and Medical science.
8. Statistics and Psychology education.

Parts of statistics:
Statistics may be classified into two parts:
1. Theoretical statistics or mathematical statistics
i. Descriptive statistics
ii. Inferential statistics
2. Applied Statistics:

i. Descriptive statistics:: The part of statistics concerned with the description and summarization
of data is called descriptive statistics. Descriptive statistics measures the measure of location,
measure of dispersion, measure of skewness, measure of kurtosis etc.

ii. Inferential statistics: :The part of statistics concerned with the drawing of conclusion is called
inferential statistics .In inferential statistics; samples are taken from the population in such a way
that the drawn sample can represent the entire population. Different statistical techniques are
used to draw the valid conclusions on the basis of the statistical measures calculated from the
sample data so that the conclusion can be representative of whole data.

2. Applied statistics: This consists of massive application of theoretical or mathematical statistics


in the different area. The statistical tools and methods are used in order to solve many more
practical problems in diversified area .Besides that, applied statistics has been used in decision
making problems.

Limitations of statistics:
Although statistics has wide field of application, it has some limitations. Some of these
limitations are as follows:
i. Statistics doesn’t deal with individuals.
ii. Statistics doesn’t deal qualitative characteristics.
iii. Statistical laws are not exact.
iv. Statistics can be misused.

i. Statistics doesn’t deal with individuals:


Statistics deals with the aggregate of facts and doesn’t give any specific recognition
to the individuals and indicates the student in probability and statics is 45 doesn’t constitute
statistics group of students in probability and statistics is 45 forms statistics.

ii. Statistics doesn’t deal qualitative characteristics: The qualitative characteristics like
honesty, intelligence, kind, efficiency etc. which cannot be expressed in numbers are
not directly studies by statistics. However, it is possible to analyze such problems
statistically by expressing them in numbers. For example, we can study the
intelligence of students on the basis of their test grades.

iii. Statistical laws are not exact: Statistical laws and rules do not hold good in every
case. However, they are true in majority of cases. Generally statistical laws are
probabilistic in nature. It is also said that statistics itself is not an absolute measure. It
provides precise result minimizing error as much as possible.

iv. Statistics can be misused: Only or statistician can handle statistical data properly. It is
likely to be misused the statistics by non-statistical persons in handling data and
interpreting the result.

Scales of measurement
1. Nominal scale
2. Ordinal scale
3. Interval scale
4. Ratio scale

1. Nominal scale: It is the simplest type scale, also known as categorical scale. It is lowest
level of measurement .It is simply a system of assigning number or the symbols to objects
or events to distinguish one from another or in order or level them. The symbols or the
numbers have no numerical meaning. The arithmetic operations cannot be used for these
numerals.
For example, gender, occupation, religion are measured in nominal scale. If we use 1 for
male and 2 for female for measuring the gender, then 1 and 2 have no numeric meaning.

2. Ordinal scale: In this scale, the numerals are arranged in some order but the gaps between
the positions of the numerals are not made equal. It is used to rate preference of
respondents. It represents qualitative values in ascending or descending order.
For example, the characteristic under study is the attitude of people towards certain fact
such as positive, negative and bad, and then we may assign numbers 1 for positive, 2 for negative
and 3 for bad. These numbers are known as ranks.

3. Interval scale: In addition to ordering the data, this scale uses equidistant units to measure
the difference between scores. It assumes data have equal intervals. The intervals
between the ordered numerals are adjusted in terms of some rule.
For example, scale of temperature is an example of ordinal scale. In an increase in
temperature from 32°F to 42°F and from 64°F to 74°F, we can say the increases are equal of
10°F, but one cannot say that the temperature of 64°F is twice as warm as the temperature of
32°F.It means there is no true zero, but it possesses only arbitrary zero.

4. Ratio scale: Ratio scale is an extension of interval scale. It includes all the properties of
interval scale. This interval has also true zero point. Physical scales of time, length,
breadth, weight etc. can be considered as the simple example of ratio scale. Thus for
example, we can say that 40 seconds is twice as long as 20 seconds in certain
measurement of time. Mathematical operations like addition, subtraction, multiplication
and division can be performed.

Types of data
First step in statistical approach to a problem is the collection of numerical information
i.e. data. Actually data are the raw materials for final statistical conclusions. In statistics,
the main source is the data. To start any statistical work, we need information. These
information are data.
There are mainly two types of data on the basis of collection procedures.
i. Primary data
ii. Secondary data

i. Primary data: The data which are originally collected by investigator or


researcher for the first time for the purpose of statistical enquiry is called
primary data. It is collected by government, in individual, institution and
research bodies. It needs more fund, time and manpower. It is more reliable
and suitable. Different methods used for collecting primary data are as
follows
a) Direct personal interview
b) Indirect oral interview
c) Mailed questionnaire
d) Information through correspondents
e) Schedule sent through enumerator

ii. Secondary data: The data that has been already collected for a particular
purpose and used for next purpose is called secondary data. It is not new
and original data. These types of data are generally published in
newspapers, magazines, bulletins, reports, journals, website, radio etc.
Secondary data source is broadly classified into two types
i. Published source
ii. Unpublished source

Cross Section data:


Data for a single point or single factor is called cross section data. It is a snapshot of information
at a particular point. This type of data cannot describe changes over time or cause and effect
effect relationships in which one variable affects other variable. For example, population of
children in census year 2068.

Time series data:


The data which can be recorded over different period of time is called time series data. In this
case same measurements are recorded on regular basis i.e. daily, weekly, monthly, quarterly, half
yearly, yearly etc. E. g. population of Nepal in census years 2048, 2058, 2068.

Failure time data


The data of each unit is recorded for each follow up time till the occurrence of event or till the
unit fails is called failure time data. The unit is taken with similar characteristic and follows up
till the occurrence of event. It is also called time to event data. It is data obtained from clinical
studies, cancer studies, biomedical sciences etc.

Panel Data
It is longitudinal or cross sectional time series data. It is data related to behavior of entities
observed across time. Data of individuals is recorded repeatedly over number of years. E.g.
income of persons X and Y in years 2013, 2014 and 2015 according to age and qualification.

Population
It is totality of units or items under study belonging to a particular a class or group e.g. children
in a school, patients in a hospital, fruits in a tree, fishes in a pond etc. Census survey is conducted
to enumerate all the population units. Population can be divided into finite and infinite
population according to number of individuals belonging to the group.

Finite Population
Population containing countable number of individuals is called finite population e.g. vehicles in
workshop, customers in shopping mall, passengers in vehicle etc.

Infinite Population
Population containing unlimited number of individuals is called infinite population. E.g. fishes in
an ocean, stars in the sky etc.
Population can be further divided into homogeneous and heterogeneous according to type of
individuals in population.

Homogeneous Population
Population consisting of individuals of same type is called homogeneous population. e.g._
population of graduate passed out student.

Heterogeneous Population
Population consisting of individuals of different types is called heterogeneous population. It
contains sub population of different types. E.g. population of United States.

Application of Statistics in the Field of CSIT


Statistics respond appropriately to the new demands of research and development in various
areas of experimental sciences. With the advances in computer technology, many innovative
techniques for use in applied statistics have recently been developed. In medicine, agriculture,
business and government, these applications have solved important problems. Performing
calculations almost at the speed of light, the computer has become one of the most useful
research tools in modern times. Computers are ideally suited for data analysis concerning large
research projects. Researchers are essentially concerned with huge storage of data, their faster
retrieval when required and of data with the aid of various techniques. In all these operations,
Computers are of great help. Their use, apart expediting the research work, has reduced human
drudgery and added to the quality of research activity.
The computers can perform many statistical calculations easily and quickly .Computation of
means , standard deviations, correlation coefficients, ‘t’ tests, analysis of variance, analysis of
covariance ,multiple regression, factor analysis and various nonparametric analyses are just a few
of the programs and subprograms that can be solve very easily using computer. Similarly using
computer, linear programming, multivariate analysis, Monte Carlo simulation etc. are also can be
done very easily. Software packages are readily available for the various simple and complicated
analytical and quantitative techniques of which researchers generally make use of. The only
work a researcher has to do is to feed in the data he/ she gathered after loading the operating
system and particular software package on the computer.
The output, or to say the result, will be ready within seconds or minutes depending upon the
quantum of work. The storage facility of computers provides statistician the immense help to use
the stored data whenever it required. Innumerable data can be processed and analyzed with
greater ease and speed. Moreover, the results obtained are generally correct and reliable. Not
only this, even the design, online data collection, pictorial graphing and report are being
developed with the help of computers.
Hence, researchers should be given computer education and be trained in the line so that they
can use computers for their research work.
Researchers interested in developing skills in computer data analysis, must be aware of the
following steps:
(i) data organization and coding
(ii) storing the data in the computer
(iii) selection of appropriate statistical measures/ techniques
(iv) selection of appropriate software package
(v) Execution of the computer program.

First of all, researcher must pay attention toward data organization and coding prior to the input
stage of data analysis. If data are not properly organized, the researcher may face difficulty while
analyzing their meaning later on. For this purpose, the data must be coded. Categorical data need
to be given a number to represent them. Once the data is coded, 1t ls ready to be stored in the
computer.
Input devices may be used for the purpose. After this, the researcher must decide the appropriate
statistical measures he will use to analyze the data. He will also have to select the appropriate
program to be used. SPSS, SAS, STATA etc. are the special statistical packaged program
whereas Microsoft Excel can be used for simple statistical analysis.

You might also like