Lecture 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Probability and Statistics

Credit hours (3+0)


Course Code: - BS-301

Dr. Bulbul Jan 1


Introduction to Statistics

Chapter 01

2
What is Statistics ?
• The word “STATISTICS”, originally comes from the Latin
word status, meaning a political state,
• originally meant information useful to the state, for example,
• information about the sizes of populations i.e. census,
• size of armed forces,
• calculation of wealth etc.

3
History
• Statistics‟ was used in 1584 for a person skilled in state affairs, having political
knowledge, power or influence by Sir William Petty (a book entitled Sir William
Petty), a seventeenth-century polymath and statesman, used the phrase ′political
arithmetic′ for „statistics‟.
• It was used by kings, and politicians.
• that‟s why it was termed as Science of Kings, Politicians and Science of Statecrafts.
• But this word has now acquired different meanings.
•In early stage statistic was used by the state to collect information on public affairs for
administration.
•In current use, statistics is the area of study that aims to collect and arrange numerical
data, whether relating to human affairs or to natural phenomena.
• Gradually it was extended to all scientific experiments to collection, presentation and
analysis of data. 4
Definition
Statistics is a mathematical science involving the collection,
interpretation, measurement, enumerations or estimation analysis, and
presentation of natural or social phenomena, through application of
various tools and technique the raw data becomes meaningful and
generates the information‟s for decision making purpose.
OR
• It is branch of applied mathematics which help us to collect, analyze
and presenting data systematically
OR
• It is process of collecting, processing, summarizing, presenting,
analyzing and interpreting of data in order to study and describe a given
problem.
5
Defined by Scientists

• “ Statistics is the science of counting”. _ A. L. Bowley.


• “Statistics is the collection of noteworthy facts
concerning states, both historical and descriptive.” _
Achenwall.
• “ Statistics is the study that deals with the collection,
classification, and tabulation of numerical facts as the
basis for explanation, description and comparison of
phenomena.”_ According to Holtzclawn.
6
Importance of Statistics
Statistics is a subject used by everybody. The following functions tell us about
importance of statistics: -
• it helps us to summarizes large set of data and in a form which is easily
understandable.
• Helps to get solid information about any problem.
• Helps for reliable and objective decision making.
•Presents facts in precise & definite form.
•It facilitates comparison and prediction of data.
•It helps in formulation of suitable policies.
•It assists in designing a experiment in a field and to survey.
•It assists in planning in any field of inquiry.
•It helps us in drawing conclusions and making predictions of how much a thing will
happen under given conditions. 7
Limitations of Statistics

• Statistics does not provide informations about individual


particulars, it only concerned with average/aggregate facts.
• Statistics deals with quantitative items not qualitative.
• Statistical laws are not perfectly accurate. Due to this limitation,
the results are obtained are true only on average or in the long
run.
•Its applications in all discipline of knowledge where study of
quantitative phenomenon is required.
8
Applications in Real life
• Firstly statistics refers to numerical facts systematically arranged
• Like statistics of prices,
• statistics of roads,
• statistics of accidents,
• statistics of crimes,
• statistics of birth etc.
• In all above examples the word statistics denotes a set of numerical
“DATA” in respective fields.

9
• Statistical techniques being more powerful are used in almost
every branch of learning, biological and physical sciences like
Genetics, Agronomy, Astronomy, Physics, Geology, etc are the
main areas.
• A businessman, an industrialist and a research worker all employ
statistical methods in their work. Banks, insurance companies
and Government all have their statistics department.
• Administrators in public and private sector use it to provide a
factual basis for decision.

10
Application of Statistics in Science & other fields
• Economics : - the development of statistical methods like
analysis, observations, conclusions and their verification and
expansion of statistical data have brought economics and
statistics very close to each other.
• Mathematics: - Mathematics is the queen of all the sciences.
Mathematics is widely applied in statistics so that a branch of
statistics is studied, called Mathematical Statistics. Theories of
Sampling, Insurance and Normal law of error all depend on the
Mathematics.

11
• Agriculture:- Agriculture is greatly benefited by statistics.
The analysis of variance, correlation and regression are of
much importance in effects of temperature, rainfall, fertilizer
and sunshine on crops. Now a days field of agriculture is
incomplete without statistics.
• Astronomy:- Statistics is deeply related to Astronomy. The
method of least squares was first introduced by an
astronomer. Collection of data of planets, their movement,
and choosing their best position is all observed by statistical
methods.

12
• Meteorology: - Statistics help meteorology to average the
data, relating to temperature, humidity of air, air pressure,
averaging annual rainfall, fluctuations in temperature and
weather conditions, and predicting and forecasting different
aspects of weather.
• Banking:- In fact banks have to depend on statistical
information. To conduct enquiries regarding deposits,
transactions, loans, withdrawals and amount of credit. etc.
Statistical data helps them in forming Banking Policies, by
analyzing the previous records.

13
Applications of Statistics in Computer Science (CS) &
Artificial Intelligence (AI)
• Statistics is used for data mining, speech recognition, visualization and image
analysis, data compression, artificial intelligence, and network and traffic
modeling.
• A statistical background is essential for understanding algorithms and
statistical properties that form the backbone of computer science.
• Computer scientists tend to focus on data acquisition/cleaning, recovery,
mining, and reporting.
•They are often tasked with the development of algorithms for prediction and
systems efficiency.

14
•Focus is also placed on machine learning (an aspect of artificial
intelligence), particularly for the purposes of data mining (finding
patterns and associations in data for a variety of purposes, such as
marketing and finance).
•Probability and Statistics for CS treats the most common discrete and
continuous distributions, showing how they find use in decision and
estimation problems, and constructs computer algorithms for generating
observations from the various distributions.
•Probability is everywhere in CS. In networks and systems, it is a key
tool that allows us to predict performance, to understand how delay
changes with the system parameters, and more. In algorithms,
randomization is used to design faster and simpler algorithms than their
deterministic matching part. 15
•It is an important for robustness analysis, measurement system error analysis,
testing of data, probabilistic risk management and many others fields in
engineering and computer science.
•AI has made remarkable progress in various fields of application. These
include automated face recognition, automated speech recognition and
translation object tracking in film material, autonomous driving, and the field
of strategy games such as chess or go, where computer programs now beat the
best human players.
•Nowadays, automatic language translation systems can even translate
languages such as Chinese into languages of the European language family in
real time and are used,
•Another growing area for AI applications is medicine. Here, AI is used, e.g.,
to improve the early detection of diseases, for more accurate diagnoses, or to
predict acute events 16
IMPORTANT TERMS USED IN STATISTICS : -
• Data: - The collection of facts is called DATA. These facts may
take the form of counts or measurements. And the collection of
numerical facts is called statistical data.
• Array: - When data is sorted out in ascending or descending
order according to the desire, it is called to array the data.
• Population: - A large collection of individuals, objects or
measurements( such as products, customers, firms, employees,
prices etc.) will be called a population. A master set of similar
elements is called a population and its size is denoted by N.
17
• Sample: - a representative part of a population selected from
the same population is called a sample or a sample is a
subset of master set or a sample is a small part of a
population and is denoted by n.
• Parameter: - A constant of population is called parameter.
For example the average height of all students of a college is
a parameter.
• Errors: - An error is generally means a mistake, but in
statistics error is used in different meaning. It is only
difference between the true value and the estimated value of
the phenomenon.
• Statistic : - A numerical value such as an average used as a
summary measure for a sample( e.g. the sample average)
18
• Variable: - A characteristic or phenomenon which on different
values for different members of the population or sample data sets
(e.g. weight, monthly sales, gender of employee, income, etc. )
• Discrete Variable: - A numerical variable whose values can vary
only in steps, often associated with counts (e.g. number of
employees, products, etc).
• Continuous Variable: - A numerical variable which values in
contrast to a discrete variable is measured on a continuous scale
and hence is not restricted to specific, discrete values (e.g.
weights, ages, heights, etc, of objects or people).
Types of Data
On the experimental point of view, the first step which statistics deals is
collection of numerical data. It is classified into two types
•Primary data
Data which are collected first time for a specific purpose and are
original in nature are known as primary data. It is also called raw data.
•Secondary data
while those used in investigation have been originally collected by
someone else are known as secondary data.
 Data are primary to the collector and secondary to the user.
20
Cont.
Further classification Data:

SHAFQAT ALI LASHARI 21


 The primary difference between discrete and continuous data is
that discrete data is a finite value that can be counted whereas
continuous data has an infinite number of possible values that
can be measured.
 Discrete data can only take certain values that can be counted
e.g. number of students in the class, the number of computers in
a computer lab.
 Data comes from measuring and can take any value within a
given range e.g. weight of students, daily temperature.

22
Branches of Statistics
• There are two main branches of statistics
• Descriptive Statistics: - It is first phase of statistics. It provide basic
information of data, either through numerical calculations or graphical
/tabulation. These can be used for data sets which relate to entire
populations or samples.
• Inferential Statistics: - It is second phase of statistics. Techniques
employed in making estimates and drawing conclusions about the
characteristics of a statistical „population‟ using results from a sample data
set. It is processes of performing hypothesis testing, determining
relationship among variables and making predictions.
23
What is Classification and Tabulation?
• Classification is the process of arranging data into different classes or
groups according to the resemblances and similarities.
Bases of Classification: -
• The following are four important bases of classification: -
• Qualitative Classification: - When data is classified according to some
measurable characteristics which are not capable of quantitative
measurement like religion, gender, marital status, occupation, etc.
• Quantitative Classification: - When data is classified according to some
measurable characteristics or if we make groups or classes according to
numerical values of the variables.
24
• Geographical Classification: - it is based on geographical
regions e.g. temperature of different places, population of
diff: cities, etc.
• Chronological Classification: - When data is classified on the
basis of time i.e. years , months , days, hours, temp: of a city
in 7 days etc.
• Classification should be definite, stable, and flexible.

25
• Tabulation: - The process of arranging data into rows
and columns is called tabulation. Tabulation may be
simple, double or complex depending on the type of
classification.
It should be brief and self explanatory.
• Advantages: -
1. Figures can be located quickly.
2. Comparison of figures is possible.
3. Takes less space.
26
What is Grouped and Ungrouped data?
• When data is small, and presented in original form, it is
called ungrouped data.
• When data is large, it is distributed into frequencies,
called grouped data.

27
What are the methods used to collect primary data?

• 1) Direct personal investigation: - In this method and


investigator collects the information personally from the
individual concerned. He/She interviews the informant.
• 2) Collection through Questionnaires: - A questionnaire is an
inquiry form comprising of a number of relevant questions with
space for entering information asked. The questionnaires are
sent by mail, published in newspaper or displayed on websites.
This method is cheap and we can collect information from a
large population. The questionnaire should be simple, brief and
easy for all respondents to answer. 28
What are the methods used to collect secondary data?

• Secondary data may be obtained from: -


• The official publications, journals of the statistical
division
• Public offices of government
• ministries of finance
• food, agriculture, population planning, labour, education
departments etc.
29
• State Bank of Pakistan, Private Banks, Semi
Government Organizations,
• Hospitals, District Government offices, Union Council
offices etc.
• Trade Associations, Journals and Newspapers,
Research Institutes, Universities etc.

30

You might also like