Introduction & Basic Concepts in Statistics

Introduction & Basic
Concepts in Statistics
Statistics is used in business and economics. It plays
an important role in the exploration of new markets
for a product, forecasting of business trends, control
and maintenance of high-quality products, improvement
of employer-employee relationship and analysis of data
concerning insurance, investment, sales, employment,
transportation, communications, auditing and
accounting procedures.
STATISTICS is the branch of mathematics that deals
with the theory and method of collecting, organizing,
presenting, analyzing and interpreting data.Two
Two Main Divisions/Phases of Statistics

1. DESCRIPTIVE STATISTICS refers to the summary statistic that
quantitatively describes or summarizes features from a collection of
data under investigation. The goal is to describe. Numerical measures
are used to tell about features of a set of data.
Examples:
 The average, or measure of the center of a data set, consisting of the mean, median, mode,
or midrange
 The spread of a data set, which can be measured with the range or standard deviation
 Overall descriptions of data such as the five number summary
 Measurements such as skewness and kurtosis
 The exploration of relationships and correlation between paired data
 The presentation of statistical results in graphical form

2. INFERENTIAL STATISTICS- statistical tools that are used to
examine the relationships between variables within a sample and then
make generalizations or predictions about how those variables will
relate to a larger population.
• Example:
 A confidence interval gives a range of values for an unknown parameter of the population by
measuring a statistical sample. This is expressed in terms of an interval and the degree of
confidence that the parameter is within the interval.
 Tests of significance or hypothesis testing where scientists make a claim about the population
by analyzing a statistical sample. By design, there is some uncertainty in this process. This can
be expressed in terms of a level of significance.
Two Branches of Statistics
1. Statistical Theory – is concerned with the formulation of
theories, principles, and formulas which are used as bases
in the solution of problems related to Statistics.
2. Statistical Methods – is concerned with the application of
the theories, principles and formulas in the solution of
everyday problems.
OTHER STATISTICAL TERMS:
• POPULATION – a set of data consisting of all conceivable possible observations of a certain
phenomenon. It refers to the totality of the observations. Population is denoted by capital N.
• SAMPLE – a finite number of items selected from a population possessing identical

characteristics with those of the population from which it was taken. Sample is denoted by
small letter n
• PARAMETERS – are characteristics/measures computed from the population
• STATISTIC/S – are characteristics/measures computed from the sample

• VARIABLE – refers to a fundamental quantity that changes in value
from one observation to another within a given domain and under a
given set of conditions. Variables may be represented by the letters
X, Y, etc.
• DISCRETE VARIABLE - is a variable whose value is obtained by
counting.
• CONTINUOUS VARIABLE- is a variable whose value is obtained by
measuring.
• CONSTANT – refers to fundamental quantities that do not change in
value.
Four Levels of Data Measurement
Nominal –also called the categorical variable scale, is defined as a scale
used for labeling variables into distinct classifications and doesn’t
involve a quantitative value or order. This scale is the simplest of the
four variable measurement scales. Calculations done on these variables
will be futile as there is no numerical value of the options. (ex. Sex,
gender, place of residence, political affiliation)
Ordinal –a variable measurement scale used to simply depict the order
of variables and not the difference between each of the variables.
These scales are generally used to depict non-mathematical ideas such
as frequency, satisfaction, happiness, a degree of pain, etc.
• Ordinal Scale maintains descriptional qualities along with an intrinsic order but is
void of an origin of scale and thus, the distance between variables can’t be
calculated. Descriptional qualities indicate tagging properties similar to the nominal
scale, in addition to which, the ordinal scale also has a relative position of variables.
Origin of this scale is absent due to which there is no fixed start or “true zero”.
• Examples:
 High school class ranking: 1st, 9th, 87th…
 Socioeconomic status: poor, middle class, rich.
 The Likert Scale: strongly disagree, disagree, neutral, agree, strongly agree.
 Level of Agreement: yes, maybe, no.
 Time of Day: dawn, morning, noon, afternoon, evening, night.
 Political Orientation: left, center, right.

• Interval Scale is defined as a numerical scale where the order of the variables is
known as well as the difference between these variables. Variables that have
familiar, constant, and computable differences are classified using the Interval
scale. It is easy to remember the primary role of this scale too, ‘Interval’ indicates
‘distance between two entities’, which is what Interval scale helps in achieving.
• These scales are effective as they open doors for the statistical analysis of
provided data. Mean, median, or mode can be used to calculate the central tendency
in this scale. The only drawback of this scale is that there no pre-decided starting
point or a true zero value.
• Interval scale contains all the properties of the ordinal scale, in addition to which, it
offers a calculation of the difference between variables. The main characteristic of
this scale is the equidistant difference between objects.
Interval Scale Examples
There are situations where attitude scales are considered to be interval scales.
Apart from the temperature scale, time is also a very common example of an interval
scale as the values are already established, constant, and measurable.
Calendar years and time also fall under this category of measurement scales.
 Likert scale, Net Promoter Score, Semantic Differential Scale, Bipolar Matrix
Table, etc. are the most-used interval scale examples.
Celsius Temperature.
Fahrenheit Temperature.
IQ (intelligence scale).
SAT scores.
Time on a clock with hands.
• Ratio Scale: 4th Level of Measurement
• is defined as a variable measurement scale that not only produces the order of
variables but also makes the difference between variables known along with
information on the value of true zero. It is calculated by assuming that the variables
have an option for zero, the difference between the two variables is the same and
there is a specific order between the options.
• With the option of true zero, varied inferential, and descriptive analysis techniques
can be applied to the variables. In addition to the fact that the ratio scale does
everything that a nominal, ordinal, and interval scale can do, it can also establish the
value of absolute zero. The best examples of ratio scales are weight and height. In
market research, a ratio scale is used to calculate market share, annual sales, the
price of an upcoming product, the number of consumers, etc.
• Examples of Ratio scale
 Age
 Weight
 Height
 Sales Figures
 Ruler measurements.
 Income earned in a week

STEPS IN A STATISTICAL INQUIRY OR INVESTIGATION
start with a problem
1. Collection of data
2. Presentation of data
3. Analysis of data
4. Interpretation of data
Data Collection and Data Presentation
What are DATA?
• Data are plain facts, usually raw numbers, words, measurements,

observations or just description of things. Think of a spreadsheet full
of numbers with no meaningful description. In order for these
numbers to become information, they must be interpreted to have
meaning.
TWO TYPES OF DATA
1. QUALITATIVE DATA is descriptive in nature ex., color, shapes
2. QUANTITATIVE is numerical information ex. weight, height

DATA COLLECTION
• Data collection is concerned with the accurate gathering of data;

although methods may differ depending on the field, the emphasis on
ensuring accuracy. The primary goal of any data collection is to
capture quality data or evidence that easily translates to rich data
analysis that may lead to credible and conclusive answers to questions
that have been posed.
METHODS OF DATA COLLECTION
1. THE INTERVIEW or DIRECT METHOD

The researcher or interviewer gets the needed data
from the respondent or interviewee verbally and directly
face-to-face contact.
2. THE QUESTIONNAIRE or INDIRECT METHOD
The questionnaire is a tool for data gathering and
research that consists of a set of questions in a
different form of question type that is used to collect
information from the respondents for the purpose of
either survey or statistical analysis study.
3. REGISTRATION METHOD
This method is used by the government such as the records of births at the
Philippine Statistics Authority (PSA), registration record at the COMELEC
4. OBSERVATION
This method is a way of collecting data through observing. The observer gains
firsthand knowledge by being in and around the social setting that is being
investigated.
5. EXPERIMENTATION
An experiment is a procedure carried out to support, refute, or validate a
hypothesis. An experiment is a method that most clearly shows cause-and-effect
because it isolates and manipulates a single variable, in order to clearly show its
effect.
DATA PRESENTATION
Once data has been collected, it has to be classified and organized in such a way that it becomes
easily readable and interpretable, that is, converted to information.
TYPES OF DATA PRESENTATION

1. TEXTUAL PRESENTATION
This type of presentation combines text and figures in a statistical report.
Example: news item in the newspaper
2. TABULAR PRESENTATION
This type of presentation uses tables consisting of vertical columns and horizontal rows
with headings describing these rows and columns. The data are presented in more brief and
orderly manner.
Example: frequency table
3. GRAPHICAL PRESENTATION
It is a most effective means of presenting statistical data because important
relationships are brought out more clearly in graphs.
DIFFERENT TYPES OF GRAPHS COMMONLY USED IN DATA PRESENTATION
1. BAR GRAPH
A bar chart or bar graph is a chart or graph that presents categorical data with
rectangular bars with heights or lengths proportional to the values that they
represent. The bars can be plotted vertically or horizontally.
LINE GRAPH
A line graph is a graphical display of information that changes continuously over time.
A line graph may also be referred to as a line chart. Within a line graph, there are
points connecting the data to show a continuous change. The lines in a line graph can
descend and ascend based on the data. We can use a line graph to compare different
events, situations, and information.
PIE GRAPH
A pie chart is a circular chart divided into wedge-like sectors, illustrating
proportion. Each wedge represents a proportionate part of the whole, and the total
value of the pie is always 100 percent.
Pie charts can make the size of portions easy to understand at a glance. They're
widely used in business presentations and education to show the proportions among a
large variety of categories including expenses, segments of a population, or answers to
a survey.

SCATTER DIAGRAM
A scatter diagram also called a scatterplot, is a type of plot or
mathematical diagram using Cartesian coordinates to display values for typically two
variables for a set of data. If the points are coded (color/shape/size), one additional
variable can be displayed. The data are displayed as a collection of points, each having
the value of one variable determining the position on the horizontal axis and the value
of the other variable determining the position on the vertical axis.
5. PICTOGRAPH/PICTOGRAM
A pictograph is a chart or graph, which uses pictures to represent data. A pictograph
is one of the simplest forms of data visualization.
Two types of Sampling
• Probability sampling
• Simple random
• Systematic
• Stratified
• Cluster
• Non-probability sampling
• Convenience/Accidental
• Judgmental/Purposive
• Quota
• Snowball
Probability vs non-probability sampling
1. Probability or Random Sampling
Provides equal chances to every single element of the population to be
included in the sampling.
2. Non-Probability Sampling
The samples are selected in a process that does not give all the
individuals in the population equal chances of being selected.
Samples are selected on the basis of their accessibility or by the
purposive personal judgment of the researcher.
Probability-based Sampling
Simple Random Sampling

 Lottery Method
 Fish Bowl Method
 Table of Random Numbers
Systematic Sampling
Step 1. Identify the population (N)
Step 2. Identify the number of sample (n) to be drawn from the
population
Step 3. Divide N by n to find nth interval
Example
Population is 1,000. Desired sample size is 100. Sampling interval is 10
Get a random start from 1 to 10 in the list as first sample and every 10th
in the list
Stratified Sampling
Used to ensure that different groups in the population are adequately represented in the sample
Step 1. Identify the population and divide the population into different groups or strata according to
criteria.
Step 2. Decide on the sampling size or actual percentage of the population to be considered as sample.
Step 3. Get a proportion of sample from each group
Step 4. Select the respondents by random sampling
Example : Population = 2000 Desired Sample Size = 10%

Proportion of sample per stratum = 10%
500 students x .10 = 50
600 businessman x .10 = 60
400 teachers x .10 = 40
500 farmers x .10 = 50
Total sample = 200
Select the 200 by random sampling.
Cluster Sampling
Often called geographic sampling
Used in large scale surveys
The population is divided into multiple groups called clusters . The
clusters are selected with simple random or systematic sampling
technique for data collection and data analysis.
Example: the Population includes elementary schools in the Province.
The province is first divided into Districts which are treated as clusters
and are randomly selected. From the districts, the schools can be picked
out at random and then classes and then students are selected at random
Non-Probability Sampling
1. Accidental or Convenience Sampling
Researcher selects subjects that are more readily accessible or
available.
2. Purposive Sampling
Subjects are selected based on the needs of the study.
Non-Probability-based Sampling
Quota Sampling
Researcher takes a sample that is in proportion to some characteristic or trait of the
population
The population is divided into groups or strata (the basis may be age, gender,
education level, race, religion etc.
Samples are taken from each group to meet a quota.
Care is taken to maintain the correct proportions representative of the population.
Example :
The population consists of 60% female and 40% male.
The desired sample size is 200.
Therefore, the sample should consist of ____ females and ____ males.
A study on science teaching is to be conducted in high schools of a region.
There are 4,641 teachers grouped according to area of specialization.
There are 2,243 biology teachers, 1,406 chemistry teachers and 992 physics
teachers.
The desired sample size is 300.
Select the sample according to the Quota Sampling technique.
4. Snowball Sampling
This type of sampling starts with known sources of information, who or
which will in turn give other sources of information . As this goes on,
data accumulates.
This is used to find socially devalued urban populations such as drug

addicts, alcoholics, child abusers and criminals because they are usually
hidden from outsiders.

Introduction & Basic Concepts in Statistics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction & Basic Concepts in Statistics

Uploaded by

Copyright:

Available Formats

Introduction & Basic

Two Main Divisions/Phases of Statistics

 Overall descriptions of data such as the five number summary

 Measurements such as skewness and kurtosis

 The exploration of relationships and correlation between paired data

 The presentation of statistical results in graphical form

• SAMPLE – a finite number of items selected from a population possessing identical

• PARAMETERS – are characteristics/measures computed from the population

• STATISTIC/S – are characteristics/measures computed from the sample

 High school class ranking: 1st, 9th, 87th…

 Socioeconomic status: poor, middle class, rich.

 Level of Agreement: yes, maybe, no.

 Time of Day: dawn, morning, noon, afternoon, evening, night.

 Political Orientation: left, center, right.

 Income earned in a week

start with a problem

• Data are plain facts, usually raw numbers, words, measurements,

1. QUALITATIVE DATA is descriptive in nature ex., color, shapes

2. QUANTITATIVE is numerical information ex. weight, height

• Data collection is concerned with the accurate gathering of data;

1. THE INTERVIEW or DIRECT METHOD

TYPES OF DATA PRESENTATION

Simple Random Sampling

Example : Population = 2000 Desired Sample Size = 10%

This is used to find socially devalued urban populations such as drug

You might also like