Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

ENGINEERING DATA ANALYSIS

CHAPTER 1.1: INTRODUCTION

Statistics is a group of methods used to collect, analyze, present,


and interpret data and to make decisions.

 Descriptive Statistics consists of methods for organizing,


displaying, and describing data by using tables, graphs, and Data may be obtained from:
summary measures.  Internal sources
 Inferential Statistics consists of methods that use sample  External sources
results to help make decisions or predictions about a  Surveys and Experiments
population.
CHAPTER 1.4: NUMERICAL DESCRIPTIVE MEASURES
Population consists of all elements- individuals, items, or
MEAN FOR UNGROUPED DATA:
objects- whose characteristics are being studied. The population
that is being studies is also called the target population. Mean for population data:
Sample is a portion of the population selected for study. Mean for sample data:
Census is a survey that includes every member of the population. Median is the value of the middle term in a data set
that has been ranked in increasing order. If there is an
Sample survey is the technique of collecting information from a
even number of values in the data set, the median is
portion of the population.
given by the average of the two middle values. The
Representative sample is a sample that represents the median gives the center of a histogram, with half the
characteristics of the population as closely as possible. data values to the left of the median and half to the
right of the median. The advantage of using the median
Random Sample is a sample drawn in such a way that each as a measure of central tendency is that it is not
element of the population has a chance of being selected.
influenced by outliers. Consequently, the median is
 Simple random sampling is all samples of the same size preferred over the mean as a measure of central
selected from a population have the same chance of being tendency for data sets that contain outliers.
selected. Mode is the value that occurs with the highest
Element or member of a sample or population is a specific frequency in a data set. Only advantage of the mode is
subject or object about which the information is collected. that it can be calculated for both kinds of data.

Observation or measurement is the value of a variable for an  Unimodal: A data set with only one mode
element.  Bimodal: A data set with two modes.
 Multimodal: A data set with more than 2 modes.
Data set is a collection of observations on one or more variables.
RELATIONSHIPS AMONG MEAN-MEDIAN-MODE
Variable is a characteristic under study that assumes different
values for different elements. In contrast to a variable, the value 1. For a symmetric histogram and frequency curve
of a constant is fixed. with one peak (Figure 3.2), the values of the mean,
median, and mode are identical, and they lie at the
center of the distribution.

 Quantitative Variables are variables that can


be measured numerically.
 Discrete Variables- a variable whose values are
countable. It can assume only certain values 2. For a histogram and a frequency curve skewed to
with no intermediate values. the right (Figure 3.3), the value of the mean is the
 Continuous Variables- a variable that can largest, that of the mode is the smallest, and the
assume any numerical value over a certain value of the median lies between these two.
interval. (Height, weight, age) (Notice that the mode always occurs at the peak
 Qualitative Variables a variable that cannot point.) The value of the mean is the largest in this
assume a numerical value but can be classified case because it is sensitive to outliers that occur in
into two or more nonnumeric categories. the right tail. These outliers pull the mean to the
Cross-section data are data collected on different right.
elements at the same time.

Time-series data are data collected on the same


element for the same variable at different time/period.
3. If skewed to the left, the value of the mean is the smallest
and that of the mode is the largest, with the value of the
median lying between these two. In this case, the outliers
in the left tail pull the mean to the left.

MEASURES OF DISPERSION
 Range= largest value – smallest value
 Standard Deviation is the most used measure of
dispersion as it tells how closely the values of a data set
are clustered around the mean. For a population it is
denoted by σ and for the sample data it is denoted by s
.
 Lower value indicates that the value of that data
set is spread over a relatively smaller range around
the mean.
 Larger value indicates that the value of the data
set is spread over a relatively large range around
the mean.

 Variance calculated for population is denoted by σ 2 and


the variance calculated for sample data is denoted by s2
.

POPULATION PARAMETERS AND SAMPLE STATISTICS

Population parameters/parameter are numerical measures


such as the mean, median, mode, range, variance, or standard
deviation.

Sample statistic/statistic is a summary measure calculated for a


sample data.

MEAN FOR GROUPED DATA

VARIANCE AND STANDARD DEVIATION FOR GROUPED DATA

You might also like