Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Probability and Statistics

ENG-PST

Dr. Tarik Adnan


Email: tarikalmohamad@karabuk.edu.tr
1
Office: 164
1.2. Common-Sense
Notions of Sampling,
Collection of Data

2
1- Simple random sampling
imply = ima etmek
• Simple random sampling implies that any certain
sample of a specified sample size has the same
chance of being chosen as any other sample of the
same size

• The importance of correct sampling revolves around


the degree of confidence (güven derecesi) with
which the engineer is able to analyse the problems
being given.

3
1- Simple random sampling (cont.)
• biased sample =
yanlı örneklem,
yanlı örnek
• For example, a sample is to be chosen to answer certain
• inference = sonuç
questions regarding football team preferences in a certain çıkarmak

state in Turkey. • Confined =


sınırlanmış, mahsur

• The sample involves the choice of 1000 families, and a


survey is to be conducted. What happen if random
sampling is not used? biased sample

4
2- Stratified random sampling
• Stratified =
• The previous random sampling approach is not always the katmanlı,tabakalı

solution to each existing problem. • Stratum = tabaka

• What is the difference between stratified and simple sampling? • Non-overlapping =


- stratified random sampling involves random selection of a sample within each üst üste gelmeyen ,
stratum örtüşmeyen

Safranbolu

Karabuk
Center
5
Example of data analysis
corrosion =
aşındırma ,
• The following example illustrates the analysis of ways that korozyon

may or may not reduce the amount of corrosion Fatigue =


yorgunluk,
• The coating is a protectant (e.g. a substance that provides bitkinlik, yorulma
protection) to minimize fatigue damage in an aluminum metal hasarı

Each value is an
average of two
experimental units
(specimens)

6
Cont. Corrosion results

• A plot of the averages in previous


table is illustrated in the next figure

• From the figure, we can draw


conclusions about the impact of (1-
coating the specimens) and (2- the
humidity)

• A relatively large value of cycles to


failure represents a small amount of
corrosion.

• Note that an increase in humidity appears to make the corrosion worse. The
use of the chemical corrosion coating procedure appears to reduce corrosion.
7
1.3. Measures of Location:
The Sample Mean and
Median

8
Definition 1.1

• Measures of location are designed to provide the engineer


with some quantitative values of where the center, or some
other location, of data is located.
• One obvious and very useful measure is the sample mean.
It is basically a numerical average.

• Suppose that the observations in a sample are 𝑥1 , 𝑥2 , … , 𝑥𝑛 . The


sample mean, denoted by 𝑥ҧ is:

9
Definition 1.2
• Tendency =
• One important measure is the sample median. The purpose meyletme, eğilim
of the sample median is to reflect the central tendency of
the sample in such a way that it is not influenced by extreme
values or outliers (i.e., smallest & largest values).

• Given that the observations in a sample are


𝑥1 , 𝑥2 , … , 𝑥𝑛 , arranged in ascending \ increasing order of
magnitude, the sample median is:

10
Example

Example
• Let say, the data set 𝐴 = 1.7 , 2.2 , 3.9 , 3.11 , 14.7 . The
sample mean and median are, respectively,

11
Cont.
Fulcrum =
• The median is the value separating the higher half from the dayanma
noktası
lower half of a data sample, a population, or a probability
distribution.
• Clearly there is a difference in concept between the mean and
median. It may be of interest to the engineer that the sample
mean is the centroid of the data in a sample.
• It is the point at which a fulcrum (centroid) can be placed to
balance a system of “weights” which are the locations of the
individual data. This figure shows the sample mean as a centroid of the with-

nitrogen sample.

12
MATLAB

• In the command window of MATLAB, create a vector called


A and has 5 elements.
• Use the functions mean and median to calculate the vector’
values

%A is a matrix with only one


dimension (i.e., vector)

13
Definition 1.3
• A trimmed mean is another method to quantify the
center of data in a given sample.

• How is it calculated? is computed by trimming away a


certain percent of both the largest and the smallest set
of values and calculating the mean of the remaining
values

• For example, let’s back to our preceding example of


nitrogen & no-nitrogen data and we want to calculate the
10% trimmed mean?
14
Cont.
• We can compute the 10% trimmed mean by eliminating
the largest 10% and smallest 10% and computing the
average of the remaining values.

• Since the sample size is 10 for each sample, hence, the


10% trimmed mean for the nitrogen group is

• And the 10% trimmed mean for the no-nitrogen group is

15
Exercises
1- The following measurements were recorded for the drying
time, in hours, of a certain brand of latex paint.

Assume that the measurements are a simple random sample.


(a) What is the sample size for the above sample?
(b) Calculate the sample mean for these data.
(c) Calculate the sample median.
(d) Plot the data by way of a dot plot.
(e) Compute the 20% trimmed mean for the above data set.
(f) Is the sample mean for these data more or less descriptive
as a center of location than the trimmed mean? 16
Exercises
2- According to the journal Chemical Engineering, an important
property of a fiber is its water absorbency. A random sample of 20
pieces of cotton fiber was taken and the absorbency on each
piece was measured. The following are the absorbency values:

(a) Calculate the sample mean and median for the above
sample values.
(b) Compute the 10% trimmed mean.
(c) Do a dot plot of the absorbency data.
(d) Using only the values of the mean, median, and trimmed
mean, do you have evidence of outliers in the data? 17
Samples’ Variability Measures

• The magnitude of the variability among the observations in the


sample may determine the success of a particular statistical
method. It plays an important role in data analysis x o

• In fact, it appears that the variability


within the nitrogen sample is larger than
that of the no-nitrogen sample

18
Cont.
• As another example, contrast the two data sets below. Each
contains two samples and the difference in the means is
roughly the same for the two samples, but data set B seems
to provide a much sharper contrast between the two
populations from which the samples were taken..

19
Sample Range & Sample Standard Deviation

Variance
Variability

Standard deviation

• There are many measures of spread or variability. The


simplest one is the sample range = 𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛 .

• The range can be very useful on statistical quality control.

• Another quantity is the sample measure of spread that is used


most often is the sample standard deviation. Let’s learn their
definitions 20
Definition: Variance & sample standard
deviation
• Assume 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 denote sample values.

• Large variability in a data set produces relatively large


2 and
values of 𝑥 − 𝑥ҧ thus a large sample variance. The
quantity 𝑛 − 1 is called the degrees of freedom associated
with the variance estimate 21
Exercises
• Solve the following exercises
1- Compute the sample variance and standard deviation of the
data set (5, 17, 6, 6).

2- A researcher is interested in testing the “bias” in a pH meter.


Data are collected on the meter by measuring the pH of a
neutral substance (pH = 7.0). A sample of size 10 is taken, with
results given by:

What is the sample mean 𝑥ҧ , the sample variance 𝑠 2 and


sample standard deviation? 22
MATLAB

• Create a vector and compute its variance & standard


deviation.

• Use the functions var & std to calculate the vector’s values

23
MATLAB

• Now create a matrix A and compute its variance and


standard deviation .
• Use the function var & std to calculate the matrix’s values

24
Thank you

Feel free to ask questions in the class or


through Microsoft Team.

25 25

You might also like