Professional Documents
Culture Documents
STA124 Complete Note (Edward cares)
STA124 Complete Note (Edward cares)
Example 2: A coin is tossed twice (or equivalently, two coins are tossed together
once) and the number of Heads is counted. The sample space for this experiment is
S = {HH, HT, TH, TT}. We can assign the real number 2 to the outcome HH, the
real number 1 to the outcome HT and TH, and the real number 0 to the outcome TT.
Several representations of this assignment are possible:
If we define the random variable Y as the number of heads, then we can have the
functional notation of this as Y : S R , and the rule of assignment defined by
Y(HH) = 2
Y(HT) = Y(TH)=1
Y(TT) = 0
Range Space: Clearly, X and Y in the above examples are mappings of elements
of the sample space S into the set of real numbers R. Therefore, R is called the
range space. The range space for example 1 is R = {0, 1} and for example 2 is R =
{0, 1, 2}
Discrete random variables
X will be referred to as a discrete random variable if the range space Rx contains
finite or countably infinite numbers of points. Examples 1 and 2 above are discrete
random variables
If we define the random variable X as the number of defectives, then the real number
assignment can be given as follows
X(DDD) = 3
X(DDN) = 2
X(DND) = 2
X(NDD) = 2
X(DNN) = 1
X(NDN ) = 1
X(NND) = 1
X(NNN) = 0
The range space is thus R = {0, 1, 2, 3}
Example 4: Consider an experiment involving the toss of a fair die twice (or tossing
two dice together once). Let the sample space S consist of the 36 ordered pairs (i,j),
i = 1,2,3,4,5,6); j = 1,2,3,4,5,6, where i is the outcome on the first die and j the
outcome on the second die.
We can represent the outcomes as follows
(1,1), (1,2), (1,3), (1,4), (1,5), (1,6)
(2,1), (2,2), (2,3), (2,4), (2,5), (2,6)
(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)
(4,1), (4,2), (4,3), (4,4), (4,5), (4,6)
(5,1), (5,2), (5,3), (5,4), (5,5), (5,6)
(6,1), (6,2), (6,3), (6,4), (6,5), (6,6)
(6,6): X(6,6) = 12
Then the range space R ={2,3,4…,11,12}
Probability density (mass) function of a discrete random variable: The
probability density (mass) function of a discrete random variable X (denoted f(x) or
probability mass function p(x)) is a real valued function defined as f(x) = Pr(X=x).
Thus if X is a discrete random variable which can take on values x1, x2,…, R x , then
f(x) will be called a probability density (mass) function if
(i) f(x) ≥ 0
(ii) f ( x) = 1
Rx
(iii) Pr(X =x ) = f(x)
6
k (2 x 1) 1
x1
E ( X ) xf ( x)
Rx
The discrete formula given above says to take a weighted sum of the values x of X, where the
weights are the probabilities f(x)
Example 1: Suppose that the probability function of a random variable X is given in the table
below
X 0 1 2
f(x) ¼ ½ ¼
Example 2: Suppose that an engineer who needs a little extra money devises a game of chance in
which some of his friends might wish to participate. The game he proposed was to let his friends
cast an unbiased die and then receive a payment according to the following schedule.
A: If {1,2,3} occurs, he receives N1000, B if {4,5} occurs, he receives N5000, and if C = {6}
occurs, he receives N35000. Determine the average mount expected to be paid.
Solution: Since the die is unbiased, the probabilities of the respective events are 3/6/ 2/6 and 1/6.
We can present the information as follows
Definition: The expectation of any function of discrete X, denoted h(x) can be given as
E(h(X)) = h(x)f(x)
Rx
2
Suppose h(x) = x , then
E( X 2 ) = x 2
f(x)
Rx
Variance of X
Suppose that the function of X, h(x) = ( x ) , then for the discrete X the variance is given as
2
x2 = E ( X ) 2 = (x - ) 2
f(x)
Rx
Consider x2 = E ( X )
2
= E ( X 2X )
2 2
= E ( X ) 2E ( X ) )
2 2
= E( X 2 ) 2
= E ( X ) ( E ( X ))
2 2
Activity 1
Suppose that the probability density function of a random variable X is as given in the table below
X -2 1 3
Activity 2
Suppose that a game is played with a single die which is assumed fair. In this game, a player wins
N90 if a 2 turns up, loses N30 if a 6 turns up while he neither wins nor loses if any other face turns
up. Find the expected amount won/lost on the toss and its variance.
Solution: Let X be a random variable denoting the amount won/lost on a toss with probability f(x)
of winning or losing. For the equally likely experiment, f(x) =1/6 for all i = 1,2,…,6.
We can then express the scenario in terms of the observed X as
90 , f ( x) 1 / 6
X 30 , f ( x) 1 / 6 ,
0, f(x) 4 / 6
Activity 3
Let X have the pdf given as
x
, x 1, 2, 3, 4
f ( x) 10
0, otherwise
4
x 2 1 2 2 2 3 2 4
2
E(X ) =
x
x
1
2
(1 ) (2 ) (3 ) (4 ) 10
10 10 10 10 10
Activity 4
Let X be a random variable denoting the square of the score of number shown when a die is tossed
once.
Find the mean and the standard deviation of X
Solution
Let the possible values of the experiment be given as
S = {1, 4, 9, 16, 25, 36}
Each outcome has probability of 1/6
The mean of X, E(X) = 1/6(1+4+9+16+25+36)
= 91/6 = 15.1667
Computing E(X2) = 1/6(12+42+92+162+252+362)
= 1/6(1+16+81+256+625+1296)
= 2275/6 =379.1667
Definition: Let X be a continuous random variable, then the probability density function (pdf) of
X with range space Rx that is an interval or union of intervals, is a non-negative integrable function
defined for all real values of x (, ) and satisfying
Activity 1
Let the pdf of a continuous random variable X be given as
3
(4 x 2 x ), 0 x 2
2
f ( x) 8
0 otherwise
Compute Pr(X>1)
3 2 1
Solution: Pr(X>1) =
8 1
(4 x 2 x 2 )dx =
2
Activity 2
Suppose that X is a continuous random variable having the pdf given as
K (4 x 2 x 2 ), 0 x 2
f ( x)
0 otherwise
Solution: Since f(x) is a pdf, we have that f ( x)dx 1
2
K (4 x 2 x 2 )dx 1
0
8K 3
1 and K = ,
3 8
Activity 3
Given that a random variable Y has the pdf given as
y
f ( y) e , y 0
0 otherwise
For continuous-type random variables, the definitions associated with mathematical expectation
are the same as those in the discrete case except that integrals replace summations. For
illustrations, let X be a continuous random variable with a pdf f(x).
E ( X ) xf ( x)dx ,
Rx
Definition: The expectation of any function of continuous random variable X, denoted h(x) can
be given as
E(h(X)) = R h(x)f(x)dx
x
Activity 1
Let the probability density function of a continuous random variable X be given as
2
f ( x) 3x in the range [0,2]
8
Find E(X)
Solution:
E ( X ) Rx xf ( x)dx
2 3x 2 2 3
= 0 x dx = 3x dx
8 0 8
2
4 3
= 3x
32 2
0
E( X ) 2
Activity 2
Let the pdf of a continuous random variables be given by
4
(9 x x ), 0 x 3
3
f ( x) 81
0, otherwise
Solution:
4 3
81 0
= x(9 x x 3 )dx
4 3 2
81 0
= (9 x x 4 )dx
= 1.6
4 3 2
E(X2) =
81 0
x (9 x x 3 )dx
= 3
Var(X) = E(X2)-(E(X))2 = 3-(1.6)2
= 0.44
STA124 Lecture Module 2
Bernoulli Trials and Binomial Distribution
Bernoulli Trial: This is a random experiment with exactly two possible
outcomes "success" and "failure", in which the probability of success is
the same every time the experiment is conducted
For example, tossing a fair coin has outcomes head (H) or tail (T). An
item selected from a production process can be defective (D) or non-
defective (ND). A patient admitted in the hospital can either be dead (D)
or alive (A). It is generally termed as a success-failure experiment with
probability of success p and the probability of failure 1-p or q
Definition: Let X be a discrete random variable, which can only assume
two values, 1 or 0, then the
probability density function of X can be written as
f ( x) p x (1 p)1 x , x 0,1
= 0. p 0 (1 p)10 1. p1 (1 p)11
0 p p
Variance of X is
Var(X) = E(X2) – (E(X))2
= 1
x
x p (1 p)
2 1 x p 2
x 0
= 0. p 0 (1 p)10 1. p1 (1 p)11 p 2
p p 2 p (1 p ) or pq
Thus the mean and variance of the above Bernoulli distribution are p and
p(1-p) respectively.
= (0.25)5
= 0.0009765
(iii) Pr(X≥1) = 1-Pr(X<1)
= 1- 0.0009765
= 0.999
Mean and variance of a binomial distribution: The mean and variance
of a binomial distribution can be derived using direct method, method of
moment or moment generating function.
We shall use the direct method in this course
Consider
X2 = (X(X-1) + X)
and
E(X2) = E(X(X-1) + E(X)
where
n
= EX x p x 1 p
n! n x
x 0 x!n x !
n
n(n 1)!
x p x 1 p1 p
n x
x 0 x( x 1)!n x !
n 1 n 1
x 1
np P 1 P
n x
x 0 x 1
n 1 n 1
= np , since P x 1 1 P n x =1
x 1 x 0
n n x
Also. E X X 1 xx 1 p 1 p n x
x 0 x
n
n(n 1)( n 2)!
xx 1 x( x 1)( x 2)!(n x)! p p 2 1 p
x2 n x
x 0
n2
(n 2)!
n(n 1) p 2 p x 2 1 p
n x
x 0 ( x 2)!( n x )!
n(n 1) p 2
Thus
E(X2) = E(X(X-1) + E(X) = n(n-1)p2 + np
= 1-(0.012+0.058+0.137+0.205+0.218)
= 0.370
(a)The average number of ceiling fans that functions more than three
months is obtained
as
E(X) =np
= 20(0.2) =4
Therefore, the average number to be replaced in three months
20-4 = 16
Self Assessment Questions
1. The probability that a certain component will survive electric shock is
3/4. If 5 components are
selected at random, what is the probability that
(a) at most 2 components will survive electric shock?
(b) at least 3 components will survive electric shock?
Generally, we can define a geometric random variable X as the number of trials until the first
success is obtained.
(a). Terminals on an on-line computer system are attached to a communication line to the central
computer system. The probability that any terminal is ready to transmit is 0.95. Let X = number
of terminals polled until the first ready terminal is located.
(b). It is known that 20% of products on a production line are defective. Products are inspected
until first defective is encountered. Let X = number of inspections to obtain first defective
(c). One percent of bits transmitted through a digital transmission are received in error. Bits are
transmitted until the first error. Let X denote the number of bits transmitted until the first error.
Definition: Consider a repeated independent trials where the outcome of each trial can result in a
success with probability p and a failure with probability q =1-p, then the probability distribution
of a random variable X, the number of trials before the first successes occurs is given as
Pr( X x) q x1 p, x 1, 2, 3,
The random variable defined above denotes the number of trials required to obtain the first success
The mean and variance of geometric distribution are
1 1 p
The mean E(X) = , and Var(X) =
p p2
Examples
(1) Suppose that the random variable X has a geometric distribution with p = 0.5, determine the
following
(a)Pr(X=1) (b) Pr(X=8) (c)Pr(X ≤ 2) (d) Pr(X > 2)
(e) The mean and variance of X
Solutions
The pdf is given by
Pr( X x) 0.5 0.5 x1, 1, 2, 3,
(a)Pr(X=1) Pr( X 1) 0.5 0.511 0.5
(b) Pr( X 8) 0.5 0.581 0.0039
(2) A representative from the National Football League's Marketing Division randomly selects
people
on a random street in Kansas City, Kansas until he finds a person who attended the last home
football game. Let p, the probability that he succeeds in finding such a person be 0.20, and, let X
denote the number of people he selects until he finds his first success.
(a) What is the probability that the marketing representative must select 4 people before he finds
one
who attended the last home football game?
(b) Find the average number of people the marketing representative needs to select before he finds
one who will attend the last home football game and the standard deviation
Solution: Since X denotes the number of people he selected until he finds his first success, then
using Geometric distribution, the required probability is
41
(a) Pr( X 4) 0.20 (0.80 )
0.20(0.80)3 = 0.1024
1 1
5
(b) The average number is p 0.20
That is, we should expect the marketing representative to select 5 people before he finds one who
will attend the last football game.
The variance is
1 p
σ =Var(X) = p2 =
0.08
4.472
0.20 2
POISSON DISTRIBURION
Definition: Let X be a discrete random variable with a non-negative integer support. X will be
said to have a Poisson distribution with parameter λ>0 if the pdf is given as
e x
f ( x) x 0,1, ,
x!
where λ is the mean number of occurrence per unit.
Example 1
During a laboratory experiment, the average number of radioactive particles passing through a
point per millisecond is 4. What is the probability that
(a) exactly 6 particles will pass through the point in a given millisecond?
(b) at least 7 particles will pass through the point in a given millisecond?
(c) between 2 and 6 particles will pass through the point in a given millisecond?
Solution: let X be the random variable denoting the number of particles that pass through the point
per millisecond.
Given λ = 4, then the probability that x particles will pass through the point is given as
e 4 4 x
Pr( X x) x 0,1,
x!
Thus
e 4 4 6 e 4 4 6
(a) Pr( X 6)
6! 6 5 4 3 2 1
0.0183 4096
720
= 0.1041
e 4 4 x
(b) Pr( X 7)
x 7 6!
6
4x
1 e 4
x 0 6!
0 1 2 3 4 5 6
= 1 e 4 4 4 4 4 4 4 4
0! 1! 2! 3! 4! 5! 6!
= 1-0.0183(1+4+8+10.6667+10.6667+8.5333+5.68889)
= 1- 0.8886 = 0.1114
(a) `Exercise for students
Example 2
A sales firm receives on the average three call per hour on its toll-free number. For any given hour,
find the probability that it will receive the following.
(a) At most three calls (b) At least three calls
Solution: Let X be the number of calls per hour with average number =3, then
e 3 3 x
Pr( X x) x 0,1,
x!
(a) The probability of at most three calls is
30 31 3 2 33
e 3
0! 1! 2! 3!
e 3 13 = 0.64723
(b) The probability of at least three calls is
2 e 33x 30 31 3 2
Pr( X 3) 1 1 e 3 0.57680
x0 x! 0! 1! 2!
UNIFORM DISTRIBUTION
A continuous random variable X is said to have a uniform distribution if the pdf is given as
1
f ( x) , x
The uniform distribution arises in a situation when we suppose a certain random variable is
equally likely to be near any value in the interval [α, β]. The probability that X lies in any
subinterval of [α, β] is equal to the length of that subinterval divided by the length of the interval
[α, β].
Mean and variance of a continuous uniform distribution
The mean of X is given as
x
E( X ) dx
2 2 ( )( )
=
2( ) 2( )
=
2
Example 2
Let the continuous random variable X denote the current measured in a thin copper wire in
Milliamperes (mA). Assume that the range of X is [0, 20 mA], and assume that the probability
density function of X is f(x) = 0.05, 0≤x≤20.
(i)What is the probability that a measure of current is between 5 and 10 millamperes?
(ii) Find the mean current and the variance.
Solution:
10
(i) Pr( 0<x<10) = 5 0.05dx
= 0.05 x 5 0.25
10
(20 0) 2
(ii) a=0, b=20, so that E(X) = (0+20)/2 =10mA and Var(X) = 33.33mA 2
12
NORMAL DISTRIBUTION
A random variable X is said to have a normal distribution with mean μ and variance σ2 if the pdf
is given by
( x )
e
1
2 2 , -∞<x<∞; -∞< μ <∞; σ >0
2
f(x) =
2 2
The normal distribution above is often denoted N(μ, σ2).
Some useful results concerning a normal distribution are summarized in Fig. 6.1. For any normal
random variable, it is shown that
Pr(μ-σ<X< μ+ σ) = 0.6827
Pr(μ-2σ<X< μ+ 2σ) = 0.9545
Pr(μ-3σ<X <μ+ 3σ) = 0.9973
Standard normal random variable: If X is a normal random variable with E(X) = μ and Var(X)
= σ2, then Z = X is called the standard normal random variable with E(Z) = 0 and Var(Z) =
1.
The cumulative distribution function of a standard normal random variable is denoted as
( z ) Pr( Z z )
In order to find the probability of interest, the normal random variable X needs to be transformed
to the standard normal random variable. Examples are as illustrated below,
X a
(1) Pr(X ≤ a) = Pr
Pr Z z a
X a
(2) Pr(X ≥ a) = Pr
Pr Z z a
1 Pr Z z a
Solution:
a. (i) Let X denote the current measurements in milliamperes.
The mean = 10 and variance 2 = 4, so that =2
Using example 2 above, the probability can be written as
X 10 13 10
Pr(X>13) = Pr
2 2
Pr Z 1.5
1 Pr Z 1.5
= 1-0.93319
= 0.6681
Example 2
Each month, an American household generates an average of 28 pounds of newspaper for
garbage or recycling. It is assumed
that the amount generated is normally distributed and the standard deviation is 2 pounds. If a
household is selected at random, find the probability of its generating
(a) Between 27 and 31 pounds per month
(b) More than 30.2 pounds per month
Solution
µ = 28, σ = 2
(a) P(27<X,31)
Find the Z value as follows
X 27 28
Z1 0.5
2
X 31 28
Z2 1.5
2
Pr(27< X <31) = Pr(-0.5 < Z<1.5)
0.9332-0.3085 = 0.6247
APPENDIX
Negative Binomial Random variable
The number X of trials before producing k successes in a negative binomial experiment is called
a negative binomial random variable, and its probability distribution is called negative binomial
distribution.
Probability Density function of Negative Binomial Distribution
We can formally conceptualize negative binomial distribution as follows. In a series of Bernoulli
trials (independent trials with constant probability of success), let the random variable X denote
the number of trials until r successes occur. Then X is a negative binomial random variable with
parameter 0<p<1 and r = 1, 2, 3, ……., with pdf given by
x 1 r xr
f ( x) Pr( X x) p q , r , r 1, r 2, , where q = 1-p.
r 1
Mean and Variance of Negative binomial random variable
If X is a negative binomial random variable with parameter p and r, then the mean
r
E( X ) and 2 Var( X )
rq
p p2
Example 1
(1) In an opinion poll on voters exiting from a polling booth and asking them if they voted for
a female Presidential candidate. The probability (p) that a person voted for a female
candidate is 20%
(a) What is the probability that 15 voters will be asked until you can find 5 people who voted
for a female candidate?
(b) Compute the expected number of voters to be interviewed in order finding 5 people who
voted for female candidates and the variance.
Solution: Let X be the random variable denoting the number of voters interviewed before finding
5 voters who voted for a female presidential candidate. We have that number of success k =5 and
the number of voters required to obtain the 5 successes is x-k =15-5 =10.
P= 0.2, q =1-p = 0.8
(a) Thus the required probability is
14
Pr( X 5) (0.2)5 (0.8)10 = 0.034
4
5 5(0.8)
E(X) = = 25 and Var(X) = =100
0 .2 ( 0 .2 ) 2
Example 2: Suppose that independent tosses of a biased coin are made until r heads are
observed, where the probability of observing a head on any given toss is 0.25. Let X be
the random variable denoting the number of trials required to obtain the rth head.
(a) Give the appropriate probability density function for this experiment.
(b) Given that r =3 in question 2(a) above, find the probability that
(i) 3 heads are observed on the sixth trial.
(ii) at least 6 trials are required to observe 3 heads.
x 1
Pr( X x) 0.25 20.75 x2 , 3, 4,…
2
Required to find
x 1
(i) Pr( X 6) 0.2530.75 x3 ,
2
5
0.2530.753
2
= 0.066
Example 3
On each randomly dialed number, there is a 9% chance of reaching an adult who will complete the
survey.
(a) What is the probability that the 3rd completed survey occurs on the 10th call?
(b) Determine the average number of telephone calls required to get 3 completed survey and the
variance.
Solution:
(a) Let X be a random variable denoting the number of calls required to complete or more
surveys.
10 1
Pr( X 10) 0.093 (1 0.09103 ,
3 1
9
Pr( X 10) 0.093 (1 0.09 7 ,
2
= 0.0136
3 3(0.91)
(b) E(X) = = 33.3 and Var(X) = =337.03
0.09 (0.09) 2
CORRELATION
Correlation refers to a process for establishing the relationships between two variables. You
learned a way to get a general idea about whether or not two variables are related, is to plot them
on a “scatter plot”.
In statistics, Correlation studies and measures the direction and extent of relationship among
variables, so the correlation measures co-variation, not causation. Therefore, we should never
interpret correlation as implying cause and effect relation. For example, there exists a correlation
between two variables X and Y, which means the value of one variable is found to change in one
direction, the value of the other variable is found to change either in the same direction (i.e. positive
change) or in the opposite direction (i.e. negative change). Furthermore, if the correlation exists,
it is linear, i.e. we can represent the relative movement of the two variables by drawing a straight
line on graph paper
Scatter Diagram
A scatter diagram is used to examine the relationship between both the axes (X and Y) with one
variable. In the graph, if the variables are correlated, then the point drops along a curve or line. A
scatter diagram or scatter plot gives an idea of the nature of relationship.
In a scatter correlation diagram, if all the points stretch in one line, then the correlation is perfect
and is in unity. However, if the scatter points are widely scattered throughout the line, then the
correlation is said to be low. If the scatter points rest near a line or on a line, then the correlation
is said to be linear.
Perfect positive correlation Perfect negative correlation
No Correlation
Correlation coefficient
Correlation coefficient shows the measure of correlation. To compare two datasets, we use the
correlation formulas.
The most common formula is the Pearson Correlation coefficient used for correlation between the
two variables say X and Y. The value of the coefficient lies between -1 to +1. When the coefficient
comes down to zero, then the data is considered as not related. While, if we get the value of +1,
then the data are perfectly positively correlated, and -1 has a perfect negative correlation. If r
denotes the sample correlation coefficient between two variables, then the Pearson correlation
coefficient is given by
,
where n = Number of observations
Σx = Total of the X variable value
Σy = Total of the Y variable value
Σxy = Sum of the product of X and Y variables
Σx2 = Sum of the squares of X variable
Σy2 = Sum of the squares of the Y variable
Baby No 1 2 3 4 5 6 7 8 9 10
Weight (kg) 3.63 3.02 3.82 3.42 3.59 2.87 3.03 3.46 3.36 3.30
Length (cm) 53.1 49.7 48.4 54.2 54.9 43.7 47.2 45.2 54.4 50.4
Σx = 3.63 + 3.02 + 3.82 + 3.42 + 3.59 + 2.87 + 3.03 + 3.46 + 3.36 + 3.30 = 33.5
Σy = 53.1 + 49.7 + 48.4 + 54.2 + 54.9 + 43.7 + 47.2 + 45.2 + 54.4 + 50.4 = 501.2
Σx2 = 13.18 + 9.12 + 14.59 + 11.70 + 12.89 + 8.24 + 9.18 + 11.97 + 11.29 + 10.89 = 113.05
Σy2 = 2 819.6 + 2 470.1 + 2 342.6 + 2 937.6 + 3 014.0 + 1 909.7 + 2 227.8 + 2 043.0 + 2 959.4 +
2 540.2 = 25 264
Σxy = 192.8 + 150.1 + 184.9 + 185.4 + 197.1 + 125.4 + 143.0 + 156.4 + 182.8 + 166.3
= 1 684.2
Use the formula and the numbers you calculated in the previous steps to find r.
Simple linear regression is used to estimate the relationship between two quantitative variables.
You can use simple linear regression when you want to know:
1. How strong the relationship is between two variables (e.g., the relationship between
rainfall and soil erosion).
2. The value of the dependent variable at a certain value of the independent variable (e.g.,
the amount of soil erosion at a certain level of rainfall).
Regression models describe the relationship between variables by fitting a line to the observed
data. Linear regression models use a straight line, while logistic and nonlinear regression models
use a curved line. Regression allows you to estimate how a dependent variable changes as the
independent variable(s) change.
Simple linear regression example. Suppose that you are a social researcher interested in the
relationship between income and happiness. You survey 500 people whose incomes range from
15k to 75k and ask them to rank their happiness on a scale from 1 to 10.
Your independent variable (income) and dependent variable (happiness) are both quantitative, so
you can do a regression analysis to see if there is a linear relationship between them.
If you have more than one independent variable, use multiple linear regression instead.
Assumptions of simple linear regression
Simple linear regression is a parametric test, meaning that it makes certain assumptions about the
data. These assumptions are:
4. The relationship between the independent and dependent variable is linear: the line of
best fit through the data points is a straight line (rather than a curve or some sort of
grouping factor).
y is the predicted value of the dependent variable (y) for any given value of the
independent variable (x).
β 0 is the intercept, the predicted value of y when the x is 0.
β1 is the regression coefficient – how much we expect y to change as x increases.
x is the independent variable ( the variable we expect to influence y).
ԑ is the error term, or how much variation there is in our data.
Linear regression finds the line of best fit line through your data by searching for the regression
coefficient (B1) that minimizes the total error (e) of the model.
While you can perform a linear regression by hand, this is a tedious process, so most people use
statistical programs to help them quickly analyze the data.
STA 124 assignment 2
In a study of the relationship between the producer and retail prices of a commodity, the following
monthly average prices were collected over a period of ten months.