Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Assignment 05

Course: Statistics (STA 240)


Submitted To: Md. Mortuza Ahmmed Submitted By: Shamsul Islam Raisy (BSCE-11106005) Submitting Date: 21/11/2011.

Statistics:
Statistics is the study of the collection, organization, analysis and interpretation of data.

Application of statistics in civil engineering: There are many uses in different kinds of
fields in civil engineering of statistics. One of the most important is disaster management. Every year our country faces lots of natural disasters. We collect, organize and analyze all those datas to interpret them, to think of a better way to deal with the nature.

Sample and population: In a class of students, everyone is part of the whole population
and the monitor is a sample.

Variable: A number that could change the value in different situations is variable. Ex. height
of a student (a different student could have a different height), size of a shirt (there could be many different size of a shirt).

Scale of measurement:
1. Nominal scale: nominal scale of measurement only satisfies the identity property of measurement. Values assigned to variables represent a descriptive category, but have no inherent numerical value with respect to magnitude. Ex. gender (male, female), color (black, white, red), religion (Islam, Hindu, Buddhism). 2. Ordinal scale: the ordinal scale has the property of both identity and magnitude. Each value on the ordinal scale has a unique meaning and it has an ordered relationship to every other value on the scale. Ex. in a horse race-win, place and show; in a class-superior, good, average and poor. 3. Interval scale: the interval scale of measurement has the properties of identity, magnitude and equal intervals. Interval scale expresses the difference and the measurement of difference in the same scale. Ex. Womens dress size and temperature Size 8 10 Bust 32 34 Waist 24 26 Hips 35 37 Object Weight 4. Ratio scale: The ratio scale of measurement satisfies all four properties of measurement, identity, magnitude, equal intervals, and an absolute zero. Ex. weight of an object, it could be zero. We can say C weights twice as B and D is heavier than A, B and C. A 0 B 2 C 4 D 8 Day Temperature Sunday 60 f Monday 65 f Tuesday 70 f

Line Graph :
1200 1000 800 600 400 200 0 August September October November December

Months Share Index August 700 September 800 October 1000 November 600 December 400

Bar Diagram :
250 200 150 100 50 0 Muslim Hindu Other

Religion Population Muslim 200 Hindu 100 Other 50

Pie Chart :
Religion Population Muslim
Muslim Hindu Other

Percent 57.14% 28.57% 14.29% 100%

200 100 50 350

Hindu Other Total

Age Weight

Scatter Diagram :
{x, y}={(10,20)(20,40)(30,45)(40,50)(50,55)(60,50)(70,45)}
60 50 40 30 20 10 0 0 20 40 60 80

10 20 20 40 30 45 40 50 50 55 60 50 70 45

Steam & Leaf Plot :


11,14,16,21,23,24,27,30,31,35,36,37,38,40,41,42,43,50,51. Steam Leaf 1 1,4,6. 2 1,3,4,7. 3 0,1,5,6,7,8. 4 0,1,2,3. 5 0,1.

Histogram :
3.5 3 2.5 2 1.5 1 0.5 0 5 to 10 10 to 15 15 to 20 20 to 25 25 to 30

Class 5 to 10 10 to 15 15 to 20 20 to 25 25 to 30

Frequency Length 5 5/5=1 10 10/5=2 15 15/5=3 10 10/5=2 5 5/5=1

Central tendency:
Central tendency is to find out a significant number to represent the whole data set.

Measures of central tendency:

1. Mean: also known as arithmetic mean is the value which we get after dividing the total of all value by the number of values. =

Ex. given values: 5, 6, 7, 3, 4, 7, 8, 5, 4 5 + 6 + 7 + 3 + 4 + 7 + 8 + 5 + 4 49 = = = 5.44 9 9 2. Median: it is the middle value that we get after arranging all the values in chronological order. Its generally used when there is an extreme value present. = ( ) Ex. given values: 5, 6, 7, 3, 4, 7, 8, 5, 4[chronologically= 3, 4, 4, 5, 5, 6, 7, 7, 8.] 9+1 = = 5 . 2 Again, if the given value is: 3, 4, 4, 5, 5, 6, 7, 7, 8, 9. 10 10 = 5 . + 1 = 6 . 2 2 =5 =5 5 6 5+5 =5 2 3. Mode: in a given set of numbers which one appears most often that of is mode of that data set. When there is two numbers appearing most, then both of them are mode. Ex. given values: 3, 4, 4, 5, 5, 6, 7,7, 7, 8, 9. = 7 4. Geometric mean: its useful when we want to find average change of percentages, ratios, or growth rates over time. We cant use GM when there is a 0 or negative value in the set. GM is never greater than the AM. The GM of a set of n positive numbers is defined as the n th root of the product of n values: = 1 2 3 ( ){, = , 1 = . } To find average percent increase over time the formula would be: =

1{, = }

5. Harmonic mean: we cant use HM when there is a 0 or negative value in the set. The formula is: = {, = , 1 = . . } 1 1 1 + + + 1 2 Arithmetic mean is the best measure of central tendency: there are some criteria for a good measure of central tendency. They are: I. II. III. IV. V. VI. Clearly defined. Readily comprehensible. Easily calculated. Based on all observations. Less effected by extreme values. Capable for further algebraic treatment.

And since arithmetic mean fulfills all criteria other then the fifth one, some can say Arithmetic mean is the best measure of central tendency. Extreme value in real life situation: let in a family of three where members weights are 40, 45 and 50 kg, we can find the average 45kg. But if someone with 100kg of weight joins the family then our average will be 58.7kg, which is more than 3 other values and very far from the last one. Sometimes, measures of central tendency are not appropriate to work with. If we consider only the central tendency on a data set, we may draw wrong conclusion about the whole data set. Ex. In a company, average salary of employees is 79000tk (98000tk, 75000tk and 65000tk.) but this doesnt mean everyones salary is 79000tk each or some amount near that. Find the two numbers whose harmonic mean=32/5, geometric mean=8. , = 32/5, = 8 , , < . = 8 = 64 . . () = ,
64 ..

2 32 = 1 1 5 +

2 32 = + 5 2 32 = + 5 2 64 32 = + 5

32 + = 5 2 64 + = 20 64 + = 20 64 + 2 = 20

64 + 2 = 20

2 20 + 64 = 0 b (b-16)-4(b-16) = 0 (b-16) (b-4) = 0 So, b= 16 = 4

a= 20 16 or a= 20 4 =4 = 16

< = 4 = 16 Find the mean for the series- 1, 2, 3,. . . . . . . . , 500. mean for the series = + 1 500 + 1 = = 250.5 2 2

What is the median of the sample 4,5,7,9,6,3,2,5,1,9,8,5,8? = 1,2,3,4,5,5,5,6,7,8,8,9,9. = 5 Find the mean for the series 1000, 2000, 3000, . . . . . . ., 50000. , = + 1 2

1000,2000,3000, . . . . . . . , 50000. 1000 , 1,2,3, . . . . . . . , 50. = 50 + 1 = 25.5 2

= 25.5 1000 = 25500

Measures of Dispersion
Measures of dispersion are the way by which we can find out the actual data sheet of an average. That means finding out which average is very systematic.

Methods of measures of dispersion:


Range is based on the largest and the smallest values in the data set, mean deviation, variance and standard deviation are all based on deviations from the arithmetic mean. a. Range: The simplest measure of dispersion is the range. It is the difference between the largest and the smallest values in a data set. the formula: = Ex. A student took five exams and scored 92, 75, 95, 90 and 98. We have to find out the range for his scores.

= 98 75 = 23 b. Mean deviation: The arithmetic mean of the absolute values of the deviations from the arithmetic mean. the formula: = {, = , = = } again, when we have to find out MD and SD for two numbers: = = {, = } 2 Ex. A student took five exams and scored 92, 75, 95, 90 and 98. We have to find out the mean deviation for his scores. Given, 1 = 92, 2 = 75, 3 = 95, 4 = 90, 5 = 98 So, =
92+75+95+90+98 5 450 = 5

= 90 92 90 + 75 90 + 95 90 + 90 90 + 98 90 = 5 2 + 15 + 5 + 0 + 8 = 5 30 = =6 5 c. Variance: The arithmetic mean of the squared deviation from the mean. the formula is: 2 =
2

Ex. A student took five exams and scored 86, 94, 76, 76 and 88. We have to find out the variance for his scores. Given, 1 = 86, 2 = 94, 3 = 76, 4 = 76, 5 = 88 So, =
86+94+76+76+88 5 420 = 5

= 84 2 =
8684 2 + 9484 2 + 7684 2 + 7684 2 + 8884 2 5 2 2 + 10 2 + 8 2 + 8 2 + 4 2 = 5 4+100+64+64+16 = 5 248 = 5 = 49.6 2

d. Standard deviation: The square root of the variance. the formula is: = again, when we have to find out MD and SD for two numbers: = = 2 {, = } or when we have to find out the standard deviation for n numbers: =
2 1 12

Ex. A student took five exams and scored 86, 94, 76, 76 and 88. We have to find out the standard deviation for his scores. Given, 1 = 86, 2 = 94, 3 = 76, 4 = 76, 5 = 88

So, =

86+94+76+76+88 5 420 = 5

= 84 = = = =
8684 2 + 9484 2 + 7684 2 + 7684 2 + 8884 2 5 2 2 + 10 2 + 8 2 + 8 2 + 4 2 5 4+100+64+64+16 5 248 5

= 49.6 = 7.04

Co-efficient of variation: co-efficient of variation is used to compare the dispersion


in different sets of data with different units of measurement. The formula is: = 100 Ex. Find the co-efficient of variation for 5kg and 3taka. Here, =
53 2

=1
5+3 2

and, =
1

=4

= 4 100 = 25%

Correlation: The correlation is a way to measure how associated or related two


variables are. Formula is: =

Chart for correlation strength Range Strength

{, = , = , = }

+1 Perfectly positive There are three patterns of correlation: -1 Perfectly negative I. Positive correlation: in a positive correlation, if one of the observations increases the second 0 to .3 Weakly positive one does the same and when the first observation decreases the second one does .3 to .7 Moderately positive the same. Ex. higher education and years spent on education - people with higher .7 to -1 Strongly positive education tends to more year of education. II. Negative correlation: in a negative correlation, if one of the observations increases the second one decreases and when the first observation decreases the second one increases. Ex. watching TV and exam grade when a student watch a lots of TV he tends to have lower grade in exam. III. Zero correlation: in a zero correlation, an observation doesnt have any effect on other one. Ex. Bill Gates money and my happiness no matter how much money Bill Gates has that dont make me sad or happier.

Regression: Its a statistical tool for the investigation of relationship between


variables. Formula is: = + {, = = 0, = 1 } and, = =

( )2 2

Difference between regression & correlation Regression Correlation 1. It can explain cause or effect. 2. The limit of regression is . + 3. It cant predict the future. 1. It cant explain cause or effect. 2. The limit of correlation is -1 . . . . .+1 3. It can predict the future.

Probability:
Probability provides a way to find and express our uncertainty in making decisions about a population from sample information. Probability reflects the long-run relative frequency of the outcome, a probability could be expressed as decimal (0.1), faction (10 ) or percentage (10%). Formula:
1

I. II. III. IV. V.

= + ( ) = () + () = . (/) = . (/) = . () (/) =


() ()

.(/) ()

Important Terms:
Experiment: Its an activity that is either observed or measured, such as tossing a coin or drawing a card. Event: An event is a possible outcome of an experiment. Ex. if the experiment is to sample six lamps coming off a production line, an event could be to get one defective and five good ones. Certain event: In a certain event, if we have a sample of eight numbers and we have to find out the probability that sample should be included with eight digits; this is known as certain event. Ex. = 2,3,5,7,11,13,17,19 = 8 [ ] = 1
8

Impossible event: An event which have no possibility to occur. Ex. in a jar of red balls finding a white ball could be considered as impossible event. Sample space: A sample space is a complete set of all events of an experiment. Ex. singer = , , , , , , bee = , . Mutually exclusive events: those events that cant happen at a time are called mutually exclusive event. Ex. In a coin toss of a single coin, events of heads and tails are mutually exclusive event. Independent event: Two or more events could be called independent events when the occurring or not occurring of one doesnt affect another. Ex. coin toss and exam grade. Conditional probability: A conditional probability is denoted by P(X/Y).

Probability Distributions: There are three types of probability distribution:


I. Binomial distribution: The probability distribution of the random variable X is called binomial distribution. The formula is: () = {, = , = 0,1,2,3, . . , , = , = . [ = 1 ]} Mean of binomial distribution is: = = Variance of binomial distribution is: () = 2 = Poisson probability: There are some applications for the Poisson distribution. Applications are: a) The number of death by horse kicking in the army. b) Birth defects and genetic mutations. c) Rare diseases (leukemia). d) Car accidents. e) Traffic flow and ideal gap distance. f) Number of typing errors on a page. g) Hairs found in McDonalds burger. h) Spread of an endangered animal in Africa. i) Failure of a machine in one month. Formula is: =
!

II.

, = 0,1,2,3, . . = 2.71828, = Mean and variance: = , = 2 = . III. Normal probability distributions: The normal probability distribution is very common in the field of statistics. Formula: () = Mean and variance: E(X) = V(X) = 2
1 2
1 2 )

2(

Area under the normal curve using integration: the probability of a continuous normal variable X found in a particular interval [a,b] is the area under the curve bounded by x= and x= ( < < ) = () The standard normal distribution: If we have the standardized situation of = 0 and = 1 then we have, 1 2 () = 2 2 we can transform all the observations of any normal random variable x with mean ()and variance () to a new set of observations of another normal random variable z with mean 0 and variance 1 using the following transform = Property of normal distribution: a) The normal curve is symmetrical about the mean . b) The mean is at the middle and divides the area into halves. c) The total area under the curve is equal to 1. d) Its completely determined by its mean and standard deviation.

Sampling:
The methods of drawing sample from a population are: 1. Simple random sampling: Simple random sample is a sample selected so that each item or person in the population has the same chance of being included, this can be done in two methods I. Lottery method: let, in a group of people we have to select 3 people randomly. We write down all their names on different small piece of papers. Then fold them so that no one could read which name is written in which. Then shuffle them all in a jar. Then ask someone to pick three piece of paper from that jar, and this three will be names of our 3 selected people. This method of selecting simple random sampling is called lottery method. II. Random number applying: random numbers can be obtained using the calculator, a spreadsheet, printed tablets of random numbers or tossing coins or rolling dice. 2. Stratified sampling: let, in a group of people we have to select 1 single, 1married and 2 divorced. To be able to do that, we have to divide all the male and females of that group in 3 subgroups, 1.single 2.married and 3.divorced. Then from first subgroup we have to take one, one from second subgroup and two from the third subgroup. This way we will get our 1 single, 1married and 2 divorced people. This method of sampling is called stratified sampling. 3. Systematic sampling: let, in a university we have to know what the students are thinking about a new drink within two days, but we can ask only hundred students in that time limit. There are five thousand students in that university. So to complete this task in two days, we divide the entire student IDs by 50 and we ask every 50th ID holder about our new drink. This process of sampling is called systematic sampling.

4. Cluster sampling: let, in a university we have to know what the students are thinking about a new drink, within two days. There are five thousand students in that university studying in thirty subjects. That is huge amount of data to process in two days. So to complete this task in two days, we select five specific subjects and we ask twenty students from each selected subjects about our new drink. This process of sampling is called cluster sampling. Difference between stratified sampling & cluster sampling Stratified sampling Cluster sampling 1. Two strata cannot be same. 1. Two clusters can be same.

2. Strata show the homogeneous 2. Clusters show the and the heterogeneous type homogeneous. [in case of situation]. 3. Strata divided into groups. 3. Clusters are divided into brunch.

Hypothesis:
Hypothesis is a statement about a parameter subject to verification.

Null hypothesis: A statement about the value of a population parameter


developed for the purpose of testing numerical evidence. It is expressed by 0 .

Alternate hypothesis: A statement that is accepted if the sample data provide


sufficient evidence that the null hypothesis is false. It is expressed by 1 .

Level of significance: The probability of rejecting the null hypothesis when it is


true. It is also called level of risk, because it is the risk we take of rejecting the null hypothesis when it is really true. It is expressed by . Hypothesis testing is done in five simple steps. They are:

Step 1: Establishing 0 and 1 . Step 2: Selecting the value for . Step 3: Selecting appropriate formula. = Step 4: Calculating the value of z. Step 5: Making a decision, we have to accept or reject 0 depending on the value of
z. If the value of z is more then , then 1 is right and if the value of z is less then , then 1 is wrong. In these five steps, after calculating all the right variables, two kind of error is possible. They are:

Type 1 error: Rejecting the 0 , when it is true or right. Type 2 error: Accepting the 0 , when it is false or wrong.
The average I.Q. of university women in Bangladesh is suspected to be more then 110. A random sample of 64 women yielded an average I.Q. of 115.5 and a standard deviation of 20. Can you conclude that the average I.Q. of the women in the population is really more than 110? Test this at 5% level of significance (5% = 1.64). Step 1: 0 : = 110 1 : > 110 Step 2: 5% = 1.64 Step 3: =

Step 4: , = 115.5 = 110 = 64 = 20 =


115.5110
20 64

= 2.5 = 2.2 Step 5: Value of z= 2.2 is more than the value of 5% = 1.64 so, the average I.Q. of university women in Bangladesh is more then 110.

5.5

You might also like