Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

BNFO615: Bioinformatics Centric Data Structure

Prof. Zhi Wei

Homework Assignment I
Due: Wed. Oct. 6

Notes:
1) Hand in a paper copy at the beginning of the class period on the due date.
2) The paper copy (properly stapled together) must include the following items if any:
 Detailed calculation for probability problems;
 Source code of your program;
 Output produced by the program;
 A brief description if needed.
3) Late assignments will be penalized at a rate of 25 points per 24-hour period.

Problem 1 (20 pt):


There is a rare disease that only happens to 1 out of 100,000 people. A test shows positive
 99% of times when applied to an ill patient and,
 5% of times when applied to a healthy patient.

a) What is the probability for a patient to have the disease given that his test result is positive?
b) If the patient did another test and the result turned out to be positive again, what is the
probability for the patient to be ill this time? Assume that the two tests are independent.
c) Assume that the patient keeps on trying the tests, what is the minimum number of tests that
the patient has to try to be 99% percent sure that he is actually ill? Assume that all tests are
independent.

Problem 2 (20 pt):


Given three events: 1) R: It rains, 2) W: Glass is wet, and 3) U: People bring umbrella. We
assume that event U and W are conditionally independent given event R. More specifically, we
have Pr(U, W|R) = Pr(U|R)Pr(W|R) and Pr(U, W|R) = Pr(U|R)Pr(W|R). The graphical
representation of the relationship among the three events is illustrated in Figure 1. The prior
probability for event R is Pr(R) = 0.8. The conditional probabilities Pr(U|R) and Pr(W|R) are
given in the below two tables. Based on these information, compute the conditional probability
Pr(W|U).
BNFO615: Bioinformatics Centric Data Structure
Prof. Zhi Wei

Pr(W|R) R R
Pr(U|R) R R

W 0.7 0.4 U 0.9 0.2

W 0.3 0.6 U 0.1 0.8

W U
Figure 1: Graphical representation of events R, W, and U

Problem 3 (30 pt):


Install the package UsingR and then consider the following problems:

1) The data set pi2000 (UsingR) contains the first 2,000 digits of π. What is the percentage of
digits that are 3 or less? What percentage of the digits are 5 or more?
2) The time variable in the nym.2002 (UsingR) data set contains the time to finish the 2002
New York City marathon for a random sample of the finishers.
1. What percent ran the race in under 3 hours?
2. What is the time cutoff for the top 10%? The top 25%?
3. What time cuts off the bottom 10%?
Do you expect this data set to by symmetrically distributed?

Problem 4 (30 pt):


1) An elevator can safely hold 3,500 pounds. A sign in the elevator limits the passenger count to
15. If the adult population has a mean weight of 180 pounds with a 25-pound standard
deviation, how unusual would it be, if the central limit theorem applied, that an elevator
holding 15 people would be carrying more than 3,500 pounds?
2) A traffic officer writes an average of four tickets per day, with a variance of one ticket.
Assume the central limit theorem applies. What is the probability that she will write fewer
than 75 tickets in a 21-day cycle?

You might also like