LOG708 Applied Statistics 24.nov.2020

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

LOG708 1 Applied Statistics

Front page

LOG708 Applied Statistics


Date: 24.11.20
Time: 09.00 -13.30, including time for any practical and technical actions needed to hand in your exam
paper.
Supporting materials: All supporting materials permitted. It is not allowed to cooperate with or receive
help from others.
Number of papers in exam question set: 8
Technical, administrative or academic questions: studentweb@himolde.no or +47 71 19 59 90

Keep your mobile phone close by. Important messages concerning all candidates might come by sms.
The best of luck!

1
Assignment
You can find the exam question set in the panel on the left. If you wish to download the question set to your
machine, follow this link: LOG708_H2020_23.11.2020 (002)

If you are not able to see the exam question set in Inspera, you can also find the question set in Canvas.

Write your answer in Word and save the document as one single PDF file on your own machine. Upload your
PDF file below.

Your file is saved in Inspera until the deadline for handing in your assignment. After the deadline has passed,
the last version of any uploaded PDF files is submitted automatically.

More information on how to submit in Inspera.

If you have any questions or need assistance, contact studentweb@himolde.no

1/1
Question 1
Attached
1

Problem 1 [25%]
Mekanic AS operates five production lines for their best-selling component called MDX100.
A production supervisor at Mekanic has randomly selected 150 units of MDX100 from the
five production lines (Line A – E) and created a chart (Figure 1) to summarize her data.

Figure 1. Condition of the units selected from the production lines

(a) Among the perfect units, what proportion is approximately contributed by Line B?
[2.5%]
(b) Suggest an alternative method that the supervisor could use to summarize the data
[2.5 %]
(c) Suppose the supervisor reclassified the variable “Condition” into two categories:
Defective (components with major or minor fault) and Perfect. Sketch a simple bar chart
to summarize the new variable [4%]

The supervisor decided to examine historical data on the daily number of defective units
produced by each of the production lines for the last 81 days. The summary statistics are as
follows:
Table 1: Daily number of defective components
Line A Line B Line C Line D Line E
Mean 40 35 20 30 25
Standard deviation 16 10 7 9 8
2

(d) Which of the production lines has the most predictable number of defective
components? Why? [8%]

(e) Using the information provided, create a 99% confidence interval for the average
number of defective components produced by Line A. [8%]

Problem 2 [25%]
A dataset about energy use of appliances in a low-energy house consists of 95 observations.
One of the variables in the dataset is T1, which represents energy use of light fixtures
measured in Watt-hours. Figure 2 shows the distribution of T1.

Figure 2. Boxplot summarizing values of T1


Tasks:
a) Estimate the interquartile range for the values of T1 [3%]
b) The mean and standard deviation for the values of T1 are 20.6 and 0.6, respectively.
Assuming that T1 has a normal distribution, respond to the following questions:
i. Find the proportion of values that fall between 19 and 20.5. [6%]
ii. Determine a value that separates the highest 10% of T1 values from the rest. [6%]
c) In the dataset, there is another variable called T2. An analyst has used the following
formula to create a new variable called AT:

AT = 2T1 + 3T2
3

If the values of T2 are also normally distributed and the analyst has observed a mean of
19.5 and a standard deviation of 4, find the probability that AT is less than 95. [10%]

Problem 3 [25%]
This problem consists of tasks (a) and (b). For task (a), assume that the values of the given
variables are normally distributed.

(a) Terje has surveyed all physical stores in Oslo and found that the average price for product
X is 171 NOK. He then randomly visited five online stores and observed the following
prices.

Table 2: Price of product X


Online store 1 Online store 2 Online store 3 Online store 4 Online store 5
165 NOK 173 NOK 168 NOK 172 NOK 166 NOK

Task: Test a hypothesis that on average it is cheaper to buy product X online [10%]

(b) Assume that the Ministry of Trade in Norway conducted an extensive study in June and
concluded that 45% of companies in Norway had lost between 5% and 25% of sales since
the onset of COVID-19. In September, a trade organization called Norsk Industri
conducted a similar study and found that among 360 companies that responded to the
survey 51% had lost between 5% and 25% of sales since the onset of the pandemic.

Tasks:
(i) Using Norsk Industri’s report, calculate the margin of error for the estimate of the
proportion of companies that lost between 5% and 25% of sales since the onset of the
pandemic. [5%]
4

(ii) Test the following hypothesis: “The proportion reported by Norsk Industri is
significantly different from the figure reported by the Ministry of Trade”. [10%]

Problem 4 [25%]
In this problem, we will continue to use the energy use dataset (95 observations). After some
preliminary analyses, the analyst conducted a regression analysis to explain the variation in the
energy use of light fixtures. Table 3 presents an extract of her dataset, consisting of six variables
whose names and descriptions are as follows:

• T1: Energy use of light fixtures in the house in Watt-hours


• RH_1: Temperature in kitchen area, in Celsius
• RH_2: Temperature in living room area, in Celsius
• T3: Humidity in living room area, in %
• RH_3. Temperature in laundry room area
• HMD: Humidity outside; classified as Low (1), Moderate (2) and High (3)

Table 3: An extract of energy use dataset


ID T1 RH_1 RH_2 T3 RH_3 HMD
1 19,89 47,60 44,79 19,79 44,73 3
2 19,89 46,69 44,72 19,79 44,79 3
3 19,89 46,30 44,63 19,79 44,93 1
4 19,89 46,07 44,59 19,79 45,00 2
5 19,89 46,33 44,53 19,79 45,00 1
6 19,89 46,03 44,50 19,79 44,93 2

The analyst conducted the analysis and Table 4 presents a portion of the results.

Table 4: A portion of results from the analysis


5

Estimate t value
Intercept 12.35 3.29
RH_1 -0.06 -2.57
RH_2 -0.78 -9.79
T3 0.45 2.25
RH_3 0.81 12.81
HMD1 -0.27 -3.39
HMD2 0.03 0.44
R2 = 0.85

Note:
HMD1 and HMD2 are dummy variables defined as follows:
HMD1 = 1 if the humidity outside is low, otherwise, HMD1 = 0
HMD2 = 1 if the humidity outside is moderate, otherwise, HMD2 = 0

(a) Estimate the value of T1 for the following observation [5%]


ID RH_1 RH_2 T3 RH_3 HMD
42 46 44 19 43 3

(b) Interpret the estimated coefficient of HMD1 [5%]

(c) By using 99% confidence interval, would you conclude that RH_1 is a significant predictor
of T1? [5%]

(d) If the total sum of squares for the estimated model is 37.27, compute the Adjusted R2 and an
estimate of the standard deviation of the random term. [5%]

(e) The analyst conducted another analysis as described below:


Step 1: She created a new dummy variable called NHMD and defined it as follows:
NHMD = 1 if the humidity outside is low, otherwise, NHMD = 0

Step 2: She multiplied T3 by NHMD and labeled the result as T3_NHMD

Step 3: She estimated a new regression model. Table 5 presents a portion of the results
6

Table 5: A portion of results from the second analysis


Estimate p value
Intercept 10.96 ------
RH_1 -0.06 ------
RH_2 -0.71 ------
T3 0.53 ------
RH_3 0.72 ------
T3_NHMD -0.01 0.003

Task: Interpret the effect of T3_NHMD on T1. [5%]


7

Appendix 1
8

Appendix 2

You might also like