Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

1

Data Analytics for Public Administration

Student’s Name

Institutional Affiliation

Course Code

Instructor

Date
2

Data Analytics for Public Administration

Question 1

Suppose that a city manager would like you to analyze the mileage of the city’s vehicle fleet.
Begin your analysis by creating a frequency distribution of total vehicle mileage. In this
frequency distribution, group total vehicle mileage into classes, using a class width of 10,000 and
a starting point of 20,000 miles (you should have a total of 7 classes). Answer the following:

a. What percentage of vehicles have a total mileage that is between 30,000 and 39,999?

Answer = 63%

b. How many vehicles have a mileage that is 80,000 or higher?

Answer = 1%

Question 2

Is there a relationship between vehicle type and annual maintenance cost? Continue your
analysis by creating a crosstabulation with vehicle type and annual maintenance cost as the two
variables of interest.

In this crosstabulation, group annual maintenance cost into classes, using a class width of $1000
and a starting point of $200 (you should have 6 classes). Answer the following questions:

a. How many vehicles in the city’s vehicle fleet are emergency service vehicles?

Answer = 11 Vehicles

b. Among vehicles that cost between $200 and $1199 to maintain annually, what percent are
non-emergency service vehicles?

Answer = 96.30%

c. Among emergency service vehicles, what percent cost more than $3199 per year to maintain?

Answer = 72.72%

d. Among non-emergency service vehicles, what percent cost more than $3199 per year to
maintain?
3

Answer = 10.11%

Question 3

Please answer the next set of questions pertaining to the annual maintenance cost per vehicle. For
all responses, round to two decimal places.

a. What is the mean annual maintenance cost per vehicle?

Answer = $1673.69

b. What is the median annual maintenance cost per vehicle?

Answer = $1119.50

c. What is the standard deviation?

Answer = $1572.15

d. Suppose a given vehicle in the fleet costs $5000/year to maintain. How many standard
deviations away from the mean is this vehicle?

(5000 - 1673.69)/ 1572.15 = 2.12 Standard Deviations Away

e. What is the probability of a vehicle in the fleet costing more than $3500/year to maintain?

Calculating the Z-score: (X – Mean)/ SD = (3500 - 1673.69)/ 1572.15 = 1.161663

Given Z = 1.161663, P(x>Z) = 0.12269

Therefore, the probability = 12.27%

f. What is the correlation between annual maintenance cost and total number of days the vehicle
is in use per year?

Answer = 0.29

Question 4

For the remainder of the questions, suppose that the data in the dataset is from a simple random
sample of vehicles within a large city’s vehicle fleet, and you are interested in making some
4

statistical inferences about the city’s entire fleet. For the questions below, round your answers to
two decimal places.

a. For total number of days, the vehicle was in use over the year (TDAYS), what is the point
estimate of the population mean?

Answer = 204.19

b. The city manager wants an estimate of the degree to which vehicles in the fleet are being used.
Provide an interval estimate for the mean total number of days vehicles are in use over the year.
Specifically, what is the 99% confidence interval?

Lower limit = 180.83

Upper limit = 227.55

99% confidence interval = (180.83, 227.55)

Question 5

Suppose that, in a previous year, the city’s fleet drove 200 days out of the year on average. The
city manager would like to know if this year’s fleet is any different. Conduct a hypothesis test to
examine if the city’s vehicle fleet is being driven 200 days out of the year on average. The
population standard deviation is unknown.

a. This is a one-tailed test.

Null Hypothesis (H₀): The average number of days the vehicles were driven in the current year
is equal to the average of 200 days from the previous year (μ = 200)

Alternative Hypothesis (H₁): The average number of days the vehicles were driven in the
current year is greater than the average of 200 days from the previous year (μ > 200)

b. What is the test statistic?

Answer: t = 0.47

c. What is the p-value associated with the test statistic?

Answer: p = 0.32
5

d. Would you reject the null hypothesis that the vehicle's fleet is driven 200 days out of the year?

We do not reject the null hypothesis

e. What would you conclude?

The average number of days the vehicles were driven in the current year is equal to the average
of 200 days from the previous year (μ = 200). That is, on average, the city's fleet is being driven
200 days out of the year.

Question 6

The city manager wants to develop a better understanding of the factors that influence vehicle
maintenance costs. Use the data to run a simple linear regression with annual maintenance cost
per vehicle (MCOST) as the dependent variable and total vehicle mileage (TMILE) as the
independent variable. Then, answer the following questions:

a. What is the regression equation?

MCOST (y) = 791.19 + 0.02 (TMILE)

b. What is the coefficient of determination?

Answer = 0.0407

c. With a level of significance of 0.01, we would conclude that the slope coefficient for total
vehicle mileage is: not significantly different from zero

d. With a level of significance of 0.05, we would conclude that the slope coefficient for total
vehicle mileage is: significantly different from zero

Question 7

Suppose that the city manager was underwhelmed by the results of the simple linear regression.
Use the data to run a multiple regression with annual maintenance cost per vehicle (MCOST) as
the dependent variable and total vehicle mileage (TMILE), total number of days the vehicle was
in use over the year (TDAYS), and vehicle type (VTYPE) as the independent variables.

Y = MCOST
6

X1 = TMILE

X2 = TDAYS

X3 = VTYPE

a. What is the regression equation?

Y = −236.22 + 0.0218 (TMILE) + 3.36 (TDAYS) + 2978.92 (VTYPE)

b. What is the adjusted coefficient of determination?

0.4319

c. Provide an interpretation of the slope coefficients for each independent variable in the model.
In your interpretation, first consider whether each slope coefficient is significantly different from
zero at a level of significance of 0.05. Then, interpret what each slope coefficient means, making
sure you are specific to the variables and the example.

TMILE (Total Vehicle Mileage)

The slope coefficient for TMILE is 0.0218 with a p-value of 0.0109.

At a significance level of 0.05, the slope coefficient for TMILE is significantly different from
zero.

Interpretation: For each unit increase in Total Vehicle Mileage (TMILE), holding other variables
constant, there is an associated increase in the annual maintenance cost per vehicle by
approximately $0.0218.

TDAYS (Total Number of Days in Use)

The slope coefficient for TDAYS is 3.36 with a p-value of 0.0177.

At a significance level of 0.05, the slope coefficient for TDAYS is significantly different from
zero.

Interpretation: For each additional day the vehicle is in use over the year (TDAYS), holding
other variables constant, there is an associated increase in the annual maintenance cost per
vehicle by approximately $3.36.
7

VTYPE (Vehicle Type)

The slope coefficient for VTYPE is 2978.92 with a p-value very close to zero (7.70E-12).

At any reasonable significance level, the slope coefficient for VTYPE is significantly different
from zero.

Interpretation: The impact of Vehicle Type (represented by the dummy coded variable) is
significant in predicting the annual maintenance cost per vehicle. In comparison to the reference
category (if any, represented by the dummy variable), vehicles categorized differently incur a
substantially higher average maintenance cost (approximately $2978.92) holding other variables
constant.

You might also like