Professional Documents
Culture Documents
Predictive Analysis Midterm - F22
Predictive Analysis Midterm - F22
Predictive Analysis Midterm - F22
Student Name:
MIDTERM EXAM
1. In an excel spreadsheet, calculate the (i) Final Grade for each student (NOTE: each exam is
equally weighted, but the Final Exam is the equivalent of two exams- i.e., its weighted
double); and the Final population (ii) Mean, (iii) Median, (iv), and (v) Standard Deviation for
the following data set of a Graduate Level statistics course
Exam 1 Exam 2 Exam 3 Exam 4 Final
Student Exam
Lonny 81 83 88 90 89
Charlie 87 80 87 91 89
Doris 90 88 92 92 93
Rita 76 71 85 90 87
Timmy 81 77 86 91 91
Ray 77 69 77 88 92
Donnie 64 60 71 89 87
Xavier 88 85 89 77 89
Millie 85 89 94 89 88
Alice 91 87 95 96 93
2. Using the same data set, create (i) a scatterplot of all FINAL calculated grades; a (ii) a single,
clustered bar graph illustrating EACH students’ grade performance; and (iii) a simple line
graph depicting student performance over time (i.e., collective progression from Exam 1
through the Final).
3. Using the same data set, create an ILLUSTRATED regression analysis comparing (i) the Top
Performing Student (Student A) to the Mean, and the (ii) Lowest Performing Student
(Student B) against the Mean- these may be illustrated in one graph (Student A & B v. Mean)
or separate graphs.
Part 2: Short Answer (50%)
Please provide the briefest explanation that FULLY answers the following Questions.
1. In data analysis, what do the terms (i) “frequency” and (ii) “mode” mean?
2. What are the three ways to link quantitative and qualitative data?
Las pruebas paramétricas asumen distribuciones estadísticas subyacentes a los datos. Por tanto, deben cumplirse algunas
condiciones de validez, de modo que el resultado de la prueba paramétrica sea fiable. Por ejemplo, la prueba t de Student para
dos muestras independientes será fiable solo si cada muestra se ajusta a una distribución normal y si las varianzas son
homogéneas.
Las pruebas no paramétricas no dehen ajustarse a ninguna distribución. Pueden por tanto aplicarse incluso aunque no se
cumplan las condiciones de validez paramétricas.
2
4200 Connecticut Avenue NW | Washington, DC 20008 | 202.274.5869 | udc.edu
9. What is a “t-test”?
Una prueba t (también conocida como prueba t de Student) es una herramienta para evaluar
las medias de uno o dos grupos mediante pruebas de hipótesis. Una prueba t puede usarse
para determinar si un único grupo difiere de un valor conocido (una prueba t de una
muestra), si dos grupos difieren entre sí (prueba t de muestras independientes), o si hay una
diferencia significativa en medidas pareadas (una prueba t de muestras dependientes o
pareada).
3
4200 Connecticut Avenue NW | Washington, DC 20008 | 202.274.5869 | udc.edu
12. El valor p es una probabilidad. Para las herramientas de análisis de
patrón, existe la probabilidad de que el patrón espacial observado se haya
creado mediante algún proceso aleatorio. Cuando el valor p es muy
pequeño, significa que es muy poco probable (pequeña probabilidad) que
el patrón espacial observado sea el resultado de procesos aleatorios, por lo
tanto puede rechazar la hipótesis nula. Puede preguntar: ¿Cuán pequeño
es suficientemente pequeño? Buena pregunta. Consulte la tabla y el
análisis a continuación.
13. Las puntuaciones z son desviaciones estándar. Si, por ejemplo, una
herramienta devuelve una puntuación z de +2,5, diría que el resultado son
desviaciones estándar de 2,5 Tanto las puntuaciones z como los valores p
se asocian con la distribución normal estándar como se muestra a
continuación.
14. Las puntuaciones z muy altas o muy bajas (negativas), asociadas con
valores p muy pequeños, se encuentran en las colas de la distribución
normal. Cuando ejecuta una herramienta de análisis de patrón de entidad y
produce valores p pequeños y una puntuación z muy alta o muy baja, esto
indica que es poco probable que el patrón espacial observado refleje el
patrón aleatorio teórico representado por su hipótesis nula (CSR).
15. Para rechazar la hipótesis nula, debe formar una opinión subjetiva con
respecto al grado de riesgo que desea aceptar por estar equivocado (por
rechazar falsamente la hipótesis nula). Por lo tanto, antes de ejecutar la
estadística espacial, usted selecciona un nivel de confianza. Los niveles de
confianza típicos son 90, 95 ó 99 por ciento. Un nivel de confianza del 99
por ciento sería el más conservador en este caso, lo que indica que no
desea rechazar la hipótesis nula a menos que la probabilidad de que el
patrón se haya creado mediante una opción aleatoria sea realmente
pequeña (menos de 1 por ciento de probabilidad).
4
4200 Connecticut Avenue NW | Washington, DC 20008 | 202.274.5869 | udc.edu
Degrees of freedom is a combination of how much data you have and how many parameters you
need to estimate. It indicates how much independent information goes into a parameter estimate. In
this vein, it’s easy to see that you want a lot of information to go into parameter estimates to obtain
more precise estimates and more powerful hypothesis tests. So, you want many DF!
17. In each of these statements, tell whether descriptive or inferential statistics have been used.
5
4200 Connecticut Avenue NW | Washington, DC 20008 | 202.274.5869 | udc.edu
a. By 2040 at least 3.5 billion people will run short of water (World Future Society). Inferential
b. Nine out of ten on-the-job fatalities are men (Source: USA TODAY Weekend ). Inferential
c. Expenditures for the cable industry were $5.66 billion in 1996 (Source: USA TODAY ). Descriptive
d. The median household income for people aged 25–34 is $35,888 (Source: USA TODAY ). Descriptive
e. Allergy therapy makes bees go away (Source: Prevention). Inferential
19. The average quantitative GRE scores for the top 30 graduate schools of engineering are listed. Construct a
grouped frequency distribution and a cumulative frequency distribution with 5 classes.
767 770 761 760 771 768 776 771 756 770 763 760 747 766 754 771 771 778 766 762 780 750 746 764
769 759 757 753 758 746
https://www.chegg.com/homework-help/questions-and-answers/760-11-gre-scores-top-ranked-engineering-schools-average-quantitative-
gre-scores-top-30-gr-q88567273
20. In a recent survey, 3 in 10 people indicated that they are likely to leave their jobs when the economy
improves. Of those surveyed, 34% indicated that they would make a career change, 29% want a new job
in the same industry, 21% are going to start a business, and 16% are going to retire. Make a pie chart for
the data.
24. In building new homes, a contractor finds that the probability of a home buyer selecting a two-car garage
is 0.70 and of selecting a one-car garage is 0.20. Find the probability that the buyer will select no garage.
6
4200 Connecticut Avenue NW | Washington, DC 20008 | 202.274.5869 | udc.edu
The builder does not build houses with three-car or more garages.
https://www.chegg.com/homework-help/mathzone-access-card-for-math-in-our-world-2nd-edition-chapter-11.7-problem-30e-solution-
9780077314118
25. The composition of the Senate of the 111th Congress is 41 Republicans 2 Independent 57 Democrats. A
new committee is being formed to study ways to benefit the arts in education.
a. If 3 Senators are selected at random to head the committee, what is the probability that they will
all be Republicans?
b. What is the probability that they will all be Democrats?
c. What is the probability that there will be 1 from each party, including the Independent?
https://www.chegg.com/homework-help/questions-and-answers/help-3-questions-please-sorry-shotty-photoshop-skills-think-get-point-
q19045788
7
4200 Connecticut Avenue NW | Washington, DC 20008 | 202.274.5869 | udc.edu
26. Approximately 10.3% of American high school students drop out of school before graduation. Choose 10
students entering high school at random. Find the probability that
a. No more than two drop out
b. At least 6 graduate
c. All 10 stay in school and graduate
https://www.chegg.com/homework-help/questions-and-answers/approximately-103-american-high-school-students-drop-school-graduation-
choose-10-students--q87719579
https://www.chegg.com/homework-help/questions-and-answers/find-probability-p-0-q38586535
28. In the standard normal distribution, find the values of z for the 75th, 80th, and 92nd percentiles.
https://www.chegg.com/homework-help/questions-and-answers/need-answers-important-minitab-application-minitab-please-
answer-find-z-value-minitab-stan-q27069865?trackid=a04fcc798c3e&strackid=55831424652b
29. The average credit card debt for college seniors is $3262. If the debt is normally distributed with a
standard deviation of $1100, find these probabilities.
a. That the senior owes at least $1000
b. That the senior owes more than $4000
c. That the senior owes between $3000 and $4000
31.
32. (a)
34.
36.
37. (b)
39.
8
4200 Connecticut Avenue NW | Washington, DC 20008 | 202.274.5869 | udc.edu
40. So the probability that the senior owes more than $4000 is
41.
42. (c)
44.
46.
48.
https://www.chegg.com/homework-help/elementary-statistics-9th-edition-chapter-6.2-problem-11e-solution-9780078136337
49. For the following, find the variance and the standard deviation:
Eighty randomly selected batteries were tested to determine their lifetimes (in hours). The following
frequency distribution was obtained.
62.5–73.5 5
73.5–84.5 14
84.5–95.5 18
95.5–106.5 25
106.5–117.5 12
117.5–128.5 6
https://www.chegg.com/homework-help/questions-and-answers/class-frequency-625-735-5-735-845-14-
845-955-18-955-1065-25-1065-1175-12-1175-1285-6-find--q79140215
In a recent study of 35 ninth-grade students, the mean number of hours per week that they played video games was 16.6.
The standard deviation of the population was 2.8.
1. Find the 95% confidence interval of the mean of the time playing video games.
2. Calculate the margin of error.
9
4200 Connecticut Avenue NW | Washington, DC 20008 | 202.274.5869 | udc.edu
a. The Confidence interval is (15.67, 17.53)
https://www.chegg.com/homework-help/questions-and-answers/recent-study-35-ninth-grade-students-mean-number-hours-per-
week-played-video-games-166-sta-q103046161
10
4200 Connecticut Avenue NW | Washington, DC 20008 | 202.274.5869 | udc.edu
Part 4: Statistical Application 2 (15%)
Imagine a scenario where you’ve been asked to be the gatekeeper (and paid very well for your duty) for a
series of scientific experiments to determine “truthfulness” in humans. A series of Subjects will approach you
and make a single statement- you are to determine the veracity of that statement (for simplicity, assume that
the only options available to you are to determine if a subject’s statement is “true” or false”- i.e.: there are
ONLY TWO OUTCOMES). REVIEW the Probability Equations!
Suppose that you are pre-equipped with knowledge of the following probabilities:
82% percent of all subjects you will interview are “liars”
75% percent of all “liars” will be identified
40% of the time “truth-tellers” will be wrongly identified as liars
Please answer the following questions:
11
4200 Connecticut Avenue NW | Washington, DC 20008 | 202.274.5869 | udc.edu