Case 1

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Last year, five randomly selected students took a math aptitude test before they began their statistics

course.
The Statistics Department has two questions.
 What linear regression equation best predicts statistics performance, based on math aptitude
scores?
 If a student made an 80 on the aptitude test, what grade would we expect her to make in statistics?
How to Find the Regression Equation:In the table below, the xi column shows scores on the
aptitude test. Similarly, the yi column shows statistics grades. The last two rows show sums and
mean scores that we will use to conduct the regression analysis.

2. A digital media company (similar to Voot, Hotstar, Netflix, etc.) had launched a show. Initially,
the show got a good response, but then witnessed a decline in viewership. The company wants
to figure out what went wrong. We are concerned about determining the driver variable for show
viewership. This is the case of pridiction rather than projection where we are more interested in
predicting the key driver variables and their impact rather than forcasting the results.

We have been given data for the period of 1 March 2017 to 19 May 2017.
With Columns as
Views_show : Number of times the show was viewed
Visitors : Number of visitors who browsed the platform, but not necessarily watched a video.
Views_platform : Number of times a video was viewed on the platform
Ad_impression : Proxy for marketing budget. Represents number of impressions generated by
ads
Cricket_match_india: If a cricket match was being played. 1 indicates match on a given day, 0
indicates there wasn't
Character_A : Describes presence of Character A. 1 indicates character A was in the episode, 0
indicates she/he wasn't

Media.csv
3. According to the World Health Organization (WHO) stroke is the 2nd leading cause of death
globally, responsible for approximately 11% of total deaths.
This dataset is used to predict whether a patient is likely to get stroke based on the input
parameters like gender, age, various diseases, and smoking status. Each row in the data
provides relavant information about the patient.

Attribute Information
id: unique identifier
2) gender: "Male", "Female" or "Other"
3) age: age of the patient
4) hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has
hypertension
5) heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a
heart disease
6) ever_married: "No" or "Yes"
7) work_type: "children", "Govt_jov", "Never_worked", "Private" or "Self-employed"
8) Residence_type: "Rural" or "Urban"
9) avg_glucose_level: average glucose level in blood
10) bmi: body mass index
11) smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"*
12) stroke: 1 if the patient had a stroke or 0 if not
*Note: "Unknown" in smoking_status means that the information is unavailable for this
patient

healthcare-dataset-stroke-data.csv

You might also like