Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

DATASET 2

1. What is the size of the dataset, and what types of variables are included?

The size of the dataset is 10886 rows, 12 columns. It includes both continues and categoric
data .

2. What are the distributions of the variables, and are they normally distributed?

No the data was not normally distributed.

3. What are the most frequent values or categories in the dataset, and how do they relate
to the target variable?

Season holiday working weather are the columns containing most frequent values

4. What are the important variables that influence the target variable?

5. Are there any correlations or patterns between the independent variables?

Yes, “registered” and “casual” are the variables which are highly correlated to each other more
than0.5

6. Is the dataset balanced, or is there an imbalance in the target variable distribution?

The dataset is balanced with the target variable “count” for this test set.

7. Are there any missing values, and if so, what is the best way to impute them?

Yes, there are missing values in this dataset. I filled them through mean.but before that I
changed the data type

8. Are there any outliers, and how should they be treated?


Yes, there exists the outliers in this dataset which I have calculated using IQR method

9. What is the appropriate method for feature scaling or normalization?

i used min max method to normalize the data

10. What is the best way to handle categorical variables in the model?

The categorical data was converted through one hot encoding

You might also like