Professional Documents
Culture Documents
Jitendra (BA Project Document)
Jitendra (BA Project Document)
BY
Simha Jitendra 33076
K. Pooja 33026
M. Indraja 33004
G. Avinash 33005
P. Ravi Teja 33034
Rama Krishna
Under the guidance of
Mr. Santhosh Pathro
Our exploratory analysis will be useful as it will delve into the various factors
which may have some sort of influence on fuel economy (miles per gallon). The
variables we have chosen to compare to the MPG are horsepower, and number
of cylinders. We hypothesize that these variables will have a strong relationship
to a cars fuel economy. Our hope is that the analysis will provide findings that
will identify which components of cars are the biggest perpetrators in
minimizing fuel economy.
In order to execute the analysis, we will use both Python and R. Our analysis
will begin with describing the data and then will proceed into displaying
different relationships between design of the car and the miles per gallon it is
able to achieve. A detailed report, including our code and methods, can be
viewed below.
Source Data:
We used the mtcars data set that is built-in to the R distribution.
Here in the summary(data) Now we will find the NA values that means missing
values in dataset. We see that the data appears tidy. Now we look at the
descriptive statistics for each field - (min, 1st Q, Median, Mean, 3rd Q, max).
As we can see that there are 6 NA values are missing in that case we have to
assign the dummy values for that NA.
We will clean the data and we will find the descriptive statistics for each field -
(min, 1st Q, Median, Mean, 3rd Q, Max).
Now here we have taken the variable Model_year column.
Here we are adding one more variable with some condition in the data set. As
we can see that one variable column is added at end of the data set.
Now we will perform the simple linear regression. By using the equation y =
ax+b, here ‘y’ is the response variable and ‘x’ is the predictor variable and ‘a &
b’ are constants which are called the coefficients. We have to use lm() function
in R. And we have predicted the value of the Displacement.
SUMMARY
As our project is on mtcars. First we have imported the dataset with the help of
the csv file. Second step we have print the data and seen the dataset. After that
we have find the missing values in the dataset. In the dataset we have find the
missing values of the variables for that we have assigned dummy values. We
have clean the dataset and we find the descriptive statistics values of mean,
median, min values. After that we have created the one more variable column in
data set.. We are putting range of the values for that we are combing some
values. Now we will perform the simple linear regression. By using the equation
y = ax+b, here ‘y’ is the response variable and ‘x’ is the predictor variable and
‘a & b’ are constants which are called the coefficients. We have to use lm()
function in R. After that we have predict the variables then we got the predictor
value. Our analysis shows a strong negative correlation for number of
Displacement(-0.06055)