Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Analysis for mtcars

Business Analytics Project


POST GRADUATE IN DIPLOMA OF MANAGEMENT

BY
Simha Jitendra 33076
K. Pooja 33026
M. Indraja 33004
G. Avinash 33005
P. Ravi Teja 33034
Rama Krishna
Under the guidance of
Mr. Santhosh Pathro

FACULTY OF BUSINESS ANALYTICS


INTEGRAL INSTITUTE OF ADVANCED MANAGEMENT
An Autonomous Institute
(Approved by A.I.C.T.E)
MVP Colony, Visakhapatnam, Andhra Pradesh
INTRODUCTION
An automobile consultancy firm “MycarDream” provides assistance to its
clients in making appropriate car deals, based on their requirements.
Based on various market surveys, the firm has gathered a large dataset of
different types of cars and their attributes across the world. The business model
of the company is solely based on consumer interest, aiming to provide the most
appropriate car to their clients and hence maximise the customer satisfaction.
Since this issue is so relevant to today’s engineering challenges and protecting
the environment, our team has decided to analyze data in order to explore which
avenues may be considered by engineering teams when attempting to meet fuel
economy standards. To do this, we have used the MTcars data set, which has
data on the design, performance and fuel economy for 32 automobiles.

Our exploratory analysis will be useful as it will delve into the various factors
which may have some sort of influence on fuel economy (miles per gallon). The
variables we have chosen to compare to the MPG are horsepower, and number
of cylinders. We hypothesize that these variables will have a strong relationship
to a cars fuel economy. Our hope is that the analysis will provide findings that
will identify which components of cars are the biggest perpetrators in
minimizing fuel economy.

In order to execute the analysis, we will use both Python and R. Our analysis
will begin with describing the data and then will proceed into displaying
different relationships between design of the car and the miles per gallon it is
able to achieve. A detailed report, including our code and methods, can be
viewed below.

Source Data:
We used the mtcars data set that is built-in to the R distribution.

First we look at the structure of the data set.


We find that it contains 32 rows and 11 variables. Now we look at some of the
actual data - first few rows and last few rows

Here we have took the “string” as ‘Str()’.

Here in the summary(data) Now we will find the NA values that means missing
values in dataset. We see that the data appears tidy. Now we look at the
descriptive statistics for each field - (min, 1st Q, Median, Mean, 3rd Q, max).
As we can see that there are 6 NA values are missing in that case we have to
assign the dummy values for that NA.

We will clean the data and we will find the descriptive statistics for each field -
(min, 1st Q, Median, Mean, 3rd Q, Max).
Now here we have taken the variable Model_year column.
Here we are adding one more variable with some condition in the data set. As
we can see that one variable column is added at end of the data set.

Now we will perform the simple linear regression. By using the equation y =
ax+b, here ‘y’ is the response variable and ‘x’ is the predictor variable and ‘a &
b’ are constants which are called the coefficients. We have to use lm() function
in R. And we have predicted the value of the Displacement.

SUMMARY
As our project is on mtcars. First we have imported the dataset with the help of
the csv file. Second step we have print the data and seen the dataset. After that
we have find the missing values in the dataset. In the dataset we have find the
missing values of the variables for that we have assigned dummy values. We
have clean the dataset and we find the descriptive statistics values of mean,
median, min values. After that we have created the one more variable column in
data set.. We are putting range of the values for that we are combing some
values. Now we will perform the simple linear regression. By using the equation
y = ax+b, here ‘y’ is the response variable and ‘x’ is the predictor variable and
‘a & b’ are constants which are called the coefficients. We have to use lm()
function in R. After that we have predict the variables then we got the predictor
value. Our analysis shows a strong negative correlation for number of
Displacement(-0.06055)

You might also like