Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 13

Click to edit Master title style

DATA SCIENCE INTERNSHIP


E X P O S Y S D ATA L A B S

PA N K A J P

1
Click to edit Master title style
1. ABSTRACT

This report documents the development and evaluation of various


machine learning models to predict the profit of companies based on
their R&D Spend, Administration Cost, and Marketing Spend. The
dataset consists of financial data from 50 companies. We explored
multiple regression algorithms including Linear Regression, Decision
Tree Regression, Random Forest Regression, and Support Vector
Regression (SVR). The models were evaluated using metrics such as
Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared
(R2) score. The results demonstrate the effectiveness of these models in
forecasting company profits, with Random Forest Regression yielding
the highest accuracy.

2 2
Click to edit
2. TABLE OF Master title style
CONTENTS

1. ABSTRACT
2. Table of Contents
3. Introduction
4. Existing Method
5. Proposed Method with Architecture
6. Methodology
7. Implementation
8. Conclusion

3 3
Click to edit Master title style
3. INTRODUCTION

In the competitive business landscape, accurate profit forecasting is


crucial for strategic planning and decision-making. Traditional methods
often rely on historical data and linear models, which may not capture
the complexity of financial interactions. This project aims to leverage
machine learning techniques to develop robust models for predicting
company profits based on financial inputs such as R&D Spend,
Administration Cost, and Marketing Spend. The objective is to identify
the most effective regression model that can provide reliable profit
predictions.

4 4
Click to edit METHODS
4. EXISTING Master title style

Traditional profit prediction methods primarily involve statistical


techniques such as linear regression, which assume a linear relationship
between the input variables and the target variable. While these
methods are straightforward and interpretable, they may not perform
well with complex datasets where relationships are non-linear and
interactions between variables are significant. Existing methods often
lack the flexibility and accuracy needed for precise profit forecasting,
motivating the exploration of advanced machine learning models.

5 5
Click to edit Master
5. PROPOSED METHODtitleWITH
styleARCHITURCUTRE

To improve profit prediction accuracy, we propose a machine learning approach incorporating multiple
regression algorithms. The architecture of the proposed method includes several stages:

•Data Collection: Gather financial data from 50 companies.


•Data Preprocessing: Clean the data, handle missing values, and standardize features using StandardScaler.
•Model Training: Implement various regression models (Linear Regression, Decision Tree, Random Forest,
SVR).
•Model Evaluation: Assess the models using metrics such as MSE, MAE, and R2 score.
•Prediction and Analysis: Use the best-performing model for profit prediction and analyze the results.

6 6
Click to edit Master title style
MODEL ARCHITECTURE

7 7
Click to edit Master title style
6. METHODOLOGY
• Data Collection: The dataset consists of financial information from 50 companies, including R&D Spend,
Administration Cost, Marketing Spend, and Profit.

• Data Preprocessing: Data preprocessing involves cleaning the dataset to remove any inconsistencies and
handling missing values. StandardScaler is used to standardize the features to ensure that each feature
contributes equally to the model.

• Model Selection: We explore several regression models:


Linear Regression: A basic model to establish a baseline.
Decision Tree Regression: Captures non-linear relationships.
Random Forest Regression: An ensemble method that improves accuracy by combining multiple
decision trees.
Support Vector Regression (SVR): Uses the RBF kernel to capture complex patterns.

• Model Training and Evaluation: The models are trained on 80% of the dataset and evaluated on the
remaining 20%. The performance is measured using MSE, MAE, and R2 score.

8 8
Click to edit Master title style
6. METHODOLOGY
• Data Collection: The dataset consists of financial information from 50 companies, including R&D Spend,
Administration Cost, Marketing Spend, and Profit.

• Data Preprocessing: Data preprocessing involves cleaning the dataset to remove any inconsistencies and
handling missing values. StandardScaler is used to standardize the features to ensure that each feature
contributes equally to the model.

• Model Selection: We explore several regression models:


Linear Regression: A basic model to establish a baseline.
Decision Tree Regression: Captures non-linear relationships.
Random Forest Regression: An ensemble method that improves accuracy by combining multiple
decision trees.
Support Vector Regression (SVR): Uses the RBF kernel to capture complex patterns.

• Model Training and Evaluation: The models are trained on 80% of the dataset and evaluated on the
remaining 20%. The performance is measured using MSE, MAE, and R2 score.

9 9
Click to edit Master title style
7. IMPLEMENTATION
Linear Regression: from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

from sklearn.svm import SVR


Support Vector Regression (SVR): svr_model = SVR(kernel='rbf', gamma='auto')
svr_model.fit(X_train, y_train)
svr_pred = svr_model.predict(X_test)

from sklearn.tree import DecisionTreeRegressor


dt_model = DecisionTreeRegressor(random_state=0)
Decision Tree Regression: dt_model.fit(X_train, y_train)
dt_pred = dt_model.predict(X_test)

Random Forest Regression: from sklearn.ensemble import RandomForestRegressor


rf_model = RandomForestRegressor(n_estimators=100, random_state=0)
rf_model.fit(X_train, y_train)
rf_pred = rf_model.predict(X_test)

1010
Click to edit Master
EVALUATION METRICStitle style

MODEL MAE MSE R2_SCORE


LINEAR
77506468.16885415 7320.441614848128 0.94
REGRESSION
DECISION TREE 30129511.580150068 4480.289000000006 0.98

RANDOM FOREST 41550185.79973195 5193.778120000019 0.97

SVR 1453663319.2412798 28983.263844945584 -0.13

1111
Click to edit Master title style
8. CONCLUSION

This project demonstrates the application of various machine learning


regression models to predict company profits based on financial data.
The results indicate that ensemble methods such as Random Forest
Regression provide the highest accuracy, followed by Support Vector
Regression. These models outperform traditional linear regression by
capturing complex interactions between input variables. Future work can
explore other advanced techniques and larger datasets to further
enhance prediction accuracy.

1212
Click to edit Master title style

Thank You

13

You might also like