Data Science Internship

Click to edit Master title style
DATA SCIENCE INTERNSHIP

E X P O S Y S D ATA L A B S
PA N K A J P
1
1. ABSTRACT
This report documents the development and evaluation of various

machine learning models to predict the profit of companies based on
their R&D Spend, Administration Cost, and Marketing Spend. The
dataset consists of financial data from 50 companies. We explored
multiple regression algorithms including Linear Regression, Decision
Tree Regression, Random Forest Regression, and Support Vector
Regression (SVR). The models were evaluated using metrics such as
Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared
(R2) score. The results demonstrate the effectiveness of these models in
forecasting company profits, with Random Forest Regression yielding
the highest accuracy.
2 2
Click to edit
2. TABLE OF Master title style
CONTENTS
1. ABSTRACT
2. Table of Contents
3. Introduction
4. Existing Method
5. Proposed Method with Architecture
6. Methodology
7. Implementation
8. Conclusion
3 3
3. INTRODUCTION
In the competitive business landscape, accurate profit forecasting is

crucial for strategic planning and decision-making. Traditional methods
often rely on historical data and linear models, which may not capture
the complexity of financial interactions. This project aims to leverage
machine learning techniques to develop robust models for predicting
company profits based on financial inputs such as R&D Spend,
Administration Cost, and Marketing Spend. The objective is to identify
the most effective regression model that can provide reliable profit
predictions.
4 4
Click to edit METHODS
4. EXISTING Master title style
Traditional profit prediction methods primarily involve statistical

techniques such as linear regression, which assume a linear relationship
between the input variables and the target variable. While these
methods are straightforward and interpretable, they may not perform
well with complex datasets where relationships are non-linear and
interactions between variables are significant. Existing methods often
lack the flexibility and accuracy needed for precise profit forecasting,
motivating the exploration of advanced machine learning models.
5 5
Click to edit Master
5. PROPOSED METHODtitleWITH
styleARCHITURCUTRE
To improve profit prediction accuracy, we propose a machine learning approach incorporating multiple
regression algorithms. The architecture of the proposed method includes several stages:
•Data Collection: Gather financial data from 50 companies.

•Data Preprocessing: Clean the data, handle missing values, and standardize features using StandardScaler.
•Model Training: Implement various regression models (Linear Regression, Decision Tree, Random Forest,
SVR).
•Model Evaluation: Assess the models using metrics such as MSE, MAE, and R2 score.
•Prediction and Analysis: Use the best-performing model for profit prediction and analyze the results.
6 6
MODEL ARCHITECTURE
7 7
6. METHODOLOGY
• Data Collection: The dataset consists of financial information from 50 companies, including R&D Spend,
Administration Cost, Marketing Spend, and Profit.
• Data Preprocessing: Data preprocessing involves cleaning the dataset to remove any inconsistencies and
handling missing values. StandardScaler is used to standardize the features to ensure that each feature
contributes equally to the model.
• Model Selection: We explore several regression models:

Linear Regression: A basic model to establish a baseline.
Decision Tree Regression: Captures non-linear relationships.
Random Forest Regression: An ensemble method that improves accuracy by combining multiple
decision trees.
Support Vector Regression (SVR): Uses the RBF kernel to capture complex patterns.
• Model Training and Evaluation: The models are trained on 80% of the dataset and evaluated on the
remaining 20%. The performance is measured using MSE, MAE, and R2 score.
8 8
6. METHODOLOGY
• Data Collection: The dataset consists of financial information from 50 companies, including R&D Spend,
Administration Cost, Marketing Spend, and Profit.
• Data Preprocessing: Data preprocessing involves cleaning the dataset to remove any inconsistencies and
handling missing values. StandardScaler is used to standardize the features to ensure that each feature
contributes equally to the model.
• Model Selection: We explore several regression models:

Linear Regression: A basic model to establish a baseline.
Decision Tree Regression: Captures non-linear relationships.
Random Forest Regression: An ensemble method that improves accuracy by combining multiple
decision trees.
Support Vector Regression (SVR): Uses the RBF kernel to capture complex patterns.
• Model Training and Evaluation: The models are trained on 80% of the dataset and evaluated on the
remaining 20%. The performance is measured using MSE, MAE, and R2 score.
9 9
7. IMPLEMENTATION
Linear Regression: from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
from sklearn.svm import SVR

Support Vector Regression (SVR): svr_model = SVR(kernel='rbf', gamma='auto')
svr_model.fit(X_train, y_train)
svr_pred = svr_model.predict(X_test)
from sklearn.tree import DecisionTreeRegressor

dt_model = DecisionTreeRegressor(random_state=0)
Decision Tree Regression: dt_model.fit(X_train, y_train)
dt_pred = dt_model.predict(X_test)
Random Forest Regression: from sklearn.ensemble import RandomForestRegressor

rf_model = RandomForestRegressor(n_estimators=100, random_state=0)
rf_model.fit(X_train, y_train)
rf_pred = rf_model.predict(X_test)
1010
Click to edit Master
EVALUATION METRICStitle style
MODEL MAE MSE R2_SCORE

LINEAR
77506468.16885415 7320.441614848128 0.94
REGRESSION
DECISION TREE 30129511.580150068 4480.289000000006 0.98
RANDOM FOREST 41550185.79973195 5193.778120000019 0.97
SVR 1453663319.2412798 28983.263844945584 -0.13
1111
8. CONCLUSION
This project demonstrates the application of various machine learning

regression models to predict company profits based on financial data.
The results indicate that ensemble methods such as Random Forest
Regression provide the highest accuracy, followed by Support Vector
Regression. These models outperform traditional linear regression by
capturing complex interactions between input variables. Future work can
explore other advanced techniques and larger datasets to further
enhance prediction accuracy.
1212
Thank You
13

Data Science Internship

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Science Internship

Uploaded by

Copyright:

Available Formats

Click to edit Master title style

DATA SCIENCE INTERNSHIP

This report documents the development and evaluation of various

In the competitive business landscape, accurate profit forecasting is

Traditional profit prediction methods primarily involve statistical

•Data Collection: Gather financial data from 50 companies.

• Model Selection: We explore several regression models:

• Model Selection: We explore several regression models:

from sklearn.svm import SVR

from sklearn.tree import DecisionTreeRegressor

Random Forest Regression: from sklearn.ensemble import RandomForestRegressor

MODEL MAE MSE R2_SCORE

RANDOM FOREST 41550185.79973195 5193.778120000019 0.97

SVR 1453663319.2412798 28983.263844945584 -0.13

This project demonstrates the application of various machine learning

You might also like