Professional Documents
Culture Documents
SSRN-id3526707 2020
SSRN-id3526707 2020
net/publication/339055809
CITATIONS READS
2 7,620
3 authors, including:
Rupashri Barik
JIS COLLEGE OF ENGINEERING
8 PUBLICATIONS 4 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Rupashri Barik on 17 March 2020.
Keywords:
Machine Learning, Linear Regression, Polynomial Regression
words, ML [2] [3] is a type of artificial intelligence used as an aid for data visualization, to infer values
that extracts patterns out of raw data by using an of a function where no data are available, and to
algorithm or method. The key focus of ML is to summarize the relationships among two or more
allow computer systems to learn from experience variables. Extrapolation refers to the use of a fitted
without being explicitly programmed or human curve beyond the range of the observed data, and is
intervention. subject to a degree of uncertainty since it may
Linear Regression [4] is an algorithm of machine reflect the method used to construct the curve as
learning based on supervised learning scheme. much as it reflects the observed data.
Linear regression [6] carries out a task that may Following is the descriptions of the method that the
predict the value of a dependent variable (y) on work has been done.
basis of an independent variable (x) that is given.
Therefore, this kind of regression technique looks Proposed Method for Salary Prediction:
for a linear type of relationship between input x and
output y. Step 1: Salary data have been taken from dataset.
Polynomial Regression [5] [7] is a form of linear
Step 2: Then the points corresponding to the salary
regression in which the relationship between the
data of an individual person have been plotted in the
independent variable x and dependent variable y is graph. The data are initialized in pandas (ascending,
modelled as an nth degree polynomial. Polynomial descending, mixed-up). Taking the dataset from
regression [8] fits a nonlinear relationship between each pandas field and from the pandas dataset we
the value of x and the corresponding conditional plotted the points on the graph as per number wise
mean of y, denoted E(y |x). or input wise that came real dataset.
This application provides a Salary graph Step 3: After that we using linear regression for
representation that is mainly done by polynomial draw lines between the points.
regression statistics, polynomial regression [8] is a
Step 4: If the points are not in linear way then we
form of regression analysis which represents the
use polynomial regression for curving purpose.
relationship between the independent Through the clustering points we can make a
variable x and the dependent variable y and that is smooth and curve path.
modelled as the nth degree of polynomial in x.
Polynomial regression is suitable for a nonlinear Step 5: After then through the linear/polynomial
graph through the x-y axis we can predict salary.
type of relationship between the value of x and the
correlating conditional mean of y, represented as Step 6: Also, we predict a person on future salary
E(y |x). Although polynomial regression fits a position as per the graph goes. Only take a particular
nonlinear model to the data, as a statistical person position, then the prediction answer be
estimation problem it is linear, in the sense that the executed through the help of the graph.
regression function E(y | x) is linear in the
unknown parameters that are estimated from
the data. For this reason, polynomial regression is III. EXPERIMENTATION
assumed to be a special case of multiple linear
regressions. This prediction has been implemented through the
Curve fitting is a method of building a curve, or to following approaches:
represent a mathematical function that is optimally numpy:- NumPy is a Python package
suitable to a series of data points, and possibly it is which stands for 'Numerical Python'. It is
subject to constraints. Curve fitting can involve the core library for scientific computing,
either interpolation, where an exact relevant to the which contains a powerful n-dimensional
data is needed, or smoothing, in which a "smooth" array object, provide tools for integrating
function is created that is approximately suitable C, C++ etc. It is also useful in linear
the data. A related topic is regression algebra, random number cap ability etc.
analysis, which focuses more on questions
of statistical inference such as how much matplotlib.pyplot:- matplotlib.pyplot is a
uncertainty is present in a curve that is fit to data collection of command style functions that
observed with random errors. Fitted curves can be make matplotlib work like MATLAB.
Each pyplot function makes some change
I3SET2K19: INTERNATIONAL CONFERENCE ON INDUSTRY INTERACTIVE INNOVATIONS IN SCIENCE, ENGINEERING AND
TECHNOLOGY2
ELECTRONIC COPY AVAILABLE AT: https://ssrn.com/abstract=#######
to a figure: e.g., creates a figure, creates a NumPy is used for antialiasing a dynamic
plotting area in a figure, plots some lines array or large set.
in a plotting area, decorates the plot with matplotlib.pyplot is used for making graph
labels, etc. pandas are mainly used like database
pandas:- pandas is a kind of software where we store the dataset from the excel
library written for the Python file. Through the data set we are forming
programming language for data graph and predicting
manipulation and analysis. In particular, it
offers data structures and operations for Then importing the dataset on the pandas through
manipulating numerical tables and time the help of “<any variable>. read_csv” and then
series. It is free software released under showing that field that are importing from the excel
the three-clause BSD license. file. Through the salary sets the plotting the points
at the graph. After then it is being tried to input
read_csv:- Python is a great language for linear regression function makes only straight line,
doing data analysis, primarily because of if all points are linear then these function is used
the fantastic ecosystem of data- and directly goes making graph and predicting. But
centric python packages. pandas is one of if the points are not linear so we go for next step. If
those packages and makes importing and the points are not linear then we use polynomial
analyzing data much easier.
function for curving purpose that help to observing
user that how the growth are moving on a path.
Linear Regression :- Linear
After this the diagram on the graph is for
Regression will be used to perform linear
demonstration. Then it is predicted to a salary
and polynomial regression and make
through the graph using both the x-y-axis.
predictions accordingly. Now, you have
two arrays: the input x and output y. You “<variable>.predict(poly_reg.fit_transform(v))”.
should call .reshape() on x because this From here it gets a prediction value through the
array is required to be two-dimensional, or polynomial/linear regression.
to be more precise, to have one column
and as many rows as necessary,
IV. RESULTS AND DISCUSSION
Polynomial features: - Generate a new From this application can be observed as the
feature matrix consisting of all polynomial graphical representation and can also predict the
combinations of the features with degree any point from position and automatically
less than or equal to the specified degree. calculates salary. Also survey the salary venue.
For example, if an input sample is two Surveys of salary are therefore differentiated on
dimensional and of the form [a, b], the basis of their data source into those that -
degree-2 polynomial features.
Get data from companies, or
Collect data from employees.
Predict:-predict the values where find at
the time period that how much there in Survey operators assigned for salary strive to get
the most significant input data in every possible
that graph.
way. There is no way to decide that which
First, a dataset in excel file to be made and then to approach is correct. The first possibility may assure
large companies, where as the second choice is
be opened in Jupyter notebook from ANACONDA
mainly for comparative smaller companies.
Navigator.From there we can read the datasetin
ANACONDA Navigator. At Jupyter notebook first Output of accessing data set is as follows:
we taking three variables for importing purpose
such as NumPy, matplotlib.pyplot and pandas.
These functions are used are as follows:
REFERENCES
1. Andreas Mullar, “Introduction to Machine Learning
using Python: A guide for data Scientist,” in O’Reilly
Publisher, India.
2. S. Marsland, Machine learning: an algorithmic
Figure 3: Sampling the data from the salary dataset
perspective. CRC press, 2015.
3. A. L. Buczak and E. Guven, “A survey of data
In Figure 3 we can see all the points are not using. mining and machine learning methods for cyber
For that reason now we are implementing security intrusion detection,” IEEE Communications
“PolynomialFeatures” for curveness. “degree = 6” Surveys & Tutorials, vol. 18, no. 2, pp. 1153–1176,
here refer the smoothness of curve . Oct., 2015.
Rupashri Barik
Ayush Mukherjee