CO-367 Machine Learning Lab File: Submitted To: Submitted by

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

CO-367 Machine Learning

Lab File

Submitted to: Submitted by:


Sanjay Pathidar Shubham Anand

(Associate Professor) 2k17/CO/336

Page 1
INDEX

S.No Experiment Date Sign

Page 2
EXPERMIMENT 1

AIM: To study basic python libraries used in data science

THEORY:
A. Numpy

NumPy is the fundamental package for scientific computing with python.
It contains among other things:
• a powerful N-dimensional array object

• sophisticated (broadcasting) functions

• tools for integrating C/C++ and Fortran code

• useful linear algebra, Fourier transform, and random number


capabilities

FUNCTIONS OF NUMPY LIBRARY


1. MATHEMATICAL FUNCTIONS
ARCSIN, ARCOS and ARCTAN functions return the trigonometric
inverse of sin, cos, and tan of the given angle.

NUMPY.AROUND() is a function that returns the value rounded to


the desired precision.

NUMPY.FLOOR() is a function returns the largest integer not


greater than the input parameter.

2. STRING FUNCTIONS
ADD() is a function that returns element-wise string concatenation
for two arrays of str or Unicode.

MULTIPLY() is a function that returns the string with multiple


concatenation, element-wise.

CENTER() is a function that returns a copy of the given string with


elements centered in a string of specified length.

SPLIT() is a function that returns a list of the words in the string,


using separate or delimiter.

Page 3
3. SORTING FUNCTIONS
NUMPY.SORT() function returns a sorted copy of the input array.

NUMPY.ARGSORT() function performs an indirect sort on input


array, along the given axis and using a specified kind of sort to
return the array of indices of data.

NUMPY.LEXSORT() function performs an indirect sort using a


sequence of keys. The keys can be seen as a column in a
spreadsheet.

4. STATICTICAL FUNCTIONS
NUMPY.AMIN() and NUMPY.AMAX() functions return the minimum
and the maximum from the elements in the given array along the
specified axis.

NUMPY.PTP() function returns the range (maximum-minimum) of


values along an axis.

NUMPY.MEDIAN() returns the value separating the higher half of a


data sample from the lower half – Median.

NUMPY.PERCENTILE() returns Percentile (or a centile) that is a


measure used in statistics indicating the value below which a given
percentage of observations in a group of observations fall.

B. Matplotlib

Matplotlib is a Python 2D plotting library which produces publication
quality figures in a variety of hardcopy formats and interactive
environments across platforms. Matplotlib can be used in Python
scripts, the Python and IPython shells, the Jupyter notebook, web
application servers, and four graphical user interface toolkits.

Matplotlib tries to make easy things easy and hard things possible. You
can generate plots, histograms, power spectra, bar charts, errorcharts,
scatterplots, etc., with just a few lines of code.

Page 4
FUNCTIONS OF MATPLOTLIB LIBRARY
Matplotlib comes with a wide variety of plots. Plots helps to understand
trends, patterns, and to make correlations. They’re typically instruments
for reasoning about quantitative information. Some of the sample plots are
covered here.

1. LINE PLOT
# importing matplotlib module from matplotlib
import pyplot as plt

# Function to plot plt.plot(x,y)

# function to show the plot plt.show()

2. BAR PLOT
# importing matplotlib module from matplotlib
import pyplot as plt

# Function to plot the bar plt.bar(x,y)

# function to show the plot plt.show()

3. HISTOGRAM
# importing matplotlib module from matplotlib
import pyplot as plt

# Function to plot histogram plt.hist(y)

# Function to show the plot plt.show()

4. SCATTER PLOT

# importing matplotlib module from matplotlib


import pyplot as plt

# Function to plot scatter plt.scatter(x, y)

# Function to show the plot plt.show()

C. Pandas
Python has long been great for data munging and preparation, but
less so for data analysis and modeling. pandas helps fill this gap,
enabling you to carry out your entire data analysis workflow in
Python without having to switch to a more domain specific language
like R.

pandas does not implement significant modeling functionality


outside of linear and panel regression; for this, look to statsmodels
and scikit-learn. More work is still needed to make Python a first

Page 5
class statistical modeling environment, but we are well on our way
toward that goal.

FUNCTIONS OF PANDAS LIBRARY


1. INDEX
dataflair_index =pd.date_range('1/1/2000', periods=8)

2. SERIES
dataflair_s1 = pd.Series(np.random.randn(5),
index=['a', 'b', 'c', 'd', 'e'])

3. DATAFRAME
d a t a fl a i r _ d f 1 = p d . D a t a F r a m e ( n p . r a n d o m . r a n d n ( 8 , 3 ) ,
index=dataflair_index,columns=['A', 'B', 'C'])

4. PANEL

dataflair_wp1 = pd.Panel(np.random.randn(2, 5, 4), items=['Item1',


'Item2'],major_axis=pd.date_range('1/1/2000',
periods=5),minor_axis=['A', 'B', 'C', 'D'])

CONCLUSION:
We’ve learnt the basics of the most commonly used data science libraries in
python.

Page 6
EXPERMIMENT 2

AIM: To learn how to read from a csv file using pandas.


THEORY:
Data in the form of tables is also called CSV (comma separated values)
- literally "comma-separated values." This is a text format intended for
the presentation of tabular data. Each line of the file is one line of the
table. The values of individual columns are separated by a separator
symbol - a comma (,), a semicolon (;) or another symbol. CSV can be
easily read and processed by Python.

CODE:
# Load the Pandas libraries with alias 'pd'

import pandas as pd

# Read data from file 'filename.csv'

# (in the same directory that your python process is based)

# Control delimiters, rows, column names with read_csv

data = pd.read_csv("filename.csv")

# Preview the first 5 lines of the loaded data

data.head()

FUNCTION DESCRIPTION
read_csv Read a comma-separated values
(csv) file into DataFrame.Also
supports optionally iterating or
breaking of the file into chunks.

head Preview the first 5 lines of the


loaded data

CONCLUSION:
We successfully read a csv file and displayed the first five lines of our dataset

Page 7
EXPERMIMENT 3
AIM: To implement linear regression.

THEORY:

Linear Regression is a Machine Learning algorithm based on supervised
learning. It performs a regression task. Regression models a target prediction
value based on

independent variables. It is mostly used for finding out the relationship


between variables and forecasting. Different regression models differ based
on – the kind of relationship between dependent and independent variables,
they are considering and the number of independent variables being used.
Linear Regression performs the task to predict a dependent variable value (y)
based on a given independent variable (x). So, this regression technique
finds out a linear relationship between x (input) and y(output). In a simple
regression problem (a single x and a single y), the form of the model would
be:

y = B0 + B1*x
In higher dimensions when we have more than one input (x), the line is called
a plane or a hyper-plane. The representation therefore is the form of the
equation and the specific values used for the coefficients.

CODE:

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

dataset = pd.read_csv('Salary_Data.csv')

X = dataset.iloc[:, :-1].values

y = dataset.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size


= 1/3, random_state = 0)

Page 8
# Feature Scaling

from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()

X_train = sc_X.fit_transform(X_train)

X_test = sc_X.transform(X_test)

sc_y = StandardScaler()

y_train = sc_y.fit_transform(y_train)"""

# Fitting Simple Linear Regression to the Training set

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()

regressor.fit(X_train, y_train)

# Predicting the Test set results

y_pred = regressor.predict(X_test)

# Visualising the Training set results

plt.scatter(X_train, y_train, color = 'red')

plt.plot(X_train, regressor.predict(X_train), color = 'blue' )

plt.title('Salary Vs Experiance (Training Set)')

plt.xlabel('Years of Experiance')

plt.ylabel('Salary')

plt.show()

# Visualising the Test set results

plt.scatter(X_test, y_test, color = 'red')

plt.plot(X_train, regressor.predict(X_train), color = 'blue' )

plt.title('Salary Vs Experiance (Test Set)')

plt.xlabel('Years of Experiance')

plt.ylabel('Salary')

plt.show()

CONCLUSION:
In this experiment we learned about linear regression and the graph
obtained by importing the dataset and fitting the regression model to the
dataset.

Page 9
EXPERMIMENT 4
AIM: To implement DT CART (classification and regression trees) algorithm.

THEORY:
A decision tree is a largely used non-parametric effective machine learning
modelling technique for regression and classification problems. To find
solutions a decision tree makes sequential, hierarchical decision about the
outcome variable based on the predictor data. Decision tree builds
regression or classification models in the form of a tree structure. It breaks
down a dataset into smaller and smaller subsets while at the same time an
associated decision tree is incrementally developed. The final result is a tree

with decision nodes and leaf nodes.

Classification and Regression Tree (CART) is one of commonly used Decision

Tree algorithms. In this post, we will explain the steps of CART algorithm
using an example data. Decision Tree is a recursive partitioning approach
and CART split each of the input node into two child nodes, so CART
decision tree is Binary Decision Tree. At each level of decision tree, the
algorithm identify a condition - which variable and level to be used for
splitting input node (data sample) into two child nodes.

CODE:

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

dataset = pd.read_csv('Position_Salaries.csv')

X = dataset.iloc[:, 1:2].values

y = dataset.iloc[:, 2].values

# Splitting the dataset into the Training set and Test set

"”"from sklearn.cross_validation import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size


= 0.2, random_state =

0)"""

Page 10
# Feature Scaling

"""from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()

X_train = sc_X.fit_transform(X_train)

X_test = sc_X.transform(X_test)

sc_y = StandardScaler()

y_train = sc_y.fit_transform(y_train)"""

# Fitting Decision Tree Regression to the dataset

from sklearn.tree import DecisionTreeRegressor

regressor = DecisionTreeRegressor(random_state = 0)

regressor.fit(X, y)

# Predicting a new result

y_pred = regressor.predict(6.5)

# Visualising the Decision Tree Regression results (higher


resolution)

X_grid = np.arange(min(X), max(X), 0.01)

X_grid = X_grid.reshape((len(X_grid), 1))

plt.scatter(X, y, color = 'red')

plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')

plt.title('Truth or Bluff (Decision Tree Regression)')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show()

CONCLUSION:
In this experiment we learned about Regression Tree (Classification and
Regression tree) and the graph obtained by importing the dataset and fitting
the regression tree model to the dataset.

Page 11
Page 12

You might also like