Professional Documents
Culture Documents
Internet ....
Internet ....
Internet ....
PROJECT INTRODUCTION
DATA MINING DEFINITION AND TECHNIQUES Data mining, also popularly known as
Knowledge Discovery in Database, refers to extracting or “mining" knowledge from large
amounts of data. Data mining techniques are used to operate on large volumes of data to
discover hidden patterns and relationships helpful in decision making. While data
mining and knowledge discovery in database are frequently treated as synonyms, data
mining is actually part of the knowledge discovery process. The sequences of steps
identified in extracting knowledge from data are shown in
2. Clustering
3. Predication
4. Association rule
5. Neural networks
6. Decision Trees
Linear Regression
The term "linearity" in algebra refers to a linear
relationship between two or more variables. If we draw
this relationship in a two dimensional space (between
two variables, in this case), we get a straight line.
to determine the linear relationship between the
numbers of hours a student studies and the percentage of
marks that student scores in an exam. We want to find
out that given the number of hours a student prepares for
a test, about how high of a score can the student achieve?
If we plot the independent variable (hours) on the x-axis
and dependent variable (percentage) on the y-axis, linear
regression gives us a straight line that best fits the data
points,
LINEAR
REGRESSION
1. y = mx + b
2. Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as
plt%matplotlib inline
3. Dataset
dataset=pd.read_csv('student_scores.csv'
)
dataset.head()
DATASET
To see statistical details
of the dataset, we can
use describe():
dataset.describe()
DATASET
1.
dataset.plot(x='Hours',
y='Scores', style='o')
plt.title('Hours vs
Percentage')
plt.xlabel('Hours
Studied')
plt.ylabel('Percentage
Score') plt.show()
Evaluating the Algorithm
MAKING PREDICTIONS
MATPLOTLIB
THANK YOU