DataScience Exam10

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Machine learning:-

 Machine learning is a set of algorithms that have the


capability to learn to perform tasks such as prediction and
classification effectively using data.
Machine learning Algorithms :

 Supervised Learning Algorithms


 Unsupervised Learning Algorithms
 Reinforcement Learning Algorithms
 Evolutionary Learning Algorithms

Supervised Learning Algorithms

 These algorithms require the knowledge of both the outcome


variable (dependent variable) and the features (independent
variable or input variables).
 Some real-world applications of supervised learning are Risk
Assessment, Fraud Detection, Spam filtering, etc.
Unsupervised Learning Algorithms:-

 In unsupervised learning, the models are trained with the data


that is neither classified nor labelled, and the model acts on that
data without any supervision.
 Let's take an example to understand it more preciously; suppose
there is a basket of fruit images, and we input it into the machine
learning model.
 The images are totally unknown to the model, and the task of
the machine is to find the patterns and categories of the objects.
Framework for developing ML model
 Problem or Opportunity Identification
 Feature Extraction − Collection of Relevant Data
 Data Pre-processing
 Model Building
 Communication and Deployment of the Data Analysis
Correlation

• Data Correlation Is a way to understand the relationship between


multiple variables and attributes in your dataset.
• Using Correlation, you can get some insights such as:
• One or multiple attributes depend on another attribute or a
cause for another attribute.
• One or multiple attributes are associated with other attributes.
• Correlation can help in predicting one attribute from another (Great
way to impute missing values).
• Correlation can (sometimes) indicate the presence of a causal
relationship.
• Correlation is used as a basic quantity for many modelling
techniques.
• Positive Correlation: means that if feature A increases then feature
B also increases or if feature A decreases then feature B also
decreases. Both features move in tandem and they have a linear
relationship.
• Negative Correlation: means that if feature A increases then feature
B decreases and vice versa.
• No Correlation: No relationship between those two attributes.

Decision Tree Algorithm:-

 Decision Tree is a Supervised learning technique that can be


used for both classification and Regression problems, but mostly
it is preferred for solving Classification problems.
 In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. 
 It is called a decision tree because, similar to a tree, it starts with
the root node, which expands on further branches and constructs
a tree-like structure.
 Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two
or more homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the tree
cannot be segregated further after getting a leaf node.
 Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given
conditions.

 Branch/Sub Tree: A tree formed by splitting the tree.


 Pruning: Pruning is the process of removing the unwanted
branches from the tree.
 Information Gain
 Gini Index


Steps of Implement Decision Tree Algorithms
in analysis:

 Load Libraries

 Data Pre-processing step

 Fitting a Decision-Tree algorithm to the Training set


 Predicting the test result

 Test accuracy of the result(Creation of Confusion matrix)

 Visualizing the test set result.


You might also like