Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

*What is machine learning?

Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the
use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.

INTRODUCTION TO LAB:
Machine Learning is used anywhere from automating mundane tasks to offering intelligent
insights, industries in every sector try to benefit from it. For example, a wearable fitness tracker
like Fitbit, or an intelligent home assistant like Google Home. But there are much more examples
of ML in use.
Prediction: Machine learning can also be used in the prediction systems. Considering the
loan example, to compute the probability of a fault, the system will need to classify the
available data in groups.

Image recognition: Machine learning can be used for face detection in an image as well.
There is a separate category for each person in a database of several people.

Speech Recognition: It is the translation of spoken words into the text. It is used in voice
searches and more. Voice user interfaces include voice dialing, call routing, and appliance
control. It can also be used a simple data entry and the preparation of structured documents.

Medical diagnoses: ML is trained to recognize cancerous tissues.

Financial industry: and trading companies use ML in fraud investigations and credit
checks.

Types of Machine Learning


Machine learning can be classified into 3 types of algorithms
1. Supervised Learning

2. Unsupervised Learning

3. Reinforcement Learning

Overview of Supervised Learning Algorithm


In Supervised learning, an AI system is presented with data which is labeled, which means that
each data tagged with the correct label
The goal is to approximate the mapping function so well that when you have new input data (x)
that you can predict the output variables (Y) for that data
New l•1aII

NOt
Categorical NOT
Separation*Pa

Learns

Spam

Enables the machine to be tralned to classify observaFlons


into some class
As shown in the above example, we have initially taken some data and marked them as ‘Spam’ or
‘Not Spam’. This labeled data is used by the training supervised model, this data is used to train
the model.
Once it is trained we can test our model by testing it with some test new mails and checking of
the model is able to predict the right output.
Types of Supervised learning
Classification: A classification problem is when the output variable is a category, such as
“red” or “blue” or “disease” and “no disease”.

Regression: A regression problem is when the output variable is a real value, such as
“dollars” or “weight”.

Overview of Unsupervised Learning Algorithm


In unsupervised learning, an AI system is presented with unlabeled, uncategorized data and the
system’s algorithms act on the data without prior training. The output is dependent upon the
coded algorithms. Subjecting a system to unsupervised learning is one way of testing AI.
Types of Unsupervised learning:
Clustering: A clustering problem is where you want to discover the inherent groupings in
the data, such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy X also tend to buy Y.
Overview of Reinforcement Learning
A reinforcement learning algorithm learns by interacting with its environment. It receives
rewards by performing correctly and penalties for performing incorrectly. It learns without
intervention from a human by maximizing its reward and minimizing its penalty. It is a type of
dynamic programming that trains algorithms using a system of reward and punishment.

in the above example, we can see that the agent is given 2 options i.e. a path with water or a path
with fire. A reinforcement algorithm works on reward a system i.e. if the agent uses the fire path
then the rewards are subtracted and agent tries to learn that it should avoid the fire path. If it had
chosen the water path or the safe path then some points would have been added to the reward
points, the agent then would try to learn what path is safe and what path isn’t.
It is basically leveraging the rewards obtained; the agent improves its environment knowledge to
select the next action.
Q1. Name a few libraries in Python used for Data Analysis and Scientific Computations.

Here is a list of Python libraries mainly used for Data Analysis:

 NumPy
 SciPy
 Pandas
 SciKit
 Matplotlib
 Seaborn
 Bokeh

Q2. Which library would you prefer for plotting in Python language: Seaborn or
Matplotlib or Bokeh?

It depends on the visualization you’re trying to achieve. Each of these libraries is used for a
specific purpose:

 Matplotlib: Used for basic plotting like bars, pies, lines, scatter plots, etc
 Seaborn: Is built on top of Matplotlib and Pandas to ease data plotting. It is used for
statistical visualizations like creating heatmaps or showing the distribution of your
data
 Bokeh: Used for interactive visualization. In case your data is too complex and you
haven’t found any “message” in the data, then use Bokeh to create interactive
visualizations that will allow your viewers to explore the data themselves

Q3. How are NumPy and SciPy related?

 NumPy is part of SciPy.


 NumPy defines arrays along with some basic numerical functions like indexing,
sorting, reshaping, etc.
 SciPy implements computations such as numerical integration, optimization and
machine learning using NumPy’s functionality.

Q4. Write a basic Machine Learning program to check the accuracy of a model, by
importing any dataset using any classifier?

1 #importing dataset
2 import sklearn
3 from sklearn import datasets
4 iris = datasets.load_iris()
5 X = iris.data
6 Y = iris.target
7
8 #splitting the dataset
9 from sklearn.cross_validation import train_test_split
10 X_train, Y_train, X_test, Y_test = train_test_split(X,Y, test_size = 0.5)
11
12 #Selecting Classifier
13 my_classifier = tree.DecisionTreeClassifier()
14 My_classifier.fit(X_train, Y_train)
15 predictions = my_classifier(X_test)
16 #check accuracy
17 From sklear.metrics import accuracy_score
18 print accuracy_score(y_test, predictions)

Q5. You are given a cancer detection data set. Let’s suppose when you build a
classification model you achieved an accuracy of 96%. Why shouldn’t you be happy
with your model performance? What can you do about it?

You can do the following:

 Add more data


 Treat missing outlier values
 Feature Engineering
 Feature Selection
 Multiple Algorithms
 Algorithm Tuning
 Ensemble Method
 Cross-Validation

Q6. Explain false negative, false positive, true negative and true positive with a simple
example.

Let’s consider a scenario of a fire emergency:

 True Positive: If the alarm goes on in case of a fire.


Fire is positive and prediction made by the system is true.
 False Positive: If the alarm goes on, and there is no fire.
System predicted fire to be positive which is a wrong prediction, hence the
prediction is false.
 False Negative: If the alarm does not ring but there was a fire.
System predicted fire to be negative which was false since there was fire.
 True Negative: If the alarm does not ring and there was no fire.
The fire is negative and this prediction was true.

Q7. What’s the difference between Type I and Type II error?


Q8. What is Regression in machine learning?

Ans:
Regression consists of mathematical methods that allows to predict a continuous
outcome (y) based on the value of one or more predictor variables (x).( Linear regression
is probably the most popular form of regression analysis because of its ease-of-use in predicting and forecasting.)

Types:
 Linear
 Polynomial
 Lasso
 Ringe
 Elasticnet

Q9. Explain types of Regression Models

Q10. Explain Classification and Regression


Q11. Explain Univariate and Multivariate Regression

Univariate linear regression focuses on determining relationship between one independent


variable and one dependent variable.
As the name implies, multivariate regression is a technique that estimates a single
regression model with more than one outcome variable. When there is more than one
predictor variable in a multivariate regression model, the model is a multivariate multiple
regression.

Q 12. How is KNN different from K-means clustering?

Q13. What is the difference between Gini Impurity and Entropy in a Decision Tree?

 Gini Impurity and Entropy are the metrics used for deciding how to split a Decision
Tree.
 Gini measurement is the probability of a random sample being classified correctly if
you randomly pick a label according to the distribution in the branch.
 Entropy is a measurement to calculate the lack of information. You calculate the
Information Gain (difference in entropies) by making a split. This measure helps to
reduce the uncertainty about the output label.

Q14. What is the difference between Entropy and Information Gain?

 Entropy is an indicator of how messy your data is. It decreases as you reach closer to
the leaf node.
 The Information Gain is based on the decrease in entropy after a dataset is split on an
attribute. It keeps on increasing as you reach closer to the leaf node.

Q 15. What is Unsupervised Learning?

Unsupervised learning is also a type of machine learning algorithm used to find patterns on
the set of data given. And here the data is unlabbled and un categorized.
Unsupervised Learning Algorithms:

 Clustering,
 Anomaly Detection,
 Neural Networks and Latent Variable Models.

Example:

In the same example, a T-shirt clustering will categorize as “collar style and V neck style”,
“crew neck style” and “sleeve types”.

Q. 16 What is ‘Naive’ in a Naive Bayes?


Ans:
Naive Bayes is a simple and powerful algorithm for predictive modeling. ... Naive Bayes is called
naive because it assumes that each input variable is independent.

Q. 17 What is the purpose of Training dataset and testing Dataset?


Ans:
 Training data set:
The goal is to produce a trained (fitted) model that generalizes well to new,
unknown data. The fitted model is evaluated using “new” examples from the held-out
datasets (validation and test datasets) to estimate the model's accuracy in classifying
new data.

 Test Dataset: The sample of data used to provide an unbiased evaluation of a final model
fit on the training dataset.

Q. 16 Write down any 5 applications of using machine learning algorithms?


Ans:
Applications of Machine learning
1. Image Recognition: Image recognition is one of the most common applications of
machine learning. ...
2. Speech Recognition. ...
3. Traffic prediction: ...
4. Product recommendations: ...
5. Self-driving cars: ...
6. Email Spam and Malware Filtering: ...
7. Virtual Personal Assistant: ...
8. Online Fraud Detection:

You might also like