Professional Documents
Culture Documents
LP I ML Viva Questions
LP I ML Viva Questions
Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the
use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.
INTRODUCTION TO LAB:
Machine Learning is used anywhere from automating mundane tasks to offering intelligent
insights, industries in every sector try to benefit from it. For example, a wearable fitness tracker
like Fitbit, or an intelligent home assistant like Google Home. But there are much more examples
of ML in use.
Prediction: Machine learning can also be used in the prediction systems. Considering the
loan example, to compute the probability of a fault, the system will need to classify the
available data in groups.
Image recognition: Machine learning can be used for face detection in an image as well.
There is a separate category for each person in a database of several people.
Speech Recognition: It is the translation of spoken words into the text. It is used in voice
searches and more. Voice user interfaces include voice dialing, call routing, and appliance
control. It can also be used a simple data entry and the preparation of structured documents.
Financial industry: and trading companies use ML in fraud investigations and credit
checks.
2. Unsupervised Learning
3. Reinforcement Learning
NOt
Categorical NOT
Separation*Pa
Learns
Spam
Regression: A regression problem is when the output variable is a real value, such as
“dollars” or “weight”.
in the above example, we can see that the agent is given 2 options i.e. a path with water or a path
with fire. A reinforcement algorithm works on reward a system i.e. if the agent uses the fire path
then the rewards are subtracted and agent tries to learn that it should avoid the fire path. If it had
chosen the water path or the safe path then some points would have been added to the reward
points, the agent then would try to learn what path is safe and what path isn’t.
It is basically leveraging the rewards obtained; the agent improves its environment knowledge to
select the next action.
Q1. Name a few libraries in Python used for Data Analysis and Scientific Computations.
NumPy
SciPy
Pandas
SciKit
Matplotlib
Seaborn
Bokeh
Q2. Which library would you prefer for plotting in Python language: Seaborn or
Matplotlib or Bokeh?
It depends on the visualization you’re trying to achieve. Each of these libraries is used for a
specific purpose:
Matplotlib: Used for basic plotting like bars, pies, lines, scatter plots, etc
Seaborn: Is built on top of Matplotlib and Pandas to ease data plotting. It is used for
statistical visualizations like creating heatmaps or showing the distribution of your
data
Bokeh: Used for interactive visualization. In case your data is too complex and you
haven’t found any “message” in the data, then use Bokeh to create interactive
visualizations that will allow your viewers to explore the data themselves
Q4. Write a basic Machine Learning program to check the accuracy of a model, by
importing any dataset using any classifier?
1 #importing dataset
2 import sklearn
3 from sklearn import datasets
4 iris = datasets.load_iris()
5 X = iris.data
6 Y = iris.target
7
8 #splitting the dataset
9 from sklearn.cross_validation import train_test_split
10 X_train, Y_train, X_test, Y_test = train_test_split(X,Y, test_size = 0.5)
11
12 #Selecting Classifier
13 my_classifier = tree.DecisionTreeClassifier()
14 My_classifier.fit(X_train, Y_train)
15 predictions = my_classifier(X_test)
16 #check accuracy
17 From sklear.metrics import accuracy_score
18 print accuracy_score(y_test, predictions)
Q5. You are given a cancer detection data set. Let’s suppose when you build a
classification model you achieved an accuracy of 96%. Why shouldn’t you be happy
with your model performance? What can you do about it?
Q6. Explain false negative, false positive, true negative and true positive with a simple
example.
Ans:
Regression consists of mathematical methods that allows to predict a continuous
outcome (y) based on the value of one or more predictor variables (x).( Linear regression
is probably the most popular form of regression analysis because of its ease-of-use in predicting and forecasting.)
Types:
Linear
Polynomial
Lasso
Ringe
Elasticnet
Q13. What is the difference between Gini Impurity and Entropy in a Decision Tree?
Gini Impurity and Entropy are the metrics used for deciding how to split a Decision
Tree.
Gini measurement is the probability of a random sample being classified correctly if
you randomly pick a label according to the distribution in the branch.
Entropy is a measurement to calculate the lack of information. You calculate the
Information Gain (difference in entropies) by making a split. This measure helps to
reduce the uncertainty about the output label.
Entropy is an indicator of how messy your data is. It decreases as you reach closer to
the leaf node.
The Information Gain is based on the decrease in entropy after a dataset is split on an
attribute. It keeps on increasing as you reach closer to the leaf node.
Unsupervised learning is also a type of machine learning algorithm used to find patterns on
the set of data given. And here the data is unlabbled and un categorized.
Unsupervised Learning Algorithms:
Clustering,
Anomaly Detection,
Neural Networks and Latent Variable Models.
Example:
In the same example, a T-shirt clustering will categorize as “collar style and V neck style”,
“crew neck style” and “sleeve types”.
Test Dataset: The sample of data used to provide an unbiased evaluation of a final model
fit on the training dataset.