Professional Documents
Culture Documents
Artificial Intelligence
Artificial Intelligence
Artificial Intelligence
"It is a branch of computer science by which we can create intelligent machines which
can behave like a human, think like humans, and able to make decisions."=
Strong AI: It is the hypothetical concept that involves the machine that will be better
than humans and will surpass human intelligence.
Ans :The Turing test is a method to test a machine’s ability to match human-level intelligence. A
machine is used to challenge human intelligence, and when it passes the test it is considered
intelligent.
What is an expert system?
MACHINE LEARNING
Ans –
Unsupervised Learning - The training is provided to the machine with the set of data
that has not been labeled, classified, or categorized, and the algorithm needs to act
on that data without any supervision. The goal of unsupervised learning is to
restructure the input data into new features or a group of objects with similar
patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to
find useful insights from the huge amount of data.
Linear Regression : It is a statistical method that is used for predictive analysis. Linear
regression algorithm shows a linear relationship between a dependent (y) and one or
more independent (y) variables, hence called as linear regression. Since linear
regression shows the linear relationship, which means it finds how the value of the
dependent variable is changing according to the value of the independent variable.
The main aim of LR is to find the best fit line so that the actual and predicted points
are close to each other.
What are the common techniques used to improve the accuracy of a linear
regression model?
Feature selection: selecting the most relevant features for the model to improve its
predictive power.
Feature scaling: scaling the features to a similar range to prevent bias towards certain
features.
Cross-validation: dividing the data into multiple partitions and using a different
partition for validation in each iteration to avoid overfitting.
Ensemble methods: combining multiple models to improve the overall accuracy
and reduce variance
Classification Algorithm : These algorithm tries to classify the data into labels like
Yes/No, 0/1, Spam/Not Spam. Regression give output in value but classification give
output in category.
Confusion Matrix – It is a matrix that gives us the performance of the model in terms
of true and false positive/negative
Log loss – It is generally used for the model whose output is between 0 and 1
2. K-Nearest Neighbour : It works on similarity method i.e based on the old data
it has stored it sees the similarity between the new and the old data and based
on that it classify the new data . It is also called as lazy-learner coz it spends
less time on learning on the training dataset.The decision is based on
Euclidean or manhatten distance depending on which data points are
classified
Soft Margin: When the data is not perfectly separable or contains outliers,
SVM permits a soft margin technique.
Regression Algorithm Classification Algorithm
Decision Tree Algorithm : It is a tree data structure where the nodes represents the
features of the dataset , edges represent the decision rules and the leaf node
represent the output .It starts with a single dataset as a root node , after this the best
attribute through attribute selection measure is been calculated and it is been
divided into its possible values and this process is recursively applied until no further
splitting of node is possible.
Random Forest : It is a type of ensemble learning algorithm which takes the average of all the
output of various number of decision tree and then predicts the output.
VARIANCE : It shows how much a model function changes by changing the training dataset.
The main goal of each machine learning model is to generalize well. Here
generalization defines the ability of an ML model to provide a suitable output by
adapting the given set of unknown input.
Overfitting : When the model tries to learn the training data so well that it also
catches the noise and unwanted characteristics . When it is introduced to new
dataset it gives wrong output.It has low bias and high variance.
It can be controlled by :
Underfitting : When the model fails to learn the hidden relationships among the
features in the training dataset it gives false output.
Feature Engineering : Feature Engineering is the process of extracting and organizing the
important features from raw data in such a way that it fits the purpose of the machine learning
model. It includes selecting relevant features, handling missing data, encoding the data(data
transformation).
Handling of missing values : We can drop the column if its not relevant , or can either imput
mean,median of the column , or can set default values for the columns.
Outliers : They are the datapoints that significantly deviates from the rest of the data points .
It may be due to inconsistent data entry . It can be detected with the help of z-score - It is
used to determine how much a datapoint is away from the mean.In order to deal with outlier
we can either remove them or impute them with median.
Recall : Tp/Tp+Fn
DEEP LEARNING
Deep Learning is a computer software that mimics the network of
Defn :
neurons in a brain. It is a subset of machine learning based on artificial
neural networks. It is called deep learning because it makes use of deep
neural networks. The basic components present in a deep learning are :
Application :
Ans : Artificial neural networks are built on the principles of the structure and
operation of human neurons. An artificial neural network’s input layer, which is the
first layer, receives input from external sources and passes it on to the hidden layer,
which is the second layer. Each neuron in the hidden layer gets information from the
neurons in the previous layer, computes the weighted total, and then transfers it to
the neurons in the next layer.
Can work on the smaller amount of Requires the larger volume of dataset
dataset compared to machine learning
Takes less time to train the model. Takes more time to train the model.
Learning Rate : The learning rate determines how quickly or slowly a neural network model
adapts to a given situation and learns. A higher learning rate value indicates that the model only
needs a few training epochs and produces rapid changes, whereas a lower learning rate indicates
that the model may take a long time to converge or may never converge and become stuck on a poor
solution.
1. Feed Forward Neural Network : This is the most basic type of neural network, in which
flow control starts at the input layer and moves to the output layer. These networks only
have a single layer or a single hidden layer. The input layer of this network receives the sum
of the weights present in the input. These networks are utilised in the computer vision-based
facial recognition method.
CNN is a multi-layered neural network
2. Convolutional Neural Network :
with a unique architecture designed to extract increasingly
complex features of the data at each layer to determine the
output. CNN is mostly used when there is an unstructured data set
(e.g., images) .
Eg : A model has to predict a caption for the image . The image is of cat but for model it’s a
collection of pixels . The hidden layer will try to extract relevant features such as tail of
cat , head of a cat . After this depending on the probability of these features the model will
predict the caption.
3.
The classifier is called ‘naive’ because it makes assumptions that may or may not turn out to
be correct.
The algorithm assumes that the presence of one feature of a class is not related to the
presence of any other feature (absolute independence of features), given the class variable.
For instance, a fruit may be considered to be a cherry if it is red in color and round in shape,
regardless of other features. This assumption may or may not be right (as an apple also
matches the description).
Correlation: Correlation tells us how strongly two random variables are related to each other.
It takes values between -1 to +1.
Matplotlib : It is also an open-source library and plots high-defined figures like pie
charts, histograms, scatterplots, graphs, etc.
Pandas : It is an open-source machine learning library that provides flexible high-
level data structures and a variety of analysis tools. It eases data analysis, data
manipulation, and cleaning of data. Pandas support operations like Sorting,
Concatenation, Conversion of data, Visualizations, Aggregations, etc.
Numpy : It is a popular machine learning library that supports large matrices and
multi-dimensional data. It consists of in-built mathematical functions for easy
computations.
Scipy : The name “SciPy” stands for “Scientific Python”. It is an open-source library
used for high-level scientific computations.
Scikit-learn is an open-source library that supports machine learning. It supports
variously supervised and unsupervised algorithms like linear regression,
classification, clustering, etc.
Tensorflow: It is an end-to-end machine learning library that deals specifically with
deep learning-related task. It provides various methods libraries that can be used on
unstructured data.
Keras: It is a high-level API developed by Google for implementing neural network-
related tasks.
If your dataset is suffering from high variance, how would you handle it?
For datasets with high variance, we could use the bagging algorithm to handle it. Bagging
algorithm splits the data into subgroups with sampling replicated from random data. After the
data is split, random data is used to create rules using a training algorithm. Then we use
polling technique to combine all the predicted outcomes of the model.
The random forest creates each Gradient boosting yields better outcomes than random forests if
tree independent of the others parameters are carefully tuned but it’s not a good option if the
while gradient boosting develops data set contains a lot of outliers/anomalies/noise as it can result
one tree at a time. in overfitting of the model.
Normalisation Regularisation
Normalisation adjusts the data; . If your data is on very
different scales (especially low to high), you would want Regularisation adjusts the
to normalise the data. Alter each column to have prediction function.
compatible basic statistics. This can be helpful to make Regularization imposes some
sure there is no loss of accuracy. One of the goals of control on this by providing
model training is to identify the signal and ignore the simpler fitting functions over
noise if the model is given free rein to minimize error, complex ones.
there is a possibility of suffering from overfitting.
Normalization and Standardization are the two very popular methods used for feature scaling.
Normalisation Standardization