Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Q No.

1
1.1Machine Learning:
 Machine learning is a concept which allow to machine to learn from examples and
experiences and that too without being explicitly programmed so instead of writing code
what is done is simple feed data to the generic data.
 Machine learning is the study of computer algorithms that improve automatically
through experience. It is seen as a subset of artificial intelligence. Machine learning
algorithms build a model based on sample data, known as training data in order to make
predictions or decisions without being explicitly programmed to do so. Machine learning
algorithms are used in a wide variety of applications e.g., email filtering spam or not and
also for computer vision. where it is difficult or unfeasible to develop conventional
algorithms to perform the needed tasks.
 Machine learning is an application of artificial intelligence that provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that
can access data and use it to learn for themselves.

1.2Working of Machine Learning: Machine learning uses two types of


techniques: supervised learning, which trains a model on known input and output data
so that it can predict future outputs, and unsupervised learning, which finds hidden
patterns or intrinsic structures in input data.
1.3Limitations:
1. Accuracy depends on training and learning which is sometime not
available
2. Require accurate detail on many past progress
3. Need large data requirements for learning which may be got from
various resources
4. Machine can not learn if no data is available.
5. Machine need heterogeneity in data set to learn more accurate
1.4Main challenges and their solutions:
1. Data: it includes amount and quality of data. if large data or no
data is available, we cannot train the algorithm. Sometime data is
not accurate. Unclear representation of data.
Solution:
 Have large data
 Accurate data
 Heterogenous data
 Exact information
2. Feature selection and extraction:
Solution: Select the most relevant and useful features that are already available in the
data set and combine the available features and create even more useful features.

3. Overfitting and underfitting: Overfitting is the situation when the model


does really well on the training set but generalizes very poorly in the future predictions.
Underfitting occurs when the model gives way less accuracy than expected even for the
training set.
Solution: Overfitting is avoided by cross validation and training more data whereas
underfitting is avoided by selecting more features and increasing complexity.
4. Resources: it may need expensive data, resources on which it run, some may not be
in range.
Solution: We do not have proper solution of this challenge.
Q No. 2
2.1 Logistic regression: Logistic regression is a supervised learning classification
algorithm used to predict the probability of a target variable. The nature of target or dependent
variable is dichotomous, which means there would be only two possible classes. In simple
words, the dependent variable is binary in nature having data coded as either 1 or 0.

Step by step explanation: I will explain logistic regression with an example


Let’s we want to predict gender on the bases of weight height. Data sample of 1000 people is
collected. Height is in inches and weight is in kg. it will work like
1. Get dataset
2. Train classifier
3. Make prediction
We have formula for hypothesis: h ( x )=θ T and function like

Also called sigmoid function and work like

Logistic regression decision boundary

Since our data set has two features: height and weight, the logistic regression hypothesis is the

following:

The logistic regression classifier will predict “Male” if:


This is because the logistic regression “threshold” is set at g(z)=0.5, see the plot of the logistic

regression function above for verification.

For our data set the values of θ are:

With the coefficients at hand, a manual prediction (that is, without using the

function clf.predict()) would simply require to compute the vector product

and to check if the resulting scalar is bigger than or equal to zero (to predict Male), or otherwise

(to predict Female).

As an example, say we want to predict the gender of someone with Height=70

inches and Weight = 180 pounds, like at line 14 at the script LogisticRegression.py above, one

can simply do:

Making a prediction using the Logistic Regression parameter θ.


Since the result of the product is bigger than zero, the classifier will predict Male.

A visualization of the decision boundary and the complete data set can be seen here:

As you can see, above the decision boundary lie most of the blue points that correspond to the

Male class, and below it all the pink points that correspond to the Female class.

Also, from just looking at the data you can tell that the predictions won’t be perfect. This can be

improved by including more features (beyond weight and height), and by potentially using a

different decision boundary.Logistic regression decision boundaries can also be non-linear

functions, such as higher degree polynomials.


Computing the logistic regression parameter

The scikit-learn library does a great job of abstracting the computation of the logistic regression

parameter θ, and the way it is done is by solving an optimization problem.

Let’s start by defining the logistic regression cost function for the two points of interest: y=1, and

y=0, that is, when the hypothesis function predicts Male or Female.

Then, we take a convex combination in y of these two terms to come up with the logistic

regression cost function:

Logistic regression cost function.

The logistic regression cost function is convex. Thus, in order to compute θ, one needs to solve

the following (unconstrained) optimization problem:

There is a variety of methods that can be used to solve this unconstrained optimization problem,

such as the 1st order method gradient descent that requires the gradient of the logistic regression

cost function, or a 2nd order method such as Newton’s method that requires the gradient and the

Hessian of the logistic regression cost.


For the case of gradient descent, the search direction is the negative partial derivative of the

logistic regression cost function with respect to the parameter θ:

Partial derivative of the logistic regression cost function.

In its most basic form, gradient descent will iterate along the negative gradient direction of θ

(known as a minimizing sequence) until reaching convergence.

2.2Neural Networks: Neural networks are a series of algorithms that mimic the


operations of a human brain to recognize relationships between vast amounts of data. They are
used in a variety of applications in financial services, from forecasting and marketing research
to fraud detection and risk assessment.
Step by step explanation:
A neural network has many layers. Each layer performs a specific function, and the complex the
network is, the more the layers are. That’s why a neural network is also called a multi-layer
perceptron.
The purest form of a neural network has three layers:
1. Input layer
2. Hidden layer
3. Output layer

Working:
1. Information is fed into the input layer which transfers it to the hidden layer
2. The interconnections between the two layers assign weights to each input randomly
3. A bias added to every input after weights are multiplied with them individually
4. The weighted sum is transferred to the activation function
5. The activation function determines which nodes it should fire for feature extraction
6. The model applies an application function to the output layer to deliver the output
7. Weights are adjusted, and the output is back-propagated to minimize error
The model uses a cost function to reduce the error rate. You will have to change the weights with
different training models. 
1. The model compares the output with the original result
2. It repeats the process to improve accuracy
The model adjusts the weights in every iteration to enhance the accuracy of the output. 

2.3SVM: support vector machine is a computer algorithm that learns by example to assign
labels to objects. For instance, an SVM can learn to recognize fraudulent credit card activity by
examining hundreds or thousands of fraudulent and nonfraudulent credit card activity reports.
Alternatively, an SVM can learn to recognize handwritten digits by examining a large collection
of scanned images of handwritten zeroes, ones and so forth. SVM have also been successfully
applied to an increasingly wide variety of biological applications.
Step by step explanation:
 Import the dataset.
 Explore the data to figure out what they look like.
 Pre-process the data.
 Split the data into attributes and labels.
 Divide the data into training and testing sets.
 Train the SVM algorithm.
 Make some predictions.
Example
we can see that the boundary decision line is the function x2=x1−3x2=x1−3. Using the
formula wTx+b=0wTx+b=0 we can obtain a first guess of the parameters as

w= [1, −1] b=−3w= [1, −1] b=−3


Using these values, we would obtain the following width between the support vectors: 22√=2–
√22=2. Again, by inspection we see that the width between the support vectors is in fact of
length 42–√42 meaning that these values are incorrect.

Recall that scaling the boundary by a factor of cc does not change the boundary line, hence we can
generalize the equation as

cx1−xc2−3c=0cx1−xc2−3c=0
w= [c, −c] b=−3cw= [c, −c] b=−3c
Plugging back into the equation for the width we get

2||w||22–√cc=14=42–√=42–√2||w||=4222c=42c=14
Hence the parameters are in fact
w= [14, −14] b=−34w= [14, −14] b=−34
To find the values of αiαi we can use the following two constraints which come from the dual
problem:

w=∑imαiy(i)x(i)w=∑imαiy(i)x(i)
∑imαiy(i)=0∑imαiy(i)=0
And using the fact that αi≥0αi≥0 for support vectors only (i.e., 3 vectors in this case) we obtain the
system of simultaneous linear equations:
⎡⎣⎢6α1−2α2−3α3−1α1−3α2−4α31α1−2α2−1α3⎤⎦⎥α=⎡⎣⎢1/4−1/40⎤⎦⎥=⎡⎣⎢1/161/160⎤⎦

3.Comparision of performance
3.1 Different matrices are used for the evaluation of performance of these algorithms.
We used confusion matric in class.
3.2 Conditions in which one algorithm is preferred over other
In fact, no one could be the best.  It depends upon the problem which classifier would
be suitable.

SVM is best.

1) When number of features (variables) and number of training data is very large
(say millions of features and millions of instances (data)).
2) When sparsity in the problem is very high, i.e., most of the features have zero
value.
3) It is the best for document classification problems where sparsity is high and
features/instances are also very high.
4) It also performs very well for problems like image classification, genes
classification, drug disambiguation etc. where number of features are high.
Logistic regression:
Neural Networks:

You might also like