Professional Documents
Culture Documents
Unit 1
Unit 1
Unit 1
“A computer program is said to learn from experience E with respect to some class of tasks
T and performance measure P if its performance at tasks in T, as measured by P, improves
with experience E.”
Any problem can be segregated as well-posed learning problem if it has three traits
Task
Performance Measure
Experience
Certain examples that efficiently defines the well-posed learning problem are
1. To better filter emails as spam or not
Task – Classifying emails as spam or not
Performance Measure – The fraction of emails accurately classified as spam or not
spam
Experience – Observing you label emails as spam or not spam
2. A checkers learning problem
Task – Playing checkers game
Performance Measure – percent of games won against opposer
Experience – playing implementation games against itself
3. Handwriting Recognition Problem
Task – Acknowledging handwritten words within portrayal
Performance Measure – percent of words accurately classified
Experience – a directory of handwritten words with given classifications
4. A Robot Driving Problem
Task – driving on public four-lane highways using sight scanners
Performance Measure – average distance progressed before a fallacy
Experience – order of images and steering instructions noted down while
observing a human driver
5. Fruit Prediction Problem
Task – forecasting different fruits for recognition
Performance Measure – able to predict maximum variety of fruits
Experience – training machine with the largest datasets of fruits images
6. Face Recognition Problem
Task – predicting different types of faces
Performance Measure – able to predict maximum types of faces
Experience – training machine with maximum amount of datasets of different face
images
7. Automatic Translation of documents
Task – translating one type of language used in a document to other language
Performance Measure – able to convert one language to other efficiently
Experience – training machine with a large dataset of different types of languages
Machine learning is one of the most exciting technologies. It is being used today, perhaps in
many more places than one would expect.
Today, all most companies are using Machine Learning to improve business decisions,
increase productivity, detect disease, forecast weather, and do many more things. With the
exponential growth of technology, we not only need better tools to understand the data we
currently have, but we also need to prepare ourselves for the data we will have.
To achieve this goal we need to build intelligent machines. We can write a program to do
simple things. But most of the time, Hardwiring Intelligence in it is difficult. The best way to
do it is to have some way for machines to learn things themselves.
A mechanism for learning – if a machine can learn from input then it does the hard
work for us. This is where Machine Learning comes into action. Some of the most
common examples are:
Image Recognition
Image Recognition is one of the reasons behind the boom one could have
experienced in the field of Deep Learning. The task which started from classification
between cats and dog images has now evolved up to the level of Face Recognition and
real-world use cases based on that like employee attendance tracking.Also, image
recognition has helped revolutionized the healthcare industry by employing smart systems
in disease recognition and diagnosis methodologies.
Speech Recognition
Speech Recognition based smart systems like Alexa and Siri have certainly come
across and used to communicate with them. In the backend, these systems are based
basically on Speech Recognition systems. These systems are designed such that they can
convert voice instructions into text.
One more application of the Speech recognition that we can encounter in our day-to-day life
is that of performing Google searches just by speaking to it.
Recommender Systems
As our world has digitalized more and more approximately every tech giants try to
provide customized services to its users. This application is possible just because of
the recommender systems which can analyze a user’s preferences and search history and
based on that they can recommend content or services to them.
An example of these services is very common for example youtube. It recommends new
videos and content based on the user’s past search patterns. Netflix recommends movies
and series based on the interest provided by users when someone creates an account for
the very first time.
Fraud Detection
In today’s world, most things have been digitalized varying from buying toothbrushes
or making transactions of millions of dollars everything is accessible and easy to use. But
with this process of digitization cases of fraudulent transactions and fraudulent activities
have increased. Identifying them is not that easy but machine learning systems are very
efficient in these tasks.
Due to these applications only whenever the system detects red flags in a user’s activity
than a suitable notification be provided to the administrator so, that these cases can be
monitored properly for any spam or fraud activities.
Self Driving Cars
It would have been assumed that there is certainly some ghost who is driving a car if
we ever saw a car being driven without a driver but all thanks to machine learning and deep
learning that in today’s world, this is possible and not a story from some fictional book. Even
though the algorithms and tech stack behind these technologies are highly advanced but at
the core it is machine learning which has made these applications possible.
The most common example of this use case is that of the Tesla cars which are well-tested
and proven for autonomous driving.
Medical Diagnosis
If you are a machine learning practitioner or even if you are a student then you must
have heard about projects like breast cancer Classification, Parkinson’s Disease
Classification, Pneumonia detection, and many more health-related tasks which are
performed by machine learning models with more than 90% of accuracy.
Not even in the field of disease diagnosis in human beings but they work perfectly fine for
plant disease-related tasks whether it is to predict the type of disease it is or to detect
whether some disease is going to occur in the future.
Stock Market Trading
Stock Market has remained a hot topic among working professionals and even
students because if you have sufficient knowledge of the markets and the forces which
drives them then you can make fortune in this domain. Attempts have been made to create
intelligent systems which can predict future price trends and market value as well.
This can be considered as one of the applications of time series forecasting because stock
price data is nothing but sequential data in which the time at which data has been taken is of
utmost importance.
Virtual Try On
Have you ever purchased your specs or lenses from Lenskart? If yes then you must
have come across its feature where you can try different frames virtually without actually
purchasing them or visiting the outlet. This has become possible just because of the
machine learning systems only which identify certain landmarks on a person’s face and then
place the specs virtually on your face using those landmarks.
Data Representation
The process of collecting the data and analyzing that data in large quantity is known
as statistics. It is a branch of mathematics trading with the collection, analysis,interpretation,
and presentation of numeral facts and figures.
It is a numerical statement that helps us to collect and analyze the data in large
quantity the statistics are based on two of its concepts:
Statistical Data
Statistical Science
Collecting data and arranged in tabular form to study their salient features. Such an
arrangement is known as the presentation of data.
It refers to the process of compression the collected data in a tabular form or
graphically. This arrangement of data is known as Data Representation.
The row can be placed in different orders like it can be presented in ascending orders,
descending order, or can be presented in alphabetical order.
EX:- Let the marks obtained by 10 students of class V in a class test, out of 50 according to
their roll numbers, be: 39, 44, 49, 40, 22, 10, 45, 38, 15, 50
Ans)
Ascending order:
10, 15, 22, 38, 39, 40, 44. 45, 49, 50 Roll No. Marks
Descending order:
1 39
50, 49, 45, 44, 40, 39, 38, 22, 15, 10
2 44
When the row is placed in ascending or 3 49
descending order is known as arrayed 4 40
data.
5 22
6 10
7 45
8 38
9 14
10 50
Types of Graphical Data Representation
Bar Chart
Bar chart helps us to represent the collected data visually. The collected data can be
visualized horizontally or vertically in a bar chart like amounts and frequency.
EX:-Let the marks obtained by 5 students of class V in a class test, out of 10 according to
their names, be:
7,8,4,9,6
The data in the given form is known as raw data. The above given data can be placed in the
bar chart as shown below:
Name Marks
Marks
10
9
Akshay 7
8
7
Maya 8 6
5
Marks
Dhanvi 4 4
3
2
Jaslen 9
1
0
Muskan 6 Akshay Maya Dhanvi Jaslen Muskan
Histogram
A histogram is the graphical representation of data. It is similar to the appearance of a bar
graph but there is a lot of difference between histogram and bar graph because a bar graph
helps to measure the frequency of categorical data. A categorical data means it is based on
two or more categories like gender, months, etc. Whereas histogram is used for quantitative
data.
For example:
Line Graph
The graph which uses lines and points to present the change in time is known as a line
graph. Line graphs can be based on the number of animals left on earth, the increasing
population of the world day by day, or the increasing or decreasing the number of bitcoins
day by day, etc.
For Example:
Pie Chart
Pie chart is a type of graph that involves a structural graphic representation of numerical
proportion. It can be replaced in most cases by other plots like a bar chart, box plot, dot plot,
etc. As per the research, it is shown that it is difficult to compare the different sections of a
given pie chart, or if it is to compare data across different pie charts.
For example:
Big Data includes huge volume, high velocity, and extensible variety of data. There are 3
types: Structured data, Semi-structured data and unstructured data.
1. Structured data
Structured data is data whose elements are addressable for effective analysis. It has
been organized into a repository database. It concerns all data which can be stored in
database SQL in a table with rows and columns. They have relational keys and can easily
be mapped into pre-designed fields.
Example: Relational data.
2. Semi-Structured data
Semi-structured data is information that does not reside in a relational database but
that has some organizational properties that make it easier to analyze. With some
processes, you can store them in the relation database (it could be very hard for some
kind of semi-structured data), but Semi-structured exist to ease space.
Example: XML data.
3. Unstructured data
Unstructured data is a data which is not organized in a predefined manner or does not
have a predefined data model .It has more alternative platforms for storing and
managing, it is increasingly prevalent in IT systems and is used by organizations in a
variety of business intelligence and analytics applications.
Example: Word, PDF, Text, Media logs.
Properties Structured data Semi-structured data Unstructured data
Classification
Regression
Classification
Classification deals with predicting categorical target variables, which represent
discrete classes or labels. For instance, classifying emails as spam or not spam, or predicting
whether a patient has a high risk of heart disease. Classification algorithms learn to map the
input features to one of the predefined classes.
Here are some classification algorithms:
Logistic Regression Decision Tree
Support Vector Machine K-Nearest Neighbors (KNN)
Random Forest Naive Bayes
Regression
Regression, on the other hand, deals with predicting continuous target variables,
which represent numerical values. For example, predicting the price of a house based on its
size, location, and amenities, or forecasting the sales of a product. Regression algorithms
learn to map the input features to a continuous numerical value.
Here are some regression algorithms:
Linear Regression Lasso Regression
Polynomial Regression Decision tree
Ridge Regression Random Forest
Example: Consider that you have a dataset that contains information about the purchases
you made from the shop. Through clustering, the algorithm can group the same purchasing
behavior among you and other customers, which reveals potential customers without
predefined labels. This type of information can help businesses get target customers as well
as identify outliers.
There are two main categories of unsupervised learning that are mentioned below:
Clustering
Association
Clustering
Clustering is the process of grouping data points into clusters based on their similarity. This
technique is useful for identifying patterns and relationships in data without the need for
labeled examples.
Here are some clustering algorithms:
K-Means Clustering algorithm Principal Component Analysis
Mean-shift algorithm Independent Component Analysis
DBSCAN Algorithm
Association
Association rule learning is a technique for discovering relationships between items in a
dataset. It identifies rules that indicate the presence of one item implies the presence of
another item with a specific probability.
Here are some association rule learning algorithms:
Apriori Algorithm FP-growth
Eclat Algorithm
Advantages of Unsupervised Machine Learning
It helps to discover hidden patterns and various relationships between the data.
Used for tasks such as customer segmentation, anomaly detection, and data
exploration.
It does not require labeled data and reduces the effort of data labeling.
3. Semi-Supervised Learning
Semi-Supervised learning is a machine learning algorithm that works between the supervised
and unsupervised learning so it uses both labeled and unlabelled data. It’s particularly useful
when obtaining labeled data is costly, time-consuming, or resource-intensive. This approach
is useful when the dataset is expensive and time-consuming. It is chosen when labeled data
requires skills and relevant resources in order to train or learn from it.
Example:
Consider that we are building a language translation model, having labeled translations for
every sentence pair can be resources intensive. It allows the models to learn from labeled
and unlabeled sentence pairs, making them more accurate. This technique has led to
significant improvements in the quality of machine translation services.
Basic Linear Algebra in Machine Learning Techniques
Machine learning has a strong connection with mathematics. Each machine learning
algorithm is based on the concepts of mathematics & also with the help of mathematics, one
can choose the correct algorithm by considering training time, complexity, number of features,
etc. Linear Algebra is an essential field of mathematics, which defines the study of vectors,
matrices, planes, mapping, and lines required for linear transformation.
The term Linear Algebra was initially introduced in the early 18 th century to find out the
unknowns in Linear equations and solve the equation easily; hence it is an important branch of
mathematics that helps study data. Also, no one can deny that Linear Algebra is undoubtedly
the important and primary thing to process the applications of Machine Learning. It is also a
prerequisite to start learning Machine Learning and data science.
V=[e1,e2,e3,e4]
Here V is a vector in which e1, e2, e3 and e4 are its elements, and V[2] is e3.
Vector Operations
1. Scalar-Vector Multiplication
p = [e1, e2, e3] The product of a scalar with a vector gives the below result. When the
scalar 2 is multiplied by a vector p then all the elements of the vector p is multiplied by that
scalar. This operation satisfies commutative property.
p * 2 = [2 * e1, 2 * e2, 2 * e3]