Unit 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

1) Well Posed Learning Problem

“A computer program is said to learn from experience E with respect to some class of tasks
T and performance measure P if its performance at tasks in T, as measured by P, improves
with experience E.”

Any problem can be segregated as well-posed learning problem if it has three traits
 Task
 Performance Measure
 Experience
Certain examples that efficiently defines the well-posed learning problem are
1. To better filter emails as spam or not
 Task – Classifying emails as spam or not
 Performance Measure – The fraction of emails accurately classified as spam or not
spam
 Experience – Observing you label emails as spam or not spam
2. A checkers learning problem
 Task – Playing checkers game
 Performance Measure – percent of games won against opposer
 Experience – playing implementation games against itself
3. Handwriting Recognition Problem
 Task – Acknowledging handwritten words within portrayal
 Performance Measure – percent of words accurately classified
 Experience – a directory of handwritten words with given classifications
4. A Robot Driving Problem
 Task – driving on public four-lane highways using sight scanners
 Performance Measure – average distance progressed before a fallacy
 Experience – order of images and steering instructions noted down while
observing a human driver
5. Fruit Prediction Problem
 Task – forecasting different fruits for recognition
 Performance Measure – able to predict maximum variety of fruits
 Experience – training machine with the largest datasets of fruits images
6. Face Recognition Problem
 Task – predicting different types of faces
 Performance Measure – able to predict maximum types of faces
 Experience – training machine with maximum amount of datasets of different face
images
7. Automatic Translation of documents
 Task – translating one type of language used in a document to other language
 Performance Measure – able to convert one language to other efficiently
 Experience – training machine with a large dataset of different types of languages

Applications of Machine Learning

Machine learning is one of the most exciting technologies. It is being used today, perhaps in
many more places than one would expect.
Today, all most companies are using Machine Learning to improve business decisions,
increase productivity, detect disease, forecast weather, and do many more things. With the
exponential growth of technology, we not only need better tools to understand the data we
currently have, but we also need to prepare ourselves for the data we will have.
To achieve this goal we need to build intelligent machines. We can write a program to do
simple things. But most of the time, Hardwiring Intelligence in it is difficult. The best way to
do it is to have some way for machines to learn things themselves.
A mechanism for learning – if a machine can learn from input then it does the hard
work for us. This is where Machine Learning comes into action. Some of the most
common examples are:

 Image Recognition  Self Driving Cars


 Speech Recognition  Medical Diagnosis
 Recommender Systems  Stock Market Trading
 Fraud Detection  Virtual Try On

Image Recognition
Image Recognition is one of the reasons behind the boom one could have
experienced in the field of Deep Learning. The task which started from classification
between cats and dog images has now evolved up to the level of Face Recognition and
real-world use cases based on that like employee attendance tracking.Also, image
recognition has helped revolutionized the healthcare industry by employing smart systems
in disease recognition and diagnosis methodologies.

Speech Recognition
Speech Recognition based smart systems like Alexa and Siri have certainly come
across and used to communicate with them. In the backend, these systems are based
basically on Speech Recognition systems. These systems are designed such that they can
convert voice instructions into text.
One more application of the Speech recognition that we can encounter in our day-to-day life
is that of performing Google searches just by speaking to it.
Recommender Systems
As our world has digitalized more and more approximately every tech giants try to
provide customized services to its users. This application is possible just because of
the recommender systems which can analyze a user’s preferences and search history and
based on that they can recommend content or services to them.
An example of these services is very common for example youtube. It recommends new
videos and content based on the user’s past search patterns. Netflix recommends movies
and series based on the interest provided by users when someone creates an account for
the very first time.
Fraud Detection
In today’s world, most things have been digitalized varying from buying toothbrushes
or making transactions of millions of dollars everything is accessible and easy to use. But
with this process of digitization cases of fraudulent transactions and fraudulent activities
have increased. Identifying them is not that easy but machine learning systems are very
efficient in these tasks.
Due to these applications only whenever the system detects red flags in a user’s activity
than a suitable notification be provided to the administrator so, that these cases can be
monitored properly for any spam or fraud activities.
Self Driving Cars
It would have been assumed that there is certainly some ghost who is driving a car if
we ever saw a car being driven without a driver but all thanks to machine learning and deep
learning that in today’s world, this is possible and not a story from some fictional book. Even
though the algorithms and tech stack behind these technologies are highly advanced but at
the core it is machine learning which has made these applications possible.
The most common example of this use case is that of the Tesla cars which are well-tested
and proven for autonomous driving.
Medical Diagnosis
If you are a machine learning practitioner or even if you are a student then you must
have heard about projects like breast cancer Classification, Parkinson’s Disease
Classification, Pneumonia detection, and many more health-related tasks which are
performed by machine learning models with more than 90% of accuracy.
Not even in the field of disease diagnosis in human beings but they work perfectly fine for
plant disease-related tasks whether it is to predict the type of disease it is or to detect
whether some disease is going to occur in the future.
Stock Market Trading
Stock Market has remained a hot topic among working professionals and even
students because if you have sufficient knowledge of the markets and the forces which
drives them then you can make fortune in this domain. Attempts have been made to create
intelligent systems which can predict future price trends and market value as well.
This can be considered as one of the applications of time series forecasting because stock
price data is nothing but sequential data in which the time at which data has been taken is of
utmost importance.
Virtual Try On
Have you ever purchased your specs or lenses from Lenskart? If yes then you must
have come across its feature where you can try different frames virtually without actually
purchasing them or visiting the outlet. This has become possible just because of the
machine learning systems only which identify certain landmarks on a person’s face and then
place the specs virtually on your face using those landmarks.
Data Representation

The process of collecting the data and analyzing that data in large quantity is known
as statistics. It is a branch of mathematics trading with the collection, analysis,interpretation,
and presentation of numeral facts and figures.
It is a numerical statement that helps us to collect and analyze the data in large
quantity the statistics are based on two of its concepts:

 Statistical Data
 Statistical Science

Collecting data and arranged in tabular form to study their salient features. Such an
arrangement is known as the presentation of data.
It refers to the process of compression the collected data in a tabular form or
graphically. This arrangement of data is known as Data Representation.
The row can be placed in different orders like it can be presented in ascending orders,
descending order, or can be presented in alphabetical order.

EX:- Let the marks obtained by 10 students of class V in a class test, out of 50 according to
their roll numbers, be: 39, 44, 49, 40, 22, 10, 45, 38, 15, 50

Ans)
Ascending order:
10, 15, 22, 38, 39, 40, 44. 45, 49, 50 Roll No. Marks
Descending order:
1 39
50, 49, 45, 44, 40, 39, 38, 22, 15, 10
2 44
When the row is placed in ascending or 3 49
descending order is known as arrayed 4 40
data.
5 22
6 10
7 45
8 38
9 14
10 50
Types of Graphical Data Representation
Bar Chart
Bar chart helps us to represent the collected data visually. The collected data can be
visualized horizontally or vertically in a bar chart like amounts and frequency.

EX:-Let the marks obtained by 5 students of class V in a class test, out of 10 according to
their names, be:
7,8,4,9,6
The data in the given form is known as raw data. The above given data can be placed in the
bar chart as shown below:

Name Marks
Marks
10
9
Akshay 7
8
7
Maya 8 6
5
Marks
Dhanvi 4 4
3
2
Jaslen 9
1
0
Muskan 6 Akshay Maya Dhanvi Jaslen Muskan

Histogram
A histogram is the graphical representation of data. It is similar to the appearance of a bar
graph but there is a lot of difference between histogram and bar graph because a bar graph
helps to measure the frequency of categorical data. A categorical data means it is based on
two or more categories like gender, months, etc. Whereas histogram is used for quantitative
data.
For example:
Line Graph
The graph which uses lines and points to present the change in time is known as a line
graph. Line graphs can be based on the number of animals left on earth, the increasing
population of the world day by day, or the increasing or decreasing the number of bitcoins
day by day, etc.
For Example:

Pie Chart
Pie chart is a type of graph that involves a structural graphic representation of numerical
proportion. It can be replaced in most cases by other plots like a bar chart, box plot, dot plot,
etc. As per the research, it is shown that it is difficult to compare the different sections of a
given pie chart, or if it is to compare data across different pie charts.
For example:

Frequency Distribution Table


A frequency distribution table is a chart that helps us to summarise the value and the
frequency of the chart. This frequency distribution table has two columns, The first column
consist of the list of the various outcome in the data, While the second column list the
frequency of each outcome of the data. By putting this kind of data into a table it helps us to
make it easier to understand and analyze the data.
Diversity Of Data
The diversity in machine learning tries to decrease the redundancy in the training
data, the learned model as well as the inference and provide more information for
machine learning process. It can improve the performance of the model and has
played an important role in machine learning process.

Big Data includes huge volume, high velocity, and extensible variety of data. There are 3
types: Structured data, Semi-structured data and unstructured data.

1. Structured data
Structured data is data whose elements are addressable for effective analysis. It has
been organized into a repository database. It concerns all data which can be stored in
database SQL in a table with rows and columns. They have relational keys and can easily
be mapped into pre-designed fields.
Example: Relational data.
2. Semi-Structured data
Semi-structured data is information that does not reside in a relational database but
that has some organizational properties that make it easier to analyze. With some
processes, you can store them in the relation database (it could be very hard for some
kind of semi-structured data), but Semi-structured exist to ease space.
Example: XML data.
3. Unstructured data
Unstructured data is a data which is not organized in a predefined manner or does not
have a predefined data model .It has more alternative platforms for storing and
managing, it is increasingly prevalent in IT systems and is used by organizations in a
variety of business intelligence and analytics applications.
Example: Word, PDF, Text, Media logs.
Properties Structured data Semi-structured data Unstructured data

It is based on It is based on It is based on


Technology Relational database XML/RDF(Resource character and
table Description Framework). binary data

Matured transaction No transaction


Transaction and various Transaction is adapted management
management concurrency from DBMS not matured and no
techniques concurrency

Version Versioning over Versioning over tuples or Versioned as a


management tuples,row,tables graph is possible whole

It is more flexible than It is more flexible


It is schema
structured data but less and there is
Flexibility dependent and less
flexible than unstructured absence of
flexible
data schema

It is very difficult to It’s scaling is simpler than It is more


Scalability
scale DB schema structured data scalable.

New technology, not very


Robustness Very robust —
spread

Structured query Only textual


Query Queries over anonymous
allow complex queries are
performance nodes are possible
joining possible
MACHINE LEARNING TYPES
There are several types of machine learning, each with special characteristics and
applications. Some of the main types of machine learning algorithms are as follows:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

1. Supervised Machine Learning


Supervised learning is defined as when a model gets trained on a “Labelled Dataset”.
Labelled datasets have both input and output parameters.In Supervised Learning algorithms
learn to map points between inputs and correct outputs. It has both training and validation
datasets labelled.
Example:
Consider a scenario where you have to build an image classifier to differentiate
between cats and dogs. If you feed the datasets of dogs and cats labeled images to the
algorithm, the machine will learn to classify between a dog or a cat from these labeled
images. When we input new dog or cat images that it has never seen before, it will use the
learned algorithms and predict whether it is a dog or a cat. This is how supervised
learning works and this is particularly an image classification.
There are two main categories of supervised learning that are mentioned below:

 Classification
 Regression

Classification
Classification deals with predicting categorical target variables, which represent
discrete classes or labels. For instance, classifying emails as spam or not spam, or predicting
whether a patient has a high risk of heart disease. Classification algorithms learn to map the
input features to one of the predefined classes.
Here are some classification algorithms:
 Logistic Regression  Decision Tree
 Support Vector Machine  K-Nearest Neighbors (KNN)
 Random Forest  Naive Bayes

Regression
Regression, on the other hand, deals with predicting continuous target variables,
which represent numerical values. For example, predicting the price of a house based on its
size, location, and amenities, or forecasting the sales of a product. Regression algorithms
learn to map the input features to a continuous numerical value.
Here are some regression algorithms:
 Linear Regression  Lasso Regression
 Polynomial Regression  Decision tree
 Ridge Regression  Random Forest

Advantages of Supervised Machine Learning


 Supervised Learning models can have high accuracy as they are trained on labelled data.
 The process of decision-making in supervised learning models is often interpretable.
 It can often be used in pre-trained models which saves time and resources when
developing new models from scratch.
2. Unsupervised Machine Learning
Unsupervised Learning Unsupervised learning is a type of machine learning technique
in which an algorithm discovers patterns and relationships using unlabeled data. Unlike
supervised learning, unsupervised learning doesn’t involve providing the algorithm with
labeled target outputs. The primary goal of unsupervised learning is often to discover hidden
patterns, similarities, or clusters within the data, which can then be used for various
purposes, such as data exploration, visualization, dimensionality reduction, and more.

Example: Consider that you have a dataset that contains information about the purchases
you made from the shop. Through clustering, the algorithm can group the same purchasing
behavior among you and other customers, which reveals potential customers without
predefined labels. This type of information can help businesses get target customers as well
as identify outliers.
There are two main categories of unsupervised learning that are mentioned below:
 Clustering
 Association

Clustering
Clustering is the process of grouping data points into clusters based on their similarity. This
technique is useful for identifying patterns and relationships in data without the need for
labeled examples.
Here are some clustering algorithms:
 K-Means Clustering algorithm  Principal Component Analysis
 Mean-shift algorithm  Independent Component Analysis
 DBSCAN Algorithm

Association
Association rule learning is a technique for discovering relationships between items in a
dataset. It identifies rules that indicate the presence of one item implies the presence of
another item with a specific probability.
Here are some association rule learning algorithms:
 Apriori Algorithm  FP-growth
 Eclat Algorithm

Advantages of Unsupervised Machine Learning
 It helps to discover hidden patterns and various relationships between the data.
 Used for tasks such as customer segmentation, anomaly detection, and data
exploration.
 It does not require labeled data and reduces the effort of data labeling.

3. Semi-Supervised Learning
Semi-Supervised learning is a machine learning algorithm that works between the supervised
and unsupervised learning so it uses both labeled and unlabelled data. It’s particularly useful
when obtaining labeled data is costly, time-consuming, or resource-intensive. This approach
is useful when the dataset is expensive and time-consuming. It is chosen when labeled data
requires skills and relevant resources in order to train or learn from it.

Example:
Consider that we are building a language translation model, having labeled translations for
every sentence pair can be resources intensive. It allows the models to learn from labeled
and unlabeled sentence pairs, making them more accurate. This technique has led to
significant improvements in the quality of machine translation services.
Basic Linear Algebra in Machine Learning Techniques
Machine learning has a strong connection with mathematics. Each machine learning
algorithm is based on the concepts of mathematics & also with the help of mathematics, one
can choose the correct algorithm by considering training time, complexity, number of features,
etc. Linear Algebra is an essential field of mathematics, which defines the study of vectors,
matrices, planes, mapping, and lines required for linear transformation.

The term Linear Algebra was initially introduced in the early 18 th century to find out the
unknowns in Linear equations and solve the equation easily; hence it is an important branch of
mathematics that helps study data. Also, no one can deny that Linear Algebra is undoubtedly
the important and primary thing to process the applications of Machine Learning. It is also a
prerequisite to start learning Machine Learning and data science.

Different ways to represent the Data in Linear Algebra


 Scalar: It is a physical quantity described using a single element, It has only magnitude
and not direction. Basically, a scalar is just a single number.
Example: 17 and 256

 Vector: It is a geometric object having both magnitude and direction, it is an ordered


number array, and are always in a row or column. A Vector has just one index, which can
refer to a particular value within the Vector.

V=[e1,e2,e3,e4]
Here V is a vector in which e1, e2, e3 and e4 are its elements, and V[2] is e3.

 Vector Operations
1. Scalar-Vector Multiplication
p = [e1, e2, e3] The product of a scalar with a vector gives the below result. When the
scalar 2 is multiplied by a vector p then all the elements of the vector p is multiplied by that
scalar. This operation satisfies commutative property.
p * 2 = [2 * e1, 2 * e2, 2 * e3]

You might also like