Lecture 10 Image

Image Processing
Lecture 10: Image Analysis

Feature recognition and classification
Modified by:
Assoc. Prof. Dr Hossam Mahmoud Moftah
Faculty of computers and artificial intelligence– Beni-Suef
University
Boundary Descriptors Techniques (lab assignment using python)
 There are several simple geometric measures that can be
useful for describing a boundary
 Length
 the number of pixels along a boundary gives a rough approximation of
its
 Diameter (Major Axis)
 Minor Axis
 the line perpendicular to the major axis
 Eccentricity
 Ratio of major axis to minor axis
Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

Region Descriptors Techniques
 Shape features (lab assignment using python)
 Area and perimeter
Adapted from Idar Dyrdal

 Texture features
 One of the simplest set of statistical features for texture
description consists of the following histogram-based
descriptors of the image (or region):
 mean, variance (or its square root, the standard deviation),

skew, energy (used as a measure of uniformity), and
entropy,





 Histogram-based (statistical) features
 The energy descriptor
 provides another measure of how the pixel values are distributed
along the gray-level range: images with a single constant value
have maximum energy (i.e., energy = 1); images with few gray
levels will have higher energy than the ones with many gray
levels. The energy descriptor can be calculated as


 Histogram-based (statistical) features
 Roughness
 The variance is sometimes used as a normalized descriptor of
roughness (R), defined as:
 Where, 𝜎2 is the normalized (to a [0, 1] interval) variance.

 R = 0 for areas of constant intensity, that is, smooth texture.


Texture Features Example

Texture Features Example
 Highest uniformity has lowest entropy

Texture Features - Gray level co-occurrence Matrix

(lab assignment using python)

Classification
Classification
 To design a classifier it is essential to have a training set of

images
 supervised learning:
 the classes to which the images belong are known
 unsupervised learning:
 they are unknown
 training the classifier: is the process of using data to
determine the best set of features for a classifier
Adapted from Geoff Dougherty, Image Processing for Medical Applications, CAMBRIDGE, 2009
Difference between Classification and Regression in Machine
Learning
Adapted from Jason Brownlee PhD, https://machinelearningmastery.com/about/

Difference between Classification and Regression in Machine
Learning
Adapted from Jason Brownlee PhD, https://machinelearningmastery.com/about/

Statistical classification
 There are two general approaches to statistical

classification parametric and nonparametric:
 Parametric Methods:
 require probability distributions and estimate
parameters derived from them such as the mean and
standard deviation to provide a compact
representation of the classes.
 there is a set of fixed parameters that uses to
determine a probability model that is used in
Machine Learning as well.
 Examples: Logistic Regression, Naïve Bayes
Model, etc.
Adapted from Geoff Dougherty, Image Processing for Medical Applications, CAMBRIDGE, 2009,
https://www.geeksforgeeks.org/difference-between-parametric-and-non-parametric-methods/
 Parametric Methods:
 Consider the case where there are just two classes
 class 1 (ω1) and class 2 (ω2), and a single feature, x.
 We have a training set, i.e. representative examples
from both classes,
 so that we can measure the feature for both classes and
construct probability distributions for each
 These are formally known as the probability density
functions or class-conditional probabilities (Appendix
B.3)
 p(x|ω1) and p(x|ω2), i.e. the probabilities of measuring
the value x, given that the feature is in class 1 or class
2, respectively.
 If we have a large number of examples in each class,
then the probability density functions will be Gaussian
in shape (the Central Limit Theorem)
 The classification problem is: given another feature
measurement, x, to which class does this feature
belong?
 the posterior probability, P(ωi|x), i.e. the probability
that given a feature value of x, the feature belongs to
class ωi.
 Probability theory, and specifically Bayes’ Rule, relates
the posterior probabilities to the class-conditional
probabilities or likelihoods (the derivation is given in
Appendix B.3):
 where P(ωi) is the a priori or prior probability (i.e. the

probability of being in class ω1 or ω2 based on the
relative numbers of those classes in the population,
prior to taking the test)
 and p(x) is often considered a mere scaling factor (the

evidence) that guarantees that the posterior
probabilities sum to unity
 We want to maximize the posterior probability, P(ωi|x)

 which is the same as maximizing p(x|ω1) . P(ωi)).
 Bayes’ decision rule is:
 Nonparametric
 In Non-Parametric methods, there is no need to make
any assumption of parameters for the given population
or the population we are studying.
 Non-parametric methods are gaining popularity
 Examples: KNN, Decision Tree Model, etc.
Adapted from https://www.geeksforgeeks.org/difference-between-parametric-and-non-parametric-methods/

k-nearest-neighbor (k-NN) classifier
(lab assignment using python for medical application)
 K nearest neighbors (KNN) is a simple algorithm that stores
all available cases and classifies new cases based on a
similarity measure (distance function)
 A case is classified by a majority voting of its neighbors, with

the case being assigned to the class most common among its K
nearest neighbors measured by a distance function.
 If K=1, then the case is simply assigned to the class of its

nearest neighbor
Adapted from Bing Liu, CS583, UIC

 Distance Function Measurements:

 Most common: Euclidean distance
 To classify a new input vector x, examine the k-closest training data points to x and assign the object to the most frequently occurring class
adopted from https://www.cut-the-knot.org/pythagoras/DistanceFormula.shtml
Adapted from David Sontag, New York University and Vibhav Gogate, Carlos Guestrin,
Mehryar Mohri, & Luke Zettlemoyer
Adapted from David Sontag, New York University and Vibhav Gogate, Carlos Guestrin,
Mehryar Mohri, & Luke Zettlemoyer
Adapted from Carla P. Gomes

gomes@cs.cornell.edu
KNN Example
Points X1(Acid Durability) X2(Strength) Y(Classification)
P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 ?
Adapted from Anand Bhosale presentation, international institude of information technology,innovation and leadership
Euclidean Distance From Each Point
3 Nearest NeighBour
P1 P2 P3 P4
(7,7) (7,4) (3,4) (1,4)

Euclidean
Distance of
P5(3,7) from
Class BAD BAD GOOD GOOD
KNN Classification
Points X1(Durability) X2(Strength) Y(Classification

)
P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 GOOD
KNN pseudocode
Unsupervised methods
 With unsupervised classification, the class labels are

unknown, and the data are plotted to see whether they cluster
naturally.
 Unsupervised methods examples:

 k-means clustering (see lecture 5)
 Hierarchical clustering
Measuring Classification Performance
Adapted from https://en.wikipedia.org/wiki/Confusion_matrix

The Confusion Matrix
Adapted from https://glassboxmedicine.com/2019/02/17/measuring-performance-the-confusion-matrix/




A receiver operating characteristic (ROC) curve
 The ROC curve is a graphical plot that illustrates the
diagnostic ability of a binary classifier system as its
discrimination threshold is varied.
 The ROC curve is created by plotting the true positive

rate (TPR) against the false positive rate (FPR) at various
threshold settings.
Adapted from https://en.wikipedia.org/wiki/Receiver_operating_characteristic /

A receiver operating characteristic (ROC) curve
Adapted https://en.wikipedia.org/wiki/Receiver_operating_characteristic
Confusion Matrix example
Adopted from Poornima Singh, Sanjay Singh, and Gayatri S Pandi-Jain, Effective heart disease prediction system using
data mining techniques, Int J Nanomedicine, 2018
The End

Lecture 10 Image

Uploaded by

Copyright:

Available Formats

You might also like

Lecture 10 Image

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 10 Image

Uploaded by

Copyright:

Available Formats

Image Processing

Lecture 10: Image Analysis

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

Adapted from Idar Dyrdal

 mean, variance (or its square root, the standard deviation),

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

 Where, 𝜎2 is the normalized (to a [0, 1] interval) variance.

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

 Highest uniformity has lowest entropy

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

Adapted from Rushin Shah, Site Supervisor at Adani Shantigram

 To design a classifier it is essential to have a training set of

Adapted from Jason Brownlee PhD, https://machinelearningmastery.com/about/

Adapted from Jason Brownlee PhD, https://machinelearningmastery.com/about/

 There are two general approaches to statistical

 where P(ωi) is the a priori or prior probability (i.e. the

 and p(x) is often considered a mere scaling factor (the

 We want to maximize the posterior probability, P(ωi|x)

 Non-parametric methods are gaining popularity

 Examples: KNN, Decision Tree Model, etc.

Adapted from https://www.geeksforgeeks.org/difference-between-parametric-and-non-parametric-methods/

 A case is classified by a majority voting of its neighbors, with

 If K=1, then the case is simply assigned to the class of its

Adapted from Bing Liu, CS583, UIC

 Distance Function Measurements:

adopted from https://www.cut-the-knot.org/pythagoras/DistanceFormula.shtml

Adapted from Carla P. Gomes

Points X1(Acid Durability) X2(Strength) Y(Classification)

(7,7) (7,4) (3,4) (1,4)

Class BAD BAD GOOD GOOD

Points X1(Durability) X2(Strength) Y(Classification

 With unsupervised classification, the class labels are

 Unsupervised methods examples:

Adapted from https://en.wikipedia.org/wiki/Confusion_matrix

Adapted from https://glassboxmedicine.com/2019/02/17/measuring-performance-the-confusion-matrix/

Adapted from https://glassboxmedicine.com/2019/02/17/measuring-performance-the-confusion-matrix/

Adapted from https://glassboxmedicine.com/2019/02/17/measuring-performance-the-confusion-matrix/

Adapted from https://glassboxmedicine.com/2019/02/17/measuring-performance-the-confusion-matrix/

 The ROC curve is created by plotting the true positive

Adapted from https://en.wikipedia.org/wiki/Receiver_operating_characteristic /

You might also like