Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Summary of Chapters 9-12

Niraj Soni
M180243MS

Chapter 9 explains the need of time series analysis in data analysis. There are several models
for time series analysis that is to identify any regular pattern of observations relative to the
past, with the purpose of making predictions for future periods. Initially some basic concept
of time series is explained. Then the major indicators are discussed for evaluating the
accuracy of the time series models. After that studied about decomposition methods which
are used to identify the basic components of time series. And learnt about exponential
smoothing models which is the class of predictive methods and is considered as most
accurate model based on empirical analyses, before considering the main characteristics of
autoregressive models. At last studied about methods based on combinations of predictive
models, which are quite effective in practice, and learnt the general criteria underlying the
choice of a forecasting method. Time series analysis has many applications in business,
financial, socio-economic, environmental and industrial domains. Depending on the specific
application, predictions, patterns, trends, or sequences are referred.

Chapter 10 explains Classification holds a prominent position in learning theory due to its
theoretical implications and the countless applications it affords. Starting from a set of past
observations whose target class is known, classification models are used to generate a set of
rules that allow the target class of future examples to be predicted. In the chapter we learnt
general characteristics of classification problems, discussing the main criteria used to
evaluate and to compare different models. Also the major classification methods:
classification trees, Bayesian methods, neural networks, logistic regression and support
vector machines.
According to chapter outline
1. Classification problems suggest about the components of classification problem and
phases of development of classification model.
2. Evaluation of classification models- To develop alternative models and then select
the method affording the best prediction accuracy it should be evaluated based on
several criteria.
3. Classification trees are perhaps the best-known and most widely used learning
methods in data mining applications because conceptual simplicity, ease of usage,
computational speed, robustness with respect to missing data and outliers and, most of
all, the interpretability of the rules they generate.
4. Bayesian methods calculate the posterior probability that a given observation belongs
to a specific target class by means of Bayes’ theorem, once the prior probability and
the class conditional probabilities are known.
5. Logistic regression is a technique for converting binary classification problems into
linear regression
Chapter 11 discuss about association rules which is to identify regular patterns and
recurrences within a large set of transactions. To overcome situations like a class of
unsupervised learning models that can be used when the dataset of interest does not include a
target attribute, association rules are used. 1st part of the chapter briefly describes the structure
and evaluation criteria of association rules. Here we can see the use of association rules to
identify possible recurrence in the massive list of dataset. 2nd part explains simple association
rules that address a single dimension of analysis and asymmetric binary attributes. In 3 rd part
Apriori algorithm is described, which is the most popular method of generating association
rules, both in its original form and in a number of variants. The Apriori algorithm is a more
efficient method of extracting strong rules contained in a set of transactions. During the first
phase the algorithm generates the frequent item sets in a systematic way, without exploring
the space of all candidates, while in the second phase it extracts the strong rules. And in last
part the main issues are discussed that arise in connection with the extraction of association
rules of a general nature, characterized by a more complex structure.

Chapter 12 explains various clustering methods. By defining appropriate metrics and the
induced notions of distance and similarity between pairs of observations, the purpose of
clustering methods is the identification of homogeneous groups of records called clusters.
Section 1 of the chapter describes the main features of clustering models. The aim of
clustering models is to subdivide the records of a dataset into homogeneous groups of
observations, so that observations belonging to one group are similar to one another and
dissimilar from observations included in other groups. Here we learnt 4 methods for deriving
the clusters. Then affinity measures of distance between pairs of observations, in relation to
the nature of the attributes contained in the dataset are discussed where we save several
attributes that affect the measures. Section 2 gave a deep learning on the methods learnt in
section 1 for deriving the clusters. Partition methods which is focusing in particular on K-
means and K-medoids algorithms and hierarchical method that illustrate both agglomerative
and divisive hierarchical methods, in connection with the main metrics that express the
inhomogeneity among distinct clusters. Section 3 gives some indicators to check the quality
of clustering models that include cohesion, separation, silhouette coefficient.

You might also like