Professional Documents
Culture Documents
2033 Rao Faisal Maqbool Data Maining 2
2033 Rao Faisal Maqbool Data Maining 2
Briefly outline how to compute the dissimilarity between objects described by the
following:
Where m is that the number of matches (i.e., the amount of variables that i and j are within the
same state), and p is that the total number of variables.
Alternatively, we will use an outsized number of binary variables by creating a replacement
binary variable for every of the M nominal states. For an object with a given state value, the
binary variable representing that state is about to 1, while the remaining binary variables are set
to 0.
As an example, a set of documents on the auto industry is probably going to possess the term
auto in almost every document. to finish this, a mechanism are often introduced for attenuating
the effect of terms that occur too often within the collection to be meaningful for relevance
determination. this will be solved by cutting down the weights of terms with high collection
frequency.
Term Frequency(TF) is that the ratio of number of times a word occurred during a document to
the entire number of words within the document.
TF refers to term frequency of a term during a document . More the frequency of the term , more
likelihood is that that this particular document has relevancy to the present query term .
TF-IDF may be a n abbreviation for Term Frequency-Inverse Document Frequency and is a quite
common algorithm to rework text into a meaningful representation of numbers. The technique is
widely wont to extract features across various NLP applications. this text would assist you
understand the importance of TF-IDF, and the way to compute and apply the algorithm in your
applications.
Q2. Exercise 2.2 gave the following data (in increasing order) for the attribute age: 13, 15,
16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
A. (a) Use smoothing by bin means to smooth these data, using a bin depth of 3. Illustrate
your steps. Comment on the effect of this technique for the given data.
(a) Use smoothing by bin means to smooth the above data, using a bin depth of three.
A) Binning by boundaries
B) Exponential smoothing
C) Random walk
Q3. Consider the below given information, design and train a perceptron
while considering the below given parameters.
A B T W1 W2
1 1 1 0.6 -0.4
0 1 1
1 0 1
0 0 0