Homework 5: Document Identifier Document Content

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Homework 5

1. Using the collection of text documents D, rank them according to the results obtained from the
similarity function described in class.
Document Identifier Document content
1. German culture began long before the rise of Germany as a nation-state
and spanned the entire German-speaking world.
2. The university of Wisconsin parkside is located in Kenosha, Wisconsin.
3. Wisconsin sells cheese to other places in the world such as Germany.
4. My culture class deals with German culture
5. I have a lot of classes at the University of Wisconsin, but nothing about
Wisconsin even though it is in Wisconsin.

a. What is Wisconsin?
b. The country of Germany
c. cheeses of the world

The vectors without stoping words and combinations of same words (german and Germany for
example) are
1. {germany, culture , germany, nation, state, germany, speak, world}
2.{university, wisconsin, parkside,kenosha,wisconsin}
3.{wisconsin, sell, cheese, places, world, germany}
4.{culture,class, deal, germany, culture}
5. {class, university, wisconsin, nothing, wisconsin,wisconsin}
This is our initial table

And our vector associated is

Using the formula w(t,d)=(0.5+0.5*tf/max_tf)log(N/df)
This is our queries vectors

And this is our similarity or closeness of the documents to our queries

You might also like