The document discusses challenges with measuring document similarity and proposes representing core document semantics as topic events. Specifically, it notes that existing methods focus more on information retrieval than semantic understanding. Additionally, long documents can contain many topic transitions that are difficult to capture. The document proposes representing each document's key elements as a structured topic event summary to help gain correlations and the core semantics within and between documents.
The document discusses challenges with measuring document similarity and proposes representing core document semantics as topic events. Specifically, it notes that existing methods focus more on information retrieval than semantic understanding. Additionally, long documents can contain many topic transitions that are difficult to capture. The document proposes representing each document's key elements as a structured topic event summary to help gain correlations and the core semantics within and between documents.
The document discusses challenges with measuring document similarity and proposes representing core document semantics as topic events. Specifically, it notes that existing methods focus more on information retrieval than semantic understanding. Additionally, long documents can contain many topic transitions that are difficult to capture. The document proposes representing each document's key elements as a structured topic event summary to help gain correlations and the core semantics within and between documents.
The document discusses challenges with measuring document similarity and proposes representing core document semantics as topic events. Specifically, it notes that existing methods focus more on information retrieval than semantic understanding. Additionally, long documents can contain many topic transitions that are difficult to capture. The document proposes representing each document's key elements as a structured topic event summary to help gain correlations and the core semantics within and between documents.
Document similarity Vector space model Topic event Document similarity
Similarity not available for document level similarity right now.
Conventional metrics only measure document similarity by
statistics or morphology of words such as vector space model Document similarity
The study focusing on document similarity is relatively rare.
Existing methods are mainly focus on information retrieval rather
than semantic level understanding. Vector space model
Vector space model(vsm) regards each document as a collection of
words and similarity measurement is based on the presence of words: 1. Jack borrowed a book from the teacher 2. The teacher borrowed a book from jack Topic event
Long document contain many topic transitions and different
focuses that’s difficult to capture their core semantics.
Topics are coherent and correlations can be gained by
comprehensive analysis. Topic event
We represent core semantics as an event which is called topic
event.
Topic event is the structured summery that’s we get from each
document.
Topic event contains key element of the document.