Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Summer Internship Report

Aditya Dhoke
Roll. No - 04005013
Department of Computer Science and Engineering
Indian Institute of Technology Bombay ,India.
Guide Prof. Amit Sheth
August 5, 2007

Abstract
In this report, I will describe the projects / implemetations namely Entity Spotter, Semantic Browser and Web Portal on which I worked on with
Kno.e.sis lab members during my summer internship, 2007. I will also provide the details of the presentation which I gave as a part of my curriculum.
Finally , the conclusion states the utility of this internship for me and future
endeavors that I plan.

0.1

Entity Spotter

The implementation reads a set of entities and relations and their corresponding IDs to create a tree data structure, similar to a Trie structure and
tags the entities in text.
In the initial stage, the program takes as input the concepts and relationships along with their alphanumerical IDs and creates a tree structure
with each node having a hashmap.This hashmap (set of key value pairs) has
key as the words within the entity and value is another node in the tree. As
words are read, the tree is traversed from top to bottom with the key as the
current word and the value as the node below it. If the set of words read
matches an entity, the ID of that entity will be fetched from the current
node. It also handles the case where one entity is prefix of another and if
the longer entity does not have a match, it backtracks to tag the shorter
entity.
The method runs in O(log(m)*n) time, where m is the number of entities
and n is number of words in the input file, as opposed to the brute force
technique where time taken would have been O(m*n).The data structure
has been shown the Figure 1. The figure shows storage of two entites gene
mutation and gene abnormility and their corresponding IDs D130 and
D432. If gene mutation occurs in the text, the hash-table of the uppermost
node is referred and later the hash-table in the left node is referred.This
finally leads to the node in which the ID D130 is stored.

Figure 1: Tree structure for storing entities

0.2

Presentation : Ontology Summarization Based


on RDF Sentence Graph

The aim of the presentation was to put forward the idea in [1] .The computer
scientist in Southeast University,China had introduced the novel idea of RDF
Sentence graph which they used for summarizing the ontologies in RDF format. Given an ontology(RDF graph) along with length of the summary and
preference, RDF sentences are detected from which graph is built in which
each node is a RDF sentence.Now the summarization problem has been
reduced to finding salient nodes in the graph.After this, re-ranking of the
salient nodes was done to get more appropriate ontology. Degree Centrality,
Shortest-Path-based Centrality, Eigenvector Centrality, Weighted HITS are
the methods that were used finding salience.The work flow has been shown
the Figure 2.

Figure 2: Workflow for Ontology Summarization

0.3

Semantic Browser

Semantic Browser is a tool for browsing the semantically connected Pub-Med


abstracts. We can traverse the documents along with the RDF generated
using the text.The aim of the application is to evaluate the quality and
authenticity of the RDF that has been created from the text. It has been
built as a web application for platform independence.

0.3.1

Data Storage

The RDF statements are stored in the form of Trie structure persistent
object. The abstracts and their PMIDs are indexed using Lucence Index.
The persistent object and indexes are created off-line and stored on the
server.

0.3.2

Data Exchange

The data exchange is done using AJAX, parameters are passed from the
client-side to a JSP which in turn queries information on the server-side.
The data retrieved is converted in XML format by JSP. The XML data is
parsed by DOM on the client-side and is then made readable to the user by
CSS.

0.3.3

Functionality

The entities and relations in the abstract are highlighted.When the user
hovers over the entity(subject), the corresponding relation and object of
RDF statement are listed. The PMID numbers of the files in which this
statement occurs is displayed. Two search boxes are provided one for PMID
and other for keyword. As the user types suggestions appear in a drop down
menu.

Figure 3: Phases of Semantic Browser

0.4

Web Portal

I worked on the library web page of Kno.e.sis. The resources were displayed
on web using a tool named Exhibit. The tool provided an interface to browse
through the resources. Earlier, it fetched data in JSON format which was
created manually from the spreadsheets. Now, the data is read directly
from spreadsheet. The data in spreadsheet(Google Spreadsheet) was cleaned
up using Java library so that every lab members name appears only once
irrespective of whether he/she uses initials or canonical forms.

0.5

Acknowledgements

I am grateful to Prof.Amit Sheth for giving me opportunity to work in his


lab.I am thankful to Cartic Ramakrishnan for his consistent guidance and
support.

0.6

Conclusion

At the end of internship, most of my queries about the research in Semantic


Web and its future prospects have been answered. I got myslf acquanted
with different areas of Semantic Web by interacting with the lab members. I
have discovered the research topic that I am interested in and consequently
want to pursue Ph.D. in the same topic.

Bibliography
[1] Xiang Zhang,Gong Cheng,Yuzhong Qu, Ontology Summarization
Based on RDF Sentence Graph, World Wide Web Conference, 2007.
[2] Bush,V., As We May Think. The Atlantic Monthly,1945. 176(1) p.101108.
[3] Cartic Ramakrishnan,Krys J. Kochut,Amit P. Sheth, A Framework for
Schema-Driven Relationship Discovery form Unstructured Text ISWC
,2006. p.583-596.
[4] Marti A. Herst, Untangling Text Data Mining, Proceedings of ACL
,1999.
[5] Partha Pratim Talukdar,Thorsten Brants,Mark Liberman Fernando
Periera, A Context Pattern Induction Method for Named Entity Extraction, Proceedings of 10th Conference on Computional Natural Language Learning, June 2006.
[6] Eugene Agichtein, Luis Gravano Snowball: Extracting Relations from
Large Plain-Text Collections ACM DL, 2000.
[7] lucene.apache.org

You might also like