Web Search Systems

Chapter 2
BACKGROUND AND LITERATURE REVIEW
The Web information resources are growing explosively in number and volume but to retrieve relevant information from these resources are becoming more difficult and time-consuming. The way of storing Web information resources is considered a root cause of irrelevancy in information retrieved because they are not stored in machine-understandable organization. It has been estimated [12] that nearly 48 percent to 63 percent irrelevant results are retrieved. That is, the relevant results are in the range of 37 percent to 52 percent, as shown in Table 2.1, which are far from the accuracy. An extension of current web known as Semantic Web, has been envisioned to organize the web content through ontologies in order to enable them machine understandable. To support the theme of semantic web, there is a crucial need of techniques for designing, development, populating and integrating ontologies.
2.1 Semantic Web Vs Current Web

Semantic web may be compared with non-semantic web within several parameters such as content, conceptual perception, scope, environment and resource-utilization. (a) Content: Semantic web encompasses actual content along with its formal semantics. Here, the formal semantics are machine understandable content, generated in logic-based languages such as Web Ontology Language (OWL) recommended by W3C. Through formal semantics of content, computers can make inferences about data i.e., to understand what the data resource is and how it relates to other data. In todays web there is no formal semantics of existing contents. These content are machine-readable but not machine understandable.
Table 2.1: Relevant results (from top 20 results) of search some engines Search Engine Yahoo Google MSN Ask Seekport Precision 0.52 0.48 0.37 0.44 0.37
(b) Conceptual Perception: Current web is just like a book having multiple hyperlinked documents. In book scenario, index of keywords are presented in each book but the contexts in which those keywords are used, are missing in the indexes. That is, there are no formal semantic of keywords in indexes. To check which one is relevant, we have to read the corresponding pages of that book. Same is the case with current web. In semantic web this limitation will be eliminated via ontologies where data is given with well-defined meanings, understandable by machines. (c) Scope: Through literature survey, it has been determined that inaccessible part of the Web is about five hundred times larger than accessible one [13]. It is estimated that there are billion pages of information available on the web, and only a few of them can be reached via traditional search engines. In semantic web formal semantics of data are available via ontologies, and the ontologies are the essential component of semantic web accessible to semantic search engines. (d) Environment: Semantic Web is the web of ontologies having data with formal meanings. This is in contrast to current Web which contains virtually boundless information in the form of documents. The Semantic Web, on the other hand, is about having data as well as documents that machines can process, transform, assemble, and even act on data in useful ways.
(e) Resources Utilization: There are a lot of Web resources that may be very useful in our everyday activities. In current web it is difficult to locate them; because they are not annotated properly by the metadata understandable by machines. In Semantic Web there will be a network of related resources. It will be very easy to locate and to use them in semantic web world. Similarly there are some other criteria factors for comparison between current web and semantic web. For example, information searching, accessing, extracting, interpreting and processing on semantic web will be more easy and efficient; Semantic web will have inference or reasoning capability; network or communication cost will be reduced in the presence of semantic web for the reason of relevant results; and many more - some are listed in the Table 2.2. Table 2.2: Semantic vs. current Web
Sr. No 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Web Factors Conceptual Perception Content Scope Environment Resource Utilization Inference/Reasoning capability Knowledge Management application support Information searching,accessing,extracting Timeliness,accuracy,transparency information Semantic heterogeneity Ingredients Text simplification and clarification of (Non-Semantic) Web Large hyperlinked book No formal meanings Limited-Probably invisible web excluded Web of documents Minimum-Normal No No Difficult and timeconsuming task Less More -Content -Presentation No Semantic Web Large interlinked database Formally defined Boundless-Probable invisible web included Web of ontologies, data and documents Maximum Yes Yes Easy and Efficient More Less -Content, presentations -Formal Semantics Yes
According to theme of semantic web, if the explicit semantics of web resources are put together with their linguistic semantics and are represented in some logic-based languages then we can handle the limitations of current web. To support this theme of semantic web, W3C recommended some standards [14] such as RDF (Resource Description Framework), RDFS (RDF Schema), OWL (Web Ontology Language), SPARQL (a query language for RDF) and GRDDL (Gleaning Resource Descriptions from Dialects of Languages). RDF is used for data model of semantic web application. Resources are represented through URIs, connected through labeled edges which are also represented through URIs. RDF is represented
through a language called RDFS. There is another more powerful language so- called OWL to represent RDF model. The query language such as SPARQL can be used to querying RDF model. Now the semantic web vision has become a reality [15, 16]. Several semantic web systems have been developed. A subset of these systems is given in Table 2.3. A huge number of ontology based web documents have been published.
Table 2.3: Examples of Semantic Web Systems [17]

SW Systems A Digital Music Archive(DMA) for NRK using SW techniques A linked Open Data Resource List Management Tool for UndergradStds A Semantic Web Content Repository for Clinical Research An Intelligent Search Engine for Online Services for Public Admin. An Ontology of Cantabrias Cultural Heritage Composing Safer Drug Regimens for the Individual Patient using SWT CRUZARAn application of semantic matchmaking for eTourism in the city of Zaragoza Enhancement and Integration of Corporate Social Software Using SW Enhancing Content Search Using SW Geographic Referencing Framework Improving the Reliability of Internet Search Results Using Search Thresher Improving Web Search Using Metadata KDE 4.0 Semantic Desktop Search and Tagging POPSNASAs Expertise Location Service Powered by SW Technologies Prioritization of Biological Targets for Drug Discovery Real Time Suggestion of Related Ideas in the Financial Industry S_Ctnt_Desc to improve discovery Semantic MDR and IR for National Archives Semantic Tags SWT for Public Health Awareness Country Norway UK US Spain Spain US Spain Activity area Broadcastin g ELT, and publishing HC and PI PI and eG PI and museum HC PI and eToursim Util and energy IT industry PI,GIS & eG Web accessibilit y Search Semantic desktop PI Life sciences Financial Telecom PI IT industry HC,PI & eG PI and HC Application area of SWT IS,CD,& DI CD,CM,DI, and SA DI Portal and IS Portal and DI DI and IS Portal and DI SWT used 1,2,3 & IHV RDF,3,1,SK OS &PV 1,2,4,5, and PV 1 and IHV 1 and IHV 1,2,PV, and IHV 1,3,Rules,PV , and IHV 1,3, and PV 1 1,2 & IHV 1 SW technology benefits IS,INR,S & RD (ECR),P, reduced time to market, and S & RD Automation,IM,and IS ECR,INR, and IS ECR and IM S & RD,open model,IS,and DCG Faceted navigation,P,and (S&R D) IS,S & RD, and INR IS and S& RD S&RD and automation IS
France US UK Ireland
DI,CM, and SN Portal and IS DI IS
Spain and US Germany
US US Spain UK Korea Serbia US
Portal,IS,CD, and customization DI,CD,IS,service integration, and SA DI and SN DI and portal CD CD and IS Portal,CD, and SA SA,SN, and DI DI
1,2,RDF,4,P V,and IHV 1,3,2 and RDFS++ 1 and 3 1,3,2,2DL,P V,and IHV I and IHV 1,PV & IHV 1,2,Rules, PV, and IHV 1,3 and PV 1,2,PV & IHV 2,3,and PV
P,open model, and S&RD IS,IM, and open model
Faceted navigation, S&RD, and ECR IS,S&RD,IM,and ECR IS S&RD,open model, and IS IS INR,IS, S&RD,and DCG S&RD and IM
Semantic-based Search and Query China DI,IS and Schema S&RD and IS System for the Traditional Chinese mapping Medicine Community The SW for the Agricultural Domain, Italy PI and eG Portal,IS,SA, and 1,2,SKOS,P IS Semantic Navigation of Food, Nutrition CD V,and IHV and Agricultural Journal The swordfish Metadata Initiative US IT industry Portal and DI 1 and IHV DCG and ECR Better,Faster,Smarter Web Content Twine US IT industry SA,SN,and DI RDF,1,2,3 INR,IS,S&RD,and DCG Use of SWT in Natural Language India IT industry Natural language 1,2,5,3, and IM and ECR interface to Business Applications interface IHV Different abbreviations and integers used in above table are: SWT(Semantic Web Technologies),IHV(In-House Vocabularies),IS(Improved Search),IM(Incremental Modeling),CD(Content Discovery),DI(Data Integration),INR (Identity New Relationships),S&RD(Share and Reuse Data),PV(Public Vocabularies),ECR(Explicit Content Relationships),SA(Semantic Annotation),SN(Social Network),GIS(Geographic Information System) ,ELT(Education, Learning Technology),CM(Content Management),DCG(Dynamic Content Generation),P(Personalization),PI(Public Institution),HC(Health Care),eG(eGovernement),1(RDFS),2(OWL),3(SPARQL),4(GRDDL),5(Rules, Rules(N3))
2.2
The Web Ontology
Ontology formally provides structural knowledge of a domain and its data in the machineunderstandable way in some W3C [18] recommended technologies such as RDFS [19] and OWL [20] for ontology formalization. It is an essential component of any semantic web application. Basically, it is the central component in overall layered architecture of the semantic web, as shown in Figure 2.1. In ontologies, the information-resources are connected in such a way that each is uniquely identified by URI and new information can be drawn through a reasoning process. Basically, the ontology is a special type of network of web-resources. A conceptual view of ontology can be in RDF-graph or in triples form. Consider an example of a persons family ontology as shown in Figure 2.2. Now by definition of ontology, it looks a special kind of a network of information resources. The only information explicitly given in said ontology are isFatherOf, isMotherOf, isBrotherOf, isSisterOf, but we can drive a several types of new information such as isSonOf(3,1) (i.e. Humza is the son of Farooq), isHusbandOF(8,9), isWifeOf(9,8),
isGrandFatherOf(8,3), isGrandmotherOf(9,3) and many more can easily be derived. It can be implemented in some logic-based language such as OWL and conceptually it can be seen as triples or RDF-graph.
Figure 2.1: A layered model of semantic web [21].
10
Figure 2.2: A sample slice of persons family ontology
2.3 Engineering Ontologies for Semantic Web

We have reviewed the literature concerning the adaptation of existing web development methodologies for semantic web applications. The findings disclose that not much work has been done in this direction. XML web engineering methodologies adaptations are proposed in the form of WEESA [22]. It generates semantic annotations by defining a mapping between the XML schemas and existing ontologies. No proper guidelines are there for designing ontology. WEESA cannot directly design or develop ontologies. It focused on mapping of XML schema concepts onto concepts of existing ontology. In [23], the authors have
11
proposed an extension of the Web Site Design Method. In this approach object chunk entities are mapped to concepts in the ontology. OOHDM has extended in the light of the semantic web technologies [24]. Its primitives for conceptual and navigation models have been described as DAML classes and RDFS have been used for domain vocabulary. The Hera [25] methodology has been extended for adaptive web-based application engineering. It uses semantic web standards for models representation. Existing web engineering methodologies as mentioned above, have adapted the use of ontology in their development process, but their main focus is mapping and annotation, using existing ontologies. Design process of new ontologies during semantic web application development is remained lack of focus. No proper guidance is there on how to design ontology for semantic web applications. There are some other methodologies for ontology development as surveyed in [26, 27, 28]. Mostly these methodologies focus on specification and formalization of ontology and do not concentrate on its design phase. These methodologies are based on natural language processing (NLP) and machine learning
techniques. Orientation of these methodologies is web-agents facilitation rather than web-contents formalization. The work on ontology development was boosted when the idea of Semantic Web was envisioned. In KBSI IDEF5 methodology [29], data about domain is collected and analyzed. Then built and fixed strategy is used to create ontology. Ushold and King [30] proposed an ontology development methodology. In this methodology, after identifying purpose of ontology, it is captured and then coded. In MethOntology[31] after preparing ontology specification, knowledge is acquired and analyzed to determine domain terms such as concepts, relations and properties and then formalization is started. After that, evaluation and documentation is performed. In [32] a methodology, based on collaborative approach has been proposed. In its first phase, design criteria for the ontology, specified boundary conditions for the ontology and set of standards for evaluating ontology, are defined. In second phase, the initial version of ontology is produced, and then through
12
iterative process the desired ontology is obtained. Software Built and Fixed approach is followed, which leads to heavy development and maintenance cost. Helena and Joo [33] proposed a methodology for ontology construction in 2004. This method divides ontology construction activities in different steps such as specification, conceptualization, formalization, implementation and maintenance. Knowledge acquisition, evaluation and documentation are performed during each phase. There are some other approaches investigating the transformation of a relational model into an ontological model. In these approaches, the ontology is developed from database schemas. These approaches mainly use reverse engineering theory. In [34], a methodology is outlined for constructing
ontology from conceptual database schemas using a mapping process. The process is carried out by taking into consideration the logical database model of target system. Most of methodologies as overviewed above are based on the built and fixed approach. In that, first initial version of ontology is built and improved iteratively until domain requirements are satisfied. In this way the basic principles of software engineering are not followed properly. These methodologies mainly focus on data during developing process rather than focus on descriptive knowledge. These methodologies mainly work on specification and implementation phases and design phase lacks in proper attention. Moreover the design and implementation phases of these methodologies are difficult to identify and separate. One of the challenges in real world applications is to improve accessing and sharing knowledge that resides in database. Domain knowledge to be shared needs to be formalized using a formal ontology language. Extracting and building web ontology on top of Relational Database (RDB) is a way to represent domain specific knowledge. Moreover, information residing in RDBs is highly engineered for accessibility and scalability [35] and is characterized by high quality and rapidly increasing correspondence to the surface web [36]. In the schema mapping techniques, the idea is to convert RDB schema to ontology based on the predefined schema mapping rules. Various research groups have proposed various techniques.
13
Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective mapping for ontologies automatically. It uses RDB schema to create OWL ontology and the mapping instance data. The translation process relies on a set of data source-to-domain mapping rules. A processing module named as Semantic Bridge for RDB translates queries to produce OWL ontologies. Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective ontology mappings automatically. Automapper is an application independent tool that generates a basic ontology from a relational database schema and dynamically produces instance data using ontology. This method quickly exposes the data to the Semantic Web, where a variety of tools and applications are available to support translation, integration, reasoning, and visualization.
The following class descriptions, axioms and restrictions are currently generated by Automapper: maxCardinality is set to 1 for all nullable columns and is used for descriptive purposes minCardinality is set to 1 for all non-nullable columns and is used for descriptive purposes. All datatype and object properties that represent columns are marked as Functional Properties. To ensure global uniqueness and class specificity, these columns are given URIs based on concatenating the table and column names allValuesFrom restriction reflect the datatype or class associated with each column and is used for descriptive purposes. Table 2.4 lists the contents of Departments table Table 2.4: Departments Table
IDa 1 2 3 NAME System Solutions Research and Development Management
a. ID is Primary Key
14
From this schema, Automapper creates the data source ontology and class-specific inverse functional rules, as shown in the Fig.2.3
dsont:Hresources.Departments a owl:Class; rdfs:subClassOf [ a owl:Restriction; owl:onProperty dsont:hresources.departments.ID; owl:allValuesFrom xsd:decimal ], [ a owl:Restriction; owl:onProperty dsont:hresources.departments.ID; 1^^xsd:nonNegativeInteger ]
Figure 2.3: Ontology and inverse functional rules created by Automapper XTR-RTO [38] is an approach to build OWL ontology from eXtensible Markup Language (XML) document. The transform module firstly maps source of XML schema into RDB and next, maps it into OWL ontology. Specific attributes for describing RDB are used such as rdb: DbName, rdb: Relation, rdb: RelationList, rdb: Table and rdb: Attribute. All tables are mapped to instances of type rdb: Relation and consequently will be added to type rdb: RelationList, where as each attribute is mapped to an instance of type rdf: Attribute and an instance of type rdb: hasType. XTR-RTO [38] is an approach to build OWL ontology from eXtensible Markup Language (XML) document. The transformation module firstly maps source of XML schema into RDB and next, maps it into OWL ontology. Specific attributes for describing RDB are used such as rdb: DbName, rdb: Relation, rdb: RelationList, rdb: Table and rdb: Attribute. All tables are mapped to instances of type rdb: Relation and consequently will be added to type rdb: RelationList, where as each attribute is mapped to an instance of type rdf: Attribute and an instance of type rdb: hasType.
15
The entity-relation model is the most popular style for organizing database at present, which can express the relationship between data clearly. Therefore metadata information and structural restrictions extracted from relational database to construct ontologies.The ontology contains: Vocabularies for describing relational database systems such as:rdb:DBName,rdb:Relation, rdb:RelationList,rdb:Table,rdb:Attribute,rdb:PrimaryKeyAttribute , and rdb:ForeignKeyAttribute. Semantic relationships between vocabularies such
as:rdb:hasRelation,db:hasAttribute,rdb:primaryKey,db:hasType and rdb:isNullable. Restrictions on the vocabularies and their semantic relationships such as: each relation has zero or more attributes, and each attribute has exactly one type. Mapping approach used in XTR-RTO is described below: Each table is mapped to an instance of type rdb: Relation and added to type rdb: RelationList. Each attribute is mapped to an instance of type rdb: Attribute and an instance of type rdb: hasType is generated simultaneously. If the attribute is the foreign key, an instance of type rdb: ReferenceAttribute and an instance of type rdb: ReferenceRelation are generated to represent this information. Generate the restriction of each instance of type rdb: Attribute, such as cardinality restriction and foreign key restriction. There is one table in this relational database, as illustrated in Table 2.5 Table 2.5: Book Table in Relational Model
BOOK_ID TITLE AUTHOR PRINTER_ID
Suppose the database is saved in local host, and the OWL ontology describing the database has a namespace http://localhost/book.owl, which can be changed by users. Fig. 2.4 illustrates the relations in the ontology:
16
<?xml version=1.0 encoding=UTF-8> <rdf:RDF> <rdb:DBName rdf:ID=BOOK> <rdb:hasRelation> <rdb:RelationList> <rdb:Relation rdf:resource= http://localhost/book/BOOK/BOOK.owl #BOOK/> </rdb:RelationList> </rdb:hasRelation> </rdb:DBName> </rdf:RDF>
Figure 2.4: Relations in the ontology The OWL description of table BOOK is as shown in the Fig. 2.5
<rdb:Relation rdf:ID=BOOK> <rdb:hasAttribute> <rdb:AttributeList> <rdb:Attributerdf:resource=http://localhost/book/BOOK/BOOK.owl#BOOK_ID/> <rdb:Attribute rdf:resource=http://localhost/book/BOOK/BOOK.owl#TITLE/> <rdb:Attribute rdf:resource=http://localhost/book/BOOK/BOOK.owl#AUTHOR/> <rdb:Attribute rdf:resource=http://localhost/book/BOOK/BOOK.owl#PRINTER_ID/> </rdb:AttributeList> </rdb:hasAttribute> </rdb:Relation> <rdb:PrimarKey rdf:ID=BOOK_ID> <rdb:isNullable>false</rdb:isNullable> <rdb:type>string</rdb:type> </rdb:PrimarKey> <rdb:Attribute rdf:ID=TITLE> <rdb:isNullable>false</rdb:isNullable> <rdb:type>string</rdb:type> </rdb:Attribute> <rdb:Attribute rdf:ID=AUTHOR> <rdb:isNullable>false</rdb:isNullable> <rdb:type>string</rdb:type> </rdb:Attribute> <rdb:Attribute rdf:ID=PRINTER_ID> <rdb:isNullable>false</rdb:isNullable> <rdb:type>string</rdb:type> </rdb:Attribute> <rdb:referenceAttribute> <rdf:Attribute rdf:resource= http://localhost/book/PRINTER/PRINTER.owl #PRINTER#PRINTER_ID/> </rdb:referenceAttribute>
Figure 2.5: OWL description of BOOK Table
17
In RTAXON [39] data mining approach is used. This work proposed learning technique to exploit all database content to identify categorization patterns. These categorization patterns used to generate class hierarchies. This fully formalized method combines a classical schema analysis with hierarchy mining (extraction) in the data. Learning method focuses on concept hierarchy identification by identifying the relation of attributes. Due to the assumption that attribute names have specific role in the relation, this approach identifies lexical clues of attribute names that reveal their specific role in the relation (i.e. classifying the tuples). A learning method that shows how the content of the databases can be exploited to identify categorization patterns. These categorization patterns are then used to generate class hierarchies. This fully formalized method combines a classical schema analysis with hierarchy mining (extraction) in the data. Learning method focuses on concept hierarchy identification by identifying the relation of attributes. Due to the assumption that attribute names have specific role in the relation, this approach identifies lexical clues of attribute names that reveal their specific role in the relation (i.e. classifying the tuples). In example of Fig. 2.6, the categorizing attribute in the Products relation is clearly identified by its name (Category). 2.3.1 Identification of the categorizing attributes: Two sources are involved in the identification of categorizing attributes: names of attributes and data diversity in attribute extensions (i.e. in column data). These two sources are indicators that allow, finding attribute candidates and selecting the most plausible one. a. Identification of lexical clues in attribute names: Attributes bear names that reveal their specific role in the relation (i.e. classifying the tuples) used for categorization. The lexical clue that indicates the role of the attribute can just be a part of the name, as in the attribute names CategoryID or Object Type. In Fig. 2.6, for example, the categorizing attribute in the
18
Products relation is clearly identified by its name (i.e. Category).A list of clues set up and used to perform a first filtering of potential candidates. b. Filtering through entropy-based estimation of data diversity: With an extensive list of lexical clues, the first filtering step appears to be effective. For example, Category column in the Products relation can be used to derive subclasses. However, experiments on complex databases show that this step often identifies several candidates. The selection among the remaining candidates is based on an estimation of the data diversity in the attribute extensions. A good candidate might exhibit some typical degree of redundancy that can be formally characterized using the concept of entropy from information theory. Entropy is a measure of the uncertainty of a data source. Attributes with highly repetitive content will be characterized by low entropy. Conversely, among attributes of a given relation, the primary key will have the highest entropy since all values in its extension are distinct. Informally, the rationale behind this selection step is to favor the candidate that would provide the most balanced distribution of instances within the subclasses. 2.3.2 Generation and population of the subclasses: The generation of subclasses from an identified categorizing attribute can be straightforward. A subclass is derived from each value type of the attribute extension (i.e. for each element of the attribute active domain). However, proper handling of the categorization source may require more complex mappings. Example As illustrated in Fig. 2.6, the derivation applied in this example can be divided into two inter-related. The first part, named (a) in the figure, includes derivations that are motivated by the identification of patterns from the database schema. Each relation (or table) definition from the relational database schema is the source of a class in the ontology. To complete the class definitions, datatype properties are derived from some of the relation attributes. The foreign key relationships are the most reliable source for linking classes and, in this example; each relationship is translated into an object property. The derivations applied to obtain this upper part of the
19
ontology are well covered by current methods and, if applied on this database sample, most of the methods would provide the result of the (a) derivations as final output. However, by having a closer look at the data, additional structuring patterns can be exploited to refine the ontology structure. More particularly, the (b) part of the derivations shows how the Products class can be refined with subclasses derived from the values of Category column in the Products source table. PRODUCTS TABLE
PID 1 2 3 Name Gold Kaviar Tonic Juice Pepper Sauce Supplier 12 24 2 Price 55.30 Category Seafood Beverage Condiments
Product Name, Price
Seafood
Beverage
Condiments
Figure 2.6: An example of a categorization pattern where the categories to be employed for hierarchy generation are further defined in an external relation In [40], Li et al. proposed automatic ontology learning approach which acquires OWL ontology based on RDB schemas which are at least in third normal form (3NF).Since these learning rules depend upon relational database schema, therefore learning rules are applied for classes, properties and property characteristics, class hierarchy, cardinality and instances. Several implementation rules for ontology learning are (1) information from several relations/tables integrated into one class, (2) a class created from relation or table which describes an entity, instead of relationship between relations, and (3) inclusion dependencies identified from instances represented as subsumption relation of classes. Learning OWL ontology from relational database strongly depends on a group of learning rules. According to the target of learning, the rules are organized in five groups: rules for learning classes, properties, hierarchy, cardinality and instances.
20
2.3.3 Rules for learning classes

During learning ontological classes, one class may derive from the information spreading across several relations. These rules integrate the information in several relations into one ontological class, when these relations are used to describe one entity. 2.3.4 Rules for learning properties and property characteristics There are two kinds of properties: object properties and datatype properties. These rules learn object properties according to the reference relationship between relations. In a relational model, relations may be used to indicate the relationship between two relations; in this case, such relation can be mapped into an ontological object property. These rules learn object properties by using the binary relationship between relations. Complex or non-binary (n-ary) relations are not supported by ontology languages and need to be decomposed into a group of binary relations. For each attribute in relations, if it cannot be converted into ontological object property, it can be converted into ontological datatype property. 2.3.5 Rules for learning hierarchy Classes and properties can be organized in a hierarchy. If two relations in database have inheritance relationship, then the two corresponding ontological class or property organized in a hierarchy. 2.3.6 Rules for learning cardinality Ontology defines property cardinality to further specify properties. The property cardinality learned from the constraint of attributes in relations. If attribute is primary or foreign key then the minCardinality and maxCardinality of the property is 1. And if any attribute is declared as NOT NULL, the minCardinality of the property is 1. Furthermore if any attribute is declared as UNIQUE, the maxCardinality of the property is 1. 2.3.7 Rules for learning instances For an ontological class, its instances consist of tuples in relations corresponding to class and relations between instances are established using the information contained in the foreign keys in the database tuples.
21
According to above mentioned rules, ontology constructed based on the data in relational database automatically. 2.3.8 Implementation The overall framework of ontology learning is presented in Fig. 5. The input for the framework is data stored in relational database. The framework uses a database analyzer to extract schema information from database, such as the primary keys, foreign keys and dependencies. Then the obtained information is transferred to ontology generator. The ontology generator will generate ontology based on the schema information and rules. As a last step, user can modify and refine the obtained ontology with the aid of ontology reasoner and ontology editor. The framework is a domain/application independent and can learn ontology for general or specific domains from relational database.Fig.2.7 illustrates ontology construction framework
RDB
RDB Analyzer Ontology Generator Rules Lib Ontology Editor
Ontology Reasoner
Ontology
Ontology Based Application

Figure 2.7: Ontology Construction Framework
22
As this survey shows that each approach defines common rules for mapping basic RDB schema pattern to ontology such as relations, properties, reference keys and cardinalities as shown in the following Table 2.5. Table 2.5: Comparative Analysis of ontology construction approaches
Approach Automapper: Relational Database Semantic Translation using OWL and SWRL [37] Using Relational Database to Build OWL Ontology from XML Data Sources [38] Mapping Creation Automatic Procedure -Construction of ontology from RDB with the help of Configuration file - Construction of ontology from RDB Sources Used -Configuration file -RDB Schema -Mapping rules Mapping Rules Class,datatype property, and object property Limitations -Un-normalized RDB -Un-resolvable URIs
Semi-Automatic
-RDB schema -Mapping rules
rdb:Relation,rdb :Attribute,and rdb:ReferenceAt tribute
Mining the Content of Relational Database to Learn Ontology with Deeper Taxonomies [39] Learning Ontology from Relational Database[40]
Semi-Automatic
-Schema analysis and Hierarchy mining from stored data
- RDB schema - Mapping rules - Stored data
Class,datatype properties,object property, and M:M object property classes, properties, property characteristics, cardinalities and data instances
-Un-normalized RDB -Need domain experts help for extracting cardinality restrictions. -Sometimes, attribute name does not represent its value
Automatic
- Construction of Ontology from RDB without using middle model
-RDB Schema -Mapping rules
-Un-normalized RDB
Ontology built either from scratch using any ontology editor or by leveraging database (document collections) using (semi-) automatic ontology construction. The topic of ontology construction has been receiving growing attention since different application might focus on different aspect. However, the basic idea is to provide vocabularies of concepts and their relationship within a specific domain. Many works have been done in the area of ontology construction through the use of relational database as the data source. As described in Table 2.5, the techniques used in ontology construction from relational database are based on schema mapping and data mining approaches.
23
Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective mapping ontology automatically. The translation process relies on a set of data source-to-domain mapping rules. These mapping rules depend on well-form relational database schema. In many applications, sometimes well-form relational database schema is not available, thus it is not guarantee for better results in ontology construction process. XTR-RTO [38] uses metadata information extracted from relational database to construct ontology based on predefined schema translation rules. Effectiveness of theses translation rules depends upon wellform relational database schema. Sometimes unavailability of well-designed relational database, results in challenges in ontology construction process. RTXON [39] identifies lexical clues for attribute names in the filtering process. In many applications, attribute names, sometimes do not represent its value (any lexical clue), thus better results cannot be guaranteed in filtering process. The approach used in [40] acquired ontology from relational database automatically by using group of learning rules. These learning rules depend upon relational database schema and in many applications, unavailability of well-formed relational database resulting in inconsistent and incorrect ontology construction.
2.4 Populating Web Ontologies

Web-ontology automatic population is one of the research issues nowadays. There are two stages of web-ontology developments; one is its (schema) creation and second is its population. Although populating web-ontology is not a complicated task, but it is very time consuming, enormous and laborious when it is performed manually. For the success of Semantic Web there is a need of a proper solution for populating ontologies. In current web, mostly the data is available in XML data files, and there is extremely large number of these data resources, containing data in terabytes. To upgrade such data-intensive web
24
applications into semantic web applications, there is need of some proper methodologies for automatic populating of ontologies. Different approaches have been proposed for ontology populating. Some of them are based on natural language processing and machine learning techniques [41] [42]. Junte Zhang and Proscovia Olango [43] presented a novel approach for making and populating ontologies. According to this method domain knowledge about the ontology is collected and domain ontology is constructed by using open source tool called Protg. This domain ontology is transformed to equivalent RDF file and this RDF file is manipulated manually to populate the ontology skeleton created by the Protg. XSLT or Xquery have been used to extract the relevant information from Wikipedia pages into Perls regular expressions and then ontology instances are generated using those expressions. Semantic heterogeneities and inconsistency problems were raised while exporting Wikipedia pages to XML format and those remained unsolved. Web-ontology creation and populating guidelines are provided while developing new semantic web applications using WSDM [44] [45]. Concepts in the ontology are mapped to object chunks manually at conceptual level. This conceptual mapping is used to generate actual semantic web page at implementation level. A similar approached is used in WEESA [46]. This is an adaptation of XML-base web engineering. Web-ontology concepts are mapped to schema elements of XML document. This mapping is defined for each page. Then ontology is populated via tool [47]. In [48] a methodology is proposed for extracting data from web documents for ontology population. This methodology consists of three steps. The first step consists of extracting information in the form of sentences and paragraphs. Web documents are selected by using some search engines or we could select them manually. This information is understood by the system semantically and syntactically and it also understands the relations between the terms of text using some rhetorical structures. For efficient representation of extracted information, XML is used due to its flexibility and abilities for handling data. We proposed ontology populating methodology [49] to populate ontology from data stored in XML data files. This methodology may help in transforming an existing non-semantic
25
web application into semantic web application by populating its web-ontology semi-automatically through a set of transformation algorithms by reducing the time consuming task of ontology population. In [50], a similar work is presented. The proposed methodologies take a web-ontology schema and XML-document as input and produce a populated ontology as output.
2.5 Integrating Web Ontologies

When multiple ontologies are simultaneously used in some integration operations such as merging, mapping and aligning then they may suffer from different types of heterogeneities such as semantic heterogeneity and non-semantic or syntactic heterogeneity [51,52]. The syntactic heterogeneity is due to the use of different languages for the formalization of ontologies e.g., one ontology is formalized in OWL [20] and the other is formalized in DAML [53] language. The semantic heterogeneity consists of terminological, conceptual and contextual heterogeneities. The terminological heterogeneity occurs when different terms are used to represent same concepts or same term is used to represent different concepts. Conceptual heterogeneity between two concepts may occur due to granularities of concepts i.e., when one is sub-concept or super-concept of the other or both are overlapped. Similarly, two concepts are explicitsemantic heterogeneous when they have different roles or functionalities in the similar domain. In ontology model, the taxonomic and non-taxonomic relations of a concept represent its roles and super/sub concepts respectively. Since, in a certain domain an intellectual concept is explicitly defined in term of roles that it keeps, therefore we may call the roles of a concept as its explicit semantics. However, explicit semantics of a non-intellectual concept may be derived from its granularities and its physical attributes because of typical nature of those domains, e.g. in ontology of a furniture domain, concepts like chair, table and desk have no intellectual properties, these concepts have only taxonomic (i.e. parent, child, sibling) and elementary (i.e. color, type, etc.) characteristics.
26

Web Search Systems

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Web Search Systems

Uploaded by

Copyright:

Available Formats

Chapter 2

BACKGROUND AND LITERATURE REVIEW

2.1 Semantic Web Vs Current Web

Table 2.3: Examples of Semantic Web Systems [17]

DI,CM, and SN Portal and IS DI IS

Spain and US Germany

US US Spain UK Korea Serbia US

P,open model, and S&RD IS,IM, and open model

The Web Ontology

Figure 2.1: A layered model of semantic web [21].

Figure 2.2: A sample slice of persons family ontology

2.3 Engineering Ontologies for Semantic Web

Figure 2.5: OWL description of BOOK Table

Product Name, Price

2.3.3 Rules for learning classes

RDB Analyzer Ontology Generator Rules Lib Ontology Editor

Ontology Based Application

-RDB schema -Mapping rules

rdb:Relation,rdb :Attribute,and rdb:ReferenceAt tribute

-Schema analysis and Hierarchy mining from stored data

- RDB schema - Mapping rules - Stored data

- Construction of Ontology from RDB without using middle model

-RDB Schema -Mapping rules

2.4 Populating Web Ontologies

2.5 Integrating Web Ontologies

You might also like