Professional Documents
Culture Documents
Web Search Systems
Web Search Systems
The Web information resources are growing explosively in number and volume but to retrieve relevant information from these resources are becoming more difficult and time-consuming. The way of storing Web information resources is considered a root cause of irrelevancy in information retrieved because they are not stored in machine-understandable organization. It has been estimated [12] that nearly 48 percent to 63 percent irrelevant results are retrieved. That is, the relevant results are in the range of 37 percent to 52 percent, as shown in Table 2.1, which are far from the accuracy. An extension of current web known as Semantic Web, has been envisioned to organize the web content through ontologies in order to enable them machine understandable. To support the theme of semantic web, there is a crucial need of techniques for designing, development, populating and integrating ontologies.
Table 2.1: Relevant results (from top 20 results) of search some engines Search Engine Yahoo Google MSN Ask Seekport Precision 0.52 0.48 0.37 0.44 0.37
(b) Conceptual Perception: Current web is just like a book having multiple hyperlinked documents. In book scenario, index of keywords are presented in each book but the contexts in which those keywords are used, are missing in the indexes. That is, there are no formal semantic of keywords in indexes. To check which one is relevant, we have to read the corresponding pages of that book. Same is the case with current web. In semantic web this limitation will be eliminated via ontologies where data is given with well-defined meanings, understandable by machines. (c) Scope: Through literature survey, it has been determined that inaccessible part of the Web is about five hundred times larger than accessible one [13]. It is estimated that there are billion pages of information available on the web, and only a few of them can be reached via traditional search engines. In semantic web formal semantics of data are available via ontologies, and the ontologies are the essential component of semantic web accessible to semantic search engines. (d) Environment: Semantic Web is the web of ontologies having data with formal meanings. This is in contrast to current Web which contains virtually boundless information in the form of documents. The Semantic Web, on the other hand, is about having data as well as documents that machines can process, transform, assemble, and even act on data in useful ways.
(e) Resources Utilization: There are a lot of Web resources that may be very useful in our everyday activities. In current web it is difficult to locate them; because they are not annotated properly by the metadata understandable by machines. In Semantic Web there will be a network of related resources. It will be very easy to locate and to use them in semantic web world. Similarly there are some other criteria factors for comparison between current web and semantic web. For example, information searching, accessing, extracting, interpreting and processing on semantic web will be more easy and efficient; Semantic web will have inference or reasoning capability; network or communication cost will be reduced in the presence of semantic web for the reason of relevant results; and many more - some are listed in the Table 2.2. Table 2.2: Semantic vs. current Web
Sr. No 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Web Factors Conceptual Perception Content Scope Environment Resource Utilization Inference/Reasoning capability Knowledge Management application support Information searching,accessing,extracting Timeliness,accuracy,transparency information Semantic heterogeneity Ingredients Text simplification and clarification of (Non-Semantic) Web Large hyperlinked book No formal meanings Limited-Probably invisible web excluded Web of documents Minimum-Normal No No Difficult and timeconsuming task Less More -Content -Presentation No Semantic Web Large interlinked database Formally defined Boundless-Probable invisible web included Web of ontologies, data and documents Maximum Yes Yes Easy and Efficient More Less -Content, presentations -Formal Semantics Yes
According to theme of semantic web, if the explicit semantics of web resources are put together with their linguistic semantics and are represented in some logic-based languages then we can handle the limitations of current web. To support this theme of semantic web, W3C recommended some standards [14] such as RDF (Resource Description Framework), RDFS (RDF Schema), OWL (Web Ontology Language), SPARQL (a query language for RDF) and GRDDL (Gleaning Resource Descriptions from Dialects of Languages). RDF is used for data model of semantic web application. Resources are represented through URIs, connected through labeled edges which are also represented through URIs. RDF is represented
through a language called RDFS. There is another more powerful language so- called OWL to represent RDF model. The query language such as SPARQL can be used to querying RDF model. Now the semantic web vision has become a reality [15, 16]. Several semantic web systems have been developed. A subset of these systems is given in Table 2.3. A huge number of ontology based web documents have been published.
France US UK Ireland
Portal,IS,CD, and customization DI,CD,IS,service integration, and SA DI and SN DI and portal CD CD and IS Portal,CD, and SA SA,SN, and DI DI
1,2,RDF,4,P V,and IHV 1,3,2 and RDFS++ 1 and 3 1,3,2,2DL,P V,and IHV I and IHV 1,PV & IHV 1,2,Rules, PV, and IHV 1,3 and PV 1,2,PV & IHV 2,3,and PV
Faceted navigation, S&RD, and ECR IS,S&RD,IM,and ECR IS S&RD,open model, and IS IS INR,IS, S&RD,and DCG S&RD and IM
Semantic-based Search and Query China DI,IS and Schema S&RD and IS System for the Traditional Chinese mapping Medicine Community The SW for the Agricultural Domain, Italy PI and eG Portal,IS,SA, and 1,2,SKOS,P IS Semantic Navigation of Food, Nutrition CD V,and IHV and Agricultural Journal The swordfish Metadata Initiative US IT industry Portal and DI 1 and IHV DCG and ECR Better,Faster,Smarter Web Content Twine US IT industry SA,SN,and DI RDF,1,2,3 INR,IS,S&RD,and DCG Use of SWT in Natural Language India IT industry Natural language 1,2,5,3, and IM and ECR interface to Business Applications interface IHV Different abbreviations and integers used in above table are: SWT(Semantic Web Technologies),IHV(In-House Vocabularies),IS(Improved Search),IM(Incremental Modeling),CD(Content Discovery),DI(Data Integration),INR (Identity New Relationships),S&RD(Share and Reuse Data),PV(Public Vocabularies),ECR(Explicit Content Relationships),SA(Semantic Annotation),SN(Social Network),GIS(Geographic Information System) ,ELT(Education, Learning Technology),CM(Content Management),DCG(Dynamic Content Generation),P(Personalization),PI(Public Institution),HC(Health Care),eG(eGovernement),1(RDFS),2(OWL),3(SPARQL),4(GRDDL),5(Rules, Rules(N3))
2.2
Ontology formally provides structural knowledge of a domain and its data in the machineunderstandable way in some W3C [18] recommended technologies such as RDFS [19] and OWL [20] for ontology formalization. It is an essential component of any semantic web application. Basically, it is the central component in overall layered architecture of the semantic web, as shown in Figure 2.1. In ontologies, the information-resources are connected in such a way that each is uniquely identified by URI and new information can be drawn through a reasoning process. Basically, the ontology is a special type of network of web-resources. A conceptual view of ontology can be in RDF-graph or in triples form. Consider an example of a persons family ontology as shown in Figure 2.2. Now by definition of ontology, it looks a special kind of a network of information resources. The only information explicitly given in said ontology are isFatherOf, isMotherOf, isBrotherOf, isSisterOf, but we can drive a several types of new information such as isSonOf(3,1) (i.e. Humza is the son of Farooq), isHusbandOF(8,9), isWifeOf(9,8),
isGrandFatherOf(8,3), isGrandmotherOf(9,3) and many more can easily be derived. It can be implemented in some logic-based language such as OWL and conceptually it can be seen as triples or RDF-graph.
10
11
proposed an extension of the Web Site Design Method. In this approach object chunk entities are mapped to concepts in the ontology. OOHDM has extended in the light of the semantic web technologies [24]. Its primitives for conceptual and navigation models have been described as DAML classes and RDFS have been used for domain vocabulary. The Hera [25] methodology has been extended for adaptive web-based application engineering. It uses semantic web standards for models representation. Existing web engineering methodologies as mentioned above, have adapted the use of ontology in their development process, but their main focus is mapping and annotation, using existing ontologies. Design process of new ontologies during semantic web application development is remained lack of focus. No proper guidance is there on how to design ontology for semantic web applications. There are some other methodologies for ontology development as surveyed in [26, 27, 28]. Mostly these methodologies focus on specification and formalization of ontology and do not concentrate on its design phase. These methodologies are based on natural language processing (NLP) and machine learning
techniques. Orientation of these methodologies is web-agents facilitation rather than web-contents formalization. The work on ontology development was boosted when the idea of Semantic Web was envisioned. In KBSI IDEF5 methodology [29], data about domain is collected and analyzed. Then built and fixed strategy is used to create ontology. Ushold and King [30] proposed an ontology development methodology. In this methodology, after identifying purpose of ontology, it is captured and then coded. In MethOntology[31] after preparing ontology specification, knowledge is acquired and analyzed to determine domain terms such as concepts, relations and properties and then formalization is started. After that, evaluation and documentation is performed. In [32] a methodology, based on collaborative approach has been proposed. In its first phase, design criteria for the ontology, specified boundary conditions for the ontology and set of standards for evaluating ontology, are defined. In second phase, the initial version of ontology is produced, and then through
12
iterative process the desired ontology is obtained. Software Built and Fixed approach is followed, which leads to heavy development and maintenance cost. Helena and Joo [33] proposed a methodology for ontology construction in 2004. This method divides ontology construction activities in different steps such as specification, conceptualization, formalization, implementation and maintenance. Knowledge acquisition, evaluation and documentation are performed during each phase. There are some other approaches investigating the transformation of a relational model into an ontological model. In these approaches, the ontology is developed from database schemas. These approaches mainly use reverse engineering theory. In [34], a methodology is outlined for constructing
ontology from conceptual database schemas using a mapping process. The process is carried out by taking into consideration the logical database model of target system. Most of methodologies as overviewed above are based on the built and fixed approach. In that, first initial version of ontology is built and improved iteratively until domain requirements are satisfied. In this way the basic principles of software engineering are not followed properly. These methodologies mainly focus on data during developing process rather than focus on descriptive knowledge. These methodologies mainly work on specification and implementation phases and design phase lacks in proper attention. Moreover the design and implementation phases of these methodologies are difficult to identify and separate. One of the challenges in real world applications is to improve accessing and sharing knowledge that resides in database. Domain knowledge to be shared needs to be formalized using a formal ontology language. Extracting and building web ontology on top of Relational Database (RDB) is a way to represent domain specific knowledge. Moreover, information residing in RDBs is highly engineered for accessibility and scalability [35] and is characterized by high quality and rapidly increasing correspondence to the surface web [36]. In the schema mapping techniques, the idea is to convert RDB schema to ontology based on the predefined schema mapping rules. Various research groups have proposed various techniques.
13
Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective mapping for ontologies automatically. It uses RDB schema to create OWL ontology and the mapping instance data. The translation process relies on a set of data source-to-domain mapping rules. A processing module named as Semantic Bridge for RDB translates queries to produce OWL ontologies. Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective ontology mappings automatically. Automapper is an application independent tool that generates a basic ontology from a relational database schema and dynamically produces instance data using ontology. This method quickly exposes the data to the Semantic Web, where a variety of tools and applications are available to support translation, integration, reasoning, and visualization.
The following class descriptions, axioms and restrictions are currently generated by Automapper: maxCardinality is set to 1 for all nullable columns and is used for descriptive purposes minCardinality is set to 1 for all non-nullable columns and is used for descriptive purposes. All datatype and object properties that represent columns are marked as Functional Properties. To ensure global uniqueness and class specificity, these columns are given URIs based on concatenating the table and column names allValuesFrom restriction reflect the datatype or class associated with each column and is used for descriptive purposes. Table 2.4 lists the contents of Departments table Table 2.4: Departments Table
IDa 1 2 3 NAME System Solutions Research and Development Management
a. ID is Primary Key
14
From this schema, Automapper creates the data source ontology and class-specific inverse functional rules, as shown in the Fig.2.3
dsont:Hresources.Departments a owl:Class; rdfs:subClassOf [ a owl:Restriction; owl:onProperty dsont:hresources.departments.ID; owl:allValuesFrom xsd:decimal ], [ a owl:Restriction; owl:onProperty dsont:hresources.departments.ID; 1^^xsd:nonNegativeInteger ]
Figure 2.3: Ontology and inverse functional rules created by Automapper XTR-RTO [38] is an approach to build OWL ontology from eXtensible Markup Language (XML) document. The transform module firstly maps source of XML schema into RDB and next, maps it into OWL ontology. Specific attributes for describing RDB are used such as rdb: DbName, rdb: Relation, rdb: RelationList, rdb: Table and rdb: Attribute. All tables are mapped to instances of type rdb: Relation and consequently will be added to type rdb: RelationList, where as each attribute is mapped to an instance of type rdf: Attribute and an instance of type rdb: hasType. XTR-RTO [38] is an approach to build OWL ontology from eXtensible Markup Language (XML) document. The transformation module firstly maps source of XML schema into RDB and next, maps it into OWL ontology. Specific attributes for describing RDB are used such as rdb: DbName, rdb: Relation, rdb: RelationList, rdb: Table and rdb: Attribute. All tables are mapped to instances of type rdb: Relation and consequently will be added to type rdb: RelationList, where as each attribute is mapped to an instance of type rdf: Attribute and an instance of type rdb: hasType.
15
The entity-relation model is the most popular style for organizing database at present, which can express the relationship between data clearly. Therefore metadata information and structural restrictions extracted from relational database to construct ontologies.The ontology contains: Vocabularies for describing relational database systems such as:rdb:DBName,rdb:Relation, rdb:RelationList,rdb:Table,rdb:Attribute,rdb:PrimaryKeyAttribute , and rdb:ForeignKeyAttribute. Semantic relationships between vocabularies such
as:rdb:hasRelation,db:hasAttribute,rdb:primaryKey,db:hasType and rdb:isNullable. Restrictions on the vocabularies and their semantic relationships such as: each relation has zero or more attributes, and each attribute has exactly one type. Mapping approach used in XTR-RTO is described below: Each table is mapped to an instance of type rdb: Relation and added to type rdb: RelationList. Each attribute is mapped to an instance of type rdb: Attribute and an instance of type rdb: hasType is generated simultaneously. If the attribute is the foreign key, an instance of type rdb: ReferenceAttribute and an instance of type rdb: ReferenceRelation are generated to represent this information. Generate the restriction of each instance of type rdb: Attribute, such as cardinality restriction and foreign key restriction. There is one table in this relational database, as illustrated in Table 2.5 Table 2.5: Book Table in Relational Model
BOOK_ID TITLE AUTHOR PRINTER_ID
Suppose the database is saved in local host, and the OWL ontology describing the database has a namespace http://localhost/book.owl, which can be changed by users. Fig. 2.4 illustrates the relations in the ontology:
16
<?xml version=1.0 encoding=UTF-8> <rdf:RDF> <rdb:DBName rdf:ID=BOOK> <rdb:hasRelation> <rdb:RelationList> <rdb:Relation rdf:resource= http://localhost/book/BOOK/BOOK.owl #BOOK/> </rdb:RelationList> </rdb:hasRelation> </rdb:DBName> </rdf:RDF>
Figure 2.4: Relations in the ontology The OWL description of table BOOK is as shown in the Fig. 2.5
<rdb:Relation rdf:ID=BOOK> <rdb:hasAttribute> <rdb:AttributeList> <rdb:Attributerdf:resource=http://localhost/book/BOOK/BOOK.owl#BOOK_ID/> <rdb:Attribute rdf:resource=http://localhost/book/BOOK/BOOK.owl#TITLE/> <rdb:Attribute rdf:resource=http://localhost/book/BOOK/BOOK.owl#AUTHOR/> <rdb:Attribute rdf:resource=http://localhost/book/BOOK/BOOK.owl#PRINTER_ID/> </rdb:AttributeList> </rdb:hasAttribute> </rdb:Relation> <rdb:PrimarKey rdf:ID=BOOK_ID> <rdb:isNullable>false</rdb:isNullable> <rdb:type>string</rdb:type> </rdb:PrimarKey> <rdb:Attribute rdf:ID=TITLE> <rdb:isNullable>false</rdb:isNullable> <rdb:type>string</rdb:type> </rdb:Attribute> <rdb:Attribute rdf:ID=AUTHOR> <rdb:isNullable>false</rdb:isNullable> <rdb:type>string</rdb:type> </rdb:Attribute> <rdb:Attribute rdf:ID=PRINTER_ID> <rdb:isNullable>false</rdb:isNullable> <rdb:type>string</rdb:type> </rdb:Attribute> <rdb:referenceAttribute> <rdf:Attribute rdf:resource= http://localhost/book/PRINTER/PRINTER.owl #PRINTER#PRINTER_ID/> </rdb:referenceAttribute>
17
In RTAXON [39] data mining approach is used. This work proposed learning technique to exploit all database content to identify categorization patterns. These categorization patterns used to generate class hierarchies. This fully formalized method combines a classical schema analysis with hierarchy mining (extraction) in the data. Learning method focuses on concept hierarchy identification by identifying the relation of attributes. Due to the assumption that attribute names have specific role in the relation, this approach identifies lexical clues of attribute names that reveal their specific role in the relation (i.e. classifying the tuples). A learning method that shows how the content of the databases can be exploited to identify categorization patterns. These categorization patterns are then used to generate class hierarchies. This fully formalized method combines a classical schema analysis with hierarchy mining (extraction) in the data. Learning method focuses on concept hierarchy identification by identifying the relation of attributes. Due to the assumption that attribute names have specific role in the relation, this approach identifies lexical clues of attribute names that reveal their specific role in the relation (i.e. classifying the tuples). In example of Fig. 2.6, the categorizing attribute in the Products relation is clearly identified by its name (Category). 2.3.1 Identification of the categorizing attributes: Two sources are involved in the identification of categorizing attributes: names of attributes and data diversity in attribute extensions (i.e. in column data). These two sources are indicators that allow, finding attribute candidates and selecting the most plausible one. a. Identification of lexical clues in attribute names: Attributes bear names that reveal their specific role in the relation (i.e. classifying the tuples) used for categorization. The lexical clue that indicates the role of the attribute can just be a part of the name, as in the attribute names CategoryID or Object Type. In Fig. 2.6, for example, the categorizing attribute in the
18
Products relation is clearly identified by its name (i.e. Category).A list of clues set up and used to perform a first filtering of potential candidates. b. Filtering through entropy-based estimation of data diversity: With an extensive list of lexical clues, the first filtering step appears to be effective. For example, Category column in the Products relation can be used to derive subclasses. However, experiments on complex databases show that this step often identifies several candidates. The selection among the remaining candidates is based on an estimation of the data diversity in the attribute extensions. A good candidate might exhibit some typical degree of redundancy that can be formally characterized using the concept of entropy from information theory. Entropy is a measure of the uncertainty of a data source. Attributes with highly repetitive content will be characterized by low entropy. Conversely, among attributes of a given relation, the primary key will have the highest entropy since all values in its extension are distinct. Informally, the rationale behind this selection step is to favor the candidate that would provide the most balanced distribution of instances within the subclasses. 2.3.2 Generation and population of the subclasses: The generation of subclasses from an identified categorizing attribute can be straightforward. A subclass is derived from each value type of the attribute extension (i.e. for each element of the attribute active domain). However, proper handling of the categorization source may require more complex mappings. Example As illustrated in Fig. 2.6, the derivation applied in this example can be divided into two inter-related. The first part, named (a) in the figure, includes derivations that are motivated by the identification of patterns from the database schema. Each relation (or table) definition from the relational database schema is the source of a class in the ontology. To complete the class definitions, datatype properties are derived from some of the relation attributes. The foreign key relationships are the most reliable source for linking classes and, in this example; each relationship is translated into an object property. The derivations applied to obtain this upper part of the
19
ontology are well covered by current methods and, if applied on this database sample, most of the methods would provide the result of the (a) derivations as final output. However, by having a closer look at the data, additional structuring patterns can be exploited to refine the ontology structure. More particularly, the (b) part of the derivations shows how the Products class can be refined with subclasses derived from the values of Category column in the Products source table. PRODUCTS TABLE
PID 1 2 3 Name Gold Kaviar Tonic Juice Pepper Sauce Supplier 12 24 2 Price 55.30 Category Seafood Beverage Condiments
Seafood
Beverage
Condiments
Figure 2.6: An example of a categorization pattern where the categories to be employed for hierarchy generation are further defined in an external relation In [40], Li et al. proposed automatic ontology learning approach which acquires OWL ontology based on RDB schemas which are at least in third normal form (3NF).Since these learning rules depend upon relational database schema, therefore learning rules are applied for classes, properties and property characteristics, class hierarchy, cardinality and instances. Several implementation rules for ontology learning are (1) information from several relations/tables integrated into one class, (2) a class created from relation or table which describes an entity, instead of relationship between relations, and (3) inclusion dependencies identified from instances represented as subsumption relation of classes. Learning OWL ontology from relational database strongly depends on a group of learning rules. According to the target of learning, the rules are organized in five groups: rules for learning classes, properties, hierarchy, cardinality and instances.
20
21
According to above mentioned rules, ontology constructed based on the data in relational database automatically. 2.3.8 Implementation The overall framework of ontology learning is presented in Fig. 5. The input for the framework is data stored in relational database. The framework uses a database analyzer to extract schema information from database, such as the primary keys, foreign keys and dependencies. Then the obtained information is transferred to ontology generator. The ontology generator will generate ontology based on the schema information and rules. As a last step, user can modify and refine the obtained ontology with the aid of ontology reasoner and ontology editor. The framework is a domain/application independent and can learn ontology for general or specific domains from relational database.Fig.2.7 illustrates ontology construction framework
RDB
Ontology Reasoner
Ontology
22
As this survey shows that each approach defines common rules for mapping basic RDB schema pattern to ontology such as relations, properties, reference keys and cardinalities as shown in the following Table 2.5. Table 2.5: Comparative Analysis of ontology construction approaches
Approach Automapper: Relational Database Semantic Translation using OWL and SWRL [37] Using Relational Database to Build OWL Ontology from XML Data Sources [38] Mapping Creation Automatic Procedure -Construction of ontology from RDB with the help of Configuration file - Construction of ontology from RDB Sources Used -Configuration file -RDB Schema -Mapping rules Mapping Rules Class,datatype property, and object property Limitations -Un-normalized RDB -Un-resolvable URIs
Semi-Automatic
Mining the Content of Relational Database to Learn Ontology with Deeper Taxonomies [39] Learning Ontology from Relational Database[40]
Semi-Automatic
Class,datatype properties,object property, and M:M object property classes, properties, property characteristics, cardinalities and data instances
-Un-normalized RDB -Need domain experts help for extracting cardinality restrictions. -Sometimes, attribute name does not represent its value
Automatic
-Un-normalized RDB
Ontology built either from scratch using any ontology editor or by leveraging database (document collections) using (semi-) automatic ontology construction. The topic of ontology construction has been receiving growing attention since different application might focus on different aspect. However, the basic idea is to provide vocabularies of concepts and their relationship within a specific domain. Many works have been done in the area of ontology construction through the use of relational database as the data source. As described in Table 2.5, the techniques used in ontology construction from relational database are based on schema mapping and data mining approaches.
23
Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective mapping ontology automatically. The translation process relies on a set of data source-to-domain mapping rules. These mapping rules depend on well-form relational database schema. In many applications, sometimes well-form relational database schema is not available, thus it is not guarantee for better results in ontology construction process. XTR-RTO [38] uses metadata information extracted from relational database to construct ontology based on predefined schema translation rules. Effectiveness of theses translation rules depends upon wellform relational database schema. Sometimes unavailability of well-designed relational database, results in challenges in ontology construction process. RTXON [39] identifies lexical clues for attribute names in the filtering process. In many applications, attribute names, sometimes do not represent its value (any lexical clue), thus better results cannot be guaranteed in filtering process. The approach used in [40] acquired ontology from relational database automatically by using group of learning rules. These learning rules depend upon relational database schema and in many applications, unavailability of well-formed relational database resulting in inconsistent and incorrect ontology construction.
24
applications into semantic web applications, there is need of some proper methodologies for automatic populating of ontologies. Different approaches have been proposed for ontology populating. Some of them are based on natural language processing and machine learning techniques [41] [42]. Junte Zhang and Proscovia Olango [43] presented a novel approach for making and populating ontologies. According to this method domain knowledge about the ontology is collected and domain ontology is constructed by using open source tool called Protg. This domain ontology is transformed to equivalent RDF file and this RDF file is manipulated manually to populate the ontology skeleton created by the Protg. XSLT or Xquery have been used to extract the relevant information from Wikipedia pages into Perls regular expressions and then ontology instances are generated using those expressions. Semantic heterogeneities and inconsistency problems were raised while exporting Wikipedia pages to XML format and those remained unsolved. Web-ontology creation and populating guidelines are provided while developing new semantic web applications using WSDM [44] [45]. Concepts in the ontology are mapped to object chunks manually at conceptual level. This conceptual mapping is used to generate actual semantic web page at implementation level. A similar approached is used in WEESA [46]. This is an adaptation of XML-base web engineering. Web-ontology concepts are mapped to schema elements of XML document. This mapping is defined for each page. Then ontology is populated via tool [47]. In [48] a methodology is proposed for extracting data from web documents for ontology population. This methodology consists of three steps. The first step consists of extracting information in the form of sentences and paragraphs. Web documents are selected by using some search engines or we could select them manually. This information is understood by the system semantically and syntactically and it also understands the relations between the terms of text using some rhetorical structures. For efficient representation of extracted information, XML is used due to its flexibility and abilities for handling data. We proposed ontology populating methodology [49] to populate ontology from data stored in XML data files. This methodology may help in transforming an existing non-semantic
25
web application into semantic web application by populating its web-ontology semi-automatically through a set of transformation algorithms by reducing the time consuming task of ontology population. In [50], a similar work is presented. The proposed methodologies take a web-ontology schema and XML-document as input and produce a populated ontology as output.
26