Improving Wikipedia With Dbpedia Properties: 3. Bluefinder Algorithm 1. Missing Links in Wikipedia

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

PRES LUNAM

cole Doctorale STIM


Sciences et Technologies de
lInformation et Mathmatiques

Spcialit : Informatique
Laboratoire : LINA
quipe : GDD

Improving Wikipedia with DBpedia


Properties
Diego Torres
diego.torres@lifia.info.unlp.edu.ar

Director: Pascal Molli


Co-Director: Alicia Diaz (Arg)

1. Missing Links in Wikipedia


DBpedia extracts information and
stores it in a semantic representation.

Co-encadrant: Hala Skaf-Molli

3. BlueFinder Algorithm
BlueFinder implements the collaborative filtering recommender system approach.
Similarity between pairs of articles is determined by the level of overlapping among the DBpediatypes of the articles.
It uses the Jaccard distance function in a k-Nearest Neighbours(kNN) algorithm adapted to the
context of DBpedia and PIA.

BlueFinder
Querying DBpedia, for example:
All pairs (City,Person) where
<City> is birthplace of <Person>

119,097

(Paris, Henri_Alekan)
(Rosario, Lionel_Messi)
(Boston, Robin_Moore)

It is not possible to navigate in Wikipedia from Boston to Robin_Moore.


Around 50,000 other cases have the same problem.

How to insert missing links by respecting conventions in Wikipedia?

Input: (from,to): the unconnected pair,


K: number of neighbours,
PIAIndex( connectedPairs, pathQueries, E*): it is pre computed for a DBpedia property
maxRecom: limit of recommendations.
Output: set of recommended path queries
1) Select the K nearest connected pairs to (from,to)
By applying the Jaccard distance function to the DBpedia-types of the articles.
2) Obtain from PIA the path queries that connect the K pairs.
The path queries that connect similar pairs are better than the others.
3) Filter Noise: eliminate path queries related with Administrative Categories.
Administrative categories are not accessible by regular Wikipedia users.
4) Apply the Star generalization
It groups path queries that end with the same category.
5) Return the firsts maxRecom path queries as recommendation

Wikipedia convention example

*E is the set of edges that relate the connected pairs with the path queries.

#from/Category:#from/
Category: People_from_#from/#to

A more general
representation

A general path is called: Path Query


The path queries that best represent connected
pairs will also fix unconnected pairs like (Boston,
Robin_Moore)

2. Collaborative Filtering Recommender Systems Approach

We ran an evaluation to answer: Is BlueFinder able to fix the unconnected pairs?

Methodology

We eliminated existing links and then observed whether BlueFinder was able to recreate them.
Example

A is birthplace of D

References
1.Torres, D., Molli, P., Skaf-Molli, H., and Diaz, A. From dbpedia to wikipedia: Filling the gap by discovering wikipedia conventions. In 2012 IEEE/WIC/ACM International Conference on Web Intelligence (WI12) (2012).
2. Torres, D., Molli, P., Skaf-Molli, H., and Daz, A. Improving Wikipedia with DBpedia. In WWW (Companion Volume), A. Mille, F. L. Gandon,
J. Misselis, M. Rabinovich, and S. Staab, Eds., ACM (2012), 11071112.

Run BlueFinder
with (A,D)

Results: BlueFinder is able to fix the unconnected pairs !


General values for all scenarios

 


 































































 









































 













































 





 



 






























2. Prediction for unconnected pairs: For an unconnected


pair (e.g. (Boston, Robin_Moore)) of Wikipedia articles, BlueFinder finds the best path queries in the item set learning from
similar connected pairs in the user set.

We evaluated BlueFinder with twenty properties of DBpedia. The evaluation combines the
number of neighbours (k) and the number of recommendation results (maxRecom) to calculate
the value of fixed cases (pot).

We want to predict the utility of path queries for a


particular pair of Wikipedia articles learning from the
good path query examples of connected pairs.
1. Building the item and user set: There is one item set and
one user set for each DBpedia semantic property (e.g. is birthplace of). Both are pre-computed by PIA algorithm in the PIA
index [1,2].
The PIA index is a bipartite graph that represents the coverage of path queries for a set of pairs of Wikipedia articles that are
related by a DBpedia property.
Item set: the set of path queries.
User set: the set of pairs of Wikipedia articles.

If the recommendation recreates the original


path, then BlueFinder fixes the connection.

In Wikipedia is possible
to navigate from A to D.



Paris/Category:Paris/
Category:People_from_Paris/Henri_Alekan

4. Evaluation and Results

 

Rosario/Category:Rosario/
Category:People_from_Rosario/Lionel_Messi


















5. Conclusions and Future Works


BlueFinder can fix the missing links between DBpedia and Wikipedia. The evaluation showed us a
good level of fixed cases.
It is necessary to continue working in the recommendation algorithm in order to improve the precision values.
We also plan to analyse other similarity measures involving other semantic properties and to
extend the evaluation to the whole DBpedia.

You might also like