Seminario Interdipartimentale di Algoritmica
 
 

Monday, November 13, 2006  12:00 noon
Efficient and Decentralized PageRank Approximation in a Peer-to-Peer Web Search Network
Debora Donato
Yahoo Research, Barcelona

DI - Department of Computer Science
Seminar Room, third floor

Abstract:

PageRank-style (PR) link analyses are a cornerstone of Web search engines and Web mining, but they are computationally expensive. Recently, various techniques have been proposed for speeding up these analyses by distributing the link graph among multiple sites, typically, the sites that own the corresponding pages. However, none of these advanced methods is suitable for a fully decentralized PR computation in a peer-to-peer (P2P) network with autonomous peers, JXP is an algorithm for dynamically and collaboratively computing PR scores of Web pages that are arbitrarily distributed in a P2P network. The algorithm runs at every peer, and it works by combining locally computed PR scores with random meetings among the peers in the network. It is scalable as the number of peers on the network grows, and experiments as well as theoretical arguments show that, after a certain number of peer meetings, JXP scores converge to the true PR scores that one would obtain by a centralized computation. A mathematical framework for the optimized version of JXP is also given, where important properties and the convergence proof of the algorithm are presented.