Seminario Interdipartimentale di Algoritmica  

Monday, June 12, 2006, 12:00 noon Abstract
Algorithms for Link-Based Web Spam Detection

Carlos Castillo, University of Rome "La Sapienza"

DIS - Department of Computer and System Sciences
Room C3, second floor


Abstract:

The talk will be about techniques for automating the detection of Web spam, this is, groups of pages that are linked together with the sole purpose of obtaining an undeservedly high score in search engines. The problem of Web spam is widespread and difficult to solve, mostly due to the large size of Web collections that makes many algorithms unfeasible in practice.

For spam detection we apply only link-based methods, this is, we only study the topology of the Web graph without looking at the contents of the pages. We compute Web page attributes applying rank propagation and probabilistic counting over the Web graph. We have built automated classifiers comparable to state-of-the-art content-based classifiers and with better performance than previous link-based Web spam detection techniques.

Joint work with: Luca Becchetti, Debora Donato, Stefano Leonardi and Ricardo Baeza-Yates.