Seminario
Interdipartimentale di Algoritmica
Monday, June 12, 2006, 12:00 noon Abstract
Algorithms for Link-Based Web Spam Detection
Carlos Castillo, University of Rome "La Sapienza"
DIS - Department of Computer and System Sciences
Room C3, second floor
Abstract:
The talk
will be about techniques for automating the detection of Web spam, this
is, groups of pages that are linked together with the sole
purpose of obtaining an undeservedly high score in search engines. The
problem of Web spam is widespread and difficult to solve, mostly due to
the large size of Web collections that makes many algorithms unfeasible
in practice.
For spam detection we apply only link-based methods, this is, we only
study the topology of the Web graph without looking at the contents of
the pages. We compute Web page attributes applying rank propagation and
probabilistic counting over the Web graph. We have built automated
classifiers comparable to state-of-the-art content-based classifiers
and with better performance than previous link-based Web spam detection
techniques.
Joint work with: Luca Becchetti, Debora Donato, Stefano Leonardi and Ricardo Baeza-Yates.