Seminario Interdipartimentale di Algoritmica
 
 
 

Lunedì 19 Gennaio 2004  ore 11:00
UbiCrawler: a scalable fully distributed web crawler
Massimo Santini
Università di Modena e Reggio Emilia

DIS - Dipartimento di Informatica e Sistemistica, via Salaria 113
Aula C2, piano secondo

Abstract:
This talk will report our experience in implementing UbiCrawler, a
scalable distributed web crawler, using the Java programming
language. The main features of UbiCrawler are platform independence,
linear scalability, graceful degradation in the presence of faults,
a very effective assignment function (based on consistent hashing)
for partitioning the domain to crawl, and more in general the
complete decentralization of every task. The necessity of handling
very large sets of data has highlighted some limitation of the Java
APIs, which prompted the authors to partially reimplement them.


SIA