Seminario Interdipartimentale di Algoritmica
 

Monday, May 4th, 2009, 12:00 noon
Compression and caching for efficient web retrieval
Flavio Chierichetti, "Sapienza" - University of Rome
   
DIS - Department of Computer Engineering, Via Ariosto 25
Room B1, ground floor

Abstract

Web search engines use indexes to efficiently retrieve pages containing specified query terms. The problem of compressed indexes that permit such fast retrieval has a long history. We consider [1] the problem: assuming that the terms in a page are generated from a probability distribution, how well can we build such indexes that allow fast retrieval? Of particular interest is the case when the probability distribution is Zipfian (or similar power laws), since these are the distributions that arise on the web.

Then, motivated by contextual advertising systems and other web applications involving efficiency-accuracy tradeoffs, we introduce the ``similarity caching'' problem. Here, a cache hit is said to occur if the requested item is similar but not necessarily equal to some cached item. We study this problem both experimentally [2] and theoretically in the framework of competitive algorithms [3].
[1]: Compressed Web Indexes
Joint work with Kumar, Raghavan
(To appear in WWW 2009)
[2]: Nearest-Neighbor Caching for Content-Match Applications
Joint work with Pandey, Broder, Josifovski, Kumar, Vassilvitskii
(To appear in WWW 2009)
[3]: Similarity Caching
Joint work with Kumar, Vassilvitskii
(To appear in PODS 2009)