Seminario Interdipartimentale di Algoritmica  

Monday, July 11, 2005, 12:00 noon
Counting Triangles in Data Streams

Luciana Buriol, Università di Roma "La Sapienza"

DIS - Department of Computer and System Sciences
C3 Room, second floor

Abstract:

Data stream algorithms aim to maintain the underlying information of a stream of data, using small memory space. Their main applications are in cases where the data is not stored and must be processed on the fly, and when the exact computation uses excessive space or time. In this work, we propose data stream algorithms for approximately counting the number of triangles of a graph. Counting the exact number of triangles of a graph is time and space consuming. Our algorithms are based on sampling techniques designed for the case when the graph is read as a list of edges in arbitrary order and for the case when the graph is read as an incidence stream (sequence of the adjacency lists of the nodes). In both cases we provide an ($1+\epsilon$)-approximation of the number of triangles with high probability that uses sublinear space. Our algorithms reduced considerable the space bounds with respect to previous results presented so far in the literature, besides of being much simpler to understand and to implement. Using analogous methods, we propose algorithms for computing the number of bipartite cliques of small size for directed networks. Counting the number of triangles of a graph is a fundamental problem in network analysis, e.g., it can be used to compute the clustering coefficient of a network. When considering the graph formed by the hyperlinked structure of the web, the clustering coefficient gives an indication of the presence of communities, that is an important measure performed in such kind of graphs. We also present experimental results using the proposed algorithms applied in a set of networks and show that in practice the algorithms provide results very close to the optimum.