Distributed Models, Mapreduce and Large Scale Algorithms
April 15th 2019, 10.00 - 14.00, Room B203, DIAG, Via Ariosto 25.
April 16th 2019, 9.00 - 13.00, Room B203, DIAG, Via Ariosto 25.
As a fundamental tool in modeling and analyzing real world data, large-scale algorithms are a central part of any tool set for big data analysis. Processing datasets with hundreds of billions of entries is only possible via developing distributed algorithms under distributed frameworks such as MapReduce, Pregel, Gigraph, and alike. For these distributed algorithms to work well in practice, we need to take into account several metrics such as the number of rounds of computation and the communication complexity of each round. For example, given the popularity and ease-of-use of MapReduce framework, developing practical algorithms with good theoretical guarantees for basic algorithmic primitives is a problem of great importance. In this course, we discuss how to design and implement algorithms based on traditional MapReduce architecture. In this regard, we discuss various basic algorithmic problems such as computing connected components, maximum matching, MST, counting triangle, clustering, diversity maximization and so on so for. In particular, we discuss a computation model for MapReduce and describe the sampling&filtering, and core-set techniques to develop efficient algorithms in this framework.