Large scale dependable distributed systems - A.A. 2014-2015

Lecturer: Leonardo Querzoni
CFUs: 3
Lecture hours:
    1st semester:
    Friday 12:00-13:30, room A4.

The success of the *-as-a-service business model recently shifted the demand for distributed reliable architectures towards previously unseen scales. Modern cloud platforms represent the main result of years of research in the area of dependable distributed systems. Yet, the design of their internal architectures pushed researchers to find new solutions to well known problems in order to withstand the sheer scale and the demand for elasticity that characterize cloud scenarios.
This course aims at analyzing current trends in the design of large scale dependable distributed systems, that are at the base of cloud platforms. The course will focus on the following topics:

  • Resilience to byzantine faults (including Byzantine-resilient state machine replication)
  • Storage in dynamic settings
  • Event and stream processing
Notes:

No office hours from December 12th, 2014 to January 11th, 2015.

Lectures:

October 3rd, 2014 - Intro, a short recap on dependability.
October 10th, 2014 - RSMs, Distributed consensus and Paxos.
October 17th, 2014 - Paxos, Byzantine consensus.
October 24th, 2014 - BFT.
October 31st, 2014 - BFT.
November 7th, 2014 - BFT.
November 14th, 2014 - BFT.
November 21st, 2014 - Intro to Data Stream Processing.
November 28th, 2014 - DSP with Storm.
December 5th, 2014 - Scalability and elasticity in DSP systems.
December 12th, 2014 - Dynamic distributed systems.

Slides:

The password for accessing the following PDFs is "eds"
Introduction - 1 - 2 - 3 - 4 - 5

Exam rules

Instructions (Please read carefully and contact me for further information)
Suggested topics
Paper templates in LaTeX and Word.

Useful links:
  1. A. Avizienis, J.-C. Laprie, B. Randell and C. Landwehr. Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. on Dependable and Secure Computing, vol. 1, no. 1, pp. 11–33, 2004.
  2. L. Lamport,R. Shostak and M. Pease. The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst. 4, 3 (July 1982), 382-401.
  3. F. Schneider. Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial. ACM Comp. Surveys, vol. 22, n. 4, 1990
  4. L. Lamport. The Part-Time Parliament. ACM Transactions on Computer Systems, vol. 16, n. 2, pp.133-169, 1998.
  5. L. Lamport. Paxos Made Simple. ACM SIGACT News vol. 32, n. 4, pp. 51-58, 2001
  6. M. Castro and B. Liskov. Practical Byzantine fault-tolerance and proactive recovery. ACM Transactions on Computer Systems (TOCS), vol. 20, n. 4, November 2002
  7. R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: speculative byzantine fault tolerance. Proc. of 21st ACM SIGOPS symposium on Operating systems principles (SOSP '07). ACM, New York, NY, USA, 45-58, 2007.
  8. J. Yin, J-Ph. Martin, A. Venkataramani, L. Alvisi, and M. Dahlin. Separating agreement from execution for byzantine fault tolerant services. SIGOPS Oper. Syst. Rev. 37, 5 (October 2003), 253-267.
  9. Allen Clement, Edmund L. Wong, Lorenzo Alvisi, Michael Dahlin, Mirco Marchetti. Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults. NSDI 2009
  10. Rachid Guerraoui, Nikola Knezevic, Vivien Quéma, Marko Vukolic. The next 700 BFT protocols. EuroSys 2010
  11. Nathan Marz. Storm - Distributed and fault-tolerant realtime computation (from berkeley.edu)
  12. Dario Simonassi, Gabriel Eisbruch, Jonathan Leibiusky. Realtime Processing with Storm (from oreilly.com)

Many of these papers are freely available. Those that require an active subscription can be downloaded from computers connected through the proxy installed at La Sapienza. Check the BIXY service (in italian), or contact me for further details.