Master of Science in Engineering in Computer Science
Facoltà di Ingegneria dell'Informazione, Informatica e Statistica
Dipartimento di Ingegneria Informatica, Automatica e Gestionale A. Ruberti
Sapienza Università di Roma

Elective in Software and Services

Section "Big Data Management"

2014/2015

prof. Domenico Lembo


For whom is this course. This 3 credit course is actually one of the sections of the course Elective in Software and Services of the Master of Science in Engineering in Computer Science the Sapienza Università di Roma.

Prerequisites. A good knowledge of the fundamentals of Programming Structures, Programming Languages, Databases (SQL, relational data model, Entity-Relationship data model, conceptual and logical database design) and Database systems.

Course goals. In one sentence, Big Data is data that exceeds the processing capacity of conventional database systems. In particular, Big Data applications deal with huge amounts of data, possibly collected from a huge number of data sources (volume), with highly heterogeneous format (variety), at a very high rate (velocity). This scenario calls for new technologies to be developed, ranging from new data storage mechanisms to new computing frameworks. In this course we will look at several key technologies used in manipulating, storing, and analyzing big data. In particular, we will study architectures for data intensive distributed applications, Data Warehouse solutions, NoSQL storage solutions, including RDF and graph databases.

Lectures

Schedule
  1. Lecture 1: Thursday, October 2, 2014

  2. Lecture 2: Friday, October 10, 2014

  3. Lecture 3: Friday, October 17, 2014

  4. Lecture 4: Friday, October 24, 2014

  5. Lecture 5: Friday, October 31, 2014

  6. Lecture 6: Friday, November 7, 2014

  7. Lecture 7: Friday, November 14, 2014

  8. Lecture 8: Friday, November 21, 2014

  9. Lecture 9: Friday, November 28, 2014

  10. Lecture 10: Friday, December 5, 2014

  11. Lecture 11: Wednesday, December 17, 2014
  12. Lecture 12: Friday, December 19, 2014

Slides

Slides are available at http://elearning2.uniroma1.it/

To access the material enter in the system with your INFOSTUD account and select the course on Big Data Management


Exams

There are three modalities available for exams:

(1) Written exam plus an oral examination. Written exam will consist in some open questions on the topics covered by this section of the elective in Software and Services, and/or some exercises. Students will have 1,5 hour for completing the written exam. The oral examination will be essentially a discussion of the written exam with possible additional questions. To register to an exam a student must send an e-mail to lembo@dis.uniroma1.it by the dates indicated in the schedule below. Registered students that decide not to show up at the written exam are strongly invited to send another email to cancel their registration. Details on oral examination (e.g., date and hour) will be communicated during the written exam (if possible, it will be held just after the written exam).

(2) Development of a small project. Students are strongly encouraged to propose their own idea for projects. As a suggestion, they can refer to (and also select from) the following list of tools. The project connected to a tool consists, for example, in studying the logical data model(s) adopted by the tool, the native storage data structure it uses, the query language it provides, and highlighting further distinguishing features. Also, a demonstration of the basic use of the tool through one or more examples is required. Presentation connected to projects (possibly through slides) should last around 20 minutes (including the demo).

  1. Graph database and RDF tools
    1. Neo4j
    2. Allegrograph
    3. InfiniteGraph
    4. HyperGraphDB
    5. Virtuoso
    6. OrientDB (it has features of both document and graph DBMSs).
  2. key-value database tools
    1. Riak
    2. Redis
    3. MemcachedDB
    4. Voldemort
  3. document database tools
    1. MongoDB
    2. Couchbase
    3. MarkLogic (Enterprise NoSQL)
  4. column-family database tools
    1. Cassandra
    2. Hbase
    3. Hypertable
  5. DataWarehousing tools
    1. Hive
    2. Qlikview (a proprietary front-end tool for Business intelligence. A personal edition can be downloaded for study purposes. Being it a front-end tool, the focus of student analysis should be on the mechanisms provided by the tool for data analytics, and for multidimensional access to data, rather than on data models or storage data structure).

Note: This kind of projects can be developed individually or by groups of two students. In this latter case, presentation should be equally separated into two parts, one managed by each member of the group. In this case, the overall presentation time can be extended to 30-40 minutes.

The exam will consist in the project presentation with possible additional questions on the topics covered by this section of the elective in Software and Services.

To have a project assigned, students must send an email to lembo@dis.uniroma1.it indicating the kind, number, and title of the project they are willing to present (please, do not start working to a project before you have it assigned).

(3) Article Presentation

Article presentation consists in preparing a 20 minute presentation about one of the scientific papers available below or proposed by students.

  1. Database Design for NoSQL Systems
  2. Optimizing Joins in a Map-Reduce Environment (the student does not have to study the appendix)
  3. Hive - A Petabyte Scale Data Warehouse Using Hadoop
  4. To be completed....

Note: This kind of projects can be developed only individually

To have an article assigned, students must send an email to lembo@dis.uniroma1.it indicating the title of the paper they are willing to present (please, do not start working to a paper before you have it assigned).

Note: Both project and paper presentations and paper will be preferably carried out during the office ours (every Thursday afternoon) Students are however required to send an email in advance to fix the exact date and hour of their presentation.

Note: We recall that these exam details refer only to the section on Big Data Management of the course "Elective in Software and Services". Once you have passed the exam of this section, it will be notified to Prof. Giuseppe Santucci, which is the responsible for the course for this academic year. The exam of the overall course of "Elective in Software and Services" will be officially recorded (verbalizzato) through the INFOSTUD system only once the student will have successfully passed the exams of all the sections of the course (or of the sections foreseen on the student's study plan). For details on this final registration please refere to the web page of the course "Elective in Software and Services".

Past Exam Evaluation tests (samples for modality (1))

  1. January 28, 2014
  2. February 20, 2014
  3. June 26, 2014