Corso di laurea magistrale in Data Science
Facoltà di Ingegneria dell'Informazione, Informatica e Statistica, Sapienza Università di Roma

Data Management for Data Science -
Homework Assignments

2017/2018

Prof. Riccardo Rosati


Students can present homeworks during the lectures. There will be four homework assignments, which will be announced on this web page.

Assignment 1 - SQL

Choose an application domain and, using a relational DBMS, build a database. This can be done in two ways:

The work must be done by groups of two students.

Students can use publicly available DBMSs like MySQL or PostgreSQL (see below), or other, commercial DBMSs.

The complexity of the queries produced should be at least comparable to the specification appearing in the following exercise on SQL.

The presentation of the work done wlll consist of a short (10-15 minutes) session in which the student(s) will show the work done by directly interacting with the relational DBMS on her/his own laptop.

Such presentations will be held during the lecture of April 10, 2018.

Useful links:


Assignment 2 - SQL evaluation and optimization

Starting from the database developed in the first homework, every group has to identify at least 2*n SQL queries (where n is the number of students in the group) that pose performance problems to the DBMS. The students have to show how the use of indices and/or views and/or integrity constraints and/or query reformulation and/or schema restructuring makes query execution significantly faster on the DBMS.

Students can present the second homework either during the lecture of May 7, 2018 or during the lecture of May 14, 2018.


Assignment 3 - NoSQL

Use a NoSQL tool (graph database, column database, key-value database, etc.) to manage and query a dataset. Ideally, the groups should use the same dataset as the one used in the first (and/or second) homework. Examples of such systems include (but are not limited to) MongoDB, Neo4J and GraphDB (see the course material on aggregated databases, graph databases and RDF databases for more details).

The work must be done by the same student groups who presented the first and second homework assignments.

The presentation of the work done wlll consist of a short (at most 10-15 minutes) session in which the student(s) will show the work done by directly interacting with the NoSQL system on her/his/their own laptop, highlighting the differences with respect to a standard (SQL) relational database system.

The presentations of the third homework will be held during the lectures of May 28, 2018 and May 29, 2018.