For whom is this course. This 3 credit course is actually
one of the sections of the course Elective in Software and Services of
the Master of Science in Engineering in Computer Science the Sapienza
Università di Roma.
Prerequisites. A good knowledge of the fundamentals of
Programming Structures, Programming Languages, Databases (SQL,
relational data model, Entity-Relationship data model, conceptual and
logical database design) and Database systems.
Course goals. In one sentence, Big Data is data that exceeds the
processing capacity of conventional database systems. In particular,
Big Data applications deal with huge amounts of data, possibly
collected from a huge number of data sources (volume), with
highly heterogeneous format (variety), at a very high rate (velocity).
This scenario calls for new technologies to be developed, ranging from
new data storage mechanisms to new computing frameworks. In this course
we will look at several key technologies used in manipulating, storing,
and analyzing big data. In particular, we will study architectures for
data intensive distributed applications, Data Warehouse solutions,
NoSQL storage solutions, including RDF and graph databases.
Lectures
- When: during the first semester (September 29 - December
20, 2014), every Friday at 2:00pm - 3:30pm, and, sometimes, on Thursday
at 2:00pm - 3:30pm (check the schedule)
- Where: via Ariosto 25, Roma - classroom A3
Schedule
- Lecture 1: Thursday, October 2, 2014
- Introduction to Big Data Management
- Lecture 2: Friday, October 10, 2014
- Data Warehousing: Introduction and architectures
- Lecture 3: Friday, October 17, 2014
- Data Warehousing: ETL, Multidimensional Model, Accessing Data
Warehouses (Reports, OLAP, Dashboards, Data Mining)
- Lecture 4: Friday, October 24, 2014
- Data Warehousing: ROLAP, MOLAP, Methodology issues,
Dimensional Fact Model (basic constructs)
- Lecture 5: Friday, October 31, 2014
- Data Warehousing: Dimensional Fact Model (advanced constructs),
Logical Models for data Marts, Star Schema and its variants
- Lecture 6: Friday, November 7, 2014
- Data Warehousing: Views; Logical Design
- Graph databases: introduction; Graph Databases vs. Relational Databases; Regular Path Queries and GDBMSs
- Lecture 7: Friday, November 14, 2014
- Graph Databases: Implementation of Graphs; Types of Graph databases; Resource
Description Framework (introduction)
- Lecture 8: Friday, November 21, 2014
- Graph Databases: Resource Description Framework; SPARQL
- Lecture 9: Friday, November 28, 2014
- Graph Databases: linked data
- NoSQL databases: introduction to aggregate DBs
- Lecture 10: Friday, December 5, 2014
- NoSQL: Aggregate Databases (key-values; document stores; Column-family databases)
- Lecture 11: Wednesday, December 17, 2014
- Presentation of two seminars:
- Luca Garulli (orientechnologies), The Multi-Model NoSQL Approach (http://www.orientechnologies.com/orientdb/)
- Stefano Grossi and Alessandra De Castro (Sogei S.p.A.), Data Discovery: A practical experience. During this seminar, a demo of Qlikview has been presented. Qlikview is a tool for OLAP and Business Discovery
- Lecture 12: Friday, December 19, 2014
- NoSQL: Aggregate Databases (Distribution models; Consistency; Map-Reduce).
Slides
Slides are available at http://elearning2.uniroma1.it/
To access the material enter in the system with your INFOSTUD
account and select the course on Big Data Management
Exams
There are three modalities available for exams:
(1) Written exam plus an oral examination. Written exam will consist
in some open questions on the topics covered by this
section of the elective in Software and Services, and/or some
exercises. Students will have 1,5 hour for completing the written exam. The oral examination will be essentially a
discussion of the written exam with possible additional questions. To register to an exam a student must send
an e-mail to lembo@dis.uniroma1.it by the dates indicated in the
schedule below. Registered
students that decide not to show up at the written exam are strongly
invited to send another email to cancel their registration. Details on
oral examination (e.g., date and hour) will be communicated during the
written exam (if possible, it will be held just after the written exam).
- Schedule of final exams
- First written exam: January 30, 2015, 9:30 a.m., room A3. (UPDATED DATE)
Deadline for registration: January 23, 2015
- Second written exam: February 24, 2015, 9:30 a.m., room B203 (second floor) NEW DATE.
Deadline for registration: February 20, 2015.
- Third written exam: June 25, 2015, 10:00 a.m., room A3.
Deadline for registration: June 21, 2015.
- Forth written exam: July 23, 2015, 9:30 a.m., room A3.
Deadline for registration: July 19, 2015.
- Fifth written exam: September 16, 2015, 9:30 a.m., room A3.
Deadline for registration: September 12, 2015.
(2) Development of a small project. Students are strongly encouraged to propose their own idea for projects. As a suggestion, they can refer to (and also select from) the following list of tools. The project connected to a tool consists, for example, in studying the logical data model(s) adopted by the tool, the native storage data structure it uses, the query language it provides, and highlighting further distinguishing features. Also, a demonstration of the basic use of the tool through one or more examples is required. Presentation connected to projects (possibly through slides) should last around 20 minutes (including the demo).
- Graph database and RDF tools
- Neo4j
-
Allegrograph
-
InfiniteGraph
-
HyperGraphDB
-
Virtuoso
-
OrientDB (it has features of both document and graph DBMSs).
- key-value database tools
- Riak
-
Redis
-
MemcachedDB
-
Voldemort
- document database tools
- MongoDB
-
Couchbase
-
MarkLogic (Enterprise NoSQL)
- column-family database tools
- Cassandra
-
Hbase
-
Hypertable
- DataWarehousing tools
- Hive
- Qlikview (a
proprietary front-end tool for Business intelligence. A personal
edition can be downloaded for study purposes. Being it a front-end
tool, the focus of student analysis should be on the mechanisms
provided by the tool for data analytics, and for multidimensional
access to data, rather than on data models or storage data structure).
Note: This kind of projects can be developed individually or
by groups of two students. In this latter case,
presentation should be equally separated into two parts, one managed by
each member of the group. In this case, the overall presentation time
can be extended to 30-40 minutes.
The exam will consist in the project presentation with possible additional questions on the
topics covered by this
section of the elective in Software and Services.
To have a project assigned, students must send an email to
lembo@dis.uniroma1.it
indicating the kind, number, and title of the project they are willing
to present (please, do not start working to a project before you have
it assigned).
(3) Article Presentation
Article presentation consists in preparing a 20 minute presentation about
one of the scientific papers available below or proposed by students.
- Database Design for NoSQL Systems
- Optimizing Joins in a Map-Reduce Environment (the student does not have to study the appendix)
- Hive - A Petabyte Scale Data
Warehouse Using Hadoop
- To be completed....
Note: This kind of projects can be developed only individually
To have an article assigned, students must send an email to
lembo@dis.uniroma1.it
indicating the title of the paper they are willing
to present (please, do not start working to a paper before you have
it assigned).
Note: Both project and paper presentations and paper will be preferably
carried out during the office ours (every Thursday afternoon) Students
are however required to send an email in advance to fix the exact date
and hour of their presentation.
Note: We recall that these exam details refer only to the
section on Big Data Management of the course "Elective in Software and
Services". Once you have passed the exam of this section, it will be
notified to Prof. Giuseppe Santucci, which is the responsible for the
course for this academic year. The exam of the overall course of
"Elective in Software and Services" will be officially recorded
(verbalizzato) through the INFOSTUD system only once the student will
have successfully passed the exams of all the sections of the course
(or of the sections foreseen on the student's study plan). For details
on this final registration please refere to the web
page of the course "Elective in Software and Services".
Past Exam Evaluation tests (samples for modality (1))
- January 28, 2014
- February 20, 2014
- June 26, 2014