Information Integration (academic year 2018/2019)

This is one of the two sections of the course Large Scale Data Management. The lectures of this section will be held in March-May 2019.

For whom is this course. This 3 credits course is actually one of the two sections of the course Large Scale Data Management for the students of the Master in Engineering of Computer Science (School of Engineering) of Sapienza Università di Roma.
Prerequisites. A good knowledge of the fundamentals of Programming Structures, Programming Languages, Databases (SQL, relational data model, Entity-Relationship data model, conceptual and logical database design) and Database systems, as well as a basic knowledge of Mathematical Logic is required.
Course goals. Information integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing information integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from both a theoretical and a practical point of view. In the last years, there has been a huge amount of research work on data integration, and a precise, clear picture of a systematic approach to such problem is now available. This section will present an overview of the research work carried out in the area of data integration, with emphasis on the theoretical results that are relevant for the development of information integration solutions. Special attention will be devoted to the following aspects: architectures for information integration, modeling an information integration application, ontology-based data access and integration, processing queries in information integration, data exchange, and reasoning on queries.

  • News
    • February 19, 2019 The lectures of the course will start on Thursday, February 28, at 10:00am in classroom A5.
  • Topics covered
    • Architectures for information integration
    • Distributed data management
    • Data federation
    • Data exchange and data warehousing
    • ETL (Extraction, Transformation and Loading), data cleaning and data reconciliation
    • Data integration
    • Ontology-based data integration
  • Teaching material
    • Before the beginning of the lectures, students are invited to (re)study the basic notions of propositional and first-order logic. For this purpose, students may use the material they used in previous courses, or have a look at:
    • Slides
      The lecture notes can be downloaded from the course page in Moodle

    • Book
      A good book on information integration is: Principles of data integration, by AnHai Doan, Alon Halevy, Zachary Ives.

    • Papers
      This is a list of papers that students can read if they are interested in specific topics:

      • Reasoning about schema mapping
        • Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, Wang Chiew Tan. Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Database Syst. 30(4): 994-1055 (2005)
        • Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, Wang Chiew Tan: Characterizing schema mappings via data examples. ACM Trans. Database Syst. 36(4): 23 (2011)
      • Query answering
        • Alon Y. Levy, Alberto O. Mendelzon, Yehoshua Sagiv, Divesh Srivastava. Answering Queries Using Views. PODS 1995: 95-104
        • Rachel Pottinger, Alon Halevy. MiniCon: A scalable algorithm for answering queries using views.The VLDB Journal” The International Journal on Very Large Data Bases, Volume 10, Issue 2-3 (September 2001)
        • Oliver M. Duschka, Michael R. Genesereth, Alon Y. Levy. Recursive Query Plans for Data Integration. J. Log. Program. 43(1): 49-73 (2000)
        • George Konstantinidis, José Luis Ambite. Scalable query rewriting: a graph-based approach, SIGMOD '11 Proceedings of the 2011 international conference on Management of data.
      • Probabilistic data integration
        • Xin Luna Dong, Alon Y. Halevy, Cong Yu. Data integration with uncertainty. VLDB J. 18(2): 469-500 (2009)
      • Query answering under inconsistencies
        • Andrea Calì, Domenico Lembo, Riccardo Rosati. On the decidability and complexity of query answering over inconsistent and incomplete databases. PODS 2003: 260-271
        • Marcelo Arenas, Leopoldo E. Bertossi, Jan Chomicki. Consistent Query Answers in Inconsistent Databases. PODS 1999: 68-79
        • Balder ten Cate, Gaëlle Fontaine, Phokion G. Kolaitis: On the Data Complexity of Consistent Query Answering. Theory Comput. Syst. 57(4): 843-891 (2015)
      • Data cleaning and reconciliation
        • Douglas Burdick, Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, Wang Chiew Tan: Expressive Power of Entity-Linking Frameworks. ICDT 2017: 10:1-10:18
        • Anja Gruenheid, Xin Luna Dong, Divesh Srivastava: Incremental Record Linkage. PVLDB 7(9): 697-708 (2014)
      • Ontology-based data integration
        • Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, Mariano Rodriguez-Muro, Riccardo Rosati: Ontologies and Databases: The DL-Lite Approach. Reasoning Web 2009: 255-356
  • Exams
    The rules for the exams will be published shortly.

  • Schedule of exams
    • First exam: June 2019
    • Second exam: July 2019
    • Third exam: September 2019
    • First special session: October 2019
    • Fourth exam: January 2020
    • Fifth exam: February 2020
    • Second special session: April 2020
  • Lectures
    • When: Monday, 9:00am - 11:00am, Thursday, 2:00pm - 5:00pm,
      starting from February 26, 2018.

    • Where: Classroom A5, via Ariosto 25, Roma
    • Schedule

      Week Thursday (8:00am - 10:00am)
      classroom A5
      Thursday (10:00am - 01:00pm)
      classroom A5
      01 (Feb 25)
      Lectures 1,2,3
      - Introduction to information integration
      - Propositional logic: syntax and semantics
      02 (Mar 04)
      03 (Mar 11)
      Lectures 4,5,6
      - First-order logic
      - Relationship between logic and data management
      04 (Mar 18)
      Lectures 7,8,9
      - The various forms of information integration
      - Logical formalization of data integration
      05 (Mar 25)
      06 (Apr 01)
      07 (Apr 08)
      08 (Apr 15)
      09 (Apr 22)
      10 (Apr 29)
      11 (May 06)
      12 (May 13)
      13 (May 20)
      14 (May 27)

  • Past editions
  • Office hours. Tuesday, 5:00 pm, at the Dipartimento di Informatica e Sistemistica "Antonio Ruberti",
    via Ariosto 25, Roma, second floor, room B203 (if available), or room B217 (otherwise) -- please, look at the last
    minute news for the next office hours