# Information Integration (academic year 2011/2012)

This is one of the sections of the course Elective in Software and Services (Complementi di software e servizi per la società dell'informazione). The lectures of this section were held in February-May 2012.

**For whom is this course.**This 3 credits course is actually one of the sections of the course Complementi di software e servizi per la società dell'informazione for the students of the Laurea Magistrale in Ingegneria Informatica of the Sapienza Università di Roma, and the students of the Master in Computer Engineering (School of Engineering) of the Sapienza Università di Roma.

**Prerequisites.**A good knowledge of the fundamentals of Programming Structures, Programming Languages, Databases (SQL, relational data model, Entity-Relationship data model, conceptual and logical database design) and Database systems, as well as a basic knowledge of Mathematical Logic is required.

**Course goals.**Information integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing information integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from both a theoretical and a practical point of view. In the last years, there has been a huge amount of research work on data integration, and a precise, clear picture of a systematic approach to such problem is now available. This section will present an overview of the research work carried out in the area of data integration, with emphasis on the theoretical results that are relevant for the development of information integration solutions. Special attention will be devoted to the following aspects: architectures for information integration, modeling an information integration application, ontology-based data access and integration, processing queries in information integration, data exchange, and reasoning on queries.

**News****April 6, 2013**The students who are "fuori corso" and want to do the exam of Information Integration in April 2013 should send a message to Prof. Lenzerini indicating in which of the following dates:

- April 9, 4pm

- April 26, 4pm

- April 23, 4pm

the student would like to make the presentation.

We remind the students that the registration of the exam is for the whole course "Elective in Software and Services", and not for the single section of Information Integration. The dates of the registration of the exam are published in the home page of Elective in Software and Service.

**Teaching material**- Before the beginning of the lectures, students are invited to (re)study the basic notions of propositional and first-order logic. For this purpose, students may use the material they used in previous courses, or have a look at:
- Introduction to propositional logic (paper)
- Introduction to first-order logic
- FOL and conjunctive queries (from the material of "Metodi Formali per il Software e i Servizi", by Giuseppe De Giacomo)

- Slides

- Before the beginning of the lectures, students are invited to (re)study the basic notions of propositional and first-order logic. For this purpose, students may use the material they used in previous courses, or have a look at:
**Lectures****Exams**

For the exam, each student should prepare a 15 minute presentation (using slides - either .ppt file of .pdf file) on a specific topic.The possible topics of the presentations must be chosen among the following list (use Google to find the papers and download them):

- 0.
*Commercial or academic tool/system for data federation or data integration* - 1. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Moshe Y. Vardi.
*Rewriting of Regular Expressions and Regular Path Queries*. In J. Comput. Syst. Sci. 64(3):443-465, 2002 - 2. Alon Y. Levy, Alberto O. Mendelzon, Yehoshua Sagiv, Divesh Srivastava.
*Answering Queries Using Views*. PODS 1995: 95-104 - 3. Rachel Pottinger, Alon Halevy.
*MiniCon: A scalable algorithm for answering queries using views*.The VLDB Journal” The International Journal on Very Large Data Bases, Volume 10, Issue 2-3 (September 2001) - 4. Oliver M. Duschka, Michael R. Genesereth, Alon Y. Levy.
*Recursive Query Plans for Data Integration*. J. Log. Program. 43(1): 49-73 (2000) - 5. Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Riccardo Rosati.
*Linking Data to Ontologies*. J. Data Semantics 10: 133-173 (2008) - 6. Philippe Adjiman, Philippe Chatalic, Francois Goasdou, Marie-Christine Rousset, Laurent Simon.
*Distributed Reasoning in a Peer-to-Peer Setting: Application to the Semantic Web.*Journal of Artificial Intelligence Research (JAIR) 25: 269-314 (2006) - 7. Paolo Atzeni, Paolo Cappellari, Philip A. Bernstein.
*Model-Independent Schema and Data Translation*. EDBT 2006: 368-385 - 8. Xin Luna Dong, Alon Y. Halevy, Cong Yu.
*Data integration with uncertainty*. VLDB J. 18(2): 469-500 (2009) - 9. Ronald Fagin, Phokion G. Kolaitis, Rene J. Miller, Lucian Popa.
*Data Exchange: Semantics and Query Answering*. ICDT 2003: 207-224 - 10. Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, Wang Chiew Tan.
*Composing schema mappings: Second-order dependencies to the rescue*. ACM Trans. Database Syst. 30(4): 994-1055 (2005) - 11. Andrea Calì, Domenico Lembo, Riccardo Rosati.
*On the decidability and complexity of query answering over inconsistent and incomplete databases*. PODS 2003: 260-271 - 12. Jens Bleiholder, Felix Naumann.
*Data fusion*. ACM Comput. Surv. 41(1): (2008) - 13. Marcelo Arenas, Leopoldo E. Bertossi, Jan Chomicki.
*Consistent Query Answers in Inconsistent Databases*. PODS 1999: 68-79 - 15. Gosta Grahne, Alberto O. Mendelzon.
*Tableau Techniques for Querying Information Sources through Global Schemas*. ICDT 1999: 332-347 - 16. George Konstantinidis, José Luis Ambite.
*Scalable query rewriting: a graph-based approach*, SIGMOD '11 Proceedings of the 2011 international conference on Management of data. - 17. Hector Gonzalez, Alon Y. Halevy, Christian S. Jensen, Anno Langen, Jayant Madhavan, Rebecca Shapley, Warren Shen, Jonathan Goldberg-Kidon.
*Google fusion tables: web-centered data management and collaboration*. SIGMOD Conference 2010: 1061-1066 - 100.
*Any other paper on data integration* - If the topic is N.0, the student should explain the relationship between the tool and the general topic of data integration, should illustrate the features of the tool, relate such features with the concepts, the theory and the techniques studied in the course, and, hopefully, give a short demo of the tool.
- If the topic is a topic dealt with in a paper, then the student should explain the relationship between the paper and the general topic of data integration, should illustrate the content of the paper, and relate such content with the concepts, the theory and the techniques studied in the course.

Once the student has chosen the topic, (s)he should send an email message to Prof. Lenzerini with the indication of the topic he has chosen, and wait for the confirmation. If the topic is N.0, then the student must also indicate the tool he has chosen to investigate. If the topic is N.100, then the student must also indicate the paper he has chosen to study. If the chosen topic (or tool/system) is already taken, the student will be asked to pick up a new topic (or tool/system).

Once the student is ready for the presentation, (s)he should send an email message to prof. Lenzerini with the indication of the date when (s)he wants to give the exam, chosen among the following dates:

- June 26, 2012, at 4pm, in room B203

- July 3, 2012, at 4pm, in room B203

- July 10, 2012, at 4pm, in room B203

- July 17, 2012, at 4pm, in room B203The presentation (15 minute long) should be organized as follows:

**Past editions**-
**Office hours.**Tuesday, 5:00 pm, at the Dipartimento di Informatica e Sistemistica "Antonio Ruberti", via Ariosto 25, Roma, second floor, room B203 (if available), or room B217 (otherwise) -- please, look at the last minute news for the next office hours