Integrazione, Warehousing e Mining di sorgenti eterogenee
Prototipo per gli algoritmi di query rewriting e query answering using views e per la riconciliazione dei dati: IBIS

Andrea Calì, Giuseppe De Giacomo, Diego Calvanese, Domenico Lembo, Maurizio Lenzerini

Tema Tema 1: Integrazione di dati provenienti da sorgenti eterogenee
Codice D1-P3
Data 8 ottobre 2002
Tipo di prodotto Prodotto software
Unitā responsabile RM
Unitā coinvolte RM
Autori Andrea Calì, Giuseppe De Giacomo, Diego Calvanese, Domenico Lembo, Maurizio Lenzerini
Autore da contattare Domenico Lembo
Dipartimento di Informatica e Sistemistica
Universitā di Roma La Sapienza 
Presentazione prototipo D1-P3
Documentazione in linea  D1.P3.ps


The Internet-Based Information System (IBIS) is a tool for the semantic integration of heterogeneous data sources, developed at the same time in the project D2I and in the context of a collaboration between the "Dipartimento di Informatica e Sistemistica" (DIS) of the University of Rome "La Sapienza" and CM Sistemi. IBIS adopts innovative solutions to deal with all aspects of a complex data integration environment, including source wrapping, limitations on source access, and query answering under integrity constraints. With regard to the last two aspects, it is worth underlining that the attention of CM sistemi was originally devoted to the problem of query answering in the presence of limitations in accessing the sources, whereas, within the D2I project, DIS mainly studied the problem of query answering in the presence of integrity constraints on the global schema, as described in the deliverables D1.R5: "Survey on methods for query answering and query rewriting using views" and D1.R11:"Methodology and Tools to Reconcile Data". The relevance of the first problem in data integration applications led us to also investigate it in the context of the D2I project, and to study techniques and algorithms to properly process queries in such a setting. Such algorithms are actually implemented in the IBIS system.

IBIS uses a relational global schema to query the data at the sources, and is able to cope with a variety of heterogeneous data sources, including data sources on the Web, relational databases, and legacy sources. Each non-relational source is wrapped to provide a relational view on it. Also, each source is considered sound. The system allows for the specification of integrity constraints on the global schema; in addition, it considers the presence of some forms of constraints on the source schema, in order to perform runtime optimization during data extraction. In particular, key and foreign key constraints can be specified on the global schema, and functional dependencies and full-width inclusion dependencies, i.e., inclusions between entire relations, can be specified on the source schema.

The system has been designed to allow for the specification of either GAV or LAV mappings, and for properly processing queries in both the approaches. However, the current implementation of IBIS supports only the definition of GAV mappings, and implements only query processing techniques for this approach. Furthermore, the framework adopted in IBIS enables for dealing with both incomplete and inconsistent data sources. Actually, the techniques developed in D2I to cope with inconsistent data have not yet been implemented in the system.

More in the details, query processing in IBIS is separated in three phases:

  1. the query is expanded to take into account the integrity constraints in the global schema;
  2. the atoms in the expanded query are unfolded according to their definition in the mapping, obtaining a query expressed over the sources;
  3. the expanded and unfolded query is executed over the retrieved source databases (see below), to produce the answer to the original query.

Query unfolding and execution are the standard steps of query processing in GAV data integration systems, while for the expansion phase IBIS makes use of the algorithms presented in the deliverable D1.R11. The expanded query has to be evaluated over the retrieved global database in order to produce the certain answers to the original query. As the construction of the retrieved global database is computationally costly, the IBIS Expander module does not construct it explicitly. Instead, it unfolds the expanded query and evaluates the unfolded query over the retrieved source database, whose data are extracted by the Extractor module that retrieves from the sources all the tuples that may be used to answer the original query.

It is worth noticing that for LAV  mappings, phases 2 and 3 of the query answering process in IBIS can be easily replaced by a query rewriting procedure, as the one described in the deliverable D1.R11.

Ambiente di sviluppo e di esecuzione

Developed in Java and Visual C++ 6.0 within the Microsoft Com+ environment.
Runnable under Windows 2000 Advanced Server.



Sito a cura di Domenico Lembo