Seminario Interdipartimentale di Algoritmica  

Monday, January 16, 2006, 12:00 noon
Research Issues in Data Summarization and Approximate Query Answering: Efficiency, Accuracy, Privacy Preservation, Sensor Data Streams

Domenico Saccà, Università della Calabria

DI - Department of Computer Science
Seminar Room, third floor


Abstract:

In many application contexts, like statistical databases, transaction recording systems, scientific databases, query optimizers, OLAP (On-line Analytical Processing), and many others, a multidimensional view of data is often adopted:  Data are stored in multidimensional arrays, called datacubes, where every aggregation query (e.g., sum of the values contained inside a range, or number of occurrences of distinct values) can be answered by visiting sequentially a sub-array covering the range. In demanding applications, in order to both save storage space and support fast access, datacubes are summarized into lossy synopses of aggregate values and range queries are executed over aggregate data rather than over raw ones, thus returning approximate answers. Approximate query answering is very useful when the user wants to have fast answers without being forced to wait a long time to get a precision which often is not necessary. The talk will discuss a number of challenging research issues involved in data summarization and approximate query answering: (1) efficiency (techniques for summarizing multi-dimensional data based on binary hierarchical histograms), (2) accuracy (indices for improving query estimation inside histogram blocks), (3) privacy preservation (evaluation of whether single data can be reconstructed from summarized ones) and (4) sensor data streams (handling data continuously produced by a network of sensors, possibly within a grid framework).