Seminario
Interdipartimentale di Algoritmica
Monday, January 16, 2006, 12:00 noon
Research Issues in Data Summarization and Approximate Query Answering:
Efficiency, Accuracy, Privacy Preservation, Sensor Data Streams
Domenico Saccà, Università della Calabria
DI - Department
of Computer Science
Seminar Room, third floor
Abstract:
In many
application contexts, like statistical databases, transaction recording
systems, scientific databases, query optimizers, OLAP (On-line
Analytical Processing), and many others, a multidimensional view of
data is often adopted: Data are stored in multidimensional
arrays, called datacubes, where every aggregation query (e.g., sum of
the values contained inside a range, or number of occurrences of
distinct values) can be answered by visiting sequentially a sub-array
covering the range. In demanding applications, in order to both save
storage space and support fast access, datacubes are summarized into
lossy synopses of aggregate values and range queries are executed over
aggregate data rather than over raw ones, thus returning approximate
answers. Approximate query answering is very useful when the user wants
to have fast answers without being forced to wait a long time to get a
precision which often is not necessary. The talk will discuss a number
of challenging research issues involved in data summarization and
approximate query answering: (1) efficiency (techniques for summarizing
multi-dimensional data based on binary hierarchical histograms), (2)
accuracy (indices for improving query estimation inside histogram
blocks), (3) privacy preservation (evaluation of whether single data
can be reconstructed from summarized ones) and (4) sensor data streams
(handling data continuously produced by a network of sensors, possibly
within a grid framework).