Seminario Interdipartimentale
di Algoritmica
Monday,
May 7, 2007, 12:00 noon
Monotony and Surprise: Conservative Approaches to Pattern Discovery
Alberto Apostolico, Accademia Nazionale dei Lincei & Georgia Tech
DIS - Department of Computer
Enigeering,
Via Salaria 113
Room C3, second floor
Abstract:
The
problem of characterizing and detecting surprisingly recurrent
sequence patterns such as substrings or motifs and related associations
or rules is pursued ubiquitously in order to compress data, unveil
structure, infer succinct descriptions, extract and classify features,
etc. In Molecular Biology, some such patterns are variously
implicated
in facets of biological structure and function. Because of that,
Pattern
Discovery constitutes one of the most battered, flourishing and
arguably
useful applications of Computational Molecular Biology. The very notion
of a pattern still embodies subtleties and ambiguities, as do related
concepts such as class and structure.
The discovery, particularly on a massive scale, of surprising
patterns and correlations thereof poses interesting
methodological and
algorithmic problems. Often, the tables and descriptors at the outset
grow faster and bigger than the phenomena they are meant to
encapsulate,
which generates daunting computational burdens, and gives rise to a
throughput that seems impossible to visualize and digest. While part of
these problems are endemic, another part seems rooted in the
characterizations traditionally offered for the notion of a motif or
association pattern, that are typically based either on syntax or on
statistics alone. This talk reviews such classical notions as well as
alternate ones, based on
constraints of pattern saturation and monotonicity of scores, that
tightly combine syntactic and statistical specifications, and shows how
they afford some parsimony in the generation and testing of surprising
patterns.