Seminario Interdipartimentale di Algoritmica
 

Monday, May 7, 2007, 12:00 noon
Monotony and Surprise: Conservative Approaches to Pattern Discovery

Alberto Apostolico, Accademia Nazionale dei Lincei & Georgia Tech

DIS - Department of Computer Enigeering, Via Salaria 113
Room C3, second floor

Abstract:

The problem of characterizing and detecting  surprisingly recurrent sequence patterns such as substrings or motifs and related associations or rules is pursued ubiquitously in order to compress data, unveil structure, infer succinct descriptions, extract and classify features, etc.   In Molecular Biology, some such patterns are variously implicated in facets of biological structure and function. Because of that, Pattern  Discovery constitutes one of the most battered, flourishing and arguably useful applications of Computational Molecular Biology. The very notion of a pattern still embodies subtleties and ambiguities, as do related concepts such as class and structure. The discovery, particularly on a massive scale, of surprising patterns and correlations thereof  poses interesting methodological and algorithmic problems. Often, the tables and descriptors at the outset grow faster and bigger than the phenomena they are meant to encapsulate, which generates daunting computational burdens, and gives rise to a throughput that seems impossible to visualize and digest. While part of these problems are endemic, another part seems rooted in the characterizations traditionally offered for the notion of a motif or association pattern, that are typically based either on syntax or on statistics alone. This talk reviews such classical notions as well as alternate ones,  based on constraints of pattern saturation and monotonicity of scores, that tightly combine syntactic and statistical specifications, and shows how they afford some parsimony in the generation and testing of surprising patterns.