Query logs record the queries and the actions of the users of
search
engines, and as such they contain valuable information about the
interests, the preferences, and the behavior of the users, as well
as
their implicit feedback to search-engine results. Mining the wealth of
information available in the query logs has many
important applications including query-log analysis, user profiling
and personalization, advertising, query recommendation, and more.
We introduce the query-flow graph, a graph
representation of the interesting knowledge about latent querying
behavior.
Intuitively, in the query-flow graph a directed edge from query q_i
to query q_j means that the two queries are likely to be part of the
same ``search mission''.
Any path over the query-flow graph may be seen as a searching
behavior, whose likelihood is given by the strength of the edges along
the path.
The query-flow graph is an outcome of query-log mining and, at the same
time, a useful tool for it.
Using this approach we build a real-world query-flow graph
from a large-scale query log and we demonstrate its utility in
concrete applications, namely, finding logical sessions,
and query recommendation.
We further build an accurate model for classifying user query
reformulations into
broad classes (generalization, specialization, error correction or
parallel move), achieving 92% accuracy. We apply the model to
automatically label two large query logs, creating annotated query-flow
graphs. We study the resulting
reformulation patterns, finding results consistent with previous
studies done on smaller manually annotated datasets, and discovering
new
interesting patterns, including connections between reformulation
types and topical categories.