Elective in Computer Networks II - Section 2
Introduction to recommender systems
Teacher: Luca Becchetti
Syllabus
Massive
amounts of data are produced every day in the Internet and by
important Web applications such as search engines and social
networking platforms. Sources of such data are for example traffic
logs in IP routers, search engines' query-logs, logs of
user activity on social networking or e-commerce platforms to
name a few. Analyzing such data sources in order to extract
interesting trends and patterns is the main purpose of Data Mining and
it is of fundamental importance in many applications.
The area is vast and and this course will cover two specific, yet
representative aspects. A first part of the course will present
some of the main techniques that are used in important Web
applications to profile users, with an emphasis on recommendation
scenarios, since the challenges these present and the techniques used
are of more general interest. A second part of the course will cover
techniques to maintain statistics of interest over data streams,
with a focus on cases in which the statistics of interest have to be
maintained on the fly and/or the massive data size demands strictly
sublinear time algorithms.
Topics
1.
Introductory part
-
Events and probability
- Discrete variable, expected value and variance
- Deviations
- Hash functions
The
topics
above
will
be
presented
along
the course as and when needed
2. Main techniques for user profiling and recommendation
- Content-based methods
- Collaborative filtering
- User-based
- Item-based
- Hybrid methods
- Recommending contacts in social networks: social matching and
link prediction
- Algorithms for estimating set intersection
3. Link prediction and social matching
- Motivations
- Link prediction as a problem of node similarity estimation
- Link predictiors: neighbouhood based, path based
- Random walk based similarity measures
Material
Slides
References
Introductory
part: Chapters 1-4 of: Michael Mitzenmacher,Eli Upfal. Probability
and computing: randomized algorithms and probabilistic analysis.
Cambridge University Press.
Collaborative
filtering: user-based
1. Paul Resnick,
Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl.
1994. GroupLens: an open architecture for collaborative filtering of
netnews. In Proceedings
of
the
1994
ACM
conference
on Computer supported cooperative work
(CSCW '94). Available at http://pages.cpsc.ucalgary.ca/~saul/wiki/uploads/HCIPapers/resnick-grouplens-cscw94.pdf
2. Jon Herlocker,
Joseph A. Konstan and John Riedl.
An
Empirical Analysis of Design Choices in Neighborhood-Based
Collaborative Filtering Algorithms. Information Retrieval Volume 5,
Number 4, pp. , Springer. Available
at http://eurovision.tribler.org/trac/raw-attachment/wiki/SimilarityFunction/herlocker_02_empirical.pdf
Collaborative
filtering: item-based
1. Amazon.com
Recommendations: Item-to-Item Collaborative Filtering Linden, Smith,
and York, IEEE Internet Computing, 7:76--80, 2002. Available at http://debian.tribler.org/trac/raw-attachment/wiki/SimilarityFunction/Amazon-Recommendations.pdf
2. Badrul Sarwar,
George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based
collaborative filtering recommendation algorithms. In Proceedings of
the 10th international conference on World Wide Web (WWW '01). ACM,
New York, NY, USA, 285-295. Available at
http://glaros.dtc.umn.edu/gkhome/fetch/papers/itempsWWW01.pdf
Collaborative
filtering: surveys and more
1. Adomavicius,
G.
Toward the
next generation of recommender systems: a survey of the
state-of-the-art and possible extensions. IEEE Transactions on
Knowledge and Data Engineering, Vol. 17(6), pp. 734-749, 2005.
Available at http://ids.csom.umn.edu/faculty/gedas/papers/recommender-systems-survey-2005.pdf
2. The Netflix
Challenge prize rules. Read the "prize structure" section for the RMSE
evaluation method.
http://www.netflixprize.com/assets/rules.pdf
Compact set
representation
1. Andrei Z.
Broder. 2000. Identifying and Filtering Near-Duplicate Documents. In Proceedings of the
11th Annual Symposium on Combinatorial Pattern Matching
(COM '00), Raffaele Giancarlo and David Sankoff (Eds.).
Springer-Verlag, London, UK, 1-10. Available at http://www.cs.brown.edu/courses/cs253/papers/nearduplicate.pdf
Link prediction
1. David
Liben‐Nowell and Jon Kleinberg. "The link‐prediction problem for
social networks." Journal of the American society for information
science and technology 58.7 (2007): 1019-1031. Available at http://snap.stanford.edu/class/cs224w-readings/nowell04linkprediction.pdf
Social matching (suggested reading)
1. Terveen,
Loren, and David W. McDonald. "Social matching: A framework and
research agenda." ACM transactions on computer-human interaction
(TOCHI) 12.3 (2005): 401-434. Available at http://grouplens.org/system/files/Terveen-SocialMatching.pdf
Students
have to take a written exam and have to present a project.
Written Exam
The
written exam will consist of 2 or 3 assignments.
Project
Students
should participate to a project. The project will consist in an
assignment in which a group of maximum two students will
cooperate to perform an experimental study on some of the
aspects considered during the course. The outcome of the project
will consist in i) the code produced to perform the experiments;
ii) the data (statistics) produced during the experiments; iii)
a technical report describing the work done and the results
obtained. The technical report should have the form of a
conference paper. It should be between 4 and 5 pages long and
follow the ACM SIGs proceedings format. Word and LateX templates
are available at http://www.acm.org/sigs/publications/proceedings-templates#aL.
In particular, the report should contain an introduction, a
section describing the experimental setting (the dataset, the
platform used, the performance indices considered etc.), one
section describing and commenting on the results obtained and a
conclusion section, highlighting significant aspects (positive
or negative) of the results you obtained, and pointing to
directions that in your opinion deserve further
investigation. The introduction should provide an overview
of the problem studied, the methods/approach used and the result
obtained. It should also point to and comment some key
references on the topic (e.g., the main papers in which the
approach you are using was proposed/used before).
It is expected that you devote some effort to the writeup of the
technical report. The way you present the material is important.
Your report should help me understand what you have done, the
critical decisions you hade to take etc. Reading or simply
browsing some well-written scientific paper will definitely help
you improve the presentation. To this purpose, a good idea is
reading the suggested references listed above!
Grades
Each student will receive at most 20 points for the written exam
and at most 10 for the project.
Forming a group
Please contact me by email to notify the names of the members of
your team and receive your assignment. You can propose a topic
you liked and I will try to assign a related project to you.
Discussing your project
Each student should arrange an appointment and discuss the
project s/he participated in before taking part to the written
exam. Before that, I should receive by email (at least 3 days
before the appointment) a pdf copy of the report describing your
work. This means that I should receive this before any
participant to a group discusses his/her project with me.