Corso di reti

Elective in Computer Networks II - Section 2
Introduction to recommender systems

Teacher: Luca Becchetti

Syllabus (tentative)
Course material (tentative)
Exam

Syllabus

Massive amounts of data are produced every day in the Internet and by important Web applications such as search engines and social networking platforms. Sources of such data are for example traffic logs in IP routers, search engines' query-logs, logs of user activity on social networking or e-commerce platforms to name a few. Analyzing such data sources in order to extract interesting trends and patterns is the main purpose of Data Mining and it is of fundamental importance in many applications.
The area is vast and and this course will cover two specific, yet representative aspects. A first part of the course will present some of the main techniques that are used in important Web applications to profile users, with an emphasis on recommendation scenarios, since the challenges these present and the techniques used are of more general interest. A second part of the course will cover techniques to maintain statistics of interest over data streams, with a focus on cases in which the statistics of interest have to be maintained on the fly and/or the massive data size demands strictly sublinear time algorithms.

Topics

1. Introductory part

- Events and probability
- Discrete variable, expected value and variance
- Deviations
- Hash functions

The topics above will be presented along the course as and when needed

2. Main techniques for user profiling and recommendation
- Content-based methods
- Collaborative filtering
- User-based
- Item-based
- Hybrid methods
- Recommending contacts in social networks: social matching and link prediction
- Algorithms for estimating set intersection

3. Link prediction and social matching
- Motivations
- Link prediction as a problem of node similarity estimation
- Link predictiors: neighbouhood based, path based
- Random walk based similarity measures

Material

Slides

Introductory part. See also "Crash course on probability" slides by Prof. Anagnostopoulos
Main techniques for user profiling and recommendations: slides are those of the Course by Rahul Sami at Michigan's Open University: http://open.umich.edu/education/si/si583/winter2009. In particular, we considered slides for lectures 1, 5, 6, 8, 10. All material (including slides) can be downloaded at http://open.umich.edu/node/1638/zip_download. Please consider that, unless otherwise note this material is made available under the terms of the Creative Commons Attribution Noncommercial Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/
Here is a slightly revised version of lecture 5 above
Compact set represenation (Broder's sketches)
Link prediction and social matching

References
Introductory part: Chapters 1-4 of: Michael Mitzenmacher,Eli Upfal. Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press.

Collaborative filtering: user-based
1. Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl. 1994. GroupLens: an open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM conference on Computer supported cooperative work (CSCW '94). Available at http://pages.cpsc.ucalgary.ca/~saul/wiki/uploads/HCIPapers/resnick-grouplens-cscw94.pdf
2. Jon Herlocker, Joseph A. Konstan and John Riedl. An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms. Information Retrieval Volume 5, Number 4, pp. 287-310, Springer. Available at http://eurovision.tribler.org/trac/raw-attachment/wiki/SimilarityFunction/herlocker_02_empirical.pdf

Collaborative filtering: item-based
1. Amazon.com Recommendations: Item-to-Item Collaborative Filtering Linden, Smith, and York, IEEE Internet Computing, 7:76--80, 2002. Available at http://debian.tribler.org/trac/raw-attachment/wiki/SimilarityFunction/Amazon-Recommendations.pdf
2. Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web (WWW '01). ACM, New York, NY, USA, 285-295. Available at http://glaros.dtc.umn.edu/gkhome/fetch/papers/itempsWWW01.pdf

Collaborative filtering: surveys and more
1. Adomavicius, G. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, Vol. 17(6), pp. 734-749, 2005. Available at http://ids.csom.umn.edu/faculty/gedas/papers/recommender-systems-survey-2005.pdf
2. The Netflix Challenge prize rules. Read the "prize structure" section for the RMSE evaluation method. http://www.netflixprize.com/assets/rules.pdf

Compact set representation
1. Andrei Z. Broder. 2000. Identifying and Filtering Near-Duplicate Documents. In Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching (COM '00), Raffaele Giancarlo and David Sankoff (Eds.). Springer-Verlag, London, UK, 1-10. Available at http://www.cs.brown.edu/courses/cs253/papers/nearduplicate.pdf

Link prediction
1. David Liben‐Nowell and Jon Kleinberg. "The link‐prediction problem for social networks." Journal of the American society for information science and technology 58.7 (2007): 1019-1031. Available at http://snap.stanford.edu/class/cs224w-readings/nowell04linkprediction.pdf

Social matching (suggested reading)
1. Terveen, Loren, and David W. McDonald. "Social matching: A framework and research agenda." ACM transactions on computer-human interaction (TOCHI) 12.3 (2005): 401-434. Available at http://grouplens.org/system/files/Terveen-SocialMatching.pdf

Exam

Students have to take a written exam and have to present a project.

Written Exam
The written exam will consist of 2 or 3 assignments.

Project
Students should participate to a project. The project will consist in an assignment in which a group of maximum two students will cooperate to perform an experimental study on some of the aspects considered during the course. The outcome of the project will consist in i) the code produced to perform the experiments; ii) the data (statistics) produced during the experiments; iii) a technical report describing the work done and the results obtained. The technical report should have the form of a conference paper. It should be between 4 and 5 pages long and follow the ACM SIGs proceedings format. Word and LateX templates are available at http://www.acm.org/sigs/publications/proceedings-templates#aL.
In particular, the report should contain an introduction, a section describing the experimental setting (the dataset, the platform used, the performance indices considered etc.), one section describing and commenting on the results obtained and a conclusion section, highlighting significant aspects (positive or negative) of the results you obtained, and pointing to directions that in your opinion deserve further investigation. The introduction should provide an overview of the problem studied, the methods/approach used and the result obtained. It should also point to and comment some key references on the topic (e.g., the main papers in which the approach you are using was proposed/used before).
It is expected that you devote some effort to the writeup of the technical report. The way you present the material is important. Your report should help me understand what you have done, the critical decisions you hade to take etc. Reading or simply browsing some well-written scientific paper will definitely help you improve the presentation. To this purpose, a good idea is reading the suggested references listed above!

Grades
Each student will receive at most 20 points for the written exam and at most 10 for the project.

Forming a group
Please contact me by email to notify the names of the members of your team and receive your assignment. You can propose a topic you liked and I will try to assign a related project to you.

Discussing your project
Each student should arrange an appointment and discuss the project s/he participated in before taking part to the written exam. Before that, I should receive by email (at least 3 days before the appointment) a pdf copy of the report describing your work. This means that I should receive this before any participant to a group discusses his/her project with me.