Recommender Systems Comparison: The Best Performing Algorithm
In this blog, SoftServe’s Data Science Group shows how to use packages for building recommender systems in R: recommenderlab, recosystem, Slope One and SVD Approximation. It also compares performance of a few algorithms based on 1M MovieLense dataset.
First of all, the mentioned packages need to be installed and loaded:
install.packages("recommenderlab") install.packages("recosystem") library(devtools) install_github(repo = "SlopeOne", username = "tarashnot") install_github(repo = "SVDApproximation", username = "tarashnot") library(recommenderlab) library(recosystem) library(SlopeOne) library(SVDApproximation) library(data.table) library(RColorBrewer) library(ggplot2)
1M MovieLense Dataset
MovieLens is a project of GroupLens Research, a research lab established in 1997 by the Department of Computer Science and Engineering at the University of Minnesota. This project was focused on gathering research data on personalized recommendations. MovieLens is a recommender system and virtual community website that rec ommends movies for its users as based on their film preferences using collaborative filtering.
1M MovieLens dataset contains approximately one million ratings of 6040 movies from 3706 users with 1-to-5 rating scale. General statistics of this dataset could be observed in the figure below (note: grey lines show median values). The level of rating matrix sparsity is 0.045.
This dataset is added to SVDApproximation, so it needs no downloads to be accessed as soon as the package is loaded:
data(ratings) head(ratings) user item rating 1: 1 1 5 2: 6 1 4 3: 8 1 4 4: 9 1 5 5: 10 1 5 6: 18 1 4
The ratings table contains IDs of users and items (in our case - movies) and ratings.
Statistics of ratings data:
visualize_ratings(ratings_table = ratings)