Understanding the Apriori algorithm and its implementation
The goal of this chapter is to produce rules of the following form: if a person recommends this set of movies, they will also recommend this movie. We will also discuss extensions where a person who recommends a set of movies, is likely to recommend another particular movie.
To do this, we first need to determine if a person recommends a movie. We can do this by creating a new feature Favorable, which is True if the person gave a favorable review to a movie:
We can see the new feature by viewing the dataset:
all_ratings[10:15]
We will sample our dataset to form training data. This also helps reduce the size of the dataset that will be searched, making the Apriori algorithm run faster. We obtain all reviews from the first 200 users:
We will be searching the user's favorable reviews for our itemsets. So, the next thing we need is the movies which each user has given a favorable rating. We can compute this by grouping the dataset by theUserIDand iterating over the movies in each group:
favorable_reviews_by_users = dict((k, frozenset(v.values)) for k, v in favorable_ratings.groupby("UserID")["MovieID"])
In the preceding code, we stored the values as a frozenset, allowing us to quickly check if a movie has been rated by a user.
Sets are much faster than lists for this type of operation, and we will use them in later code.
Finally, we can create aDataFramethat tells us how frequently each movie has been given a favorable review: