- Learning Data Mining with Python(Second Edition)
- Robert Layton
- 326字
- 2021-07-02 23:40:12
Understanding the Apriori algorithm and its implementation
The goal of this chapter is to produce rules of the following form: if a person recommends this set of movies, they will also recommend this movie. We will also discuss extensions where a person who recommends a set of movies, is likely to recommend another particular movie.
To do this, we first need to determine if a person recommends a movie. We can do this by creating a new feature Favorable, which is True if the person gave a favorable review to a movie:
all_ratings["Favorable"] = all_ratings["Rating"] > 3
We can see the new feature by viewing the dataset:
all_ratings[10:15]

We will sample our dataset to form training data. This also helps reduce the size of the dataset that will be searched, making the Apriori algorithm run faster. We obtain all reviews from the first 200 users:
ratings = all_ratings[all_ratings['UserID'].isin(range(200))]
Next, we can create a dataset of only the favorable reviews in our sample:
favorable_ratings_mask = ratings["Favorable"]
favorable_ratings = ratings[favorable_ratings_mask]
We will be searching the user's favorable reviews for our itemsets. So, the next thing we need is the movies which each user has given a favorable rating. We can compute this by grouping the dataset by the UserID and iterating over the movies in each group:
favorable_reviews_by_users = dict((k, frozenset(v.values)) for k, v in favorable_ratings.groupby("UserID")["MovieID"])
In the preceding code, we stored the values as a frozenset, allowing us to quickly check if a movie has been rated by a user.
Sets are much faster than lists for this type of operation, and we will use them in later code.
Finally, we can create a DataFrame that tells us how frequently each movie has been given a favorable review:
num_favorable_by_movie = ratings[["MovieID", "Favorable"]].groupby("MovieID").sum()
We can see the top five movies by running the following code:
num_favorable_by_movie.sort_values(by="Favorable", ascending=False).head()
Let's see the top five movies list. We only have IDs now, and will get their titles later in the chapter.

- Learning LibGDX Game Development(Second Edition)
- .NET之美:.NET關鍵技術深入解析
- Mastering Zabbix(Second Edition)
- Java應用與實戰
- Network Automation Cookbook
- C/C++常用算法手冊(第3版)
- Mastering Kali Linux for Web Penetration Testing
- Getting Started with SQL Server 2012 Cube Development
- Unity Shader入門精要
- C語言實驗指導及習題解析
- Spring Boot進階:原理、實戰與面試題分析
- .NET 3.5編程
- C指針原理揭秘:基于底層實現機制
- Machine Learning for OpenCV
- Kotlin語言實例精解