官术网_书友最值得收藏!

Understanding the Apriori algorithm and its implementation

The goal of this chapter is to produce rules of the following form: if a person recommends this set of movies, they will also recommend this movie. We will also discuss extensions where a person who recommends a set of movies, is likely to recommend another particular movie.

To do this, we first need to determine if a person recommends a movie. We can do this by creating a new feature Favorable, which is True if the person gave a favorable review to a movie:

all_ratings["Favorable"] = all_ratings["Rating"] > 3

We can see the new feature by viewing the dataset:

all_ratings[10:15]

We will sample our dataset to form training data. This also helps reduce the size of the dataset that will be searched, making the Apriori algorithm run faster. We obtain all reviews from the first 200 users:

ratings = all_ratings[all_ratings['UserID'].isin(range(200))]

Next, we can create a dataset of only the favorable reviews in our sample:

favorable_ratings_mask = ratings["Favorable"]
favorable_ratings = ratings[favorable_ratings_mask]

We will be searching the user's favorable reviews for our itemsets. So, the next thing we need is the movies which each user has given a favorable rating. We can compute this by grouping the dataset by the UserID and iterating over the movies in each group:

favorable_reviews_by_users = dict((k, frozenset(v.values)) for k, v in favorable_ratings.groupby("UserID")["MovieID"])

In the preceding code, we stored the values as a frozenset, allowing us to quickly check if a movie has been rated by a user.

Sets are much faster than lists for this type of operation, and we will use them in later code.

Finally, we can create a DataFrame that tells us how frequently each movie has been given a favorable review:

num_favorable_by_movie = ratings[["MovieID", "Favorable"]].groupby("MovieID").sum()

We can see the top five movies by running the following code:

num_favorable_by_movie.sort_values(by="Favorable", ascending=False).head()

Let's see the top five movies list. We only have IDs now, and will get their titles later in the chapter.

主站蜘蛛池模板: 哈巴河县| 新化县| 大竹县| 韩城市| 阆中市| 奉化市| 疏附县| 克什克腾旗| 潞西市| 门源| 巴南区| 赣州市| 福州市| 珲春市| 习水县| 峨眉山市| 毕节市| 东光县| 台北市| 灵寿县| 耿马| 武威市| 乌鲁木齐市| 珲春市| 上饶市| 成安县| 奉节县| 罗江县| 依兰县| 谢通门县| 岳西县| 乳山市| 泰来县| 海原县| 泗水县| 兴城市| 古浪县| 阿城市| 博白县| 临桂县| 历史|