官术网_书友最值得收藏!

Obtaining the dataset

Since the inception of the Netflix Prize, Grouplens, a research group at the University of Minnesota, has released several datasets that are often used for testing algorithms in this area. They have released several versions of a movie rating dataset, which have different sizes. There is a version with 100,000 reviews, one with 1 million reviews and one with 10 million reviews.

The datasets are available from http://grouplens.org/datasets/movielens/ and the dataset we are going to use in this chapter is the MovieLens 100K dataset (with 100,000 reviews). Download this dataset and unzip it in your data folder. Start a new Jupyter Notebook and type the following code:

import os
import pandas as pd
data_folder = os.path.join(os.path.expanduser("~"), "Data", "ml-100k")
ratings_filename = os.path.join(data_folder, "u.data")

Ensure that ratings_filename points to the u.data file in the unzipped folder.

主站蜘蛛池模板: 铜梁县| 黄梅县| 祁东县| 同德县| 陆川县| 黄山市| 天柱县| 富裕县| 稻城县| 云霄县| 贡山| 苍梧县| 裕民县| 山阴县| 新泰市| 镇雄县| 仁怀市| 曲阳县| 湖州市| 抚远县| 崇州市| 玉林市| 广元市| 西城区| 芮城县| 彭水| 台东县| 新河县| 芦溪县| 鹤岗市| 时尚| 驻马店市| 民乐县| 北碚区| 斗六市| 读书| 托克托县| 乐都县| 灌阳县| 绥宁县| 吉林省|