官术网_书友最值得收藏!

Obtaining the dataset

Since the inception of the Netflix Prize, Grouplens, a research group at the University of Minnesota, has released several datasets that are often used for testing algorithms in this area. They have released several versions of a movie rating dataset, which have different sizes. There is a version with 100,000 reviews, one with 1 million reviews and one with 10 million reviews.

The datasets are available from http://grouplens.org/datasets/movielens/ and the dataset we are going to use in this chapter is the MovieLens 100K dataset (with 100,000 reviews). Download this dataset and unzip it in your data folder. Start a new Jupyter Notebook and type the following code:

import os
import pandas as pd
data_folder = os.path.join(os.path.expanduser("~"), "Data", "ml-100k")
ratings_filename = os.path.join(data_folder, "u.data")

Ensure that ratings_filename points to the u.data file in the unzipped folder.

主站蜘蛛池模板: 固阳县| 贵定县| 读书| 虎林市| 广西| 竹溪县| 台州市| 青州市| 外汇| 平谷区| 通化市| 济阳县| 太原市| 梁平县| 米泉市| 上饶县| 汕头市| 长武县| 融水| 连南| 喀喇| 山东| 合阳县| 外汇| 中西区| 融水| 长宁县| 清涧县| 黄冈市| 资源县| 开阳县| 玉林市| 峡江县| 丹阳市| 平山县| 阜新| 惠来县| 游戏| 益阳市| 耿马| 福清市|