- Learning Data Mining with Python(Second Edition)
- Robert Layton
- 138字
- 2021-07-02 23:40:12
Obtaining the dataset
Since the inception of the Netflix Prize, Grouplens, a research group at the University of Minnesota, has released several datasets that are often used for testing algorithms in this area. They have released several versions of a movie rating dataset, which have different sizes. There is a version with 100,000 reviews, one with 1 million reviews and one with 10 million reviews.
The datasets are available from http://grouplens.org/datasets/movielens/ and the dataset we are going to use in this chapter is the MovieLens 100K dataset (with 100,000 reviews). Download this dataset and unzip it in your data folder. Start a new Jupyter Notebook and type the following code:
import os
import pandas as pd
data_folder = os.path.join(os.path.expanduser("~"), "Data", "ml-100k")
ratings_filename = os.path.join(data_folder, "u.data")
Ensure that ratings_filename points to the u.data file in the unzipped folder.
- 自己動手實現Lua:虛擬機、編譯器和標準庫
- JavaScript Unlocked
- HTML5+CSS3基礎開發教程(第2版)
- Java游戲服務器架構實戰
- Windows Server 2012 Unified Remote Access Planning and Deployment
- Python自然語言處理(微課版)
- 深度強化學習算法與實踐:基于PyTorch的實現
- TypeScript項目開發實戰
- Mastering Apache Maven 3
- C程序設計實踐教程
- Extending Puppet(Second Edition)
- Arduino計算機視覺編程
- 機器學習微積分一本通(Python版)
- 深度探索Go語言:對象模型與runtime的原理特性及應用
- C語言從入門到精通