- Learning Data Mining with Python(Second Edition)
- Robert Layton
- 197字
- 2021-07-02 23:40:12
Loading with pandas
The MovieLens dataset is in a good shape; however, there are some changes from the default options in pandas.read_csv that we need to make. To start with, the data is separated by tabs, not commas. Next, there is no heading line. This means the first line in the file is actually data and we need to manually set the column names.
When loading the file, we set the delimiter parameter to the tab character, tell pandas not to read the first row as the header (with header=None) and to set the column names with given values. Let's look at the following code:
all_ratings = pd.read_csv(ratings_filename, delimiter="t", header=None, names
= ["UserID", "MovieID", "Rating", "Datetime"])
While we won't use it in this chapter, you can properly parse the date timestamp using the following line. Dates for reviews can be an important feature in recommendation prediction, as movies that are rated together often have more similar rankings than movies ranked separately. Accounting for this can improve models significantly.
all_ratings["Datetime"] = pd.to_datetime(all_ratings['Datetime'], unit='s')
You can view the first few records by running the following in a new cell:
all_ratings.head()
The result will come out looking something like this:

- Vue.js 3.x快速入門
- 大學(xué)計(jì)算機(jī)基礎(chǔ)(第三版)
- Dynamics 365 Application Development
- x86匯編語(yǔ)言:從實(shí)模式到保護(hù)模式(第2版)
- 從Excel到Python:用Python輕松處理Excel數(shù)據(jù)(第2版)
- Scala程序員面試算法寶典
- 學(xué)習(xí)OpenCV 4:基于Python的算法實(shí)戰(zhàn)
- Statistical Application Development with R and Python(Second Edition)
- Hands-On Nuxt.js Web Development
- Java 從入門到項(xiàng)目實(shí)踐(超值版)
- Applied Deep Learning with Python
- Go語(yǔ)言編程之旅:一起用Go做項(xiàng)目
- Selenium自動(dòng)化測(cè)試實(shí)戰(zhàn):基于Python
- MySQL 5.7從入門到精通(視頻教學(xué)版)(第2版)
- Visual FoxPro程序設(shè)計(jì)(第二版)