官术网_书友最值得收藏!

Loading with pandas

The MovieLens dataset is in a good shape; however, there are some changes from the default options in pandas.read_csv that we need to make. To start with, the data is separated by tabs, not commas. Next, there is no heading line. This means the first line in the file is actually data and we need to manually set the column names.

When loading the file, we set the delimiter parameter to the tab character, tell pandas not to read the first row as the header (with header=None) and to set the column names with given values. Let's look at the following code:

all_ratings = pd.read_csv(ratings_filename, delimiter="t", header=None, names
= ["UserID", "MovieID", "Rating", "Datetime"])

While we won't use it in this chapter, you can properly parse the date timestamp using the following line. Dates for reviews can be an important feature in recommendation prediction, as movies that are rated together often have more similar rankings than movies ranked separately. Accounting for this can improve models significantly.

all_ratings["Datetime"] = pd.to_datetime(all_ratings['Datetime'], unit='s')

You can view the first few records by running the following in a new cell:

all_ratings.head()

The result will come out looking something like this:

主站蜘蛛池模板: 芦溪县| 开封县| 六枝特区| 库伦旗| 漳浦县| 清水河县| 太保市| 兴文县| 德江县| 沙坪坝区| 天门市| 乌拉特中旗| 绵阳市| 滕州市| 朝阳县| 西乡县| 朔州市| 新龙县| 博野县| 道孚县| 湾仔区| 太仆寺旗| 崇州市| 丰原市| 新巴尔虎右旗| 阿拉尔市| 高邮市| 绥阳县| 久治县| 叶城县| 陇川县| 和硕县| 麻江县| 都江堰市| 太白县| 林周县| 海阳市| 和林格尔县| 临城县| 阿巴嘎旗| 伊宁市|