官术网_书友最值得收藏!

Loading with pandas

The MovieLens dataset is in a good shape; however, there are some changes from the default options in pandas.read_csv that we need to make. To start with, the data is separated by tabs, not commas. Next, there is no heading line. This means the first line in the file is actually data and we need to manually set the column names.

When loading the file, we set the delimiter parameter to the tab character, tell pandas not to read the first row as the header (with header=None) and to set the column names with given values. Let's look at the following code:

all_ratings = pd.read_csv(ratings_filename, delimiter="t", header=None, names
= ["UserID", "MovieID", "Rating", "Datetime"])

While we won't use it in this chapter, you can properly parse the date timestamp using the following line. Dates for reviews can be an important feature in recommendation prediction, as movies that are rated together often have more similar rankings than movies ranked separately. Accounting for this can improve models significantly.

all_ratings["Datetime"] = pd.to_datetime(all_ratings['Datetime'], unit='s')

You can view the first few records by running the following in a new cell:

all_ratings.head()

The result will come out looking something like this:

主站蜘蛛池模板: 辽阳市| 杭锦旗| 灵寿县| 浏阳市| 甘谷县| 梅州市| 措勤县| 调兵山市| 南岸区| 竹北市| 淮北市| 会同县| 麻栗坡县| 大港区| 建始县| 梁平县| 墨江| 平果县| 信丰县| 余庆县| 昌都县| 望城县| 古丈县| 江门市| 卓资县| 尉犁县| 新丰县| 长宁区| 河北区| 左贡县| 株洲县| 辉县市| 灵石县| 郴州市| 尚志市| 阿拉尔市| 临朐县| 阜新| 通城县| 通河县| 璧山县|