官术网_书友最值得收藏!

Loading with pandas

The MovieLens dataset is in a good shape; however, there are some changes from the default options in pandas.read_csv that we need to make. To start with, the data is separated by tabs, not commas. Next, there is no heading line. This means the first line in the file is actually data and we need to manually set the column names.

When loading the file, we set the delimiter parameter to the tab character, tell pandas not to read the first row as the header (with header=None) and to set the column names with given values. Let's look at the following code:

all_ratings = pd.read_csv(ratings_filename, delimiter="t", header=None, names
= ["UserID", "MovieID", "Rating", "Datetime"])

While we won't use it in this chapter, you can properly parse the date timestamp using the following line. Dates for reviews can be an important feature in recommendation prediction, as movies that are rated together often have more similar rankings than movies ranked separately. Accounting for this can improve models significantly.

all_ratings["Datetime"] = pd.to_datetime(all_ratings['Datetime'], unit='s')

You can view the first few records by running the following in a new cell:

all_ratings.head()

The result will come out looking something like this:

主站蜘蛛池模板: 长岭县| 内乡县| 拉萨市| 安陆市| 吴桥县| 桐庐县| 海门市| 西城区| 西华县| 和平县| 叙永县| 库伦旗| 云梦县| 小金县| 双鸭山市| 岐山县| 调兵山市| 咸丰县| 威远县| 屏边| 宁南县| 天水市| 清水县| 古蔺县| 珠海市| 黄浦区| 凤城市| 沿河| 河曲县| 绍兴市| 德兴市| 旬阳县| 北川| 江孜县| 板桥市| 洮南市| 昌黎县| 清涧县| 凯里市| 德保县| 渝中区|