官术网_书友最值得收藏!

Reducing the size of the data

The dataset that we are working with contains over 6 million rows of data. Most machine learning algorithms will take a large amount of time to work with a dataset of this size. In order to make our execution time quicker, we will reduce the size of the dataset to 20,000 rows. We can do this by using the following code:

#Storing the fraudulent data into a dataframe

df_fraud = df[df['isFraud'] == 1]

#Storing the non-fraudulent data into a dataframe

df_nofraud = df[df['isFraud'] == 0]

#Storing 12,000 rows of non-fraudulent data

df_nofraud = df_nofraud.head(12000)

#Joining both datasets together

df = pd.concat([df_fraud, df_nofraud], axis = 0)

In the preceding code, the fraudulent rows are stored in one dataframe. This dataframe contains a little over 8,000 rows. The 12,000 non-fraudulent rows are stored in another dataframe, and the two dataframes are joined together using the concat method from pandas.

This results in a dataframe with a little over 20,000 rows, over which we can now execute our algorithms relatively quickly. 

主站蜘蛛池模板: 仲巴县| 宝应县| 马尔康县| 漯河市| 盐津县| 望城县| 裕民县| 桐乡市| 通道| 延吉市| 黄平县| 宿州市| 凯里市| 襄垣县| 新乡县| 宜良县| 嘉兴市| 涪陵区| 霍林郭勒市| 光泽县| 昌乐县| 满洲里市| 丁青县| 司法| 江都市| 上栗县| 九龙坡区| 益阳市| 南召县| 宜兴市| 翁牛特旗| 蒙城县| 清徐县| 德清县| 云霄县| 白城市| 五寨县| 元江| 崇左市| 太仓市| 会理县|