官术网_书友最值得收藏!

Reducing the size of the data

The dataset that we are working with contains over 6 million rows of data. Most machine learning algorithms will take a large amount of time to work with a dataset of this size. In order to make our execution time quicker, we will reduce the size of the dataset to 20,000 rows. We can do this by using the following code:

#Storing the fraudulent data into a dataframe

df_fraud = df[df['isFraud'] == 1]

#Storing the non-fraudulent data into a dataframe

df_nofraud = df[df['isFraud'] == 0]

#Storing 12,000 rows of non-fraudulent data

df_nofraud = df_nofraud.head(12000)

#Joining both datasets together

df = pd.concat([df_fraud, df_nofraud], axis = 0)

In the preceding code, the fraudulent rows are stored in one dataframe. This dataframe contains a little over 8,000 rows. The 12,000 non-fraudulent rows are stored in another dataframe, and the two dataframes are joined together using the concat method from pandas.

This results in a dataframe with a little over 20,000 rows, over which we can now execute our algorithms relatively quickly. 

主站蜘蛛池模板: 麦盖提县| 湖州市| 赤城县| 临洮县| 泗阳县| 固镇县| 新野县| 板桥市| 连江县| 呼伦贝尔市| 红桥区| 富阳市| 若尔盖县| 丁青县| 芦山县| 徐州市| 广河县| 安康市| 揭东县| 华安县| 涟水县| 庄浪县| 郴州市| 黄大仙区| 五指山市| 济阳县| 北川| 漳州市| 什邡市| 成武县| 韶关市| 鸡东县| 峡江县| 炎陵县| 德保县| 海口市| 修武县| 四会市| 田东县| 招远市| 浠水县|