官术网_书友最值得收藏!

Reducing the size of the data

The dataset that we are working with contains over 6 million rows of data. Most machine learning algorithms will take a large amount of time to work with a dataset of this size. In order to make our execution time quicker, we will reduce the size of the dataset to 20,000 rows. We can do this by using the following code:

#Storing the fraudulent data into a dataframe

df_fraud = df[df['isFraud'] == 1]

#Storing the non-fraudulent data into a dataframe

df_nofraud = df[df['isFraud'] == 0]

#Storing 12,000 rows of non-fraudulent data

df_nofraud = df_nofraud.head(12000)

#Joining both datasets together

df = pd.concat([df_fraud, df_nofraud], axis = 0)

In the preceding code, the fraudulent rows are stored in one dataframe. This dataframe contains a little over 8,000 rows. The 12,000 non-fraudulent rows are stored in another dataframe, and the two dataframes are joined together using the concat method from pandas.

This results in a dataframe with a little over 20,000 rows, over which we can now execute our algorithms relatively quickly. 

主站蜘蛛池模板: 南木林县| 夏河县| 永福县| 驻马店市| 绿春县| 通道| 阳高县| 遵化市| 长治市| 泾川县| 精河县| 元阳县| 宜阳县| 馆陶县| 雷波县| 绍兴县| 娱乐| 托克托县| 塔城市| 安福县| 陵水| 宜黄县| 宁河县| 罗田县| 双江| 常宁市| 会泽县| 明星| 铅山县| 皋兰县| 法库县| 龙山县| 浦东新区| 双柏县| 张掖市| 锡林郭勒盟| 吴忠市| 聊城市| 广州市| 溆浦县| 广州市|