- Machine Learning with scikit:learn Quick Start Guide
- Kevin Jolly
- 178字
- 2021-06-24 18:15:55
Reducing the size of the data
The dataset that we are working with contains over 6 million rows of data. Most machine learning algorithms will take a large amount of time to work with a dataset of this size. In order to make our execution time quicker, we will reduce the size of the dataset to 20,000 rows. We can do this by using the following code:
#Storing the fraudulent data into a dataframe
df_fraud = df[df['isFraud'] == 1]
#Storing the non-fraudulent data into a dataframe
df_nofraud = df[df['isFraud'] == 0]
#Storing 12,000 rows of non-fraudulent data
df_nofraud = df_nofraud.head(12000)
#Joining both datasets together
df = pd.concat([df_fraud, df_nofraud], axis = 0)
In the preceding code, the fraudulent rows are stored in one dataframe. This dataframe contains a little over 8,000 rows. The 12,000 non-fraudulent rows are stored in another dataframe, and the two dataframes are joined together using the concat method from pandas.
This results in a dataframe with a little over 20,000 rows, over which we can now execute our algorithms relatively quickly.
推薦閱讀
- 大數(shù)據(jù)導論:思維、技術(shù)與應用
- 腦動力:Linux指令速查效率手冊
- 大數(shù)據(jù)項目管理:從規(guī)劃到實現(xiàn)
- 21天學通PHP
- Java實用組件集
- Hybrid Cloud for Architects
- 具比例時滯遞歸神經(jīng)網(wǎng)絡的穩(wěn)定性及其仿真與應用
- 智能生產(chǎn)線的重構(gòu)方法
- 在實戰(zhàn)中成長:Windows Forms開發(fā)之路
- 人工智能技術(shù)入門
- 從零開始學Java Web開發(fā)
- 工業(yè)機器人實操進階手冊
- 一步步寫嵌入式操作系統(tǒng)
- 未來學徒:讀懂人工智能飛馳時代
- Unreal Development Kit Game Design Cookbook