官术网_书友最值得收藏!

Dropping features that are redundant

From the dataset seen previously, there are a few columns that are redundant to the machine learning process:

  • nameOrig: This column is a unique identifier that belongs to each customer. Since each identifier is unique with every row of the dataset, the machine learning algorithm will not be able to discern any patterns from this feature. 
  • nameDest: This column is also a unique identifier that belongs to each customer and as such provides no value to the machine learning algorithm. 
  • isFlaggedFraud: This column flags a transaction as fraudulent if a person tries to transfer more than 200,000 in a single transaction. Since we already have a feature called isFraud that flags a transaction as fraud, this feature becomes redundant. 

We can drop these features from the dataset by using the following code: 

#Dropping the redundant features

df = df.drop(['nameOrig', 'nameDest', 'isFlaggedFraud'], axis = 1)
主站蜘蛛池模板: 长治市| 洛隆县| 青阳县| 都江堰市| 恩施市| 凤庆县| 尤溪县| 六盘水市| 邹平县| 香格里拉县| 涪陵区| 客服| 泌阳县| 准格尔旗| 田林县| 双江| 宿迁市| 桂东县| 陇西县| 邯郸县| 乐都县| 买车| 临桂县| 申扎县| 广东省| 沂南县| 丽水市| 宁武县| 介休市| 红河县| 桂林市| 江源县| 石柱| 南投市| 邛崃市| 六盘水市| 枣庄市| 原平市| 恩平市| 阳城县| 保康县|