官术网_书友最值得收藏!

Dropping features that are redundant

From the dataset seen previously, there are a few columns that are redundant to the machine learning process:

  • nameOrig: This column is a unique identifier that belongs to each customer. Since each identifier is unique with every row of the dataset, the machine learning algorithm will not be able to discern any patterns from this feature. 
  • nameDest: This column is also a unique identifier that belongs to each customer and as such provides no value to the machine learning algorithm. 
  • isFlaggedFraud: This column flags a transaction as fraudulent if a person tries to transfer more than 200,000 in a single transaction. Since we already have a feature called isFraud that flags a transaction as fraud, this feature becomes redundant. 

We can drop these features from the dataset by using the following code: 

#Dropping the redundant features

df = df.drop(['nameOrig', 'nameDest', 'isFlaggedFraud'], axis = 1)
主站蜘蛛池模板: 遂溪县| 南木林县| 高台县| 定安县| 玉山县| 宝兴县| 潼关县| 赣州市| 大田县| 丹阳市| 承德县| 肇州县| 长泰县| 正安县| 贺兰县| 惠安县| 清徐县| 和政县| 资兴市| 资溪县| 同江市| 合水县| 漳州市| 邢台市| 左贡县| 理塘县| 图木舒克市| 礼泉县| 高清| 平遥县| 新河县| 星子县| 阿鲁科尔沁旗| 扎囊县| 长寿区| 玛纳斯县| 全椒县| 个旧市| 泸州市| 灵台县| 枣庄市|