- Machine Learning for Cybersecurity Cookbook
- Emmanuel Tsukerman
- 322字
- 2021-06-24 12:29:00
How it works...
The first step involves simply loading the necessary libraries that will allow us to manipulate data quickly and easily. In steps 2 and 3, we generate a training and testing set consisting of normal observations. These have the same distributions. In step 4, on the other hand, we generate the remainder of our testing set by creating outliers. This anomalous dataset has a different distribution from the training data and the rest of the testing data. Plotting our data, we see that some outlier points look indistinguishable from normal points (step 5). This guarantees that our classifier will have a significant percentage of misclassifications, due to the nature of the data, and we must keep this in mind when evaluating its performance. In step 6, we fit an instance of Isolation Forest with default parameters to the training data.
Note that the algorithm is fed no information about the anomalous data. We use our trained instance of Isolation Forest to predict whether the testing data is normal or anomalous, and similarly to predict whether the anomalous data is normal or anomalous. To examine how the algorithm performs, we append the predicted labels to X_outliers (step 7) and then plot the predictions of the Isolation Forest instance on the outliers (step 8). We see that it was able to capture most of the anomalies. Those that were incorrectly labeled were indistinguishable from normal observations. Next, in step 9, we append the predicted label to X_test in preparation for analysis and then plot the predictions of the Isolation Forest instance on the normal testing data (step 10). We see that it correctly labeled the majority of normal observations. At the same time, there was a significant number of incorrectly classified normal observations (shown in red).
Depending on how many false alarms we are willing to tolerate, we may need to fine-tune our classifier to reduce the number of false positives.
- 數(shù)據(jù)展現(xiàn)的藝術(shù)
- 大學(xué)計算機信息技術(shù)導(dǎo)論
- Dreamweaver CS3+Flash CS3+Fireworks CS3創(chuàng)意網(wǎng)站構(gòu)建實例詳解
- Dreamweaver CS3網(wǎng)頁制作融會貫通
- Python Algorithmic Trading Cookbook
- 城市道路交通主動控制技術(shù)
- 數(shù)據(jù)挖掘方法及天體光譜挖掘技術(shù)
- Android游戲開發(fā)案例與關(guān)鍵技術(shù)
- Learn CloudFormation
- 格蠹匯編
- Dreamweaver+Photoshop+Flash+Fireworks網(wǎng)站建設(shè)與網(wǎng)頁設(shè)計完全實用
- 大數(shù)據(jù)素質(zhì)讀本
- Hands-On Business Intelligence with Qlik Sense
- Embedded Linux Development using Yocto Projects(Second Edition)
- 細節(jié)決定交互設(shè)計的成敗