- Machine Learning for Cybersecurity Cookbook
- Emmanuel Tsukerman
- 228字
- 2021-06-24 12:28:54
Train-test-splitting your data
In machine learning, our goal is to create a program that is able to perform tasks it has never been explicitly taught to perform. The way we do that is to use data we have collected to train or fit a mathematical or statistical model. The data used to fit the model is referred to as training data. The resulting trained model is then used to predict future, previously-unseen data. In this way, the program is able to manage new situations without human intervention.
One of the major challenges for a machine learning practitioner is the danger of overfitting – creating a model that performs well on the training data but is not able to generalize to new, previously-unseen data. In order to combat the problem of overfitting, machine learning practitioners set aside a portion of the data, called test data, and use it only to assess the performance of the trained model, as opposed to including it as part of the training dataset. This careful setting aside of testing sets is key to training classifiers in cybersecurity, where overfitting is an omnipresent danger. One small oversight, such as using only benign data from one locale, can lead to a poor classifier.
There are various other ways to validate model performance, such as cross-validation. For simplicity, we will focus mainly on train-test splitting.
- Clojure Data Analysis Cookbook
- Word 2003、Excel 2003、PowerPoint 2003上機指導與練習
- AutoCAD快速入門與工程制圖
- Effective DevOps with AWS
- 深度學習中的圖像分類與對抗技術
- Creo Parametric 1.0中文版從入門到精通
- CompTIA Network+ Certification Guide
- 運動控制系統
- Mastering Exploratory Analysis with pandas
- 大數據素質讀本
- PowerPoint 2010幻燈片制作高手速成
- Deep Learning Essentials
- 天才與算法:人腦與AI的數學思維
- PyTorch深度學習
- 實戰Hadoop