- Machine Learning for Cybersecurity Cookbook
- Emmanuel Tsukerman
- 266字
- 2021-06-24 12:29:08
How it works...
In the literature and industry, it has been determined that the most frequent N-grams are also the most informative ones for a malware classification algorithm. For this reason, in this recipe, we will write functions to extract them for a file. We start by importing some helpful libraries for our extraction of N-grams (step 1). In particular, we import the collections library and the ngrams library from nltk. The collections library allows us to convert a list of N-grams to a frequency count of the N-grams, while the ngrams library allows us to take an ordered list of bytes and obtain a list of N-grams. We specify the file we would like to analyze and write a function that will read all of the bytes of a given file (steps 2 and 3). We define a few more convenience functions before we begin the extraction. In particular, we write a function to take a file's sequence of bytes and output a list of its N-grams (step 4), and a function to take a file and output the counts of its N-grams (step 5). We are now ready to pass in a file and extracts its N-grams. We do so to extract the counts of 4-grams of our file (step 6) and then display the 10 most common of them, along with their counts (step 7). We see that some of the N-gram sequences, such as (0,0,0,0) and (255,255,255,255) may not be very informative. For this reason, we will utilize feature selection methods to cut out the less informative N-grams in our next recipe.
- 機器學習實戰:基于Sophon平臺的機器學習理論與實踐
- PPT,要你好看
- 大數據戰爭:人工智能時代不能不說的事
- 計算機應用復習與練習
- 條碼技術及應用
- 可編程控制器技術應用(西門子S7系列)
- 21天學通Java
- 精通特征工程
- Photoshop CS3圖像處理融會貫通
- 零起點學西門子S7-200 PLC
- 電腦日常使用與維護322問
- Hands-On Data Warehousing with Azure Data Factory
- 常用傳感器技術及應用(第2版)
- Effective Business Intelligence with QuickSight
- Wireshark Revealed:Essential Skills for IT Professionals