- Machine Learning for Cybersecurity Cookbook
- Emmanuel Tsukerman
- 266字
- 2021-06-24 12:29:08
How it works...
In the literature and industry, it has been determined that the most frequent N-grams are also the most informative ones for a malware classification algorithm. For this reason, in this recipe, we will write functions to extract them for a file. We start by importing some helpful libraries for our extraction of N-grams (step 1). In particular, we import the collections library and the ngrams library from nltk. The collections library allows us to convert a list of N-grams to a frequency count of the N-grams, while the ngrams library allows us to take an ordered list of bytes and obtain a list of N-grams. We specify the file we would like to analyze and write a function that will read all of the bytes of a given file (steps 2 and 3). We define a few more convenience functions before we begin the extraction. In particular, we write a function to take a file's sequence of bytes and output a list of its N-grams (step 4), and a function to take a file and output the counts of its N-grams (step 5). We are now ready to pass in a file and extracts its N-grams. We do so to extract the counts of 4-grams of our file (step 6) and then display the 10 most common of them, along with their counts (step 7). We see that some of the N-gram sequences, such as (0,0,0,0) and (255,255,255,255) may not be very informative. For this reason, we will utilize feature selection methods to cut out the less informative N-grams in our next recipe.
- 輕輕松松自動(dòng)化測(cè)試
- 大數(shù)據(jù)管理系統(tǒng)
- 腦動(dòng)力:C語(yǔ)言函數(shù)速查效率手冊(cè)
- 自動(dòng)化控制工程設(shè)計(jì)
- WordPress Theme Development Beginner's Guide(Third Edition)
- 傳感器與物聯(lián)網(wǎng)技術(shù)
- Moodle Course Design Best Practices
- 計(jì)算機(jī)與信息技術(shù)基礎(chǔ)上機(jī)指導(dǎo)
- SAP Business Intelligence Quick Start Guide
- 液壓機(jī)智能故障診斷方法集成技術(shù)
- 電動(dòng)汽車驅(qū)動(dòng)與控制技術(shù)
- 網(wǎng)絡(luò)安全概論
- DynamoDB Applied Design Patterns
- Embedded Linux Development using Yocto Projects(Second Edition)
- Cloud Native Development Patterns and Best Practices