- Machine Learning for Cybersecurity Cookbook
- Emmanuel Tsukerman
- 254字
- 2021-06-24 12:29:08
How to do it...
In the following steps, we will enumerate all the 4-grams of a sample file and select the 50 most frequent ones:
- We begin by importing the collections library to facilitate counting and the ngrams library from nltk to ease extraction of N-grams:
import collections
from nltk import ngrams
- We specify which file we would like to analyze:
file_to_analyze = "python-3.7.2-amd64.exe"
- We define a convenience function to read in a file's bytes:
def read_file(file_path):
"""Reads in the binary sequence of a binary file."""
with open(file_path, "rb") as binary_file:
data = binary_file.read()
return data
- We write a convenience function to take a byte sequence and obtain N-grams:
def byte_sequence_to_Ngrams(byte_sequence, N):
"""Creates a list of N-grams from a byte sequence."""
Ngrams = ngrams(byte_sequence, N)
return list(Ngrams)
- We write a function to take a file and obtain its count of N-grams:
def binary_file_to_Ngram_counts(file, N):
"""Takes a binary file and outputs the N-grams counts of its binary sequence."""
filebyte_sequence = read_file(file)
file_Ngrams = byte_sequence_to_Ngrams(filebyte_sequence, N)
return collections.Counter(file_Ngrams)
- We specify that our desired value is N=4 and obtain the counts of all 4-grams in the file:
extracted_Ngrams = binary_file_to_Ngram_counts(file_to_analyze, 4)
- We list the 10 most common 4-grams of our file:
print(extracted_Ngrams.most_common(10))
The result is as follows:
[((0, 0, 0, 0), 24201), ((139, 240, 133, 246), 1920), ((32, 116, 111, 32), 1791), ((255, 255, 255, 255), 1663), ((108, 101, 100, 32), 1522), ((100, 32, 116, 111), 1519), ((97, 105, 108, 101), 1513), ((105, 108, 101, 100), 1513), ((70, 97, 105, 108), 1505), ((101, 100, 32, 116), 1503)]
推薦閱讀
- 工業(yè)機(jī)器人虛擬仿真實(shí)例教程:KUKA.Sim Pro(全彩版)
- Word 2000、Excel 2000、PowerPoint 2000上機(jī)指導(dǎo)與練習(xí)
- Dreamweaver CS3網(wǎng)頁設(shè)計(jì)與網(wǎng)站建設(shè)詳解
- AWS Administration Cookbook
- 大數(shù)據(jù)平臺異常檢測分析系統(tǒng)的若干關(guān)鍵技術(shù)研究
- 基于32位ColdFire構(gòu)建嵌入式系統(tǒng)
- 變頻器、軟啟動器及PLC實(shí)用技術(shù)260問
- 大數(shù)據(jù)驅(qū)動的機(jī)械裝備智能運(yùn)維理論及應(yīng)用
- 網(wǎng)站前臺設(shè)計(jì)綜合實(shí)訓(xùn)
- ESP8266 Robotics Projects
- 生物3D打印:從醫(yī)療輔具制造到細(xì)胞打印
- Web編程基礎(chǔ)
- Puppet 3 Beginner’s Guide
- 手把手教你學(xué)Flash CS3
- 30天學(xué)通Java Web項(xiàng)目案例開發(fā)