- Machine Learning for Cybersecurity Cookbook
- Emmanuel Tsukerman
- 267字
- 2021-06-24 12:29:04
How to do it...
In the following steps, we will collect notable portions of the PE header:
- Import pefile and modules for enumerating our samples:
import pefile
from os import listdir
from os.path import isfile, join
directories = ["Benign PE Samples", "Malicious PE Samples"]
- We define a function to collect the names of the sections of a file and preprocess them for readability and normalization:
def get_section_names(pe):
"""Gets a list of section names from a PE file."""
list_of_section_names = []
for sec in pe.sections:
normalized_name = sec.Name.decode().replace("\x00", "").lower()
list_of_section_names.append(normalized_name)
return list_of_section_names
- We define a convenience function to preprocess and standardize our imports:
def preprocess_imports(list_of_DLLs):
"""Normalize the naming of the imports of a PE file."""
return [x.decode().split(".")[0].lower() for x in list_of_DLLs]
- We then define a function to collect the imports from a file using pefile:
def get_imports(pe):
"""Get a list of the imports of a PE file."""
list_of_imports = []
for entry in pe.DIRECTORY_ENTRY_IMPORT:
list_of_imports.append(entry.dll)
return preprocess_imports(list_of_imports)
- Finally, we prepare to iterate through all of our files and create lists to store our features:
imports_corpus = []
num_sections = []
section_names = []
for dataset_path in directories:
samples = [f for f in listdir(dataset_path) if isfile(join(dataset_path, f))]
for file in samples:
file_path = dataset_path + "/" + file
try:
- In addition to collecting the preceding features, we also collect the number of sections of a file:
pe = pefile.PE(file_path)
imports = get_imports(pe)
n_sections = len(pe.sections)
sec_names = get_section_names(pe)
imports_corpus.append(imports)
num_sections.append(n_sections)
section_names.append(sec_names)
- In case a file's PE header cannot be parsed, we define a try-catch clause:
except Exception as e:
print(e)
print("Unable to obtain imports from " + file_path)
推薦閱讀
- 工業(yè)機器人技術(shù)及應(yīng)用
- Visual FoxPro 6.0數(shù)據(jù)庫與程序設(shè)計
- 讓每張照片都成為佳作的Photoshop后期技法
- 水晶石精粹:3ds max & ZBrush三維數(shù)字靜幀藝術(shù)
- 運動控制器與交流伺服系統(tǒng)的調(diào)試和應(yīng)用
- CompTIA Linux+ Certification Guide
- 運動控制系統(tǒng)應(yīng)用與實踐
- 面向?qū)ο蟪绦蛟O(shè)計綜合實踐
- Dreamweaver CS6精彩網(wǎng)頁制作與網(wǎng)站建設(shè)
- Hands-On Data Warehousing with Azure Data Factory
- HBase Essentials
- Python文本分析
- Learning Couchbase
- Spark Streaming實時流式大數(shù)據(jù)處理實戰(zhàn)
- ASP.NET學(xué)習(xí)手冊