書(shū)名： Machine Learning for Cybersecurity Cookbook
作者名： Emmanuel Tsukerman
本章字?jǐn)?shù)： 150字
更新時(shí)間： 2021-06-24 12:29:08

Extracting N-grams

In standard quantitative analysis of text, N-grams are sequences of N tokens (for example, words or characters). For instance, given the text The quick brown fox jumped over the lazy dog, if our tokens are words, then the 1-grams are the, quick, brown, fox, jumped, over, the, lazy, and dog. The 2-grams are the quick, quick brown, brown fox, and so on. The 3-grams are the quick brown, quick brown fox, brown fox jumped, and so on. Just like the local statistics of the text allowed us to build a Markov chain to perform statistical predictions and text generation from a corpus, N-grams allow us to model the local statistical properties of our corpus. Our ultimate goal is to utilize the counts of N-grams to help us predict whether a sample is malicious or benign. In this recipe, we demonstrate how to extract N-gram counts from a sample.

官术网_书友最值得收藏!

Machine Learning for Cybersecurity Cookbook

Extracting N-grams