官术网_书友最值得收藏!

Basic searching for strings and files

Imagine searching for a four leaf clover in a big garden. It would be really hard (and it is still really hard for computers). Thankfully, words are not images and text on a computer is easily searchable depending on the format. The term format has to be used because if your tool cannot understand a given type of text (encoding), then you might have trouble recognizing a pattern or even detecting that there is text at all!

Typically, when you are looking at the console, text files, source code (C, C++, Bash, HTML), spreadsheets, XML, and other types, you are looking at it in ASCII or UTF. ASCII is a commonly used format in the *NIX world on the console. There is also the UTF encoding scheme, which is an improvement upon ASCII and can support a variety of extended characters that were not present in computing originally. It comes in a number of formats such as UTF-8, UTF-16, and UTF32.

When you hear the words encoding and decoding, it is similar to encryption and decryption. The purpose is not to hide something, but rather to transform some data into something appropriate for the use case. For example, transmission, usage with languages, and compression.
ASCII and UTF are not the only types your target data might be in. In various types of files, you may encounter different types of encoding of data. This is a different problem that's specific to your data and will need additional considerations.

In this recipe, we will begin the process of searching for strings and a couple of ways to search for some of your own needles in a massive haystack of data. Let's dig in.

主站蜘蛛池模板: 泰顺县| 莱西市| 沙雅县| 乌兰察布市| 桃园县| 朔州市| 广西| 彭阳县| 武川县| 喜德县| 双流县| 陕西省| 个旧市| 马尔康县| 江源县| 万州区| 平遥县| 铜川市| 吉木乃县| 贵港市| 丽水市| 墨竹工卡县| 沅江市| 岢岚县| 蒲江县| 怀化市| 岳普湖县| 驻马店市| 白水县| 南部县| 平罗县| 海兴县| 长泰县| 济阳县| 林州市| 武鸣县| 金塔县| 拉孜县| 澄城县| 吴江市| 福建省|