官术网_书友最值得收藏!

Basic searching for strings and files

Imagine searching for a four leaf clover in a big garden. It would be really hard (and it is still really hard for computers). Thankfully, words are not images and text on a computer is easily searchable depending on the format. The term format has to be used because if your tool cannot understand a given type of text (encoding), then you might have trouble recognizing a pattern or even detecting that there is text at all!

Typically, when you are looking at the console, text files, source code (C, C++, Bash, HTML), spreadsheets, XML, and other types, you are looking at it in ASCII or UTF. ASCII is a commonly used format in the *NIX world on the console. There is also the UTF encoding scheme, which is an improvement upon ASCII and can support a variety of extended characters that were not present in computing originally. It comes in a number of formats such as UTF-8, UTF-16, and UTF32.

When you hear the words encoding and decoding, it is similar to encryption and decryption. The purpose is not to hide something, but rather to transform some data into something appropriate for the use case. For example, transmission, usage with languages, and compression.
ASCII and UTF are not the only types your target data might be in. In various types of files, you may encounter different types of encoding of data. This is a different problem that's specific to your data and will need additional considerations.

In this recipe, we will begin the process of searching for strings and a couple of ways to search for some of your own needles in a massive haystack of data. Let's dig in.

主站蜘蛛池模板: 三门县| 封丘县| 敦化市| 康马县| 盐津县| 望谟县| 太湖县| 朝阳区| 临邑县| 辽阳市| 龙里县| 卢氏县| 锡林郭勒盟| 新邵县| 霸州市| 罗平县| 探索| 茌平县| 巧家县| 名山县| 双牌县| 和田县| 深圳市| 泸州市| 汽车| 云霄县| 仁化县| 新丰县| 利津县| 海安县| 玉树县| 利辛县| 樟树市| 西昌市| 祁东县| 大埔县| 永新县| 普格县| 富阳市| 北川| 林周县|