官术网_书友最值得收藏!

Basic searching for strings and files

Imagine searching for a four leaf clover in a big garden. It would be really hard (and it is still really hard for computers). Thankfully, words are not images and text on a computer is easily searchable depending on the format. The term format has to be used because if your tool cannot understand a given type of text (encoding), then you might have trouble recognizing a pattern or even detecting that there is text at all!

Typically, when you are looking at the console, text files, source code (C, C++, Bash, HTML), spreadsheets, XML, and other types, you are looking at it in ASCII or UTF. ASCII is a commonly used format in the *NIX world on the console. There is also the UTF encoding scheme, which is an improvement upon ASCII and can support a variety of extended characters that were not present in computing originally. It comes in a number of formats such as UTF-8, UTF-16, and UTF32.

When you hear the words encoding and decoding, it is similar to encryption and decryption. The purpose is not to hide something, but rather to transform some data into something appropriate for the use case. For example, transmission, usage with languages, and compression.
ASCII and UTF are not the only types your target data might be in. In various types of files, you may encounter different types of encoding of data. This is a different problem that's specific to your data and will need additional considerations.

In this recipe, we will begin the process of searching for strings and a couple of ways to search for some of your own needles in a massive haystack of data. Let's dig in.

主站蜘蛛池模板: 福建省| 星座| 张掖市| 恭城| 句容市| 霍邱县| 冀州市| 望城县| 沙雅县| 中宁县| 乌海市| 鹰潭市| 盐边县| 淮安市| 辽中县| 长治市| 巴南区| 辽宁省| 平定县| 滁州市| 阳江市| 孝感市| 枣强县| 天峻县| 延长县| 嘉禾县| 九江县| 玉山县| 中阳县| 临猗县| 贵阳市| 和田县| 元阳县| 拉孜县| 株洲县| 白朗县| 桐柏县| 三门峡市| 德化县| 玉林市| 阜南县|