官术网_书友最值得收藏!

Basic searching for strings and files

Imagine searching for a four leaf clover in a big garden. It would be really hard (and it is still really hard for computers). Thankfully, words are not images and text on a computer is easily searchable depending on the format. The term format has to be used because if your tool cannot understand a given type of text (encoding), then you might have trouble recognizing a pattern or even detecting that there is text at all!

Typically, when you are looking at the console, text files, source code (C, C++, Bash, HTML), spreadsheets, XML, and other types, you are looking at it in ASCII or UTF. ASCII is a commonly used format in the *NIX world on the console. There is also the UTF encoding scheme, which is an improvement upon ASCII and can support a variety of extended characters that were not present in computing originally. It comes in a number of formats such as UTF-8, UTF-16, and UTF32.

When you hear the words encoding and decoding, it is similar to encryption and decryption. The purpose is not to hide something, but rather to transform some data into something appropriate for the use case. For example, transmission, usage with languages, and compression.
ASCII and UTF are not the only types your target data might be in. In various types of files, you may encounter different types of encoding of data. This is a different problem that's specific to your data and will need additional considerations.

In this recipe, we will begin the process of searching for strings and a couple of ways to search for some of your own needles in a massive haystack of data. Let's dig in.

主站蜘蛛池模板: 临颍县| 克什克腾旗| 桦南县| 宣汉县| 肇州县| 龙州县| 凤冈县| 普陀区| 龙江县| 定安县| 肇州县| 凤城市| 遂溪县| 托克逊县| 长治市| 德昌县| 胶南市| 东宁县| 和静县| 屏山县| 文水县| 南昌县| 酒泉市| 闻喜县| 扬中市| 沙雅县| 屯留县| 绍兴市| 蕲春县| 葵青区| 天峨县| 仙居县| 日喀则市| 陈巴尔虎旗| 石阡县| 阿城市| 河东区| 泾川县| 安康市| 扎囊县| 延安市|