官术网_书友最值得收藏!

Tokenization

The analysis phase of the compiler aims to split the source code into small units called tokens. A token may be a word or just a single symbol, such as = (the equals sign). A token is the smallest unit of the source code that carries meaningful value for the compiler. For example, the expression int a = 42; will be divided into the tokens int, a, =, 42, and ;. The expression isn't just split by spaces, because the following expression is being split into the same tokens (though it is advisable not to forget the spaces between operands):

int a=42;

The splitting of the source code into tokens is done using sophisticated methods using regular expressions. It is known as lexical analysis, or tokenization (dividing into tokens). For compilers, using a tokenized input presents a better way to construct internal data structures used to analyze the syntax of the code. Let's see how.

主站蜘蛛池模板: 南京市| 庆安县| 普宁市| 塘沽区| 龙胜| 海口市| 霸州市| 方正县| 寿阳县| 大名县| 志丹县| 阳东县| 浮梁县| 尖扎县| 揭西县| 达尔| 岳阳市| 平湖市| 察哈| 栖霞市| 罗平县| 鲁山县| 曲麻莱县| 吉安市| 正安县| 满洲里市| 武汉市| 台江县| 屏东县| 缙云县| 龙州县| 定安县| 莱阳市| 抚州市| 茂名市| 澜沧| 高台县| 临夏县| 同德县| 徐闻县| 苏尼特左旗|