- Expert C++
- Vardan Grigoryan Shunguang Wu
- 158字
- 2021-06-24 16:33:54
Tokenization
The analysis phase of the compiler aims to split the source code into small units called tokens. A token may be a word or just a single symbol, such as = (the equals sign). A token is the smallest unit of the source code that carries meaningful value for the compiler. For example, the expression int a = 42; will be divided into the tokens int, a, =, 42, and ;. The expression isn't just split by spaces, because the following expression is being split into the same tokens (though it is advisable not to forget the spaces between operands):
int a=42;
The splitting of the source code into tokens is done using sophisticated methods using regular expressions. It is known as lexical analysis, or tokenization (dividing into tokens). For compilers, using a tokenized input presents a better way to construct internal data structures used to analyze the syntax of the code. Let's see how.
- Mastering AWS Lambda
- Vue.js快跑:構建觸手可及的高性能Web應用
- Python零基礎快樂學習之旅(K12實戰訓練)
- Mastering Swift 2
- 精通Python自然語言處理
- Everyday Data Structures
- Android應用開發實戰
- 硬件產品設計與開發:從原型到交付
- Learning D
- Web編程基礎:HTML5、CSS3、JavaScript(第2版)
- iOS Development with Xamarin Cookbook
- 大話代碼架構:項目實戰版
- PHP 7 Programming Blueprints
- 劍指大數據:企業級電商數據倉庫項目實戰(精華版)
- Scala編程(第4版)