- Java 9 Regular Expressions
- Anubhava Srivastava
- 409字
- 2021-07-02 18:58:34
The effect of eager matching on regular expression alternation
This regular expression engine behavior may return unexpected matches in alternation if alternations are not ordered carefully in the regex pattern.
Take an example of this regex pattern, which matches the strings white or whitewash:
white|whitewash
While applying this regex against an input of whitewash, the regex engine finds that the first alternative white matches the white substring of the input string whitewash, hence, the regex engine stops proceeding further and returns the match as white.
Note that our regex pattern has a better second alternative as whitewash, but due to the regex engine's eagerness to complete and return the match, the first alternative is returned as a match and the second alternative is ignored.
However, consider swapping the positions of the third and fourth alternatives in our regex pattern to make it as follows:
whitewash|white
If we apply this against the same input, whitewash, then the regex engine correctly returns the match as whitewash.
We can also use anchors or boundary matchers in our regular expressions to make it match a complete word. Any of the following two patterns will match and return whitewash as a match:
^(white|whitewash)$
\b(white|whitewash)\b
Let's take a look at a more interesting example, which attempts to match a known literal string "cat & rat" or a complete word in the input, using the following pattern:
\b(\w+|cat & rat)\b
If the input string is story of cat & rat, and we apply our regex pattern repeatedly, then the following four matched substrings will be returned:
1. story
2. of
3. cat
4. rat
It is because the regex engine is eagerly using the first alternative pattern \w+ to match a complete word and is returning all the matched words. The engine never attempts a second alternative of the literal string, cat & rat, because a successful match is always found using the first alternative. However, let's change the regex pattern to the following:
\b(cat & rat|\w+)\b
If we apply this regex on the same sting, story of cat & rat, and we apply our regex pattern repeatedly, then the following three matched substrings will be returned:
1. story
2. of
3. cat & rat
This is because now cat & rat is the first alternative and when the regex engine moves to a position before the letter c in the input, it is able to match and return a successful match using the first alternative.
- OpenCV 3和Qt5計算機視覺應(yīng)用開發(fā)
- Java FX應(yīng)用開發(fā)教程
- Django Design Patterns and Best Practices
- Mastering LibGDX Game Development
- SQL Server 2016數(shù)據(jù)庫應(yīng)用與開發(fā)習(xí)題解答與上機指導(dǎo)
- Python深度學(xué)習(xí):模型、方法與實現(xiàn)
- HTML5開發(fā)精要與實例詳解
- Scratch·愛編程的藝術(shù)家
- Extending Unity with Editor Scripting
- WordPress Search Engine Optimization(Second Edition)
- INSTANT Premium Drupal Themes
- 深入理解Java虛擬機:JVM高級特性與最佳實踐
- 企業(yè)級Java現(xiàn)代化:寫給開發(fā)者的云原生簡明指南
- WCF全面解析
- Mastering JavaScript Promises