- Java 9 Regular Expressions
- Anubhava Srivastava
- 541字
- 2021-07-02 18:58:35
Possessive quantifiers
Possessive quantifiers are quantifiers that are greedy when matching text like greedy quantifiers do. Both greedy and possessive quantifiers try to match as many characters as possible. The important difference, however, is that the possessive quantifiers do not backtrack (go back) unlike greedy quantifiers; therefore, it is possible that the regex match fails if the possessive quantifiers go too far.
This table shows all the three types of quantifiers, side by side:

Let's take an example input string, a1b5, and see the behavior of the greedy, lazy, and possessive quantifiers.
If we apply a regex using the greedy quantifier, \w+\d, then it will match a1b (the longest match before backtracking starts) using \w+, and 5 will be matched using \d; thus, the full match will be a1b5.
Now, if we apply a regex using the non-greedy quantifier, \w+?\d, then it will match a (the shortest match before expanding starts) using \w+?, and then the adjacent digit 1 will be matched using \d. Thus, the first full match will be a1. If we let the regex execute again, then it will find another match, b5.
Finally, if we apply a regex using the possessive quantifier, \w++\d, then it will match all the characters a1b5 (the longest possible match without giving back) using \w++ . Due to this, \d remains unmatched, and hence the regex fails to find any match.
Let's take another example. The requirement is to match a string that starts with lowercase English alphabets or hyphen. The string can have any character after the alphabets/hyphens, except a colon. There can be any number of any characters of any length after the colon until the end.
An example of a valid input is as-df999 and that of an invalid input is asdf-:123.
Now, let's try solving this regex problem using a greedy quantifier regex:
^[a-z-]+[^:].*$
Unfortunately, this is not the right regex pattern because this regex will match both the aforementioned valid and invalid inputs. This is because of the backtracking behavior of the regex engine in greedy quantifiers. The [a-z-]+ pattern will find the longest possible match in the form of asdf-, but due to the negated character class pattern [^:] , the regex engine will backtrack one position to asdf and will match the next hyphen for [^:]. All the remaining text, that is, :123, will be matched using .*.
Let's try to solve this regex problem using the following possessive quantifier regex:
^[a-z-]++[^:].*$
This regex pattern will still match our valid input, but it will fail to match an invalid input because there is no backtracking in possessive quantifiers; hence, the regex engine will not go back any position after matching asdf- in the second example string. Since the next character is a colon and our regex sub-pattern is [^:], the regex engine will stop matching and correctly declare our invalid input a failed match.
Possessive quantifiers are good for the performance of the underlying regex engine because the engine does not have to keep any backtracking information in memory. The performance increase is even more when a regex fails to match because possessive quantifiers fail faster. So, remember that the benefit of possessive quantifiers is to improve the regex performance, especially when using nested quantifiers.
- 潮流:UI設計必修課
- Redis Applied Design Patterns
- 算法精粹:經典計算機科學問題的Java實現
- Vue.js快速入門與深入實戰
- Raspberry Pi for Secret Agents(Third Edition)
- Learning AWS Lumberyard Game Development
- Python 3網絡爬蟲實戰
- 組態軟件技術與應用
- Babylon.js Essentials
- HTML5開發精要與實例詳解
- 大學計算機基礎實驗指導
- Mastering Gephi Network Visualization
- Raspberry Pi開發實戰
- Learning Google Apps Script
- 面向對象程序設計教程(C#版)