官术网_书友最值得收藏!

Possessive quantifiers

Possessive quantifiers are quantifiers that are greedy when matching text like greedy quantifiers do. Both greedy and possessive quantifiers try to match as many characters as possible. The important difference, however, is that the possessive quantifiers do not backtrack (go back) unlike greedy quantifiers; therefore, it is possible that the regex match fails if the possessive quantifiers go too far.

This table shows all the three types of quantifiers, side by side:

Let's take an example input string, a1b5, and see the behavior of the greedy, lazy, and possessive quantifiers.

If we apply a regex using the greedy quantifier, \w+\d, then it will match a1b (the longest match before backtracking starts) using \w+, and 5 will be matched using \d; thus, the full match will be a1b5.

Now, if we apply a regex using the non-greedy quantifier, \w+?\d, then it will match a (the shortest match before expanding starts) using \w+?, and then the adjacent digit 1 will be matched using \d. Thus, the first full match will be a1. If we let the regex execute again, then it will find another match, b5.

Finally, if we apply a regex using the possessive quantifier, \w++\d, then it will match all the characters a1b5 (the longest possible match without giving back) using \w++ . Due to this, \d remains unmatched, and hence the regex fails to find any match.

Let's take another example. The requirement is to match a string that starts with lowercase English alphabets or hyphen. The string can have any character after the alphabets/hyphens, except a colon. There can be any number of any characters of any length after the colon until the end.

An example of a valid input is as-df999 and that of an invalid input is asdf-:123.

Now, let's try solving this regex problem using a greedy quantifier regex:

    ^[a-z-]+[^:].*$

Unfortunately, this is not the right regex pattern because this regex will match both the aforementioned valid and invalid inputs. This is because of the backtracking behavior of the regex engine in greedy quantifiers. The [a-z-]+ pattern will find the longest possible match in the form of asdf-, but due to the negated character class pattern [^:] , the regex engine will backtrack one position to asdf and will match the next hyphen for [^:]. All the remaining text, that is, :123, will be matched using .*.

Let's try to solve this regex problem using the following possessive quantifier regex:

    ^[a-z-]++[^:].*$

This regex pattern will still match our valid input, but it will fail to match an invalid input because there is no backtracking in possessive quantifiers; hence, the regex engine will not go back any position after matching asdf- in the second example string. Since the next character is a colon and our regex sub-pattern is [^:], the regex engine will stop matching and correctly declare our invalid input a failed match.

Possessive quantifiers are good for the performance of the underlying regex engine because the engine does not have to keep any backtracking information in memory. The performance increase is even more when a regex fails to match because possessive quantifiers fail faster. So, remember that the benefit of possessive quantifiers is to improve the regex performance, especially when using nested quantifiers.

主站蜘蛛池模板: 德清县| 大姚县| 双城市| 环江| 鹿邑县| 康定县| 荆州市| 麻阳| 时尚| 泸州市| 江山市| 大足县| 通辽市| 乃东县| 庆阳市| 霍邱县| 福泉市| 聂拉木县| 鄂温| 东城区| 察哈| 彩票| 三河市| 门头沟区| 图片| 平定县| 石柱| 丘北县| 桃园县| 渝北区| 凤阳县| 如东县| 台安县| 建水县| 万安县| 南澳县| 福建省| 丰台区| 雷州市| 瓦房店市| 弥渡县|