官术网_书友最值得收藏!

Possessive quantifiers

Possessive quantifiers are quantifiers that are greedy when matching text like greedy quantifiers do. Both greedy and possessive quantifiers try to match as many characters as possible. The important difference, however, is that the possessive quantifiers do not backtrack (go back) unlike greedy quantifiers; therefore, it is possible that the regex match fails if the possessive quantifiers go too far.

This table shows all the three types of quantifiers, side by side:

Let's take an example input string, a1b5, and see the behavior of the greedy, lazy, and possessive quantifiers.

If we apply a regex using the greedy quantifier, \w+\d, then it will match a1b (the longest match before backtracking starts) using \w+, and 5 will be matched using \d; thus, the full match will be a1b5.

Now, if we apply a regex using the non-greedy quantifier, \w+?\d, then it will match a (the shortest match before expanding starts) using \w+?, and then the adjacent digit 1 will be matched using \d. Thus, the first full match will be a1. If we let the regex execute again, then it will find another match, b5.

Finally, if we apply a regex using the possessive quantifier, \w++\d, then it will match all the characters a1b5 (the longest possible match without giving back) using \w++ . Due to this, \d remains unmatched, and hence the regex fails to find any match.

Let's take another example. The requirement is to match a string that starts with lowercase English alphabets or hyphen. The string can have any character after the alphabets/hyphens, except a colon. There can be any number of any characters of any length after the colon until the end.

An example of a valid input is as-df999 and that of an invalid input is asdf-:123.

Now, let's try solving this regex problem using a greedy quantifier regex:

    ^[a-z-]+[^:].*$

Unfortunately, this is not the right regex pattern because this regex will match both the aforementioned valid and invalid inputs. This is because of the backtracking behavior of the regex engine in greedy quantifiers. The [a-z-]+ pattern will find the longest possible match in the form of asdf-, but due to the negated character class pattern [^:] , the regex engine will backtrack one position to asdf and will match the next hyphen for [^:]. All the remaining text, that is, :123, will be matched using .*.

Let's try to solve this regex problem using the following possessive quantifier regex:

    ^[a-z-]++[^:].*$

This regex pattern will still match our valid input, but it will fail to match an invalid input because there is no backtracking in possessive quantifiers; hence, the regex engine will not go back any position after matching asdf- in the second example string. Since the next character is a colon and our regex sub-pattern is [^:], the regex engine will stop matching and correctly declare our invalid input a failed match.

Possessive quantifiers are good for the performance of the underlying regex engine because the engine does not have to keep any backtracking information in memory. The performance increase is even more when a regex fails to match because possessive quantifiers fail faster. So, remember that the benefit of possessive quantifiers is to improve the regex performance, especially when using nested quantifiers.

主站蜘蛛池模板: 德惠市| 长寿区| 弥渡县| 平果县| 大城县| 乐平市| 库尔勒市| 淮安市| 蒲江县| 阿拉尔市| 郑州市| 昭苏县| 阿合奇县| 高台县| 和平区| 贡山| 论坛| 孟村| 涞水县| 营口市| 扶余县| 宁海县| 湖州市| 巴林左旗| 阜阳市| 诏安县| 福州市| 紫云| 长汀县| 顺平县| 武定县| 康定县| 福建省| 吴旗县| 新干县| 临西县| 江孜县| 胶南市| 日喀则市| 宿迁市| 宜州市|