正则表达式Regex

正则表达式Regex since 2013-22-22

Basic Grammar

捕获组及反相引用 Grouping and Backreferences

(regex): 捕获组. 如(abc){3}\1中,有一个捕获组(abc),后面复用该捕获组3次,之后反向引用捕获组1
(?:regex): 非捕获组. 如(?:abc){3}中,有一个捕获组(abc),后面复用该捕获组3次,但无法反向引用捕获组
(?<name>regex) 命名捕获组，Java7支持

修改器 Modifiers

(?i)regex (?-i): 打开和关闭大小写敏感
(?s)regex (?-s): 打开和关闭”dot matches newline”
(?m)regex: ^ 匹配每行的开头 $ 匹配每行的结尾
(?-m)regex: ^ 匹配全部内容的开头 $ 匹配全部内容的结尾
(?i-sm)regex: 组合几个开关
(?i-sm:reg1)reg2: 组合开关，只作用于reg1

Atomic Grouping and Possessive Quantifiers

(?>regex)
?+, *+...

查看器 Lookaround

(?=regex): Zero-width positive lookahead.
- Zero-width: not consume any data, 1(?=2)3永远匹配失败
- positive: ‘等于’
- lookahead: ‘向前匹配’, ‘streets’–>t(?=s)–>第二个’t’
(?!regex): Zero-width negative lookahead.
- negative: ‘不等于’
(?<=regex): Zero-width positive lookbehind.
- lookbehind: ‘向后匹配’, ‘streets’–>(?<=s)t–>第一个’t’
(?<!regex): Zero-width positive lookbehind.
- negative: ‘不等于’

Continuing from The Previous Match

\\Gregex

Conditionals

Comments

(?#comment): comments

Common Usage

匹配中文 [\u4e00-\u9fa5]
匹配首尾空白字符 ^\s*|\s*$
匹配邮件地址 \w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
匹配不含有连续横杠的字符串 (-(?!-)|[a-z0-9])*