Regex 正規表示法 - 群組與環顧 (Groups & Lookaround)

Capturing Group `( )`

在 Regex 的 pattern 中，用小括號 ( ) 圈住的部分表示一個子樣式 (subexpression)，也稱作是一個群組 (group)。

( ) 有幾個用途：

pattern 裡面的小括號可以用來擷取 (capture) 出子字串 (substring)，你最後可以從匹配群組 (capturing group) 的結果中取得這些子字串。
pattern 裡面如果有小括號，那麼括號中匹配到的子字串是可以在 pattern 中再被取出來使用的，這行為稱作回朔 (backreference)。
小括號在 Regex 當中也有 group 的意思，用在把括號內的條件式當作是一個整體。例如 (foo){2,} 表示匹配連續出現兩次以上的 foo 字串，而不是像 foo{2,} 中的量詞 {2,} 只會作用在最後的 o 字元。

JavaScript 使用範例

var match = /(hello \S+)/.exec('This is a hello world!');
console.log(match[1]); // hello world!

Python 使用範例

import re
match = re.search(r'(hello \S+)', 'This is a hello world!')
print(match.group(1)) # hello world!

PHP 使用範例

preg_match('/(hello \S+)/', 'This is a hello world!', $match);
echo $match[1]; // hello world!

Ruby 使用範例

match = 'This is a hello world!'.match(/(hello \S+)/).captures
puts match[0] # hello world!

Backreference `\1`

回朔 (backreference) 的語法用來引用一個 capturing group 的內容。

backreference 的語法是反斜線加上一個數字 \1，數字從 1 開始表示第一個 capturing group。

例子：

pattern (\w)a\1，其中的 \1 表示 \w 擷取到的字元，我們拿來匹配字串 "hahaha"，得到的結果會是 "hah"。

JavaScript 使用範例

var match = /(hello) \1 \S+/.exec('This is a hello hello world!');
console.log(match[0]); // hello hello world!

Python 使用範例

import re
match = re.search(r'(hello) \1 \S+', 'This is a hello hello world!')
print(match.group(0)) # hello hello world!

PHP 使用範例

preg_match('/(hello) \1 \S+/', 'This is a hello hello world!', $match);
echo $match[0]; // hello hello world!

Ruby 使用範例

match = 'This is a hello hello world!'.match(/(hello) \1 \S+/)
puts match[0] # hello hello world!

Named Capturing Group `(?P<name> )`

現在大多數程式語言的 Regex 引擎都支持 named group，意思就是你可以幫每個 capturing group 命名，以方便後續的讀取和引用。

例子：

pattern (?P<digit>\d+) 中你可以將匹配的數字群組命名為 digit，在程式中，你就可以方便的像這樣取得匹配結果 result.get_group("digit")

backreference name

(?P=name) 則是用來取得 named group 的回朔 (backreference) 語法。

例如 pattern (?P<f>foo)123(?P=f) 可以用來匹配 "foo123foo" 字串。

JavaScript 使用範例

JavaScript 不支援 Named Capturing Group。

Python 使用範例

import re
match = re.search(r'(?P<foo>hello \S+)', 'This is a hello world!')
print(match.group('foo')) # hello world!

PHP 使用範例

preg_match('/(?P<foo>hello \S+)/', 'This is a hello world!', $match);
echo $match['foo']; // hello world!

Ruby 使用範例

match = 'This is a hello world!'.match(/(?<foo>hello \S+)/)
puts match[:foo] # hello world!

Ruby 語法是用 (?<name> ) 而不是 (?P<name> )。

Non-Capturing Group `(?: )`

non-capturing group 的語法相對於 capturing group，表示僅需要用作 group 的用途，但不需要擷取群組。

語法用法例如 (?:foo){2,}。

JavaScript 使用範例

var match = /(?:hello) (\S+)/.exec('This is a hello world!');
console.log(match[1]); // world!

Python 使用範例

import re
match = re.search(r'(?:hello) (\S+)', 'This is a hello world!')
print(match.group(1)) # world!

PHP 使用範例

preg_match('/(?:hello) (\S+)/', 'This is a hello world!', $match);
echo $match[1]; // world!

Ruby 使用範例

match = 'This is a hello world!'.match(/(?:hello) (\S+)/).captures
puts match[0] # world!

Lookaround

環顧 (lookaround) 跟錨點符號 (anchor) 有些類似，都用來做邊界檢查用的，不用來比對字元，本身不佔據任何字元位置，也不會被列入 capturing group 裡 (non-capturing)。

但相較於 anchor 的邊界條件是比較單純的單字邊界、行首、行末等，如果要處理匹配比較複雜的特定文字內容的邊界，這時就可以使用 lookaround 語法。

可以這樣想像 lookaround，就像是站在原地不動，向前或向後檢視，而依檢視的順序，可以再區分為 lookahead (向前環顧) 和 lookbehind (向後環顧)。

lookaround 的向前向後，是初學者比較容易困惑的地方，這邊不是代表文字出現的先後，而是 Regex 引擎測試條件的先後順序。

Positive Lookahead `(?= )`

positive lookahead 是用來檢視某文字後面連接的內容是否符合預期，只有在接下來的內容能夠滿足指定條件的前提下，才會繼續進行匹配。

語法 A(?=B)，括號中是想要檢視的內容，可以是一般的 Regex 語法，表示 A 後面必須是接著 B。

例子：

a(?=[bcd]) 表示 a 後面接的必須是 b, c 或 d 字元，可以用來匹配 "ab", "123acxyz" 等字串，但不能匹配 "a101", "ahi" 等字串。

(?:foo)(?=hello) 表示找出 foo 且後面接的必須是 hello，可以用來匹配 "foohello", "this is foohello 123" 等字串，但不能匹配 "foohell", "123 foo hi", "oohello" 等字串。

Negative Lookahead `(?! )`

negative lookahead 是相對於 positive lookahead，只有在接下來的內容能不滿足指定條件的前提下，才會繼續進行匹配。

語法 A(?!B)，括號中是想要檢視的內容，可以是一般的 Regex 語法，表示 A 後面不能是接著 B。

例子：

a(?![bcd]) 表示 a 後面接的不能是 b, c 或 d 字元，可以用來匹配 "a101", "ahi" 等字串，但不能匹配 "ab", "123acxyz" 等字串。

(?:foo)(?!hello) 表示找出 foo 且後面接的不能是 hello，可以用來匹配 "foohell", "123 foo hi" 等字串，但不能匹配 "foohello", "this is foohello 123", "oohello" 等字串。

Positive Lookbehind `(?<= )`

positive lookbehind 是用來檢視某文字前面連接的內容是否符合預期，只有在之前的內容能夠滿足指定條件的前提下，才會繼續進行匹配。

語法 (?<=B)A，括號中是想要檢視的內容，可以是一般的 Regex 語法，表示 A 前面必須是接著 B。

例子：

(?<=[bcd])a 表示 a 前面接的必須是 b, c 或 d 字元，可以用來匹配 "ba", "123caxyz" 等字串，但不能匹配 "a101", "helloa" 等字串。

(?<=hello)(?:foo) 表示找出 foo 且前面接的必須是 hello，可以用來匹配 "hellofoo", "this is hellofoo 123" 等字串，但不能匹配 "foohell", "123 foo hi", "hellooo" 等字串。

Negative Lookbehind `(?<! )`

negative lookbehind 是相對於 positive lookbehind，只有在前面的內容不滿足指定條件的前提下，才會繼續進行匹配。

語法 (?<!B)A，括號中是想要檢視的內容，可以是一般的 Regex 語法，表示 A 前面不能接著 B。

例子：

(?<![bcd])a 表示 a 前面接的不能是 b, c 或 d 字元，可以用來匹配 "a101", "helloa" 等字串，但不能匹配 "ba", "123caxyz" 等字串。

(?<!hello)(?:foo) 表示找出 foo 且前面接的不能是 hello，可以用來匹配 "foohell", "123 foo hi" 等字串，但不能匹配 "hellofoo", "this is hellofoo 123", "hellooo" 等字串。

JavaScript 使用範例

JavaScript 不支援 lookbehind (?<= ) (?<! )，只支援 Lookahead：

/Jack(?=Sprat)/.test('JackFrost'); // false
/Jack(?=Sprat)/.test('JackSprat'); // true

Python 使用範例

import re
match = re.search(r'(?<=<b>)\w+(?=<\/b>)', 'Fortune favours the <b>bold</b>')
print(match.group()) # bold

PHP 使用範例

preg_match('/(?<=<b>)\w+(?=<\/b>)/', 'Fortune favours the <b>bold</b>', $match);
echo $match[0]; // bold

Ruby 使用範例

match = 'Fortune favours the <b>bold</b>'.match(/(?<=<b>)\w+(?=<\/b>)/)
puts match[0] # bold

Regex 正規表示法 - 群組與環顧 (Groups & Lookaround)

Capturing Group ( )

JavaScript 使用範例

Python 使用範例

PHP 使用範例

Ruby 使用範例

Backreference \1

JavaScript 使用範例

Python 使用範例

PHP 使用範例

Ruby 使用範例

Named Capturing Group (?P<name> )

backreference name

JavaScript 使用範例

Python 使用範例

PHP 使用範例

Ruby 使用範例

Non-Capturing Group (?: )

JavaScript 使用範例

Python 使用範例

PHP 使用範例

Ruby 使用範例

Lookaround

Positive Lookahead (?= )

Negative Lookahead (?! )

Positive Lookbehind (?<= )

Negative Lookbehind (?<! )

JavaScript 使用範例

Python 使用範例

PHP 使用範例

Ruby 使用範例

Capturing Group `( )`

Backreference `\1`

Named Capturing Group `(?P<name> )`

Non-Capturing Group `(?: )`

Positive Lookahead `(?= )`

Negative Lookahead `(?! )`

Positive Lookbehind `(?<= )`

Negative Lookbehind `(?<! )`