我想使用preg_match_all()来提取[[和]]之间的内容,但忽略[[和]]],因此例如此文本:
$text = <<<TEXT
Some text going here
[[ 1. this is a text ]]
another text but multiple lines
[[ 2. this
is a
text ]]
This should be ignored, haveing 3 on the left
[[[ 3. this is a text ]]
This should be ignored, haveing 3 on the right
[[ 4. this is a text ]]]
This should be ignored, haveing 3 both on the left and right
[[[ 5. this is a text ]]]
This is the final sentence.
[[ 6. this is a text ]]
TEXT;
if (preg_match_all("(?!<'[)('['[.*?']'])(?!'[)", $text, $tags, PREG_PATTERN_ORDER)) {
$tags = $tags[0];
}
echo '<pre>';
print_r(tags);
echo '</pre>';
因此,只选择1.、2.和6。但是,我在上面尝试的regex选择了除2.之外的所有内容,没有按预期工作。
您可以使用以下模式:
preg_match_all('~(?<!'[)'['[(?!'[)([^]]*)]](?!])~', $text, $tags);
备注:
无需指定PREG_PATTER_ORDER,因为它是PREG_match*函数的默认集合
我为方括号内的内容添加了捕获括号,如果不需要,可以删除它们
如果标签内不允许使用方括号,则图案可以缩短为:
~(?<!'[)'['[([^][]*)]](?!])~
这里有一个正则表达式可以完成这项工作:
((?<!'[)'['[([^'[][^']]*)']'](?!']))
正则表达式101
分解
- 任何未由[
- [[
- 任何字符[
- 任何字符,但]0次或更多次
- ]]
- 后面不跟a]
这应该是防弹的,除非它需要至少一个介于[[和]]之间的字符。
尝试:
preg_match_all('/('A|[^[])'[{2}[^[](?<content>[^]]+)[^]]']{2}([^]]|'z)/s', ...)
http://regex101.com/r/jC2mM0
http://codepad.viper-7.com/bbs3oR
Array
(
[0] => Array
(
[0] =>
[[ 1. this is a text ]]
[1] =>
[[ 2. this
is a
text ]]
[2] =>
[[ 6. this is a text ]]
)
[1] => Array
(
[0] => 1. this is a text
[1] => 2. this
is a
text
[2] => 6. this is a text
)
[2] => Array
(
[0] =>
[1] =>
[2] =>
)
)