在较长的文本中查找与php/regex匹配的括号


Finding matching brackets with php/regex in longer texts

我已经试着思考了很长一段时间,但仍然没有找到解决方案。

我正在研究一些简单的格式化方法,其中我想要一些包含括号内字符串的标记,并在括号前定义标记。标签也应该能够放在其他括号内。

字符串:

This is some random text, tag1{while this is inside a tag2{tag}}. This is some
other text tag2{also with a tag  tag3{inside} of it}.

我现在想做的是每个的内容

tag1{}
tag2{}
tag3{}

我发现其他人也有类似的问题(使用正则表达式查找匹配的括号),但他们的问题更多地集中在如何在其他括号中查找匹配的方括号上,而我的问题是两者兼而有之,以及在较长的文本中查找乘法括号。

如果标签总是平衡的,可以使用这样的表达式来获取所有标签的内容和名称,包括嵌套标签。

'b('w+)(?={((?:[^{}]+|{(?2)})*)})

示例:

$str = "This is some random text, tag1{while this is inside a tag2{tag}}. This is some other text tag2{also with a tag  tag3{inside} of it}.";
$re = "/''b(''w+)(?={((?:[^{}]+|{(?2)})*)})/";
preg_match_all($re, $str, $m);
echo "* Tag names:'n";
print_r($m[1]);
echo "* Tag content:'n";
print_r($m[2]);

输出:

* Tag names:
Array
(
    [0] => tag1
    [1] => tag2
    [2] => tag2
    [3] => tag3
)
* Tag content:
Array
(
    [0] => while this is inside a tag2{tag}
    [1] => tag
    [2] => also with a tag  tag3{inside} of it
    [3] => inside
)

我不知道,如果有一个regexp,它可以在一个调用中获得所有内部和外部标记,但您可以从链接的问题中使用这个regexp /'{(([^'{'}]+)|(?R))*'}/,并递归迭代到结果中。

为了更清楚起见,我在正则表达式中添加了您的标签名称和一些命名的子模式:

function search_tags($string, $recursion = 0) {
    $Results = array();
    if (preg_match_all("/(?<tagname>['w]+)'{(?<content>(([^'{'}]+)|(?R))*)'}/", $string, $matches, PREG_SET_ORDER)) {
        foreach ($matches as $match) {
            $Results[] = array('match' => $match[0], 'tagname' => $match['tagname'], 'content' => $match['content'], 'deepness' => $recursion);
            if ($InnerResults = search_tags($match['content'], $recursion+1)) {
                $Results = array_merge($Results, $InnerResults);
            }
        }
        return $Results;
    }
    return false;
}

这将返回一个数组,其中包含所有匹配项,包括整个匹配项、标记名称、括号内容和迭代计数器,显示匹配项嵌套在其他标记中的频率。我为您的字符串添加了另一个嵌套级别以进行演示:

$text = "This is some random text, tag1{while this is inside a tag2{tag}}. This is some other text tag3{also with a tag tag4{and another nested tag5{inside}} of it}.";
echo '<pre>'.print_r(search_tags($text), true).'</pre>';

输出为:

Array
(
    [0] => Array
        (
            [match] => tag1{while this is inside a tag2{tag}}
            [tagname] => tag1
            [content] => while this is inside a tag2{tag}
            [deepness] => 0
        )
    [1] => Array
        (
            [match] => tag2{tag}
            [tagname] => tag2
            [content] => tag
            [deepness] => 1
        )
    [2] => Array
        (
            [match] => tag3{also with a tag tag4{and another nested tag5{inside}} of it}
            [tagname] => tag3
            [content] => also with a tag tag4{and another nested tag5{inside}} of it
            [deepness] => 0
        )
    [3] => Array
        (
            [match] => tag4{and another nested tag5{inside}}
            [tagname] => tag4
            [content] => and another nested tag5{inside}
            [deepness] => 1
        )
    [4] => Array
        (
            [match] => tag5{inside}
            [tagname] => tag5
            [content] => inside
            [deepness] => 2
        )
)

正则表达式是这样的:

tag[0-9]+'{[^'}]+

并且您应该首先替换内部标签

我认为没有其他办法了。你需要在每个括号上循环。

     $output=array();
     $pos=0;     
while(preg_match('/tag'd+'{/S',$input,$match,PREG_OFFSET_CAPTURE,$pos)){
   $start=$match[0][1];
   $pos=$offset=$start+strlen($match[0][0]);
   $bracket=1;
   while($bracket!==0 and preg_match('/'{|'}/S',$input,$found,PREG_OFFSET_CAPTURE,$offset)){
      ($found[0][0]==='}')?$bracket--:$bracket++;
      $offset=$found[0][1]+1;
   }
   $output[]=substr($input,$start,$offset-$start);
}