删除嵌套引号


Remove nested quotes

我有这个文本,我试图删除所有内部引号,只保留一个引用级别。引号内的文本包含任何字符,甚至换行符等。这可以使用正则表达式还是我必须编写一个小解析器?

[quote=foo]I really like the movie. [quote=bar]World 
War Z[/quote] It's amazing![/quote]
This is my comment.
[quote]Hello, World[/quote]
This is another comment.
[quote]Bye Bye Baby[/quote]

这是我想要的文字:

[quote=foo]I really like the movie.  It's amazing![/quote]
This is my comment.
[quote]Hello, World[/quote]
This is another comment.
[quote]Bye Bye Baby[/quote]

这是我在 PHP 中使用的正则表达式:

%'[quote's*(=[a-zA-Z0-9'-_]*)?'](.*)'[/quote']%si

我也尝试了这个变体,但它与.,不匹配,我不知道在报价中还能找到什么:

%'[quote's*(=[a-zA-Z0-9'-_]*)?'](['w's]+)'[/quote']%i

问题出在这里:

(.*)

你可以使用这个:

$result = preg_replace('~'G(?!'A)(?>('[quote'b[^]]*](?>[^[]+|'[(?!/?quote)|(?1))*'[/quote])|(?<!'[)(?>[^[]+|'[(?!/?quote))+'K)|'[quote'b[^]]*]'K~', '', $text);

详:

'G(?!'A)              # contiguous to a precedent match
(?>                   ## content inside "quote" tags at level 0
  (                    ## nested "quote" tags (group 1)
    '[quote'b[^]]*]
    (?>                ## content inside "quote" tags at any level
      [^[]+
     |                  # OR
      '[(?!/?quote)
     |                  # OR
      (?1)              # repeat the capture group 1 (recursive)
    )*
    '[/quote]
  )
 |
  (?<!'[)           # not preceded by an opening square bracket
  (?>              ## content that is not a quote tag
    [^[]+           # all that is not a [
   |                # OR
    '[(?!/?quote)   # a [ not followed by "quote" or "/quote"
  )+'K              # repeat 1 or more and reset the match
)
|                   # OR
'[quote'b[^]]*]'K   # "quote" tag at level 0 

使用此模式

'[quote=?[^']]*'][^'[]*'[/quote'](?=((.(?!'[q))*)'[/)

并替换为任何内容就像在这个例子中

我认为编写解析器会更容易。

使用正则表达式查找[quote]['quote],然后分析结果。

preg_match_all('#('[quote[^]]*']|'['/quote'])#', $bbcode, $matches, PREG_OFFSET_CAPTURE);
$nestlevel = 0;
$cutfrom = 0;
$cut = false;
$removed = 0
foreach($matches(0) as $quote){
    if (substr($quote[0], 0, 1) == '[') $nestlevel++; else $nestlevel--;
    if (!$cut && $nestlevel == 2){ // we reached the first nested quote, start remove here
        $cut = true;
        $cutfrom = $quote[1];
    }
    if ($cut && $nestlevel == 1){ // we closed the nested quote, stop remove here
        $cut = false;
        $bbcode = substr_replace($bbcode, '', $cutfrom - $removed, $quote[1] + 8 - $removed); // strlen('['quote]') = 8
        $removed += $quote[1] + 8 - $cutfrom;
    }
);