I';我试图用自定义嵌套标签解析html中的一些文本


I'm trying to parse some text in html with custom nested tags

我想把一些文本解析成一个数组:

我的文本如下:

You've come to the {right; correct; appropriate} place! Start by {searching; probing; inquiring} our site below, or {browse; {search; lookup; examine}} our list of popular support articles.

第三组单词有嵌套标签。如何忽略打开和关闭嵌套标签来实现之类的数组

$tags[0][0] = 'right';
$tags[0][1] = 'suitable';
$tags[0][2] = 'appropriate';
$tags[1][0] = 'searching';
$tags[1][1] = 'probing';
$tags[1][2] = 'inquiring';
$tags[2][1] = 'browse';
$tags[2][2] = 'search';
$tags[2][3] = 'lookup';
$tags[2][4] = 'examine';

本质上忽略了标签的嵌套。如有任何帮助,我们将不胜感激。

我目前唯一的想法是逐个字符遍历文本,直到我找到一个{这将增加一个"深度"变量。捕获中间的单词,直到我发现一个}减少深度变量,当它返回零时,停止捕获单词。我只是想知道是否有更简单的方法。谢谢

感谢您的出色帮助,我对其进行了一些修改,以提出以下解决方案

$code = "You've come to {the right; the correct; the appropriate} place! 
    Start by {searching; probing; inquiring} our site below, or 
    {browse; {search; {foo; bar}; lookup}; examine} our list of 
    popular support articles.";
echo $code."'r'n'r'n";
preg_match_all('/{((?:[^{}]*|(?R))*)}/', $code, $matches);
$arr = array();
$r = array('{','}');
foreach($matches[1] as $k1 => $m)
{
    $ths = explode(';',str_replace($r,'',$m));
    foreach($ths as $key => $val)
    {
        if($val!='')
        $arr[$k1][$key] = trim($val);
        $code = str_replace($matches[0][$k1],'[[rep'.$k1.']]',$code);
    }
}    
echo $code;

退货

你来到了正确的地方!从下面的{search;probing;inquicking}我们的网站开始,或者{browse;{search;{foo;bar};查找};查看我们的热门支持文章列表。

你已经来到[[rep0]]的地方了!从下面的[[rep1]]我们的网站开始,或者从[[rep2]]我们的热门支持文章列表开始。

我目前唯一的想法是逐个字符遍历文本,直到我找到一个{这将增加一个"深度"变量。捕获中间的单词,直到我发现一个}减少深度变量,当它返回零时,停止捕获单词。我只是想知道是否有更简单的方法。

这听起来是一种合理的方法。另一种方法是使用一点regex,尽管可能会导致一个比您自己的解决方案可读性差(因此维护性差)的解决方案。

<?php
$text = "You've come to the {right; correct; appropriate} place! 
    Start by {searching; probing; inquiring} our site below, or 
    {browse; {search; {foo; bar}; lookup}; examine} our list of 
    popular support articles. {the right; the correct; the appropriate}";
preg_match_all('/{((?:[^{}]*|(?R))*)}/', $text, $matches);
$arr = array();
foreach($matches[1] as $m) {
  preg_match_all('/'w(['w's]*'w)?/', $m, $words);
  $arr[] = $words[0];
}    
print_r($arr);
?>

将产生:

Array
(
    [0] => Array
        (
            [0] => right
            [1] => correct
            [2] => appropriate
        )
    [1] => Array
        (
            [0] => searching
            [1] => probing
            [2] => inquiring
        )
    [2] => Array
        (
            [0] => browse
            [1] => search
            [2] => foo
            [3] => bar
            [4] => lookup
            [5] => examine
        )
    [3] => Array
        (
            [0] => the right
            [1] => the correct
            [2] => the appropriate
        )
)