使用正则表达式(或任何其他方式)匹配基本HTML


Matching basic HTML using regex (or any other way)

我有一些HTML如下:

    <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
    <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...
    <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)

现在,使用php,我想将其拆分并制作两个数组,如下所示:

阵列1-(这将包含<b>标签中的所有内容)

    [0] -> <b>This is a title: </b>
    [1] -> <b>Some more text: </b>
    ...
    [n] -> <b>Hello world!: </b>

阵列2-(这将使所有<b>标签之外)

    [0] -> 0091 + Two + 423 + Four + (Five, Six, Seven)
    [1] -> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...
    [n] -> Test + Foo + 1122 + (120, 122, Four)

我试着使用正则表达式和preg_match_all,但似乎无法理解它们。如有任何帮助,我们将不胜感激。

谢谢!

<?php 
$string = '    <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
    <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...
    <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)';
preg_match_all("#(<b>[^<]+<'/b>)([^<]+)#", $string, $matches);
print_r($matches);
?> 

输出:

Array
(
    [0] => Array
        (
            [0] => <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
            [1] => <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...
            [2] => <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
        )
    [1] => Array
        (
            [0] => <b>This is a title: </b>
            [1] => <b>Some more text: </b>
            [2] => <b>Hello world!: </b>
        )
    [2] => Array
        (
            [0] =>  0091 + Two + 423 + Four + (Five, Six, Seven)
            [1] =>  Abc + Hi + Random + Text + (Hello, 522, Four)
    ...
            [2] =>  Test + Foo + 1122 + (120, 122, Four)
        )
)

你可以试试这个:

<pre>
<?php
$subject =<<<LOD
<b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
<b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
<b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
LOD;
$pattern = '~(<b>.*?</b>)((?>[^<]+|<(?!b))*)~';
preg_match_all($pattern, $subject, $matches);
array_shift($matches);
array_walk_recursive($matches,function (&$val){ $val=trim($val); });
list($array1, $array2) = $matches;
print_r($array1);
print_r($array2);