在PHP:DOMXPath中使用//text()时包装任何HTML标记


Wrap any HTML tags when using //text() in PHP: DOMXPath

我有以下HTML:

<div id="ABC">
    <i>Lorem Ipsum</i> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.
    <br>
    It has survived not only <b>five centuries</b>, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with <i>desktop publishing software</i> like Aldus PageMaker including versions of Lorem Ipsum.
</div>

我使用以下查询将ABC内容存储在数组中:

foreach ( $xpath->query('//div[@id="ABC"]/text() | //div[@id="ABC"]/i | //div[@id="ABC"]/b') as $text ) {
     $data['content'][] = $text->nodeValue; 
}

输出是这样的:

   [content] => Array
        (
            [0] => Lorem Ipsum
            [1] => is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.
            [2] => It has survived not only
            [3] => five centuries
            [4] => , but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with
            [5] => desktop publishing software
            [6] => like Aldus PageMaker including versions of Lorem Ipsum.
   )

如果我想要这样的输出,有可能吗

   [content] => Array
        (
            [0] => Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.
            [1] => It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
   )

您可以将文本节点累积到字符串中,直到遇到br节点。此时,您将累积的字符串添加到$data['content']数组中,并将该字符串重置为零。在循环结束时,如果数组不为空,您还需要将累积的字符串添加到数组中。

所以这个循环应该是这样的:

$line = '';
foreach ( $xpath->query('//div[@id="ABC"]/text() | //div[@id="ABC"]/i | //div[@id="ABC"]/b  | //div[@id="ABC"]/br') as $text ) {
  if ($text->nodeName == 'br') {
    $data['content'][] = $line;
    $line = '';
  }
  else
    $line .= $text->nodeValue;
}
if ($line) $data['content'][] = $line;

请注意,我已经向$xpath->query调用添加了一个//div[@id="ABC"]/br查询,以便在循环中返回br节点。