PHP DOMDocument,用p包装所有没有节点的元素


PHP DOMDocument, wrap all elements without node with p

我从RTE获取HTML。之后我用 DOMDocument 类操纵它的内容。

编辑器有时会给我没有节点的文本,例如:

<p>This is some text inside a text-node</p>
This is text without any node and should be wrapped with a text-node

是否可以使用 DOMDocument 用文本节点包装此文本?

我在函数中使用以下代码:

    $dom = new 'DOMDocument();
    $dom->loadHTML($MY_HTML);
    $xpath = new 'DOMXPath($dom);
    foreach ($xpath->query('//p') as $k => $paragraph) {
        $paragraph->setAttribute('class', $paragraph->getAttribute('class') . ' bodytext');
    }
    $body = $xpath->query('/html/body');
    return preg_replace('/^<body>|<'/body>$/', '', $dom->saveXml($body->item(0)));

从技术上讲,文本已经在"文本节点"内,但这将使用段落节点包装所有未换行的文本节点:

<?php
$html = <<<'END'
<div>
    <p>This is some text inside a text-node</p>
    This is text without any node and should be wrapped with a text-node
</div>
END;
$doc = new 'DOMDocument();
$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED);
$xpath = new 'DOMXPath($doc);
$nodes = $xpath->query('//text()[not(ancestor::p)][normalize-space()]');
foreach ($nodes as $node) {
    $p = $doc->createElement('p', htmlspecialchars(trim($node->textContent)));
    $node->parentNode->replaceChild($p, $node);
}
print $doc->saveHTML($doc->documentElement);
// <div>
//   <p>This is some text inside a text-node</p>
// <p>This is text without any node and should be wrapped with a text-node</p>
// </div>

关键是使用 //text()[not(ancestor::p)][normalize-space()] XPath 查询选择所有不p祖先的非空文本节点。