如何删除字符串中元素的内容,只留下最外层的元素标签


How to remove contents of elements in string, leaving only outermost element tags?

我有一个这样的字符串:

<p>
This is some text
</p>
<p>
This is some text
</p>
<p>
This is some text
</p>
<blockquote data-id="1">
    This is some text
    <blockquote data-id="2">
        This is some text
    </blockquote>
</blockquote>
<blockquote data-id="3">
    <blockquote data-id="4">
        This is some text
        <blockquote data-id="5">
            This is some text
        </blockquote>
    </blockquote>
    This is some text
</blockquote>
<blockquote data-id="6">
    This is some text
</blockquote>

我想保留最外层的blockquote标签,但删除内容。所以我想将上面的内容转换为这个:

<p>
This is some text
</p>
<p>
This is some text
</p>
<p>
This is some text
</p>
<blockquote data-id="1"></blockquote>
<blockquote data-id="3"></blockquote>
<blockquote data-id="6"></blockquote>

在 PHP 中执行此操作的有效方法是什么?

有很多方法可以剥掉这只猫的皮。我会给字符串一个虚拟的根节点,放弃与 xpath 表达式/root/blockquote/text() | /root/blockquote/*匹配的所有节点,然后从根的子节点重建字符串。


例:

$string = <<<'STRING'
<p>
This is some text
</p>
<p>
This is some text
</p>
<p>
This is some text
</p>
<blockquote data-id="1">
    This is some text
    <blockquote data-id="2">
        This is some text
    </blockquote>
</blockquote>
<blockquote data-id="3">
    <blockquote data-id="4">
        This is some text
        <blockquote data-id="5">
            This is some text
        </blockquote>
    </blockquote>
    This is some text
</blockquote>
<blockquote data-id="6">
    This is some text
</blockquote>
STRING;
$dom = new DOMDocument();
$dom->loadXML("<root>$string</root>");
$xpath = new DOMXPath($dom);
foreach ($xpath->query('/root/blockquote/text() | /root/blockquote/*') as $node) {
    $node->parentNode->removeChild($node);
}
$string = '';
foreach ($dom->documentElement->childNodes as $node) {
    $string .= $dom->saveHTML($node);
}
echo $string;

输出:

<p>
This is some text
</p>
<p>
This is some text
</p>
<p>
This is some text
</p>
<blockquote data-id="1"></blockquote>
<blockquote data-id="3"></blockquote>
<blockquote data-id="6"></blockquote>

发布我的问题后不久,我突然想到 DomDocument 可以很好地解决这个问题(尽管可能有更好的解决方案)。

这就是我想出的:

$html = '<p>
This is some text
</p>
<p>
This is some text
</p>
<p>
This is some text
</p>
<blockquote data-id="1">
    This is some text
    <blockquote data-id="2">
        This is some text
    </blockquote>
</blockquote>
<blockquote data-id="3">
    <blockquote data-id="4">
        This is some text
        <blockquote data-id="5">
            This is some text
        </blockquote>
    </blockquote>
    This is some text
</blockquote>
<blockquote data-id="6">
    This is some text
</blockquote>';

libxml_use_internal_errors(true); // MUST INCLUDE THIS LINE!
$dom = new 'DOMDocument();
$dom->loadHTML($html); // pass the HTML string
$xpath = new 'DOMXPath($dom); // pass the appropriate DomDocument object to the constructor
foreach ($xpath->query('//blockquote') as $node) {
    /** @var 'DOMElement $node */
    $node->nodeValue = '';
}
echo domInnerHtml($xpath->query('//body')->item(0));

 /**
 * Returns the inner HTML of a DOMNode
 *
 * @link http://stackoverflow.com/questions/2087103/innerhtml-in-phps-domdocument
 * @param DOMNode $element
 * @return string
 */
function domInnerHtml(DOMNode $element) {
    $innerHtml = '';
    $children  = $element->childNodes;
    foreach ($children as $child) {
        $innerHtml .= $element->ownerDocument->saveHTML($child);
    }
    return $innerHtml;
}

输出为:

<p>
This is some text
</p>
<p>
This is some text
</p>
<p>
This is some text
</p>
<blockquote data-id="1"></blockquote>
<blockquote data-id="3"></blockquote>
<blockquote data-id="6"></blockquote>