根据条件从已解析的HTML文档中删除HTML元素 - Remove HTML element from parsed HTML document on a condition

Remove HTML element from parsed HTML document on a condition

本文关键字：HTML 文档删除元素条件 | 更新日期: 2023-09-27

我已经使用Simple PHP HTML DOM Parser解析了一个HTML文档。在解析的文档中，有一个ul标签，里面有一些li标签。其中一个li标签包含一个我想删除的可怕的"添加这个"按钮。

更糟糕的是，列表项没有类或id，并且它在列表中并不总是处于相同的位置。因此，没有简单的方法（如果我错了，请纠正我）可以用解析器删除它。

我想做的是在所有li元素中搜索字符串"addthis.com"，并删除任何包含该字符串的元素。

<ul>
    <li>Foobar</li>
    <li>addthis.com</li><!-- How do I remove this? -->
    <li>Foobar</li>
</ul>

仅供参考：这是我追求学习PHP的一个业余项目，而不是一个以营利为目的的内容盗窃案。

欢迎所有建议！

找不到显式删除节点的方法，但可以通过将outertext设置为空来删除。

$html = new simple_html_dom();
$html->load(file_get_contents("test.html"), false, false); // preserve formatting
foreach($html->find('ul li') as $element) {
  if (count($element->find('a.addthis_button')) > 0) {
    $element->outertext="";
  }
}
echo $html;

您可以在解析后使用jQuery。类似这样的东西：

$('li').each(function(i) {
    if($(this).html() == "addthis.com"){
        $(this).remove();
    }
});

此解决方案使用DOMDocument类和domnode.removechild方法：

$str="<ul><li>Foobar</li><li>addthis.com</li><li>Foobar</li></ul>";
$remove='addthis.com';
$doc = new DOMDocument();
$doc->loadHTML($str);
$elements = $doc->getElementsByTagName('li');
$domElemsToRemove = array();
foreach ($elements as $element) {
  $pos = strpos($element->textContent, $remove); // or similar $element->nodeValue
  if ($pos !== false) {
    $domElemsToRemove[] = $element;
  }
}
foreach( $domElemsToRemove as $domElement ){
  $domElement->parentNode->removeChild($domElement);
}
$str = $doc->saveHTML(); // <ul><li>Foobar</li><li>Foobar</li></ul>