我使用XPATH来删除不整洁的HTML标签,
$nodeList = $xpath->query("//*[normalize-space(.)='' and not(self::br)]");
foreach($nodeList as $node)
{
$node->parentNode->removeChild($node);
}
将删除像这样可怕的输入,
<p><em><br /></em></p>
<p><span style="text-decoration: underline;"><em><br /></em></span></p>
但是它也去掉了我想要保留的img tag
,
<p><img title="picture summit" src="images/32913430_127001_e.jpg" alt="picture summit" width="590" height="366" /></p>
如何使用XPATH保持img tag
输入?
使用:
//p[not(descendant::*[self::img or self::br]) and normalize-space()='']
也许您可以使用如下所示的XPath 1.0表达式来删除不需要的段落:
//p[count(text())=0 and count(img)=0]