如何使用PHP DOMDocument从HTML字符串中删除不需要的样式属性


How to Remove unwanted Style attributes from HTML String Using PHP DOMDocument

我准备了一个允许样式的白名单,我想从HTML字符串

中删除白名单中的所有样式
$allowed_styles = array('font-size','color','font-family','text-align','margin-left');
$html = 'xyz html';
$html_string = '<bdoy>' . $html . '<body>';
$dom = new DOMDocument();
$dom->loadHTML($html_string);
$elements = $dom->getElementsByTagName('body');
foreach($elements as $element) {
foreach($element->childNodes as $child) {
if($child->hasAttribute('style')) {
$style = strtolower(trim($child->getAttribute('style')));
//match and get only the CSS Property name
preg_match_all('/(?<names>[a-z'-]+):/', $style, $matches);
for($i=0;$i<sizeof($matches["names"]);$i++) {
  $style_property = $matches["names"][$i];
  // if the css-property is not in allowed styles array
  // then remove the whole style tag from this child
  if(!in_array($style_property,$allowed_styles)) {
   $child->removeAttribute('style');
   continue;
   }
}
    }
  }
}
$dom->saveHTML();
$html_output = $dom->getElementsByTagName('body');

我已经测试了很多html字符串,它在任何地方都能很好地工作。但是当我试图过滤这个html字符串

$html_string = ​'<div style="font-style: italic; text-align: center; 
background-color: red;">On The Contrary</div><span 
style="font-style: italic; background-color: rgb(244, 249, 255); 
font-size: 32px;"><b style="text-align: center; 
background-color: rgb(255, 255, 255);">This is USA</b></span>';

除了

这一行,所有其他不允许的样式都从这个字符串中删除。
<b style="text-align: center; background-color: rgb(255, 255, 255);">

有谁能告诉我除了白名单之外,还有什么更有效、更可靠的方法来删除样式吗

类似于Oleja的解决方案,但这一个只删除不允许的属性,而不是整个style属性。

//$this->removeStylesheet($doc, ['color','font-weight']);
function removeStylesheet($tree, $allowed_styles) {
    if ($tree->nodeType != XML_TEXT_NODE) {
        if ($tree->hasAttribute('style')) {
            $style = strtolower(trim($tree->getAttribute('style')));
            preg_match_all('/(?<names>[a-z'-]+) *:(?<values>[^''";]+)/', $style, $matches);
            $new_styles = array();
            for ($i=0; $i<sizeof($matches['names']); $i++) {
                if(in_array($matches['names'][$i], $allowed_styles)) {
                    $new_styles[] = $matches['names'][$i].':'.$matches['values'][$i];
                }
            }
            if ($new_styles)
                $tree->setAttribute('style', implode(';', $new_styles));
            else
                $tree->removeAttribute('style');
        }
        if ($tree->childNodes) {
            foreach ($tree->childNodes as $child) {
                $this->removeStylesheet($child, $allowed_styles);
            }
        }
    }
}

对于这个(和其他嵌套的)html,你必须使用这样的递归函数:

$html = 'your html';
$allowed_styles = array('font-size','color','font-family','text-align','margin-left');
$html_string = '<body>' . $html . '</body>';
$dom = new DOMDocument();
$dom->loadHTML($html_string);
$elements = $dom->getElementsByTagName('body');
foreach ($elements as $element)
    clearHtml($element, $allowed_styles);
$html_output = $dom->saveHTML(); 
function clearHtml($tree, $allowed_styles) {
    if ($tree->nodeType != XML_TEXT_NODE) {
        if ($tree->hasAttribute('style')) {
            $style = strtolower(trim($tree->getAttribute('style')));
            preg_match_all('/(?<names>[a-z'-]+):/', $style, $matches);
            for($i = 0; $i < sizeof($matches['names']); $i++) {
                $style_property = $matches['names'][$i];
                if(!in_array($style_property, $allowed_styles)) {
                    $tree->removeAttribute('style');
                    continue;
                }
            }
        }
        if ($tree->childNodes)
            foreach ($tree->childNodes as $child)
                clearHtml($child, $allowed_styles);
    }
}