PHP 删除无效字符


php remove invalid chars

当我

使用$dom->loadHTML('<?xml version="1.0" encoding="UTF-8"?>' . $html);时,我输出了以下错误。

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Char 0xD860 out of allowed range in Entity, line: 1 in D:'xampp'xampp'htdocs'xampp'similarity'functions.php on line 438
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Char 0xDEE2 out of allowed range in Entity, line: 1 in D:'xampp'xampp'htdocs'xampp'similarity'functions.php on line 438
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Invalid char in CDATA 0x3 in Entity, line: 1 in D:'xampp'xampp'htdocs'xampp'similarity'functions.php on line 438

如何使用 php 定位和删除那些"无效"的字符?

安德烈

未经测试,但这应该有效:

$buffer = ob_get_clean();
$tidy = new tidy();
$myHTML = $tidy->repairString('<?xml version="1.0" encoding="UTF-8"?>' . $html);
$dom->loadHTML($myHTML);