使用PHP从其他网站解析(无效)HTML


Parsing (invalid) HTML from other website using PHP

我试图从以下URL解析以下HTML:

http://md5.rednoize.com/?q=fbade9e36a3f36d3d676c1b808451dd7

代码:

    $html = file_get_contents($url.$hash);
    $config = array(
      'clean' => 'yes',
      'output-html' => 'yes',
    );
    $tidy = tidy_parse_string($html, $config, 'utf8');
    $tidy->cleanRepair();
    $dom = new DOMDocument;
    $dom->loadHTML($tidy);
    $result = $dom->getElementById('result');

但是无效:

Warning: DOMDocument::loadHTML() [<a href='domdocument.loadhtml'>domdocument.loadhtml</a>]: ID switcher already defined in Entity, line: 128 in

是否有一种方法仍然能够解析它?

您可以在关闭严格错误检查后尝试解析它:

$dom = new DOMDocument;
$dom->strictErrorChecking = FALSE;
$dom->loadHTML($tidy);