PHP.XPath.调用str_replace()会导致';未终止实体';错误


PHP. XPath. Call to str_replace() cause an 'unterminated entity' error.

使用XPath解析此链接

$html = '<a href="/browse/product.do?cid=1&amp;vid=1&amp;pid=1" class="productItemName">what is going on here</a>';
$dom = new DOMDocument();
$dom->loadhtml($html);
$xpath = new DOMXPath($dom);
$selectors['link'] = '//a/@href';
$links_nodeList = $xpath->query($selectors['link']);
foreach ($links_nodeList as $link) {
    $link->nodeValue = str_replace("http://www.test.com",'',$link->nodeValue); // relativize link
    $links[] = $link->nodeValue;
}
echo("<p>links</p>");
echo("<pre>");
print_r($links);
echo("</pre>");

给出结果:

Warning: main() [function.main]: unterminated entity reference vid=1&pid=1 in C:'Users'dir'public_html'whatisgoingon.php on line 14
links
Array
(
    [0] => /browse/product.do?cid=1
)

这条线导致了错误和链接的截断。这是怎么回事?

$link->nodeValue = str_replace("http://www.test.com",'',$link->nodeValue);

引用nodeValue时,必须对URL中的&amp;进行解码。用htmlentities() 包裹

foreach ($links_nodeList as $link) {
    $link->nodeValue = str_replace("http://www.test.com",'',htmlspecialchars($link->nodeValue)); // relativize link
    $links[] = htmlspecialchars($link->nodeValue, ENT_QUOTES, 'UTF-8');
}

输出:

Array
(
    [0] => /browse/product.do?cid=1&amp;vid=1&amp;pid=1
)