使用XPath解析此链接
$html = '<a href="/browse/product.do?cid=1&vid=1&pid=1" class="productItemName">what is going on here</a>';
$dom = new DOMDocument();
$dom->loadhtml($html);
$xpath = new DOMXPath($dom);
$selectors['link'] = '//a/@href';
$links_nodeList = $xpath->query($selectors['link']);
foreach ($links_nodeList as $link) {
$link->nodeValue = str_replace("http://www.test.com",'',$link->nodeValue); // relativize link
$links[] = $link->nodeValue;
}
echo("<p>links</p>");
echo("<pre>");
print_r($links);
echo("</pre>");
给出结果:
Warning: main() [function.main]: unterminated entity reference vid=1&pid=1 in C:'Users'dir'public_html'whatisgoingon.php on line 14
links
Array
(
[0] => /browse/product.do?cid=1
)
这条线导致了错误和链接的截断。这是怎么回事?
$link->nodeValue = str_replace("http://www.test.com",'',$link->nodeValue);
引用nodeValue
时,必须对URL中的&
进行解码。用htmlentities()
包裹
foreach ($links_nodeList as $link) {
$link->nodeValue = str_replace("http://www.test.com",'',htmlspecialchars($link->nodeValue)); // relativize link
$links[] = htmlspecialchars($link->nodeValue, ENT_QUOTES, 'UTF-8');
}
输出:
Array
(
[0] => /browse/product.do?cid=1&vid=1&pid=1
)