simplehtmldom-关注链接


simplehtmldom - follow links

有人能举例说明如何在抓取每个元素<a href>时遵循链接并获取其相关信息吗?

$html = file_get_html('http://www.blabla.com/');
$html->find('div', 1)->class = 'bar';

现在每个<li>都有一个链接到更多信息

<li class="#Selected">
<a href="/contactinfo/ITService/">info</a>
<h2>New York</h2>
<h3>USA</h3>
<strong>ITService</strong>
</li>

然后:

<div class="InfoD">
<h2>New York</h2>
<h3>USA</h3>
<strong>ITService</strong>
<p>
Tel. : XXXXXX   
</p>
<p>
Mail. : XXXX@XXX.com    
</p>
</div>

我知道如何使用HTMLDOM抓取这样的元素,但当每个元素和多个页面都有链接时,我不知道如何。如果有人能指出一个例子或任何类似的教程。感谢

首先从li.#Selected a获取所有链接,然后循环从每个链接获取div.InfoD。。。

这里有一个代码片段显示如何:

// includes Simple HTML DOM Parser
include "simple_html_dom.php";
$url = "http://www.blabla.com/";
$baseUrl= "http://www.blabla.com"
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a URL
$html->load_file($url);
// Get all links
$anchors = $html->find('li.#Selected a');
// loop through each link and get the node having "InfoD" class
// Everytime make sure to clear dom objects to avoid memory leaks
foreach ($anchors as $anchor) {
    // Create the new link to parse
    $urlTemp = $baseUrl . $anchor->href;
    //Create a DOM object
    $html2 = new simple_html_dom();
    // Load HTML from a URL
    $html2->load_file($urlTemp);
    // Get all nodes with "text-logo"
    $div = $html->find('div.InfoD', 0);
    echo $div;
    echo "<hr/>";
    // Clear dom object
    $html2->clear(); 
    unset($htm2);
}
// Clear dom object
$html->clear(); 
unset($html);