如何在PHP中将Google搜索结果页面放入DOMDocument对象中


How to get Google search results page into a DOMDocument object in PHP?

我不想抢谷歌。这只是一次性的事情,比手动获取大约 300 个 url 要快一些。

不过,我似乎无法创建 DOMDocument。它总是以空对象结束。

search_list.txt包含我的搜索词列表。现在我只用一个术语"乐高积木"来测试它。

该脚本将正确下载搜索结果页。我在网络浏览器中查看了它,它看起来不错。

search_list.txt

legos

获得结果.php

<?php
$search_list = 'search_list.txt'; // file containing search terms
$results = 'results.txt';
$handle = fopen($vendor_list,'r');
while($line = fgets($handle)) {
        $fp = fopen($results,'w');
        $ch = curl_init('http://www.google.com/'
        . 'search?q=' . urlencode($line));
        curl_setopt($ch,CURLOPT_FILE,$fp);
        curl_setopt($ch,CURLOPT_HEADER,0);
        curl_exec($ch);
        curl_close($ch);
        fclose($fp);
        unset($ch,$fp);
}
fclose($handle);

$dom = DOMDocument::loadHTML(file_get_contents($results));
echo print_r($dom,true); // EMPTY
$search_div = $dom->getElementById('search');
if(is_null($search_div)) { // ALWAYS NULL
        echo 'Search_div is null';
} else {
        echo print_r($search_div,true);
}
?>

我做了一些更改。

而不是 fopen - fgets - * ,file .

而不是curlsimple_html_dom::load_file

$search_list = 'search_list.txt'; // file containing search terms
$result_list = 'results.txt'; // file containing search terms
$searching_list = file($search_list);
foreach ($search_list as $key => $searching_word) {
    $html->load_file('http://www.google.com/'.'search?q='.urlencode($searching_word));
    $search_div = $html->find("div[id='search']");
    echo $search_div[0]; // See content of the search div.
    file_put_contents($result_list,$search_div[0]);
}
?>

您可以使用 echo $search_div[0]; 查看结果。

它向您展示了搜索div的全部内容。

我搜索了"asd"=)...

根据我的结果,它以 like 开头

<div id="search"><div id="ires"><ol><li class="g"><h3 class="r"><a href="/url?q=http://en.wikipedia.org/wiki/Atrial_septal_defect&amp;sa=U&amp;ei=5qhMUv36ILKX0AXxuYGYCQ&amp;ved=0CBgQFjAA&amp;usg=AFQjCNFo67q2pfiPWK5SDMKFTeu-QSfcxw"><b>Atrial septal defect</b> - Wikipedia, the free encyclopedia</a></h3><div class="s"><div class="kv" style="margin-bottom:2px"><cite>en.wikipedia.org/wiki/<b>Atrial_septal_defect</b></cite><span class="flc"> - <a href="/url?q=http://webcache.googleusercontent.com/search%3Fq%3Dcache:Ocu9slAHjr4J:http://en.wikipedia.org/wiki/Atrial_septal_defect%252Basd%26hl%3Den%26ct%3Dclnk&amp;sa=U&amp;ei=5qhMUv36ILKX0AXxuYGYCQ&amp;ved=0CBkQIDAA&amp;usg=AFQjCNEY245u_ERgmZd7-2vIk5RAIRbOeg">Cached</a> - <a href="/search?ie=UTF-8&amp;q=related:en.wikipedia.org/wiki/Atrial_septal_defect+asd&amp;tbo=1&amp;sa=X&amp;ei=5qhMUv36ILKX0AXxuYGYCQ&amp;ved=0CBoQHzAA">Similar</a></span></div><span class="st"><b>Atrial septal defect</b> (<b>ASD</b>)

并像

</span><br></div></li><li class="g"><h3 class="r"><a href="/url?q=http://achievementschooldistrict.org/&amp;sa=U&amp;ei=5qhMUv36ILKX0AXxuYGYCQ&amp;ved=0CEQQFjAJ&amp;usg=AFQjCNHqINq_rlt8mbk2WmlATfpx-fyP8w"><b>Achievement School District</b></a></h3><div class="s"><div class="kv" style="margin-bottom:2px"><cite>achievementschooldistrict.org/</cite><span class="flc"> - <a href="/url?q=http://webcache.googleusercontent.com/search%3Fq%3Dcache:s8DoGxDbr4oJ:http://achievementschooldistrict.org/%252Basd%26hl%3Den%26ct%3Dclnk&amp;sa=U&amp;ei=5qhMUv36ILKX0AXxuYGYCQ&amp;ved=0CEUQIDAJ&amp;usg=AFQjCNEPhVqK33c7ruuXT7cwVe3-8JdUVA">Cached</a></span></div><span class="st"><b>Achievement School District</b> &middot; The <b>ASD</b> &middot; Driving Results &middot; Campuses &middot; Join Our <br>  Team &middot; Enroll A Student &middot; <b>ASD</b> News &middot; Contact Us&nbsp;<b>...</b></span><br></div></li></ol></div></div>

更新

这部分是根据巴特尔·巴克的评论改编的。

如果谷歌搜索的第一个结果没有变化,您可以使用此代码在搜索中获取第一个结果。

<?php
$search_list = 'search_list.txt'; // file containing search terms
$result_list = 'results.txt'; // file containing search terms
$order_language = "en"
$searching_list = file($search_list);
foreach ($search_list as $key => $searching_word) {
    $link = 'https://www.google.com.tr/search?hl='.$order_language.'&q='.$searching_word.'&btnI=1';
    echo $link;
    file_put_contents($result_list,$link[0]);
}
?>

我再次搜索了"asd"=)...

结果

https://www.google.com.tr/search?hl=en&q=asd&btnI=1

当我复制并粘贴到 chrome 时,此链接重定向到我到"asd 搜索"的第一个结果。

http://www.asd-europe.org/

如果我能帮你,我会感到幸福的。祝你今天开心。