我不想抢谷歌。这只是一次性的事情,比手动获取大约 300 个 url 要快一些。
不过,我似乎无法创建 DOMDocument。它总是以空对象结束。
search_list.txt
包含我的搜索词列表。现在我只用一个术语"乐高积木"来测试它。
该脚本将正确下载搜索结果页。我在网络浏览器中查看了它,它看起来不错。
search_list.txt
legos
获得结果.php
<?php
$search_list = 'search_list.txt'; // file containing search terms
$results = 'results.txt';
$handle = fopen($vendor_list,'r');
while($line = fgets($handle)) {
$fp = fopen($results,'w');
$ch = curl_init('http://www.google.com/'
. 'search?q=' . urlencode($line));
curl_setopt($ch,CURLOPT_FILE,$fp);
curl_setopt($ch,CURLOPT_HEADER,0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
unset($ch,$fp);
}
fclose($handle);
$dom = DOMDocument::loadHTML(file_get_contents($results));
echo print_r($dom,true); // EMPTY
$search_div = $dom->getElementById('search');
if(is_null($search_div)) { // ALWAYS NULL
echo 'Search_div is null';
} else {
echo print_r($search_div,true);
}
?>
我做了一些更改。
而不是 fopen
- fgets
- * ,file
.
而不是curl
,simple_html_dom::load_file
$search_list = 'search_list.txt'; // file containing search terms
$result_list = 'results.txt'; // file containing search terms
$searching_list = file($search_list);
foreach ($search_list as $key => $searching_word) {
$html->load_file('http://www.google.com/'.'search?q='.urlencode($searching_word));
$search_div = $html->find("div[id='search']");
echo $search_div[0]; // See content of the search div.
file_put_contents($result_list,$search_div[0]);
}
?>
您可以使用 echo $search_div[0];
查看结果。
它向您展示了搜索div的全部内容。
我搜索了"asd"=)...
根据我的结果,它以 like 开头
<div id="search"><div id="ires"><ol><li class="g"><h3 class="r"><a href="/url?q=http://en.wikipedia.org/wiki/Atrial_septal_defect&sa=U&ei=5qhMUv36ILKX0AXxuYGYCQ&ved=0CBgQFjAA&usg=AFQjCNFo67q2pfiPWK5SDMKFTeu-QSfcxw"><b>Atrial septal defect</b> - Wikipedia, the free encyclopedia</a></h3><div class="s"><div class="kv" style="margin-bottom:2px"><cite>en.wikipedia.org/wiki/<b>Atrial_septal_defect</b></cite><span class="flc"> - <a href="/url?q=http://webcache.googleusercontent.com/search%3Fq%3Dcache:Ocu9slAHjr4J:http://en.wikipedia.org/wiki/Atrial_septal_defect%252Basd%26hl%3Den%26ct%3Dclnk&sa=U&ei=5qhMUv36ILKX0AXxuYGYCQ&ved=0CBkQIDAA&usg=AFQjCNEY245u_ERgmZd7-2vIk5RAIRbOeg">Cached</a> - <a href="/search?ie=UTF-8&q=related:en.wikipedia.org/wiki/Atrial_septal_defect+asd&tbo=1&sa=X&ei=5qhMUv36ILKX0AXxuYGYCQ&ved=0CBoQHzAA">Similar</a></span></div><span class="st"><b>Atrial septal defect</b> (<b>ASD</b>)
并像
</span><br></div></li><li class="g"><h3 class="r"><a href="/url?q=http://achievementschooldistrict.org/&sa=U&ei=5qhMUv36ILKX0AXxuYGYCQ&ved=0CEQQFjAJ&usg=AFQjCNHqINq_rlt8mbk2WmlATfpx-fyP8w"><b>Achievement School District</b></a></h3><div class="s"><div class="kv" style="margin-bottom:2px"><cite>achievementschooldistrict.org/</cite><span class="flc"> - <a href="/url?q=http://webcache.googleusercontent.com/search%3Fq%3Dcache:s8DoGxDbr4oJ:http://achievementschooldistrict.org/%252Basd%26hl%3Den%26ct%3Dclnk&sa=U&ei=5qhMUv36ILKX0AXxuYGYCQ&ved=0CEUQIDAJ&usg=AFQjCNEPhVqK33c7ruuXT7cwVe3-8JdUVA">Cached</a></span></div><span class="st"><b>Achievement School District</b> · The <b>ASD</b> · Driving Results · Campuses · Join Our <br> Team · Enroll A Student · <b>ASD</b> News · Contact Us <b>...</b></span><br></div></li></ol></div></div>
更新
这部分是根据巴特尔·巴克的评论改编的。
如果谷歌搜索的第一个结果没有变化,您可以使用此代码在搜索中获取第一个结果。
<?php
$search_list = 'search_list.txt'; // file containing search terms
$result_list = 'results.txt'; // file containing search terms
$order_language = "en"
$searching_list = file($search_list);
foreach ($search_list as $key => $searching_word) {
$link = 'https://www.google.com.tr/search?hl='.$order_language.'&q='.$searching_word.'&btnI=1';
echo $link;
file_put_contents($result_list,$link[0]);
}
?>
我再次搜索了"asd"=)...
结果
https://www.google.com.tr/search?hl=en&q=asd&btnI=1
当我复制并粘贴到 chrome 时,此链接重定向到我到"asd 搜索"的第一个结果。
http://www.asd-europe.org/
如果我能帮你,我会感到幸福的。祝你今天开心。