我正试图为Facebook群组构建一个爬网程序,当运行它来获取页面内容时,我发现了一个问题:
<?php
$url = "https://www.facebook.com/groups/theGroupId/";
$ch = curl_init($url); // initialize the CURL library in my PHP script so we can later work on it - inside the handler.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // curl_setopt() function is used to set options on the $ch handler.// in this case we use the CURLOPT_RETURNTRANSFER option
$curl_scraped_page = curl_exec($ch); // "run all the stuff we've set" - return the data scraped to the variable $curl_scraped_page
var_dump($curl_scraped_page);
if ($curl_scraped_page === false) {
die(curl_error($ch));
}
curl_close($ch);
echo $curl_scraped_page;
?>
我收到以下错误:"SSL证书问题q:无法获取本地颁发者证书"。
我浏览了本教程:http://unitstep.net/blog/2009/05/05/using-curl-in-php-to-access-https-ssltls-protected-sites/这解释了为什么会发生这种情况,以及如何用两种不同的方法解决它,我尝试了两种方法,但仍然得到了相同的错误消息:
<?php
$url = "https://www.facebook.com/groups/{theGroupId}/";
$ch = curl_init($url); // initialize the CURL library in my PHP script so we can later work on it - inside the handler.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // curl_setopt() function is used to set options on the $ch handler.// in this case we use the CURLOPT_RETURNTRANSFER option
//curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_CAINFO, getcwd() . "/CAcerts/GTECyberTrustGlobalRoot.crt");
$curl_scraped_page = curl_exec($ch); // "run all the stuff we've set" - return the data scraped to the variable $curl_scraped_page
var_dump($curl_scraped_page);
if ($curl_scraped_page === false) {
die(curl_error($ch));
}
curl_close($ch);
echo $curl_scraped_page;
?>
这是确切的输出(使用var_dump):
boolean false
SSL certificate problem: unable to get local issuer certificate
我做错什么了吗?这是正确的方法吗?
<?php
$url = "http://www.facebook.com/groups/4189052132/";
function curl($url) {
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_AUTOREFERER => TRUE,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",
CURLOPT_URL => $url,
CURLOPT_COOKIE => $session
);
$ch = curl_init();
curl_setopt_array($ch, $options);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$scraped_page = curl($url);
echo $scraped_page;
?>
无需验证他们的证书。这就是为什么你会遇到这个问题。