我正在使用symfony2
和Goutte
来抓取web数据。我试着简单地登录到facebook并返回登录的页面数据。
这是我的代码:
<?php
namespace junk'scraperBundle'Controller;
use Symfony'Bundle'FrameworkBundle'Controller'Controller;
use Goutte'Client;
class ThingController extends Controller
{
public function somethingAction($something)
{
// make a request to an external site
$client = new Client();
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYHOST, FALSE);
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYPEER, FALSE);
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_RETURNTRANSFER, TRUE);
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_FOLLOWLOCATION, TRUE);
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_COOKIESESSION, TRUE);
$client->setHeader('User-Agent', "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36");
$crawler = $client->request('GET', 'https://www.facebook.com');
// select the form and fill in some values
$form = $crawler->selectButton('Log In')->form();
$form['email'] = 'email@junk.com';
$form['pass'] = 'password';
// submit that form
$crawler = $client->submit($form);
echo $crawler->html();
return $this->render('scraperBundle:Thing:index.html.twig');
}
} // END class ThingController
问题是我得到了一个错误:
Cookies Required
Cookies are not enabled on your browser. Please enable cookies in your browser preferences to continue.
我认为问题出在我的cURL
配置选项上。只有CURLOPT_SSL_VERIFYHOST
和CURLOPT_SSL_VERIFYPEER
选项,我就可以成功地进入其他https
页面,比如GitHub,但我就是不知道如何为Facebook做到这一点。
有什么建议吗?
谢谢!
您可以尝试以下方式:
public function somethingAction($something)
{
$cookie_file = '/tmp/' . uniqid() . 'cookie';
$client = new Client();
/.../
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_COOKIEFILE, $cookie_file);
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_COOKIEJAR, $cookie_file);
/.../
}
希望它能帮助你。