PHP cURL加载响应数据失败


PHP cURL failed to load response data

我试图用php做数据抓取,但我需要访问的url需要post数据。

<?php 
//set POST variables
$url = 'https://www.ncaa.org/';
//$url = 'https://web3.ncaa.org/hsportal/exec/hsAction?hsActionSubmit=searchHighSchool';
// This is the data to POST to the form. The KEY of the array is the name of the field. The value is the value posted.
$data_to_post = array();
$data_to_post['hsCode'] = '332680';
$data_to_post['state'] = '';
$data_to_post['city'] = '';
$data_to_post['name'] = '';
$data_to_post['hsActionSubmit'] = 'Search';
// Initialize cURL
$curl = curl_init();
// Set the options
curl_setopt($curl,CURLOPT_URL, $url);
// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));
// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, $data_to_post);
//execute the post
$result = curl_exec($curl);
//close the connection
curl_close($curl);
?>

当我尝试访问第二个$url时,实际信息被托管,它返回未能加载响应数据,但它允许我访问ncaa主页。是否有一个原因,我得到一个失败的加载响应数据,即使我发送正确的表单数据?

该站点显然会检查是否存在可识别的用户代理。默认情况下,PHP curl不发送User-Agent标头。添加

curl_setopt($curl, CURLOPT_USERAGENT, 'curl/7.21.4');

,脚本返回一个响应。然而,在这种情况下,响应说它需要一个比您现有的浏览器更新的浏览器。所以你应该从真实的浏览器中复制用户代理字符串,例如

curl_setopt($curl, CURLOPT_USERAGENT, '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36');

同时要求参数以application/x-www-form-urlencoded格式发送。当你使用一个数组作为CURLOPT_POSTFIELDS的参数时,它使用multipart/form-data。所以把这行改成:

curl_setopt($curl,CURLOPT_POSTFIELDS, http_build_query($data_to_post));

将数组转换为url编码的字符串。

在URL中,省略?hsActionSubmit=searchHighSchool,因为该参数是在POST字段中发送的。

最终的工作脚本如下:

<?php
//set POST variables
//$url = 'https://www.ncaa.org/';
$url = 'https://web3.ncaa.org/hsportal/exec/hsAction';
// This is the data to POST to the form. The KEY of the array is the name of the field. The value is the value posted.
$data_to_post = array();
$data_to_post['hsCode'] = '332680';
$data_to_post['state'] = '';
$data_to_post['city'] = '';
$data_to_post['name'] = '';
$data_to_post['hsActionSubmit'] = 'Search';
// Initialize cURL
$curl = curl_init();
// Set the options
curl_setopt($curl,CURLOPT_URL, $url);
// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));
// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, http_build_query($data_to_post));
curl_setopt($curl, CURLOPT_USERAGENT, '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36');
//execute the post
$result = curl_exec($curl);
//close the connection
curl_close($curl);

curl HTTPS连接需要关闭特定选项。CURLOPT_SSL_VERIFYPEER

// Initialize cURL
$curl = curl_init();
// Set the options
curl_setopt($curl,CURLOPT_URL, $url);
// ** This option MUST BE FALSE **
**curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);**
// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));
// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, $data_to_post);
//execute the post
$result = curl_exec($curl);
//close the connection
curl_close($curl);