使用cURL下载站点's HTML源代码,但得到的文件与预期的不同


Using cURL to download a site's HTML source, but getting different file than intended

我正在尝试使用cURL和PHP下载这里的HTML源代码(因为它出现在浏览器中)。但是返回的不是实际的源代码,而是这个(元刷新链接设置为0)。

<html>
    <head><title>Object moved</title></head>
    <body>
        <h2>Object moved to <a href="https://login.live.com/login.srf?wa=wsignin1.0&amp;rpsnv=11&amp;checkda=1&amp;ct=1321044850&amp;rver=6.1.6195.0&amp;wp=MBI&amp;wreply=http:%2F%2Fwww.windowsphone.com%2Fen-US%2Fapps%2Fea39f002-ac30-e011-854c-00237de2db9e&amp;lc=1033&amp;id=268289">here</a>.
        </h2>
    </body>
</html>

我试图欺骗推荐标题是网站,但似乎我做错了。代码如下。有什么建议吗?由于

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.windowsphone.com/en-US/apps/ea39f002-ac30-e011-854c-00237de2db9e');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_AUTOREFERER, false);
curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com/en-US/apps/ea39f002-ac30-e011-854c-00237de2db9e");
$html = curl_exec($ch);
curl_close($ch);

添加curl选项来跟随重定向:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

如果它是一个元刷新而不是HTTP移动头,请参见:PHP: CURL可以跟随meta重定向吗

正如flesk所提到的,您可能还需要存储cookie

$ch = curl_init();
 curl_setopt($ch, CURLOPT_URL, 'http://www.windowsphone.com/en-US/apps/ea39f002-ac30-e011-854c-00237de2db9e');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); 
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com");
$html = curl_exec($ch);
curl_close($ch);
echo $html;

问题不在于referer,而在于您需要启用cookie才能使其工作。

试试这样写:

curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");

您必须查询该页两次。首先允许重定向从login.live.com获取cookie,然后使用cookie集再次查询。