尝试使用 PHP cURL 从网站获取 HTML 不起作用


Attempt to get HTML from website using PHP cURL does not work

我正在尝试编写一个脚本,可以从学校的时间表搜索网页中检索 HTML。当我使用浏览器访问网页时,我可以正常访问网页,但是当我尝试使用 cURL 让它工作时,它会从重定向的页面获取 HTML。当我更改

CURLOPT_FOLLOWLOCATION

从 true 到 false 的变量,它只输出一个发送标题的空白页。

作为参考,我的PHP代码是

<?php
$curl_connection = curl_init('https://www.registrar.usf.edu/ssearch/');
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/");
$result = curl_exec($curl_connection);
print $result;
?>

我尝试从cURL获取HTML的网站是 https://www.registrar.usf.edu/ssearch/或 https://www.registrar.usf.edu/ssearch/search.php

有什么想法吗?

我又添加了 2 行,现在保存了 cookie,当您尝试抓取 shedule 的页面时,它会决定是否重定向您。

$curl_connection = curl_init();
$url = "https://www.registrar.usf.edu/ssearch/search.php";
curl_setopt($curl_connection, CURLOPT_URL, $url);
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($curl_connection, CURLOPT_COOKIEJAR, 'cookie.txt');//cookiejar to dump cookie infos.
curl_setopt ($curl_connection, CURLOPT_COOKIEFILE, 'cookie.txt');//cookie file for further reference from the site
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/");
$result = curl_exec($curl_connection);
echo $result;

另外,我还没有看到有人将网址放入curl_init

这是饼干:

# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.
www.registrar.usf.edu   FALSE   /   FALSE   0   PHPSESSID   eied78t0v1qlqcop0rdk214361
www.registrar.usf.edu   FALSE   /ssearch/   FALSE   1336718465  cookie_test cookie_set

如果你想调试一个不起作用的卷曲的东西,从var_dump(curl_getinfo($curl_connection));开始,下一个要检查的是curl_error($curl_connection);