HTTPS链接获取问题


HTTPS link fetch issue

过去几天我一直在尝试从网站获取请求,但没有成功。我一直得到301错误。有没有人能帮我抓取这个页面的内容:https://pre.corrupt-net.org/search.php?search=Lasse_Stefanz-Bara_Du-SE-CD-FLAC-1995-LoKET

期待您的回信。

编辑:这是我使用的php函数:

function http_request(
    $verb = 'GET',             /* HTTP Request Method (GET and POST supported) */
    $ip,                       /* Target IP/Hostname */
    $port = 80,                /* Target TCP port */
    $uri = '/',                /* Target URI */
    $getdata = array(),        /* HTTP GET Data ie. array('var1' => 'val1', 'var2' => 'val2') */
    $postdata = array(),       /* HTTP POST Data ie. array('var1' => 'val1', 'var2' => 'val2') */
    $cookie = array(),         /* HTTP Cookie Data ie. array('var1' => 'val1', 'var2' => 'val2') */
    $custom_headers = array(), /* Custom HTTP headers ie. array('Referer: http://localhost/ */
    $timeout = 1000,           /* Socket timeout in milliseconds */
    $req_hdr = false,          /* Include HTTP request headers */
    $res_hdr = false           /* Include HTTP response headers */
    )
{
    $ret = '';
    $verb = strtoupper($verb);
    $cookie_str = '';
    $getdata_str = count($getdata) ? '?' : '';
    $postdata_str = '';
    foreach ($getdata as $k => $v)
        $getdata_str .= urlencode($k) .'='. urlencode($v);
    foreach ($postdata as $k => $v)
        $postdata_str .= urlencode($k) .'='. urlencode($v) .'&';
    foreach ($cookie as $k => $v)
        $cookie_str .= urlencode($k) .'='. urlencode($v) .'; ';
    $crlf = "'r'n";
    $req = $verb .' '. $uri . $getdata_str .' HTTP/1.1' . $crlf;
    $req .= 'Host: '. $ip . $crlf;
    $req .= 'User-Agent: Mozilla/5.0 Firefox/3.6.12' . $crlf;
    $req .= 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' . $crlf;
    $req .= 'Accept-Language: en-us,en;q=0.5' . $crlf;
    $req .= 'Accept-Encoding: deflate' . $crlf;
    $req .= 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' . $crlf;
    foreach ($custom_headers as $k => $v)
        $req .= $k .': '. $v . $crlf;
    if (!empty($cookie_str))
        $req .= 'Cookie: '. substr($cookie_str, 0, -2) . $crlf;
    if ($verb == 'POST' && !empty($postdata_str)){
        $postdata_str = substr($postdata_str, 0, -1);
        $req .= 'Content-Type: application/x-www-form-urlencoded' . $crlf;
        $req .= 'Content-Length: '. strlen($postdata_str) . $crlf . $crlf;
        $req .= $postdata_str;
    }   
    else $req .= $crlf;
    if ($req_hdr)
        $ret .= $req;
    if (($fp = @fsockopen($ip, $port, $errno, $errstr)) == false)
        return "Error $errno: $errstr'n";
    stream_set_timeout($fp, 0, $timeout * 1000);
    fputs($fp, $req);
    while ($line = fgets($fp)) $ret .= $line;
    fclose($fp);
    if (!$res_hdr)
        $ret = substr($ret, strpos($ret, "'r'n'r'n") + 4);
    return $ret;
}

首先,301不是"错误",它表明您正在被重定向。您需要解析响应标头,获取Location:标头的值(HTTP协议规范要求在重定向响应中出现)并请求该URI。

其次,上面的函数似乎不支持访问HTTPS url。您需要为PHP实例安装OpenSSL扩展才能做到这一点,并且您还需要以某种方式调用它。您可以通过在$ip参数的地址前面传递ssl://tls://来使用上述函数,但不能简单地传递IP。

第三,做这些事情的通常方法是使用cURL扩展。你可以这样做:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://pre.corrupt-net.org/search.php?search=Lasse_Stefanz-Bara_Du-SE-CD-FLAC-1995-LoKET'); // Set the URL
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Get the result from the execution
if (($result = curl_exec($ch)) === FALSE) { // Execute the request
  echo "cURL failed! Error: ".curl_error($ch);
} else {
  echo "Success! Result: $result";
}
curl_close($ch);

或者,如果cURL不可用或者出于某种原因不想使用它,你可以使用我的HTTPRequest类,它是PHP4兼容的,不需要扩展(除了用于HTTPS请求的OpenSSL)。在脚本顶部的注释中记录(大致)。你可以这样做:

$request = new httprequest(); // Create an object
// Set the request URL
if (!$request->setRequestURL('https://pre.corrupt-net.org/search.php?search=Lasse_Stefanz-Bara_Du-SE-CD-FLAC-1995-LoKET')) echo "Failed! Error: ".$request->getLastErrorStr()."<br>'r'n";
// Send the request
if (!$request->sendRequest()) echo "Failed! Error: ".$request->getLastErrorStr()."<br>'r'n";
echo "Success! Result: ".$request->getResponseBodyData(TRUE);

顺便说一句,很多Scene PreDB管理器/提供商不太热衷于自动抓取,你可能会被禁止…