PHP Error[2]: fopen(http://www1.macys.com/robots.txt)


PHP Error[2]: fopen(http://www1.macys.com/robots.txt)

我正在尝试下载机器人.txt文件的内容

我的原始问题链接:用于URL/robots.txt的PHP file_exists()返回错误

这是第 22 行:$f = fopen($file, 'r');

我收到此错误:

PHP Error[2]: fopen(http://www1.macys.com/robots.txt): failed to open stream: Redirection limit reached, aborting
    in file /host/chapache/host/apache/www/home/flaviuspogacian/proiecte/Mickey_ClosetAffair_Discovery/webroot/protected/modules/crawler/components/Robots.php at line 22
#0 /host/chapache/host/apache/www/home/flaviuspogacian/proiecte/Mickey_ClosetAffair_Discovery/webroot/protected/modules/crawler/components/Robots.php(22): fopen()

对于此代码,其中 $website_id 是一个数字,$website类似于 http://www.domain.com/

public function read_website_save_2_db($website_id, $website) {
    $slashes = 0;
    for ($i = 0; $i < strlen($website); $i++)
        if ($website[$i] == '/')
            $slashes++;
    if ($slashes == 2)
        $file = $website . '/robots.txt';
    else
        $file = $website . 'robots.txt';
    echo $website_id . ' ' . $file . PHP_EOL;
    try {
        $f = fopen($file, 'r');
        if (($f) || (strpos(get_headers($file, 1), "404") !== FALSE)) {
            fclose($f);
            echo 'exists' . PHP_EOL;
            $curl_tool = new CurlTool();
            $content = $curl_tool->downloadFile($file, ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt');
            //if the file exists on local disk, delete it
            if (file_exists(ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt'))
                unlink(ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt');
            echo ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt', $content . PHP_EOL;
            file_put_contents(ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt', $content);
        }
    else {
        echo 'maybe it''s not there' . PHP_EOL;
    }
} catch (Exception $e) {
    echo 'EXCEPTION ' . $e . PHP_EOL;
}

}

你的代码的某些部分看起来很混乱。我会做这样的事情(但当然不是函数内部的回显,它只是例如)

public function read_website_save_2_db($website_id, $website) {
  $url = rtrim($website, '/') . '/robots.txt';
  $content = @file_get_contents($url);
  $status = 0;
  $success = false;
  if( !empty($http_response_header) ) {
    foreach($http_response_header as $header) {
      if(substr($header, 0, 6) == 'HTTP/1') {
        $status = trim(substr($header, strpos($header, ' '), strlen($header)));
        $success = strnatcasecmp($status, '200 OK') === 0;
        break;
      }
    }
  }
  if(!$success) {
    echo 'Request failed with status '.$status;
  }
  elseif(!$content) {
    echo 'Website responded with empty robots.txt';
  }
  else {
    file_put_contents(ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt', $content);
    echo 'Wii, we have downloaded a copy of '.$url;
  }
}