我正在尝试下载机器人.txt文件的内容
我的原始问题链接:用于URL/robots.txt的PHP file_exists()返回错误
这是第 22 行:$f = fopen($file, 'r');
我收到此错误:
PHP Error[2]: fopen(http://www1.macys.com/robots.txt): failed to open stream: Redirection limit reached, aborting
in file /host/chapache/host/apache/www/home/flaviuspogacian/proiecte/Mickey_ClosetAffair_Discovery/webroot/protected/modules/crawler/components/Robots.php at line 22
#0 /host/chapache/host/apache/www/home/flaviuspogacian/proiecte/Mickey_ClosetAffair_Discovery/webroot/protected/modules/crawler/components/Robots.php(22): fopen()
对于此代码,其中 $website_id 是一个数字,$website类似于 http://www.domain.com/
public function read_website_save_2_db($website_id, $website) {
$slashes = 0;
for ($i = 0; $i < strlen($website); $i++)
if ($website[$i] == '/')
$slashes++;
if ($slashes == 2)
$file = $website . '/robots.txt';
else
$file = $website . 'robots.txt';
echo $website_id . ' ' . $file . PHP_EOL;
try {
$f = fopen($file, 'r');
if (($f) || (strpos(get_headers($file, 1), "404") !== FALSE)) {
fclose($f);
echo 'exists' . PHP_EOL;
$curl_tool = new CurlTool();
$content = $curl_tool->downloadFile($file, ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt');
//if the file exists on local disk, delete it
if (file_exists(ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt'))
unlink(ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt');
echo ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt', $content . PHP_EOL;
file_put_contents(ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt', $content);
}
else {
echo 'maybe it''s not there' . PHP_EOL;
}
} catch (Exception $e) {
echo 'EXCEPTION ' . $e . PHP_EOL;
}
}
你的代码的某些部分看起来很混乱。我会做这样的事情(但当然不是函数内部的回显,它只是例如)
public function read_website_save_2_db($website_id, $website) {
$url = rtrim($website, '/') . '/robots.txt';
$content = @file_get_contents($url);
$status = 0;
$success = false;
if( !empty($http_response_header) ) {
foreach($http_response_header as $header) {
if(substr($header, 0, 6) == 'HTTP/1') {
$status = trim(substr($header, strpos($header, ' '), strlen($header)));
$success = strnatcasecmp($status, '200 OK') === 0;
break;
}
}
}
if(!$success) {
echo 'Request failed with status '.$status;
}
elseif(!$content) {
echo 'Website responded with empty robots.txt';
}
else {
file_put_contents(ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt', $content);
echo 'Wii, we have downloaded a copy of '.$url;
}
}