PHP读取文件列表的网址和搜索这些网站的来源字符串 - php reading file list of webaddresses and searching the source of those sites for string

php reading file list of webaddresses and searching the source of those sites for string

我创建了这个脚本来搜索包含web url列表的ny2.txt(目前只有1行)http://campersbarn.com

然后我想循环遍历每一行并获得该站点的源。最后，我检查文本krgrpowered是否存在于网站。

这是来自错误日志

[08-May-2014 08:10:55 America/Denver] PHP Warning:  file_get_contents(): php_network_getaddresses: getaddrinfo failed: Name or service not known in /home4/millipg7/public_html/limitedtee/test/test.php on line 8
[08-May-2014 08:10:55 America/Denver] PHP Warning:  file_get_contents(http://campersbarn.com
): failed to open stream: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home4/millipg7/public_html/limitedtee/test/test.php on line 8

如果ny2.txt中的内容是"http the url。com"或"http the url。com"php执行得非常快，但是什么也没发生…

<?php
$lines = file('ny2.txt');
$fh = fopen("result.txt", 'w');
foreach ($lines as $line_num => $url) {
  $html = file_get_contents($url);
  if (strpos($html,'krgrpowered')!==false)
   fwrite($fh,$url."'n");
} 
    fclose($fh);
?>

要删除尾随换行符，可以使用rtrim():

foreach ($lines as $line_num => $url) {
    $url = rtrim($url);
    $html = file_get_contents($url);

但是我建议您简单地使用FILE_IGNORE_NEW_LINES标志，这样file()就不会在第一个位置附加它们:

$lines = file('ny2.txt', FILE_IGNORE_NEW_LINES);

它们没什么用

检查正在生成的$url的var_dump。它可能像空格/换行符一样简单。试一试:

$html = file_get_contents(trim($url));

或者如果你想要非常彻底:

$html = file_get_contents(rawurlencode(trim($url)));

我注意到错误消息(http://campersbarn.com)中的URL后面有一个小空格，所以可能就是这样。