收集包含 404,403 个代码的网站 - Collecting sites with 404,403 codes

Collecting sites with 404,403 codes

本文关键字：代码网站包含 | 更新日期: 2023-09-27

我有一个网站目录，我想排除那些提供 404 或 403 代码的网站（分别不能向用户显示任何有趣的内容）。但是使用php的file_get_contents或curl函数即使带有请求标头，有时即使我可以通过浏览器看到普通页面，也会给出404或403响应。我可以使用什么来收集正确的代码（可以肯定的是，该网站没有内容）？

试试这个函数

 <?php
    function Visit($url){
           $agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";$ch=curl_init();
           curl_setopt ($ch, CURLOPT_URL,$url );
           curl_setopt($ch, CURLOPT_USERAGENT, $agent);
           curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
           curl_setopt ($ch,CURLOPT_VERBOSE,false);
           curl_setopt($ch, CURLOPT_TIMEOUT, 5);
           curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, FALSE);
           curl_setopt($ch,CURLOPT_SSLVERSION,3);
           curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, FALSE);
           $page=curl_exec($ch);
           //echo curl_error($ch);
           $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
           curl_close($ch);
           if($httpcode>=200 && $httpcode<308) return true;
           else return false;
    }
    if (Visit("http://www.google.com"))
           echo "Website OK"."n";
    else
           echo "Website DOWN";
    ?>

根据 W3 状态代码的定义进行编辑