如何使用PHP检查URL是外部URL还是内部URL


How To Check Whether A URL Is External URL or Internal URL With PHP?

我用这个循环得到一个页面的所有ahref:

foreach($html->find('a[href!="#"]') as $ahref) {
    $ahrefs++;
}

我想做这样的事情:

foreach($html->find('a[href!="#"]') as $ahref) {
    if(isexternal($ahref)) {
        $external++;
    }
    $ahrefs++;
}

其中isexternal是函数

function isexternal($url) {
    // FOO...
    // Test if link is internal/external
    if(/*condition is true*/) {
        return true;
    }
    else {
        return false;
    }
}

救命!

使用parse_url并将主机与本地主机进行比较(通常但并非总是与$_SERVER['HTTP_HOST']相同)

function isexternal($url) {
  $components = parse_url($url);    
  return !empty($components['host']) && strcasecmp($components['host'], 'example.com'); // empty host will indicate url like '/relative.php'
}

悬停此选项将把www.example.com和example.com视为不同的主机。如果你希望所有子域都被视为本地链接,那么函数会更大一些:

function isexternal($url) {
  $components = parse_url($url);
  if ( empty($components['host']) ) return false;  // we will treat url like '/relative.php' as relative
  if ( strcasecmp($components['host'], 'example.com') === 0 ) return false; // url host looks exactly like the local host
  return strrpos(strtolower($components['host']), '.example.com') !== strlen($components['host']) - strlen('.example.com'); // check if the url host is a subdomain
}

这就是简单地检测外部URL的方法:

$url    = 'https://my-domain.com/demo/';
$domain = 'my-domain.com';
$internal = (
    false !== stripos( $url, '//' . $domain ) || // include "//my-domain.com" and "http://my-domain.com"
    stripos( $url, '.' . $domain ) ||            // include subdomains, like "www.my-domain.com". DANGEROUS (see below)!
    (
        0 !== strpos( $url, '//' ) &&            // exclude protocol relative URLs, like "//example.com"
        0 === strpos( $url, '/' )                // include root-relative URLs, like "/demo"
    )
);

上述检查将把www.my-domain.commy-domain.com视为"内部"。

为什么此规则很危险

子域逻辑引入了一个可能被利用的弱点:例如,当外部URL在路径中包含您的域时,https://external.com/www.my-domain.com将被视为内部!

更安全的代码

这个问题可以通过删除子域支持来消除(我建议这样做):

$url    = 'https://my-domain.com/demo/';
$domain = 'my-domain.com';
$internal = (
    false !== stripos( $url, '//' . $domain ) || // include "//my-domain.com" and "http://my-domain.com"
    (
        0 !== strpos( $url, '//' ) &&            // exclude protocol relative URLs, like "//example.com"
        0 === strpos( $url, '/' )                // include root-relative URLs, like "/demo"
    )
);

我知道这篇文章很旧,但我现在编码了我的函数。也许其他人也需要它。

function IsResourceLocal($url){
    if( empty( $url ) ){ return false; }
    $urlParsed = parse_url( $url );
    $host = $urlParsed['host'];
    if( empty( $host ) ){ 
    /* maybe we have a relative link like: /wp-content/uploads/image.jpg */
    /* add absolute path to begin and check if file exists */
    $doc_root = $_SERVER['DOCUMENT_ROOT'];
    $maybefile = $doc_root.$url;
    /* Check if file exists */
    $fileexists = file_exists ( $maybefile );
    if( $fileexists ){
        /* maybe you want to convert to full url? */
        return true;        
        }
     }
    /* strip www. if exists */
    $host = str_replace('www.','',$host);
    $thishost = $_SERVER['HTTP_HOST'];
    /* strip www. if exists */
    $thishost = str_replace('www.','',$thishost);
    if( $host == $thishost ){
        return true;
        }
    return false;
}
function isexternal($url) {
    // FOO...
    // Test if link is internal/external
    if(strpos($url,'domainname.com') !== false || strpos($url,"/") === '0') 
    {
         return true;
    }
    else 
    {
         return false;
    }
}

您可能需要检查链接是否在同一域中。只有当所有href属性都是绝对的并且包含域时,这才有效。像/test/file.html这样的相对文件夹很棘手,因为可能有与域同名的文件夹。。所以,如果你在每个链接中都有完整的url:

function isexternal($url) {
  // Test if link is internal/external
  if(stristr($url, "myDomain.com") || strpos($url,"/") == '0')
    return true;
  else
    return false;
}