PHP:获取域名(没有子域名)的任何地址可用的字符串


PHP: Get Domain (without subdomain) of any Address available as String

最近有一个问题被问到,如何获得任何URL可用的域名作为字符串。

不幸的是,这个问题已经关闭了,到目前为止,链接的答案只指向使用Regex的解决方案(对于像.co.uk这样的特殊情况失败)和静态解决方案,考虑到这些例外(其中ofc。可能会随着时间而改变)。

所以,我在寻找这个问题的通用解决方案,这将在任何时候工作,并找到了一个。(至少两项检查呈阳性)

如果您发现一个尝试的解决方案不起作用的领域,请随意提及它,我也将尝试改进代码片段以涵盖这种情况。

要找到给定字符串的定义域,三步解决方案似乎最有效:

    首先,使用parse_url (http://php.net/manual/en/function.parse-url.php)获取实际的主机名
  • 第二,在任何dns服务器上查询可用的"Top-Most"A-Record。(我使用checkdnsrr为此目的:http://php.net/manual/en/function.checkdnsrr.php)
  • 最后但并非最不重要的:执行一些验证以确保您没有遇到一些"默认响应"。

我只做了一些测试,结果似乎和预期的一样。该方法直接生成输出,但可以修改为返回域名而不是生成输出:

<?php
getDomain("http://www.stackoverflow.com");
getDomain("http://www.google.co.uk");
getDomain("http://books.google.co.uk");
getDomain("http://a.b.c.google.co.uk");
getDomain("http://www.nominet.org.uk/intelligence/statistics/registration/");
getDomain("http://invalid.fail.pooo");
getDomain("http://AnotherOneThatShouldFail.com");

function getDomain($url){
  echo "Searching Domain for '".$url."': ";
  //Step 1: Get the actual hostname
  $url = parse_url($url);
  $actualHostname = $url["host"];
  //step 2: Top-Down approach: check DNS Records for the first valid A-record.
  //Re-Assemble url step-by-step, i.e. for www.google.co.uk, check: 
  // - uk
  // - co.uk
  // - google.co.uk (will match here)
  // - www.google.co.uk (will be skipped)
  $domainParts = explode(".", $actualHostname);
  for ($i= count($domainParts)-1; $i>=0; $i--){
    $domain = "";
    $currentCountry = null;
    for ($j = count($domainParts)-1; $j>=$i; $j--){
      $domain = $domainParts[$j] . "." . $domain;
      if ($currentCountry == null){
        $currentCountry = $domainParts[$j];
      }
    }
    $domain = trim($domain, ".");
    $validRecord = checkdnsrr($domain, "A"); //looking for Class A records
    if ($validRecord){
       //If the host can be resolved to an ip, it seems valid.
       //if hostname is returned, its invalid.  
       $hostIp = gethostbyname($domain);  
       $validRecord &= ($hostIp != $domain);
       if ($validRecord){
         //last check: DNS server might answer with one of ISPs default server ips for invalid domains.
         //perform a test on this by querying a domain of the same "country" that is invalid for sure to obtain an
         //ip list of ISPs default servers. Then compare with the response of current $domain.
         $validRecord &= !(in_array($hostIp, gethostbynamel("iiiiiiiiiiiiiiiiiinvaliddomain." . $currentCountry)));
       }
    }
    //valid record?
    if ($validRecord){
      //return $domain;
      echo $domain."<br />";
      return;
    }
  }
  //return null;
  echo " not resolved.<br />";
}

?>

上面例子的输出:

Searching Domain for 'http://www.stackoverflow.com': stackoverflow.com
Searching Domain for 'http://www.google.co.uk': google.co.uk
Searching Domain for 'http://books.google.co.uk': google.co.uk
Searching Domain for 'http://a.b.c.google.co.uk': google.co.uk
Searching Domain for 'http://www.nominet.org.uk/intelligence/statistics/registration/': nominet.org.uk
Searching Domain for 'http://invalid.fail.pooo': not resolved.
Searching Domain for 'http://AnotherOneThatShouldFail.com': not resolved.

这只是一组非常有限的测试用例,但我无法想象域没有a记录的情况。

作为一个很好的副作用,这也验证了url,而不仅仅依赖于理论上有效的格式,就像最后的例子所显示的那样。

最好的,dognose