最近有一个问题被问到,如何获得任何URL可用的域名作为字符串。
不幸的是,这个问题已经关闭了,到目前为止,链接的答案只指向使用Regex的解决方案(对于像.co.uk这样的特殊情况失败)和静态解决方案,考虑到这些例外(其中ofc。可能会随着时间而改变)。
所以,我在寻找这个问题的通用解决方案,这将在任何时候工作,并找到了一个。(至少两项检查呈阳性)
如果您发现一个尝试的解决方案不起作用的领域,请随意提及它,我也将尝试改进代码片段以涵盖这种情况。
要找到给定字符串的定义域,三步解决方案似乎最有效:
- 首先,使用
- 第二,在任何dns服务器上查询可用的"Top-Most"A-Record。(我使用
checkdnsrr
为此目的:http://php.net/manual/en/function.checkdnsrr.php) - 最后但并非最不重要的:执行一些验证以确保您没有遇到一些"默认响应"。
parse_url
(http://php.net/manual/en/function.parse-url.php)获取实际的主机名我只做了一些测试,结果似乎和预期的一样。该方法直接生成输出,但可以修改为返回域名而不是生成输出:
<?php
getDomain("http://www.stackoverflow.com");
getDomain("http://www.google.co.uk");
getDomain("http://books.google.co.uk");
getDomain("http://a.b.c.google.co.uk");
getDomain("http://www.nominet.org.uk/intelligence/statistics/registration/");
getDomain("http://invalid.fail.pooo");
getDomain("http://AnotherOneThatShouldFail.com");
function getDomain($url){
echo "Searching Domain for '".$url."': ";
//Step 1: Get the actual hostname
$url = parse_url($url);
$actualHostname = $url["host"];
//step 2: Top-Down approach: check DNS Records for the first valid A-record.
//Re-Assemble url step-by-step, i.e. for www.google.co.uk, check:
// - uk
// - co.uk
// - google.co.uk (will match here)
// - www.google.co.uk (will be skipped)
$domainParts = explode(".", $actualHostname);
for ($i= count($domainParts)-1; $i>=0; $i--){
$domain = "";
$currentCountry = null;
for ($j = count($domainParts)-1; $j>=$i; $j--){
$domain = $domainParts[$j] . "." . $domain;
if ($currentCountry == null){
$currentCountry = $domainParts[$j];
}
}
$domain = trim($domain, ".");
$validRecord = checkdnsrr($domain, "A"); //looking for Class A records
if ($validRecord){
//If the host can be resolved to an ip, it seems valid.
//if hostname is returned, its invalid.
$hostIp = gethostbyname($domain);
$validRecord &= ($hostIp != $domain);
if ($validRecord){
//last check: DNS server might answer with one of ISPs default server ips for invalid domains.
//perform a test on this by querying a domain of the same "country" that is invalid for sure to obtain an
//ip list of ISPs default servers. Then compare with the response of current $domain.
$validRecord &= !(in_array($hostIp, gethostbynamel("iiiiiiiiiiiiiiiiiinvaliddomain." . $currentCountry)));
}
}
//valid record?
if ($validRecord){
//return $domain;
echo $domain."<br />";
return;
}
}
//return null;
echo " not resolved.<br />";
}
?>
上面例子的输出:
Searching Domain for 'http://www.stackoverflow.com': stackoverflow.com
Searching Domain for 'http://www.google.co.uk': google.co.uk
Searching Domain for 'http://books.google.co.uk': google.co.uk
Searching Domain for 'http://a.b.c.google.co.uk': google.co.uk
Searching Domain for 'http://www.nominet.org.uk/intelligence/statistics/registration/': nominet.org.uk
Searching Domain for 'http://invalid.fail.pooo': not resolved.
Searching Domain for 'http://AnotherOneThatShouldFail.com': not resolved.
这只是一组非常有限的测试用例,但我无法想象域没有a记录的情况。
作为一个很好的副作用,这也验证了url,而不仅仅依赖于理论上有效的格式,就像最后的例子所显示的那样。
最好的,dognose