如何检查文本是否包含特定域名


How to check if a text contains a specific domain name?

我目前有这个,但它并非完美无缺:

$testcases = array(
array("I love mywebsite.com", true),
array("mywebsite.com/ is what I like", true),
array("www.mywebsite.com is my website", true),
array("Check out www.mywebsite.com/", true),
array("... http://mywebsite.com ...", true),
array("... http://mywebsite.com/ ...", true),
array("... http://www.mywebsite.com ...", true),
array("... http://www.mywebsite.com/ ...", true),
array("I like commas and periods. Just like www.mywebsite.com, they do it too!", true),
array("thisismywebsite.com is a lot better", false),
array("The URL fake.mywebsite.com is unknown to their server", false),
array("Check out http://redirect.mywebsite.com/www.ultraspammer.com", false)
);
function contains_link($text) {
return preg_match("/(https?:'/'/(?:www'.)?|(?:www'.))mywebsite'.com/", $text) > 0;
}
foreach ($testcases as $case) {
echo $case[0] . "=".(contains_link($case[0]) ? "true" : "false") . " and it should be " . ($case[1] ? "true" : "false") . "<br />";
}

输出:

I love mywebsite.com=false and it should be true
mywebsite.com/ is what I like=false and it should be true
www.mywebsite.com is my website=true and it should be true
Check out www.mywebsite.com/=true and it should be true
... http://mywebsite.com ...=true and it should be true
... http://mywebsite.com/ ...=true and it should be true
... http://www.mywebsite.com ...=true and it should be true
... http://www.mywebsite.com/ ...=true and it should be true
I like commas and periods. Just like www.mywebsite.com, they do it too!=true and it should be true
thisismywebsite.com is a lot better=false and it should be false
The URL fake.mywebsite.com is unknown to their server=false and it should be false
Check out http://redirect.mywebsite.com/www.ultraspammer.com=false and it should be false

正则表达式的替代方案:parse_url()

$url = parse_url($text);
if($url['host'] == 'www.mywebsite.com' || $url['host'] == 'mywebsite.com')

更新:

假设$text可以有很多域,请改用strstr()

if(strstr($text,"mywebsite.com") !== FALSE)

更新 2:

function contains_link($text) {
        return preg_match("/(^(https?:'/'/(?:www'.)?|(?:www'.))?|'s(https?:'/'/(?:www'.)?|(?:www'.))?)mywebsite'.com/", $text);
}

和:

  contains_link("AAAAAAA http://mywebsite.com"); //1
  contains_link("foo BAaa http://www.mywebsite.com"); //1
  contains_link("abc.com www.mywebsite.com"); // 1
我认为

您要查找的是:

^(https?://)?(www'.)?mywebsite'.com/?

在这里看到它的实际效果: http://regexr.com?30t6m


这是在PHP中:

function contains_link($text) {
    return preg_match("~^(https?://)?(www'.)?mywebsite'.com/?~", $text);
}

附言如果你想确保它后面没有任何内容,你应该在末尾附加一个$

如果您只搜索文本:

strpos($text, "mywebsite.com") !== FALSE

如果您想搜索确切的"单词"(开始):

preg_match("/(^|'s)(https?:'/'/)?(www'.)?mywebsite'.com/", $text);

或(开始和结束):

preg_match("/(^|'s)(https?:'/'/)?(www'.)?mywebsite'.com'/?('s|[,.]|$)/", $text);