真实存在的facebook页面及其url结构的有效性 - validity of facebook page whther it really exists and its url structure

我花了几个小时研究如何验证Facebook页面。我发现并阅读了很多文章/帖子，但没有找到符合我要求的内容。我想把用户输入的url（$rawurl）转换成我想要的格式（$古德url），在谷歌上，我发现regex是实现这一点的方法，但它非常复杂，难以理解，需要帮助。

用户可以按自己喜欢的方式输入URL示例：

http://facebook.com/WillSmith, 
https://facebook.com/WillSmith, 
http://www.facebook.com/WillSmith, 
https://www.facebook.com/WillSmith, 
www.facebook.com/WillSmith 
or just facebook.com/WillSmith

或者任何其他方式。不仅如此，除了虚荣的url格式外，脸书页面还提供了其他格式，如facebook.com/pages/usernames/somenumbers。像en-gb.facebook.com这样的子域让事情变得更加困难。所以在谷歌上搜索了更多之后，我找到了一个正则表达式http[s]?://(www|[a-zA-Z]{2}-[a-zA-Z]{2})'.facebook'.com/(pages/[a-zA-Z0-9'.-]+/[0-9]+|[a-zA-Z0-9'.-]+)[/]?$，但不确定它是否能满足上述所有条件。

帮助我所需要的：1.我需要的标准格式是https://www.facebook.com/WillSmith2.我还需要检查它是否是一个有效的URL。例如，上面的url是有效的，如果你看到这个url https://www.facebook.com/WillSmith555，它符合有效的标准，但Facebook上没有这样的页面。上面写着"对不起，此页面不可用。您关注的链接可能已断开，或者该页面可能已被删除"，并带有一张断开的拇指支撑图片。

在检查了这两个条件后，我需要在php文件中进行echo，无论用户输入的url在进行regex转换后是有效的还是无效的。

请帮忙。

您可以在facebook:上执行仅限头部的请求

<?php
    function header_req( $url )
    {
        $channel = curl_init();
        curl_setopt($channel, CURLOPT_URL, $url);
        curl_setopt($channel, CURLOPT_CONNECTTIMEOUT, 10);
        curl_setopt($channel, CURLOPT_TIMEOUT, 10);
        curl_setopt($channel, CURLOPT_HEADER, true);
        curl_setopt($channel, CURLOPT_NOBODY, true);
        curl_setopt($channel, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($channel, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201');
        curl_setopt($channel, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($channel, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
        curl_setopt($channel, CURLOPT_SSL_VERIFYPEER, FALSE);
        curl_setopt($channel, CURLOPT_SSL_VERIFYHOST, FALSE); 
        curl_exec($channel);
        $httpCode = curl_getinfo( $channel, CURLINFO_HTTP_CODE );
        curl_close($channel);
        return $httpCode;
    }
    $url = "https://www.facebook.com/WillSmith";
    //lets check the url for facebook as host:

    // 1 add http if not found in URL
    if ( stripos( $url , "http") !== 0)
        $url = "http://" . $url;

    // 2 get facebook.com from URL
    $host = parse_url( $url, PHP_URL_HOST );
    // 3 if host is indeed facebook.com then continue
    if ( stripos( $host , "facebook.com" ) )
    {
        $response = header_req($url);
        if ( $response === 200 || $response === 302 )
            echo "Page Found";
        else
            echo "Page Not Found";
    }
?>

优点：

它将只获得大约1KB-5KB的页面标题。
不使用Regexp。
所有页面都经过验证，无论模式是什么：）

关于正则表达式：

您需要用反斜杠转义斜杠
一些修改以匹配您的所有示例

您的正则表达式，已修改：

^(http[s]?:'/'/)?((www|[a-zA-Z]{2}-[a-zA-Z]{2})'.)?facebook'.com'/(pages'/[a-zA-Z0-9'.-]+'/[0-9]+|[a-zA-Z0-9'.-]+)'b['/]?$

演示：http://regex101.com/r/lN1tN6/1