实现网址正则表达式


Implementing web address regular expression

我在网上找到了以下内容,但在实现时遇到了问题

(http|ftp|https):'/'/['w'-_]+('.['w'-_]+)+(['w'-'.,@?^=%&:/~'+#]*['w'-'@?^=%&/~'+#])?

这就是我想要php做的:

取以下内容:Look here: http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php

并将其转换为:Look here: <a href="http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php">http://www.rocketlanguages.com/span...anish_accents.php</a>

如果URL很长,则a文本会被分解为。。。在中间

试试这个:

// URL regex from here:
// http://daringfireball.net/2010/07/improved_regex_for_matching_urls
define( 'URL_REGEX', <<<'_END'
~(?i)'b((?:[a-z]['w-]+:(?:/{1,3}|[a-z0-9%])|www'd{0,3}[.]|[a-z0-9.'-]+[.][a-z]{2,4}/)(?:[^'s()<>]+|'(([^'s()<>]+|('([^'s()<>]+')))*'))+(?:'(([^'s()<>]+|('([^'s()<>]+')))*')|[^'s`!()'[']{};:'".,<>?«»“”‘’]))~
_END
);
// PHP 5.3 or higher, can use closures (anonymous functions)
function replace_urls_with_anchor_tags( $string,
                                        $length = 50,
                                        $elision_string = '...' ) {
    $replace_function = function( $matches ) use ( $length, $elision_string) {
        $matched_url = $matches[ 0 ];
        return '<a href="' . $matched_url . '">' .
                abbreviated_url( $matched_url, $length, $elision_string )   .
                '</a>';
    };
    return preg_replace_callback(
        URL_REGEX,
        $replace_function,
        $string
    );
}
function abbreviated_url( $url, $length = 50, $elision_string = '...' ) {
    if ( strlen( $url ) <= $length ) {
        return $url;
    }
    $width_either_side = (int) ( ( $length - strlen( $elision_string ) ) / 2 );
    $left  = substr( $url, 0, $width_either_side );
    $right = substr( $url, strlen( $url ) - $width_either_side );
    return $left . $elision_string . $right;
}

(URL_REGEX定义中的反调混淆了stackoverflow.com的语法高亮显示,但没什么好担心的)

函数replace_urls_with_anchor_tags获取一个字符串,并将其中匹配的所有URL更改为锚标记,通过省略省略号来缩短长URL。该函数采用可选的lengthelision_string参数,以防您想篡改长度并将省略号更改为其他参数。

下面是一个用法示例:

// Test it out
$test = <<<_END
Look here:
http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php
And here:
http://stackoverflow.com/questions/12385770/implementing-web-address-regular-expression
_END;
echo replace_urls_with_anchor_tags( $test, 50, '...' );
// OUTPUT:
// Look here:
// <a href="http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php">http://www.rocketlangua...ion_spanish_accents.php</a>
//
// And here:
// <a href="http://stackoverflow.com/questions/12385770/implementing-web-address-regular-expression">http://stackoverflow.co...ress-regular-expression</a>

请注意,如果您使用的是PHP 5.2或更低版本,则必须重写replace_urls_with_anchor_tags以使用create_function而不是闭包。直到PHP 5.3:才引入闭包

// No closures in PHP 5.2, must use create_function()
function replace_urls_with_anchor_tags( $string,
                                        $length = 50,
                                        $elision_string = '...' ) {
    $replace_function = create_function(
        '$matches',
        'return "<a href='"$matches[0]'">" .
                abbreviated_url( $matches[ 0 ], '            .
                                 $length  . ', '             .
                                 '"' . $elision_string . '"' .
                               ') . "</a>";'
    );
    return preg_replace_callback(
        URL_REGEX,
        $replace_function,
        $string
    );
}

请注意,我将您找到的URL正则表达式替换为DaveRandom在其评论中提到的页面上链接的正则表达式。它更完整,事实上,您使用的正则表达式中确实有一个错误——有几个"/"字符没有转义(在这里:['w'-'.,@?^=%&amp;:/~'+#]*['w'-'@?^=%&amp;/~'+#])。此外,它不会检测到像80或8080这样的端口号。

希望这能有所帮助。

我正在使用这个正则表达式,它对我来说很好,如果你想要,可以试试这个

(http|https|ftp):'/'/[a-z0-9]+(['-'.]{1}[a-z0-9]+)*'.[a-z]{2,5}(:[0-9]{1,5})?('/.*)?