我已经在我们的OSticket系统上实现了"linkify",它似乎运行得很好,除了一个我在文档中没有注意到的缺陷http://jmrware.com/articles/2010/linkifyurl/linkify.php.当连杆后面跟着a)时,它包括连杆中的)。GRRR
所以我想知道是否有任何超级regex的家伙可以调试,因为linkify已经几年没有更新了。。。
如果需要,这里是正则表达式:
$url_pattern = '/# Rev:20100913_0900 github.com'/jmrware'/LinkifyURL
# Match http & ftp URL that is not already linkified.
# Alternative 1: URL delimited by (parentheses).
('() # $1 "(" start delimiter.
((?:ht|f)tps?:'/'/[a-z0-9'-._~!$&''()*+,;=:'/?#[']@%]+) # $2: URL.
(')) # $3: ")" end delimiter.
| # Alternative 2: URL delimited by [square brackets].
('[) # $4: "[" start delimiter.
((?:ht|f)tps?:'/'/[a-z0-9'-._~!$&''()*+,;=:'/?#[']@%]+) # $5: URL.
(']) # $6: "]" end delimiter.
| # Alternative 3: URL delimited by {curly braces}.
('{) # $7: "{" start delimiter.
((?:ht|f)tps?:'/'/[a-z0-9'-._~!$&''()*+,;=:'/?#[']@%]+) # $8: URL.
('}) # $9: "}" end delimiter.
| # Alternative 4: URL delimited by <angle brackets>.
(<|&(?:lt|'#60|'#x3c);) # $10: "<" start delimiter (or HTML entity).
((?:ht|f)tps?:'/'/[a-z0-9'-._~!$&''()*+,;=:'/?#[']@%]+) # $11: URL.
(>|&(?:gt|'#62|'#x3e);) # $12: ">" end delimiter (or HTML entity).
| # Alternative 5: URL not delimited by (), [], {} or <>.
( # $13: Prefix proving URL not already linked.
(?: ^ # Can be a beginning of line or string, or
| [^='s''"']] # a non-"=", non-quote, non-"]", followed by
) 's*[''"]? # optional whitespace and optional quote;
| [^='s]'s+ # or... a non-equals sign followed by whitespace.
) # End $13. Non-prelinkified-proof prefix.
( 'b # $14: Other non-delimited URL.
(?:ht|f)tps?:'/'/ # Required literal http, https, ftp or ftps prefix.
[a-z0-9'-._~!$''()*+,;=:'/?#[']@%]+ # All URI chars except "&" (normal*).
(?: # Either on a "&" or at the end of URI.
(?! # Allow a "&" char only if not start of an...
&(?:gt|'#0*62|'#x0*3e); # HTML ">" entity, or
| &(?:amp|apos|quot|'#0*3[49]|'#x0*2[27]); # a [&''"] entity if
[.!&'',:?;]? # followed by optional punctuation then
(?:[^a-z0-9'-._~!$&''()*+,;=:'/?#[']@%]|$) # a non-URI char or EOS.
) & # If neg-assertion true, match "&" (special).
[a-z0-9'-._~!$''()*+,;=:'/?#[']@%]* # More non-& URI chars (normal*).
)* # Unroll-the-loop (special normal*)*.
[a-z0-9'-_~$()*+='/#[']@%] # Last char can''t be [.!&'',;:?]
) # End $14. Other non-delimited URL.
/imx';
我开始摆弄它,但事实证明它有点让我不知所措。我也在寻找其他替代建议,而不是linkify?
如果你注意到倒数第二行:# Last char can''t be [.!&'',;:?]
基本上,最好在其中添加一个)?
如果你只想匹配看起来像HTTP/FTP URL但不以标点符号等结尾的子字符串,你可以使用一些简单的东西,比如:
'b(?:ht|f)tps?://[^'s<>"']+(?![][()<>{}.,!?:;"'])