正则表达式将URL解析为链接，但前提是它们还不是链接 - regular expression to parse URLs to links, but only if they are not links yet

regular expression to parse URLs to links, but only if they are not links yet

本文关键字：链接前提 URL 正则表达式 | 更新日期: 2023-09-27

我们使用以下正则表达式将文本中的URL转换为链接，如果它们太长，则会在中间用省略号缩短：

/**
 * Replace all links with <a> tags (shortening them if needed)
 */
$match_arr[] = '/((http|ftp)+(s)?:'/'/[^<>'s,!')]+)/ie';
$replace_arr[] = "'<a href='"''0'" title='"''0'" target='"_blank'">' . " .
    "( mb_strlen( '$0' ) > {$maxlength} ? mb_substr( '$0', 0, " . ( $maxlength / 2 ) . " ) . '…' . " .
    "mb_substr( '$0', -" . ( $maxlength / 2 ) . " ) : '$0' ) . " .
"'</a>'";

这是有效的。然而，我发现如果文本中已经有链接，比如：

$text = '... <a href="http://www.google.com">http://www.google.com</a> ...';

它将匹配两个URL，因此它将尝试再创建两个<a>标签，当然这完全打乱了DOM。

如果链接已经在<a>标记中，如何防止正则表达式匹配？它也将在title属性中，所以基本上我只想完全跳过每个<a>标签。

最简单的方法（使用正则表达式，在这种情况下它可能不是最可靠的工具）可能是确保链接后没有</a>：

#(http|ftp)+(s)?://[^<>'s,!')]++(?![^<]*</a>)#ie

我使用所有格量词来确保整个URL匹配（即，为了满足前瞻性，没有回溯）。