使用 php 在正则表达式中查找基本域


Finding baseDomain in regex using php

我想做的是搜索$BaseDomain它是否存在于正则表达式的链接匹配中,如果确实如此,则无需在另一次点击中打开链接。我还尝试使用$replacePattern1而不是$1$replacePattern2而不是$2。当链接不是$BaseDomain时,target = _blank永远不会追加到href。请问我错在哪里?

function ReplaceUrlToHtmlLink($source) {
           /// my site name ///
            $BaseDomain = "ivotism.com";
    //URLs starting with http://, https://, or ftp://
    $replacePattern1 = '/('b(https?|ftp):'/'/[-A-Z0-9+&@#'/%?=~_|!:,.;]*[-A-Z0-9+&@#'/%=~_|])/i';
    $source =  preg_replace($replacePattern1, '<a href="$1" ' . (strpos('$1', $BaseDomain) !== false ?
                    ' target="_blank"' : '') . '>$1</a>', $source);
    //URLs starting with "www." (without // before it, or it'd re-link the ones done above).
    $replacePattern2 = '/(^|[^'/])(www'.['S]+('b|$))/i';
    $source =  preg_replace($replacePattern2, ' <a href="http://$2" ' . (strpos('$2', $BaseDomain) !== false ?
                    ' target="_blank"' : '') .'>$2</a>', $source);
    //Change email addresses to mailto:: links.
    $replacePattern3 = '/(([a-zA-Z0-9'-'_'.])+@[a-zA-Z'_]+?('.[a-zA-Z]{2,6})+)/i';
    $source =  preg_replace($replacePattern3, '<a href="mailto:$1">$1</a>', $source);
     return $source;
}
在所有

测试中使用=== true。因为!== false也为0工作.

但是不可能在其他函数中使用 $1(或 $n),即使在相同的行参数中也是如此。

用:

function ReplaceUrlToHtmlLink($source) {
    /// my site name ///
        $BaseDomain = "ivotism.com";    // or "ivotism'.com"
    //URLs starting with http://, https://, or ftp://
    $replacePattern1 = '/('b(https?|ftp):'/'/[-A-Z0-9+&@#'/%?=~_|!:,.;]*[-A-Z0-9+&@#'/%=~_|])/i';
    $source =  preg_replace($replacePattern1, '<a href="$1" target="_blank">$1</a>', $source);
    //URLs starting with "www." (without // before it, or it'd re-link the ones done above).
    $replacePattern2 = '/(^|[^'/])(www'.['S]+('b|$))/i';
    $source =  preg_replace($replacePattern2, ' <a href="http://$2" target="_blank">http://$2</a>', $source);
    //Change link with domain.
    $replacePattern3 = '/(<a href="[^"]+'. $BaseDomain .'[^"]*") target="_blank">/i';
    $source =  preg_replace($replacePattern3, '$1>', $source);
    //Change email addresses to mailto:: links.
    $replacePattern4 = '/(([a-zA-Z0-9'-'_'.])+@[a-zA-Z'_]+?('.[a-zA-Z]{2,6})+)/i';
    $source =  preg_replace($replacePattern4, '<a href="mailto:$1">$1</a>', $source);
     return $source;
}

我试图制作一个干净且相对健壮的函数,它只解析一次文本。

我自愿使用朴素的子模式来描述从主机名到URL末尾的部分(即 'S+(?<=['PP#?/&])并描述电子邮件(即 [^'s@]+@[^'s@]+显然过于宽容。因此,请随时改进它们。但是,请记住,模式的目标是提取,而不是验证。验证任务按部分分配给parse_urlfilter_var

define ('BASE_DOMAIN', 'ivotism.com');
define ('DEFAULT_SCHEME', 'http');
function createLinks ($text) {
    $pattern =  '~
      (?=[hsfw/])     # quick check
      (?:             # start with scheme or //
          (?:
              'b (?<scheme> https? | s?ftp | ftps? ) :
            |
              (?<='s|'A) # when there is no scheme, slashes
                         # must be preceded by a whitespace
          )
          (?<slashes> // )
          (?: www'. )?
        |             # OR start with "www."
          'bwww'.
      )
      'S+             # non-whitespace characters
      (?<=['PP#?/&])  # last character allowed
    |
      (?<mail> [^'s@]+ @ [^'s@]+ )
    ~ix';
    return preg_replace_callback($pattern, function ($m) {
        if (isset($m['mail'])) {
            if (filter_var($m['mail'], FILTER_VALIDATE_EMAIL))
                return '<a href="mailto:' . $m['mail'] . '">' . $m['mail'] . '</a>';
            return $m[0];   
        } else {
            $url = (empty($m['scheme']) ? DEFAULT_SCHEME . ':' : '')
                 . (empty($m['slashes']) ? '//' : '') . $m[0];
            $host = parse_url($url, PHP_URL_HOST);
            if (empty($host) || preg_match('~(?:'A|'.)'Q'. BASE_DOMAIN . ''E'z~i', $host))
                return $m[0];
            return '<a href="'. $url . '" target="_blank">' . $url . '</a>';
        } 
    }, $text);
}