Regex:匹配所有行中的子字符串，除非子字符串在注释部分中 - Regex: Match a substring in all lines, except when the substring is inside a comment section

Regex: Match a substring in all lines, except when the substring is inside a comment section

本文关键字：字符串注释部 Regex | 更新日期: 2024-01-22

我来了：

我正在编写一个PHP应用程序，我有了一个新的官方域，所有常见问题解答都在这里。我的脚本中的一些文件包括到旧FAQ域的帮助链接，所以我想用新域替换它们。然而，只有当URL位于评论或评论块下时，我才想保持链接到旧域的URL（我仍然使用旧域进行自我参考和其他文档）。

所以，基本上，我想要实现的是一个正则表达式，它在给定以下条件下工作：

匹配所有行*中出现的所有example.com
不要匹配整行，只匹配example.com字符串
- 如果该行以//、/*或"*"开头，则该行中的任何example.com实例都不匹配（不过，如果注释块在打开的同一行中关闭，则可能会出现问题）

我通常这样写我的区块评论：

/* text
 * blah 
 * blah
*/

这就是为什么如果"example.com"位于//、/*或"*"之后，我不想匹配它。

我想应该是这样的：

^(?:(?!//|/'*|'s'*).?).*example'.com

但这有一个问题：它匹配整行，而不是只匹配"example.com"（这主要是在一行中匹配两个或多个"example.com"字符串时会导致问题）。

有人能帮我修正则表达式吗请注意：它不一定是PHP正则表达式，因为我总是可以使用grepWin这样的工具一次在本地编辑所有文件

哦，请告诉我是否有一种方法可以以某种方式概括块注释，比如：一旦找到/*，在找到*/之前不要匹配example.com。这将是非常有用的有可能在通用（非语言依赖）正则表达式中实现它吗？

一个正则表达式，只有在example.com不在注释部分的情况下才匹配（但它不关心行注释，所以必须单独执行）：

$result = preg_replace(
    '%example'.com # Match example.com
    (?!            # only if it''s not possible to match
     (?:           # the following:
      (?!/'*)      #  (unless an opening comment starts first)
      .            #  any character
     )*            # any number of times
     '*/           # followed by a closing comment.
    )              # End of lookahead
    %sx', 
    'newdomain.com', $subject);

我会使用某种标记器来区分注释和其他语言标记。

在处理PHP文件时，应该使用PHP自己的标记化函数token_get_all:

$tokens = token_get_all($source);

然后，您可以枚举令牌并按其类型将令牌分开：

foreach ($tokens as &$token) {
    if (in_array($token[0], array(T_COMMENT, T_DOC_COMMENT, T_ML_COMMENT))) {
        // comment
    } else {
        // not a comment
        $token[1] = str_replace('example.com', 'example.net', $token[1]);
    }
}

最后，使用implode将所有内容重新组合在一起。

对于手头没有合适的标记器的其他语言，您可以编写自己的小标记器：

preg_match_all('~/'*.*?'*/|//(?s).*|(example'.com)|.~', $code, $tokens, PREG_SET_ORDER);
foreach ($tokens as &$token) {
    if (strlen($token[1])) {
        $token = str_replace('example.com', 'example.net', $token[1]);
    } else {
        $token = $token[0];
    }
}
$code = implode('', $tokens);

请注意，这不考虑任何其他类似令牌的字符串。因此，如果它出现在字符串中，但也出现在诸如之类的"注释"中，则它将与example.com不匹配

'foo /* not a comment example.com */ bar'