如果 URL 的一部分,如何忽略正则表达式


How to ignore regex if part of a URL?

在我的一个PHP站点上,我使用此正则表达式自动从字符串中删除电话号码:

$text = preg_replace('/'+?[0-9][0-9()-'s+]{4,20}[0-9]/', '[removed]', $text);

但是,当用户发布包含多个数字作为其文本一部分的长 URL 时,URL 也会受到 preg_replace 的影响,这会破坏 URL。

如何确保上述preg_replace不会更改$text中包含的 URL?

编辑:

根据要求,下面是上述preg_replace破坏 URL 的示例:

$text = 'Please help me with my question here: https://stackoverflow.com/questions/20589314/  Thanks!';
$text = preg_replace('/'+?[0-9][0-9()-'s+]{4,20}[0-9]/', '[removed]', $text);
echo $text; 
//echoes: Please help me with my question here: https://stackoverflow.com/questions/[removed]/ Thanks!

我认为您必须解析网址和电话号码,例如/(?: url 'K | phone number)/ - sln
@sln:我该怎么做?如果有帮助,这里有一个URL正则表达式:stackoverflow.com/a/8234912/869849 - ProgrammerGirl

下面是一个使用提供的 url 和电话编号正则表达式的示例:

Php 测试用例

 $text = 'Please help me with my +44-83848-1234 question here: http://stackoverflow.com/+44-83848-1234questions/20589314/ phone #:+44-83848-1234-Thanks!';
 $str = preg_replace_callback('~((?:(?:[a-zA-Z]{3,9}:(?://)?)(?:[;:&=+$,'w-]+@)?[a-zA-Z0-9.-]+|(?:www'.|[;:&=+$,'w-]+@)[a-zA-Z0-9.-]+)(?:(?:/[+'~%/.'w-]*)?'??[+=&;%@.'w-]*'#?'w*)?)|('+?[0-9][0-9()'s+-]{4,20}[0-9])~',
                   function( $matches ){
                        if ( $matches[1] != "" ) {
                             return $matches[1];
                        }
                        return '[removed]';
                   },
                   $text);
 print $str;

输出>>

 Please help me with my [removed] question here: http://stackoverflow.com/+44-83848-1234questions/20589314/ phone #:[removed]-Thanks!

正则表达式,使用 RegexFormat 处理

 # '~((?:(?:[a-zA-Z]{3,9}:(?://)?)(?:[;:&=+$,'w-]+@)?[a-zA-Z0-9.-]+|(?:www'.|[;:&=+$,'w-]+@)[a-zA-Z0-9.-]+)(?:(?:/[+'~%/.'w-]*)?'??[+=&;%@.'w-]*'#?'w*)?)|('+?[0-9][0-9()'s+-]{4,20}[0-9])~'
     (                                  # (1 start), URL
          (?:
               (?:
                    [a-zA-Z]{3,9} :
                    (?: // )?
               )
               (?: [;:&=+$,'w-]+ @ )?
               [a-zA-Z0-9.-]+ 
            |  
               (?: www '. | [;:&=+$,'w-]+ @ )
               [a-zA-Z0-9.-]+ 
          )
          (?:
               (?: / [+~%/.'w-]* )?
               '??
               [+=&;%@.'w-]* 
               '#?
               'w* 
          )?
     )                                  # (1 end)
  |  
     (                                  # (2 start), Phone Num
          '+? 
          [0-9] 
          [0-9()'s+-]{4,20} 
          [0-9] 
     )                                  # (2 end)

你应该多做一些编码,所以与其抚摸你的头,不如去抚摸你的自我!

<?php
    $text = "This is my number20558789yes with no spaces
    and this is yours 254785961
    But this 20558474 is within http://stackoverflow.com/questions/20558474/
    So I don't remove it
    and this is another url http://stackoverflow.com/questions/20589314/ 
    Thanks!";
    $up = "(https?://[-.a-zA-Z0-9]+'.[a-zA-Z]{2,3}/'S*)"; // to catch urls
    $np = "('+?[0-9][0-9()-'s+]{4,20}[0-9])"; // you know this pattern already
    preg_match_all("#{$up}|{$np}#", $text, $matches); // match all above patterns together ($matches[1] contains urls, $matches[2] contains numbers)
    preg_match_all("#{$np}#", print_r(array_filter($matches[1]), true), $urls_numbers); // extract numbers from urls, actually if we have any
    $diff = array_diff(array_filter($matches[2]), $urls_numbers[0]); // an array with numbers that we should replace
    $text = str_replace($diff, "[removed]", $text); // replacing
    echo $text; // here you are

然后输出

This is my number[removed]yes with no spaces
and this is yours [removed]
But this 20558474 is within http://stackoverflow.com/questions/20558474/
So I don't remove it
and this is another url http://stackoverflow.com/questions/20589314/ 
Thanks!

假设电话号码前面经常有空格或位于一行的开头是否公平?如果是这样,这将阻止您意外更改 URL,因为 URL 中间既不存在空格也不存在换行符:

$text = preg_replace('/(^|'s)'+?[0-9][0-9()-'s+]{4,20}[0-9]/', '[removed]', $text);