正则表达式匹配,仅提取所需的字符串段


Regular expression match, extracting only wanted segments of string

我正在尝试从字符串中提取三个段。由于我不是特别擅长正则表达式,我认为我所做的可能会做得更好。

我想提取以下字符串的粗体部分:

一些文本: ANYTHING_HERE (旧=ANYTHING_HERE,新=ANYTHING_HERE

一些例子可能是:

ABC:Some_Field(旧=,新=123)

ABC:Some_Field(旧=ABCde,新=1234)

ABC:Some_Field(旧=你好世界,新=再见世界)

因此,上述内容将返回以下匹配项:

$matches[0] = 'Some_Field';
$matches[1] = '';
$matches[2] = '123';

到目前为止,我有以下代码:

preg_match_all('/^([a-z]*':('s?)+)(.+)('s?)+'(old=(.+)',('s?)+new=(.+)')/i',$string,$matches);

上面的问题是它为字符串的每个单独段返回匹配项。我不知道如何使用正则表达式确保字符串是正确的格式,而无需捕获和存储匹配项,如果这有意义?

那么,我

的问题,如果还不清楚,我如何从上面的字符串中检索我想要的段?

你不需要preg_match_all .您可以使用此preg_match调用:

$s = 'SOMETEXT: ANYTHING_HERE (Old=ANYTHING_HERE1, New=ANYTHING_HERE2)';
if (preg_match('/[^:]*:'s*('w*)'s*'(Old=('w*),'s*New=('w*)/i', $s, $arr))
   print_r($arr);

输出:

Array
(
    [0] => SOMETEXT: ANYTHING_HERE (Old=ANYTHING_HERE1, New=ANYTHING_HERE2
    [1] => ANYTHING_HERE
    [2] => ANYTHING_HERE1
    [3] => ANYTHING_HERE2
)
if(preg_match_all('/([a-z]*)':'s*.+'(Old=(.+),'s*New=(.+)')/i',$string,$matches)) {
    print_r($matches);
}

例:

$string = 'ABC: Some_Field (Old=Hello World,New=Bye Bye World)';

将匹配:

Array
(
    [0] => Array
        (
            [0] => ABC: Some_Field (Old=Hello World,New=Bye Bye World)
        )
    [1] => Array
        (
            [0] => ABC
        )
    [2] => Array
        (
            [0] => Hello World
        )
    [3] => Array
        (
            [0] => Bye Bye World
        )
)

问题是您使用的括号比您需要的要多,因此捕获的输入段比您希望的要多。

例如,每个('s?)+段都应该是's*

您正在寻找的正则表达式是:

[^:]+:'s*(.+)'s*'(old=(.*)'s*,'s*new=(.*)')

在 PHP 中:

preg_match_all('/[^:]+:'s*(.+)'s*'(old=(.*)'s*,'s*new=(.*)')/i',$string,$matches);

可以在这里找到一个有用的工具:http://www.myregextester.com/index.php

该工具提供了一个"解释"复选框(以及您想要选择的"PHP"复选框和"i"标志复选框),它还提供了正则表达式的完整说明。为了后人,我也在下面进行了解释:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?i-msx:                 group, but do not capture (case-insensitive)
                         (with ^ and $ matching normally) (with . not
                         matching 'n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  [^:]+                    any character except: ':' (1 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  's*                      whitespace ('n, 'r, 't, 'f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to '1:
----------------------------------------------------------------------
    .+                       any character except 'n (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of '1
----------------------------------------------------------------------
  's*                      whitespace ('n, 'r, 't, 'f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  '(                       '('
----------------------------------------------------------------------
  old=                     'old='
----------------------------------------------------------------------
  (                        group and capture to '2:
----------------------------------------------------------------------
    .*                       any character except 'n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of '2
----------------------------------------------------------------------
  's*                      whitespace ('n, 'r, 't, 'f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  ,                        ','
----------------------------------------------------------------------
  's*                      whitespace ('n, 'r, 't, 'f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  new=                     'new='
----------------------------------------------------------------------
  (                        group and capture to '3:
----------------------------------------------------------------------
    .*                       any character except 'n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of '3
----------------------------------------------------------------------
  ')                       ')'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

像这样^_^更简单的东西呢

[:=]'s*(['w's]*)

现场演示

:'s*([^('s]+)'s*'(Old=([^,]*),New=([^)]*)

现场演示

另外,请告知您是否需要解释。