混合贪婪和非贪婪的正则表达式


regex mixing greedy and non-greedy?

我有一个字符串,我试图在其中插入易于处理的数据。对于这个例子,我想要收入以及共识数据。

$digits = '['$]?['d]{1,3}(?:['.]['d]{1,2})?';
$price = '(?:' . $digits . '(?:['-])?' . $digits . '['s]?(?:million|billion)?)';
$str = 'revenue of $31-34 billion, versus the consensus of $29.3 billion';
preg_match_all('/(?:revenue|consensus)(?:.*)' . $price . '/U', $str, $matches[]);
print_r($matches);

退货:

Array (
    [0] => Array (
        [0] => Array (
            [0] => 'revenue of $31'
            [1] => 'consensus of $29'
        )
    )
)

我所期待的:

Array (
    [0] => Array (
        [0] => Array (
            [0] => 'revenue of $31-34 billion'
            [1] => 'consensus of $29.3 billion'
        )
    )
)

当我省略U修饰符时:

Array (
    [0] => Array (
        [0] => Array (
            [0] => 'revenue of $31-34 billion, versus the consensus of $29.3 billion'
        )
    )
)

我不能在revenue of $31-34 billion中使用of作为确定模式,数据可能会/可能不会使用它,因此我使用了(?:.*)

preg_match_all('/(?:revenue|consensus)(?:.*?)' . $price . '/', $str, $matches[]);
                                           ^               ^  

您可以通过添加?使一个特定的通配符不贪婪,如.*?中所示。去掉全局/U修饰符,将上面的通配符改为非贪婪,只保留$digits$price

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => revenue of $31-34 billion
                    [1] => consensus of $29.3 billion
                )
        )
)