这个条件正则表达式是最有效的吗?


Is this conditional regex the most efficient?

我将给出我的PHP示例。我正在测试引号字符串是否正确关闭(例如,引号字符串必须以双引号结束,如果以dq开始)。引号之间必须至少有1个字符,并且引号之间的字符集不能包含相同的起始/结束引号字符。例如:

$myString = "hello";// 'hello' also good but "hello' should fail
if (preg_match("/^('")?[^'"]+(?(1)'")|('')?[^'']+(?(1)'')$/", $myString)) {
    die('1');
} else {
    die('2');
}
// The string '1' is outputted which is correct

我是条件正则表达式的新手,但对我来说,我似乎不能使preg_match()更简单。这是正确的吗?

要做到这一点,不需要使用"条件特性"。但是你需要检查字符串从开始到结束(换句话说,你不能只检查字符串的一部分):

preg_match('~'A[^"'']*+(?:"[^"'''']*+(?:''''.[^"'''']*)*+"[^"'']*|''[^'''''']*+(?:''''.[^'''''']*)*+''[^"'']*)*+'z~s', $str)

如果你绝对想在引号内至少有一个字符,你需要添加这些前导(?=[^"])(?=[^']):

preg_match('~'A[^"'']*+(?:"(?=[^"])[^"'''']*+(?:''''.[^"'''']*)*+"[^"'']*|''(?=[^''])[^'''''']*+(?:''''.[^'''''']*)*+''[^"'']*)*+'z~s', $str)

细节:

~
'A  # start of the string
[^"']*+ #"# all that is not a quote
(?:
    " #"# opening quote
    (?=[^"]) #"# at least one character that isn't a quote
    [^"'']*+ #"# all characters that are not quotes or backslashes
    (?:''.[^"'']*)*+ #"# an escaped character and the same (zero or more times)
    " #"# closing quote
    [^"']*  
  | #"# or same thing for single quotes
    '(?=[^'])[^''']*+(?:''.[^''']*)*+'[^"']*
)*+
'z  # end of the string
~s  # singleline mode: the dot matches newlines too

演示

注意,这些模式是为处理转义字符而设计的。

大多数情况下,条件语句可以用一个简单的替换来代替。

顺便说一句:不要认为较短的模式总是比较长的模式好,这是一个错误的想法。

基于下面的两个观察,我构建了一个简单而快速的正则表达式,但是处理转义引号

  • OP被特别询问字符串$str = "hello, I said: '"How are you?'""是否无效,没有响应
  • OP提到性能(以效率为标准)

我也不喜欢难以阅读的代码,所以我使用<<< Nowdoc符号来避免在正则表达式模式

中转义任何内容。

我的解决方案:

$strings = [
    "'hello's the word'",
    "'hello is the word'",
    '"hello "there" he said"',
    '"hello there he said"',
    '"Hi',
    "'hello",
    "no quotes",
    "''"
];
$regexp = <<< 'TEXT'
/^('|")(?:(?!'1).)+'1$/
TEXT;
foreach ($strings as $string):
    echo "$string - ".(preg_match($regexp,$string)?'true':'false')."<br/>";
endforeach;
输出:

'hello's the word' - false
'hello is the word' - true
"hello "there" he said" - false
"hello there he said" - true
"Hi - false
'hello - false
no quotes - false
'' - false

工作原理:

^('|")   //starts with single or double-quote
(?:      //non-capturing group
  (?!'1) //next char is not the same as first single/double quote
  .      //advance one character
)+       //repeat group with next char (there must be at least one char)
'1$      //End with the same single or double-quote that started the string