我将给出我的PHP示例。我正在测试引号字符串是否正确关闭(例如,引号字符串必须以双引号结束,如果以dq开始)。引号之间必须至少有1个字符,并且引号之间的字符集不能包含相同的起始/结束引号字符。例如:
$myString = "hello";// 'hello' also good but "hello' should fail
if (preg_match("/^('")?[^'"]+(?(1)'")|('')?[^'']+(?(1)'')$/", $myString)) {
die('1');
} else {
die('2');
}
// The string '1' is outputted which is correct
我是条件正则表达式的新手,但对我来说,我似乎不能使preg_match()更简单。这是正确的吗?
要做到这一点,不需要使用"条件特性"。但是你需要检查字符串从开始到结束(换句话说,你不能只检查字符串的一部分):
preg_match('~'A[^"'']*+(?:"[^"'''']*+(?:''''.[^"'''']*)*+"[^"'']*|''[^'''''']*+(?:''''.[^'''''']*)*+''[^"'']*)*+'z~s', $str)
如果你绝对想在引号内至少有一个字符,你需要添加这些前导(?=[^"])
和(?=[^'])
:
preg_match('~'A[^"'']*+(?:"(?=[^"])[^"'''']*+(?:''''.[^"'''']*)*+"[^"'']*|''(?=[^''])[^'''''']*+(?:''''.[^'''''']*)*+''[^"'']*)*+'z~s', $str)
细节:
~
'A # start of the string
[^"']*+ #"# all that is not a quote
(?:
" #"# opening quote
(?=[^"]) #"# at least one character that isn't a quote
[^"'']*+ #"# all characters that are not quotes or backslashes
(?:''.[^"'']*)*+ #"# an escaped character and the same (zero or more times)
" #"# closing quote
[^"']*
| #"# or same thing for single quotes
'(?=[^'])[^''']*+(?:''.[^''']*)*+'[^"']*
)*+
'z # end of the string
~s # singleline mode: the dot matches newlines too
演示注意,这些模式是为处理转义字符而设计的。
大多数情况下,条件语句可以用一个简单的替换来代替。
顺便说一句:不要认为较短的模式总是比较长的模式好,这是一个错误的想法。
基于下面的两个观察,我构建了一个简单而快速的正则表达式,但是不处理转义引号
- OP被特别询问字符串
$str = "hello, I said: '"How are you?'""
是否无效,没有响应 - OP提到性能(以效率为标准)
我也不喜欢难以阅读的代码,所以我使用<<<
Nowdoc符号来避免在正则表达式模式
我的解决方案:
$strings = [
"'hello's the word'",
"'hello is the word'",
'"hello "there" he said"',
'"hello there he said"',
'"Hi',
"'hello",
"no quotes",
"''"
];
$regexp = <<< 'TEXT'
/^('|")(?:(?!'1).)+'1$/
TEXT;
foreach ($strings as $string):
echo "$string - ".(preg_match($regexp,$string)?'true':'false')."<br/>";
endforeach;
输出:'hello's the word' - false
'hello is the word' - true
"hello "there" he said" - false
"hello there he said" - true
"Hi - false
'hello - false
no quotes - false
'' - false
工作原理:
^('|") //starts with single or double-quote
(?: //non-capturing group
(?!'1) //next char is not the same as first single/double quote
. //advance one character
)+ //repeat group with next char (there must be at least one char)
'1$ //End with the same single or double-quote that started the string