我发现了一个php函数,用于从网页中抓取项目标题,它使用的正则表达式是/<div class='"detail'">(.*?)<p>/si
,如以下代码所示:我知道/<div class='"detail'">
正在尝试匹配特定的div,而(.*?)<p>
正在匹配该div之后和<p>
之前的任何字符,没有贪婪,但是/si
意味着什么?谢谢!
<?php
// Get the title
function match_title( $content ) {
preg_match( '/<div class='"detail'">(.*?)<p>/si', $content, $result );
isset( $result ) ? $title = trim( addslashes( $result[1] ) ) : $title = '';
return $title;
}
$url = "http://a.m.taobao.com/i21708516412.htm";
$item = file_get_contents($url);
$title=match_title( $item );
?>
查看所有修饰符:http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
i (PCRE_CASELESS)
If this modifier is set, letters in the pattern match both upper and lower case letters.
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.
总而言之:匹配换行并且表达式是无大小写的。
/si
是正则表达式匹配模式
i
使正则表达式匹配不区分大小写。
s
启用"单行模式"。在此模式下,点匹配换行符。
参见- Regex匹配模式