我正在使用
public function __construct()
{
$this->EE =& get_instance();
$regex = '/('S+@'S+'.'S+)/';
$replace = '<a href="mailto:$1">$1</a>';
$this->return_data = preg_replace($regex, $replace, ee()->TMPL->tagdata);
}
但是,要查找纯文本电子邮件地址并将其更改为Mailto链接,所见即所得编辑器将结束段落标签放在链接之后,以便捕获结束标签并将其放入mailto链接中。我需要我的正则表达式来排除 .com 或 .net 或其他任何东西之后的任何内容。 我该怎么做?
现在,它正在返回 mailto:email@domain.com
,我需要排除.com之后的任何和所有标签 这是转储的一部分,这是输出的内容:
<br />
Preston Newbill<br />
Manager<br />
pnewbill@domain.com</p>
一个非常基本的正则表达式,用于获取电子邮件地址而不匹配任何HTML标签:
['w'.]+@['w'.'-]+
解释如下:
-
'w
:代表"单词字符",通常为 [A-Za-z0-9_]。通知包含下划线和数字 -
'.
:转义点 -
['w'.]+
:匹配任何单词字符和任何点
不幸的是,这并不匹配所有可能的电子邮件地址。有关更多详细信息,请参阅此问题。
完全符合 RFC-822 的正则表达式(源)将是:
(?:(?:'r'n)?[ 't])*(?:(?:(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't]
)+|'Z|(?=['["()<>@,;:''".'[']]))|"(?:[^'"'r'']|''.|(?:(?:'r'n)?[ 't]))*"(?:(?:
'r'n)?[ 't])*)(?:'.(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(
?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|"(?:[^'"'r'']|''.|(?:(?:'r'n)?[
't]))*"(?:(?:'r'n)?[ 't])*))*@(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'0
31]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'
](?:(?:'r'n)?[ 't])*)(?:'.(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+
(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:
(?:'r'n)?[ 't])*))*|(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z
|(?=['["()<>@,;:''".'[']]))|"(?:[^'"'r'']|''.|(?:(?:'r'n)?[ 't]))*"(?:(?:'r'n)
?[ 't])*)*'<(?:(?:'r'n)?[ 't])*(?:@(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'
r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[
't])*)(?:'.(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)
?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't]
)*))*(?:,@(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[
't])+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*
)(?:'.(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't]
)+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*))*)
*:(?:(?:'r'n)?[ 't])*)?(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+
|'Z|(?=['["()<>@,;:''".'[']]))|"(?:[^'"'r'']|''.|(?:(?:'r'n)?[ 't]))*"(?:(?:'r
'n)?[ 't])*)(?:'.(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:
'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|"(?:[^'"'r'']|''.|(?:(?:'r'n)?[ 't
]))*"(?:(?:'r'n)?[ 't])*))*@(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031
]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](
?:(?:'r'n)?[ 't])*)(?:'.(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?
:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:(?
:'r'n)?[ 't])*))*'>(?:(?:'r'n)?[ 't])*)|(?:[^()<>@,;:''".'['] '000-'031]+(?:(?
:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|"(?:[^'"'r'']|''.|(?:(?:'r'n)?
[ 't]))*"(?:(?:'r'n)?[ 't])*)*:(?:(?:'r'n)?[ 't])*(?:(?:(?:[^()<>@,;:''".'[']
'000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|"(?:[^'"'r'']|
''.|(?:(?:'r'n)?[ 't]))*"(?:(?:'r'n)?[ 't])*)(?:'.(?:(?:'r'n)?[ 't])*(?:[^()<>
@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|"
(?:[^'"'r'']|''.|(?:(?:'r'n)?[ 't]))*"(?:(?:'r'n)?[ 't])*))*@(?:(?:'r'n)?[ 't]
)*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''
".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*)(?:'.(?:(?:'r'n)?[ 't])*(?
:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[
']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*))*|(?:[^()<>@,;:''".'['] '000-
'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|"(?:[^'"'r'']|''.|(
?:(?:'r'n)?[ 't]))*"(?:(?:'r'n)?[ 't])*)*'<(?:(?:'r'n)?[ 't])*(?:@(?:[^()<>@,;
:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|'[([
^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*)(?:'.(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''"
.'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'['
]'r'']|''.)*'](?:(?:'r'n)?[ 't])*))*(?:,@(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'
['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'
r'']|''.)*'](?:(?:'r'n)?[ 't])*)(?:'.(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'[']
'000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']
|''.)*'](?:(?:'r'n)?[ 't])*))*)*:(?:(?:'r'n)?[ 't])*)?(?:[^()<>@,;:''".'['] '0
00-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|"(?:[^'"'r'']|''
.|(?:(?:'r'n)?[ 't]))*"(?:(?:'r'n)?[ 't])*)(?:'.(?:(?:'r'n)?[ 't])*(?:[^()<>@,
;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']]))|"(?
:[^'"'r'']|''.|(?:(?:'r'n)?[ 't]))*"(?:(?:'r'n)?[ 't])*))*@(?:(?:'r'n)?[ 't])*
(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".
'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*)(?:'.(?:(?:'r'n)?[ 't])*(?:[
^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'[']
]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*))*'>(?:(?:'r'n)?[ 't])*)(?:,'s*(
?:(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''
".'[']]))|"(?:[^'"'r'']|''.|(?:(?:'r'n)?[ 't]))*"(?:(?:'r'n)?[ 't])*)(?:'.(?:(
?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=[
'["()<>@,;:''".'[']]))|"(?:[^'"'r'']|''.|(?:(?:'r'n)?[ 't]))*"(?:(?:'r'n)?[ 't
])*))*@(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't
])+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*)(?
:'.(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|
'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*))*|(?:
[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".'['
]]))|"(?:[^'"'r'']|''.|(?:(?:'r'n)?[ 't]))*"(?:(?:'r'n)?[ 't])*)*'<(?:(?:'r'n)
?[ 't])*(?:@(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["
()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*)(?:'.(?:(?:'r'n)
?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>
@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*))*(?:,@(?:(?:'r'n)?[
't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,
;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*)(?:'.(?:(?:'r'n)?[ 't]
)*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''
".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*))*)*:(?:(?:'r'n)?[ 't])*)?
(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['["()<>@,;:''".
'[']]))|"(?:[^'"'r'']|''.|(?:(?:'r'n)?[ 't]))*"(?:(?:'r'n)?[ 't])*)(?:'.(?:(?:
'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z|(?=['[
"()<>@,;:''".'[']]))|"(?:[^'"'r'']|''.|(?:(?:'r'n)?[ 't]))*"(?:(?:'r'n)?[ 't])
*))*@(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])
+|'Z|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*)(?:'
.(?:(?:'r'n)?[ 't])*(?:[^()<>@,;:''".'['] '000-'031]+(?:(?:(?:'r'n)?[ 't])+|'Z
|(?=['["()<>@,;:''".'[']]))|'[([^'[']'r'']|''.)*'](?:(?:'r'n)?[ 't])*))*'>(?:(
?:'r'n)?[ 't])*))*)?;'s*)
您可以尝试将正则表达式更改为以下内容:
/('S+@'S+'.[^'<]+)/
当它在顶级域中遇到第一个<
时,这将停止捕获。
@ukliviu提出了一种更严格的方法,其误报率甚至比 HTML 标记更少。
从广义上讲,尝试将HTML标记与正则表达式混合是一个坏主意。您的结果会有所不同 - 对于可靠的脚本来说变化太大。如果您需要解析 HTML,请使用 PHP 中可用的 HTML 解析器,DomDocument。
摆脱HTML甚至更简单。您可以使用strip_tags
从字符串中删除任何和所有 HTML,甚至是损坏的标记。您的代码可以简单地是:
$this->return_data = strip_tags(ee()->TMPL->tagdata);
概念验证:
$sample1 = 'mailto:email@domain.com</p>';
echo 'dirty: '.htmlentities($sample1).', clean: '.htmlentities(strip_tags($sample1));
// output: dirty: mailto:email@domain.com</p>, clean: mailto:email@domain.com
在这里看到它的实际效果:http://codepad.viper-7.com/KHsIr0
一个函数调用,无需维护疯狂的正则表达式。
下面是如何使用 DomDocument 执行此操作的示例:
// create a new DomDocument object
$doc = new DOMDocument();
// load the HTML into the DomDocument object (this would be your source HTML)
libxml_use_internal_errors(true);
$doc->loadHTML('
<p>
<br>
Preston Newbill<br>
Manager<br>
pnewbill@domain.com<br>
<a href="mailto:noob@aol.com">also email me @ noob@aol.com</a><br>
Party 9/15/2013@10:00pm!
');
libxml_clear_errors();
// grab the body, recursively check for child nodes. Turn any email addresses into links
$body = $doc->getElementsByTagName('body')->item(0);
checkDomNodeForEmailAddress($body);
// strip off the html,head, and body
$doc->removeChild($doc->firstChild);
$doc->replaceChild($doc->firstChild->firstChild->firstChild, $doc->firstChild);
die('<hr>final product:'.htmlentities($doc->saveHtml()));
function checkDomNodeForEmailAddress(DOMNode $domNode) {
foreach ($domNode->childNodes as $node) {
if($node->hasChildNodes()) {
if (strtolower($node->nodeName) != 'a')
checkDomNodeForEmailAddress($node);
} else {
$node->nodeValue = preg_replace('/('S+@'S+'.[^'<]+)/', '<a href="mailto:$1">$1</a>', $node->nodeValue);
}
}
}
在这里尝试一下: http://codepad.viper-7.com/EpdBKx
文档
-
strip_tags
- http://php.net/manual/en/function.strip-tags.php - 文档 - http://php.net/manual/en/class.domdocument.php