我有一个文本字符串,然后用php regex从中获取URL。可以有任何数量的链接,所以我使用
preg_match_all
问题是,出于某种原因,当我放入一个链接时,它认为有3个。当我做数组唯一时,它会过滤掉中间值,但不会过滤掉最后一个。
这是下面的代码
$bodyMessage = imap_body($hMail,$idxMsg);
$bodyMessage = quoted_printable_decode($bodyMessage);
preg_match_all('((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(''''))+['w'd:#@%/;$()~_?'+-='''.&]*)', $bodyMessage, $matches, PREG_PATTERN_ORDER);
$links = array_unique($matches[0]);
print_r($links);
print_r($links)的输出为:
Array ( [0] => http://usnews.msnbc.msn.com/_news/2012/07/20/12861792-6-year-old-girl-confirmed-to-have-been-killed-in-colorado-theater-shootings?lite
[2] => http://usnews.msnbc.msn.com/_news/2012/07/20/12861792-6-year-old-girl-confirmed-to-have-been-killed-in-colorado-theater-shootings?lite
它解析的电子邮件正文是:
--20cf300e4d7d02c34004c55e1489 Content-Type: text/plain; charset=ISO-8859-1 @bill http://usnews.msnbc.msn.com/_news/2012/07/20/12861792-6-year-old-girl-confirmed-to-have-been-killed-in-colorado-theater-shootings?lite --20cf300e4d7d02c34004c55e1489 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable @bill
有什么想法吗?谢谢
编辑:
我遵循了这个建议,通过修剪,返回了一个空数组
function trims($l){
trim($l);
}
$links = $matches[0];
$trimmedLinks = array_map("trims", $links);
$trimmedLinks = array_unique($trimmedLinks);
print_r($trimmedLinks); // = Array ( [0] => )
编辑:
我认为这可能与从imap那里获取身体信息有关。当我复制并粘贴来自imap的文本字符串,并将其设置为$bodyMessage时,它就工作了。。。建议?
您应该有这样的模式
((?:https?|ftp|gopher|telnet|file|notes|ms-help):(?:(?://)|(?:''''))+['w'd:#@%/;$()~_?'+-='''.&]*)
具有非捕获组。如果你放?:在括号中,您将获得非捕获组。然后一个数组将是:
Array ( [0] => http://usnews.msnbc.msn.com/_news/2012/07/20/12861792-6-year-old-girl-confirmed-to-have-been-killed-in-colorado-theater-shootings?lite )
编辑:这个问题的答案是使用imap_fetchbody而不是