<Link to: http://www.someurl(.+)> maybe some text here(.*) <Link: www.someotherurl(.+)> maybe even more text(.*)
鉴于这一切都在一行上,我如何匹配或更好地提取所有完整的 url 和文本? 对于此示例,我希望提取:
http://www.someurl(.+)
. maybe some text here(.*)
. www.someotherurl(.+)
. maybe even more text(.*)
基本上,<Link.*:.*
将启动每个链接捕获,>
将结束它。然后,第一次捕获后的所有文本也将被捕获,直到下一次链接捕获出现零次或多次。
我试过:
preg_match_all('/<Link.*?:.*?(https|http|www)(.+?)>(.*?)/', $v1, $m4);
但我需要一种方法来捕获结束>
后的文本.问题是在第一个链接之后可能有也可能没有另一个链接(当然也可能没有链接!
$string = "<Link to: http://www.someurl(.+)> maybe some text here(.*) <Link: www.someotherurl(.+)> maybe even more text(.*)";
$string = preg_split('~<link(?: to)?:'s*([^>]+)>~i',$string,-1,PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
echo "<pre>";
print_r($string);
输出:
Array
(
[0] => http://www.someurl(.+)
[1] => maybe some text here(.*)
[2] => www.someotherurl(.+)
[3] => maybe even more text(.*)
)
您可以使用
此模式:
preg_match_all('~<link'b[^:]*:'s*'K(?<link>[^'s>]++)[^>]*>'s*(?<text>[^<]++)~',
$txt, $matches, PREG_SET_ORDER);
foreach($matches as $match) {
printf("<br/>link: %s'n<br/>text: %s", $match['link'], $match['text']);
}