使用正则表达式提取整个url内容 - Extract entire url content using Regex

Extract entire url content using Regex

本文关键字：url 内容提取正则表达式 | 更新日期: 2023-09-27

好的，我使用(PHP) file_get_contents来阅读一些网站，这些网站只有一个facebook链接…我得到整个网站后，我想找到完整的Url为facebook

所以在某些地方会有:

<a href="http://facebook.com/username" >

我想要得到http://facebook.com/username，我的意思是从第一个(")到最后一个(")。用户名可变…也可以是用户名。我可以在"href"之前或之后设置一些属性。

只是以防我没有很清楚:

<a href="http://facebook.com/username" >  //I want http://facebook.com/username
<a href="http://www.facebook.com/username" >  //I want http://www.facebook.com/username
<a class="value" href="http://facebook.com/username. some" attr="value" >  //I want http://facebook.com/username. some

或以上所有示例，可以使用单引号

<a href='http://facebook.com/username' > //I want http://facebook.com/username

Thanks to all

不要在HTML上使用正则表达式。这把猎枪迟早会把你的腿炸飞的。使用DOM代替:

$dom = new DOMDocument;
$dom->loadHTML(...);
$xp = new DOMXPath($dom);
$a_tags = $xp->query("//a");
foreach($a_tags as $a) {
   echo $a->getAttribute('href');
}

我建议使用DOMDocument而不是使用regex。下面是针对您的情况的快速代码示例:

$dom = new DOMDocument();
$dom->loadHTML($content);
// To hold all your links...
$links = array();
$hrefTags = $dom->getElementsByTagName("a");
    foreach ($hrefTags as $hrefTag)
       $links[] = $hrefTag->getAttribute("href");
print_r($links); // dump all links