如何使用 PHP 以有效的方式从 HTML 代码中获取 URL


How to get the URL from the HTML code in an efficient way using PHP?

我有以下两个包含HTML代码的变量:

$var1= Profile photo uploaded<div class="comment_attach_image">
<a class="group1 cboxElement" 
   href="http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png" >
  <img src="http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png" height="150px" width="150px" />
</a>
<a class="comment_attach_image_link_dwl"  href="http://52.1.47.143/feed/download/year_2015/month_03/file_a4ea5532b83a56bbbae2fffc80de4fee.png" >Download</a>
</div>
$var2 = PDF file added<div class="comment_attach_file">
        <a class="comment_attach_file_link" href="http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf" >1b87d4420c693f2bbdf738cbf2457d89.pdf</a>
        <a class="comment_attach_file_link_dwl"  href="http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf" >Download</a>
        </div>

我只想从上述两个变量中提取 URL。我想要从上述两个变量中得到的如下:

$new_var1 = http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png;
$new_var2 = http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf ;

如何在PHP中以高效和更智能的方式做到这一点?

或者用PHP的方式做(是的...j/k):

<?php
$var1 = 'Profile photo uploaded<div class="comment_attach_image">
<a class="group1 cboxElement" 
   href="http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png" >
  <img src="http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png" height="150px" width="150px" />
</a>
<a class="comment_attach_image_link_dwl"  href="http://52.1.47.143/feed/download/year_2015/month_03/file_a4ea5532b83a56bbbae2fffc80de4fee.png" >Download</a>
</div>';
$var2 = 'PDF file added<div class="comment_attach_file">
        <a class="comment_attach_file_link" href="http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf" >1b87d4420c693f2bbdf738cbf2457d89.pdf</a>
        <a class="comment_attach_file_link_dwl"  href="http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf" >Download</a>
        </div>';
$url_regex = '/(href|src)="(.*?)"/';
preg_match_all($url_regex, $var1, $matches);
var_dump($matches);
preg_match_all($url_regex, $var2, $matches);
var_dump($matches);

将产生这个:

array(3) {
  [0]=>
  array(3) {
    [0]=>
    string(86) "href="http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png""
    [1]=>
    string(85) "src="http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png""
    [2]=>
    string(100) "href="http://52.1.47.143/feed/download/year_2015/month_03/file_a4ea5532b83a56bbbae2fffc80de4fee.png""
  }
  [1]=>
  array(3) {
    [0]=>
    string(4) "href"
    [1]=>
    string(3) "src"
    [2]=>
    string(4) "href"
  }
  [2]=>
  array(3) {
    [0]=>
    string(79) "http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png"
    [1]=>
    string(79) "http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png"
    [2]=>
    string(93) "http://52.1.47.143/feed/download/year_2015/month_03/file_a4ea5532b83a56bbbae2fffc80de4fee.png"
  }
}
array(3) {
  [0]=>
  array(2) {
    [0]=>
    string(100) "href="http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf""
    [1]=>
    string(100) "href="http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf""
  }
  [1]=>
  array(2) {
    [0]=>
    string(4) "href"
    [1]=>
    string(4) "href"
  }
  [2]=>
  array(2) {
    [0]=>
    string(93) "http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf"
    [1]=>
    string(93) "http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf"
  }
}

请参阅preg_match_all了解所包含的内容。如果您真的只需要匹配的第一个URL,请选择preg_match,它具有与preg_match_all相同的功能签名。

如果你试图解析一个DOM,JavaScript将是一个更好的选择。但是,如果您坚持使用PHP,请尝试下载此名为Simple HTML DOM的HTML解析器。他们的网站上有很好的文档,但对于您要做的事情,我会使用以下内容

// Get the contents of your page
$html = file_get_html('http://linkto.com/yourfile.html');
// Find all links this way
foreach($html->find('a') as $element)  {
   echo $element->href.'<br>';
}
// Target the two particular variables as follows
// Target the first variable by the anchor tag's class name
$new_var1 = $html->find('a[class=group1 cboxElement]', 0)->href; 
$new_var2 = $html->find('a[class=comment_attach_file_link_dwl]', 0)->href;