使用DOMDocument从rss提要读取图像url时出现的问题


Problems on reading image url from a rss feed, using DOMDocument

我有一个rss提要

<rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">  
<item> 
      <title>VIDEO: Have you heard of Alibaba?</title>  
      <description>Alibaba is the world's biggest e-commerce firm but most people in the West haven't heard of it.</description>  
      <link>http://www.bbc.co.uk/news/business-29216696#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>  
      <guid isPermaLink="false">http://www.bbc.co.uk/news/business-29216696</guid>  
      <pubDate>Tue, 16 Sep 2014 02:29:17 GMT</pubDate>  
      <media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/77609000/jpg/_77609399_73619721.jpg"/>  
      <media:thumbnail width="144" height="81" url="http://news.bbcimg.co.uk/media/images/77609000/jpg/_77609400_73619721.jpg"/> 
    </item>  
    <item> 
      <title>VIDEO: Phones 4U shops closing for business</title>  
      <description>Retailer Phones 4U has gone into administration putting 5,596 jobs at risk.</description>  
      <link>http://www.bbc.co.uk/news/business-29202179#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>  
      <guid isPermaLink="false">http://www.bbc.co.uk/news/business-29202179</guid>  
      <pubDate>Mon, 15 Sep 2014 22:15:50 GMT</pubDate>  
      <media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/77587000/jpg/_77587217_77587209.jpg"/>  
      <media:thumbnail width="144" height="81" url="http://news.bbcimg.co.uk/media/images/77587000/jpg/_77587218_77587209.jpg"/> 
    </item> 
</rss>

我能够使用php的DOMDocument类从这个rss中读取标题和描述。

以下是我的代码

$xml = 'http://feeds.bbci.co.uk/news/video_and_audio/business/rss.xml' ;
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
$items=$xmlDoc->getElementsByTagName('item');
foreach($items as $item){
$item_title= $item->getElementsByTagName('title')->item(0)->childNodes->item(0)->nodeValue;
$item_link= $item->getElementsByTagName('link')->item(0)->childNodes->item(0)->nodeValue;
$item_desc= $item->getElementsByTagName('description')->item(0)->childNodes->item(0)->nodeValue;

}

但如何才能读取每个项目的"媒体:缩略图"标签的url?

因为它有名称空间,所以在这种情况下将getElementsByTagNameNS()->getAttribute()一起使用。示例:

$xml = 'http://feeds.bbci.co.uk/news/video_and_audio/business/rss.xml' ;
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
$items = $xmlDoc->getElementsByTagName('item');
foreach($items as $key => $item) {
    $item_title= $item->getElementsByTagName('title')->item(0)->childNodes->item(0)->nodeValue;
    $item_link= $item->getElementsByTagName('link')->item(0)->childNodes->item(0)->nodeValue;
    $item_desc= $item->getElementsByTagName('description')->item(0)->childNodes->item(0)->nodeValue;
    $media = $item->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'thumbnail');
    foreach($media as $thumb) {
        echo $thumb->getAttribute('url') . '<br/>';
    }
}

SimpleXMLElement变体:

$xml = simplexml_load_file('http://feeds.bbci.co.uk/news/video_and_audio/business/rss.xml');
foreach($xml->channel->item as $item) {
    $title = $item->title;
    $description = $item->description;
    $link = $item->link;
    $media = $item->children('media', 'http://search.yahoo.com/mrss/');
    foreach($media->thumbnail as $thumb) {
        echo $thumb->attributes()->url . '<br/>';
    }
}

使用Xpath。它是DOM扩展的一部分,允许您使用表达式从DOM中获取节点和值。与XML本身一样,Xpath允许您为名称空间定义前缀/别名。

$dom = new DOMDocument;
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('m', 'http://search.yahoo.com/mrss/');
$xpath->registerNamespace('a', 'http://www.w3.org/2005/Atom');
foreach ($xpath->evaluate('//item') as $itemNode) {
  $item = [
    'title' => $xpath->evaluate('string(title)', $itemNode),
    'link' => $xpath->evaluate('string(link)', $itemNode),
    'description' => $xpath->evaluate('string(description)', $itemNode),
  ];
  foreach ($xpath->evaluate('m:thumbnail/@url', $itemNode) as $urlAttribute) {
    $item['thumbnails'][] = $urlAttribute->value;
  }  
  var_dump($item);
}