从RSS错误中删除项目


Removing Item From RSS Error

我使用以下PHP从我拥有的XML中删除超过8天的条目。它以前工作得很好,但现在给我错误信息

在非对象上调用成员函数removeChild()/Users//DateTest-3.php on line 40

第40行是

$node->parentNode->removeChild($node);

知道为什么会抛出错误吗?

<?php
$rss = new DOMDocument();
$url = 'http://URL.com/Test.xml';
$rss->load($url);
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
    $item = array ( 
        'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
        'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
        'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
        'date' => $node->getElementsByTagName('date')->item(0)->nodeValue,
    );
    array_push($feed, $item);
}
$limit = 50;
for ($i = 0; $i < count($feed); $i++) {
    date_default_timezone_set('America/Los_Angeles');
    $newDate = strtotime("-8 day");
    $date = strtotime($feed[$i]['date']);
    if ($date > $newDate) {
        echo "Don't delete";
    } else {
        echo "Delete";
        $node->parentNode->removeChild($node);
    }
}
$rss->save("Test.xml")


?>
  • 在RSS 1.0中没有"日期"。但是"dc:date"开始起作用了。http://web.resource.org/rss/1.0/spec s5.5

  • 在RSS 2.0中,没有'date',只有'pubdate'。http://cyber.law.harvard.edu/rss/rss.html hrelementsOfLtitemgt

  • 决定,如果你想寻找'date', 'dc:date'和'pubDate'。下面的代码与pubDate一起工作:

  • $limit = 50;未使用

  • 从迭代的nodeList中删除节点将不起作用。这是顶旧帽子!查看此处的评论:http://php.net/manual/de/domnode.removechild.php解决方案是使用队列来标记坏节点并在之后删除它们。

  • 我冒昧地把代码弄乱了一点。我故意让调试程序处于活动状态。主要用于日期比较和简化列表显示。

  • 请调整提要URL和条件中的"-x days"。我不得不使用一个公共rss源来测试东西

,

<?php
date_default_timezone_set('America/Los_Angeles');
$feed = array(); // target array for filtered items
$nodesToRemoveQueue = array(); // stores all nodes to remove
$rss = new DOMDocument();
$url = 'http://rss.nytimes.com/services/xml/rss/nyt/Space.xml';
$rss->load($url);
$nodeList = $rss->getElementsByTagName('item');
foreach ($nodeList as $node)
{
    $pubDate = $node->getElementsByTagName('pubDate')->item(0)->nodeValue;
    // if date in the xml feed is older then desired number of days, remove node
    // and proceed with iteration. (do not transfer the data into the $feeds array.)
    if(isDateOlderThenDays($pubDate, '-5 days')) {
        echo 'Removed ' . $pubDate . '<br>';
        // $node->parentNode->removeChild($node); this won't work!!
        $nodesToRemoveQueue[] = $node; // put node in queue, remove later
        continue;
    }
    echo 'Kept ' . $pubDate . '<br>';
    // build item for $feed array, then add item to $feed array
    $item = array (
        'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
        'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
        'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
        'date' => $pubDate,
    );
    $feed[] = $item;
}
// helper to compare dates -
function isDateOlderThenDays($date, $days)
{
    // when pubdate($date) is lower(older) then $days, return true, else false.
    return (strtotime($date) < strtotime($days)) ? true : false;
}
// feed array contains all the not "outdated" items
var_dump($feed);
// finally: remove the "outdated" nodes
foreach($nodesToRemoveQueue as $node){
  $node->parentNode->removeChild($node);
}
// nodelist reduction check. this should only displays the dates kept
$nodeList = $rss->getElementsByTagName('item');
foreach ($nodeList as $node) {
    echo $node->getElementsByTagName('pubDate')->item(0)->nodeValue . '<br>';
}
// write reduced RSS XML to file
$rss->save(__DIR__.'/Test.xml');

另一种保存XML的方法是:

$xmlString = $rss->saveXML();
file_put_contents(__DIR__.'/Test.xml', $xmlString);

是否故意只处理

之后的最后一个节点?
foreach ($rss->getElementsByTagName('item') as $node)

因为$node与最后一个$rss->getElementsByTagName('item')赋值保持在一起。还是代码丢失了?

在第二个foreach中,在每次迭代中重新分配$node。例:$node = $feed[$i] .