改进了多个 RSS 源请求的 PHP 性能


Improving PHP Performance of Multiple RSS Feed Requests?

我正在开发 www.sciencedaily.com 的一个小移动应用程序版本。只是一个小小的副项目。

我正在使用RSS_PHP来获取XML提要,我拥有的代码(效果很好)在DOCTYPE之前如下所示:

require_once '_/rss_php.php';
$featuredRSS = new rss_php;
$healthMedRSS = new rss_php;
$mindBrainRSS = new rss_php;
$plantsAnimalsRSS = new rss_php;
$earthClimateRSS = new rss_php;
$spaceTimeRSS = new rss_php;
$matterEnergyRSS = new rss_php;
$compMathRSS = new rss_php;
$archaeoRSS = new rss_php;
$featuredRSS->load('http://www.sciencedaily.com/rss/top_news/top_science.xml');
$healthMedRSS->load('http://www.sciencedaily.com/rss/health_medicine.xml');
$mindBrainRSS->load('http://www.sciencedaily.com/rss/mind_brain.xml');
$plantsAnimalsRSS->load('http://www.sciencedaily.com/rss/plants_animals.xml');
$earthClimateRSS->load('http://www.sciencedaily.com/rss/earth_climate.xml');
$spaceTimeRSS->load('http://www.sciencedaily.com/rss/space_time.xml');
$matterEnergyRSS->load('http://www.sciencedaily.com/rss/matter_energy.xml');
$compMathRSS->load('http://www.sciencedaily.com/rss/computers_math.xml');
$archaeoRSS->load('http://www.sciencedaily.com/rss/fossils_ruins.xml');
$featuredItems = $featuredRSS->getItems();
$healthMedItems = $healthMedRSS->getItems();
$mindBrainItems = $mindBrainRSS->getItems();
$plantsAnimalsItems = $plantsAnimalsRSS->getItems();
$earthClimateItems = $earthClimateRSS->getItems();
$spaceTimeItems = $spaceTimeRSS->getItems();
$matterEnergyItems = $matterEnergyRSS->getItems();
$compMathItems = $compMathRSS->getItems();
$archaeoItems = $archaeoRSS->getItems();

然后,在内容中,我通过使用经典的东西来回显结果,例如

foreach($items as $item) {
    echo $item['title'];
}
etc...

就像我说的,一切都很好。但它慢得要命。我知道应用程序的速度会有限制,因为它必须抓取提要,但是RSS_PHP没有像SimplePie那样的缓存能力。

关于提高速度的任何想法?也许先加载特色内容,然后再加载其他所有内容?

提前感谢!!

使用 CURL 库。可以选择同时进行多个请求。所有请求将同时并行运行。查看此链接以获取示例和教程。PHP curl 并行请求

更新

查找有关php_rss的文档。用

$testRSS->loadRSS($res); //$Res is string data from Curl instead of url

我将尝试让您开始使用线程。为了清楚起见,我正在使用减少数量的 rss 提要。

require_once '_/rss_php.php';
public class loadFeeds(){
private int workers = 0;
private function launchWorker(&$feed, $url) {//NOTE: pass by reference!
  $pid = pcntl_fork();
  switch ($pid) {
  case -1: // fork failed
    $this->clog("ERROR: Worker fork failure. Running inline.");
    $feed->load($url);
    break;
  case 0: // child fork
    $feed->load($url));
    break;
  default: // parent fork?
    $this->workers++;
    echo "$this->workers launched.'n";
    break;
}
public function load() {
  $featuredRSS = new rss_php;
  $healthMedRSS = new rss_php;
  $mindBrainRSS = new rss_php;
  //Start some threads
  launchWorker($featuredRSS, 'http://www.sciencedaily.com/rss/top_news/top_science.xml');
  launchWorker($healthMedRSS, 'http://www.sciencedaily.com/rss/top_news/top_science.xml');
  launchWorker($mindBrainRSS, 'http://www.sciencedaily.com/rss/top_news/top_science.xml');
  $status = 0;
  while ($this->workers > 0) {//wait until all workers are done.
    while ($pid = pcntl_wait($status, WNOHANG OR WUNTRACED)) {//wait for a thread to report
      if ($pid > 0) { // if we have a valid pid 
        $this->workers--;//recover worker
      }
    }
  }
  $featuredItems = $featuredRSS->getItems();
  $healthMedItems = $healthMedRSS->getItems();
  $mindBrainItems = $mindBrainRSS->getItems();
}

请注意,我还没有对此进行测试,因为我目前还没有设置,但它为您提供了主要组件。如果您遇到问题,可以在此处生成其他问题。

你可以:

  • 实现某种形式的缓存,这在任何情况下都会有所帮助。
  • 也许考虑使用一些ajax,页面加载速度会很快,RSS一进去就会一一显示。你也可能是能够模拟运行多个调用。

实现缓存服务器端,这是我几年前创建的脚本(根据需要更正):

rss-fetch.sh:

#!/bin/bash
#------------------------------------------------------------------
#
# This script will run via CRONTAB and fetch RSS feeds from the
# urls.txt file, which can be used internally.  This way we minimize
# the number of requests externally for data.
#
# - created by Jakub - March 26 2008
#
#------------------------------------------------------------------
basedir=/htdocs/RSS
storedir=/htdocs/RSS/read/
sourcefile=/htdocs/RSS/urls.txt
#------------------------------------------------------------------
# Read the URLS.TXT file to get the URL/filename
#
# Formatted:
# http://rss.slashdot.org/Slashdot/slashdot/slashdot.xml
# ^- URL                    ^- filename to save as
for s in `cat "$sourcefile"`; 
    do  
        geturl=`dirname $s`;
        filename=`basename $s`;
        wget -qN $geturl -pO "$storedir"$filename;
done;
#------------------------------------------------------------------

然后将本地 rss 提要拉入您的 PHP 进行解析,您的延迟是获取外部源。

如果您在 CRON 上设置上述脚本,它将以您想要的任何频率获取。享受!

根据文档,它也可以从本地 URI 加载数据 - 为什么不在单独的脚本中获取远程提要,比如每 15 分钟一次,并在此处仅加载本地版本?它将减少远程服务器上的负载减少带宽使用量。