可以使用Zend请求提取链接,标签内的字符串也来自国外网站的数据值,并将所有这些复制到数组并返回它?
以下面的网站http://bills.ru/为例,从下面的表格中提取"события на долговом рынке",所有的数据应该存储在具有以下结构的数组中:
id日期标题url
或者谁能至少给出一些实现Zend Request的好例子?
我建议使用像Goutte这样的东西,它不会让你过滤返回的html。
如果你不想使用额外的库,你也可以使用Zend'Dom
从你的Request
查询html
这是我设法完成以下任务的代码,它有效。
<?php
use Zend'Http'Client;
use Zend'Dom'Query;
/**
* Extracts date values, titles and links from block "события на долговом рынке" then save all date in 1
* array and prints it.
*
* Using Zend'Http'Client to make connection to website and further manipulate with Zend'Dom'Query's CSS selectors
* to retrieve date values, link and titles within block "событья на долговом рынке". Three private method are used to
* return values for each type and 1 public function used for retrieving
*
* @var client is Zend'Http'Client object and makes connection using function setUri() with declared website
* @var response servers as getting response from requested website
* @var dom is a Zend'Dom'Query object that allows manipulating with Zend'Http'Client objects
* @var results is a Zend'Dom'NodeList object made by using function execute()
* @var result used in foreach loop and for retrieving titles and url from a tag
* @var results_date same as @var results but for date values
* @var result_date same as @var result but for date values
* @var dateArray array where date values will be stored
* @var valuesArray array where data will be stored and printed afterwards
* @var html used to story content from @var client
*/
class BILLS
{
public $client;
public $response;
public $dom;
public $results;
public $result;
public $results_date;
public $result_date;
public $dateArray;
public $valuesArray;
public $html;
/**
* When new object with following class is created an object Zend'Http'Client is created and set Uri attribute.
* A request is being done to this object and data is put into $html variable for further use.
* @see client, response, html
*/
function __construct ()
{
$this->client = new 'Zend'Http'Client();
$this->client->setUri('http://bills.ru');
$this->client->send();
$this->response = $this->client->getResponse();
$this->html = $this->response->getBody();
}
/**
* Returns date values within object
* @see result_date
*/
private function _date()
{
return $this->result_date->textContent;
}
/**
* Returns text content within object
* @see result
*/
private function _title()
{
return $this->result->textContent;
}
/**
* Returns url within object
* @see result
*/
private function _url()
{
return $this->result->getAttribute('href');
}
/**
* If connection has no problems a new Query object is created and searched for a tags with class new. Then
* using a foreach loop found data is stored in array and printed to screen. Uses 3 private function for returning
* values for each type that will be stored in array an printed afterwards.
*
* @see dom, results_date, dateArray, results, valuesArray, _date(), _url(), _title()
*
*/
public function printTask()
{
$iteration = 0;
$iterationData = 0;
if($this->response->getStatusCode() == 200)
{
$this->dom = new Query($this->html);
$this->results_date = $this->dom->execute('table tr td.news');
foreach ($this->results_date as $this->result_date)
{
if($iterationData < 5)
{
$dateArray[$iterationData] = $this->_date();
$iterationData++;
}
}
$this->results = $this->dom->execute('table tr td a.news');
foreach ($this->results as $this->result)
{
if($iteration < 5)
{
$valuesArray = array(
'id' => $iteration+1,
'date' => $dateArray[$iteration],
'title' => $this->_title(),
'url' => "http://bills.ru".$this->_url()
);
echo '<pre>';
print_r($valuesArray);
echo '</pre>';
$iteration++;
}
}
}
}
}
$object = new BILLS;
$object->printTask();
?>