我想获取URL的html。基本上我正在这样做:
$client = new Zend_Http_Client($url);
$client->setConfig(array('strictredirects' => true, 'timeout'=> 100, 'storeresponse' => true));
$response = $client->request();
$html = $response->getBody();
对于某些重定向的URL,我收到以下错误
提供的 URI 无效
例如,如果您考虑以下 URL:
http://www.hiexpress.com/redirect?path=hd&brandCode=ex&hotelCode=housl®ionCode=1&localeCode=en&cm_mmc=mdpr--谷歌地图--前_-侯斯尔
它会重定向到另一个 URL。当我尝试获得最后一个响应时,它什么也没给我。我怎么会得到这个网址的html?我尝试了配置选项严格重定向,但仍然给出相同的错误。我怎么解决??
试试这个 把它放到你的控制器中
// @$uri your url
$client = new Zend_Http_Client($uri, array(
'maxredirects' => 2,
'timeout' => 10,
));
// Try to mimic the requesting user's UA
$client->setHeaders(array(
'User-Agent' => $_SERVER['HTTP_USER_AGENT'],
'X-Powered-By' => 'Zend Framework'
));
$response = $client->request();
$body = $response->getBody();
$body = trim($body);
// Get DOM
if( class_exists('DOMDocument') ) {
$dom = new Zend_Dom_Query($body);
} else {
$dom = null; // Maybe add b/c later
}
$title = null;
if( $dom ) {
$titleList = $dom->query('title');
if( count($titleList) > 0 ) {
$title = trim($titleList->current()->textContent);
$title = substr($title, 0, 255);
}
}
$this->view->title = $title;//Title of the page
$description = null;
if( $dom ) {
$descriptionList = $dom->queryXpath("//meta[@name='description']");
// Why are they using caps? -_-
if( count($descriptionList) == 0 ) {
$descriptionList = $dom->queryXpath("//meta[@name='Description']");
}
if( count($descriptionList) > 0 ) {
$description = trim($descriptionList->current()->getAttribute('content'));
$description = substr($description, 0, 255);
}
}
$this->view->description = $description;// Description of the page
由于某种原因,Zend Http 客户端无法正常工作,不得不使用 CURL 来完成工作。要获取 HTML:
$headers = array( "User-Agent:MyAgent/1.0'r'n");
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_MAXREDIRS, 50);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl, CURLOPT_TIMEOUT, 20);
$html = curl_exec($curl);
curl_close($curl);
echo $html;
要获取有效网址/重定向网址:
$headers = array( "User-Agent:MyAgent/1.0'r'n");
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_MAXREDIRS, 50);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl, CURLOPT_TIMEOUT, 20);
$content = curl_exec($curl);
$redirectedUrl = curl_getinfo($curl, CURLINFO_EFFECTIVE_URL);
curl_close($curl);
echo $redirectedUrl;