简单的HTML dom url长度错误


simple html dom url length error

<?php
include('../simple_html_dom.php');
$fname = "http://www.myurl.com";
$html = file_get_html($fname);
$divs = $html->find('h6');
foreach($divs as $element)
{
 $title = $element->find('a', 0)->plaintext;
 echo $title.'<br>';
}
echo '<br>';
?>

我得到这个错误:

"打开流失败:HTTP请求失败! "HTTP/1.1 500内部服务器错误,......."

我的url很长,它的实际长度是750个字符。如果我使用wget它显示"文件名太长"

我该如何修复它?我需要它的工作与简单的dom

750个字符作为URL长度是可以的。最常用的实际限制是2000个字符,这是旧ie中的限制。

您应该尝试模拟web浏览器发出请求。参见另一个问题

编辑:在你的代码中使用CURL

<?php
// include is not a function, don't use parens (also use require instead)
require '../simple_html_dom.php';
$fname = "http://www.myurl.com";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
// don't want to polute your output
//curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, $fname);
$result=curl_exec($ch);
$html = new simple_html_dom();
$html->load($result);
$divs = $html->find('h6');
foreach($divs as $element)
{
 $title = $element->find('a', 0)->plaintext;
 echo $title.'<br>';
}
echo '<br>';

URL长度可以。链接可能已断开或已过期。我尝试了下面所示的链接,结果似乎很好:

<?php
include("simple_html_dom.php");
$fname = "http://www.youtubeonfire.com/?genre=0&language=0&next_token=rO0ABXNyACdjb20uYW1hem9uLnNkcy5RdWVyeVByb2Nlc3Nvci5Nb3JlVG9rZW7racXLnINNqwMA%0AC0kAFGluaXRpYWxDb25qdW5jdEluZGV4WgAOaXNQYWdlQm91bmRhcnlKAAxsYXN0RW50aXR5SURa%0AAApscnFFbmFibGVkSQAPcXVlcnlDb21wbGV4aXR5SgATcXVlcnlTdHJpbmdDaGVja3N1bUkACnVu%0AaW9uSW5kZXhaAA11c2VRdWVyeUluZGV4TAANY29uc2lzdGVudExTTnQAEkxqYXZhL2xhbmcvU3Ry%0AaW5nO0wAEmxhc3RBdHRyaWJ1dGVWYWx1ZXEAfgABTAAJc29ydE9yZGVydAAvTGNvbS9hbWF6b24v%0Ac2RzL1F1ZXJ5UHJvY2Vzc29yL1F1ZXJ5JFNvcnRPcmRlcjt4cAAAAAEAAAAAAAABds0AAAAAAQAA%0AAAC71ED7AAAAAAFwdAAQMDAwMDAwMDAwMDAwMjAxM35yAC1jb20uYW1hem9uLnNkcy5RdWVyeVBy%0Ab2Nlc3Nvci5RdWVyeSRTb3J0T3JkZXIAAAAAAAAAABIAAHhyAA5qYXZhLmxhbmcuRW51bQAAAAAA%0AAAAAEgAAeHB0AApERVNDRU5ESU5HeA%3D%3D&sort=2";
$html = file_get_html($fname);
$divs = $html->find("h6");
foreach($divs as $element) {
    $title = $element->find("a", 0)->plaintext;
    echo($title . "<br />");
}
echo("<br />");
输出:

Spider (2013)
500 MPH STORM 2013 HD
Van Diemans Land (Action,Adventure,20...
Good Agent is A Bad Agent (Full HQ En...
Employee of the Month (Full HQ Englis...
The Croods (2013)
GIRLFRIENDS - 2013
Boys Are Pigs-2013
The Patriot -2013
My Daughter&#x27;s Secret -2013
Dead on Arrival [2013]
Flght 2013XViD1
Samsung Galaxy S4 Presentation UNPACK...
Affinity 2013
Golden Globe Awards 2013: Full Show
Parker-2013
Hells&#x27; Kitchen-  New Action Movie 2013
ALIENS [2013]
7 Nights Of Darkness -2013
Hansel And Gretel 2013
The Collection (2012)
Mac And Devin Go To High School 2012
Red Dawn (2012)
Hijacked -2012
Bending The Rules -2012
Inside -2012
VAMPIRELAND-2012
Dead Mine -2012
Devil Seed-2012
Kill Em All -2012
One In The Chamber -2012
The Forger - 2012
Dark Desire -2012
A Common Man -2012 .
The Helpers -2012
Red Dawn- 2012 720p

所以,修复URL的问题,一切都会正常工作!

你说你的URL在你的浏览器中工作,而我们这里所有人都收到一个500错误,就像你的脚本一样。

站点可能会根据IP和请求的其他报头检查URL中的令牌。因此,您需要找到一种方法从PHP脚本中获得一个标记化的URL。

为此,您需要首先从PHP脚本下载主页,然后找到下一个链接的URL并在您的脚本中使用此链接。