具有指定URL的cURL有效,但preg_match URL失败


cURL with specified URLs works, with preg_match URLs fails

我使用的网站在您访问时存储两个cookie(ASP.NET_SessionID__RequestVerificationToken_XXXXXXXXX)。

该页面由一个带有pdf链接的div和一个带有"pdf查看器"源代码的iframe组成。

我正在尝试使用cURL来检索这两个cookie,然后下载pdf。我发现我必须在cURL中设置几个选项。然而,我仍然无法下载pdf。

我现在的设置是:

  1. 点击主页,(a)保存ASP.NET_SessionID cookie,(b)从iframe中找到"pdf查看器"URL,(c)找到pdf下载URL
  2. 点击"pdf查看器"URL并保存__RequestVerificationToken_XXXXXXXXX cookie
  3. 从步骤1和2中创建cookie头
  4. 使用cURL、pdf下载URL和发送cookie头下载文件

然而,我的文件结果只是一个登录页面。

第一个cURL:

$agent= 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0';
$report_url = "[my_main_url_here]";
$ch1 = curl_init($report_url);
curl_setopt($ch1, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch1, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch1, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch1, CURLOPT_HEADER, true);
curl_setopt($ch1, CURLOPT_SSLVERSION, 4);
curl_setopt($ch1, CURLOPT_USERAGENT, $agent);
curl_setopt($ch1, CURLOPT_SSL_CIPHER_LIST, 'AES128-SHA:RC2-CBC-MD5');
curl_setopt($ch1, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch1, CURLOPT_HEADER, 1);
curl_setopt($ch1, CURLOPT_VERBOSE, true);
curl_setopt($ch1, CURLOPT_NOBODY, false);
$output1 = curl_exec($ch1);
curl_close($ch1);

我使用preg_match找到pdf下载链接:

preg_match("/'/ReportID=.{30}/", $output1, $pdf_link);
$pdf_viewer_full = "https://gate.aon.com" . $pdf_link[0];

然后我点击pdf查看器URL获得第二个cookie:

$ch2 = curl_init($viewer_url_full);
curl_setopt($ch2, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch2, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch2, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch2, CURLOPT_HEADER, true);
curl_setopt($ch2, CURLOPT_SSLVERSION, 4);
curl_setopt($ch2, CURLOPT_USERAGENT, $agent);
curl_setopt($ch2, CURLOPT_SSL_CIPHER_LIST, 'AES128-SHA:RC2-CBC-MD5');
curl_setopt($ch2, CURLOPT_HEADER, 1);
curl_setopt($ch2, CURLOPT_VERBOSE, true);
curl_setopt($ch2, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch2, CURLOPT_NOBODY, false);
$output2 = curl_exec($ch2);
curl_close($ch2);

然后我从这两个文件的标题中取出cookie:

preg_match("/ASP.NET_SessionId=......................../", $output1, $cookie1);
preg_match("/__RequestVerificationToken_.{145}/", $output2, $cookie2);
$cookies = 'Cookie: ' . $cookie1[0] . '; ' . $cookie2[0];

然后尝试下载文件:

$headers = array ($cookies);
$file = fopen ('Report.pdf', 'w+');
$ch3 = curl_init($pdf_link_full);
curl_setopt($ch3, CURLOPT_SSL_CIPHER_LIST, 'AES128-SHA:RC2-CBC-MD5');
curl_setopt($ch3, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch3, CURLOPT_FILE, $file);
curl_setopt($ch3, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch3, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch3, CURLOPT_SSLVERSION, 4);
curl_setopt($ch3, CURLOPT_USERAGENT, $agent);
curl_setopt($ch3, CURLOPT_COOKIEFILE, "cookie.txt");
$output3 = curl_exec($ch3);
curl_close($ch3);

编辑:如果我手动设置$pdf_link_full,它会工作。但是,如果我在preg_match中找到它(如上所述),它就会失败。

然而,如果我打印$pdf_link_full$pdf_link_full_2,它们看起来完全一样。我这里缺少编码还是其他什么?谢谢

问题出在我的preg_match上。它返回了一个带有&的URL,当我手动设置它时,我只使用了与号(&)。

&替换&解决了此问题。