在我的代码中,当我html
内容分配给变量但不使用url path
时,我的Regexp
工作正常。我得到空数组。
<?php
$productmfgno = "154637401";
$url = "http://www.pandorasoem.com/search#q=".$productmfgno;
$ch1= curl_init();
curl_setopt ($ch1, CURLOPT_URL, $url );
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch1,CURLOPT_VERBOSE,1);
curl_setopt($ch1, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)');
curl_setopt ($ch1, CURLOPT_REFERER,'http://www.google.com'); //just a fake referer
curl_setopt($ch1, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch1,CURLOPT_POST,0);
curl_setopt($ch1, CURLOPT_FOLLOWLOCATION, 20);
$htmlContent= curl_exec($ch1);
curl_close($ch1);
/* It works when I assign this html content to $htmlContent variable but not working with cURL url
$htmlContent = '<div class="findify-navigation-header findify-clearfix"> <div class="findify-pagination findify-push-right"></div> <div class="findify-header">Showing 2 results for <span class="findify-query">"154637401"</span>. <span id="findify-didyoumean"></span></div> </div>';
*/
preg_match_all('/<div.*class='"findify'-header'".*?>(.*?)<span.*class='"findify-query'">.*?<'/div>/Us', $htmlContent, $count);
print_r($count);
预期成果 - Showing 2 results for
所以我可以获取该结果计数。
问题是,页面上没有结果,您正在请求。实际搜索是在加载页面后通过 ajax 执行的。
您可能正在寻找搜索的 Ajax 端点返回 javascript 代码(而不是 json)的结果。这就是:
http://api.findify.io/v1.0/store/search?callback=jQuery111206735094679573879_1458022087824&q=154637401&key=5b31ee91-78fa-48e1-9338-1748ca55028e&analytics%5Bkey%5D=5b31ee91-78fa-48e1-9338-1748ca55028e&analytics%5Bvisit%5D=true&analytics%5Buniq%5D=true&analytics%5Burl%5D=http%253A%252F%252Fwww.pandorasoem.com%252Fsearch%2523q%253D154637401&analytics%5Bbaseurl%5D=http%253A%252F%252Fwww.pandorasoem.com%252Fsearch%2523q%253D154637401&analytics%5Bhost%5D=www.pandorasoem.com&analytics%5Bwidth%5D=1920&analytics%5Bheight%5D=1200&analytics%5Binner_width%5D=1438&analytics%5Binner_height%5D=667&analytics%5Bdoc_width%5D=1438&analytics%5Bdoc_height%5D=915&analytics%5Bscroll_x%5D=0&analytics%5Bscroll_y%5D=0&analytics%5Bvisit_id%5D=Ts22zuHHGJRZc3U1&analytics%5Buniq_id%5D=BoeCUKSzgdML6C50&byPage=24&page=0&_=1458022087825
UPD:由于格式不同,您需要一个新的正则表达式。像这样的事情可以:
preg_match_all('/["'']?totalHits["'']?'s*:'s*('d+)/gi', $htmlContent, $count);