正则表达式不适用于 cURL URL 内容


Regexp not working with cURL url contents

在我的代码中,当我html内容分配给变量但不使用url path时,我的Regexp工作正常。我得到空数组。

    <?php 
    $productmfgno = "154637401";
                $url = "http://www.pandorasoem.com/search#q=".$productmfgno;
                $ch1= curl_init();
                curl_setopt ($ch1, CURLOPT_URL, $url );
                curl_setopt($ch1, CURLOPT_HEADER, 0);
                curl_setopt($ch1,CURLOPT_VERBOSE,1);
                curl_setopt($ch1, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)');
                curl_setopt ($ch1, CURLOPT_REFERER,'http://www.google.com');  //just a fake referer
                curl_setopt($ch1, CURLOPT_RETURNTRANSFER, 1);
                curl_setopt($ch1,CURLOPT_POST,0);
                curl_setopt($ch1, CURLOPT_FOLLOWLOCATION, 20);
                $htmlContent= curl_exec($ch1);
                curl_close($ch1);
      /* It works when I assign this html content to $htmlContent variable but not working with cURL url
  $htmlContent = '<div class="findify-navigation-header findify-clearfix"> <div class="findify-pagination findify-push-right"></div> <div class="findify-header">Showing 2 results for <span class="findify-query">"154637401"</span>. <span id="findify-didyoumean"></span></div> </div>';
*/
                preg_match_all('/<div.*class='"findify'-header'".*?>(.*?)<span.*class='"findify-query'">.*?<'/div>/Us', $htmlContent, $count);
                print_r($count);

预期成果 - Showing 2 results for

所以我可以获取该结果计数。

问题是,页面上没有结果,您正在请求。实际搜索是在加载页面后通过 ajax 执行的。

您可能正在寻找搜索的 Ajax 端点返回 javascript 代码(而不是 json)的结果。这就是:

http://api.findify.io/v1.0/store/search?callback=jQuery111206735094679573879_1458022087824&q=154637401&key=5b31ee91-78fa-48e1-9338-1748ca55028e&analytics%5Bkey%5D=5b31ee91-78fa-48e1-9338-1748ca55028e&analytics%5Bvisit%5D=true&analytics%5Buniq%5D=true&analytics%5Burl%5D=http%253A%252F%252Fwww.pandorasoem.com%252Fsearch%2523q%253D154637401&analytics%5Bbaseurl%5D=http%253A%252F%252Fwww.pandorasoem.com%252Fsearch%2523q%253D154637401&analytics%5Bhost%5D=www.pandorasoem.com&analytics%5Bwidth%5D=1920&analytics%5Bheight%5D=1200&analytics%5Binner_width%5D=1438&analytics%5Binner_height%5D=667&analytics%5Bdoc_width%5D=1438&analytics%5Bdoc_height%5D=915&analytics%5Bscroll_x%5D=0&analytics%5Bscroll_y%5D=0&analytics%5Bvisit_id%5D=Ts22zuHHGJRZc3U1&analytics%5Buniq_id%5D=BoeCUKSzgdML6C50&byPage=24&page=0&_=1458022087825

UPD:由于格式不同,您需要一个新的正则表达式。像这样的事情可以:

preg_match_all('/["'']?totalHits["'']?'s*:'s*('d+)/gi', $htmlContent, $count);