如何从asmx web服务生成的页面中抓取数据


How to scrape data from asmx web service generated page

我一直在网上搜索,但没有发现任何有用的东西。我需要从供应商网站自动更新我的产品价格。我想一次从所有产品的分类页面上抓取信息。

我使用了简单的html-dom方法来获取数据。当我使用标签从firefoxfirebug扩展中检索价格时,它什么也没打印。我尝试打印该类别页面中的所有链接,但其中没有产品链接。当我右键点击页面查看该网站的源代码时,我没有看到与产品相关的代码。div是空的,类似于;

<div class=coll-2 fleft> </div>

但它充满了firebug扩展中的代码。然后我看到一个js文件有这样的代码;

function GetProductListHeader() {
var startPage = GetStartPage();
if (pageName == 'kategori' || pageName == 'reyon') {
    var BrandList = GetQueryStringByName("Brand");
    var ColorList = GetQueryStringByName("Color");
    var PropList = GetQueryStringByName("propid");
    var ItemDim1CodeList = GetQueryStringByName("vcode");
    var QPrice = GetQueryStringByName("price");
    var cFilter = GetQueryStringByName("cfilter");
    var parametre = { PageName: pageName, pUrl: PageUrl, BrandList: BrandList, ColorList: ColorList, ItemDim1CodeList: ItemDim1CodeList, PropList: PropList, QPrice: QPrice, cFilter: cFilter, startPage: startPage };
    $.ajax(
        {
            url: '/WS/wsProduct.asmx/GetProductListHeader',
            type: 'POST',
            processData: false,
            contentType: 'application/json; charset=utf-8',
            data: JSON.stringify(parametre),
            dataType: 'json',
            async: true
        })
        .done(function (e) {
            if (e.d != "") {  
                $('.coll-2').html(e.d);
                GetProductList(startPage);
            }
        })
}
}

有没有办法用php获取这些数据?

谢谢。

编辑:从chrome网络获得curl代码后,我尝试设置它,我使用了以下脚本;

$html = 'curl "http://bebekbayi.com/WS/wsProduct.asmx/GetProductList" ' 
    -H "Cookie: ASP.NET_SessionId=wy5hyt1bujcrdka2hpbp2wnm; _gat=1; _ga=GA1.2.1204447549.1447830812" ' 
    -H "Origin: http://bebekbayi.com" ' 
    -H "Accept-Encoding: gzip, deflate" ' 
    -H "Accept-Language: tr-TR,tr;q=0.8,en-US;q=0.6,en;q=0.4" ' 
    -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36" ' 
    -H "Content-Type: application/json; charset=UTF-8" ' 
    -H "Accept: application/json, text/javascript, */*; q=0.01" ' 
    -H "Cache-Control: max-age=0" ' 
    -H "X-Requested-With: XMLHttpRequest" ' 
    -H "Connection: keep-alive" ' 
    -H "Referer: http://bebekbayi.com/kategori/bakim-cantalari" 
    --data-binary "{""PageName"":""kategori"",""pUrl"":""bakim-cantalari"",""pIndex"":1,""BrandList"":"""",""ColorList"":"""",""ItemDim1CodeList"":"""",""PropList"":"""",""QPrice"":"""",""cFilter"":""""}" --compressed';
exec($html,$result);
   foreach($result as $res){
       echo $res . '<br>'; 
   }

它回来了;[InvalidOperationException:无法识别意外以"/GetProductList"结尾的URL的请求格式。]

我认为您的任务现在变得更容易了,因为您可以直接获得数据源。

您所能做的就是获取Web服务的完整URL并进行PHP CURL调用。

所以你会得到响应,通常它会在XML中,但这将取决于这个Web服务是如何编写的。

这是代码。

$html = "curl 'http://bebekbayi.com/WS/wsProduct.asmx/GetProductList' -H 'Origin: http://bebekbayi.com' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: en-US,en;q=0.8' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36' -H 'Content-Type: application/json; charset=UTF-8' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Referer: http://bebekbayi.com/reyon/Anne' -H 'X-Requested-With: XMLHttpRequest' -H 'Connection: keep-alive' --data-binary '{'"PageName'":'"reyon'",'"pUrl'":'"Anne'",'"pIndex'":1,'"BrandList'":'"'",'"ColorList'":'"'",'"ItemDim1CodeList'":'"'",'"PropList'":'"'",'"QPrice'":'"'",'"cFilter'":'"'"}' --compressed";
exec($html,$result);
$obj =  json_decode(implode("",$result) , true);
print_R($obj);exit;
exit;