如何从脚本标签中提取信息


how extract information from script tag

在我的代码中,经过几次查询,我得到了如下可变内容:

<!DOCTYPE html>
<html dir=ltr>
  <head>
    <script>
      mapslite = {
        START_TIME: new Date()
      };
      mapslite.getBasePageResponse = function(cacheResponse) {
        delete mapslite.getBasePageResponse;
        cacheResponse([[[3988.776886432477,103.7950744,1.3090672],[0,0,0],[1024,768],13.10000038146973],"/maps-lite/js/2/maps_lite_20160404_RC01",107,null,null,["en",""],["/maps/lite/ApplicationService.GetEntityDetails","/maps/lite/ApplicationService.UpdateStarring","/maps/lite/ApplicationService.Search",null,"/maps/lite/suggest","/maps/lite/directions","/maps/lite/MapsLiteService.GetHotelAvailability",null,"https://www.google.com/maps/api/js/.....
,[null,null,1.3090672,103.7950744],null,"11401",null,"PjoDV_jjE8yPuATo_LmYDA","Asia/Singapore",[["'u003cb'u003eBuses'u003c/b'u003e from this station",[[3,"bus.png",null,"Bus",[["https://maps.gstatic.com/mapfiles/transit/iw2/b/bus.png",0,[15,15],null,0]]]],[[null,null,null,null,"0x31da18325b415901:0xeb661015c651c24a",[[5,["48",1,"#ffffff"]]]],[null,null,null,null,"0x31da19f34e04d59b:0x5758ef6990938b",[[5,["61",1,"#ffffff"]]]],[null,null,null,null,"0x31da1a5b8b75c379:0x6a13e189555f9fab",[[5,["95",1,"#ffffff"]]]],[null,null,null,null,"0x31da1a16ea23bf95:0xd7c90f15535c2b9f",[[5,["106",1,"#ffffff"]]]],[null,null,null,null,"0x31da10a7613d616f:0xf1f61ffeac2ea8a4",[[5,["970",1,"#ffffff"]]]],[null,null,null,null,"0x31da1a0bd6262d0b:0xfbd5d2bfd7a1252",[[5,["NR8",1,"#ffffff"]]]]],null,0,"5"]]],["http://www.google.com/search?q=
....
[0,0,"",0,1,null,null,null,0,0,1,1,0,"map,common",null,0,0,1,null,null,1,"1","2,1","","",0],null,null,"PjoDV_jjE8yPuATo_LmYDA",null,null,null,null,"//consent.google.com","2.maps_lite_20160404_RC01"]);
      };
      executeOgJs = function() {
        delete executeOgJs;
      };
    </script>

我想提取的重要信息是"本站"线路上的所有数字:"48,61,95106970,NR8"(它位于",1,"#ffffff"旁边)。

我尝试使用python代码:

 tree = html.fromstring(buspage, base_url=detail['result']['url'])
        bus_elm = tree.xpath("/html/body/div[1]/div/div[4]/div[4]/div/div/div[2]/div/div[2]/div[1]/div[2]/div/div/div[2]/div/table/tr/td")

但也遇到了一些错误和困难。有什么方法可以在PHP中方便地做到这一点吗?

我相信,如果您确信始终具有特定的结构,那么最好使用regex。

与"array"["5N4",323,"#asdasd"]匹配的表达式为CCD_ 1。

您可以在PHP中使用explode()或在python中使用split()来获得您想要的数字(在本例中为5N4),如下所示:

function get_numbers_from($input) {
    $matches = preg_match_all('('['"[]a-zA-Z0-9]*?'"','d*?','".*?'"'])', $input);
    foreach($matches[1] as $key => $match) {
        array_push($numbers, explode(',', $match)[0]);
    }
    return $numbers;
}