在我的代码中,经过几次查询,我得到了如下可变内容:
<!DOCTYPE html>
<html dir=ltr>
<head>
<script>
mapslite = {
START_TIME: new Date()
};
mapslite.getBasePageResponse = function(cacheResponse) {
delete mapslite.getBasePageResponse;
cacheResponse([[[3988.776886432477,103.7950744,1.3090672],[0,0,0],[1024,768],13.10000038146973],"/maps-lite/js/2/maps_lite_20160404_RC01",107,null,null,["en",""],["/maps/lite/ApplicationService.GetEntityDetails","/maps/lite/ApplicationService.UpdateStarring","/maps/lite/ApplicationService.Search",null,"/maps/lite/suggest","/maps/lite/directions","/maps/lite/MapsLiteService.GetHotelAvailability",null,"https://www.google.com/maps/api/js/.....
,[null,null,1.3090672,103.7950744],null,"11401",null,"PjoDV_jjE8yPuATo_LmYDA","Asia/Singapore",[["'u003cb'u003eBuses'u003c/b'u003e from this station",[[3,"bus.png",null,"Bus",[["https://maps.gstatic.com/mapfiles/transit/iw2/b/bus.png",0,[15,15],null,0]]]],[[null,null,null,null,"0x31da18325b415901:0xeb661015c651c24a",[[5,["48",1,"#ffffff"]]]],[null,null,null,null,"0x31da19f34e04d59b:0x5758ef6990938b",[[5,["61",1,"#ffffff"]]]],[null,null,null,null,"0x31da1a5b8b75c379:0x6a13e189555f9fab",[[5,["95",1,"#ffffff"]]]],[null,null,null,null,"0x31da1a16ea23bf95:0xd7c90f15535c2b9f",[[5,["106",1,"#ffffff"]]]],[null,null,null,null,"0x31da10a7613d616f:0xf1f61ffeac2ea8a4",[[5,["970",1,"#ffffff"]]]],[null,null,null,null,"0x31da1a0bd6262d0b:0xfbd5d2bfd7a1252",[[5,["NR8",1,"#ffffff"]]]]],null,0,"5"]]],["http://www.google.com/search?q=
....
[0,0,"",0,1,null,null,null,0,0,1,1,0,"map,common",null,0,0,1,null,null,1,"1","2,1","","",0],null,null,"PjoDV_jjE8yPuATo_LmYDA",null,null,null,null,"//consent.google.com","2.maps_lite_20160404_RC01"]);
};
executeOgJs = function() {
delete executeOgJs;
};
</script>
我想提取的重要信息是"本站"线路上的所有数字:"48,61,95106970,NR8"(它位于",1,"#ffffff"旁边)。
我尝试使用python代码:
tree = html.fromstring(buspage, base_url=detail['result']['url'])
bus_elm = tree.xpath("/html/body/div[1]/div/div[4]/div[4]/div/div/div[2]/div/div[2]/div[1]/div[2]/div/div/div[2]/div/table/tr/td")
但也遇到了一些错误和困难。有什么方法可以在PHP中方便地做到这一点吗?
我相信,如果您确信始终具有特定的结构,那么最好使用regex。
与"array"["5N4",323,"#asdasd"]匹配的表达式为CCD_ 1。
您可以在PHP中使用explode()
或在python中使用split()
来获得您想要的数字(在本例中为5N4),如下所示:
function get_numbers_from($input) {
$matches = preg_match_all('('['"[]a-zA-Z0-9]*?'"','d*?','".*?'"'])', $input);
foreach($matches[1] as $key => $match) {
array_push($numbers, explode(',', $match)[0]);
}
return $numbers;
}