如何使用 php 制作正则表达式以从以下代码中抓取复杂数组


How to make regular expression to scrape complex array from a following code using php

我想从这个网站检索一些数据:http://sites.target.com/site/en/spot/search_results.jsp?&mapType=enhanced&startAddress=72756&startingLat=36.322757720947266&startingLong=-93.99922943115234&_requestid=2573952

上面网页的html源代码中有以下几行,

<script>
    function GetMap(){
    var results = new Array();
    results[results.length] = {"lat" : "36.299484", "lon" : "-94.173495", "id" : "2498", "name" : "test", "phoneNumber" : "(479) 986-1100", "hours" : "test", "address" : {"city" : "Rogers", "state" : "AR", "zip" : "72758", "street" : "2404 Promenade Blvd" }, "concepts" : [{"name" : "Pharmacy", "phoneNumber" : "(479) 986-1101", "hours" : "<b>M-Fr:</b> 10:00AM-7:00PM<br><b>Sa:</b> 10:00AM-7:00PM" }, {"name" : "PhotoLab", "hours" : "<b>M-Fr:</b> 9:00AM-9:00PM<br><b>Sa:</b> 9:00AM-9:00PM<br><b>Su:</b> 10:00AM-8:00PM" }, {"name" : "Wine"}, {"name" : "Starbucks"}]};
    results[results.length] = {"lat" : "36.1157", "lon" : "-94.1555", "id" : "1470", "name" : "test", "phoneNumber" : "(479) 443-5517", "hours" : "test", "address" : {"city" : "Fayetteville", "state" : "AR", "zip" : "72703", "street" : "3545 N Shiloh Dr" }, "concepts" : [{"name" : "Pharmacy", "phoneNumber" : "(479) 443-5628", "hours" : "<b>M-Fr:</b> 9:00AM-7:00PM<br><b>Sa:</b> 9:00AM-5:00PM<br><b>Su:</b> 11:00AM-5:00PM" }]};
    results[results.length] = {"lat" : "36.6738", "lon" : "-93.2257", "id" : "2098", "name" : "test", "phoneNumber" : "(417) 243-4500", "hours" : "test", "address" : {"city" : "Branson", "state" : "MO", "zip" : "65616", "street" : "1200 Branson Hills Pkwy" }, "concepts" : [{"name" : "Pharmacy", "phoneNumber" : "(417) 243-4513", "hours" : "<b>M-Fr:</b> 9:00AM-9:00PM<br><b>Sa:</b> 9:00AM-6:00PM<br><b>Su:</b> 9:00AM-6:00PM" }, {"name" : "PhotoLab", "phoneNumber" : "(417) 243-4500", "hours" : "<b>M-Fr:</b> 12:00PM-6:00PM<br><b>Sa:</b> 12:00PM-6:00PM<br><b>Su:</b> 12:00PM-6:00PM" }, {"name" : "Wine"}, {"name" : "Starbucks"}]};
    results[results.length] = {"lat" : "37.0849", "lon" : "-94.474", "id" : "774", "name" : "test", "phoneNumber" : "(417) 659-8755", "hours" : "test", "address" : {"city" : "Joplin", "state" : "MO", "zip" : "64801", "street" : "3151 E 7th St" }, "concepts" : [{"name" : "FreshGrocery"}, {"name" : "Pharmacy", "phoneNumber" : "(417) 206-3377", "hours" : "<b>M-Fr:</b> 9:00AM-9:00PM<br><b>Sa:</b> 9:00AM-6:00PM<br><b>Su:</b> 9:00AM-6:00PM" }, {"name" : "Wine"}, {"name" : "Starbucks"}]};
    results[results.length] = {"lat" : "37.1511", "lon" : "-93.2623", "id" : "1031", "name" : "test", "phoneNumber" : "(417) 889-1511", "hours" : "test", "address" : {"city" : "Springfield", "state" : "MO", "zip" : "65804", "street" : "1825 E Primrose St" }, "concepts" : [{"name" : "FreshGrocery"}, {"name" : "Pharmacy", "phoneNumber" : "(417) 520-1745", "hours" : "<b>M-Fr:</b> 9:00AM-7:00PM<br><b>Sa:</b> 9:00AM-5:00PM<br><b>Su:</b> 11:00AM-5:00PM" }, {"name" : "PhotoLab", "phoneNumber" : "(417) 889-1511", "hours" : "<b>M-Fr:</b> 12:00PM-6:00PM<br><b>Sa:</b> 12:00PM-6:00PM<br><b>Su:</b> 12:00PM-6:00PM" }, {"name" : "Starbucks"}]};

我想分开纬度,经度,ID,姓名,电话号码,城市,州,邮政编码。

是否可以从上面的javaScript代码中解析数据,我对如此复杂的行进行正则表达式感到困惑。

我们可以以以下格式获取 PHP 中的数据吗?

Array
(
    [lat] => 36.299484
    [lon] => -94.173695
    [id] =>  2498
    [name] => Rogers
    [phoneNumber] => (479) 986-1100
    [city] => Rogers
    [state] => AR
    [zipcode] => 72758

这是我不完整的方法。

   $fp = fopen("file.csv","w");
    $contents = file_get_contents('http://sites.target.com/site/en/spot/search_results.jsp?&mapType=enhanced&startAddress=72756&startingLat=36.322757720947266&startingLong=-93.99922943115234&_requestid=2573952');
    preg_match_all('Regular Expression Here', $contents, $matches);        
    foreach ($matches[1] as $index) {       
        preg_match('Regular Expression Here', $contents, $matches);
        preg_match_all('Regular Expression Here', $matches [1], $matches);
        $c = count ($matches [1]);
        $results = array();
        for ($i=0; $i<$c; $i++)  {
            $results [$matches [1] [$i]] = trim($matches [2] [$i], "''");
        }
        fwrite($fp,implode(";",array_values($results))."'r'n");
    }
    fclose($fp);

如果格式像您的示例中那样固定,我不会打扰正则表达式。

我要做的是:

  1. 使用 file 而不是 file_get_contents 获取数组中的内容(每个数组元素一行文件);
  2. 遍历数组;
  3. 使用许多字符串函数中的一些来检查results[results.length] =并删除之前的所有内容,包括results[results.length] =本身;
  4. 使用 trim 去掉结尾、开头和尾随;的空格
  5. 使用 json_decode 解码剩余的 JSON 对象,请参阅此示例。

该信息的格式称为 JSON。 PHP 有一个 JSON 解析器。

http://php.net/manual/en/book.json.php