循环遍历大数组的最佳实践


Best practice looping through huge arrays

我有两个大数组:

  • 数组A有4900个项目(每个项目是一个小数组)
  • 数组B有700个项目(每个项目也是一个小数组)

基本上这些就是我的数组:

A = array (
    [0] => array(
        "name" => "KE-KE IMPEX ",
        "email" => "someemai@gmail.com",
        "kezbCompany" => "Fragrance",
        "startDate" => "2013-03-25 00:00:00",
        "endDate" => "2014-03-25 00:00:00",
        "companyBase" => "06 20 232 2534"
    )
    ...
    [4900] => array(
        "name" => "Jane Doe",
        "email" => "zzer@sad.com",
        "kezbCompany" => "sadsad",
        "startDate" => "2013-03-25 00:00:00",
        "endDate" => "2014-03-25 00:00:00",
        "companyBase" => "06 20 232 2534"
    )
)
B = array (
    [0] => array(
        "name" => "KE-KE IMPEX 46554 sda",
        "email" => "xxx@gmail.com",
        "kezbCompany" => "546wer",
        "startDate" => "2013-03-25 00:00:00",
        "endDate" => "2014-03-25 00:00:00",
        "companyBase" => "06 20 232 2534"
    )
    ...
    [700] => array(
        "name" => "45 Jane Doe",
        "email" => "kekeimpex@gmail.com",
        "kezbCompany" => "asd",
        "startDate" => "2013-03-25 00:00:00"        
    )
)

小物件看起来像这样(A和B摊位):

array(
  'name' => 'John Doe',
  'email' => 'john@doe.com'
)

那么我需要做的是:检查哪个小数组有相同的名字

但是请记住,大多数情况下,这两个小数组的结构不会相同。

例如,他们的电子邮件是不同的。现在,如果我先循环A,然后再循环B,这会花费很多时间

这是我当前的代码:

$szData = file_get_contents('szData.txt');
$kData = file_get_contents('kData.txt');
$A = json_decode($szData);
$B = json_decode($kData);
$foundNr = 0;
foreach ($A as $key => $sz)
{
    $cName = $sz->companyName;
    foreach ($B as $index => $k)
    {
        $pattern = '/^(.*)+('.$cName.')/i';
        echo "SzSor: " . $key . " --- Ksor: " . $index . "</br>";
        if (preg_match($pattern, $k->companyName))
        {
            $founData[] = $k->companyName;
            ++$foundNr;
        }
    }
}

任何想法?

一种解决方案是首先将子数组中的所有数据合并为内爆字符串(John Doe-john@doe.com),然后使用快速函数比较匹配的字符串:

$szData = file_get_contents('szData.txt');
$kData = file_get_contents('kData.txt');
$A = json_decode($szData);
$B = json_decode($kData);
$foundNr = 0;
$B_sample = array();
foreach  ($B as $index => $data){
    $B_sample[$index] = implode('-',$data);
    //or use some other custom procedure to create unique string from the data.
}

$B_sample = array();
foreach  ($A as $index => $data){
    $A_sample[$index] = implode('-',$data);
}
$A_B_intersect = array_intersect ($A_sample , $B_sample ); // the KEYS in the result array are from $A
$foundNr = count($A_B_intersect);
foreach($A_B_intersect as $key->$data){
    $founData[] = $A[$key]->companyName;
}

这有额外的好处,如果你可以创建$A_sample,同时构建$A,重用相同的循环。

您最好尽量避免使用自己的循环之类的东西,而尽量使用内置的PHP函数,如:

in_array:

摘自docs的代码:

<?php
$os = array("Mac", "NT", "Irix", "Linux");
if (in_array("Irix", $os)) {
    echo "Got Irix";
}
if (in_array("mac", $os)) {
    echo "Got mac";
}
?>
The second condition fails because in_array() is case-sensitive, so the program above will display:
Got Irix
Example #2 in_array() with strict example
<?php
$a = array('1.10', 12.4, 1.13);
if (in_array('12.4', $a, true)) {
    echo "'12.4' found with strict check'n";
}
if (in_array(1.13, $a, true)) {
    echo "1.13 found with strict check'n";
}
?>
The above example will output:
1.13 found with strict check

因此,虽然您可能仍然需要循环遍历一个数组来比较所有可能的值,但最有效的代码通常来自使用内置函数。

说了这么多,写几个简单的循环并测试一下,看看哪个性能最好。如果您知道数据的结构,您可能能够完全跳过许多元素—例如,如果您知道数据将以字母顺序出现,并且您正在寻找以"S"开头的内容,则执行一次跳过100条记录的循环,直到获得"S"值—然后返回到它之前的最后一个条目。诸如此类。这是一种使代码快速高效的思维方式——如果您可以从100个元素中跳过99个元素,在5000个元素中的前3500个元素中不执行逐个元素的搜索。