数组比较和计数哪些值相似,哪些值不相似


Array compare and count which has similar values and which not

我遇到的情况是,我必须比较动态数组,并只获取前四个键的值相似的数组的计数。例如:

Array[0]
(
    [item] => 1
    [size] => 1
    [pair] => 1
    [pay] => 1
    [name] => 
    [msg] => 
    [email] => 
    [b19e19b13682bcfef93651c86f9ad9e6] => eih6j74035oj17bvnses32km23
)
Array[1]
(
    [item] => 1
    [size] => 2
    [pair] => 1
    [pay] => 1
    [name] => 
    [msg] => 
    [email] => 
    [b19e19b13682bcfef93651c86f9ad9e6] => eih6j74035oj17bvnses32km23
)
Array[2]
(
    [item] => 1
    [size] => 2
    [pair] => 2
    [pay] => 2
    [name] => 
    [msg] => 
    [email] => 
    [b19e19b13682bcfef93651c86f9ad9e6] => eih6j74035oj17bvnses32km23
)
Array[3]
(
    [item] => 1
    [size] => 1
    [pair] => 1
    [pay] => 1
    [name] => 
    [msg] => 
    [email] => 
    [b19e19b13682bcfef93651c86f9ad9e6] => eih6j74035oj17bvnses32km23
)

我有上面的一组数组,它有第一次迭代,最后一次迭代有相似的值(对于前四个键)。为此,我必须推导出类似(0,3),(1),(2)的东西。有什么解决办法吗?

这应该可以正常工作:

把你的数组放在一个数组中,就像我在$arrays中所做的那样,然后:

<?php
$arrays = [
array('a'=>1, 'b'=>2, 'c'=>3, 'd'=>4),
array('a'=>1, 'b'=>2, 'c'=>3, 'd'=>4),
array('a'=>1, 'b'=>2, 'c'=>3, 'd'=>4),
array('a'=>1, 'b'=>2, 'c'=>4, 'd'=>3),
];
$result = [];
//get the keys of a sub-array that is inside $arrays, to be used later
$keys = array_keys($arrays[0]);
for($i=0; $i < sizeof($arrays); $i++){
    $sa = array(); // to store similar arrays indexes
    for($k=$i+1; $k < sizeof($arrays); $k++){
        $similar = false;
        //compare the values of keys in the two arrays. Just compare the first 4 keys (as the user's desire)
        for($j=0; $j < 4; $j++){
            //check if the values are similar, if they are, assign $similar to true, and assign $j=3 to end the loop, (a bit of laziness here)
            ($similar = $arrays[$i][$keys[$j]] == $arrays[$k][$keys[$j]] ? true : false) ? null : ($j=3); 
        }
        // check if the key (which represents an index in $arrays) is in $sa or not, if not, push it.
        $similar ? (in_array($i, $sa) ? null : array_push($sa, $i) && in_array($k, $sa) ? null : array_push($sa, $k)) : null;
        //if $similar is true, make $i jumps to the $k index (saving time)
        $similar ? $i=$k : null;
    }
    //if $sa not empty, push it to $result
    empty($sa) ? null : ($result[] = $sa);
}
/* 
// at this stage, $result includes all the similar arrays
// so we need another loop to push the unique arrays to $result
// just check if an index of $arrays is in an sub-array of $result, if not, push it as an array of one record 
*/
for($j=0; $j < sizeof($arrays); $j++){
    $f = false;
    for($i=0; $i < sizeof($result); $i++){
        in_array($j, $result[$i]) ? $f = true : null;
    }
    if(!$f){
        $sa = array();
        array_push($sa, $j);
        array_push($result, $sa);
    }
}

最后,$result是数组的数组,每个子数组值表示$arrays的索引如果结果输出为:

array(2) { 
    [0]=> array(3) { 
            [0]=> int(0) 
            [1]=> int(1) 
            [2]=> int(2) 
    },
    [1]=> array(1) { 
            [0]=> int(3) 
    } 
}

这意味着$arrays具有两组子阵列,其中$arrays[0]、$arrays[1]和$arrays[2]相似(组1),则$arrays[3]是唯一的(组2)

N.B:如果有人能优化我的答案,我将不胜感激。

就我个人而言,我更喜欢OOP方法:更干净、可重用。。。

用法

$o = new SOF_ArrayComapare($yourInputArray, array('item', 'size', 'pair', 'pay', 'name'));
$arraysEqual = $o->getEqualArrays();
print $o->toString();

类别定义

class SOF_ArrayComapare {
    private $_keysToMatch   = array();
    private $_array         = array();
    public function __construct($array, $keysToMatch) {
        $this->_array       = $array;
        $this->_keysToMatch = $keysToMatch;
    }
    private $_equalArrays   = array();
    private $_indexToEscape = array();
    public function getEqualArrays() {
        $size = count($this->_array);
        for ($i=0 ; $i<$size ; $i++) {
            if (in_array($i, $this->_indexToEscape))    continue;
            else                                        $this->_indexToEscape[] = $i;
            $this->_equalArrays[$i][] = $i;
            for ($j=($i+1) ; $j<$size ; $j++) {
                if (in_array($j, $this->_indexToEscape)) continue;
                if ($this->areEquals($this->_array[$i], $this->_array[$j])) {
                    $this->_indexToEscape[]     = $j;
                    $this->_equalArrays[$i][]   = $j;
                }
            }   
        }
        return $this->_equalArrays;
    }
    private function areEquals($a1, $a2) {
        foreach($this->_keysToMatch as $key) {
            if(
                !isset($a1[$key]) || 
                !isset($a2[$key]) ||
                $a1[$key] !== $a2[$key]
            ) {
                return FALSE;
            }
        }
        return TRUE;
    }
    public function toString($htmlFormat = TRUE) {
        $newLine = ($htmlFormat === TRUE) ? '<br />' : "'n";
        $report = "These arrays are equals: " . $newLine;
        foreach($this->_equalArrays as $array) {
            $report .= '(' . implode(',', $array) . ')' . $newLine;
        }
        return $report;
    }
}

我将在最后对此进行解释,但代码非常不言自明:

function getDuplicatesArray()
{
    $foundIndexes = array();
    $arraysCount = count(Array);
    $resultArray = array();
    // $i is current index
    // $j is compared index
    for ($i = 0; i < $arraysCount; i++)
    {
        if (in_array($i, $foundIndexes))
            continue;
        $currentResultArray = array($i);
        for ($j = $i+1; $j<$arraysCount; $j++)
        {
            if (in_array($j, $foundIndexes))
                continue;
            if(areFirstValsSame($i,$j))
            {   
                $currentResultArray[] = $j;
                if (count($currentResultArray) == 2) // first j for this i
                    foundIndexes[] = $i;
                foundIndexes[] = $j;
            }       
        }
        $resultArray[] = $currentResultArray;
    }//.. for i
}//.. getDuplicatesArray
function areFirstValsSame($index1, $index2){
    $toCompare = 4;
    for ($i=0; i<$toCompare; i++)
        if (Array($index1, i) != Array($index2, i)
            return false;
    return true; 
}

一个"已找到"的数组保存所有具有重复项的索引。

当发现重复时,数组的索引将添加到"已找到"中。

找到的当前数组保存要与当前数组进行比较的重复数组的所有索引。

然后在测试下一个索引之前,检查它以前是否被发现,如果是,跳过它

一旦比较了索引(无论是否找到重复索引),就会将当前找到的索引添加到结果中。

如果所有四个数组都相同,则会得到:(0,1,2,3),如果第一个和第三个重复,第二个和第四个重复,则会获得:(0,2)、(1,3)等

你不会重复检查两次。但是,您仍然需要重新读取那些到目前为止未发现与检查的索引重复的数组的值。

这可以通过递归进行优化,但会占用更多的内存,而且对于少量的数组,这甚至不会引起注意。