根据数组值组合计算最接近的匹配项


Calculate closest matches from array value combinations

我有一个零件长度数组,例如:-

array(150, 180, 270);

然后我有一个测量($a = 440)

我需要计算两个最接近的可能的长度组合,它们大于$a,而不必手动编写数百个可能的组合来计算。

因此:

150
180
270

150 + 150
150 + 180
150 + 270

180 + 180
180 + 270

270 + 270

150 + 150 + 150
150 + 150 + 180

等等

这需要运行一定次数,而不仅仅是找到前两个匹配并停止,因为150 + 150 + 150$a的匹配比270 + 270更接近,但可能会在之后运行。

edit:我还需要存储组成匹配的部分的组合,最好是在一个数组中。

我希望我已经解释得足够好,让别人理解。

由于这是一个资源量很大的脚本,我认为最好提前提供生成选项的选项,然后使用该数据创建一个变量/object/sql脚本来永久存储数据。例如,做一些类似的事情

SELECT * FROM combination_total WHERE size > YOUR_SIZE ORDER BY size ASC LIMIT 2;

我的新脚本是类似的,但它只是生成一个所有组合的数组,没有任何重复。看起来又很快了。请注意$maxLength变量,它当前设置为2000,可以使用您自己的最大大小进行修改。

<?php
$partLengths = array(150, 180, 270);
$currentCombinations = array(
    array(
        'total' => 150,
        'combination' => array(150)
    ),
    array(
        'total' => 180,
        'combination' => array(180)
    ),
    array(
        'total' => 270,
        'combination' => array(270)
    )
);
$maxLength = 2000;
$largestSize = 0;
function generateCombination() {
    global $currentCombinations, $largestSize, $partLengths;
    $tmpCombinations = $currentCombinations;
    foreach ($tmpCombinations as $combination) {
        foreach ($partLengths as $partLength) {
            $newCombination = $combination['combination'];
            $newCombination[] = $partLength;
            sort($newCombination);
            $newCombinationTotal = array_sum($newCombination);
            if (!combinationExists($newCombination)) {
                $currentCombinations[] = array(
                        'total' => $newCombinationTotal,
                        'combination' => $newCombination
                );
            }
            $largestSize = ($newCombinationTotal > $largestSize) ? $newCombinationTotal : $largestSize;
        }
    }
}
function combinationExists($combination) {
    global $currentCombinations;
    foreach ($currentCombinations as $currentCombination) {
        if ($combination == $currentCombination['combination']) {
            return true;
        }
    }
    return false;
}
while ($largestSize < $maxLength) {
    generateCombination();
}
// here you can use $currentCombinations to generate sql/object/etc
var_dump($currentCombinations);
?>

这段代码计算出$a以上最接近的组合,以及之后最接近的一个。它删除重复项以加快速度。它并没有进行大规模优化,但初步测试表明它还不错,这取决于$a的初始值是否巨大。

<?php
/* value in cm */
$a = 1020;
$partLengths = array(150, 180, 270);
$closestValue = array();
$secondClosest = array();
$currentCombinations = array(
    array(
        'total' => 150,
        'combination' => array(150)
    ),
    array(
        'total' => 180,
        'combination' => array(180)
    ),
    array(
        'total' => 270,
        'combination' => array(270)
    )
);
function getCombinations(&$currentCombinations, $partLengths,$a, &$closestValue, &$secondClosest) { 
    $tmpCombinations = $currentCombinations;
    static $secondMatch = true;
    for ($x=0;$x<count($partLengths);$x++) {
        for ($y=0;$y<count($tmpCombinations);$y++) {
            $newCombination = $tmpCombinations[$y]['combination'];
            $newCombination[] = $partLengths[$x];
            $newCombinationTotal = array_sum($newCombination);
            sort($newCombination);
            if (!combinationExists($currentCombinations, $newCombination, $newCombinationTotal)) {
                $currentCombinations[] = array('total' => $newCombinationTotal, 'combination' => $newCombination);
            }
            if ($closestValue['total'] < $a) {
                $oldGap = $a - $closestValue['total'];
                $newGap = $a - $newCombinationTotal;
                $newGap = ($newGap < 0) ? 0 - $newGap : $newGap;
                if ($newGap < $oldGap) {
                    $secondClosest = $closestValue;
                    $closestValue['total'] = $newCombinationTotal;
                    $closestValue['combination'] = $newCombination;
                }
            } else {
                $oldGap = $a - $secondClosest['total'];
                $newGap = $a - $newCombinationTotal;
                $oldGap = ($oldGap < 0) ? 0 - $oldGap : $oldGap;
                $newGap = ($newGap < 0) ? 0 - $newGap : $newGap;
                if ($newCombinationTotal > $a && $newCombinationTotal > $closestValue['total']) {
                    if ($secondMatch || $newGap < $oldGap) {
                        $secondMatch = false;
                        $secondClosest['total'] = $newCombinationTotal;
                        $secondClosest['combination'] = $newCombination;
                    }
                }
            }
        }
    }
}
function combinationExists(&$currentCombinations, $newCombination, $newCombinationTotal) {
    foreach ($currentCombinations as $currentCombination) {
        if ($currentCombination['total'] != $newCombinationTotal && $currentCombination['combination'] != $newCombination) {
            return false;
        }
    }
    return false;
}
while ($secondClosest['total'] <= $a) {
    getCombinations($currentCombinations, $partLengths, $a, $closestValue, $secondClosest);
}
var_dump($closestValue);
var_dump($secondClosest);
?>

如果速度确实成为一个问题,另一个建议是预先生成所有组合,并将它们保存在某种可以轻松访问的哈希/数据库等中。

下面的代码是暴力的,只测试两个值的可能组合,所以我知道它不完整。然而,这只是一个开始。

更新:请参阅下面我的另一个答案,以获得一个更好的解决方案,它可以与任何可能的组合一起工作,而不仅仅是2,并且经过了优化

<?php
    echo "<html><head><title>Test Array Sums</title></head><body>";
    $testarray = array(2, 5, 9, 78, 332);
    $target_value = 10;
    $closest1 = 0;
    $closest2 = 0;
    $closest_sum = 0;
    $closest_difference = 0;
    $first_time_in_loop = TRUE;
    foreach ($testarray AS $entry1)
    {
        foreach ($testarray AS $entry2)
        {
            if ($first_time_in_loop)
            {
                $first_time_in_loop = FALSE;
                $closest1 = $entry1;
                $closest2 = $entry2;
                $closest_sum = $closest1 + $closest2;
                $closest_difference = abs($target_value - $closest_sum);
            }
            $test_sum = $entry1 + $entry2;
            if (abs($test_sum - $target_value) < $closest_difference)
            {
                if ($test_sum - $target_value >= 0)
                {
                    // Definitely the best so far
                    $closest1 = $entry1;
                    $closest2 = $entry2;
                    $closest_sum = $closest1 + $closest2;
                    $closest_difference = abs($closest_sum - $target_value);
                }
                else if ($closest_sum - $target_value < 0)
                {
                    // The sum isn't big enough, but neither was the previous best option
                    // and at least this is closer
                    $closest1 = $entry1;
                    $closest2 = $entry2;
                    $closest_sum = $closest1 + $closest2;
                    $closest_difference = abs($closest_sum - $target_value);
                }
            }
            else
            {
                if ($closest_sum - $target_value < 0 && $test_sum - $target_value >= 0)
                {
                    // $test_value is farther away from the target than the previous best option,
                    // but at least it's bigger than the target value (the previous best option wasn't)
                    $closest1 = $entry1;
                    $closest2 = $entry2;
                    $closest_sum = $closest1 + $closest2;
                    $closest_difference = abs($closest_sum - $target_value);
                }
            }
        }
    }
    echo "Best pair: " . $closest1 . ", " . $closest2 . "<br />";
    echo "</body></html>";
?>

你能将测试值的总数限制为3个吗?或者你真的需要将其扩展到所有可能的组合吗?(即,如果4+4+5+4+5+3+5+4比26+26更接近,那么你需要找到它吗?)

如果你可以将测试的数量限制为5,那么你可以扩展上面的循环,最多处理5个选项。否则,将需要编写一个更复杂的循环。

在改进我之前的回答的基础上,这里有一个版本可以测试任意数量的条目,最多可以测试最大数量。

更新:(添加了优化;请参阅下面的评论)

例如,如果所需值为15,而列表为(1, 17, 20),则最佳选择为1+1+1+1+1+1+1+1+1+1+1+1+1+1+1,因此您必须允许下面的$max_loops至少为15才能找到此匹配项,即使列表中只有3个值!对于(1, 133, 138),情况更糟,其中所需的值是,例如,130。在这种情况下,您需要130递归!你可以看到,这可能是一场优化噩梦。但是,下面的算法是有效的,并且经过了很好的优化。

<?php
    echo "<html><head><title>Test Array Sums</title></head><body>";
    $testarray = array(1, 3, 6);
    $target_value = 10;
    $current_closest_sum = 0;
    $current_closest_difference = 0;
    $first_time_in_loop = TRUE;
    $max_loops = 10;
    $current_loop = 0;
    $best_set = array();
    $current_set = array();
    $sums_already_evaluated = array();
    function nestedLoop($current_test = 0)
    {
        global $testarray, $target_value, $current_closest_sum, $current_closest_difference, $first_time_in_loop, $max_loops, $current_loop, $best_set, $current_set, $sums_already_evaluated;
        ++$current_loop;
        foreach ($testarray AS $entry)
        {
            $current_set_temp = $current_set;
            $current_set[] = $entry;
            if ($first_time_in_loop)
            {
                $first_time_in_loop = FALSE;
                $current_closest_sum = $entry + $current_test;
                $current_closest_difference = abs($target_value - $current_closest_sum);
                $best_set[] = $entry;
            }
            $test_sum = $entry + $current_test;
            if (in_array($test_sum, $sums_already_evaluated))
            {
                // no need to test a sum that has already been tested
                $current_set = $current_set_temp;
                continue;
            }
            $sums_already_evaluated[] = $test_sum;
            if ($test_sum > $target_value && $current_closest_sum > $target_value && $test_sum >= $current_closest_sum)
            {
                // No need to evaluate a sum that is certainly worse even by itself
                $current_set = $current_set_temp;
                continue;
            }
            $set_best = FALSE;
            if (abs($test_sum - $target_value) < $current_closest_difference)
            {
                if ($test_sum - $target_value >= 0)
                {
                    // Definitely the best so far
                    $set_best = TRUE;
                }
                else if ($current_closest_sum - $target_value < 0)
                {
                    // The sum isn't big enough, but neither was the previous best option
                    // and at least this is closer
                    $set_best = TRUE;
                }
            }
            else
            {
                if ($current_closest_sum - $target_value < 0 && $test_sum - $target_value >= 0)
                {
                    // $test_value is farther away from the target than the previous best option,
                    // but at least it's bigger than the target value (the previous best option wasn't)
                    $set_best = TRUE;
                }
            }
            if ($set_best)
            {
                $current_closest_sum = $test_sum;
                $current_closest_difference = abs($current_closest_sum - $target_value);
                $best_set = $current_set;
            }
            if ($current_loop < $max_loops)
            {
                if ($test_sum - $target_value < 0)
                {
                    nestedLoop($test_sum);
                }
            }
            $current_set = $current_set_temp;
        }
        --$current_loop;
    }
    // make array unique
    $testarray = array_unique($testarray);
    rsort($testarray, SORT_NUMERIC);
    // Enter the recursion
    nestedLoop();
    echo "Best set: ";
    foreach ($best_set AS $best_set_entry)
    {
        echo $best_set_entry . " ";
    }
    echo "<br />";
    echo "</body></html>";
?>

UPDATE:我添加了两个小的优化,它们似乎非常有帮助,避免了内存过载或哈希表查找。它们是:

(1) 跟踪所有以前评估的总和,不要再评估它们。

(2) 如果一个总和(本身)已经比以前的测试差,请跳过任何带有该总和的进一步测试。

我认为,有了这两个优化,该算法可能在您的实际情况下运行得很好。

以前的评论如下,现在有人不同意

我之前的评论在下面有点没有意义,因为上面两个优化似乎效果很好。但无论如何,我都会把这些评论包括在内。

不幸的是,如上所述,上述循环高度未优化。为了在现实情况下工作,必须对其进行优化,避免重复测试(和其他优化)。然而,它展示了一种有效的算法。

请注意,这在数学上是一个复杂的区域。各种优化可能在一种情况下有所帮助,但在另一种情况则不然。因此,为了使上述算法有效工作,您需要讨论现实的使用场景——零件列表中的最大长度会有限制吗?长度的范围是多少?以及零件列表的其他更微妙的特征&期望的目标虽然很微妙,但可能会对如何优化算法产生很大影响。

在这种情况下,"理论"问题不足以产生所需的解决方案,因为优化至关重要。因此,提出优化建议并不是特别有用。

例如,Leonard的优化(通过保存之前测试的所有组合来避免重复)对于小集合来说效果很好,但对于大集合来说内存使用量会激增(正如他所指出的)。这不是一个简单的问题。

(大约2小时后编辑代码,以处理由于将递归限制为一定数量的递归而可能错过的组合-最初通过将数组从高到低排序)