PHP代码中的加权标准差


Weighted Standard Deviation in PHP code

a1 = [14, 12, 11, 9, 9, 8, 8]
a2 = [12, 13, 14, 9, 9, 8]
...
...
std_dev_a1 = 2.267786838
std_dev_a2 = 2.483277404
...
...

a3由a1和a2组成,

a3 = [14, 12, 11, 9, 9, 8, 8, 12, 13, 14, 9, 9, 8]
std_dev_a3 = 2.295480509

当然我不能那样做std_dev_a3 ! = (std_dev_a1 * 7 + std_dev_a2 * 6)/13

我的问题是:我可以得到std_dev_a3仅通过std_dev_a1和std_dev_a2?

当我在PHP中编写代码来计算数组的stddev时,问题出现了。由于数组不断增长,它最终会耗尽内存。因此,我在每次迭代中unset()数组,然后问题就出来了。我从上次迭代中保存的东西是数组的平均值,stddev,数组的长度,所以有可能计算std_dev的一个新的数组是基于旧的数组加上新的数组?

你不能精确地计算它,因为标准偏差公式计算的是每个元素与平均值之间的差异。

但是你可以通过下面的公式得到一个很好的近似:

std_dev_a3 = (n1 - 1)*pow(std_dev_a1, 2) + (n2 - 1)*pow(std_dev_a2, 2)
std_dev_a3 = sqrt(std_dev_a3 / (n1 + n2 - 2))

您提到使用这种方法是因为存在内存泄漏。

将数据存储到频率表中可以避免内存泄漏:

{[8] => 3, [9] => 4, ..., [14] => 2}

使用这个数据结构,您可以计算标准偏差:

// This should be provide by your data
$freq = array(8 => 3, 9 => 4, 11 => 1, 12 => 2, 13 => 1, 14 => 2);
// Calculate mean
$mean = 0;
$n = 0;
foreach ($freq as $value => $count) {
  $mean += $value * $count;
  $n += $count;
}
$mean = $mean / $n;
// Calculate std dev
$std_dev = 0;
foreach ($freq as $value => $count) {
  $std_dev += ($count * pow($value - $mean, 2));
}
$std_dev = sqrt($std_dev/($n - 1));