理解一致哈希 - Understanding Consistent Hashing

在过去的几天里，我一直在研究PHP的一致性哈希算法。我希望更好地了解一致性哈希的实际工作方式，这样我就可以在一些即将到来的项目中使用它。在我看来，Flexihash实际上是唯一一个易于理解的纯PHP实现，所以我从中做了一些笔记

我创建了一个自己的小算法，试图了解它是如何工作的，以及如何使它尽可能快地工作。我很惊讶我的算法与Flexihash相比有多快，这让我怀疑我的实现是否在某种程度上有缺陷，或者我可能没有抓住整个概念的关键部分。

下面列出了在100万个顺序键（0到1000000）的迭代中的速度差异。显示每个节点以显示实际散列到该特定节点的键数。

Flexihash:
 Time: 269 seconds
Speed: 3714 hashes/sec
 node1: 80729
 node2: 88390
 node3: 103623
 node4: 112338
 node5: 79893
 node6: 85377
 node7: 80966
 node8: 134462
 node9: 117046
node10: 117176
My implementation:
 Time: 3 seconds
Speed: 265589 hashes/sec
 node1: 100152
 node2: 100018
 node3: 99884
 node4: 99889
 node5: 100057
 node6: 100030
 node7: 99948
 node8: 99918
 node9: 100011
node10: 100093

这是我目前实现的哈希算法。

class Base_Hasher_Consistent
{
    protected $_nodes;
    public function __construct($nodes=NULL)
    {
        $this->_nodes = array();
        $node_count = count($nodes);
        $hashes_per_node = (int)(PHP_INT_MAX / $node_count);
        $old_hash_count = 0;
        foreach($nodes as $node){
            if(!($node == $nodes[$node_count-1])){
                $this->_nodes[] = array(
                    'name' => $node,
                    'begin' => $old_hash_count,
                    'end' => $old_hash_count + $hashes_per_node - 1
                );
                $old_hash_count += $hashes_per_node;
            }else{
                $this->_nodes[] = array(
                    'name' => $node,
                    'begin' => $old_hash_count,
                    'end' => PHP_INT_MAX
                );
            }
        }
    }
    public function hashToNode($data_key,$count=0)
    {
        $hash_code = (int)abs(crc32($data_key));
        foreach($this->_nodes as $node){
            if($hash_code >= $node['begin']){
                if($hash_code <= $node['end']){
                    return($node['name']);
                }
            }
        }
    }
}

是我遗漏了什么，还是算法真的比Flexihash快？此外，我知道Flexihash支持查找多个节点，所以我不确定这是否与它有关

我只是想得到一些保证，我理解一致性哈希是如何工作的，或者可能链接到一些真正能很好地解释它的文章。

谢谢！

那么，你想知道crc32是如何工作的，还是只是想知道如何编写一个好的"bucket"实现？

您的bucket实现看起来不错。如果你只是这样做，你可能会做得更快：

$bucket_index = floor($hash_code / $hashes_per_node);
return $this->_nodes[$bucket_index]['name'];

您编写的"算法"只会使$nodes具有存储桶的数量，并计算出$data_key应该放在其中的哪个存储桶中。你使用的实际哈希算法是crc32，如果你使用bucket，这可能不是一个理想的算法。

如果你想知道crc32是如何工作的，以及为什么哈希是一致的。。在维基百科上查一下。据我所知，不存在不一致的哈希，因此所有哈希算法都是一致的。

哈希算法的一个特点是，对于相似的数据密钥，它可以生成非常不同的哈希。crc32也是如此，因为crc32旨在检测微小的变化。但它不能保证的是，由此产生的散列是"良好传播的"。由于你在做bucket，你想要一个哈希算法，它具有在所有频谱上生成哈希的特定特性。对于crc32来说，它可能只是巧合地工作得很好。我只想用md5。

EstelS表示：

我只是想得到一些保证，我理解一致性哈希是如何工作的，或者可能链接到一些真正能很好地解释它的文章。

这是一篇让我写Flexihash的优秀文章：http://www.tomkleinpeter.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/

我已经很久没有看代码了——很可能你的代码要快得多。。速度从来不是我关心的问题。但也有可能你的做法完全不同：）

另请参阅rs在GitHub上提交的这篇文章，该文章声称通过使用二进制搜索将速度提高了70倍。如果有人能确认这是货物，我会把它合并在一起。

干杯！Paul