查找一组字符串的关联程度


Finding how closely a set of strings are related

我有一个字符串数组(如示例所示)。我只是想知道他们当中最常见的是什么。最常见的字符串定义为:-如果Apple Ipod touch出现10次(比如),而Apple Ipod出现8次,那么我会说Apple Ipod touch在所有元素中都是主导/常见的字符串。

Apple iPod touch, 8GB (with FaceTime Camera and Retina Display)
Aple Ipod Clasic 80gb 6th Generation Black
iPod classic 160GB - Silver
Apple 8GB iPod Touch
Apple Ipod Touch 8gb 4th Generation Mc540ll/a 8 Gb Newest Model
Apple iPod touch Black 4th Generation 8GB Touch Screen Wi-Fi MP3
Apple 8GB iPod touch�
Apple 8GB iPod touch MC540LL/A
Apple MC540LL/A - 8GB iPod Touch w/ Camera (4th Gen) (Newest Model)
Apple iPod Touch - 8 GB - Electronics
Apple iPod 8GB 4th Generation Black Touch
Apple iPod touch 8GB 4th Gen (Refurbished)
Apple Ipod Touch Digital Player - Apple Ios 5
Apple Ipod Touch 8G - White (4Th Gen)
Apple MC540LL/A iPod Touch 8GB (4th Generation)
(refurbished) Apple Ipod Touch 8gb (4th Generation)
Apple Ipod Touch 8Gb 4Th Generation
iPod Touch 8GB (4th Gen)
Apple Ipod Touch 32G - White (4Th Gen)
Apple iPod touch 8GB (4th Gen), White
Apple iPod touch White 4th Generation 8GB Touch Screen Wi-Fi MP3
Apple 32GB Black 4th Generation iPod Touch - MC544LL/A
Apple 8GB iPod touch�
Apple iPod touch 8GB - White - Electronics
Apple MC544LL/A - 32GB iPod Touch w/ Camera (4th Gen) (Newest Model)

那么,有人能给我推荐一些好的算法吗?问题是我没有任何标准/基准可供比较。我只需要将所有元素相互比较,找出最常见的元素。这必须在PHP或Javascript中实现。

希望我的问题很清楚。如果我不清楚,请评论。

我不确定您是否看过使用PHP的similar_text函数,或者是否有类似的javascript函数。谷歌的快速搜索也显示了我http://cambiatablog.wordpress.com/2011/03/25/algorithm-for-string-similarity-better-than-levenshtein-and-similar_text/

EDIT:类似的文本Javascript函数!http://phpjs.org/functions/similar_text:902