在数组中查找相似的字符串 - Finding similar strings in array

Finding similar strings in array

本文关键字：字符串相似查找数组 | 更新日期: 2024-03-02

我需要利用similar_text()来处理一组值，这些值看起来像这样：

$strings = ["lawyer" => 3, "business" => 3, "lawyers" => 1, "a" => 3];

我想做的是找到实际上相同的单词，即上面数组中的lawyer和lawyers，并将它们的计数相加到一个新数组中。

因此lawyer将是4，因为lawyers将与lawyer的原始字符串相关联。

请记住，此数组只能是单数单词，并且长度未指定，其范围可能从1到>99。

我不知道从哪里开始，所以我用foreach循环对它进行了破解，如下所示，但预期的输出并不像预期的那样。

foreach ( $strings as $key_one => $count_one ) {
    foreach ( $strings as $key_two => $count_two ) {
        similar_text($key_two, $key_one, $percent);
        if ($percent > 80) {
            if(!isset($counts[$key_one])) {
                $counts[$key_one] = $count_one;
            } else {
                $counts[$key_one] += $count_two;
            }
        }
    }
}

^{注意：本例的百分比匹配在80（因为lawyer&lawyers的匹配是~92%）}

这最终给了我类似于以下的东西：

Array
(
    [lawyer] => 4
    [business] => 3
    [a] => 3
    [lawyers] => 2
)

我要求它在哪里：

Array
(
    [lawyer] => 4
    [business] => 3
    [a] => 3
)

^{注意我是如何要求它实际删除lawyers并将计数添加到lawyer的}

您的困难在于，正如律师与律师相似，律师也与律师相似。所以他们俩的票数都比对方高。

试试这个：

foreach ( $strings as $key_one => &$count_one ) {
    if ($count_one == 0) continue; // skip it if we've already processed it
    if (!isset($counts[$key_one]) {
        $counts[$key_one] = $count_one;
        $count_one = 0;
    }
    foreach ( $strings as $key_two => &$count_two ) {
        similar_text($key_two, $key_one, $percent);
        if ($percent > 80) {
            $counts[$key_one] += $count_two;
            $count_two = 0;
        }
    }
}

这样做的缺点是更改了原始的$string数组，这可能并不理想。这里有另一种方法，在另一个散列中跟踪已经处理的字符串：

$already = $counts = array(); // not really necessary, but nice to init
foreach ( $strings as $key_one => $count_one ) {
    if (isset($already[$key_one])) continue; // skip if already processed
    $counts[$key_one] = $count_one; // by definition this should be new
    foreach ( $strings as $key_two => $count_two ) {
        similar_text($key_two, $key_one, $percent);
        if ($percent > 80) {
            $counts[$key_one] += $count_two;
            $already[$key_two] = true;
        }
    }
}

我推荐第二种解决方案。

您可以始终使用

unset( $counts[$key_two] ) ;