如何在PHP中检查两个字符串的部分相似性


How to check a partial similarity of two strings in PHP

是PHP中检查两个字符串相似度%的函数吗?

例如:

$string1="Hello how are you doing" 
$string2= " hi, how are you"

function($string1, $string2)将返回true,因为单词"how","are","you"出现在行中。

或者更好,返回60%的相似度,因为"how","are","you"是$string1的3/5。

PHP中是否存在这样的函数?

因为这是一个很好的问题,所以我花了一些精力:

<?php
$string1="Hello how are you doing";
$string2= " hi, how are you";
echo 'Compare result: ' . compareStrings($string1, $string2) . '%';
//60%

function compareStrings($s1, $s2) {
    //one is empty, so no result
    if (strlen($s1)==0 || strlen($s2)==0) {
        return 0;
    }
    //replace none alphanumeric charactors
    //i left - in case its used to combine words
    $s1clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s1);
    $s2clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s2);
    //remove double spaces
    while (strpos($s1clean, "  ")!==false) {
        $s1clean = str_replace("  ", " ", $s1clean);
    }
    while (strpos($s2clean, "  ")!==false) {
        $s2clean = str_replace("  ", " ", $s2clean);
    }
    //create arrays
    $ar1 = explode(" ",$s1clean);
    $ar2 = explode(" ",$s2clean);
    $l1 = count($ar1);
    $l2 = count($ar2);
    //flip the arrays if needed so ar1 is always largest.
    if ($l2>$l1) {
        $t = $ar2;
        $ar2 = $ar1;
        $ar1 = $t;
    }
    //flip array 2, to make the words the keys
    $ar2 = array_flip($ar2);

    $maxwords = max($l1, $l2);
    $matches = 0;
    //find matching words
    foreach($ar1 as $word) {
        if (array_key_exists($word, $ar2))
            $matches++;
    }
    return ($matches / $maxwords) * 100;    
}
?>

正如其他答案已经说过的,您可以使用similar_text。下面是演示:

$string1="Hello how are you doing" ;
$string2= " hi, how are you";
echo similar_text($string1, $string2, $perc); //12
echo $perc; //61.538461538462

将返回12,并在$perc中设置您所要求的相似度百分比。

除了Alex Siri的回答和根据以下文章:

http://docstore.mik.ua/orelly/webprog/php/ch04_06.htm

PHP提供了几个函数来测试两个字符串是否近似相等:

$string1="Hello how are you doing" ;
$string2= " hi, how are you";

SOUNDEX

if (soundex($string1) == soundex($string2)) {
  echo "similar";
} else {
  echo "not similar";
}

变音位

if (metaphone($string1) == metaphone($string2)) {
   echo "similar";
} else {
  echo "not similar";
}
类似文本

$similarity = similar_text($string1, $string2);

LEVENSHTEIN

$distance = levenshtein($string1, $string2); 

好的,这是我的函数,使它更有趣。

我正在检查字符串的近似相似性。

这是我使用的一个标准。

  1. 单词的顺序很重要
  2. 单词可以有85%的相似度。

的例子:

$string1 = "How much will it cost to me" (string in vocabulary)
$string2 = "How much does costs it "   //("costs" instead "cost" -is a mistake) (user input);

算法:1)检查单词的相似度,用"正确"的单词创建干净的字符串(按照词汇表中的顺序)。输出:"它花了多少钱"2)创建干净的字符串与"正确的词"在用户输入中出现的顺序。输出:"多少钱?"3)比较两个输出,如果不相同则返回no,如果相同则返回yes。

error_reporting(E_ALL);
ini_set('display_errors', true);
$string1="сколько это стоит ваще" ;
$string2= "сколько будет стоить это будет мне";
if(compareStrings($string1, $string2)) {
 echo "yes";    
} else {
    echo 'no';
}
//echo compareStrings($string1, $string2);
function compareStrings($s1, $s2) {
    if (strlen($s1)==0 || strlen($s2)==0) {
        return 0;
    }
    while (strpos($s1, "  ")!==false) {
        $s1 = str_replace("  ", " ", $s1);
    }
    while (strpos($s2, "  ")!==false) {
        $s2 = str_replace("  ", " ", $s2);
    }
    $ar1 = explode(" ",$s1);
    $ar2 = explode(" ",$s2);
  //  $array1 = array_flip($ar1);
  //  $array2 = array_flip($ar2);
    $l1 = count($ar1);
    $l2 = count($ar2);
 $meaning="";
    $rightorder="";
    $compare=0;
    for ($i=0;$i<$l1;$i++) {

        for ($j=0;$j<$l2;$j++) {
            $compare = (similar_text($ar1[$i],$ar2[$j],$percent)) ;
          //  echo $compare;
if ($percent>=85) {
    $meaning=$meaning." ".$ar1[$i];
    $rightorder=$rightorder." ".$ar1[$j];
    $compare=0;
}
        }

    }
    //print_r($rightorder);
if ($rightorder==$meaning) {
    return true;
} else {
    return false;
}
}

我很想听听你的意见和建议如何改进它

可以使用PHP的similar_text函数

int similar_text ( string $first , string $second)

查看PHP文档:http://php.net/manual/en/function.similar-text.php

虽然这个问题很老了,但由于一些原因,我只是添加了我的解决方案。首先,作者希望比较相似的单词,而不是像他的评论那样比较字符串。其次,大多数答案试图通过similar_text来解决它,这并不适合这个问题,因为它通过字符差异来比较文本并找到相似性,这也导致了完全不同的字符串的匹配。@Hugo Delsing给出的第一个答案是使用array_flip来反转键和值,但如果键重复不止一次,它只会考虑单词。我张贴了下面的答案,将比较的话。它唯一能给出的问题是它不会过多考虑单词的顺序。

function compareStrings($s1, $s2)
{
    if (strlen($s1) == 0 || strlen($s2) == 0) {
        return 0;
    }
    $ar1 = preg_split('/[^'w'-]+/', strtolower($s1), null, PREG_SPLIT_NO_EMPTY);
    $ar2 = preg_split('/[^'w'-]+/', strtolower($s2), null, PREG_SPLIT_NO_EMPTY);
    $l1 = count($ar1);
    $l2 = count($ar2);
    $ar2_copy = array_values($ar2);
    $matched_indices = [];
    $word_map = [];
    foreach ($ar1 as $k => $w1) {
        if (isset($word_map[$w1])) {
            if ($word_map[$w1][0] >= $k) {
                $matched_indices[$k] = $word_map[$w1][0];
            }
            array_splice($word_map[$w1], 0, 1);
        } else {
            $indices = array_keys($ar2_copy, $w1);
            $index_count = count($indices);
            if ($index_count) {
                if ($index_count == 1) {
                    $matched_indices[$k] = $indices[0];
                    // remove the word at given index from second array so that it won't repeat again
                    unset($ar2_copy[$indices[0]]);
                } else {
                    $matched_indices[$k] = $indices[0];
                    // remove the word at given indices from second array so that it won't repeat again
                    foreach ($indices as $index) {
                        unset($ar2_copy[$index]);
                    }
                    array_splice($indices, 0, 1);
                    $word_map[$w1] = $indices;
                }
            }
        }
    }
    return round(count($matched_indices) * 100 / $l1, 2);
}