关键字:所有超过3个字符的单词
我想比较两个字符串之间的关键字,条件如下:
- 移动单词并不重要(示例1适用于这种情况(
- 少于3个字符的单词不计算(示例2适用于这种情况(
- 把较短的句子放在str1中(字符数((示例3适用于这种情况(
- 我只想在str1和str2中使用不同的单词(示例4适用于这种情况(
事实上,我有一个机器人,它每天攻击两个新闻网站,并将新闻复制到我的数据库中。然后我需要一个算法来比较新闻标题并识别重复的新闻。(正如你所知,同一条新闻在不同的新闻网站上有不同的标题。但通常,同一新闻的标题包含相同的关键词(
示例1:移动单词并不重要
str1= 'hello petter'
str2= 'petter hello'
result: 0
示例2:少于3个字符的单词不计算
str1= 'hello !!'
str2= 'petter hello'
result: 0 // '!!' are less than 3characters and str1 is 'hello'. then result:0
或
str1= 'hello petter how are u?'
str2= 'petter hello how are you'
result: 0 // str1 is 'hello petter how are'
示例3:必须更改变量
str1= 'hello petter how are you ?'
str2= 'petter hello how are you?'
// Then
str1= 'hello petter how are you?'
str2= 'petter hello how are you ?'
result:1 // 1 is for 'you' (in str1)
示例4:不同的单词在str2 中并不重要
str1= 'hello petter how are you?'
str2= 'petter hello how are you ?'
result: 1 // str2 is 'petter hello how are you', then 1 is for: 'you?' (in str1)
注意:"you"(在str2中(对我来说并不重要,因为它不匹配带有str1的任何单词。
咒骂示例:(了解更多信息(
str1= 'petter hello how are you pal?'
str2= 'petter hello how are... !!'
// In first str1 change with str2
str1= 'petter hello how are... !!'
str2= 'petter hello how are you pal?'
// Then remove '!!' (in str1)
str1= 'petter hello how are...'
str2= 'petter hello how are you pal?'
result: 1 // 1 for 'are...' (in str1) - ['are','you','pal?' does not matter (in str2)]
最后,我需要一个函数来通过结果和关键字的数量来识别重复新闻(所有超过3个字符的单词(。
$keywords_numb=7;
$result=2;
function identify_duplicate($keywords_numb,$result){
if($keywords_numb / 3 >= $result){
$Specified = 'this is a new news';
}
else $Specified = 'this is a duplicate news';
return $Specified;
}
echo $Specified;
输出:
this is a new news
有人知道我该怎么写这个程序吗?问候
您不需要regex。。您可以使用以下函数并按任何顺序传递字符串:
function identify_duplicate($var1, $var2){
if(strlen($var1)>=strlen($var2)){
$str1 = $var1;
$str2 = $var2;
}
else{
$str1 = $var2;
$str2 = $var1;
}
$str1 = explode(" ", $str1);
$str2 = explode(" ", $str2);
$return = sizeof($str1);
foreach($str1 as $val){
if(in_array($val, $str2) || strlen($val) <= 3){
$return = $return - 1;
}
}
return $return;
}
在@karthik manchala的帮助下,我做到了。。。
$str1='this news is about a player named Ronaldo';
$str2='The player who called Ronaldo';
function identify_duplicate($str1, $str2){
if(strlen($str1)>strlen($str2)){
list($str1, $str2) = array($str2, $str1); // swap two variables
}
$str1 = explode(" ", $str1);
$str2 = explode(" ", $str2);
$words_numb = sizeof($str1);
$result=$words_numb;
foreach($str1 as $val){
if(in_array($val, $str2) || strlen($val) <= 3){
$result--;
}
}
if($words_numb / 3 >=$result){
$Specified = 'this is a duplicate news';
}
else $Specified = 'this is a new news';
return $Specified;
}
$out=identify_duplicate($str1, $str2);
echo $out;
输出:
这是一个重复的新闻