糟糕的单词过滤器,如何用单词的长度替换单词


Bad Word Filter, how do I replace words by their length?

我有这个PHP代码,它在有坏词的地方生成随机字符:

$wordreplace = array (
  "!",
  "#",
  "@",
  "&",
  "^",
  "$",
  "%",
  "*"
);
class badword {
  function word_fliter($content) {
    global $badwords, $wordreplace;
    $count = count($badwords);
    $countfilter = count($wordreplace);
    // Loop through the badwords array
    for ($n = 0; $n < $count; ++$n, next ($badwords)) {
      //Create random replace characters
      $x = 2;
      $y = rand(3,5);
      $filter = "";
      while ($x<="$y") {
        $f = rand(0,$countfilter);
        $filter .="$wordreplace[$f]";
        $x++;
      }
      //Search for badwords in content
      $search = "$badwords[$n]";
      $content = preg_replace("'$search'i","<i>$filter</i>",$content);
    }
    return $content;
  }
}

然而,我需要它生成与坏单词中的字母数相等的星号。

对于替换字符串中的坏单词,可以使用preg_replace_callback执行正则表达式搜索并使用回调进行替换。

以下示例用* 替换所有坏单词

输出:

PHP是一种常见的通用脚本语言适合网络开发。

<?php
header('Content-Type: text/plain');
// array of bad words to filter 
$badwords = array('pop','devel');
// strin to clean from bad words  
$text = 'PHP is a popular general-purpose scripting language that is especially suited to web development.';
// with /'w+/ regex we send each word to  replace_badwords function 
$text = preg_replace_callback('/'w+/', 'replace_badwords', $text);
echo $text;
// replace_badwords check if string has bad word then its replace with * char or not return otrginal word
function replace_badwords($ocr) {
  global $badwords;
  $newstr=false;
  foreach ($badwords as $key => $value) {
        $start = strpos($ocr[0],$value);
        if($start!== false){
            $length = strlen($value);
            $newstr = substr_replace($ocr[0],str_repeat("*",$length),$start,$length);
        }
        }
    return $newstr ? $newstr :  $ocr[0];
}
?>

您可能想得太多了。代码的整个第一部分旨在为坏单词找到一个随机替换。只有最后两行很重要,您可以将它们替换为:

$content = preg_replace_callback(
    $badwords,
    function ($matches) {
        return str_repeat('*', strlen($matches[0]));
    },
    $content
);

不过,要做到这一点,你需要将你的恶语包装在一个"捕获组"中,比如

$capture_badwords = [];
foreach ($word in $badwords) {
    $capture_badwords[] = "/(".$word.")/";
}

这应该会产生这样的数组:

["/(abadword)/", "/(anotherbadword)/", "/(afourletterword)/", ... ]

preg_replace_callback允许您定义一个函数,使用该函数可以操作匹配的组。

举个例子:

$badwords = [ "/(dog)/", "/(cat)/", "/(here)/" ];
//"#'(('d+)')#"
$phrase = "It is just dogs chasing cats in here.";
echo preg_replace_callback(
        $badwords,
        function ($matches) {
            return str_repeat('*', strlen($matches[0]));
        },
        $phrase
    );

收益率:

这只是***在***中追逐***。