从文本中过滤不良词汇


filtering bad words from text

这个函数从文本中筛选电子邮件并返回匹配的模式

  function parse($text, $words)
  {
    $resultSet = array();
    foreach ($words as $word){
      $pattern = 'regex to match emails';
      preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE );
      $this->pushToResultSet($matches);
    }
    return $resultSet;
  }

类似的方式,我想从文本中匹配坏单词并返回它们为$resultSet

下面是过滤坏词

的代码

测试在这里

$badwords = array('shit', 'fuck'); // Here we can use all bad words from database
$text = 'Man, I shot this f*ck, sh/t! fucking fu*ker sh!t f*cking  sh't ;)';
echo "filtered words <br>";
echo $text."<br/>";
$words = explode(' ', $text);
foreach ($words as $word)
    {
        $bad= false;
        foreach ($badwords as $badword)
            {
                if (strlen($word) >= strlen($badword))
                {
                    $wordOk = false;
                    for ($i = 0; $i < strlen($badword); $i++)
                    {   
                        if ($badword[$i] !== $word[$i] && ctype_alpha($word[$i]))
                        {
                            $wordOk = true;
                            break;
                        }
                    }
                    if (!$wordOk)
                    {
                        $bad= true;
                        break;
                    }
        }
            }   
            echo $bad ? 'beep ' : ($word . ' '); // Here $bad words can be returned and replace with *. 
    }

beep代替坏词

但是我想把匹配的坏字推送到$this->pushToResultSet(),并像邮件过滤的第一个代码一样返回。

我可以用我的坏过滤代码做到这一点吗?

大致将David Atchley的答案转换为PHP,这是否像您希望的那样工作?

$blocked = array('fuck','shit','damn','hell','ass');
$text = 'Man, I shot this f*ck, damn sh/t! fucking fu*ker sh!t f*cking  sh't ;)';
$matched = preg_match_all("/(".implode('|', $blocked).")/i", $text, $matches);
$filter = preg_replace("/(".implode('|', $blocked).")/i", 'beep', $text);
var_dump($filter);
var_dump($matches);

JSFiddle为工作示例。

是的,你可以匹配坏词(保存以后),在文本中替换它们,并基于你试图过滤的坏词数组动态构建regex(你可以将它存储在DB中,从JSON中加载,等等)。下面是工作示例的主要部分:

var blocked = ['fuck','shit','damn','hell','ass'],
    matchBlocked = new RegExp("("+blocked.join('|')+")", 'gi'),
    text = $('.unfiltered').text(),
    matched = text.match(matchBlocked),
    filtered = text.replace(matchBlocked, 'beep');

请参阅上面的JSFiddle链接获取完整的工作示例。