用2句PHP将文本分割


Cut text on pieces by 2 sentences PHP

我有一个很长的文本字符串。我想把它存储在一个数组中,每个元素2句话。我认为应该通过在点+空格周围分解文本来完成;不过,也有类似"Mr."的元素我不知道如何将其从爆炸函数中排除。

我也不知道如何调整它,使文本按2句而不是按1句爆炸。

可能类似于:

$min_sentence_length = 100;
$ignore_words = array('mr.','ms.');
$text = "some texing alsie urj skdkd. and siks ekka lls. lorem ipsum some.";
$parts = explode(" ", $text);
$sentences = array();
$cur_sentence = "";
foreach($parts as $part) {
  // Check sentence min length and is there period 
  if (strlen($cur_sentence) > $min_sentence_length && 
    substr($part,-1) == "." && !in_array($part, $ignore_words)) {
    $sentences[] = $cur_sentence;
    $cur_sentence = "";
  }
  $cur_sentence .= $part . " ";   
}
if (strlen($cur_sentence) > 0)
  $sentences[] = $cur_sentence;

问题上的注释链接到使用preg_split()而不是explode()的答案,以更准确地描述如何以及何时拆分输入。这可能对你有用。另一种方法是将每次出现". "时的输入拆分为一个临时数组,然后在该数组中循环,根据需要将其拼接在一起。例如

$tempArray = explode('. ', $input);
$outputArray = array();
$outputElement = '';
$sentenceCount = 0;
foreach($tempArray as $part){
  $outputElement .= $part . '. ';
  //put other exceptions here, not just "Mr."
  if ($part != 'Mr'){
    $sentenceCount++;
  }
  if ($senteceCount == 2){
    $outputArray[] = $outputElement;
    $outputElement = '';
    $sentenceCount = 0;
  }
}