RegEx匹配特定单词，除非它'；It’这是句子的最后一个单词 - RegEx to match specific words unless it's the last word in a sentence (titleize)

RegEx to match specific words unless it's the last word in a sentence (titleize)

本文关键字：单词 It 最后一个句子 RegEx | 更新日期: 2023-11-26

我将所有单词大写，然后将a、的、和等单词小写。第一个和最后一个单词应该保持大写。我尝试使用's而不是''b，这导致了一些奇怪的问题。我也试过[^$]，但这似乎并不意味着"不是字符串的末尾"

function titleize($string){
  return ucfirst(
     preg_replace("/'b(A|Of|An|At|The|With|In|To|And|But|Is|For)'b/uie",
     "strtolower('$1')", 
     ucwords($string))
  );
}

这是我试图修复的唯一一个失败的测试。末尾的"in"应保持大写。

titleize("gotta give up, gotta give in");
//Gotta Give Up, Gotta Give In

这些测试通过：

titleize('if i told you this was killing me, would you stop?');
//If I Told You This Was Killing Me, Would You Stop?
titleize("we're at the top of the world (to the simple two)");
//We're at the Top of the World (to the Simple Two)
titleize("and keep reaching for those stars");
//And Keep Reaching for Those Stars

在将字符串发送到regex replace之前应用ucwords()，然后在从regex返回之后再次应用ucfirst（用于出现在字符串开头的单词）。这可以通过约定缩短，即字符串开头和结尾的每个单词都不被空格包围。使用这个约定，我们可以使用类似'/(?<='s)( ... )(?='s)/'的正则表达式。这将以某种方式简化您的功能：

function titleize2($str) {
 $NoUc = Array('A','Of','An','At','The','With','In','To','And','But','Is','For');
 $reg = '/(?<='s)('      # set lowercase only if surrounded by whitespace
      . join('|', $NoUc) # add OR'ed list of words
      . ')(?='s)/e';     # set regex-eval mode
 return preg_replace( $reg, 'strtolower("''1")', ucwords($str) );
}

如果使用进行测试

...
$Strings = Array('gotta give up, gotta give in',
                 'if i told you this was killing me, would you stop?',
                 'we''re at the top of the world (to the simple two)',
                 'and keep reaching for those stars');
foreach ($Strings as $s)
   print titleize2($s) . "'n";
...

这将返回正确的结果。

试试这个正则表达式：

/'b(A|Of|An|At|The|With|In|To|And|But|Is|For)(?!$)'b/uie

负前瞻(?!$)排除了后面跟着endofrine的匹配。

为行(?!$)的末尾添加一个负前瞻应该可以实现

function titleize($string){
  return ucfirst(
     preg_replace("/'b(A|Of|An|At|The|With|In|To|And|But|Is|For)'b(?!$)/uie",
     "strtolower('$1')", 
     ucwords(inflector::humanize($string)))
  );
}