正则表达式匹配从第一个大写字母到字符串的句子结尾,以突出显示单词数组


Regex to match from first uppercase to end of sentence of string to highlight array of words

我知道这个标题很难理解。基本上我得到了一个大概20000字符的文本

当我执行搜索时,我想提取找到任何匹配单词的句子并突出显示它们。

我得到了一个要突出显示的单词数组,称为$words,并将主文本称为$text。所以我的代码如下:

foreach($words as $word):
    $regex = '/[^.!?'n]*'b'.preg_quote($word,"/").''b[^.!?'n]*/i';
    preg_match_all($regex, $text, $matches);  
    count($matches[0]) > 3 ? $search_q= 3 : $search_q=count($matches[0]);
    for ($i=0; $i < $search_q; $i++):
        echo preg_replace('/'b('.preg_quote($word,"/").')'b/i','<span class="highlighted">$1</span>',$matches[0][$i]).'[..]  ';
    endfor;
endforeach;

这个代码的问题是当两个单词属于同一个句子时,那么这个句子将被打印两次。我想打印一次,两个单词都突出显示,但是我不知道怎么做。

谢谢大家的帮助

更新:测试场景

我们假设:

$text="A new holiday shopping tradition: Smartphones and social networks
Many consumers will take out their phones before their wallets this holiday season with even more visiting social media sites before tackling their gift lists.
More than one-quarter (27 percent) of smartphone owners plan to use their devices for holiday shopping to search for store locations (67 percent), compare prices (59 percent) and check product availability (46 percent).  Additionally, 44 percent say they plan to use social media to seek discounts, read reviews and check family and friends’ gift lists.
“Consumers are using online and mobile platforms to make the most of their holiday budgets, and the survey indicates that they will do more than just compare prices,” said Paul.  “Retailers that use mobile and online channels to show product availability, locations and pricing but add customized promotions and gift ideas may encourage shoppers to come in the door for a specific gift and take additional items to the register.”";

单词是:

$words=array('social','media');

用我的代码我得到这个:

A new holiday shopping tradition: Smartphones and **social** networks[..]
Many consumers will take out their phones before their wallets this holiday season with even more visiting **social** media sites before tackling their gift lists[..]
Additionally, 44 percent say they plan to use **social** media to seek discounts, read reviews and check family and friends’ gift lists[..]
Many consumers will take out their phones before their wallets this holiday season with even more visiting social **media** sites before tackling their gift lists[..]
Additionally, 44 percent say they plan to use social **media** to seek discounts, read reviews and check family and friends’ gift lists[..]

我想改成:

A new holiday shopping tradition: Smartphones and **social** networks[..]
Many consumers will take out their phones before their wallets this holiday season with even more visiting **social** **media** sites before tackling their gift lists[..]
Additionally, 44 percent say they plan to use **social** **media** to seek discounts, read reviews and check family and friends’ gift lists[..]

使用fge代码我得到:

social[..] 
social[..] 
social[..] 
media[..] 
media[..] 

我希望有例子很容易理解。谢谢大家

如果您将文本分成一组句子并依次检查每个句子,您的头痛可能会少一些。如果单词列表不是太长,您可以将整个列表放入正则表达式中。比如:

/'b('Qword1'E|'Qword2'E|'Qword3'E)'b/

首先,我不明白您为什么使用如此复杂的正则表达式:您确实使用了单词锚,那么为什么要为补充字符类而烦恼呢?

第二,这个解决方案假设单词不包含特殊的正则表达式字符…

你可以这样做:

$w = preg_quote($word, "/");
$fullword = ''b'Q' . $w . ''E'b';
$regex = '/' . $fullword . '(?!.*' . $fullword . ')/i';

解释:'Q意味着所有字符,直到'E都应该按字面处理(这意味着如果单词包含点,则您是安全的)。所以,你匹配你的词(它是锚定的),然后你说你不应该再匹配这个词(?!.*'b'Qwordhere'E'b)

这意味着如果一个句子多次包含一个单词,它将只匹配最后一次出现的单词!

最后,使用:

preg_replace('/(' . $fullword . ')/ig', '<span class="highlighted">$1</span>', $text);