我有两个字符串。其中一个包含<em>
标记,完全小写,不包含分隔符或常用词,如"the"、"in"等,而另一个则不包含。一个例子:
$str1 = 'world <em>round</em>';
$str2 = 'World - is Round';
我想通过比较$str1
中的哪个小写单词包含<em>
标签,将$str2
设为'World - is <em>Round</em>'
。到目前为止,我已经完成了以下操作,但如果两个字符串中的单词数量不相等,则失败。
public static function applyHighlighingOnDisplayName($str1, $str2) {
$str1_w = explode(' ', $str1);
$str2_w = explode(' ', $str2);
for ($i=0; $i<count($str1_w); $i++) {
if (strpos($str1_w[$i], '<em>') !== false) {
$str2_w[$i] = '<em>' . $str2_w[$i] . '</em>';
}
}
return implode(' ', $str2_w);
}
$str1 = '<em>cup</em> <em>cakes</em>' & $str2 = 'Cup Cakes':
applyHighlighingOnDisplayName($str1, $str2) : '<em>Cup</em> <em>Cakes</em>': Correct
$str1 = 'cup <em>cakes</em>' & $str2 = 'The Cup Cakes':
applyHighlighingOnDisplayName($str1, $str2) : 'The <em>Cup</em> Cakes: Incorrect
我应该如何改变我的方法?
正如其他人所说,regex就是解决方案。以下是一个带有详细注释的工作示例:
$string1 = 'world <em>round</em>';
$string2 = 'World is - Round';
// extract what's in between <em> and </em> - it will be stored in $matches[1]
preg_match('/<em>(.+)<'/em>/i', $string1, $matches);
if (!$matches) {
echo 'The first string does not contain <em>';
exit();
}
// replace what we found in the previous operation
$newString = preg_replace('/'b' . preg_quote($matches[1], ''b/') . '/i', '<em>$0</em>', $string2);
echo $newString;
详细信息:
- http://php.net/manual/en/function.preg-replace.php
- http://php.net/manual/en/function.preg-match.php
后期编辑-涵盖多个案例:
$string1 = 'world <em>round</em> not <em>flat</em>';
$string2 = 'World is - Round not Flat! Round, ok?';
// extract what's in between <em> and </em> - it will be stored in $matches[1]
preg_match_all('/<em>(.+?)<'/em>/i', $string1, $matches);
if (!$matches) {
echo 'The first string does not contain <em>';
exit();
}
foreach ($matches[1] as $match) {
// replace what we found in the previous operation
$string2 = preg_replace('/'b' . preg_quote($match) . ''b/i', '<em>$0</em>', $string2);
}
echo $string2;
您当前的方法取决于字符串中的字数;更好的解决方案是使用正则表达式为您进行匹配。即使你强调了其他强调词的子串,下面的版本也会安全地工作(例如"猫"answers"猫的摇篮"或"猫砂")。
function applyHighlighingOnDisplayName($str1, $str2) {
# if we have strings surrounded by <em> tags...
if (preg_match_all("#<em>(.+?)</em>#", $str1, $match)) {
## sort the match strings by length, descending
usort($match[1], function($a,$b){ return strlen($b) - strlen($a); } );
# all the match words are in $match[1]
foreach ($match[1] as $m) {
# replace every match with a string that is very unlikely to occur
# this prevents 'b matching the start or end of <em> and </em>
$str2 = preg_replace("#'b($m)'b#i",
"ZZZZ$1ZZZZ",
$str2);
}
# replace ZZZZ with the <em> tags
return preg_replace("#ZZZZ(.*?)ZZZZ#", "<em>$1</em>", $str2);
}
return $str2;
}
$str1 = 'cup <em>cakes</em>';
$str2 = 'Cup Cakes';
print applyHighlighingOnDisplayName($str1, $str2) . PHP_EOL;
输出:
Cup <em>Cakes</em>
The Cup <em>Cakes</em>
两个没有<em>
'd字的字符串:
$str1 = 'cup cakes';
$str2 = 'Cup Cakes';
print applyHighlighingOnDisplayName($str1, $str2) . PHP_EOL;
输出:
Cup Cakes
现在有一些更棘手的事情:很多短单词,其中一个单词是所有其他单词的子字符串:
$str1 = '<em>i</em> <em>if</em> <em>in</em> <em>i''ve</em> <em>is</em> <em>it</em>';
$str2 = 'I want to make the str2 as "World - is Round", by comparing which lowercase word in the str1 contains the em tag. So far, I''ve done the following, but it fails if number of words aren''t equal in both strings.';
输出:
<em>I</em> want to make the str2 as "World - <em>is</em> Round", by comparing which lowercase word <em>in</em> the str1 contains the em tag. So far, <em>I've</em> done the following, but <em>it</em> fails <em>if</em> number of words aren't equal <em>in</em> both strings.
这是因为高亮显示代码期望两个字符串中的单词位置之间1:1对应:
cup <em>cakes</em>
1 2
Cup Cakes
但在你不正确的样品上:
cup <em>cakes</em>
1 2 3
The Cup Cakes
例如,你在单词#2找到<em>
,所以你在另一个字符串中突出显示单词#2,但在那个字符串中,单词#2是Cup
。
一个更好的算法是从原始字符串中剥离html,这样您最终只得到cup cakes
。然后在另一个字符串中查找cup cakes
,并突出显示该位置的第二个单词。这将补偿字符串中由额外(或更少)单词引起的任何"运动"。