Heyho,
我想用链接替换一些单词,但仅限于前 3 个 p 标签 ($limit_p = 3( 中,并且仅在 p 标签中第一次出现。单词列表和链接列表位于不同的数组中。我有一个preg_replace_callback函数来替换它。它工作正常,但如果一个单词是另一个单词的一部分并且每次都替换它,则会出现一些问题:
$text = "<p>Lorem ipsum Hello World lorem ipsum.</p><p>Hello you</p>";
$arr1 = array('/ Hello World '/,'/ Hello /');
$arr2 = array(' <a href="link2">Hello World"</a> ',' <a href="link1">Hello</a> ');
$limit_p = 3;
$limit_tag = 1;
$res = preg_replace_callback(
'/(<p[^>]*>)(.+?)(<'/p>)/Ui',
function ($m) use (&$arr1, &$arr2, &$limit_tag) {
list (, $s, $t, $e) = $m;
$t = preg_replace($arr1, $arr2, $t, $limit_tag);
//$t = str_replace($find, $repl, $t);
return "$s$t$e";
},
$text, $limit_p
);
我得到的是:
<p>Lorem ipsum <a href="link2"><a href="link1">Hello</a> World</a> lorem ipsum.</p><p><a href="link1">Hello</a> you</p>
我想要的是:
<p>Lorem ipsum <a href="link2">Hello World</a> lorem ipsum.</p><p><a href="link1">Hello</a> you</p>
所以我只想替换它,如果它不在 a 标签内。如果同一个单词在 2 个 p 标签中,则两次都会被替换,这是我不想要的。只应替换第一个匹配项。
你能帮帮我吗?
非常感谢!
在Niet的帮助下,我现在有了这个解决方案:
$dom = new DOMDocument();
// loadXml needs properly formatted documents, so it's better to use loadHtml, but it needs a hack to properly handle UTF-8 encoding
$previous_value = libxml_use_internal_errors(TRUE);
$dom->loadHtml(mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8"));
libxml_clear_errors();
libxml_use_internal_errors($previous_value);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//text()[not(ancestor::a) and (ancestor::p) and not(ancestor::strong)]') as $node)
{
$replaced = preg_replace_callback(
'/'b(?:('.implode(')|(',$arr1).'))'b/',
function($m) use (&$arr1,&$arr2) {
// find which pattern matched
array_shift($m);
$result = array_filter($m);
$keys = array_keys($result);
$matched = $keys[0];
// apply match and remove from search list
$result = @$arr2[$matched];
unset($arr1[$matched], $arr2[$matched]);
return $result;
},
$node->wholeText, -1
);
//$replaced = str_ireplace('match this text', 'MATCH', $node->wholeText);
$newNode = $dom->createDocumentFragment();
if($replaced && $replaced != "")
$newNode->appendXML($replaced);
$node->parentNode->replaceChild($newNode, $node);
}
// get only the body tag with its contents, then trim the body tag itself to get only the original content
return mb_substr($dom->saveXML($xpath->query('//body')->item(0)), 6, -7, "UTF-8");
它工作正常,但 html 代码必须有效并且有时会崩溃(我很确定我的是有效的(,但我收到此错误:警告:DOMDocumentFragment::appendXML((: 实体:第 1 行:解析器错误:xmlParseEntityRef:在 [...] 中没有名称
问题是您正在按顺序执行替换。
相反,请尝试一次应用所有内容:
有这样的$arr1
:
$arr1 = array("Hello World","Hello");
在你最深的代码中:
$t = preg_replace_callback(
'/'b(?:('.implode(')|(',$arr1).'))'b/',
function($m) use (&$arr1,&$arr2) {
// find which pattern matched
array_shift($m);
$result = array_filter($m);
$keys = array_keys($result);
$matched = $keys[0];
// apply match and remove from search list
$result = $arr2[$matched];
unset($arr1[$matched], $arr2[$matched]);
return $result;
},
$t
);
假设我没有搞砸它,这应该可以很好地工作。
怎么样
$text = "<p>Lorem ipsum Hello World lorem ipsum.</p><p>Hello you</p>";
$arr1 = array('Hello World','Hello');
$arr2 = array('<a href="link2">Hello World</a>','<a href="link1">Hello</a>');
print strtr($text, array_combine($arr1, $arr2));
// <p>Lorem ipsum <a href="link2">Hello World</a> lorem ipsum.</p><p><a href="link1">Hello</a> you</p>