在我们的一些文章中,我们有一些图像错误地将链接硬编码到图像标签的title/alt属性中,这会破坏图像的显示。例如:
<img src="/imgs/my-image.jpg" title="This is a picture of a <a href="/blob.html">blob</a>." />
我尝试过使用preg_replace_callback函数,但由于链接中的引号重复,很难匹配完整的标题。
我希望能够对任何字符串进行动态编程,以确保正确的输出。想法?
您可以尝试这种模式:
$pattern = <<<'EOD'
~
(?:
'G(?!'A) # second entry point
(?: # content up to the next alt/title attribute (optional)
[^><"]* " # end of the previous attribute
(?> [^><"]* " [^"]* " )*? # other attributes (optional)
[^><"]* # spaces or attributes without values (optional)
'b(?:alt|title)'s*='s*" # the next alt/title attribute
)?+ # make all the group optional
|
<img's[^>]*? # first entry point
'b(?:alt|title)'s*='s*"
)
[^<"]*+'K
(?: # two possibilities:
</?a[^>]*> # an "a" tag (opening or closing)
| # OR
(?=") # followed by the closing quote
)
~x
EOD;
$result = preg_replace($pattern, '', $html);
在线演示
这种模式使用了与'G
锚的重复匹配的邻接性。