PHP 正则表达式仅从 URL 中清除特定字符串 - PHP regex to clean a specific string from URLs only

PHP regex to clean a specific string from URLs only

是否有正则表达式忍者想出一个PHP解决方案来清除任何http/url中的标签，但将标签保留在文本的其余部分？

例如：

the word <cite>printing</cite> is in http://www.thisis<cite>printing</cite>.com

应该变成：

the word <cite>printing</cite> is in http://www.thisisprinting.com

这就是我要做的：

<?php
//a callback function wrapper for strip_tags
function strip($matches){
    return strip_tags($matches[0]);
}
//the string
$str = "the word <cite>printing<cite> is in http://www.thisis<cite>printing</cite>.com";
//match a url and call the strip callback on it
$str = preg_replace_callback("/:'/'/[^'s]*/", 'strip', $str);
//prove that it works
var_dump(htmlentities($str));

http://codepad.viper-7.com/XiPcs9

此替换的适当正则表达式可能是：

#(https?://)(.*?)<cite>(.*?)</cite>([^'s]*)#s

s在所有换行符中匹配的标志。
在标签之间使用lazy选择以确保准确，以免转义更多相似的标签

片段：

<?php
$str = "the word <cite>printing<cite> is in http://www.thisis<cite>printing</cite>.com";
$replaced = preg_replace('#(https?://)(.*?)<cite>(.*?)</cite>([^'s]*)#s', "$1$2$3$4", $str);
echo $replaced;
// Output: the word <cite>printing<cite> is in http://www.thisisprinting.com

现场演示

假设您可以从文本中识别 URL，您可以：

$str = 'http://www.thisis<cite>printing</cite>.com';
$str = preg_replace('~</?cite>~i', "", $str);
echo $str;

输出：

http://www.thisisprinting.com