PHP/Regex.如何替换域后url中的所有百分比(%)字符


PHP/Regex. How to replace all percent (%) chars in url after domain?

我遇到了一个问题,我试图为preg_replace创建一个正则表达式,用域名后url中的"_"字符替换所有百分比(%)字符(在域路径中)。

示例:

This is c%ontent with 1 url her%e http://example.com/this%is%image.jpg
and 1 url here http://anotherexample.com/t%his%is%image2.jpg

结果:

This is c%ontent with 1 url her%e http://example.com/this_is_image.jpg
and 1 url here http://anotherexample.com/t_his_is_image2.jpg

我的问题是:如何使用preg_replace做到这一点?

我所拥有的只是用于在img标签中选择域的正则表达式:

/<img [^>]*src="([^"]+example'.com'/[^"]+)"[^>]*>/

如果您处理正则表达式替换,我建议使用preg_replace_callback

http://php.net/manual/en/function.preg-replace-callback.php

请记住,在url中替换%可能很危险,因为url可能有一些有效的%字符,如http://foo.bar/here%20/index.html,其中%20是空白

示例

$haystack = 'This is c%ontent with 1 url her%e http://example.com/this%is%image.jpg 
and 1 url here http://anotherexample.com/t%his%is%image2.jpg';
// please use your fav url regex here
$urlRegex = '#'bhttps?://[^'s()<>]+(?:'(['w'd]+')|([^[:punct:]'s]|/))#';
$haystack = preg_replace_callback($urlRegex, function($url){
    return str_replace('%', '_', $url[0]);
}, $haystack);

您可以用一个简单的正则表达式匹配字符串中的URL:

// $subject is the string    
preg_match_all('/http:'/'/[^'s]+/', $subject, $matches);

然后循环匹配,将URL中的%替换为_,并将其替换为原始$subject:

foreach ($matches as $match) {
    $search = $match;
    $replace = str_replace('%', '_', $match);
    $subject = str_replace($search, $replace, $subject);
}

像这样使用dirname()basename()str_replace()怎么样:

$haystack = 'This is c%ontent with 1 url her%e http://example.com/this%is%image.jpg';
$result = dirname($haystack) . '/' . str_replace('%','_',basename($haystack));
echo $result;

结果:

This is c%ontent with 1 url her%e http://example.com/this_is_image.jpg

这将比使用preg_replace()和正则表达式效率高得多。

更新:

正如ins0所指出的,上面的答案取决于只包含一个位于末尾的url的字符串。不是很灵活。这是基于我在上面发布的另一个想法:

$haystack = 'This is c%ontent with 1 url her%e http://example.com/this%is%image.jpg 
and 1 url here http://anotherexample.com/t%his%is%image2.jpg';
$parts = explode(' ',$haystack);
foreach ($parts as &$part) {
    if (strpos($part,'http://') !== false || strpos($part,'https://') !== false) {
        $part = dirname($part) . '/'. str_replace('%','_',basename($part));
    }
}
$haystack = implode(' ',$parts);
echo $haystack;

结果:

This is c%ontent with 1 url her%e http://example.com/this_is_image.jpg
and 1 url here http://anotherexample.com/t_his_is_image2.jpg

这没有@ins0的答案那么优雅,但我提出了另一个解决方案,我通常不使用php进行编码,所以这可能不是最理想的。如果可以改进,请发表评论。

$str3 = "This is c%ontent with 1 url her%e http://example.com/this%is%image.jpg and 1 url here http://anotherexample.com/t%his%is%image2.jpg ";
$regex = "(http''S+(''s|$))";
$unmatched = preg_split($regex, $str3);
preg_match_all($regex, $str3, $matches);
$substituted = (str_replace("%", "_", $matches[0]));
$result = "";
foreach($substituted as $key=>$value) {
    $result .= $unmatched[$key];
    $result .= $substituted[$key];
}  
print $result; # for testing