preg_match以匹配src=、background=和url(.)


preg_match to match src=, background= and url(..)

我想找到一个正则表达式,它可以(在给定的HTML中)找到以下图像:

  • 捕获于:src=""
  • 捕获于:src=''
  • 捕获于:background=""
  • 捕获于:background=''
  • 捕获于:url("")
  • 捕获于:url('')
  • 捕获于:url()

到目前为止,我想出了:

preg_match_all("/src=(('"|'|)?(.*'.(png|gif|jpg))('"|'|))/Ui", $strHTML, $arrMatches);
preg_match_all("/background=(('"|'|)?(.*'.(png|gif|jpg))('"|'|))/Ui", $strHTML, $arrMatches);
preg_match_all("/url'(('"|'|)?((.*'.(png|gif|jpg))('"|'|))')/Ui", $strHTML, $arrMatches);

但是这些是不完整的,因为它们不包括前缀(src/background/url)。此外,在安全方面,我认为它们可以进一步改进,以防止有人进入src="http://somesite.com/someurl.exe?ext=jpg"

我们感谢在正确方向上提供的任何帮助。

编辑:

我想我明白了,尽管代码肯定可以改进,甚至可能组合和/或优化:)

/* match CSS url() links */
preg_match_all("/(url'(('"|'|)(.*'.(png|gif|jpg|jpeg))('"|'|)'))/Ui", $strHTML, $arrMatches);
Array
(
    [0] => Array
        (
            [0] => url('test1.gif')
            [1] => url(test2.gif)
            [2] => url("test3.gif")
        )
    [1] => Array
        (
            [0] => url('test1.gif')
            [1] => url(test2.gif)
            [2] => url("test3.gif")
        )
    [2] => Array
        (
            [0] => '
            [1] => 
            [2] => "
        )
    [3] => Array
        (
            [0] => test1.gif
            [1] => test2.gif
            [2] => test3.gif
        )
    [4] => Array
        (
            [0] => gif
            [1] => gif
            [2] => gif
        )
    [5] => Array
        (
            [0] => '
            [1] => 
            [2] => "
        )
)
/* match img links */
preg_match_all("/(src=('"''??)(.*'.(png|gif|jpg|jpeg))('"''??))/Ui", $strHTML, $arrMatches);
/* match background links */
preg_match_all("/(background=('"''??)(.*'.(png|gif|jpg|jpeg))('"''??))/Ui", $strHTML, $arrMatches);

如果您确定这些属性名称(src、url和background)。。。

$arr = array(
    'url("http://somesite.com/someurl.exe?src=jpg")',
    'url(http://somesite.com/someurl.exe?src=jpg)',
    'src="http://somesite.com/someurl.exe?src=jpg"',
    'src="http://somesite.com/someurl.exe?ext=jpg"',
    'background="http://somesite.com/someurl.exe?src=jpg"'
);
foreach ($arr as $str) {
    preg_match_all('/(?<=src=|background=|url'()(''|")?(?<image>.*?)(?='1|'))/i',$str,$matches);
    echo $str;
    foreach($matches['image'] as $img) {
        echo "'nimage: <b>$img</b>'n";
    }
    echo "'n";
}