使用正则表达式提取页面内容和图像 - Extract page content and image using regex

Extract page content and image using regex

本文关键字：图像正则表达式提取 | 更新日期: 2023-09-27

>我使用以下方法获取页面内容：

$data = file_get_contents($url);

现在我想提取

图像和
数据部分离开脚本和 HTML 代码。

这是我使用的图像的正则表达式：

function get_logo($data) 
{
    return preg_match("/<img(.*?)src=('"|'')(.+?)(gif|jpg|png|bmp)('"|'')(.*?)('/)?>(<'/img>)?/", $html, $matches) ? $matches[1] : '';
}

什么也不返回。

不要使用正则表达式来解析 HTML！

我建议你使用像PHP Simple HTML DOM解析器这样的HTML DOM解析器。

以下正则表达式将从变量中提取图像 url$data：

preg_match_all('/<img[^>]+src=([''"])([^"'']+)'1/i', $content, $matches);
var_dump($matches[2]);

在数组中$matches[2]将是指向$content图像的所有链接

1）我们看不到html，很难理解你需要。

2）preg_match_all("/<img[^>]+src=['"|''](.+'.(gif|jpg|png|bmp))['"|'']/im", $html, $matches)返回页面上的所有img标签，图像名称和扩展名