查找以"X"开头的字符串并以"Y"并替换两者之间的内容


Find string that begins with "X" and ends with "Y" and replaces content between

我正在尝试处理一些html并将所有img标签src替换为base64。我已经编写了下面的函数来转换图像并以base64格式返回。我需要帮助的是:

我需要使用str_replace, preg_replace或某种regex来扫描一些html并将所有的"src"替换为图像的base64表示。html作为变量存储,而不是作为实际的html文档存储。例如,如果我有一些html,如:

$htmlSample =  "<div>Some text, yada yada and now and image <img src='image1.png' /></div>"

我需要扫描它并将src='image.png'替换为base64等效的,例如src="data:image/png;base64,/9j/4WvuRXhpZgAASUkqAAgAAAAIAA8BAgASAAAAbgAABAgAK"——(这不是实际的base64,只是一些填充文本)。该函数需要能够为html中的多个图像执行此操作。如果你能给我指个正确的方向,我会很感激的。谢谢你们了!

function convertImage($file)
{

    if($fp = fopen($file,"rb", 0))
    {
       $picture = fread($fp,filesize($file));
       fclose($fp);
       $base64 = base64_encode($picture);
       $tag = '<img ' . "" .
          'src="data:image/png;base64,' . $base64 .
          '"  />';
       return $tag;
    }
}

看看像SimpleDOM这样的DOM操纵符。这将使您能够以更面向对象的方式解析HTML文档,而不是使用混乱的正则表达式,因为库更有可能处理您可能想不到的情况。

正如Adam建议的那样,我能够使用SimpleDOM(链接:simplehtmldom.sourceforge.net)完成此操作。

require_once('simple_html_dom.php');
$html = "This is some test code <img width='50' src='img/paddock1.jpg' /> And this is some additional text and an image: <img src='img/paddock2.jpg' />";
//uses function from simple_html_dom.php to make html parsable
$doc = str_get_html($html);
//finds each image in html and converts
foreach ($doc->find('img[src]') as $img) 
{
    //get src of image and assign to $src
    $src = $img->src;
    $imageBase = convertImage($src);
    $img->src = $imageBase;

}
$html = (string) $doc;
echo $html;
function convertImage($file)
{
    //finds file based on $src name from above and runs code if file exists
    if($fp = fopen($file,"rb", 0))
    {
       $picture = fread($fp,filesize($file));
       fclose($fp);
       //converts image file to base64
        $base64 = base64_encode($picture);
       //returns nessary data: + base64 code to $imageBase above to be inserted into html>img>src
       return 'data:image/png;base64,' . $base64;
    }
}