PHP在HTML标记中找到某个字符,并用字符串替换整个标记


PHP find certain character in HTML tag and replace the whole tag by string

我从sql表中提取了一个字符串值,如下所示:

<p>Commodity Exchange on 5 April 2016 settled as following graph:</p> 
<p><img alt='"'" src='"ckeditor/plugins/imageuploader/uploads/986dfdea.png'" 
style='"height:163px; width:650px'" /></p></p> 
<p>end of string</p>

我希望在html标签中获得图像名称986dfdea.png(因为字符串中有很多<p></p>标签,我想知道这个标签包含图像),并用一个符号替换整个标签内容,比如"#image1"。

最终它会变成这样:

<p>Commodity Exchange on 5 April 2016 settled as following graph:</p> 
#image1 
<p>end of string</p>

我正在为移动应用程序开发API,但对PHP有着初级技能,仍然无法通过参考以下参考文献来实现我的目标:

PHP/regex:如何获取HTML标记的字符串值?

如何使用php从html中提取img src、title和alt?

请帮忙。

是的,你可以使用正则表达式,你需要的代码更少,但我们不应该用正则表达式解析html,所以你需要的是:

  1. 您的字符串包含无效的html(</p></p>),因此我们使用tidy_repair_string进行清洁
  2. 使用DOMXpath()查询包含img标签的p标签
  3. 删除任何额外的",并使用getAttribute("src")basename获取图像文件名
  4. 使用图像#imagename的值创建新的createTextNode
  5. 使用replaceChild将带有图像的p替换为上面创建的新createTextNode
  6. 清理new DOMDocument();自动生成的!DOCTYPEhtmlbody标签

<?php
$html = <<< EOF
<p>Commodity Exchange on 5 April 2016 settled as following graph:</p>
<p><img alt='"'" src='"ckeditor/plugins/imageuploader/uploads/986dfdea.png'"
style='"height:163px; width:650px'" /></p></p>
<p>end of string</p>
EOF;

$html = tidy_repair_string($html,array(
                           'output-html'   => true,
                           'wrap'           => 80,
                           'show-body-only' => true,
                           'clean' => true,
                           'input-encoding' => 'utf8',
                           'output-encoding' => 'utf8',
                                          ));

$dom = new DOMDocument();
$dom->loadHtml($html);

$x = new DOMXpath($dom);
foreach($x->query('//p/img') as $pImg){
    //get image name
    $imgFileName = basename(str_replace('"', "", $pImg->getAttribute("src")));
    $replace = $dom->createTextNode("#$imgFileName");
    $pImg->parentNode->replaceChild($replace, $pImg);
    # loadHTML causes a !DOCTYPE tag to be added, so remove it:
    $dom->removeChild($dom->firstChild);
    # it also wraps the code in <html><body></body></html>, so remove that:
    $dom->replaceChild($dom->firstChild->firstChild, $dom->firstChild);
    echo str_replace(array("<body>", "</body>"), "", $dom->saveHTML());
}

输出:

<p>Commodity Exchange on 5 April 2016 settled as following graph:</p>
<p>#986dfdea.png</p>
<p>end of string</p>

Ideone演示