无论img元素的格式有多糟糕，都可以使用php-pre_replace来预处理src值 - using php preg_replace to prepend the src values regardless how badly formed the img element is

我的html内容如下：

<div class="preload"><img src="PRODUCTPAGE_files/like_icon_u10_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/line_u14_line.png" width="1" height="1"/>

它是一条不间断的长行，没有换行符，分隔每个img元素，没有任何缩进。

我使用的php代码如下：

/**
 *
 * Take in html content as string and find all the <script src="yada.js" ... >
 * and add $prepend to the src values except when there is http: or https:
 *
 * @param $html String The html content
 * @param $prepend String The prepend we expect in front of all the href in css tags
 * @return String The new $html content after find and replace. 
 * 
 */
    protected static function _prependAttrForTags($html, $prepend, $tag) {
        if ($tag == 'css') {
            $element = 'link';
            $attr = 'href';
        }
        else if ($tag == 'js') {
            $element = 'script';
            $attr = 'src';
        }
        else if ($tag == 'img') {
            $element = 'img';
            $attr = 'src';
        }
        else {
            // wrong tag so return unchanged
            return $html;
        }
        // this checks for all the "yada.*"
        $html = preg_replace('/(<'.$element.''b.+'.$attr.'=")(?!http)([^"]*)(".*>)/', '$1'.$prepend.'$2$3$4', $html);
        // this checks for all the 'yada.*'
        $html = preg_replace('/(<'.$element.''b.+'.$attr.'='."'".')(?!http)([^"]*)('."'".'.*>)/', '$1'.$prepend.'$2$3$4', $html);
        return $html;
    }
}

无论img元素的格式有多糟糕，我都希望我的函数能正常工作。

无论src属性的位置如何，它都必须工作。

它唯一应该做的就是在src值前面加上一些东西。

还要注意，如果src值以http开头，则不会发生这种preg_replace。

现在，只有当我的内容是：时，我的代码才能工作

<div class="preload">
    <img src="PRODUCTPAGE_files/like_icon_u10_normal.png" width="1" height="1"></img>
    <img src="PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/line_u14_line.png" width="1" height="1"/><img src="PRODUCTPAGE_files/line_u15_line.png" width="1" height="1"/>

正如你可能猜到的，它成功地做到了，但只针对第一个img元素，因为它转到下一行，并且在打开的img标记的末尾没有/。

请告诉我如何改进我的功能。

更新：

我用了DOMDocument，它很管用！在准备好src值之后，我需要将其替换为php代码片段

所以原创：

<img src="PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1"/>

在使用DOMDocument并添加我的前缀字符串之后：

<img src="prepended/PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1" />

现在我需要用替换整个东西

<?php echo $this->Html->img('prepended/PRODUCTPAGE_files/read_icon_u12_normal.png', array('width'=>'1', height='1')); ?>

我还能使用DOMDocument吗？或者我需要使用preg_replace？

DomDocument的构建是为了解析HTML，无论它有多糟糕，而不是构建自己的HTML解析器，为什么不使用它呢？

使用DomDocument和XPath的组合，您可以这样做：

<?php
$html = <<<HTML
<script src="test"/><link href="test"/><div class="preload"><img src="PRODUCTPAGE_files/like_icon_u10_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/line_u14_line.png" width="1" height="1"/><img width="1" height="1" src="httpPRODUCTPAGE_files/line_u14_line.png"/>
HTML;
$doc = new DOMDocument();
@$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$searchTags = $xpath->query('//img | //link | //script');
$length = $searchTags->length;
for ($i = 0; $i < $length; $i++) {
    $element = $searchTags->item($i);
    if ($element->tagName == 'link')
        $attr = 'href';
    else
        $attr = 'src';
    $src = $element->getAttribute($attr);
    if (!startsWith($src, 'http'))
    {
        $element->setAttribute($attr, "whatever" . $src);
    }
}
// this small function will check the start of a string 
// with a given term, in your case http or http://
function startsWith($haystack, $needle)
{
    return !strncmp($haystack, $needle, strlen($needle));
}
$result = $doc->saveHTML();
echo $result;

这是它工作的现场演示

如果你的HTML搞砸了，比如缺少结束标记等，你可以在@$doc->loadHTML($html);:之前使用

$doc->recover = true;
$doc->strictErrorChecking = false;

如果你想格式化输出，你可以在@$doc->loadHTML($html);:之前使用

$doc->formatOutput = true;

使用XPath，我们只捕获需要编辑的数据，因此不必担心其他元素。

请记住，如果您的HTML缺少标记，例如body、html、doctype、head，则会自动添加，但如果您已经有了标记，则不应该做任何其他操作。

但是，如果你想删除它们，你可以使用以下内容，而不仅仅是$doc->saveHTML();:

$result = preg_replace('~<(?:!DOCTYPE|/?(?:html|head|body))[^>]*>'s*~i', '', $doc->saveHTML());

如果你想用新创建的元素替换元素，你可以使用这个：

$newElement = $doc->createElement($element->tagName, '');
$newElement->setAttribute($attr, "prepended/" . $src);
$myArrayWithAttributes = array ('width' => '1', 'height' => '1');
foreach ($myArrayWithAttributes as $attribute=>$value)
    $newElement->setAttribute($attribute, $value);
$element->parentNode->replaceChild($newElement, $element);

通过创建一个片段：

$frag = $doc->createDocumentFragment();
$frag->appendXML('<?php echo $this->Html->img("prepended/PRODUCTPAGE_files/read_icon_u12_normal.png", array("width"=>"1", "height"=>"1")); ?>');
$element->parentNode->replaceChild($frag, $element);

现场演示

您可以使用整洁格式化HTML：

$tidy = tidy_parse_string($result, array(
    'indent' => TRUE,
    'output-xhtml' => TRUE,
    'indent-spaces' => 4
));
$tidy->cleanRepair();
echo $tidy;