去掉HTML块中的第一个IMG元素 - Strip out first IMG elements in an HTML block

我有一个PHP应用程序，它可以从第三方来源获取HTML，HTML中可能包含一个或多个IMG元素。我想完整地获取第一个IMG实例，但不确定如何进行。

有人能把我推向正确的方向吗？

谢谢。

您可以使用XPath解析html，并以这种方式提取所需的数据。它比字符串位置检查稍微复杂一些，但如果您决定想要更具体的东西（例如，第一个img标签的src和alt），它的优点是更健壮。

首先将html字符串加载到DOMDocument中，然后将其加载到XPath中。

// Load html in to DOMDocument, set up XPath
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

我们想要页面上出现的第一个img，所以使用选择器/descendant::img[1]。N.B，这与//img[1]不同，尽管这通常会给出类似的结果。关于两者之间的区别，这里有一个很好的解释。

$matches = $xpath->evaluate("/descendant::img[1]");

使用XPath的一个缺点是，它不容易说"把与img标记匹配的完整字符串还给我"，因此我们可以组合一个简单的函数，迭代匹配节点的属性并重新构建img标记。

$tag = "<img ";
foreach ($node->attributes as $attr) {
    $vals[] = $attr->name . '="' . $attr->value . '"';
}
$tag .= implode(" ", $vals) . " />";

把所有这些放在一起，我们得到了这样的东西：

<?php
// Example html
$html = '<html><body>'
    . ' <img src="/images/my-image.png" alt="My image" width="100" height="100" />'
    . 'Some text here <img src="do-not-want-second.jpg" alt="No thanks" />';
// Load html in to DOMDocument, set up XPath
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
// Get the first img in the doc
// N.B. Not the same as "//img[1]" - see https://stackoverflow.com/a/453902/2287
$matches = $xpath->evaluate("/descendant::img[1]");
foreach ($matches as $match) {
    echo buildImgTag($match);
}
/**
 * Build an img tag given it's matched node
 *
 * @param DOMElement $node Img node
 *
 * @return Rebuilt img tag
 */
function buildImgTag($node) {
    $tag = "<img ";
    $vals = array();
    foreach ($node->attributes as $attr) {
        $vals[] = $attr->name . '="' . $attr->value . '"';
    }
    $tag .= implode(" ", $vals) . " />";
    return $tag;
}

```

因此，总的来说，这是一种比在html上执行strpos或regex稍微复杂一些的方法，但如果您决定对img标记执行任何操作，比如提取特定属性，则应该为您提供更大的灵活性。

如果您假设HTML是一个有效的HTML，下面的示例会起作用，但我们不能假设！如果您100%确信它是一个有效的HTML，那么继续使用它，如果不是，我建议您使用更好的方式，如下所示。

$html = '<br />First<img src="path/abc.jpg" />Next<img src="path/cde.jpg" />';
$start = stripos($html, '<img');
$extracted = substr($html, $start);
$end = stripos($extracted, '>');
echo substr($html, $start, $end+1);

此代码将为您提供：<img src="path/abc.jpg" />

使用不区分大小写的函数查找<img的第一个出现条纹
从第一个出现点开始剪切实际数据
使用不区分大小写的函数查找>的第一个出现条纹
提取介于起点和终点之间的内容substr

更好的方式：

PHP简单HTML DOM解析器手动

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images 
foreach($html->find('img') as $element) {
       echo $element->src . '<br>';
}

用PHP5+编写的HTMLDOM解析器可以让您在简单的方法
需要PHP 5+
支持无效的HTML
使用类似jQuery的选择器在HTML页面上查找标记
在一行中从HTML中提取内容

jQuery可以为ya做到这一点。

$('img')[0]

如果它在页面中HTML的一个较小的小节中，则相应地调整选择器。