PHP regex检查图像是否被标签包装


PHP regex to check if image is wrapped with a tag

我正在创建一个wordpress功能,需要确定内容中的图像是否被包含PDF或DOC文件链接的标签包装,例如

<a href="www.site.com/document.pdf"><img src="../images/image.jpg" /></a>

我该如何在PHP中这样做呢?

谢谢

非常强烈建议不要使用正则表达式。除了更容易出错和可读性差之外,它也不能让您轻松地操作内容。

最好将内容加载到DomDocument中,检索所有<img>元素并验证它们的父元素是否为<a>元素。然后,您所要做的就是验证href属性的值是否以所需的扩展名结束。

一个非常粗糙的实现看起来有点像这样:

<?php
$sHtml = <<<HTML
<html>
<body>
    <img src="../images/image.jpg" />
    <a href="www.site.com/document.pdf"><img src="../images/image.jpg" /></a>
    <a href="www.site.com/document.txt"><img src="../images/image.jpg" /></a>
    <p>this is some text <a href="site.com/doc.pdf"> more text</p> 
</body>
</html>
HTML;
$oDoc = new DOMDocument();
$oDoc->loadHTML($sHtml);
$oNodeList = $oDoc->getElementsByTagName('img');
foreach($oNodeList as $t_oNode)
{
    if($t_oNode->parentNode->nodeName === 'a')
    {
        $sLinkValue = $t_oNode->parentNode->getAttribute('href');
        $sExtension = substr($sLinkValue, strrpos($sLinkValue, '.'));
        echo '<li>I am wrapped in an anchor tag '
           . 'and I link to  a ' . $sExtension . ' file '
        ; 
    }
}
?>

我将为读者留下一个精确的实现作为练习;-)

下面是一个可以使用的基于DOM解析的代码:

$html = <<< EOF
<a href="www.site.com/document.pdf"><img src="../images/image.jpg" /></a>
<img src="../images/image1.jpg" />
<a href="www.site.com/document.txt"><IMG src="../images/image2.jpg" /></a>
<a href="www.site.com/document.doc"><img src="../images/image3.jpg" /></a>
<a href="www.site.com/document1.pdf">My PDF</a>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$nodeList = $doc->getElementsByTagName('a');
for($i=0; $i < $nodeList->length; $i++) {
    $node = $nodeList->item($i);
    $children = $node->childNodes; 
    $hasImage = false;
    foreach ($children as $child) { 
       if ($child->nodeName == 'img') {
          $hasImage = true;
          break;
       }
    }
    if (!$hasImage)
       continue;
    if ($node->hasAttributes())
       foreach ($node->attributes as $attr) {
          $name = $attr->nodeName;
          $value = $attr->nodeValue;
          if ($attr->nodeName == 'href' && 
              preg_match('/'.(doc|pdf)$/i', $attr->nodeValue)) {
                echo $attr->nodeValue . 
                     " - Image is wrapped in a link to a PDF or DOC file'n";
                break;
          }
       }
}

实时演示:http://ideone.com/dwJNAj