使用 PHP 检索 head 标签内多个脚本标签的属性和内容 - retrieve attributes and content of multiple script tags inside the head tag with PHP

retrieve attributes and content of multiple script tags inside the head tag with PHP

我发现了几个与我的问题相关的不同问题，但我无法将它们组合成一个函数。

这是我的 HTML：

<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>microscope</title>
<script language="javascript">AC_FL_RunContent = 0;</script>
<script src="Scripts/AC_RunActiveContent.js" language="javascript"></script>
</head>

这是我现在的代码：

$filePath = "directory/file.html";
retrieveScriptContentandAttributes($filePath);
function retrieveScriptContentandAttributes($filePath) {
$dom = new DOMDocument;
@$dom->loadHTMLFile($filePath);
//var_dump($dom->loadHTMLFile($filePath));
$head = $dom->getElementsByTagName('head')->item(0);
$xp = new DOMXpath($dom);
$script = $xp->query("script", $head);
for ($row = 0; $row < 5; $row++) {
    echo $script->item($row)->textContent;
    if ($script->item($row) instanceof DOMNode) {
        if ($script->item($row)->hasAttributes()) {
            foreach ($script->item($row)->attributes as $attr) {
                $name = $attr->nodeName;
                $value = $attr->nodeValue;
                $scriptAttr[] = array('attr'=>$name, 'value'=>$value);
            }
            echo $scriptAttr;
        }
    }
}

我得到的结果是"ArrayAC_FL_RunContent = 0;数组通知：尝试获取非对象的属性"行"echo $script->item（$row）->textContent;"。奇怪的是，该行执行得很好。但是我需要一种方法来让$scriptAttr像这样打印数组：language=>javascript。然后再次用于下一个脚本标记：src=>Scripts/AC_RunActiveContent.js，language=>javascript。

感谢您的帮助！！

尝试 DOMXpath（请参阅： http://php.net/manual/en/class.domxpath.php）：

<?php
$dom = new DOMDocument();
$dom->loadHtml('<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>microscope</title>
<script language="javascript">AC_FL_RunContent = 0;</script>
<script src="Scripts/AC_RunActiveContent.js" language="javascript"></script>
</head>
');
$xpath = new DOMXPath($dom);
$scriptAttributes = array();
/* //head/script[@src] would only select nodes with an src attribute */
foreach ($xpath->query('//head/script') as $node) {
    $attributes =& $scriptAttributes[];
    foreach ($node->attributes as $name => $attribute) {
        $attributes[$name] = $attribute->nodeValue;
    }
}
var_dump($scriptAttributes);

输出：

array(2) {
  [0]=>
  array(1) {
    ["language"]=>
    string(10) "javascript"
  }
  [1]=>
  array(2) {
    ["src"]=>
    string(30) "Scripts/AC_RunActiveContent.js"
    ["language"]=>
    string(10) "javascript"
  }
}

你可以稍微清理代码，消除getElementsByTagName调用：

$dom = new DOMDocument;
@$dom->loadHTMLFile($filePath);
$xp = new DOMXpath($dom);
$scripts = $xp->query("//head/script"); // find only script tags in the head block, ignoring scripts elsewhere
foreach($scripts as $script) {
    .... your stuff here ...
}

xpath 查询返回的 DOMNoteList 是可迭代的，因此您可以简单地对它进行 foreach，而无需执行计数/for 循环。通过直接 XPath 查询执行此操作，您不必检查$script节点是否是脚本节点......这是查询结果将返回的唯一节点类型。