使用 PHP 检索 head 标签内多个脚本标签的属性和内容


retrieve attributes and content of multiple script tags inside the head tag with PHP

我发现了几个与我的问题相关的不同问题,但我无法将它们组合成一个函数。

这是我的 HTML:

<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>microscope</title>
<script language="javascript">AC_FL_RunContent = 0;</script>
<script src="Scripts/AC_RunActiveContent.js" language="javascript"></script>
</head>

这是我现在的代码:

$filePath = "directory/file.html";
retrieveScriptContentandAttributes($filePath);
function retrieveScriptContentandAttributes($filePath) {
$dom = new DOMDocument;
@$dom->loadHTMLFile($filePath);
//var_dump($dom->loadHTMLFile($filePath));
$head = $dom->getElementsByTagName('head')->item(0);
$xp = new DOMXpath($dom);
$script = $xp->query("script", $head);
for ($row = 0; $row < 5; $row++) {
    echo $script->item($row)->textContent;
    if ($script->item($row) instanceof DOMNode) {
        if ($script->item($row)->hasAttributes()) {
            foreach ($script->item($row)->attributes as $attr) {
                $name = $attr->nodeName;
                $value = $attr->nodeValue;
                $scriptAttr[] = array('attr'=>$name, 'value'=>$value);
            }
            echo $scriptAttr;
        }
    }
}

我得到的结果是"ArrayAC_FL_RunContent = 0;数组通知:尝试获取非对象的属性"行"echo $script->item($row)->textContent;"。奇怪的是,该行执行得很好。但是我需要一种方法来让$scriptAttr像这样打印数组:language=>javascript。然后再次用于下一个脚本标记:src=>Scripts/AC_RunActiveContent.js,language=>javascript。

感谢您的帮助!!

尝试 DOMXpath(请参阅: http://php.net/manual/en/class.domxpath.php):

<?php
$dom = new DOMDocument();
$dom->loadHtml('<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>microscope</title>
<script language="javascript">AC_FL_RunContent = 0;</script>
<script src="Scripts/AC_RunActiveContent.js" language="javascript"></script>
</head>
');
$xpath = new DOMXPath($dom);
$scriptAttributes = array();
/* //head/script[@src] would only select nodes with an src attribute */
foreach ($xpath->query('//head/script') as $node) {
    $attributes =& $scriptAttributes[];
    foreach ($node->attributes as $name => $attribute) {
        $attributes[$name] = $attribute->nodeValue;
    }
}
var_dump($scriptAttributes);

输出

array(2) {
  [0]=>
  array(1) {
    ["language"]=>
    string(10) "javascript"
  }
  [1]=>
  array(2) {
    ["src"]=>
    string(30) "Scripts/AC_RunActiveContent.js"
    ["language"]=>
    string(10) "javascript"
  }
}

你可以稍微清理代码,消除getElementsByTagName调用:

$dom = new DOMDocument;
@$dom->loadHTMLFile($filePath);
$xp = new DOMXpath($dom);
$scripts = $xp->query("//head/script"); // find only script tags in the head block, ignoring scripts elsewhere
foreach($scripts as $script) {
    .... your stuff here ...
}

xpath 查询返回的 DOMNoteList 是可迭代的,因此您可以简单地对它进行 foreach,而无需执行计数/for 循环。通过直接 XPath 查询执行此操作,您不必检查$script节点是否是脚本节点......这是查询结果将返回的唯一节点类型。