提取HTML文档的一部分-需要包含xHTML标记 - Extract portion of HTML document - need to include xHTML markup

Extract portion of HTML document - need to include xHTML markup

我有一种情况，需要提取xHTML页面的一部分，包括标记。

在这种情况下，regex不是正确的途径，因为我不能保证子div的确切数量。

<div id="myDiv">
    <div><p>This is some content</p></div>
    <div><p>This additional content</p></div>
</div>

因此，在上面的代码片段中，我需要提取<div><p>This is some content</p></div>，其中包括标记。

我已经对使用xPath进行了一些研究，它似乎是完成这项工作的方法，但我不确定如何使它不仅返回节点的值，还返回所有相关的标记。

您是正确的，这可以通过DOMDocument和XPath实现，比如：

$doc = new DOMDocument();
$doc->loadHTML( $html); // Load the HTML snippet
$xpath = new DOMXPath( $doc);
$node = $xpath->query( '//div[@id="myDiv"]/div')->item(0); // Get the <div>
$saved_node = $doc->saveHTML( $node); // Export that node

在输出中，您可以看到所需的字符串，包括标记：

string(62) "<div><p>This is some content</p></div>"

请注意，我必须通过htmlentities()运行输出，这样您就可以在不查看页面源代码的情况下看到<div>。