在找到的元素之外抓取 HTML - Scraping HTML outside of found element

Scraping HTML outside of found element

本文关键字：抓取 HTML 元素 | 更新日期: 2023-09-27

我使用Simple HTML DOM解析器来匹配元素并提取所需的内容。但是我想做的是能够将所有 HTML 置于

假设 HTML 是

<body>
<div id="otherContent"></div>
<div id="content"></div>
<div id="otherContent2"></div>
</body>

我希望能够获得 #contentdiv之外的所有内容。

Simple HTML DOM Parser 可以做到这一点吗？我想正则表达式是可能的，但像 HTML 解析器这样的更优雅的解决方案会很棒。

是的，Simple HTML DOM 解析器可以做到这一点。例如：

$html = "<your_html_here>";
$content = $html->find("#content");
$innertext = $content->innertext; // if you need all markup from #content
$plaintext = $content->plaintext; // if you need only text
$outertext = $content->outertext; // try it yourself :)

您也可以清除任何 html：

$html = "<your_html_here>";
$html->find("#content")->outertext = ""; // now you've all markup in $html except #content

在手册中阅读更多内容。

可以使用

PHPquery（库很大，但非常有用）以下是示例： https://code.google.com/p/phpquery/