使用简单 HTML DOM 解析器检索文本节点 - Retrieve a text node with Simple HTML DOM Parser

Retrieve a text node with Simple HTML DOM Parser

本文关键字：检索文本节点简单 HTML DOM | 更新日期: 2023-09-27

我对Simple HTML DOM解析器很陌生。我想从以下 HTML 中获取一个子元素：

<div class="article">
 <div style="text-align:justify">
    <img src="image.jpg" title="image">
    <br>
    <br>
    "Text to grab"
    <div>......</div>
    <br></br>
    ................
    ................
  </div>
</div>

我正在尝试获取文本"要抓取的文本"

到目前为止，我已经尝试了以下查询：

$html->find('div[class=article] div')->children(3);

但它不起作用。知道如何解决这个问题吗？

这里

不需要simple_html_dom。它可以通过DOMDocument和DOMXPath来完成。两者都是PHP核心的一部分。

例：

// your sample data
$html = <<<EOF
<div class="article">
 <div style="text-align:justify">
    <img src="image.jpg" title="image">
    <br>
    <br>
    "Text to grab"
    <div>......</div>
    <br></br>
    ................
    ................
  </div>
</div>
EOF;
// create a document from the above snippet
// if you are loading from a remote url use:
//   $doc->load($url);
$doc = new DOMDocument();
$doc->loadHTML($html);
// initialize a XPath selector
$selector = new DOMXPath($doc);
// get the text node (also text elements in xml/html are nodes
$query = '//div[@class="article"]/div/br[2]/following-sibling::text()[1]';
$textToGrab = $selector->query($query)->item(0);
// remove newlines on start and end using trim() and output the text
echo trim($textToGrab->nodeValue);

输出：

"Text to grab"

如果它总是在同一个地方，你可以这样做：

$html->find('.article text', 4);