如何使用xpath为每个匹配只选择文本字符串的直接父节点?


How can I select only the immediate parent node of a text string using xpath for every match

注意:这个问题与下面的问题不同,这里的值出现在一个节点和同一节点的子节点中:

XPath contains(text(),'some string')在有多个text子节点的节点上使用时不起作用

给定以下html:

$content = 
'<html>
 <body>
  <div>
   <p>During the interim there shall be nourishment supplied</p>
  </div>
  <div>
   <p>During the <a href="#">interim</a> there shall be interim nourishment supplied</p>
  </div>
  <div>
   <ul><li>During the interim there shall be nourishment supplied</li></ul>
  </div>
 </body>
</html>';

和下面的xpath:

//*[contains(text(),'interim')]

…只提供3个匹配,而我想要4个匹配。根据评论,我期待的四个元素是P P A LI。

这完全符合预期。看看这个glot。io链接。

<?php
$html = <<<HTML
<html>
 <body>
  <div>
   <p>During the interim there shall be nourishment supplied</p>
  </div>
  <div>
   <p>During the <a href="#">interim</a> there shall be interim nourishment supplied</p>
  </div>
  <div>
   <ul><li>During the interim there shall be nourishment supplied</li></ul>
  </div>
 </body>
</html>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//*/text()[contains(.,"interim")]') as $n) var_dump($n->getNodePath());

您将得到四个匹配项:

  • /html/身体/div [1]/p/text ()
  • /html/身体/div [2]/p//text ()
  • /html/身体/div [2]/p/text () [2]
  • /html/身体/div [3]/ul/李/text ()