Xpath保留换行和其他html标记


Xpath preserving break lines and other html tags

下面是html页面的来源:

<h3>Background</h3>
<p>Example 1<br>Example 2<br> </br> <ul></li>ABC<li></ul>
</p>
<h3>Job Description</h3>
<p>content of job description</p>
这是xpath查询:
//node()[preceding::h3[text()="Background"] and following-sibling::h3[text()="Job Description"]]

我需要这样的输出:

<p>Example 1<br>Example 2<br> </br> <ul></li>ABC<li></ul>
    </p>

使用simple,您需要做如下操作:

$html = str_get_html($str);
foreach($html->find('h3') as $h3){
  if($h3->text() == 'Background'){
    echo $h3->next_sibling();
  }
}
// <p>Example 1<br>Example 2<br> </br> <ul></li>ABC<li></ul>  </p>

DomXpath不能到达那里,因为html太无效(ulp 's内部)

这一行修复了代码。它现在保留了换行标记和<li>标记。

//node()[preceding::h3[text()="Background"] and following-sibling::h3[text()="Job Description"]]/node()'

我在字符串的末尾添加了/node()