正在分析在相同元素上具有两种不同格式的html页面 - Parsing html page that has two different format on the same elements

Parsing html page that has two different format on the same elements

本文关键字：格式两种页面 html 元素 | 更新日期: 2023-09-27

在同一个html页面中有两种不同格式的相同内容：

第一种是：

<div class="gs"><h3 class="gsr"><a href="http://www.example1.com/">title1</a>

第二种是：

<div class="gs"><h3 class="gsr"><span class="gsc"></span><a href="http://www.example2.com/">title2</a>

如何在一个代码中获得链接和标题，以便使用simple_html_dom处理这两种不同的格式？我试过这个代码，但它不起作用：

foreach($html->find('h3[class=gsr]') as $docLink){
   $link = $docLink->first_child();
   echo $link->plaintext;
   echo $link->href;
}

使用getElementsByTagName($tag);

它将在dom中定位所有指定的标签。。。

请参阅此链接getElementsByTagName

从文档中似乎有一个Descendant Selectors 的概念

// Find all <td> in <table> which class=hello 
$es = $html->find('table.hello td');

然后

foreach($html->find('h3[class=gsr] a') as $link) {
   echo $link->plaintext;
   echo $link->href;
}

应该做好你的工作。[我真的不知道simple_html_dom btw；）试试看]

编辑

还有嵌套的选择器

// Find first <li> in first <ul> 
$e = $html->find('ul', 0)->find('li', 0);

所以

foreach($html->find('h3[class=gsr]') as $docTitle) {
   $link = $docTitle->find('a', 0); //get the first anchor tag
   echo $link->plaintext;
   echo $link->href;
}

也应该工作