PHP简单HTML DOM计算错误的元素数量


PHP Simple HTML DOM counts wrong the number of elements

使用此代码,我想计算某个节点中类为"level3"的元素(dt)的数量:

include_once('simple_html_dom.php');
ini_set("memory_limit", "-1");
ini_set('max_execution_time', 1200);
function parseInit($url) {
  $ch = curl_init();
  $timeout = 0;
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);     
  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2); 
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
  $data = curl_exec($ch);
  curl_close($ch);
  return $data;
}
$data = parseInit("https://www.smile-dental.de/index.php");
$html = new simple_html_dom();
$html = $html->load($data);
$struct = $html->find("dt.level1", 0)->next_sibling()->find("dt.level2", 0)->next_sibling()->find("dt.level3");
echo count($struct);
$html->clear();  
unset($html);

但结果我遇到了这样的问题。实际结果应该是2,但我得到了53(类为"level3"的DT元素到类为"level1"的第一个DT节点的总数)。你能帮我解释一下问题出在哪里吗?

提前感谢!

---编辑---一般来说,我想创建(左侧导航栏的)链接的层次结构。我写了这样的函数。但它是错误的,也许是因为我上面写的情况。但是,除了这个问题,代码中可能还有其他问题。

function get_links($struct) {
    static $iter = 1;
    $nav_left_links = $struct->find("dt.level".$iter);
    echo "<ul>";   
    foreach ($nav_left_links as $links) {
        echo "<li>".$links->find("a", 0)->href;
        echo str_pad('',4096)."'n";
        ob_flush();
        flush();
        usleep(500000);
        $iter++;
        if ($links->next_sibling() && count($links->next_sibling()->find("dt")) > 0) {
            get_links($links->next_sibling());
        } else {
            $iter--;
            if ($key == count($nav_left_links)) {
                break;
            } else {
                continue;   
            }
        }
        echo "</li>";  
    }
    echo "</ul>";
    $iter--;
}
$data = parseInit("https://www.smile-dental.de/index.php");
$html = new simple_html_dom();
$html = $html->load($data);
$struct = $html->find(".mod_vertical_dropmenu_142_inner", 0);
get_links($struct);
$html->clear();  
unset($html); 

或者,如果有人知道如何在没有PHP Simple HTML DOM的情况下重写这些代码,使用经典的解析方法,我将不胜感激。

不幸的是,您似乎发现了一个错误。我做了一些实验,即使在更正了验证错误后,简单的html-dom也无法正确遍历dldtdd元素。当我使用regex将所有dl元素转换为ul,并将dddt元素转换为li时,我确实做到了:

$html->find("li.level1", 1)->find("li.level2", 1)->find("li.level3"); 结果

<li class="level3 off-nav-321-8120 notparent first"><span class="outer"> <span class="inner"> <a href="/index.php?option=com_virtuemart&amp;view=productdetails&amp;virtuemart_category_id=321&amp;virtuemart_product_id=8120"><span>Pro-Seal Versiegeler</span></a> </span> </span></li>
<li class="level3 off-nav-321-8120 notparent first"></li>
<li class="level3 off-nav-321-8122 notparent last"><span class="outer"> <span class="inner"> <a href="/index.php?option=com_virtuemart&amp;view=productdetails&amp;virtuemart_category_id=321&amp;virtuemart_product_id=8122"><span>Pro-Seal L.E.D. Versiegeler</span></a> </span> </span></li>
<li class="level3 off-nav-321-8122 notparent last"></li>