php regex or html dom parsing


php regex or html dom parsing

我使用regex进行HTML解析,但我需要您的帮助来解析下表:

            <table class="resultstable" width="100%" align="center">
                <tr>
                    <th width="10">#</th>
                    <th width="10"></th>
                    <th width="100">External Volume</th>
                </tr>                   
                <tr class='odd'>
                        <td align="center">1</td>
                        <td align="left">
                            <a href="#" title="http://xyz.com">http://xyz.com</a>
                            &nbsp;
                        </td>
                        <td align="right">210,779,783<br />(939,265&nbsp;/&nbsp;499,584)</td>
                    </tr>
                     <tr class='even'>
                        <td align="center">2</td>
                        <td align="left">
                            <a href="#" title="http://abc.com">http://abc.com</a>
                            &nbsp;
                        </td>
                        <td align="right">57,450,834<br />(288,915&nbsp;/&nbsp;62,935)</td>
                    </tr>
            </table>

我想获得所有域的卷(在数组或var中),例如

http://xyz.com - 210,779,783

在这种情况下,我应该使用regex还是HTML dom。我不知道如何解析大表,你能帮忙吗,谢谢。

这里有一个XPath示例,正好解析问题中的HTML。

<?php
$dom = new DOMDocument();
$dom->loadHTMLFile("./input.html");
$xpath = new DOMXPath($dom);
$trs = $xpath->query("//table[@class='resultstable'][1]/tr");
foreach ($trs as $tr) {
  $tdList = $xpath->query("td[2]/a", $tr);
  if ($tdList->length == 0) continue;
  $name = $tdList->item(0)->nodeValue;
  $tdList = $xpath->query("td[3]", $tr);
  $vol = $tdList->item(0)->childNodes->item(0)->nodeValue;
  echo "name: {$name}, vol: {$vol}'n";
}
?>