从td元素中拉出节点属性以及节点值


Pull node attribute out aswell as node value from td element

我有下面的PHP代码,它获取一个HTML文件并从中提取表,然后解析表,并像Current Output中那样返回单元格数据。我正在尝试获取href属性输出,也像Desired Output代码段中一样。如果存在href,我看不出如何从单元格中仅针对href,我似乎只能获取节点值,非常感谢任何帮助。

电流输出

Array
(
    [0] => Array
        (
            [id] => 213
            [url] => Website
        )
)

所需输出

Array
(
    [0] => Array
        (
            [id] => 213
            [url] => Website
            [link] => example.com/page/1/
        )
)

HTML

<table>
    <tr>
        <td>213</td>
        <td><a href="example.com/page/1/">Website</a></td>
    </tr>
</table>

PHP

$dom = new DOMDocument();
$html = $dom->loadHTMLFile($url);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
$cols = $rows->item(0)->getElementsByTagName('th');
$row_headers = null;
foreach($cols AS $node) {
    $row_headers[] = $node->nodeValue;
}
$table = array();
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach($rows AS $row) {
    $cols = $row->getElementsByTagName('td');
    $row = array();
    $i = 0;
    foreach($cols AS $node) {
        if ($row_headers != null) {
            $row[$row_headers[$i]] = $node->nodeValue;
        }
        $i++;
    }
    if (!empty($row)) {
        $table[] = $row;
    }
}

我曾在嵌套的foreach foreach($cols AS $node)中尝试过$row['link'] = $node->getAttribute('href');,但似乎也不起作用。

请参阅下面的代码和内联注释

$html = '<table>
    <tr>
        <td>213</td>
        <td><a href="example.com/page/1/">Website</a></td>
    </tr>
    <tr>
        <td>444</td>
        <td><a href="example.org/page/1/">not a website</a></td>
    </tr>
</table>';
$dom = new DOMDocument();
$html = $dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$rows = $dom->getElementsByTagName("tr");
foreach($rows as $row){
    $cols = $row->getElementsByTagName('td'); 
    $id = $cols->item(0)->nodeValue; // get the id, the first td element, index=0
    $anchor = $cols->item(1)->nodeValue; // get the anchor text, the second td element, index=1
    $url    = $cols->item(1)->getElementsByTagName('a')->item(0)->getAttribute('href'); // get the url from the href attribute, the second td element, index=1
    $result[] = array(
        'id' => $id,
        'anchor'=> $anchor,
        'url'=>$url
    );
}
print_r($result);

应该输出这个

Array
(
    [0] => Array
        (
            [id] => 213
            [anchor] => Website
            [url] => example.com/page/1/
        )
    [1] => Array
        (
            [id] => 444
            [anchor] => not a website
            [url] => example.org/page/1/
        )
)