如何使用简单的html-dom-php查找特定数据


How to find specific data using simple html dom php

当我刮表时,表tr和td值正在更改。下面是原始表格。

<table class="scoretable">
<tbody>
<tr><td class="jdhead">Name</td><td class="fullhead">John</td></tr>
<tr><td class="jdhead">Age</td><td class="fullhead">30</td></tr>
<tr><td class="jdhead">Phone</td><td class="fullhead">91234988788</td></tr>
<tr><td class="jdhead">Location</td><td class="fullhead">Madrid</td></tr>
<tr><td class="jdhead">Country</td><td class="fullhead">Spain</td></tr>
<tr><td class="jdhead">Role</td><td class="fullhead">Manager</td></tr>
</tbody>
</table>
<table class="scoretable">
<tbody>
<tr><td class="jdhead">Name</td><td class="fullhead">John</td></tr>
<tr><td class="jdhead">Age</td><td class="fullhead">30</td></tr>
<tr><td class="jdhead">Phone</td><td class="fullhead">91234988788</td></tr>
<tr><td class="jdhead">Role</td><td class="fullhead">Manager</td></tr>
</tbody>
</table>

上面两张表来自不同的页面。我需要刮取姓名、电话和角色。

$url = "http://name.com/listings";
$html = file_get_html( $url );
$posts1 = $html->find('td[class=fullhead]',1);
foreach ( $posts1 as $post1 ) {
    $poster1 = $post1->outertext;
    echo $poster1;
    }

我会尝试从HTML中preg_match所需的值,如下所示:

<?php
$url = 'http://name.com/listings';
$html = file_get_contents($url);
if (preg_match('~<tr><td class="jdhead">Name</td><td class="fullhead">([^<]*)</td></tr>~', $html, $matches)) {
    echo $matches[1]; // here is you name   
}
if (preg_match('~<tr><td class="jdhead">Phone</td><td class="fullhead">([^<]*)</td></tr>~', $html, $matches)) {
    echo $matches[1]; // here is you phone  
}
if (preg_match('~<tr><td class="jdhead">Role</td><td class="fullhead">([^<]*)</td></tr>~', $html, $matches)) {
    echo $matches[1]; // here is you role   
}

更新(见下面的评论):

<?php
$url = 'http://jobsearch.naukri.com/job-listings-010915006292';
$html = file_get_contents($url);
if (preg_match('~<TR VALIGN="top"> <TD CLASS="jdHead">Job Posted </TD> <TD VALIGN="top" CLASS="detailJob">([^<]*)</TD> </TR>~', $html, $matches)) {
    echo 'Job Posted: ' . $matches[1] . '<br><br>';
}

if (preg_match('~<TR VALIGN="top"> <TD CLASS="jdHead">Job Description</TD> <TD VALIGN="top" CLASS="detailJob">(.*?)</TD> </TR>~', $html, $matches)) {
    echo 'Job Description: ' . $matches[1] . '<br><br>';
}

我有一个适用于您的解决方案,例如:

<?php
// load
$doc = new DOMDocument();
$doc->loadHTMLFile("tabledata.html");
// required nodes
$required_data = ['Name', 'Phone', 'Role'];
$tbody_elements = $doc->getElementsByTagName('tbody');
// xpath object
$xpath = new DOMXPath($doc);
// array for final data
$finaldata = [];
// each tr is one user
foreach($tbody_elements as $key => $tbody)
{
    // iterate though the required data
    foreach($required_data as $data)
    {
        $return = $xpath->query("tr[td[text()='$data']]", $tbody);
        foreach($return as $node)
        {
            $finaldata[$key][$data] = $node->textContent;
        }
    }
}

输出:

array(2) {
  [0]=>
  array(3) {
    ["Name"]=>
    string(8) "NameJohn"
    ["Phone"]=>
    string(16) "Phone91234988788"
    ["Role"]=>
    string(11) "RoleManager"
  }
  [1]=>
  array(3) {
    ["Name"]=>
    string(8) "NameJohn"
    ["Phone"]=>
    string(16) "Phone91234988788"
    ["Role"]=>
    string(11) "RoleManager"
  }
}