html-dom解析器从span-ibling中提取href


html dom parser to extract href from span sibiling

这是我的html文件,包含表中<span>标记中的日期和链接。有人能帮我找到特定日期的链接吗。查看特定日期的链接

<table>
<tbody>
<tr class="c0">
<td class="c11">
<td class="c8">
<ul class="c2 lst-kix_h6z8amo254ry-0 start">
<li class="c1">
<span>1st Apr 2014 - </span>
<span class="c6"><a class="c4" href="/link.html">View</a>
</span>
</li>
</ul>
</td>
</tr>
</td>
</table>

我想检索特定日期的链接

我的代码是这样的

include('simple_html_dom.php');    
$html = file_get_html('link.html');
//store the links in array
foreach($html->find('span') as $value)
{
    //echo $value->plaintext . '<br />';
    $date = $value->plaintext;
    if (strpos($date,$compare_text)) {
         //$linkeachday = $value->find('span[class=c1]')->href;
        //$day_url[] = $value->href;
        //$day_url = Array("text" => $value->plaintext);
        $day_url = Array("text" => $date, "link" =>$linkeachday);
        //echo $value->next_sibling (a);
    }
}

$spans = $html->find('table',0)->find('li')->find('span');
echo $spans;
 $num = null;
 foreach($spans as $span){
     if($span->plaintext == $compare_text){
        $next_span = $span->next_sibling();
        $num = $next_span->plaintext;
         echo($num);    
        break; 
     }
 }
 echo($num);

您的上一个示例是正确的。。。

我对它进行了一些修改,得到了以下内容,基本上得到了所有的跨度,然后测试他们是否有搜索到的文本,如果有,它会显示他们下一个兄弟的内容(查看代码中的注释):

$input =  <<<_DATA_
    <table>
        <tbody>
            <tr class="c0">
                <td class="c11">
                    <td class="c8">
                        <ul class="c2 lst-kix_h6z8amo254ry-0 start">
                            <li class="c1">
                                <span>1st Apr 2013 - </span>
                                <span>1st Apr 2014 - </span>
                                <span class="c6">
                                    <a class="c4" href="/link.html">View</a>
                                </span>
                                <span>1st Apr 2015 - </span>
                            </li>
                        </ul>
                    </td>
                </td>
            </tr>
        </tbody>
    </table>
_DATA_;
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);
// Searched value
$searchDate = '1st Apr 2014';
// Find all the spans direct childs of li, which is a descendent of table
$spans = $html->find('table li > span');
// Loop through all the spans
foreach ($spans as $span) {
    // If the span starts with the searched text && has a following sibling
    if ( strpos($span->plaintext, $searchDate) === 0 && $sibling = $span->next_sibling()) {
        // Then, print it's text content
        echo $sibling->plaintext;    // or ->innertext for raw content
        // And stop (if only one result is needed)
        break;
    }
}

输出

View

对于字符串比较,您也可以(最好)使用regex。。。

因此,在上面的代码中,您添加了这个来构建您的模式:

$pattern = sprintf('~^'s*%s~i', preg_quote($searchDate, '~'));

然后使用preg_match测试匹配:

if ( preg_match($pattern, $span->plaintext) && $sibling = $span->next_sibling()) {

我不知道简单的HTML DOM,但内置的PHP DOM库应该足够了。

假设你的约会对象是这样的。。。

$date = '1st Apr 2014';

使用XPath表达式可以很容易地找到相应的链接。例如

$doc = new DOMDocument();
$doc->loadHTMLFile('link.html');
$xp = new DOMXpath($doc);
$query = sprintf('//span[starts-with(., "%s")]/following-sibling::span/a', $date);
$links = $xp->query($query);
if ($links->length) {
    $href = $links->item(0)->getAttribute('href');
}
    include('simple_html_dom.php');
    $html = file_get_html('link.html');
        $compare_text = "1st Apr 2013";

        $tds = $html->find('table',1)->find('span');
        $num = 0;
         foreach($tds as $td){
        if (strpos($td->plaintext, $compare_text) !== false){
                $next_td = $td->next_sibling();
                    foreach($next_td->find('a') as $elm) {
                    $num = $elm->href;
                    }
             //$day_url =   array($day => array(daylink => $day, text => $td->plaintext, link => $num));
echo $td->plaintext. "<br />";
echo $num . "<br />";
             }
         }