不能从其他页面获得准确的值 - Can't get exactly values from other page

Can't get exactly values from other page

本文关键字：其他不能 | 更新日期: 2023-09-27

我试图从这个页面http://www.skysports.com/football/competitions/bundesliga/table得到计分表。我用

$bundes = file('http://www.skysports.com/football/competitions/bundesliga/table');

当我尝试显示数组$ bundles时，我这样做:

echo '<pre>', print_r($bundes), '</pre>';

我尝试执行display的代码显示如下:

[1437] => 
[1022] => German Bundesliga 2015/16
#   Team    Pl  W   D   L   F   A   GD  Pts Last 6
1   [1059] => [1060] => Bayern Munich [1061] => [1062] =>   9   9   0   0   29  4   25  27  [1072] =>
[1073] =>
[1074] =>

这是表的第一行。现在我可以显示$bundes[1060]，我得到Bayer Munich的输出，但我怎么能从$bundes[1062]中得到值，值是9,9,0,0,29,4,25和27?我需要在<td></td>中显示这些值当我尝试回显$bundes[1062]时，我一无所获。

一种更可靠的提取数据的方法是使用DOM操作类做如下事情:

$doc = new 'DOMDocument();
@$doc->loadHTMLFile('http://www.skysports.com/football/competitions/bundesliga/table');
$xpath = new 'DOMXPath($doc);
$rows = $xpath->query('//tbody/tr');
$data = [];
foreach ($rows as $i => $row) {
    $columns = $xpath->query('td', $row);
    foreach ($columns as $column) {
        $data[$i][] = trim($column->textContent);
    }
}
print_r($data);

等于:

Array
(
    [0] => Array
        (
            [0] => 1
            [1] => Bayern Munich
            [2] => 9
            [3] => 9
            [4] => 0
            [5] => 0
            [6] => 29
            [7] => 4
            [8] => 25
            [9] => 27
            [10] => 
        )
...

关于Dagon的评论，没有条款可以禁止抓取和提取数据(只要你以合理的速度这样做，不影响网站的性能)。使用条款&然而，版权法确实规定了对抓取的内容可以做什么和不可以做什么(例如重新发布)。

网页抓取可能违反某些网站的使用条款。这些条款的可执行性尚不明确(参见"关于链接的常见问题-网站使用条款是否具有约束力的合同?")。
-维基百科，网页抓取:法律问题

顺便说一句，页面机器人meta标签允许INDEX.