从特定链接中提取链接文本 - Extracting Link Text From Specific Links

Extracting Link Text From Specific Links

本文关键字：链接提取文本 | 更新日期: 2023-09-27

我想知道如何从这个页面中只获得电影的标题。

我有这个，但是我不能让它工作。我也不太了解DomDocument。这将获取当前页面上的所有链接。但是，我只需要获得列出的电影标题的链接。

$content =  file_get_contents("http://www.imdb.com/movies-in-theaters/");
$dom = new DomDocument();
$dom->loadHTML($content);
$urls = $dom->getElementsByTagName('a');

$dom = new DomDocument();
@$dom->loadHTMLFile('http://www.imdb.com/movies-in-theaters/');
$urls = $dom->getElementsByTagName('a');
$titles = array();
foreach ($urls as $url)
{
    if ('overview-top' === $url->parentNode->parentNode->getAttribute('class'))
        $titles[] = $url->nodeValue;
}
print_r($titles);

将输出:

Array
(
    [0] =>  Star Trek Into Darkness (2013)
    [1] =>  Frances Ha (2012)
    [2] =>  Stories We Tell (2012)
    [3] =>  Erased (2012)
    [4] =>  The English Teacher (2013)
    [5] =>  Augustine (2012)
    [6] =>  Black Rock (2012)
    [7] =>  State 194 (2012)
    [8] =>  Iron Man 3 (2013)
    [9] =>  The Great Gatsby (2013)
    [10] =>  Pain & Gain (2013)
    [11] =>  Peeples (2013)
    [12] =>  42 (2013)
    [13] =>  Oblivion (2013)
    [14] =>  The Croods (2013)
    [15] =>  The Big Wedding (2013)
    [16] =>  Mud (2012)
    [17] =>  Oz the Great and Powerful (2013)
)

您也可以使用XPath来做到这一点，但我对它的了解还不够，无法这样做。