获取分页链接


Fetch pagination links

我是php的新手。我想做的是获取分页的链接。页面上有页码,当然链接会随着我们选择页面而改变。如何通过停留在http://ahadith.co.uk/sahihmuslim.php主页面上获取分页的url。

<?php 
        $ch = curl_init(); 
        curl_setopt($ch, CURLOPT_URL, "http://ahadith.co.uk/sahihmuslim.php"); 
//fetches data from the site mentioned above
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
        $output = curl_exec($ch); 
        $pattern = "/href=[']([^'][a-zA-Z]+.[a-zA-Z]+.[cid]+=[0-9]+)[']?/";
//this regex brings the links from the above url
        preg_match_all($pattern, $output, $matches, PREG_PATTERN_ORDER);
        foreach ($matches[1] as $data) {
        $homepage = file_get_contents('http://ahadith.co.uk/'.$data);
//all the links data which was caught above using regex has been stored in $homepage
        $pattern_chapter= "/(?<='<h2'>)('s*.*'s*)(?='<'/h2'>)/";
//Here I have fetched the chapters from the data stored in $homepage
        preg_match_all($pattern_chapter, $homepage, $matches_chapter, PREG_PATTERN_ORDER);
        foreach ($matches_chapter[1] as $chapters) {
        print_r($chapters);
        }
?>

现在,我必须从存储在$homepage中的数据中获取分页的链接。在这种情况下,分页有44页,我想得到所有44页的链接。这是匹配分页http:'/'/([a-zA-Z]+.[a-zA-Z]+.[a-zA-Z]+.[a-zA-Z]+.[a-zA-Z]+.[cid]+=[0-9]&[a-zA-Z]+=[0-9]&[a-zA-Z]+=[0-9]+)中的链接的正则表达式我找了很多地方找这个,但找不到任何与此相关的东西。有人能帮我吗。

使用"HtmlPageDom"。它是一个第三方库,用于使用DOM轻松操作HTML文档。您可以从任何页面中提取所需的任何类型的数据。

https://github.com/wasinger/htmlpagedom/blob/master/README.md