不能一次使用Curl和php抓取超过90个页面 - Not able to scrape more than 90 pages at a time using Curl and php?

Not able to scrape more than 90 pages at a time using Curl and php?

本文关键字：抓取 php 90个 Curl 一次不能 | 更新日期: 2023-09-27

我为我的一个客户创建了一个scraper，该scraper主要用于抓取少数合作伙伴网站并获取数据。在90页之前，scraper的工作非常出色，一旦达到90页，它就会不断地抓取相同的页面。我真的很困惑，为什么会发生这种情况。

有人能帮我一下吗?

function getFlightCharges() {
$requestHeaders = requestHeaderProperties ();
$sql = 'select * from F_charge where enabled = 0';
$details = DatabaseHandler::GetAll ( $sql );
foreach ( $details as $detail ) {
    $link = $detail ['product_link'];
    $id = $detail ['id'];
    $url = $link;
    $referer = 'Referer: example.com/';
    $requestHeaders [] = $referer;
    $html = getHTMLContentFromURL ( $url, $requestHeaders );
    foreach ( $html->find ( '.no-touch' ) as $e ) {
        foreach ( $e->find ( '.content-well' ) as $e1 ) {
            foreach ( $e1->find ( '.price' ) as $prices ) {
                $pricee = strip_tags ( $prices->innertext );
            }
            foreach ( $e1->find ( '.article-body' ) as $desc ) {
                $description = strip_tags ( $desc->innertext );
            }
        }
    }
    $sql = "INSERT INTO price_data(product_price) VALUES (:product_price)";
    $params = array (':product_price' => $price);
    DatabaseHandler::Execute ( $sql, $params );     
}
DatabaseHandler::Close ();
}

所以这里是功能，获得合作伙伴网站的链接，获得当前价格和更新我的数据库。直到第89或90页，它都很好用。但是不久之后它就卡在了第90页并且不断地在数据库中抓取和更新

但不久之后它就卡在了第90页，并在数据库中不断地抓取和更新

基本上每个php脚本都有运行时间限制。有些服务器将其限制为30秒，因此如果处理90个页面，则可能超出此脚本运行时间限制。

你的解决方案是:

限制每个脚本运行的页数访问。
使用cron命令来执行脚本。
由于您在不同的时间调用脚本，因此您应该在db表中添加时间标记以保存记录上次更新的时间标记。