PHP scraper脚本中的Useragent

Useragent in PHP scraper script

我有一个PHP scraper脚本,用来在我的网站上刮取页面。然后,脚本将内容解析为HTML,并将其输出给用户。我偶然发现在PHP中使用useragent函数来假装你是一个爬虫,例如GoogleBot。如何将我的两个脚本组合在一起,使我正在抓取的页面认为我是一个爬网程序?

我的scraper PHP代码是:

$html = file_get_contents("search.php?q=$query");
    '/<div class="cl1 cld">.*?<a rel="nofollow" class="l le" href="(.*?)">(.*?)<'/a>.*?<div class="cra">(.*?)<'/div>.*?<div class="clud">(.*?)<'/div>.*?<'/div>/s',
    $posts, // will contain the blog posts
    PREG_SET_ORDER // formats data into an array of posts
foreach ($posts as $post) {
    $link = $post[1];
    $title = $post[2];
    $description = $post[3];
    $url = $post[4];
echo "<div class='result'><div class='title'><a href='$link'>$title</a></div>$description<div class='url'>$url</div></div>";


$userAgent = 'MyScraperBot (';

如果你想继续使用file_get_contents,你可以用设置PHPs内部(http fopen wrapper)用户代理

 ini_set("user_agent", 'MyScraperBot (');

您需要使用CURL setopt

// spoofing FireFox 2.0
$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20061204 Firefox/";
$ch = curl_init();
// set user agent
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
// set the rest of your cURL options here