将数据从网站cURL的for循环存储到数组中


Store data into an array from a for loop from a website cURL

我正试图从网站上"窃取"产品名称,以便自己列出。我希望将这些值存储在一个数组中。我目前已经通过cURL成功地打印出了它们,并去掉了所有的样式。

这是我的代码:

<?php
$ch = curl_init("http://www.nrs.com/category/3101/whitewater-kayaking/helmets");
$fp = fopen("example_homepage.txt", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
$website = file_get_contents('example_homepage.txt');
//COLLECTED AND STORED WEBSITE AS VARIABLE
preg_match_all('#'<h2>(.+?)'<'/h2>#s', $website, $unfiltered);
$products = array_pop($unfiltered);
$remove_how_much = (count($unfiltered[0]))-(array_search('Follow Us:',$products));

for($count=1;$count<=$remove_how_much;$count++) {
    array_pop($products);
}
for($counter=0;$counter<=(count($products)-1);$counter++) {
    $explode1 = explode('>',$products[$counter]);
    $explode2 = explode ('</a',$explode1[1]);
    echo $explode2[0];
    echo '<br/>';
}
?>

快去测试一下,你会看到它打印出来的。我希望将这些值保存到一个数组中,检查是否有重复,并取出单词

- Closeout 

来自所有值。

我也陷入了检查其他分页页面的需要,

所以,我需要从循环

http://www.nrs.com/category/3101/whitewater-kayaking/helmets?pg=1

http://www.nrs.com/category/3101/whitewater-kayaking/helmets?pg=2

等等,直到它接收到错误或重复的页面。

有什么想法吗?

还有,有没有一种方法可以改进我当前的代码以更有效地获取它。

使用PHP简单HTML DOM解析器

<?
include("simple_html_dom.php");
$html = file_get_html('http://www.nrs.com/category/3101/whitewater-kayaking/helmets?ppg=all');

foreach($html->find('h2') as $element)
       echo $element->plaintext."<br />";
/* OUTPUT
WRSI Trident Composite Helmet
WRSI Moment Fullface Helmet With Vents
WRSI Moment Fullface Helmet Without Vents
WRSI Current Pro Helmet
WRSI Current Helmet Without Vents
WRSI Current Helmet Without Vents
WRSI Current Helmet With Vents
WRSI Current Helmet With Vents
WRSI Current Rescue Helmet without Vents
WRSI Current Rescue Helmet with Vents
WRSI Limited Edition Current Helmet
NRS Chaos Helmet - Side Cut - Closeout
...
*/
?>

主页http://simplehtmldom.sourceforge.net/