我从一个网站的数据,源代码是
view-source:http://www.pakdukaan.com/75-computer-cases
我用来抓取数据的代码如下
<?php
$html = file_get_contents('http://www.pakdukaan.com/75-computer-cases');
$pk_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pk_doc->loadHTML($html);
libxml_clear_errors();
$pk_xpath = new DOMXPath($pk_doc);
$pk_list = array();
$pk_and_price = $pk_xpath->query('//div[@class="product_list list row "]');
if($pk_and_price->length > 0){
foreach($pk_and_price as $pat){
$name = $pk_xpath->query('//h5[@class="name"]', $pat)->item(0)->nodeValue;
$pkmn_types = array();
$price = $pk_xpath->query('//span[@class="price product-price"]', $pat)
foreach($types as $type){
$pkmn_types[] = $type->nodeValue;
}
$pk_list[] = array('name' => $name, 'price' => $pkmn_price);
}
}
}
//output what we have
echo "<pre>";
echo print_r($pk_list);
echo "</pre>";
?>
但是我得到的不是所有箱子的名字,而是一个箱子的名字,而且我得到了两次箱子的所有价格。
输出
Array
(
[0] => Array
(
[name] =>
Thermaltake V2 Plus + 350W Power Supply
[price] => Array
(
[0] =>
Rs. 4,099
[1] =>
Rs. 4,099
[2] =>
Rs. 5,899
[3] =>
Rs. 5,899
[4] =>
Rs. 8,499
[5] =>
Rs. 8,499
[6] =>
Rs. 9,499
[7] =>
Rs. 9,499
[8] =>
Rs. 10,350
[9] =>
Rs. 10,350
[10] =>
Rs. 12,999
[11] =>
Rs. 12,999
[12] =>
Rs. 17,799
[13] =>
Rs. 17,799
[14] =>
Rs. 16,199
[15] =>
Rs. 16,199
[16] =>
Rs. 17,299
[17] =>
Rs. 17,299
[18] =>
Rs. 16,500
[19] =>
Rs. 16,500
[20] =>
Rs. 5,899
[21] =>
Rs. 5,899
[22] =>
Rs. 8,399
[23] =>
Rs. 8,399
[24] =>
Rs. 4,999
[25] =>
Rs. 4,999
[26] =>
Rs. 7,599
[27] =>
Rs. 7,599
[28] =>
Rs. 9,999
[29] =>
Rs. 9,999
)
)
)
1
有谁能帮忙解决这个问题吗?我已经尝试了很多改变div的类在网站的源代码,但无法得到适当的结果。
那么,让我们检查一下你的错误:
首先:查询$pk_xpath->query('//h5[@class="name"]', $pat)
,然后只取item(0)
。
这意味着您跳过xpath-query中的所有其他DOMNodes
。但是如果你这样做:
$names = $pk_xpath->query('//h5[@class="name"]', $pat);
foreach ($names as $n) {
echo $n->nodeValue . PHP_EOL;
}
您将看到页面中所有名称。 第二:价格。如果您检查抓取页面的html,您将看到 因此,您需要另一个xpath查询,例如,您可以找到所有span[@class="price product-price"]
为每个项目double 。一个span
是可见的,第二个是弹出块,目前隐藏。.product-meta
项,然后在其中搜索price product-price
。