Dom和XPath抓取——这里有什么问题吗? - Dom and XPath scraping - What wrong here?

Dom and XPath scraping - What wrong here?

我需要从互联网上抓取一个网页的文本长度，我使用dom和xpath来查找数据，但是我似乎无法选择我需要的确切信息。这是我的代码到目前为止，问题是与项目(0)->nodeValue部分-这适用于我的其他刮痧，我有另一个页面，但不是这个。

$argos_html = file_get_html('http://www.argos.co.uk/static/Product/partNumber/9282197/Trail/searchtext%3EIPOD+TOUCH.htm');
$dom_argos= new DOMDocument();
$dom_argos->loadHTML($argos_html);
$xpath_argos = new DOMXpath($dom_argos);
$expr_currys = "/html/body/div[4]/div[3]/form/div[2]/div/div[5]/ul/li[3]/span";
$nodes_argos = $xpath_argos->query($expr_argos);
$argos_stock_data = $nodes_argos->item(0)->nodeValue;

谁能告诉我我哪里错了?因为我总是得到一个错误，这涉及到->项目(0)->nodeValue;部分，但是如果我把它注释掉，没有错误，但是没有收集到任何数据…

应该是->nodeValue;

我知道这可能是由于页面结构，但我是新的这一切!Thx

运行代码，我首先得到:

Notice: Undefined variable: expr_argos
Warning: DOMXPath::query() [domxpath.query]: Invalid expression

因此，首先，确保您使用的是对XPath查询有效的内容——例如，您应该这样写:

$nodes_argos = $xpath_argos->query($expr_currys);

而不是当前的

$nodes_argos = $xpath_argos->query($expr_argos);

然后，您会得到以下错误:

Notice: Trying to get property of non-object

$argos_stock_data = $nodes_argos->item(0)->nodeValue;

基本上，这意味着您正在尝试读取属性nodeValue，而不是对象:$nodes_argos->item(0);

我猜你的XPath查询是无效的;因此，对xpath()方法的调用不会返回任何有趣的东西。

您应该检查(相当长，不容易理解) XPath查询，确保它匹配HTML页面中的内容。

当我在Firefox中使用XPath时，它很好，但它不能与DOM一起工作，这并不奇怪。我假设您从某种能够返回某些元素的路径的浏览器插件获得XPath。但是，您不应该信任浏览器插件返回的xpath，因为浏览器会通过JavaScript修改DOM，并在必要的地方添加隐含值。请使用原始源代码。

您的XPath在Firefox中计算为"2天内到家"，这不是我在名为"stock_data"的变量中所期望的。但无论如何，这应该可以做到:

$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile('http://www.argos.co.uk/static/Product/partNumber/9282197/Trail/searchtext%3EIPOD+TOUCH.htm');
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$nodes = $xpath->query(
    '/html/body//div[@id="deliveryInformation"]/ul/li[@class="home"]/span'
);
echo $nodes->item(0)->nodeValue; // "Home delivery within 2 days"