我有一个http://www.statistics.com/index.php?page=glossary&term_id=703
具体在这部分:
<b>Additive Error:</b>
<p> Additive error is the error that is added to the true value and does not
depend on the true value itself. In other words, the result of the measurement is
considered as a sum of the true value and the additive error: </p>
我尽我最大的努力得到标签<p>
和</p>
之间的文本,用这个:
include('simple_html_dom.php');
$url = 'http://www.statistics.com/index.php?page=glossary&term_id=703';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
$html = new simple_html_dom();
$html->load($curl_scraped_page);
foreach ( $html->find('b') as $e ) {
echo $e->innertext . '<br>';
}
它给了我:
Additive Error:
Browse Other Glossary Entries
我尝试将foreach更改为:foreach ( $html->find('b p') as $e ) {
则foreach ( $html->find('/b p') as $e ) {
然后它只是一直给我空白页。我做错了什么?谢谢。
为什么不使用PHP内置的DOM扩展和xpath呢?
libxml_use_internal_errors(true); // <- you might needs this if that page has errors
$dom = new DomDocument();
$dom->loadHtml($curl_scraped_page);
$xpath = new DomXPath($dom);
print $xpath->evaluate('string(//p[preceding::b]/text())');
// ^
// this will get you text content from <p> tags preceded by <b> tags
如果<b>
之前有多个<p>
标记,并且您只想获得第一个,请将xpath查询调整为:
string((//p[preceding::b]/text())[1])
要将它们全部作为DOMNodeList
,提交string()
函数://p[preceding::b]/text()
,然后您可以遍历列表并访问每个节点的textContent
属性…
如果您想要b或p标签内的所有内容,您可以简单地执行foreach ($html->find('b,p') as $e) { ... }
。
试试这个
<?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://www.statistics.com/index.php?page=glossary&term_id=703');
$xpath = new DOMXPath($dom);
$mytext = '';
foreach($xpath->query('//font') as $font){
$mytext = $xpath->query('.//p', $font)->item(0)->nodeValue;
break;
}
echo $mytext;
?>