使用正则表达式解析 WP 网页抓取数据


Parsing WP Web Scraping Data with Regex

我使用Wordpress插件"WP Web Scraper"轻松地从站点收集一些数据。该函数wpws_get_content返回结果"提高(数字%)"我想将其清理为数字。我设法使用以下代码返回 (90%)

<p id="number1"><?php echo wpws_get_content('http://my.sportrelief.com/sponsor/sachazarb', '#totalizer_percent', array( 'replace_query' => '/Raised/', 'replace_query_type' => 'regex', 'replace_with' => ' ', ) ); ?></p>

我无法进一步解决。它还返回重复数据,即

<!--
 Start of web scrap (created by wp-web-scraper)
 Source URL: http://my.sportrelief.com/sponsor/sachazarb
 Query: #totalizer_percent (cssselector)
 Other options: Array
(
    [headers] => 
    [cache] => 60
    [useragent] => WPWS bot (http://windreeladprint.com)
    [timeout] => 2
    [on_error] => error_show
    [output] => html
    [glue] => 
    [eq] => 
    [gt] => 
    [lt] => 
    [query_type] => cssselector
    [remove_query] => 
    [remove_query_type] => cssselector
    [replace_query] => /Raised/
    [replace_query_type] => regex
    [replace_with] =>  
    [basehref] => 1
    [a_target] => 
    [callback_raw] => 
    [callback] => 
    [debug] => 1
    [charset] => UTF-8
)
--><span id="totalizer_percent" class="percent">  (90%)</span><span id="totalizer_percent" class="percent">  (90%)</span><!--
 End of web scrap
 WPWS Cache Control: Remote-fetch via WP_Http
 Computing time: 1.306 seconds
-->
您可以

尝试此正则表达式Raised'(('d+)%')并使用'1$1替换为第一个捕获的组。

正则表达式 101 演示