我有一个简单的文档,我需要将其拆分为事件(按天),不幸的是,该文档包含其他无用的信息(例如事件详细信息),我需要爬行才能检索信息。此文档的例外如下所示:
10th March 2015
Baseball 10:00 Please remember to bring your bats
Soccer 14:00 over 18s only
11th March 2015
Swimming 10:00 Children only
Soccer 14:00 Over 14s team training
我最初的计划是使用 preg_spit 尝试在日期处拆分字符串,然后遍历每个字符串,但是我需要维护文档的结构。
理想情况下,我想将数据返回到如下数组中:
arr[
'days' =>[
'date' => '10th MArch 2015'
'events' => ['Baseball 10:00', 'Soccer 14:00'],
]
]
我该如何最好地做到这一点?正则表达式不是我的强项,但我知道的足够多,可以捕捉([0-9]{1,2}[a-z]{2}/s[a-z]+/s[0-9]{4})
的日子和([a-Z]+/s[0-9]{2}:[0-9]{2})
的事件。
您可以使用此正则表达式:
/(?:'b('d+th'h+.*?'d{4})'b|'G)'s+('S+'h+'d{2}:'d{2}'b).*?(?='s+(?>'S+'h+'d{2}:'d{2}|'d+th'h+|'z))/i
然后用一些PHP代码来循环结果。
正则表达式演示
这就是我想出的。我用explode()
来拆分不同的部分,然后拆分线条。直到最后我才使用preg_match()
来获取特定的运动/时间。
<?php
$text = <<<EOD
10th March 2015
Baseball 10:00 Please remember to bring your bats
Soccer 14:00 over 18s only
11th March 2015
Swimming 10:00 Children only
Soccer 14:00 Over 14s team training
EOD;
$days = array();
if( $sections = explode("'n'n",$text) ){
foreach($sections as $k=>$section){
$events = array();
$lines = explode("'n",$section);
$day = $lines[0];
unset($lines[0]);
if($lines){
foreach($lines as $line){
preg_match("/('w+)'s('d){2}:('d){2}/",$line,$matches);
if(isset($matches[0])){
$events[] = $matches[0];
}
}
}
$days[$k] = array(
'day' => $day,
'events' => $events
);
}
}
echo '<pre>',print_r($days),'</pre>';