使用正则表达式对数据及其子项进行分组


Using regex to group data and it's children

我有一个简单的文档,我需要将其拆分为事件(按天),不幸的是,该文档包含其他无用的信息(例如事件详细信息),我需要爬行才能检索信息。此文档的例外如下所示:

10th March 2015
Baseball 10:00 Please remember to bring your bats
Soccer 14:00 over 18s only
11th March 2015
Swimming 10:00 Children only
Soccer 14:00 Over 14s team training

我最初的计划是使用 preg_spit 尝试在日期处拆分字符串,然后遍历每个字符串,但是我需要维护文档的结构。

理想情况下,我想将数据返回到如下数组中:

arr[
   'days' =>[
        'date' => '10th MArch 2015'
        'events' => ['Baseball 10:00', 'Soccer 14:00'],
    ]
]

我该如何最好地做到这一点?正则表达式不是我的强项,但我知道的足够多,可以捕捉([0-9]{1,2}[a-z]{2}/s[a-z]+/s[0-9]{4})的日子和([a-Z]+/s[0-9]{2}:[0-9]{2})的事件。

您可以使用此正则表达式:

/(?:'b('d+th'h+.*?'d{4})'b|'G)'s+('S+'h+'d{2}:'d{2}'b).*?(?='s+(?>'S+'h+'d{2}:'d{2}|'d+th'h+|'z))/i

然后用一些PHP代码来循环结果。

正则表达式演示

这就是我想出的。我用explode()来拆分不同的部分,然后拆分线条。直到最后我才使用preg_match()来获取特定的运动/时间。

<?php
$text = <<<EOD
10th March 2015
Baseball 10:00 Please remember to bring your bats
Soccer 14:00 over 18s only
11th March 2015
Swimming 10:00 Children only
Soccer 14:00 Over 14s team training
EOD;
$days = array();
if( $sections = explode("'n'n",$text) ){
    foreach($sections as $k=>$section){
        $events = array();
        $lines = explode("'n",$section);
        $day = $lines[0];
        unset($lines[0]);
        if($lines){
            foreach($lines as $line){
                preg_match("/('w+)'s('d){2}:('d){2}/",$line,$matches);
                if(isset($matches[0])){
                    $events[] = $matches[0];
                }
            }
        }
        $days[$k] = array(
            'day' => $day,
            'events' => $events
        );
    }
}
echo '<pre>',print_r($days),'</pre>';