需要Regex(PHP)的帮助


Need some assistence with Regex (PHP)

我想使用preg_replace将txt文件解析为HTML以添加格式。文件的格式如下:

09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!

这应该被视为一个组,并解析为一个表,如:

<table>
<tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr>
<tr><td>1234567</td><td>(optional)</td><td>Today is a beautiful day</td></tr>
<tr><td>1234568</td><td>(optional)</td><td>Tomorrow will be even better</td></tr>
<tr><td>1234569</td><td>(optional)</td><td>December is the best month of the year!</td></tr>
</table>

目前,我使用两个单独的preg_replacements,一个用于第一行(日期),另一个用于后面的行,可以只有一个或最多100个左右。但是,这个文件也可以包含其他文本,需要忽略(作为替换),但如果这行的格式或多或少相同(7位数字和一些文本),它也会被格式化:

$file = preg_replace('~^'s*(('[.*']){0,2}'d{1,2}:'d{2}:'d{2}('[/.*']){0,2})'s('d{2}-'d{2}-'d{2}('[/.*']){0,2})'s+(?:'d{2}/'d{3}'s+|)(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)'s+(.+)$~m', '<table class="file"><tr class="entry"><td class="time">$1 $4</td><td class="day">$6</td><td class="message">$7</td></tr>', $file);
$file = preg_replace('~^'s*(.{0,11}?)'s*(('[.+?'])?'d{7}('[/.+?'])?)'s+(.+?)$~m', '<tr class="id"><td class="optional">$1</td><td class="id">$2</td><td class="message">$5</td></tr>', $file);

如何改进?比如,如果我有这个内容:

09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
Liverpool - WBA 2-2
1234570 This line should be ignored
19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better

因此,我只想捕获并预存第一个块和最后一个块,从时间/日期和后面的几行开始,从7位数的ID开始。

到目前为止,感谢阅读;)

我认为这实现了您想要做的事情。

有一句话我不清楚为什么应该忽略它:

1234570应忽略此行

该线路满足7 digits and some text的要求。

我想出的正则表达式是:

/^('d{2}:'d{2}:'d{2}'h*'d{1,2}-'d{1,2}-'d{1,2}|'d{7})'h*([a-zA-Z]{3}day)?'h*(.+)/m

以下是regex101演示:https://regex101.com/r/qB0gH6/1

以及在PHP使用中:

$string = '09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
Liverpool - WBA 2-2
1234570 This line should be ignored
19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better';
echo preg_replace('/^('d{2}:'d{2}:'d{2}'h*'d{1,2}-'d{1,2}-'d{1,2}|'d{7})'h*([a-zA-Z]{3}day)?'h*(.+)/m', '<td>$1</td><td>$2</td><td>$3</td>', $string);

输出:

<td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234567</td><td></td><td>Today is a beautiful day</td>
<td>1234568</td><td></td><td>Tomorrow will be even better</td>
<td>1234569</td><td></td><td>December is the best month of the year!</td>
Liverpool - WBA 2-2
<td>1234570</td><td></td><td>This line should be ignored</td>
<td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234571</td><td></td><td>Today is a beautiful day</td>
<td>1234572</td><td></td><td>Tomorrow will be even better</td>

好吧,根据你的更新,它有点复杂,但我认为这就是:

$string = '09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
Liverpool - WBA 2-2
1234570 This line should be ignored
19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better';
echo preg_replace_callback('/(?:^|'n)('d{2}:'d{2}:'d{2}'h*'d{1,2}-'d{1,2}-'d{1,2})'h+([a-zA-Z]{3}day)?'h*(.+?)'n(('d{7})'h+(.+?)('n|$))+/', 
                    function ($matches) {
                        $lines = explode("'n", $matches[0]);
                        $theoutput = '<table><tr>';
                        foreach($lines as $line) {
                            if(preg_match('/(?:^|'n)('d{2}:'d{2}:'d{2}'h*'d{1,2}-'d{1,2}-'d{1,2})'h+([a-zA-Z]{3}day)?'h*(.*)/', $line, $output)) {
                                //it is the first date string line;
                                foreach($output as $key => $values) {
                                    if(!empty($key)) {
                                        $theoutput .= '<td>' . $values . '</td>';
                                    }
                                }
                            } else {
                                if(preg_match('/('d{7})'h*(.*)/', $line, $output)) {
                                    $theoutput .= '</tr><tr>';
                                    foreach($output as $key => $values) {
                                        if(!empty($key)) {
                                            $theoutput .= '<td>' . $values . '</td>';
                                        }
                                    }
                                }
                            }
                        }
                        $theoutput .= '</tr></table>';
                        return $theoutput;
                    }, $string);

输出:

<table><tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234567</td><td>Today is a beautiful day</td></tr><tr><td>1234568</td><td>Tomorrow will be even better</td></tr><tr><td>1234569</td><td>December is the best month of the year!</td></tr></table>
Liverpool - WBA 2-2
1234570 This line should be ignored
<table><tr><td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234571</td><td>Today is a beautiful day</td></tr><tr><td>1234572</td><td>Tomorrow will be even better</td></tr></table>