PHP Regex on CD Tracklists


PHP Regex on CD Tracklists

我使用preg_match来格式化曲目列表,这样曲目编号、标题和持续时间就可以在表中分隔成各自的单元格:

<td>01</td><td>Track Title</td><td>01:23</td>

问题是轨道本身可以采取以下任何形式(轨道编号和持续时间上的前导零并不总是存在):

01. Track Title (01:23)
01. Track Title 01:23
01. Track Title
1 Track Title (01:23)
1 Track Title 01:23
1 Track Title

以下仅适用于带有时间戳的轨道:

/([0-9]+)'.?['s+](.*)['s+]('?[0-5]?[0-9]:[0-5][0-9]')?)/

所以我在时间戳中添加了?

/([0-9]+)'.?['s+](.*)['s+](('?[0-5]?[0-9]:[0-5][0-9]')?)?/

这适用于没有时间戳的曲目,但有时间戳的歌曲最终会被标题卡住,就像这样:

<td>01</td><td>Track Title 01:23</td><td></td>

EDIT:跟踪列表是纯文本的,在解析之前从SQL表中提取。

试试这个:

/^([0-9]+)'.?['s]+(.*)(['s]+('(?[0-5]?[0-9]:[0-5][0-9]')?))?$/U

注意,我使用了不规则模式修饰符U来尝试匹配最小的匹配字符串,并且我已经锚定了字符串的开头和结尾。

默认情况下,正则表达式是贪婪的,因此匹配标题.* 的部分会吃掉字符串的其余部分,因为最后一个带有持续时间的部分是可选的。

使用/U修饰符打开不自由行为-查找打开的PCRE_UNGREEDYhttp://us1.php.net/manual/en/reference.pcre.pattern.modifiers.php

怎么样:

^([0-9]+)'.?'s+(.*?)(?:'(?([0-5]?[0-9]:[0-5][0-9])')?)?$

解释:

正则表达式:

(?-imsx:^([0-9]+)'.?'s+(.*?)(?:'(?([0-5]?[0-9]:[0-5][0-9])')?)?$)
matches as follows:
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching 'n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to '1:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of '1
----------------------------------------------------------------------
  '.?                      '.' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  's+                      whitespace ('n, 'r, 't, 'f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to '2:
----------------------------------------------------------------------
    .*?                      any character except 'n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of '2
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    '(?                      '(' (optional (matching the most amount
                             possible))
----------------------------------------------------------------------
    (                        group and capture to '3:
----------------------------------------------------------------------
      [0-5]?                   any character of: '0' to '5' (optional
                               (matching the most amount possible))
----------------------------------------------------------------------
      [0-9]                    any character of: '0' to '9'
----------------------------------------------------------------------
      :                        ':'
----------------------------------------------------------------------
      [0-5]                    any character of: '0' to '5'
----------------------------------------------------------------------
      [0-9]                    any character of: '0' to '9'
----------------------------------------------------------------------
    )                        end of '3
----------------------------------------------------------------------
    ')?                      ')' (optional (matching the most amount
                             possible))
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  $                        before an optional 'n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------