正则表达式语法中的问题


problem in regex syntax

我有一个txt文件,里面有很多行我想搜索版本和日期什么正则表达式适合在数组中获得类似v 1.31.6.7 2008/03/07的内容

来自许多像这样的txt文件:

此文件可能包含创建、测试和经Sourcefire,股份有限公司认证("VRT认证规则"(以及Sourcefire和其他第三方创建的规则,以及$Id:ddos.rules,版本1.31.6.7 2008/03/07 20:53:40 vrtbuild ExpDDOS规则

版本可能不同,如:v 1.48.6.12

类似这种格式的

日期也不同

假设我有很多行重复

$Id: ddos.rules,v 1.31.6.7 2008/03/07 20:53:40 vrtbuild Exp
$Id: exploit.rules,v 1.116.6.53 2008/11/18 16:36:27 vrtbuild Exp $
$Id: misc.rules,v 1.77.6.20 2008/10/17 19:36:59 vrtbuild Exp $
$Id: smtp.rules,v 1.77.6.19 2008/10/17 19:37:00 vrtbuild Exp $
$Id: tftp.rules,v 1.28.6.6 2008/07/22 17:59:06 vrtbuild Exp $
$Id: web-iis.rules,v 1.110.6.11 2008/07/22 17:59:06 vrtbuild Exp $
$Id: web-attacks.rules,v 1.23 2005/05/16 22:18:17 mwatchinski Exp $

具有不同的日期和v(版本(值

我发现了这样的日期模式:

^(((0[1-9]|[12]'d|3[01])'/(0[13578]|1[02])'/((19|[2-9]'d)'d{2}))|((0[1-9]|[12]'d|30)'/(0[13456789]|1[012])'/((19|[2-9]'d)'d{2}))|((0[1-9]|1'd|2[0-8])'/02'/((19|[2-9]'d)'d{2}))|(29'/02'/((1[6-9]|[2-9]'d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))$

有人能解释吗?

您的日期正则表达式:

^(((0[1-9]|[12]'d|3[01])'/(0[13578]|1[02])'/((19|[2-9]'d)'d{2}))|((0[1-9]|[12]'d|30)'/(0[13456789]|1[012])'/((19|[2-9]'d)'d{2}))|((0[1-9]|1'd|2[0-8])'/02'/((19|[2-9]'d)'d{2}))|(29'/02'/((1[6-9]|[2-9]'d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))$

非常有趣。我决定对此进行分析,看看它到底匹配什么。事实证明,这个正则表达式匹配所有格式为DD/MM/YYYY的有效日期,从1900年到9999年。有趣的是,它还正确匹配了从1597年到9999年的所有有效闰日。此正则表达式了解每个月的有效天数。它知道5月有31天,6月只有30天。它还知道二月有28天,闰年有29天。在这里,它被分解,以便凡人可以阅读:

$re_date = '%
    # Match all valid DD/MM/YYYY dates from 1900 to 9999 and
    #   all leap days from year 1597 to 9999.
    ^                           # Anchor to start of string.
    ( # $1:
      (  # $2: Date format alternative 1: (months having 31 days)
        ( 0[1-9]|[12]'d|3[01])  # $3: Day: 01-09,10-19,20-29,30,31
        '/
        (0[13578]|1[02])        # $4: Month: 01,03,05,07,08,10,12
        '/
        ((19|[2-9]'d)'d{2})     # $5,$6: Year: 1900-9999
      )                         # End $2:
    | (  # $7: Date format alternative 2: (months having 30 days)
        (0[1-9]|[12]'d|30)      # $8: Day: 01-09,10-19,20-29,30
        '/
        (0[13456789]|1[012])    # $9: Month: 01,03-09,10-12
        '/
        ((19|[2-9]'d)'d{2})     # $10,$11: Year: 1900-9999
      )                         # End $7:
    | (  # $12: Date format alternative 3: (month having 28 days)
        (0[1-9]|1'd|2[0-8])     # $13: Day 01-09,10-19,20-28
        '/
        02                      # Month: 02
        '/
        ((19|[2-9]'d)'d{2})     # $14,$15: Year: 1900-9999
      )                         # End $12:
    | (  # $16: Date format alternative 3: (leap days)
        29                      # Day: 29
        '/
        02                      # Month: 02
        '/ # Match all valid leap day dates from year 1597 to 9999.
        (                       # $17: Year alt 1 (divisible by 4 but not 100)
          (1[6-9]|[2-9]'d)      # $18: Century part: 16-19,20-99
          ( 0[48]               # $19: Year part: Either 04-08
          | [2468][048]         # or 20,24,28,40,44,48,60,64,68,80,84,88
          | [13579][26]         # or 12,16,32,36,52,56,72,76,92,96,
          )                     # End $19:
        | (                     # or $20: Year alternative 2 (divisible by 400)
            ( 16                # $21: Century part: Either 16
            | [2468][048]       # or 20,24,28,40,44,48,60,64,68,80,84,88
            | [3579][26]        # or 32,36,52,56,72,76,92,96
            )                   # End $21:
            00                  # Year part: 00
          )                     # End $20:
        )                       # End $17:
      )                         # End $16:
    )                           # End $1:
    $                           # Anchor to end of string.
    %x';

为了解决眼前的问题,这里有一个更精确的正则表达式:

$count = preg_match_all('%
    # Match version/date sub-string
    'b          # Anchor to word boundary.
    (           # $1: Version number.
      [Vv]      # Version identifier (allow V or v).
      [ ]+      # One or more spaces.
      [0-9]+    # Major version number is one or more digits.
      (?:       # Group minor version numbers.
        '.      # Minor versions separated by dot.
        [0-9]+  # Minor version is one or more digits.
      )*        # Zero or more minor versions.
    )           # End $1: Version number.
    [ ]+        # One or more spaces.
    (           # $2: Date.
      [0-9]{4}  # Year is four digits.
      /         # / Separator.
      [0-9]{2}  # Month is two digits.
      /         # / Separator.
      [0-9]{2}  # Day is two digits.
    )           # End $2: Date.
    %x', $text, $matches);
if ($count > 0) {
    $versions = $matches[1];
    $dates    = $matches[2];
    printf("Found %d matches:'n", $count);
    for ($i = 0; $i < $count; ++$i) {
        printf("  Match%3d:  Version: %-15s  Date: %s'n",
            $i + 1, $versions[$i], $dates[$i]);
    }
} else {
    echo("No matches found.'n");
}

注意:当处理像这样的非平凡正则表达式时,最好使用'x'自由间距模式来编写它们。这允许添加大量的注释和缩进,使其更易于阅读。

foreach ($lines as $line){
    if (preg_match("|v (.*?) (.*?) |", $line, $match)){
        echo "found version ".$match[1]." date ".$match[2];
    }
}

这正是你想要的吗?