如何提取数据(字符串)使用PHP使用正则表达式


how to extract data(string) using php using regex?

我试图提取

$str = "Instant Oatmeal - Corn Flavour 175g (35g x 5)";
preg_match('/(?P<name>.*) (?P<total_weight>'d+)(?P<total_weight_unit>.*) '((?P<unitWeight>'d+)(?P<unitWeight_unit>.*) x (?P<portion_no>'d+)')/', $str, $m);

它是正确的:

Instant Oatmeal - Corn Flavour 175g (35g x 5)
name : Instant Oatmeal - Corn Flavour
total_weight : 175 g
#portion : 5
unit weight : 35 g

但是,如果我想提取

$str = "Cholcolate Sandwich Cookies (Tray) 264.6g (29.4g x 9)";

是错误的:

Cholcolate Sandwich Cookies (Tray) 264.6g (29.4g x 9)
name : Cholcolate Sandwich Cookies (Tray)
total_weight : 264 .6g
#portion : 9
unit weight : 29 .4g

如何解决这个问题?

对于不重要的正则表达式使用自由间距模式!

在处理像这样的重要正则表达式时,可以通过以自由间距格式编写它们并添加大量注释(以及任何嵌套括号的缩进)来显著提高可读性(和可维护性)。这是你的原始正则表达式在自由空格格式与注释:

$re_orig = '/# Original regex with added comments.
    (?P<name>.*)               # $name:
    [ ]                        # Space separates name from weight.
    (?P<total_weight>'d+)      # $total_weight:
    (?P<total_weight_unit>.*)  # $total_weight_unit:
    [ ]                        # Space separates totalunits from .
    '(                         # Literal parens enclosing portions data.
    (?P<unitWeight>'d+)        # $unitWeight:
    (?P<unitWeight_unit>.*)    # $unitWeight_unit:
    [ ]x[ ]                    # "space-X-space" separates portions data.
    (?P<portion_no>'d+)        # $portion_no:
    ')                         # Literal parens enclosing portions data.
    /x';

这是一个改进的版本:

$re_improved = '/# Match Name, total weight, units and portions data.
    ^                       # Anchor to start of string.
    (?P<name>.*?)           # $name:
    [ ]+                    # Space(s) separate name from weight.
    (?P<total_weight>       # $total_weight:
      'd+                   # Required integer portion.
      (?:'.'d*)?            # Optional fractional portion.
    )
    (?P<total_weight_unit>  # $total_weight_unit:
      .+?                   # Units consist of any chars.
    )
    [ ]+                    # Space(s) separate total from portions.
    '(                      # Literal parens enclosing portions data.
    (?P<unitWeight>         # $unitWeight:
      'd+                   # Required integer portion.
      (?:'.'d*)?            # Optional fractional portion.
    )
    (?P<unitWeight_unit>    # $unitWeight_unit:
      .+?                   # Units consist of any chars.
    )
    [ ]+x[ ]+               # "space-X-space" separates portions data.
    (?P<portion_no>         # $portion_no:
      'd+                   # Required integer portion.
      (?:'.'d*)?            # Optional fractional portion.
    )
    ')                      # Literal parens enclosing portions data.
    $                       # Anchor to end of string.
    /xi';

指出:

  • 所有数值量的表达式已经改进,允许可选的小数部分。
  • 增加了字符串锚的开始和结束。
  • 增加i ignorecase修饰符,以防部分数据中的X为大写。

我不知道你是如何应用这个正则表达式的,但是这个改进的正则表达式应该可以解决你的直接问题。

编辑:2011-10-09 11:17 MDT更改了单位的表达式,以允许Ilmari Karonen指出的情况。

使用

/(?P<name>.*) (?P<total_weight>'b[0-9]*'.?[0-9]+)(?P<total_weight_unit>.*) '((?P<unitWeight>'b[0-9]*'.?[0-9]+)(?P<unitWeight_unit>.*) x (?P<portion_no>'d+)')/

你的问题是你没有考虑到浮点数。我纠正了这个。请注意,部分仍然是一个整数,但我想这是合乎逻辑的:)