要解析的复杂正则表达式


Complex regular expression to resolve

我尝试用正则表达式解析url以捕获元素,但我不知道该怎么做。URL:

示例
  • location-cottage with $path_ => array(type => cottage)
  • location-cottage-p1 with $path_ => array(type => cottage, page => p1)
  • location-cottage-my-region-r01 with $path_ => array(type => cottage, region => r01)
  • location-cottage-my-department-d01 with $path_ => array(type => cottage, department => d01)
  • location-cottage-my-department-d01-p1 with $path_ => array(type => cottage, department => d01, page => p1)

我想用一个正则表达式做到这一点,但我不知道这样做,我试着这样做:

$expression = '#location-(?P<type>cottage|house)[a-z,-]*';
$expression.= '(?P<region>r[0-9]{2}|)';
$expression.= '(?P<department>d[0-9]{2}')';
$expression.= '(?P<town>v[0-9]{5}|)';
$expression.= '[-]*(?P<page>[p0-9]*)$#';
preg_match($expression, $_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'], $path_);

有人能帮我吗?

在第二部分中,如果可能的话,我希望只保留01而不保留d01,只保留1而不保留p1,如下所示:

  • location-cottage-my-department-d01-p1 with $path_ => array(type => cottage, department => 01, page => 1)

首先,使用#x使您的正则表达式更具可读性。然后在每个可选的捕获组后面使用?:

$expression = <<< RX 
    #
      location-(?P<type>cottage|house)[a-z,-]*
      (?P<region> r[0-9]{2}|)   ?
      (?P<department> d[0-9]{2})   ?
      (?P<town> v[0-9]{5}|)   ?
      [-]*(?P<page> [p0-9]*)   ?
    $#x
RX;

如果你不想捕获d,那么将其移出命名捕获组,并将其包装在(?: )?中。

您可以解析字符串

而不是正则表达式(在大多数情况下都是超大的)
list($locationString, $type, $region, $department, $town, $page) = array_pad(explode('-', $path(), null, -6);

现在单独验证每个参数(注意,缺少的参数是null,因为array_pad())。这并不是更容易读,但是以后你可以更容易地修改它,比如当你想添加类型的时候。