将url字符串(路径和参数）解析为数组 - parse url string (path and parameters) into array

只需在这里编写一个小函数，需要一些优化帮助！

所有请求重定向到索引页，

我有一个函数，可以将url解析为数组。

url的类型描述为：

http://localhost/{user}/{page}/?sub_page={sub_page}&action={action}

例如：

http://localhost/admin/stock/?sub_page=products&action=add

当请求uri时，域被排除在外，所以我的函数接受这样的字符串：

/admin/stock/?sub_page=products&action=add

我的函数如下，警告它是非常程序化的。

对于那些懒得阅读和理解它的人，我在底部添加了一个解释；）

function uri_to_array($uri){
    // uri will be in format: /{user}/{page}/?sub_page={subpage}&action={action} ... && plus additional parameters
    // define array that will be returned
    $return_uri_array = array();
    // separate path from querystring;
    $array_tmp_uri = explode("?", $uri);
    // if explode returns the same as input $string, no delimeter was found
    if ($uri == $array_tmp_uri[0]){ 
        // no question mark found.
        // format either '/{user}/{page}/' or '/{user}/'
        $uri = trim($array_tmp_uri[0], "/");
        // remove excess baggage
        unset ($array_tmp_uri);
        // format either '{user}/{page}' or '{user}'
        $array_uri = explode("/", $uri);
        // if explode returns the same as input $string, no delimiter was found
        if ($uri == $array_uri[0]){
            // no {page} defined, just user.
            $return_uri_array["user"] = $array_uri[0];
        }
        else{
            // {user} and {page} defined.
            $return_uri_array["user"] = $array_uri[0];
            $return_uri_array["page"] = $array_uri[1];            
        }
    }
    else{
        // query string is defined
        // format either '/{user}/{page}/' or '/{user}/'
        $uri = trim($array_tmp_uri[0], "/");
        $parameters = trim($array_tmp_uri[1]);
        // PARSE PATH
        // remove excess baggage
        unset ($array_tmp_uri);
        // format either '{user}/{page}' or '{user}'
        $array_uri = explode("/", $uri);
        // if explode returns the same as input $string, no delimiter was found
        if ($uri == $array_uri[0]){
            // no {page} defined, just user.
            $return_uri_array["user"] = $array_uri[0];
        }
        else{
            // {user} and {page} defined.
            $return_uri_array["user"] = $array_uri[0];
            $return_uri_array["page"] = $array_uri[1];            
        }
        // parse parameter string
        $parameter_array = array();
        parse_str($parameters, $parameter_array);
        // copy parameter array into return array
        foreach ($parameter_array as $key => $value){
            $return_uri_array[$key] = $value;
        }
    }
    return $return_uri_array;
}

基本上有一个主if语句，一个路径是如果没有定义querystring（没有"？"），另一个路径则是如果确实存在。

我只是想让这个功能变得更好。

让它成为一门课值得吗？

本质上，我需要一个以/{user}/{page}/?sub_page={sub_page}&action={action}为参数并返回的函数

array(
    "user" => {user},
    "page" => {page},
    "sub_page" => {sub_page},
    "action" => {action}
)

干杯，Alex

如果您想

做得好
使用正则表达式
使用相同的方法解析所有URL:s（parse_url()不支持相对路径，下面称为only_path）

这可能符合你的口味：

$url = 'http://localhost/admin/stock/?sub_page=products&action=add';
preg_match ("!^((?P<scheme>[a-zA-Z][a-zA-Z'd+-.]*):)?(((//(((?P<credentials>([a-zA-Z'd'-._~'!$&'()*+,;=%]*)(:([a-zA-Z'd'-._~'!$&'()*+,;=:%]*))?)@)?(?P<host>(['w'd-.%]+)|('d{1,3}'.'d{1,3}'.'d{1,3}'.'d{1,3})|('[([a-fA-F'd.:]+)']))?(:(?P<port>'d*))?))(?<path>(/[a-zA-Z'd'-._~'!$&'()*+,;=:@%]*)*))|(?P<only_path>(/(([a-zA-Z'd'-._~'!$&'()*+,;=:@%]+(/[a-zA-Z'd'-._~'!$&'()*+,;=:@%]*)*))?)|([a-zA-Z'd'-._~'!$&'()*+,;=:@%]+(/[a-zA-Z'd'-._~'!$&'()*+,;=:@%]*)*)))?(?P<query>'?([a-zA-Z'd'-._~'!$&'()*+,;=:@%/?]*))?(?P<fragment>#([a-zA-Z'd'-._~'!$&'()*+,;=:@%/?]*))?$!u", $url, $matches);
$parts = array_intersect_key ($matches, array ('scheme' => '', 'credentials' => '', 'host' => '', 'port' => '', 'path' => '', 'query' => '', 'fragment' => '', 'only_path' => '', ));
var_dump ($parts);

它应该涵盖所有可能的格式良好的URL:s

如果host为空，则only_path应包含path，即无protocol和无host的URL。

更新：

也许我应该把这个问题读得更好一点。这将把URL解析成组件，你可以使用这些组件更容易地获得你真正感兴趣的部分

// split the URL
preg_match ('!^((?P<scheme>[a-zA-Z][a-zA-Z'd+-.]*):)?(((//(((?P<credentials>([a-zA-Z'd'-._~'!$&'()*+,;=%]*)(:([a-zA-Z'd'-._~'!$&'()*+,;=:%]*))?)@)?(?P<host>(['w'd-.%]+)|('d{1,3}'.'d{1,3}'.'d{1,3}'.'d{1,3})|('[([a-fA-F'd.:]+)']))?(:(?P<port>'d*))?))(?<path>(/[a-zA-Z'd'-._~'!$&'()*+,;=:@%]*)*))|(?P<only_path>(/(([a-zA-Z'd'-._~'!$&'()*+,;=:@%]+(/[a-zA-Z'd'-._~'!$&'()*+,;=:@%]*)*))?)|([a-zA-Z'd'-._~'!$&'()*+,;=:@%]+(/[a-zA-Z'd'-._~'!$&'()*+,;=:@%]*)*)))?('?(?P<query>([a-zA-Z'd'-._~'!$&'()*+,;=:@%/?]*)))?(#(?P<fragment>([a-zA-Z'd'-._~'!$&'()*+,;=:@%/?]*)))?$!u', $url, $matches);
$parts = array_intersect_key ($matches, array ('scheme' => '', 'credentials' => '', 'host' => '', 'port' => '', 'path' => '', 'query' => '', 'fragment' => '', 'only_path' => '', ));
// extract the user and page
preg_match ('!/*(?P<user>.*)/(?P<page>.*)/!u', $parts['path'], $matches);
$user_and_page = array_intersect_key ($matches, array ('user' => '', 'page' => '', ));
// the query string stuff
$query = array ();
parse_str ($parts['query'], $query);

参考：

为了澄清，以下是用于制定正则表达式的相关文件：

RFC3986方案/协议
RFC3986用户和密码
RFC1035主机名
- 或者RFC3986IPv4
- 或者RFC2732IPv6
RFC3986查询
RFC3986片段

这可能是什么？

function uri_to_array($uri){
  $result = array();
  parse_str(substr($uri, strpos($uri, '?') + 1), $result);
  list($result['user'], $result['page']) = explode('/', trim($uri, '/'));
  return $result;
}
print_r(
  uri_to_array('/admin/stock/?sub_page=products&action=add')
);
/*
Array
(
    [sub_page] => products
    [action] => add
    [page] => stock
    [user] => admin
)
*/

演示：http://codepad.org/nBCj38zT

一些改进此功能的建议。

首先，使用parse_url而不是分解来分隔主机名、路径和查询字符串。

第二，在决定是否有查询字符串之前，先编写用于解析路径的代码，因为无论哪种方式都可以解析路径。

第三，不使用foreach循环来复制参数，而是使用array_merge，如下所示：

// put $return_uri_array last so $parameter_array can't override values
$return_uri_array = array_merge($parameter_array, $return_uri_array);

这是否应该是一个类取决于您的编程风格。一般来说，我总是使用类，因为在单元测试中模拟它们更容易。

最紧凑的方法是这样的正则表达式（没有经过充分测试，只是为了展示原理）

if(preg_match('!http://localhost/(?P<user>'w+)(?:/(?P<page>'w+))/(?:'?sub_page=(?P<sub_page>'w+)&action=(?P<action>'w+))!', $uri, $matches)) {
  return $matches;
}

生成的数组也将具有匹配项的数字索引，但您可以忽略它们或使用array_intersect_keys筛选所需的键。'w+模式匹配所有"word"字符，您可以将其替换为像[-a-zA-Z0-9_]或类似的字符类。