正则表达式,用于部分提取 php 代码((数组定义))


Regular Expression to extract php code partially (( array definition ))

我将php代码存储在这样的字符串中((数组定义((

$code=' array(
  0  => "a",
 "a" => $GlobalScopeVar,
 "b" => array("nested"=>array(1,2,3)),  
 "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
); ';

有一个正则表达式来提取这个数组??,我的意思是我想要类似的东西

$array=(  
  0  => '"a"',
 'a' => '$GlobalScopeVar',
 'b' => 'array("nested"=>array(1,2,3))',
 'c' => 'function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }',
);

pD :: 我做研究试图找到一个正则表达式,但什么也没找到。
pD2 :: 堆栈溢出之神,现在让我赏金,我将提供 400 :3
pD3 :: 这将在内部应用程序中使用,我需要提取一些 PHP 文件的数组以部分"处理",我尝试用这个 codepad.org/td6LVVme 解释

正则表达式

所以这是我想出的 MEGA 正则表达式:

's*                                     # white spaces
########################## KEYS START ##########################
(?:                                     # We''ll use this to make keys optional
(?P<keys>                               # named group: keys
'd+                                     # match digits
|                                       # or
"(?(?=''''")..|[^"])*"                  # match string between "", works even 4 escaped ones "hello '" world"
|                                       # or
''(?(?='''''')..|[^''])*''              # match string between '''', same as above :p
|                                       # or
'$'w+(?:'[(?:[^[']]|(?R))*'])*          # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
)                                       # close group: keys
########################## KEYS END ##########################
's*                                     # white spaces
=>                                      # match =>
)?                                      # make keys optional
's*                                     # white spaces
########################## VALUES START ##########################
(?P<values>                             # named group: values
'd+                                     # match digits
|                                       # or
"(?(?=''''")..|[^"])*"                  # match string between "", works even 4 escaped ones "hello '" world"
|                                       # or
''(?(?='''''')..|[^''])*''              # match string between '''', same as above :p
|                                       # or
'$'w+(?:'[(?:[^[']]|(?R))*'])*          # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
|                                       # or
array's*'((?:[^()]|(?R))*')             # match an array()
|                                       # or
'[(?:[^[']]|(?R))*']                    # match an array, new PHP array syntax: [1, 3, 5] is the same as array(1,3,5)
|                                       # or
(?:function's+)?'w+'s*                  # match functions: helloWorld, function name
(?:'((?:[^()]|(?R))*'))                 # match function parameters (wut), (), (array(1,2,4))
(?:(?:'s*use's*'((?:[^()]|(?R))*')'s*)? # match use(&$var), use($foo, $bar) (optionally)
'{(?:[^{}]|(?R))*'}                     # match { whatever}
)?;?                                    # match ; (optionally)
)                                       # close group: values
########################## VALUES END ##########################
's*                                     # white spaces

我做了一些评论,请注意您需要使用 3 个修饰符:
x : 让我发表评论 s:用点匹配换行符 i:匹配不区分大小写

.PHP

$code='array(0  => "a", 123 => 123, $_POST["hello"][''world''] => array("is", "actually", "An array !"), 1234, ''got problem ?'', 
 "a" => $GlobalScopeVar, $test_further => function test($noway){echo "this works too !!!";}, "yellow" => "blue",
 "b" => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3)), "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
  "bug", "fixed", "mwahahahaa" => "Yeaaaah"
);'; // Sample data
$code = preg_replace('#(^'s*array's*'('s*)|('s*')'s*;?'s*$)#s', '', $code); // Just to get ride of array( at the beginning, and ); at the end
preg_match_all('~
's*                                     # white spaces
########################## KEYS START ##########################
(?:                                     # We''ll use this to make keys optional
(?P<keys>                               # named group: keys
'd+                                     # match digits
|                                       # or
"(?(?=''''")..|[^"])*"                  # match string between "", works even 4 escaped ones "hello '" world"
|                                       # or
''(?(?='''''')..|[^''])*''              # match string between '''', same as above :p
|                                       # or
'$'w+(?:'[(?:[^[']]|(?R))*'])*          # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
)                                       # close group: keys
########################## KEYS END ##########################
's*                                     # white spaces
=>                                      # match =>
)?                                      # make keys optional
's*                                     # white spaces
########################## VALUES START ##########################
(?P<values>                             # named group: values
'd+                                     # match digits
|                                       # or
"(?(?=''''")..|[^"])*"                  # match string between "", works even 4 escaped ones "hello '" world"
|                                       # or
''(?(?='''''')..|[^''])*''              # match string between '''', same as above :p
|                                       # or
'$'w+(?:'[(?:[^[']]|(?R))*'])*          # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
|                                       # or
array's*'((?:[^()]|(?R))*')             # match an array()
|                                       # or
'[(?:[^[']]|(?R))*']                    # match an array, new PHP array syntax: [1, 3, 5] is the same as array(1,3,5)
|                                       # or
(?:function's+)?'w+'s*                  # match functions: helloWorld, function name
(?:'((?:[^()]|(?R))*'))                 # match function parameters (wut), (), (array(1,2,4))
(?:(?:'s*use's*'((?:[^()]|(?R))*')'s*)? # match use(&$var), use($foo, $bar) (optionally)
'{(?:[^{}]|(?R))*'}                     # match { whatever}
)?;?                                    # match ; (optionally)
)                                       # close group: values
########################## VALUES END ##########################
's*                                     # white spaces
~xsi', $code, $m); // Matching :p
print_r($m['keys']); // Print keys
print_r($m['values']); // Print values

// Since some keys may be empty in case you didn't specify them in the array, let's fill them up !
foreach($m['keys'] as $index => &$key){
    if($key === ''){
        $key = 'made_up_index_'.$index;
    }
}
$results = array_combine($m['keys'], $m['values']);
print_r($results); // printing results

输出

Array
(
    [0] => 0
    [1] => 123
    [2] => $_POST["hello"]['world']
    [3] => 
    [4] => 
    [5] => "a"
    [6] => $test_further
    [7] => "yellow"
    [8] => "b"
    [9] => "c"
    [10] => 
    [11] => 
    [12] => "mwahahahaa"
    [13] => "this is"
)
Array
(
    [0] => "a"
    [1] => 123
    [2] => array("is", "actually", "An array !")
    [3] => 1234
    [4] => 'got problem ?'
    [5] => $GlobalScopeVar
    [6] => function test($noway){echo "this works too !!!";}
    [7] => "blue"
    [8] => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3))
    [9] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
    [10] => "bug"
    [11] => "fixed"
    [12] => "Yeaaaah"
    [13] => "a test"
)
Array
(
    [0] => "a"
    [123] => 123
    [$_POST["hello"]['world']] => array("is", "actually", "An array !")
    [made_up_index_3] => 1234
    [made_up_index_4] => 'got problem ?'
    ["a"] => $GlobalScopeVar
    [$test_further] => function test($noway){echo "this works too !!!";}
    ["yellow"] => "blue"
    ["b"] => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3))
    ["c"] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
    [made_up_index_10] => "bug"
    [made_up_index_11] => "fixed"
    ["mwahahahaa"] => "Yeaaaah"
    ["this is"] => "a test"
)

                                  在线正则表达式演示                                   在线 php 演示

已知错误(已修复(

    $code='array("aaa", "sdsd" => "dsdsd");'; // fail
    $code='array(''aaa'', ''sdsd'' => "dsdsd");'; // fail
    $code='array("aaa", ''sdsd'' => "dsdsd");'; // succeed
    // Which means, if a value with no keys is followed
    // by key => value and they are using the same quotation
    // then it will fail (first value gets merged with the key)

在线错误演示

学分

转到 Bart Kiers 以获取他的递归模式以匹配嵌套括号。

建议

你也许应该使用解析器,因为正则表达式很敏感。 @bwoebi在他的回答中做得很好。

即使你要求正则表达式,它也适用于纯PHP。 token_get_all是这里的关键功能。对于正则表达式,请查看@HamZa的答案。

这里的优点是它比正则表达式更具动态性。正则表达式具有静态模式,而使用 token_get_all,您可以在每个令牌之后决定要做什么。它甚至在必要时转义单引号和反斜杠,这是正则表达式不会做的。

此外,在正则表达式中,即使被注释,你也有问题来想象它应该做什么;当你查看PHP代码时,代码的作用更容易理解。

$code = ' array(
  0  => "a",
  "a" => $GlobalScopeVar,
  "b" => array("nested"=>array(1,2,3)),  
  "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
  "string_literal",
  12345
); ';
$token = token_get_all("<?php ".$code);
$newcode = "";
$i = 0;
while (++$i < count($token)) { // enter into array; then start.
        if (is_array($token[$i]))
                $newcode .= $token[$i][1];
        else
                $newcode .= $token[$i];
        if ($token[$i] == "(") {
                $ending = ")";
                break;
        }
        if ($token[$i] == "[") {
                $ending = "]";
                break;
        }
}
// init variables
$escape = 0;
$wait_for_non_whitespace = 0;
$parenthesis_count = 0;
$entry = "";
// main loop
while (++$i < count($token)) {
        // don't match commas in func($a, $b)
        if ($token[$i] == "(" || $token[$i] == "{") // ( -> normal parenthesis; { -> closures
                $parenthesis_count++;
        if ($token[$i] == ")" || $token[$i] == "}")
                $parenthesis_count--;
        // begin new string after T_DOUBLE_ARROW
        if (!$escape && $wait_for_non_whitespace && (!is_array($token[$i]) || $token[$i][0] != T_WHITESPACE)) {
                $escape = 1;
                $wait_for_non_whitespace = 0;
                $entry .= "'";
        }
        // here is a T_DOUBLE_ARROW, there will be a string after this
        if (is_array($token[$i]) && $token[$i][0] == T_DOUBLE_ARROW && !$escape) {
                $wait_for_non_whitespace = 1;
        }
        // entry ended: comma reached
        if (!$parenthesis_count && $token[$i] == "," || ($parenthesis_count == -1 && $token[$i] == ")" && $ending == ")") || ($ending == "]" && $token[$i] == "]")) {
                // go back to the first non-whitespace
                $whitespaces = "";
                if ($parenthesis_count == -1 || ($ending == "]" && $token[$i] == "]")) {
                        $cut_at = strlen($entry);
                        while ($cut_at && ord($entry[--$cut_at]) <= 0x20); // 0x20 == " "
                        $whitespaces = substr($entry, $cut_at + 1, strlen($entry));
                        $entry = substr($entry, 0, $cut_at + 1);
                }
                // $escape == true means: there was somewhere a T_DOUBLE_ARROW
                if ($escape) {
                        $escape = 0;
                        $newcode .= $entry."'";
                } else {
                        $newcode .= "'".addcslashes($entry, "'''")."'";
                }
                $newcode .= $whitespaces.($parenthesis_count?")":(($ending == "]" && $token[$i] == "]")?"]":","));
                // reset
                $entry = "";
        } else {
                // add actual token to $entry
                if (is_array($token[$i])) {
                        $addChar = $token[$i][1];
                } else {
                        $addChar = $token[$i];
                }
                if ($entry == "" && $token[$i][0] == T_WHITESPACE) {
                        $newcode .= $addChar;
                } else {
                        $entry .= $escape?str_replace(array("'", "''"), array("'''", "''''"), $addChar):$addChar;
                }
        }
}
//append remaining chars like whitespaces or ;
$newcode .= $entry;
print $newcode;

演示位置:http://3v4l.org/qe4Q1

应输出:

array(
  0  => '"a"',
  "a" => '$GlobalScopeVar',
  "b" => 'array("nested"=>array(1,2,3))',  
  "c" => 'function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }',
  '"string_literal"',
  '12345'
) 

您可以获取数组的数据,print_r(eval("return $newcode;"));获取数组的条目:

Array
(
    [0] => "a"
    [a] => $GlobalScopeVar
    [b] => array("nested"=>array(1,2,3))
    [c] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
    [1] => "string_literal"
    [2] => 12345
)

这样做的干净方法显然是使用分词器(但请记住,仅分词器并不能解决问题(。

对于挑战,我打算采用正则表达式方法。

这个想法不是描述PHP语法,而是更多地以负面的方式描述它(换句话说,我只描述基本和所需的PHP结构来获得结果(。这种基本描述的优点是处理比函数、字符串、整数或布尔值更复杂的对象。结果是一个更灵活的模式,可以处理多行/单行注释,heredoc/nowdoc语法:

<pre><?php
$code=' array(
  0   => "a",
  "a" => $GlobalScopeVar,
  "b" => array("nested"=>array(1,2,3)),  
  "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
); ';
$pattern = <<<'EOD'
~
# elements
(?(DEFINE)
    # comments
    (?<comMulti> /'* .*? (?:'*/|'z) )                                              # multiline comment
    (?<comInlin> (?://|'#) 'N* $ )                                                 # inline comment
    (?<comments> 'g<comMulti> | 'g<comInlin> )
    # strings
    (?<strDQ> " (?>[^"'']+|''.)* ")                                                # double quote string
    (?<strSQ> ' (?>[^''']+|''.)* ')                                                # single quote string
    (?<strHND> <<<(["']?)([a-zA-Z]'w*)'g{-2} (?>'R 'N*)*? 'R 'g{-1} ;? (?='R|$) )  # heredoc and nowdoc syntax
    (?<string> 'g<strDQ> | 'g<strSQ> | 'g<strHND> )
    # brackets
    (?<braCrl> { (?> 'g<nobracket> | 'g<brackets> )* } )
    (?<braRnd> '( (?> 'g<nobracket> | 'g<brackets> )* ') )
    (?<braSqr> '[ (?> 'g<nobracket> | 'g<brackets> )* ] )
    (?<brackets> 'g<braCrl> | 'g<braRnd> | 'g<braSqr> )
    # nobracket: content between brackets except other brackets
    (?<nobracket> (?> [^][)(}{"'</'#]+ | 'g<comments> | / | 'g<string> | <+ )+ )
    # ignored elements
    (?<s> 's+ | 'g<comments> )
)
# array components
(?(DEFINE)    
    # key
    (?<key> [0-9]+ | 'g<string> )
    # value
    (?<value> (?> [^][)(}{"'</'#,'s]+ | 'g<s> | / | 'g<string> | <+ | 'g<brackets> )+? (?='g<s>*[,)]) )
)
(?J)
(?: 'G (?!'A)(?<!')) | array 'g<s>* '( ) 'g<s>* 'K
    (?: (?<key> 'g<key> ) 'g<s>* => 'g<s>* )? (?<value> 'g<value> ) 'g<s>* (?:,|,?'g<s>*(?<stop> ') ))
~xsm
EOD;

if (preg_match_all($pattern, $code, $m, PREG_SET_ORDER)) {
    foreach($m as $v) {
        echo "'n<strong>Whole match:</strong> " . $v[0]
           . "'n<strong>Key</strong>:'t" . $v['key']
           . "'n<strong>Value</strong>:'t" . $v['value'] . "'n";
        if (isset($v['stop']))
            echo "'n<strong>done</strong>'n'n"; 
    }
}

这是你要求的,非常紧凑。如果您想进行任何调整,请告诉我。

代码(你可以直接在 php 中运行它(

$code=' array(
  0  => "a",
 "a" => $GlobalScopeVar,
 "b" => array("nested"=>array(1,2,3)),  
 "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
); ';
$regex = "~(?xm)
^['s''"]*([^''"'s]+)[''"'s]*
=>'s*+
(.*?)'s*,?'s*$~";
if(preg_match_all($regex,$code,$matches,PREG_SET_ORDER)) {
    $array=array();
    foreach($matches as $match) {
        $array[$match[1]] = $match[2];
    }
    echo "<pre>";
    print_r($array);
    echo "</pre>";
} // END IF

输出

Array
(
    [0] => "a"
    [a] => $GlobalScopeVar
    [b] => array("nested"=>array(1,2,3))
    [c] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
)

$array包含您的数组。

你喜欢?

如果您有任何问题或需要调整,请告诉我。 :)

仅针对这种情况:

$code=' array(
  0=>"a",
  "a"=>$GlobalScopeVar,
  "b"=>array("nested"=>array(1,2,3)),  
  "c"=>function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
); ';
preg_match_all('#'s*(.*?)'s*=>'s*(.*?)'s*,?'s*$#m', $code, $m);
$array = array_combine($m[1], $m[2]);
print_r($array);

输出:

Array
(
    [0] => "a"
    ["a"] => $GlobalScopeVar
    ["b"] => array("nested"=>array(1,2,3))
    ["c"] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
)