从正则表达式捕获组中获取所有可能的匹配项


Getting all possible matches from a regex capture group

考虑以下正则表达式:

/'<form.+?((action|id|method|name)=('"|'')(.*?)?('"|'')).*?'>/i

捕获像<form>这样基本的东西就足够了,还可以捕捉像<form action="post.php" method="post" name="form1">这样的东西以及上面表达式中列出的这四个属性的其他各种组合。

我之所以选择这个表达式而不是基本的/'<form.*?'>/i,是因为我想从捕获组2和4中获取值(属性名称和属性值)。然而,当我在上面复杂的表单元素上运行这个表达式时,它只会返回actionpost.php。我希望它返回一个匹配数组。

以下是一些示例代码:

<?php
    $string = '<form action="post.php" method="post" name="form1">';
    preg_match_all('/'<form.+?((action|id|method|name)=('"|'')(.*?)?('"|'')).*?'>/i', $string, $forms);
    print_r($forms);
?>

如果出于演示目的在命令行中运行此程序,则输出如下:

c:'Users'Aaron'Desktop>php test.php
Array
(
    [0] => Array
        (
            [0] => <form action="post.php" method="post" name="form1">
        )
    [1] => Array
        (
            [0] => action="post.php"
        )
    [2] => Array
        (
            [0] => action
        )
    [3] => Array
        (
            [0] => "
        )
    [4] => Array
        (
            [0] => post.php
        )
    [5] => Array
        (
            [0] => "
        )
)

我想要的输出是这样的:

c:'Users'Aaron'Desktop>php test.php
Array
(
    [0] => Array
        (
            [0] => <form action="post.php" method="post" name="form1">
            [1] => <form action="post.php" method="post" name="form1">
            [2] => <form action="post.php" method="post" name="form1">
        )
    [1] => Array
        (
            [0] => action="post.php"
            [1] => method="post"
            [2] => name="form1"
        )
    [2] => Array
        (
            [0] => action
            [1] => method
            [2] => name
        )
    [3] => Array
        (
            [0] => "
            [1] => "
            [2] => "
        )
    [4] => Array
        (
            [0] => post.php
            [1] => post
            [2] => form1
        )
    [5] => Array
        (
            [0] => "
            [1] => "
            [2] => "
        )
)

我目前能够解决这个问题,方法是找到form元素,并为我希望搜索的任意多个属性多次运行表达式。这是代码。但我忍不住想,一定有更简单的方法吗?

所以问题是:我可以从一个捕获组返回所有比赛,而不仅仅是第一场比赛吗?

提前谢谢。

我真诚地建议您不要使用正则表达式处理(HTML),只需使用DOM Parser即可。

代码

<?php
$string = '<form action="post.php" method="post" name="form1">';
$dom = new DOMDocument;
$dom->loadHTML($string);
foreach ($dom->getElementsByTagName('form') as $ftag) {
    if ($ftag->hasAttributes()) {
        foreach ($ftag->attributes as $attribute) {
            $attrib[$attribute->nodeName] = $attribute->nodeValue;
        }
    }
}
print_r($attrib);

输出:

Array
(
    [action] => post.php
    [method] => post
    [name] => form1
)

你必须先找到一个表单元素。

<?php
 $string = '<form action="post.php" method="post" name="form1">';
 preg_match_all('/'<form+?'>/i', $string, $forms);

然后在内部应用正则表达式:

 foreach($form in $forms){
  preg_match_all('/((action|id|method|name)=(("[^"]*")|(''[^'']*''))/i',$form[0],$attrs);
 }
 $form = array_merge($form,$attrs);
 print_r($forms);
?>

如果能用的话,我没有设备可以试一下。希望它能做到:)