正则表达式不与多行文本


Regex not with multi line text

我正在为一款交易卡牌游戏创建价格指南。请原谅这里的书呆子水平。我从一个网站拉数据

$data = mb_convert_encoding(file_get_contents("http://yugioh.wikia.com/api.php?action=query&prop=revisions&titles=Elemental%20HERO%20Shining%20Flare%20Wingman&rvprop=content&format=php"), "HTML-ENTITIES", "UTF-8");

然后我使用一系列Regex语句解析它。

preg_match_all('/(?<='|'slore)'s+'=(.*)/', $data, $matches);
$text = $matches[1][0]; //it prints out here just fine
$text = preg_replace("/('['[('w+|'s)*'|)/sx", "" , $text); //it disappears if I try to print it here
$text = preg_replace("/'['[/", "" , $text);
$text = preg_replace("/']']/", "" , $text);

从上面的行可以看到,在我获取匹配的第二行,如果我在它后面加上print_r语句,它将打印文本。在下一行,如果我在它后面加上一个print语句,它将不会打印任何东西。因此,按照这个逻辑,它意味着regex没有正确解析。我做错了什么?我认为这与多行有关,但我试过了,没有帮助。

编辑

这是第一次pull后的文本

 "[[Elemental HERO Flame Wingman]]" + "[[Elemental HERO Sparkman]]"
Must be [[Fusion Summon]]ed and cannot be [[Special Summon]]ed by other ways. This card gains 300 [[ATK]] for each "[[Elemental HERO]]" card in your [[Graveyard]]. When this card [[destroy]]s a [[Monster Card|monster]] [[Destroyed by Battle|by battle]] and [[send]]s it to the Graveyard: Inflict [[Effect Damage|damage]] to your opponent equal to the ATK of the destroyed monster in the Graveyard.

此正则表达式/('['[('w+|'s)*'|)/sx包含嵌套量词:'w+量词一起使用,*应用于整个交替组。这会产生大量的回溯步骤,并导致灾难性的回溯。

避免这个问题的最好方法是通过字符类['w's]*(匹配0个或多个字母数字字符或空白符号)。

参见IDEONE demo:

$s = "'"[[Elemental HERO Flame Wingman]]'" + '"[[Elemental HERO Sparkman]]'"'nMust be [[Fusion Summon]]ed and cannot be [[Special Summon]]ed by other ways. This card gains 300 [[ATK]] for each '"[[Elemental HERO]]'" card in your [[Graveyard]]. When this card [[destroy]]s a [[Monster Card|monster]] [[Destroyed by Battle|by battle]] and [[send]]s it to the Graveyard: Inflict [[Effect Damage|damage]] to your opponent equal to the ATK of the destroyed monster in the Graveyard.";
$s = preg_replace('/('['[(['w's]*)'|)/', "" , $s);
echo $s;

还要注意,您不需要x修饰符(因为模式本身没有注释和无意义的空白)和s修饰符(因为模式中没有.)。