PHP RegExp:捕获所有HTML结束标记，后跟换行符 - PHP RegExp: Capture all HTML closing tags followed by new line character

PHP RegExp: Capture all HTML closing tags followed by new line character

本文关键字：换行符结束 HTML RegExp PHP | 更新日期: 2023-09-27

我想捕获任何后面跟着换行符的HTML结束标记，并仅用HTML标记替换它们。

例如，我想把这个：

<ul>'n
    <li>element</li>'n
</ul>'n'n
<br/>'n'n
Some text'n

进入这个：

<ul>
    <li>element</li>
</ul>'n
<br/>'n
Some text'n

问题是我无法用regex:捕获'n字符

preg_match_all('/(<'/[a-zA-Z]*>|<[a-zA-Z]*'/>)'n/s', $in, $matches);

一旦我将匹配数组放置在模式中的某个位置，匹配数组就会返回空值。

有趣的是，如果我只尝试单独匹配'n字符，它会找到所有字符：

preg_match_all('/'n/s', $in, $matches);

尝试：

preg_match_all('/(<'/[a-zA-Z]*>|<[a-zA-Z]*'/>)''n/s', $in, $matches);

您必须转义"''"字符。

您可以使用以下内容：

(<[^>]+>)$'R{2}
# capture anything between a pair of < and > at the end of the line
# followed by two newline characters

您需要使用multiline模式，请参阅regex101.com上的演示。
在PHP中，这将是：

$regex = '~(<[^>]+>)$'R{2}~m';
$string = preg_replace($regex, "$1", $your_string_here);

一般来说，DomDocument解析器提供了保留或丢弃空白的可能性，因此您可能更适合使用它。