如何在PHP中转换HTML实体和preg_replace


How to convert HTML-ENTITIES and preg_replace in PHP

我正在尝试将 转换为whitespace

然后使用preg_replace做一些正则表达式。

喜欢这个。

$title = " TEST Ok.2-2";
$title = mb_convert_encoding($title, 'UTF-8', 'HTML-ENTITIES');
//$title = html_entity_decode($title, ENT_NOQUOTES, 'UTF-8');
//( MEAN: I can use mb_convert_encoding() or html_entity_decode())
//GOT the same out put = TEST < Ok.2-2.
//So now I have TEST < Ok.2-2
//I want to make a space on Ok so I use preg_replace()
$replace = "~'s+(ok[.]?)~i";
$title = preg_replace($replace, ' OK. ', $title, -1);
$title = preg_replace('/'s+/', ' ', $title);
$title = trim($title);
//The result = TEST < Ok.2-2 (not work!)
echo($title);

使用此代码,mb_convert_encodinghtml_entity_decode运行良好,但是当我尝试使用 preg_replace 来正则表达式空格时,似乎没有找到转换的空格。

现在输出:TEST < Ok.2-2

预期输出:TEST < OK. 2-2

现在我的解决方案

我添加了硬编码str_replace替换空格&nbsp;并使用mb_convert_encoding或html_entity_decode转换另一个 html 实体。

$title = '&nbsp;TEST&nbsp;&lt;&nbsp;Ok.2-2';
$title = str_replace('&nbsp;', ' ', $title);
$title = mb_convert_encoding($title, 'UTF-8', 'HTML-ENTITIES');
//$title = html_entity_decode($title, ENT_NOQUOTES, 'UTF-8');
//( MEAN: I can use mb_convert_encoding() or html_entity_decode())
//GOT the same out put = TEST < Ok.2-2.
//So now I have TEST < Ok.2-2
//I want to make a space on Ok so I use preg_replace()
$replace = '~'s+(ok[.]?)~i';
$title = preg_replace($replace, ' OK. ', $title, -1);
$title = preg_replace('/'s+/', ' ', $title);
$title = trim($title);
//The result TEST < OK. 2-2 (WORK!)
echo($title);

现在我的输出:TEST < OK. 2-2

我的预期:TEST < OK. 2-2

对最佳解决方案有什么建议吗?

我认为这会给你你想要的东西。

$title = trim(
     preg_replace('~'s+~', ' ', 
          str_ireplace(array('&nbsp;', ' ok.'), array(' ', ' OK. '), 
     "&nbsp;TEST&nbsp;Ok.2-2")
     )
);

这将:

  1. 去除前导和尾随空格 ( trim
  2. 将多个空格替换为单个空格 ( preg_replace('~'s+~', ' '
  3. &nbsp;替换为单个空格 ( str_ireplace
  4. 替换ok.OK.不区分大小写 ( str_ireplace

输出:

测试正常。 2-2

您的 HTML 实体解码示例是正确的,http://sandbox.onlinephpfunctions.com/code/eed7e30d507f7197585f29c1fdde9e7744fc572d

$title = html_entity_decode("&nbsp;TEST&nbsp;Ok.2-2", ENT_NOQUOTES, 'UTF-8');
echo $title;

输出:

测试确定2-2

编辑:

<?php
$title = '&nbsp;TEST&nbsp;&lt;&nbsp;Ok.2-2';
$title = trim(preg_replace('~'s+~', ' ', str_ireplace(array('&nbsp;', '&lt;', 'Ok.'), array(' ', '', ' OK. '), $title)));
echo $title;

仅删除带有str_replace的 2 个实体可能更安全。如果您的字符串被<h1>&nbsp;TEST&nbsp;&lt;&nbsp;Ok.2-2</h1>并且您进行了解码,则删除了所有字符串<则字符串将无法正常运行。

输出:

测试正常。 2-2