将JavaScript函数转换为PHP函数(用于将字符串转换为HTML编码文本)


Converting a JavaScript function to a PHP function(used to convert string to HTML encoded text)

根据这里的函数http://www.unicodetools.com/unicode/convert-to-html.php,该函数用于将字符串转换为HTML编码文本。

JavaScript是:

function a(b) {
    var c= '';
    for(i=0; i<b.length; i++) {
        if(b.charCodeAt(i)>127) {
            c += '&#' + b.charCodeAt(i) + ';'; 
        } else { 
            c += b.charAt(i); 
        }
  }
  document.forms.conversionForm.outputText.value = c;
}

和我的尝试是:

function str_to_html_entity($str) {
    $output = NULL;
    for($i = 0; $i < strlen($str); $i++) {
        if(ord($str) > 127) {
            $output .= '&#' + ord($str) + ';'; 
        } else { 
            $output .= substr($str, $i); 
        }
  }
  return $output;
}
echo str_to_html_entity("Thére Àre sôme spëcial charâcters ïn thìs têxt");

我的PHP函数运行正确,但结果不是我所期望的:

我结果:

Thére Àre sôme spëcial charâcters ïn thìs têxthére Àre sôme spëcial charâcters ïn thìs têxtére Àre sôme spëcial charâcters ïn thìs têxt�re Àre sôme spëcial charâcters ïn thìs têxtre Àre sôme spëcial charâcters ïn thìs têxte Àre sôme spëcial charâcters ïn thìs têxt Àre sôme spëcial charâcters ïn thìs têxtÀre sôme spëcial charâcters ïn thìs têxt�re sôme spëcial charâcters ïn thìs têxtre sôme spëcial charâcters ïn thìs têxte sôme spëcial charâcters ïn thìs têxt sôme spëcial charâcters ïn thìs têxtsôme spëcial charâcters ïn thìs têxtôme spëcial charâcters ïn thìs têxt�me spëcial charâcters ïn thìs têxtme spëcial charâcters ïn thìs têxte spëcial charâcters ïn thìs têxt spëcial charâcters ïn thìs têxtspëcial charâcters ïn thìs têxtpëcial charâcters ïn thìs têxtëcial charâcters ïn thìs têxt�cial charâcters ïn thìs têxtcial charâcters ïn thìs têxtial charâcters ïn thìs têxtal charâcters ïn thìs têxtl charâcters ïn thìs têxt charâcters ïn thìs têxtcharâcters ïn thìs têxtharâcters ïn thìs têxtarâcters ïn thìs têxtrâcters ïn thìs têxtâcters ïn thìs têxt�cters ïn thìs têxtcters ïn thìs têxtters ïn thìs têxters ïn thìs têxtrs ïn thìs têxts ïn thìs têxt ïn thìs têxtïn thìs têxt�n thìs têxtn thìs têxt thìs têxtthìs têxthìs têxtìs têxt�s têxts têxt têxttêxtêxt�xtxtt

预期结果:

Th&#233;re &#192;re s&#244;me sp&#235;cial char&#226;cters &#239;n th&#236;s t&#234;xt

有人能告诉我PHP函数有什么问题吗?

感谢

更新
function str_to_html_entity($str) {
    $result = null;
    for ($i = 0, $length = mb_strlen($str, 'UTF-8'); $i < $length; $i++) {
        $character = mb_substr($str, $i, 1, 'UTF-8');
        if (strlen($character) > 1) {  // the character consists of more than 1 byte
            $character = htmlentities($character, ENT_COMPAT, 'UTF-8');
        }
        $result .= $character;
    }
  return $result;
}
echo str_to_html_entity("Thére Àre"); // Th&eacute;re &Agrave;re
echo str_to_html_entity("中"); // 中

一般:

  • Javascript字符串是Unicode感知的,这意味着str[0]将返回一个字符,无论这个字符有多长。charCodeAt将正确返回任何字符的字符代码。
  • PHP字符串是哑二进制数组,其中一个字符可能占用多个字节。$str[0]ord只处理单个字节,因此会混淆任何多字节字符。请参阅每个程序员绝对需要了解的编码和字符集,以使用文本进行深入解释。

因此,您不能在PHP中复制完全相同的算法。此外,在循环中,您使用整个$str而不是字符串偏移量,这是您的另一个主要问题。要使它支持Unicode,这可能是最好的方法:

$result = null;
foreach (preg_split('/./u', $str) as $character) {
    if (strlen($character) > 1) {  // the character consists of more than 1 byte
        $character = mb_convert_encoding($character, 'HTML-ENTITIES', 'UTF-8');
    }
    $result .= $character;
}

期望字符串是UTF-8编码。正如你所看到的,有一个很好的函数叫做mb_convert_encoding,它可以一次转义整个文本块,你实际上是在重新发明它。

unicode受损的pcre的替代版本:

$result = null;
for ($i = 0, $length = mb_strlen($str, 'UTF-8'); $i < $length; $i++) {
    $character = mb_substr($str, $i, 1, 'UTF-8');
    if (strlen($character) > 1) {  // the character consists of more than 1 byte
        $character = mb_convert_encoding($character, 'HTML-ENTITIES', 'UTF-8');
    }
    $result .= $character;
}

但是说真的,只要使用$str = mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8')就可以了。不需要循环

您的函数中有几个错误。检查我的一些修复

function str_to_html_entity($str) {
    $output = NULL;
    $lenght = strlen($str);
    for($i = 0; $i < $lenght; $i++) {
        if(ord($str[$i]) > 127) {
            $output .= '&#' . ord($str[$i]) . ';';
        } else {
            $output.= $str[$i];
        }
  }
  return $output;
}

编辑1

也使用

   $lenght = strlen($str);
优化