PHP Utf8解码问题 - PHP Utf8 Decoding Issue

PHP Utf8 Decoding Issue

本文关键字：问题解码 Utf8 PHP | 更新日期: 2023-09-27

我有以下地址行:Praha 5, star m，

我需要在这个字符串上使用utf8_decode()函数，然后才能将其写入PDF文件(使用domPDF库)。

但是，上面地址行的php utf8解码函数似乎不正确(或者更确切地说，不完整)。

以下代码:

<?php echo utf8_decode('Praha 5, Staré Město,'); ?>

生产:

Praha 5, star M?国标,

知道为什么没有被解码吗?

utf8_decode 将字符串从UTF-8编码转换为ISO-8859-1。"latin - 1"。
Latin-1编码不能表示字母"i"。就这么简单。
"解码"完全是用词不当，它的作用与iconv('UTF-8', 'ISO-8859-1', $string)相同。

看看每个程序员绝对需要知道的编码和字符集来处理文本

我最终使用了自己开发的UTF-8/UTF-16解码功能(转换为&#number;表示)，我没有找到任何模式来解释为什么没有检测到UTF-8，我怀疑这是因为"编码为"序列在返回的字符串中并不总是完全处于相同的位置。你可以做一些额外的检查。

三字符UTF-8指示符:$startutf8 = chr(0xEF).chr(187).chr(191);(如果您在任何地方看到这个，而不仅仅是前三个字符，字符串是UTF-8编码的)

按照UTF-8规则解码;这取代了早期的版本，该版本一个字节一个字节地使用

function charset_decode_utf_8 ($string) {
/* Only do the slow convert if there are 8-bit characters */
/* avoid using 0xA0 ('240) in ereg ranges. RH73 does not like that */
if (! ereg("['200-'237]", $string) and ! ereg("['241-'377]", $string))
    return $string;
// decode three byte unicode characters
$string = preg_replace("/(['340-'357])(['200-'277])(['200-'277])/e",       
"'&#'.((ord('''1')-224)*4096 + (ord('''2')-128)*64 + (ord('''3')-128)).';'",   
$string);
// decode two byte unicode characters
$string = preg_replace("/(['300-'337])(['200-'277])/e",
"'&#'.((ord('''1')-192)*64+(ord('''2')-128)).';'",
$string);
return $string;
}

问题是在你的PHP文件编码，保存你的文件在UTF-8编码，然后甚至不需要使用utf8_decode，如果你得到这些数据'Praha 5, Staré Město,'从数据库，最好改变它的字符集为UTF-8

你不需要(@Rajeev:这个字符串被自动检测为utf-8编码:

echo mb_detect_encoding('Praha 5, Staré Město,');

将始终返回UTF-8。

你宁愿看到:https://code.google.com/p/dompdf/wiki/CPDFUnicode