字符编码问题- UTF-8 /问题,而在互联网上传输数据


Character encoding issues - UTF-8 / Issue while transmitting data on the internet?

我已经从客户端发送了数据,它是这样发送的:

// $booktitle = "Comí habitación bailé"
$xml_obj = new DOMDocument('1.0', 'utf-8');
// node created with booktitle and added to xml_obj 
// NO htmlentities / other transformations done
$returnHeader = drupal_http_request($url, $headers = array("Content-Type:  text/xml; charset=utf-8"), $method = 'POST', $data = $xml_data, $retry = 3);

当我收到它在我的结束(通过drupal_http_request),我对它做htmlentities,我得到以下内容:

 Comí habitación bailé

显示时看起来像胡言乱语:

 Comí Habitación Bailé

怎么了?


编辑1)

<?php
$title = "Comí habitación bailé";
echo "title=$title'n";
echo 'encoding is '.mb_detect_encoding($title);
$heutf8 = htmlentities($title, ENT_COMPAT, "UTF-8");
echo "heutf8=$heutf8'n";
?>

在Windows机器上运行这个测试脚本并重定向到一个文件显示:

title=Comí habitación bailé
encoding is UTF-8heutf8=

在linux系统上运行:

title=Comí habitación bailé
encoding is UTF-8PHP Warning:  htmlentities(): Invalid multibyte sequence in argument in /home/testaccount/public_html/test2.php on line 5
heutf8=

我认为你不应该用htmlentities来编码实体,只是为了正确输出它(你应该在评论中使用htmlspecialchars来避免交叉脚本),只是设置正确的标头和元结束通常回显值:

<?php
 header ('Content-type: text/html; charset=utf-8');
 ?>
 <html>
 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 </head>
 <body>
 </body>
 </html>

htmlentities默认将其输入解释为ISO-8859-1;你正在传递UTF-8作为字符集参数吗?

尝试以键/值数组格式传递标头信息。

比如

$headers = array("Content-Type" => "text/xml; charset=utf-8"")