使用 UTF-8 时无法将字符串插入 MySQL


Can't insert string into MySQL when using UTF-8

我想插入一些数据,这些数据是从谷歌翻译那里得到的。例如:http://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=hello

收到结果后,我想将其插入MySQL表中。所以我写了以下代码:

$link     = "http://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=";
$server   = "127.0.0.1";
$username = "AliAhmadi";
$password = "AliAhmadi";
$database = "AliAhmadi";
$conn     = mysql_pconnect($server, $username, $password);
if (!$conn)
     die("Bye Bye");
mysql_select_db($database, $conn);
mysql_set_charset('utf8',$conn);
$ch       = curl_init();
$url          = $link."hello";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$WebContent   = curl_exec($ch);
$update_query = 'update `en_db` SET `meaning`="'.mysql_real_escape_string($WebContent).'" where `id`=1';
mysql_query($update_query,$conn);
mysql_close($conn);

谷歌发送了以下文本文件:

[["سلام", "你好", "]], [["感叹词", ["سلام", "هالو", "الو"], [["سلام", ["你好", "嗨", "阿罗哈", "所有冰雹"]], ["هالو", ["哈洛", "你好", "哈洛"]], ["الو", ["你好"]]]], "en", , [["سلام", [5], 0, 0 , 1000, 0, 1, 0]], [["hello", 4, , , "], ["hello", 5, [["سلام", 1000, 0, 0], ["خوش", 0, 0, 0], ["میهمان گرامی", 0, 0, 0], ["خوش آمدید", 0, 0, 0], [
"درود کاربر", 0, 0, 0]], [[0, 5]], "你好"]], , , [["en"]], 74]

但在表中只保存字符串的第一部分:

[
[["

我认为问题来自 unicode,因为当我mysql_set_charset('utf8',$conn);评论时,它会在表中保存一些东西,但看起来像 [[["Èå","to",","]],[["介词",["Èå","ÈÑÇ''u06CC","ÏÑ","ÏÑ ÈÑÇÈÑ","''u06CCÔ","Óæ''u06CC","äÒÏ","ØÑÝ","ÈÓæ''u06CC","

ÊÇ äÓÈÊ Èå","ÈÑ ÍÓÈ","ÈØÑÝ","ÑæÈØÑÝ"],[["Èå",["to","in","on","at","against "]],["ÈÑÇ''u06CC",["为","至","上","为缘故","至","至","以致"]],["ÏÑ",["在","至","关于","至"]],["ÏÑ ÈÑÇÈÑ",["反对","对","至","为","至"]],["''u06CCÔ",["之前","至","与","至"]],["Óæ''u06CC",["至","至"]],["至","近","关于"]],["ØÑÝ",["至","至","至"]],["ÈÓæ''u06CC",["向","至","入","关闭","至","在"]],["ÊÇ äÓÈÊ Èå",["至","至"]],["ÈÑ ÍÓÈ",["根据","在","at","to"]],["ÈØÑÝ
",["toward","at","unto","to","in","into"]],["ÑæÈØÑÝ",["unto","to"]]]],[",["ÚáÇãÊ ãÕÏÑ Çäá''u06CCÓ''u06CC ÇÓÊ"],,[["ÚáÇãÊ ãÕÏÑ Çäá''u06CCÓ''u06CC ÇÓÊ",["to"]]]],"en",,[["Èå",[5],0,0,0,1000,0,1,0]],[["to",4,,,"],["to",5,["Èå",1000,0,0],["ÈÑÇ''u06CC",0,0,0],["ÊÇ",0,0,0],["ÑÇ Èå",0,0,0],["Èå ãäÙæÑ",0,0,0]],[[0,2]],"to"]],,,,5]

谷歌翻译返回的Unicode是什么?我的代码问题在哪里?我在utf8_unicode_ci、utf8_general_ci 和 utf8_presian_ci 之间更改了排序规则,但这个问题再次发生。

我相信

您的en_db.meaning列被定义为默认排序规则latin1_swedish_ci。这使用 ISO-8859-1 (Latin-1) 编码,该编码无法存储阿拉伯字符。

(当您删除mysql_set_charset调用时,MySQL将您的UTF-8阿拉伯语误解为拉丁字符,这些字符确实适合该列,但看起来完全错误。

确保在创建表时指定使用 UTF-8 的排序规则,例如CREATE TABLE en_db (...) COLLATE utf8_general_ci或一般(...) CHARACTER SET utf8utf8mb4如果可用,则指定星体平面支持的排序规则)。

您可以使用 ALTER TABLE en_db CONVERT TO CHARACTER SET utf8 更改现有表及其中所有文本列的排序规则,但如果其中已经包含非 ASCII 字符,则无论哪种方式,它们都可能是错误的。

<?php
//Set Beginning of php code:
header("Content-Type: text/html; charset=UTF-8");
mysql_query("SET NAMES 'utf8'"); 
mysql_query('SET CHARACTER SET utf8');
//then create the connection 
$CNN=mysql_connect("localhost","usr_urdu","123") or die('Unable to Connect');
$DB=mysql_select_db('db_urdu',$CNN)or die('Unable to select DB');