我想插入一些数据,这些数据是从谷歌翻译那里得到的。例如:http://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=hello
收到结果后,我想将其插入MySQL表中。所以我写了以下代码:
$link = "http://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=";
$server = "127.0.0.1";
$username = "AliAhmadi";
$password = "AliAhmadi";
$database = "AliAhmadi";
$conn = mysql_pconnect($server, $username, $password);
if (!$conn)
die("Bye Bye");
mysql_select_db($database, $conn);
mysql_set_charset('utf8',$conn);
$ch = curl_init();
$url = $link."hello";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$WebContent = curl_exec($ch);
$update_query = 'update `en_db` SET `meaning`="'.mysql_real_escape_string($WebContent).'" where `id`=1';
mysql_query($update_query,$conn);
mysql_close($conn);
谷歌发送了以下文本文件:
[["سلام", "你好", "]], [["感叹词", ["سلام", "هالو", "الو"], [["سلام", ["你好", "嗨", "阿罗哈", "所有冰雹"]], ["هالو", ["哈洛", "你好", "哈洛"]], ["الو", ["你好"]]]], "en", , [["سلام", [5], 0, 0 , 1000, 0, 1, 0]], [["hello", 4, , , "], ["hello", 5, [["سلام", 1000, 0, 0], ["خوش", 0, 0, 0], ["میهمان گرامی", 0, 0, 0], ["خوش آمدید", 0, 0, 0], ["درود کاربر", 0, 0, 0]], [[0, 5]], "你好"]], , , [["en"]], 74]
但在表中只保存字符串的第一部分:
[[["
我认为问题来自 unicode,因为当我mysql_set_charset('utf8',$conn);
评论时,它会在表中保存一些东西,但看起来像 [[["Èå","to",","]],[["介词",["Èå","ÈÑÇ''u06CC","ÏÑ","ÏÑ ÈÑÇÈÑ","''u06CCÔ","Óæ''u06CC","äÒÏ","ØÑÝ","ÈÓæ''u06CC","
ÊÇ äÓÈÊ Èå","ÈÑ ÍÓÈ","ÈØÑÝ","ÑæÈØÑÝ"],[["Èå",["to","in","on","at","against "]],["ÈÑÇ''u06CC",["为","至","上","为缘故","至","至","以致"]],["ÏÑ",["在","至","关于","至"]],["ÏÑ ÈÑÇÈÑ",["反对","对","至","为","至"]],["''u06CCÔ",["之前","至","与","至"]],["Óæ''u06CC",["至","至"]],["至","近","关于"]],["ØÑÝ",["至","至","至"]],["ÈÓæ''u06CC",["向","至","入","关闭","至","在"]],["ÊÇ äÓÈÊ Èå",["至","至"]],["ÈÑ ÍÓÈ",["根据","在","at","to"]],["ÈØÑÝ",["toward","at","unto","to","in","into"]],["ÑæÈØÑÝ",["unto","to"]]]],[",["ÚáÇãÊ ãÕÏÑ Çäá''u06CCÓ''u06CC ÇÓÊ"],,[["ÚáÇãÊ ãÕÏÑ Çäá''u06CCÓ''u06CC ÇÓÊ",["to"]]]],"en",,[["Èå",[5],0,0,0,1000,0,1,0]],[["to",4,,,"],["to",5,["Èå",1000,0,0],["ÈÑÇ''u06CC",0,0,0],["ÊÇ",0,0,0],["ÑÇ Èå",0,0,0],["Èå ãäÙæÑ",0,0,0]],[[0,2]],"to"]],,,,5]
谷歌翻译返回的Unicode是什么?我的代码问题在哪里?我在utf8_unicode_ci、utf8_general_ci 和 utf8_presian_ci 之间更改了排序规则,但这个问题再次发生。
您的en_db.meaning
列被定义为默认排序规则latin1_swedish_ci
。这使用 ISO-8859-1 (Latin-1) 编码,该编码无法存储阿拉伯字符。
(当您删除mysql_set_charset
调用时,MySQL将您的UTF-8阿拉伯语误解为拉丁字符,这些字符确实适合该列,但看起来完全错误。
确保在创建表时指定使用 UTF-8 的排序规则,例如CREATE TABLE en_db (...) COLLATE utf8_general_ci
或一般(...) CHARACTER SET utf8
(utf8mb4
如果可用,则指定星体平面支持的排序规则)。
您可以使用 ALTER TABLE en_db CONVERT TO CHARACTER SET utf8
更改现有表及其中所有文本列的排序规则,但如果其中已经包含非 ASCII 字符,则无论哪种方式,它们都可能是错误的。
<?php
//Set Beginning of php code:
header("Content-Type: text/html; charset=UTF-8");
mysql_query("SET NAMES 'utf8'");
mysql_query('SET CHARACTER SET utf8');
//then create the connection
$CNN=mysql_connect("localhost","usr_urdu","123") or die('Unable to Connect');
$DB=mysql_select_db('db_urdu',$CNN)or die('Unable to select DB');