我想从维基百科的文章中得到一个简短的摘录。在我的浏览器中使用以下url:http://en.wikipedia.org//w/api.php?action=query&支持= extracts&格式= txt& exsentences = 2, exlimit = 10, exintro =, explaintext =, iwurl =,标题=希腊% 20语言
我在浏览器中得到以下结果:
Array
(
[query] => Array
(
[pages] => Array
(
[11887] => Array
(
[pageid] => 11887
[ns] => 0
[title] => Greek language
[extract] => Greek (Modern Greek: ελληνικά [eliniˈka] "Greek" and ελληνική γλώσσα [eliniˈci ˈɣlosa] ( ) "Greek language") is an independent branch of the Indo-European family of languages. Native to the southern Balkans, western Asia Minor, Greece, the Aegean Islands, and Cyprus it has the longest documented history of any Indo-European language, spanning 34 centuries of written records.
)
)
)
)
这很好。
问题是,当我使用相同的url试图用php服务器端CURL抓取它时,外国字母显示为胡言乱语。我是这样做的:
$url = 'http://en.wikipedia.org//w/api.php?action=query&prop=extracts&format=txt&exsentences=2&exlimit=10&exintro=&explaintext=&iwurl=&titles=Greek%20language';
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript");
$c = curl_exec($ch);
echo $c;
给出如下结果:
Array ( [query] => Array ( [pages] => Array ( [11887] => Array ( [pageid] => 11887 [ns] => 0 [title] => Greek language [extract] => Greek (Modern Greek: ελληνικά [eliniˈka] "Greek" and ελληνική γλώσσα [eliniˈci ˈɣlosa] ( ) "Greek language") is an independent branch of the Indo-European family of languages. Native to the southern Balkans, western Asia Minor, Greece, the Aegean Islands, and Cyprus it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. ) ) ) )
但是外来词是胡言乱语。我在其他关于外语的文章中得到了同样的结果。如何正确地接收和呈现外国信件?
您需要设置header
<?php
header('Content-Type: text/html;charset=utf-8'); //<--- Add this
这是因为这些字符是Unicode的,所以需要隐式地设置标题以反映字符集。