当使用维基百科API服务器端时,单词显示为乱码


words show up in gibberish when using wikipedia api server-side

我想从维基百科的文章中得到一个简短的摘录。在我的浏览器中使用以下url:http://en.wikipedia.org//w/api.php?action=query&支持= extracts&格式= txt& exsentences = 2, exlimit = 10, exintro =, explaintext =, iwurl =,标题=希腊% 20语言

我在浏览器中得到以下结果:

Array
(
[query] => Array
    (
        [pages] => Array
            (
                [11887] => Array
                    (
                        [pageid] => 11887
                        [ns] => 0
                        [title] => Greek language
                        [extract] => Greek (Modern Greek: ελληνικά [eliniˈka] "Greek" and ελληνική γλώσσα [eliniˈci ˈɣlosa] ( ) "Greek language") is an independent branch of the Indo-European family of languages. Native to the southern Balkans, western Asia Minor, Greece, the Aegean Islands, and Cyprus it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. 
                    )
            )
    )
)

这很好。

问题是,当我使用相同的url试图用php服务器端CURL抓取它时,外国字母显示为胡言乱语。我是这样做的:

$url = 'http://en.wikipedia.org//w/api.php?action=query&prop=extracts&format=txt&exsentences=2&exlimit=10&exintro=&explaintext=&iwurl=&titles=Greek%20language';
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript"); 
$c = curl_exec($ch);
echo $c;

给出如下结果:

Array ( [query] => Array ( [pages] => Array ( [11887] => Array ( [pageid] => 11887 [ns] => 0 [title] => Greek language [extract] => Greek (Modern Greek: ελληνικά [eliniˈka] "Greek" and ελληνική γλώσσα [eliniˈci ˈɣlosa] ( ) "Greek language") is an independent branch of the Indo-European family of languages. Native to the southern Balkans, western Asia Minor, Greece, the Aegean Islands, and Cyprus it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. ) ) ) )

但是外来词是胡言乱语。我在其他关于外语的文章中得到了同样的结果。如何正确地接收和呈现外国信件?

您需要设置header

<?php
header('Content-Type: text/html;charset=utf-8'); //<--- Add this

这是因为这些字符是Unicode的,所以需要隐式地设置标题以反映字符集。