谷歌语音API重复响应


Google Speech API duplicates responses

我使用语音API v2与PHP,这里是一个代码:

 $file_to_upload = array('myfile'=>'@'.$filename.'.flac');
 $ch = curl_init();
 curl_setopt($ch, CURLOPT_URL, "https://www.google.com/speech-api/v2/recognize?output=json&lang=ru-RU&key=___my_api_key___"); 
 curl_setopt($ch, CURLOPT_POST, 1);
 curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: audio/x-flac; rate=8000"));
 curl_setopt($ch, CURLOPT_POSTFIELDS, $file_to_upload);
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 $result=curl_exec ($ch);

谷歌响应与两个JSON对象,第一个是空的,第二个有有效的响应,正如我所期望的。这会给解析和进一步处理带来困难。参见HTTP转储:

My POST request:

POST /speech-api/v2/recognize?output=json&lang=ru-RU&key=___my_api_key___ HTTP/1.1
Host: www.google.com
Accept: */*
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36
Content-Length: 13123
Expect: 100-continue
Content-Type: audio/x-flac; rate=8000; boundary=----------------------------9641e899ac92
------------------------------9641e899ac92
Content-Disposition: form-data; name="myfile"; filename="/tmp/voice/1400157667.6440-in.wav.flac"
Content-Type: application/octet-stream
fLaC..."......e..'......! ..{..!y>..7..............................( ...reference libFLAC 1.2.1 20070917.
...encoded binary data...
------------------------------9641e899ac92--

具有重复识别结果的响应:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Disposition: attachment
Cache-Control: no-transform
X-Content-Type-Options: nosniff
Pragma: no-cache
Date: Thu, 15 May 2014 12:41:09 GMT
Server: S3 v1.0
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 80:quic
Transfer-Encoding: chunked
e
{"result":[]}    <--- first one
f8
{"result":[{"alternative":[{"transcript":"............","confidence":0.73531097},{"transcript":"................"},{"transcript":".............."},{"transcript":"................"},{"transcript":"............ .."}],"final":true}],"result_index":0}   <--- second one
0

为什么会发生?当我使用API v1时,它只有一个响应。其他v2在互联网上的例子也只有一个。

首先,要确保您使用的语言提供了说话者Diarization。例如,对于哥伦比亚的西班牙语,谷歌不提供讲话者分类,但对于来自西班牙的西班牙语,它提供:

语言支持

此外,有时需要稍微改变音频,可以使用ffmpeg:

实现
ffmpeg -i input.wav -ac 1 -ab 128k -filter:a volume=0.9 -filter:a equalizer=f=4000:t=h:w=200:g=-2 output.wav