PHP在没有API的情况下获取Youtube评论


PHP fetch Youtube Comments without API

我正在使用PHP,并试图抓取youtube评论,但没有使用youtube API

我可以很好地获取第一页的评论,但带有标记的"显示更多"形式很难。

这是显示更多请求的实时标题

https://www.youtube.com/comment_ajax?action_load_comments=1&filter=-kWHMH2kxXs&order_by_time=false
POST /comment_ajax?action_load_comments=1&filter=-kWHMH2kxXs&order_by_time=false HTTP/1.1
Host: www.youtube.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
X-YouTube-Page-CL: 78947767
X-YouTube-Page-Timestamp: Fri Oct 31 12:43:20 2014 (1414784600)
X-YouTube-Variants-Checksum: 9225d6367a37f5c51f11f11009c7ed18
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Referer: https://www.youtube.com/all_comments?v=-kWHMH2kxXs
Content-Length: 1712
Cookie: VISITOR_INFO1_LIVE=xJf1SvXMyvQ; PREF=fv=12.0.0&al=en&f5=30&f1=50000000; YSC=AQRpt5kYK6k; ACTIVITY=1414984537233; SID=DQAAANgAAAAVDrV25qIMIY1h4EnHCkB8QuHQpVPP2YhTT2PPvE2wkYbGLdHG9xQWEX_ADKYlKolQJRwza-Js_dmVlB-No68zaXnhLFm0NnbUaEV4zsaUwT5R_Kg1YvR2RvixP0OIw603Gax8sXIfXHmALqdYxWJ46Dt1qh2TmVoX06w7KlOQgvBE6_yViqu4j0b1iUSdVwJfMkhi8NVymGGsHWOVm027hdYdKKJTUC8-PJYbVKvItugatr0dJRL5_s6_l-P1ZFP2-OKrhb0H3ORmPU1EaFtsbPB3ZFjut09hxPFKCOq51w; HSID=A7izZQDyAow9noXRe; SSID=ALKL6xYLDPNuetixT; APISID=mOy8lSC4EJ0mUUs_/AjHuF16GPSuuL0gzQ; SAPISID=tmpcNtK_8ScRg3Mc/AMrBhFSc1O6ejr4HK; LOGIN_INFO=fe8cf1ebd49b7874eaf88ae7e3930925c2sAAAB7IjgiOiAxMzY1NjQ3NTc1MzUsICI0IjogIkdBSUEiLCAiNyI6IDE0MTQ5NDk2NDcsICIxIjogMSwgIjIiOiAiSWtxR3c2a2ZQdVRRQktZWHhGdWRaZz09IiwgIjMiOiAyNzc5ODQzNTk2fQ==; lwb=1; wide=1
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
page_token=Cg0Q8o2Ks8HdwQIgACgBErMHCAIQ6OuMlsHdwQIqpQe2m7DWq%2BSSlskBkZa45rrvrcmUAfyw%2B4WRtdrj%2FAG57%2FHkzJuNvcYB2POv74G5pZ8Mo%2Frf4Om0q%2FgS39TjhqfnzrMd6JbK9qnmj5sCueOj2YLYwozGAe7z0di9pPucHO3%2Fs42y3ejI%2BQHLhcen4eTv5x7NgNyI35nQrIcB1emWlMOJtcnlAdflt%2BbyjLnC4QGZktCSyZro5Anvhui%2BqObE%2B9kB5cahueaIkf27Ae7kkt%2Fry%2BqyOKfNh%2BCooZ%2FwcJWwt%2BP2xZmjGpO5lIqhrayYlQG4t9z%2Bo%2FDA2sEB94DOtZmc2dQG%2FavmlfWh5PkehJ6lrrLIzIW6Adr9%2BuHWkLmPFITArbbFleOkzQH8ytjyg7iuloABg6j8t%2Bj2ypG6AaeNhpf5yq6FnAGxq7u%2Bx6Sjl4EBg4vMmNDj3ulAsK7JrsSUz9%2FKAeTs4OfIwt33ygG277%2BRvaHOpE7Vqtuw6ZOy%2FR%2By7Nrixrj%2BzgWqs8avyoSK2tgBp6fHg%2BKLh4dr6NOqqLnb8JnnAYKH87SF%2B%2BXujgHDv7SArt7Z0ZUBu6Sy3YTBo%2FZ9%2Ft7QwYSLneytAaXG%2BfKblNOsVO3Bj%2BHxiqnLSJmuyeXj6ZT67wGK%2F%2FO9n9nA0uIBuIaez6%2BP1aWWAeybq8L%2F44efFuWNrO7h8JCCBPiVz8eq0sCnQcHjjaS2go3M5QH0wI3WwNa%2B9Uq1o8mZ85SZ4jiKs5O38prnsq8B5pDPvK%2FxlbFGz7%2FA%2BuSR2cffAZePkYuJ35n6Mt%2FUyO73kfqiOMGetu%2F%2Fr%2BCDkAHgkeLxh5K1ugWmoLjQis%2FZlBzfhbLI%2Ff7%2BjpkBrYHs0fvW%2B%2FvAAYCl99yYrb66hwGP6vuCparSjqYB1LbP14TY8ZJg7oyBnOGppMCgAY6u1rXq9pr9nAGn167P5Mmp1iCR%2F%2FLzxI%2Fw1njc5KjE2cGZpGSKivTwzv7HgmnkmIjsn7ra6vkB%2FI2mxMXS9ZPaAarLnYXDnumlggHvobaC%2BYuQr9kBk7i9hsKjgd1XiY7T2u%2BHys3PAb2Nw97A7Z2vrQGfof2Coeay7nSG68mDmpjcnQqWw7W157Tw9GLdq9TXpajvrbcByp2Iy8Cex6rdAfqvsIGn0uHwaMOap9bYho%2BaVLrAxbKru6C1ngGt1drm%2FsWs9yW%2Fm9722KfXxr4Bq8GDlZbnz4wPj4GPj5bd0MzOAb3ZiovalKGiL7zYhbO81fvlPJjlto%2Bh26qewgHUy%2B3ZxuXk4AYYAQ%3D%3D&session_token=QUFFLUhqa0RnaTI1Z3dnTVdTaTZEbUM2Vkp4WWpnTlJPUXxBQ3Jtc0tuN2g1WTh0c2FIa2JMU0FER1oxWU5HTzNtMzMyLXRuUEYzeldzMU5VMnhmOXdUN0U2TnEzNW9KSjFXb0FoV1Y0QUxSTG9SWXVlSXk0am50RTFBcUhnSi10QklHdXVlcE5LUGVYLXZ0YzNvajUzYTZGRUhtbWVISklFS2JSSjVwRXdRWERua04yYktDd243NGFfOGcyVUdIZXVaMmc%3D
HTTP/1.1 200 OK
Alternate-Protocol: 443:quic,p=0.01
Cache-Control: no-cache
Content-Disposition: attachment
Content-Encoding: gzip
Content-Length: 13424
Content-Type: application/json; charset=UTF-8
Date: Mon, 03 Nov 2014 04:24:24 GMT
Expires: Tue, 27 Apr 1971 19:44:06 EST
Server: gwiseguy/2.0
x-content-type-options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Firefox-Spdy: 3.1

我可以用所有的URL参数来构建我的CURL请求,这就是的简单部分

https://www.youtube.com/comment_ajax?action_load_comments=1&filter=-kWHMH2kxXs&order_by_time=false

但是,如何包含page_token?

我已经从评论页面中提取了令牌,显示了更多的形式,但我不知道该如何将其包含在中

我试过这个

$headers = array(
            "Cache-Control: no-cache",
            "page_token=" . $dataToken
        );
        curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

但我一直被踢回错误

[HTTP/1.1 403 Forbidden Date: Mon, 03 Nov 2014 04:58:24 GMT Server: gwiseguy/2.0 Cache-Control: no-cache X-Frame-Options: SAMEORIGIN X-Content-Type-Options: nosniff Content-Type: text/html; charset=utf-8 X-XSS-Protection: 1; mode=block; report=https://www.google.com/appserve/security-bugs/log/youtube Expires: Tue, 27 Apr 1971 19:44:06 EST Content-Length: 0 Alternate-Protocol: 443:quic,p=0.01 ] 

我有所有需要的数据,我只是不确定如何构建我的CURL请求。

任何帮助都将不胜感激。

谢谢。

虽然目前只支持v2 API,但您应该重新考虑使用它

https://gdata.youtube.com/feeds/api/videos/-kWHMH2kxXs/comments?orderby=published

你只需要点击提要的next链接就可以获得下一批评论。

page_token不是标头的一部分,它是POST变量。你必须把它放在你请求的正文部分。

使用curl,我认为你必须使用以下内容:

$fs="page_token=".urlencode("<your_token_goes_here>");
curl_setopt($ch,CURLOPT_POST, 1);
curl_setopt($ch,CURLOPT_POSTFIELDS, $fs);