用PHP清除HTML和编码问题 - Scraping HTML with PHP and encoding problems

Scraping HTML with PHP and encoding problems

本文关键字：编码问题 HTML PHP 清除 | 更新日期: 2023-09-27

我正在尝试用PHP抓取以下url：http://www.clubedoricardo.com.br/Produto/Smartphone-Samsung-Galaxy-Win-2-Duos-G360-Cinza-Dual-Chip-4G-Tela-45-Camera-5MP-Frontal-2MP-Quad-Core-12Ghz-8GB/44-491-496-568187

$url="http://www.clubedoricardo.com.br/Produto/Smartphone-Samsung-Galaxy-Win-2-Duos-G360-Cinza-Dual-Chip-4G-Tela-45-Camera-5MP-Frontal-2MP-Quad-Core-12Ghz-8GB/44-491-496-568187";
$dom = new DOMDocument;
$dom->loadHTMLFile($url);
$page_content = $dom->saveHTML();
echo($page_content);

但文本中有一些奇怪的字符。我尝试使用UTF-8和ISO-8859进行编码，但没有任何变化。

有什么想法吗？

当我点击您提供的链接时，会出现一个空白网站。尝试：

$dom->loadHTML(mb_convert_encoding($url, 'HTML-ENTITIES', 'UTF-8'));