用PHP清除HTML和编码问题


Scraping HTML with PHP and encoding problems

我正在尝试用PHP抓取以下url:http://www.clubedoricardo.com.br/Produto/Smartphone-Samsung-Galaxy-Win-2-Duos-G360-Cinza-Dual-Chip-4G-Tela-45-Camera-5MP-Frontal-2MP-Quad-Core-12Ghz-8GB/44-491-496-568187

$url="http://www.clubedoricardo.com.br/Produto/Smartphone-Samsung-Galaxy-Win-2-Duos-G360-Cinza-Dual-Chip-4G-Tela-45-Camera-5MP-Frontal-2MP-Quad-Core-12Ghz-8GB/44-491-496-568187";
$dom = new DOMDocument;
$dom->loadHTMLFile($url);
$page_content = $dom->saveHTML();
echo($page_content);

但文本中有一些奇怪的字符。我尝试使用UTF-8和ISO-8859进行编码,但没有任何变化。

有什么想法吗?

当我点击您提供的链接时,会出现一个空白网站。尝试:

$dom->loadHTML(mb_convert_encoding($url, 'HTML-ENTITIES', 'UTF-8'));