html上的Regexp无法从meta中提取特定于文本的信息 - Regexp on html cannot extraxt specific info from meta

Regexp on html cannot extraxt specific info from meta

Fellows我有以下字符串：

<meta charset="UTF-8">

可以是任意一种

通过一个html字符串，我想提取UTF-8。我尝试使用以下代码：

preg_match_all('/^(<'s*meta's*) charset=[^"]'s*($>)*/ix', $contents, $matches);

但不知怎的不起作用，我也不知道为什么。

preg_match_all('/^<meta's[^>]*charset=["'']([^>]+)["''])/i', $contents, $matches);

charset=[^"]'s*($>)*
有几个问题[^"]=否"
's*=零个或多个空格（这可以，但不必要）
($>)*=不确定你在这里的意图。$锚定在绳子的末端。。。因此您试图匹配/捕获零个或多个（字符串末尾后的">"）s。。（它将始终为零）

对于这种情况，使用DOMDocument类将是更合适、更准确的方法：

$html_string = '<meta charset="UTF-8">';
$doc = new 'DOMDocument();
$doc->loadHTML($html_string);
$charset = $doc->childNodes->item(1)->getElementsByTagName("meta")->item(0)->getAttribute("charset");
print_r($charset);  // "UTF-8"

Finnaly我切换到guzzle http，并从http标头中获得编码