如何提取HTML代码块中的单行


How to extract single line in block of HTML code

我的内容如下:

<meta property="og:type" content="article" />
<meta property="og:url" content="http://website/fox/" />
<meta property="og:site_name" content="The Fox" />
<meta property="og:image" content="http://images.Fox.com/2014/09/foxandforset.gif?w=209" />
<meta property="og:title" content="Fox goes to forest" />

我的要求是提取/获得一行,即meta property=og:image..,因此结果应包含:

<meta property="og:image" content="http://images.Fox.com/2014/09/foxandforset.gif?w=209" />

提取HTML的"一行",或者通常使用正则表达式来解析HTML,是很脆弱的。更健壮的方法是使用HTML解析器,例如DOM扩展提供的支持。

示例:

$html = <<<'HTML'
<meta property="og:type" content="article" />
<meta property="og:url" content="http://website/fox/" />
<meta property="og:site_name" content="The Fox" />
<meta property="og:image" content="http://images.Fox.com/2014/09/foxandforset.gif?w=209" />
<meta property="og:title" content="Fox goes to forest" />
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//meta[@property="og:image"]');
foreach ($nodes as $node) {
    echo $dom->saveHTML($node);
}

输出:

<meta property="og:image" content="http://images.Fox.com/2014/09/foxandforset.gif?w=209">
^<meta property="og:image".*$

试试这个。设置标志mg。请参阅演示。

http://regex101.com/r/hQ1rP0/48