如何使用php dom xpath或regex获得样式表url


How to get stylesheet URLs using php dom xpath or regex?

我正在构建用于合并所有屏幕css样式表的自定义库,但我不确定如何仅为媒体类型screen获取样式表。例如:

<!-- This should be fetched -->
<link href="http://www.domain.com/style.css" rel="stylesheet" type="text/css" />
<!-- This should be fetched -->
<link href="http://www.domain.com/ie.css" rel="stylesheet" type="text/css" />
<style type="text/css" media="all">
  <!-- This should be fetched -->
  @import url("http://static.php.net/www.php.net/styles/phpnet.css");
</style>
<style type="text/css" media="screen">
   <!-- This should be fetched -->
  @import url("http://static.php.net/www.php.net/styles/site.css");
</style>
<style type="text/css" media="print">
  <!-- This should NOT be fetched since it is media type print -->
  @import url("http://static.php.net/www.php.net/styles/print.css");
</style>

给定上述字符串,我只想提取hrefurl的值。我不知道该怎么做。虽然我试过了

preg_match_all("/(url'(['''"]?)([^'"''')]+)(['"'']?'))/", $html, $matches);
print_r($matches);

但是它没有返回。

使用php dom, xpath或regex有什么解决方案吗?

下面是工作代码!我已经为您创建了一个代码包粘贴bin: http://codepad.org/WQzcO3k3

<?php
$inputString = '<!-- This should be fetched -->
<link href="http://www.domain.com/style.css" rel="stylesheet" type="text/css" />
<!-- This should be fetched -->
<link href="http://www.domain.com/ie.css" rel="stylesheet" type="text/css" />
<style type="text/css" media="all">
  <!-- This should be fetched -->
  @import url("http://static.php.net/www.php.net/styles/phpnet.css");
</style>
<style type="text/css" media="screen">
   <!-- This should be fetched -->
  @import url("http://static.php.net/www.php.net/styles/site.css");
</style>
<style type="text/css" media="print">
  <!-- This should NOT be fetched since it is media type print -->
  @import url("http://static.php.net/www.php.net/styles/print.css");
</style>';
$outputUrls = array();
@$doc = new DOMDocument();
@$doc->loadHTML($inputString);
$xml = simplexml_import_dom($doc); // just to make xpath more simple
$linksOrStyles = $xml->xpath('//*[@rel="stylesheet" or @media="all" or @media="screen"]');     

//print_r($linksOrStyles);
foreach ($linksOrStyles as $linkOrStyleSimpleXMLElementObj)
{
    if ($linkOrStyleSimpleXMLElementObj->xpath('@href') != false) {
      $outputUrls[] = $linkOrStyleSimpleXMLElementObj['href'] . '';
    } else {
        //get the 'url' value.
        $httpStart = strpos($linkOrStyleSimpleXMLElementObj.'', 'http://');
        $httpEnd = strpos($linkOrStyleSimpleXMLElementObj.'', '"', $httpStart);
        $outputUrls[] = substr($linkOrStyleSimpleXMLElementObj.'', $httpStart, ($httpEnd - $httpStart));
        //NOTE:Use preg_match only to get URL. i had to use strpos here 
        //since codepad.org doesnt suport preg
        /*
        preg_match(
            "#((http|https|ftp)://('S*?'.'S*?))('s|';|')|']|'[|'{|'}|,|'"|'|:|'<|$|'.'s)#ie",
            ' ' . $linkOrStyleSimpleXMLElementObj,
            $matches
        );
        print_r($matches);
        $outputUrls[] = $matches[0];
        */
    }
}
echo 'Output Url list: ';
print_r($outputUrls);
?>