如何获取嵌入代码的域名(最佳方法)


How to get the domain name of a embed code ( the best way )

如何获取嵌入代码的域URL,我有400k视频,我从许多网站获取视频,其中一些使用iframe或对象,获取嵌入代码域的简单方法和最佳方法是什么?

内帧代码示例:

<iframe src="http://www.websites-test.com/video231/" frameborder=0 width=510 height=400 scrolling=no></iframe>

嵌入代码示例:

<object width="990" height="750"> <param name="movie" value="http://www.websites-test.com/video231/"></param><param name="AllowScriptAccess" value="always"></param><param name="wmode" value="transparent"></param><embed src="http://www.websites-test.com/video231/" type="application/x-shockwave-flash" wmode="transparent"` AllowScriptAccess="always" width="990" height="750"></embed></object>

因此,假设$Domain_嵌入= websites-test.com

我建议你解析HTML代码(如何在PHP中解析和处理HTML/XML?),然后从适当的属性中提取域。例如:

<?php
function getDomainFromEmbed($html, $all = false)
{
    $result = array();
    $doc = new DOMDocument;
    @$doc->loadHTML($html);
    $iframes = $doc->getElementsByTagName('iframe');
    if (!empty($iframes)) {
        foreach ($iframes as $iframe) {
            if ($iframe->hasAttribute('src')) {
                $url = parse_url($iframe->getAttribute('src'), PHP_URL_HOST);
                if ($all) {
                    $result[] = $url;
                } else {
                    return $url;
                }
            }
        }
    }
    $objects = $doc->getElementsByTagName('object');
    if (!empty($objects)) {
        foreach ($objects as $object) {
            if ($object->hasAttribute('data')) {
                $url = parse_url($object->getAttribute('data'), PHP_URL_HOST);
                if ($all) {
                    $result[] = $url;
                } else {
                    return $url;
                }
            }
            $params = $object->getElementsByTagName('param');
            if (!empty($params)) {
                foreach ($params as $param) {
                    if ($param->hasAttribute('name') && $param->hasAttribute('value') && 'movie' === $param->getAttribute('name')) {
                        $url = parse_url($param->getAttribute('value'), PHP_URL_HOST);
                        if ($all) {
                            $result[] = $url;
                        } else {
                            return $url;
                        }
                    }
                }
            }
        }
    }
    $embeds = $doc->getElementsByTagName('embed');
    if (!empty($embeds)) {
        foreach ($embeds as $embed) {
            if ($embed->hasAttribute('src')) {
                $url = parse_url($embed->getAttribute('src'), PHP_URL_HOST);
                if ($all) {
                    $result[] = $url;
                } else {
                    return $url;
                }
            }
        }
    }
    return $all ? $result : null;
}
echo '<pre>';
var_dump(getDomainFromEmbed('<iframe src="http://www.websites-test.com/video231/" frameborder=0 width=510 height=400 scrolling=no></iframe>'));
var_dump(getDomainFromEmbed('<object width="990" height="750"> <param name="movie" value="http://www.websites-test.com/video231/"></param><param name="AllowScriptAccess" value="always"></param><param name="wmode" value="transparent"></param><embed src="http://www.websites-test.com/video231/" type="application/x-shockwave-flash" wmode="transparent"` AllowScriptAccess="always" width="990" height="750"></embed></object>'));
echo '</pre>';

试试这段代码:

function getDomain($html) {
    preg_match('`<[^>]*src=["'''s]?([^"^''^'s]+)["'''s][^>]*>`i', $html, $matches);
    if(isset($matches[1]))
        return parse_url($matches[1], PHP_URL_HOST);  
    return false;
}
$html = '<iframe src="http://www.websites-test.com/video231/" frameborder=0 width=510 height=400 scrolling=no></iframe>';
echo getDomain($html);
echo '<br />';
$html = '<object width="990" height="750"> <param name="movie" value="http://www.websites-test.com/video231/"></param><param name="AllowScriptAccess" value="always"></param><param name="wmode" value="transparent"></param><embed src="http://www.websites-test.com/video231/" type="application/x-shockwave-flash" wmode="transparent"` AllowScriptAccess="always" width="990" height="750"></embed></object>';
echo getDomain($html);

当然,您可以根据需要将其$Domain_Embed = getDomain($html)到变量中,而不是echo getDomain($html)$html是包含这些标记的 HTML 代码,其中包含您提到的src

对于同一$html中的多个对象,您可以更改函数以获取结果数组:

function getDomains($html) {
    $results = array();
    preg_match_all('`<[^>]*src=["'''s]?([^"^''^'s]+)["'''s][^>]*>`i', $html, $matches);
    if(isset($matches[1]) && is_array($matches[1]))
        foreach($matches[1] as $match)
            $results[] = parse_url($match, PHP_URL_HOST);
    return empty($results) ? false : $results;
}
echo '<pre>' . print_r(getDomains($html), true) . '</pre>';