看起来不像链接的链接


Links that doesn't look like links

我有一些代码可以获得页面的所有链接,但有些代码获得的链接看起来不像链接。例如,索引0-4得到的链接名为"javascript:void(0)",而索引5得到的是一个只有"/"的空白链接。我该如何解决这个问题?谢谢。

$content = file_get_contents("http://bestspace.co"); //get content of page
$links = "<a's[^>]*href=('"??)([^'" >]*?)''1[^>]*>(.*)<'/a>"; //set regular expression to get links
preg_match_all("/$links/siU", $content, $matches); //get all links on page and store in array $matches[2]
print_r($matches[2]);

数组

Array ( 
[0] => javascript:void(0) 
[1] => javascript:void(0) 
[2] => javascript:void(0) 
[3] => javascript:void(0) 
[4] => javascript:void(0) 
[5] => / 
[6] => /bestdeals 
[7] => /about-us 
[8] => /why-choose-us 
[9] => /products 
[10] => https://cloud.bestspace.co/clientarea.php 
ect... );

使用array_filter删除所有Javascript链接。

$links = array_filter($matches[2], function($x) {
    return substr($x, 0, 11) != 'javascript:';
});