使用html-dom解析器删除href链接和标签 - Remove href links and label using html dom parser

Remove href links and label using html dom parser

首先我获取网页的html，然后删除通常出现在页面左侧或右侧(而不是页面正文中(的href链接。Href链接正在被删除，但其标签没有被删除。

示例：

<a href='http://test.blogspot.com/2012/11/myblog.html'>London</a>

链接正在被删除，但不是它的标签，即"伦敦"。如何删除html源中的完整行？我正在使用以下代码：

$string = strip_tags($html_source_code, '<a>', TRUE); 
function strip_tags($text, $tags = '', $invert = FALSE) {
      preg_match_all('/<(.+?)['s]*'/?['s]*>/si', trim($tags), $tags); 
      $tags = array_unique($tags[1]); 
      if(is_array($tags) AND count($tags) > 0) { 
        if($invert == FALSE) { 
          return preg_replace('@<(?!(?:'. implode('|', $tags) .')'b)('w+)'b.*?>.*?</'1>@si', '', $text); 
        } 
        else { 
          return preg_replace('@<('. implode('|', $tags) .')'b.*?>.*?</'1>@si', '', $text); 
        } 
      } 
      elseif($invert == FALSE) { 
        return preg_replace('@<('w+)'b.*?>.*?</'1>@si', '', $text); 
      } 
return $text; 
}

如果我使用您的代码，我会得到一个致命错误：无法重新声明strip_tags((。

将name函数更改为类似my_strip_tags的函数效果良好。

function my_strip_tags($text, $tags = '', $invert = FALSE) {
      preg_match_all('/<(.+?)['s]*'/?['s]*>/si', trim($tags), $tags); 
      $tags = array_unique($tags[1]); 
      if(is_array($tags) AND count($tags) > 0) { 
        if($invert == FALSE) { 
          return preg_replace('@<(?!(?:'. implode('|', $tags) .')'b)('w+)'b.*?>.*?</'1>@si', '', $text); 
        } 
        else { 
          return preg_replace('@<('. implode('|', $tags) .')'b.*?>.*?</'1>@si', '', $text); 
        } 
      } 
      elseif($invert == FALSE) { 
        return preg_replace('@<('w+)'b.*?>.*?</'1>@si', '', $text); 
      } 
return $text; 
}
$html_source_code = "Beginning of content ... <a href='http://test.blogspot.com/2012/11/myblog.html'>London</a> ... end of content.";
echo "<p>".$html_source_code."</p>";
$string = my_strip_tags($html_source_code, '<a>', TRUE);
echo "<p>".$string."</p>";

打印：

内容开始。。。伦敦内容结束。

内容开头。。。内容结束。

$link = "<a href='http://test.blogspot.com/2012/11/myblog.html'>London</a>";
function erraser($theLink, $checkTag){
    if(strpos($theLink, $checkTag) == true){
        for($i=0; $i< strlen($theLink); $i++){
        $link[$i] = '';
        return  $link[$i];
        }
       }else{
        return $theLink;
    }
}

现在，让我们来看一下：

你所要做的就是给erraser()函数两个参数，然后是链接的变量，以及识别链接的任何文本

如果你这样做，例如：echo erraser($link, 'href');，它会删除链接，而return什么都不删除。但是，如果在echo erraser($link, '----');中给它----，那么它将发出链接london，这意味着它将检查它是否是链接，并执行所需的功能