如何使用php和正则表达式获取非网站链接


How to get links not from a website using php and regular expression

如果链接链接到其他网站,我想在我网站的所有链接中添加rel="nofollow"。

例如,

$str = "<a href='www.linktoothersite.com'>I swear this isn't spam!</a><br><a href='www.mywebsite.com'>Hello World</a>";

输出应为

$str = "<a href='www.linktoothersite.com' rel="nofollow">I swear this isn't spam!</a><br><a href='www.mywebsite.com'>Hello World</a>";

我真的想要正则表达式,但不想要DDOMDocument。因为当我使用DOMDocument时,我总是会出错"警告:DOMDocument::loadHTML()[DOMDocument.loadHTML]:htmlParseEntityRef:应为";"在实体"

使用DOM解析器并在所有链接上循环,检查其他站点的href属性。这是未经测试的,可能需要一些调整。

// assuming your html is in $HTMLstring
$dom = new DOMDocument();
$dom->loadHTML($HTMLstring);
// May need to disable error checking if the HTML isn't fully valid
$dom->strictErrorChecking = FALSE;
// Get all the links
$links = $dom->getElementsByTagName("a");
foreach($links as $link) {
  $href = $link->getAttribute("href");
  // Find out if the link points to a domain other than yours
  // If your internal links are relative, you'll have to do something fancier to check
  // their destinations than this simple strpos()
  if (strpos("yourdomain.example.com", $href) == -1) {
     // Add the attribute
     $link->setAttribute("rel", "nofollow");
  }
// Save the html
$output = $dom->saveHTML;