如何在<;代码></代码>;标签,允许人们发布代码


How can I escape all code within <code></code> tags to allow people to post code?

我想做的是允许用户在需要时发布代码,这样它就可以查看,而不会渲染。例如:

<span>
<div id="hkhsdfhu"></div>
</span>
<h1>Hello</h1>

应该变成:

&lt;span&gt;
&lt;div id="hkhsdfhu"&gt;&lt;/div&gt;
&lt;/span&gt;
&lt;h1&gt;Hello&lt;/h1&gt;

仅当它被包裹在<code></code>标记中时。现在我使用以下函数只允许某些HTML标记,并转义任何其他标记:

function allowedHtml($str) {
$allowed_tags = array("b", "strong", "i", "em");
$sans_tags = str_replace(array("<", ">"), array("&lt;","&gt;"), $str);
$regex = sprintf("~&lt;(/)?(%s)&gt;~", implode("|",$allowed_tags));
$with_allowed = preg_replace($regex, "<''1''2>", $sans_tags);
return $with_allowed;
}

但是,如果用户将他们的代码封装在<code></code>标记中,并且它包含上面函数中允许的任何标记,那么这些标记将被呈现,而不是转义。如何使<code></code>标记中的任何内容都被转义(或者仅将<>转换为&lt;&gt;)?我知道htmlentities(),但我不想对整个帖子这样做,只想在<code></code>标签中添加内容。

提前感谢!

只需使用带有e修饰符的单个preg_replace()函数,即可对<code>标签中的所有内容执行htmlenteties()函数

编辑

function allowedHtml($str) {
  $str = htmlentities($str, ENT_QUOTES, "UTF-8");
  $allowed_tags = array("b", "strong", "i", "em", "code");
  foreach ($allowed_tags as $tag) {
    $str = preg_replace("#&lt;" . $tag . "&gt;(.*?)&lt;/" . $tag . "&gt;#i", "<" . $tag . ">$1</" . $tag . ">", $str);
  }
  return $str;
}
$reply = allowedHtml($_POST['reply']);
$reply = preg_replace("#'<code'>(.+?)'</code'>#e", "'<code>'.htmlentities('$1', ENT_QUOTES, 'UTF-8').'</code>'", $reply);
$reply = str_replace("&amp;", "&", $reply);

重写您的allowedHtml()函数,并在末尾添加一个str_replace()

它经过测试,现在应该可以完美工作:)

更新-新解决方案

function convertHtml($reply, $revert = false) {
  $specials = array("**", "*", "_", "-");
  $tags = array("b", "i", "u", "s");
  foreach ($tags as $key => $tag) {
    $open = "<" . $tag . ">";
    $close = "</" . $tag . ">";
    if ($revert == true) {
      $special = $specials[$key];
      $reply = preg_replace("#" . $open . "(.+?)" . $close . "#i", $special . "$1" . $special, $reply);
    }
    else {
      $special = str_replace("*", "'*", $specials[$key]);
      $reply = preg_replace("#" . $special . "(.+?)" . $special . "#i", $open . "$1" . $close, $reply);
    }
  }
  return $reply;
}
$reply = htmlentities($reply, ENT_QUOTES, "UTF-8");
$reply = convertHtml($reply);
$reply = preg_replace("#[^'S'r'n]{4}(.+?)(?!.+)#i", "<pre><code>$1</code></pre>", $reply);
$reply = preg_replace("#'</code'>'</pre'>('s*)'<pre'>'<code'>#i", "$1", $reply);
$reply = nl2br($reply);
$reply = preg_replace("#'<pre'>'<code'>(.*?)'</code'>'</pre'>#se", "'<pre><code>'.convertHtml(str_replace('<br />', '', '$1'), true).'</code></pre>'", $reply);

讨论了另一个解决方案,上面的代码将解决这个问题。它的工作原理就像Stack Overflow html转换一样,这意味着**变为粗体,*变为斜体,_变为下划线,并且-是"删除线"。除此之外,所有以4个或更多空格开头的行都将输出为代码

我认为您最好直接使用dom,而不是使用正则表达式来解析出允许的标记。例如,要遍历<code>标记中的dom和escape内容,可以执行以下操作:

$doc = new DOMDocument();
$doc->loadHTML($postHtml);
$codeNode = $doc->getElementsByTagName('code')->item(0);
$escapedCode = htmlspecialchars($codeNode->nodeValue);

这里有一种使用preg_replace()的方法。只要确保在调用allowedHtml函数之前先调用此函数,这样标记就已经被替换了。

<?php
$post = <<<EOD
I am a person writing a post
How can I write this code?
Example:
<code>
<span>
<div id="hkhsdfhu"></div>
</span>
<h1>Hello</h1>
</code>
Pls help me...
EOD;

$post = preg_replace('/<code>(.*?)<'/code>/ise',
                     "'<code>' . htmlspecialchars('$1') . '</code>'",
                      $post);
var_dump($post);

结果:

string(201) "I am a person writing a post
How can I write this code?
Example:
<code>
&lt;span&gt;
&lt;div id='&quot;hkhsdfhu'&quot;&gt;&lt;/div&gt;
&lt;/span&gt;
&lt;h1&gt;Hello&lt;/h1&gt;
</code>
Pls help me..."

这里有一个。

$str = preg_replace_callback('/(?<=<code>)(.*?)(?=<'/code>)/si','escape_code',$str);
function escape_code($matches) {
    $tags = array('b','strong','i','em');
    // declare the tags in this array
    $allowed = implode('|',$tags);
    $match = htmlentities($matches[0],ENT_NOQUOTES,'UTF-8');
    return preg_replace('~&lt;(/)?('.$allowed.')('s*/)?&gt;~i','<$1$2$3>',$match);
}